enDAQ Custom Analysis

Introduction

This notebook and accompanying webinar was developed and released by the enDAQ team. This is the fifth “chapter” of our series on Python for Mechanical Engineers:

  1. Get Started with Python

  2. Introduction to Numpy & Pandas for Data Analysis

  3. Introduction to Plotly for Plotting Data

  4. Introduction of the enDAQ Library

    • There are lots of examples in this!

  5. More Examples! (Today’s Webinar)

    • Frequency Analysis (FFTs and PSDs)

    • Simple Shock Response Spectrums

    • Peak Analysis

    • Preview endaq.batch

To sign up for future webinars and watch previous ones, visit our webinars page.

Installation

Available on PyPi via: > pip install endaq

For the most recent features that are still under development, you can also use pip to install endaq directly from GitHub: > pip install git+https://github.com/MideTechnology/endaq-python.git@development

[ ]:
!pip install -q endaq
!pip install -q kaleido #this is for rendering images with plotly
exit() #needed in Colab because they pre-load some libraries, wouldn't be neccessary if running locally
     |████████████████████████████████| 71 kB 3.0 MB/s
     |████████████████████████████████| 11.3 MB 13.0 MB/s
     |████████████████████████████████| 62 kB 749 kB/s
     |████████████████████████████████| 38.1 MB 1.3 MB/s
     |████████████████████████████████| 93 kB 1.0 MB/s
     |████████████████████████████████| 83 kB 1.0 MB/s
     |████████████████████████████████| 25.3 MB 51.8 MB/s
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires pandas~=1.1.0; python_version >= "3.0", but you have pandas 1.3.4 which is incompatible.
google-colab 1.0.0 requires requests~=2.23.0, but you have requests 2.26.0 which is incompatible.
datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.
albumentations 0.1.12 requires imgaug<0.2.7,>=0.2.5, but you have imgaug 0.2.9 which is incompatible.
     |████████████████████████████████| 79.9 MB 1.2 MB/s

[ ]:
import endaq

endaq.plot.utilities.set_theme(theme='endaq')

import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio; pio.renderers.default = "iframe"
import pandas as pd
import numpy as np
import scipy

PSDs & FFTs

Simple Sine Wave

[ ]:
time = np.linspace(0,2,200,endpoint=False)

sine_waves = pd.DataFrame(index=time)
sine_waves['0.8g @ 2 Hz'] = 0.8*np.sin(2*np.pi*2 * time)
sine_waves['1g @ 3 Hz'] = np.sin(2*np.pi*3 * time)
sine_waves['0.6g @ 5 Hz'] = 0.6*np.sin(2*np.pi*5 * time)
sine_waves['0.5g @ 4 & 6 Hz'] = 0.5*np.sin(2*np.pi*4 * time) + 0.5*np.sin(2*np.pi*6 * time)
sine_waves['0.3g @ 7 Hz'] = 0.3*np.sin(2*np.pi*7 * time)

sine_waves
0.8g @ 2 Hz 1g @ 3 Hz 0.6g @ 5 Hz 0.5g @ 4 & 6 Hz 0.3g @ 7 Hz
0.00 0.000000 0.000000 0.000000 0.000000 0.000000
0.01 0.100267 0.187381 0.185410 0.308407 0.127734
0.02 0.198952 0.368125 0.352671 0.583150 0.231154
0.03 0.294500 0.535827 0.485410 0.794687 0.290575
0.04 0.385403 0.684547 0.570634 0.921177 0.294686
... ... ... ... ... ...
1.95 -0.470228 -0.809017 -0.600000 -0.951057 -0.242705
1.96 -0.385403 -0.684547 -0.570634 -0.921177 -0.294686
1.97 -0.294500 -0.535827 -0.485410 -0.794687 -0.290575
1.98 -0.198952 -0.368125 -0.352671 -0.583150 -0.231154
1.99 -0.100267 -0.187381 -0.185410 -0.308407 -0.127734

200 rows × 5 columns

[ ]:
fig = px.line(sine_waves)
fig.update_layout(
    title_text='Comparison of Fabricated Sine Waves',
    yaxis_title_text='Acceleration (g)',
    xaxis_title_text='Time (s)',
    legend_title_text=''
)

Now to compute the PSD on this using endaq.calc.psd.welch(), see docs

[ ]:
psd = endaq.calc.psd.welch(sine_waves, bin_width=0.5)
[ ]:
fig = px.line(psd)
fig.update_layout(
    title_text='Comparison of the PSD of Two Fabricated Sine Waves',
    yaxis_title_text='Acceleration (g^2/Hz)',
    xaxis_title_text='Frequency (Hz)',
    xaxis_type='log',
    legend_title_text=''
)

PSDs are the more appropriate way to analyze vibration data for a number of reasons (see blog) but we typically see people want to see the FFT because it’s units are easier to intuitively understand.

In my experience the best way to do this is to compute a PSD using our function with a few modifications: - Use scaling as parseval which means it is g^2 instead of g^2/Hz - Use a boxcar window so in effect there is no windowing - Define a very fine bin width - Set the overlap between FFTs to 0 (not necessary if the bin width is small enough) - Scale from g^2 as RMS to g-peak via **2 * (2**0.5)

[ ]:
fft = endaq.calc.psd.welch(
    sine_waves,
    scaling='parseval',
    window='boxcar',
    noverlap=0,
    bin_width=0.1,
)
fft = fft**0.5 * (2**0.5) #scale from g^2 as RMS to g-peak

fig = px.line(fft)
fig.update_layout(
    title_text='Comparison of the FFT (from PSD) of Two Fabricated Sine Waves',
    yaxis_title_text='Acceleration (g)',
    xaxis_title_text='Frequency (Hz)',
    xaxis_type='log',
    legend_title_text='',
    template='endaq_light'
)
/usr/local/lib/python3.7/dist-packages/scipy/signal/spectral.py:1966: UserWarning:

nperseg = 1000 is greater than input length  = 200, using nperseg = 200

Real Sine Wave

Let’s load a IDE file with get_doc docs.

[ ]:
doc = endaq.ide.get_doc('https://info.endaq.com/hubfs/100Hz_shake_cal.ide')
endaq.ide.get_channel_table(doc)
  channel name type units start end duration samples rate
0 8.0 X (100g) Acceleration g 00:00.0218 00:25.0532 00:25.0314 126575 5000.07 Hz
1 8.1 Y (100g) Acceleration g 00:00.0218 00:25.0532 00:25.0314 126575 5000.07 Hz
2 8.2 Z (100g) Acceleration g 00:00.0218 00:25.0532 00:25.0314 126575 5000.07 Hz
3 80.0 X (40g) Acceleration g 00:00.0218 00:25.0547 00:25.0328 100934 3985.00 Hz
4 80.1 Y (40g) Acceleration g 00:00.0218 00:25.0547 00:25.0328 100934 3985.00 Hz
5 80.2 Z (40g) Acceleration g 00:00.0218 00:25.0547 00:25.0328 100934 3985.00 Hz
6 36.0 Pressure/Temperature:00 Pressure Pa 00:00.0217 00:25.0702 00:25.0485 27 1.06 Hz
7 36.1 Pressure/Temperature:01 Temperature °C 00:00.0217 00:25.0702 00:25.0485 27 1.06 Hz
8 70.0 X Quaternion q 00:00.0284 00:24.0682 00:24.0397 2424 99.35 Hz
9 70.1 Y Quaternion q 00:00.0284 00:24.0682 00:24.0397 2424 99.35 Hz
10 70.2 Z Quaternion q 00:00.0284 00:24.0682 00:24.0397 2424 99.35 Hz
11 70.3 W Quaternion q 00:00.0284 00:24.0682 00:24.0397 2424 99.35 Hz
12 59.0 Control Pad Pressure Pressure Pa 00:00.0251 00:25.0399 00:25.0148 252 10.02 Hz
13 59.1 Control Pad Temperature Temperature °C 00:00.0251 00:25.0399 00:25.0148 252 10.02 Hz
14 59.2 Relative Humidity Relative Humidity RH 00:00.0251 00:25.0399 00:25.0148 252 10.02 Hz
15 76.0 Lux Light Ill 00:00.0000 00:25.0278 00:25.0278 100 3.96 Hz
16 76.1 UV Light Index 00:00.0000 00:25.0278 00:25.0278 100 3.96 Hz

Now get the actual data out of one channel with to_pandas() docs.

Then generate a plot with rolling_min_max_envelope() (docs) that will instead of plotting all data points make a shaded plot which will look identical to plotting all points but be more responsive and faster (not entirely necessary on this dataset, but becomes so with larger ones).

[ ]:
accel = endaq.ide.to_pandas(doc.channels[80], time_mode='seconds')
accel = accel-accel.median() #remove DC offset
endaq.plot.rolling_min_max_envelope(
    accel,
    plot_as_bars=True,
    desired_num_points=1000,
    opacity=0.7
)

Here we’ll plot the raw data around the peak event with around_peak() docs.

[ ]:
endaq.plot.around_peak(accel, num=500)
[ ]:
fft = endaq.calc.psd.welch(
    accel[4.5:9.5], #just do it in time of first dwell
    scaling='parseval',
    window='boxcar',
    noverlap=0,
    bin_width=0.5,
)
fft = fft**0.5 * (2**0.5) #scale from g^2 as RMS to g-peak

fig = px.line(fft[:500])
fig.update_layout(
    title_text='FFT (from PSD) of Real World 10.3g Sine Wave',
    yaxis_title_text='Acceleration (g)',
    xaxis_title_text='Frequency (Hz)',
    legend_title_text='',
)

Comparing FFT Options

FFT Option Overview

There are a lot of different ways to compute a FFT in Python. I’ve been using Welch’s method so far but I want to compare that to some other more “direct” methods. So let’s compare them! First let’s conveniently wrap our manipulation of Welch’s method PSD into a FFT inside a function.

[ ]:
def welch_fft(df, bin_width=0.5):
  fft = endaq.calc.psd.welch(df, bin_width=bin_width, scaling='parseval', window='boxcar', noverlap=0)
  return fft**0.5 * (2**0.5) #scale from g^2 as RMS to g-peak

Now let’s wrap something around Numpy’s FFT functions for a real discrete Fourier transform.

Notice here we actually have phase information! So we’ll return that too.

[ ]:
def numpy_fft(df):
  """
  Using Numpy's rfft functions compute a discrete Fourier Transform
  """
  freq = np.fft.rfftfreq(len(df), d=endaq.calc.utils.sample_spacing(df))
  df_fft = pd.DataFrame(
    np.fft.rfft(df.to_numpy(), axis=0),
    index=pd.Series(freq, name="frequency (Hz)"),
    columns=df.columns
  )

  df_mag = df_fft.apply(np.abs, raw=True) / len(df_fft)
  df_phase = df_fft.apply(np.angle, raw=True)

  return df_mag, df_phase

Now we will use the FFTW algorithm which is available in the pyFFTW library under a GPL license (which makes it potentially difficult for us to use because we use the more premissive MIT license).

First let’s download it.

[ ]:
!pip install -q pyfftw
     |████████████████████████████████| 2.6 MB 5.3 MB/s

Now let’s use it in a function which allows for a drop-in replacement to the Numpy code. This algorithm is generally regarded as the fatest for computing discrete Fourier transforms - so we’ll put it to the test!

[ ]:
import pyfftw

def fftw_fft(df):
  """
  Using the FFTW algorithm, compute a discrete Fourier Transform
  """
  freq = pyfftw.interfaces.numpy_fft.rfftfreq(len(df), d=endaq.calc.utils.sample_spacing(df))
  df_fft = pd.DataFrame(
    pyfftw.interfaces.numpy_fft.rfft(df.to_numpy(), axis=0),
    index=pd.Series(freq, name="frequency (Hz)"),
    columns=df.columns
  )

  df_mag = df_fft.apply(np.abs, raw=True) / len(df_fft)
  df_phase = df_fft.apply(np.angle, raw=True)

  return df_mag, df_phase

Now let’s see a FFT result with this library. Note though this will be relatively large to plot…

[ ]:
fft, phase = fftw_fft(accel[4.5:9.5])

fig = px.line(fft[80:120])
fig.update_layout(
    title_text='FFT using FFTW of Real World 10.3g Sine Wave',
    yaxis_title_text='Acceleration (g)',
    xaxis_title_text='Frequency (Hz)',
    legend_title_text='',
)

Remember one of the benefits of the DFT is to get phase. Although not particularly useful for this dataset let’s plot it, notice the shift for the Z axis at the drive frequency.

[ ]:
fig = px.line(phase[80:120])
fig.update_layout(
    title_text='Phase using FFTW of Real World 10.3g Sine Wave',
    yaxis_title_text='Phase Angle (radians)',
    xaxis_title_text='Frequency (Hz)',
    legend_title_text='',
)

FFT Option Comparison

Now let’s do the fun part to compare the three approaches! First let’s make a sine wave with a bit over 1M points.

[ ]:
time = np.linspace(0,200,2**20,endpoint=False)

sine_waves = pd.DataFrame(index=time)
sine_waves['10g @ 100 Hz'] = 10*np.sin(2*np.pi*100 * time)
sine_waves['8g @ 99 Hz'] = 8*np.sin(2*np.pi*99 * time)
sine_waves['6g @ 100.25 Hz'] = 6*np.sin(2*np.pi*100.25 * time)
[ ]:
fig = px.line(sine_waves[:0.01])
fig.update_layout(
    title_text='Comparison of Fabricated Sine Waves',
    yaxis_title_text='Acceleration (g)',
    xaxis_title_text='Time (s)',
    legend_title_text=''
)
[ ]:
from time import process_time
import timeit
pyfftw.forget_wisdom() #FFTW will be faster on subsequent runs

lengths = [2**14, 16411, 2**17, 131101, 2**20, 1000003]

ffts = pd.DataFrame()
times = pd.DataFrame()
def melt_fft(fft,name,L):
  fft =  fft.reset_index().melt(id_vars='frequency (Hz)')
  fft['Algo'] = name
  fft['Length'] = L
  return fft

def time_fft(func):
    return np.array(timeit.repeat(func, number=1, repeat=10)).mean()

def time_pyfftw(x, threads=1):
    pyfftw.forget_wisdom()
    rfft = pyfftw.builders.rfft(x, threads=threads, auto_align_input=True)
    rfft()
    return np.array(timeit.repeat(lambda: rfft(), number=1, repeat=10)).mean()

for L in lengths:
  x = sine_waves.iloc[:L, 0].to_numpy()

  t1 = process_time()
  wfft_bin = welch_fft(sine_waves.iloc[:L], bin_width=0.5)

  t2 = process_time()
  wfft = welch_fft(sine_waves.iloc[:L], bin_width=1/(L*endaq.calc.utils.sample_spacing(sine_waves)))

  t3 = process_time()
  nfft, phase = numpy_fft(sine_waves.iloc[:L])

  t4 = process_time()
  fftw, phase = fftw_fft(sine_waves.iloc[:L])

  t5 = process_time()

  times_t = pd.DataFrame(
      {'Welch w/ 0.5 Hz Bin':t2-t1,
       'Welch': t3-t2,
       'Numpy':t4-t3,
       'FFTW':t5-t4,
       'Numpy - No OH': time_fft(lambda: np.fft.rfft(x)),
       'FFTW - No OH': time_pyfftw(x, threads=1)
      },
      index=pd.Series([L],name='Length')
  )
  times = pd.concat([times,times_t.reset_index().melt(id_vars='Length')])

  wfft_bin = melt_fft(wfft_bin.copy(),'Welch w/ 0.5 Hz Bin',L)
  wfft = melt_fft(wfft.copy(),'Welch',L)
  nfft = melt_fft(nfft.copy(),'Numpy',L)
  fftw = melt_fft(fftw.copy(),'FFTW',L)

  ffts = pd.concat([ffts,wfft_bin,wfft,nfft,fftw])
[ ]:
times['str_len'] = times['Length'].astype('str')
fig = px.bar(
    times,
    x='str_len',
    y='value',
    color='variable',
    log_y=True,
    barmode='group',
    labels={'value':'Computation Time (s)','str_len':'Array Length','variable':''}
)
fig.show()
[ ]:
fig = px.scatter(
    times,
    x='str_len',
    y='value',
    color='variable',
    log_y=True,
    log_x=True,
    labels={'value':'Computation Time (s)','str_len':'Array Length','variable':''}
)
fig.show()

So what does this mean!? FFTW is the fastest as expected, but only if we first structure the data in a more efficient way. But typically you will not have the data structured in this “optimal” way for FFTW which means the time it takes to restucture it is real.

Long story short for this audience, using Welch’s method is fastest!

[ ]:
fig = px.line(
    ffts[(ffts['frequency (Hz)']>95) & (ffts['frequency (Hz)']<105)],
    x='frequency (Hz)',
    y='value',
    color='Algo',
    facet_col='Length',
    facet_row='variable',
    title='FFTs: Acceleration (g) vs Frequency (Hz)',
    labels={'value':'','frequency (Hz)':''}

)
fig.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))
fig.update_layout(width=1200,height=1000)
fig.show()