%matplotlib inline
import numpy as np
from numpy import sin, pi, arange
import matplotlib.pyplot as plt
from ipywidgets import interact_manual, interact
Let's create a model of a simulated signal using a DataFrame. Each row represensts a sine wave component. The columns encode the frequency, amplitude, and phase of each sine wave component.
import pandas as pd
model = {}
model['components'] = pd.DataFrame({
'freq': [3, 10, 5 ,15, 35],
'amplitude': [5, 15, 10, 5, 7],
'phase': [1/7, 1/8, 1, 1/2, -1/4],
})
model['noise'] = {'magnitude': 2, 'seed': 12321}
model
We can write a function to generate a signal, taking in the model and sampling frequency as input arguments, like below.
def generate_signal(model, sampling_freq):
time = arange(-1, 1 + 1/sampling_freq, 1/sampling_freq)
signals = []
for i in model['components'].index:
component_i = model['components'].iloc[i]
amplitude = component_i['amplitude']
freq = component_i['freq']
phase = component_i['phase']
signal_i = amplitude * sin(2*pi*freq*time + phase)
signals.append(signal_i)
signals = np.array(signals)
signal = np.sum(signals, axis=0)
# generate noise
np.random.seed(model['noise']['seed'])
noise = model['noise']['magnitude'] * np.random.randn(signals.shape[1])
# add noise to signal
signal = signal + noise
return signal
def plot_signal(signal):
plt.figure(figsize=(20,3))
plt.plot(signal)
plt.show()
signal = generate_signal(model, 500)
plot_signal(signal)
Exercise
Using DataFrame to represent the simulated signal gives us the ability to select a subset of components to generate a filtered version of the signal. We can enhance the function generate_signal
by taking advantage of DataFrame's built-in selection capability. Implement the function generate_signal_with_selector
below.
def generate_signal_with_selector(model, sampling_freq, selector_func):
time = arange(-1, 1 + 1/sampling_freq, 1/sampling_freq)
selected_indices = selector_func(model['components'])
selected_components = model['components'][selected_indices]
## BEGIN SOLUTION
signals = []
for i in selected_components.index:
component_i = selected_components.loc[i]
# create a row of signal and append to signals
# combine the signal
signal = np.sum(signals, axis=0)
# generate noise
# add noise to signal
return signal
## END SOLUTION
# TEST 1
select_greater_than_10 = lambda df: df['freq'] > 10
signal = generate_signal_with_selector(model, 500, select_greater_than_10)
plot_signal(signal)
# The plot should show the wave form composed of sine waves whose frequencies are greater than 10
# TEST 2
select_less_than_20 = lambda df: df['freq'] < 20
signal = generate_signal_with_selector(model, 500, select_less_than_20)
plot_signal(signal)
# The plot should show the wave form composed of sine waves whose frequencies are less than 20
Frequency Domain
In the previous example, we generated a complex signal composed of multiple sine waves oscillating at different frequencies. Typically in data analysis, we only observe the signal and are trying to uncover the generative processes that gave rise to the signal. In this section, we will introduce the frequency domain and how we can identify if there are any frequencies oscillating at a consistent frequency in our signal using the fourier transform. The fourier transform is essentially convolving different frequencies of sine waves with our data.
One important assumption to note is that the fourier transformations assume that your signals are stationary, which means that the generative processes giving rise to the oscillations do not vary over time. See this video for a more in depth discussion on stationarity. In practice, this assumption is rarely true. Often it can be useful to use other techniques such as wavelets to look at time x frequency representations. We will not be covering wavelets here, but see this series of videos for more information.
Discrete Time Fourier Transform
We will gain an intution of how the fourier transform works by building our own discrete time fourier transform. See this accompanying video for a more in depth overview.
The discrete Fourier transform of variable $x$ at frequency $f$ is
$X_f = \sum\limits_{k=0}^{n-1} x_k e^{-i2\pi f(k-1)n^{-1}}$ where $n$ refers to the number of data points in vector $x$, and the capital letter $X_f$ is the fourier coefficient of time series variable $x$ at frequency $f$.
Essentially, we create a bank of sine waves at different frequencies that are linearly spaced. We will compute n-1 sine waves as the zero frequency component will be the mean offset over the entire signal and will simply be zero in our example.
Notice that we are computing complex sine waves using the np.exp
function instead of the np.sin
function. 1j
is how we can specify a complex number in python. We can extract the real components using np.real
or the imaginary using np.imag
. See this video for more information about complex numbers.
Now let's create a bank of n-1 linearly spaced complex sine waves.
import numpy as np
from numpy import exp
sampling_freq = 500
n = sampling_freq * 2 + 1
time = np.arange(0, n, 1) / n
sine_waves = []
for i in range(n):
sine_waves.append(exp(-1j*2*pi*i*time))
sine_waves = np.array(sine_waves)
sine_waves.shape
Let's look at the first 5 waves to see their frequencies. Remember the first basis function is zero frequency component and reflects the mean offset over the entire signal.
f,a = plt.subplots(nrows=5, figsize=(10,5), sharex=True)
for i in range(0,5):
a[i].plot(sine_waves[i,:])
plt.xlabel('Time', fontsize=18)
plt.tight_layout()
We can visualize all of the sine waves simultaneously using a heatmap representation. Each row is a different sine wave, and columns reflect time. The intensity of the value is like if the sine wave was coming towards and away rather than up and down. Notice how it looks like that the second half of the sine waves appear to be a mirror image of the first half. This is because the first half contain the positive frequencies, while the second half contains the negative frequencies. Negative frequencies capture sine waves that travel in reverse order around the complex plane compared to that travel forward. This becomes more relevant with the hilbert transform, but for the purposes of this tutorial we will be ignoring the negative frequencies.
plt.figure(figsize = (15,15))
plt.imshow(np.real(sine_waves))
plt.ylabel('Frequency', fontsize=18)
plt.xlabel('Time', fontsize=18)
Now let's take the dot product of each of the sine wave basis set with our signal
signal = generate_signal(model, 500)
fourier = np.dot(signal, sine_waves)
fourier.shape
Now that we have computed the fourier transform, we might want to examine the results. The fourier transform provides a 3-D representation of the data including frquency, power, and phase. Typically, the phase information is ignored when plotting the results of a fourier analysis. The traditional way to view the information is plot the data as amplitude on the y-axis and frequency on the x-axis. We will extract amplitude by taking the absolute value of the fourier coefficients. Remember that we are only focusing on the positive frequencies (the 1st half of the sine wave basis functions).
Here the x axis simply reflects the index of the frequency. The actual frequency is $N/2 + 1$ as we are only able estimate frequencies that are half the sampling frequency, this is called the Nyquist frequency. Watch this video to hear more about frequencies and zero padding.
plt.figure(figsize=(20,5))
plt.plot(np.abs(fourier[0:int(np.ceil(len(fourier)/2))]))
plt.xlabel('Frequency (index)', fontsize=18)
plt.ylabel('Amplitude', fontsize=18)
Notice that there are 5 different frequencies that have varying amplitudes. Recall that when we simulated this data we added 5 different sine waves with different frequencies and amplitudes.
freq = [3, 10, 5 ,15, 35]
amplitude = [5, 15, 10, 5, 7]
Let's zoom in a bit more to see this more clearly and also add the correct frequency labels in $Hz$.
plt.figure(figsize=(20,5))
plt.plot(np.arange(0, 80)/2, np.abs(fourier)[0:80])
plt.xlabel('Frequency (Hz)', fontsize=18)
plt.ylabel('Amplitude', fontsize=18)
Exercise
The plot above shows five spikes corresponding to the five frequencies used to generate the simulated signal. Now, repeat the code above to compute the fourier (i.e., by taking a doct product) of a "filtered" version of the simulated signal generated using your implementation of generate_signal_with_selector
. Create a similar plot like above except only a selected subset of the spikes should be seen.
select_less_than_20 = lambda df: df['freq'] < 20
a_filtered_signal = generate_signal_with_selector(model, 500, select_less_than_20)
### BEGIN SOLUTION
signal = a_filtered_signal#generate_signal(model, 500)
fourier = np.dot(signal, sine_waves)
fourier.shape
plt.figure(figsize=(20,5))
plt.plot(np.arange(0, 80)/2, np.abs(fourier)[0:80])
plt.xlabel('Frequency (Hz)', fontsize=18)
plt.ylabel('Amplitude', fontsize=18)
### END SOLUTION
# Expected: A plot that looks like above except there is no spike at Frequency = 35
Inverse Fourier Transform
The fourier transform allows you to represent a time series in the frequency domain. We can also invert this transformation and create a time series with only the frequency domain information using the inverse fourier transform.
$x_k = \sum\limits_{k=0}^{n-1} X_k e^{i2\pi f(k-1)n^{-1}}$
Notice that we are computing the dot product between the complex sine wave and the fourier coefficients $X$ instead of the time series data $x$.
sampling_freq = 500
signal = generate_signal(model, 500)
# plot the simulated signal
plt.figure(figsize=(20,5))
plt.plot(signal)
plt.ylabel('Intensity', fontsize=18)
plt.xlabel('Time', fontsize=18)
n = sampling_freq * 2 + 1
time = np.arange(0, n, 1) / n
sine_waves = []
for i in range(n):
sine_waves.append(exp(-1j*2*pi*i*time))
sine_waves = np.array(sine_waves)
# fourier transform
fourier = np.dot(signal, sine_waves)
# inverse fourier transform
inv_fourier = np.dot(fourier, sine_waves)
plt.figure(figsize=(20,5))
plt.plot(inv_fourier)
plt.ylabel('Intensity', fontsize=18)
plt.xlabel('Time', fontsize=18)
Fast Fourier Transform
The discrete time fourier transform is useful to understand the relationship between the time and frequency domains. However, in practice this method is rarely used as there are more faster and efficient methods to perform this computation. One popular algorithm is called the fast fourier transform (FFT). This function is also in numpy np.fft.fft
.
from numpy.fft import fft, ifft
# generate a signal
signal = generate_signal(model, 500)
# compute fast fourier transformation
fourier_fft = fft(signal)
plt.figure(figsize=(20,5))
plt.plot((np.arange(0,80)/2), np.abs(fourier_fft[0:80]))
plt.ylabel('Amplitude', fontsize=18)
plt.xlabel('Frequency (Hz)', fontsize=18)
Let's generate a new signal using the same model by selecting only those whose freq is less or equal to 10. Using fft, the result now shows three spies
We can also use the ifft
to perform an inverse fourier transform
plt.figure(figsize=(20,5))
plt.plot(ifft(fourier))
plt.ylabel('Intensity', fontsize=18)
plt.xlabel('Time', fontsize=18)
select_less_than_20 = lambda df: df['freq'] < 20
a_filtered_signal = generate_signal_with_selector(model, 500, select_less_than_20)
# compute and plot the fourier of `a_filtered_signal`
# plot the ifft of the fourier of `a_filtered_signal`
Convolution Theorem
Convolution in the time domain is the same multiplication in the frequency domain. This means that time domain convolution computations can be performed much more efficiently in the frequency domain via simple multiplication. (The opposite is also true that multiplication in the time domain is the same as convolution in the frequency domain. Watch this video for an overview of the convolution theorem and convolution in the frequency domain.
Filters
Filters can be classified as finite impulse response (FIR) or infinite impulse response (IIR). These terms describe how a filter responds to a single input impulse. FIR filters have a response that ends at a disrete point in time, while IIR filters have a response that continues indefinitely.
Filters are constructed in the frequency domain and several properties that need to be considers.
- ripple in the pass-band
- attenuation in the stop-band
- steepness of roll-off
- filter order (i.e., length for FIR filters)
- time-domain ringing
In general, there is a frequency by time tradeoff. The sharper something is in frequency, the broader it is in time, and vice versa.
Here we will use IIR butterworth filters as an example.
High Pass
High pass filters only allow high frequency signals to remain, effectively removing any low frequency information.
Here we will construct a high pass butterworth filter and plot it in frequency space.
Note: this example requires using scipy 1.2.1+. To upgrade run !pip install scipy --upgrae
and then reset your kernel.
from scipy.signal import butter, filtfilt, freqz
filter_order = 3
frequency_cutoff = 25
sampling_frequency = 500
# Create the filter
b, a = butter(filter_order, frequency_cutoff, btype='high', output='ba', fs=sampling_frequency)
def rad_sample_to_hz(x, fs):
return (x*fs)/(2*np.pi)
def plot_filter(b, a, fs):
plt.figure(figsize=(20,5))
w, h = freqz(b, a, worN=512*2, whole=False)
plt.plot(rad_sample_to_hz(w, fs), abs(h), linewidth=3)
plt.ylabel('Gain', fontsize=18)
plt.xlabel('Frequency', fontsize=18)
plot_filter(b, a, sampling_frequency)
Notice how the gain scales from [0,1]? Filters can be multiplied by the FFT of a signal to apply the filter in the frequency domain. When the resulting signal is transformed back in the time domain using the inverse FFT, the new signal will be filtered. This can be much faster than applying filters in the time domain.
The filter_order parameter adjusts the sharpness of the cutoff in the frequency domain. Try playing with different values to see how it changes the filter plot.
@interact_manual(frequency_cutoff = 50, sampling_frequency = 500)
def test_butter_filter(frequency_cutoff, sampling_frequency):
b, a = butter(filter_order, frequency_cutoff, btype='high', output='ba', fs=sampling_frequency)
plot_filter(b, a, sampling_frequency)
What does the filter look like in the temporal domain? Let's take the inverse FFT and plot it to see what it looks like as a kernel in the temporal domain. Notice how changing the filter order adds more ripples in the time domain.
from scipy.signal import sosfreqz
filter_order = 8
sos = butter(filter_order, frequency_cutoff, btype='high', output='sos', fs=sampling_frequency)
w_sos, h_sos = sosfreqz(sos)
plt.plot(ifft(h_sos)[0:100], linewidth=3)
plt.ylabel('Amplitude', fontsize=18)
plt.xlabel('Time', fontsize=18)
Now let's apply the filter to our data. We will be applying the filter to the signal in the time domain using the filtfilt
function. This is a good default option, even though there are several other functions to apply the filter. filtfilt
applies the filter forward and then in reverse ensuring that there is zero-phase distortion.
filter_order = 8
frequency_cutoff = 25
sampling_frequency = 500
b, a = butter(filter_order, frequency_cutoff, btype='high', output='ba', fs=sampling_frequency)
signal = generate_signal(model, 500)
filtered = filtfilt(b, a, signal)
plt.figure(figsize=(20,5))
plt.plot(signal[0:200], linewidth=1)
plt.plot(filtered[0:200], linewidth=2)
plt.ylabel('Intensity', fontsize=18)
plt.xlabel('Time', fontsize=18)
plt.legend(['Original','Filtered'], fontsize=18)
selected = generate_signal_with_selector(model, 500, lambda df: df['freq'] > 25)
plt.figure(figsize=(20,5))
## BEGIN SOLUTION
## END SOLUTION
plt.ylabel('Intensity', fontsize=18)
plt.xlabel('Time', fontsize=18)
plt.legend(['Selected','Filtered'], fontsize=18)
from scipy.signal import butter, filtfilt
filter_order = 2
frequency_cutoff = 10
sampling_frequency = 500
signal = generate_signal(model, 500)
# Create the filter
b, a = butter(filter_order, frequency_cutoff, btype='low', output='ba', fs=sampling_frequency)
# Apply the filter
filtered = filtfilt(b, a, signal)
plt.figure(figsize=(20,5))
plt.plot(signal[0:600], linewidth=2)
plt.plot(filtered[0:600], linewidth=4)
plt.ylabel('Intensity', fontsize=18)
plt.xlabel('Time', fontsize=18)
plt.legend(['Original','Filtered'], fontsize=18)
What does the filter look like?
filter_order = 2
frequency_cutoff = 10
sampling_frequency = 500
# Create the filter
b, a = butter(filter_order, frequency_cutoff, btype='low', output='ba', fs=sampling_frequency)
plot_filter(b, a, sampling_frequency)
selected = generate_signal_with_selector(model, 500, lambda df: df['freq'] <= 10)
# TODO: create a plot to compare two signals
filter_order = 2
lowcut = 7
highcut = 13
# Create the filter
b, a = butter(filter_order, [lowcut, highcut], btype='bandpass', output='ba', fs=sampling_frequency)
# Apply the filter
filtered = filtfilt(b, a, signal)
plt.figure(figsize=(20,5))
plt.plot(signal, linewidth=2)
plt.plot(filtered, linewidth=4)
plt.ylabel('Intensity', fontsize=18)
plt.xlabel('Time', fontsize=18)
plt.legend(['Original','Filtered'], fontsize=18)
# TODO: create a plot to compare 'filtered' to a simulated filtered signal
# obtained by `generate_signal_with_selected`
filter_order = 2
lowcut = 8
highcut = 12
# Create the filter
b,a = butter(filter_order, [lowcut, highcut], btype='bandstop', output='ba', fs=sampling_frequency)
# Plot the filter
plot_filter(b, a, sampling_frequency)
# Apply the filter
filtered = filtfilt(b, a, signal)
plt.figure(figsize=(20,5))
plt.plot(signal[0:200], linewidth=1)
plt.plot(filtered[0:200], linewidth=2)
plt.ylabel('Intensity', fontsize=18)
plt.xlabel('Time', fontsize=18)
plt.legend(['Original','Filtered'], fontsize=18)
# TODO: create a plot to compare 'filtered' to a simulated filtered signal
# obtained by `generate_signal_with_selected`
Exercise
Apply what you've learned from this lab to analyze a "bad" signal to uncover the good signal hidden within. Let's suppose we somehow know that the bad signal is the result of a good signal being affected by a drift and a strong high frequenct noises.
Let's load the two signals and plot them.
good = np.load('good_signal.npy')
bad = np.load('bad_signal.npy')
plt.figure(figsize=(20,5))
plt.plot(bad, linewidth=1)
plt.plot(good, linewidth=2)
Now, write code to recover the good signal and plot the result. Discuss what the properties of the high frequency noise and drift might be.
### BEGIN SOLUTION
### END SOLUTION