Chapter 2: Signal Processing#

by Julien Emile-Geay, University of Southern California

Preamble#

Often timeseries have to be treated prior to analysis. This is a delicate game: altering data is a cardinal sin, so one needs to be very aware of one’s doing prior to engaging in this activity. Here we show a few examples where such treatment may be warranted.

Goals#

  • Become familiar with the various filtering methods and some of their arguments

  • Become familiar with the various detrending methods and some of their arguments

Reading Time: 20 min

Keywords#

Filtering, Detrending.

Pre-requisites#

Some timeseries analysis, ideally

Relevant Packages#

pyleoclim, pandas, matplotlib, numpy

Let us first load necessary packages:

import matplotlib.pyplot as plt
import pyleoclim as pyleo
import numpy as np
import pandas as pd

Filtering#

The goal of filtering is to enhance part of a signal, or suppress one or more parts. Pyleoclim incorporates a vast array of filters, some inherited from the SciPy library. To filter a Series object, simply apply the .filter() method. One innovation of our package is that is makes it easy to specify either a cutoff scale (period) or a cutoff frequency.

Let’s use data fron a Hawai’i seismogram and isolate certain oscillations.

df = pd.read_csv('../data/hawaii.e.dat',sep='\s+',names=['time','ground_motion'])
gm = pyleo.Series(time = df['time']/1000,time_unit='s',
                  value = df['ground_motion']*1000, value_name = 'amplitude (m)',
                  label = 'Hawaii.e',
                  auto_time_params=False,
                  verbose=False)
gm.plot()
<>:1: SyntaxWarning: invalid escape sequence '\s'
<>:1: SyntaxWarning: invalid escape sequence '\s'
/var/folders/bf/_x19bm694857h_hrw44j0p7w0000gn/T/ipykernel_42628/265003603.py:1: SyntaxWarning: invalid escape sequence '\s'
  df = pd.read_csv('../data/hawaii.e.dat',sep='\s+',names=['time','ground_motion'])
(<Figure size 1000x400 with 1 Axes>,
 <Axes: xlabel='time [s]', ylabel='amplitude (m)'>)
../_images/24739b8bcae6ca1c8f9532cb32eb85e6bec7461516f4440a2bffa3e4636f5bcc.png

Filter Types#

Lowpass filter#

Let’s use a Butterworth filter with a cutoff_scale of 5 to remove all frequencies with a period of less than 5s. This is known as a low-pass filter.

gm_low = gm.filter(cutoff_scale=5)
fig, ax = gm.plot()
gm_low.plot(ax=ax,label=gm.label + ' (5s lowpass)')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[3], line 1
----> 1 gm_low = gm.filter(cutoff_scale=5)
      2 fig, ax = gm.plot()
      3 gm_low.plot(ax=ax,label=gm.label + ' (5s lowpass)')

File ~/Documents/GitHub/Pyleoclim_util/pyleoclim/core/series.py:1485, in Series.filter(self, cutoff_freq, cutoff_scale, method, keep_log, **kwargs)
   1362 ''' Filtering methods for Series objects using four possible methods:
   1363     - `Butterworth <https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.butter.html>`_
   1364     - `Lanczos <http://scitools.org.uk/iris/docs/v1.2/examples/graphics/SOI_filtering.html>`_
   (...)
   1482 
   1483 '''
   1484 if not self.is_evenly_spaced():
-> 1485     raise ValueError('This  method assumes evenly-spaced timeseries, while the input is not. Use the ".interp()", ".bin()" or ".gkernel()" methods prior to ".filter()".')
   1487 new = self.copy()
   1489 mu = np.mean(self.value) # extract the mean

ValueError: This  method assumes evenly-spaced timeseries, while the input is not. Use the ".interp()", ".bin()" or ".gkernel()" methods prior to ".filter()".

Ooops. We got an error message. It must be that the time axis is uneven. Let’s use the resolution() function to visualize what’s happening.

gm.resolution().plot()
(<Figure size 1000x400 with 1 Axes>,
 <Axes: xlabel='time [s]', ylabel='resolution [s]'>)
../_images/ab5b5a848a1a721b1a29023e64dce2e0eaeab60d36c52d6982a9a8326d3873a3.png

It appears that the resolution jumps at the 100th second, for some reason. Let’s interpolate to get an even resolution of 1ms, and check that nothing crazy is happening:

ts = gm.interp(step=0.001)
fig, ax = gm.plot(label='original',alpha=0.4)
ts.plot(ax=ax, label='interpolated',alpha=0.4)
<Axes: xlabel='time [s]', ylabel='amplitude (m)'>
../_images/5648a4d664fade12be1be628f583de378361d93ea02f5f894122aea911ef8c23.png

The two series are indistinguishable, so we are safe to proceed.

lp = ts.filter(cutoff_scale=2)
fig, ax = ts.plot(label='original')
lp.plot(ax=ax, label='low-pass')
<Axes: xlabel='time [s]', ylabel='amplitude (m)'>
../_images/fa34005586b1ceb16da07cc6128fbee218a3a858bf0840abf459e2fd5673d961.png

One can see edge effects at the end of the series. This is because the series starts and ends with different values. Tapering would mitigate that.

lpa = ts.filter(cutoff_scale=2, pad='ARIMA')  # DOES NOT WORK
fig, ax = ts.plot(label='original')
lpa.plot(ax=ax, label='low-pass, ARIMA padding')
<Axes: xlabel='time [s]', ylabel='amplitude (m)'>
../_images/7dc94819615d7a48eeaa87d1fe832e2819a17697ee1c023ec0b5d4014f144756.png

High-pass filter#

The opposite operation is a high-pass filter, which keeps only frequencies higher than a given cutoff. To generate a high-pass filter, you can simply subtract the low-pass filtered version from the original:

hp = lp.copy() # copy the series into a new object
hp.value = ts.value - lp.value
fig, ax = ts.plot(label='original')
hp.plot(ax=ax, label='high-pass', alpha=0.8)
<Axes: xlabel='time [s]', ylabel='amplitude (m)'>
../_images/c58a2c636c54d275224657a4808e9b8bbae0726c99cff109c872da1ae38305fa.png

Band-pass filter#

If you want to isolate variability in a band of frequencies (say 1-2s), simply pass a 2-list of cutoff scales:

bp = ts.filter(cutoff_scale=[1,2])
fig, ax = ts.plot(label='original')
bp.plot(ax=ax, label='1-2s band-pass', alpha=0.8)
<Axes: xlabel='time [s]', ylabel='amplitude (m)'>
../_images/b8ab4591900ffa052525dd94a6d848e0c9a0af72d163fa9082984be05800264d.png

Notch filter#

Conversely, if you wanted to remove all variability between 1-2s (a notch in frequency space), you would subtract the bandpass-filtered version.

notch = bp.copy() # copy the series into a new object
notch.value = ts.value - bp.value
fig, ax = ts.plot(label='original')
hp.plot(ax=ax, label='1-2s notch', alpha=0.8)
<Axes: xlabel='time [s]', ylabel='amplitude (m)'>
../_images/fea044f155d95d01e4ca048dad91a2dbff548e4102d7306594b3e83366e14c10.png

Window Carpentry#

In the above we only used the default method (Butterworth) to generate the filter coefficient. SciPy’s Finite Impulse Response function makes it easy to apply many different types of windows,

# from scipy import signal
# windows = ['boxcar','triang','blackman','lanczos']
# fig, axs = plt.subplots(4,2,figsize=(10,8))
# Nx = 31

# for i, win in enumerate(windows):
#     win_dow = signal.get_window(win,Nx)
#     axs[i,0].plot(win_dow, label = win + ' window')
#     filt = ts.filter(cutoff_scale=2, method='firwin',window=win)
#     ts.plot(ax=axs[i,1],label='original',xlabel='')
#     filt.plot(ax=axs[i,1],label='2s lowpass, '+win,xlabel='')

Detrending#

gmst = pyleo.utils.load_dataset('HadCRUT5')
gmst.plot()
(<Figure size 1000x400 with 1 Axes>,
 <Axes: xlabel='Time [year C.E.]', ylabel='GMST [$^{\\circ}$C]'>)
../_images/31c3b620e247d59d232944883f9ff33b18c6e3c0b8d055ca0696040e9b3c2476.png

The “trend” here is complex, and likely nonlinear. If we wanted to describe it linearly, one would have to distinguish 4 periods: cooling in the late 19th century, warming from 1900 to 1940, cooling from 1940 to 1970, followed by steady warming.

Built-in detrending methods in Pyleoclim#

Let’s apply 3 methods using their default parameters: linear detrending, empirical mode decomposition, and a Savitzky-Golay filter, which is meant to isolate the lowest-order trend:

methods  = ['linear', 'emd', 'savitzky-golay']
fig, axs = plt.subplots(3,1,figsize=(12,8))
axs= axs.flatten()
for i, mthd in enumerate(methods):
    gmst_dtd = gmst.detrend(method=mthd,keep_log=True)
    trend = gmst.copy()
    trend.value = gmst_dtd.log[0]['previous_trend'] # retrieve the trend from the log    
    gmst.plot(ax=axs[i])
    trend.plot(ax=axs[i],label='trend, ' + mthd)
    gmst_dtd.plot(ax=axs[i], label='detrended')
../_images/ddb3d37a14fe80b1f83cb4985f3b524e8ac00dde15e5f7d8e73034ca6fc2eb1c.png

The linear trend here does a decent job at capturing first-order behavior, but the two nonlinear methods are out to lunch. Savitzky-Golay does not have a tuning knob, so that’s that. However, EMD does allow you to pick different modes (\(n=1\) is the default).

Tweaking parameters#

Let’s try modes 1-4 (\(n=0, 1 ,2 ,3\)) and see which fits best.

fig, axs = plt.subplots(4,1,figsize=(12,10))
axs= axs.flatten()
for n in range(4):
    gmst_dtd = gmst.detrend(method='emd',n=n,keep_log=True)
    trend = gmst.copy()
    trend.value = gmst_dtd.log[0]['previous_trend'] # retrieve the trend from the log    
    gmst.plot(ax=axs[n])
    trend.plot(ax=axs[n],label=rf'EMD trend, $n={n}$')
    gmst_dtd.plot(ax=axs[n], label='detrended')
../_images/5550b330b96c7bc7c7aa17c4640cbda7930095712652386f34dd21a9e1f1bb9f.png

The choice is subjective, but it would be reasonable to pick \(n=2\) as characterizing the low-order nonlinear trend. This flexibility of EMD is the reason why it is the default choice in Pyleoclim.

SSA detrending#

Another option to isolate a non-linear trend is Singular Spectrum Analysis. If there is a prominent trend, it is often the first mode coming out of that analysis. Let’s see this:

gmst_ssa = gmst.ssa()
gmst_ssa.screeplot()
(<Figure size 600x400 with 1 Axes>,
 <Axes: title={'center': 'SSA scree plot'}, xlabel='Mode index $i$', ylabel='$\\lambda_i$'>)
../_images/d5c1045f78a1daa685fc9420fc9f048105320e6ec482597d377227b795e8f632.png

Indeed, the first mode accounts for an overwhelming fraction of the variance:

print("{:.2f}".format(gmst_ssa.pctvar[0])+'%')
83.35%

This is what the mode looks like:

gmst_ssa.modeplot(index=0)
(<Figure size 1000x500 with 3 Axes>,
 <Axes: title={'center': 'RC Spectrum (mtm)'}, xlabel='Period [year]', ylabel='PSD'>)
../_images/6a8784bd4ac8c418738365e0620f4bfa18d74488a689f8bb53c088daaee1466e.png

To remove this mode from the original series, you would do:

gmst_ssa = gmst.ssa(trunc='var') # this will select the mode(s) that account for a certain fraction of variance, here 80% by default. 
ssa_trend = gmst_ssa.RCseries
ssa_trend.plot()
(<Figure size 1000x400 with 1 Axes>,
 <Axes: xlabel='Time [year C.E.]', ylabel='GMST [$^{\\circ}$C]'>)
../_images/98158d97976d8f50aee9610c56de3b7ea545c974ca8d59d73d8a9a1c16f85412.png
gmst_ssa_dtd = gmst.copy() # copy the original
gmst_ssa_dtd.value = gmst.value - ssa_trend.value # remove SSA trend from original
gmst_ssa_dtd.label = gmst.label + ' (SSA detrended)'
fig, ax = gmst.plot()
ssa_trend.plot(ax=ax,label='SSA trend')
gmst_ssa_dtd.plot(ax=ax)
<Axes: xlabel='Time [year C.E.]', ylabel='GMST [$^{\\circ}$C]'>
../_images/83372d6ace8e6d6ddddd3493384daa750865a48bce9681eacdffe9ffb44624a7.png

This pre-processing allows to better isolate oscillatory behavior. To see this, let’s look at the spectra of the original and detrended versions:

gmst.spectral(method='mtm',settings={'NW':2}).plot()
gmst_ssa_dtd.spectral(method='mtm',settings={'NW':2}).plot()
(<Figure size 1000x400 with 1 Axes>,
 <Axes: xlabel='Period [year]', ylabel='PSD'>)
../_images/ae99d48fcf5c049613f3633add62d8dcdf600dad1d144e9b544ffd68ea671dbb.png ../_images/6cadb2cc5e663310a3c21a92f0e6089eac746f8a756ddafc27012e85e05452dd.png

We see that detrending has removed much of the variability at scales longer than ~30y, allowing to hone in on various peaks near 3.5, 6, 10 and 20 years. To see if those are significant, however, you would need to apply signif_test(), which we illustrate in another tutorial.

Takeways#

  • Pyleoclim enables easy filtering and detrending

  • Interactive visualization is a key part of the process, to make sure that the signal processing is achieving what it is supposed to do.