Filtering and Detrending with Pyleoclim#

Alexander James & Julien Emile-Geay

Preamble#

Goals:#

  • Become familiar with the various filtering methods and some of their arguments

  • Become familiar with the various detrending methods and some of their arguments

Reading Time:

10 minutes

Keywords#

Signal Processing; Visualization

Pre-requisites#

None. This tutorial assumes basic knowledge of Python. If you are not familiar with this coding language, check out this tutorial: http://linked.earth/LeapFROGS/.

Relevant Packages#

Pandas; Matplotlib

Data Description#

Benthic foraminifera \(\delta^{18}O\) data from the LR04 benthic stack ship (for free!) with pyleoclim

Demonstration#

Let’s import pyleoclim, load and plot the data:

%load_ext watermark

import pyleoclim as pyleo
lr04 = pyleo.utils.load_dataset('LR04')
fig, ax = lr04.plot(invert_yaxis=True)
../_images/a816019a1490890b7d4a9d0cf98d186c6212397b88f5d6e8eb85930887d431ca.png

Detrending#

To detrend the data using pyleoclim we simply call the .detrend() method on a Series object. There are several detrending methods which are covered more thoroughly by the documentation.

In this case, we’ll use the EMD (Empirical Mode Decomposition) method. This decomposes the series into different modes and removes the last one on the assumption that it describes the trend in the series.

ts_emd = lr04.detrend(method='emd', preserve_mean=True, keep_log=True)
fig, ax = lr04.plot(label='Original', invert_yaxis=True)
ts_emd.plot(label='EMD',ax=ax)
<Axes: xlabel='Age [ky BP]', ylabel='$\\delta^{18} \\mathrm{O}$ [‰]'>
../_images/4a1ce7dd0cb522336ee8e7a36a50268bb8cb37149f8fa44e41ef16a7bb1a90be.png

You can also pass method-dependent keyword arguments for each of the different methods. For example, with EMD, if we specify a value for the keyword n, it will remove the n smoothest modes rather than the single smoothest mode (by default, n=1).

ts_emd_n = lr04.detrend(n=3, preserve_mean=True)
fig, ax = ts_emd.plot(label='EMD with n = 1', invert_yaxis=True)
ts_emd_n.plot(label='EMD with n = 3',ax=ax)
<Axes: xlabel='Age [ky BP]', ylabel='$\\delta^{18} \\mathrm{O}$ [‰]'>
../_images/79d184fff563cae1f45075a791b093f00deefd59ee4ce102e2b2cfe83141c67a.png

As you can see, this isn’t really necessary for this record, but it does come in handy on occasion, and other methods can be more sensitive to parameter values, so it’s good to be aware that some configurability exists. Note that these keyword arguments are method dependent, meaning that you’ll need to consult the documentation for each in order to make sure you know which keyword arguments can be used with each method.

Note that we passed two optional arguments: one to preserve the mean of the original series (preserve_mean = True), so it would plot nicely along the original; the other is to keep the log, so we can easily extract the trend. To plot it, you can do this:

trend_emd = ts_emd.copy()
trend_emd.value = ts_emd.log[0]['previous_trend']
trend_emd.label = 'EMD trend'
trend_emd.plot(invert_yaxis=True)
(<Figure size 1000x400 with 1 Axes>,
 <Axes: xlabel='Age [ky BP]', ylabel='$\\delta^{18} \\mathrm{O}$ [‰]'>)
../_images/549aa5f1e9691061bbcbc5d732fe9e6e77100c50485a087d545dcf9b831b763c.png

There are other methods available to us such as linear, savitzky-golay, and constant. The latter option is not terribly useful, as it just removes a constant from the series. Linear detrending is generally a sensible path, but it is obvious here that the trend is nonlinear. savitzky-golay is quite good at extracting those:

ts_sg = lr04.detrend(method='savitzky-golay', preserve_mean=True, keep_log=True)
fig, ax = lr04.plot(label='Original', invert_yaxis=True)
ts_sg.plot(label='Savitzky-Golay detrending',ax=ax, color = 'C3')
/Users/julieneg/Documents/GitHub/Pyleoclim_util/pyleoclim/utils/tsutils.py:988: UserWarning: Timeseries is not evenly-spaced, interpolating...
  warnings.warn("Timeseries is not evenly-spaced, interpolating...")
<Axes: xlabel='Age [ky BP]', ylabel='$\\delta^{18} \\mathrm{O}$ [‰]'>
../_images/3eb837ec7893577001bd0283d2521972188bb4312687063df9dc1d9b164d4ede.png

Note that the method also assumes evenly-spaced data, and having found unevenly-spaced data, it applied interpolation under the hood to get what it needed. If the regridding choice matters, it would be better to do this outside of detrend(). In any case, the result is rather similar to EMD, so let’s isolate the trend for a more quantitative comparison:

trend_sg = ts_sg.copy()
trend_sg.value = ts_sg.log[0]['previous_trend']
trend_sg.label = 'SG trend'
fig, ax = trend_emd.plot(invert_yaxis=True, color = 'C1')
trend_sg.plot(ax=ax, color = 'C3')
<Axes: xlabel='Age [ky BP]', ylabel='$\\delta^{18} \\mathrm{O}$ [‰]'>
../_images/0540979428a9c9c12fa0efd1d53ad4f572967c74462de265263d195356916a69.png

Savitzky-Golay is indeed very good at isolating the lowest-frequency component. Note that the millennial-scale wiggles present with EMD would show up in a spectrum, so be exceeding careful when applying detrending prior to spectral analysis, as it could create artifacts.

Filtering#

Technically, Savitzky-Golay is a filter, and can be used as such. Pyleoclim incorporates a vast array of filters, some inherited from the SciPy library. To filter a Series object, simply apply the .filter() method. One innovation of our package is that is makes it easy to specify either a cutoff scale (period) or a cutoff frequency.

Let’s use the lanczos filter with a cutoff_scale of 80 to remove all frequencies with a period of less than 80ka (we pass the number 80 because our time axis is already in units of ka). This is known as a low-pass filter. We have to interpolate the series because the Lanczos method expects an evenly-spaced series. See the uniform time sampling tutorial for more information on interpolation and other methods of creating evenly-spaced time series.

ts_low = lr04.interp().filter(method='lanczos',cutoff_scale=80)
fig, ax = lr04.plot(label='Original')
ts_low.plot(label='80ky lowpass',ax=ax, invert_yaxis=True)
<Axes: xlabel='Age [ky BP]', ylabel='$\\delta^{18} \\mathrm{O}$ [‰]'>
../_images/ac6a65718ec37bc95c71c7bdb40d13afaca80366935a2651e6b80b2350c3df0a.png

One could also pass a cutoff frequency (cutoff_freq=...) , though we find the cutoff scale more user-friendly.

If we look at the results of performing wavelet analysis on our series before and after filtering, we can clearly see the effect of our low pass filter. You can learn more about spectral analysis in Pyleoclim in the spectral analysis tutorial.

fig, ax = lr04.wavelet(method='wwz').plot()
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.
../_images/b1256c59cec55071aa5923f01162fe8aff5acd1b95e2e2337cdabd0c54bd1514.png
fig, ax = ts_low.wavelet(method='wwz').plot()
../_images/d27fc2895f37a0ba70f4f147168a870cd7d11d635d88a3e5d1af8997d5bb04b4.png

If we want to apply a band pass filter, we can pass a list of values to our cutoff_scale argument.

ts_band = lr04.interp().filter(method='butterworth',cutoff_scale=[10,80])
fig, ax = lr04.plot(label='Original')
ts_band.plot(label='Bandpass (10 to 80 ka)',ax=ax, invert_yaxis=True)
<Axes: xlabel='Age [ky BP]', ylabel='$\\delta^{18} \\mathrm{O}$ [‰]'>
../_images/4186d2e4be0e04b20c2dfd92fba697ae07ee1eecc6376db991897f39f9e29a26.png

Checking our work by running wavelet analysis (done below) shows that we’ve isolated the components of the series whose periodicities lie between 10 and 80ka.

fig, ax = ts_band.wavelet(method='cwt').plot()
../_images/df6b6d79354c431435d3562a736b74f0c8f7d19971d516bc9798e9ab3e127697.png

To apply a high pass filter, we simply apply a lowpass filter and then subtract the values of the low pass filter series from our original series.

#Applying a high pass filter with a cutoff_scale of 50

ts_high = lr04.copy()
ts_high.value -= lr04.interp().filter(method='lanczos',cutoff_scale=50).value
fig, ax = lr04.plot(label='Original',invert_yaxis=True)
ts_high.plot(label='High pass',ax=ax)
<Axes: xlabel='Age [ky BP]', ylabel='$\\delta^{18} \\mathrm{O}$ [‰]'>
../_images/772777916bfcdb4605362c71771fe62fdd08bfe5a6629ec2f9ec6a072d6c5dc6.png

Though removal of the long term trend in our series is an obvious consequence of our high pass filter, wavelet analysis should show us how the frequency profile of this series has changed.

fig, ax = ts_high.wavelet(method='wwz').plot()
../_images/81dac29998a24318ec2ef628b774996f2c3784f3ad67533a59ed22eee1d120d0.png

It’s not perfect, but the power of the components of the series with longer periodicities has decreased.

As with .detrend(), .filter() accepts method-specific keyword arguments. For example, in the case of the savitzky-golay method, we can pass window_length to adjust the size of the filter window.

ts_sg_low = lr04.interp().filter(method='savitzky-golay',cutoff_scale=80)
ts_sg_win = lr04.interp().filter(method='savitzky-golay',cutoff_scale=80,window_length=81)
fig, ax = ts_sg_low.plot(label='Low pass')
ts_sg_win.plot(label='Adjusted low pass',ax=ax, invert_yaxis=True)
<Axes: xlabel='Age [ky BP]', ylabel='$\\delta^{18} \\mathrm{O}$ [‰]'>
../_images/f077ee247a32e1c45e6fcc1f304608acc478675ad57037510de00e4e275944fc.png

There are a number of other methods we could use here including: butterworth, lanczos, and firwin. Each has its own keyword arguments. Exploring these (and their documentation), is left as an exercise for the reader.

Finally, note that singular spectrum analysis is another option to isolate particular components of a signal. As the dedicated tutorial shows, it can function for detrending or filtering as well.

%watermark -n -u -v -iv -w
Last updated: Mon Mar 04 2024

Python implementation: CPython
Python version       : 3.11.7
IPython version      : 8.20.0

pyleoclim: 0.13.1b0

Watermark: 2.4.3