Overview of Pyleoclim: Efficient & Flexible Timeseries Analysis#

Authors#

Deborah Khider1, Julien Emile-Geay2, Alexander James2, Feng Zhu3

1 Information Sciences Institute, University of Southern California 2 Department of Earth Sciences, University of Southern California 3 Nanjing University of Information Science and Technology

Author1 = {“name”: “Deborah Khider”, “affiliation”: “Information Sciences Institute, University of Southern California”, “email”: “khider@usc.edu”, “orcid”: “0000-0001-7501-8430”} Author2 = {“name”: “Julien Emile-Geay”, “affiliation”: “Department of Earth Sciences, University of Southern California”, “email”: “julieneg@usc.edu”, “orcid”: “0000-0001-5920-4751”} Author3 = {“name”: “Alexander James”, “affiliation”: “Department of Earth Sciences, University of Southern California”, “email”: “akjames@usc.edu”, “orcid”: “0000-0001-8561-3188”} Author4 = {“name”: “Feng Zhu”, “affiliation”: “Nanjing University of Information Science and Technology”, “email”: “fzhu@nuist.edu”, “orcid”: “0000-0002-9969-2953”}

Preamble#

Goals:#

  • Get acquainted with the Pyleoclim package, namely the objects and their associated methods

  • Learn to create a Series object from a csv file

  • Call the default methods for plotting, spectral and wavelet analysis

  • Construct workflows with Pyleoclim

Reading Time:

10 minutes

Keywords#

Visualization; Signal Processing; Spectral Analysis; Wavelet Analysis; Method Cascading

Pre-requisites#

None. This tutorial assumes basic knowledge of Python. If you are not familiar with this coding language, check out this tutorial: http://linked.earth/ec_workshops_py/.

Relevant Packages#

Pandas; Matplotlib

Data Description#

Sea-surface temperature from Kaplan (1998) averaged over the NINO3 (5N-5S, 150W-190E)) region.

Demonstration#

Let’s import the packages needed for this tutorial:

import pyleoclim as pyleo
import pandas as pd

Step 1: Create a Series object from a csv file#

To do so, we will first read the data from a csv file and load into a Pandas DataFrame:

df = pd.read_csv('../data/wtc_test_data_nino_even.csv')
df.head()
t air nino
0 1871.000000 87.36090 -0.358250
1 1871.083333 -21.83460 -0.292458
2 1871.166667 -5.52632 -0.143583
3 1871.250000 75.73680 -0.149625
4 1871.333333 105.82000 -0.274250

Next we create a Series object from the columns of the DataFrame:

ts_nino = pyleo.Series(time =  df['t'], value = df['nino'], label = 'Kaplan Niño3 SST',
                  time_name = 'Year', value_name = 'NINO3 index',
                  time_unit = 'CE',   value_unit = '$^{\circ}$C', verbose=False)            

Let’s make a simple plot:

fig, ax = ts_nino.plot()
../_images/301deb386e47b7391a27db6c1f773972d4f4bf0debb9e55d4553960cc30aa03e.png

You can also make a warming stripe for this Series:

fig, ax = ts_nino.stripes(ref_period=(1961,1990), show_xaxis=True)
../_images/993bcf4745f0c7671be8549716dcfcac6e74d938b8a162b76e45b679500c7f0a.png

Step 2: Pre-processing#

Pyleoclim has multiple functionalities to pre-process a timeseries, including standardizing, detrending, interpolation, and much more. You can learn about the various pre-processing steps in the basic Series manipulations, filtering and detrending, and uniform time sampling tutorials. As an example of one treatment, let’s standardize the data and plot it against the original values.

ts_nino_std = ts_nino.standardize()

fig, ax = ts_nino.plot(label='original', zorder=99) # a high zorder ensures this is plotted on top
ax = ts_nino_std.plot(label='standardized', ax=ax, lgd_kwargs={'ncol': 2})
../_images/af9103e9ddc08c3974e76b02de579dbe68d7d49ee9c5310995b21abd3c2b195c.png

For more on graphics customizations, see figures with multiple panels.

Step 3: Spectral Analysis#

Pyleoclim makes it simple to perform spectral analysis. Calling the .spectral() method will yield a PSD (power spectral density) object of the corresponding series. Unless specified otherwise, .spectral() uses the Lomb-Scargle periodogram (method='lomb_scargle), which can handle both evenly-spaced data and unevenly-spaced data with great speed. However, it has known biases, so it may be appropriate to specify an alternative method, like the Multi-Taper Method, which can only handle evenly-spaced data. For our example data, both methods apply.

psd_wwz = ts_nino_std.spectral()  # method='lomb-scargle' by default 
psd_mtm = ts_nino_std.spectral(method='mtm')  # method='mtm' 

We can then plot the results of our analyses on the same figure:

fig, ax = psd_wwz.plot(label='Lomb-Scargle',zorder=99)
ax = psd_mtm.plot(ax=ax, label='MTM')
../_images/167e7d3f8d2df6e4171000784c13239f41911ca2fc4da9aec494a6dbe25f108d.png

To identify notable periodicities, we need to perform some significance test. Currently, Pyleoclim supports the test against an AR(1) benchmark, which can be performed by simply calling the signif_test() method, which uses 200 surrogate series by default:

psd_mtm_signif = ts_nino_std.spectral(method='mtm').signif_test()
Performing spectral analysis on individual series: 100%|██████████| 200/200 [00:10<00:00, 18.50it/s]

Calling the plot() method on our psd_mtm_signif object will return a plot of the PSD along with the AR(1) threshold curve, which we can use to identify the significant cycles.

fig, ax = psd_mtm_signif.plot()
../_images/929b93c7053780f332f2797f5b341bf91332129bb95b613090b4db7e1d602d26.png

For more details, see the spectral analysis notebook.

Step 4: Wavelet Analysis#

The current version of Pyleoclim supports the wwz method for wavelet analysis on unevenly-spaced records and the cwt method for evenly-spaced records. wwz is more flexible than cwt in the sense that it can be used on any record, but is much slower (no free lunch!). To perform wavelet analysis we simply call the wavelet() method, which will in turn create a Scalogram object. Since our data is evenly spaced, we’ll use cwt, which is the default:

scal = ts_nino_std.wavelet() 
fig, ax = scal.plot() 
../_images/82dced48af0f014b6da9c2a54352208a2bbcd96985712b7d533e43ff6aad078d.png

Notice how the title got built automatically from the figure metadata. We can also perform significance analysis the same way we did when conducting spectral analysis:

scal_signif = ts_nino_std.wavelet().signif_test()
Performing wavelet analysis on individual series: 100%|██████████| 200/200 [00:03<00:00, 56.20it/s]

In this case, regions of significance (those that exceed the 95% confidence threshold found using AR1 surrogates) will be outlined in white.

fig, ax = scal_signif.plot()
../_images/a43e5a4fcdc3064d9ef648c3ed228dfc1bcc66352ec43589fc9d8dcf373fe3c9.png

For more details, see the wavelet analysis notebook.

One more thing: method cascading#

Life is short. Pyleoclim provides a very cool feature to make life a little bit easier: “method cascading”. This feature allows us to string together the steps of a multi-step workflow in just one line.

fig, ax = ts_nino.standardize().spectral(method='mtm').anti_alias().signif_test().plot()
Performing spectral analysis on individual series: 100%|██████████| 200/200 [00:13<00:00, 15.29it/s]
../_images/e7ab21e4dc1901dd7cc9aab6efbf2d4e6d9a3a9849f75df67e4f7fbf299edb6a.png

Same thing with wavelet analysis:

fig, ax = ts_nino.standardize().wavelet().signif_test().plot()
Performing wavelet analysis on individual series: 100%|██████████| 200/200 [00:04<00:00, 45.31it/s]
../_images/e5fdac80cfe54474667491b1ce5419e2149a7e5071ebb4acad3e47b9dbb85e1f.png

If you think some of this could serve your goals, try the other notebooks in this repository.