Chapter 3: Measures of association#

by Julien Emile-Geay & Jordan Landers, University of Southern California

In this notebook we explore the relationship between two well-known climate indices, the Southern Oscillation Index (SOI) and the North Atlantic Oscillation (NAO) index. The NAO and SOI index have been alleged to show some relationship, albeit a subtle one, in some seasons/epochs. Specifically, we will explore:

  • effects of trends

  • effects of autocorrelation

  • various measures of association

  • various methods of establishing the significance of a relationship

Data Wrangling#

  • The NAO data are from NCEP

  • The SOI data ship with Pyleoclim, which houses a few datasets that make it easy to experiment with real-world geoscientific timeseries. To see what is available, do:

import pyleoclim as pyleo
pyleo.utils.available_dataset_names()
['SOI',
 'NINO3',
 'HadCRUT5',
 'AIR',
 'LR04',
 'AACO2',
 'EDC-dD',
 'GISP2',
 'cenogrid_d18O',
 'cenogrid_d13C']

To load the SOI, simply do:

SOI = pyleo.utils.load_dataset('SOI')
SOI
{'archiveType': 'Instrumental', 'label': 'Southern Oscillation Index'}
None
Time [year C.E.]
1951.000000    1.5
1951.083333    0.9
1951.166667   -0.1
1951.250000   -0.3
1951.333333   -0.7
              ... 
2019.583333   -0.1
2019.666667   -1.2
2019.750000   -0.4
2019.833333   -0.8
2019.916667   -0.6
Name: SOI [mb], Length: 828, dtype: float64
SOI.plot()
(<Figure size 1000x400 with 1 Axes>,
 <Axes: xlabel='Time [year C.E.]', ylabel='SOI [mb]'>)
../_images/670ec1415a72e7f0d3421f377a0a99bb7ebf0fdb67e10043634475ffd80abfb5.png

That was the easy part. For the NAO index, we have to work a little harder.

import pandas as pd
url = 'https://www.cpc.ncep.noaa.gov/products/precip/CWlink/pna/norm.nao.monthly.b5001.current.ascii.table'
df_nao = pd.read_csv(url, sep='\s+', header=0, index_col=0)
df_nao
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1950 0.92 0.40 -0.36 0.73 -0.59 -0.06 -1.26 -0.05 0.25 0.85 -1.26 -1.02
1951 0.08 0.70 -1.02 -0.22 -0.59 -1.64 1.37 -0.22 -1.36 1.87 -0.39 1.32
1952 0.93 -0.83 -1.49 1.01 -1.12 -0.40 -0.09 -0.28 -0.54 -0.73 -1.13 -0.43
1953 0.33 -0.49 -0.04 -1.67 -0.66 1.09 0.40 -0.71 -0.35 1.32 1.04 -0.47
1954 0.37 0.74 -0.83 1.34 -0.09 -0.25 -0.60 -1.90 -0.44 0.60 0.40 0.69
... ... ... ... ... ... ... ... ... ... ... ... ...
2020 1.34 1.26 1.01 -1.02 -0.41 -0.15 -1.23 0.12 0.98 -0.65 2.54 -0.30
2021 -1.11 0.14 0.73 -1.43 -1.24 0.77 0.03 -0.28 -0.21 -2.29 -0.18 0.29
2022 1.08 1.68 0.77 -0.36 0.71 -0.12 -0.09 1.47 -1.61 -0.72 0.69 -0.15
2023 1.25 0.92 -1.11 -0.63 0.39 -0.58 -2.17 -1.16 -0.44 -2.03 -0.32 1.94
2024 0.21 1.09 -0.21 -0.78 NaN NaN NaN NaN NaN NaN NaN NaN

75 rows × 12 columns

Here we have a challenge: the data present themselves as a 2-dimensional table, and need to be reshaped into a 1-D series by somehow combining months and years in a sequential fashion. Fear not! Pandas can do!

First, we turn the index as a column called Year:

df_nao = df_nao.reset_index().rename(columns={'index':'Year'})
df_nao
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
0 1950 0.92 0.40 -0.36 0.73 -0.59 -0.06 -1.26 -0.05 0.25 0.85 -1.26 -1.02
1 1951 0.08 0.70 -1.02 -0.22 -0.59 -1.64 1.37 -0.22 -1.36 1.87 -0.39 1.32
2 1952 0.93 -0.83 -1.49 1.01 -1.12 -0.40 -0.09 -0.28 -0.54 -0.73 -1.13 -0.43
3 1953 0.33 -0.49 -0.04 -1.67 -0.66 1.09 0.40 -0.71 -0.35 1.32 1.04 -0.47
4 1954 0.37 0.74 -0.83 1.34 -0.09 -0.25 -0.60 -1.90 -0.44 0.60 0.40 0.69
... ... ... ... ... ... ... ... ... ... ... ... ... ...
70 2020 1.34 1.26 1.01 -1.02 -0.41 -0.15 -1.23 0.12 0.98 -0.65 2.54 -0.30
71 2021 -1.11 0.14 0.73 -1.43 -1.24 0.77 0.03 -0.28 -0.21 -2.29 -0.18 0.29
72 2022 1.08 1.68 0.77 -0.36 0.71 -0.12 -0.09 1.47 -1.61 -0.72 0.69 -0.15
73 2023 1.25 0.92 -1.11 -0.63 0.39 -0.58 -2.17 -1.16 -0.44 -2.03 -0.32 1.94
74 2024 0.21 1.09 -0.21 -0.78 NaN NaN NaN NaN NaN NaN NaN NaN

75 rows × 13 columns

Next, we invoke pandas’ powerful melt function to unpivot the table:

df_nao = df_nao.melt(id_vars='Year', var_name='Month', value_name='NAO')
df_nao
Year Month NAO
0 1950 Jan 0.92
1 1951 Jan 0.08
2 1952 Jan 0.93
3 1953 Jan 0.33
4 1954 Jan 0.37
... ... ... ...
895 2020 Dec -0.30
896 2021 Dec 0.29
897 2022 Dec -0.15
898 2023 Dec 1.94
899 2024 Dec NaN

900 rows × 3 columns

Next we combine year and month into a date, then ditch the superfluous year and month columns:

df_nao['date']=(df_nao
          .assign(Date=lambda x: pd.to_datetime(x['Year'].astype(str) + '-' + x['Month'], format='%Y-%b')))['Date']             
df_nao = df_nao.drop(columns=['Year', 'Month']).set_index('date').sort_index()
df_nao
NAO
date
1950-01-01 0.92
1950-02-01 0.40
1950-03-01 -0.36
1950-04-01 0.73
1950-05-01 -0.59
... ...
2024-08-01 NaN
2024-09-01 NaN
2024-10-01 NaN
2024-11-01 NaN
2024-12-01 NaN

900 rows × 1 columns

Note the use of a lambda function to rework month and year on the fly. Now we have a proper dataframe with a time axis that is a proper datetime object, which Pyleoclim will know what do to with:

df_nao.index
DatetimeIndex(['1950-01-01', '1950-02-01', '1950-03-01', '1950-04-01',
               '1950-05-01', '1950-06-01', '1950-07-01', '1950-08-01',
               '1950-09-01', '1950-10-01',
               ...
               '2024-03-01', '2024-04-01', '2024-05-01', '2024-06-01',
               '2024-07-01', '2024-08-01', '2024-09-01', '2024-10-01',
               '2024-11-01', '2024-12-01'],
              dtype='datetime64[ns]', name='date', length=900, freq=None)
df_nao['NAO'].plot()
<Axes: xlabel='date'>
../_images/0a7d4b088677e53860f04abf6c2795a5163e316ec3566723fe88c1a6def72510.png

Now all we need to do to turn this into a Pyleoclim Series object is to sprinkle a little metadata.

metadata = {
        'time_unit': 'years CE',
        'time_name': 'Time',
        'value_name': 'NAO',
        'label': 'North Atlantic Oscillation',
        'archiveType': 'Instrumental',
        'importedFrom': 'NOAA/NCEI',
    }

NAO = pyleo.Series.from_pandas(df_nao['NAO'],metadata)
NAO.plot()
(<Figure size 1000x400 with 1 Axes>,
 <Axes: xlabel='Time [years CE]', ylabel='NAO'>)
../_images/2f2928ca6b8637730e8e1a554381e559e9a3485fca171f7a983ebe4aac6b8c7c.png

It turns out that this verison has gaps, which we need to fill by interpolation so other methods work properly:

NAOi = NAO.interp(step=1/12)
fig, ax = NAO.plot()
NAOi.plot(ax=ax, label='interpolated')
<Axes: xlabel='Time [years CE]', ylabel='NAO'>
../_images/08436a68c270fa715b425fe5ea209171e301d0cab677da33e701726c9da61b79.png

Let us now look at SOI and NAO side by side:

ms = NAOi & SOI
ms.stackplot()
(<Figure size 640x480 with 3 Axes>,
 {0: <Axes: ylabel='NAO'>,
  1: <Axes: ylabel='SOI [mb]'>,
  'x_axis': <Axes: xlabel='Time [year C.E.]'>})
../_images/cf44e6e13c4cd9e3309a51efa0cb1d193fd632e536cd57d3243f3e22109621bb.png

The series have comparable length, and SOI seems, by eye, to dislay stronger internnual variability. We could confirm this by spectral analysis, but this is not the object of this tutorial. Let’s dive into exploring their relationship, if any.

Measuring Association#

Here we will use the Pyleoclim’s correlation() function to measure association between SOI and NAO.

corr_base = SOI.correlation(NAOi, method='built-in')
print(corr_base)
  correlation    p-value  signif. (α: 0.05)
-------------  ---------  -------------------
    0.0011608       0.97  False

We see a very small correlation, with (under this particular test) a large p-value that does not allow to claim that this correlation is significant (i.e. remarkable) in any way. At this point you should be asking yourself what the underlying null hypothesis is. This link might be helpful in recalling that, and the assumptions of this standard test.

Note that the function allows to use a bunch of different statistics to measure association:

for stat in ['pearsonr','spearmanr','pointbiserialr','kendalltau','weightedtau']:
     corr = SOI.correlation(NAOi, statistic=stat, number=200)
     print(f'{stat}(NAO,SOI)={corr.r:.3f}, p={corr.p:.3f}')   
Evaluating association on surrogate pairs:   0%|                                                                | 0/200 [00:00<?, ?it/s]
Evaluating association on surrogate pairs:  78%|████████████████████████████████████████▌           | 156/200 [00:00<00:00, 1551.08it/s]
Evaluating association on surrogate pairs: 100%|████████████████████████████████████████████████████| 200/200 [00:00<00:00, 1549.06it/s]
pearsonr(NAO,SOI)=0.001, p=0.980

Evaluating association on surrogate pairs:   0%|                                                                | 0/200 [00:00<?, ?it/s]
Evaluating association on surrogate pairs: 100%|████████████████████████████████████████████████████| 200/200 [00:00<00:00, 2188.33it/s]

spearmanr(NAO,SOI)=-0.002, p=0.965
Evaluating association on surrogate pairs:   0%|                                                                | 0/200 [00:00<?, ?it/s]
Evaluating association on surrogate pairs:  81%|██████████████████████████████████████████          | 162/200 [00:00<00:00, 1616.32it/s]
Evaluating association on surrogate pairs: 100%|████████████████████████████████████████████████████| 200/200 [00:00<00:00, 1595.42it/s]

pointbiserialr(NAO,SOI)=0.001, p=0.975
Evaluating association on surrogate pairs:   0%|                                                                | 0/200 [00:00<?, ?it/s]
Evaluating association on surrogate pairs: 100%|████████████████████████████████████████████████████| 200/200 [00:00<00:00, 2572.95it/s]

kendalltau(NAO,SOI)=-0.001, p=0.980
Evaluating association on surrogate pairs:   0%|                                                                | 0/200 [00:00<?, ?it/s]
Evaluating association on surrogate pairs:  22%|████████████▏                                         | 45/200 [00:00<00:00, 447.87it/s]
Evaluating association on surrogate pairs:  46%|████████████████████████▌                             | 91/200 [00:00<00:00, 454.93it/s]
Evaluating association on surrogate pairs:  68%|████████████████████████████████████▎                | 137/200 [00:00<00:00, 454.62it/s]
Evaluating association on surrogate pairs:  92%|████████████████████████████████████████████████▊    | 184/200 [00:00<00:00, 457.71it/s]
Evaluating association on surrogate pairs: 100%|█████████████████████████████████████████████████████| 200/200 [00:00<00:00, 455.56it/s]
weightedtau(NAO,SOI)=-0.035, p=0.710

We see that no matter which way we slice it, we get very small numbers, which appear insignificant with respect to the default test.

Spurious Correlations#

In the geosciences, there are two process that might artificially increase correlations between otherwise unrelated variables

  1. smoothing

  2. common trends

Smoothing-enhanced correlations#

ms_md = ms.filter(method='butterworth',cutoff_scale=[2, 50])
ms_md.plot()
(<Figure size 1000x400 with 1 Axes>,
 <Axes: xlabel='Time [year C.E.]', ylabel='value'>)
../_images/902ccc23f2306deff2e755e8e5f4b017d034db90aba6099b5881379fda445aaf.png
corr_md=ms_md.series_list[0].correlation(ms_md.series_list[1],method='built-in')
print(corr_md)
  correlation  p-value    signif. (α: 0.05)
-------------  ---------  -------------------
     0.306343  < 1e-18    True

We see that we sufficient smoothing, we have increased the correlation to a point that it is now highly significant (note the infinitesimal p-value). Bu BEWARE! The built-in test for Pearson’s correlation, as recalled above, assumes that the data are IID: independent and identically distributed. After smoothing so heavily, neighboring values now resemble each other a lot, so independence is lost (see course notes for more details). So now we need to bring more sophisticated methods to gauge the p-values. First, let’s use a common analytical approximation to adjust for the loss of degrees of freedom (see course notes). This is coded up in Pyleoclim as ttest:

corr_md_dof=ms_md.series_list[0].correlation(ms_md.series_list[1],method='ttest')
print(corr_md_dof)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[17], line 1
----> 1 corr_md_dof=ms_md.series_list[0].correlation(ms_md.series_list[1],method='ttest')
      2 print(corr_md_dof)

File ~/Documents/GitHub/Pyleoclim_util/pyleoclim/core/series.py:3556, in Series.correlation(self, target_series, alpha, statistic, method, number, timespan, settings, seed, common_time_kwargs, mute_pbar)
   3554 if method == 'ttest':
   3555     if statistic == 'pearsonr':
-> 3556         stat, signf, pval = corrutils.corr_ttest(ts0.value, ts1.value, alpha=alpha)
   3557         signif = bool(signf)
   3558     else:

File ~/Documents/GitHub/Pyleoclim_util/pyleoclim/utils/correlation.py:234, in corr_ttest(y1, y2, alpha)
    231 Ney2 = N * (1-g2) / (1+g2)
    233 Ne = stats.mstats.gmean([Ney1+Ney2])
--> 234 assert Ne >= 10, 'Too few effective d.o.f. to apply this method!'
    236 df = Ne - 2
    237 t = np.abs(r) * np.sqrt(df/(1-r**2))

AssertionError: Too few effective d.o.f. to apply this method!

That didn’t work! As the error message implies, the smoothing has lowered the effective degrees of freedom to such an extent this method won’t work. Then let’s use a parametric Monte Carlo test: AR(1) benchmarks.

corr_md_ar1=ms_md.series_list[0].correlation(ms_md.series_list[1],method='ar1sim', number=5000, seed = 222)
print(corr_md_ar1)
Evaluating association on surrogate pairs: 100%|██████████| 5000/5000 [00:03<00:00, 1487.20it/s]
  correlation    p-value  signif. (α: 0.05)
-------------  ---------  -------------------
     0.306343       0.35  False

The correlation has not changed, but see what happened to the p-value! We can actually quantify this:

corr_md_ar1.p/corr_md.p
1.8258770939061737e+18

This is how much larger the p-value is under this test! So we got from highly significant to insignificant. Would we have found the same with the non-parametric Monte Carlo test (phase-randomization)?

corr_md_pr=ms_md.series_list[0].correlation(ms_md.series_list[1],method='phaseran', number=5000, seed = 222)
print(corr_md_pr)
Evaluating association on surrogate pairs: 100%|██████████| 5000/5000 [00:03<00:00, 1434.84it/s]
  correlation    p-value  signif. (α: 0.05)
-------------  ---------  -------------------
     0.306343       0.02  True

In this case, the two methods disagree, and phase-randomization appears more lenient. Disagreements between methods are frequent when there are relatively few degrees of freedom, and are cause for caution.

Take-home message: beware the effect of smoothing on the degrees of freedom, and assessments of significance.

Trend-enhanced correlations#

Now we look at a slightly different issue: how the presence of common trends may make otherwise unrelated timeseries appear artificially related. First, let’s apply the common_time method to obtain a common time vector to both series, from which we’ll construct the trend:

msc = ms.common_time()
tc = msc.series_list[0].time # would get the same answer with msc.series_list[`].time

The baseline trend will look something like this, which we’ll superimpose on the NAO and SOI indices, respectively.

trend = (tc-tc.mean())/tc.ptp() # create trend as a NumPy array
import matplotlib.pyplot as plt
plt.plot(tc, trend) # plot it
[<matplotlib.lines.Line2D at 0x1406ca890>]
../_images/9fd22fa7783bcb093fa27dc3f37da9e88684ff347441512643c5f23794853e2d.png
scales = [0.5, 1, 2]
for method in ['built-in','ttest','ar1sim','phaseran']:
    for s in scales:
        scaled_trend = s*trend # create trend as a NumPy array
        nao_t = msc.series_list[0].copy() # create copy of the NAO series
        nao_t.value += scaled_trend # add trend
        soi_t = msc.series_list[1].copy() # create copy of the SOI series
        soi_t.value += scaled_trend # add trend
        corr_t = nao_t.correlation(soi_t,method=method)
        print(f"Scale = {s}, using {method} method\n")
        print(corr_t)
Scale = 0.5, using built-in method

  correlation    p-value  signif. (α: 0.05)
-------------  ---------  -------------------
    0.0279778       0.42  False

Scale = 1, using built-in method

  correlation  p-value    signif. (α: 0.05)
-------------  ---------  -------------------
    0.0924265  < 1e-2     True

Scale = 2, using built-in method

  correlation  p-value    signif. (α: 0.05)
-------------  ---------  -------------------
     0.277871  < 1e-15    True

Scale = 0.5, using ttest method

  correlation    p-value  signif. (α: 0.05)
-------------  ---------  -------------------
    0.0279778       0.46  False

Scale = 1, using ttest method

  correlation    p-value  signif. (α: 0.05)
-------------  ---------  -------------------
    0.0924265       0.02  True

Scale = 2, using ttest method

  correlation  p-value    signif. (α: 0.05)
-------------  ---------  -------------------
     0.277871  < 1e-12    True
Evaluating association on surrogate pairs: 100%|██████████| 1000/1000 [00:00<00:00, 1523.83it/s]
Scale = 0.5, using ar1sim method

  correlation    p-value  signif. (α: 0.05)
-------------  ---------  -------------------
    0.0279778       0.44  False
Evaluating association on surrogate pairs: 100%|██████████| 1000/1000 [00:00<00:00, 1531.31it/s]
Scale = 1, using ar1sim method

  correlation    p-value  signif. (α: 0.05)
-------------  ---------  -------------------
    0.0924265       0.03  True
Evaluating association on surrogate pairs: 100%|██████████| 1000/1000 [00:00<00:00, 1537.74it/s]
Scale = 2, using ar1sim method

  correlation  p-value    signif. (α: 0.05)
-------------  ---------  -------------------
     0.277871  < 1e-6     True
Evaluating association on surrogate pairs: 100%|██████████| 1000/1000 [00:00<00:00, 1452.27it/s]
Scale = 0.5, using phaseran method

  correlation    p-value  signif. (α: 0.05)
-------------  ---------  -------------------
    0.0279778       0.55  False
Evaluating association on surrogate pairs: 100%|██████████| 1000/1000 [00:00<00:00, 1513.72it/s]
Scale = 1, using phaseran method

  correlation    p-value  signif. (α: 0.05)
-------------  ---------  -------------------
    0.0924265       0.15  False
Evaluating association on surrogate pairs: 100%|██████████| 1000/1000 [00:00<00:00, 1389.75it/s]
Scale = 2, using phaseran method

  correlation  p-value    signif. (α: 0.05)
-------------  ---------  -------------------
     0.277871  < 1e-2     True

As we dial up the amplitude of the trend (1/2, 1, 2x the original trend), we see that the correlation goes from 0.028, to 0.09, to 0.28. The built-in method, being easily fooled, declares all three as significant. The other methods are more resilient, but when the trend is large enough (2), all of them call the correlation significant. The t-test with adjusted degrees of freedom (DOFs) gives downright counterinuitive results: it calls 0.028 significant, 0.09 insignificant, and 0.28 significant again. In this case, phase randomization is the most conservative. This is why it is the default method in Pyleoclim.

Take-home message: common trends can easily create the appearance of correlations (see Tyler Vigen’s excellent website) and really complicate assessments of significance. If the trend is not relevant to your question, we recommend removing it prior to computing correlations, e.g. using Pyleoclim's detrend().

Takeways#

Not only is correlation not indicative of causation, but spurious correlations abound, often driven by smoothing, trends, or short sample sizes. Some null hypotheses are more stringent than others, but the built-in one nearly always assumed IID data, which is hardly ever verified in the geosciences. Make you carefully match your null hypothesis to your data/problem.

If if it causality you are after, more advanced tools are available, like Empirical Dynamical Modeling. It is well worth a dive into the underlying theory, but it is undoubtedly more complex.