{
"cells": [
{
"cell_type": "markdown",
"id": "49c1c16c-d5d7-40f1-a190-17beb9164984",
"metadata": {},
"source": [
"\n",
"\n",
"# Principal Component Analysis with Pyleoclim\n",
"\n",
"by [Julien Emile-Geay](https://orcid.org/0000-0001-5920-4751), [Deborah Khider](https://orcid.org/0000-0001-7501-8430), [Alexander James](https://orcid.org/0000-0001-8561-3188)\n",
"\n",
"## Preamble\n",
"A chief goal of multivariate data analysis is data reduction, which attempts to condense the information contained in a potentially high-dimensional dataset into a few interpretable patterns. Chief among data reduction techniques is Principal Component Analysis (PCA), which organizes the data into orthogonal patterns that account for a decreasing share of variance: the first pattern accounts for the lion's share, followed by the second, third, and so on. In geophysical timeseries, it is often the case that the first pattern (\"mode\") tends to be associated with the longest time scale (e.g. a secular trend). PCA is therefore very useful for exploratory data analysis, and sometimes helps testing theories of climate change (e.g. comparing simulated to observed patterns of change). \n",
"\n",
"### Goals\n",
"\n",
"In this notebook you'll see how to apply PCA within Pyleoclim. For more details, see [Emile-Geay (2017), chap 12](http://dx.doi.org/10.6084/m9.figshare.1014336). \n",
"In addition, it will walk you through Monte-Carlo PCA, which is the version of the PCA relevant to the analysis of records that present themselves as age ensembles. \n",
"\n",
"**Reading Time: 15 min**\n",
"\n",
"### Keywords\n",
"Principal Component Analysis, Singular Value Decomposition, Data Reduction\n",
"\n",
"### Pre-requisites\n",
"None\n",
"\n",
"### Relevant Packages\n",
"statsmodels, matplotlib, pylipd\n",
"\n",
"## Data Description\n",
"- for PCA we use the Euro2k database, as the European working group of the [PAGES 2k paleotemperature compilation](http://dx.doi.org/10.1038/sdata.2017.88), which gathers proxies from multiple archives, mainly wood, coral, lake sediment and documentary archives. \n",
"- for MC-PCA we use the [SISAL v2 database](http://dx.doi.org/10.1038/sdata.2017.88) of speleothem records.\n",
"\n",
"## Demonstration\n",
"\n",
"Let's load packages first:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "024c5495-ab9d-4729-84f5-0c652f866c09",
"metadata": {},
"outputs": [],
"source": [
"%load_ext watermark\n",
"\n",
"import pyleoclim as pyleo\n",
"import numpy as np\n",
"import seaborn as sns"
]
},
{
"cell_type": "markdown",
"id": "ae3abfca-5a0d-40e1-a5f3-c8301b5844be",
"metadata": {},
"source": [
"### Data Wrangling and Processing\n",
"To load this dataset, we make use of [pylipd](https://pylipd.readthedocs.io). We first import everything into a pandas dataframe:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "4e7d0efe-e06e-4d24-87be-c123c3039b53",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Loading 16 LiPD files\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 16/16 [00:00<00:00, 32.59it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Loaded..\n"
]
},
{
"data": {
"text/html": [
"
\n", " | dataSetName | \n", "archiveType | \n", "geo_meanLat | \n", "geo_meanLon | \n", "geo_meanElev | \n", "paleoData_variableName | \n", "paleoData_values | \n", "paleoData_units | \n", "paleoData_proxy | \n", "paleoData_proxyGeneral | \n", "time_variableName | \n", "time_values | \n", "time_units | \n", "depth_variableName | \n", "depth_values | \n", "depth_units | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "Ocn-RedSea.Felis.2000 | \n", "Coral | \n", "27.8500 | \n", "34.3200 | \n", "-6.0 | \n", "d18O | \n", "[-4.12, -3.82, -3.05, -3.02, -3.62, -3.96, -3.... | \n", "permil | \n", "d18O | \n", "None | \n", "year | \n", "[1995.583, 1995.417, 1995.25, 1995.083, 1994.9... | \n", "yr AD | \n", "None | \n", "None | \n", "None | \n", "
1 | \n", "Ant-WAIS-Divide.Severinghaus.2012 | \n", "Borehole | \n", "-79.4630 | \n", "-112.1250 | \n", "1766.0 | \n", "temperature | \n", "[-29.607, -29.607, -29.606, -29.606, -29.605, ... | \n", "degC | \n", "borehole | \n", "None | \n", "year | \n", "[8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,... | \n", "yr AD | \n", "None | \n", "None | \n", "None | \n", "
2 | \n", "Ant-WAIS-Divide.Severinghaus.2012 | \n", "Borehole | \n", "-79.4630 | \n", "-112.1250 | \n", "1766.0 | \n", "uncertainty_temperature | \n", "[1.327, 1.328, 1.328, 1.329, 1.33, 1.33, 1.331... | \n", "degC | \n", "None | \n", "None | \n", "year | \n", "[8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,... | \n", "yr AD | \n", "None | \n", "None | \n", "None | \n", "
3 | \n", "Asi-SourthAndMiddleUrals.Demezhko.2007 | \n", "Borehole | \n", "55.0000 | \n", "59.5000 | \n", "1900.0 | \n", "temperature | \n", "[0.166, 0.264, 0.354, 0.447, 0.538, 0.62, 0.68... | \n", "degC | \n", "borehole | \n", "None | \n", "year | \n", "[800, 850, 900, 950, 1000, 1050, 1100, 1150, 1... | \n", "yr AD | \n", "None | \n", "None | \n", "None | \n", "
4 | \n", "Ocn-AlboranSea436B.Nieto-Moreno.2013 | \n", "Marine sediment | \n", "36.2053 | \n", "-4.3133 | \n", "-1108.0 | \n", "temperature | \n", "[18.79, 19.38, 19.61, 18.88, 18.74, 19.25, 18.... | \n", "degC | \n", "alkenone | \n", "None | \n", "year | \n", "[1999.07, 1993.12, 1987.17, 1975.26, 1963.36, ... | \n", "yr AD | \n", "None | \n", "None | \n", "None | \n", "