Working with Age Ensembles#

by Alexander James and Deborah Khider

Preamble#

Quantifying chronological uncertainties, and how they influence the understanding of past changes in Earth systems, is a unique and fundamental challenge of the paleogeosciences. Without robust error determination, it is impossible to properly assess the extent to which past changes occurred simultaneously across regions, accurately estimate rates of change or the duration of abrupt events, or attribute causality – all of which limit our capacity to apply paleogeoscientific understanding to modern and future processes. With regards to the construction of an age-depth model, studies typically calculate a single “best” estimate (often the posterior median or mean), use this model to place measured paleoclimatic or paleoenvironmental data on a timescale, and then proceed to analyze the record with little to no reference to the uncertainties generated as part of the age-modeling exercise, however rigorous in its own right.

Luckily, the vast majority of modern age-uncertainty quantification techniques estimate uncertainties by generating an “ensemble” of plausible age models: hundreds or thousands of plausible alternate age–depth relationships that are consistent with radiometric age estimates, the depositional or accumulation processes of the archive, and the associated uncertainties. By taking advantage of these ensembles to evaluate how the results of an analysis vary across the ensemble members, age-based uncertainty can be more robustly accounted for. A rising number of paleoclimate records now incorporate such age ensembles, which the LiPD format was designed to store. However, properly leveraging these often unwieldy ensemble objects for analysis can be cumbersome. In this notebook we:

  • Show how translate ensemble age-depth relationships to ensemble age-paleovariable relationships

  • Show how to apply this transformatuon in two different data contexts.

The goal is to provide boilerplate code for specific solutions as well as a launching point for researchers to craft custom solutions as necessary. Further reading on age models (as well as an R implemenation for their generation) can be found in Mckay et al. (2021) (the paper from which much of this preamble was lifted).

Goals#

  • Create an ensemble_series object from two different file types:

    • A LiPD file

    • A NOAA file plus a CSV

  • Create an ensemble_series object using ensembles based on ChronData tables and ensembles based on PaleoData tables.

Reading time:

Keywords#

LiPD; Age ensembles

Pre-requisites#

This tutorial assumes basic knowledge of Python. If you are not familiar with this coding language, check out this tutorial.

Relevant Packages#

PyLiPD; Pandas

Data Description#

This tutorial makes use of the following datasets:

  • Crystal cave: McCabe-Glynn, S., Johnson, K., Strong, C. et al. Variable North Pacific influence on drought in southwestern North America since AD 854. Nature Geosci 6, 617–621 (2013). https://doi.org/10.1038/ngeo1862

  • MD98-2181: Khider, D., Jackson, C. S., and Stott, L. D., Assessing millennial-scale variability during the Holocene: A perspective from the western tropical Pacific, Paleoceanography, 29, 143– 159 (2014). https://doi:10.1002/2013PA002534.

Demonstration#

Let’s import the necessary packages:

%load_ext watermark

import ast

import pyleoclim as pyleo
import numpy as np
import pandas as pd

from pylipd.lipd import LiPD

Ensemble series from chron ensemble#

In the following two examples we will create EnsembleSeries objects from ensembles where uncertainty is represented by variations in the time axis. This structure is typically used to represent dating uncertainty.

Example from LiPD#

For this example we’ll use the Crystal Cave record.

The easiest way to load data from a LiPD file in Python is to use the PyLiPD package. Tutorials on basic usage of the PyLiPD package can be found in the PyLiPD Tutorials GitHub repository.

#Create a path to the data
filename = '../data/Crystal.McCabe-Glynn.2013.lpd'

#Initialize the lipd object
D = LiPD()

#Load the data
D.load(filename)
Loading 1 LiPD files
100%|██████████| 1/1 [00:00<00:00,  4.62it/s]
Loaded..

You can create a DataFrame containing relevant information about the ensembles: (1) ensembleTable : the full name of the table (relevant if one exists) (2) EnsembleVariableValues: Each column represents a member of the ensemble. (3) information about depth

Note that the DataFrame includes information about units, which may be needed to convert the age-depth found in the ensemble table vs the other found with the data.

Check the units!!!!!
#Pull the ensemble tables into a dataframe
ensemble_df = D.get_ensemble_tables()
ensemble_df
datasetName ensembleTable ensembleVariableName ensembleVariableValues ensembleVariableUnits ensembleDepthName ensembleDepthValues ensembleDepthUnits notes
0 Crystal.McCabe-Glynn.2013 http://linked.earth/lipd/Crystal.McCabe-Glynn.... Year [[2007.0, 2007.0, 2008.0, 2007.0, 2007.0, 2007... yr AD depth [0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0... mm None

You can flatten the information contained in a LiPD file into a DataFrame that can be used for further querying. The columns correspond to metadata properties while each row corresponds to a specific variable (a column in the csv file).

#Pull the paleo data into a list. We use all the available data set names because our file only contains one dataset
df = D.get_timeseries_essentials(D.get_all_dataset_names()[0], mode='paleo')
df
dataSetName archiveType geo_meanLat geo_meanLon geo_meanElev paleoData_variableName paleoData_values paleoData_units paleoData_proxy paleoData_proxyGeneral time_variableName time_values time_units depth_variableName depth_values depth_units
0 Crystal.McCabe-Glynn.2013 Speleothem 36.59 -118.82 1386.0 d18o [-8.01, -8.23, -8.61, -8.54, -8.6, -9.08, -8.9... permil None None age [2007.7, 2007.0, 2006.3, 2005.6, 2004.9, 2004.... yr AD depth [0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0... mm

Let’s grab some information that we will need at a later time:

paleoDepth = df['depth_values'].iloc[0]
paleoValues = df['paleoData_values'].iloc[0]
paleoTime = df['time_values'].iloc[0]

#It's wise to make sure our units all make sense so we'll pull these as well
paleo_depth_units = df['depth_units'].iloc[0]

#The stored value name and value unit are horrendously formatted, so we'll hard code them using info from the dataframe
value_name = 'd18O'
value_unit = 'permil VPDB'

#We can access the row of interest in our ensemble table via indexing by 0 (because there's just the one row anyway)
chronDepth = ensemble_df.iloc[0]['ensembleDepthValues']
chronValues = ensemble_df.iloc[0]['ensembleVariableValues']

#Getting depth units, time name, and time units from our ensemble table
ensemble_depth_units = ensemble_df.iloc[0]['ensembleDepthUnits']

#The way time name and units are stored in our ensemble dataframe are a bit wonky, so we'll do some organization of our own
time_name = 'Time'
time_unit = f'{ensemble_df.iloc[0]["ensembleVariableName"]} {ensemble_df.iloc[0]["ensembleVariableUnits"]}'
print(f'Num rows in chronValues: {chronValues.shape[0]}, Length of chronDepth: {len(chronDepth)}')
Num rows in chronValues: 1053, Length of chronDepth: 1053

[All looks well. Let’s make a EnseembleSeries object using the from_AgeEnsembleArray method. To use this functionality, it’s often easier to create a general Series object first so some of the meatadata (e.g., name and units) can be reused when the EnsembleSeries is created.

The function takes the following argument:

  • series – A Series object with the values to be mapped and needed metadata;

  • age_array (np.array) – An array of ages to map the values to; in our example, this is the ChronValues variable.

  • value_depth (vector) – An array of depths corresponding to the series values; in our example, this is the paleoDepth variable.

  • age_depth (vector) – An array of depths corresponding to the age array; in our example, this is the chronDepth variable.

  • extrapolate (bool) – Whether to extrapolate the age array to the value depth. By default, this is set to True.

Note: This function also work similarly with `EnsembleGeoseries`. If `depth` is a vector in the `GeoSeries` object, then `value_depth` does not need to be specified.
ts = pyleo.Series(time = paleoTime, value = paleoValues, time_name = time_name,
                 value_name = value_name, time_unit = time_unit,
                 value_unit = value_unit, label = df['dataSetName'].iloc[0], verbose=False)


ensemble = pyleo.EnsembleSeries.from_AgeEnsembleArray(series = ts, age_array = chronValues, value_depth = paleoDepth,
                                                     age_depth = chronDepth, verbose = False)

And voila! We can leverage all of pyleoclim’s wonderful ensemble functionalities.

ensemble.common_time().plot_envelope(figsize=(16,10))
(<Figure size 1600x1000 with 1 Axes>,
 <Axes: xlabel='Time [Year yr AD]', ylabel='d18O [permil VPDB]'>)
../_images/7001904596068dc18a93c9050285a5131a39ca9239d59bdec3a76623933bff25.png
Exporting ensemble information to Numpy Arrays and Pandas DataFrame#

In some instances, it may be useful to return the age information stored in a LiPD file directly into a numpy.array or a pandas.DataFrame for use outside of Pyleoclim. You can do this using the to_array and to_dataframe.

Let’s recover the age ensemble and store them it into a numpy.array. The to_array function takes the following arguments:

  • axis (str, [‘time’, ‘value’], optional) – Whether the return the ensemble from value or time. The default is ‘value’.

  • labels (bool, [True,False], optional) – Whether to retrun a separate list with the timseries labels. The default is True.

vals, headers = ensemble.to_array(axis='time')
vals
array([[2007., 2007., 2008., ..., 2007., 2007., 2007.],
       [2007., 2007., 2008., ..., 2007., 2007., 2007.],
       [2006., 2006., 2008., ..., 2006., 2006., 2006.],
       ...,
       [ 985.,  957.,  896., ..., 1000., 1005.,  655.],
       [ 985.,  957.,  895., ..., 1000., 1004.,  655.],
       [ 985.,  957.,  894., ..., 1000., 1004.,  654.]])

Another option is to return to a pandas.DataFrame using the to_dataframe method.

Example from NOAA#

For this example we’ll use the MD982181 record from Khider et al. (2014). We will source the CSV file containing the age ensemble for the SST values from this record from the Linked Earth wiki page, though a local CSV file could also be used here with minimal changes.

#Loading the ensemble tables. Note that column 0 corresponds to depth

ensemble_url = 'https://wiki.linked.earth/wiki/images/7/79/MD982181.Khider.2014.chron1model1ensemble.csv'

ensemble_df = pd.read_csv(ensemble_url,header=None)

ensemble_df
0 1 2 3 4 5 6 7 8 9 ... 991 992 993 994 995 996 997 998 999 1000
0 1.0 207.301109 -21.273103 359.894201 3.950743 69.791337 195.918556 477.506539 254.071725 18.043039 ... 200.832011 29.981959 159.156188 -30.030879 10.998169 256.535747 232.055674 61.422668 -24.770834 475.549540
1 2.5 210.598385 -6.095219 363.537231 19.011842 79.372074 202.387822 478.016110 257.074060 29.545646 ... 207.932983 38.840758 167.586782 -18.344618 17.123170 262.956721 235.486451 71.243925 -18.329537 475.748343
2 5.0 216.040623 19.369267 369.630612 44.252513 95.396047 213.157541 478.839723 262.061321 48.814435 ... 219.792022 53.661308 181.704918 1.189714 27.295888 273.711759 241.180553 87.665771 -7.626766 476.046104
3 6.5 219.280420 34.728605 373.297041 59.463559 105.037349 219.613424 479.321568 265.045696 60.422642 ... 226.919006 62.580463 190.208029 12.937785 33.382424 280.190421 244.585561 97.544358 -1.220809 476.208644
4 7.0 220.356966 39.859076 374.520562 64.542743 108.254684 221.764597 479.480550 266.039429 64.298267 ... 229.296199 65.557071 193.046673 16.857451 35.409002 282.353374 245.719046 100.840599 0.912427 476.260687
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
332 827.0 9434.248215 9545.771005 9596.976203 9588.136195 9438.630060 9611.671864 9514.355046 9613.503470 9515.947093 ... 9441.197068 9509.291601 9584.105141 9487.163021 9432.573204 9528.250540 9577.714737 9442.864368 9570.410846 9500.091340
333 836.0 9662.543323 9749.147272 9818.449843 9818.782493 9638.010047 9835.377593 9714.793979 9837.351019 9721.120860 ... 9644.065902 9728.433304 9795.061924 9676.390919 9623.447075 9734.211473 9803.351304 9649.143650 9780.641448 9718.151948
334 839.0 9739.420208 9817.390937 9892.732202 9896.475674 9705.027332 9910.431322 9781.949176 9912.636525 9790.081021 ... 9712.162198 9802.252278 9865.803631 9739.802457 9687.383488 9803.283922 9879.292011 9718.456343 9851.235118 9791.439448
335 843.0 9842.408624 9908.664339 9992.061086 10000.572811 9794.731539 10010.805766 9871.703033 10013.435160 9882.382981 ... 9803.252697 9901.159253 9960.389782 9824.560618 9772.826640 9895.641914 9981.000988 9811.218375 9945.682572 9889.530993
336 846.0 9919.958299 9977.298254 10066.739084 10078.966925 9862.230432 10086.278763 9939.153975 10089.299381 9951.834789 ... 9871.758062 9975.645163 10031.496854 9888.262156 9837.032501 9965.076292 10057.571280 9881.008911 10016.722863 9963.337556

337 rows × 1001 columns

The format here is different from the one we encountered from the LiPD file, so we’ll do things a bit differently. Outside of LiPD (and sometimes within it), the format of age ensemble tables can vary considerably. We’ll show how to approach formatting this table in a minute. First, let’s look at the paleo data we’re working with.

url = 'https://www.ncei.noaa.gov/pub/data/paleo/contributions_by_author/khider2014/khider2014-sst.txt'


df = pd.read_csv(url, sep = '\t', skiprows=137)

df
depth_cm Mg/Ca-g.rub-w d18Og.rub-w age_calyrBP age-2.5% age-34% age-68% age-97.5% SST SST-2.5% SST-34% SST-68% SST-97.5% δ18Osw δ18Osw 2.5% δ18Osw 34% δ18Osw 68% δ18Osw 97.5%
0 1.0 5.21 -999.990 199 -63 96 237 428 27.80 26.88 27.63 28.04 28.74 -999.99 -999.99 -999.99 -999.99 -999.99
1 2.5 5.04 -2.830 205 -53 104 242 429 27.41 26.47 27.23 27.65 28.35 0.02 -0.32 -0.05 0.10 0.35
2 3.0 -999.99 -2.810 207 -50 107 244 430 -999.99 -999.99 -999.99 -999.99 -999.99 -999.99 -999.99 -999.99 -999.99 -999.99
3 4.0 -999.99 -2.780 211 -44 112 248 432 -999.99 -999.99 -999.99 -999.99 -999.99 -999.99 -999.99 -999.99 -999.99 -999.99
4 5.0 5.10 -2.788 216 -38 117 252 433 27.55 26.58 27.36 27.79 28.49 0.09 -0.25 0.01 0.17 0.43
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
450 831.0 -999.99 -2.580 9604 9450 9563 9649 9764 -999.99 -999.99 -999.99 -999.99 -999.99 -999.99 -999.99 -999.99 -999.99 -999.99
451 836.0 5.14 -2.650 9722 9565 9677 9770 9886 27.64 26.67 27.46 27.87 28.59 0.01 -0.33 -0.06 0.09 0.34
452 839.0 5.77 -2.620 9793 9634 9747 9844 9960 28.95 28.08 28.77 29.17 29.88 0.30 -0.04 0.23 0.38 0.63
453 843.0 5.45 -2.640 9888 9726 9839 9942 10059 28.30 27.42 28.14 28.53 29.19 0.13 -0.20 0.07 0.22 0.47
454 846.0 5.42 -2.530 9960 9794 9909 10015 10134 28.24 27.33 28.06 28.47 29.16 0.22 -0.10 0.15 0.31 0.56

455 rows × 18 columns

The missing value field was -999, let’s change it to NaN.

df.replace(-999.99, np.NaN, inplace=True)
df
depth_cm Mg/Ca-g.rub-w d18Og.rub-w age_calyrBP age-2.5% age-34% age-68% age-97.5% SST SST-2.5% SST-34% SST-68% SST-97.5% δ18Osw δ18Osw 2.5% δ18Osw 34% δ18Osw 68% δ18Osw 97.5%
0 1.0 5.21 NaN 199 -63 96 237 428 27.80 26.88 27.63 28.04 28.74 NaN NaN NaN NaN NaN
1 2.5 5.04 -2.830 205 -53 104 242 429 27.41 26.47 27.23 27.65 28.35 0.02 -0.32 -0.05 0.10 0.35
2 3.0 NaN -2.810 207 -50 107 244 430 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 4.0 NaN -2.780 211 -44 112 248 432 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 5.0 5.10 -2.788 216 -38 117 252 433 27.55 26.58 27.36 27.79 28.49 0.09 -0.25 0.01 0.17 0.43
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
450 831.0 NaN -2.580 9604 9450 9563 9649 9764 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
451 836.0 5.14 -2.650 9722 9565 9677 9770 9886 27.64 26.67 27.46 27.87 28.59 0.01 -0.33 -0.06 0.09 0.34
452 839.0 5.77 -2.620 9793 9634 9747 9844 9960 28.95 28.08 28.77 29.17 29.88 0.30 -0.04 0.23 0.38 0.63
453 843.0 5.45 -2.640 9888 9726 9839 9942 10059 28.30 27.42 28.14 28.53 29.19 0.13 -0.20 0.07 0.22 0.47
454 846.0 5.42 -2.530 9960 9794 9909 10015 10134 28.24 27.33 28.06 28.47 29.16 0.22 -0.10 0.15 0.31 0.56

455 rows × 18 columns

As in the crystal cave example, loading the data is fairly simple. Most of the legwork is getting our data into the appropriate format. We can pull the necessary paleo data from the NOAA dataframe. Note that we need to check the column names as sometimes unusual syntax will throw a wrench in the works. In this example the Depth column is in fact depth_cm.

df.columns
Index(['depth_cm', 'Mg/Ca-g.rub-w', 'd18Og.rub-w', 'age_calyrBP', 'age-2.5%',
       'age-34%', 'age-68%', 'age-97.5%', 'SST ', 'SST-2.5%', 'SST-34%',
       'SST-68%', 'SST-97.5%', 'δ18Osw', 'δ18Osw 2.5%', 'δ18Osw 34%',
       'δ18Osw 68%', 'δ18Osw 97.5%'],
      dtype='object')

Let’s create a GeoSeries instead of a Series object to start the EnsembleGeoSeries to illustrate the functionality that takes depth from the GeoSeries object directly.

paleoValues = df['SST '].to_numpy().flatten()
paleoDepth = df['depth_cm'].to_numpy().flatten()
age = df['age_calyrBP'].to_numpy().flatten() 

value_name = 'SST'
value_unit = 'deg C'

time_name = 'Age'
time_unit = 'Year BP'

Now for the ensemble values.

ensemble_df
0 1 2 3 4 5 6 7 8 9 ... 991 992 993 994 995 996 997 998 999 1000
0 1.0 207.301109 -21.273103 359.894201 3.950743 69.791337 195.918556 477.506539 254.071725 18.043039 ... 200.832011 29.981959 159.156188 -30.030879 10.998169 256.535747 232.055674 61.422668 -24.770834 475.549540
1 2.5 210.598385 -6.095219 363.537231 19.011842 79.372074 202.387822 478.016110 257.074060 29.545646 ... 207.932983 38.840758 167.586782 -18.344618 17.123170 262.956721 235.486451 71.243925 -18.329537 475.748343
2 5.0 216.040623 19.369267 369.630612 44.252513 95.396047 213.157541 478.839723 262.061321 48.814435 ... 219.792022 53.661308 181.704918 1.189714 27.295888 273.711759 241.180553 87.665771 -7.626766 476.046104
3 6.5 219.280420 34.728605 373.297041 59.463559 105.037349 219.613424 479.321568 265.045696 60.422642 ... 226.919006 62.580463 190.208029 12.937785 33.382424 280.190421 244.585561 97.544358 -1.220809 476.208644
4 7.0 220.356966 39.859076 374.520562 64.542743 108.254684 221.764597 479.480550 266.039429 64.298267 ... 229.296199 65.557071 193.046673 16.857451 35.409002 282.353374 245.719046 100.840599 0.912427 476.260687
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
332 827.0 9434.248215 9545.771005 9596.976203 9588.136195 9438.630060 9611.671864 9514.355046 9613.503470 9515.947093 ... 9441.197068 9509.291601 9584.105141 9487.163021 9432.573204 9528.250540 9577.714737 9442.864368 9570.410846 9500.091340
333 836.0 9662.543323 9749.147272 9818.449843 9818.782493 9638.010047 9835.377593 9714.793979 9837.351019 9721.120860 ... 9644.065902 9728.433304 9795.061924 9676.390919 9623.447075 9734.211473 9803.351304 9649.143650 9780.641448 9718.151948
334 839.0 9739.420208 9817.390937 9892.732202 9896.475674 9705.027332 9910.431322 9781.949176 9912.636525 9790.081021 ... 9712.162198 9802.252278 9865.803631 9739.802457 9687.383488 9803.283922 9879.292011 9718.456343 9851.235118 9791.439448
335 843.0 9842.408624 9908.664339 9992.061086 10000.572811 9794.731539 10010.805766 9871.703033 10013.435160 9882.382981 ... 9803.252697 9901.159253 9960.389782 9824.560618 9772.826640 9895.641914 9981.000988 9811.218375 9945.682572 9889.530993
336 846.0 9919.958299 9977.298254 10066.739084 10078.966925 9862.230432 10086.278763 9939.153975 10089.299381 9951.834789 ... 9871.758062 9975.645163 10031.496854 9888.262156 9837.032501 9965.076292 10057.571280 9881.008911 10016.722863 9963.337556

337 rows × 1001 columns

chronDepth = ensemble_df[0].to_numpy() #The depth values in this case are stored in column 0

chronValues = ensemble_df[ensemble_df.columns.values[1:]].to_numpy() 

Checking that our dimensions are as we expect:

print(f'Num rows in chronValues: {chronValues.shape[0]}, Length of chronDepth: {len(chronDepth)}')
Num rows in chronValues: 337, Length of chronDepth: 337

Let’s create an EnsembleGeoSeries object for a GeoSeries object. The main advantage of doing so is to deal with NaNs in the timeseries. Upon creation of the GeoSeries object the time, depth, and value axes are aligned without missing values; ensuring consistency among these vectors for the creation of the EnsembleGeoSeries object:

ts = pyleo.GeoSeries(time = age, value = paleoValues, value_name = value_name, verbose=False,
                  time_name = time_name, value_unit = value_unit, time_unit = time_unit, depth = paleoDepth, lat = 6.45, lon = 125.83)

ensemble = pyleo.EnsembleGeoSeries.from_AgeEnsembleArray(geo_series = ts, age_array = chronValues,
                                                     age_depth = chronDepth, verbose = False)

And again we can leverage Pyleoclim’s ensemble capabilities!

ensemble.common_time().plot_envelope(figsize=(16,10))
(<Figure size 1600x1000 with 1 Axes>,
 <Axes: xlabel='Age [Year BP]', ylabel='SST [deg C]'>)
../_images/ad8d5897bd3d7e20c51f2fb18060a72f836ca54162e768710f55b1edce7ed652.png

Ensemble series from paleo ensemble#

In the following example we’ll create an EnsembleGeoSeries object where uncertainty is represented by variations in the value axis. This structure is more typically used to represent measurement uncertainty, though depending on the age modeling method used it can also be used to represent dating uncertainty.

In order to do this we will use the from_PaleoEnsembleArray method attached to the EnsembleGeoSeries class. Note that this method is also available from the EnsembleSeries class.

In order to get at a paleo based ensemble table we’ll create a custom PyLiPD query. If you want to learn more about how to construct these queries for yourself, we recommend checking out the PyLiPD Tutorials.

D = LiPD()

D.load(['../data/MD982181.Khider.2014.lpd'])
Loading 1 LiPD files
100%|██████████| 1/1 [00:00<00:00,  1.30it/s]
Loaded..

query = """
    PREFIX le: <http://linked.earth/ontology#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

    SELECT ?datasetName ?ensembleTable ?ensembleVariableName ?ensembleVariableValues ?ensembleVariableUnits ?ensembleDepthName ?ensembleDepthValues ?ensembleDepthUnits ?notes ?methodobj ?methods
    WHERE {
        ?ds a le:Dataset .
        ?ds le:hasName ?datasetName .
            FILTER regex(?datasetName, "[dsname].*", "i").

        ?ds le:hasPaleoData ?paleo .
        ?paleo le:modeledBy ?model .
        ?model le:hasEnsembleTable ?ensembleTable .
            OPTIONAL{?ensembleTable le:notes ?notes .}

        ?ensembleTable le:hasVariable ?ensvar .
        ?ensvar le:hasName ?ensembleVariableName .
            FILTER regex(lcase(?ensembleVariableName), "[ensembleVarName].*", "i").
        ?ensvar le:hasValues ?ensembleVariableValues .
        OPTIONAL{?ensvar le:hasUnits ?ensembleVariableUnitsObj .
                ?ensembleVariableUnitsObj rdfs:label ?ensembleVariableUnits .
                VALUES ?ensembleVariableUnits {"deg C"} .}

        ?ensembleTable le:hasVariable ?ensdepthvar .
        ?ensdepthvar le:hasName ?ensembleDepthName .
            FILTER regex(lcase(?ensembleDepthName), "[ensembleDepthVarName].*", "i").
            FILTER (?ensembleDepthName = 'depth')
        ?ensdepthvar le:hasValues ?ensembleDepthValues .
            OPTIONAL{?ensdepthvar le:hasUnits ?ensembleDepthUnitsObj .
                    ?ensembleDepthUnitsObj rdfs:label ?ensembleDepthUnits.}
    
       }
"""

_,ens_df = D.query(query)

ens_df
datasetName ensembleTable ensembleVariableName ensembleVariableValues ensembleVariableUnits ensembleDepthName ensembleDepthValues ensembleDepthUnits notes methodobj methods
0 MD982181.Khider.2014 http://linked.earth/lipd/paleo0model0ensemble0 realization 1 [[1.0, 28.2528873873423, 27.6829378220138, 28.... deg C depth [1.0, 2.5, 5.0, 6.5, 7.0, 8.0, 11.5, 13.0, 15.... cm SST ensemble from the Bayesian calibration des... None None
1 MD982181.Khider.2014 http://linked.earth/lipd/paleo0model0ensemble0 depth [1.0, 2.5, 5.0, 6.5, 7.0, 8.0, 11.5, 13.0, 15.... None depth [1.0, 2.5, 5.0, 6.5, 7.0, 8.0, 11.5, 13.0, 15.... cm SST ensemble from the Bayesian calibration des... None None
2 MD982181.Khider.2014 http://linked.earth/lipd/paleo0model1ensemble0 realization 1 [[2.5, -0.1929763844799, -0.266404906905685, 0... None depth [2.5, 5.0, 6.5, 7.0, 8.0, 11.5, 13.0, 15.0, 16... cm d18Osw ensemble from the Bayesian calibration ... None None
3 MD982181.Khider.2014 http://linked.earth/lipd/paleo0model1ensemble0 depth [2.5, 5.0, 6.5, 7.0, 8.0, 11.5, 13.0, 15.0, 16... None depth [2.5, 5.0, 6.5, 7.0, 8.0, 11.5, 13.0, 15.0, 16... cm d18Osw ensemble from the Bayesian calibration ... None None

Having returned a reasonable looking dataframe, we’ll pull the data we need. We have to apply some transformations to the data in order to get them into proper array format since LiPD stores data as strings. Built in PyLiPD functions take care of this for you, we only have to worry about it because we’re using a custom query.

Everything else we do after this is almost identical to our other examples, the only difference being that paleoValues is now a matrix and chronValues is a vector.

ens_df['ensembleVariableValues'] = ens_df['ensembleVariableValues'].apply(lambda row : np.array(ast.literal_eval(row)))
ens_df['ensembleDepthValues'] = ens_df['ensembleDepthValues'].apply(lambda row : np.array(ast.literal_eval(row)))
paleo_df = D.get_timeseries_essentials()
paleo_row = paleo_df[paleo_df['paleoData_variableName']=='sst']
paleo_row
dataSetName archiveType geo_meanLat geo_meanLon geo_meanElev paleoData_variableName paleoData_values paleoData_units paleoData_proxy paleoData_proxyGeneral time_variableName time_values time_units depth_variableName depth_values depth_units
6 MD982181.Khider.2014 Marine sediment 6.45 125.83 -2114.0 sst [27.8, 27.41, nan, nan, 27.55, 27.16, 27.35, 2... deg C None None age_calyrbp [199.0, 205.0, 207.0, 211.0, 216.0, 222.0, 224... yr BP depth_cm [1.0, 2.5, 3.0, 4.0, 5.0, 6.5, 7.0, 8.0, 9.0, ... cm
paleoValues = ens_df[ens_df['ensembleVariableUnits']=='deg C']['ensembleVariableValues'].to_numpy()[0] #Drop the column that contains depth
paleoDepth = ens_df[ens_df['ensembleVariableUnits']=='deg C']['ensembleDepthValues'].to_numpy()[0]

value_name = "SST"
value_unit = "deg C"

chronValues = paleo_row['time_values'].to_numpy()[0]
chronDepth = paleo_row['depth_values'].to_numpy()[0]

time_name = 'Time'
time_unit = 'Years BP'

Let’s create a GeoSeries object as the basis for our EnsembleGeoSeries object:

ts = pyleo.GeoSeries(time = chronValues, value = paleo_row['paleoData_values'].iloc[0], 
                     time_name =  time_name, time_unit = time_unit, verbose=False,
                     value_name = value_name, value_unit = value_unit,
                     label = 'MD98-2181', archiveType = 'Marine sediment',
                     depth = chronDepth, depth_name = 'Depth', depth_unit = 'cm', lat = paleo_df['geo_meanLat'].iloc[0],
                    lon = paleo_df['geo_meanLon'].iloc[0])

ensemble = pyleo.EnsembleGeoSeries.from_PaleoEnsembleArray(geo_series = ts, paleo_array = paleoValues, age_depth = ts.depth, paleo_depth = paleoDepth)

We can then apply ensemble methods, per usual.

ensemble.common_time().spectral().plot_envelope(figsize=(16,10))
Performing spectral analysis on individual series: 100%|██████████| 1000/1000 [00:05<00:00, 178.28it/s]
(<Figure size 1600x1000 with 1 Axes>,
 <Axes: xlabel='Period [Years BP]', ylabel='PSD'>)
../_images/5e1442dac23784315970b0d5bc9b02c7dd338112be16679e0e61b5bfb3a91ba5.png
%watermark -n -u -v -iv -wpy
Last updated: Mon Aug 26 2024

Python implementation: CPython
Python version       : 3.10.14
IPython version      : 8.24.0

y: not installed

pyleoclim: 1.0.0b0
numpy    : 1.26.4
pandas   : 2.1.4

Watermark: 2.4.3