Basic manipulation of pylipd.LiPD objects

Basic manipulation of `pylipd.LiPD` objects#

Authors#

by Deborah Khider

Preamble#

Goals:#

Extract a LiPD time series for analysis
Remove/pop LiPD datasets from an existing LiPD object

Reading Time: 5 minutes

Keywords#

LiPD; query

Pre-requisites#

None. This tutorial assumes basic knowledge of Python and Pandas. If you are not familiar with this coding language and the Pandas library, check out this tutorial: http://linked.earth/ec_workshops_py/.

Relevant Packages#

Pandas, pylipd

Data Description#

This notebook uses the following datasets, in LiPD format:

Nurhati, I. S., Cobb, K. M., & Di Lorenzo, E. (2011). Decadal-scale SST and salinity variations in the central tropical Pacific: Signatures of natural and anthropogenic climate change. Journal of Climate, 24(13), 3294–3308. doi:10.1175/2011jcli3852.1
PAGES2k Consortium (2017): A global multiproxy database for temperature reconstructions of the Common Era. Sci Data 4, 170088. doi:10.1038/sdata.2017.88

from pylipd.lipd import LiPD

Demonstration#

Extract time series data from LiPD formatted datasets#

If you are famliar with the R utilities, one useful functions is the ability to expand “timeseries” structures. This capability was also present in the previous iteration of the Python utilities and PyLiPD retains this compatbility to ease the transition.

If you’re unsure about what a “timeseries” is in the LiPD context, read this page.

Working with one dataset#

First, let’s load a single dataset:

data_path = '../data/Ocn-Palmyra.Nurhati.2011.lpd'
D = LiPD()
D.load(data_path)

Loading 1 LiPD files

100%|█████████████████████████████████████████████| 1/1 [00:00<00:00, 46.22it/s]

Loaded..

Now let’s get all the timeseries for this dataset. Note that the get_timeseries function requires to pass the dataset names. This is useful if you only want to expand only one dataset from your LiPD object. You can also use the function get_all_dataset_names in the call to expand all datasets:

ts_list = D.get_timeseries(D.get_all_dataset_names())

type(ts_list)

Extracting timeseries from dataset: Ocn-Palmyra.Nurhati.2011 ...

dict

Note that the above function returns a dictionary that organizes the extracted timeseries by dataset name:

ts_list.keys()

dict_keys(['Ocn-Palmyra.Nurhati.2011'])

Each timeseries is then stored into a list of dictionaries that preserve essential metadata for each time/depth and value pair:

type(ts_list['Ocn-Palmyra.Nurhati.2011'])

list

Although the information is present, it is not easy to navigate or query across the various list. One simple way of doing so is to return the list into a Pandas.DataFrame:

ts_list, df = D.get_timeseries(D.get_all_dataset_names(), to_dataframe=True)

df

Extracting timeseries from dataset: Ocn-Palmyra.Nurhati.2011 ...

	mode	time_id	archiveType	geo_meanLon	geo_meanLat	geo_meanElev	geo_type	geo_pages2kRegion	geo_ocean	geo_siteName	...	paleoData_values	paleoData_description	paleoData_inferredVariableType	paleoData_dataType	paleoData_interpretation	paleoData_iso2kUI	paleoData_qCCertification	paleoData_ocean2kID	paleoData_pages2kID	paleoData_inCompilation
0	paleoData	age	Coral	-162.13	5.87	-10.0	Feature	Ocean	WP	Palmyra	...	[0.39, 0.35, 0.35, 0.35, 0.36, 0.22, 0.33, 0.3...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	paleoData	age	Coral	-162.13	5.87	-10.0	Feature	Ocean	WP	Palmyra	...	[1998.21, 1998.13, 1998.04, 1997.96, 1997.88, ...	Year AD	Year	float	NaN	NaN	NaN	NaN	NaN	NaN
2	paleoData	age	Coral	-162.13	5.87	-10.0	Feature	Ocean	WP	Palmyra	...	[-5.41, -5.47, -5.49, -5.43, -5.48, -5.53, -5....	NaN	NaN	NaN	[{'scope': 'climate', 'variableDetail': 'sea_s...	CO11NUPM01B	MNE, NJA	NaN	NaN	NaN
3	paleoData	age	Coral	-162.13	5.87	-10.0	Feature	Ocean	WP	Palmyra	...	[8.96, 8.9, 8.91, 8.94, 8.92, 8.89, 8.87, 8.81...	NaN	NaN	NaN	[{'direction': 'negative', 'variableDetail': '...	CO11NUPM01BT1	MNE, NJA	PacificNurhati2011	Ocn_129	Ocean2k_v1.0.0
4	paleoData	age	Coral	-162.13	5.87	-10.0	Feature	Ocean	WP	Palmyra	...	[1998.29, 1998.21, 1998.13, 1998.04, 1997.96, ...	Year AD	Year	float	NaN	NaN	NaN	NaN	NaN	NaN

5 rows × 88 columns

You can now use all the pandas functionalities for filtering and querying dataframes. First, let’s have a look at the available properties, which corresponds to the column headers:

df.columns

Index(['mode', 'time_id', 'archiveType', 'geo_meanLon', 'geo_meanLat',
       'geo_meanElev', 'geo_type', 'geo_pages2kRegion', 'geo_ocean',
       'geo_siteName', 'createdBy', 'dataContributor', 'googleDataURL',
       'pub1_author', 'pub1_dataUrl', 'pub1_citeKey', 'pub1_DOI',
       'pub1_journal', 'pub1_title', 'pub1_pages', 'pub1_publisher',
       'pub1_year', 'pub1_volume', 'pub2_author', 'pub2_urldate', 'pub2_url',
       'pub2_institution', 'pub2_title', 'pub2_citeKey', 'pub3_author',
       'pub3_title', 'pub3_volume', 'pub3_DOI', 'pub3_citeKey', 'pub3_journal',
       'pub3_year', 'pub3_dataUrl', 'pub3_issue', 'pub3_pages',
       'pub3_publisher', 'originalDataURL', 'googleSpreadSheetKey',
       'studyName', 'hasUrl', 'googleMetadataWorksheet', 'lipdVersion',
       'dataSetName', 'tableType', 'paleoData_googleWorkSheetKey',
       'paleoData_measurementTableMD5', 'paleoData_filename',
       'paleoData_paleoDataTableName', 'paleoData_measurementTableName',
       'year', 'yearUnits', 'paleoData_variableType', 'paleoData_notes',
       'paleoData_hasMedianValue', 'paleoData_hasMaxValue',
       'paleoData_variableName', 'paleoData_missingValue',
       'paleoData_proxyObservationType',
       'paleoData_useInGlobalTemperatureAnalysis', 'paleoData_hasMinValue',
       'paleoData_sensorSpecies', 'paleoData_resolution_hasMedianValue',
       'paleoData_resolution_hasMinValue', 'paleoData_resolution_hasMaxValue',
       'paleoData_resolution_hasMeanValue', 'paleoData_resolution_units',
       'paleoData_TSid', 'paleoData_wDSPaleoUrl', 'paleoData_number',
       'paleoData_sensorGenus', 'paleoData_hasMeanValue', 'paleoData_units',
       'paleoData_proxy', 'paleoData_archiveType', 'paleoData_values',
       'paleoData_description', 'paleoData_inferredVariableType',
       'paleoData_dataType', 'paleoData_interpretation', 'paleoData_iso2kUI',
       'paleoData_qCCertification', 'paleoData_ocean2kID',
       'paleoData_pages2kID', 'paleoData_inCompilation'],
      dtype='object')

Let’s have a look at the paleoData_variableName column to see what’s available:

df['paleoData_variableName']

   d18O
   year
   d18O
  Sr/Ca
   year
Name: paleoData_variableName, dtype: object

All columns get extracted, hence why year is extracted as a paleo variable, with its associated values stored in paleoData_values. Notice that there is also two variables names d18O. Since this is a coral record, it stands to reason that one corresponds to the measured \(\delta^{18}O\) of the coral and the other the \(\delta^{18}O\) of the seawater. Let’s have a look at the notes field:

df[['paleoData_variableName','paleoData_notes']]

	paleoData_variableName	paleoData_notes
0	d18O	d18Osw (residuals calculated from coupled SrCa...
1	year	NaN
2	d18O	Duplicate of modern d18O record presented in C...
3	Sr/Ca	; paleoData_variableName changed - was origina...
4	year	NaN

In fact, one is for the measurement on the coral and the other for seawater. Querying on this small dataset is not necessary; however, it can become useful when looking at a collection of files as shown in the next example (working with multiple datasets).

To extract by row index (here extracting for Sr_Ca):

df_cut = df.iloc[4,:]

df_cut

mode                         paleoData
time_id                            age
archiveType                      Coral
geo_meanLon                    -162.13
geo_meanLat                       5.87
                               ...    
paleoData_iso2kUI                  NaN
paleoData_qCCertification          NaN
paleoData_ocean2kID                NaN
paleoData_pages2kID                NaN
paleoData_inCompilation            NaN
Name: 4, Length: 88, dtype: object

df_cut['paleoData_variableName']

'year'

This can be very useful when working with the Pyleoclim software since a Pyleoclim.Series can be initialized from the information contained in df_cut. Working with PyLiPD and Pyleoclim is the subject of several tutorials.

Working with such a large dataframe can be overwhelming and not needed in some cases. Therefore, PyLiPD has a nifty function called get_timeseries_essentials that grabs information about the dataset, its geographical location, the time/depth values, the variable information, including archive and proxy:

df_essential = D.get_timeseries_essentials()

df_essential

	dataSetName	archiveType	geo_meanLat	geo_meanLon	geo_meanElev	paleoData_variableName	paleoData_values	paleoData_units	paleoData_proxy	paleoData_proxyGeneral	time_variableName	time_values	time_units	depth_variableName	depth_values	depth_units
0	Ocn-Palmyra.Nurhati.2011	Coral	5.87	-162.13	-10.0	d18O	[-5.41, -5.47, -5.49, -5.43, -5.48, -5.53, -5....	permil	d18O	None	year	[1998.29, 1998.21, 1998.13, 1998.04, 1997.96, ...	yr AD	None	None	None
1	Ocn-Palmyra.Nurhati.2011	Coral	5.87	-162.13	-10.0	d18O	[0.39, 0.35, 0.35, 0.35, 0.36, 0.22, 0.33, 0.3...	permil	d18O	None	year	[1998.21, 1998.13, 1998.04, 1997.96, 1997.88, ...	yr AD	None	None	None
2	Ocn-Palmyra.Nurhati.2011	Coral	5.87	-162.13	-10.0	Sr_Ca	[8.96, 8.9, 8.91, 8.94, 8.92, 8.89, 8.87, 8.81...	mmol/mol	Sr/Ca	None	year	[1998.29, 1998.21, 1998.13, 1998.04, 1997.96, ...	yr AD	None	None	None
3	Ocn-Palmyra.Nurhati.2011	Coral	5.87	-162.13	-10.0	d18O	[-5.41, -5.47, -5.49, -5.43, -5.48, -5.53, -5....	permil	d18O	None	year	[1998.29, 1998.21, 1998.13, 1998.04, 1997.96, ...	yr AD	None	None	None
4	Ocn-Palmyra.Nurhati.2011	Coral	5.87	-162.13	-10.0	d18O	[0.39, 0.35, 0.35, 0.35, 0.36, 0.22, 0.33, 0.3...	permil	d18O	None	year	[1998.21, 1998.13, 1998.04, 1997.96, 1997.88, ...	yr AD	None	None	None
5	Ocn-Palmyra.Nurhati.2011	Coral	5.87	-162.13	-10.0	Sr_Ca	[8.96, 8.9, 8.91, 8.94, 8.92, 8.89, 8.87, 8.81...	mmol/mol	Sr/Ca	None	year	[1998.29, 1998.21, 1998.13, 1998.04, 1997.96, ...	yr AD	None	None	None
6	Ocn-Palmyra.Nurhati.2011	Coral	5.87	-162.13	-10.0	d18O	[-5.41, -5.47, -5.49, -5.43, -5.48, -5.53, -5....	permil	d18O	None	year	[1998.29, 1998.21, 1998.13, 1998.04, 1997.96, ...	yr AD	None	None	None
7	Ocn-Palmyra.Nurhati.2011	Coral	5.87	-162.13	-10.0	d18O	[0.39, 0.35, 0.35, 0.35, 0.36, 0.22, 0.33, 0.3...	permil	d18O	None	year	[1998.21, 1998.13, 1998.04, 1997.96, 1997.88, ...	yr AD	None	None	None
8	Ocn-Palmyra.Nurhati.2011	Coral	5.87	-162.13	-10.0	Sr_Ca	[8.96, 8.9, 8.91, 8.94, 8.92, 8.89, 8.87, 8.81...	mmol/mol	Sr/Ca	None	year	[1998.29, 1998.21, 1998.13, 1998.04, 1997.96, ...	yr AD	None	None	None
9	Ocn-Palmyra.Nurhati.2011	Coral	5.87	-162.13	-10.0	d18O	[-5.41, -5.47, -5.49, -5.43, -5.48, -5.53, -5....	permil	d18O	None	year	[1998.29, 1998.21, 1998.13, 1998.04, 1997.96, ...	yr AD	None	None	None
10	Ocn-Palmyra.Nurhati.2011	Coral	5.87	-162.13	-10.0	d18O	[0.39, 0.35, 0.35, 0.35, 0.36, 0.22, 0.33, 0.3...	permil	d18O	None	year	[1998.21, 1998.13, 1998.04, 1997.96, 1997.88, ...	yr AD	None	None	None
11	Ocn-Palmyra.Nurhati.2011	Coral	5.87	-162.13	-10.0	Sr_Ca	[8.96, 8.9, 8.91, 8.94, 8.92, 8.89, 8.87, 8.81...	mmol/mol	Sr/Ca	None	year	[1998.29, 1998.21, 1998.13, 1998.04, 1997.96, ...	yr AD	None	None	None
12	Ocn-Palmyra.Nurhati.2011	Coral	5.87	-162.13	-10.0	d18O	[-5.41, -5.47, -5.49, -5.43, -5.48, -5.53, -5....	permil	d18O	None	year	[1998.29, 1998.21, 1998.13, 1998.04, 1997.96, ...	yr AD	None	None	None
13	Ocn-Palmyra.Nurhati.2011	Coral	5.87	-162.13	-10.0	d18O	[0.39, 0.35, 0.35, 0.35, 0.36, 0.22, 0.33, 0.3...	permil	d18O	None	year	[1998.21, 1998.13, 1998.04, 1997.96, 1997.88, ...	yr AD	None	None	None
14	Ocn-Palmyra.Nurhati.2011	Coral	5.87	-162.13	-10.0	Sr_Ca	[8.96, 8.9, 8.91, 8.94, 8.92, 8.89, 8.87, 8.81...	mmol/mol	Sr/Ca	None	year	[1998.29, 1998.21, 1998.13, 1998.04, 1997.96, ...	yr AD	None	None	None
15	Ocn-Palmyra.Nurhati.2011	Coral	5.87	-162.13	-10.0	d18O	[-5.41, -5.47, -5.49, -5.43, -5.48, -5.53, -5....	permil	d18O	None	year	[1998.29, 1998.21, 1998.13, 1998.04, 1997.96, ...	yr AD	None	None	None
16	Ocn-Palmyra.Nurhati.2011	Coral	5.87	-162.13	-10.0	d18O	[0.39, 0.35, 0.35, 0.35, 0.36, 0.22, 0.33, 0.3...	permil	d18O	None	year	[1998.21, 1998.13, 1998.04, 1997.96, 1997.88, ...	yr AD	None	None	None
17	Ocn-Palmyra.Nurhati.2011	Coral	5.87	-162.13	-10.0	Sr_Ca	[8.96, 8.9, 8.91, 8.94, 8.92, 8.89, 8.87, 8.81...	mmol/mol	Sr/Ca	None	year	[1998.29, 1998.21, 1998.13, 1998.04, 1997.96, ...	yr AD	None	None	None
18	Ocn-Palmyra.Nurhati.2011	Coral	5.87	-162.13	-10.0	d18O	[-5.41, -5.47, -5.49, -5.43, -5.48, -5.53, -5....	permil	d18O	None	year	[1998.29, 1998.21, 1998.13, 1998.04, 1997.96, ...	yr AD	None	None	None
19	Ocn-Palmyra.Nurhati.2011	Coral	5.87	-162.13	-10.0	d18O	[0.39, 0.35, 0.35, 0.35, 0.36, 0.22, 0.33, 0.3...	permil	d18O	None	year	[1998.21, 1998.13, 1998.04, 1997.96, 1997.88, ...	yr AD	None	None	None
20	Ocn-Palmyra.Nurhati.2011	Coral	5.87	-162.13	-10.0	Sr_Ca	[8.96, 8.9, 8.91, 8.94, 8.92, 8.89, 8.87, 8.81...	mmol/mol	Sr/Ca	None	year	[1998.29, 1998.21, 1998.13, 1998.04, 1997.96, ...	yr AD	None	None	None
21	Ocn-Palmyra.Nurhati.2011	Coral	5.87	-162.13	-10.0	d18O	[-5.41, -5.47, -5.49, -5.43, -5.48, -5.53, -5....	permil	d18O	None	year	[1998.29, 1998.21, 1998.13, 1998.04, 1997.96, ...	yr AD	None	None	None
22	Ocn-Palmyra.Nurhati.2011	Coral	5.87	-162.13	-10.0	d18O	[0.39, 0.35, 0.35, 0.35, 0.36, 0.22, 0.33, 0.3...	permil	d18O	None	year	[1998.21, 1998.13, 1998.04, 1997.96, 1997.88, ...	yr AD	None	None	None
23	Ocn-Palmyra.Nurhati.2011	Coral	5.87	-162.13	-10.0	Sr_Ca	[8.96, 8.9, 8.91, 8.94, 8.92, 8.89, 8.87, 8.81...	mmol/mol	Sr/Ca	None	year	[1998.29, 1998.21, 1998.13, 1998.04, 1997.96, ...	yr AD	None	None	None

The metadata (i.e., the column names) available through this function will always remain the same and are as follows:

df_essential.columns

Index(['dataSetName', 'archiveType', 'geo_meanLat', 'geo_meanLon',
       'geo_meanElev', 'paleoData_variableName', 'paleoData_values',
       'paleoData_units', 'paleoData_proxy', 'paleoData_proxyGeneral',
       'time_variableName', 'time_values', 'time_units', 'depth_variableName',
       'depth_values', 'depth_units'],
      dtype='object')

Working with multiple datasets#

path = '../data/Pages2k/'

D_dir = LiPD()
D_dir.load_from_dir(path)

Loading 16 LiPD files

100%|███████████████████████████████████████████| 16/16 [00:00<00:00, 66.07it/s]

Loaded..

Let’s expand into our essential dataframe:

df_dir = D_dir.get_timeseries_essentials()

Let’s have a look at the dataframe:

df_dir.head()

	dataSetName	archiveType	geo_meanLat	geo_meanLon	geo_meanElev	paleoData_variableName	paleoData_values	paleoData_units	paleoData_proxy	paleoData_proxyGeneral	time_variableName	time_values	time_units	depth_variableName	depth_values	depth_units
0	Ocn-RedSea.Felis.2000	Coral	27.85	34.32	-6.0	d18O	[-4.12, -3.82, -3.05, -3.02, -3.62, -3.96, -3....	permil	d18O	None	year	[1995.583, 1995.417, 1995.25, 1995.083, 1994.9...	yr AD	None	None	None
1	Ocn-RedSea.Felis.2000	Coral	27.85	34.32	-6.0	d18O	[-4.12, -3.82, -3.05, -3.02, -3.62, -3.96, -3....	permil	d18O	None	year	[1995.583, 1995.417, 1995.25, 1995.083, 1994.9...	yr AD	None	None	None
2	Ocn-RedSea.Felis.2000	Coral	27.85	34.32	-6.0	d18O	[-4.12, -3.82, -3.05, -3.02, -3.62, -3.96, -3....	permil	d18O	None	year	[1995.583, 1995.417, 1995.25, 1995.083, 1994.9...	yr AD	None	None	None
3	Ocn-RedSea.Felis.2000	Coral	27.85	34.32	-6.0	d18O	[-4.12, -3.82, -3.05, -3.02, -3.62, -3.96, -3....	permil	d18O	None	year	[1995.583, 1995.417, 1995.25, 1995.083, 1994.9...	yr AD	None	None	None
4	Ocn-RedSea.Felis.2000	Coral	27.85	34.32	-6.0	d18O	[-4.12, -3.82, -3.05, -3.02, -3.62, -3.96, -3....	permil	d18O	None	year	[1995.583, 1995.417, 1995.25, 1995.083, 1994.9...	yr AD	None	None	None

The size of this dataframe is:

df_dir.shape

(200, 16)

So we expanded into 25 timeseries.

Let’s have a look at the available variables:

df_dir['paleoData_variableName'].unique()

array(['d18O', 'uncertainty_temperature', 'temperature', 'notes', 'Mg_Ca',
       'Uk37', 'trsgi', 'MXD'], dtype=object)

Let’s assume we are only interested in the temperature data:

df_temp = df_dir[df_dir['paleoData_variableName']=='temperature']
df_temp.head()

	dataSetName	archiveType	geo_meanLat	geo_meanLon	geo_meanElev	paleoData_variableName	paleoData_values	paleoData_units	paleoData_proxy	paleoData_proxyGeneral	time_variableName	time_values	time_units	depth_variableName	depth_values	depth_units
9	Ant-WAIS-Divide.Severinghaus.2012	Borehole	-79.463	-112.125	1766.0	temperature	[-29.607, -29.607, -29.606, -29.606, -29.605, ...	degC	borehole	None	year	[8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,...	yr AD	None	None	None
11	Ant-WAIS-Divide.Severinghaus.2012	Borehole	-79.463	-112.125	1766.0	temperature	[-29.607, -29.607, -29.606, -29.606, -29.605, ...	degC	borehole	None	year	[8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,...	yr AD	None	None	None
13	Ant-WAIS-Divide.Severinghaus.2012	Borehole	-79.463	-112.125	1766.0	temperature	[-29.607, -29.607, -29.606, -29.606, -29.605, ...	degC	borehole	None	year	[8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,...	yr AD	None	None	None
15	Ant-WAIS-Divide.Severinghaus.2012	Borehole	-79.463	-112.125	1766.0	temperature	[-29.607, -29.607, -29.606, -29.606, -29.605, ...	degC	borehole	None	year	[8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,...	yr AD	None	None	None
17	Ant-WAIS-Divide.Severinghaus.2012	Borehole	-79.463	-112.125	1766.0	temperature	[-29.607, -29.607, -29.606, -29.606, -29.605, ...	degC	borehole	None	year	[8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,...	yr AD	None	None	None

df_temp.shape

(88, 16)

which leaves us with 11 timeseries.

Let’s assume that you want everything that is not related to time, depth, and uncertainty. To keep the rows that are relevant to our problem, you can use the DataFrame.query function available in Pandas:

df_filt = df_dir.query("paleoData_variableName in ('temperature','MXD','Mg_Ca','d18O','trsgi', 'Uk37')")
df_filt.head()

	dataSetName	archiveType	geo_meanLat	geo_meanLon	geo_meanElev	paleoData_variableName	paleoData_values	paleoData_units	paleoData_proxy	paleoData_proxyGeneral	time_variableName	time_values	time_units	depth_variableName	depth_values	depth_units
0	Ocn-RedSea.Felis.2000	Coral	27.85	34.32	-6.0	d18O	[-4.12, -3.82, -3.05, -3.02, -3.62, -3.96, -3....	permil	d18O	None	year	[1995.583, 1995.417, 1995.25, 1995.083, 1994.9...	yr AD	None	None	None
1	Ocn-RedSea.Felis.2000	Coral	27.85	34.32	-6.0	d18O	[-4.12, -3.82, -3.05, -3.02, -3.62, -3.96, -3....	permil	d18O	None	year	[1995.583, 1995.417, 1995.25, 1995.083, 1994.9...	yr AD	None	None	None
2	Ocn-RedSea.Felis.2000	Coral	27.85	34.32	-6.0	d18O	[-4.12, -3.82, -3.05, -3.02, -3.62, -3.96, -3....	permil	d18O	None	year	[1995.583, 1995.417, 1995.25, 1995.083, 1994.9...	yr AD	None	None	None
3	Ocn-RedSea.Felis.2000	Coral	27.85	34.32	-6.0	d18O	[-4.12, -3.82, -3.05, -3.02, -3.62, -3.96, -3....	permil	d18O	None	year	[1995.583, 1995.417, 1995.25, 1995.083, 1994.9...	yr AD	None	None	None
4	Ocn-RedSea.Felis.2000	Coral	27.85	34.32	-6.0	d18O	[-4.12, -3.82, -3.05, -3.02, -3.62, -3.96, -3....	permil	d18O	None	year	[1995.583, 1995.417, 1995.25, 1995.083, 1994.9...	yr AD	None	None	None

df_filt.shape

(176, 16)

Which leaves us with 22 timeseries.

Removing and popping datasets out of a LiPD object#

You can also remove (i.e., delete the corresponding dataset from the LiPD object) or pop (i.e., delete the corresponding dataset from the LiPD object and return the dataset) datasets from a LiPD object. Note that these functionalities behave similarly as the functions with the same names on Python lists. These functions underpin more adavanced filtering and querying capabilities that we will discuss in later tutorials.

First let’s make a copy of D_dir:

D_test = D_dir.copy()
print(D_test.get_all_dataset_names())

['Ocn-RedSea.Felis.2000', 'Ant-WAIS-Divide.Severinghaus.2012', 'Asi-SourthAndMiddleUrals.Demezhko.2007', 'Ocn-AlboranSea436B.Nieto-Moreno.2013', 'Eur-SpannagelCave.Mangini.2005', 'Ocn-FeniDrift.Richter.2009', 'Eur-LakeSilvaplana.Trachsel.2010', 'Ocn-PedradeLume-CapeVerdeIslands.Moses.2006', 'Ocn-SinaiPeninsula_RedSea.Moustafa.2000', 'Eur-NorthernSpain.Martin-Chivelet.2011', 'Arc-Kongressvatnet.D_Andrea.2012', 'Eur-CoastofPortugal.Abrantes.2011', 'Eur-SpanishPyrenees.Dorado-Linan.2012', 'Eur-FinnishLakelands.Helama.2014', 'Eur-NorthernScandinavia.Esper.2012', 'Eur-Stockholm.Leijonhufvud.2009']

And let’s remove Eur-Stockholm.Leijonhufvud.2009, which corresponds to the last entry in the list above:

D_test.remove('Eur-Stockholm.Leijonhufvud.2009')

print(D_test.get_all_dataset_names())

['Ocn-RedSea.Felis.2000', 'Ant-WAIS-Divide.Severinghaus.2012', 'Asi-SourthAndMiddleUrals.Demezhko.2007', 'Ocn-AlboranSea436B.Nieto-Moreno.2013', 'Eur-SpannagelCave.Mangini.2005', 'Ocn-FeniDrift.Richter.2009', 'Eur-LakeSilvaplana.Trachsel.2010', 'Ocn-PedradeLume-CapeVerdeIslands.Moses.2006', 'Ocn-SinaiPeninsula_RedSea.Moustafa.2000', 'Eur-NorthernSpain.Martin-Chivelet.2011', 'Arc-Kongressvatnet.D_Andrea.2012', 'Eur-CoastofPortugal.Abrantes.2011', 'Eur-SpanishPyrenees.Dorado-Linan.2012', 'Eur-FinnishLakelands.Helama.2014', 'Eur-NorthernScandinavia.Esper.2012']

Now let’s pop Eur-NorthernScandinavia.Esper.2012 from D_test:

d_eur = D_test.pop('Eur-NorthernScandinavia.Esper.2012')

Now let’s have a look at d_eur:

print(d_eur.get_all_dataset_names())

['Eur-NorthernScandinavia.Esper.2012']

It contains the dataset we are expecting. Let’s have a look at D_test:

print(D_test.get_all_dataset_names())

['Ocn-RedSea.Felis.2000', 'Ant-WAIS-Divide.Severinghaus.2012', 'Asi-SourthAndMiddleUrals.Demezhko.2007', 'Ocn-AlboranSea436B.Nieto-Moreno.2013', 'Eur-SpannagelCave.Mangini.2005', 'Ocn-FeniDrift.Richter.2009', 'Eur-LakeSilvaplana.Trachsel.2010', 'Ocn-PedradeLume-CapeVerdeIslands.Moses.2006', 'Ocn-SinaiPeninsula_RedSea.Moustafa.2000', 'Eur-NorthernSpain.Martin-Chivelet.2011', 'Arc-Kongressvatnet.D_Andrea.2012', 'Eur-CoastofPortugal.Abrantes.2011', 'Eur-SpanishPyrenees.Dorado-Linan.2012', 'Eur-FinnishLakelands.Helama.2014']

The dataset was removed from `D_test` in the process. Hence, it's always prudent to make a copy of the original object when using the `remove` and `pop` functionalities.

If can also remove/pop more than one dataset at a time:

rem = ['Ocn-RedSea.Felis.2000','Ant-WAIS-Divide.Severinghaus.2012']

D_test.remove(rem)
print(D_test.get_all_dataset_names())

['Asi-SourthAndMiddleUrals.Demezhko.2007', 'Ocn-AlboranSea436B.Nieto-Moreno.2013', 'Eur-SpannagelCave.Mangini.2005', 'Ocn-FeniDrift.Richter.2009', 'Eur-LakeSilvaplana.Trachsel.2010', 'Ocn-PedradeLume-CapeVerdeIslands.Moses.2006', 'Ocn-SinaiPeninsula_RedSea.Moustafa.2000', 'Eur-NorthernSpain.Martin-Chivelet.2011', 'Arc-Kongressvatnet.D_Andrea.2012', 'Eur-CoastofPortugal.Abrantes.2011', 'Eur-SpanishPyrenees.Dorado-Linan.2012', 'Eur-FinnishLakelands.Helama.2014']