Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Understanding Data Providers: NOAA & PANGAEA

Overview

PyleoTUPS integrates with two major paleoclimate data repositories to provide researchers with unified access to paleoclimate datasets. Understanding how these repositories work is essential for effectively using PyleoTUPS.

Data Provider:

In PyleoTUPS, a “Data Provider” is a paleoclimate repository that:

PyleoTUPS works with: a. NOAA b. Pangaea

PyleoTUPS acts as a bridge between you and these repositories, handling API calls, data parsing, and format conversion so you don’t have to.


NOAA NCEI Paleoclimate Database

What is NOAA?

The National Oceanic and Atmospheric Administration (NOAA) maintains the NCEI Paleoclimate Global Monitoring Program, one of the world’s largest collections of paleoclimate data.

Understanding the NOAA Data Structure

Study (or "Individual Dataset")
├── Sites (with geographic coordinates)
│   └── Paleo Data 
│       └── Data Tables (spreadsheet-like table)
│           └── Files (text, CSV, Excel)
└── Metadata
    ├── Authors/Investigators
    ├── Funding Information
    ├── Publication Citation
    └── Links to raw files

Key Concepts:

NOAA datasets are organized hierarchically:

[noaa_ER_diagram.png]

Entity Relations: In NOAA, data is organized in a hierarchical, one-to-many structure:

NOAA API Endpoints

PyleoTUPS uses the NOAA NCEI Paleo Study Search API:

Base URL: https://www.ncei.noaa.gov/access/paleo-search/api/study/search.json

The API accepts a rich set of query parameters [View complete list here]:

CategoryParameterExample
Identifiersnoaa_id, xml_idnoaa_id=13156
Textsearch_textsearch_text="younger dryas"
Peopleinvestigatorsinvestigators="Smith, JS"
Locationlocations, min_lat, max_lat, min_lon, max_lonmin_lat=30, max_lat=40
Data Typedata_type_id4 (Corals), 18 (Tree Ring)
Variablesvariable_name (cvWhats), cv_materials, cv_seasonalitiesvariable_name="Radial growth"
Timeearliest_year, latest_year, time_format, time_methodearliest_year=-8000
Elevationmin_elevation, max_elevationmin_elevation=0, max_elevation=3000
Paginationlimit, skiplimit=50, skip=100

How PyleoTUPS Uses NOAA

When you call NOAADataset.search_studies( <params> ):

  1. Query Building → Translates Pythonic parameter names to NOAA API names

  2. API Request → Makes HTTP GET request to the NOAA study search endpoint

  3. Response Parsing → Receives JSON containing study metadata and file URLs

  4. Data Registration → Stores studies internally and builds indexes for efficient file lookups

  5. Returns → A DataFrame summarizing found studies

Each study returned includes file URLs pointing to text/CSV/Excel files hosted on NOAA servers.

Example NOAA Workflow

import pyleotups as pt

ds = pt.NOAADataset()

# Search by ID (direct lookup)
df = ds.search_studies(noaa_id=13156)

# Search by location
df = ds.search_studies(min_lat=30, max_lat=40, min_lon=-100, max_lon=-80, limit=20)

# Search by data type (e.g., Tree rings)
df = ds.search_studies(data_type_id=18, limit=50)

# Get data from a study
df_data = ds.get_data("some_datatable_id")

PANGAEA Database

What is PANGAEA?

PANGAEA is a sophisticated scientific data repository operated by the Center for Marine Environmental Sciences (MARUM). It hosts interdisciplinary datasets, with a growing collection of paleoclimate studies.

PANGAEA Data Organization

PANGAEA organizes datasets differently than NOAA:

Dataset (standalone publication)
├── Metadata
│   ├── Title and description
│   ├── Authors/Investigators
│   ├── Publication DOI
│   ├── Funding Information
│   └── Topics
├── Data Tables
│   ├── Columns (parameters with units and descriptions)
│   ├── Geographic locations (one or more, often one per row)
│   └── Rows (measurements or observations)
└── Related Datasets
    └── Child datasets or related publications

Key Concepts:

Unlike NOAA, one Pangaea Dataset contains only one Data Table i.e. 1 csv/tsv type file. However, one Pangaea Dataset can still contain multiple events.

PANGAEA Query Interface

PANGAEA uses a filter-based search model with advanced query syntax:

Base URL: https://www.pangaea.de/advanced/search.php

Query parameters and operators [View complete list here]:

FeatureSyntaxExample
Full-textq=<text>q=stable isotopes
Authorauthor:<name>author:"Khider, D"
Parameterparameter:<name>parameter:"δ18O"
Topictopic=<topic>topic="Paleontology"
GeographicBounding boxminlon=-100&maxlon=-80&minlat=30&maxlat=40
OperatorsAND, OR, NOT(isotopes OR δ18O) AND paleoclimate
Field Searchproperty:valueMultiple field combinations

Logical Operators:

PyleoTUPS contructs this Pangaea query.

How PyleoTUPS Uses PANGAEA

When you call PangaeaDataset.search_studies(**kwargs):

  1. Query Building → Translates Python parameters into PANGAEA query syntax

  2. Query Execution → Makes requests to PANGAEA search API via pangaeapy library

  3. Result Processing → Retrieves dataset metadata and constructs summary DataFrames

  4. ID Registration → Stores dataset IDs and metadata for later data retrieval

  5. Returns → A DataFrame summarizing found datasets

PyleoTUPS uses the pangaeapy library (an existing wrapper for PANGAEA API) under the hood to handle low-level API interactions.

Example PANGAEA Workflow

import pyleotups as pt

ds = pt.PangaeaDataset()

# Search by ID (direct lookup)
df = ds.search_studies(study_ids=830587)

# Search by text
df = ds.search_studies(search_text="stable isotopes", limit=20)

# Search by parameter
df = ds.search_studies(variable_name="δ18O", limit=20)

# Search with geographic bounds
df = ds.search_studies(min_lat=-10, max_lat=10, min_lon=120, max_lon=160)

# Get data from a dataset
df_data = ds.get_data(830587)

Comparison: NOAA vs. PANGAEA

Data Model

AspectNOAAPANGAEA
StructureHierarchical (Study → Site → PaleoData → DataTable)Flat (Dataset with multiple parameters)
GeographyMultiple sites per studyOne or more events/locations per dataset
Primary FocusPaleoclimate proxy recordsInterdisciplinary geoscience data
File FormatsLegacy text formats, NOAA Templated Text formats, CSV, ExcelStandardized table format (tab-delimited), net-cdf
MetadataRich hierarchical structureStandardized metadata fields

Query Capabilities

FeatureNOAAPANGAEA
ID-Based SearchYes (NOAA ID, XML ID)Yes (DOI, numeric ID)
Full-TextYes (Oracle syntax)Yes (faceted search)
Variable FilterVia cvWhats (controlled vocab)Via parameter name (text-based)
GeographicBounding boxBounding box
Time RangeExplicit earliest/latest yearImplicit in data timestamps
Data TypeFiltered via dataTypeIDFiltered via topic
AuthorsSupportedSupported
Multi-value LogicAND/OR operatorsAND/OR operators

Data Access

AspectNOAAPANGAEA
FilesLinks to remote files (text, CSV, Excel)Tables accessed via API or download
ParsingComplex legacy formats → requires dedicated parserStandardized format → no parsing needed
File HandlingPyleoTUPS downloads and parsespangaeapy handles retrieval
Metadata in DataEmbedded within filesSeparate dataset-level metadata

For PyleoTUPS Users


Summary

PyleoTUPS bridges NOAA and PANGAEA by:

  1. Normalizing search parameters into repository-specific query formats

  2. Abstracting repository differences so users think in terms of paleoclimate concepts, not backend APIs

  3. Parsing diverse file formats (especially NOAA’s legacy formats)

  4. Providing unified access to both searches and data retrieval

  5. Preserving metadata throughout the data pipeline

In the next section, we’ll explore how PyleoTUPS’ architecture enables this unified interface.