
Querying the NOAA NCEI Database¶
Authors¶
Preamble¶
In a previous chapter, you learned how to create a NOAADataset object and its many functionalities. We won’t be revisiting those here; just know that regardless of how many datasets are returned, the functions will apply.
This tutorial demonstrates how to query the database. A list of search parameters is available on the NOAA website and is summarized below:
xml_id : str, optional Specify the internal XML document ID. Must be an exact match (e.g., ‘1840’).
noaa_id : str, optional Provide the unique NOAA Study ID as a number (e.g., ‘13156’).
search_text : str, optional General text search across study content. Supports wildcards (%) and logical operators (AND, OR). Examples: ‘younger dryas’, ‘loess AND stratigraphy’
data_publisher : by default ‘NOAA’ Choose from: ‘NOAA’, ‘NEOTOMA’, or ‘PANGAEA’. Example: ‘NOAA’
data_type_id : str, optional Filter by data type. Use one or more type IDs separated by ‘|’. Available IDs: 1: BOREHOLE, 2: CLIMATE FORCING, 3: CLIMATE RECONSTRUCTIONS, 4: CORALS AND SCLEROSPONGES, 6: HISTORICAL, 7: ICE CORES, 8: INSECT, 9: LAKE LEVELS, 10: LOESS, 11: PALEOCLIMATIC MODELING, 12: FIRE HISTORY, 13: PALEOLIMNOLOGY, 14: PALEOCEANOGRAPHY, 15: PLANT MACROFOSSILS, 16: POLLEN, 17: SPELEOTHEMS, 18: TREE RING, 19: OTHER COLLECTIONS, 20: INSTRUMENTAL, 59: SOFTWARE, 60: REPOSITORY Example: ‘4’, ‘4|18’
investigators : str or list[str], optional Investigator(s) in the form
"LastName, Initials". Lists are joined with|.investigators_and_or : {“and”,“or”}, default “or” Logical combiner when multiple investigators are supplied. Only sent when 2+ items.
locations : str or list[str], optional Location(s) as hierarchical strings using
>(e.g.,"Continent>Africa>Kenya"). Lists joined with|.locations_and_or : {“and”,“or”}, default “or” Logical combiner for multiple locations. Only sent when 2+ items.
keywords : str or list[str], optional Controlled keyword(s); hierarchies with
>. Lists joined with|.keywords_and_or : {“and”,“or”}, default “or” Logical combiner for multiple keywords. Only sent when 2+ items.
species : str or list[str], optional Four-letter tree species codes (uppercase enforced). Lists joined with
|.species_and_or : {“and”,“or”}, default “or” Logical combiner for multiple species. Only sent when 2+ items.
variable_name : str or list[str], optional Refers to PaST "cvWhats” terms (hierarchies with
>). Lists joined with|.variable_name_and_or : {“and”,“or”}, default “or” Logical combiner for multiple cvWhats/variable_name. Only sent when 2+ items.
cv_materials : str or list[str], optional PaST “Material” terms (hierarchies with
>). Lists joined with|.cv_materials_and_or : {“and”,“or”}, default “or” Logical combiner for multiple cv_materials. Only sent when 2+ items.
cv_seasonalities : str or list[str], optional PaST “Seasonality” terms (e.g.,
"annual"or"3-month>Aug-Oct"). Lists joined with|.cv_seasonalities_and_or : {“and”,“or”}, default “or” Logical combiner for multiple cv_seasonalities. Only sent when 2+ items.
min_lat, max_lat : int, optional Latitude bounds in whole degrees (–90..90).
min_lon, max_lon : int, optional Longitude bounds in whole degrees (–180..180).
min_elevation, max_elevation : int, optional Elevation bounds in meters (integers; negative allowed).
earliest_year, latest_year : int, optional Year bounds (integers; negative allowed). If provided without time settings,
time_formatdefaults to'CE'.time_format : {“CE”,“BP”}, optional Interpretation of years. If omitted with a time window, defaults to
'CE'.time_method : {“overAny”,“entireOver”,“overEntire”}, optional How to apply the time window (overlap, envelop, or within).
reconstruction : bool or str, optional Accepts True/False or strings (case-insensitive) like
"true"|"yes"|"y"|"1"→'Y'and"false"|"no"|"n"|"0"→'N'.Noneomits the filter.recent : bool, default False If True, restrict to studies from the last ~2 years (newest first).
limit : int, default 100 Number of studies to return (PyleoTUPS default).
display : bool, default False If True, render a small preview after parsing.
skip : int, optional Number of studies to skip (for pagination). Use with
limitto page through results. Example:limit=10, skip=10returns items 11–20.
Goals¶
Perform query on NOAA NCEI using their search parameters.
Understand rate limits for API query and how to perform large query using the
skipandlimitparametersAdding two
NOAADatasetobjects
Prerequisite¶
Understanding of NOAA datasets and associated search API
Reading time¶
15 min
Let’s import our packages!
import pyleotups as pt
import pandas as pdStudy Query¶
In the notebook introducing the NOAADataset object, we performed a basic query using the NOAA dataset ID. If you are more comfortable using the NOAA graphical interface, you can always use the resulting ID to work with PyleoTUPS. However, this forces you to create a new NOAADataset object for each query.
Let’s search for our demo dataset again from Clemens et al. (2021):
ds1=pt.NOAADataset()
res1 = ds1.search_studies(noaa_id=33213)[2026-04-28 08:31:51,522][INFO] - search_studies: Limit defaulted to 100 (PyleoTUPS).
[2026-04-28 08:31:51,523][INFO] - search_studies: Input Query includes geographical bounds. Inspect the results to ensure they match your intended region as one study can contain sites across various parts of the world.
Request URL: https://www.ncei.noaa.gov/access/paleo-search/study/search.json?dataPublisher=NOAA&NOAAStudyId=33213&limit=100
Parsing NOAA studies: 100%|██████████| 1/1 [00:00<00:00, 2314.74it/s]
[2026-04-28 08:31:52,024][INFO] - Retrieved 1 studies.
Now, let’s say we want to also have a look at the study by Bhattacharya et al. (2022), which is available through this NOAA landing page:
ds2=pt.NOAADataset()
res2 = ds2.search_studies(noaa_id=36778)[2026-04-28 08:31:53,570][INFO] - search_studies: Limit defaulted to 100 (PyleoTUPS).
[2026-04-28 08:31:53,570][INFO] - search_studies: Input Query includes geographical bounds. Inspect the results to ensure they match your intended region as one study can contain sites across various parts of the world.
Request URL: https://www.ncei.noaa.gov/access/paleo-search/study/search.json?dataPublisher=NOAA&NOAAStudyId=36778&limit=100
Parsing NOAA studies: 100%|██████████| 1/1 [00:00<00:00, 2381.77it/s]
[2026-04-28 08:31:53,955][INFO] - Retrieved 1 studies.
You can combine them in the same object as such:
ds = ds1 + ds2
ds.get_summary()Investigator Query¶
One of the most common search is to look up the datasets produced by an investigator. For instance, let’s have a look at some of the studies contributed by one of the authors of these tutorials:
ds = pt.NOAADataset()
res = ds.search_studies(investigators = "Khider, D.")[2026-04-28 08:31:57,942][INFO] - search_studies: Limit defaulted to 100 (PyleoTUPS).
[2026-04-28 08:31:57,943][INFO] - search_studies: Input Query includes geographical bounds. Inspect the results to ensure they match your intended region as one study can contain sites across various parts of the world.
Request URL: https://www.ncei.noaa.gov/access/paleo-search/study/search.json?dataPublisher=NOAA&limit=100&investigators=Khider%2C+D.
Parsing NOAA studies: 100%|██████████| 3/3 [00:00<00:00, 4725.09it/s]
[2026-04-28 08:31:58,579][INFO] - Retrieved 3 studies.
Let’s have a look at the results:
display(res)What if I am really interested in that last study? Well, one solution is to create a search with the StudyID (they come in handy!) or one can refine the search with another investigator.
res = ds.search_studies(investigators = ["Khider, D.", "Jackson, C."])
display(res)[2026-04-28 08:32:02,205][INFO] - search_studies: Limit defaulted to 100 (PyleoTUPS).
[2026-04-28 08:32:02,205][INFO] - search_studies: Input Query includes geographical bounds. Inspect the results to ensure they match your intended region as one study can contain sites across various parts of the world.
Request URL: https://www.ncei.noaa.gov/access/paleo-search/study/search.json?dataPublisher=NOAA&limit=100&investigators=Khider%2C+D.%7CJackson%2C+C.&investigatorsAndOr=or
Parsing NOAA studies: 100%|██████████| 4/4 [00:00<00:00, 20535.15it/s]
[2026-04-28 08:32:02,836][INFO] - Retrieved 4 studies.
Oops! Not exactly what you were expecting, right? Now, I have 4 studies. Let’s have a look at the investigator field:
for value in res['Investigators']:
print(value)Judson Partin, Terrence Quinn, Chuan-Chou Shen, Julien Emile-Geay, Frederick Taylor, Christopher Maupin, Ke Lin, Charles Jackson, Jay Banner, Daniel Sinclair, Chih-An Huh
Deborah Khider, Lowell Stott, Julien Emile-Geay, Robert Thunell, Doug Hammond
Hai Cheng, R. Lawrence Edwards, Deborah Khider, Ashish Sinha, Lowell Stott, Justin Reuter
Deborah Khider, Charles Jackson, Lowell Stott
The reason is that NOAA uses or by default. So we have been searching for studies whose investigators are either Charles Jackson or Deborah Khider. This is how you perform an and search:
res = ds.search_studies(investigators = ["Khider, D.", "Jackson, C."], investigators_and_or = 'and')
display(res)[2026-04-28 08:32:05,775][INFO] - search_studies: Limit defaulted to 100 (PyleoTUPS).
[2026-04-28 08:32:05,775][INFO] - search_studies: Input Query includes geographical bounds. Inspect the results to ensure they match your intended region as one study can contain sites across various parts of the world.
Request URL: https://www.ncei.noaa.gov/access/paleo-search/study/search.json?dataPublisher=NOAA&limit=100&investigators=Khider%2C+D.%7CJackson%2C+C.&investigatorsAndOr=and
Parsing NOAA studies: 100%|██████████| 1/1 [00:00<00:00, 7810.62it/s]
[2026-04-28 08:32:06,686][INFO] - Retrieved 1 studies.
Which is the study we wanted!
Geographical Query¶
One of the most common types of searches is a geographical query. Let’s look for all the datasets within 5°S-5°N and 109-125°E, roughly corresponding to the Indo-Pacific Warm Pool.
res = ds.search_studies(max_lat=5, min_lat=-5, max_lon=109,
min_lon=125)[2026-04-28 08:32:09,784][INFO] - search_studies: Limit defaulted to 100 (PyleoTUPS).
[2026-04-28 08:32:09,785][INFO] - search_studies: Input Query includes geographical bounds. Inspect the results to ensure they match your intended region as one study can contain sites across various parts of the world.
Request URL: https://www.ncei.noaa.gov/access/paleo-search/study/search.json?dataPublisher=NOAA&limit=100&minLat=-5&maxLat=5&minLon=125&maxLon=109
Parsing NOAA studies: 100%|██████████| 100/100 [00:00<00:00, 2671.58it/s]
[2026-04-28 08:32:11,846][INFO] - Retrieved 100 studies.
We got 100 datasets, which means we may have hit the limit. Let’s increase to 500 and see what happens:
res = ds.search_studies(max_lat=5, min_lat=-5, max_lon=109,
min_lon=125, limit=500)[2026-04-28 08:32:15,452][INFO] - search_studies: Limit set to 500.
[2026-04-28 08:32:15,452][INFO] - search_studies: Input Query includes geographical bounds. Inspect the results to ensure they match your intended region as one study can contain sites across various parts of the world.
Request URL: https://www.ncei.noaa.gov/access/paleo-search/study/search.json?dataPublisher=NOAA&limit=500&minLat=-5&maxLat=5&minLon=125&maxLon=109
Parsing NOAA studies: 100%|██████████| 388/388 [00:00<00:00, 10904.66it/s]
[2026-04-28 08:32:22,098][INFO] - Retrieved 388 studies.
We have 388 studies now, far from the maximum of 500 so it seams we collected all records. Let’s have a look at the first few rows of the results:
display(res.head())Let’s refine this search for DataType ‘Paleoceanography’. If you are not familiar with how NOAA organizes is database and what types are, have a look at this page for a list of data types (under products). For convenience, a list of datatypes and correspondind IDs are provided at the top of this page and are available on the NOAA API documentation page.
res = ds.search_studies(max_lat=5, min_lat=-5, max_lon=109,
min_lon=125, data_type_id = 14 ,limit=100)
display(res.head())[2026-04-28 08:32:28,539][INFO] - search_studies: Limit defaulted to 100 (PyleoTUPS).
[2026-04-28 08:32:28,540][INFO] - search_studies: Input Query includes geographical bounds. Inspect the results to ensure they match your intended region as one study can contain sites across various parts of the world.
Request URL: https://www.ncei.noaa.gov/access/paleo-search/study/search.json?dataTypeID=14&dataPublisher=NOAA&limit=100&minLat=-5&maxLat=5&minLon=125&maxLon=109
Parsing NOAA studies: 100%|██████████| 100/100 [00:00<00:00, 11020.53it/s]
[2026-04-28 08:32:30,316][WARNING] - Retrieved 100 studies, which is the specified limit. Consider increasing the limit parameter to fetch more studies.
[2026-04-28 08:32:30,317][INFO] - Retrieved 100 studies.
Because the data type filters do note just return Paleoceanography studies, we recommend using a filter on the Pandas DataFrame itself.
Variable queries¶
Let’s have a look at another type of queries often performed in the course of a paleoclimate investigation: searching by variable_name. In NOAA NCEI, the actual term is cv_whats (cv stands for the PaST Thesaurus’ Controlled Vocabulary). We harmonized the term for consistency.
The CV terms are programmatically accessible through their name or ID. For a list, see this page.
Let’s have a look at a query for temperature. We will limit the results to 10 for this demo.
res = ds.search_studies(variable_name = 'Temperature', limit=100)
display(res.head(10)) #return the first 10 rows instead of the first 5[2026-04-28 08:43:12,777][INFO] - search_studies: Limit defaulted to 100 (PyleoTUPS).
[2026-04-28 08:43:12,781][INFO] - search_studies: Input Query includes geographical bounds. Inspect the results to ensure they match your intended region as one study can contain sites across various parts of the world.
Request URL: https://www.ncei.noaa.gov/access/paleo-search/study/search.json?dataPublisher=NOAA&limit=100&cvWhats=Temperature
Parsing NOAA studies: 100%|██████████| 100/100 [00:00<00:00, 10546.40it/s]
[2026-04-28 08:43:15,331][WARNING] - Retrieved 100 studies, which is the specified limit. Consider increasing the limit parameter to fetch more studies.
[2026-04-28 08:43:15,331][INFO] - Retrieved 100 studies.
Let’s have a look at how this would work in practice, using the example above.
Let’s create our first dataset and query for the first 5 entries:
ds1 = pt.NOAADataset()
res1 = ds1.search_studies(variable_name = 'Temperature', limit=5) #let's retrieve the first five dataset:
display(res1)[2026-04-28 08:45:01,516][INFO] - search_studies: Limit set to 5.
[2026-04-28 08:45:01,516][INFO] - search_studies: Input Query includes geographical bounds. Inspect the results to ensure they match your intended region as one study can contain sites across various parts of the world.
Request URL: https://www.ncei.noaa.gov/access/paleo-search/study/search.json?dataPublisher=NOAA&limit=5&cvWhats=Temperature
Parsing NOAA studies: 100%|██████████| 5/5 [00:00<00:00, 3373.80it/s]
[2026-04-28 08:45:03,006][WARNING] - Retrieved 5 studies, which is the specified limit. Consider increasing the limit parameter to fetch more studies.
[2026-04-28 08:45:03,007][INFO] - Retrieved 5 studies.
As you can see, we obtain the same first five datasets as our previous query. Let’s now retrieve the next 5:
ds2 = pt.NOAADataset()
res2 = ds2.search_studies(variable_name = 'Temperature', limit=5, skip = 5) #let's retrieve the next 5 (limit) while skipping the first 5 (skip) that we already have.
display(res2)[2026-04-28 08:56:08,979][INFO] - search_studies: Limit set to 5.
[2026-04-28 08:56:08,980][INFO] - search_studies: Input Query includes geographical bounds. Inspect the results to ensure they match your intended region as one study can contain sites across various parts of the world.
Request URL: https://www.ncei.noaa.gov/access/paleo-search/study/search.json?dataPublisher=NOAA&limit=5&skip=5&cvWhats=Temperature
Parsing NOAA studies: 100%|██████████| 5/5 [00:00<00:00, 5309.25it/s]
[2026-04-28 08:56:10,361][WARNING] - Retrieved 5 studies, which is the specified limit. Consider increasing the limit parameter to fetch more studies.
[2026-04-28 08:56:10,362][INFO] - Retrieved 5 studies.
If you need to conduct large queries, we recommend using the skip parameter to obtain all the results and add them up:
ds = ds1+ds2
display(ds.get_summary().head(10))Time queries¶
Another useful query to be performed is to search within time constraints. To do so, NOAA NCEI has 3 categories of parameters:
earliest_year, latest_year : int, optional Year bounds (integers; negative allowed).
time_format : {“CE”,“BP”}, optional Interpretation of years. If omitted with a time window, defaults to
'CE'.time_method : {“overAny”,“entireOver”,“overEntire”}, optional How to apply the time window (overlap, envelop, or within).

Let’s look for datasets representing temperature within the Indo Pacific Warm Pool (10°S-10°N, 110°E-130°E) over the Holocene (0-10000 years BP):
ds = pt.NOAADataset()
res = ds.search_studies(variable_name = 'Temperature', earliest_year = 10000, latest_year=0, time_format = 'BP', min_lat = -10,
max_lat = 10, min_lon = 110, max_lon = 130, limit = 10)
display(res)[2026-04-28 10:08:13,686][INFO] - search_studies: Limit set to 10.
[2026-04-28 10:08:13,686][INFO] - search_studies: Input Query includes geographical bounds. Inspect the results to ensure they match your intended region as one study can contain sites across various parts of the world.
Request URL: https://www.ncei.noaa.gov/access/paleo-search/study/search.json?dataPublisher=NOAA&limit=10&cvWhats=Temperature&minLat=-10&maxLat=10&minLon=110&maxLon=130&earliestYear=10000&latestYear=0&timeFormat=BP
Parsing NOAA studies: 100%|██████████| 10/10 [00:00<00:00, 4045.43it/s]
[2026-04-28 10:08:14,498][WARNING] - Retrieved 10 studies, which is the specified limit. Consider increasing the limit parameter to fetch more studies.
[2026-04-28 10:08:14,499][INFO] - Retrieved 10 studies.
By default, the search was done as overAny, including datasets that span much longer/shorter than the Holocene. Let’s perform the search with entireOver, which requires the datasets to cover the entire time period, hence removing the shorter record:
res = ds.search_studies(variable_name = 'Temperature', earliest_year = 10000, latest_year=0, time_format = 'BP', min_lat = -10,
max_lat = 10, min_lon = 110, max_lon = 130, time_method = 'entireOver',limit = 10)
display(res)[2026-04-28 10:49:55,057][INFO] - search_studies: Limit set to 10.
[2026-04-28 10:49:55,058][INFO] - search_studies: Input Query includes geographical bounds. Inspect the results to ensure they match your intended region as one study can contain sites across various parts of the world.
Request URL: https://www.ncei.noaa.gov/access/paleo-search/study/search.json?dataPublisher=NOAA&limit=10&cvWhats=Temperature&minLat=-10&maxLat=10&minLon=110&maxLon=130&earliestYear=10000&latestYear=0&timeFormat=BP&timeMethod=entireOver
Parsing NOAA studies: 100%|██████████| 10/10 [00:00<00:00, 6211.02it/s]
[2026-04-28 10:49:55,846][WARNING] - Retrieved 10 studies, which is the specified limit. Consider increasing the limit parameter to fetch more studies.
[2026-04-28 10:49:55,847][INFO] - Retrieved 10 studies.
Similarly, using the overEntire will remove datasets that cover much longer timespans than the Holocene:
res = ds.search_studies(variable_name = 'Temperature', earliest_year = 10000, latest_year=0, time_format = 'BP', min_lat = -10,
max_lat = 10, min_lon = 110, max_lon = 130, time_method = 'overEntire',limit = 10)
display(res)[2026-04-28 10:51:10,592][INFO] - search_studies: Limit set to 10.
[2026-04-28 10:51:10,593][INFO] - search_studies: Input Query includes geographical bounds. Inspect the results to ensure they match your intended region as one study can contain sites across various parts of the world.
Request URL: https://www.ncei.noaa.gov/access/paleo-search/study/search.json?dataPublisher=NOAA&limit=10&cvWhats=Temperature&minLat=-10&maxLat=10&minLon=110&maxLon=130&earliestYear=10000&latestYear=0&timeFormat=BP&timeMethod=overEntire
Parsing NOAA studies: 100%|██████████| 9/9 [00:00<00:00, 6410.04it/s]
[2026-04-28 10:51:11,404][INFO] - Retrieved 9 studies.
Summary¶
In this tutorial, you have learned how to use the search_studies function with its many parameters to perform query on NOAA NCEI for paleo. In the next tutorial, we will have a look at the PANGAEA search.
- Clemens, S. C., Yamamoto, M., Thirumalai, K., Giosan, L., Richey, J. N., Nilsson-Kerr, K., Rosenthal, Y., Anand, P., & McGrath, S. M. (2021). Remote and local drivers of Pleistocene South Asian summer monsoon precipitation: A test for future predictions. Science Advances, 7(23). 10.1126/sciadv.abg3848
- Bhattacharya, T., Feng, R., Tierney, J. E., Rubbelke, C., Burls, N., Knapp, S., & Fu, M. (2022). Expansion and Intensification of the North American Monsoon During the Pliocene. AGU Advances, 3(6). 10.1029/2022av000757