{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Accessing CMIP6 output with `intake`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview\n", "A large swath of CMIP6 output is hosted by the [AWS Open Data Sponsorship Program](https://aws.amazon.com/opendata/public-datasets/) and by Google as part of [Google Cloud Public Datasets](https://cloud.google.com/public-datasets). This notebook will demonstrate how to access, query and request data of interest from a particular source's CMIP6 holdings using [intake](https://github.com/intake/intake-esm). Additionally, we will look at how to concatenate output that may not be on the same calendar.\n", "\n", "1. Browse the CMIP6 catalog\n", "1. Select CMIP6 output of interest\n", "1. Concatenate experiments in time (including converting calendars when required)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prerequisites\n", "| Concepts | Importance | Notes |\n", "| --- | --- | --- |\n", "| Xarray | Helpful | |\n", "| CMIP6 | Helpful |Familiarity with metadata structure |\n", "\n", "- **Time to learn**: 15 min\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Imports" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from collections import defaultdict, Counter\n", "\n", "import pandas as pd\n", "import numpy as np\n", "\n", "import intake\n", "import xarray as xr\n", "import nc_time_axis\n", "import pyleoclim as pyleo\n", "import cftime" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## CMIP6 Catalog" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's worth taking a minute to explore the catalog and how it's organized. Never fear, opening the datastore doesn't load all holdings into memory, which is good because at the time of writing this there were over 520k files available. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
pangeo-cmip6 catalog with 7674 dataset(s) from 514818 asset(s):
\n", " | unique | \n", "
---|---|
activity_id | \n", "18 | \n", "
institution_id | \n", "36 | \n", "
source_id | \n", "88 | \n", "
experiment_id | \n", "170 | \n", "
member_id | \n", "657 | \n", "
table_id | \n", "37 | \n", "
variable_id | \n", "700 | \n", "
grid_label | \n", "10 | \n", "
zstore | \n", "514818 | \n", "
dcpp_init_year | \n", "60 | \n", "
version | \n", "736 | \n", "
derived_variable_id | \n", "0 | \n", "
\n", " | activity_id | \n", "institution_id | \n", "source_id | \n", "experiment_id | \n", "member_id | \n", "table_id | \n", "variable_id | \n", "grid_label | \n", "zstore | \n", "dcpp_init_year | \n", "version | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "HighResMIP | \n", "CMCC | \n", "CMCC-CM2-HR4 | \n", "highresSST-present | \n", "r1i1p1f1 | \n", "Amon | \n", "ps | \n", "gn | \n", "gs://cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/... | \n", "NaN | \n", "20170706 | \n", "
1 | \n", "HighResMIP | \n", "CMCC | \n", "CMCC-CM2-HR4 | \n", "highresSST-present | \n", "r1i1p1f1 | \n", "Amon | \n", "rsds | \n", "gn | \n", "gs://cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/... | \n", "NaN | \n", "20170706 | \n", "
2 | \n", "HighResMIP | \n", "CMCC | \n", "CMCC-CM2-HR4 | \n", "highresSST-present | \n", "r1i1p1f1 | \n", "Amon | \n", "rlus | \n", "gn | \n", "gs://cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/... | \n", "NaN | \n", "20170706 | \n", "
3 | \n", "HighResMIP | \n", "CMCC | \n", "CMCC-CM2-HR4 | \n", "highresSST-present | \n", "r1i1p1f1 | \n", "Amon | \n", "rlds | \n", "gn | \n", "gs://cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/... | \n", "NaN | \n", "20170706 | \n", "
4 | \n", "HighResMIP | \n", "CMCC | \n", "CMCC-CM2-HR4 | \n", "highresSST-present | \n", "r1i1p1f1 | \n", "Amon | \n", "psl | \n", "gn | \n", "gs://cmip6/CMIP6/HighResMIP/CMCC/CMCC-CM2-HR4/... | \n", "NaN | \n", "20170706 | \n", "
\n", " | \n", " | \n", " | pr | \n", "ta | \n", "tas | \n", "ua | \n", "total | \n", "
---|---|---|---|---|---|---|---|
institution_id | \n", "experiment_id | \n", "source_id | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
AWI | \n", "historical | \n", "AWI-ESM-1-1-LR | \n", "2.0 | \n", "3.0 | \n", "3.0 | \n", "5.0 | \n", "13.0 | \n", "
lgm | \n", "AWI-ESM-1-1-LR | \n", "1.0 | \n", "1.0 | \n", "1.0 | \n", "0.0 | \n", "3.0 | \n", "|
midHolocene | \n", "AWI-ESM-1-1-LR | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "1.0 | \n", "2.0 | \n", "|
INM | \n", "historical | \n", "INM-CM4-8 | \n", "2.0 | \n", "2.0 | \n", "2.0 | \n", "3.0 | \n", "9.0 | \n", "
lgm | \n", "INM-CM4-8 | \n", "1.0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "2.0 | \n", "|
midHolocene | \n", "INM-CM4-8 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "1.0 | \n", "2.0 | \n", "|
MIROC | \n", "historical | \n", "MIROC-ES2L | \n", "41.0 | \n", "31.0 | \n", "42.0 | \n", "33.0 | \n", "147.0 | \n", "
lgm | \n", "MIROC-ES2L | \n", "1.0 | \n", "1.0 | \n", "1.0 | \n", "1.0 | \n", "4.0 | \n", "|
past1000 | \n", "MIROC-ES2L | \n", "1.0 | \n", "1.0 | \n", "0.0 | \n", "0.0 | \n", "2.0 | \n", "|
MPI-M | \n", "historical | \n", "MPI-ESM1-2-LR | \n", "20.0 | \n", "23.0 | \n", "30.0 | \n", "28.0 | \n", "101.0 | \n", "
lgm | \n", "MPI-ESM1-2-LR | \n", "1.0 | \n", "1.0 | \n", "1.0 | \n", "1.0 | \n", "4.0 | \n", "|
midHolocene | \n", "MPI-ESM1-2-LR | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "1.0 | \n", "2.0 | \n", "|
MRI | \n", "hist-volc | \n", "MRI-ESM2-0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
historical | \n", "MRI-ESM2-0 | \n", "23.0 | \n", "17.0 | \n", "18.0 | \n", "21.0 | \n", "79.0 | \n", "|
midHolocene | \n", "MRI-ESM2-0 | \n", "0.0 | \n", "1.0 | \n", "0.0 | \n", "1.0 | \n", "2.0 | \n", "|
past1000 | \n", "MRI-ESM2-0 | \n", "1.0 | \n", "1.0 | \n", "1.0 | \n", "0.0 | \n", "3.0 | \n", "
<xarray.Dataset>\n", "Dimensions: (lat: 160, bnds: 2, lon: 320, member_id: 1,\n", " dcpp_init_year: 1, time: 13980, plev: 19)\n", "Coordinates:\n", " * lat (lat) float64 -89.14 -88.03 -86.91 ... 86.91 88.03 89.14\n", " lat_bnds (lat, bnds) float64 -90.0 -88.59 -88.59 ... 88.59 88.59 90.0\n", " * lon (lon) float64 0.0 1.125 2.25 3.375 ... 356.6 357.8 358.9\n", " lon_bnds (lon, bnds) float64 -0.5625 0.5625 0.5625 ... 358.3 359.4\n", " * time (time) object 0850-01-16 12:00:00 ... 2014-12-16 12:00:00\n", " time_bnds (time, bnds) object dask.array<chunksize=(12000, 2), meta=np.ndarray>\n", " * member_id (member_id) object 'r1i1p1f1'\n", " * dcpp_init_year (dcpp_init_year) float64 nan\n", " * plev (plev) float64 1e+05 9.25e+04 8.5e+04 ... 1e+03 500.0 100.0\n", " height float64 2.0\n", "Dimensions without coordinates: bnds\n", "Data variables:\n", " pr (member_id, dcpp_init_year, time, lat, lon) float32 dask.array<chunksize=(1, 1, 308, 160, 320), meta=np.ndarray>\n", " ta (member_id, dcpp_init_year, time, plev, lat, lon) float32 dask.array<chunksize=(1, 1, 26, 19, 160, 320), meta=np.ndarray>\n", " tas (member_id, dcpp_init_year, time, lat, lon) float32 dask.array<chunksize=(1, 1, 439, 160, 320), meta=np.ndarray>\n", " ua (member_id, dcpp_init_year, time, plev, lat, lon) float32 dask.array<chunksize=(1, 1, 35, 19, 160, 320), meta=np.ndarray>\n", "Attributes: (12/44)\n", " Conventions: CF-1.7 CMIP-6.2\n", " activity_id: PMIP\n", " branch_method: no parent\n", " cmor_version: 3.5.0\n", " data_specs_version: 01.00.31\n", " experiment: last millennium\n", " ... ...\n", " intake_esm_attrs:member_id: r1i1p1f1\n", " intake_esm_attrs:table_id: Amon\n", " intake_esm_attrs:grid_label: gn\n", " intake_esm_attrs:version: 20200120\n", " intake_esm_attrs:_data_format_: zarr\n", " intake_esm_dataset_key: PMIP.MRI.MRI-ESM2-0.past1000.Amon.gn
<xarray.DataArray 'tas' (member_id: 1, dcpp_init_year: 1, time: 13980)>\n", "dask.array<getitem, shape=(1, 1, 13980), dtype=float32, chunksize=(1, 1, 600), chunktype=numpy.ndarray>\n", "Coordinates:\n", " lat float64 -9.533\n", " lon float64 110.2\n", " * time (time) object 0850-01-16 12:00:00 ... 2014-12-16 12:00:00\n", " * member_id (member_id) object 'r1i1p1f1'\n", " * dcpp_init_year (dcpp_init_year) float64 nan\n", " height float64 2.0\n", "Attributes:\n", " cell_measures: area: areacella\n", " cell_methods: area: time: mean\n", " comment: near-surface (usually, 2 meter) air temperature\n", " history: 2020-01-04T03:36:44Z altered by CMOR: Treated scalar dime...\n", " long_name: Near-Surface Air Temperature\n", " original_name: TA\n", " standard_name: air_temperature\n", " units: K
Warning
\n", " Experiments do not necessarily fall on the same calendar, even those carried out using the same model. \n", "