API Reference#
EnsembleTS#
- class pens.ens.EnsembleTS(time=None, value=None, label=None, time_name=None, time_unit=None, value_name=None, value_unit=None)#
Ensemble Timeseries
Note that annual reconstruction is assumed so the time axis is in years. The ensembles variable should be in shape of (nt, nEns), where nt is the number of years, and nEns is the number of ensemble members.
- distance(y=None, order=1, nsamples=None)#
Compute the distance between a target y and the ensemble object.
- Parameters:
y (array-like, length n) – trace/plume whose probability is to be assessed If None, the distance is computed between every possible pair of trajectories within the ensemble If specified, Must have n == self.nt
order (int, or inf) – Order of the norm. inf means numpy’s inf object. The default is 1.
nsamples (int) – number of samples to use (to speed up computation for very large ensembles)
See also
- Returns:
dist
- Return type:
numpy array, dimension (self.nEns,)
- from_df(df, time_column=None, value_columns=None)#
Load data from a pandas.DataFrame
- Parameters:
df (pandas.DataFrame) – The pandas.DataFrame object.
time_column (str) – The label of the column for the time axis.
value_columns (list of str) – The list of the labels for the value axis of the ensemble members.
- get_means_and_trends(segment_length=10, step=10, xm=array([-0.5, -0.48994975, -0.4798995, -0.46984925, -0.45979899, -0.44974874, -0.43969849, -0.42964824, -0.41959799, -0.40954774, -0.39949749, -0.38944724, -0.37939698, -0.36934673, -0.35929648, -0.34924623, -0.33919598, -0.32914573, -0.31909548, -0.30904523, -0.29899497, -0.28894472, -0.27889447, -0.26884422, -0.25879397, -0.24874372, -0.23869347, -0.22864322, -0.21859296, -0.20854271, -0.19849246, -0.18844221, -0.17839196, -0.16834171, -0.15829146, -0.14824121, -0.13819095, -0.1281407, -0.11809045, -0.1080402, -0.09798995, -0.0879397, -0.07788945, -0.0678392, -0.05778894, -0.04773869, -0.03768844, -0.02763819, -0.01758794, -0.00753769, 0.00251256, 0.01256281, 0.02261307, 0.03266332, 0.04271357, 0.05276382, 0.06281407, 0.07286432, 0.08291457, 0.09296482, 0.10301508, 0.11306533, 0.12311558, 0.13316583, 0.14321608, 0.15326633, 0.16331658, 0.17336683, 0.18341709, 0.19346734, 0.20351759, 0.21356784, 0.22361809, 0.23366834, 0.24371859, 0.25376884, 0.2638191, 0.27386935, 0.2839196, 0.29396985, 0.3040201, 0.31407035, 0.3241206, 0.33417085, 0.34422111, 0.35427136, 0.36432161, 0.37437186, 0.38442211, 0.39447236, 0.40452261, 0.41457286, 0.42462312, 0.43467337, 0.44472362, 0.45477387, 0.46482412, 0.47487437, 0.48492462, 0.49497487, 0.50502513, 0.51507538, 0.52512563, 0.53517588, 0.54522613, 0.55527638, 0.56532663, 0.57537688, 0.58542714, 0.59547739, 0.60552764, 0.61557789, 0.62562814, 0.63567839, 0.64572864, 0.65577889, 0.66582915, 0.6758794, 0.68592965, 0.6959799, 0.70603015, 0.7160804, 0.72613065, 0.7361809, 0.74623116, 0.75628141, 0.76633166, 0.77638191, 0.78643216, 0.79648241, 0.80653266, 0.81658291, 0.82663317, 0.83668342, 0.84673367, 0.85678392, 0.86683417, 0.87688442, 0.88693467, 0.89698492, 0.90703518, 0.91708543, 0.92713568, 0.93718593, 0.94723618, 0.95728643, 0.96733668, 0.97738693, 0.98743719, 0.99748744, 1.00753769, 1.01758794, 1.02763819, 1.03768844, 1.04773869, 1.05778894, 1.0678392, 1.07788945, 1.0879397, 1.09798995, 1.1080402, 1.11809045, 1.1281407, 1.13819095, 1.14824121, 1.15829146, 1.16834171, 1.17839196, 1.18844221, 1.19849246, 1.20854271, 1.21859296, 1.22864322, 1.23869347, 1.24874372, 1.25879397, 1.26884422, 1.27889447, 1.28894472, 1.29899497, 1.30904523, 1.31909548, 1.32914573, 1.33919598, 1.34924623, 1.35929648, 1.36934673, 1.37939698, 1.38944724, 1.39949749, 1.40954774, 1.41959799, 1.42964824, 1.43969849, 1.44974874, 1.45979899, 1.46984925, 1.4798995, 1.48994975, 1.5]), bw='silverman')#
Extract trend distributions from EnsembleTS object via Gaussian Kernel Density Estimation
- Parameters:
segment_length (int, optional) – DESCRIPTION. The default is 10.
step (int, optional) – DESCRIPTION. The default is 10.
xm (NumPy array, optional) – axis over which KDE is calculated The default is np.linspace(-0.5,1.5,200).
bw (str, scalar or callable, optional) – The method used to calculate the estimator bandwidth. This can be “scott”, “silverman”, a scalar constant or a callable. If a scalar, this will be used directly as kde.factor. If a callable, it should take a gaussian_kde instance as only parameter and return a scalar. If None (default), “scott” is used.
- Returns:
new – DESCRIPTION.
- Return type:
TYPE
- hdi_score(y, prob=0.9)#
Computes HDI score for target series y
- Parameters:
y (array-like, length n) – trace whose intensity of probability (“likelihood”) is to be assessed Must have n == self.nt.
prob (float) – probability for which the highest density interval will be computed. The default is 0.9.
- Returns:
score (the score (scalar))
HDI (the n x 2 array)
- line_density(figsize=[10, 4], cmap='Greys', color_scale='linear', bins=None, num_fine=None, xlabel=None, ylabel=None, title=None, ylim=None, xlim=None, title_kwargs=None, ax=None, **pcolormesh_kwargs)#
Plot the timeseries 2-D histogram
- Parameters:
cmap (str) – The colormap for the histogram.
color_scale (str) – The scale of the colorbar; should be either ‘linear’ or ‘log’.
bins (list/tuple of 2 floats) – The number of bins for each axis: nx, ny = bins.
Referneces –
---------- –
https (-) –
- load_nc(path, time_name='time', var=None)#
Load data from a .nc file with xarray
- Parameters:
path (str) – The path of the .nc file.
var (str) – The name of variable to load. Note that we assume the first axis of the loaded variable is time.
time_name (str) – The name of the time axis.
- make_labels()#
Initialization of plot labels based on object metadata
- Returns:
time_header (str) – Label for the time axis
value_header (str) – Label for the value axis
- plot(figsize=[12, 4], xlabel=None, ylabel=None, title=None, ylim=None, xlim=None, legend_kwargs=None, title_kwargs=None, ax=None, **plot_kwargs)#
Plot the raw values (multiple series)
- plot_hdi(prob=0.9, median=True, figsize=[12, 4], color='tab:blue', xlabel=None, ylabel=None, label=None, title=None, ylim=None, xlim=None, alpha=0.2, legend_kwargs=None, title_kwargs=None, ax=None, **plot_kwargs)#
h/t: Arviz code: https://arviz-devs.github.io/arviz/_modules/arviz/stats/stats.html#hdi
- Parameters:
prob (float) – probability for which the highest density interval will be computed. The default is 0.9.
median (bool) – If True (default), the posterior median is added.
figsize (tuple, optional) – dimensions of the figure. The default is [12, 4].
xlabel (str, optional) – Label for x axis. The default is None.
ylabel (str, optional) – Label for y axis. The default is None.
label (str, optional) – Label for the plotted objects; useful for multi-plots. If None (default) is specified, will attempt to use the object’s label.
title (TYPE, optional) – DESCRIPTION. The default is None.
ylim (TYPE, optional) – DESCRIPTION. The default is None.
xlim (TYPE, optional) – DESCRIPTION. The default is None.
alpha (TYPE, optional) – DESCRIPTION. The default is 0.3.
legend_kwargs (TYPE, optional) – DESCRIPTION. The default is None.
title_kwargs (TYPE, optional) – DESCRIPTION. The default is None.
ax (TYPE, optional) – DESCRIPTION. The default is None.
**plot_kwargs (TYPE) – DESCRIPTION.
- Returns:
DESCRIPTION.
- Return type:
TYPE
- plot_qs(figsize=[10, 4], qs=[0.025, 0.25, 0.5, 0.75, 0.975], color='indianred', xlabel=None, ylabel=None, title=None, ylim=None, xlim=None, alphas=[0.3, 0.1], plot_kwargs=None, legend_kwargs=None, title_kwargs=None, ax=None, plot_trend=True)#
Plot the quantiles
- Parameters:
figsize (list, optional) – The size of the figure. Defaults to [12, 4].
qs (list, optional) – The list to denote the quantiles plotted. Defaults to [0.025, 0.25, 0.5, 0.75, 0.975].
color (str, optional) – The basic color for the quantile envelopes. Defaults to ‘indianred’.
xlabel (str, optional) – The label for the x-axis. Defaults to ‘Year (CE)’.
ylabel (str, optional) – The label for the y-axis. Defaults to None.
title (str, optional) – The title of the figure. Defaults to None.
ylim (tuple or list, optional) – The limit of the y-axis. Defaults to None.
xlim (tuple or list, optional) – The limit of the x-axis. Defaults to None.
alphas (list, optional) – The alphas for the quantile envelopes. Defaults to [0.5, 0.1].
plot_kwargs (dict, optional) – The keyword arguments for the ax.plot() function. Defaults to None.
legend_kwargs (dict, optional) – The keyword arguments for the ax.legend() function. Defaults to None.
title_kwargs (dict, optional) – The keyword arguments for the ax.title() function. Defaults to None.
ax (matplotlib.axes, optional) – The matplotlib.axes object. If set the image will be plotted in the existing ax. Defaults to None.
plot_trend (bool, optional) – If True, will plot the trend analysis result if existed. Defaults to True.
- plot_traces(num_traces=5, figsize=[10, 4], title=None, label=None, seed=None, indices=None, xlim=None, ylim=None, color=None, ax=None, plot_legend=True, lgd_kwargs=None, xlabel=None, ylabel=None, lw=0.5, alpha=0.1)#
Plot EnsembleTS as a subset of traces.
- Parameters:
num_traces (int, optional) – Number of traces to plot, chosen at random. Default is 5.
figsize (list, optional) – The figure size. The default is [10, 4].
xlabel (str, optional) – x-axis label. The default is None.
ylabel (str, optional) – y-axis label. The default is None.
title (str, optional) – Plot title. The default is None.
label (str, optional) – Label to use on the plot legend. Automatically generated if not provided.
seed (int, optional) – seed for the random number generator. Useful for reproducibility. The default is None. Disregarded if indices is not None
indices (int, optional) – (0-based) indices of the traces. The default is None. If provided, supersedes “seed” and “num_traces”.
xlim (list, optional) – x-axis limits. The default is None.
ylim (list, optional) – y-axis limits. The default is None.
color (str, optional) – Color of the traces. The default uses the property cycler: https://matplotlib.org/stable/gallery/color/color_cycle_default.html
alpha (float, optional) – Transparency of the lines representing the multiple members. The default is 0.3.
linestyle ({'-', '--', '-.', ':', '', (offset, on-off-seq), ...}) – Set the linestyle of the line
lw (float, optional) – Width of the lines representing the multiple members. The default is 0.5.
num_traces – Number of traces to plot. The default is None, which will plot all traces.
savefig_settings (dict, optional) –
the dictionary of arguments for plt.savefig(); some notes below:
”path” must be specified; it can be any existed or non-existed path, with or without a suffix; if the suffix is not given in “path”, it will follow “format”
”format” can be one of {“pdf”, “eps”, “png”, “ps”} The default is None.
ax (matplotlib.ax, optional) – Matplotlib axis on which to return the plot. The default is None.
plot_legend (bool; {True,False}, optional) – Whether to plot the legend. The default is True.
lgd_kwargs (dict, optional) – Parameters for the legend. The default is None.
seed – Set the seed for the random number generator. Useful for reproducibility. The default is None.
- Returns:
fig (matplotlib.figure) – the figure object from matplotlib See [matplotlib.pyplot.figure](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.figure.html) for details.
ax (matplotlib.axis) – the axis object from matplotlib See [matplotlib.axes](https://matplotlib.org/api/axes_api.html) for details.
- plume_distance(y=None, max_dist=1, num=100, q=0.5, order=1, spread_stat='IQR', dist=None, nsamples=None)#
Compute the (quantile-based) characteristic distance between a plume (ensemble) and another object (whether a single trace or another plume). Searches for quantile q of the “proximity probability” distribution
- Parameters:
y (array-like, length self.nt) – trace/plume whose probability is to be assessed
q (float) – Quantile from which the characteristic distance is derived. Default = 0.5 (median)
order (int, or inf) – Order of the norm. inf means numpy’s inf object. The default is 1.
spread_stat (str) – Statistic to be used for distributional spread. Choices: ‘SD’, ‘IQR’ or ‘HDI’ SD is the standard deviation, appropriate for Gaussian situations IQR (default) is the interquartile-range (a non-parametric measure, robust and resistant) HDI returns the 95% highest-density interval as a NumPy 2-array
dist (array-like, length self.nEns) – if provided, uses this as vector of distances. Otherwise it is computed internally
nsamples (int) – number of samples to use from the ensemble. Default is None, which uses all samples.
See also
- Returns:
eps_q (float) – Representative distance at quantile q (in same units as self or y)
eps_spread (float) – Measure of distributional spread
- proximity_prob(y, eps, order=1, dist=None, nsamples=None)#
Compute the probability P that the trace y is within a distance eps of the ensemble object.
- Parameters:
y (array-like, length self.nt) – trace/plume whose proximity is to be assessed
eps (array of float64) – numerical tolerance for the distance.
order (int, or inf) – Order of the norm. inf means numpy’s inf object. The default is 1.
dist (array-like, length self.nEns) – if provided, uses this as vector of distances. Otherwise it is computed internally
See also
- Returns:
P – Probability that the trace y is within a distance eps of the ensemble object
- Return type:
float in [0,1]
- random_paths(model='fGn', param=None, p=1, trend=None, seed=None)#
Generate p random walks through the ensemble according to a given parametric model with random parameter sampling
- Parameters:
model (str) –
Stochastic model for the temporal behavior. Accepted choices are:
unif: resample uniformly from the posterior distribution
ar: autoregressive model, see https://www.statsmodels.org/dev/tsa.html#univariate-autoregressive-processes-ar
fGn: fractional Gaussian noise, see https://stochastic.readthedocs.io/en/stable/noise.html#stochastic.processes.noise.FractionalGaussianNoise
power-law: aka Colored Noise, see https://stochastic.readthedocs.io/en/stable/noise.html#stochastic.processes.noise.ColoredNoise
param (variable type [default is None]) –
parameter of the model.
unif: no parameter
ar: param is the result from fitting Statsmodels Autoreg.fit() (with zero-lag term)
fGn: param is the Hurst exponent, H (float)
power-law: param is the spectral exponent beta (float)
Under allowable values, fGn and power-law should return equivalent results as long as H = (beta+1)/2 is in [0, 1)
p (int) – number of series to export
trend (array, length self.nt) – general trend of the ensemble. If None, it is calculated as the ensemble mean. If provided, it will be added to the ensemble.
seed (int) – seed for the random generator (provided for reproducibility)
- Returns:
new
- Return type:
EnsembleTS object containing the p series
- sample_nearest(target, metric='MSE')#
Get the nearest sample path against the target series
Note that metric is used only for the final distance calculation.
- slice(timespan)#
Slicing the timeseries with a timespan (tuple or list)
- Parameters:
timespan (tuple or list) – The list of time points for slicing, whose length must be even. When there are n time points, the output Series includes n/2 segments. For example, if timespan = [a, b], then the sliced output includes one segment [a, b]; if timespan = [a, b, c, d], then the sliced output includes segment [a, b] and segment [c, d].
- Returns:
new – The sliced EnsembleSeries object.
- Return type:
- subsample(nsamples, seed=None)#
Thin out original ensemble by drawing nsamples at random
- Parameters:
nsamples (int) – number of samples to draw at random from the original ensemble. If nsamples >= self.nEns, the object is returned unchanged.
seed (int) – seed for the random generator (provided for reproducibility)
- Returns:
res – Downsized object.
- Return type:
- to_df(time_column=None, value_column='ens')#
Convert an EnsembleTS to a pandas.DataFrame
- Parameters:
time_column (str) – The label of the column for the time axis.
value_column (str) – The base column label for the ensemble members. By default, the columns for the members will be labeled as “ens.0”, “ens.1”, “ens.2”, etc.
- to_pyleo(**kwargs)#
Convert to a pyleoclim.EnsembleSeries or pyleoclim.Series object
- Parameters:
kwargs (keyword arguments) – keyword arguments for a pyleoclim.Series object
- trace_rank(y)#
Computes ensemble rank (expressed as percentile) for trace y
- Parameters:
y (array-like, length n) – trace whose rank within the ensemble is to be assessed Must have n == self.nt.
- Returns:
percent
- Return type:
array-like, length n
Utilities#
- pens.utils.hdi1d(ary, hdi_prob, skipna=True)#
Compute highest density interval over a 1d array. h/t: Arviz code: https://arviz-devs.github.io/arviz/_modules/arviz/stats/stats.html#hdi
- aryNumPy array
values over which to compute HDI
- hdi_probfloat
probability
- skipnabool
flag to decide whether to drop NaNs (defaults to True)
- pens.utils.means_and_trends_ensemble(var, segment_length, step, years)#
- Calculates the means and trends on an ensemble array
Uses statsmodels’ OLS method
- Inputs:
var: 2d numpy array [time, ens member] segment_length: # elements in block (integer) step: step size (integer) years: 1d numpy array
- Outputs:
means: Means of every segment. trends: trends over every segment. idxs: The first and last index of every segment, for record-keeping. tm: median time point of each block
Author: Julien Emile-Geay, based on code by Michael P. Erb. Date: March 8, 2018
- pens.utils.standardize(x, scale=1, axis=0, ddof=0, eps=0.001)#
Centers and normalizes a time series. Constant or nearly constant time series not rescaled.
- Parameters:
x (array) – vector of (real) numbers as a time series, NaNs allowed
scale (real) – A scale factor used to scale a record to a match a given variance
axis (int or None) – axis along which to operate, if None, compute over the whole array
ddof (int) – degress of freedom correction in the calculation of the standard deviation
eps (real) – a threshold to determine if the standard deviation is too close to zero
- Returns:
z (array) – The standardized time series (z-score), Z = (X - E[X])/std(X)*scale, NaNs allowed
mu (real) – The mean of the original time series, E[X]
sig (real) – The standard deviation of the original time series, std[X]
References
Tapio Schneider’s MATLAB code: tapios/RegEM
The zscore function in SciPy: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.zscore.html
See also
pyleoclim.utils.tsutils.preprocess
pre-processes a times series using standardization and detrending.