API Reference

API Reference#

EnsembleTS#

class pens.ens.EnsembleTS(time=None, value=None, label=None, time_name=None, time_unit=None, value_name=None, value_unit=None)#

Ensemble Timeseries

Note that annual reconstruction is assumed so the time axis is in years. The ensembles variable should be in shape of (nt, nEns), where nt is the number of years, and nEns is the number of ensemble members.

distance(y=None, order=1, nsamples=None)#

Compute the distance between a target y and the ensemble object.

Parameters:

y (array-like, length n) – trace/plume whose probability is to be assessed If None, the distance is computed between every possible pair of trajectories within the ensemble If specified, Must have n == self.nt
order (int, or inf) – Order of the norm. inf means numpy’s inf object. The default is 1.
nsamples (int) – number of samples to use (to speed up computation for very large ensembles)

See also

np.linalg.norm: https://numpy.org/doc/stable/reference/generated/numpy.linalg.norm.html

Returns:: dist
Return type:: numpy array, dimension (self.nEns,)

from_df(df, time_column=None, value_columns=None)#

Load data from a pandas.DataFrame

Parameters:

df (pandas.DataFrame) – The pandas.DataFrame object.
time_column (str) – The label of the column for the time axis.
value_columns (list of str) – The list of the labels for the value axis of the ensemble members.

hdi_score(y, prob=0.9)#

Computes HDI score for target series y

Parameters:

y (array-like, length n) – trace whose intensity of probability (“likelihood”) is to be assessed Must have n == self.nt.
prob (float) – probability for which the highest density interval will be computed. The default is 0.9.

Returns:

score (the score (scalar))
HDI (the n x 2 array)

line_density(figsize=[10, 4], cmap='Greys', color_scale='linear', bins=None, num_fine=None, xlabel=None, ylabel=None, title=None, ylim=None, xlim=None, title_kwargs=None, ax=None, **pcolormesh_kwargs)#

Plot the timeseries 2-D histogram

Parameters:

cmap (str) – The colormap for the histogram.
color_scale (str) – The scale of the colorbar; should be either ‘linear’ or ‘log’.
bins (list/tuple of 2 floats) – The number of bins for each axis: nx, ny = bins.

References

https://matplotlib.org/3.6.0/gallery/statistics/time_series_histogram.html

load_nc(path, time_name='time', var=None)#

Load data from a .nc file with xarray

Parameters:

path (str) – The path of the .nc file.
var (str) – The name of variable to load. Note that we assume the first axis of the loaded variable is time.
time_name (str) – The name of the time axis.

make_labels()#

Initialization of plot labels based on object metadata

Returns:

time_header (str) – Label for the time axis
value_header (str) – Label for the value axis

plot(figsize=[12, 4], xlabel=None, ylabel=None, title=None, ylim=None, xlim=None, legend_kwargs=None, title_kwargs=None, ax=None, **plot_kwargs)#: Plot the raw values (multiple series)

plot_hdi(prob=0.9, median=True, figsize=[12, 4], color='tab:blue', xlabel=None, ylabel=None, label=None, title=None, ylim=None, xlim=None, alpha=0.2, legend_kwargs=None, title_kwargs=None, ax=None, **plot_kwargs)#

h/t: Arviz code: https://arviz-devs.github.io/arviz/_modules/arviz/stats/stats.html#hdi

Parameters:

prob (float) – probability for which the highest density interval will be computed. The default is 0.9.
median (bool) – If True (default), the posterior median is added.
figsize (tuple, optional) – dimensions of the figure. The default is [12, 4].
xlabel (str, optional) – Label for x axis. The default is None.
ylabel (str, optional) – Label for y axis. The default is None.
label (str, optional) – Label for the plotted objects; useful for multi-plots. If None (default) is specified, will attempt to use the object’s label.
title (TYPE, optional) – DESCRIPTION. The default is None.
ylim (TYPE, optional) – DESCRIPTION. The default is None.
xlim (TYPE, optional) – DESCRIPTION. The default is None.
alpha (TYPE, optional) – DESCRIPTION. The default is 0.3.
legend_kwargs (TYPE, optional) – DESCRIPTION. The default is None.
title_kwargs (TYPE, optional) – DESCRIPTION. The default is None.
ax (TYPE, optional) – DESCRIPTION. The default is None.
**plot_kwargs (TYPE) – DESCRIPTION.

Returns:

DESCRIPTION.

Return type:

TYPE

plot_qs(figsize=[10, 4], qs=[0.025, 0.25, 0.5, 0.75, 0.975], color='indianred', xlabel=None, ylabel=None, title=None, ylim=None, xlim=None, alphas=[0.3, 0.1], plot_kwargs=None, legend_kwargs=None, title_kwargs=None, ax=None, plot_trend=True)#

Plot the quantiles

Parameters:

figsize (list, optional) – The size of the figure. Defaults to [12, 4].
qs (list, optional) – The list to denote the quantiles plotted. Defaults to [0.025, 0.25, 0.5, 0.75, 0.975].
color (str, optional) – The basic color for the quantile envelopes. Defaults to ‘indianred’.
xlabel (str, optional) – The label for the x-axis. Defaults to ‘Year (CE)’.
ylabel (str, optional) – The label for the y-axis. Defaults to None.
title (str, optional) – The title of the figure. Defaults to None.
ylim (tuple or list, optional) – The limit of the y-axis. Defaults to None.
xlim (tuple or list, optional) – The limit of the x-axis. Defaults to None.
alphas (list, optional) – The alphas for the quantile envelopes. Defaults to [0.5, 0.1].
plot_kwargs (dict, optional) – The keyword arguments for the ax.plot() function. Defaults to None.
legend_kwargs (dict, optional) – The keyword arguments for the ax.legend() function. Defaults to None.
title_kwargs (dict, optional) – The keyword arguments for the ax.title() function. Defaults to None.
ax (matplotlib.axes, optional) – The matplotlib.axes object. If set the image will be plotted in the existing ax. Defaults to None.
plot_trend (bool, optional) – If True, will plot the trend analysis result if existed. Defaults to True.

plot_traces(num_traces=5, figsize=[10, 4], title=None, label=None, seed=None, indices=None, xlim=None, ylim=None, color=None, ax=None, plot_legend=True, lgd_kwargs=None, xlabel=None, ylabel=None, lw=0.5, alpha=0.1)#

Plot EnsembleTS as a subset of traces.

Parameters:

num_traces (int, optional) – Number of traces to plot, chosen at random. Default is 5.
figsize (list, optional) – The figure size. The default is [10, 4].
xlabel (str, optional) – x-axis label. The default is None.
ylabel (str, optional) – y-axis label. The default is None.
title (str, optional) – Plot title. The default is None.
label (str, optional) – Label to use on the plot legend. Automatically generated if not provided.
seed (int, optional) – seed for the random number generator. Useful for reproducibility. The default is None. Disregarded if indices is not None
indices (int, optional) – (0-based) indices of the traces. The default is None. If provided, supersedes “seed” and “num_traces”.
xlim (list, optional) – x-axis limits. The default is None.
ylim (list, optional) – y-axis limits. The default is None.
color (str, optional) – Color of the traces. The default uses the property cycler: https://matplotlib.org/stable/gallery/color/color_cycle_default.html
alpha (float, optional) – Transparency of the lines representing the multiple members. The default is 0.3.
linestyle ({'-', '--', '-.', ':', '', (offset, on-off-seq), ...}) – Set the linestyle of the line
lw (float, optional) – Width of the lines representing the multiple members. The default is 0.5.
num_traces – Number of traces to plot. The default is None, which will plot all traces.
savefig_settings (dict, optional) –
the dictionary of arguments for plt.savefig(); some notes below:
- ”path” must be specified; it can be any existed or non-existed path, with or without a suffix; if the suffix is not given in “path”, it will follow “format”
- ”format” can be one of {“pdf”, “eps”, “png”, “ps”} The default is None.
ax (matplotlib.ax, optional) – Matplotlib axis on which to return the plot. The default is None.
plot_legend (bool; {True,False}, optional) – Whether to plot the legend. The default is True.
lgd_kwargs (dict, optional) – Parameters for the legend. The default is None.
seed – Set the seed for the random number generator. Useful for reproducibility. The default is None.

Returns:

fig (matplotlib.figure) – the figure object from matplotlib See [matplotlib.pyplot.figure](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.figure.html) for details.
ax (matplotlib.axis) – the axis object from matplotlib See [matplotlib.axes](https://matplotlib.org/api/axes_api.html) for details.

plume_distance(y=None, max_dist=1, num=100, q=0.5, order=1, spread_stat='IQR', dist=None, nsamples=None)#

Compute the (quantile-based) characteristic distance between a plume (ensemble) and another object (whether a single trace or another plume). Searches for quantile q of the “proximity probability” distribution

Parameters:

y (array-like, length self.nt) – trace/plume whose probability is to be assessed
q (float) – Quantile from which the characteristic distance is derived. Default = 0.5 (median)
order (int, or inf) – Order of the norm. inf means numpy’s inf object. The default is 1.
spread_stat (str) – Statistic to be used for distributional spread. Choices: ‘SD’, ‘IQR’ or ‘HDI’ SD is the standard deviation, appropriate for Gaussian situations IQR (default) is the interquartile-range (a non-parametric measure, robust and resistant) HDI returns the 95% highest-density interval as a NumPy 2-array
dist (array-like, length self.nEns) – if provided, uses this as vector of distances. Otherwise it is computed internally
nsamples (int) – number of samples to use from the ensemble. Default is None, which uses all samples.

See also

np.linalg.norm: https://numpy.org/doc/stable/reference/generated/numpy.linalg.norm.html

Returns:

eps_q (float) – Representative distance at quantile q (in same units as self or y)
eps_spread (float) – Measure of distributional spread

proximity_prob(y, eps, order=1, dist=None, nsamples=None)#

Compute the probability P that the trace y is within a distance eps of the ensemble object.

Parameters:

y (array-like, length self.nt) – trace/plume whose proximity is to be assessed
eps (array of float64) – numerical tolerance for the distance.
order (int, or inf) – Order of the norm. inf means numpy’s inf object. The default is 1.
dist (array-like, length self.nEns) – if provided, uses this as vector of distances. Otherwise it is computed internally

See also

np.linalg.norm: https://numpy.org/doc/stable/reference/generated/numpy.linalg.norm.html

Returns:: P – Probability that the trace y is within a distance eps of the ensemble object
Return type:: float in [0,1]

random_paths(model='fGn', param=None, p=1, trend=None, seed=None)#

Generate p random walks through the ensemble according to a given parametric model with random parameter sampling

Parameters:

model (str) –
Stochastic model for the temporal behavior. Accepted choices are:
- unif: resample uniformly from the posterior distribution
- ar: autoregressive model, see https://www.statsmodels.org/dev/tsa.html#univariate-autoregressive-processes-ar
- fGn: fractional Gaussian noise, see https://stochastic.readthedocs.io/en/stable/noise.html#stochastic.processes.noise.FractionalGaussianNoise
- power-law: aka Colored Noise, see https://stochastic.readthedocs.io/en/stable/noise.html#stochastic.processes.noise.ColoredNoise
param (variable type [default is None]) –
parameter of the model.
- unif: no parameter
- ar: param is the result from fitting Statsmodels Autoreg.fit() (with zero-lag term)
- fGn: param is the Hurst exponent, H (float)
- power-law: param is the spectral exponent beta (float)
Under allowable values, fGn and power-law should return equivalent results as long as H = (beta+1)/2 is in [0, 1)
p (int) – number of series to export
trend (array, length self.nt) – general trend of the ensemble. If None, it is calculated as the ensemble mean. If provided, it will be added to the ensemble.
seed (int) – seed for the random generator (provided for reproducibility)

Returns:

new

Return type:

EnsembleTS object containing the p series

sample_nearest(target, metric='MSE')#

Get the nearest sample path against the target series

Note that metric is used only for the final distance calculation.

slice(timespan)#

Slicing the timeseries with a timespan (tuple or list)

Parameters:: timespan (tuple or list) – The list of time points for slicing, whose length must be even. When there are n time points, the output Series includes n/2 segments. For example, if timespan = [a, b], then the sliced output includes one segment [a, b]; if timespan = [a, b, c, d], then the sliced output includes segment [a, b] and segment [c, d].
Returns:: new – The sliced EnsembleSeries object.
Return type:: EnsembleTS

subsample(nsamples, seed=None)#

Thin out original ensemble by drawing nsamples at random

Parameters:

nsamples (int) – number of samples to draw at random from the original ensemble. If nsamples >= self.nEns, the object is returned unchanged.
seed (int) – seed for the random generator (provided for reproducibility)

Returns:

res – Downsized object.

Return type:

EnsembleTS

to_df(time_column=None, value_column='ens')#

Convert an EnsembleTS to a pandas.DataFrame

Parameters:

time_column (str) – The label of the column for the time axis.
value_column (str) – The base column label for the ensemble members. By default, the columns for the members will be labeled as “ens.0”, “ens.1”, “ens.2”, etc.

to_pyleo(**kwargs)#

Convert to a pyleoclim.EnsembleSeries or pyleoclim.Series object

Parameters:: kwargs (keyword arguments) – keyword arguments for a pyleoclim.Series object

trace_rank(y)#

Computes ensemble rank (expressed as percentile) for trace y

Parameters:: y (array-like, length n) – trace whose rank within the ensemble is to be assessed Must have n == self.nt.
Returns:: percent
Return type:: array-like, length n

Utilities#

pens.utils.hdi1d(ary, hdi_prob, skipna=True)#

Compute highest density interval over a 1d array. h/t: Arviz code: https://arviz-devs.github.io/arviz/_modules/arviz/stats/stats.html#hdi

aryNumPy array: values over which to compute HDI
hdi_probfloat: probability
skipnabool: flag to decide whether to drop NaNs (defaults to True)

pens.utils.standardize(x, scale=1, axis=0, ddof=0, eps=0.001)#

Centers and normalizes a time series. Constant or nearly constant time series not rescaled.

Parameters:

x (array) – vector of (real) numbers as a time series, NaNs allowed
scale (real) – A scale factor used to scale a record to a match a given variance
axis (int or None) – axis along which to operate, if None, compute over the whole array
ddof (int) – degress of freedom correction in the calculation of the standard deviation
eps (real) – a threshold to determine if the standard deviation is too close to zero

Returns:

z (array) – The standardized time series (z-score), Z = (X - E[X])/std(X)*scale, NaNs allowed
mu (real) – The mean of the original time series, E[X]
sig (real) – The standard deviation of the original time series, std[X]

References

Tapio Schneider’s MATLAB code: tapios/RegEM

The zscore function in SciPy: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.zscore.html

See also

pyleoclim.utils.tsutils.preprocess: pre-processes a times series using standardization and detrending.

API Reference

Contents

API Reference#

EnsembleTS#

Utilities#