API Reference#

EnsembleTS#

class pens.ens.EnsembleTS(time=None, value=None, label=None, time_name=None, time_unit=None, value_name=None, value_unit=None)#

Ensemble Timeseries

Note that annual reconstruction is assumed so the time axis is in years. The ensembles variable should be in shape of (nt, nEns), where nt is the number of years, and nEns is the number of ensemble members.

distance(y=None, order=1, nsamples=None)#

Compute the distance between a target y and the ensemble object.

Parameters:
  • y (array-like, length n) – trace/plume whose probability is to be assessed If None, the distance is computed between every possible pair of trajectories within the ensemble If specified, Must have n == self.nt

  • order (int, or inf) – Order of the norm. inf means numpy’s inf object. The default is 1.

  • nsamples (int) – number of samples to use (to speed up computation for very large ensembles)

Returns:

dist

Return type:

numpy array, dimension (self.nEns,)

from_df(df, time_column=None, value_columns=None)#

Load data from a pandas.DataFrame

Parameters:
  • df (pandas.DataFrame) – The pandas.DataFrame object.

  • time_column (str) – The label of the column for the time axis.

  • value_columns (list of str) – The list of the labels for the value axis of the ensemble members.

Extract trend distributions from EnsembleTS object via Gaussian Kernel Density Estimation

Parameters:
  • segment_length (int, optional) – DESCRIPTION. The default is 10.

  • step (int, optional) – DESCRIPTION. The default is 10.

  • xm (NumPy array, optional) – axis over which KDE is calculated The default is np.linspace(-0.5,1.5,200).

  • bw (str, scalar or callable, optional) – The method used to calculate the estimator bandwidth. This can be “scott”, “silverman”, a scalar constant or a callable. If a scalar, this will be used directly as kde.factor. If a callable, it should take a gaussian_kde instance as only parameter and return a scalar. If None (default), “scott” is used.

Returns:

new – DESCRIPTION.

Return type:

TYPE

hdi_score(y, prob=0.9)#

Computes HDI score for target series y

Parameters:
  • y (array-like, length n) – trace whose intensity of probability (“likelihood”) is to be assessed Must have n == self.nt.

  • prob (float) – probability for which the highest density interval will be computed. The default is 0.9.

Returns:

  • score (the score (scalar))

  • HDI (the n x 2 array)

line_density(figsize=[10, 4], cmap='Greys', color_scale='linear', bins=None, num_fine=None, xlabel=None, ylabel=None, title=None, ylim=None, xlim=None, title_kwargs=None, ax=None, **pcolormesh_kwargs)#

Plot the timeseries 2-D histogram

Parameters:
  • cmap (str) – The colormap for the histogram.

  • color_scale (str) – The scale of the colorbar; should be either ‘linear’ or ‘log’.

  • bins (list/tuple of 2 floats) – The number of bins for each axis: nx, ny = bins.

  • Referneces

  • ----------

  • https (-) –

load_nc(path, time_name='time', var=None)#

Load data from a .nc file with xarray

Parameters:
  • path (str) – The path of the .nc file.

  • var (str) – The name of variable to load. Note that we assume the first axis of the loaded variable is time.

  • time_name (str) – The name of the time axis.

make_labels()#

Initialization of plot labels based on object metadata

Returns:

  • time_header (str) – Label for the time axis

  • value_header (str) – Label for the value axis

plot(figsize=[12, 4], xlabel=None, ylabel=None, title=None, ylim=None, xlim=None, legend_kwargs=None, title_kwargs=None, ax=None, **plot_kwargs)#

Plot the raw values (multiple series)

plot_hdi(prob=0.9, median=True, figsize=[12, 4], color='tab:blue', xlabel=None, ylabel=None, label=None, title=None, ylim=None, xlim=None, alpha=0.2, legend_kwargs=None, title_kwargs=None, ax=None, **plot_kwargs)#

h/t: Arviz code: https://arviz-devs.github.io/arviz/_modules/arviz/stats/stats.html#hdi

Parameters:
  • prob (float) – probability for which the highest density interval will be computed. The default is 0.9.

  • median (bool) – If True (default), the posterior median is added.

  • figsize (tuple, optional) – dimensions of the figure. The default is [12, 4].

  • xlabel (str, optional) – Label for x axis. The default is None.

  • ylabel (str, optional) – Label for y axis. The default is None.

  • label (str, optional) – Label for the plotted objects; useful for multi-plots. If None (default) is specified, will attempt to use the object’s label.

  • title (TYPE, optional) – DESCRIPTION. The default is None.

  • ylim (TYPE, optional) – DESCRIPTION. The default is None.

  • xlim (TYPE, optional) – DESCRIPTION. The default is None.

  • alpha (TYPE, optional) – DESCRIPTION. The default is 0.3.

  • legend_kwargs (TYPE, optional) – DESCRIPTION. The default is None.

  • title_kwargs (TYPE, optional) – DESCRIPTION. The default is None.

  • ax (TYPE, optional) – DESCRIPTION. The default is None.

  • **plot_kwargs (TYPE) – DESCRIPTION.

Returns:

DESCRIPTION.

Return type:

TYPE

plot_qs(figsize=[10, 4], qs=[0.025, 0.25, 0.5, 0.75, 0.975], color='indianred', xlabel=None, ylabel=None, title=None, ylim=None, xlim=None, alphas=[0.3, 0.1], plot_kwargs=None, legend_kwargs=None, title_kwargs=None, ax=None, plot_trend=True)#

Plot the quantiles

Parameters:
  • figsize (list, optional) – The size of the figure. Defaults to [12, 4].

  • qs (list, optional) – The list to denote the quantiles plotted. Defaults to [0.025, 0.25, 0.5, 0.75, 0.975].

  • color (str, optional) – The basic color for the quantile envelopes. Defaults to ‘indianred’.

  • xlabel (str, optional) – The label for the x-axis. Defaults to ‘Year (CE)’.

  • ylabel (str, optional) – The label for the y-axis. Defaults to None.

  • title (str, optional) – The title of the figure. Defaults to None.

  • ylim (tuple or list, optional) – The limit of the y-axis. Defaults to None.

  • xlim (tuple or list, optional) – The limit of the x-axis. Defaults to None.

  • alphas (list, optional) – The alphas for the quantile envelopes. Defaults to [0.5, 0.1].

  • plot_kwargs (dict, optional) – The keyword arguments for the ax.plot() function. Defaults to None.

  • legend_kwargs (dict, optional) – The keyword arguments for the ax.legend() function. Defaults to None.

  • title_kwargs (dict, optional) – The keyword arguments for the ax.title() function. Defaults to None.

  • ax (matplotlib.axes, optional) – The matplotlib.axes object. If set the image will be plotted in the existing ax. Defaults to None.

  • plot_trend (bool, optional) – If True, will plot the trend analysis result if existed. Defaults to True.

plot_traces(num_traces=5, figsize=[10, 4], title=None, label=None, seed=None, indices=None, xlim=None, ylim=None, color=None, ax=None, plot_legend=True, lgd_kwargs=None, xlabel=None, ylabel=None, lw=0.5, alpha=0.1)#

Plot EnsembleTS as a subset of traces.

Parameters:
  • num_traces (int, optional) – Number of traces to plot, chosen at random. Default is 5.

  • figsize (list, optional) – The figure size. The default is [10, 4].

  • xlabel (str, optional) – x-axis label. The default is None.

  • ylabel (str, optional) – y-axis label. The default is None.

  • title (str, optional) – Plot title. The default is None.

  • label (str, optional) – Label to use on the plot legend. Automatically generated if not provided.

  • seed (int, optional) – seed for the random number generator. Useful for reproducibility. The default is None. Disregarded if indices is not None

  • indices (int, optional) – (0-based) indices of the traces. The default is None. If provided, supersedes “seed” and “num_traces”.

  • xlim (list, optional) – x-axis limits. The default is None.

  • ylim (list, optional) – y-axis limits. The default is None.

  • color (str, optional) – Color of the traces. The default uses the property cycler: https://matplotlib.org/stable/gallery/color/color_cycle_default.html

  • alpha (float, optional) – Transparency of the lines representing the multiple members. The default is 0.3.

  • linestyle ({'-', '--', '-.', ':', '', (offset, on-off-seq), ...}) – Set the linestyle of the line

  • lw (float, optional) – Width of the lines representing the multiple members. The default is 0.5.

  • num_traces – Number of traces to plot. The default is None, which will plot all traces.

  • savefig_settings (dict, optional) –

    the dictionary of arguments for plt.savefig(); some notes below:

    • ”path” must be specified; it can be any existed or non-existed path, with or without a suffix; if the suffix is not given in “path”, it will follow “format”

    • ”format” can be one of {“pdf”, “eps”, “png”, “ps”} The default is None.

  • ax (matplotlib.ax, optional) – Matplotlib axis on which to return the plot. The default is None.

  • plot_legend (bool; {True,False}, optional) – Whether to plot the legend. The default is True.

  • lgd_kwargs (dict, optional) – Parameters for the legend. The default is None.

  • seed – Set the seed for the random number generator. Useful for reproducibility. The default is None.

Returns:

plume_distance(y=None, max_dist=1, num=100, q=0.5, order=1, spread_stat='IQR', dist=None, nsamples=None)#

Compute the (quantile-based) characteristic distance between a plume (ensemble) and another object (whether a single trace or another plume). Searches for quantile q of the “proximity probability” distribution

Parameters:
  • y (array-like, length self.nt) – trace/plume whose probability is to be assessed

  • q (float) – Quantile from which the characteristic distance is derived. Default = 0.5 (median)

  • order (int, or inf) – Order of the norm. inf means numpy’s inf object. The default is 1.

  • spread_stat (str) – Statistic to be used for distributional spread. Choices: ‘SD’, ‘IQR’ or ‘HDI’ SD is the standard deviation, appropriate for Gaussian situations IQR (default) is the interquartile-range (a non-parametric measure, robust and resistant) HDI returns the 95% highest-density interval as a NumPy 2-array

  • dist (array-like, length self.nEns) – if provided, uses this as vector of distances. Otherwise it is computed internally

  • nsamples (int) – number of samples to use from the ensemble. Default is None, which uses all samples.

Returns:

  • eps_q (float) – Representative distance at quantile q (in same units as self or y)

  • eps_spread (float) – Measure of distributional spread

proximity_prob(y, eps, order=1, dist=None, nsamples=None)#

Compute the probability P that the trace y is within a distance eps of the ensemble object.

Parameters:
  • y (array-like, length self.nt) – trace/plume whose proximity is to be assessed

  • eps (array of float64) – numerical tolerance for the distance.

  • order (int, or inf) – Order of the norm. inf means numpy’s inf object. The default is 1.

  • dist (array-like, length self.nEns) – if provided, uses this as vector of distances. Otherwise it is computed internally

Returns:

P – Probability that the trace y is within a distance eps of the ensemble object

Return type:

float in [0,1]

random_paths(model='fGn', param=None, p=1, trend=None, seed=None)#

Generate p random walks through the ensemble according to a given parametric model with random parameter sampling

Parameters:
Returns:

new

Return type:

EnsembleTS object containing the p series

sample_nearest(target, metric='MSE')#

Get the nearest sample path against the target series

Note that metric is used only for the final distance calculation.

slice(timespan)#

Slicing the timeseries with a timespan (tuple or list)

Parameters:

timespan (tuple or list) – The list of time points for slicing, whose length must be even. When there are n time points, the output Series includes n/2 segments. For example, if timespan = [a, b], then the sliced output includes one segment [a, b]; if timespan = [a, b, c, d], then the sliced output includes segment [a, b] and segment [c, d].

Returns:

new – The sliced EnsembleSeries object.

Return type:

EnsembleTS

subsample(nsamples, seed=None)#

Thin out original ensemble by drawing nsamples at random

Parameters:
  • nsamples (int) – number of samples to draw at random from the original ensemble. If nsamples >= self.nEns, the object is returned unchanged.

  • seed (int) – seed for the random generator (provided for reproducibility)

Returns:

res – Downsized object.

Return type:

EnsembleTS

to_df(time_column=None, value_column='ens')#

Convert an EnsembleTS to a pandas.DataFrame

Parameters:
  • time_column (str) – The label of the column for the time axis.

  • value_column (str) – The base column label for the ensemble members. By default, the columns for the members will be labeled as “ens.0”, “ens.1”, “ens.2”, etc.

to_pyleo(**kwargs)#

Convert to a pyleoclim.EnsembleSeries or pyleoclim.Series object

Parameters:

kwargs (keyword arguments) – keyword arguments for a pyleoclim.Series object

trace_rank(y)#

Computes ensemble rank (expressed as percentile) for trace y

Parameters:

y (array-like, length n) – trace whose rank within the ensemble is to be assessed Must have n == self.nt.

Returns:

percent

Return type:

array-like, length n

Utilities#

pens.utils.hdi1d(ary, hdi_prob, skipna=True)#

Compute highest density interval over a 1d array. h/t: Arviz code: https://arviz-devs.github.io/arviz/_modules/arviz/stats/stats.html#hdi

aryNumPy array

values over which to compute HDI

hdi_probfloat

probability

skipnabool

flag to decide whether to drop NaNs (defaults to True)

Calculates the means and trends on an ensemble array

Uses statsmodels’ OLS method

Inputs:

var: 2d numpy array [time, ens member] segment_length: # elements in block (integer) step: step size (integer) years: 1d numpy array

Outputs:

means: Means of every segment. trends: trends over every segment. idxs: The first and last index of every segment, for record-keeping. tm: median time point of each block

Author: Julien Emile-Geay, based on code by Michael P. Erb. Date: March 8, 2018

pens.utils.standardize(x, scale=1, axis=0, ddof=0, eps=0.001)#

Centers and normalizes a time series. Constant or nearly constant time series not rescaled.

Parameters:
  • x (array) – vector of (real) numbers as a time series, NaNs allowed

  • scale (real) – A scale factor used to scale a record to a match a given variance

  • axis (int or None) – axis along which to operate, if None, compute over the whole array

  • ddof (int) – degress of freedom correction in the calculation of the standard deviation

  • eps (real) – a threshold to determine if the standard deviation is too close to zero

Returns:

  • z (array) – The standardized time series (z-score), Z = (X - E[X])/std(X)*scale, NaNs allowed

  • mu (real) – The mean of the original time series, E[X]

  • sig (real) – The standard deviation of the original time series, std[X]

References

Tapio Schneider’s MATLAB code: tapios/RegEM

The zscore function in SciPy: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.zscore.html

See also

pyleoclim.utils.tsutils.preprocess

pre-processes a times series using standardization and detrending.