Detect a shift in the mean and/or variance of a dataset

detectShift() allows you to detect a shift in the mean and/or variance of a dataset, and assess its significance given age and data uncertainty relative to a robust null hypothesis. This approach uses the function changepoint::cpt.mean(), changepoint::cpt.var(), or changepoint::cpt.meanvar() from the changepoint package, propagates inputted or modelled time and/or value ensembles, and summarizes their likelihoods relative to a robust null hypothesis (see ?testNullHypothesis)

detectShift(
  ltt = NA,
  time = NA,
  vals = NA,
  time.variable.name = NA,
  vals.variable.name = NA,
  time.units = NA,
  vals.units = NA,
  dataset.name = NA,
  surrogate.method = "isospectral",
  summary.bin.step = 100,
  summary.bin.vec = NA,
  null.hypothesis.n = 100,
  null.quantiles = c(0.95, 0.9),
  time.range = NA,
  calc.deltas = TRUE,
  ...
)

Arguments

ltt

A LiPD-timeseries-tibble, a tibble or data.frame that has the variable(s) of interest, a time variable (age, year or time) along with their metadata, aranged in rows. If ltt = NA, then one in is created from other inputs

time

if ltt is not provided, input a vector or matrix of time (year or age) data. If it's a multicolumn matrix, the columns are time-ensemble members

vals

if ltt is not provided, input a vector or matrix of paleoData. If it's a multicolumn matrix, the columns are value-ensemble members

time.variable.name

If ltt is not provided, specify the name of the time variable (typically 'age' or 'year')

vals.variable.name

If ltt is not provided, specify the name of the paleo variable (e.g., 'd18O' or 'temperature'). Alternatively, if ltt is provided with more rows than expected, this term is used to attempt to select the correct row.

time.units

If ltt is not provided, specify the units the time variable (typically 'yr BP' or 'CE')

vals.units

If ltt is not provided, specify the units the paleo variable (e.g. 'permil' or 'degrees C')

dataset.name

If ltt is not provided, specify the dataset name

surrogate.method

What method to use to generage surrogate data for hypothesis testing? Options include:

'isospectral': (Default) Following Ebisuzaki (1997), generate surrogates by scrambling the phases of the data while preserving their power spectrum. This uses the To generate these "isospectral" surrogates. Uses the rEDM::make_surrogate_data() or rEDM::SurrogateData() function depending on version
'isopersistent': Generates surrogates by simulating from an autoregressive process of order 1 (AR(1)), which has been fit to the data. Uses the geoChronR::createSyntheticTimeseries() function
'shuffle': Randomly shuffles the data to create surrogates. Uses the rEDM::make_surrogate_data() or rEDM::SurrogateData() function depending on version

summary.bin.step

Time interval over which to summarize the results

summary.bin.vec

Optionally provide a vector over which to create the summary bins, this will supersede summary.bin.step if provided (default = NA)

null.hypothesis.n

How many simulations to run for null hypothesis testing (default = 100)

null.quantiles

What quantiles to report as output from null hypothesis testing (default = c(.95, .9))

time.range

Optionally enter a time range (as minimum and maximum) that you'd like to restrict the analysis to. (default = NA)

calc.deltas

Calculate the difference in means between change point sections. Default FALSE

...

arguments to pass to pass to changeFun

Value

A tibble of output data and metadata. Each row represents a time step of the summary bin