Sampling Module

class corrai.sampling.Sample(parameters, is_dynamic=True, results=<factory>)[source]

Bases: object

Container for simulation samples and results.

Each Sample instance stores parameter values and the corresponding simulation results. It supports indexing, aggregation, plotting, and integration with sampling strategies.

Handle both dynamic and static models.

Parameters:

parameters (list of Parameter) – List of model parameters used to generate the samples.

Variables:
  • parameters (list of Parameter) – Parameters associated with this sample.

  • is_dynamic (Bool default True) – Specify if stored results are timeeries in a DataFrame for dynamic models or a Series of float for static models

  • values (ndarray of shape (n_samples, n_parameters)) – Numerical values of the sampled parameters.

  • results (Series of DataFrames) – Simulation results for each sample. Each element is typically a pandas DataFrame indexed by time, containing model outputs.

parameters: list[Parameter]
is_dynamic: bool = True
values: DataFrame
results: Series
get_pending_index()[source]

Identify which samples have not yet been simulated.

Returns:

Boolean mask of length len(self), where True indicates a sample without results.

Return type:

ndarray of bool

get_parameters_intervals()[source]

Return parameter intervals.

Returns:

Lower and upper bounds for each parameter.

Return type:

ndarray of shape (n_parameters, 2)

Raises:
  • NotImplementedError – If any parameter has type ‘Integer’.

  • ValueError – If parameters are not of type ‘Real’.

get_list_parameter_value_pairs(idx=None)[source]

Map parameter objects to their sampled values.

Parameters:

idx (int, list of int, ndarray, or slice, optional) – Indices of samples to retrieve. Defaults to all.

Returns:

Nested list where each inner list corresponds to a sample.

Return type:

list of list of (Parameter, value)

get_dimension_less_values(idx=slice(None, None, None))[source]

Normalize parameter values to [0, 1].

Parameters:

idx (int, list, ndarray, or slice, optional) – Indices of samples to normalize. Defaults to all.

Returns:

Dimensionless parameter values, scaled using their defined intervals.

Return type:

ndarray of shape (n_selected, n_parameters)

add_samples(values, results=None)[source]

Add new samples and optionally their results.

Parameters:
  • values (ndarray of shape (n_samples, n_parameters)) – Sampled parameter values to add.

  • results (list of DataFrame, optional) – Simulation results corresponding to values. If None, empty DataFrames are stored.

Raises:

AssertionError – If results length does not match values length.

get_aggregated_time_series(indicator, method='mean', agg_method_kwarg=None, reference_time_series=None, freq=None, prefix='aggregated')[source]

Aggregate sample results using a specified statistical or error metric.

This method extracts the specified indicator column, and aggregates the time series across simulations using the given method. If a reference time series is provided, metrics that require ground truth (e.g., mean_absolute_error) are supported.

If freq is provided, the aggregation is done over time bins, producing a table of simulation runs versus time periods.

Only works for dynamic models

Parameters:
  • indicator (str) – The column name in each DataFrame to extract and aggregate.

  • method (str, default="mean") – The aggregation method to use. Supported methods include: - “mean” - “sum” - “nmbe” - “cv_rmse” - “mean_squared_error” - “mean_absolute_error”

  • agg_method_kwarg (dict, optional) – Additional keyword arguments to pass to the aggregation function.

  • reference_time_series (pandas.Series, optional) – Reference series (y_true) to compare each simulation against. Required for error-based methods such as “mean_absolute_error”. Must have the same datetime index and length as the individual simulation results.

  • freq (str or pandas.Timedelta or datetime.timedelta, optional) – If provided, aggregate the time series within bins of this frequency (e.g., “d” for daily, “h” for hourly). The result will be a DataFrame where each row corresponds to a simulation and each column to a time bin.

  • prefix (str, default="aggregated") – Prefix to use for naming the output column when freq is not specified.

Returns:

If freq is not provided, returns a one-column DataFrame containing the aggregated metric per simulation, indexed by the same index as results.

If freq is provided, returns a DataFrame indexed by simulation IDs (same as results.index), with columns representing each aggregated time bin.

Return type:

pandas.DataFrame

Raises:

ValueError – If the shapes of results and reference_time_series are incompatible. If the datetime index is not valid or missing.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from corrai.base.parameter import Parameter
>>> from corrai.sampling import Sample
>>> sample = Sample(
...     parameters=[
...         Parameter("a", interval=(1, 10)),
...         Parameter("b", interval=(1, 10)),
...     ]
... )
>>> t = pd.date_range("2009-01-01", freq="h", periods=2)
>>> res_1 = pd.DataFrame({"a": [1, 2]}, index=t)
>>> res_2 = pd.DataFrame({"a": [3, 4]}, index=t)
>>> sample.add_samples(np.array([[1, 2], [3, 4]]), [res_1, res_2])
>>> # No frequency aggregation: one aggregated value per simulation
>>> sample.get_aggregated_time_series("a")
   aggregated_a
0           1.5
1           3.5
>>> # With frequency aggregation: one value per time bin per simulation
>>> ref = pd.Series(
...     [1, 1], index=pd.date_range("2009-01-01", freq="h", periods=2)
... )
>>> sample.get_aggregated_time_series(
...     indicator="a",
...     method="mean_absolute_error",
...     reference_time_series=ref,
...     freq="h",
...)

2009-01-01 00:00:00 2009-01-01 01:00:00

0 0.0 1.0 1 2.0 3.0

get_static_results_as_df()[source]
get_score_df(indicator, reference_time_series, scoring_methods=None, resample_rule=None, resample_agg_method='mean')[source]

Compute scoring metrics for a given indicator across all sample results.

This method evaluates the performance of dynamic model predictions by comparing them against a reference time series. It supports multiple scoring metrics (R², NMBE, CV(RMSE), MAE, RMSE, max error) and optional resampling of data.

Parameters:
  • indicator (str) – Name of the indicator/variable to evaluate from the simulation results. Must be a valid columns in the sample results DataFrame.

  • reference_time_series (pd.Series) – Ground truth or measured time series data to compare against.

  • scoring_methods (list of str or callable, optional) –

    List of scoring methods to apply. Can be:

    • String values from SCORE_MAP: "r2", "nmbe", "cv_rmse", "mae", "rmse", "max"

    • Custom callable functions with signature func(y_true, y_pred) -> float

    If None, all methods are used. Default is None.

  • resample_rule (str, pd.Timedelta or dt.timedelta, optional) – Resampling frequency for aggregating the time series data before scoring. Examples: "D" (daily), "h" (hourly), "ME" (month end). If None, no resampling is performed. Default is None.

  • resample_agg_method (str, optional) – Aggregation method to use when resampling. Common values include: "mean", "sum", "min", "max", "median". Default is "mean".

Returns:

DataFrame containing scoring metrics for each sample.

  • Index: sample identifiers from self.results

  • Columns: metric names (e.g., "r2_score", "nmbe", "cv_rmse")

  • Values: computed metric values (float)

The DataFrame’s index name is set to the resampling rule or the inferred frequency of the reference time series.

Return type:

pd.DataFrame

Raises:

NotImplementedError – If the model is not dynamic (self.is_dynamic == False).

Notes

The scoring metrics available in SCORE_MAP are:

  • r2: R² score (coefficient of determination)

  • nmbe: Normalized Mean Bias Error

  • cv_rmse: Coefficient of Variation of Root Mean Squared Error

  • mae: Mean Absolute Error

  • rmse: Root Mean Squared Error

  • max: Maximum absolute error

When resampling is applied, both the predicted and reference time series are resampled using the same rule and aggregation method to ensure alignment.

Examples

Basic usage with default metrics:

>>> import pandas as pd
>>> import numpy as np
>>> # Assuming 'sample' is an instance of Sample class with results
>>> reference = pd.Series(
...     np.random.randn(100),
...     index=pd.date_range("2023-01-01", periods=100, freq="h"),
... )
>>> scores = sample.get_score_df(
...     indicator="temperature", reference_time_series=reference
... )
>>> print(scores)
            r2_score      nmbe   cv_rmse       mae      rmse       max
0    0.85234  0.012345  0.234567  1.234567  1.567890  3.456789
1    0.82156  0.023456  0.345678  1.345678  1.678901  3.567890
...

Using specific metrics and daily resampling:

>>> scores = sample.get_score_df(
...     indicator="Energy",
...     reference_time_series=reference,
...     scoring_methods=["r2", "rmse", "mae"],
...     resample_rule="D",
...     resample_agg_method="sum",
... )
>>> print(scores)
          r2_score      rmse       mae
D
0  0.91234  12.34567  10.12345
1  0.89123  13.45678  11.23456
...

See also

sklearn.metrics.r2_score

R² metric implementation

sklearn.metrics.mean_absolute_error

MAE metric implementation

sklearn.metrics.root_mean_squared_error

RMSE metric implementation

plot_hist(indicator, method='mean', unit='', agg_method_kwarg=None, reference_time_series=None, bins=30, colors='orange', reference_value=None, reference_label='Reference', show_rug=False, title=None)[source]

Plot histogram of aggregated results.

Parameters:
  • indicator (str) – Name of the indicator column to plot.

  • method (str, default="mean") – Aggregation method.

  • unit (str, optional) – Unit of the indicator.

  • agg_method_kwarg (dict, optional) – Additional kwargs for aggregation.

  • reference_time_series (Series, optional) – Reference time series.

  • bins (int, default=30) – Histogram number of bins.

  • colors (str, default="orange") – Color of the histogram.

  • reference_value (int, float, optional) – Add a vertical dashed red line at reference value. May be used for comparison with an expected value

  • reference_label (str, optional) – Label name for reference value line to be displayed in the legend. Default is “Reference”

  • show_rug (bool, default=False) – If True, display rug plot below histogram.

  • title (str, optional) – Custom title.

Returns:

Plotly histogram figure.

Return type:

go.Figure

plot_sample(indicator, reference_timeseries=None, title=None, y_label=None, x_label=None, alpha=0.5, show_legends=False, round_ndigits=2, quantile_band=0.75, type_graph='area')[source]

Plot simulation results with different visualization modes.

This function allows visualization of multiple simulation samples, either as a scatter plot of all samples or as an aggregated area with min–max envelope, median, and quantile bands.

Only works for dynamic models

Parameters:
  • indicator (str, optional) – Column name to extract if inner elements are DataFrames with multiple columns.

  • reference_timeseries (pandas.Series, optional) – A reference time series to plot alongside simulations (e.g., measured data).

  • title (str, optional) – Plot title.

  • y_label (str, optional) – Label for the y-axis.

  • x_label (str, optional) – Label for the x-axis.

  • alpha (float, default=0.5) – Opacity for scatter markers when type_graph="scatter".

  • show_legends (bool, default=False) – Whether to display legends for each individual sample trace when type_graph="scatter".

  • round_ndigits (int, default=2) – Number of digits for rounding parameter values in legend strings.

  • quantile_band (float, default=0.75) – Upper quantile to display when type_graph="area". Both (1 - quantile_band) and quantile_band are drawn as dotted lines, e.g. 0.75 → 25% and 75%.

  • type_graph ({"area", "scatter"}, default="area") –

    Visualization mode: - "scatter" : plot all samples individually as scatter markers. - "area" : plot aggregated area with min–max envelope,

    median line, and quantile bands.

Return type:

Figure

Examples

>>> fig = plot_sample(results, reference_timeseries=ref)
>>> fig.show()
>>> fig = plot_sample(results, reference_timeseries=ref, type_graph="scatter")
>>> fig.show()
plot_pcp(indicators_configs, color_by=None, title='Parallel Coordinates Samples', html_file_path=None)[source]

This method produces an interactive PCP visualization that allows comparison of model parameters against aggregated indicators from simulation results. It supports both dynamic and static models.

For dynamic models, the specified indicators are aggregated across time using the provided functions (e.g., “mean”, “sum”, error metrics). For static models, the indicators are taken directly from the stored results.

Parameters:
  • indicators_configs (list of str or list of tuple) –

    Configuration of indicators to include in the plot.

    • For dynamic models, each element must be a tuple of the form: (indicator_name, method) or (indicator_name, method, reference_series).

      Here:
      • indicator_name : str Column name in the simulation results to aggregate.

      • method : str or Callable Aggregation function or metric to apply.

      • reference_series : pandas.Series, optional Reference time series required for error-based methods (e.g., mean absolute error).

    • For static models, a simple list of indicator names (str) is sufficient.

  • color_by (str, optional) – Name of a parameter or result column to use for coloring the PCP lines. If None, all lines are plotted in the same color.

  • title (str, default="Parallel Coordinates — Samples") – Title of the plot.

  • html_file_path (str, optional) – If provided, saves the interactive plot as an HTML file at the specified path.

Returns:

The generated parallel coordinates figure. The figure can be displayed interactively in a Jupyter notebook, web browser, or exported to HTML.

Return type:

plotly.graph_objects.Figure

Raises:

ValueError – If the indicators_configs are incompatible with the model type (dynamic vs static).

See also

get_aggregated_time_series

For details on supported aggregation methods and how indicator values are computed for dynamic models.

__init__(parameters, is_dynamic=True, results=<factory>)
class corrai.sampling.Sampler(parameters, model, simulation_options=None)[source]

Bases: SampleMethodsMixin

Abstract base class for parameter samplers.

A Sampler generates parameter sets according to a chosen sampling method and runs simulations of a given model.

Parameters:
  • parameters (list of Parameter) – List of parameters to be sampled.

  • model (Model) – The model to simulate for each sample.

  • simulation_options (dict, optional) – Options passed to the simulation (e.g., time range, timestep).

Variables:

sample (Sample) – Container holding parameter values and simulation results.

__init__(parameters, model, simulation_options=None)[source]
property parameters
property values
property results
add_sample(*args, **kwargs)[source]

Generate new samples and optionally run simulations.

Must be implemented in subclasses.

Returns:

The newly generated sample values.

Return type:

ndarray

append_sample_from_param_dict(param_dict, simulation_kwargs=None)[source]

Add a new sample from a parameter dictionary and simulate it.

This method appends a single sample to the existing sample set by providing parameter values as a dictionary. The keys must correspond to the name property of the Parameter objects in the parameters. After adding the sample, it automatically runs a simulation for the newly added parameters.

Parameters:
  • param_dict (dict of {str: int, float, or str}) – Dictionary mapping parameter names to their values. Keys must match the name attribute of Parameter objects in self.parameters. All parameters must be present in the dictionary.

  • simulation_kwargs (dict, optional) – Additional keyword arguments to pass to the simulation method. These can include simulation-specific options such as solver settings, output options, or other model-specific parameters. Default is None.

Returns:

The method modifies the sample in place by adding a new row to self.sample and stores the simulation results in self.results.

Return type:

None

Raises:

AssertionError – If any parameter name in param_dict is not found in self.values.columns, indicating missing or misspelled parameter names.

Notes

  • The newly added sample is assigned the index self.values.index[-1], which corresponds to the last index after appending.

  • This method is useful for manual parameter exploration, adding specific test cases, or iteratively building samples based on optimization results.

Examples

Add a single sample with specific parameter values:

>>> from corrai.base.parameter import Parameter
>>> from corrai.base.model import IshigamiDynamic
>>> from corrai.sampling import LHSSampler
>>>
>>> # Define parameters
>>> params = [
...     Parameter("par_x1", (-3.14, 3.14), model_property="x1"),
...     Parameter("par_x2", (-3.14, 3.14), model_property="x2"),
...     Parameter("par_x3", (-3.14, 3.14), model_property="x3"),
... ]
>>>
>>> # Create sample object
>>> simulation_opts = {
...     "start": "2023-01-01 00:00:00",
...     "end": "2023-01-01 23:00:00",
...     "timestep": "h",
... }
>>> sample = LHSSampler(
...     parameter_list=params,
...     model=IshigamiDynamic(),
...     simulation_options=simulation_opts,
... )
>>>
>>> # Add a specific sample
>>> new_params = {"par_x1": 1.5, "par_x2": -0.5, "par_x3": 2.0}
>>> sample.append_sample_from_param_dict(new_params)
>>> print(sample.values.tail(1))
      par_x1  par_x2  par_x3
0        1.5    -0.5     2.0

See also

add_samples

Add multiple samples at once using a numpy array

simulate_at

Simulate a specific sample by its index

Parameter

Class defining model parameters with bounds and properties

simulate_at(idx=None, n_cpu=1, simulation_kwargs=None)[source]
simulate_pending(n_cpu=1, simulation_kwargs=None)[source]
class corrai.sampling.RealSampler(parameters, model, simulation_options=None)[source]

Bases: Sampler

Abstract base class for samplers that only support real-valued parameters.

Provides utilities for interoperability with SALib.

Parameters:
  • parameters (list of Parameter) – Parameters to sample. All must have ptype=’Real’.

  • model (Model) – Model to simulate.

  • simulation_options (dict, optional) – Simulation options to pass to the model.

Raises:

ValueError – If any parameter is not of type ‘Real’.

get_salib_problem()[source]

Returns a SALib-compatible problem definition.

__init__(parameters, model, simulation_options=None)[source]
get_salib_problem()[source]
class corrai.sampling.MorrisSampler(parameters, model, simulation_options=None)[source]

Bases: RealSampler

Elementary Effects (Morris) sampler.

Uses SALib’s Morris method to generate trajectories and samples.

Parameters:
  • parameters (list of Parameter) – Real-valued parameters to sample.

  • model (Model) – Model to simulate.

  • simulation_options (dict, optional) – Simulation options for the model.

add_sample(N, num_levels=4, simulate=True, n_cpu=1, \*\*kwargs)[source]

Generate samples using the Morris method.

__init__(parameters, model, simulation_options=None)[source]
add_sample(N, num_levels=4, simulate=True, n_cpu=1, simulation_kwargs=None, **morris_kwargs)[source]

Generate new samples and optionally run simulations.

Must be implemented in subclasses.

Returns:

The newly generated sample values.

Return type:

ndarray

class corrai.sampling.FASTSampler(parameters, model, simulation_options=None)[source]

Bases: RealSampler

FAST sampler.

Uses the Fourier Amplitude Sensitivity Test (FAST) to generate samples.

Parameters:
  • parameters (list of Parameter) – Real-valued parameters.

  • model (Model) – Model to simulate.

  • simulation_options (dict, optional) – Options for simulation.

add_sample(N, M=4, simulate=True, n_cpu=1, \*\*kwargs)[source]

Generate samples using FAST.

__init__(parameters, model, simulation_options=None)[source]
add_sample(N, M=4, simulate=True, n_cpu=1, **fast_kwargs)[source]

Generate new samples and optionally run simulations.

Must be implemented in subclasses.

Returns:

The newly generated sample values.

Return type:

ndarray

class corrai.sampling.RBDFASTSampler(parameters, model, simulation_options=None)[source]

Bases: RealSampler

RBD-FAST sampler.

Generates samples for the Random Balance Designs Fourier Amplitude Sensitivity Test (RBD-FAST).

Parameters:
  • parameters (list of Parameter) – Real-valued parameters.

  • model (Model) – Model to simulate.

  • simulation_options (dict, optional) – Options for simulation.

add_sample(N, simulate=True, n_cpu=1, \*\*kwargs)[source]

Generate samples using RBD-FAST.

__init__(parameters, model, simulation_options=None)[source]
add_sample(N, simulate=True, n_cpu=1, **rbdfast_kwargs)[source]

Generate new samples and optionally run simulations.

Must be implemented in subclasses.

Returns:

The newly generated sample values.

Return type:

ndarray

class corrai.sampling.LHSSampler(parameters, model, simulation_options=None)[source]

Bases: RealSampler

Latin Hypercube sampler.

Uses scipy.stats.qmc.LatinHypercube to generate stratified samples in the unit hypercube.

Parameters:
  • parameters (list of Parameter) – Real-valued parameters.

  • model (Model) – Model to simulate.

  • simulation_options (dict, optional) – Options for simulation.

add_sample(n, rng=None, simulate=True, \*\*kwargs)[source]

Generate n samples using LHS.

__init__(parameters, model, simulation_options=None)[source]
add_sample(n, rng=None, simulate=True, n_cpu=1, simulation_kwargs=None, **lhs_kwargs)[source]

Generate new samples and optionally run simulations.

Must be implemented in subclasses.

Returns:

The newly generated sample values.

Return type:

ndarray

class corrai.sampling.SobolSampler(parameters, model, simulation_options=None)[source]

Bases: RealSampler

Sobol sequence sampler.

Generates low-discrepancy quasi-random samples using SALib’s Sobol generator.

Parameters:
  • parameters (list of Parameter) – Real-valued parameters.

  • model (Model) – Model to simulate.

  • simulation_options (dict, optional) – Options for simulation.

add_sample(N, simulate=True, n_cpu=1, scramble=True,

calc_second_order=True, **kwargs)

Generate Sobol samples.

__init__(parameters, model, simulation_options=None)[source]
add_sample(N, simulate=True, n_cpu=1, scramble=True, *, calc_second_order=True, simulation_kwargs=None, **sobol_kwargs)[source]

Generate new samples and optionally run simulations.

Must be implemented in subclasses.

Returns:

The newly generated sample values.

Return type:

ndarray