Sampling Module
- class corrai.sampling.Sample(parameters, is_dynamic=True, results=<factory>)[source]
Bases:
objectContainer for simulation samples and results.
Each Sample instance stores parameter values and the corresponding simulation results. It supports indexing, aggregation, plotting, and integration with sampling strategies.
Handle both dynamic and static models.
- Parameters:
parameters (list of Parameter) – List of model parameters used to generate the samples.
- Variables:
parameters (list of Parameter) – Parameters associated with this sample.
is_dynamic (Bool default True) – Specify if stored results are timeeries in a DataFrame for dynamic models or a Series of float for static models
values (ndarray of shape (n_samples, n_parameters)) – Numerical values of the sampled parameters.
results (Series of DataFrames) – Simulation results for each sample. Each element is typically a pandas DataFrame indexed by time, containing model outputs.
- is_dynamic: bool = True
- values: DataFrame
- results: Series
- get_pending_index()[source]
Identify which samples have not yet been simulated.
- Returns:
Boolean mask of length len(self), where True indicates a sample without results.
- Return type:
ndarray of bool
- get_parameters_intervals()[source]
Return parameter intervals.
- Returns:
Lower and upper bounds for each parameter.
- Return type:
ndarray of shape (n_parameters, 2)
- Raises:
NotImplementedError – If any parameter has type ‘Integer’.
ValueError – If parameters are not of type ‘Real’.
- get_list_parameter_value_pairs(idx=None)[source]
Map parameter objects to their sampled values.
- Parameters:
idx (int, list of int, ndarray, or slice, optional) – Indices of samples to retrieve. Defaults to all.
- Returns:
Nested list where each inner list corresponds to a sample.
- Return type:
list of list of (Parameter, value)
- get_dimension_less_values(idx=slice(None, None, None))[source]
Normalize parameter values to [0, 1].
- Parameters:
idx (int, list, ndarray, or slice, optional) – Indices of samples to normalize. Defaults to all.
- Returns:
Dimensionless parameter values, scaled using their defined intervals.
- Return type:
ndarray of shape (n_selected, n_parameters)
- add_samples(values, results=None)[source]
Add new samples and optionally their results.
- Parameters:
values (ndarray of shape (n_samples, n_parameters)) – Sampled parameter values to add.
results (list of DataFrame, optional) – Simulation results corresponding to values. If None, empty DataFrames are stored.
- Raises:
AssertionError – If results length does not match values length.
- get_aggregated_time_series(indicator, method='mean', agg_method_kwarg=None, reference_time_series=None, freq=None, prefix='aggregated')[source]
Aggregate sample results using a specified statistical or error metric.
This method extracts the specified indicator column, and aggregates the time series across simulations using the given method. If a reference time series is provided, metrics that require ground truth (e.g., mean_absolute_error) are supported.
If freq is provided, the aggregation is done over time bins, producing a table of simulation runs versus time periods.
Only works for dynamic models
- Parameters:
indicator (str) – The column name in each DataFrame to extract and aggregate.
method (str, default="mean") – The aggregation method to use. Supported methods include: - “mean” - “sum” - “nmbe” - “cv_rmse” - “mean_squared_error” - “mean_absolute_error”
agg_method_kwarg (dict, optional) – Additional keyword arguments to pass to the aggregation function.
reference_time_series (pandas.Series, optional) – Reference series (y_true) to compare each simulation against. Required for error-based methods such as “mean_absolute_error”. Must have the same datetime index and length as the individual simulation results.
freq (str or pandas.Timedelta or datetime.timedelta, optional) – If provided, aggregate the time series within bins of this frequency (e.g., “d” for daily, “h” for hourly). The result will be a DataFrame where each row corresponds to a simulation and each column to a time bin.
prefix (str, default="aggregated") – Prefix to use for naming the output column when freq is not specified.
- Returns:
If freq is not provided, returns a one-column DataFrame containing the aggregated metric per simulation, indexed by the same index as results.
If freq is provided, returns a DataFrame indexed by simulation IDs (same as results.index), with columns representing each aggregated time bin.
- Return type:
pandas.DataFrame
- Raises:
ValueError – If the shapes of results and reference_time_series are incompatible. If the datetime index is not valid or missing.
Examples
>>> import pandas as pd >>> import numpy as np
>>> from corrai.base.parameter import Parameter >>> from corrai.sampling import Sample
>>> sample = Sample( ... parameters=[ ... Parameter("a", interval=(1, 10)), ... Parameter("b", interval=(1, 10)), ... ] ... )
>>> t = pd.date_range("2009-01-01", freq="h", periods=2) >>> res_1 = pd.DataFrame({"a": [1, 2]}, index=t) >>> res_2 = pd.DataFrame({"a": [3, 4]}, index=t)
>>> sample.add_samples(np.array([[1, 2], [3, 4]]), [res_1, res_2])
>>> # No frequency aggregation: one aggregated value per simulation >>> sample.get_aggregated_time_series("a") aggregated_a 0 1.5 1 3.5
>>> # With frequency aggregation: one value per time bin per simulation >>> ref = pd.Series( ... [1, 1], index=pd.date_range("2009-01-01", freq="h", periods=2) ... )
>>> sample.get_aggregated_time_series( ... indicator="a", ... method="mean_absolute_error", ... reference_time_series=ref, ... freq="h", ...)
2009-01-01 00:00:00 2009-01-01 01:00:00
0 0.0 1.0 1 2.0 3.0
- get_score_df(indicator, reference_time_series, scoring_methods=None, resample_rule=None, resample_agg_method='mean')[source]
Compute scoring metrics for a given indicator across all sample results.
This method evaluates the performance of dynamic model predictions by comparing them against a reference time series. It supports multiple scoring metrics (R², NMBE, CV(RMSE), MAE, RMSE, max error) and optional resampling of data.
- Parameters:
indicator (str) – Name of the indicator/variable to evaluate from the simulation results. Must be a valid columns in the sample results DataFrame.
reference_time_series (pd.Series) – Ground truth or measured time series data to compare against.
scoring_methods (list of str or callable, optional) –
List of scoring methods to apply. Can be:
String values from
SCORE_MAP:"r2","nmbe","cv_rmse","mae","rmse","max"Custom callable functions with signature
func(y_true, y_pred) -> float
If None, all methods are used. Default is None.
resample_rule (str, pd.Timedelta or dt.timedelta, optional) – Resampling frequency for aggregating the time series data before scoring. Examples:
"D"(daily),"h"(hourly),"ME"(month end). If None, no resampling is performed. Default is None.resample_agg_method (str, optional) – Aggregation method to use when resampling. Common values include:
"mean","sum","min","max","median". Default is"mean".
- Returns:
DataFrame containing scoring metrics for each sample.
Index: sample identifiers from
self.resultsColumns: metric names (e.g.,
"r2_score","nmbe","cv_rmse")Values: computed metric values (float)
The DataFrame’s index name is set to the resampling rule or the inferred frequency of the reference time series.
- Return type:
pd.DataFrame
- Raises:
NotImplementedError – If the model is not dynamic (
self.is_dynamic == False).
Notes
The scoring metrics available in
SCORE_MAPare:r2: R² score (coefficient of determination)nmbe: Normalized Mean Bias Errorcv_rmse: Coefficient of Variation of Root Mean Squared Errormae: Mean Absolute Errorrmse: Root Mean Squared Errormax: Maximum absolute error
When resampling is applied, both the predicted and reference time series are resampled using the same rule and aggregation method to ensure alignment.
Examples
Basic usage with default metrics:
>>> import pandas as pd >>> import numpy as np >>> # Assuming 'sample' is an instance of Sample class with results >>> reference = pd.Series( ... np.random.randn(100), ... index=pd.date_range("2023-01-01", periods=100, freq="h"), ... ) >>> scores = sample.get_score_df( ... indicator="temperature", reference_time_series=reference ... ) >>> print(scores) r2_score nmbe cv_rmse mae rmse max 0 0.85234 0.012345 0.234567 1.234567 1.567890 3.456789 1 0.82156 0.023456 0.345678 1.345678 1.678901 3.567890 ...
Using specific metrics and daily resampling:
>>> scores = sample.get_score_df( ... indicator="Energy", ... reference_time_series=reference, ... scoring_methods=["r2", "rmse", "mae"], ... resample_rule="D", ... resample_agg_method="sum", ... ) >>> print(scores) r2_score rmse mae D 0 0.91234 12.34567 10.12345 1 0.89123 13.45678 11.23456 ...
See also
sklearn.metrics.r2_scoreR² metric implementation
sklearn.metrics.mean_absolute_errorMAE metric implementation
sklearn.metrics.root_mean_squared_errorRMSE metric implementation
- plot_hist(indicator, method='mean', unit='', agg_method_kwarg=None, reference_time_series=None, bins=30, colors='orange', reference_value=None, reference_label='Reference', show_rug=False, title=None)[source]
Plot histogram of aggregated results.
- Parameters:
indicator (str) – Name of the indicator column to plot.
method (str, default="mean") – Aggregation method.
unit (str, optional) – Unit of the indicator.
agg_method_kwarg (dict, optional) – Additional kwargs for aggregation.
reference_time_series (Series, optional) – Reference time series.
bins (int, default=30) – Histogram number of bins.
colors (str, default="orange") – Color of the histogram.
reference_value (int, float, optional) – Add a vertical dashed red line at reference value. May be used for comparison with an expected value
reference_label (str, optional) – Label name for reference value line to be displayed in the legend. Default is “Reference”
show_rug (bool, default=False) – If True, display rug plot below histogram.
title (str, optional) – Custom title.
- Returns:
Plotly histogram figure.
- Return type:
go.Figure
- plot_sample(indicator, reference_timeseries=None, title=None, y_label=None, x_label=None, alpha=0.5, show_legends=False, round_ndigits=2, quantile_band=0.75, type_graph='area')[source]
Plot simulation results with different visualization modes.
This function allows visualization of multiple simulation samples, either as a scatter plot of all samples or as an aggregated area with min–max envelope, median, and quantile bands.
Only works for dynamic models
- Parameters:
indicator (str, optional) – Column name to extract if inner elements are DataFrames with multiple columns.
reference_timeseries (pandas.Series, optional) – A reference time series to plot alongside simulations (e.g., measured data).
title (str, optional) – Plot title.
y_label (str, optional) – Label for the y-axis.
x_label (str, optional) – Label for the x-axis.
alpha (float, default=0.5) – Opacity for scatter markers when
type_graph="scatter".show_legends (bool, default=False) – Whether to display legends for each individual sample trace when
type_graph="scatter".round_ndigits (int, default=2) – Number of digits for rounding parameter values in legend strings.
quantile_band (float, default=0.75) – Upper quantile to display when
type_graph="area". Both(1 - quantile_band)andquantile_bandare drawn as dotted lines, e.g.0.75→ 25% and 75%.type_graph ({"area", "scatter"}, default="area") –
Visualization mode: -
"scatter": plot all samples individually as scatter markers. -"area": plot aggregated area with min–max envelope,median line, and quantile bands.
- Return type:
Figure
Examples
>>> fig = plot_sample(results, reference_timeseries=ref) >>> fig.show()
>>> fig = plot_sample(results, reference_timeseries=ref, type_graph="scatter") >>> fig.show()
- plot_pcp(indicators_configs, color_by=None, title='Parallel Coordinates — Samples', html_file_path=None)[source]
This method produces an interactive PCP visualization that allows comparison of model parameters against aggregated indicators from simulation results. It supports both dynamic and static models.
For dynamic models, the specified indicators are aggregated across time using the provided functions (e.g., “mean”, “sum”, error metrics). For static models, the indicators are taken directly from the stored results.
- Parameters:
indicators_configs (list of str or list of tuple) –
Configuration of indicators to include in the plot.
For dynamic models, each element must be a tuple of the form:
(indicator_name, method)or(indicator_name, method, reference_series).- Here:
indicator_name : str Column name in the simulation results to aggregate.
method : str or Callable Aggregation function or metric to apply.
reference_series : pandas.Series, optional Reference time series required for error-based methods (e.g., mean absolute error).
For static models, a simple list of indicator names (str) is sufficient.
color_by (str, optional) – Name of a parameter or result column to use for coloring the PCP lines. If None, all lines are plotted in the same color.
title (str, default="Parallel Coordinates — Samples") – Title of the plot.
html_file_path (str, optional) – If provided, saves the interactive plot as an HTML file at the specified path.
- Returns:
The generated parallel coordinates figure. The figure can be displayed interactively in a Jupyter notebook, web browser, or exported to HTML.
- Return type:
plotly.graph_objects.Figure
- Raises:
ValueError – If the indicators_configs are incompatible with the model type (dynamic vs static).
See also
get_aggregated_time_seriesFor details on supported aggregation methods and how indicator values are computed for dynamic models.
- __init__(parameters, is_dynamic=True, results=<factory>)
- class corrai.sampling.Sampler(parameters, model, simulation_options=None)[source]
Bases:
SampleMethodsMixinAbstract base class for parameter samplers.
A Sampler generates parameter sets according to a chosen sampling method and runs simulations of a given model.
- Parameters:
- Variables:
sample (Sample) – Container holding parameter values and simulation results.
- property parameters
- property values
- property results
- add_sample(*args, **kwargs)[source]
Generate new samples and optionally run simulations.
Must be implemented in subclasses.
- Returns:
The newly generated sample values.
- Return type:
ndarray
- append_sample_from_param_dict(param_dict, simulation_kwargs=None)[source]
Add a new sample from a parameter dictionary and simulate it.
This method appends a single sample to the existing sample set by providing parameter values as a dictionary. The keys must correspond to the
nameproperty of theParameterobjects in theparameters. After adding the sample, it automatically runs a simulation for the newly added parameters.- Parameters:
param_dict (dict of {str: int, float, or str}) – Dictionary mapping parameter names to their values. Keys must match the
nameattribute ofParameterobjects inself.parameters. All parameters must be present in the dictionary.simulation_kwargs (dict, optional) – Additional keyword arguments to pass to the simulation method. These can include simulation-specific options such as solver settings, output options, or other model-specific parameters. Default is None.
- Returns:
The method modifies the sample in place by adding a new row to
self.sampleand stores the simulation results inself.results.- Return type:
None
- Raises:
AssertionError – If any parameter name in
param_dictis not found inself.values.columns, indicating missing or misspelled parameter names.
Notes
The newly added sample is assigned the index
self.values.index[-1], which corresponds to the last index after appending.This method is useful for manual parameter exploration, adding specific test cases, or iteratively building samples based on optimization results.
Examples
Add a single sample with specific parameter values:
>>> from corrai.base.parameter import Parameter >>> from corrai.base.model import IshigamiDynamic >>> from corrai.sampling import LHSSampler >>> >>> # Define parameters >>> params = [ ... Parameter("par_x1", (-3.14, 3.14), model_property="x1"), ... Parameter("par_x2", (-3.14, 3.14), model_property="x2"), ... Parameter("par_x3", (-3.14, 3.14), model_property="x3"), ... ] >>> >>> # Create sample object >>> simulation_opts = { ... "start": "2023-01-01 00:00:00", ... "end": "2023-01-01 23:00:00", ... "timestep": "h", ... } >>> sample = LHSSampler( ... parameter_list=params, ... model=IshigamiDynamic(), ... simulation_options=simulation_opts, ... ) >>> >>> # Add a specific sample >>> new_params = {"par_x1": 1.5, "par_x2": -0.5, "par_x3": 2.0} >>> sample.append_sample_from_param_dict(new_params) >>> print(sample.values.tail(1)) par_x1 par_x2 par_x3 0 1.5 -0.5 2.0
See also
add_samplesAdd multiple samples at once using a numpy array
simulate_atSimulate a specific sample by its index
ParameterClass defining model parameters with bounds and properties
- class corrai.sampling.RealSampler(parameters, model, simulation_options=None)[source]
Bases:
SamplerAbstract base class for samplers that only support real-valued parameters.
Provides utilities for interoperability with SALib.
- Parameters:
- Raises:
ValueError – If any parameter is not of type ‘Real’.
- class corrai.sampling.MorrisSampler(parameters, model, simulation_options=None)[source]
Bases:
RealSamplerElementary Effects (Morris) sampler.
Uses SALib’s Morris method to generate trajectories and samples.
- Parameters:
- class corrai.sampling.FASTSampler(parameters, model, simulation_options=None)[source]
Bases:
RealSamplerFAST sampler.
Uses the Fourier Amplitude Sensitivity Test (FAST) to generate samples.
- Parameters:
- class corrai.sampling.RBDFASTSampler(parameters, model, simulation_options=None)[source]
Bases:
RealSamplerRBD-FAST sampler.
Generates samples for the Random Balance Designs Fourier Amplitude Sensitivity Test (RBD-FAST).
- Parameters:
- class corrai.sampling.LHSSampler(parameters, model, simulation_options=None)[source]
Bases:
RealSamplerLatin Hypercube sampler.
Uses scipy.stats.qmc.LatinHypercube to generate stratified samples in the unit hypercube.
- Parameters:
- class corrai.sampling.SobolSampler(parameters, model, simulation_options=None)[source]
Bases:
RealSamplerSobol sequence sampler.
Generates low-discrepancy quasi-random samples using SALib’s Sobol generator.
- Parameters:
- add_sample(N, simulate=True, n_cpu=1, scramble=True,
calc_second_order=True, **kwargs)
Generate Sobol samples.