Surrogate Module
- class corrai.surrogate.ModelTrainer(model, test_size=0.2, random_state=42)[source]
Bases:
object- __init__(model, test_size=0.2, random_state=42)[source]
Initialize a ModelTrainer instance for training a machine learning model.
- Parameters:
model_pipe – A scikit-learn compatible model pipeline for training and prediction.
test_size (
float) – The proportion of the dataset to set aside as the test set (default: 0.2).random_state (
float) – Seed for random number generation to ensure reproducibility (default: 42).
The ModelTrainer prepares data for training and evaluation of the specified model.
Attributes: - test_size: The proportion of data to be used as the test set. - model_pipe: The machine learning model pipeline to be trained. - random_state: Seed for random number generation. - x_train: Training data features. - x_test: Test data features. - y_train: Training data labels. - y_test: Test data labels. - _is_trained: A boolean indicating if the model has been trained.
- property test_nmbe_score
- property test_cvrmse_score
- class corrai.surrogate.MultiModelSO(models=None, cv=3, scoring='neg_mean_squared_error', fine_tuning=True, tuning_n_iter=None, use_continuous_distributions=False, n_jobs=-1, random_state=None)[source]
Bases:
BaseEstimator,RegressorMixinMulti-model selection and optimization wrapper for scikit-learn regressors.
This class automates model training, cross-validation scoring, model selection, and optional fine-tuning via grid search. It compares multiple candidate models and selects the one with the best cross-validation performance according to a specified scoring metric.
- Parameters:
models (
list[str]) –- List of model keys to evaluate. Must be a subset of
MODEL_MAP. ”TREE_REGRESSOR”, “RANDOM_FOREST”, “LINEAR_REGRESSION”, “LINEAR_SECOND_ORDER”, “LINEAR_THIRD_ORDER”, “SUPPORT_VECTOR”, “MULTI_LAYER_PERCEPTRON”
If
None(default), all models inMODEL_MAPare evaluated.- List of model keys to evaluate. Must be a subset of
cv (
int) – Number of cross-validation folds to use for model comparison.fine_tuning (
bool) – If True, perform a grid search on the best model to fine-tune its hyperparameters.scoring (
str) – Scoring function to evaluate models. Should be a valid scikit-learn scorer string (e.g."r2","neg_mean_absolute_error").n_jobs (
int) – Number of parallel jobs for cross-validation and grid search.-1means using all available cores.random_state (
int) – Random seed for reproducibility.
- Variables:
model_map (dict) – Dictionary mapping model keys to fitted estimator instances.
best_model_key (str) – The key of the best-performing model after training.
_is_fitted (bool) – Whether the estimator has been fitted.
- fine_tune(X, y, model=None, verbose=3)[source]
Perform grid search hyperparameter tuning on a given model.
Examples
>>> import pandas as pd >>> from sklearn.datasets import load_diabetes >>> from sklearn.model_selection import train_test_split >>> from corrai.surrogate import MultiModelSO >>> >>> data = load_diabetes(as_frame=True) >>> X = data.data >>> y = data.target >>> X_train, X_test, y_train, y_test = train_test_split( ... X, y, test_size=0.2, random_state=42 ... ) >>> >>> model = MultiModelSO( ... models=["LINEAR_REGRESSION", "RANDOM_FOREST"], cv=5, fine_tuning=False ... ) >>> model.fit(X_train, y_train, verbose=True) === Training results === Cross validation neg_mean_squared_error scores of 5 folds mean(neg_mean_squared_error) std(neg_mean_squared_error) RANDOM_FOREST -3143.015307 355.466814 LINEAR_REGRESSION -3425.368758 525.460964 >>> y_pred = model.predict(X_test) >>> y_pred.head() 0 287 139.547558 211 179.517208 72 134.038756 321 291.417029 73 123.789659
>>> # Fast configuration and training (development) >>> model = MultiModelSO( ... models=["LINEAR_REGRESSION", "RANDOM_FOREST", "MULTI_LAYER_PERCEPTRON"], ... cv=3, ... fine_tuning=True, ... tuning_n_iter=TUNING_N_ITER_BY_MODEL, ... use_continuous_distributions=False, ... n_jobs=-1, ... )
>>> # Optimal configuration (production) >>> model = MultiModelSO( ... models=None, ... cv=5, ... fine_tuning=True, ... tuning_n_iter=TUNING_N_ITER_BY_MODEL, ... use_continuous_distributions=True, ... n_jobs=-1, ... random_state=42, ... )
- __init__(models=None, cv=3, scoring='neg_mean_squared_error', fine_tuning=True, tuning_n_iter=None, use_continuous_distributions=False, n_jobs=-1, random_state=None)[source]
- property feature_names_in_
- set_fit_request(*, verbose: bool | None | str = '$UNCHANGED$') MultiModelSO
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
verbose (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
verboseparameter infit.- Returns:
self – The updated object.
- Return type:
object
- set_predict_request(*, model: bool | None | str = '$UNCHANGED$') MultiModelSO
Configure whether metadata should be requested to be passed to the
predictmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredictif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
model (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
modelparameter inpredict.- Returns:
self – The updated object.
- Return type:
object
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') MultiModelSO
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weightparameter inscore.- Returns:
self – The updated object.
- Return type:
object
- class corrai.surrogate.StaticScikitModel(scikit_model, target_name=None)[source]
Bases:
ModelWrapper class for static surrogate MultiModelSingleOutput class and scikit-learn regressors within the Corrai framework.
This class adapts corrai’s MultiModelSO and scikit-learn models to the
Modelinterface, enabling parameter-to-property mapping and simulation execution. It is intended for non-dynamic (static) models where outputs are single values or vectors rather than time-dependent series.- Parameters:
scikit_model (
MultiModelSO|RegressorMixin) – The underlying scikit-learn model or a CorraiMultiModelSOmeta-estimator.target_name (
str) – Name of the output variable. Required whenscikit_modelis not an instance ofMultiModelSO.
- Variables:
is_dynamic (bool) – Always
Falsefor this wrapper, since it represents static models.scikit_model (MultiModelSO or RegressorMixin) – The wrapped scikit-learn model used for predictions.
target_name (str) – Output variable name.
- Raises:
ValueError – If
target_namecannot be inferred and is not provided.
- simulate(property_dict=None, simulation_options=None, **simulation_kwargs)[source]
Run the scikit-learn model prediction.
Combines provided parameter values and simulation options into a feature vector, validates compatibility with the underlying model, and returns predictions as a pandas Series.
- Parameters:
property_dict (
dict[str,str|int|float]) – Mapping from feature names to values to use for prediction.simulation_options (
dict) – Additional feature overrides or configuration parameters to include in the feature vector. These values override those inproperty_dictif keys overlap.**simulation_kwargs – Extra keyword arguments for future extensions (currently unused).
- Returns:
Prediction results with index
[self.target_name].- Return type:
Series- Raises:
ValueError – If unknown feature names are provided.