mercurial.utils.validation module

Validation utilities: cross‑validation, regularization, early stopping.

class mercurial.utils.validation.CrossValidator(n_splits: int = 5, shuffle: bool = True, random_seed: int = 42)[source]

Bases: object

k‑fold cross‑validation for Atlas case simulations.

Splits cases into training and validation folds, runs simulations, and returns performance metrics per fold.

Methods

validate(case_functions, parameter_sets[, ...])

Perform cross‑validation.

validate(case_functions: ~typing.Dict[str, ~typing.Callable], parameter_sets: ~typing.List[~typing.Dict[str, ~typing.Any]], metric: ~typing.Callable[[~typing.Dict], float] = <function CrossValidator.<lambda>>) Dict[str, Any][source]

Perform cross‑validation.

Parameters:
case_functionsdict

Mapping case_name -> function that runs simulation and returns results dict.

parameter_setslist of dict

Different parameter configurations to evaluate (e.g., different λ values).

metriccallable

Function to compute accuracy/score from simulation results.

Returns:
resultsdict

Contains fold metrics, mean scores, std devs per parameter set.

class mercurial.utils.validation.EarlyStopping(patience: int = 5, min_delta: float = 0.0001)[source]

Bases: object

Early stopping to prevent overfitting during iterative calibration. Stops when validation loss stops improving.

Methods

step(current_loss, current_params)

Update state.

reset

reset()[source]
step(current_loss: float, current_params: Any) bool[source]

Update state. Returns True if training should continue, False if stop.

class mercurial.utils.validation.Regularization(lambda_reg: float = 0.01)[source]

Bases: object

L2 regularization (ridge) for free energy or loss function. Adds λ * ||θ||₂² to the objective.

Methods

gradient(parameters)

Gradient of penalty: 2λ * θ.

penalty(parameters)

Compute L2 penalty: λ * Σ θ_i².

gradient(parameters: ndarray) ndarray[source]

Gradient of penalty: 2λ * θ.

penalty(parameters: ndarray) float[source]

Compute L2 penalty: λ * Σ θ_i².

mercurial.utils.validation.bootstrap_confidence_intervals(scores: List[float], n_resamples: int = 1000, ci: float = 0.95) Tuple[float, float, float][source]

Compute bootstrap CI for mean of scores. Returns (mean, lower, upper).

mercurial.utils.validation.generate_permuted_scores(model_func, X: ndarray, y: ndarray, n_permutations: int = 100, cv_folds: int = 5) List[List[float]][source]

Generate permuted scores by shuffling the target variable y.

Parameters:
model_funccallable

Function that takes (X_train, y_train, X_test) and returns predictions.

Xnp.ndarray

Feature matrix.

ynp.ndarray

Target values (to be shuffled).

n_permutationsint

Number of permutations.

cv_foldsint

Number of cross‑validation folds.

Returns:
list of list

For each permutation, a list of fold scores.

mercurial.utils.validation.learning_curve_analysis(train_scores: List[float], val_scores: List[float], train_sizes: List[int], baseline_error: float | None = None) Dict[source]

Analyze learning curves for underfitting signs using relative metrics.

Parameters:
train_scoreslist

Training scores (e.g., MAE) at different training set sizes.

val_scoreslist

Validation scores.

train_sizeslist

Number of training samples used.

baseline_errorfloat, optional

Error of trivial baseline (e.g., predict mean). If provided, used to judge underfitting.

Returns:
dict

Contains: - ‘converged’: bool (if validation score plateaued) - ‘gap’: float (final train - val gap) - ‘underfitting_suspected’: bool (if model is not clearly better than baseline) - ‘final_val_score’: float - ‘improvement_over_baseline’: float (if baseline provided)

mercurial.utils.validation.permutation_test(actual_scores: List[float], shuffled_scores: List[List[float]], n_permutations: int = 1000) Dict[str, float][source]

Perform a permutation test to assess statistical significance.

Parameters:
actual_scoreslist of float

Model accuracy scores on the original data (e.g., cross‑validation folds).

shuffled_scoreslist of list of float

For each permutation, a list of scores (same length as actual_scores) obtained by shuffling the relationship between inputs and outputs.

n_permutationsint

Number of permutations performed.

Returns:
dict

Contains ‘p_value’ (two‑tailed), ‘mean_shuffled’, ‘std_shuffled’, ‘original_mean’, and ‘is_significant’ (True if p < 0.05).

mercurial.utils.validation.underfitting_detection(train_errors: List[float], val_errors: List[float], baseline_error: float, threshold_ratio: float = 0.8) Dict[source]

Detect underfitting by comparing model performance to baseline.

Parameters:
train_errorslist

Training errors (e.g., MAE) for each fold or epoch.

val_errorslist

Validation errors (same length).

baseline_errorfloat

Error of a trivial baseline (e.g., predicting mean).

threshold_ratiofloat

If model’s validation error > baseline_error * threshold_ratio, consider it underfitting (baseline is better or comparable).

Returns:
dict

Contains: - ‘is_underfitting’: bool - ‘reason’: str - ‘model_val_error’: float - ‘baseline_error’: float - ‘improvement_ratio’: float (baseline / model error, >1 means model better)