mercurial.utils.validation module
Validation utilities: cross‑validation, regularization, early stopping.
- class mercurial.utils.validation.CrossValidator(n_splits: int = 5, shuffle: bool = True, random_seed: int = 42)[source]
Bases:
objectk‑fold cross‑validation for Atlas case simulations.
Splits cases into training and validation folds, runs simulations, and returns performance metrics per fold.
Methods
validate(case_functions, parameter_sets[, ...])Perform cross‑validation.
- validate(case_functions: ~typing.Dict[str, ~typing.Callable], parameter_sets: ~typing.List[~typing.Dict[str, ~typing.Any]], metric: ~typing.Callable[[~typing.Dict], float] = <function CrossValidator.<lambda>>) Dict[str, Any][source]
Perform cross‑validation.
- Parameters:
- case_functionsdict
Mapping case_name -> function that runs simulation and returns results dict.
- parameter_setslist of dict
Different parameter configurations to evaluate (e.g., different λ values).
- metriccallable
Function to compute accuracy/score from simulation results.
- Returns:
- resultsdict
Contains fold metrics, mean scores, std devs per parameter set.
- class mercurial.utils.validation.EarlyStopping(patience: int = 5, min_delta: float = 0.0001)[source]
Bases:
objectEarly stopping to prevent overfitting during iterative calibration. Stops when validation loss stops improving.
Methods
step(current_loss, current_params)Update state.
reset
- class mercurial.utils.validation.Regularization(lambda_reg: float = 0.01)[source]
Bases:
objectL2 regularization (ridge) for free energy or loss function. Adds λ * ||θ||₂² to the objective.
Methods
gradient(parameters)Gradient of penalty: 2λ * θ.
penalty(parameters)Compute L2 penalty: λ * Σ θ_i².
- mercurial.utils.validation.bootstrap_confidence_intervals(scores: List[float], n_resamples: int = 1000, ci: float = 0.95) Tuple[float, float, float][source]
Compute bootstrap CI for mean of scores. Returns (mean, lower, upper).
- mercurial.utils.validation.generate_permuted_scores(model_func, X: ndarray, y: ndarray, n_permutations: int = 100, cv_folds: int = 5) List[List[float]][source]
Generate permuted scores by shuffling the target variable y.
- Parameters:
- model_funccallable
Function that takes (X_train, y_train, X_test) and returns predictions.
- Xnp.ndarray
Feature matrix.
- ynp.ndarray
Target values (to be shuffled).
- n_permutationsint
Number of permutations.
- cv_foldsint
Number of cross‑validation folds.
- Returns:
- list of list
For each permutation, a list of fold scores.
- mercurial.utils.validation.learning_curve_analysis(train_scores: List[float], val_scores: List[float], train_sizes: List[int], baseline_error: float | None = None) Dict[source]
Analyze learning curves for underfitting signs using relative metrics.
- Parameters:
- train_scoreslist
Training scores (e.g., MAE) at different training set sizes.
- val_scoreslist
Validation scores.
- train_sizeslist
Number of training samples used.
- baseline_errorfloat, optional
Error of trivial baseline (e.g., predict mean). If provided, used to judge underfitting.
- Returns:
- dict
Contains: - ‘converged’: bool (if validation score plateaued) - ‘gap’: float (final train - val gap) - ‘underfitting_suspected’: bool (if model is not clearly better than baseline) - ‘final_val_score’: float - ‘improvement_over_baseline’: float (if baseline provided)
- mercurial.utils.validation.permutation_test(actual_scores: List[float], shuffled_scores: List[List[float]], n_permutations: int = 1000) Dict[str, float][source]
Perform a permutation test to assess statistical significance.
- Parameters:
- actual_scoreslist of float
Model accuracy scores on the original data (e.g., cross‑validation folds).
- shuffled_scoreslist of list of float
For each permutation, a list of scores (same length as actual_scores) obtained by shuffling the relationship between inputs and outputs.
- n_permutationsint
Number of permutations performed.
- Returns:
- dict
Contains ‘p_value’ (two‑tailed), ‘mean_shuffled’, ‘std_shuffled’, ‘original_mean’, and ‘is_significant’ (True if p < 0.05).
- mercurial.utils.validation.underfitting_detection(train_errors: List[float], val_errors: List[float], baseline_error: float, threshold_ratio: float = 0.8) Dict[source]
Detect underfitting by comparing model performance to baseline.
- Parameters:
- train_errorslist
Training errors (e.g., MAE) for each fold or epoch.
- val_errorslist
Validation errors (same length).
- baseline_errorfloat
Error of a trivial baseline (e.g., predicting mean).
- threshold_ratiofloat
If model’s validation error > baseline_error * threshold_ratio, consider it underfitting (baseline is better or comparable).
- Returns:
- dict
Contains: - ‘is_underfitting’: bool - ‘reason’: str - ‘model_val_error’: float - ‘baseline_error’: float - ‘improvement_ratio’: float (baseline / model error, >1 means model better)