skdatasets.utils.experiment.create_experiments

skdatasets.utils.experiment.create_experiments(*, datasets: Mapping[str, Union[Bunch, Callable[[...], Bunch], Tuple[Callable[[...], Bunch], Union[Mapping[str, Any], str]]]], estimators: Mapping[str, Union[EstimatorProtocol[Any, Any], Callable[[...], EstimatorProtocol[Any, Any]], Tuple[Callable[[...], EstimatorProtocol[Any, Any]], Union[Mapping[str, Any], str]]]], storage: sacred.observers.base.RunObserver | str, config: Optional[Union[Mapping[str, Any], str]] = None, inner_cv: Union[CVSplitter, Iterable[Tuple[ndarray[Any, dtype[int]], ndarray[Any, dtype[int]]]], int, None, Literal[False, 'dataset']] = False, outer_cv: Union[CVSplitter, Iterable[Tuple[ndarray[Any, dtype[int]], ndarray[Any, dtype[int]]]], int, None, Literal[False, 'dataset']] = None, save_estimator: bool = False, save_train: bool = False) Sequence[Experiment][source]

Create several Sacred experiments.

It receives a set of estimators and datasets, and create Sacred experiment objects for them.

Parameters:
  • datasets (Mapping) –

    Mapping where each key is the name for a dataset and each value is either:

    • A sklearn.utils.Bunch with the fields explained in Dataset structure. Only data and target are mandatory.

    • A function receiving arbitrary config values and returning a sklearn.utils.Bunch object like the one explained above.

    • A tuple with such a function and additional configuration (either a mapping or a filename).

  • estimators (Mapping) –

    Mapping where each key is the name for a estimator and each value is either:

    • A scikit-learn compatible estimator.

    • A function receiving arbitrary config values and returning a scikit-learn compatible estimator.

    • A tuple with such a function and additional configuration (either a mapping or a filename).

  • storage (sacred.observers.RunObserver or str) – Where the experiments will be stored. Either a Sacred observer, for example to store in a Mongo database, or the name of a directory, to use a file observer.

  • config (Mapping, str or None, default None) – A mapping or filename with additional configuration for the experiment.

  • inner_cv (CV-like object, "datasets" or False, default False) –

    For estimators that perform cross validation (they have a cv parameter) this sets the cross validation strategy, as follows:

    • If False the original value of cv is unchanged.

    • If "dataset", the sklearn.utils.Bunch objects for the datasets must have a inner_cv attribute, which will be the one used.

    • Otherwise, cv is changed to this value.

  • outer_cv (CV-like object, "datasets" or False, default None) –

    The strategy used to evaluate different partitions of the data, as follows:

    • If False use only one partition: the one specified in the dataset. Thus the sklearn.utils.Bunch objects for the datasets should have defined at least a train and a test partition.

    • If "dataset", the sklearn.utils.Bunch objects for the datasets must have a outer_cv attribute, which will be the one used.

    • Otherwise, this will be passed to sklearn.model_selection.check_cv() and the resulting cross validator will be used to define the partitions.

  • save_estimator (bool, default False) – Whether to save the fitted estimator. This is useful for debugging and for obtaining extra information in some cases, but for some estimators it could consume much storage.

  • save_train (bool, default False) – If True, compute and store also the score over the train data.

Returns:

experiments – Sequence of Sacred experiments, ready to be run.

Return type:

Sequence of sacred.Experiment