Dataset structure ================= Most of the repositories available in scikit-datasets have datasets in some regular format. In that case, its corresponding ``fetch`` function in scikit-datasets converts the data to a standardized format, similar to the one used in scikit-learn, but with new optional fields for additional features that some repositories include, such as indices for train, validation and test partitions. .. note:: Data in the CRAN repository is unstructured, and thus there is no ``fetch`` function for it. The data is returned in the original format. The structure is a :external:class:`~sklearn.utils.Bunch` object with the following fields: - ``data``: The matrix of observed data. A 2d NumPy array, ready to be used with scikit-learn tools. Each row correspond to a different observation while each column is a particular feature. For datasets with train, validation and test partitions, the whole data is included here. Use ``train_indices``, ``validation_indices`` and ``test_indices`` to select each partition. - ``target``: The target of the classification or regression problem. This is a 1d NumPy array except for multioutput problems, in with it is a 2d array, where each column correspond to a different output. - ``DESCR``: A human readable description of the dataset. - ``feature_names``: The list of feature names, if the repository has that information available. - ``target_names``: For classification problems, this correspond to the names of the different classes, if available. Note that this field in scikit-learn is used in some cases for naming the outputs in multioutput problems. As we will try to maintain compatibility with scikit-learn, the meaning of this field could change in future versions. - ``train_indices``: Indexes of the elements of the train partition, if available in the repository. - ``validation_indices``: Indexes of the elements of the validation partition, if available in the repository. - ``test_indices``: Indexes of the elements of the test partition, if available in the repository. - ``inner_cv``: A :external:term:`CV splitter` object, usable for cross validation and hyperparameter selection, if the repository provides a cross validation strategy (such as using a particular validation partition). - ``outer_cv``: A Python iterable over different train and test partitions, when they are provided in the repository.