leaspy.io.data.dataset ====================== .. py:module:: leaspy.io.data.dataset Classes ------- .. autoapisummary:: leaspy.io.data.dataset.Dataset Module Contents --------------- .. py:class:: Dataset(data, *, no_warning = False) Data container based on :class:`torch.Tensor`, used to run algorithms. :Parameters: **data** : :class:`~leaspy.io.data.Data` Create `Dataset` from `Data` object **no_warning** : :obj:`bool`, default False Whether to deactivate warnings that are emitted by methods of this dataset instance. We may want to deactivate them because we rebuild a dataset per individual in scipy minimize. Indeed, all relevant warnings certainly occurred for the overall dataset. :Attributes: **headers** : :obj:`list` [:obj:`str`] Features names **dimension** : :obj:`int` Number of features **n_individuals** : :obj:`int` Number of individuals **indices** : :obj:`list` [:class:`~leaspy.utils.typing.IDType`] Order of patients **event_time** : :obj:`torch.FloatTensor` Time of an event, if the event is censored, the time correspond to the last patient observation **event_bool** : :obj:`torch.BoolTensor` Boolean to indicate if an event is censored or not: 1 observed, 0 censored **n_visits_per_individual** : :obj:`list` [:obj:`int`] Number of visits per individual **n_visits_max** : :obj:`int` Maximum number of visits for one individual **n_visits** : :obj:`int` Total number of visits **n_observations_per_ind_per_ft** : :obj:`torch.LongTensor`, shape (n_individuals, dimension) Number of observations (not taking into account missing values) per individual per feature **n_observations_per_ft** : :obj:`torch.LongTensor`, shape (dimension,) Total number of observations per feature **n_observations** : :obj:`int` Total number of observations **timepoints** : :obj:`torch.FloatTensor`, shape (n_individuals, n_visits_max) Ages of patients at their different visits **values** : :obj:`torch.FloatTensor`, shape (n_individuals, n_visits_max, dimension) Values of patients for each visit for each feature **mask** : :obj:`torch.FloatTensor`, shape (n_individuals, n_visits_max, dimension) Binary mask associated to values. If 1: value is meaningful If 0: value is meaningless (either was nan or does not correspond to a real visit - only here for padding) **L2_norm_per_ft** : :obj:`torch.FloatTensor`, shape (dimension,) Sum of all non-nan squared values, feature per feature **L2_norm** : scalar :obj:`torch.FloatTensor` Sum of all non-nan squared values **no_warning** : :obj:`bool`, default False Whether to deactivate warnings that are emitted by methods of this dataset instance. We may want to deactivate them because we rebuild a dataset per individual in scipy minimize. Indeed, all relevant warnings certainly occurred for the overall dataset. **_one_hot_encoding** : :obj:`dict` [:obj:`bool`, :obj:`torch.LongTensor`] Values of patients for each visit for each feature, but tensorized into a one-hot encoding (pdf or sf) Shapes of tensors are (n_individuals, n_visits_max, dimension, max_ordinal_level [-1 when `sf=True`]) :Raises: :exc:`.LeaspyInputError` if data, model or algo are not compatible together. .. !! processed by numpydoc !! .. py:attribute:: n_individuals .. py:attribute:: indices .. py:attribute:: headers :type: list[leaspy.utils.typing.FeatureType] .. py:attribute:: dimension :type: int .. py:attribute:: n_visits :type: int .. py:attribute:: timepoints :type: Optional[torch.FloatTensor] :value: None .. py:attribute:: values :type: Optional[torch.FloatTensor] :value: None .. py:attribute:: mask :type: Optional[torch.FloatTensor] :value: None .. py:attribute:: n_observations :type: Optional[int] :value: None .. py:attribute:: n_observations_per_ft :type: Optional[torch.LongTensor] :value: None .. py:attribute:: n_observations_per_ind_per_ft :type: Optional[torch.LongTensor] :value: None .. py:attribute:: n_visits_per_individual :type: Optional[list[int]] :value: None .. py:attribute:: n_visits_max :type: Optional[int] :value: None .. py:attribute:: event_time_name :type: Optional[str] .. py:attribute:: event_bool_name :type: Optional[str] .. py:attribute:: event_time :type: Optional[torch.FloatTensor] :value: None .. py:attribute:: event_bool :type: Optional[torch.IntTensor] :value: None .. py:attribute:: covariate_names :type: Optional[list[str]] .. py:attribute:: covariates :type: Optional[torch.IntTensor] :value: None .. py:attribute:: L2_norm_per_ft :type: Optional[torch.FloatTensor] :value: None .. py:attribute:: L2_norm :type: Optional[torch.FloatTensor] :value: None .. py:attribute:: no_warning :value: False .. py:method:: get_times_patient(i) Get ages for patient number ``i`` :Parameters: **i** : :obj:`int` The index of the patient ( not its identifier) :Returns: :obj:`torch.Tensor`, shape (n_obs_of_patient,) Contains float .. !! processed by numpydoc !! .. py:method:: get_event_patient(idx_patient) Get ages at event for patient number ``idx_patient`` :Parameters: **idx_patient** : :obj:`int` The index of the patient ( not its identifier) :Returns: :obj:`tuple` [:obj:`torch.Tensor`, :obj:`torch.Tensor`] , shape (n_obs_of_patient,) Contains float .. !! processed by numpydoc !! .. py:method:: get_covariates_patient(idx_patient) Get covariates for patient number ``idx_patient`` :Parameters: **idx_patient** : :obj:`int` The index of the patient ( not its identifier) :Returns: :obj:`torch.Tensor`, shape (n_obs_of_patient,) Contains float :Raises: :exc:`.ValueError` If the dataset has no covariates. .. !! processed by numpydoc !! .. py:method:: get_values_patient(i, *, adapt_for_model=None) Get values for patient number ``i``, with nans. :Parameters: **i** : :obj:`int` The index of the patient ( not its identifier) **adapt_for_model** : None, default or :class:`~leaspy.models.mcmc_saem_compatible.McmcSaemCompatibleModel` The values returned are suited for this model. In particular: * For model with `noise_model='ordinal'` will return one-hot-encoded values [P(X = l), l=0..ordinal_max_level] * For model with `noise_model='ordinal_ranking'` will return survival function values [P(X > l), l=0..ordinal_max_level-1] If None, we return the raw values, whatever the model is. :Returns: :obj:`torch.Tensor`, shape (n_obs_of_patient, dimension [, extra_dimension_for_ordinal_models]) Contains float or nans .. !! processed by numpydoc !! .. py:method:: to_pandas(apply_headers = False) Convert dataset to a `DataFrame` with ['ID', 'TIME'] index, with all covariates, events and repeated measures if apply_headers is False, and only the repeated measures otherwise. :Parameters: **apply_headers** : :obj:`bool` Enable to select only the columns that are needed for leaspy fit (headers attribute) :Returns: :obj:`pandas.DataFrame` DataFrame with index ['ID', 'TIME'] and columns corresponding to the features, events and covariates. :Raises: :exc:`.LeaspyInputError` If the index of the DataFrame is not unique or contains invalid values. .. !! processed by numpydoc !! .. py:method:: move_to_device(device) Moves the dataset to the specified device. :Parameters: **device** : :obj:`torch.device` .. .. !! processed by numpydoc !! .. py:method:: get_one_hot_encoding(*, sf, ordinal_infos) Builds the one-hot encoding of ordinal data once and for all and returns it. :Parameters: **sf** : :obj:`bool` Whether the vector should be the survival function [1(X > l), l=0..max_level-1] instead of the probability density function [1(X=l), l=0..max_level] **ordinal_infos** : :class:`~leaspy.utils.typing.KwargsType` All the hyperparameters concerning ordinal modelling (in particular maximum level per features) :Returns: :obj:`torch.LongTensor` One-hot encoding of data values. :Raises: :exc:`.LeaspyInputError` If the values are not non-negative integers or if the features in `ordinal_infos` are not consistent with the dataset headers. .. !! processed by numpydoc !!