leaspy.io.data.dataset
======================

.. py:module:: leaspy.io.data.dataset


Classes
-------

.. autoapisummary::

   leaspy.io.data.dataset.Dataset


Module Contents
---------------

.. py:class:: Dataset(data, *, no_warning = False)

   
   Data container based on :class:`torch.Tensor`, used to run algorithms.


   :Parameters:

       **data** : :class:`~leaspy.io.data.Data`
           Create `Dataset` from `Data` object

       **no_warning** : :obj:`bool`, default False
           Whether to deactivate warnings that are emitted by methods of this dataset instance.
           We may want to deactivate them because we rebuild a dataset per individual in scipy minimize.
           Indeed, all relevant warnings certainly occurred for the overall dataset.

   :Attributes:

       **headers** : :obj:`list` [:obj:`str`]
           Features names

       **dimension** : :obj:`int`
           Number of features

       **n_individuals** : :obj:`int`
           Number of individuals

       **indices** : :obj:`list` [:class:`~leaspy.utils.typing.IDType`]
           Order of patients

       **event_time** : :obj:`torch.FloatTensor`
           Time of an event, if the event is censored, the time correspond to the last patient observation

       **event_bool** : :obj:`torch.BoolTensor`
           Boolean to indicate if an event is censored or not: 1 observed, 0 censored

       **n_visits_per_individual** : :obj:`list` [:obj:`int`]
           Number of visits per individual

       **n_visits_max** : :obj:`int`
           Maximum number of visits for one individual

       **n_visits** : :obj:`int`
           Total number of visits

       **n_observations_per_ind_per_ft** : :obj:`torch.LongTensor`, shape (n_individuals, dimension)
           Number of observations (not taking into account missing values) per individual per feature

       **n_observations_per_ft** : :obj:`torch.LongTensor`, shape (dimension,)
           Total number of observations per feature

       **n_observations** : :obj:`int`
           Total number of observations

       **timepoints** : :obj:`torch.FloatTensor`, shape (n_individuals, n_visits_max)
           Ages of patients at their different visits

       **values** : :obj:`torch.FloatTensor`, shape (n_individuals, n_visits_max, dimension)
           Values of patients for each visit for each feature

       **mask** : :obj:`torch.FloatTensor`, shape (n_individuals, n_visits_max, dimension)
           Binary mask associated to values.
           If 1: value is meaningful
           If 0: value is meaningless (either was nan or does not correspond to a real visit - only here for padding)

       **L2_norm_per_ft** : :obj:`torch.FloatTensor`, shape (dimension,)
           Sum of all non-nan squared values, feature per feature

       **L2_norm** : scalar :obj:`torch.FloatTensor`
           Sum of all non-nan squared values

       **no_warning** : :obj:`bool`, default False
           Whether to deactivate warnings that are emitted by methods of this dataset instance.
           We may want to deactivate them because we rebuild a dataset per individual in scipy minimize.
           Indeed, all relevant warnings certainly occurred for the overall dataset.

       **_one_hot_encoding** : :obj:`dict` [:obj:`bool`, :obj:`torch.LongTensor`]
           Values of patients for each visit for each feature, but tensorized into a one-hot encoding (pdf or sf)
           Shapes of tensors are (n_individuals, n_visits_max, dimension, max_ordinal_level [-1 when `sf=True`])


   :Raises:

       :exc:`.LeaspyInputError`
           if data, model or algo are not compatible together.


   ..
       !! processed by numpydoc !!

   .. py:attribute:: n_individuals


   .. py:attribute:: indices


   .. py:attribute:: headers
      :type:  list[leaspy.utils.typing.FeatureType]


   .. py:attribute:: dimension
      :type:  int


   .. py:attribute:: n_visits
      :type:  int


   .. py:attribute:: timepoints
      :type:  Optional[torch.FloatTensor]
      :value: None


   .. py:attribute:: values
      :type:  Optional[torch.FloatTensor]
      :value: None


   .. py:attribute:: mask
      :type:  Optional[torch.FloatTensor]
      :value: None


   .. py:attribute:: n_observations
      :type:  Optional[int]
      :value: None


   .. py:attribute:: n_observations_per_ft
      :type:  Optional[torch.LongTensor]
      :value: None


   .. py:attribute:: n_observations_per_ind_per_ft
      :type:  Optional[torch.LongTensor]
      :value: None


   .. py:attribute:: n_visits_per_individual
      :type:  Optional[list[int]]
      :value: None


   .. py:attribute:: n_visits_max
      :type:  Optional[int]
      :value: None


   .. py:attribute:: event_time_name
      :type:  Optional[str]


   .. py:attribute:: event_bool_name
      :type:  Optional[str]


   .. py:attribute:: event_time
      :type:  Optional[torch.FloatTensor]
      :value: None


   .. py:attribute:: event_bool
      :type:  Optional[torch.IntTensor]
      :value: None


   .. py:attribute:: covariate_names
      :type:  Optional[list[str]]


   .. py:attribute:: covariates
      :type:  Optional[torch.IntTensor]
      :value: None


   .. py:attribute:: L2_norm_per_ft
      :type:  Optional[torch.FloatTensor]
      :value: None


   .. py:attribute:: L2_norm
      :type:  Optional[torch.FloatTensor]
      :value: None


   .. py:attribute:: no_warning
      :value: False


   .. py:method:: get_times_patient(i)

      
      Get ages for patient number ``i``


      :Parameters:

          **i** : :obj:`int`
              The index of the patient (<!> not its identifier)


      :Returns:

          :obj:`torch.Tensor`, shape (n_obs_of_patient,)
              Contains float


      ..
          !! processed by numpydoc !!


   .. py:method:: get_event_patient(idx_patient)

      
      Get ages at event for patient number ``idx_patient``


      :Parameters:

          **idx_patient** : :obj:`int`
              The index of the patient (<!> not its identifier)


      :Returns:

          :obj:`tuple` [:obj:`torch.Tensor`, :obj:`torch.Tensor`] , shape (n_obs_of_patient,)
              Contains float


      ..
          !! processed by numpydoc !!


   .. py:method:: get_covariates_patient(idx_patient)

      
      Get covariates for patient number ``idx_patient``


      :Parameters:

          **idx_patient** : :obj:`int`
              The index of the patient (<!> not its identifier)


      :Returns:

          :obj:`torch.Tensor`, shape (n_obs_of_patient,)
              Contains float


      :Raises:

          :exc:`.ValueError`
              If the dataset has no covariates.


      ..
          !! processed by numpydoc !!


   .. py:method:: get_values_patient(i, *, adapt_for_model=None)

      
      Get values for patient number ``i``, with nans.


      :Parameters:

          **i** : :obj:`int`
              The index of the patient (<!> not its identifier)

          **adapt_for_model** : None, default or :class:`~leaspy.models.mcmc_saem_compatible.McmcSaemCompatibleModel`
              The values returned are suited for this model.
              In particular:
                  * For model with `noise_model='ordinal'` will return one-hot-encoded values [P(X = l), l=0..ordinal_max_level]
                  * For model with `noise_model='ordinal_ranking'` will return survival function values [P(X > l), l=0..ordinal_max_level-1]
              If None, we return the raw values, whatever the model is.


      :Returns:

          :obj:`torch.Tensor`, shape (n_obs_of_patient, dimension [, extra_dimension_for_ordinal_models])
              Contains float or nans


      ..
          !! processed by numpydoc !!


   .. py:method:: to_pandas(apply_headers = False)

      
      Convert dataset to a `DataFrame` with ['ID', 'TIME'] index, with all covariates, events and repeated measures if
      apply_headers is False, and only the repeated measures otherwise.


      :Parameters:

          **apply_headers** : :obj:`bool`
              Enable to select only the columns that are needed for leaspy fit (headers attribute)


      :Returns:

          :obj:`pandas.DataFrame`
              DataFrame with index ['ID', 'TIME'] and columns corresponding to the features, events and covariates.


      :Raises:

          :exc:`.LeaspyInputError`
              If the index of the DataFrame is not unique or contains invalid values.


      ..
          !! processed by numpydoc !!


   .. py:method:: move_to_device(device)

      
      Moves the dataset to the specified device.


      :Parameters:

          **device** : :obj:`torch.device`
              ..


      ..
          !! processed by numpydoc !!


   .. py:method:: get_one_hot_encoding(*, sf, ordinal_infos)

      
      Builds the one-hot encoding of ordinal data once and for all and returns it.


      :Parameters:

          **sf** : :obj:`bool`
              Whether the vector should be the survival function [1(X > l), l=0..max_level-1]
              instead of the probability density function [1(X=l), l=0..max_level]

          **ordinal_infos** : :class:`~leaspy.utils.typing.KwargsType`
              All the hyperparameters concerning ordinal modelling (in particular maximum level per features)


      :Returns:

          :obj:`torch.LongTensor`
              One-hot encoding of data values.


      :Raises:

          :exc:`.LeaspyInputError`
              If the values are not non-negative integers or if the features in `ordinal_infos` are not consistent with the dataset headers.


      ..
          !! processed by numpydoc !!