leaspy.io.data.dataset¶
Classes¶
Data container based on |
Module Contents¶
- class Dataset(data, *, no_warning=False)[source]¶
Data container based on
torch.Tensor, used to run algorithms.- Parameters:
- data
Data Create Dataset from Data object
- no_warning
bool, default False Whether to deactivate warnings that are emitted by methods of this dataset instance. We may want to deactivate them because we rebuild a dataset per individual in scipy minimize. Indeed, all relevant warnings certainly occurred for the overall dataset.
- data
- Attributes:
- headers
list[str] Features names
- dimension
int Number of features
- n_individuals
int Number of individuals
- indices
list[IDType] Order of patients
- event_time
torch.FloatTensor Time of an event, if the event is censored, the time correspond to the last patient observation
- event_bool
torch.BoolTensor Boolean to indicate if an event is censored or not: 1 observed, 0 censored
- n_visits_per_individual
list[int] Number of visits per individual
- n_visits_max
int Maximum number of visits for one individual
- n_visits
int Total number of visits
- n_observations_per_ind_per_ft
torch.LongTensor, shape (n_individuals, dimension) Number of observations (not taking into account missing values) per individual per feature
- n_observations_per_ft
torch.LongTensor, shape (dimension,) Total number of observations per feature
- n_observations
int Total number of observations
- timepoints
torch.FloatTensor, shape (n_individuals, n_visits_max) Ages of patients at their different visits
- values
torch.FloatTensor, shape (n_individuals, n_visits_max, dimension) Values of patients for each visit for each feature
- mask
torch.FloatTensor, shape (n_individuals, n_visits_max, dimension) Binary mask associated to values. If 1: value is meaningful If 0: value is meaningless (either was nan or does not correspond to a real visit - only here for padding)
- L2_norm_per_ft
torch.FloatTensor, shape (dimension,) Sum of all non-nan squared values, feature per feature
- L2_normscalar
torch.FloatTensor Sum of all non-nan squared values
- no_warning
bool, default False Whether to deactivate warnings that are emitted by methods of this dataset instance. We may want to deactivate them because we rebuild a dataset per individual in scipy minimize. Indeed, all relevant warnings certainly occurred for the overall dataset.
- _one_hot_encoding
dict[bool,torch.LongTensor] Values of patients for each visit for each feature, but tensorized into a one-hot encoding (pdf or sf) Shapes of tensors are (n_individuals, n_visits_max, dimension, max_ordinal_level [-1 when sf=True])
- headers
- Raises:
LeaspyInputErrorif data, model or algo are not compatible together.
- Parameters:
- n_individuals¶
- indices¶
- headers: list[FeatureType]¶
- no_warning = False¶
- get_times_patient(i)[source]¶
Get ages for patient number
i- Parameters:
- i
int The index of the patient (<!> not its identifier)
- i
- Returns:
torch.Tensor, shape (n_obs_of_patient,)Contains float
- Parameters:
i (int)
- Return type:
torch.FloatTensor
- get_event_patient(idx_patient)[source]¶
Get ages at event for patient number
idx_patient- Parameters:
- idx_patient
int The index of the patient (<!> not its identifier)
- idx_patient
- Returns:
tuple[torch.Tensor,torch.Tensor] , shape (n_obs_of_patient,)Contains float
- Parameters:
idx_patient (int)
- Return type:
- get_covariates_patient(idx_patient)[source]¶
Get covariates for patient number
idx_patient- Parameters:
- idx_patient
int The index of the patient (<!> not its identifier)
- idx_patient
- Returns:
torch.Tensor, shape (n_obs_of_patient,)Contains float
- Raises:
ValueErrorIf the dataset has no covariates.
- Parameters:
idx_patient (int)
- Return type:
torch.IntTensor
- get_values_patient(i, *, adapt_for_model=None)[source]¶
Get values for patient number
i, with nans.- Parameters:
- i
int The index of the patient (<!> not its identifier)
- adapt_for_modelNone, default or
McmcSaemCompatibleModel The values returned are suited for this model. In particular:
For model with noise_model=’ordinal’ will return one-hot-encoded values [P(X = l), l=0..ordinal_max_level]
For model with noise_model=’ordinal_ranking’ will return survival function values [P(X > l), l=0..ordinal_max_level-1]
If None, we return the raw values, whatever the model is.
- i
- Returns:
torch.Tensor, shape (n_obs_of_patient, dimension [, extra_dimension_for_ordinal_models])Contains float or nans
- Parameters:
i (int)
- Return type:
torch.FloatTensor
- to_pandas(apply_headers=False)[source]¶
Convert dataset to a DataFrame with [‘ID’, ‘TIME’] index, with all covariates, events and repeated measures if apply_headers is False, and only the repeated measures otherwise.
- Parameters:
- apply_headers
bool Enable to select only the columns that are needed for leaspy fit (headers attribute)
- apply_headers
- Returns:
pandas.DataFrameDataFrame with index [‘ID’, ‘TIME’] and columns corresponding to the features, events and covariates.
- Raises:
LeaspyInputErrorIf the index of the DataFrame is not unique or contains invalid values.
- Parameters:
apply_headers (bool)
- Return type:
- move_to_device(device)[source]¶
Moves the dataset to the specified device.
- Parameters:
- device
torch.device
- device
- Parameters:
device (device)
- Return type:
None
- get_one_hot_encoding(*, sf, ordinal_infos)[source]¶
Builds the one-hot encoding of ordinal data once and for all and returns it.
- Parameters:
- sf
bool Whether the vector should be the survival function [1(X > l), l=0..max_level-1] instead of the probability density function [1(X=l), l=0..max_level]
- ordinal_infos
KwargsType All the hyperparameters concerning ordinal modelling (in particular maximum level per features)
- sf
- Returns:
torch.LongTensorOne-hot encoding of data values.
- Raises:
LeaspyInputErrorIf the values are not non-negative integers or if the features in ordinal_infos are not consistent with the dataset headers.
- Parameters:
sf (bool)
ordinal_infos (KwargsType)
- Return type:
torch.LongTensor