leaspy.io.data.dataset¶

Classes¶

Dataset

Data container based on torch.Tensor, used to run algorithms.

Module Contents¶

class Dataset(data, *, no_warning=False)[source]¶

Data container based on torch.Tensor, used to run algorithms.

Parameters:

dataData: Create Dataset from Data object
no_warningbool, default False: Whether to deactivate warnings that are emitted by methods of this dataset instance. We may want to deactivate them because we rebuild a dataset per individual in scipy minimize. Indeed, all relevant warnings certainly occurred for the overall dataset.

Attributes:

headerslist [str]: Features names
dimensionint: Number of features
n_individualsint: Number of individuals
indiceslist [IDType]: Order of patients
event_timetorch.FloatTensor: Time of an event, if the event is censored, the time correspond to the last patient observation
event_booltorch.BoolTensor: Boolean to indicate if an event is censored or not: 1 observed, 0 censored
n_visits_per_individuallist [int]: Number of visits per individual
n_visits_maxint: Maximum number of visits for one individual
n_visitsint: Total number of visits
n_observations_per_ind_per_fttorch.LongTensor, shape (n_individuals, dimension): Number of observations (not taking into account missing values) per individual per feature
n_observations_per_fttorch.LongTensor, shape (dimension,): Total number of observations per feature
n_observationsint: Total number of observations
timepointstorch.FloatTensor, shape (n_individuals, n_visits_max): Ages of patients at their different visits
valuestorch.FloatTensor, shape (n_individuals, n_visits_max, dimension): Values of patients for each visit for each feature
masktorch.FloatTensor, shape (n_individuals, n_visits_max, dimension): Binary mask associated to values. If 1: value is meaningful If 0: value is meaningless (either was nan or does not correspond to a real visit - only here for padding)
L2_norm_per_fttorch.FloatTensor, shape (dimension,): Sum of all non-nan squared values, feature per feature
L2_normscalar torch.FloatTensor: Sum of all non-nan squared values
no_warningbool, default False: Whether to deactivate warnings that are emitted by methods of this dataset instance. We may want to deactivate them because we rebuild a dataset per individual in scipy minimize. Indeed, all relevant warnings certainly occurred for the overall dataset.
_one_hot_encodingdict [bool, torch.LongTensor]: Values of patients for each visit for each feature, but tensorized into a one-hot encoding (pdf or sf) Shapes of tensors are (n_individuals, n_visits_max, dimension, max_ordinal_level [-1 when sf=True])

Raises:

LeaspyInputError: if data, model or algo are not compatible together.

Parameters:

data (Data)
no_warning (bool)

n_individuals¶

indices¶

headers: list[FeatureType]¶

dimension: int¶

n_visits: int¶

timepoints: torch.FloatTensor | None = None¶

values: torch.FloatTensor | None = None¶

mask: torch.FloatTensor | None = None¶

n_observations: int | None = None¶

n_observations_per_ft: torch.LongTensor | None = None¶

n_observations_per_ind_per_ft: torch.LongTensor | None = None¶

n_visits_per_individual: list[int] | None = None¶

n_visits_max: int | None = None¶

event_time_name: str | None¶

event_bool_name: str | None¶

event_time: torch.FloatTensor | None = None¶

event_bool: torch.IntTensor | None = None¶

covariate_names: list[str] | None¶

covariates: torch.IntTensor | None = None¶

L2_norm_per_ft: torch.FloatTensor | None = None¶

L2_norm: torch.FloatTensor | None = None¶

no_warning = False¶

get_times_patient(i)[source]¶

Get ages for patient number i

Parameters:

iint: The index of the patient (<!> not its identifier)

Returns:

torch.Tensor, shape (n_obs_of_patient,): Contains float

Parameters:

i (int)

Return type:

torch.FloatTensor

get_event_patient(idx_patient)[source]¶

Get ages at event for patient number idx_patient

Parameters:

idx_patientint: The index of the patient (<!> not its identifier)

Returns:

tuple [torch.Tensor, torch.Tensor] , shape (n_obs_of_patient,): Contains float

Parameters:

idx_patient (int)

Return type:

tuple[Tensor, Tensor]

get_covariates_patient(idx_patient)[source]¶

Get covariates for patient number idx_patient

Parameters:

idx_patientint: The index of the patient (<!> not its identifier)

Returns:

torch.Tensor, shape (n_obs_of_patient,): Contains float

Raises:

ValueError: If the dataset has no covariates.

Parameters:

idx_patient (int)

Return type:

torch.IntTensor

get_values_patient(i, *, adapt_for_model=None)[source]¶

Get values for patient number i, with nans.

Parameters:

iint

The index of the patient (<!> not its identifier)

adapt_for_modelNone, default or McmcSaemCompatibleModel

The values returned are suited for this model. In particular:

For model with noise_model=’ordinal’ will return one-hot-encoded values [P(X = l), l=0..ordinal_max_level]

For model with noise_model=’ordinal_ranking’ will return survival function values [P(X > l), l=0..ordinal_max_level-1]

If None, we return the raw values, whatever the model is.

Returns:

torch.Tensor, shape (n_obs_of_patient, dimension [, extra_dimension_for_ordinal_models]): Contains float or nans

Parameters:

i (int)

Return type:

torch.FloatTensor

to_pandas(apply_headers=False)[source]¶

Convert dataset to a DataFrame with [‘ID’, ‘TIME’] index, with all covariates, events and repeated measures if apply_headers is False, and only the repeated measures otherwise.

Parameters:

apply_headersbool: Enable to select only the columns that are needed for leaspy fit (headers attribute)

Returns:

pandas.DataFrame: DataFrame with index [‘ID’, ‘TIME’] and columns corresponding to the features, events and covariates.

Raises:

LeaspyInputError: If the index of the DataFrame is not unique or contains invalid values.

Parameters:

apply_headers (bool)

Return type:

DataFrame

move_to_device(device)[source]¶

Moves the dataset to the specified device.

Parameters:

devicetorch.device

Parameters:

device (device)

Return type:

None

get_one_hot_encoding(*, sf, ordinal_infos)[source]¶

Builds the one-hot encoding of ordinal data once and for all and returns it.

Parameters:

sfbool: Whether the vector should be the survival function [1(X > l), l=0..max_level-1] instead of the probability density function [1(X=l), l=0..max_level]
ordinal_infosKwargsType: All the hyperparameters concerning ordinal modelling (in particular maximum level per features)

Returns:

torch.LongTensor: One-hot encoding of data values.

Raises:

LeaspyInputError: If the values are not non-negative integers or if the features in ordinal_infos are not consistent with the dataset headers.

Parameters:

sf (bool)
ordinal_infos (KwargsType)

Return type:

torch.LongTensor