leaspy.io.data.dataset

Classes

Dataset

Data container based on torch.Tensor, used to run algorithms.

Module Contents

class Dataset(data, *, no_warning=False)[source]

Data container based on torch.Tensor, used to run algorithms.

Parameters:
dataData

Create Dataset from Data object

no_warningbool, default False

Whether to deactivate warnings that are emitted by methods of this dataset instance. We may want to deactivate them because we rebuild a dataset per individual in scipy minimize. Indeed, all relevant warnings certainly occurred for the overall dataset.

Attributes:
headerslist [str]

Features names

dimensionint

Number of features

n_individualsint

Number of individuals

indiceslist [IDType]

Order of patients

event_timetorch.FloatTensor

Time of an event, if the event is censored, the time correspond to the last patient observation

event_booltorch.BoolTensor

Boolean to indicate if an event is censored or not: 1 observed, 0 censored

n_visits_per_individuallist [int]

Number of visits per individual

n_visits_maxint

Maximum number of visits for one individual

n_visitsint

Total number of visits

n_observations_per_ind_per_fttorch.LongTensor, shape (n_individuals, dimension)

Number of observations (not taking into account missing values) per individual per feature

n_observations_per_fttorch.LongTensor, shape (dimension,)

Total number of observations per feature

n_observationsint

Total number of observations

timepointstorch.FloatTensor, shape (n_individuals, n_visits_max)

Ages of patients at their different visits

valuestorch.FloatTensor, shape (n_individuals, n_visits_max, dimension)

Values of patients for each visit for each feature

masktorch.FloatTensor, shape (n_individuals, n_visits_max, dimension)

Binary mask associated to values. If 1: value is meaningful If 0: value is meaningless (either was nan or does not correspond to a real visit - only here for padding)

L2_norm_per_fttorch.FloatTensor, shape (dimension,)

Sum of all non-nan squared values, feature per feature

L2_normscalar torch.FloatTensor

Sum of all non-nan squared values

no_warningbool, default False

Whether to deactivate warnings that are emitted by methods of this dataset instance. We may want to deactivate them because we rebuild a dataset per individual in scipy minimize. Indeed, all relevant warnings certainly occurred for the overall dataset.

_one_hot_encodingdict [bool, torch.LongTensor]

Values of patients for each visit for each feature, but tensorized into a one-hot encoding (pdf or sf) Shapes of tensors are (n_individuals, n_visits_max, dimension, max_ordinal_level [-1 when sf=True])

Raises:
LeaspyInputError

if data, model or algo are not compatible together.

Parameters:
n_individuals
indices
headers: list[FeatureType]
dimension: int
n_visits: int
timepoints: torch.FloatTensor | None = None
values: torch.FloatTensor | None = None
mask: torch.FloatTensor | None = None
n_observations: int | None = None
n_observations_per_ft: torch.LongTensor | None = None
n_observations_per_ind_per_ft: torch.LongTensor | None = None
n_visits_per_individual: list[int] | None = None
n_visits_max: int | None = None
event_time_name: str | None
event_bool_name: str | None
event_time: torch.FloatTensor | None = None
event_bool: torch.IntTensor | None = None
covariate_names: list[str] | None
covariates: torch.IntTensor | None = None
L2_norm_per_ft: torch.FloatTensor | None = None
L2_norm: torch.FloatTensor | None = None
no_warning = False
get_times_patient(i)[source]

Get ages for patient number i

Parameters:
iint

The index of the patient (<!> not its identifier)

Returns:
torch.Tensor, shape (n_obs_of_patient,)

Contains float

Parameters:

i (int)

Return type:

torch.FloatTensor

get_event_patient(idx_patient)[source]

Get ages at event for patient number idx_patient

Parameters:
idx_patientint

The index of the patient (<!> not its identifier)

Returns:
tuple [torch.Tensor, torch.Tensor] , shape (n_obs_of_patient,)

Contains float

Parameters:

idx_patient (int)

Return type:

tuple[Tensor, Tensor]

get_covariates_patient(idx_patient)[source]

Get covariates for patient number idx_patient

Parameters:
idx_patientint

The index of the patient (<!> not its identifier)

Returns:
torch.Tensor, shape (n_obs_of_patient,)

Contains float

Raises:
ValueError

If the dataset has no covariates.

Parameters:

idx_patient (int)

Return type:

torch.IntTensor

get_values_patient(i, *, adapt_for_model=None)[source]

Get values for patient number i, with nans.

Parameters:
iint

The index of the patient (<!> not its identifier)

adapt_for_modelNone, default or McmcSaemCompatibleModel

The values returned are suited for this model. In particular:

  • For model with noise_model=’ordinal’ will return one-hot-encoded values [P(X = l), l=0..ordinal_max_level]

  • For model with noise_model=’ordinal_ranking’ will return survival function values [P(X > l), l=0..ordinal_max_level-1]

If None, we return the raw values, whatever the model is.

Returns:
torch.Tensor, shape (n_obs_of_patient, dimension [, extra_dimension_for_ordinal_models])

Contains float or nans

Parameters:

i (int)

Return type:

torch.FloatTensor

to_pandas(apply_headers=False)[source]

Convert dataset to a DataFrame with [‘ID’, ‘TIME’] index, with all covariates, events and repeated measures if apply_headers is False, and only the repeated measures otherwise.

Parameters:
apply_headersbool

Enable to select only the columns that are needed for leaspy fit (headers attribute)

Returns:
pandas.DataFrame

DataFrame with index [‘ID’, ‘TIME’] and columns corresponding to the features, events and covariates.

Raises:
LeaspyInputError

If the index of the DataFrame is not unique or contains invalid values.

Parameters:

apply_headers (bool)

Return type:

DataFrame

move_to_device(device)[source]

Moves the dataset to the specified device.

Parameters:
devicetorch.device
Parameters:

device (device)

Return type:

None

get_one_hot_encoding(*, sf, ordinal_infos)[source]

Builds the one-hot encoding of ordinal data once and for all and returns it.

Parameters:
sfbool

Whether the vector should be the survival function [1(X > l), l=0..max_level-1] instead of the probability density function [1(X=l), l=0..max_level]

ordinal_infosKwargsType

All the hyperparameters concerning ordinal modelling (in particular maximum level per features)

Returns:
torch.LongTensor

One-hot encoding of data values.

Raises:
LeaspyInputError

If the values are not non-negative integers or if the features in ordinal_infos are not consistent with the dataset headers.

Parameters:
Return type:

torch.LongTensor