leaspy.io.data¶
Submodules¶
- leaspy.io.data.abstract_dataframe_data_reader
- leaspy.io.data.covariate_dataframe_data_reader
- leaspy.io.data.data
- leaspy.io.data.dataset
- leaspy.io.data.event_dataframe_data_reader
- leaspy.io.data.factory
- leaspy.io.data.individual_data
- leaspy.io.data.joint_dataframe_data_reader
- leaspy.io.data.visit_dataframe_data_reader
Attributes¶
Classes¶
Methods to convert |
|
Main data container for a collection of individuals |
|
Data container based on |
|
Methods to convert |
|
Enumeration defining the possible names for observation models. |
|
Container for an individual's data |
|
Methods to convert |
|
Methods to convert |
Functions¶
|
Factory for observation models. |
Package Contents¶
- class AbstractDataframeDataReader[source]¶
Methods to convert
pandas.DataFrameto Leaspy-compliant data containers.- Raises:
- time_rounding_digits = 6¶
- individuals: dict[IDType, IndividualData]¶
- read(df, *, drop_full_nan=True, sort_index=False, warn_empty_column=True)[source]¶
The method that effectively reads the input dataframe (automatically called in __init__).
- Parameters:
- df
pandas.DataFrame The dataframe to read.
- drop_full_nanbool
Should we drop rows full of nans? (except index)
- sort_indexbool
Should we lexsort index? (Keep False as default so not to break many of the downstream tests that check order…)
- warn_empty_columnbool
Should we warn when there are empty columns?
- df
- Parameters:
- Return type:
None
- class Data[source]¶
Bases:
collections.abc.IterableMain data container for a collection of individuals
It can be iterated over and sliced, both of these operations being applied to the underlying individuals attribute.
- Attributes:
- individuals
Dict[IDType,IndividualData] Included individuals and their associated data
- iter_to_idx
Dict[int,IDType] Maps an integer index to the associated individual ID
- headers
List[FeatureType] Feature names
- dimension
int Number of features
- n_individuals
int Number of individuals
- n_visits
int Total number of visits
- cofactors
List[FeatureType] Feature names corresponding to cofactors
- event_time_name
str Name of the header that store the time at event in the original dataframe
- event_bool_name
str Name of the header that store the bool at event (censored or observed) in the original dataframe
- individuals
- individuals: dict[IDType, IndividualData]¶
- headers: list[FeatureType] | None = None¶
- property cofactors: list[FeatureType]¶
Feature names corresponding to cofactors
- Returns:
List[FeatureType]:List of feature names corresponding to cofactors.
- Return type:
- load_cofactors(df, *, cofactors=None)[source]¶
Load cofactors from a pandas.DataFrame to the Data object
- Parameters:
- df
pandas.DataFrame The dataframe where the cofactors are stored. Its index should be ID, the identifier of subjects and it should uniquely index the dataframe (i.e. one row per individual).
- cofactors
List[FeatureType], optional Names of the column(s) of dataframe which shall be loaded as cofactors. If None, all the columns from the input dataframe will be loaded as cofactors. Default: None
- df
- Parameters:
df (DataFrame)
cofactors (Optional[list[FeatureType]])
- Return type:
None
- static from_csv_file(path, data_type='visit', *, pd_read_csv_kws={}, facto_kws={}, **df_reader_kws)[source]¶
Create a Data object from a CSV file.
- Parameters:
- path
str Path to the CSV file to load (with extension)
- data_type
str Type of data to read. Can be ‘visit’ or ‘event’.
- pd_read_csv_kws
dict Keyword arguments that are sent to
pandas.read_csv()- facto_kws
dict Keyword arguments
- **df_reader_kws
Keyword arguments that are sent to
AbstractDataframeDataReadertodataframe_data_reader_factory()
- path
- Returns:
Data:A Data object containing the data from the CSV file.
- Parameters:
- Return type:
- to_dataframe(*, cofactors=None, reset_index=True)[source]¶
Convert the Data object to a
pandas.DataFrame- Parameters:
- cofactors
List[FeatureType] orint, optional Cofactors to include in the DataFrame. If None (default), no cofactors are included. If “all”, all the available cofactors are included. Default: None
- reset_index
bool, optional Whether to reset index levels in output. Default: True
- cofactors
- Returns:
pandas.DataFrame:A DataFrame containing the individuals’ ID, timepoints and associated observations (optional - and cofactors).
- Raises:
LeaspyDataInputErrorIf the Data object does not contain any cofactors.
LeaspyTypeErrorIf the cofactors argument is not of a valid type.
- Parameters:
cofactors (Optional[Union[list[FeatureType], str]])
reset_index (bool)
- Return type:
- static from_dataframe(df, data_type='visit', factory_kws={}, **kws)[source]¶
Create a Data object from a
DataFrame.- Parameters:
- df
pandas.DataFrame Dataframe containing ID, TIME and features.
- data_type
str Type of data to read. Can be ‘visit’, ‘event’, ‘joint’
- factory_kws
Dict Keyword arguments that are sent to
dataframe_data_reader_factory()- **kws
Keyword arguments that are sent to
DataframeDataReader
- df
- Returns:
Data
- Parameters:
- Return type:
- static from_individual_values(indices, timepoints=None, values=None, headers=None, event_time_name=None, event_bool_name=None, event_time=None, event_bool=None, covariate_names=None, covariates=None)[source]¶
Construct Data from a collection of individual data points
- Parameters:
- indices
List[IDType] List of the individuals’ unique ID
- timepoints
List[List[float]] For each individual
i, list of timepoints associated with the observations. The number of such timepoints is notedn_timepoints_i- values
List[array-like[float,2D]] For each individual
i, two-dimensional array-like object containing observed data points. Its expected shape is(n_timepoints_i, n_features)- headers
List[FeatureType] Feature names. The number of features is noted
n_features
- indices
- Returns:
Data:A Data object containing the individuals and their data.
- Parameters:
headers (Optional[list[FeatureType]])
event_time_name (Optional[str])
event_bool_name (Optional[str])
- Return type:
- static from_individuals(individuals, headers=None, event_time_name=None, event_bool_name=None, covariate_names=None)[source]¶
Construct Data from a list of individuals
- Parameters:
- individuals
List[IndividualData] List of individuals
- headers
List[FeatureType] List of feature names
- individuals
- Returns:
Data:A Data object containing the individuals and their data.
- Parameters:
individuals (list[IndividualData])
headers (Optional[list[FeatureType]])
event_time_name (Optional[str])
event_bool_name (Optional[str])
- Return type:
- extract_longitudinal_only()[source]¶
Extract longitudinal data from the Data object
- Returns:
Data:A Data object containing only longitudinal data.
- Raises:
LeaspyDataInputErrorIf the Data object does not contain any longitudinal data.
- Return type:
- class Dataset(data, *, no_warning=False)[source]¶
Data container based on
torch.Tensor, used to run algorithms.- Parameters:
- data
Data Create Dataset from Data object
- no_warning
bool, default False Whether to deactivate warnings that are emitted by methods of this dataset instance. We may want to deactivate them because we rebuild a dataset per individual in scipy minimize. Indeed, all relevant warnings certainly occurred for the overall dataset.
- data
- Attributes:
- headers
list[str] Features names
- dimension
int Number of features
- n_individuals
int Number of individuals
- indices
list[IDType] Order of patients
- event_time
torch.FloatTensor Time of an event, if the event is censored, the time correspond to the last patient observation
- event_bool
torch.BoolTensor Boolean to indicate if an event is censored or not: 1 observed, 0 censored
- n_visits_per_individual
list[int] Number of visits per individual
- n_visits_max
int Maximum number of visits for one individual
- n_visits
int Total number of visits
- n_observations_per_ind_per_ft
torch.LongTensor, shape (n_individuals, dimension) Number of observations (not taking into account missing values) per individual per feature
- n_observations_per_ft
torch.LongTensor, shape (dimension,) Total number of observations per feature
- n_observations
int Total number of observations
- timepoints
torch.FloatTensor, shape (n_individuals, n_visits_max) Ages of patients at their different visits
- values
torch.FloatTensor, shape (n_individuals, n_visits_max, dimension) Values of patients for each visit for each feature
- mask
torch.FloatTensor, shape (n_individuals, n_visits_max, dimension) Binary mask associated to values. If 1: value is meaningful If 0: value is meaningless (either was nan or does not correspond to a real visit - only here for padding)
- L2_norm_per_ft
torch.FloatTensor, shape (dimension,) Sum of all non-nan squared values, feature per feature
- L2_normscalar
torch.FloatTensor Sum of all non-nan squared values
- no_warning
bool, default False Whether to deactivate warnings that are emitted by methods of this dataset instance. We may want to deactivate them because we rebuild a dataset per individual in scipy minimize. Indeed, all relevant warnings certainly occurred for the overall dataset.
- _one_hot_encoding
dict[bool,torch.LongTensor] Values of patients for each visit for each feature, but tensorized into a one-hot encoding (pdf or sf) Shapes of tensors are (n_individuals, n_visits_max, dimension, max_ordinal_level [-1 when sf=True])
- headers
- Raises:
LeaspyInputErrorif data, model or algo are not compatible together.
- Parameters:
- n_individuals¶
- indices¶
- headers: list[FeatureType]¶
- no_warning = False¶
- get_times_patient(i)[source]¶
Get ages for patient number
i- Parameters:
- i
int The index of the patient (<!> not its identifier)
- i
- Returns:
torch.Tensor, shape (n_obs_of_patient,)Contains float
- Parameters:
i (int)
- Return type:
torch.FloatTensor
- get_event_patient(idx_patient)[source]¶
Get ages at event for patient number
idx_patient- Parameters:
- idx_patient
int The index of the patient (<!> not its identifier)
- idx_patient
- Returns:
tuple[torch.Tensor,torch.Tensor] , shape (n_obs_of_patient,)Contains float
- Parameters:
idx_patient (int)
- Return type:
- get_covariates_patient(idx_patient)[source]¶
Get covariates for patient number
idx_patient- Parameters:
- idx_patient
int The index of the patient (<!> not its identifier)
- idx_patient
- Returns:
torch.Tensor, shape (n_obs_of_patient,)Contains float
- Raises:
ValueErrorIf the dataset has no covariates.
- Parameters:
idx_patient (int)
- Return type:
torch.IntTensor
- get_values_patient(i, *, adapt_for_model=None)[source]¶
Get values for patient number
i, with nans.- Parameters:
- i
int The index of the patient (<!> not its identifier)
- adapt_for_modelNone, default or
McmcSaemCompatibleModel The values returned are suited for this model. In particular:
For model with noise_model=’ordinal’ will return one-hot-encoded values [P(X = l), l=0..ordinal_max_level]
For model with noise_model=’ordinal_ranking’ will return survival function values [P(X > l), l=0..ordinal_max_level-1]
If None, we return the raw values, whatever the model is.
- i
- Returns:
torch.Tensor, shape (n_obs_of_patient, dimension [, extra_dimension_for_ordinal_models])Contains float or nans
- Parameters:
i (int)
- Return type:
torch.FloatTensor
- to_pandas(apply_headers=False)[source]¶
Convert dataset to a DataFrame with [‘ID’, ‘TIME’] index, with all covariates, events and repeated measures if apply_headers is False, and only the repeated measures otherwise.
- Parameters:
- apply_headers
bool Enable to select only the columns that are needed for leaspy fit (headers attribute)
- apply_headers
- Returns:
pandas.DataFrameDataFrame with index [‘ID’, ‘TIME’] and columns corresponding to the features, events and covariates.
- Raises:
LeaspyInputErrorIf the index of the DataFrame is not unique or contains invalid values.
- Parameters:
apply_headers (bool)
- Return type:
- move_to_device(device)[source]¶
Moves the dataset to the specified device.
- Parameters:
- device
torch.device
- device
- Parameters:
device (device)
- Return type:
None
- get_one_hot_encoding(*, sf, ordinal_infos)[source]¶
Builds the one-hot encoding of ordinal data once and for all and returns it.
- Parameters:
- sf
bool Whether the vector should be the survival function [1(X > l), l=0..max_level-1] instead of the probability density function [1(X=l), l=0..max_level]
- ordinal_infos
KwargsType All the hyperparameters concerning ordinal modelling (in particular maximum level per features)
- sf
- Returns:
torch.LongTensorOne-hot encoding of data values.
- Raises:
LeaspyInputErrorIf the values are not non-negative integers or if the features in ordinal_infos are not consistent with the dataset headers.
- Parameters:
sf (bool)
ordinal_infos (KwargsType)
- Return type:
torch.LongTensor
- class EventDataframeDataReader(*, event_time_name='EVENT_TIME', event_bool_name='EVENT_BOOL', nb_events=None)[source]¶
Bases:
leaspy.io.data.abstract_dataframe_data_reader.AbstractDataframeDataReaderMethods to convert
pandas.DataFrameto Leaspy-compliant data containers for event data only.- Parameters:
- event_time_name: str
Name of the columns in dataframe that contains the time of event
- event_bool_name: str
Name of the columns in dataframe that contains if the event is censored of not
- Raises:
- Parameters:
- event_time_name = 'EVENT_TIME'¶
- event_bool_name = 'EVENT_BOOL'¶
- nb_events = None¶
- DataframeDataReaderFactoryInput¶
- class DataframeDataReaderNames(*args, **kwds)[source]¶
Bases:
enum.EnumEnumeration defining the possible names for observation models.
- EVENT = 'event'¶
- VISIT = 'visit'¶
- JOINT = 'joint'¶
- COVARIATE = 'covariate'¶
- classmethod from_string(reader_name)[source]¶
Returns the enum member corresponding to the given string.
- Parameters:
- reader_name
str The name of the reader, case-insensitive.
- reader_name
- Returns:
DataframeDataReaderNamesThe corresponding enum member.
- Raises:
NotImplementedErrorIf the provided reader_name does not match any of the enum members and is not implemented. Give the valid names in the error message.
- Parameters:
reader_name (str)
- dataframe_data_reader_factory(reader, **kwargs)[source]¶
Factory for observation models.
- Parameters:
- model
strorobs_modelsordict[str, …] If
obs_models, returns the instance.If a string, then returns a new instance of the appropriate class (with optional parameters kws).
If a dictionary, it must contain the ‘name’ key and other initialization parameters.
- **kwargs
Optional parameters for initializing the requested observation model when a string.
- model
- Returns:
AbstractDataframeDataReaderThe desired observation model.
- Raises:
LeaspyModelInputErrorIf model is not supported.
- Parameters:
reader (DataframeDataReaderFactoryInput)
- Return type:
- class IndividualData(idx)[source]¶
Container for an individual’s data
- Parameters:
- idxIDType
Unique ID
- Attributes:
- idx
IDType Unique ID
- timepoints
np.ndarray[float] Timepoints associated with the observations 1D array
- observations
np.ndarray[float] Observed data points, Shape is
(n_timepoints, n_features)- cofactors
dict[FeatureType,Any] Cofactors in the form {cofactor_name: cofactor_value}
- event_time
float Time of an event, if the event is censored, the time correspond to the last patient observation
- event_bool
bool Boolean to indicate if an event is censored or not: 1 observed, 0 censored
- idx
- Parameters:
idx (IDType)
- cofactors: dict[FeatureType, Any]¶
- add_observations(timepoints, observations)[source]¶
Include new observations and associated timepoints
- add_cofactors(cofactors)[source]¶
Include new cofactors
- Parameters:
- cofactors
dict[FeatureType,Any] Cofactors to include, in the form {name: value}
- cofactors
- Raises:
- Parameters:
cofactors (dict[FeatureType, Any])
- Return type:
None
- to_frame(headers, event_time_name, event_bool_name, covariate_names)[source]¶
Convert the individual data to a pandas DataFrame
- Parameters:
- Returns:
pd.DataFrame- DataFrame containing the individual’s data with the following columns:
ID: Unique identifier for the individual
TIME: Timepoints associated with the observations
Observations: Observed data points for each feature
Event Time: Time of the event (if any)
Event Boolean: Boolean indicating if the event was observed (1) or censored (0)
Covariates: Values of the covariates for the individual
- Parameters:
- Return type:
- class JointDataframeDataReader(*, event_time_name='EVENT_TIME', event_bool_name='EVENT_BOOL', nb_events=None)[source]¶
Bases:
leaspy.io.data.abstract_dataframe_data_reader.AbstractDataframeDataReaderMethods to convert
pandas.DataFrameto Leaspy-compliant data containers for event data and longitudinal data.- Parameters:
- event_time_name: str
Name of the columns in dataframe that contains the time of event
- event_bool_name: str
Name of the columns in dataframe that contains if the event is censored of not
- Raises:
- Parameters:
- tol_diff = 0.001¶
- visit_reader¶
- event_reader¶
- property dimension: int | None¶
Number of longitudinal outcomes in dataset.
- Return type:
Optional[int]
- property long_outcome_names: list[FeatureType]¶
Name of the longitudinal outcomes in dataset
- Return type:
- class VisitDataframeDataReader[source]¶
Bases:
leaspy.io.data.abstract_dataframe_data_reader.AbstractDataframeDataReaderMethods to convert
pandas.DataFrameto Leaspy-compliant data containers for longitudinal data only. Raises ——LeaspyDataInputError