Quickstart with Leaspy¶
This example demonstrates how to quickly use Leaspy with properly formatted data.
Leaspy uses its own data container. To use it correctly, you need to provide either a CSV file or a pandas.DataFrame in long format.
Below is an example of synthetic longitudinal data illustrating how to use Leaspy:
from leaspy.datasets import load_dataset
alzheimer_df = load_dataset("alzheimer")
print(alzheimer_df.columns)
alzheimer_df = alzheimer_df[["MMSE", "RAVLT", "FAQ", "FDG PET"]]
print(alzheimer_df.head())
Index(['E-Cog Subject', 'E-Cog Study-partner', 'MMSE', 'RAVLT', 'FAQ',
'FDG PET', 'Hippocampus volume ratio'],
dtype='object')
MMSE RAVLT FAQ FDG PET
ID TIME
GS-001 73.973183 0.111998 0.510524 0.178827 0.454605
74.573181 0.029991 0.749223 0.181327 0.450064
75.173180 0.121922 0.779680 0.026179 0.662006
75.773186 0.092102 0.649391 0.156153 0.585949
75.973183 0.203874 0.612311 0.320484 0.634809
The data correspond to repeated visits (TIME index) of different participants (ID index).
Each visit corresponds to the measurement of 4 different outcomes : the MMSE, the RAVLT, the FAQ and the FDG PET.
Warning
You MUST include both ID and TIME, either as indices or as columns.
The remaining columns should correspond to the observed variables
(also called features or endpoints).
Each feature should have its own column, and each visit should occupy one row.
Warning
Leaspy supports linear and logistic models.
The features MUST be increasing over time.
For logistic models, data must be rescaled between 0 and 1.
from leaspy.io.data import Data
data = Data.from_dataframe(alzheimer_df)
See also
For a deeper understanding of the Data and Dataset classes, including
iteration, cofactors, and best practices, see the
Data Containers Guide.
The core functionality of Leaspy is to estimate the group-average trajectory of the variables measured in a population. To do this, you need to choose a model. For example, a logistic model can be initialized and fitted as follows:
from leaspy.models import LogisticModel
model = LogisticModel(name="test-model", source_dimension=2)
model.fit(
data,
"mcmc_saem",
seed=42,
n_iter=100,
progress_bar=False,
)
model.summary()
Fit with `AlgorithmName.FIT_MCMC_SAEM` took: 1.82s
================================================================================
Model Summary
================================================================================
Model Name: test-model
Model Type: LogisticModel
Features (4): MMSE, RAVLT, FAQ, FDG PET
Sources (2): Source 0 (s0), Source 1 (s1)
Observation Models: gaussian-scalar
Neg. Log-Likelihood: -7951.8618
Training Metadata
--------------------------------------------------------------------------------
Algorithm: AlgorithmName.FIT_MCMC_SAEM
Seed: 42
Iterations: 100
Data Context
--------------------------------------------------------------------------------
Subjects: 200
Visits: 1975
Total Observations: 7900
Leaspy Version: 2.0.1
================================================================================
Population Parameters
--------------------------------------------------------------------------------
betas_mean:
s0 s1
b0 0.0542 0.0574
b1 -0.0900 -0.0068
b2 0.0664 -0.0534
MMSE RAVLT FAQ FDG PET
log_g_mean 1.5245 -0.8326 0.5165 -0.3691
MMSE RAVLT FAQ FDG PET
log_v0_mean -3.3761 -3.5254 -2.2545 -3.6828
Individual Parameters
--------------------------------------------------------------------------------
tau_mean [78.5381]
tau_std [8.5363]
xi_std [0.5165]
Noise Model
--------------------------------------------------------------------------------
noise_std 0.0735
================================================================================
Leaspy can also estimate the individual trajectories of each participant.
This is done using a personalization algorithm, here scipy_minimize:
individual_parameters = model.personalize(
data, "scipy_minimize", seed=0, progress_bar=False, use_jacobian=False
)
print(individual_parameters.to_dataframe())
Personalize with `AlgorithmName.PERSONALIZE_SCIPY_MINIMIZE` took: 38.26s
sources_0 sources_1 tau xi
ID
GS-001 0.519938 0.350398 78.325272 -0.347083
GS-002 -0.727231 -0.153210 77.355064 -0.584110
GS-003 -0.231240 -0.893911 77.242165 0.068400
GS-004 0.139597 -0.115736 78.953514 0.428237
GS-005 0.236304 -1.879540 85.565277 -0.010133
... ... ... ... ...
GS-196 0.479973 -1.056671 73.667122 0.313890
GS-197 0.532045 1.018136 81.426926 -0.557547
GS-198 -0.119706 -0.098844 84.578064 0.161188
GS-199 -0.015778 -2.901355 94.292450 -0.156172
GS-200 0.926342 -0.821031 77.081177 0.782181
[200 rows x 4 columns]
We have seen how to fit a model and personalize it to individuals. Leaspy also provides various plotting functions to visualize the results. Let’s go to the next section to see how to plot the group-average trajectory and the individual trajectories using the Parkinson’s disease dataset.
To go further:
See the User Guide and full API documentation.
Explore additional examples.