Simulating Data with Leaspy¶
This example demonstrates how to use Leaspy to simulate longitudinal data based on a fitted model.
The following imports bring in the required modules and load the synthetic Parkinson dataset from Leaspy. A logistic model will be fitted on this dataset and then used to simulate new longitudinal data.
from leaspy.datasets import load_dataset
from leaspy.io.data import Data
df = load_dataset("parkinson")
The clinical and imaging features of interest are selected and the DataFrame is converted
into a Leaspy Data object that can be used for model fitting.
data = Data.from_dataframe(
df[
[
"MDS1_total",
"MDS2_total",
"MDS3_off_total",
"SCOPA_total",
"MOCA_total",
"REM_total",
"PUTAMEN_R",
"PUTAMEN_L",
"CAUDATE_R",
"CAUDATE_L",
]
]
)
A logistic model with a two-dimensional latent space is initialized.
from leaspy.models import LogisticModel
model = LogisticModel(name="test-model", source_dimension=2)
The model is fitted to the data using the MCMC-SAEM algorithm. A fixed seed is used for reproducibility and 100 iterations are performed.
model.fit(
data,
"mcmc_saem",
n_iter=100,
progress_bar=False,
)
Fit with `AlgorithmName.FIT_MCMC_SAEM` took: 4.17s
The parameters for simulating patient visits are defined. These parameters specify the number of patients, the visit spacing, and the timing variability.
visit_params = {
"patient_number": 5,
"visit_type": "random", # The visit type could also be 'dataframe' with df_visits.
# "df_visits": df_test # Example for custom visit schedule.
"first_visit_mean": 0.0, # The mean of the first visit age/time.
"first_visit_std": 0.4, # The standard deviation of the first visit age/time.
"time_follow_up_mean": 11, # The mean follow-up time.
"time_follow_up_std": 0.5, # The standard deviation of the follow-up time.
"distance_visit_mean": 2 / 12, # The mean spacing between visits in years.
"distance_visit_std": 0.75
/ 12, # The standard deviation of the spacing between visits in years.
"min_spacing_between_visits": 1, # The minimum allowed spacing between visits.
}
A new longitudinal dataset is simulated from the fitted model using the specified parameters.
df_sim = model.simulate(
algorithm="simulate",
features=[
"MDS1_total",
"MDS2_total",
"MDS3_off_total",
"SCOPA_total",
"MOCA_total",
"REM_total",
"PUTAMEN_R",
"PUTAMEN_L",
"CAUDATE_R",
"CAUDATE_L",
],
visit_parameters=visit_params,
)
Simulate with `simulate` took: 0.03s
The simulated data is converted back to a pandas DataFrame for inspection.
The simulated longitudinal dataset is displayed below.
df_sim.head(10)
| ID | TIME | MDS1_total | MDS2_total | MDS3_off_total | SCOPA_total | MOCA_total | REM_total | PUTAMEN_R | PUTAMEN_L | CAUDATE_R | CAUDATE_L | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 67.0 | 0.056451 | 0.058039 | 0.230159 | 0.062639 | 0.059808 | 0.114918 | 0.821016 | 0.714938 | 0.588145 | 0.526200 |
| 1 | 0 | 68.0 | 0.002603 | 0.027123 | 0.202961 | 0.081504 | 0.188143 | 0.253846 | 0.664004 | 0.864308 | 0.507004 | 0.616428 |
| 2 | 0 | 69.0 | 0.105405 | 0.046672 | 0.181350 | 0.098864 | 0.089694 | 0.242480 | 0.726539 | 0.663749 | 0.544198 | 0.588475 |
| 3 | 0 | 70.0 | 0.141412 | 0.215193 | 0.141201 | 0.338784 | 0.062953 | 0.304016 | 0.720180 | 0.801322 | 0.684520 | 0.622316 |
| 4 | 0 | 71.0 | 0.090930 | 0.115712 | 0.241580 | 0.188332 | 0.112838 | 0.313152 | 0.753145 | 0.799685 | 0.742272 | 0.700213 |
| 5 | 0 | 72.0 | 0.154567 | 0.161833 | 0.292684 | 0.425679 | 0.064513 | 0.214714 | 0.785086 | 0.703949 | 0.532459 | 0.654753 |
| 6 | 0 | 73.0 | 0.135401 | 0.200666 | 0.212516 | 0.164426 | 0.147430 | 0.243140 | 0.721107 | 0.951013 | 0.882440 | 0.801490 |
| 7 | 0 | 74.0 | 0.282612 | 0.249392 | 0.336619 | 0.197981 | 0.236832 | 0.218571 | 0.855503 | 0.755934 | 0.831121 | 0.675773 |
| 8 | 0 | 75.0 | 0.136870 | 0.180738 | 0.255311 | 0.161814 | 0.075605 | 0.194809 | 0.751832 | 0.853540 | 0.822887 | 0.730259 |
| 9 | 0 | 76.0 | 0.172981 | 0.251232 | 0.425894 | 0.443622 | 0.101551 | 0.231374 | 0.949818 | 0.850079 | 0.836381 | 0.516746 |
This concludes the simulation example using Leaspy. Stay tuned for more examples on model fitting and analysis!