3. Models

3.1. Introduction to Spatio-Temporal Models

3.1.1. Temporal Random Effects

Individual temporal variability for patient \(i\) is modeled with the latent disease age \(\psi_i(t)\) :

\[ \psi_i(t) = e^{\xi_i}(t - \tau_i) + t_0 \]

where:

The longitudinal \( \gamma_i(t)\) and survival \(S_i(t)\) processes are derived from \( \psi_i(t) \).

Key Hypothesis: Longitudinal and survival processes are linked by a shared latent disease age.

3.1.2. Spatial Random Effects

Disease presentation variability is captured by space-shifts : \(\mathbf{w}_i = \mathbf{A} \mathbf{s}_i\) where:

  • \(\mathbf{A}\): mixing matrix (dimension reduction with \(N_s \leq K-1 \) independent sources: \(N_s\) being the number of sources and \(K\) the number of outcomes).

  • \( \mathbf{s}_i\): Independent sources

For identifiability, \( \mathbf{A} \) is defined as a linear combination of an orthonormal basis \( (\mathbf{B}_k)_{1 \leq k \leq K} \) orthogonal to \( \text{Span}(\mathbf{v}_0) \) [36], so that:

\[ \mathbf{A} = (\mathbf{B}\beta)^T \]

with \(\beta\) the matrix of coefficients.

Each event ( \(l\) ) has a survival shift:

\[ u_{i,l} = \sum_{m=1}^{N_s} \zeta_{l,m} s_{i,m} \]

Interpretation: Space shifts (\(w_{i,k}\)) are more interpretable than sources (\(s_i \)), as they encapsulate total spatial variability effects.

3.2. Logistic Model

3.2.1. Definition

The logistic model is a statistical model used to describe sigmoidal trajectories of longitudinal outcomes over time, typically biomarkers or clinical scores. This model is particularly useful when outcomes exhibit a monotonic progression from one asymptote to another, also called as bounded outcome scores, which is common in disease progression [20]. The logistic model is governed by parameters, which control its speed, inflection point, and asymmetry.

In Leaspy, the logistic model is implemented using a non-linear mixed-effects framework, where the disease evolution is described through a latent time variable (latent disease age) shared across outcomes, and individual variations are modeled through subject-specific parameters [8].

This model provides interpretable components:

  • A population-level trajectory defined by a logistic function,

  • Individual deviations from this curve through time reparametrization and spatial shifts,

  • The ability to simulate, estimate, and personalize trajectories for unseen individuals.

3.2.2. Data

A logistic model is relevant when you have:

  • Longitudinal repeated measurements of one or more continuous outcomes (e.g., cognitive scores, clinical scores, biomarkers)

  • Monotonic or sigmoidal progression of outcomes over time (typically increasing or decreasing to a plateau)

  • No event or survival information required, unlike joint models

To fit a logistic model, you need a dataframe with the following columns:

  • ID: Patient identifier

  • TIME: Time of measurement

  • One or more columns representing the longitudinal outcomes (e.g., OUTCOME_1, OUTCOME_2, …)

For the importation of dataframe:

dataset = dataframe.set_index(["ID", "TIME"]).sort_index()
print(dataset.head())

                        OUTCOME_1  OUTCOME_2
      ID    TIME
132-S2-0  81.661          0.44444    0.04000
          82.136          0.60000    0.56000
          82.682          0.39267    0.04000
          83.139          0.58511    0.30000
          83.691          0.57044    0.05040

data_logistic = Data.from_dataframe(dataset, "visit")

3.2.3. Mathematical background

The logistic trajectory for outcome \(k\), and from the latent disease age \(\psi_i(t)\) is defined by:

\[ \gamma_{i,k}(t) = \left[ 1 + g_k \times \exp\left( -\frac{(1+g_k)^2}{g_k} \left( v_{0,k}(\psi_i(t) - t_0) + w_{i,k} \right) \right) \right]^{-1} \]

where:

  • \(\gamma_{i,k}(t)\) is the modeled outcome value,

  • \(t_0\) is the population reference time

  • \(v_{0,k}\) is the speed of progression for outcome \(k\) at reference time \(t_0\),

  • \(w_{i,k}\) is the individual space shift for outcome \(k\).

  • \(\frac{1}{1+g_k}\) is the value of the logistic curve at \(t_0\)

  • \(\psi_i(t)\) is the latent disease age, please have a look to part introduction to spatio-temporal models for more details.

3.3. Joint Model

3.3.1. Definition

Joint models are a class of statistical models that simultaneously analyze longitudinal data and survival data [2, 14]. Unlike traditional approaches that treat these processes separately, joint models integrate them into a unified framework, recognizing that they often share underlying biological mechanisms—for example, a slowly progressing biomarker may signal an increased risk of a clinical event. By linking the two submodels—typically through shared random effects [34] or latent processes [33], fraitly [35], models account for their interdependence, reducing biases from informative dropout or measurement error [38].

In Leaspy, the joint model [27] is implemented as a longitudinal spatio-temporal model, and a survival model, that are linked through a shared latent disease age, and, in the case of multiple longitudinal outcomes, spatial random effects (see description in first paragraph of this page). This approach allows for the incorporation of both temporal and spatial random effects, providing a more comprehensive understanding of the underlying disease process.

3.3.2. Data

A joint model is relevant when you have:

  • Longitudinal measurements as repeated biomarker readings, clinical scores

  • Time-to-event outcomes as survival, dropout, or failure events

  • A suspected association between the longitudinal process and event risk

You must have one dataframe with the following columns:

  • ID: Patient identifier

  • TIME: Time of measurement

  • EVENT_TIME: Time of event

  • EVENT_BOOL: Event indicator (1 if event occurred, 0 if censored and 2 if competing event)

For one patient, the event time and event bool are the same for each row.

For the importation of dataframe:

dataset = dataframe.set_index(["ID", "TIME"]).sort_index()
print(dataset.head())

                                OUTCOME_1  OUTCOME_2  EVENT_TIME  EVENT_BOOL
        ID              TIME                                                           
132-S2-0              81.661      0.44444    0.04000        84.0           1
           82.13600000000001      0.60000    0.00000        84.0           1
                      82.682      0.39267    0.04000        84.0           1
                      83.139      0.58511    0.00000        84.0           1
                      83.691      0.57044    0.00000        84.0           1
184-S2-0              80.994      0.32600    0.00000        83.4           0
           81.47199999999998      0.35556    0.00000        83.4           0
           82.08800000000001      0.40000    0.00000        83.4           0
                      82.488      0.31111    0.00000        83.4           0

data_joint = Data.from_dataframe(dataset, "joint")

3.3.3. Mathematical background

3.3.3.1. Longitudinal Submodel

The longitudinal submodel that can be used here is the logistic model, please have a look to part description logistic model for more details.

3.3.3.2. Survival Submodel

Cause-Specific Weibull Hazards (for competing risks):

This submodel captures how variations in the progression of longitudinal disease outcomes influence the probability and timing of multiple clinical events, while accounting for censoring and competing risks. To achieve this, the leaspy joint model uses a cause-specific hazard structure.

For each event \(l\) and patient \(i\), we model a cause-specific hazard \(h_{i,l}(t)\) [32]. This framework allows us to estimate the risk of each event separately and account for the presence of competing risks (where one event precludes others).

A Weibull distribution is used to model time-to-event data due to its flexibility in representing:

  • Increasing, decreasing, or constant hazard shapes,

  • Dependence on two interpretable parameters:

    • Scale parameter \(\nu_l\),

    • Shape parameter \(\rho_l\).

The hazard is modulated via a Cox-proportional hazard framework to incorporate the effect of longitudinal biomarkers using survival shifts.

Let \(u_{i} = \zeta s_{i}\), \(\zeta\) being the survival shift and \(s\) sources. The cause-specific hazard \(h_{i,l}(t)\) for event \(l\) and subject \(i\) is defined as:

\[ h_{i,l}(t) = h_{0,i,l}(t) \cdot \exp(u_{i,l}) \]

Expanding the baseline hazard \(h_{0,i,l}(t)\), we get:

\[ h_{i,l}(t) = \frac{\rho_l e^{\xi_i}}{\nu_l} \cdot \left( \frac{e^{\xi_i} (t - \tau_i)}{\nu_l} \right)^{\rho_l - 1} \cdot \exp(u_{i,l}) \]

Where:

  • \(\nu_l\): Scale parameter of the Weibull distribution for event \(l\),

  • \(\rho_l\): Shape parameter of the Weibull distribution for event \(l\),

  • \(\xi_i\): Subject-specific parameter,

  • \(\tau_i\): Estimated reference time for individual \(i\),

Survival function \(S_{i,l}(t)\) is then defined as:

\[ S_{i,l}(t) = \exp\left( -\left( \frac{e^{\xi_i (t - \tau_i)}}{\nu_l} \right)^{\rho_l} \exp(u_{i,l}) \right) \]

Finally, CIF for event \(l\) and subject \(i\) is defined as:

\[ \text{CIF}_{i,l}(t) = \int_0^t h_{i,l}(x) \prod_{q=1}^L S_{i,q}(x) \, dx \ = \int_0^t \rho_l e^{\xi_i} \left( \frac{e^{\xi_i (x - \tau_i)}}{\nu_l} \right)^{\rho_l - 1} \exp(u_{i,l}) S_{i,l}(x) \prod_{q=1}^L S_{i,q}(x) \, dx \]

3.3.3.3. Model summary

For patient \(i\), outcome \(k\), and event \(l\):

\[\begin{split} \begin{cases} \psi_i(t) = e^{\xi_i}(t - \tau_i) + t_0 \\ \mathbf{w}_i = \mathbf{A} \mathbf{s}_i \\ \mathbf{u}_i = \zeta \mathbf{s}_i \\ \gamma_{i,k}(t) = \left[ 1 + g_k \exp\left( -\frac{v_{0,k}(1+g_k)^2}{g_k} e^{\xi_i}(t - \tau_i) + w_{i,k} \right) \right]^{-1} \\ S_{i,l}(t) = \exp\left( -\left( \frac{e^{\xi_i (t - \tau_i)}}{\nu_l} \right)^{\rho_l} \exp(u_{i,l}) \right) \end{cases} \end{split}\]

In practice in leaspy, to use the joint model, you need to precise “joint” in Leaspy object creation, then you can use it to fit, personnalize, estimate and simulate.

leaspy_joint = JointModel(nb_events=2, source_dimension=3)
leaspy_joint.fit(data_joint, nb_iter=1000, nb_burnin=500)

For estimation, it is the CIF that is outputted by the model. Note that for prediction purposes, the CIF is corrected using the survival probability at the time of the last observed visit, following common practice in other packages [3].

3.4. Mixture Model

3.4.1. Definition

Mixture models are a class of statistical models that represent a population as a combination of underlying subpopulations, each described by its own probability distribution [23]. Standard mixed effects models assume a certain homogeneity, in the sense that inter-patient variability is considered as random variation around a fixed reference. Mixture models explicitly recognize that observed data may arise from distinct latent groups—for example, patients may cluster into different onset times, rates of disease progression or more advanced clinical scores. This formulation captures heterogeneity in the data, allowing for flexible modeling of complex, multimodal patterns. Estimation is typically carried out using an adaptation of MCMC-SAEM for mixture models as described in [23], which iteratively refine both the component parameters and the posterior probabilities of membership. By providing a probabilistic clustering framework, mixture models not only identify hidden structure but also quantify uncertainty in subgroup assignment, making them widely applicable in biomedical sciences.

In Leaspy the mixture model is implemented as an adaptation of the spatio-temporal logistic model where the individual parameters (tau, xiand sources) come from a mixture of gaussian distributions with a number of components defined by the user.

3.4.2. Data

The same rules apply as for the standard logistic model.

3.4.3. Mathematical background

To build a mixture model, we add another layer upon the hierarchical structure of the model. We suppose that each subgroup \(c\) can be described by a different set of \((\overline{\tau}^c, \overline{\xi}^c, \overline{s}^c)\) and has a respective probability of occurrence \(\pi^c\). We suppose that the population parameters come from normal distributions as in the standard model, while the individual parameters are sampled from a mixture of gaussians with mixing probabilities \(\pi^c\).

The individual parameters (random effects):

\[\begin{split} \begin{cases} \xi_i \sim \sum_{c=1}^{n_c} \pi^c \mathcal{N} (\overline\xi^c, \sigma_{\xi}^c) \\ \tau_i \sim \sum_{c=1}^{n_c} \pi^c \mathcal{N} (\overline\tau^c, \sigma_{\tau}^c)\\ s_{il} \sim \sum_{c=1}^{n_c} \pi^c \mathcal{N} (\overline s^c, 1) \end{cases} \end{split}\]

3.4.4. Model summary

To use the mixture model in Leaspy you need to choose the number of cluster you wish to estimate beforehand.

from leaspy.models import LogisticMultivariateMixtureModel

leaspy_mixture = LogisticMultivariateMixtureModel(source_dimension=1, n_clusters=2, dimension=3)
leaspy_mixture.fit(data_logistic,  "mcmc_saem", n_iter=1000)

3.5. References

[ASGFKD20]

Maha Alsefri, Maria Sudell, Marta García-Fiñana, and Ruwanthi Kolamunnage-Dona. Bayesian joint modelling of longitudinal and time to event data: a methodological review. BMC Medical Research Methodology, 20(1):94, April 2020. URL: https://doi.org/10.1186/s12874-020-00976-2 (visited on 2022-07-26), doi:10.1186/s12874-020-00976-2.

[ARTL17]

Eleni-Rosalina Andrinopoulou, D Rizopoulos, Johanna Jm Takkenberg, and E Lesaffre. Combined dynamic predictions using joint models of two longitudinal outcomes and competing risk data. Statistical Methods in Medical Research, 26(4):1787–1801, August 2017. URL: http://journals.sagepub.com/doi/10.1177/0962280215588340 (visited on 2024-02-09), doi:10.1177/0962280215588340.

[DPTrouveA13]

Stanley Durrleman, Xavier Pennec, Alain Trouvé, and Nicholas Ayache. Toward a comprehensive framework for the spatiotemporal statistical analysis of longitudinal shape data. International journal of computer vision, 103(1):22–59, 2013.

[ICC10]

Joseph G. Ibrahim, Haitao Chu, and Liddy M. Chen. Basic concepts and methods for joint models of longitudinal and survival data. Journal of Clinical Oncology: Official Journal of the American Society of Clinical Oncology, 28(16):2796–2801, June 2010. doi:10.1200/JCO.2009.25.0654.

[LRT07]

E. Lesaffre, D. Rizopoulos, and R. Tsonaka. The logistic transform for bounded outcome scores. Biostatistics, 8(1):72–85, 2007. URL: https://academic.oup.com/biostatistics/article-lookup/doi/10.1093/biostatistics/kxj034, doi:10.1093/biostatistics/kxj034.

[MP00] (1,2)

Geoffrey J. McLachlan and David Peel. Finite Mixture Models. Wiley, New York, 2000. ISBN 978-0-471-00626-8.

[Ort24]

Juliette Ortholand. Joint modelling of events and repeated observations : an application to the progression of Amyotrophic Lateral Sclerosis. phdthesis, Sorbonne Université, September 2024. URL: https://theses.hal.science/tel-04770912 (visited on 2025-05-07).

[PG78]

R. L. Prentice and L. A. Gloeckler. Regression analysis of grouped survival data with application to breast cancer data. Biometrics, 34(1):57–67, 1978. URL: http://www.jstor.org/stable/2529588 (visited on 2025-05-15).

[PLSTJG14]

Cécile Proust-Lima, Mbéry Séne, Jeremy MG Taylor, and Hélène Jacqmin-Gadda. Joint latent class models for longitudinal and time-to-event data: A review. Statistical Methods in Medical Research, 23(1):74–90, February 2014. Publisher: SAGE Publications Ltd STM. URL: https://doi.org/10.1177/0962280212445839 (visited on 2024-04-18), doi:10.1177/0962280212445839.

[RG11]

Dimitris Rizopoulos and Pulak Ghosh. A Bayesian semiparametric multivariate joint model for multiple longitudinal outcomes and a time-to-event. Statistics in Medicine, 30(12):1366–1380, 2011. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/sim.4205. URL: https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.4205 (visited on 2023-03-28), doi:10.1002/sim.4205.

[RMG12]

Virginie Rondeau, Yassin Mazroui, and Juan R. Gonzalez. \textbf frailtypack : An \textit R Package for the Analysis of Correlated Survival Data with Frailty Models Using Penalized Likelihood Estimation or Parametrical Estimation. Journal of Statistical Software, 2012. URL: http://www.jstatsoft.org/v47/i04/ (visited on 2024-04-09), doi:10.18637/jss.v047.i04.

[SACD17]

Jean-Baptiste Schiratti, Stéphanie Allassonnière, Olivier Colliot, and Stanley Durrleman. A Bayesian Mixed-Effects Model to Learn Trajectories of Changes from Repeated Manifold-Valued Observations. Journal of Machine Learning Research, 18(133):1–33, 2017. URL: http://jmlr.org/papers/v18/17-197.html.

[TD04]

Anastasios A. Tsiatis and Marie Davidian. JOINT MODELING OF LONGITUDINAL AND TIME-TO-EVENT DATA: AN OVERVIEW. Statistica Sinica, 14(3):809–834, 2004. URL: http://www.jstor.org/stable/24307417 (visited on 2025-12-15), arXiv:24307417.