The Chronic Heart Failure Data
A study conducted in Belgium from 2008 to 2010 aimed to evaluate the impact of a telemonitoring program on mortality and rehospitalization rates in chronic heart failure (CHF) patients (Dendale et al., 2011) Heart failure, a condition where the heart struggles to pump sufficient blood, involved daily remote measurements of systolic and diastolic blood pressure, heart rate, and weight from 80 patients using specialized devices provided at hospital discharge These longitudinal measurements were tracked for approximately six months, revealing that 16 patients experienced rehospitalizations—13 once, 2 twice, and 1 thrice—while additional patient characteristics were recorded at the study's baseline.
The assessment of heart health involves several key factors: sex, age, heart rhythm, NT-proBNP levels indicating cardiac muscle fiber stretch, patient fitness as classified by the New York Heart Association (NYHA), and the Left Ventricular Ejection Fraction (LVEF), which evaluates overall heart performance.
The data are analyzed in Chapters 3 and 4.
The Liver Cirrhosis Data
A randomized clinical trial conducted in Copenhagen aimed to determine if Prednisone could extend survival in patients with liver cirrhosis (Andersen et al., 1993) Liver cirrhosis is a progressive disease characterized by declining liver function due to injury Between 1962 and 1969, patients were enrolled and randomly assigned to receive either Prednisone or a placebo, with follow-up visits scheduled at 3, 6, and 12 months post-treatment, and annually thereafter until 1974 The study analyzed data from 251 patients in the Prednisone group and 237 in the placebo group, collecting various biochemical markers, including bilirubin, albumin, and prothrombin The quasi-continuous prothrombin index was specifically evaluated as an indicator of liver fibrosis severity Further details are discussed in Chapter 5.
Data on Recurrent Asthma Attacks in Children
These data have also been used by Molenberghs et al (2010) and Duchateau and Janssen (2008) The setting is a prevention trial, where children, who are between
In a study involving infants aged 6 to 24 months at high risk for asthma, participants are randomized to receive either the study drug or a placebo before experiencing asthmatic attacks, which are subsequently recorded Due to the nature of the condition, patients often experience multiple events, leading to data clustering Throughout the observation period, patients have varying at-risk times, defined by intervals between attacks or periods without observation The data is presented in Table 2.1 in a calendar-time format, illustrating the at-risk periods from the conclusion of one event to the onset of the next.
Accident Insurance Policies Data
Patient ID Drug Begin End Status
These data are considered in Chapter 6, in the context of frailty models in survival.
Böhing (2000) examines claims data from 9,461 accident insurance policies issued by La Royale Belge Insurance Company, originally attributed to Thyrion (1960) and utilized by Simar (1976) and Carlin and Louis (1996) This dataset details the number of policies that reported various claims within a specific year.
Table 2.2: Accident insurance policies data of Thyrion (1960).
We use these data in the context of finite mixture models, within Chapter 6.
Time-to-insemination Data
Data on time to insemination in dairy heifer cows is crucial for understanding factors affecting the calving interval, which dairy farmers aim to maintain between 12 and 13 months The interval from parturition to first insemination significantly influences this timeline Research by Duchateau et al (2005) highlights that time-to-insemination data is clustered within herds, noting instances of cows that were either not inseminated or culled prior to insemination, leading to censoring This analysis emphasizes the covariate "parity," distinguishing between "primiparous" and "multiparous" cows, a focus further explored by Duchateau and Janssen (2008).
These data are considered in Chapter 6, in the context of survival frailty models.
National Track Records for Women
The article presents record data for seven women's track events, including the 100, 200, 400, 800, 1500, and 3000 meters, as well as the marathon, based on research by Johnson and Wichern (2007) The record times are compiled from a total of 54 countries.
These data are considered in Chapter 6, in the context of factor analysis.
The 2005 United States’ National Youth Risk Behavior Survey Data 10
The US Centers for Disease Control and Prevention conducted a survey targeting youths in grades 9–12 to examine various health-risk behaviors, including alcohol and drug use, sexual behavior, dietary habits, and physical activity The Youth Risk Behavior Surveillance System aims to monitor trends in these behaviors and evaluate the effectiveness of interventions This analysis focuses on 12 key questions related to smoking, alcohol consumption, drug use, and sexual behavior.
(2009) have previously extensively analyzed these variables, in the context of latent class models.
These data are considered in Chapter 6, in the context of latent class models.
Framework for Longitudinal and Time-to-event Data With Overdispersion
This chapter presents a joint model that integrates conjugate and normal random effects for analyzing outcomes, particularly focusing on scenarios where at least one outcome is non-Gaussian and includes survival data Conjugate random effects help alleviate the restrictive mean-variance assumptions in non-Gaussian outcomes, while normal random effects capture the correlation from repeated measurements within subjects and the relationship between different outcomes However, the normal random effects may not adequately address overdispersion, a common oversight Through a case study on chronic heart failure, we demonstrate that adopting our extended framework can enhance model fit and influence significance testing outcomes.
By first taking advantage of the ease of analytical integration over conjugate random effects, we easily implement our framework, using maximum likelihood, in standard software.
Introduction
In HIV/AIDS research, it is common to gather various outcomes from study subjects, including the time until the onset of AIDS or death, as well as viral load measurements.
In cancer research, biological markers and time-to-event outcomes, such as survival or disease recurrence, are often tracked over time For instance, prostate cancer studies utilize prostate-specific antigen measurements post-treatment to monitor disease recurrence (Yu et al., 2004, 2008) In liver cirrhosis, researchers collected various biochemical markers, including bilirubin and albumin, alongside survival time (Andersen et al., 1993) Similarly, in cardiology, telemonitoring allowed for the repeated measurement of vital signs in chronic heart failure patients while also recording time-to-rehospitalization, which can occur multiple times (Dendale et al., 2011; Njagi et al., 2013a) In veterinary research, dairy cows had protein and urea concentrations measured over time, with the time to first insemination recorded, highlighting the repeated nature of time-to-event outcomes within clustered herds (Duchateau and Janssen, 2008).
The objectives of statistical analysis can vary widely, often necessitating a joint modeling approach Researchers may focus on the distribution of one outcome based on another, such as analyzing time-to-event data in relation to the underlying longitudinal biomarker process (Tsiatis and Davidian, 2004; Verbeke et al., 2010; Rizopoulos, 2011) Additionally, it is important to investigate whether covariates influence all outcomes collectively Research questions can extend beyond the original forms of outcomes, allowing for analyses of modified versions, such as a dichotomized continuous longitudinal outcome alongside a survival outcome Consequently, statisticians may encounter a limitless array of outcome combinations in their work.
Molenberghs et al (2010) expanded on their earlier work (2007) by focusing on models for binomial, count, and survival data, which typically follow a mean-variance relationship To address instances of overdispersion, where data deviates from this relationship, random effects with a conjugate distribution are commonly integrated into these models.
The beta-binomial and negative-binomial models are essential for analyzing binomial and count data, especially in hierarchical settings Commonly, generalized linear mixed models are employed to address correlations from repeated measurements within the same experimental unit However, since overdispersion and hierarchical structures often coexist, a "combined" model is proposed, integrating both conjugate and normal random effects This model effectively addresses the mean, variance, and correlation structures inherent in hierarchical non-Gaussian outcomes, allowing for a more flexible mean-variance relationship and accounting for hierarchical correlations Case studies demonstrate that adopting this combined framework can significantly enhance model fit and potentially influence hypothesis testing for covariate effects Even when simpler models are preferred, the extended framework serves as a valuable tool for assessing goodness-of-fit.
Joint modelling, particularly through the shared-parameter modelling framework, is a prevalent method for analyzing longitudinal and time-to-event data This approach involves creating submodels for both the longitudinal outcome and the time-to-event, which are interconnected via a shared latent structure that may be parametric or non-parametric In parametric scenarios, shared normal random effects are commonly utilized to capture the correlation between longitudinal measurements and the relationship between the time-to-event and longitudinal outcomes The field is rapidly advancing, with significant reviews highlighting the focus on univariate time-to-event cases, often in conjunction with continuous longitudinal outcomes However, when a parametric shared latent structure, typically normal, is employed, the emphasis tends to be on addressing the correlation from repeated measures in the longitudinal data and the association with the time-to-event outcome It is important to note that even univariate survival outcomes can demonstrate overdispersion.
In this chapter, we introduce an advanced shared random effects framework designed for the joint analysis of various outcome types, including time-to-event, continuous, binary, and count data Our approach accommodates repeated measures for each outcome type and can be extended to analyze multiple outcome types simultaneously The flexibility of our framework arises from the integration of conjugate and normal random effects in the shared-parameter model, allowing it to effectively capture the complexities of non-Gaussian repeated measures Building on the work of Molenberghs et al (2010), we enhance the flexibility of shared-parameter joint modeling, as demonstrated through a case study that shows improved model fit and impacts on significance testing We favor conjugate random effects for their ease of analytical integration, facilitating model estimation through partial marginalization Additionally, we demonstrate the derivation of closed-form expressions for fully marginalized joint probabilities, although the presence of infinite series may complicate this process.
This article reviews the Weibull-gamma-normal, probit-beta-normal, and Poisson-gamma-normal models, as outlined by Molenberghs et al (2010), which form the foundation of our framework In Section 3.3, we present our modeling framework, and we discuss the estimation process in Section 3.4 Finally, Section 3.5 focuses on the data analysis.
Review of Ingredients
Weibull-gamma-normal Model
In the context of repeated measures time-to-event outcomes, the authors present a model for the k-th survival time in cluster i, represented by the equation f(tik|ψik, bi) = λρtik ρ k −1ψike à ik + dik e −λ k tik ρk ψik e àik + dik Here, ψik follows a Gamma distribution with parameters αk and βk, while dik is defined as w0ik bi, with bi assumed to be normally distributed, N(0, D) The model assumes that the baseline hazard function adheres to a parametric form, specifically leading to a Weibull distribution for event times The gamma random effects, which are conjugate to the Weibull distribution, are considered independent, although correlation among repeated measures is introduced through normal random effects Additionally, it is possible to assume dependence between gamma random effects or to utilize a common gamma random effect for the entire cluster, represented as ψi∼Gamma(α, β).
Probit-beta-normal Model
In the analysis of repeated measures binary outcomes, the authors focus on the probit-beta-normal model, which utilizes the probit link, distinguishing it from the logistic-beta-normal model that employs a logit link The probit-beta-normal model allows for closed-form expressions, while approximations for back-transformation are available for the logistic model The notation Yij represents the j-th measurement within cluster i, where j ranges from 1 to ni.
The model is defined by a Bernoulli distribution, where \( Y_{ij} \) follows \( Bernoulli(\pi_{ij}) \), and \( \pi_{ij} \) is influenced by \( \theta_{ij} \) and the cumulative distribution function \( \Phi(x_{0ij}\gamma + z_{0ij}b_i) \) Here, \( \theta_{ij} \) is modeled using a Beta distribution with parameters \( \alpha_1 \) and \( \beta_1 \) The fixed-effects parameters are represented by the vector \( \gamma \), while the model integrates both normal and conjugate random effects through the vectors \( x_{0ij} \) and \( z_{0ij} \), which are organized into design matrices \( X_i \) and \( Z_i \) for fixed and random effects, respectively This structure is applicable whenever these vectors are utilized.
Poisson-gamma-normal Model
For repeated measures count data, the authors consider, withYij,j = 1, , ni the j th count in thei th cluster, the following model:
P(Yij =yij|θij,bi) = 1 yij! θije τ ij + z 0 ij bi y ij e −θ ij e τij + z 0 ij b i
In the model, the random effects θ ij are drawn from a Gamma distribution with parameters α j and β j, while the fixed-effects parameters are represented by the vector ζ in the equation τ ij = x 0 ij ζ This approach integrates both normal and conjugate random effects, allowing for flexibility in the gamma random effect configurations, similar to the Weibull-gamma-normal framework.
Linear Mixed Model
Generalized linear models encompass various types of data, including continuous outcomes, and are grounded in the exponential family mathematical framework In continuous normally distributed outcomes, the mean and variance are treated as distinct parameters To account for intracluster correlation in repeated measures of continuous outcomes, random effects with a normal distribution are utilized, serving as a conjugate distribution to the normal linear model for the j-th measurement within cluster i.
Yij|bi ∼N(x 0 ij β+z 0 ij bi, σ 2 ) (3.8) is the hierarchically specified linear mixed model, whereβ is a vector of fixed-effects parameters.
Flexible Joint Modelling Framework
Case 2: Repeated Time-to-event and Repeated Binary Outcomes 18
In our study, we develop a probit-beta-normal model for the binary outcome submodel, as outlined in equations (3.4), (3.5), and (3.6), applicable to the j-th measurement within cluster i, where j ranges from 1 to n_i For the time-to-event outcome submodel, we introduce a Weibull-gamma-normal model, adhering to the specifications detailed in Section 3.3.1 We assume conditional independence between the binary and survival processes, given the common normal random effects Notably, integrating the beta random effects from the model established in (3.4), (3.5), and (3.6) leads to the implication that f(y_ij | b_i) equals 1/(α1 + β1).
(K ij α 1 ) y ij [(1−K ij )α 1 +β 1 ] 1−y ij , (3.10) where
, (3.11) the form of the joint conditional model can then be seen as f(y i ,t i |b i ,ψ i ) = Y j
1 α 1 +β 1 (K ij α 1 ) y ij [(1−K ij )α 1 +β 1 ] 1−y ij ×Y k λ k ρ k t ik ρ k −1 ψ ik e à ik +d ik e −λ k t ik ρk ψ ik e àik + dik (3.12)
The conditioning here is only on the gamma and normal random effects, given that the beta random effects in the binary outcome submodel has been integrated out.
Case 3: Repeated Time-to-event and Repeated Count Outcomes 18
In the repeated counts submodel, we utilize the Poisson-gamma-normal model, as outlined in (3.7) For the time-to-event outcome submodel, we adopt the Weibull-gamma-normal model, following the specifications detailed in Section 3.3.1 We also assume conditional independence between the two processes, given the shared normal random variable.
3.3 Flexible Joint Modelling Framework 19 effects The joint model, conditional on the random effects, then takes the form
1 y ij ! θije τ ij + z 0 ij bi y ij e −θ ij e τij + z 0 ij bi ×Y k λkρktik ρ k −1ψike à ik +d ik e −λ k t ik ρk ψ ik e àik + dik (3.13)
Note that here we have two sets of gamma random effects; one set in the survival,and the other in the count process.
Case 4: Repeated Binary and Repeated Continuous Outcomes 19
In this analysis, we examine a commonly encountered scenario involving two outcome submodels: a linear mixed model for continuous outcomes and a probit-beta-normal model for binary outcomes We maintain the principle of conditional independence as outlined in previous sections To differentiate between the two outcomes, we denote the continuous outcome as Y1i and the binary outcome as Y2i Specifically, for the binary outcome, we define Y2ik as the k-th measurement within cluster i, where k ranges from 1 to p_i.
Y 2ik ∼ Bernoulli(π ik ), π ik ∼ θ ik Φ(x 0 ik γ+z 0 ik b i ), θik ∼ Beta(α1, β1).
Again, note that by integrating out the beta random effects, these model specifications for the binary process can be shown to imply that f(y 2ik |bi) = 1 α1+β1
(M ik α 1 ) y 2ik [(1−M ik )α 1 +β 1 ] 1−y 2ik , (3.14) where
Mik= Φ (x 0 ik γ+z 0 ik bi). The joint model for the continuous and the binary process, this time only conditional on the normal random effects, is then f(y 2i ,y 1i |bi)
Estimation and Inference
The fully marginalized joint model probabilities can be derived, allowing parameter estimation through maximum likelihood by specifying the marginal likelihood contributions This process is illustrated for both Case 1 and Case 3, with relevant calculations detailed in Appendix A.1 While marginal probabilities can be directly specified for estimation, the presence of infinite series may complicate this approach, particularly in all cases except for the binary-continuous scenario However, the combination of analytical integration over conjugate random effects and numerical integration over normal random effects offers a practical estimation route, as proposed by Molenberghs et al (2007) This partial marginalization approach involves analytically integrating out the conjugate random effects while retaining the normal random effects for numerical integration, exemplified by the SAS procedure NLMIXED Consequently, for estimation purposes, we focus on the expressions for the joint distribution that is marginalized over the conjugate random effects but conditional on the normal random effects, which we outline for the four cases discussed.
Case 1: Repeated Time-to-event and Repeated Continuous Outcomes:
We consider model (3.9), and integrate over the gamma random effects We then have the following as the joint distribution conditional on the normal random effects: f(ti,y i |bi) = 1
(2π) ni 2 |Σi| 1 2 e −1 2 [y i −(X iβ+Z ibi )] 0 Σ −1 i [y i −(X iβ+Z ibi )] ×Y k λ k ρ k t ρ ik k −1 e à ik +d ik ìα λkt ρ ik k e à ik +d ik + 1 β α+1 ×β α
Case 2: Repeated Time-to-event and Repeated Binary Outcomes:
Considering model (3.12), and integrating over gamma random effects, leads to the joint distribution conditional on the normal random effects: f(t i ,y i |b i ) = Y j
(K ij α 1 ) y ij [(1−K ij )α 1 +β 1 ] 1−y ij ×Y k λ k ρ k t ρ ik k −1 e à ik +d ik ìα λ k t ρ ik k e à ik +d ik + β 1 α+1 ×β α ,
Analysis of the Chronic Heart Failure Data
Case 3: Repeated Time-to-event and Repeated Count Outcomes:
We integrate over bothΘ i andΦ i from model (3.13), having f(ti,y i |bi) = Y j
( 1 y ij !) e τ ij + z 0 ij bi y ij Γ(yij+αj) Γ(αj)β j α j e τ ij + z 0 ij bi+ 1 β j y ij +α j ×Y k λkρkt ρ ik k −1 e à ik +d ik αk β k α k λ k t ρ ik k e à ik +d ik + β 1 k α k +1 (3.17) as the joint model, conditional on normal random effects.
Case 4: Repeated Binary and Repeated Continuous Outcomes:
For this case, (3.15) already provides the required expression, given that the beta random effect has already been marginalized out of the model for the binary outcome.
These expressions are all that is required to use, for example, the SAS procedure NLMIXED for estimation and inference.
Note that though we here focus on maximum likelihood estimation, other estima- tion methods can also be considered; for instance, a Bayesian approach.
3.5 Analysis of the Chronic Heart Failure Data
This section focuses on the data presented in Section 2.1, emphasizing the joint modeling of rehospitalization risk and the average frequency of abnormal heart rate measurements, using LVEF as a baseline covariate It is important to note that the survival outcome in this analysis is recurrent.
We conduct a joint analysis of recurrent rehospitalization and a count-based assessment of dichotomized longitudinal heart rate in heart failure management Heart rate is classified into “normal” (50–90, coded as 0) and “abnormal” (above 90, coded as 1), excluding values below 50 For each non-hospitalization period, we count the instances of “abnormal” heart rate measurements, resulting in a repeated count response for patients who experienced rehospitalization and had multiple non-hospitalization intervals during the study Understanding these patterns is crucial for effective heart failure management.
As a covariate, we considered the baseline left ventricular ejection fraction, LVEF.
LVEF measures the percentage of blood ejected from the ventricle during each heartbeat This metric is categorized into two groups: preserved ejection, defined as LVEF greater than 45% (coded as 0), and reduced ejection, defined as LVEF of 45% or less (coded as 1) (Dendale et al., 2011).
In the time-to-rehospitalization submodel, we define the risk function for patient i during the k-th risk period as fik(t|ψi, bi) = λρt ρ−1 ψie à i + di e −λ(t ρ 2ik −t ρ 1ik )ψ i e ài + di, where di = κbi and bi follows a normal distribution N(0, σ b 2) Here, t1ik and t2ik denote the start and end of the risk period, respectively, with the condition t1ik ≤ t ≤ t2ik The at-risk periods are represented in calendar-time format, as referenced by Duchateau and Janssen (2008) Additionally, the variable ai is defined as Xiξ, where Xi equals 1 if the subject has reduced ejection and 0 otherwise The gamma random effect, ψi, is assumed to follow a one-parameter Gamma distribution: ψi ∼ Gamma(α, α − 1).
, (3.19) for identifiability purposes (Duchateau and Janssen, 2008).
In the repeated heart rate counts submodel, we analyzed the frequency of abnormal heart rate measurements for each subject, denoted as Yij, during the jth period when the subject was not hospitalized.
The model presented is defined by the equation \(1 y_{ij} = θ_{ij} + b_i y_{ij} e^{−θ_i e^{τ_{ij}} + b_i}\), where \(τ_{ij} = \log(time) + ζ_0 + ζ_1X_i\), with \(X_i\) representing the baseline left ventricular ejection fraction An offset term, \(\log(time)\), is included to adjust for varying time periods of the counts Identifiability issues arise with the parameters of the gamma distribution, as noted by Molenberghs et al (2007) To address this challenge, it is assumed that \(θ_i\) follows a one-parameter gamma distribution, specifically \(θ_i ∼ \text{Gamma}(\frac{α}{2}, \frac{α - 1}{2})\).
In line with (3.17), and taking our specifications into account, the joint model for the survival and count outcome, conditional upon the normal random effect, takes the form: f(y ij , t|bi) = Y j
1 yij! e τ ij +b i y ij Γ(y ij +α 2 )×α α 2 2 Γ(α 2 )×(e τ ij +b i +α 2 ) y ij +α 2 ì λρt ρ−1 e à i +d i ìα α+1 {λ(t ρ ik2 −t ρ ik1 )e à i +d i +α} α+1 (3.21)
3.5 Analysis of the Chronic Heart Failure Data 23
By assuming that the likelihood contribution of censored observations is that of sur- vival probabilities, we allow censoring of the survival outcome in the following manner: f(y ij , t|b i ) = Y j
1 yij! e τ ij +b i y ij Γ(y ij +α 2 )×α α 2 2 Γ(α 2 )×(e τ ij +b i +α 2 ) y ij +α 2 ì(λρt ρ−1 e à i +d i ) δ i ìΓ(α+δ i )ìα α Γ(α) λ(t ρ ij2 −t ρ ij1 )e à i +d i +α α+δ i, (3.22) where δ i is a censoring indicator, taking the value 1 if the observation is an event time, and 0 otherwise.
The analysis was conducted using the SAS procedure NLMIXED, with detailed programming found in Appendix A.2 To accommodate different likelihood contributions for two distinct outcomes, we employed the general likelihood feature of the procedure, similar to the approach used by Molenberghs and Verbeke (2005) for joint models involving continuous and discrete outcomes Initial values were derived from separate independent Poisson and survival models for the count and survival outcomes, respectively, with results summarized in Table 3.1 Our extended joint model results are compared with those from the conventional model, which only includes a normal random effect to address correlations from repeated measurements and associations between responses In contrast, our extended model incorporates gamma random effects, allowing for a more flexible approach to mean-variance relationships in the submodels A significance level of 5% was applied in the analysis.
The analysis of our extended model reveals that the joint effect of ejection status on the mean number of abnormal heart rate measurements and rehospitalization risk is not statistically significant (p=0.1650) However, patients with reduced ejection show a mean of 3.3531 times more abnormal heart rate measurements compared to those with preserved ejection, approaching borderline significance (p=0.0594) Additionally, the risk of rehospitalization for patients with reduced ejection is 5.5168 times higher than for those with preserved ejection, but this finding is not statistically significant (p=0.6498) Furthermore, the positive estimate of the scale factor, κ, indicates that a higher mean number of abnormal heart rate measurements correlates with an increased risk of rehospitalization, though this result is also not statistically significant (p=0.3201).
The comparison between the extended and conventional models reveals that the extended model enhances model fit without sacrificing parsimony, as indicated by an AIC-based assessment While the effect of ejection status on the mean number of abnormal heart rate measurements is borderline significant in the extended model (p=0.0901), the conventional model shows a significant scale factor (p=0.0022), contrasting with the extended model's findings Both models yield similar results regarding the joint effect of ejection status on the processes, with p-values of 0.1650 and 0.1648 respectively It is crucial to note that overly restrictive variance functions in univariate generalized linear models for non-Gaussian outcomes can lead to incorrect standard errors Although joint modeling involves multiple outcomes, overly simplistic variance structures may also pose similar issues, which will be further explored in upcoming simulation studies to assess the impact of omitting conjugate random effects on model parameters across various scenarios.
We evaluate the predictive capability of the extended model in relation to the conventional model for time-to-event outcomes To assess predictive performance, we utilize the concordance approach outlined by Harrell et al (1996), which involves analyzing pairs of patients, ensuring that at least one individual in each pair has experienced the event of interest.
A pair is deemed "usable" if the event times can be ordered In this context, for each usable pair, the predicted probability of surviving to a specific point should be higher for the member who has a longer survival time Pairs that meet this criterion are classified as "concordant."
“usable” pairs for which concordance holds, provides the concordance index (Harrell et al., 1996).
Both the extended and conventional models were applied to the initial 90 days of data to estimate the probabilities of remaining rehospitalization-free These predictions were then compared to actual rehospitalization events that occurred from day 91 to day 180, allowing for the calculation of the concordance index It is important to note that only "usable" pairs were considered in this analysis.
3.5 Analysis of the Chronic Heart Failure Data 25 the patient’s first rehospitalization event in the concerned period (the day 91-day 180 period).
Under the extended model, and once again assuming (3.19) for the distribution of ψi, the survival probabilities described above are of the form
On the other hand, under the conventional model, they are of the form
P(T > t|bi) = exp{−λ t ρ exp(ài+di)} (3.24)
Estimates for the probabilities were computed based on the maximum likelihood estimates of the respective model parameters For the normal random effects, the empirical Bayes estimates were used.
A total of 434 pairs were usable Based on the extended model, 328 of these were concordant, 105 discordant, and 1 pair had tied predicted survival probabilities This therefore provided an index of
For the conventional model, on the other hand, 310 were concordant, 123 discordant, and 1 pair had tied probability predictions; this gave an index of
Therefore, the extended model seems capable of better discriminating between patients who are going to be rehospitalized within the 90 days, and the ones who will not.
To enhance the understanding of prediction in healthcare, it is beneficial to integrate our findings with the concept of dynamic prediction, as outlined by Rizopoulos (2011, 2012a,b) This approach involves calculating patient-specific probabilities of survival at future time points using a fitted joint model, based on the patient's available information up to a specific moment Dynamic discrimination indices are subsequently derived from these probabilities In our analysis of rehospitalization-free probabilities, we incorporate previously recorded patient data through empirical Bayes estimates of random effects, which represent the modes of the posterior distribution based on observed patient data However, calculating discrimination indices in the context of recurrent events remains complex, necessitating further research to address these challenges effectively.
Table 3.1: Chronic Heart Failure Data Parameter estimates (standard errors) for the extended joint repeated counts/recurrent time-to-event model and for the conventional analysis.
Joint effect of LVEF status (p-value)
Model fit [Akaike information criterion (AIC)]
Discussion
This chapter presents a flexible joint modeling framework that addresses the correlation from repeated measurements and the associations between various outcomes It also tackles the restrictive mean-variance relationship in models for non-Gaussian outcomes by incorporating conjugate random effects, facilitating straightforward estimation.
3.6 Discussion 27 framework in standard software, through partial marginalization.
Our analysis of chronic heart failure data demonstrates that our enhanced framework improves model fit and maintains simplicity, while also delivering superior predictive performance and influencing significance tests.
Further research is needed to assess the effects of omitting conjugate random effects through simulation studies, focusing on how this omission impacts specific model parameters across various scenarios These scenarios will examine factors such as the level of overdispersion in non-Gaussian outcomes, the extent of censoring, and the length of longitudinal sequences Additionally, this framework paves the way for exploring related areas, including the derivation of marginal joint correlation functions, which could be valuable in fields like surrogate marker evaluation and psychometrics.
A Joint Survival-Longitudinal Modelling Approach for the Dynamic Prediction of
Telemonitored Chronic Heart Failure Patients
Telemonitoring in chronic heart failure enables clinicians to remotely monitor daily patient biometrics, such as blood pressure and heart rate, to predict rehospitalization and inform intervention decisions Given the high rehospitalization rates among heart failure patients, this strategy is crucial for effective clinical management This chapter introduces a dynamic prediction method that calculates patient-specific conditional survival probabilities and their confidence intervals using a joint model for time-to-rehospitalization and time-varying, potentially error-contaminated biomarkers We assess the biomarker's ability to differentiate between patients likely to be rehospitalized and those who are not within a specified time frame This innovative approach offers a robust statistical modeling solution to a significant issue that has not been previously addressed in the literature.
30 Failure Patients tical modelling approach, it also provides clinicians with a valuable additional tool on which to base their intervention decisions, and thus provides immense contribution to heart failure management.
Introduction
Joint modelling of longitudinal and time-to-event data has significantly advanced over the past few decades, addressing challenges such as error contamination and intermittent observations of longitudinal outcomes This evolution began with the simple Last Value Carried Forward method and progressed through two-stage procedures to the contemporary shared-parameter joint modelling approaches (Verbeke et al., 2010; Tsiatis and Davidian, 2004) Numerous studies have contributed to this field, with key references including Pawitan and Self (1993) and DeGruttola.
Numerous studies, including those by Tu (1994), Taylor et al (1994), Faucett and Thomas (1996), and Hogan and Laird (1997, 1998), have contributed significantly to the field, with comprehensive overviews provided by Tsiatis and Davidian (2004) and Yu et al (2004).
Classical survival analysis has focused on the models' ability to differentiate between patients who will experience the event of interest and those who will not Harrell et al (1996) introduced an index akin to the receiver operating curve (ROC) that compares survival probabilities among comparable subjects Antolini et al (2005) advanced this discrimination index to incorporate time-dependent covariates, employing a time-dependent area under the curve (AUC) approach to assess discriminative ability at various time points Key contributions in this evolving field include works by Zheng and Heagerty (2007) and Heagerty and Zheng (2005) In contrast, the joint modeling framework has seen less emphasis on discrimination Yu et al (2008) explored a Bayesian method for predicting recurrent probabilities using a joint longitudinal survival-cure model Rizopoulos (2011) concentrated on dynamic predictions of conditional survival probabilities, utilizing a Monte Carlo-based method to compute these predictions and their confidence intervals, while also proposing a comprehensive definition of prediction rules that considers longitudinal history and varying threshold values, paralleling Antolini et al.'s (2005) approach.
The article introduces the dynamic discrimination index, derived from time-dependent AUCs, which serves as a comprehensive measure of discriminative ability This methodology is relevant across various fields of medical research, including HIV studies, where CD4 counts are utilized as a survival marker to calculate conditional survival probabilities tailored to individual patients based on their specific data (Rizopoulos, 2012b).
Prothrombin measurements play a significant role in liver cirrhosis research and are also applicable in breast cancer studies, highlighting their versatility in various medical fields This underscores the importance of dynamic prediction and discrimination in advancing research outcomes.
This manuscript investigates the dynamic prediction of rehospitalization in telemonitored chronic heart failure (CHF) patients, a condition where the heart struggles to pump sufficient blood, leading to compensatory mechanisms that can eventually result in cardiac decompensation Given that severe heart failure patients experience rehospitalization rates as high as 50% annually, accurately predicting decompensation is crucial for effective clinical management Previous studies by Chin and Goldman (1997), Lewin et al (2005), Chaudhry et al (2007), Zhang et al (2009), and Dendale et al (2011) have highlighted the complexities involved in assessing a patient's risk of rehospitalization.
This article highlights the novel application of statistical modeling to predict rehospitalizations in telemonitored chronic heart failure patients, an area previously unexplored By utilizing time-to-event data and longitudinal outcomes, the study emphasizes the importance of dynamic prediction within the joint modeling framework to enhance heart failure management through telemonitoring We propose a solution based on Rizopoulos (2012b, 2011), beginning with the fitting of a shared random effects joint model that accounts for time-varying and error-contaminated biomarkers This approach addresses the measurement error often overlooked in existing methods The second phase involves calculating patient-specific conditional survival probabilities, providing a comprehensive tool for intervention within a specified time window.
The analysis of 32 failure patients' confidence intervals, as outlined by Rizopoulos (2011), demonstrates the dynamic updating of probabilities based on available biomarker measurements This approach equips physicians with a valuable tool to inform their intervention decisions Additionally, quantifying the discriminative ability of the biomarker, as discussed by Rizopoulos (2011, 2012b), enhances physicians' ability to evaluate the accuracy of predictions generated by the statistical model.
The Joint Model
Specification, Assumptions, and Estimation
Recent research has emphasized models that connect a time-to-event outcome with a longitudinal outcome through a shared latent structure These models can either impose distributional assumptions on the latent structure or allow for flexibility Tsiatis and Davidian (2004) provide a comprehensive review of both parametric and conditional score approaches Additionally, Verbeke et al (2010) and Rizopoulos (2011) have developed shared random effects joint models that assume normality for the random effects For each subject, the observed event time, denoted as Ti, is defined as the minimum of the true event time (Ti*) and the censoring time (Ci).
In our hazard model, we define the true event time as T_i and the censoring time as C_i, with the hazard function expressed as h_i(t|M_i(t), w_i) = h_0(t) exp{γ_0 w_i + αm_i(t)} Here, M_i(t) represents the history of the unobserved longitudinal process up to time t, while α indicates the influence of this process on the hazard The baseline hazard h_0(t) and a vector of baseline covariates w_i, along with their associated parameter vector γ, play crucial roles in determining the hazard at time t The risk ratio for unit changes in baseline covariates is given by exp(γ), and for unit changes in the longitudinal covariate, it is exp(α) Furthermore, the relationship between the unobserved true value for subject i at time t and the observed value y_i(t) is modeled as y_i(t) = m_i(t) + ε_i(t) = x_0 i(t)β + z_0 i(t)b_i + ε_i(t), where ε_i(t) follows a normal distribution with mean zero and variance σ^2 The fixed effects and random effects are represented by the design matrices x_0 i and z_0 i, respectively, while the measurement error ε_i(t) is assumed to be independent of the random effects b_i, which are normally distributed with mean zero and variance D.
4.2 The Joint Model 33 outcome at timet therefore comprises of the true value, mi(t), contaminated by a random error term,εi(t) The true value, as shown, is represented by a mixed model. Note that the formulation so far assumes that the longitudinal outcome is observed at any timet This is normally not the case, since measurements are only observed intermittently, at the time pointstij Therefore, as discussed in Verbekeet al.(2010), the aim is to estimate m i (t) using the available measurements,y i (t ij ), j= 1, , n i , combined with model (4.2).
As discussed by the above-mentioned authors, the likelihood contribution for the i th patient is: f(Ti, δi,y i ;θ) Z f(Ti, δi|bi;θ)
f(bi;θ)dbi, with θ the parameter vector, y i the longitudinal information for the i th subject, δi the event indicator, and f(Ti, δi|bi;θ) = [h0(Ti)exp{γ 0 wi+αmi(Ti)}] δ i ×exp −
The definition highlights the assumption that the censoring mechanism and the visiting process lack informativeness Tsiatis and Davidian (2004) provide a comprehensive overview of the common assumptions made in this context.
Various parameterizations can be utilized for the true marker, m i (t), as outlined in (4.1) This includes the option to consider the true value, the true trajectory, or a combination of both Furthermore, for highly non-linear longitudinal profiles, the use of splines or higher-order polynomials is suggested (Rizopoulos, 2012a) Consequently, the model offers significant flexibility.
Predicted Conditional Survival Probabilities
In this study, we focus on calculating survival probabilities for a new subject based on a fitted joint model that incorporates longitudinal measurements, denoted as Yi(t) = {yi(s); 0 ≤ s ≤ t} Rizopoulos (2011) highlights that the provision of longitudinal data up to time t indicates that the subject has survived until that moment, making the longitudinal process a time-dependent endogenous covariate Consequently, it is essential to evaluate the conditional probability of the subject surviving beyond time u > t, given their survival up to time t, represented as πi(u|t) = Pr{T i ∗ ≥ u|T i ∗ > t, Yi(t), Dn; θ}.
In this study, we analyze a sample of 34 failure patients represented by Dn ={Ti, δi, yi; i= 1, , n}, which serves as the basis for our predictive modeling (Rizopoulos, 2011) Utilizing a Bayesian framework allows us to effectively calculate standard errors The subject-specific survival probabilities are derived from three key components: first, the posterior parameters θ, approximated by a normal distribution based on maximum likelihood estimates (MLEs) and their asymptotic covariance matrix; second, the random effects for each subject, estimated through a Metropolis-Hastings algorithm with multivariate t proposals; and finally, the ratio of conditional survival probabilities Si(u|Mi(u))/Si(t|Mi(t)), computed using realizations of the parameters θ and the random effects from the second factor.
By repeatedly executing these steps, we generate a Monte Carlo sample of πi(u|t), which allows us to derive standard errors and compute confidence intervals For those seeking more in-depth information, refer to the technical details outlined by Rizopoulos (2011).
Prospective Accuracy: Time-dependent AUCs and the Dy-
Evaluating the predictive performance of a joint model is crucial, particularly in terms of discrimination, which assesses the model's ability to differentiate between patients who experience the event of interest and those who do not We emphasize this aspect by employing ROC methodology to enhance the model's effectiveness.
Conditional survival probabilities are updated as new measurements are obtained, allowing for the differentiation between patients likely to experience an event within a specified time frame and those who are not This distinction is crucial for making informed intervention decisions Specifically, with longitudinal measurements Yi(t), the focus is on the time window (t, t + ∆t] A prediction rule is established using π i (t + ∆t|t), where values of π i (t + ∆t|t) less than or equal to a threshold c indicate a success (event occurrence), while values greater than c indicate a failure Consequently, sensitivity and specificity are defined to evaluate the prediction accuracy.
The AUC at timet,AU C(t), is obtained by varyingc, as
Analysis of the Chronic Heart Failure Data
Model Formulation
The joint model, comprising equations (4.1) and (4.2), was utilized to analyze the time to first hospitalization and individual longitudinal markers Initially, baseline covariates were excluded from the model, resulting in the survival sub-model's linear predictor reflecting solely the biomarker's effect, while the longitudinal sub-model's fixed-effects structure incorporated only linear time progression.
In this study, we introduce the first step model, where baseline covariates are analyzed individually in a second step model Each covariate is incorporated into both the survival and longitudinal sub-models, utilizing a random intercept within the random-effects structure We assume a Weibull baseline risk function for h0(t), which is re-parameterized for convenience as hi(t|Mi(t), wi) = ρ t^(ρ−1) exp{γ0 + γ0 wi + αmi(t)}, where ρ represents the shape parameter and the scale parameter is exp(γ0).
4.3 Analysis of the Chronic Heart Failure Data 37
Diastolic Blood Pressure
This article discusses the dynamic prediction of conditional survival probabilities based on diastolic blood pressure measurements from two patients over the first 100 days, focusing on their patterns of missing data We analyze their measurements at 20, 40, 80, and 100 days, calculating predicted conditional survival probabilities for subsequent time points, utilizing 200 Monte Carlo samples to derive median values The results are illustrated in Figures 4.2 and 4.3, which display the longitudinal measurements and the corresponding conditional survival probability curves, with solid lines indicating median values and dashed lines representing confidence intervals Notably, the first patient's diastolic blood pressure consistently clusters below 60, contrasting with the second patient's measurements, which remain higher, reflecting their respective survival probability profiles.
The conditional survival probability for the first patient, with a diastolic blood pressure below 60, decreases significantly compared to the second patient This decline suggests that the first patient is at a higher risk of hypotension, making it reasonable to anticipate a lower likelihood of survival over time when contrasted with the second patient.
Figure 4.4 illustrates the updating of conditional survival probabilities for two patients as additional measurements are obtained, demonstrating how the likelihood of surviving an extra 20, 40, and 60 units of time evolves with each new data point.
The study updates measurements over 80 days, with additional assessments every 20 days, from day 20 to day 100, using the same number of Monte Carlo samples The large dots in the analysis represent median estimates, while the lines illustrate confidence intervals The focus is on evaluating the model's effectiveness in distinguishing between subjects who will experience hospitalization and those who will not, with Area Under the Curve (AUC) calculations pre-specified to occur every two weeks.
In this study, we analyze drug-drug interactions (DDIs) over a 14-day period, focusing on time windows of 2, 4, 8, and 16 days The results, displayed in Table 4.1, indicate varying degrees of discriminative ability based on the area under the curve (AUC) for each time window.
Conditional Sur viv al Probability
(a) Considering measurements during the first 20 days
Conditional Sur viv al Probability
(b) Considering measurements during the first 40 days
Figure 4.2: Conditional survival probabilities at each of the remaining time points until study end. ability for different time windows at different time points, from a high of 0.9552 for
At 6 weeks (42 days), the conditional survival probability model shows a significant predictive capability for rehospitalization within a 2-day window, with a probability of 0.9552 for correctly identifying patients at risk To evaluate the model's discriminative ability over time, we utilize the Discriminative Diagnostic Indices (DDIs), which offer a weighted average of the Area Under the Curve (AUC) values, reflecting the number of patients still at risk The DDIs range from 0.4875 for a 2-day prediction window to 0.5814 for an 8-day prediction window, highlighting variations in predictive accuracy across different timeframes.
4.3 Analysis of the Chronic Heart Failure Data 39
Conditional Sur viv al Probability
(a) Considering measurements during the first 60 days
Conditional Sur viv al Probability
(b) Considering measurements during the first 80 days
Figure 4.3: Conditional survival probabilities at each of the remaining time points until study end.
In the second step, the focus shifts to evaluating the overall discriminative value gained from adjusting for each baseline covariate Table 4.2 presents the DDIs, computed using the same time-point and time-window specifications as previously mentioned.
The DDI results from the initial model indicate that correcting for NTproBNP significantly enhances overall discriminative ability across all time windows, achieving an index exceeding 0.7 for an 8-day window Additionally, patient age contributes positively, with the index surpassing 0.6 for both the 8-day and 16-day windows.
An e xtra 20 da ys After
● ● ● ● ● 20 da ys 40 60 da ys 80 100 da ys
An e xtra 40 da ys After
● ● ● ● ● 20 da ys 40 60 da ys 80 100 da ys
An e xtra 60 da ys After
● ● ● ● ● 20 da ys 40 60 da ys 80 100 da ys
An e xtra 80 da ys After
● ● ● ● ● 20 da ys 40 60 da ys 80 100 da ys
An e xtra 20 da ys After
● ● ● ● ● 20 da ys 40 60 da ys 80 100 da ys
An e xtra 40 da ys After
● ● ● ● ● 20 da ys 40 60 da ys 80 100 da ys
An e xtra 60 da ys After
● ● ● ● ● 20 da ys 40 60 da ys 80 100 da ys
An e xtra 80 da ys After
● ● ● ● ● 20 da ys 40 60 da ys 80 100 da ys
(b) P atien t 2 Figure4.4:Conditionalprobabilitiesofsurvivinganextra20,40,60and80days,witheachadditional20daysofmeasurement.
4.3 Analysis of the Chronic Heart Failure Data 41 Table 4.1: Diastolic Blood Pressure, First Step Model AUCs and DDIs.
Time window ∆t Time pointt AUC(t) DDI
Table 4.2: Diastolic Blood Pressure, DDIs for different time windows.
Second Step Covariate controlled for:
∆t First Step NTproBNP Heart Rhythm NYHA Sex LVEF Age
Systolic Blood Pressure, Heart Rate, and Weight
This article analyzes diastolic blood pressure by examining the dynamic prediction of conditional survival probabilities and how these probabilities are updated with additional measurements It highlights the reflection of longitudinal profiles in these probabilities and evaluates the discriminative ability of both the first and second step models The methodology can also be applied to systolic blood pressure, heart rate, and weight, with a particular focus on assessing their discriminative ability using the same time-point and time-window specifications.
In our study on systolic blood pressure, we transformed various variables for numerical stability, rescaling systolic blood pressure measurements to unit magnitude by dividing by the largest value In the second step model, we controlled for NTproBNP, NYHA, sex, LVEF, and age, applying a square root transformation to NTproBNP and rescaling age by dividing by the minimum value Notably, in the model that accounted for heart rhythm, systolic blood pressure was transformed using the cube root The results, detailed in Appendix B.1, indicate a discriminative power of at least 0.6 across all time windows in the first step model, with patient age demonstrating the most significant positive effect on this discriminative power.
Heart rate values were normalized for numerical stability by dividing by the maximum value Table 4.3 presents the drug-drug interactions (DDIs) across different time windows, showing a discriminative ability exceeding 0.65 in the initial model Notably, controlling for NTproBNP significantly improved the discriminative ability for the 2, 4, and 16-day windows, while the New York Heart Association (NYHA) classification had the most impact on the 8-day window.
In the analysis, weight values were adjusted similarly to the previous method For the second step model, age values were rescaled in accordance with the approach used for systolic blood pressure Appendix B.2 illustrates that the first step model demonstrated effective discriminative indices.
4.3 Analysis of the Chronic Heart Failure Data 43
16 0.6459 of between 0.3877 and 0.5020, while patient age had the highest positive effect on the discriminative ability.
Overall Findings
Diastolic blood pressure measurements alone demonstrated limited discriminative ability, while factors such as cardiac muscle fiber stretch and patient age significantly enhanced this ability Systolic blood pressure measurements exhibited moderate discriminative power, with patient age being the most influential factor Heart rate measurements showed moderate to good discriminative capability, particularly with the DDI for an 8-day time window exceeding 0.7, where cardiac muscle fiber stretch had the greatest impact Additionally, patient age was crucial in enhancing the discriminative ability of longitudinal weight measurements In sensitivity analyses, adopting a more complex random effects structure, including both random intercepts and slopes, resulted in a marked improvement in discriminative ability.
Sensitivity Analysis
The joint model offers significant flexibility in specifying the hazard for an event, which can depend on the true value of the biomarker, its trajectory, or previous values Estimation of the true biomarker can be achieved through complex fixed and random effects structures, utilizing higher order polynomials or splines Additionally, both piecewise-constant and spline representations can be employed for the survival baseline, alongside traditional parametric options.
This article examines various modeling options regarding the discriminative ability and fit of models for heart rate analysis We focus on the baseline hazard function, the lag of the true longitudinal covariate value, and the random effects structure The survival sub-model's linear predictor is defined as γ 0 wi + αmi{max(t−k,0)}, where k indicates the lag, with zero and one lags representing the current and previous values of the covariate, respectively We evaluate two types of baseline hazard functions: the Weibull and a piecewise constant function, with knots positioned at equal percentiles of observed event times Additionally, we explore models with random intercepts only and those with both random intercepts and slopes, as summarized in Table 4.4.
In the sensitivity analysis, varying assumptions about the lag of the longitudinal covariate and the baseline hazard function showed minimal impact on a given random effects structure However, extending the random effects structure to include both random intercepts and slopes significantly enhances discriminative ability, demonstrating the value of a more complex model in this context.
Table 4.4: Heart Rate DDI under various assumptions.
Random effects Lag Baseline hazard ∆t DDI AIC
The Akaike Information Criterion (AIC) values for different models are provided to evaluate model fit, which, along with the Deviance Information Criterion (DDI), aids in model selection Among the models analyzed, the Weibull baseline hazard model featuring both random intercepts and slopes exhibits the lowest AIC, indicating the best fit, with the hazard dependent on the current value.
Discussion
This chapter discusses the role of dynamic prediction in aiding physicians with intervention decisions for telemonitored chronic heart failure (CHF) patients By utilizing dynamically updated conditional survival probabilities and their confidence intervals, healthcare providers gain valuable insights for making informed decisions Additionally, the chapter examines the effectiveness of various biomarkers in distinguishing between patients at risk of rehospitalization and those who are not Overall, this method not only offers a robust statistical model for predicting rehospitalizations but also presents a practical solution for managing heart failure effectively.
Dynamic prediction has primarily focused on the time to first hospitalization, indicating a need for methodological advancements to address recurrent events Additionally, while individual biomarkers have been analyzed, it is crucial to evaluate them collectively, considering their interrelationships There is a pressing requirement for software development to facilitate dynamic prediction and accuracy measurement when multiple longitudinal biomarkers are present A significant challenge arises as incorporating marker-specific random effects can complicate computations, especially as the number of random effects increases.
Missingness at Random in a Generalized
Time-to-Event Data, and
This chapter explores the relationship between missing data scenarios and the joint modeling of longitudinal and time-to-event outcomes We propose an extended shared random effects joint model and define the concept of missing at random in a way that aligns with traditional missing data frameworks The complexities inherent in the joint modeling of longitudinal and time-to-event data are emphasized, and we conduct a sensitivity analysis within the extended random effects model This approach is demonstrated using data from a study focused on liver cirrhosis.
50 Time-to-Event Data, and Sensitivity Analysis
Introduction
In the context of missing data, three primary model frameworks have been established: selection (SEM), pattern-mixture (PMM), and shared-parameter (SPM) models, as highlighted by Molenberghs and Kenward (2007) The SEM and PMM approaches differ in their factorizations of the joint distribution concerning the data and the processes that lead to missing values Conversely, the SPM framework posits that both the observed data and the missing value processes rely on latent variables, assuming independence conditional on these variables.
Rubin's (1976) classification of missing value processes within structural equation modeling (SEM) identifies three categories: missing completely at random (MCAR), where the missing data mechanism is independent of outcomes; missing at random (MAR), where it depends only on observed data; and missing not at random (MNAR), where it relies on unobserved outcomes This taxonomy has also been adapted to the patterns of missing data methods proposed by Molenberghs et al (1998) and Creemers et al (2011).
Models for missing data often rely on unverifiable assumptions about the missing value mechanism, highlighting the importance of sensitivity analysis (Verbeke and Molenberghs, 2000; Molenberghs and Verbeke, 2005) The variability in these assumptions affects the stability of inferences, serving as a guide for the caution required when interpreting results While sensitivity analysis has mainly been applied within the SEM and PMM frameworks, Creemers et al (2010) explored its application in the context of SPM.
In the joint longitudinal and time-to-event setting, data collection includes both longitudinal measurements and time-to-event outcomes, such as prostate-specific antigen levels in prostate cancer patients and the time to disease recurrence (Law et al., 2002; Yu et al., 2004, 2008) Similarly, in HIV/AIDS research, researchers track viral load and CD4 cell count alongside the time to AIDS onset or death (DeGruttola and Tu, 1994; Rizopoulos, 2011) The primary objectives of these studies are threefold: to analyze survival outcomes while considering longitudinal covariates, to investigate longitudinal outcomes while addressing potential non-random dropouts due to event occurrences, and to explore the association between these two types of outcomes.
(Tsiatis and Davidian, 2004; Rizopouloset al., 2009; Verbekeet al., 2010; Rizopoulos, 2012a).
The first objective, commonly achieved within the SPM framework, involves linking a time-to-event sub-model with a longitudinal process through a shared latent structure, such as a normal random effect, while assuming conditional independence (Tsiatis and Davidian, 2004; Verbeke et al., 2010; Rizopoulos, 2011, 2012a) However, this approach faces challenges, including measurement errors in longitudinal covariates, the availability of data only at specific clinic visits, and potential censoring of the time-to-event data (Tsiatis and Davidian, 2004).
The joint density in time-to-event analysis incorporates not only censoring but also visiting and measurement probabilities, as outlined by Tsiatis and Davidian (2004) Visiting probabilities determine the time points for available measurements (Rizopoulos, 2012a) In likelihood inference, it is assumed that censoring and visiting probabilities depend on past visit times and longitudinal measurements, but not on future measurements or event times, reflecting the MAR assumption These assumptions, however, are unverifiable with the data, leading to sensitivity concerns Rizopoulos (2012a) emphasizes the importance of exploring different parameterizations of the longitudinal process within the survival sub-model as a means for conducting sensitivity analysis.
This chapter explores the significant link between missing data and joint longitudinal and time-to-event models, presenting a unique perspective that aligns the two frameworks conceptually We propose an extended shared random effects joint model, inspired by Creemers et al (2011), adapted for the complexities of longitudinal data with missing observations The intricacies arise from the potential for data to be coarsened in multiple ways, including incomplete longitudinal sequences and censored time-to-event outcomes, which can occur simultaneously Coarsening highlights the disparity between the observed data and the more detailed, potentially counterfactual full data.
Within the extended framework, we provide a characterization of MAR, consistent
This article discusses the application of 52 time-to-event data and the sensitivity analysis in scenarios with missing data It emphasizes the complexity involved in model formulation within an extended framework Additionally, the study employs an extended random-effects structure to conduct sensitivity analysis effectively.
This chapter is organized to first review the concepts of missing data and various modeling frameworks, specifically focusing on the characterization of Missing At Random (MAR) within these frameworks We examine the Generalized Shared-Parameter Modeling (GSPM) framework proposed by Creemers et al (2011) and its MAR characterization In Section 5.3, we address the joint modeling of longitudinal and time-to-event data, drawing parallels to missing data issues, and present our extended framework along with its MAR characterization Section 5.4 discusses the complexities involved in model formulation, followed by a sensitivity analysis in Section 5.5, which includes an illustrative application Finally, we conclude with remarks in Section 5.6.