Survival Analysis Regression
Survival Analysis Regression
Survival analysis is a statistical methodology to study the occurrence of an event over time. It is referred to as survival analysis because it was originally derived in contexts where the event was death, but the event under study need not be death. Examples from the social sciences where survival analysis can be used are studies that investigate time from marriage until separation or divorce and intervals between births.
A graphical representation of typical survival data is depicted in Figure 1, which shows study recruitment over time. For each of four participants, study entry is indicated as t 0. Occurrence of an event is indicated by a square. If no event is observed during the study period, the last known event-free time point is marked with a circle. For these participants the “time until the event occurred” cannot be specified. Such observations are said to be censored. A censored observation can arise from the fact that a participant is lost to follow-up during the observation period or from a limited observation period, that is, the event might occur some time after the observation period has ended. Censoring of this type is called right censoring. A right censored observation indicates that occurrence of the event, if it happens, will take place after the time that contact is lost with the participant or after the end of the observation period. Analysis of the data is not based on chronologic time but on a different time scale—the “time from t 0” (Figure 2).
Survival analysis regression aims at investigating and quantifying the impact subject and study factors have on the time until the event occurs. These factors are often measured at study entry (t 0) for each individual participant, and their effect on time to event is quantified via the hazard function of the survival time distribution. The hazard function models the rates at which events occur as a function of subject and study factors. Parametric and semiparametric methods are available for survival analysis regression. For details see Hosmer and Lemeshow (1999) or Kleinbaum and Klein (2005). The most frequently used model for analyzing survival data is the Cox proportional hazards model (a semiparametric model). It assumes that hazard rates are proportional over time but does not make distributional assumptions regarding survival times. Examples of parametric methods are the Weibull and accelerated failure time models, which assume specific statistical distributions for survival times in addition to assuming proportional hazards. Standard models assume independence between observations, but extensions of the models are available to accommodate dependencies (frailty models) between observations. Such dependencies might arise if participants, for example, are family members. Extensions also exist to accommodate multiple events, competing events, and factors that might change over time. On one hand the extension to multiple events and competing events is conceptually straightforward. Analysis involving factors that might change over time, on the other hand, are both technically and conceptually more involved. Survival analysis regression has been
used extensively and successfully in various fields to quantify the impact of different factors on time to event.
SEE ALSO Censoring, Left and Right; Censoring, Sample; Regression
BIBLIOGRAPHY
Hosmer, David W., Jr., and Stanley Lemeshow. 1999. Applied Survival Analysis: Regression Modeling of Time to Event Data. New York: Wiley.
Kleinbaum, David G., and Mitchel Klein. 2005. Survival Analysis: A Self-Learning Text. New York: Springer.
Susanne May
David W. Hosmer Jr.