Intuisi untuk fungsi bahaya kumulatif (analisis survival)

17

Saya mencoba mendapatkan intuisi untuk masing-masing fungsi utama dalam ilmu aktuaria (khusus untuk Cox Proportional Hazards Model). Inilah yang saya miliki sejauh ini:

  • f(x) : mulai dari waktu mulai, distribusi probabilitas kapan Anda akan mati.
  • F(x) : hanya distribusi kumulatif. Pada waktuT , berapa% populasi yang akan mati?
  • S(x) :. Pada waktu, berapa% populasi yang akan hidup?1F(x)T
  • h(x) : fungsi bahaya. Pada waktu tertentu , dari orang-orang yang masih hidup, ini dapat digunakan untuk memperkirakan berapa banyak orang akan mati dalam interval waktu berikutnya, atau jika interval-> 0, probabilitas kematian 'seketika'.T
  • H(x): cumulative hazard. No idea.

What's the idea behind combining hazard values, especially when they are continuous? If we use a discrete example with death rates across four seasons, and the hazard function is as follows:

  • Starting at Spring, everyone is alive, and 20% will die
  • Now in Summer, of those remaining, 50% will die
  • Now in Fall, of those remaining, 75% will die
  • Final season is Winter. Of those remaining, 100% will die

Then the cumulative hazard is 20%, 70%, 145%, 245%?? What does that mean, and why is this useful?

Jon
sumber
1
Your T's should be x's, or vice versa.
Glen_b -Reinstate Monica
5
Regarding h(x), you have a mistake (although it's a very common confusion). You write, "interval->0, 'instantaneous' death probability". A correct statement would be 'instantaneous death rate'. This cannot be a probability because it is a probability divided by dt; moreover, it could be >1.
gung - Reinstate Monica

Jawaban:

6

Combining proportions dying as you do is not giving you cumulative hazard. Hazard rate in continuous time is a conditional probability that during a very short interval an event will happen:

h(t)=limΔt0P(t<Tt+Δt|T>t)Δt

Cumulative hazard is integrating (instantaneous) hazard rate over ages/time. It's like summing up probabilities, but since Δt is very small, these probabilities are also small numbers (e.g. hazard rate of dying may be around 0.004 at ages around 30). Hazard rate is conditional on not having experienced the event before t, so for a population it may sum over 1.

You may look up some human mortality life table, although this is a discrete time formulation, and try to accumulate mx.

If you use R, here's a little example of approximating these functions from number of deaths at each 1-year age interval:

dx <-  c(3184L, 268L, 145L, 81L, 64L, 81L, 101L, 50L, 72L, 76L, 50L, 
         62L, 65L, 95L, 86L, 120L, 86L, 110L, 144L, 147L, 206L, 244L, 
         175L, 227L, 182L, 227L, 205L, 196L, 202L, 154L, 218L, 279L, 193L, 
         223L, 227L, 300L, 226L, 256L, 259L, 282L, 303L, 373L, 412L, 297L, 
         436L, 402L, 356L, 485L, 495L, 597L, 645L, 535L, 646L, 851L, 689L, 
         823L, 927L, 878L, 1036L, 1070L, 971L, 1225L, 1298L, 1539L, 1544L, 
         1673L, 1700L, 1909L, 2253L, 2388L, 2578L, 2353L, 2824L, 2909L, 
         2994L, 2970L, 2929L, 3401L, 3267L, 3411L, 3532L, 3090L, 3163L, 
         3060L, 2870L, 2650L, 2405L, 2143L, 1872L, 1601L, 1340L, 1095L, 
         872L, 677L, 512L, 376L, 268L, 186L, 125L, 81L, 51L, 31L, 18L, 
         11L, 6L, 3L, 2L)

x <- 0:(length(dx)-1) # age vector

plot((dx/sum(dx))/(1-cumsum(dx/sum(dx))), t="l", xlab="age", ylab="h(t)", 
     main="h(t)", log="y")
plot(cumsum((dx/sum(dx))/(1-cumsum(dx/sum(dx)))), t="l", xlab="age", ylab="H(t)", 
     main="H(t)")

Hope this helps.

martin
sumber
Is it correct to say that h(t)*dt is the probability of an event occurring in an interval of length dt around t? therefore, the value h(t) is the probability of an event occurring within 1 unit of time centered around t. This would only be the case if h(t)<=1
gagak
10

The Book "An Introduction to Survival Analysis Using Stata" (2nd Edition) by Mario Cleves has a good chapter on that topic.

You can find the chapter on google books, p. 13-15. But I would advise on reading the whole chapter 2.

Here is the short form:

  • "it measures the total amount of risk that has been accumulated up to time t" (p. 8)
  • count data interpretation: "it gives the number of times we would expect (mathematically) to observe failures [or other events] over a given period, if only the failure event were repeatable" (p. 13)
elevendollar
sumber
5

I'd HAZARD a guess that it's noteworthy owing to its use in diagnostic plots:

(1) In the Cox proportional hazards model h(x)=eβTzh0(x), where β and z are the coefficient and covariate vectors respectively, & h0(x) is the baseline hazard function; & so logH(x)=βTz+H0(x). If you plot the estimate logH^(x) against x, different covariate patterns follow parallel curves, provided the proportional hazards assumption is correct.

(2) In the Weibull model h(x)=αθ(xθ)α1, where θ & α are the scale & shape parameters respectively; & so logH(x)=αlogxαlogθ. If you plot the estimate logH^(x) against logx, you get a straight line with slope α^ & intercept α^logθ^, provided the Weibull assumption is correct. And of course a slope near to 1 suggests an exponential model might fit.

An intuitive interpretation of H(x) is the expected number of deaths of an individual up to time x if the individual were to be resurrected after each death (without resetting time to zero).

Scortchi - Reinstate Monica
sumber
3

In paraphrasing what @Scortchi is saying, I would emphasize that the cumulative hazard function does not have a nice interpretation, and as such I would not try to use it as a way to interpret results; telling a non-statistical researcher that the cumulative hazards are different will most likely result in an "mm-hm" answer and then they'll never ask about the subject again, and not in a good way.

However, the cumulative hazard function turns out to be very useful mathematically, such as a general way to link the hazard function and the survival function. So it's important to know what the cumulative hazard is and how it can be used in various statistical methods. But in general, I don't think it's particularly useful to think about real data in terms cumulative hazards.

Cliff AB
sumber