Apa perbedaan intuitif antara variabel acak konvergen dalam probabilitas versus variabel acak konvergen dalam distribusi?
Saya sudah membaca banyak definisi dan persamaan matematika, tetapi itu tidak terlalu membantu. (Harap diingat, saya mahasiswa sarjana yang belajar ekonometrik.)
Bagaimana variabel acak dapat menyatu menjadi satu nomor, tetapi juga menyatu ke suatu distribusi?
distributions
random-variable
convergence
intuition
teman baik
sumber
sumber
Jawaban:
Katakanlah Anda memiliki NN bola di dalam kotak. Anda bisa mengambilnya satu per satu. Setelah Anda mengambil bola kk , saya bertanya: berapa berat rata-rata bola di dalam kotak? Jawaban terbaik Anda adalah ˉ x k = 1k ∑ k i = 1 xix¯k=1k∑ki=1xi . Anda menyadari bahwa ˉ x kx¯k itu sendiri adalah nilai acak? Itu tergantung padakk bola mana yang Anda pilih pertama.
Sekarang, jika Anda terus menarik bola, pada titik tertentu tidak akan ada bola yang tersisa di dalam kotak, dan Anda akan mendapatkan ˉ x N ≡ μx¯N≡μ .
Jadi, yang kita dapatkan adalah urutan acak ˉ x 1 , ... , ˉ x k , ... , ˉ x N , ˉ x N , ˉ x N , ...
Selanjutnya, mari kita dapatkan angka acak seragam e 1 , e 2 , ... , di mana e i ∈ [ 0 , 1 ] . Mari kita lihat urutan acak ξ 1 , ξ 2 , … , di mana ξ k = 1e1,e2,… ei∈[0,1] ξ1,ξ2,… √k12 ∑ki=1(ei-12 ). Theξkadalah nilai acak, karena semua hal yang merupakan nilai-nilai acak. Kita tidak bisa memprediksi apa yangξkakan menjadi. Namun, ternyata kita dapat mengklaim bahwa distribusi probabilitasξkakan terlihat lebih dan lebih seperti standar normalN(0,1). Begitulah cara distribusi bertemu.ξk=1k12√∑ki=1(ei−12) ξk ξk ξk N(0,1)
sumber
Tidak jelas berapa banyak intuisi yang dimiliki pembaca pertanyaan ini tentang konvergensi apa pun, apalagi variabel acak, jadi saya akan menulis seolah-olah jawabannya "sangat sedikit". Sesuatu yang kekuatan bantuan: daripada berpikir "bagaimana bisa sebuah variabel acak konvergen", bertanya bagaimana urutan dari variabel acak dapat menyatu. Dengan kata lain, ini bukan hanya variabel tunggal, tetapi daftar variabel (sangat panjang!), Dan yang nanti dalam daftar semakin dekat dan dekat dengan ... sesuatu. Mungkin satu nomor, mungkin seluruh distribusi. Untuk mengembangkan intuisi, kita perlu mencari tahu apa artinya "semakin dekat dan dekat". Alasannya ada begitu banyak mode konvergensi untuk variabel acak adalah bahwa ada beberapa jenis "
First let's recap convergence of sequences of real numbers. In RR we can use Euclidean distance |x−y||x−y| to measure how close xx is to yy . Consider xn=n+1n=1+1nxn=n+1n=1+1n . Then the sequence x1,x2,x3,…x1,x2,x3,… starts 2,32,43,54,65,…2,32,43,54,65,… and I claim that xnxn converges to 11 . Clearly xnxn is getting closer to 11 , but it's also true that xnxn is getting closer to 0.90.9 . For instance, from the third term onwards, the terms in the sequence are a distance of 0.50.5 or less from 0.90.9 . What matters is that they are getting arbitrarily close to 11 , but not to 0.90.9 . No terms in the sequence ever come within 0.050.05 of 0.90.9 , let alone stay that close for subsequent terms. In contrast x20=1.05x20=1.05 so is 0.050.05 from 11 , and all subsequent terms are within 0.050.05 of 11 , as shown below.
I could be stricter and demand terms get and stay within 0.0010.001 of 11 , and in this example I find this is true for the terms N=1000N=1000 and onwards. Moreover I could choose any fixed threshold of closeness ϵϵ , no matter how strict (except for ϵ=0ϵ=0 , i.e. the term actually being 11 ), and eventually the condition |xn−x|<ϵ|xn−x|<ϵ will be satisfied for all terms beyond a certain term (symbolically: for n>Nn>N , where the value of NN depends on how strict an ϵϵ I chose). For more sophisticated examples, note that I'm not necessarily interested in the first time that the condition is met - the next term might not obey the condition, and that's fine, so long as I can find a term further along the sequence for which the condition is met and stays met for all later terms. I illustrate this for xn=1+sin(n)nxn=1+sin(n)n , which also converges to 11 , with ϵ=0.05ϵ=0.05 shaded again.
Now consider X∼U(0,1)X∼U(0,1) and the sequence of random variables Xn=(1+1n)XXn=(1+1n)X . This is a sequence of RVs with X1=2XX1=2X , X2=32XX2=32X , X3=43XX3=43X and so on. In what senses can we say this is getting closer to XX itself?
Since XnXn and XX are distributions, not just single numbers, the condition |Xn−X|<ϵ|Xn−X|<ϵ is now an event: even for a fixed nn and ϵϵ this might or might not occur. Considering the probability of it being met gives rise to convergence in probability. For Xnp→XXn→pX we want the complementary probability P(|Xn−X|≥ϵ)P(|Xn−X|≥ϵ) - intuitively, the probability that XnXn is somewhat different (by at least ϵϵ ) to XX - to become arbitrarily small, for sufficiently large nn . For a fixed ϵϵ this gives rise to a whole sequence of probabilities, P(|X1−X|≥ϵ)P(|X1−X|≥ϵ) , P(|X2−X|≥ϵ)P(|X2−X|≥ϵ) , P(|X3−X|≥ϵ)P(|X3−X|≥ϵ) , …… and if this sequence of probabilities converges to zero (as happens in our example) then we say XnXn converges in probability to XX . Note that probability limits are often constants: for instance in regressions in econometrics, we see plim(ˆβ)=βplim(β^)=β as we increase the sample size nn . But here plim(Xn)=X∼U(0,1)plim(Xn)=X∼U(0,1) . Effectively, convergence in probability means that it's unlikely that XnXn and XX will differ by much on a particular realisation - and I can make the probability of XnXn and XX being further than ϵϵ apart as small as I like, so long as I pick a sufficiently large nn .
A different sense in which XnXn becomes closer to XX is that their distributions look more and more alike. I can measure this by comparing their CDFs. In particular, pick some xx at which FX(x)=P(X≤x)FX(x)=P(X≤x) is continuous (in our example X∼U(0,1)X∼U(0,1) so its CDF is continuous everywhere and any xx will do) and evaluate the CDFs of the sequence of XnXn s there. This produces another sequence of probabilities, P(X1≤x)P(X1≤x) , P(X2≤x)P(X2≤x) , P(X3≤x)P(X3≤x) , …… and this sequence converges to P(X≤x)P(X≤x) . The CDFs evaluated at xx for each of the XnXn become arbitrarily close to the CDF of XX evaluated at xx . If this result holds true regardless of which xx we picked, then XnXn converges to XX in distribution. It turns out this happens here, and we should not be surprised since convergence in probability to XX implies convergence in distribution to XX . Note that it can't be the case that XnXn converges in probability to a particular non-degenerate distribution, but converges in distribution to a constant. (Which was possibly the point of confusion in the original question? But note a clarification later.)
For a different example, let Yn∼U(1,n+1n)Yn∼U(1,n+1n) . We now have a sequence of RVs, Y1∼U(1,2)Y1∼U(1,2) , Y2∼U(1,32)Y2∼U(1,32) , Y3∼U(1,43)Y3∼U(1,43) , …… and it is clear that the probability distribution is degenerating to a spike at y=1y=1 . Now consider the degenerate distribution Y=1Y=1 , by which I mean P(Y=1)=1P(Y=1)=1 . It is easy to see that for any ϵ>0ϵ>0 , the sequence P(|Yn−Y|≥ϵ)P(|Yn−Y|≥ϵ) converges to zero so that YnYn converges to YY in probability. As a consequence, YnYn must also converge to YY in distribution, which we can confirm by considering the CDFs. Since the CDF FY(y)FY(y) of YY is discontinuous at y=1y=1 we need not consider the CDFs evaluated at that value, but for the CDFs evaluated at any other yy we can see that the sequence P(Y1≤y)P(Y1≤y) , P(Y2≤y)P(Y2≤y) , P(Y3≤y)P(Y3≤y) , …… converges to P(Y≤y)P(Y≤y) which is zero for y<1y<1 and one for y>1y>1 . This time, because the sequence of RVs converged in probability to a constant, it converged in distribution to a constant also.
Some final clarifications:
sumber
In my mind, the existing answers all convey useful points, but they do not make an important distinction clear between the two modes of convergence.
Let XnXn , n=1,2,…n=1,2,… , and YY be random variables. For intuition, imagine XnXn are assigned their values by some random experiment that changes a little bit for each nn , giving an infinite sequence of random variables, and suppose YY gets its value assigned by some other random experiment.
If Xnp→YXn→pY , we have, by definition, that the probability of YY and XnXn differing from each other by some arbitrarily small amount approaches zero as n→∞n→∞ , for as small amount as you like. Loosely speaking, far out in the sequence of XnXn , we are confident XnXn and YY will take values very close to each other.
On the other hand, if we only have convergence in distribution and not convergence in probability, then we know that for large nn , P(Xn≤x)P(Xn≤x) is almost the same as P(Y≤x)P(Y≤x) , for almost any xx . Note that this does not say anything about how close the values of XnXn and YY are to each other. For example, if Y∼N(0,1010)Y∼N(0,1010) , and thus XnXn is also distributed pretty much like this for large nn , then it seems intuitively likely that the values of XnXn and YY will differ by quite a lot in any given observation. After all, if there is no restriction on them other than convergence in distribution, they may very well for all practical reasons be independent N(0,1010)N(0,1010) variables.
(In some cases it may not even make sense to compare XnXn and YY , maybe they're not even defined on the same probability space. This is a more technical note, though.)
sumber
If you're learning econometrics, you're probably wondering about this in the context of a regression model. It converges to a degenerate distribution, to a constant. But something else does have a non-degenerate limiting distribution.
ˆβnβ^n converges in probability to ββ if the necessary assumptions are met. This means that by choosing a large enough sample size N, the estimator will be as close as we want to the true parameter, with the probability of it being farther away as small as we want. If you think of plotting the histogram of ˆβn for various n, it will eventually be just a spike centered on β.
In what sense does ˆβn converge in distribution? It also converges to a constant. Not to a normally distributed random variable. If you compute the variance of ˆβn you see that it shrinks with n. So eventually it will go to zero in large enough n, which is why the estimator goes to a constant. What does converge to a normally distributed random variable is
√n(ˆβn−β). If you take the variance of that you'll see that it does not shrink (nor grow) with n. In very large samples, this will be approximately N(0,σ2) under standard assumptions. We can then use this approximation to approximate the distribution of ˆβn in that large sample.
But you are right that the limiting distribution of ˆβn is also a constant.
sumber
Let me try to give a very short answer, using some very simple examples.
Convergence in distribution
Let Xn∼N(1n,1), for all n, then Xn converges to X∼N(0,1) in distribution. However, the randomness in the realization of Xn does not change over time. If we have to predict the value of Xn, the expectation of our error does not change over time.
Convergence in probability
Now, consider the random variable Yn that takes value 0 with probability 1−1n and 1 otherwise. As n goes to infinity, we are more and more sure that Yn will equal 0. Hence, we say Yn converges in probability to 0. Note that this also implies Yn converges in distribution to 0.
sumber