Dari mana berasal dari teorema limit pusat (CLT)?

36

Versi yang sangat sederhana dari teorema terbatas pusat seperti di bawah ini yang merupakan Lindeberg-Lévy CLT. Saya tidak mengerti mengapa ada di sebelah kiri. Dan Lyapunov CLT mengatakan tetapi mengapa bukan ? Adakah yang bisa memberitahu saya apa saja faktor-faktor ini, seperti dan ? bagaimana kita mendapatkannya di teorema?

n((1ni=1nXi)μ) d N(0,σ2)
n
1sni=1n(Xiμi) d N(0,1)
snn1sn
Babi terbang
sumber
3
Ini dijelaskan di stats.stackexchange.com/questions/3734 . Jawaban itu panjang, karena meminta "intuisi." Ini menyimpulkan, "Perkiraan sederhana ini, bagaimanapun, menunjukkan bagaimana de Moivre mungkin awalnya menduga bahwa ada distribusi pembatasan universal, bahwa logaritma adalah fungsi kuadratik, dan bahwa faktor skala yang tepat sn harus sebanding dengan n .... "
whuber
1
Secara intuitif, jika semua σi=σ maka sn=σi2=nσdan baris ke-2 mengikuti dari baris ke-1:
n((1ni=1nXi)μ)=1ni=1n(Xiμ)d N(0,σ2)
bagi denganσ=snn (tentu saja kondisi Lyapunov, kombinasi dari semua σ i , adalah pertanyaan lain)
1ni=1n(Xiμ)snn=1sni=1n(Xiμi)d N(0,1)
σi
Sextus Empiricus

Jawaban:

33

Pertanyaan yang bagus (+1) !!

Anda akan ingat bahwa untuk variabel acak independen dan Y , V a r ( X + Y ) = V a r ( X ) + V a r ( Y ) dan V a r ( a X ) = a 2V a r ( X ) . Jadi varians dari Σ n i = 1 X i adalahXYVar(X+Y)=Var(X)+Var(Y)Var(aX)=a2Var(X)i=1nXi , dan varians ˉ X = 1i=1nσ2=nσ2adalahnσ2/n2=σ2/n.X¯=1ni=1nXinσ2/n2=σ2/n

Ini untuk varians . Untuk membakukan variabel acak, Anda membaginya dengan standar deviasi. Seperti yang Anda tahu, nilai yang diharapkan dari adalah μ , jadi variabelX¯μ

memiliki nilai yang diharapkan 0 dan varian 1. Jadi jika cenderung ke Gaussian, itu harus menjadi standar GaussianN(0,

X¯E(X¯)Var(X¯)=nX¯μσ
. Formulasi Anda dalam persamaan pertama adalah setara. Dengan mengalikan sisi kiri dengan σ Anda mengatur varians ke σ 2 .N(0,1)σσ2

Mengenai poin kedua Anda, saya percaya bahwa persamaan yang ditunjukkan di atas menggambarkan bahwa Anda harus membaginya dengan dan bukan σ untuk membakukan persamaan, menjelaskan mengapa Anda menggunakansn(penaksirσ)dan bukanσsnσ) .sn

Tambahan: @whuber menyarankan untuk mendiskusikan mengapa penskalaan oleh . Dia melakukannya disana, tetapi karena jawabannya sangat panjang, saya akan mencoba menangkap esensi argumennya (yang merupakan rekonstruksi pemikiran de Moivre).n

Jika Anda menambahkan sejumlah besar dari +1 dan -1, Anda dapat memperkirakan probabilitas bahwa jumlah tersebut akan menjadi j dengan penghitungan dasar. Log probabilitas ini sebanding dengan - j 2 / n . Jadi jika kita ingin probabilitas di atas konvergen ke konstanta ketika n menjadi besar, kita harus menggunakan faktor normalisasi dalam O ( njj2/nn.O(n)

Menggunakan alat matematika modern (post de Moivre), Anda dapat melihat perkiraan yang disebutkan di atas dengan memperhatikan bahwa probabilitas yang dicari adalah

P(j)=(nn/2+j)2n=n!2n(n/2+j)!(n/2j)!

yang kami perkirakan dengan rumus Stirling

P(j)nnen/2+jen/2j2nen(n/2+j)n/2+j(n/2j)n/2j=(11+2j/n)n+j(112j/n)nj.

log(P(j))=(n+j)log(1+2j/n)(nj)log(12j/n)2j(n+j)/n+2j(nj)/nj2/n.
gui11aume
sumber
Please see my comments to previous answers by Michael C. and guy.
whuber
Seems like the first equation (LL CLT) s/b n((1ni=1nXi)μ) d N(0,1)σ2
Jika Anda parametrize Gaussian dengan mean dan varians (bukan standar deviasi) maka saya percaya formula OP benar.
gui11aume
1
X¯E(X¯)Var(X¯)=nX¯μσd N(0,1) if we multiply X¯E(X¯)Var(X¯) by σ we get what was shown by the OP (σ cancel): namely n((1ni=1nXi)μ). But we know that VAR(aX) = a^2Var(X) where in this case a= σ2 and Var(X) is 1 so the distribution is N(0,σ2).
B_Miner
Gui,If not too late I wanted to make sure I had this correct. If we assume X¯E(X¯)Var(X¯)=n(X¯μ)d N(0,1) and we multiply by a constant (σ), the expected value of this quantity (i.e. n(X¯μ)), which was zero is still zero as E[aX]=a*E[X] => σ*0=0. Is this correct?
B_Miner
8

There is a nice theory of what kind of distributions can be limiting distributions of sums of random variables. The nice resource is the following book by Petrov, which I personally enjoyed immensely.

It turns out, that if you are investigating limits of this type

1ani=1nXnbn,(1)
where Xi are independent random variables, the distributions of limits are only certain distributions.

There is a lot of mathematics going around then, which boils to several theorems which completely characterizes what happens in the limit. One of such theorems is due to Feller:

Theorem Let {Xn;n=1,2,...} be a sequence of independent random variables, Vn(x) be the distribution function of Xn, and an be a sequence of positive constant. In order that

max1knP(|Xk|εan)0, for every fixed ε>0

and

supx|P(an1k=1nXk<x)Φ(x)|0

it is necessary and sufficient that

k=1n|x|εandVk(x)0 for every fixed ε>0,

an2k=1n(|x|<anx2dVk(x)(|x|<anxdVk(x))2)1

and

an1k=1n|x|<anxdVk(x)0.

This theorem then gives you an idea of what an should look like.

The general theory in the book is constructed in such way that norming constant is restricted in any way, but final theorems which give necessary and sufficient conditions, do not leave any room for norming constant other than n.

mpiktas
sumber
4

sn represents the sample standard deviation for the sample mean. sn2 is the sample variance for the sample mean and it equals Sn2/n. Where Sn2 is the sample estimate of the population variance. Since sn =Sn/√n that explains how √n appears in the first formula. Note there would be a σ in the denominator if the limit were

N(0,1) but the limit is given as N(0, σ2). Since Sn is a consistent estimate of σ it is used in the secnd equation to taken σ out of the limit.

Michael R. Chernick
sumber
What about the other (more basic and important) part of the question: why sn and not some other measure of dispersion?
whuber
@whuber That may be up for discussion but it was not part of the question. The OP just wanted to known why sn and √n appear in the formula for the CLT. Of course Sn is there because it is consistent for σ and in that form of the CLT σ is removed.
Michael R. Chernick
1
To me it's not at all clear that sn is present because it is "consistent for σ". Why wouldn't that also imply, say, that sn should be used to normalize extreme-value statistics (which would not work)? Am I missing something simple and self-evident? And, to echo the OP, why not use sn--after all, that is consistent for σ!
whuber
The theorem as stated has convergence to N(0,1), so to accomplish that you either have to know σ and use it or use a consistent estimate of it which works by Slutsky's theorem I think. Was I that unclear?
Michael R. Chernick
I don't think you were unclear; I just think that an important point may be missing. After all, for many distributions we can obtain a limiting normal distribution by using the IQR instead of sn--but then the result is not as neat (the SD of the limiting distribution depends on the distribution we begin with). I'm just suggesting that this deserves to be called out and explained. It will not be quite as obvious to someone who does not have the intuition developed by 40 years of standardizing all the distributions they encounter!
whuber
2

Intuitively, if ZnN(0,σ2) for some σ2 we should expect that Var(Zn) is roughly equal to σ2; it seems like a pretty reasonable expectation, though I don't think it is necessary in general. The reason for the n in the first expression is that the variance of X¯nμ goes to 0 like 1n and so the n is inflating the variance so that the expression just has variance equal to σ2. In the second expression, the term sn is defined to be i=1nVar(Xi) while the variance of the numerator grows like i=1nVar(Xi), so we again have that the variance of the whole expression is a constant (1 in this case).

Essentially, we know something "interesting" is happening with the distribution of X¯n:=1niXi, but if we don't properly center and scale it we won't be able to see it. I've heard this described sometimes as needing to adjust the microscope. If we don't blow up (e.g.) X¯μ by n then we just have X¯nμ0 in distribution by the weak law; an interesting result in it's own right but not as informative as the CLT. If we inflate by any factor an which is dominated by n, we still get an(X¯nμ)0 while any factor an which dominates n gives an(X¯nμ). It turns out n is just the right magnification to be able to see what is going on in this case (note: all convergence here is in distribution; there is another level of magnification which is interesting for almost sure convergence, which gives rise to the law of iterated logarithm).

guy
sumber
4
A more fundamental question, which ought to be addressed first, is why the SD is used to measure dispersion. Why not the absolute central kth moment for some other value of k? Or why not the IQR or any of its relatives? Once that is answered, then simple properties of covariance immediately give the n dependence (as @Gui11aume has recently explained.)
whuber
1
@whuber I agree, which is why I presented this as heuristic. I'm not certain it is amenable to a simple explanation, though I'd love to hear one. For me I'm not sure that I have a simpler, explainable reason past "because the square term is the relevant term in the Taylor expansion of the characteristic function once you subtract off the mean."
guy