Dari mana berasal dari teorema limit pusat (CLT)?

36

Versi yang sangat sederhana dari teorema terbatas pusat seperti di bawah ini yang merupakan Lindeberg-Lévy CLT. Saya tidak mengerti mengapa ada di sebelah kiri. Dan Lyapunov CLT mengatakan tetapi mengapa bukan ? Adakah yang bisa memberitahu saya apa saja faktor-faktor ini, seperti dan ? bagaimana kita mendapatkannya di teorema?

\sqrt{n} ((\frac{1}{n} \sum_{i = 1}^{n} X_{i}) - μ) \overset{d}{\to} N (0, σ^{2})

$\sqrt{n}\bigg(\bigg(\frac{1}{n}\sum_{i=1}^n X_i\bigg) - \mu\bigg)\ \xrightarrow{d}\ \mathcal{N}(0,\;\sigma^2)$

\sqrt{n}

$\sqrt{n}$

\frac{1}{s_{n}} \sum_{i = 1}^{n} (X_{i} - μ_{i}) \overset{d}{\to} N (0, 1)

$\frac{1}{s_n} \sum_{i=1}^{n} (X_i - \mu_i) \ \xrightarrow{d}\ \mathcal{N}(0,\;1)$

\sqrt{s_{n}}

$\sqrt{s_n}$

\sqrt{n}

$\sqrt{n}$

\frac{1}{s_{n}}

$\frac{1}{s_n}$

central-limit-theorem intuition Babi terbang
sumber

3

Ini dijelaskan di stats.stackexchange.com/questions/3734 . Jawaban itu panjang, karena meminta "intuisi." Ini menyimpulkan, "Perkiraan sederhana ini, bagaimanapun, menunjukkan bagaimana de Moivre mungkin awalnya menduga bahwa ada distribusi pembatasan universal, bahwa logaritma adalah fungsi kuadratik, dan bahwa faktor skala yang tepat

s_{n}

$s_n$ harus sebanding dengan

\sqrt{n}

$\sqrt{n}$ .... "

whuber

1

Secara intuitif, jika semua

σ_{i} = σ

$\sigma_i=\sigma$ maka

s_{n} = \sqrt{\sum σ_{i}^{2}} = \sqrt{n} σ

$s_n = \sqrt{\sum\sigma_i^2}=\sqrt{n}\sigma$ dan baris ke-2 mengikuti dari baris ke-1:

\sqrt{n} ((\frac{1}{n} \sum_{i = 1}^{n} X_{i}) - μ) = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} (X_{i} - μ) \overset{d}{\to} N (0, σ^{2})

$\sqrt{n}\bigg(\bigg(\frac{1}{n}\sum_{i=1}^n X_i\bigg)-\mu\bigg)=\frac{1}{\sqrt{n}}\sum_{i=1}^n \bigg(X_i-\mu\bigg)\xrightarrow{d}\ \mathcal{N}(0,\;\sigma^2)$ bagi dengan

σ = \frac{s_{n}}{\sqrt{n}}

$\sigma = \frac{s_n}{\sqrt{n}}$

(tentu saja kondisi Lyapunov, kombinasi dari semua , adalah pertanyaan lain)

\frac{\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} (X_{i} - μ)}{\frac{s_{n}}{\sqrt{n}}} = \frac{1}{s_{n}} \sum_{i = 1}^{n} (X_{i} - μ_{i}) \overset{d}{\to} N (0, 1)

$\frac{\frac{1}{\sqrt{n}}\sum_{i=1}^n \bigg(X_i-\mu\bigg)}{\frac{s_n}{\sqrt{n}}}=\frac{1}{s_n}\sum_{i=1}^{n}(X_i-\mu_i)\xrightarrow{d}\ \mathcal{N}(0,\;1)$ $\sigma_i$

Sextus Empiricus

33

Pertanyaan yang bagus (+1) !!

Anda akan ingat bahwa untuk variabel acak independen dan , dan . Jadi varians dari adalah $X$ $Y$ $Var(X+Y) = Var(X) + Var(Y)$ $Var(a\cdot X) = a^2 \cdot Var(X)$ $\sum_{i=1}^n X_i$ , dan varians $\sum_{i=1}^n \sigma^2 = n\sigma^2$ adalah. $\bar{X} = \frac{1}{n}\sum_{i=1}^n X_i$ $n\sigma^2 / n^2 = \sigma^2/n$

Ini untuk varians . Untuk membakukan variabel acak, Anda membaginya dengan standar deviasi. Seperti yang Anda tahu, nilai yang diharapkan dari adalah , jadi variabel $\bar{X}$ $\mu$

memiliki nilai yang diharapkan 0 dan varian 1. Jadi jika cenderung ke Gaussian, itu harus menjadi standar Gaussian

\frac{\bar{X} - E (\bar{X})}{\sqrt{V a r (\bar{X})}} = \sqrt{n} \frac{\bar{X} - μ}{σ}

$\frac{\bar{X} - E\left( \bar{X} \right)}{\sqrt{ Var(\bar{X}) }} = \sqrt{n} \frac{\bar{X} - \mu}{\sigma}$

. Formulasi Anda dalam persamaan pertama adalah setara. Dengan mengalikan sisi kiri dengan

Anda mengatur varians ke

.

N (0, 1)

$\mathcal{N}(0,\;1)$

σ

$\sigma$

σ^{2}

$\sigma^2$

Mengenai poin kedua Anda, saya percaya bahwa persamaan yang ditunjukkan di atas menggambarkan bahwa Anda harus membaginya dengan dan bukan $\sigma$ untuk membakukan persamaan, menjelaskan mengapa Anda menggunakan(penaksirdan bukan $\sqrt{\sigma}$ $s_n$ $\sigma)$ . $\sqrt{s_n}$

Tambahan: @whuber menyarankan untuk mendiskusikan mengapa penskalaan oleh . Dia melakukannya disana, tetapi karena jawabannya sangat panjang, saya akan mencoba menangkap esensi argumennya (yang merupakan rekonstruksi pemikiran de Moivre). $\sqrt{n}$

Jika Anda menambahkan sejumlah besar dari +1 dan -1, Anda dapat memperkirakan probabilitas bahwa jumlah tersebut akan menjadi dengan penghitungan dasar. Log probabilitas ini sebanding dengan . Jadi jika kita ingin probabilitas di atas konvergen ke konstanta ketika menjadi besar, kita harus menggunakan faktor normalisasi dalam $n$ $j$ $-j^2/n$ $n$ . $O(\sqrt{n})$

Menggunakan alat matematika modern (post de Moivre), Anda dapat melihat perkiraan yang disebutkan di atas dengan memperhatikan bahwa probabilitas yang dicari adalah

P (j) = \frac{(\binom{n}{n / 2 + j})}{2^{n}} = \frac{n!}{2^{n} (n / 2 + j)! (n / 2 - j)!}

$P(j) = \frac{{n \choose n/2+j}}{2^n} = \frac{n!}{2^n(n/2+j)!(n/2-j)!}$

yang kami perkirakan dengan rumus Stirling

P (j) \approx \frac{n^{n} e^{n / 2 + j} e^{n / 2 - j}}{2^{n} e^{n} (n / 2 + j)^{n / 2 + j} (n / 2 - j)^{n / 2 - j}} = {(\frac{1}{1 + 2 j / n})}^{n + j} {(\frac{1}{1 - 2 j / n})}^{n - j} .

$P(j) \approx \frac{n^n e^{n/2+j} e^{n/2-j}}{2^n e^n (n/2+j)^{n/2+j} (n/2-j)^{n/2-j} } = \left(\frac{1}{1+2j/n}\right)^{n+j} \left(\frac{1}{1-2j/n}\right)^{n-j}.$

\log (P (j)) = - (n + j) \log (1 + 2 j / n) - (n - j) \log (1 - 2 j / n) \sim - 2 j (n + j) / n + 2 j (n - j) / n \propto - j^{2} / n .

$\log(P(j)) = -(n+j) \log(1+2j/n) - (n-j) \log(1-2j/n) \\ \sim -2j(n+j)/n + 2j(n-j)/n \propto -j^2/n.$

gui11aume
sumber

Please see my comments to previous answers by Michael C. and guy.

whuber

Seems like the first equation (LL CLT) s/b

\sqrt{n} ((\frac{1}{n} \sum_{i = 1}^{n} X_{i}) - μ) \overset{d}{\to} N (0, 1)

$\sqrt{n}\bigg(\bigg(\frac{1}{n}\sum_{i=1}^n X_i\bigg) - \mu\bigg)\ \xrightarrow{d}\ \mathcal{N}(0,\;1)$

σ^{2}

$\sigma^2$

Jika Anda parametrize Gaussian dengan mean dan varians (bukan standar deviasi) maka saya percaya formula OP benar.

gui11aume

1

\frac{\bar{X} - E (\bar{X})}{\sqrt{V a r (\bar{X})}} = \sqrt{n} \frac{\bar{X} - μ}{σ} \overset{d}{\to} N (0, 1)

$\frac{\bar{X} - E\left( \bar{X} \right)}{\sqrt{ Var(\bar{X}) }} = \sqrt{n} \frac{\bar{X} - \mu}{\sigma} \xrightarrow{d}\ \mathcal{N}(0,\;1)$ if we multiply

\frac{\bar{X} - E (\bar{X})}{\sqrt{V a r (\bar{X})}}

$\frac{\bar{X} - E\left( \bar{X} \right)}{\sqrt{ Var(\bar{X}) }}$ by

σ

$\sigma$ we get what was shown by the OP (

σ

$\sigma$ cancel): namely

\sqrt{n} ((\frac{1}{n} \sum_{i = 1}^{n} X_{i}) - μ)

$\sqrt{n}\bigg(\bigg(\frac{1}{n}\sum_{i=1}^n X_i\bigg) - \mu\bigg)$ . But we know that VAR(aX) = a^2Var(X) where in this case a=

σ^{2}

$\sigma^2$ and Var(X) is 1 so the distribution is

N (0, σ^{2})

$\mathcal{N}(0,\;\sigma^2)$ .

B_Miner

Gui,If not too late I wanted to make sure I had this correct. If we assume

\frac{\bar{X} - E (\bar{X})}{\sqrt{V a r (\bar{X})}} = \sqrt{n} (\bar{X} - μ) \overset{d}{\to} N (0, 1)

$\frac{\bar{X} - E\left( \bar{X} \right)}{\sqrt{ Var(\bar{X}) }} = \sqrt{n} ({\bar{X} - \mu}) \xrightarrow{d}\ \mathcal{N}(0,\;1)$ and we multiply by a constant (

σ

$\sigma$ ), the expected value of this quantity (i.e.

\sqrt{n} (\bar{X} - μ)

$\sqrt{n} ({\bar{X} - \mu})$ ), which was zero is still zero as E[aX]=a*E[X] =>

σ

$\sigma$ *0=0. Is this correct?

B_Miner

8

There is a nice theory of what kind of distributions can be limiting distributions of sums of random variables. The nice resource is the following book by Petrov, which I personally enjoyed immensely.

It turns out, that if you are investigating limits of this type

\frac{1}{a_{n}} \sum_{i = 1}^{n} X_{n} - b_{n}, (1)

$\frac{1}{a_n}\sum_{i=1}^nX_n-b_n, \quad (1)$ where

X_{i}

$X_i$ are independent random variables, the distributions of limits are only certain distributions.

There is a lot of mathematics going around then, which boils to several theorems which completely characterizes what happens in the limit. One of such theorems is due to Feller:

Theorem Let $\{X_n;n=1,2,...\}$ be a sequence of independent random variables, $V_n(x)$ be the distribution function of $X_n$ , and $a_n$ be a sequence of positive constant. In order that

max_{1 \leq k \leq n} P (| X_{k} | \geq ε a_{n}) \to 0, for every fixed ε > 0

$\max_{1\le k\le n}P(|X_k|\ge\varepsilon a_n)\to 0, \text{ for every fixed } \varepsilon>0$

and

sup_{x} | P (a_{n}^{- 1} \sum_{k = 1}^{n} X_{k} < x) - Φ (x) | \to 0

$\sup_x\left|P\left(a_n^{-1}\sum_{k=1}^nX_k<x\right)-\Phi(x)\right|\to 0$

it is necessary and sufficient that

\sum_{k = 1}^{n} \int_{| x | \geq ε a_{n}} d V_{k} (x) \to 0 for every fixed ε > 0,

$\sum_{k=1}^n\int_{|x|\ge \varepsilon a_n}dV_k(x)\to 0 \text{ for every fixed }\varepsilon>0,$

a_{n}^{- 2} \sum_{k = 1}^{n} (\int_{| x | < a_{n}} x^{2} d V_{k} (x) - {(\int_{| x | < a_{n}} x d V_{k} (x))}^{2}) \to 1

$a_n^{-2}\sum_{k=1}^n\left(\int_{|x|<a_n}x^2dV_k(x)-\left(\int_{|x|<a_n}xdV_k(x)\right)^2\right)\to 1$

and

a_{n}^{- 1} \sum_{k = 1}^{n} \int_{| x | < a_{n}} x d V_{k} (x) \to 0.

$a_n^{-1}\sum_{k=1}^n\int_{|x|<a_n}xdV_k(x)\to 0.$

This theorem then gives you an idea of what $a_n$ should look like.

The general theory in the book is constructed in such way that norming constant is restricted in any way, but final theorems which give necessary and sufficient conditions, do not leave any room for norming constant other than $\sqrt{n}$ .

mpiktas
sumber

4

s $_n$ represents the sample standard deviation for the sample mean. s $_n$ $^2$ is the sample variance for the sample mean and it equals S $_n$ $^2$ /n. Where S $_n$ $^2$ is the sample estimate of the population variance. Since s $_n$ =S $_n$ /√n that explains how √n appears in the first formula. Note there would be a σ in the denominator if the limit were

N(0,1) but the limit is given as N(0, σ $^2$ ). Since S $_n$ is a consistent estimate of σ it is used in the secnd equation to taken σ out of the limit.

Michael R. Chernick
sumber

What about the other (more basic and important) part of the question: why

s_{n}

$s_n$ and not some other measure of dispersion?

whuber

@whuber That may be up for discussion but it was not part of the question. The OP just wanted to known why s

_{n}

$_n$ and √n appear in the formula for the CLT. Of course S

_{n}

$_n$ is there because it is consistent for σ and in that form of the CLT σ is removed.

Michael R. Chernick

1

To me it's not at all clear that

s_{n}

$s_n$ is present because it is "consistent for

σ

$\sigma$ ". Why wouldn't that also imply, say, that

s_{n}

$s_n$ should be used to normalize extreme-value statistics (which would not work)? Am I missing something simple and self-evident? And, to echo the OP, why not use

\sqrt{s_{n}}

$\sqrt{s_n}$ --after all, that is consistent for

\sqrt{σ}

$\sqrt{\sigma}$ !

whuber

The theorem as stated has convergence to N(0,1), so to accomplish that you either have to know σ and use it or use a consistent estimate of it which works by Slutsky's theorem I think. Was I that unclear?

Michael R. Chernick

I don't think you were unclear; I just think that an important point may be missing. After all, for many distributions we can obtain a limiting normal distribution by using the IQR instead of

s_{n}

$s_n$ --but then the result is not as neat (the SD of the limiting distribution depends on the distribution we begin with). I'm just suggesting that this deserves to be called out and explained. It will not be quite as obvious to someone who does not have the intuition developed by 40 years of standardizing all the distributions they encounter!

whuber

2

Intuitively, if $Z_n \to \mathcal N(0, \sigma^2)$ for some $\sigma^2$ we should expect that $\mbox{Var}(Z_n)$ is roughly equal to $\sigma^2$ ; it seems like a pretty reasonable expectation, though I don't think it is necessary in general. The reason for the $\sqrt n$ in the first expression is that the variance of $\bar X_n - \mu$ goes to $0$ like $\frac 1 n$ and so the $\sqrt n$ is inflating the variance so that the expression just has variance equal to $\sigma^2$ . In the second expression, the term $s_n$ is defined to be $\sqrt{\sum_{i = 1} ^ n \mbox{Var}(X_i)}$ while the variance of the numerator grows like $\sum_{i = 1} ^ n \mbox{Var}(X_i)$ , so we again have that the variance of the whole expression is a constant ( $1$ in this case).

Essentially, we know something "interesting" is happening with the distribution of $\bar X_n := \frac 1 n \sum_i X_i$ , but if we don't properly center and scale it we won't be able to see it. I've heard this described sometimes as needing to adjust the microscope. If we don't blow up (e.g.) $\bar X - \mu$ by $\sqrt n$ then we just have $\bar X_n - \mu \to 0$ in distribution by the weak law; an interesting result in it's own right but not as informative as the CLT. If we inflate by any factor $a_n$ which is dominated by $\sqrt n$ , we still get $a_n(\bar X_n - \mu) \to 0$ while any factor $a_n$ which dominates $\sqrt n$ gives $a_n(\bar X_n - \mu) \to \infty$ . It turns out $\sqrt n$ is just the right magnification to be able to see what is going on in this case (note: all convergence here is in distribution; there is another level of magnification which is interesting for almost sure convergence, which gives rise to the law of iterated logarithm).

guy
sumber

4

A more fundamental question, which ought to be addressed first, is why the SD is used to measure dispersion. Why not the absolute central

k^{th}

$k^\text{th}$ moment for some other value of

k

$k$ ? Or why not the IQR or any of its relatives? Once that is answered, then simple properties of covariance immediately give the

\sqrt{n}

$\sqrt{n}$ dependence (as @Gui11aume has recently explained.)

whuber

1

@whuber I agree, which is why I presented this as heuristic. I'm not certain it is amenable to a simple explanation, though I'd love to hear one. For me I'm not sure that I have a simpler, explainable reason past "because the square term is the relevant term in the Taylor expansion of the characteristic function once you subtract off the mean."

guy

Dari mana berasal dari teorema limit pusat (CLT)?

Jawaban: