Distribusi asimptotik dari varians sampel dari sampel non-normal

19

Ini adalah perlakuan yang lebih umum dari masalah yang ditimbulkan oleh pertanyaan ini . Setelah menurunkan distribusi asimptotik dari varians sampel, kita dapat menerapkan metode Delta untuk sampai pada distribusi yang sesuai untuk deviasi standar.

Biarkan sampel berukuran dari variabel acak tidak normal iid , dengan mean dan varians . Tetapkan mean sampel dan varians sampel sebagai { X i } ,n{Xi},i=1,...,nμσ2

x¯=1ni=1nXi,s2=1n1i=1n(Xix¯)2

Kita tahu bahwa

E(s2)=σ2,Var(s2)=1n(μ4n3n1σ4)

di mana , dan kami membatasi perhatian kami pada distribusi yang momen-momen apa yang perlu ada dan terbatas, ada, dan terbatas.μ4=E(Xiμ)4

Apakah itu berlaku?

n(s2σ2)dN(0,μ4σ4)?
Alecos Papadopoulos
sumber
Heh. Saya baru saja memposting di utas lainnya, tidak menyadari Anda memposting ini. Ada sejumlah hal yang dapat ditemukan pada CLT yang diterapkan pada varian (seperti p3-4 di sini misalnya). Jawaban bagus btw.
Glen_b -Reinstate Monica
Terima kasih. Ya saya telah menemukan ini. Tapi mereka merindukan case @whuber tunjukkan. Mereka bahkan memberikan contoh Bernoulli dengan umum ! (dasar hal. 4). Saya memperluas jawaban saya untuk mencakup kasus juga. pp=1/2
Alecos Papadopoulos
Ya, saya melihat mereka menganggap Bernoulli belum mempertimbangkan kasus khusus itu. Saya pikir penyebutan perbedaan untuk Bernoulli yang diskalakan (kasus dikotomus yang sama dengan prob) adalah salah satu alasan (di antara beberapa hal lain) mengapa berharga untuk didiskusikan dalam jawaban di sini (bukan hanya dalam komentar) - tidak hanya itu itu dapat dicari.
Glen_b -Reinstate Monica

Jawaban:

20

Untuk dependensi sisi-langkah yang timbul ketika kami mempertimbangkan varians sampel, kami menulis

(n1)s2=i=1n((Xiμ)(x¯μ))2

=i=1n(Xiμ)22i=1n((Xiμ)(x¯μ))+i=1n(x¯μ)2

dan setelah sedikit manipulasi,

=i=1n(Xiμ)2n(x¯μ)2

Karena itu

n(s2σ2)=nn1i=1n(Xiμ)2nσ2nn1n(x¯μ)2

Memanipulasi,

n(s2σ2)=nn1i=1n(Xiμ)2nn1n1σ2nn1n(x¯μ)2

=nnn11ni=1n(Xiμ)2nn1n1σ2nn1n(x¯μ)2

=nn1[n(1ni=1n(Xiμ)2σ2)]+nn1σ2nn1n(x¯μ)2

Istilah menjadi kesatuan tanpa gejala. Istilah n/(n1)bersifat determinitik dan menjadi nol sebagain.nn1σ2n

Kami juga punya . Komponen pertama menyatu dalam distribusi ke Normal, konvergen kedua dalam probabilitas ke nol. Kemudian oleh teorema Slutsky produk konvergen dalam probabilitas ke nol,n(x¯μ)2=[n(x¯μ)](x¯μ)

n(x¯μ)2p0

Kita dibiarkan dengan istilah itu

[n(1ni=1n(Xiμ)2σ2)]

Diperingatkan oleh contoh mematikan yang ditawarkan oleh @whuber dalam komentar atas jawaban ini , kami ingin memastikan bahwa tidak konstan. Whuber menunjukkan bahwa jika X i adalah Bernoulli ( 1 / 2 ) maka kuantitas ini adalah konstan. Jadi tidak termasuk variabel yang ini terjadi (mungkin lain dikotomis, bukan hanya 0 / 1 biner?), Untuk sisa kita memiliki(Xiμ)2Xi(1/2)0/1

E(Xiμ)2=σ2,Var[(Xiμ)2]=μ4σ4

and so the term under investigation is a usual subject matter of the classical Central Limit Theorem, and

n(s2σ2)dN(0,μ4σ4)

Note: the above result of course holds also for normally distributed samples -but in this last case we have also available a finite-sample chi-square distributional result.

Alecos Papadopoulos
sumber
3
+1 There's no reason to check general dichotomous distributions because they are all scale and location versions of the Bernoulli: the analysis for the Bernoulli suffices. My simulations (out to sample sizes of 101000) confirm the χ12 result.
whuber
@whuber Thanks for checking. You' re right of course about the Benroulli being the mother of them all.
Alecos Papadopoulos
10

You already have a detailed answer to your question but let me offer another one to go with it. Actually, a shorter proof is possible based on the fact that the distribution of

S2=1n1i=1n(XiX¯)2

does not depend on E(X)=ξ, say. Asymptotically, it also does not matter whether we change the factor 1n1 to 1n, which I will do for convenience. We then have

n(S2σ2)=n[1ni=1nXi2X¯2σ2]

And now we assume without loss of generality that ξ=0 and we notice that

nX¯2=1n(nX¯)2

has probability limit zero, since the second term is bounded in probability (by the CLT and the continuous mapping theorem), i.e. it is Op(1). The asymptotic result now follows from Slutzky's theorem and the CLT, since

n[1nXi2σ2]DN(0,τ2)

where τ2=Var{X2}=E(X4)(E(X2))2. And that will do it.

JohnK
sumber
This is certainly more economical. But please reconsider how innocuous is the E(X)=0 assumption. For example, it excludes the case of a Bernoulli (p=1/2) sample, and as I mention at the end of my answer, for such a sample, this asymptotic result does not hold.
Alecos Papadopoulos
@AlecosPapadopoulos Indeed but the data can always be centered, right? I mean
i=1n(Xiμ(X¯μ))2=i=1n(XiX¯)2
and we can work with the these variables. For the Bernoulli case, is there something stopping us from doing so?
JohnK
@AlecosPapadopoulos Oh yeah, I see the problem.
JohnK
I have written a small piece on the matter, I think it is time to upload it in my blog. I will notify you in case you are interested to read it. The asymptotic distribution of the sample variance in this case is interesting, and even more the asymptotic distribution of the sample standard deviation. These results hold for any p=1/2 dichotomous random variable.
Alecos Papadopoulos
1
Dumb question, but how can we assume that S2 is ancillary if the Xi are not normal? Or is S2 always ancillary (w.r.t. mean parametrization I guess) but only independent of the sample mean when the sample mean is a complete sufficient statistic (i.e. normally distributed) by Basu's theorem?
Chill2Macht
3

The excellent answers by Alecos and JohnK already derive the result you are after, but I would like to note something else about the asymptotic distribution of the sample variance.

It is common to see asymptotic results presented using the normal distribution, and this is useful for stating the theorems. However, practically speaking, the purpose of an asymptotic distribution for a sample statistic is that it allows you to obtain an approximate distribution when n is large. There are lots of choices you could make for your large-sample approximation, since many distributions have the same asymptotic form. In the case of the sample variance, it is my view that an excellent approximating distribution for large n is given by:

Sn2σ2Chi-Sq(df=DFn)DFn,

where DFn2/V(Sn2/σ2)=2n/(κ(n3)/(n1)) and κ=μ4/σ4 is the kurtosis parameter. This distribution is asymptotically equivalent to the normal approximation derived from the theorem (the chi-squared distribution converges to normal as the degrees-of-freedom tends to infinity). Despite this equivalence, this approximation has various other properties you would like your approximating distribution to have:

  • Unlike the normal approximation derived directly from the theorem, this distribution has the correct support for the statistic of interest. The sample variance is non-negative, and this distribution is has non-negative support.

  • In the case where the underlying values are normally distributed, this approximation is actually the exact sampling distribution. (In this case we have κ=3 which gives DFn=n1, which is the standard form used in most texts.) It therefore constitutes a result that is exact in an important special case, while still being a reasonable approximation in more general cases.


Derivation of the above result: Approximate distributional results for the sample mean and variance are discussed at length in O'Neill (2014), and this paper provides derivations of many results, including the present approximating distribution.

This derivation starts from the limiting result in the question:

n(Sn2σ2)N(0,σ4(κ1)).

Re-arranging this result we obtain the approximation:

Sn2σ2N(1,κ1n).

Since the chi-squared distribution is asymptotically normal, as DF we have:

Chi-Sq(DF)DF1DFN(DF,2DF)=N(1,2DF).

Taking DFn2/V(Sn2/σ2) (which yields the above formula) gives DFn2n/(κ1) which ensures that the chi-squared distribution is asymptotically equivalent to the normal approximation from the limiting theorem.

Reinstate Monica
sumber
One empirically interesting question is that which of these two asymptotic results works better in finite sample cases under various underlying data distributions.
lzstat
Yes, I think that would be a very interesting (and publishable) simulation study. Since the present formula is based on kurtosis-correction of the variance of the sample variance, I would expect that the present result would work best when you have an underlying distribution with a kurtosis parameter that is far from mesokurtic (i.e., when the kurtosis-correction matters most). Since the kurtosis would need to be estimated from the sample, it is an open question as to when there would be a substantial improvement in overall performance.
Reinstate Monica