Perkiraan kesalahan interval kepercayaan untuk mean ketika

15

Misalkan menjadi keluarga variabel acak iid yang mengambil nilai dalam [ 0 , 1 ] , memiliki mean μ dan varians σ 2 . Interval kepercayaan sederhana untuk rata-rata, menggunakan σ kapan pun diketahui, diberikan oleh P ( | ˉ X - μ | > ε ) σ 2{Xi}i=1n[0,1]μσ2σ

P(|X¯μ|>ε)σ2nε21nε2(1).

Juga, karena terdistribusi secara asimptotik sebagai variabel acak normal standar, distribusi normal kadang-kadang digunakan untuk "membangun" perkiraan interval kepercayaan.X¯μσ/n


Dalam ujian statistik pilihan ganda, saya harus menggunakan perkiraan ini alih-alih setiap kali n 30 . Saya selalu merasa sangat tidak nyaman dengan ini (lebih dari yang dapat Anda bayangkan), karena kesalahan perkiraan tidak dihitung.(1)n30


  • Mengapa menggunakan perkiraan normal daripada ?(1)

  • Saya tidak ingin, sekali lagi, menerapkan aturan secara membabi buta . Apakah ada referensi bagus yang dapat mendukung saya dalam penolakan untuk melakukannya dan memberikan alternatif yang sesuai? ( ( 1 ) adalah contoh dari apa yang saya anggap sebagai alternatif yang sesuai.)n30(1)

Di sini, sementara dan E [ | X | 3 ] tidak diketahui, mereka mudah dibatasi.σE[|X|3]

Harap perhatikan bahwa pertanyaan saya adalah permintaan referensi terutama tentang interval kepercayaan dan oleh karena itu berbeda dari perbedaan dari pertanyaan yang disarankan sebagai duplikat parsial di sini dan di sini . Tidak dijawab di sana.

Olivier
sumber
2
Anda mungkin harus meningkatkan perkiraan yang ditemukan dalam referensi klasik dan mengeksploitasi fakta bahwa berada di ( 0 , 1 ) yang ketika Anda perhatikan memberikan informasi tentang momen. Alat ajaib, saya percaya, akan menjadi teorema Berry – Esseen! Xi(0,1)
Yves
1
dengan batasan itu, varians tidak boleh lebih besar dari 0,25, jauh lebih baik dari 1, bukan?
carlo

Jawaban:

3

Mengapa menggunakan perkiraan normal?

Sesederhana itu mengatakan bahwa selalu lebih baik menggunakan lebih banyak informasi daripada kurang. Persamaan (1) menggunakan teorema Chebyshev . Catatan, bagaimana ia tidak menggunakan informasi apa pun tentang bentuk distribusi Anda, yaitu berfungsi untuk distribusi apa pun dengan varian yang diberikan. Oleh karena itu, jika Anda menggunakan beberapa informasi tentang bentuk distribusi Anda, Anda harus mendapatkan perkiraan yang lebih baik. Jika Anda tahu bahwa distribusi Anda adalah Gaussian, maka dengan menggunakan pengetahuan ini Anda mendapatkan perkiraan yang lebih baik.

Karena, Anda sudah menerapkan teorema batas pusat, mengapa tidak menggunakan perkiraan Gaussian dari batas? Mereka akan menjadi lebih baik, sebenarnya, lebih ketat (atau lebih tajam) karena perkiraan ini didasarkan pada pengetahuan tentang bentuk yang merupakan informasi tambahan.

Aturan praktis 30 adalah mitos, yang mendapat manfaat dari bias konfirmasi . Itu terus disalin dari satu buku ke buku lainnya. Suatu kali saya menemukan referensi yang menyarankan aturan ini di sebuah makalah pada 1950-an. Seingat saya, itu bukan bukti kuat. Itu semacam studi empiris. Pada dasarnya, satu-satunya alasan penggunaannya adalah karena ini berfungsi. Anda tidak melihatnya sering dilanggar.

UPDATE Cari kertas oleh Zachary R. Smith dan Craig S. Wells " Central Limit Theorem and Sample Size ". Mereka menyajikan studi empiris dari konvergensi ke CLT untuk berbagai jenis distribusi. Keajaiban nomor 30 tidak berfungsi dalam banyak kasus, tentu saja.

Aksakal
sumber
+1 Untuk penjelasan yang masuk akal. Tetapi bukankah ada risiko menggunakan informasi yang tidak benar? CLT tidak mengatakan apa-apa tentang distribusi untuk n tetap . X¯n
Olivier
benar, CLT tidak mengatakan apa-apa tentang distribusi sampel hingga, jadi jangan lakukan persamaan asimptotik. Namun, tak dapat disangkal mereka memiliki informasi yang berguna, itu sebabnya membatasi hubungan digunakan di mana-mana. Masalah dengan Chebyshev's adalah begitu luas sehingga jarang digunakan di luar kelas. Misalnya, untuk satu standar deviasi, probabilitas yang diberikan adalah - informasi yang hampir tidak praktis<1/k2=1
Aksakal
Namun untuk mengambil nilai 0 atau 1 dengan probabilitas yang sama, aplikasi Chebyshev Anda tajam. ;) Masalahnya adalah bahwa Chebyshev, diterapkan pada mean sampel, tidak akan pernah tetap tajam saat n tumbuh. Xn
Olivier
Saya tidak tahu tentang makalah Smith dan Wells, saya mencoba mereproduksinya dalam R dan tidak dapat memulihkan kesimpulan mereka ...
Alex Nelson
9

Masalah dengan menggunakan ketidaksetaraan Chebyshev untuk mendapatkan interval untuk nilai sebenarnya, adalah bahwa hal itu hanya memberi Anda batas bawah untuk probabilitas, yang kadang-kadang juga sepele, atau, agar tidak sepele, mungkin memberikan sangat luas interval kepercayaan. Kita punya

P(|X¯μ|>ε)=1P(X¯εμX¯+ε)

P(X¯εμX¯+ε)11nε2

Kita melihat bahwa, tergantung juga pada ukuran sampel, jika kita mengurangi "terlalu banyak" kita akan mendapatkan jawaban sepele "probabilitas lebih besar dari nol".ε

Selain itu, apa yang kita dapatkan dari pendekatan ini adalah kesimpulan dari bentuk "" probabilitas jatuh [ ˉ X ± ε ] adalah sama atau lebih besar dari ..."μ[X¯±ε]

Tapi mari kita asumsikan bahwa kita baik sedang dengan ini, dan masing menunjukkan probabilitas minimum yang kita nyaman. Jadi kami maupmin

11nε2=pminε=1(1pmin)n

Dengan ukuran sampel yang kecil dan probabilitas minimum yang diinginkan tinggi, ini dapat memberikan interval kepercayaan lebar yang tidak memuaskan. Misalnya untukpmin=0.9 and n=100 we will get ε.316, which, for example for the variable treated by the OP that is bounded in [0,1] appears to be too big to be useful.

But the approach is valid, and distribution-free, and so there may be instances where it can be useful.

One may want to check also the Vysochanskij–Petunin inequality mentioned in another answer, which holds for continuous unimodal distributions and refines Chebyshev's inequality.

Alecos Papadopoulos
sumber
I don't agree that a problem with Chebychev it that it only gives a lower bound for the probability. In a distribution-free setting, a lower bound is the best we can hope for. The important questions are: is Chebychev sharp? Is the Chebychev C.I.'s length systematically over-estimated for a fixed level α? I answered this in my post, from a particular point of view. However, I'm still trying to understand if Chebychev for a sample mean will always fail to be sharp, in a stronger sense.
Olivier
The length of the CI is not under estimation, since there does not exists some single unknown length, so I am not sure what you mean by using the word "over-estimation" here. Different methods provide different CI's, which then of course we can attempt to evaluate and assess them.
Alecos Papadopoulos
Over-estimation was a bad choice of words, thanks for pointing it out. By "systematically over-estimated lenght" I meant that the method for obtaining a C.I. always yields something larger than necessary.
Olivier
1
@Olivier Generally speaking, the Chebyshev Inequality is known to be a loose inequality, and so used more as a tool in theoretical derivations and proofs rather than in applied work.
Alecos Papadopoulos
2
@Olivier "Generally speaking" covers your qualification, I would say.
Alecos Papadopoulos
7

The short answer is that it can go pretty badly, but only if one or both tails of the sampling distribution is really fat.

This R code generate a million sets of 30 gamma-distributed variables and take their mean; it can be used to get a sense of what the sampling distribution of the mean looks like. If the normal approximation works as intended, the results should be approximately normal with mean 1 and variance 1/(30 * shape).

f = function(shape){replicate(1E6, mean(rgamma(30, shape, shape)))}

When shape is 1.0, the gamma distribution becomes an exponential distribution, which is pretty non-normal. Nevertheless, the non-Gaussian parts mostly average out and so Gaussian approximation isn't so bad:

histogram & density plot

There's clearly some bias, and it would be good to avoid that when possible. But honestly, that level of bias probably won't be the biggest problem facing a typical study.

That said, things can get much worse. With f(0.01), the histogram looks like this:

histogram

Log-transforming the 30 sampled data points before averaging helps a lot, though:

histogram

In general, distributions with long tails (on one or both sides of the distribution) will require the most samples before the Gaussian approximation starts to become reliable. There are even pathological cases where there will literally never be enough data for the Gaussian approximation to work, but you'll probably have more serious problems in that case (because the sampling distribution doesn't have a well-defined mean or variance to begin with).

David J. Harris
sumber
I find the experiment very pertinent and interesting. I won't take this as the answer, however, as it does not address the crux of the problem.
Olivier
1
what's the crux?
David J. Harris
Your answer does not provide rigorous footing for sound statistical practice. It only gives examples. Note, also, that the random variables I consider are bounded, greatly changing what is the worst possible case.
Olivier
@Glen_b: this answer isn't so relevant to your revised version of the question. Should I just leave it here, or would you recommend something else?
David J. Harris
3

Problem with the Chebyshev confidence interval

As mentioned by Carlo, we have σ214. This follows from Var(X)μ(1μ). Therefore a confidence interval for μ is given by

P(|X¯μ|ε)14nε2.
The problem is that the inequality is, in a certain sense, quite loose when n gets large. An improvement is given by Hoeffding's bound and shown below. However, we can also demonstrate how bad it can get using the Berry-Esseen theorem, pointed out by Yves. Let Xi have a variance 14, the worst possible case. The theorem implies that P(|X¯μ|ε2n)2SF(ε)+8n, where SF is the survival function of the standard normal distribution. In particular, with ε=16, we get SF(16)e58 (according to Scipy), so that essentially
P(|X¯μ|8n)8n+0,()
whereas the Chebyshev inequality implies
P(|X¯μ|8n)1256.
Note that I did not try to optimize the bound given in (), the result here is only of conceptual interest.

Comparing the lengths of the confidence intervals

Consider the (1α)-level confidence interval lengths Z(α,n) and C(α,n) obtained using the normal approximation (σ=12) and the Chebyshev inequality, repectively. It turns out that C(α,n) is a constant times bigger than Z(α,n), independently of n. Precisely, for all n,

C(α,n)=κ(α)Z(α,n),κ(α)=(ISF(α2)α)1,
where ISF is the inverse survival function of the standard normal distribution. I plot below the multiplicative constant.

enter image description here

In particular, the 95% level confidence interval obtained using the Chebyshev inequality is about 2.3 times bigger than the same level confidence interval obtained using the normal approximation.


Using Hoeffding's bound

Hoeffding's bound gives

P(|X¯μ|ε)2e2nε2.
Thus an (1α)-level confidence interval for μ is
(X¯ε,X¯+ε),ε=lnα22n,
of length H(α,n)=2ε. I plot below the lengths of the different confidence intervals (Chebyshev inequality: C; normal approximation (σ=1/2): Z; Hoeffding's inequality: H) for α=0.05.

enter image description here

Olivier
sumber
Very interesting! I have though some corrections to suggest you toghether with a big puzzlement: first, you should take out absolute value from the Hoeffding's unequality definition, it's P(X¯με)e2nε2 or P(|X¯μ|ε)2e2nε2; the second correction is less important, α is generally taken to be 0.05 or lower, while 0.95 is addressed as 1α, it's a bit confusing to see them switched in your post.
carlo
Last and more important: I found your result incredible, so I tried to replicate it in R and I got a completely opposite result: normal approximation gives smaller confidence intervals to me! this is the code I used: curve(sqrt(-log(.025)/2/x), to= 100, col= 'red', xlab= 'n', ylab= 'half interval') #Hoeffding ; curve(qnorm(.975, 0, .5/sqrt(x)), to= 100, add= T, col= 'darkgreen') #normal approximation
carlo
0

let's start with the number 30: it's, as anyone will say, a rule of thumb. but how can we find a number that fits better to our data? It's actually mostly a matter of skewness: even the strangest distribution will fast converge to normal if they are simmetric and continuous, skewed data will be much slower. I remember learning that a binomial distribution can be properly approximated to normal when its variance is greater than 9; for this example it's to be considered that discrete distribution also have the problem that they need great numbers to simulate continuity, but think to this: a simmetric binomial distribution will reach that variance with n = 36, if p = 0.1 instead, n must go up to 100 (variabile trasformation, however, would help a lot)!

If you only want to use variance instead, dropping gaussian approximation, consider Vysochanskij–Petunin inequality over Chebichev's, it needs the assumption of unimodal distribution of the mean, but this is a very safe one with any sample size, I'd say, greater than 2.

carlo
sumber
Could you add a reference for " Vysochanskij–Petunin inequality "? Never heard of it!
kjetil b halvorsen
wikipedia docet
carlo
Can you express the rate of convergence in terms of the skewdness? Why is a sample size of, you'd say 2, enough for unimodality? How is the Vysochanskij–Petunin inequality an improvement over Chebychev if you need to double or triple the sample size for it to apply?
Olivier
I made a fast google search and I found out that binomial distribution is actually often used to explain different sample size need for skewed data, but I didn't find, and I guess there is no accepted "rate of convergence in terms of the skewdness".
carlo
Vysochanskij–Petunin inequality is more efficent than Chebychev's, so it doesn't need a greater sample at all, but it has some use constraints: first, you have to have a continuous distribution, than, it has to be unimodal (no local modes are allowed). It may seem strange to drop normality assumption to adopt another one, but if your data is not discrete, sample mean should eliminate local modes even with very small samples. Fact is that mean has much of a bell distribution and, also if it can be skewed or have fat tails, it quickly comes to only have one mode.
carlo