Hubungan antara interval kepercayaan dan pengujian hipotesis statistik untuk uji-t

31

Diketahui bahwa interval kepercayaan dan pengujian hipotesis statistik sangat terkait. Pertanyaan saya terfokus pada perbandingan cara untuk dua kelompok berdasarkan variabel numerik. Anggaplah hipotesis tersebut diuji menggunakan uji-t. Di sisi lain, seseorang dapat menghitung interval kepercayaan untuk sarana kedua kelompok. Apakah ada hubungan antara interval kepercayaan yang tumpang tindih dan penolakan terhadap hipotesis nol yang artinya sama (mendukung alternatif yang berarti berbeda - tes dua sisi)? Sebagai contoh, sebuah tes dapat menolak hipotesis nol jika interval kepercayaan tidak tumpang tindih.

hypothesis-testing confidence-interval Lan
sumber

31

Ya, ada beberapa hubungan sederhana antara perbandingan interval kepercayaan dan tes hipotesis dalam berbagai pengaturan praktis. Namun, selain memverifikasi prosedur CI dan uji-t sesuai untuk data kami, kami harus memeriksa bahwa ukuran sampel tidak terlalu berbeda dan bahwa dua set memiliki standar deviasi yang serupa. Kita juga seharusnya tidak berusaha untuk mendapatkan nilai-p yang sangat tepat dari membandingkan dua interval kepercayaan, tetapi harus senang untuk mengembangkan perkiraan yang efektif.

Dalam mencoba mendamaikan dua balasan yang sudah diberikan (oleh @John dan @Brett), akan membantu untuk menjadi eksplisit secara matematis. Rumus untuk interval kepercayaan dua sisi simetris yang sesuai untuk pengaturan pertanyaan ini adalah

CI = m \pm \frac{t_{α} (n) s}{\sqrt{n}}

$\text{CI} = m \pm \frac{t_\alpha(n) s}{\sqrt{n}}$

di mana $m$ adalah rata-rata sampel dari $n$ pengamatan independen, $s$ adalah standar deviasi sampel, $2\alpha$ adalah ukuran tes yang diinginkan (maksimum false positive rate), dan $t_\alpha(n)$ adalah persentil $1-\alpha$ atas dari distribusi t Student dengan $n-1$ derajat kebebasan. (Penyimpangan kecil dari notasi konvensional ini menyederhanakan eksposisi dengan meniadakan kebutuhan untuk meributkan perbedaan $n$ vs $n-1$ , yang bagaimanapun juga tidak akan berpengaruh.)

Menggunakan subskrip $1$ dan $2$ untuk membedakan dua set data independen untuk perbandingan, dengan $1$ sesuai dengan yang lebih besar dari dua cara, non- tumpang tindih interval kepercayaan diekspresikan oleh ketidaksetaraan (batas kepercayaan bawah 1) $\gt$ (batas kepercayaan atas 2 ); yaitu ,

m_{1} - \frac{t_{α} (n_{1}) s_{1}}{\sqrt{n_{1}}} > m_{2} + \frac{t_{α} (n_{2}) s_{2}}{\sqrt{n_{2}}} .

$m_1 - \frac{t_\alpha(n_1) s_1}{\sqrt{n_1}} \gt m_2 + \frac{t_\alpha(n_2) s_2}{\sqrt{n_2}}.$

Ini dapat dibuat agar terlihat seperti t-statistik dari uji hipotesis yang sesuai (untuk membandingkan dua cara) dengan manipulasi aljabar sederhana, menghasilkan

\frac{m_{1} - m_{2}}{\sqrt{s_{1}^{2} / n_{1} + s_{2}^{2} / n_{2}}} > \frac{s_{1} \sqrt{n_{2}} t_{α} (n_{1}) + s_{2} \sqrt{n_{1}} t_{α} (n_{2})}{\sqrt{n_{1} s_{2}^{2} + n_{2} s_{1}^{2}}} .

$\frac{m_1-m_2}{\sqrt{s_1^2/n_1 + s_2^2/n_2}} \gt \frac{s_1\sqrt{n_2}t_\alpha(n_1) + s_2\sqrt{n_1}t_\alpha(n_2)}{\sqrt{n_1 s_2^2 + n_2 s_1^2}}.$

Sisi kiri adalah statistik yang digunakan dalam uji hipotesis; biasanya dibandingkan dengan persentil dari distribusi t Student dengan $n_1+n_2$ derajat kebebasan: yaitu, untuk $t_\alpha(n_1+n_2)$ . Sisi kanan adalah rata-rata tertimbang yang bias dari persentil distribusi t asli.

Analisis sejauh ini membenarkan jawaban oleh @ Brett: tampaknya tidak ada hubungan sederhana yang tersedia. Namun, mari selidiki lebih lanjut. Saya terinspirasi untuk melakukannya karena, secara intuitif, interval kepercayaan yang tidak tumpang tindih harus mengatakan sesuatu!

Pertama, perhatikan bahwa bentuk tes hipotesis ini hanya valid ketika kita mengharapkan $s_1$ dan $s_2$ setidaknya kira-kira sama. (Jika tidak kami menghadapi terkenal masalah Behrens-Fisher dan kompleksitas.) Setelah memeriksa kesetaraan perkiraan $s_i$ , kita kemudian bisa menciptakan penyederhanaan perkiraan dalam bentuk

\frac{m_{1} - m_{2}}{s \sqrt{1 / n_{1} + 1 / n_{2}}} > \frac{\sqrt{n_{2}} t_{α} (n_{1}) + \sqrt{n_{1}} t_{α} (n_{2})}{\sqrt{n_{1} + n_{2}}} .

$\frac{m_1-m_2}{s\sqrt{1/n_1 + 1/n_2}} \gt \frac{\sqrt{n_2}t_\alpha(n_1) + \sqrt{n_1}t_\alpha(n_2)}{\sqrt{n_1 + n_2}}.$

Di sini, $s \approx s_1 \approx s_2$ . Secara realistis, kita seharusnya tidak mengharapkan perbandingan informal dari batas kepercayaan ini memiliki ukuran yang sama dengan $\alpha$ . Pertanyaan kita kemudian adalah apakah terdapat $\alpha'$ sehingga sisi kanan (setidaknya kira-kira) sama dengan statistik t yang benar. Yaitu, untuk apa $\alpha'$ itu kasusnya itu

t_{α^{'}} (n_{1} + n_{2}) = \frac{\sqrt{n_{2}} t_{α} (n_{1}) + \sqrt{n_{1}} t_{α} (n_{2})}{\sqrt{n_{1} + n_{2}}} ?

$t_{\alpha'}(n_1+n_2) = \frac{\sqrt{n_2}t_\alpha(n_1) + \sqrt{n_1}t_\alpha(n_2)}{\sqrt{n_1 + n_2}}\text{?}$

Ternyata untuk ukuran sampel yang sama, $\alpha$ dan $\alpha'$ dihubungkan (ke akurasi yang cukup tinggi) oleh hukum daya. Sebagai contoh, berikut adalah plot log-log dari keduanya untuk case $n_1=n_2=2$ (garis biru terendah), $n_1=n_2=5$ (garis merah tengah), $n_1=n_2=\infty$ ( garis emas tertinggi). Garis putus-putus hijau tengah adalah perkiraan yang dijelaskan di bawah ini. Kelurusan kurva-kurva ini memungkiri suatu hukum kekuatan. Bervariasi dengan $n=n_1=n_2$ , but not much.

Plot 1

The answer does depend on the set $\{n_1, n_2\}$ , but it is natural to wonder how much it really varies with changes in the sample sizes. In particular, we could hope that for moderate to large sample sizes (maybe $n_1 \ge 10, n_2 \ge 10$ or thereabouts) the sample size makes little difference. In this case, we could develop a quantitative way to relate $\alpha'$ to $\alpha$ .

This approach turns out to work provided the sample sizes are not too different from each other. In the spirit of simplicity, I will report an omnibus formula for computing the test size $\alpha'$ corresponding to the confidence interval size $\alpha$ . It is

α^{'} \approx e α^{1.91};

$\alpha' \approx e \alpha^{1.91};$

that is,

α^{'} \approx \exp (1 + 1.91 \log (α)) .

$\alpha' \approx \exp(1 + 1.91\log(\alpha)).$

This formula works reasonably well in these common situations:

Both sample sizes are close to each other, $n_1 \approx n_2$ , and $\alpha$ is not too extreme ( $\alpha \gt .001$ or so).
One sample size is within about three times the other and the smallest isn't too small (roughly, greater than $10$ ) and again $\alpha$ is not too extreme.
One sample size is within three times the other and $\alpha \gt .02$ or so.

The relative error (correct value divided by the approximation) in the first situation is plotted here, with the lower (blue) line showing the case $n_1=n_2=2$ , the middle (red) line the case $n_1=n_2=5$ , and the upper (gold) line the case $n_1=n_2=\infty$ . Interpolating between the latter two, we see that the approximation is excellent for a wide range of practical values of $\alpha$ when sample sizes are moderate (around 5-50) and otherwise is reasonably good.

Plot 2

This is more than good enough for eyeballing a bunch of confidence intervals.

To summarize, the failure of two $2\alpha$ -size confidence intervals of means to overlap is significant evidence of a difference in means at a level equal to $2e \alpha^{1.91}$ , provided the two samples have approximately equal standard deviations and are approximately the same size.

I'll end with a tabulation of the approximation for common values of $2\alpha$ .

$2\alpha$ $2\alpha'$
0.1 0.02

0.05 0.005

0.01 0.0002

0.005 0.00006

For example, when a pair of two-sided 95% CIs ( $2\alpha=.05$ ) for samples of approximately equal sizes do not overlap, we should take the means to be significantly different, $p \lt .005$ . The correct p-value (for equal sample sizes $n$ ) actually lies between $.0037$ ( $n=2$ ) and $.0056$ ( $n=\infty$ ).

This result justifies (and I hope improves upon) the reply by @John. Thus, although the previous replies appear to be in conflict, both are (in their own ways) correct.

whuber
sumber

7

No, not a simple one at least.

There is, however, an exact correspondence between the t-test of difference between two means and the confidence interval for the difference between the two means.

If the confidence interval for the difference between two means contains zero, a t-test for that difference would fail to reject null at the same level of confidence. Likewise if the confidence interval does not contain 0, the t-test would reject the null.

This is not the same as overlap between confidence intervals for each of the two means.

Brett
sumber

The reply by @John, which although at present is not quite right in the details, correctly points out that yes, you can relate overlaps of CIs to test p-values. The relationship is not any more complex than the t-test itself. This has the appearance of contradicting your primary conclusion as stated in the first line. How would you resolve this difference?

whuber

I don't think they are contradictory. I can add some caveats. But, in the general sense, without additional assumptions and knowledge about parameters outside of the presentation of the interval (the variance, the sample size) the response stands as is. No, not a simple one at least.

Brett

5

Under typical assumptions of equal variance, yes, there is a relationship. If the bars overlap by less than the length of one bar * sqrt(2) then a t-test would find them to be significantly different at alpha = 0.05. If the ends of the bars just barely touch then a difference would be found at 0.01. If the confidence intervals for the groups are not equal one typically takes the average and applies the same rule.

Alternatively, if the width of a confidence interval around one of the means is w then the least significant difference between two values is w * sqrt(2). This is simple when you think of the denominator in the independent groups t-test, sqrt(2*MSE/n), and the factor for the CI which, sqrt(MSE/n).

(95% CIs assumed)

There's a simple paper on making inferences from confidence intervals around independent means here. It will answer this question and many other related ones you may have.

Cumming, G., & Finch, S. (2005, March). Inference by eye: confidence intervals, and how to read pictures of data. American Psychologist, 60(2), 170-180.

John
sumber

2

Saya percaya Anda juga perlu menganggap kedua kelompok memiliki ukuran yang sama.

whuber

kira-kira, ya ...

John

Hubungan antara interval kepercayaan dan pengujian hipotesis statistik untuk uji-t

Jawaban: