Biarkan menjadi urutan variabel acak iid yang disampel dari distribusi stabil alpha , dengan parameter .
Sekarang perhatikan urutan , di mana , untuk .
Saya ingin memperkirakan persentil .
Ide saya adalah melakukan semacam simulasi Monte-Carlo:
l = 1;
while(l < max_iterations)
{
Generate $X_1, X_2, \ldots, X_{3n}$ and compute $Y_1, Y_2, \ldots, Y_{n}$;
Compute $0.01-$percentile of current repetition;
Compute mean $0.01-$percentile of all the iterations performed;
Compute variance of $0.01-$percentile of all the iterations performed;
Calculate confidence interval for the estimate of the $0.01-$percentile;
if(confidence interval is small enough)
break;
}
Memanggil mean dari semua sampel persentil yang dihitung sebagai dan , untuk menghitung interval kepercayaan yang sesuai untuk , saya resor ke bentuk Kuat dari Teorema Limit Pusat :
Misalkan menjadi urutan variabel acak iid dengan dan . Tetapkan mean sampel sebagai . Kemudian, memiliki distribusi normal standar yang membatasi, yaitu
dan teorema Slutksy untuk menyimpulkan bahwa
Maka interval kepercayaan untuk adalah
Questions:
1) Is my approach correct? How can I justify the application of the CLT? I mean, how can I show that the variance is finite? (Do I have to look at the variance of ? Because I don't think it is finite...)
2) How can I show that the average of all the sample percentiles computed converges to the true value of the percentile? (I should use order statistics but I'm unsure on how to procceed; references are appreciated.)
Jawaban:
The variance ofY is not finite. This is because an alpha-stable variable X with α=3/2 (a Holtzmark distribution) does have a finite expectation μ but its variance is infinite. If Y had a finite variance σ2 , then by exploiting the independence of the Xi and the definition of variance we could compute
This cubic equation inVar(X) has at least one real solution (and up to three solutions, but no more), implying Var(X) would be finite--but it's not. This contradiction proves the claim.
Let's turn to the second question.
Any sample quantile converges to the true quantile as the sample grows large. The next few paragraphs prove this general point.
Let the associated probability beq=0.01 (or any other value between 0 and 1 , exclusive). Write F for the distribution function, so that Zq=F−1(q) is the qth quantile.
All we need to assume is thatF−1 (the quantile function) is continuous. This assures us that for any ϵ>0 there are probabilities q−<q and q+>q for which
and that asϵ→0 , the limit of the interval [q−,q+] is {q} .
Consider any iid sample of sizen . The number of elements of this sample that are less than Zq− has a Binomial(q−,n) distribution, because each element independently has a chance q− of being less than Zq− . The Central Limit Theorem (the usual one!) implies that for sufficiently large n , the number of elements less than Zq− is given by a Normal distribution with mean nq− and variance nq−(1−q−) (to an arbitrarily good approximation). Let the CDF of the standard Normal distribution be Φ . The chance that this quantity exceeds nq therefore is arbitrarily close to
Because the argument onΦ on the right hand side is a fixed multiple of n−−√ , it grows arbitrarily large as n grows. Since Φ is a CDF, its value approaches arbitrarily close to 1 , showing the limiting value of this probability is zero.
In words: in the limit, it is almost surely the case thatnq of the sample elements are not less than Zq− . An analogous argument proves it is almost surely the case that nq of the sample elements are not greater than Zq+ . Together, these imply the q quantile of a sufficiently large sample is extremely likely to lie between Zq−ϵ and Zq+ϵ .
That's all we need in order to know that simulation will work. You may choose any desired degree of accuracyϵ and confidence level 1−α and know that for a sufficiently large sample size n , the order statistic closest to nq in that sample will have a chance at least 1−α of being within ϵ of the true quantile Zq .
Having established that a simulation will work, the rest is easy. Confidence limits can be obtained from limits for the Binomial distribution and then back-transformed. Further explanation (for theq=0.50 quantile, but generalizing to all quantiles) can be found in the answers at Central limit theorem for sample medians.
Theq=0.01 quantile of Y is negative. Its sampling distribution is highly skewed. To reduce the skew, this figure shows a histogram of the logarithms of the negatives of 1,000 simulated samples of n=300 values of Y .
sumber