Jika saya ingin mendapatkan probabilitas 9 keberhasilan dalam 16 percobaan dengan setiap percobaan memiliki probabilitas 0,6 saya bisa menggunakan distribusi binomial. Apa yang bisa saya gunakan jika masing-masing dari 16 percobaan memiliki probabilitas keberhasilan yang berbeda?
36
Jawaban:
Ini adalah jumlah dari 16 uji coba Binomial (mungkin independen). Asumsi kemerdekaan memungkinkan kita untuk melipatgandakan probabilitas. Dimana, setelah dua percobaan dengan probabilitasp1 dan p2 keberhasilan peluang keberhasilan pada kedua percobaan adalah p1p2 , peluang tidak ada keberhasilan adalah (1−p1)(1−p2) , dan peluang untuk satu kesuksesan adalah p1(1−p2)+(1−p1)p2 . Ungkapan terakhir itu berutang validitasnya pada kenyataan bahwa dua cara untuk mendapatkan satu kesuksesan itu sama-sama eksklusif: paling-paling satu di antaranya bisa benar-benar terjadi. Itu berarti probabilitas merekabertambah.
Dengan menggunakan dua aturan ini - probabilitas independen bertambah banyak dan yang eksklusif tambahkan - Anda dapat menentukan jawabannya, katakanlah, 16 uji coba dengan probabilitasp1,…,p16 . Untuk melakukannya, Anda harus memperhitungkan semua cara untuk memperoleh setiap keberhasilan yang diberikan (seperti 9). Ada (169)=11440 cara untuk mencapai 9 keberhasilan. Salah satunya, misalnya, terjadi ketika percobaan 1, 2, 4, 5, 6, 11, 12, 14, dan 15 berhasil dan yang lainnya gagal. Keberhasilan memiliki probabilitasp1,p2,p4,p5,p6,p11,p12,p14, danp15 dan kegagalan memiliki probabilitas1−p3,1−p7,…,1−p13,1−p16 . Mengalikan 16 angka ini memberi peluangurutan hasil tertentu. Menjumlahkan angka ini bersama dengan 11,439 angka yang tersisa memberikan jawabannya.
Tentu saja Anda akan menggunakan komputer.
Dengan lebih dari 16 percobaan, ada kebutuhan untuk memperkirakan distribusi. Asalkan tidak ada probabilitaspi dan 1−pi menjadi terlalu kecil, perkiraan Normal cenderung berfungsi dengan baik. Dengan metode ini Anda mencatat bahwa ekspektasi jumlah penjumlahan n percobaan adalah μ=p1+p2+⋯+pn dan (karena uji coba independen) variansnya adalah σ2=p1(1−p1)+p2(1−p2)+⋯+pn(1−pn) . Anda kemudian berpura-pura distribusi jumlah adalah Normal dengan meanμ dan simpangan bakuσ . Jawabannya cenderung baik untuk menghitung probabilitas yang sesuai dengan proporsi keberhasilan yang berbeda dariμ dengan tidak lebih dari beberapa kelipatanσ . Saatn tumbuh besar, perkiraan ini menjadi lebih akurat dan bekerja untuk kelipatanσ jauh lebih besar dariμ .
sumber
Salah satu alternatif untuk pendekatan normal @ whuber adalah dengan menggunakan probabilitas "mixing", atau model hierarkis. Ini akan berlaku ketika serupa dalam beberapa cara, dan Anda dapat memodelkan ini dengan distribusi probabilitas p i ∼ D i s t ( θ ) dengan fungsi kerapatan g ( p | θ ) yang diindeks oleh beberapa parameter θ . Anda mendapatkan persamaan integral:pi pi∼Dist(θ) g(p|θ) θ
Probabilitas binomial berasal dari pengaturan , perkiraan normal berasal dari (saya pikir) pengaturan g ( pg(p|θ)=δ(p−θ) (denganμdanσsebagaimana didefinisikan dalam jawaban @ whuber) dan kemudian mencatat "ekor" dari PDF ini jatuh tajam di sekitar puncak.g(p|θ)=g(p|μ,σ)=1σϕ(p−μσ) μ σ
Anda juga bisa menggunakan distribusi beta, yang akan mengarah ke bentuk analitik sederhana, dan yang tidak perlu menderita masalah "p kecil" yang dilakukan oleh perkiraan normal - karena beta cukup fleksibel. Menggunakan distribusi dengan α , β yang ditetapkan oleh solusi untuk persamaan berikut (ini adalah perkiraan "mimimum KL divergence"):beta(α,β) α,β
ψ(β)-ψ(α+β)=1
Di mana Adalah fungsi digamma - terkait erat dengan deret harmonik.ψ(.)
We get the "beta-binomial" compound distribution:
This distribution converges towards a normal distribution in the case that @whuber points out - but should give reasonable answers for smalln and skewed pi - but not for multimodal pi , as beta distribution only has one peak. But you can easily fix this, by simply using M beta distributions for the M modes. You break up the integral from 0<p<1 into M pieces so that each piece has a unique mode (and enough data to estimate parameters), and fit a beta distribution within each piece. then add up the results, noting that making the change of variables p=x−LU−L for L<x<U the beta integral transforms to:
sumber
We seekP(S=9) , which is:
ALL DONE. This produces the exact symbolic solution as a function of thepi . The answer is rather long to print on screen, but it is entirely tractable, and takes less than 1100 th of a second to evaluate using Mathematica on my computer.
Examples
Ifpi=i17,i=1 to 16 , then: P(S=9)=964794185433480818448661191875666868481=0.198268…
Ifpi=i√17,i=1 to 16 , then: P(S=9)=0.000228613…
More than 16 trials?
With more than 16 trials, there is no need to approximate the distribution. The above exact method works just as easily for examples with sayn=50 or n=100 . For instance, when n=50 , it takes less than 110 th of second to evaluate the entire pmf (i.e. at every value s=0,1,…,50 ) using the code below.
Mathematica code
Given a vector ofpi values, say:
... here is some Mathematica code to do everything required:
To derive the entire pmf:
... or use the even neater and faster (thanks to a suggestion from Ray Koopman below):
For an example withn=1000 , it takes just 1 second to calculate
pgfS
, and then 0.002 seconds to derive the entire pmf usingCoefficientList
, so it is extremely efficient.sumber
With[{p = Range@16/17}, N@Coefficient[Times@@(1-p+p*t),t,9]]
gives the probability of 9 successes, andWith[{p = Range@16/17}, N@CoefficientList[Times@@(1-p+p*t),t]]
gives the probabilities of 0,...,16 successes.Table
for theRange
. Your use ofCoefficientList
is very nice! I've added anExpand
to the code above which speeds the direct approach up enormously. Even so,CoefficientList
is even faster than aParallelTable
. It does not make much difference forCoefficientList
will also be a real practical advantage when n is really large.@wolfies comment, and my attempt at a response to it revealed an important problem with my other answer, which I will discuss later.
Specific Case (n=16)
There is a fairly efficient way to code up the full distribution by using the "trick" of using base 2 (binary) numbers in the calculation. It only requires 4 lines of R code to get the full distribution ofY=∑ni=1Zi where Pr(Zi=1)=pi . Basically, there are a total of 2n choices of the vector z=(z1,…,zn) that the binary variables Zi could take. Now suppose we number each distinct choice from 1 up to 2n . This on its own is nothing special, but now suppose that we represent the "choice number" using base 2 arithmetic. Now take n=3 so I can write down all the choices so there are 23=8 choices. Then 1,2,3,4,5,6,7,8 in "ordinary numbers" becomes 1,10,11,100,101,110,111,1000 in "binary numbers". Now suppose we write these as four digit numbers, then we have 0001,0010,0011,0100,0101,0110,0111,1000 . Now look at the last 3 digits of each number - 001 can be thought of as (Z1=0,Z2=0,Z3=1)⟹Y=1 , etc. Counting in binary form provides an efficient way to organise the summation. Fortunately, there is an R function which can do this binary conversion for us, called 32 elements, each element being the digit of the base 2 version of our number (read from right to left, not left to right). Using this trick combined with some other R vectorisations, we can calculate the probability that y=9 in 4 lines of R code:
intToBits(x)
and we convert the raw binary form into a numeric viaas.numeric(intToBits(x))
, then we will get a vector withPlugging in the uniform casep(1)i=i17 and the sqrt root case p(2)i=i√17 gives a full distribution for y as:
So for the specific problem ofy successes in 16 trials, the exact calculations are straight-forward. This also works for a number of probabilities up to about n=20 - beyond that you are likely to start to run into memory problems, and different computing tricks are needed.
Note that by applying my suggested "beta distribution" we get parameter estimates ofα=β=1.3206 and this gives a probability estimate that is nearly uniform in y , giving an approximate value of pr(y=9)=0.06799≈117 . This seems strange given that a density of a beta distribution with α=β=1.3206 closely approximates the histogram of the pi values. What went wrong?
General Case
I will now discuss the more general case, and why my simple beta approximation failed. Basically, by writing(y|n,p)∼Binom(n,p) and then mixing over p with another distribution p∼f(θ) is actually making an important assumption - that we can approximate the actual probability with a single binomial probability - the only problem that remains is which value of p to use. One way to see this is to use the mixing density which is discrete uniform over the actual pi . So we replace the beta distribution p∼Beta(a,b) with a discrete density of p∼∑16i=1wiδ(p−pi) . Then using the mixing approximation can be expressed in words as choose a pi value with probability wi , and assume all bernoulli trials have this probability. Clearly, for such an approximation to work well, most of the pi values should be similar to each other. This basically means that for @wolfies uniform distribution of values, pi=i17 results in a woefully bad approximation when using the beta mixing distribution. This also explains why the approximation is much better for pi=i√17 - they are less spread out.
The mixing then uses the observedpi to average over all possible choices of a single p . Now because "mixing" is like a weighted average, it cannot possibly do any better than using the single best p . So if the pi are sufficiently spread out, there can be no single p that could provide a good approximation to all pi .
One thing I did say in my other answer was that it may be better to use a mixture of beta distributions over a restricted range - but this still won't help here because this is still mixing over a singlep . What makes more sense is split the interval (0,1) up into pieces and have a binomial within each piece. For example, we could choose (0,0.1,0.2,…,0.9,1) as our splits and fit nine binomials within each 0.1 range of probability. Basically, within each split, we would fit a simple approximation, such as using a binomial with probability equal to the average of the pi in that range. If we make the intervals small enough, the approximation becomes arbitrarily good. But note that all this does is leave us with having to deal with a sum of indpendent binomial trials with different probabilities, instead of Bernoulli trials. However, the previous part to this answer showed that we can do the exact calculations provided that the number of binomials is sufficiently small, say 10-15 or so.
To extend the bernoulli-based answer to a binomial-based one, we simply "re-interpret" what theZi variables are. We simply state that Zi=I(Xi>0) - this reduces to the original bernoulli-based Zi but now says which binomials the successes are coming from. So the case (Z1=0,Z2=0,Z3=1) now means that all the "successes" come from the third binomial, and none from the first two.
Note that this is still "exponential" in that the number of calculations is something likekg where g is the number of binomials, and k is the group size - so you have Y≈∑gj=1Xj where Xj∼Bin(k,pj) . But this is better than the 2gk that you'd be dealing with by using bernoulli random variables. For example, suppose we split the n=16 probabilities into g=4 groups with k=4 probabilities in each group. This gives 44=256 calculations, compared to 216=65536
By choosingg=10 groups, and noting that the limit was about n=20 which is about 107 cells, we can effectively use this method to increase the maximum n to n=50 .
If we make a cruder approximation, by loweringg , we will increase the "feasible" size for n . g=5 means that you can have an effective n of about 125 . Beyond this the normal approximation should be extremely accurate.
sumber
R
that is extremely efficient and handles much, much larger values ofThe (in general intractable) pmf is
For thepi 's used in wolfies answer, we have:
Whenn grows, use a convolution.
sumber
R
code in the solution to the same problem (with different values of theR
code and 0.00058 seconds using Wolfies' Mathematica code (estimated by solving it 1000 times).