Dari mana asal syarat penuh dalam pengambilan sampel Gibbs?

15

Algoritma MCMC seperti Metropolis-Hastings dan Gibbs sampling adalah cara pengambilan sampel dari distribusi posterior bersama.

Saya pikir saya mengerti dan dapat mengimplementasikan metropolis-hasting dengan cukup mudah - Anda cukup memilih titik awal, dan 'berjalan di ruang parameter' secara acak, dipandu oleh kepadatan posterior dan kepadatan proposal. Pengambilan sampel Gibbs tampaknya sangat mirip tetapi lebih efisien karena hanya memperbarui satu parameter pada satu waktu, sementara yang lain tetap konstan, efektif berjalan di ruang dengan cara ortogonal.

Untuk melakukan ini, Anda memerlukan persyaratan penuh dari setiap parameter secara analitis dari *. Tetapi dari mana datangnya kondisi penuh ini? Untuk mendapatkan penyebut Anda perlu memarginalkan sambungan lebih darix1. Itu sepertinya banyak pekerjaan yang harus dilakukan secara analitik jika ada banyak parameter, dan mungkin tidak dapat ditelusuri jika distribusi bersama tidak terlalu 'bagus'. Saya menyadari bahwa jika Anda menggunakan konjugasi di seluruh model, kondisi penuh mungkin mudah, tetapi harus ada cara yang lebih baik untuk situasi yang lebih umum.

P(x1|x2, , xn)=P(x1, , xn)P(x2, , xn)
x1

Semua contoh pengambilan sampel Gibbs yang saya lihat online menggunakan contoh mainan (seperti pengambilan sampel dari multivarian normal, di mana kondisional hanya normalnya sendiri), dan tampaknya menghindari masalah ini.

* Atau apakah Anda memerlukan persyaratan penuh dalam bentuk analitik? Bagaimana program seperti winBUGS melakukannya?

cespinoza
sumber
1
Pengambilan sampel Gibbs biasanya kurang efisien daripada Metropolis-Hastings karena ia berjalan satu dimensi pada satu waktu ...
Xi'an
Gibbs sampling lebih efisien pada setiap langkah individu, tetapi mungkin memerlukan mengerikan langkah lebih banyak untuk konvergen - dan berakhir kurang efisien untuk hasil yang baik secara keseluruhan.
Lutz Prechelt

Jawaban:

7

Ya, Anda benar, distribusi bersyarat perlu ditemukan secara analitis, tetapi saya pikir ada banyak contoh di mana distribusi bersyarat penuh mudah ditemukan, dan memiliki bentuk yang jauh lebih sederhana daripada distribusi bersama.

P(X1,,Xn)XiXiXi1Xi+1Pr(Xi|X1,,Xi)=Pr(Xi|Xi1,Xi+1)

gabgoh
sumber
To add to this answer, you need not marginalize the other variables as originally stated in the question. All you need to do is to 're-organize' Pr(Xi|Xi1,Xi+1) such that you recognize the result as a known pdf and you are done. As long as you are able to re-organize the above everything else (i.e., all other constants, the integral in the denominator etc) will equal the appropriate constant for the pdf to integrate to 1.
3
They don't need to be found analytically. All full conditionals are proportional to the joint distribution, for instance. And that's all that's needed for Metropolis-Hastings.
Tristan
1
@Tristan of course. I am, however, talking about gibbs sampling.
gabgoh
1
They don't need to be found analytically for Gibbs Sampling. You just need to be able to sample, somehow, from the conditional; whether you can write down how to do this in a pretty analytic statement is not relevant.
guest
1
Actually, there is no need for an analytic full conditional: all that is required for Gibbs sampling to be implemented is the ability to simulate from the full conditionals.
Xi'an
11

I think you've missed the main advantage of algorithms like of Metropolis-Hastings. For Gibbs sampling, you will need to sample from the full conditionals. You are right, that is rarely easy to do. The main advantage of Metropolis-Hastings algorithms is that you can still sample one parameter at a time, but you only need to know the full conditionals up to proportionality. This is because the denominators cancel in the acceptance criteria function

The unnormalized full conditionals are often available. For instance, in your example P(x1|x2,...,xn)P(x1,...,xn), which you have. You don't need to do any integrals analytically. In most applications, a lot more will likely cancel too.

Programs like WinBugs/Jags typically take Metropolis-Hastings or slice sampling steps that only require the conditionals up to proportionality. These are easily available from the DAG. Given conjugacy, they also sometimes take straight Gibbs steps or fancy block stops.

Tristan
sumber
1
Hey, thanks! I think that point about not needing the norming constant for metropolis-hastings is exactly the information I needed to make sense of all of this. I think, because the GS in WinBUGS stands for gibbs sampling, I was under the impression that gibbs superseded M-H and that the software was using gibbs exclusively.
cespinoza
3
The term Gibbs sampling is often used to imply that you sample one parameter at a time, even if you don't use the original idea of sampling directly from the full conditionals. All the software samples individual parameters or blocks of parameters in sequence, but the actual step type varies a lot depending on what works best.
Tristan
2
Almost whenever you can implement Gibbs, you can also implement Metropolis-Hastings alternatives. Higher efficiency comes from mixing both approaches.
Xi'an
This should imho be the accepted answer.
NoBackingDown