Apa perbedaan antara Bidang Acak Markov dan Bidang Acak Bersyarat?

19

Jika saya memperbaiki nilai-nilai node yang diamati dari MRF, apakah itu menjadi CRF?

some one
sumber

Jawaban:

11

Oke, saya menemukan jawabannya sendiri:

Conditinal Random Fields (CRFs) adalah kasus khusus Markov Random Fields (MRFs).

1.5.4 Bidang Acak Bersyarat

Conditional Random Field (CRF) adalah bentuk MRF yang mendefinisikan posterior untuk variabel x data yang diberikan z, seperti halnya MRF tersembunyi di atas. Berbeda dengan MRF tersembunyi, bagaimanapun, faktorisasi ke dalam distribusi data P (x | z) dan P sebelumnya (x) tidak dibuat eksplisit [288]. Ini memungkinkan dependensi kompleks x pada z ditulis langsung dalam distribusi posterior, tanpa faktorisasi yang dibuat eksplisit. (Mengingat P (x | z), faktorisasi seperti itu selalu ada, namun — banyak dari mereka, pada kenyataannya — sehingga tidak ada saran bahwa CRF lebih umum daripada MRF yang tersembunyi, hanya saja mungkin lebih mudah untuk berurusan dengan .)

Sumber: Blake, Kohli dan Rother: Markov bidang acak untuk penglihatan dan pemrosesan gambar. 2011

Bidang acak bersyarat atau CRF (Lafferty et al. 2001), kadang-kadang bidang acak diskriminatif (Kumar dan Hebert 2003), hanyalah versi MRF di mana semua potensi klik dikondisikan pada fitur input: [...]

Keuntungan dari CRF dibandingkan MRF adalah analog dengan keuntungan dari classifier diskriminatif atas classifier generatif (lihat Bagian 8.6), yaitu, kita tidak perlu “membuang-buang sumber daya” memodelkan hal-hal yang selalu kita amati. [...]

Kerugian CRF dibandingkan MRF adalah mereka membutuhkan data pelatihan berlabel, dan mereka lebih lambat melatih [...]

Sumber: Kevin P. Murphy: Pembelajaran Mesin: Perspektif Probabilistik

Menjawab pertanyaan saya:

Jika saya memperbaiki nilai-nilai node yang diamati dari MRF, apakah itu menjadi CRF?

Iya. Memperbaiki nilai-nilai sama dengan mengkondisikannya. Namun, Anda harus perhatikan bahwa ada perbedaan dalam pelatihan juga.

Menyaksikan banyak ceramah tentang PGM (model grafis probabilistik) tentang coursera banyak membantu saya.

Martin Thoma
sumber
0

MRF vs Bayes nets : Secara tidak tepat (tetapi biasanya) , ada dua jenis model grafis: model grafis tidak langsung dan model grafis terarah (satu jenis lagi, misalnya Tanner graph). Yang pertama juga dikenal sebagai Markov Random Fields / jaringan Markov dan kemudian jaringan Bayes / Bayesian. (Kadang-kadang asumsi independensi dalam keduanya dapat diwakili oleh grafik chordal)

Markov menyiratkan cara faktorisasi dan bidang acak berarti distribusi tertentu di antara yang ditentukan oleh model yang tidak diarahkan.

CRF MRF : Ketika beberapa variabel diamati, kita dapat menggunakan representasi grafik yang tidak terarah yang sama (seperti grafik yang tidak diarahkan) dan parameterisasi untuk menyandikan distribusi bersyaratP(Y|X) where Y is a set of target variables and X is a (disjoint) set of observed variables.

And the only difference lies in that for a standard Markov network the normalization term sums over X and Y but for CRF the term sums over only Y.

Reference:

  1. Undirected graphical models (Markov random fields)
  2. Probabilistic Graphical Models Principles and Techniques (2009, The MIT Press)
  3. Markov random fields
Lerner Zhang
sumber
0

Let's contrast conditional inference under MRFs with modeling using a CRF, settling on definitions along the way, and then address the original question.

MRF

A Markov Random Field (MRF) with respect to a graph G is

  1. a set of random variables (or random "elements" if you like) corresponding to the nodes in G (thus, a "random field")
  2. with a joint distribution that is Markov with respect to G; that is, the joint probability distribution associated with this MRF is subject to the Markov constraint given by G: for any two variables, Vi and Vj, the value of Vi is conditionally independent of Vj given its neighbors Bi. In this case, it is said that the joint probability distribution P({Vi}) factorizes according to G.

Conditional Inference Under an MRF

Since an MRF represents a joint distribution over many variables that obeys Markov constraints, then we can compute conditional probability distributions given observed values of some variables.

For example, if I have a joint distribution over four random variables: IsRaining, SprinklerOn, SidewalkWet, and GrassWet, then on Monday I might want to infer the joint probability distribution over IsRaining and SprinklerOn given that I have observed SidewalkWet=False and GrassWet=True. On Tuesday, I might want to infer the joint probability distribution over IsRaining and SprinklerOn given that I have observed SidewalkWet=True and GrassWet=True.

In other words, we can use the same MRF model to make inferences in these two different situations, but we wouldn't say that we've changed the model. In fact, although we observed SidewalkWet and GrassWet in both cases described here, the MRF itself doesn't have "observed variables" per se---all variables have the same status in the eyes of the MRF, so the MRF also models, e.g., the joint distribution of SidewalkWet and GrassWet.

CRF

In contrast, we can define a Conditional [Markov] Random Field (CRF) with respect to a graph G as

  1. a set of random variables corresponding to the nodes in G, a subset {Xi}i=1n of which are assumed to always be observed and remaining variables {Yi}i=1m
  2. with a conditional distribution P({Yi}i=1m|{Xi}i=1n) that is Markov with respect to G

The Difference

For both MRFs and CRFs, we typically fit a model that we can then use for conditional inference in diverse settings (as in the rain example above). However, while the MRF has no consistently designated "observed variables" and needs a joint distribution over all variables that adheres to the Markov constraints of G, a CRF:

  1. designates a subset of variables as "observed"

  2. only defines a conditional distribution on non-observed given observed variables; it does not model the probability of the observed variables (if distributions are expressed in terms of parameters, this is often seen as a benefit since parameters are not wasted in explaining the probability of things that will always be known)

  3. needs only obey Markov constraints with respect to the unobserved variables (i.e. the distribution over unobserved variables can depend arbitrarily on the observed variables while inference is at least as tractable as for the MRF on G)

Since a CRF does not need to obey Markov constraints on the observed variables {Xi}, these are typically not even shown in graphical representations of a CRF (possibly a point of confusion sometimes). Instead, the CRF on G is defined as an MRF on a graph G where nodes are only included for the {Yi}s and where the parameters of the joint distribution of {Yi}s are functions of the {Xi}s, thus conditionally defining a distribution of {Yi}s given the {Xi}s.

Example

As a final example, the following linear-chain MRF would indicate that all of the Yi variables are conditionally independent of X1,X2,...Xn1 given a known value of Xn:

linear chain MRF: X_1, X_2, ..., X_n, Y_1, Y_2, ..., Y_m

In contrast, a CRF defined on the same G with the same designation of {Xi}s as being always observed, would allow for distributions of the {Yi}s that depend arbitrarily on any of the {Xi}s.

Conclusion

So, although ("yes") the conditional distribution of a MRF on G given designated observed variables can be considered to be a CRF with respect to G (since it defines a conditional distribution that obeys the Markov constraints of G), it is somewhat degenerate, and does not achieve the generality of CRFs on G. Instead, the appropriate recipe would be, given an MRF on G, define an MRF on the non-observed subset of G with parameters of the MRF expressed as the output of parameterized functions of the observed variables, training the function parameters to maximize the likelihood of the resulting conditional MRFs on labeled data.

In addition to the potential savings of model paramters, increased expressiveness of conditional model, and retention of inference efficiency, a final important point about the CRF recipe is that, for discrete models (and a large subset of non-discrete models), despite the expressiveness of the CRF family, the log-likelihood can be expressed as a convex function of the function parameters allowing for global optimization with gradient descent.

See also: the original crf paper and this tutorial

user3780389
sumber