Intuisi untuk Ekspektasi Bersyarat dari aljabar-

20

Misalkan ( Ω , F , μ )(Ω,F,μ) menjadi ruang probabilitas, diberi variabel acak ξ : Ω Rξ:ΩR dan σ-σ aljabar GFGF kita dapat membuat variabel acak baru E [ ξ | G ]E[ξ|G] , yang merupakan harapan bersyarat.


Apa sebenarnya intuisi untuk berpikir tentang E [ ξ | G ]E[ξ|G] ? Saya mengerti intuisi sebagai berikut:

(i) E [ ξ | A ] diE[ξ|A] mana AA adalah suatu peristiwa (dengan probabilitas positif).

(ii) E [ ξ | η ] diE[ξ|η] mana ηη adalah variabel acak diskrit.

Tetapi saya tidak dapat memvisualisasikan E [ ξ | G ]E[ξ|G] . Saya memahami matematika itu, dan saya mengerti bahwa itu didefinisikan sedemikian rupa untuk menggeneralisasi kasus-kasus sederhana yang dapat kita visualisasikan. Tetapi bagaimanapun juga saya tidak menemukan cara berpikir ini berguna. Itu tetap menjadi objek misterius bagi saya.


Misalnya, misalkan A menjadi peristiwa dengan μ ( A ) > 0 . Bentuk σ -algebra G = { , A , A c , Ω } , yang dihasilkan oleh A . Kemudian E [ ξ | G ] ( ω ) akan sama dengan 1Aμ(A)>0σG={,A,Ac,Ω}AE[ξ|G](ω)μ ( A )AξjikaωA, dan sama dengan11μ(A)AξωAμ ( A c )AcξjikaohmA. Dengan kata lain,E[ξ| G](ω)=E[ξ| A]jikaωA, danE[ξ| G](ω)=E[ξ| Ac]jikaωAc.1μ(Ac)AcξωAE[ξ|G](ω)=E[ξ|A]ωAE[ξ|G](ω)=E[ξ|Ac]ωAc

Bagian yang membingungkan adalah ω Ω , jadi mengapa kita tidak menulis saja E [ ξ | G ] ( ω ) = E [ ξ | Ω ] = E [ ξ ] ? Mengapa kami mengganti E [ ξ | G ] oleh E [ ξ | Sebuah  atau  A c ] tergantung pada apakah atau tidak ohm A , tetapi tidak diizinkan untuk mengganti E [ωΩE[ξ|G](ω)=E[ξ|Ω]=E[ξ]E[ξ|G]E[ξ|A or Ac]ωAξ | G ] oleh E [ ξ ] ?E[ξ|G]E[ξ]


Catatan. Dalam menanggapi pertanyaan ini, jangan menjelaskan ini dengan menggunakan definisi ekspektasi kondisional yang ketat. Aku mengerti itu. Yang ingin saya pahami adalah apa yang diharapkan oleh penghitungan bersyarat dan mengapa kita menolak satu di tempat yang lain.

Nicolas Bourbaki
sumber

Jawaban:

16

Salah satu cara untuk berpikir tentang representasi kondisional adalah sebagai proyeksi ke aljabar G - σ .σG

enter image description here( dari Wikimedia commons )

Ini benar-benar benar ketika berbicara tentang variabel acak kuadrat-integrable; dalam hal ini E [ ξ | G ] sebenarnya adalah proyeksi ortogonal dari variabel random ξ ke ruang bagian dari L 2 ( Ω ) yang terdiri dari variabel-variabel acak terukur sehubungan dengan G . Dan sebenarnya ini bahkan ternyata benar dalam beberapa hal untuk variabel acak L 1 melalui pendekatan oleh variabel acak L 2 .E[ξ|G]ξL2(Ω)GL1L2

(Lihat komentar untuk referensi.)

Jika seseorang menganggap σ - aljabar sebagai mewakili berapa banyak informasi yang kami miliki (interpretasi yang sesuai dengan teori proses stokastik), maka σ - aljabar yang lebih besar berarti lebih banyak kemungkinan kejadian dan dengan demikian lebih banyak informasi tentang kemungkinan hasil, sementara σ yang lebih kecil - aljabar berarti lebih sedikit kemungkinan kejadian dan dengan demikian lebih sedikit informasi tentang kemungkinan hasil.σσσ

Oleh karena itu, memproyeksikan F -measurable variabel acak ξ ke kecil σ - aljabar G berarti mengambil perkiraan terbaik kami untuk nilai ξ diberikan informasi yang lebih terbatas yang tersedia dari G .FξσGξG

Dengan kata lain, hanya diberikan informasi dari G , dan bukan seluruh informasi dari F , E [ ξ | G ] dalam arti yang ketat, tebakan terbaik kami untuk apa variabel acak ξ adalah.GFE[ξ|G]ξ


Sehubungan dengan contoh Anda, saya pikir Anda mungkin membingungkan variabel acak dan nilainya. Variabel acak X adalah fungsi yang domainnya adalah ruang acara; itu bukan angka. Dengan kata lain, X : Ω R , X { f | f : ohm R } sedangkan untuk ω ohm , X ( ω ) R .XX:ΩR  X{f | f:ΩR}ωΩX(ω)R

Notasi untuk ekspektasi bersyarat, menurut pendapat saya, benar-benar buruk, karena itu adalah variabel acak itu sendiri, yaitu juga fungsi . Sebaliknya, harapan (reguler) dari variabel acak adalah angka . Ekspektasi bersyarat dari variabel acak adalah jumlah yang sama sekali berbeda dari ekspektasi variabel acak yang sama, yaitu, E [ ξ | G ] bahkan tidak "ketik-cek" dengan E [ ξ ] .E[ξ|G]E[ξ]

Dengan kata lain, menggunakan simbol E untuk menunjukkan ekspektasi reguler dan kondisional adalah penyalahgunaan notasi yang sangat besar, yang menyebabkan banyak kebingungan yang tidak perlu.E

Semua itu dikatakan, perhatikan bahwa E [ ξ | G ] ( ω ) adalah angka (nilai variabel acak E [ ξ | G ] dievaluasi pada nilai ω ), tetapi E [ ξ | Ω ] adalah variabel acak, tetapi ternyata variabel acak konstan (yaitu trivial degenerate), karena σ- aljabar yang dihasilkan oleh Ω , { , Ω }E[ξ|G](ω)E[ξ|G]ωE[ξ|Ω]σΩ{,Ω} is trivial/degenerate, and then technically speaking the constant value of this constant random variable, is E[ξ]E[ξ], where here EE denotes regular expectation and thus a number, not conditional expectation and thus not a random variable.

Also you seem to be confused about what the notation E[ξ|A]E[ξ|A] means; technically speaking it is only possible to condition on σσalgebras, not on individual events, since probability measures are only defined on complete σσalgebras, not on individual events. Thus, E[ξ|A]E[ξ|A] is just (lazy) shorthand for E[ξ|σ(A)]E[ξ|σ(A)], where σ(A)σ(A) stands for the σσalgebra generated by the event AA, which is {,A,Ac,Ω}{,A,Ac,Ω}. Note that σ(A)=G=σ(Ac)σ(A)=G=σ(Ac); in other words, E[ξ|A]E[ξ|A], E[ξ|G]E[ξ|G], and E[ξ|Ac]E[ξ|Ac] are all different ways to denote the exact same object.

Finally I just want to add that the intuitive explanation I gave above explains why the constant value of the random variable E[ξ|Ω]=E[ξ|σ(Ω)]=E[ξ|{,Ω}]E[ξ|Ω]=E[ξ|σ(Ω)]=E[ξ|{,Ω}] is just the number E[ξ]E[ξ] -- the σσalgebra {,Ω}{,Ω} represents the least possible amount of information we could have, in fact essentially no information, so under this extreme circumstance the best possible guess we could have for which random variable ξξ is is the constant random variable whose constant value is E[ξ]E[ξ].

Note that all constant random variables are L2L2 random variables, and they are all measurable with respect to the trivial σσ-algebra {,Ω}{,Ω}, so indeed we do have that the constant random E[ξ]E[ξ] is the orthogonal projection of ξξ onto the subspace of L2(Ω)L2(Ω) consisting of random variables measurable with respect to {,Ω}{,Ω}, as was claimed.

Chill2Macht
sumber
2
@William I disagree with you about the use of E[ξ|A]E[ξ|A] as a ran var. Many books define E[ξ|A]E[ξ|A] to be a number, not a ran var. It is the best possible estimate of ξ|Aξ|A. This is a useful notion and highly intuitive. Disregarding it completely, just because you have a generalized notion of cond exp as a ran var is wrong from a pedagogical point-of-view. I am not confused about what a r.v. is, nor do I see how anything I wrote would lead you to thinking like that.
Nicolas Bourbaki
1
@William Thinking of cond expe as an estimate to the ran var with GG representing information, is something I have seen said before but I never gave it that much thought and tried to find a different way of visualizing cond expec. Using your suggestion, I am going to write up a simple example, and post it as an answer, for myself, and for other people. Perhaps, some people can then elaborate on my example and give a more exotic one.
Nicolas Bourbaki
1
@NicolasBourbaki I recommend that you look at p.221 of the 4th edition of Durrett's Probability - Theory and Examples. I can refer you to other sources discussing this as well. In any case, it is not really a matter of opinion -- in the most general case, a conditional expectation is a random variable, and conditioning is only done with respect to σσalgebras; conditioning with respect to an event is conditioning with respect to the σσalgebra generated by the event, and conditioning with respect to a random variable is conditioning w.r.t. the σσ-algebra generated by the RV
Chill2Macht
3
@William And I can refer you to sources which do define the cond. exep. of an event to be a real number. I do not know why you are so stuck on this point. One can define it any way, as long as the notions are not mixed up. For pedagogical reasons, teaching a class on prob. theory, and instantly jumping into the most general def., is not illuminating. In either case, it really does not matter in this discussion, and your complaint is about notation/semantics.
Nicolas Bourbaki
1
@NicolasBourbaki Chapter 5 of Whittle's Probability via Expectation gives a very good account (in my opinion) of both characterizations of conditional expectation, and explains well how each definition relates to and is motivated by the other definition. You are right that the distinction is one more of semantics. My enthusiasm for the more general definition stems (I think) from reading this chapter (5 of Whittle's Probability via Expectation), which made (I believe) good arguments about how the more general definition is in some ways easier to understand.
Chill2Macht
3

I am going to try to elaborate what William suggested.

Let ΩΩ be the sample space of tossing a coin twice. Define the ran. var. ξξ to be the num. of heads that occur in the experiment. Clearly, E[ξ]=1E[ξ]=1. One way of thinking of what 11, as an expec. value, represents is as the best possible estimate for ξξ. If we had to take a guess for what value ξξ would take, we would guess 11. This is because E[(ξ1)2]E[(ξa)2]E[(ξ1)2]E[(ξa)2] for any real number aa.

Denote by A={HT,HH}A={HT,HH} to be the event that the first outcome is a head. Let G={,A,Ac,Ω}G={,A,Ac,Ω} be the σσ-alg. gen. by AA. We think of GG as representing what we know after the first toss. After the first toss, either heads occured, or heads did not occur. Hence, we are either in the event AA or AcAc after the first toss.

If we are in the event AA, then the best possible estimate for ξξ would be E[ξ|A]=1.5E[ξ|A]=1.5, and if we are in the event AcAc, then the best possible estimate for ξξ would be E[ξ|Ac]=0.5E[ξ|Ac]=0.5.

Now define the ran. var. η(ω)η(ω) to be either 1.51.5 or 0.50.5 depending on whether or not ωAωA. This ran. var. ηη, is a better approximation than 1=E[ξ]1=E[ξ] since E[(ξη)2]E[(ξ1)2]E[(ξη)2]E[(ξ1)2].

What ηη is doing is providing the answer to the question: what is the best estimate of ξξ after the first toss? Since we do not know the information after the first toss, ηη will depend on AA. Once the event GG is revealed to us, after the first toss, the value of ηη is determined and provides the best possible estimate for ξξ.

The problem with using ξξ as its own estimate, i.e. 0=E[(ξξ)2]E[(ξη)2]0=E[(ξξ)2]E[(ξη)2] is as follows. ξ is not well-defined after the first toss. Say the outcome of the experiment is ω with first outcome being heads, we are in the event A, but what is ξ(ω)=? We do not know from just the first toss, that value is ambiguous to us, and so ξ is not well-defined. More formally, we say that ξ is not G-measurable i.e. its value is not well-defined after the first toss. Thus, η is the best possible estimate of ξ after the first toss.

Perhaps, somebody here can come up with a more sophisticated example using the sample space [0,1], with ξ(ω)=ω, and G some non-trivial σ-algebra.

Nicolas Bourbaki
sumber
1

Although you request not to use the formal definition, I think that the formal definition is probably the best way of explaining it.

Wikipedia - conditional expectation:

Then a conditional expectation of X given H, denoted as E(XH), is any H-measurable function ( ΩRn) which satisfies:

HE(XH)dP=HXdPfor eachHH

Firstly, it is a H-measurable function. Secondly it has to match the expectation over every measurable (sub)set in H. So for an event,A, the sigma algebra is {A,AC,,Ω}, so clearly it is set as you specified in your question for ωA/Ac. Similarly for any discrete random variable ( and combinations of them), we list out all primitive events and assign the expectation given that primitive event.

Now consider tossing a coin an infinite number of times, where at each toss i, you get 1/2i, if your coin is tails then your total winnings are X=i=112ici where ci = 1 for tails and 0 for heads. Then X is a real random variable on [0,1]. After n coin tosses, you know the value of X to precision 1/2n, eg after 2 coin tosses it is in [0,1/4], [1/4,1/2], [1/2,3/4] or [3/4,1] - after every coin toss, your associated sigma algebra is getting finer and finer, and similarly the conditional expectation of X is getting more and more precise.

Hopefully this example of a real valued random variable with a sequence of sigma algebras getting finer and finer (Filtration) gets you away from the purely event based intuition you are used to, and clarifies its purpose.

seanv507
sumber
I apologize, but I downvoted this question. It does not answer what I originally asked. Nor does it provide any new information that I did not know before.
Nicolas Bourbaki
What I am trying to suggest to you is you do not understand the formal definition as well as you think you do (as the other answer also suggested), so unless you work through what is unintuitive with the formal definition you will not progress.
seanv507
I understand the formal definition just fine. The questions that I asked, I know how to answer them when working from the formal definitions. The 'other answer', was trying to explain my question without using the definition of con. exp.
Nicolas Bourbaki