Pertimbangkan model linier sederhana:
yy = X ′ ββ + ϵ
dimana ϵ i ∼ i . i . d .N ( 0 , σ 2 )
Pertanyaan saya adalah, mengingat E ( X ′ X )
* Saya berasumsi, menulis ini, bahwa mendapatkan E ( R 2 )
EDIT1
menggunakan solusi yang diturunkan oleh Stéphane Laurent (lihat di bawah) kita bisa mendapatkan batas atas non trivial pada E ( R 2 )
Stéphane Laurent diturunkan sebagai berikut: R 2 ∼ B ( p - 1 , n - p , λ ) di
λ = | | X ′ β - E ( X ) ′ β 1 n | | 2σ 2
Begitu
E ( R 2 ) = E ( χ 2 p - 1 ( λ )χ 2 p - 1 ( λ ) + χ 2 n - p )≥E(χ 2 p - 1 (λ))E ( χ 2 p - 1 ( λ ) ) + E ( χ 2 n - p )
di mana χ 2 k ( λ )
λ + p - 1λ + n - 1
itu sangat ketat (jauh lebih ketat dari apa yang saya harapkan mungkin terjadi):
misalnya, menggunakan:
rho<-0.75
p<-10
n<-25*p
Su<-matrix(rho,p-1,p-1)
diag(Su)<-1
su<-1
set.seed(123)
bet<-runif(p)
rata-rata dari R 2 lebih dari 1000 simulasi adalah . Batas atas teoretis di atas memberi . Terikat tampaknya sama-sama tepat di banyak nilai-nilai R 2 . Benar-benar mencengangkan!0.960819
0.9609081
EDIT2:
setelah penelitian lebih lanjut, tampak bahwa kualitas perkiraan batas atas ke E ( R 2 ) akan menjadi lebih baik karena λ + p meningkat (dan semuanya sama, λ meningkat dengan n ).
sumber
Jawaban:
Any linear model can be written Y=μ+σGY=μ+σG where GG has the standard normal distribution on RnRn and μμ is assumed to belong to a linear subspace WW of RnRn . In your case W=Im(X)W=Im(X) .
Let [1]⊂W[1]⊂W be the one-dimensional linear subspace generated by the vector (1,1,…,1)(1,1,…,1) . Taking U=[1]U=[1] below, the R2R2 is highly related to the classical Fisher statistic
F=‖PZY‖2/(m−ℓ)‖P⊥WY‖2/(n−m),
Indeed, ‖PZY‖2‖P⊥WY‖2=R21−R2
Obviously PZY=PZμ+σPZGPZY=PZμ+σPZG and
P⊥WY=σP⊥WGP⊥WY=σP⊥WG .
When H0:{μ∈U}H0:{μ∈U} is true then PZμ=0PZμ=0 and therefore
F=‖PZG‖2/(m−ℓ)‖P⊥WG‖2/(n−m)∼Fm−ℓ,n−m
In the general situation we have to deal with PZY=PZμ+σPZGPZY=PZμ+σPZG when PZμ≠0PZμ≠0 . In this general case one has ‖PZY‖2∼σ2χ2m−ℓ(λ)∥PZY∥2∼σ2χ2m−ℓ(λ) , the noncentral χ2χ2 distribution with m−ℓm−ℓ degrees of freedom and noncentrality parameter λ=‖PZμ‖2σ2λ=∥PZμ∥2σ2 , and then
F∼Fm−ℓ,n−m(λ)F∼Fm−ℓ,n−m(λ) (noncentral Fisher distribution). This is the classical result used to compute power of FF -tests.
The classical relation between the Fisher distribution and the Beta distribution hold in the noncentral situation too. Finally R2R2 has the noncentral beta distribution with "shape parameters" m−ℓm−ℓ and n−mn−m and noncentrality parameter λλ . I think the moments are available in the literature but they possibly are highly complicated.
Finally let us write down PZμPZμ . Note that PZ=PW−PUPZ=PW−PU . One has PUμ=ˉμ1PUμ=μ¯1 when U=[1]U=[1] , and PWμ=μPWμ=μ . Hence PZμ=μ−ˉμ1PZμ=μ−μ¯1 where here μ=Xβμ=Xβ for the unknown parameters vector ββ .
sumber