Jika ya, apa? Jika tidak, mengapa tidak?
Untuk sampel di telepon, median meminimalkan deviasi absolut total. Tampaknya wajar untuk memperluas definisi ke R2, dll., Tetapi saya belum pernah melihatnya. Tapi kemudian, saya sudah lama berada di bidang kiri.
multivariate-analysis
spatial
median
phv3773
sumber
sumber
Jawaban:
Saya tidak yakin ada satu definisi yang diterima untuk median multivariat. Yang saya kenal adalah titik median Oja , yang meminimalkan jumlah volume simplisia yang terbentuk dari subset poin. (Lihat tautan untuk definisi teknis.)
Pembaruan: Situs yang dirujuk untuk definisi Oja di atas juga memiliki makalah yang bagus yang mencakup sejumlah definisi median multivarian:
sumber
Seperti yang dikatakan @Ars , tidak ada definisi yang diterima (dan ini adalah poin yang bagus). Ada alternatif umum keluarga cara untuk menggeneralisasi kuantil pada , saya pikir yang paling signifikan adalah:Rd
Generalisasi proses kuantil Misalkan menjadi ukuran empiris (= proporsi pengamatan dalam A ). Kemudian, dengan A subset dipilih dengan baik dari set Borel di R d dan λ ukuran nyata dihargai, Anda dapat menentukan fungsi kuantil empiris:Pn(A) A A Rd λ
Misalkan Anda dapat menemukan satu yang memberi Anda minimum. Kemudian set (atau elemen dari himpunan) A 1 / 2 - ε ∩ A 1 / 2 + ε memberikan Anda median ketika ε dibuat cukup kecil. Definisi median diperoleh kembali ketika menggunakan A = ( ] - ∞ , x ] x ∈ R ) dan λ ( ] - ∞ , x ] ) = x . ArsAt A1/2−ϵ∩A1/2+ϵ ϵ A=(]−∞,x]x∈R) λ(]−∞,x])=x jawaban jatuh ke dalam kerangka itu saya kira ... lokasi setengah ruang tukey dapat diperoleh dengan menggunakan dan λ ( H x ) = x (dengan x ∈ R , sebuah ∈ R d ).A(a)=(Hx=(t∈Rd:⟨a,t⟩≤x) λ(Hx)=x x∈R a∈Rd
variational definition and M-estimation The idea here is that theα -quantile Qα of a random variable Y in R can be defined through a variational equality.
The most common definition is using the quantile regression functionρα (also known as pinball loss, guess why ? ) Qα=arginfx∈RE[ρα(Y−x)] . The case α=1/2 gives ρ1/2(y)=|y| and you can generalize that to higher dimension using l1 jarak seperti yang dilakukan di @Srikant Answer . Ini adalah median teoretis tetapi memberi Anda median empiris jika Anda mengganti ekspektasi dengan ekspektasi empiris (berarti).
Obviously there are bridges between the different formulations. They are not all obvious...
sumber
There are distinct ways to generalize the concept of median to higher dimensions. One not yet mentioned, but which was proposed long ago, is to construct a convex hull, peel it away, and iterate for as long as you can: what's left in the last hull is a set of points that are all candidates to be "medians."
"Head-banging" is another more recent attempt (c. 1980) to construct a robust center to a 2D point cloud. (The link is to documentation and software available at the US National Cancer Institute.)
The principal reason why there are multiple distinct generalizations and no one obvious solution is that R1 can be ordered but R2, R3, ... cannot be.
sumber
Geometric median is the point with the smallest average euclidian distance from the samples
sumber
The Tukey halfspace median can be extended to >2 dimensions using DEEPLOC, an algorithm due to Struyf and Rousseeuw; see here for details.
The algorithm is used to approximate the point of greatest depth efficiently; naive methods which attempt to determine this exactly usually run afoul of (the computational version of) "the curse of dimensionality", where the runtime required to calculate a statistic grows exponentially with the number of dimensions of the space.
sumber
A definition that comes close to it, for unimodal distributions, is the tukey halfspace median
sumber
I do not know if any such definition exists but I will try and extend the standard definition of the median toR2 . I will use the following notation:
To extend the definition of the median toR2 , we choose mx and my to minimize the following:
The problem now is that we need a definition for what we mean by:
The above is in a sense a distance metric and several possible candidate definitions are possible.
Eucliedan Metric
Computing the median under the euclidean metric will require computing the expectation of the above with respect to the joint densityf(x,y) .
Taxicab Metric
Computing the median in the case of the taxicab metric involves computing the median ofX and Y separately as the metric is separable in x and y .
sumber