Manakah yang terbesar, dari sekumpulan variabel acak yang terdistribusi normal?

14

Saya memiliki variabel acak $X_0,X_1,\dots,X_n$ . $X_0$ memiliki distribusi normal dengan rata-rata $\mu>0$ dan varian $1$ . The $X_1,\dots,X_n$ rvs terdistribusi normal dengan rata-rata $0$ dan varians $1$ . Semuanya saling independen.

Misalkan $E$ menunjukkan peristiwa bahwa $X_0$ adalah yang terbesar dari ini, yaitu, $X_0 > \max(X_1,\dots,X_n)$ . Saya ingin menghitung atau memperkirakan $\Pr[E]$ . I'm looking for an expression for $\Pr[E]$ , as a function of $\mu,n$ , or a reasonable estimate or approximation for $\Pr[E]$ .

In my application, $n$ is fixed ( $n=61$ ) and I want to find the smallest value for $\mu$ that makes $\Pr[E] \ge 0.99$ , but I'm curious about the general question as well.

probability normal-distribution D.W.
sumber

How large is

n

$n$ ? There ought to be some good asymptotic expressions based on large-sample theory.

whuber

@whuber, thanks! I edited the question: in my case

n = 61

$n=61$ . Even if

n = 61

$n=61$ isn't large enough to count as large, if there are good asymptotic estimates in the case where

n

$n$ is large, that'd be interesting.

D.W.

5

Using numerical integration,

μ \approx 4.91912496

$\mu \approx 4.91912496$ .

whuber

14

The calculation of such probabilities has been studied extensively by communications engineers under the name $M$ -ary orthogonal signaling where the model is that one of $M$ equal-energy equally likely orthogonal signals being transmitted and the receiver attempting to decide which one was transmitted by examining the outputs of $M$ filters matched to the signals. Conditioned on the identity of the transmitted signal, the sample outputs of the matched filters are (conditionally) independent unit-variance normal random variables. The sample output of the filter matched to the signal transmitted is a $N(\mu,1)$ random variable while the outputs of all the other filters are $N(0,1)$ random variables.

The conditional probability of a correct decision (which in the present context is the event $C = \{X_0 > \max_i X_i\}$ ) conditioned on $X_0 = \alpha$ is

P (C ∣ X_{0} = α) = \prod_{i = 1}^{n} P {X_{i} < α ∣ X_{0} = α} = {[Φ (α)]}^{n}

$P(C \mid X_0 = \alpha) = \prod_{i=1}^n P\{X_i < \alpha \mid X_0 = \alpha\} = \left[\Phi(\alpha)\right]^n$ where

Φ (\cdot)

$\Phi(\cdot)$ is the cumulative probability distribution of a standard normal random variable, and hence the unconditional probability is

P (C) = \int_{- \infty}^{\infty} P (C ∣ X_{0} = α) ϕ (α - μ) d α = \int_{- \infty}^{\infty} {[Φ (α)]}^{n} ϕ (α - μ) d α

$P(C) = \int_{-\infty}^{\infty}P(C \mid X_0 = \alpha) \phi(\alpha-\mu)\,\mathrm d\alpha = \int_{-\infty}^{\infty}\left[\Phi(\alpha)\right]^n \phi(\alpha-\mu)\,\mathrm d\alpha$ where

ϕ (\cdot)

$\phi(\cdot)$ is the standard normal density function. There is no closed-form expression for the value of this integral which must be evaluated numerically. Engineers are also interested in the complementary event -- that the decision is in error -- but do not like to compute this as

P {X_{0} < max_{i} X_{i}} = P (E) = 1 - P (C)

$P\{X_0 < \max_i X_i\} = P(E) = 1-P(C)$ because this requires very careful evaluation of the integral for

P (C)

$P(C)$ to an accuracy of many significant digits, and such evaluation is both difficult and time-consuming. Instead, the integral for

1 - P (C)

$1-P(C)$ can be integrated by parts to get

P {X_{0} < max_{i} X_{i}} = \int_{- \infty}^{\infty} n {[Φ (α)]}^{n - 1} ϕ (α) Φ (α - μ) d α .

$P\{X_0 < \max_i X_i\} = \int_{-\infty}^{\infty} n \left[\Phi(\alpha)\right]^{n-1}\phi(\alpha) \Phi(\alpha - \mu)\,\mathrm d\alpha.$ This integral is more easy to evaluate numerically, and its value as a function of

μ

$\mu$ is graphed and tabulated (though unfortunately only for

n \leq 20

$n \leq 20$ ) in Chapter 5 of Telecommunication Systems Engineering by Lindsey and Simon, Prentice-Hall 1973, Dover Press 1991. Alternatively, engineers use the union bound or Bonferroni inequality

\begin{aligned} P {X_{0} < max_{i} X_{i}} & = P {(X_{0} < X_{1}) \cup (X_{0} < X_{2}) \cup \dots \cup (X_{0} < X_{n})} \\ \leq \sum_{i = 1}^{n} P {X_{0} < X_{i}} \\ = n Q (\frac{μ}{\sqrt{2}}) \end{aligned}

$\begin{align*} P\{X_0 < \max_i X_i\} &= P\left\{(X_0 < X_1)\cup (X_0 < X_2) \cup \cdots \cup (X_0 < X_n)\right\}\\ &\leq \sum_{i=1}^{n}P\{X_0 < X_i\}\\ &= nQ\left(\frac{\mu}{\sqrt{2}}\right) \end{align*}$ where

Q (x) = 1 - Φ (x)

$Q(x) = 1-\Phi(x)$ is the complementary cumulative normal distribution function.

From the union bound, we see that the desired value $0.01$ for $P\{X_0 < \max_i X_i\}$ is bounded above by $60\cdot Q(\mu/\sqrt{2})$ which bound has value $0.01$ at $\mu = 5.09\ldots$ . This is slightly larger than the more exact value $\mu = 4.919\ldots$ obtained by @whuber by numerical integration.

More discussion and details about $M$ -ary orthogonal signaling can be found on pp. 161-179 of my lecture notes for a class on communication systems'

Dilip Sarwate
sumber

4

A formal answer:

The probability distribution (density) for the maximum of $N$ i.i.d. variates is: $p_N(x)= N p(x) \Phi^{N-1}(x)$ where $p$ is the probability density and $\Phi$ is the cumulative distribution function.

From this you can calculate the probability that $X_0$ is greater than the $N-1$ other ones via $P(E) = (N-1) \int_{-\infty}^{\infty} \int_y^{\infty} p(x_0) p(y) \Phi^{N-2}(y) dx_0 dy$

You may need to look into various approximations in order to tractably deal with this for your specific application.

Dave
sumber

6

+1 Actually, the double integral simplifies into a single integral since

\int_{y}^{\infty} p (x_{0}) d x_{0} = 1 - Φ (y - μ)

$\int_y^\infty p(x_0)\,\mathrm dx_0 = 1 - \Phi(y-\mu)$ giving

P (E) = 1 - (N - 1) \int_{- \infty}^{\infty} Φ^{N - 2} (y) p (y) Φ (y - μ) d y

$P(E) = 1 - (N-1)\int_{-\infty}^\infty \Phi^{N-2}(y)p(y)\Phi(y-\mu)\,\mathrm dy$ which is the same as in my answer.

Dilip Sarwate

Manakah yang terbesar, dari sekumpulan variabel acak yang terdistribusi normal?

Jawaban: