Dalam pembelajaran mesin, mengapa superskrip digunakan sebagai pengganti subskrip?

Saya mengambil kursus Andrew Ng tentang Pembelajaran Mesin melalui Coursera . Untuk persamaan, superskrip digunakan sebagai pengganti subskrip. Misalnya, dalam persamaan berikut $x^{(i)}$ digunakan sebagai ganti $x_i$ :

$J(\theta_0, \theta_1) = \frac{1}{2m} \sum\limits_{i=1}^{m}{(h_\theta(x^{(i)}) - y^{(i)})^2}$

Rupanya, ini adalah praktik umum. Pertanyaan saya adalah mengapa menggunakan superskrip, bukan subskrip? Superskrip sudah digunakan untuk eksponensial. Memang saya tampaknya dapat membedakan antara superskrip dan kasus penggunaan eksponensial dengan memperhatikan apakah tanda kurung ada atau tidak, tetapi tampaknya masih membingungkan.

machine-learning notation entpnerd
sumber

Saya menduga itu mungkin karena beberapa orang ilmu komputer tidak berpengalaman dalam notasi matematika standar, dan karena itu membuat notasi mereka sendiri. Aktuaris kadang-kadang juga melakukan hal ini, dan itu membuat frustrasi ketika Anda sampai pada konsep yang lebih rumit.

rocinante

Apakah ipengindeksan atas ukuran kumpulan data, atau elemen-elemen vektor x? Jika yang pertama, itu benar-benar standar. Jika yang terakhir, itu sama sekali tidak standar. Dan alasan mengapa superskrip digunakan adalah karena kadang-kadang Anda ingin merujuk ke elemen vektor menggunakan subskrip.

Rex Kerr

@rocinante lol tidak, itu karena subskrip sudah diambil untuk vektor pengindeksan.

Neil G

@rocinante Itu agak lancang. Bagaimana dengan vektor kontravarian / notasi Einstein ?

Will Vousden

@rocinante Saya harus menggemakan yang lain untuk menggarisbawahi bahwa kata-kata Anda tidak beruntung. Kita semua memiliki kecenderungan untuk menganggap apa yang lokal dan akrab sebagai standar.

Nick Cox

Jawaban:

Jika menunjukkan vektor maka adalah notasi standar untuk koordinat ke- dari , yaitu $x$ $x \in \mathbb R^m$ $x_i$ $i$ $x$

x = (x_{1}, x_{2}, \dots, x_{m}) \in R^{m} .

$x = (x_1, x_2, \ldots, x_m)\in\mathbb R^m.$

Jika Anda memiliki koleksi vektor seperti itu, bagaimana Anda menunjukkan vektor ke- ? Anda tidak bisa menulis , ini memiliki arti standar lainnya. Jadi terkadang orang menulis $n$ $i$ $x_i$ $x^{(i)}$ dan itu adalah saya percaya mengapa Andrew Ng melakukannya.

Yaitu

x^{(1)} = (x_{1}^{(1)}, x_{2}^{(1)}, \dots, x_{m}^{(1)}) \in R^{m} x^{(2)} = (x_{1}^{(2)}, x_{2}^{(2)}, \dots, x_{m}^{(2)}) \in R^{m} \dots x^{(n)} = (x_{1}^{(n)}, x_{2}^{(n)}, \dots, x_{m}^{(n)}) \in R^{m} .

$\begin{equation} x^{(1)} = (x_1^{(1)}, x_2^{(1)}, \ldots, x_m^{(1)}) \in \mathbb R^m\\ x^{(2)} = (x_1^{(2)}, x_2^{(2)}, \ldots, x_m^{(2)}) \in \mathbb R^m\\ \ldots \\ x^{(n)} = (x_1^{(n)}, x_2^{(n)}, \ldots, x_m^{(n)}) \in \mathbb R^m.\\ \end{equation}$

amuba kata Reinstate Monica
sumber

Saya tidak setuju, tetapi sering

digunakan, yaitu untuk pengukuran berulang.

x_{i j}

$x_{ij}$

Cliff AB

Ya, tetapi

sama dengan

; apa yang akan setara dengan

x_{i j}

$x_{ij}$

x_{j}^{(i)}

$x^{(i)}_j$

x^{(i)}

$x^{(i)}$

Amoeba berkata Reinstate Monica

x_{i .}

$x_{i.}$

\sum_{j = 1}^{n} x_{i j} / m

$\sum_{j= 1}^n x_{ij}/m$

x_{m n}^{(i)}

$x_{mn}^{(i)}$ seems the most intuitive way to do so. Therefore the notation stays consistent when moving from vectors to matrices.

josh

@JAB Yes, it's to make the notation more explicit ("type hinting" as you say). Of course one can agree to use

x_{i}

$x_i$ for the

i

$i$ -th vector and

x_{i j}

$x_{ij}$ for the

j

$j$ -th element of the

i

$i$ -th vector. There are various conventions possible, this is just one of them. I am not even saying it is the best one, just explaining the rationale behind it.

amoeba says Reinstate Monica

The use of super scripts as you have stated I believe is not very common in machine learning literature. I'd have to review Ng's course notes to confirm, but if he's putting that use there, I would say he would be origin of the proliferation of this notation. This is a possibility. Either way, not to be too unkind, but I don't think many of the online course students are publishing literature on machine learning, so this notation is not very common in the actual literature. After all, these are introductory courses in machine learning, not PhD level courses.

What is very common with super scripts is to denote the iteration of an algorithm using super scripts. For example, you could write an iteration of Newton's method as

$\theta^{(t+1)} = \theta^{(t)} - H(\theta^{(t)}) ^{-1} \nabla \theta^{(t)}$

where $H(\theta^{(t)})$ is the Hessian and $\nabla \theta^{(t)}$ is the gradient.

(...yes this is not quite the best way to implement Newton's method due to the inversion of the Hessian matrix...)

Here, $\theta^{(t)}$ represents the value of $\theta$ in the $t^{th}$ iteration. This is the most common (but certainly not only) use of super scripts that I am aware of.

EDIT: To clarify, in the original question, it appeared to suggest that in the ML notation, $x^{(i)}$ was equivalent to statistic's $x_i$ notation. In my answer, I state that this is not truly prevalent in ML literature. This is true. However, as pointed out by @amoeba, there is plenty of superscript notation in ML literature for data, but in these cases $x^{(i)}$ does not typically mean the $i^{th}$ observation of a single vector $x$ .

Cliff AB
sumber

The clash with the use of parenthesized/bracketed superscripts for iteration counts (a notation that is in common use across a wide range of areas) is a really important thing to raise.

Glen_b -Reinstate Monica

It is also commonly used to indicate the index of the sample in the training set, which is like the iteration but not exactly the same because you usually end up iterating through your training set many times.

Rex Kerr

I've also seen iteration counts noted using subscripts (

a_{n + 1} = a_{n} + 1

$a_{n+1} = a_n + 1$ ) as well as in line (

a (n + 1) = a (n) + 1

$a(n+1) = a(n) + 1$ ). Which is why, when using some specific notation, I'll usually put something at the start to disambiguate (e.g. saying "in the following series, blah blah blah" and then putting the math). Thus, whatever notation is in use, readers can (hopefully) intuit the meaning for potentially ambiguous cases rather than having to guess based on the conventions they know.

JAB

I agree with @JAB. More generally, I don't think it's heinous for people who will be writing and using code to borrow notation from software in mathematical treatments. For example, and contentiously, computing people are way ahead of many mathematical groups in using clean notation such as

(x > 0)

$(x > 0)$ , to be evaluated as 1 if true and 0 if false, instead of unnecessary formalisms such as

I (x > 0)

$I(x > 0)$ ; here I am merely following behind Donald Knuth.

Nick Cox

@NickCox I generally only see the

I (x > 0)

$I(x > 0)$ form when it comes to probability; otherwise,

x > 0

$x > 0$ is just an inequality constraint. When it comes to mathematical equations, they're either broken up into piecewise representations or they just represent the equation itself as an inequality as doing otherwise would induce ambiguity. (It's similar to how

=

$=$ in math is more subtle than either = or == in most programming languages; it introduces a constraint or definition rather than an actual assignment or equality check.)

JAB

Superscripts are already used for exponentiation.

In mathematics superscripts are used left and right depending on the field. The choice is always historical legacy, nothing more. Whoever first got into the field set the convention of using sub- or superscripts.

Two examples. Superscripts are used to denote derivatives: $f(x)^{(n)}$

In tensor algebra both super and subscripts are used heavily for the same thing like $R^i_i$ could mean $i$ rows and $j$ columns. It's quite expressive: $T_i^k=R_i^jC_j^k$

Also I remember using scripts before letters (prescripts) in Physics, e.g. $^i_jB_k^l$ . I think it was with tensors.

Hence, the choice of superscripts by Ng is purely historical too. There's no real reason to use or not use them, or prefer them to subscripts. Actually, I believe that here ML people are using tensor notation. They definitely are well versed in the subject, e.g. see this paper.

Aksakal
sumber

Another example for your point: Einstein notation

Neil G