Kriteria berhenti untuk pemecah linier berulang diterapkan pada sistem yang hampir tunggal

Pertimbangkan $Ax=b$ dengan $A$ hampir singular yang berarti ada nilai eigen $\lambda_0$ dari $A$ yang sangat kecil. Kriteria berhenti yang biasa dari metode berulang didasarkan pada residual $r_n:=b-Ax_n$ dan menganggap iterasi dapat berhenti ketika $\|r_n\|/\|r_0\|<tol$ dengan $n$ nomor iterasi. Tetapi dalam kasus yang kami pertimbangkan, mungkin ada kesalahan besar $v$ tinggal di eigenspace yang terkait dengan nilai eigen kecil $\lambda_0$ yang memberikan residu kecil $Av=\lambda_0v$ . Misalkan residual awal $r_0$ besar, maka mungkin saja kita berhenti di $\|r_n\|/\|r_0\|<tol$ tetapi kesalahan $x_n-x$ masih besar. Apa indikator kesalahan yang lebih baik dalam kasus ini? Apakah $\|x_{n}-x_{n-1}\|$ kandidat yang baik?

linear-algebra Hui Zhang
sumber

Anda mungkin ingin memikirkan definisi Anda tentang "hampir tunggal". Matriks

(dengan

dan

matriks identitas) memiliki nilai eigen yang sangat kecil, tetapi jauh dari singular seperti halnya matriks apa pun.

I \cdot ϵ

$I \cdot \epsilon$

ϵ ≪ 1

$\epsilon\ll 1$

I

$I$

David Ketcheson

Juga,

Sepertinya notasi yang salah.

lebih khas, bukan?

| | r_{n} / r_{0} | |

$||r_n/r_0||$

| | r_{n} | | / | | r_{0} | |

$||r_n||/||r_0||$

Bill Barth

Ya, Anda benar, Bill! Saya akan memperbaiki kesalahan ini.

Hui Zhang

Bagaimana dengan

? dan apa algoritma Anda, tepatnya?

‖ b - A x ‖ / ‖ b ‖

$\| b - Ax \| / \| b \|$

shuhalo

Tambahan: Saya pikir makalah berikut ini cukup banyak menekan sistem yang dikondisikan buruk yang Anda khawatirkan, setidaknya jika Anda menggunakan CG: Axelson, Kaporin: Estimasi norma kesalahan dan kriteria berhenti dalam iterasi gradien konjugat terkondisi sebelumnya. DOI: 10.1002 / nla.244

shuhalo

Jawaban:

Harap tidak pernah menggunakan perbedaan antara pengulangan yang berurutan untuk menentukan kriteria berhenti. Ini salah mendiagnosis stagnasi konvergensi. Sebagian besar iterasi matriks nonsimetrik bukanlah monoton, dan bahkan GMRES dalam aritmatika yang tepat tanpa restart dapat mengalami stagnasi untuk sejumlah iterasi yang acak (hingga dimensi matriks) sebelum melakukan konvergensi secara tiba-tiba. Lihat contoh dalam Nachtigal, Reddy, dan Trefethen (1993) .

Cara yang lebih baik untuk mendefinisikan konvergensi

Kami biasanya tertarik pada keakuratan solusi kami lebih dari ukuran residu. Secara khusus, kami mungkin ingin menjamin bahwa perbedaan antara solusi perkiraan dan solusi tepat memuaskan untuk beberapa yang ditentukan pengguna . Ternyata dapat mencapai ini dengan menemukan sedemikian rupa sehingga mana adalah nilai singular terkecil dari , karena $x_n$ $x$

| x_{n} - x | < c

$|x_n - x| < c$

c

$c$

x_{n}

$x_n$

| A x_{n} - b | < c ϵ

$|A x_n - b| < c\epsilon$

ϵ

$\epsilon$

A

$A$

\begin{aligned} | x_{n} - x | & = | A^{- 1} A (x_{n} - x) | \\ \leq \frac{1}{ϵ} | A x_{n} - A x | \\ = \frac{1}{ϵ} | A x_{n} - b | \\ < \frac{1}{ϵ} \cdot c ϵ = c \end{aligned}

$\begin{align} |x_n - x| &= |A^{-1} A (x_n - x)| \\ & \le \frac 1 \epsilon |A x_n - A x| \\ & = \frac 1 \epsilon |A x_n - b| \\ & < \frac 1 \epsilon \cdot c \epsilon = c \end{align}$

di mana kita telah menggunakan bahwa adalah nilai singular terbesar dari (baris kedua) dan bahwa benar-benar memecahkan $1/\epsilon$ $A^{-1}$ $x$ $A x = b$ (baris ketiga).

Memperkirakan nilai tunggal terkecil $\epsilon$

Perkiraan akurat dari nilai singular terkecil biasanya tidak langsung tersedia dari masalah, tetapi dapat diperkirakan sebagai produk sampingan dari gradien konjugat atau iterasi GMRES. Perhatikan bahwa meskipun estimasi nilai eigen terbesar dan nilai singular biasanya cukup baik setelah hanya beberapa iterasi, estimasi akurat nilai eigen / singular terkecil biasanya hanya diperoleh setelah konvergensi tercapai. Sebelum konvergensi, estimasi umumnya akan secara signifikan lebih besar dari nilai sebenarnya. Ini menunjukkan bahwa Anda harus benar-benar menyelesaikan persamaan sebelum Anda dapat menentukan toleransi yang benar . Toleransi konvergensi otomatis yang menghasilkan akurasi yang disediakan pengguna $\epsilon$ $c\epsilon$ $c$ untuk solusi dan memperkirakan nilai singular terkecil dengan kondisi saat ini dari metode Krylov mungkin konvergen terlalu dini karena estimasi jauh lebih besar dari nilai sebenarnya. $\epsilon$ $\epsilon$

Catatan

Diskusi di atas juga bekerja dengan digantikan oleh operator prekondisi kiri dan residu dikondisikan sebelumnya atau dengan operator prekondisi kanan dan kesalahan . Jika $A$ $P^{-1}A$ $P^{-1} (A x^n - b)$ $A P^{-1}$ $P (x_n - x)$ $P^{-1}$ merupakan prekondisi yang baik, operator prakondisi akan dikondisikan dengan baik. Untuk prakondisi kiri, ini berarti residu prasyarat dapat dibuat kecil, tetapi residu sebenarnya mungkin tidak. Untuk pengkondisian yang benar,mudah dibuat kecil, tetapi kesalahan yang sebenarnyamungkin tidak. Ini menjelaskan mengapa prakondisi kiri lebih baik untuk membuat kesalahan kecil sedangkan prakondisi kanan lebih baik untuk membuat residu kecil (dan untuk debugging prekondisi tidak stabil). $|P(x_n - x)|$ $|x_n-x|$
Lihat jawaban ini untuk diskusi lebih lanjut tentang norma-norma yang diperkecil oleh GMRES dan CG.
Estimasi nilai singular ekstrem dapat dipantau menggunakan -ksp_monitor_singular_valuedengan program PETSc. Lihat KSPComputeExtremeSingularValues () untuk menghitung nilai singular dari kode.
Ketika menggunakan GMRES untuk memperkirakan nilai singular, sangat penting bahwa restart tidak digunakan (misalnya -ksp_gmres_restart 1000dalam PETSc).

Jed Brown
sumber

P^{- 1} r

$P^{-1}r$

P^{- 1} A

$P^{-1}A$

P^{- 1} δ x

$P^{-1}\delta x$

A P^{- 1}

$AP^{-1}$

Poin bagus, saya mengedit jawaban saya. Perhatikan bahwa case dengan prasyarat kanan memberi Anda kendali

P δ x

$P\delta x$ , membuka prekondisi (melamar

P^{- 1}

$P^{-1}$ ) typically amplifies low-energy modes in the error.

Jed Brown

Another way of looking at this problem is to consider the tools from discrete inverse problems, that is, problems which involve solving $Ax=b$ or $\min ||Ax-b||_2$ where $A$ is very ill-conditioned (i.e. the ratio between the first and last singular value $\sigma_1/\sigma_n$ is large).

Here, we have several methods for choosing the stopping criterion, and for an iterative method, I would recommend the L-curve criterion since it only involves quantities that are available already (DISCLAIMER: My advisor pioneered this method, so I am definitely biased towards it). I have used this with success in an iterative method.

The idea is to monitor the residual norm $\rho_k=||Ax_k-b||_2$ and the solution norm $\eta_k=||x_k||_2$ , where $x_k$ is the $k$ 'th iterate. As you iterate, this begins to draw the shape of an L in a loglog(rho,eta) plot, and the point at the corner of that L is the optimal choice.

This allows you to implement a criterion where you keep an eye on when you have passed the corner (i.e. looking at the gradient of $(\rho_k,\eta_k)$ ), and then choose the iterate that was located at the corner.

The way I did it involved storing the last 20 iterates, and if the gradient $abs(\frac{\log(\eta_k)-\log(\eta_{k-1})}{\log(\rho_k)-\log(\rho_{k-1})})$ was larger than some threshold for 20 successive iterations, I knew that I was on the vertical part of the curve and that I had passed the corner. I then took the first iterate in my array (i.e. the one 20 iterations ago) as my solution.

There are also more detailed methods for finding the corner, and these work better but require storing a significant number of iterates. Play around with it a bit. If you are in matlab, you can use the toolbox Regularization Tools, which implements some of this (specifically the "corner" function is applicable).

Note that this approach is particularly suitable for large-scale problems, since the extra computing time involved is minuscule.

OscarB
sumber

Thanks a lot! So in loglog(rho,eta) plot we begin from the right of the L curve and end at the top of L, is it? I just do not know the principle behind this criterion. Can you explain why it always behave like an L curve and why we choose the corner?

Hui Zhang

You're welcome :-D. For an iterative method, we begin from right and end at top always. It behaves as an L due to the noise in the problem - the vertical part happens at

| | A x - b | |_{2} = | | e | |_{2}

$||Ax-b||_2=||e||_2$ , where

e

$e$ is the noise vector

b_{e x a c t} = b + e

$b_{exact}=b+e$ . For more analysis see Hansen, P. C., & O'Leary, D. P. (1993). The use of the L-curve in the regularization of discrete ill-posed problems. SIAM Journal on Scientific Computing, 14. Note that I just made a slight update to the post.

OscarB

@HuiZhang: it isn't always an L. If the regularization is ambiguous it may be a double L, leading to two candidates for the solution, one with gross featurse better resolved, the other with certain details better resolved. (And of course, mor ecomplex shapes may appear.)

Arnold Neumaier

Does the L-curve apply to ill-conditioned problems where there should be a unique solution? That is, I'm interested in problems Ax = b where b is known "exactly" and A is nearly singular but still technically invertible. It would seem to me that if you use something like GMRES the norm of your current guess of x doesn't change too much over time, especially after the first however many iterations. It seems to me that the vertical part of the L-curve occurs because there is no unique/valid solution in an ill-posed problem; would this vertical feature be present in all ill-conditioned problems?

nukeguy

At one point, you will reach such a vertical line, typically because the numerical errors in your solution method result in ||Ax-b|| not decreasing. However, you are right that in such noise-free problems the curve does not always look like an L, meaning that you typically have a few corners to choose from and choosing one over the other can be hard. I believe that the paper I referenced in my comment above discusses noise-free scenarios briefly.