PostgreSQL: Kesalahan SSL SYSCALL: EOF terdeteksi

10

Pertama, saya telah mencari dan menemukan beberapa posting yang berkaitan dengan kesalahan ini, dan sebagian besar dari mereka mengarah ke masalah RAM atau masalah SSL, saya mencoba mengatasi kemungkinan SSL dengan menambahkan sslmode = dinonaktifkan di baris perintah:

 psql -U waypoint -d waypoint -W -c "alter table telemetria_data.historico alter clase type smallint, alter valor type real[], alter power type smallint, alter voltaje type real;" -h localhost -v sslmode=disable

Tetapi pesan yang sama muncul:

SSL SYSCALL error: EOF detected
connection to server was lost

Mengenai masalah memori yang mungkin, saya tidak tahu bagaimana cara memecahkannya.

Struktur data adalah yang dijelaskan dalam pertanyaan ini dan, karena Anda mungkin menemukan ini akan sangat lama berjalan query untuk menyelesaikan mencapai tabel alter penuh atas semua tabel yang diwarisi.

OS:

Linux ip-10-1-0-9 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u2 (2016-10-19) x86_64 GNU/Linux

PostgreSQL:

PostgreSQL 9.4.9 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit

Perbarui 2017-06-01 13:50 GMT

Perintah yang diubah menjadi (karena rekomendasi @ Daniel Vérité):

time PGSSLMODE=disable psql -U waypoint -d waypoint -W -c "alter table telemetria_data.historico alter clase type smallint, alter valor type real[], alter power type smallint, alter voltaje type real;" -h localhost

Masalah sebenarnya berubah menjadi sebagai berikut:

server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
connection to server was lost

Perbarui 2017-06-01 15:34 GMT

Ditemukan beberapa entri log /var/log/postgresql/postgresql-9.4-main.logseperti ini:

2017-06-01 13:48:49 UTC [22899-357] LOG:  checkpoints are occurring too frequently (19 seconds apart)
2017-06-01 13:48:49 UTC [22899-358] HINT:  Consider increasing the configuration parameter "checkpoint_segments".

Jadi saya akan melanjutkan dengan petunjuk yang disarankan.

Ditemukan juga grup entri ini, yang sebenarnya merujuk pada kerusakan dan pemulihan selanjutnya:

2017-06-01 13:49:04 UTC [4982-17] LOG:  server process (PID 6569) was terminated by signal 9: Killed
2017-06-01 13:49:04 UTC [4982-18] DETAIL:  Failed process was running: alter table telemetria_data.historico alter clase type smallint, alter valor type real[], alter power type smallint, alter voltaje type real;
2017-06-01 13:49:04 UTC [4982-19] LOG:  terminating any other active server processes
2017-06-01 13:49:04 UTC [22902-2] WARNING:  terminating connection because of crash of another server process
2017-06-01 13:49:04 UTC [22902-3] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2017-06-01 13:49:04 UTC [22902-4] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2017-06-01 13:49:04 UTC [16383-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:04 UTC [16384-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:04 UTC [16386-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:04 UTC [16385-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:04 UTC [4982-20] LOG:  all server processes terminated; reinitializing
2017-06-01 13:49:05 UTC [16402-1] LOG:  database system was interrupted; last known up at 2017-06-01 13:48:45 UTC
2017-06-01 13:49:05 UTC [16403-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:05 UTC [16404-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:05 UTC [16414-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:05 UTC [16415-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:06 UTC [16452-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:06 UTC [16453-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:06 UTC [16462-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:06 UTC [16463-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:06 UTC [16472-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:06 UTC [16473-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:06 UTC [16482-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:06 UTC [16483-1] waypoint@waypoint FATAL:  the database system is in recovery mode
2017-06-01 13:49:09 UTC [16402-2] LOG:  database system was not properly shut down; automatic recovery in progress
2017-06-01 13:49:09 UTC [16402-3] LOG:  redo starts at 11EC/9960F440
2017-06-01 13:49:21 UTC [16402-4] LOG:  unexpected pageaddr 11E6/52726000 in log segment 00000001000011EC000000C9, offset 7495680
2017-06-01 13:49:21 UTC [16402-5] LOG:  redo done at 11EC/C9723D60
2017-06-01 13:49:32 UTC [16402-6] LOG:  MultiXact member wraparound protections are now enabled
2017-06-01 13:49:32 UTC [4982-21] LOG:  database system is ready to accept connections

Ada saran di bagian log terakhir ini?

Pembunuh OOM diaktifkan, dan berikut ini adalah output di /var/log/messages:

Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.672817] psql invoked oom-killer: gfp_mask=0x2000d0, order=2, oom_score_adj=0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.675652] psql cpuset=/ mems_allowed=0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.677274] CPU: 1 PID: 16367 Comm: psql Not tainted 3.16.0-4-amd64 #1 Debian 3.16.36-1+deb8u2
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680406] Hardware name: Xen HVM domU, BIOS 4.2.amazon 11/11/2016
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  0000000000000000 ffffffff815123b5 ffff88003dcda1d0 0000000000000000
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  ffffffff8150ff8d 0000000000000000 ffffffff810d6e3f 0000000000000000
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  ffffffff81516d2e 0000000000000200 ffffffff810689d3 ffffffff810c43e4
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557] Call Trace:
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff815123b5>] ? dump_stack+0x5d/0x78
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff8150ff8d>] ? dump_header+0x76/0x1e8
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff810d6e3f>] ? smp_call_function_single+0x5f/0xa0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff81516d2e>] ? mutex_lock+0xe/0x2a
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff810689d3>] ? put_online_cpus+0x23/0x80
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff810c43e4>] ? rcu_oom_notify+0xc4/0xe0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff81153d1c>] ? do_try_to_free_pages+0x4ac/0x520
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff811427dd>] ? oom_kill_process+0x21d/0x370
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff8114239d>] ? find_lock_task_mm+0x3d/0x90
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff81142f43>] ? out_of_memory+0x473/0x4b0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff81148e0f>] ? __alloc_pages_nodemask+0x9ef/0xb50
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff81065c86>] ? copy_process.part.25+0x116/0x1c50
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffffa00f6bba>] ? call_filldir+0x9a/0x160 [ext4]
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff81058301>] ? __do_page_fault+0x1d1/0x4f0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff811ac3f9>] ? get_empty_filp+0xc9/0x1c0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff81067990>] ? do_fork+0xe0/0x3d0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff811c6a1c>] ? __alloc_fd+0x7c/0x120
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff815188f9>] ? stub_clone+0x69/0x90
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.680557]  [<ffffffff8151858d>] ? system_call_fast_compare_end+0x10/0x15
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.734210] Mem-Info:
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.735151] Node 0 DMA per-cpu:
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.736512] CPU    0: hi:    0, btch:   1 usd:   0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.738300] CPU    1: hi:    0, btch:   1 usd:   0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.740072] Node 0 DMA32 per-cpu:
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.741335] CPU    0: hi:  186, btch:  31 usd:   0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.743092] CPU    1: hi:  186, btch:  31 usd:   0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.745210] active_anon:370484 inactive_anon:549110 isolated_anon:24
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.745210]  active_file:240 inactive_file:1425 isolated_file:0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.745210]  unevictable:0 dirty:173 writeback:0 unstable:0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.745210]  free:20427 slab_reclaimable:9729 slab_unreclaimable:3425
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.745210]  mapped:567547 shmem:587500 pagetables:4209 bounce:0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.745210]  free_cma:0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.760872] Node 0 DMA free:15224kB min:184kB low:228kB high:276kB active_anon:228kB inactive_anon:188kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:144kB shmem:204kB slab_reclaimable:4kB slab_unreclaimable:80kB kernel_stack:80kB pagetables:4kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.777646] lowmem_reserve[]: 0 3757 3757 3757
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.779553] Node 0 DMA32 free:69956kB min:44868kB low:56084kB high:67300kB active_anon:1481708kB inactive_anon:2196252kB active_file:1772kB inactive_file:1748kB unevictable:0kB isolated(anon):96kB isolated(file):0kB present:3915776kB managed:3849676kB mlocked:0kB dirty:0kB writeback:0kB mapped:2267676kB shmem:2349796kB slab_reclaimable:38712kB slab_unreclaimable:13620kB kernel_stack:2032kB pagetables:16832kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:14 all_unreclaimable? no
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.795757] lowmem_reserve[]: 0 0 0 0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.797650] Node 0 DMA: 11*4kB (EM) 8*8kB (EM) 1*16kB (E) 2*32kB (UE) 1*64kB (E) 1*128kB (E) 2*256kB (UE) 2*512kB (EM) 3*1024kB (UEM) 3*2048kB (EMR) 1*4096kB (M) = 15228kB
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.804011] Node 0 DMA32: 13752*4kB (UEM) 85*8kB (EM) 54*16kB (M) 43*32kB (M) 17*64kB (M) 15*128kB (M) 10*256kB (M) 3*512kB (M) 2*1024kB (M) 0*2048kB 1*4096kB (R) = 71176kB
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.811528] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.815461] 588017 total pagecache pages
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.816967] 0 pages in swap cache
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.818205] Swap cache stats: add 0, delete 0, find 0/0
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.820215] Free swap  = 0kB
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.821475] Total swap = 0kB
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.822543] 982941 pages RAM
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.823640] 0 pages HighMem/MovableOnly
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.825110] 16525 pages reserved
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.826310] 0 pages hwpoisoned
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.827473] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.830523] [  159]     0   159     8242      800      21        0             0 systemd-journal
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.833717] [  162]     0   162    10200      135      22        0         -1000 systemd-udevd
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.836679] [  316]     0   316     6351     1726      14        0             0 dhclient
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.839533] [  351]     0   351     7181       72      18        0             0 cron
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.842112] [  353]     0   353     4964       68      14        0             0 systemd-logind
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.844894] [  362]   107   362    10531       96      26        0          -900 dbus-daemon
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.847504] [  376]   106   376     8345      154      21        0             0 ntpd
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.850085] [  377]     0   377    65721      457      30        0             0 rsyslogd
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.852755] [  388]     0   388     3909       39      12        0             0 agetty
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.855305] [  389]     0   389     3864       40      13        0             0 agetty
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.857914] [  451]     0   451    13796      168      29        0         -1000 sshd
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.860562] [  481]  1002   481    26362     5081      54        0             0 perfmon_loop.rb
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.863918] [  486]  1002   486    15211     3146      31        0             0 cht_perfmon
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.867273] [  625]     0   625     9560      144      22        0             0 master
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.869878] [  630]   108   630    10164      234      24        0             0 qmgr
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.872356] [ 3443]     0  3443    20130      213      41        0             0 sshd
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.874903] [ 3445]  1000  3445    20164      222      39        0             0 sshd
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.877451] [ 3446]  1000  3446     3176       43       9        0             0 sftp-server
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.880261] [ 4982]   105  4982   614831    42946     132        0          -900 postgres
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.882988] [ 6634]     0  6634     1570       23       9        0             0 collectdmon
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.885784] [ 6635]     0  6635   174485      156      36        0             0 collectd
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.888347] [22899]   105 22899   615399   541666    1105        0             0 postgres
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.890970] [22900]   105 22900   615395    14251      88        0             0 postgres
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.893836] [22901]   105 22901   615088     4252      53        0             0 postgres
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.896365] [22902]   105 22902   615305     1316      60        0             0 postgres
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.899083] [22903]   105 22903    21336      378      40        0             0 postgres
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.901954] [ 2946]   108  2946    10076      137      22        0             0 pickup
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.904462] [ 6376]     0  6376    20130      213      42        0             0 sshd
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.907440] [ 6378]  1000  6378    20130      209      40        0             0 sshd
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.910534] [ 6379]  1000  6379     5795      151      16        0             0 bash
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.913144] [ 6382]     0  6382    11515      107      28        0             0 sudo
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.915635] [ 6383]     0  6383    11895       96      27        0             0 su
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.918149] [ 6384]   105  6384     5796      139      16        0             0 bash
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.920900] [ 6561]   105  6561    18289      236      40        0             0 psql
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.923394] [ 6569]   105  6569   925161   853454    1718        0             0 postgres
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.926097] [16319]     0 16319    10865       95      25        0             0 cron
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.928611] [16320]     0 16320    10865       95      25        0             0 cron
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.931063] [16321]     0 16321    10865       95      25        0             0 cron
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.933758] [16322]     0 16322    10865       95      25        0             0 cron
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.936197] [16323]  1000 16323     1084       20       7        0             0 sh
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.938762] [16324]  1000 16324     1084       20       7        0             0 sh
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.941450] [16325]  1000 16325     1084       21       7        0             0 sh
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.943849] [16326]  1000 16326     1084       21       6        0             0 sh
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.946409] [16327]  1000 16327     3612       54      12        0             0 telemetria.sh
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.949258] [16328]  1000 16328     3613       57      12        0             0 instantaneo.sh
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.952633] [16329]  1000 16329    21335     4808      48        0             0 mon-put-instanc
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.955744] [16330]  1000 16330     3612       54      12        0             0 conexiones.sh
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.958555] [16366]  1000 16366    10744     1513      26        0             0 psql
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.961216] [16367]  1000 16367    10735     1512      26        0             0 psql
Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.963722] [16368]  1000 16368     7725     1044      19        0             0 aws

Perbarui 2017-06-01 16:19 GMT

Pengaturan yang diubah menjadi:

checkpoint_segments = 100       # in logfile segments, min 1, 16MB each
checkpoint_timeout = 30s        # range 30s-1h

Dan saya mengisi hard drive :( Saya dengan murah hati meningkatkan checkpoint_segments, tetapi tidak lebih dulu memeriksa ruang yang tersedia. Untungnya saya sedang menguji prosedur ini dalam lingkungan non-produksi. Jadi saya mungkin harus mengkloning server produksi sekali lagi, atau apakah ada cara untuk membebaskan ruang temp yang digunakan, yang sekarang terbuang sia-sia?

ERROR:  could not extend file "base/16384/3940428": No space left on device
HINT:  Check free disk space.

Sesuai pertanyaan @ deszo, nilai overcommit memori adalah sebagai berikut:

vm.nr_overcommit_hugepages = 0
vm.overcommit_kbytes = 0
vm.overcommit_memory = 0
vm.overcommit_ratio = 50

Pembaruan 2017-06-01 18: 107 GMT

Server instance adalah AWS c4.large (2 vCPU, 3,75GB RAM)

Beberapa parameter lagi from postgresql.conf:

shared_buffers = 2GB            # min 128kB
work_mem = 32MB             # min 64kB
max_connections =800            # (change requires restart)
Gonzalo Vasquez
sumber
1
Sepertinya crash backend. Mungkin kondisi kehabisan memori. Periksa log server postgresql, dan jika OSnya Linux dan pembunuh OOM diaktifkan, kernel akan log pada saat kegagalan.
Daniel Vérité
@ DanielVérité menambahkan entri-entri PostgreSQL dan kernel log ke pertanyaan
Gonzalo Vasquez
1
postgresql.org/docs/current/static/… - terlepas dari ini, berapa ukuran RAM pada instance Anda? Apa shared_buffers, work_memdan max_connectionsset ke?
dezso
@dezso menambahkan ... Anda harus mengikuti kiat tautan Anda, tetapi begitu saya "menghidupkan kembali" instance dari de disk full issue;)
Gonzalo Vasquez

Jawaban:

5

Dugaan saya adalah Anda menjalankan PostgreSQL di AWS. Jika demikian, Anda perlu mengidentifikasi ini dalam pertanyaan dan tag.

Proses terakhir yang disebutkan dalam dump kernel sebenarnya awsjadi sepertinya taruhan yang aman

Jun  1 13:49:04 ip-10-1-0-9 kernel: [260956.963722] [16368]  1000 16368     7725     1044      19        0             0 aws

Sehubungan dengan PostgreSQL, ini hanya menyiratkan kegagalan koneksi

Sejauh masalah menembak itu yang memicu oom_killer aws, Anda memiliki dua opsi ..

  • Lebih rendah di work_membawah ambang batas yang memicu oom_killer. Apa pun permintaan yang direncanakan secara berbeda di sekitar perbedaan dalam memori atau lebih lambat adalah salah satu yang menahan Anda.
  • Atur log_statementdan baca masalah kueri terakhir sebelum kehabisan memori macet.
Evan Carroll
sumber