PostgreSQL: erreur SSL SYSCALL: EOF détecté

Question

Tout d'abord, j'ai recherché et trouvé plusieurs messages liés à cette erreur, et la plupart d'entre eux pointent soit vers un problème RAM ou un problème SSL, j'ai essayé de surmonter la possibilité SSL en ajoutant le sslmode = désactivé dans la ligne de commande:

 psql -U waypoint -d waypoint -W -c "alter table telemetria_data.historico alter clase type smallint, alter valor type real[], alter power type smallint, alter voltaje type real;" -h localhost -v sslmode=disable

Mais le même message est apparu:

SSL SYSCALL error: EOF detected connection to server was lost

Concernant le problème de mémoire possible, je ne sais pas comment le dépanner.

La structure de données est celle décrite dans la question this et, comme vous pouvez le constater, cela nécessiterait une très longue requête pour terminer la table alter complète sur toutes les tables héritées.

OS:

Linux ip-10-1-0-9 3.16.0-4-AMD64 #1 SMP Debian 3.16.36-1+deb8u2 (2016-10-19) x86_64 GNU/Linux

PostgreSQL:

PostgreSQL 9.4.9 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit

Mise à jour 2017-06-01 13:50 GMT

Modification de la commande (en raison des recommandations de @ Daniel Vérité):

time PGSSLMODE=disable psql -U waypoint -d waypoint -W -c "alter table telemetria_data.historico alter clase type smallint, alter valor type real[], alter power type smallint, alter voltaje type real;" -h localhost

Le problème a été modifié comme suit:

server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. connection to server was lost

Mise à jour 2017-06-01 15:34 GMT

Plusieurs entrées de journal trouvées (dans /var/log/postgresql/postgresql-9.4-main.log) comme ceux-ci:

2017-06-01 13:48:49 UTC [22899-357] LOG: checkpoints are occurring too frequently (19 seconds apart) 2017-06-01 13:48:49 UTC [22899-358] HINT: Consider increasing the configuration parameter "checkpoint_segments".

Je vais donc continuer avec l'indice suggéré.

J'ai également trouvé ce groupe d'entrées, qui se réfèrent en fait au crash et à la récupération ultérieure:

2017-06-01 13:49:04 UTC [4982-17] LOG: server process (PID 6569) was terminated by signal 9: Killed 2017-06-01 13:49:04 UTC [4982-18] DETAIL: Failed process was running: alter table telemetria_data.historico alter clase type smallint, alter valor type real[], alter power type smallint, alter voltaje type real; 2017-06-01 13:49:04 UTC [4982-19] LOG: terminating any other active server processes 2017-06-01 13:49:04 UTC [22902-2] WARNING: terminating connection because of crash of another server process 2017-06-01 13:49:04 UTC [22902-3] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2017-06-01 13:49:04 UTC [22902-4] HINT: In a moment you should be able to reconnect to the database and repeat your command. 2017-06-01 13:49:04 UTC [16383-1] waypoint@waypoint FATAL: the database system is in recovery mode 2017-06-01 13:49:04 UTC [16384-1] waypoint@waypoint FATAL: the database system is in recovery mode 2017-06-01 13:49:04 UTC [16386-1] waypoint@waypoint FATAL: the database system is in recovery mode 2017-06-01 13:49:04 UTC [16385-1] waypoint@waypoint FATAL: the database system is in recovery mode 2017-06-01 13:49:04 UTC [4982-20] LOG: all server processes terminated; reinitializing 2017-06-01 13:49:05 UTC [16402-1] LOG: database system was interrupted; last known up at 2017-06-01 13:48:45 UTC 2017-06-01 13:49:05 UTC [16403-1] waypoint@waypoint FATAL: the database system is in recovery mode 2017-06-01 13:49:05 UTC [16404-1] waypoint@waypoint FATAL: the database system is in recovery mode 2017-06-01 13:49:05 UTC [16414-1] waypoint@waypoint FATAL: the database system is in recovery mode 2017-06-01 13:49:05 UTC [16415-1] waypoint@waypoint FATAL: the database system is in recovery mode 2017-06-01 13:49:06 UTC [16452-1] waypoint@waypoint FATAL: the database system is in recovery mode 2017-06-01 13:49:06 UTC [16453-1] waypoint@waypoint FATAL: the database system is in recovery mode 2017-06-01 13:49:06 UTC [16462-1] waypoint@waypoint FATAL: the database system is in recovery mode 2017-06-01 13:49:06 UTC [16463-1] waypoint@waypoint FATAL: the database system is in recovery mode 2017-06-01 13:49:06 UTC [16472-1] waypoint@waypoint FATAL: the database system is in recovery mode 2017-06-01 13:49:06 UTC [16473-1] waypoint@waypoint FATAL: the database system is in recovery mode 2017-06-01 13:49:06 UTC [16482-1] waypoint@waypoint FATAL: the database system is in recovery mode 2017-06-01 13:49:06 UTC [16483-1] waypoint@waypoint FATAL: the database system is in recovery mode 2017-06-01 13:49:09 UTC [16402-2] LOG: database system was not properly shut down; automatic recovery in progress 2017-06-01 13:49:09 UTC [16402-3] LOG: redo starts at 11EC/9960F440 2017-06-01 13:49:21 UTC [16402-4] LOG: unexpected pageaddr 11E6/52726000 in log segment 00000001000011EC000000C9, offset 7495680 2017-06-01 13:49:21 UTC [16402-5] LOG: redo done at 11EC/C9723D60 2017-06-01 13:49:32 UTC [16402-6] LOG: MultiXact member wraparound protections are now enabled 2017-06-01 13:49:32 UTC [4982-21] LOG: database system is ready to accept connections

Des suggestions sur cette dernière partie du journal?

OOM Killer est activé et voici la sortie à /var/log/messages:

Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.672817] psql invoked oom-killer: gfp_mask=0x2000d0, order=2, oom_score_adj=0 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.675652] psql cpuset=/ mems_allowed=0 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.677274] CPU: 1 PID: 16367 Comm: psql Not tainted 3.16.0-4-AMD64 #1 Debian 3.16.36-1+deb8u2 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.680406] Hardware name: Xen HVM domU, BIOS 4.2.Amazon 11/11/2016 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.680557] 0000000000000000 ffffffff815123b5 ffff88003dcda1d0 0000000000000000 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.680557] ffffffff8150ff8d 0000000000000000 ffffffff810d6e3f 0000000000000000 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.680557] ffffffff81516d2e 0000000000000200 ffffffff810689d3 ffffffff810c43e4 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.680557] Call Trace: Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.680557] [<ffffffff815123b5>] ? dump_stack+0x5d/0x78 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.680557] [<ffffffff8150ff8d>] ? dump_header+0x76/0x1e8 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.680557] [<ffffffff810d6e3f>] ? smp_call_function_single+0x5f/0xa0 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.680557] [<ffffffff81516d2e>] ? mutex_lock+0xe/0x2a Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.680557] [<ffffffff810689d3>] ? put_online_cpus+0x23/0x80 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.680557] [<ffffffff810c43e4>] ? rcu_oom_notify+0xc4/0xe0 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.680557] [<ffffffff81153d1c>] ? do_try_to_free_pages+0x4ac/0x520 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.680557] [<ffffffff811427dd>] ? oom_kill_process+0x21d/0x370 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.680557] [<ffffffff8114239d>] ? find_lock_task_mm+0x3d/0x90 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.680557] [<ffffffff81142f43>] ? out_of_memory+0x473/0x4b0 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.680557] [<ffffffff81148e0f>] ? __alloc_pages_nodemask+0x9ef/0xb50 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.680557] [<ffffffff81065c86>] ? copy_process.part.25+0x116/0x1c50 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.680557] [<ffffffffa00f6bba>] ? call_filldir+0x9a/0x160 [ext4] Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.680557] [<ffffffff81058301>] ? __do_page_fault+0x1d1/0x4f0 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.680557] [<ffffffff811ac3f9>] ? get_empty_filp+0xc9/0x1c0 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.680557] [<ffffffff81067990>] ? do_fork+0xe0/0x3d0 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.680557] [<ffffffff811c6a1c>] ? __alloc_fd+0x7c/0x120 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.680557] [<ffffffff815188f9>] ? stub_clone+0x69/0x90 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.680557] [<ffffffff8151858d>] ? system_call_fast_compare_end+0x10/0x15 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.734210] Mem-Info: Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.735151] Node 0 DMA per-cpu: Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.736512] CPU 0: hi: 0, btch: 1 usd: 0 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.738300] CPU 1: hi: 0, btch: 1 usd: 0 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.740072] Node 0 DMA32 per-cpu: Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.741335] CPU 0: hi: 186, btch: 31 usd: 0 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.743092] CPU 1: hi: 186, btch: 31 usd: 0 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.745210] active_anon:370484 inactive_anon:549110 isolated_anon:24 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.745210] active_file:240 inactive_file:1425 isolated_file:0 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.745210] unevictable:0 dirty:173 writeback:0 unstable:0 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.745210] free:20427 slab_reclaimable:9729 slab_unreclaimable:3425 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.745210] mapped:567547 shmem:587500 pagetables:4209 bounce:0 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.745210] free_cma:0 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.760872] Node 0 DMA free:15224kB min:184kB low:228kB high:276kB active_anon:228kB inactive_anon:188kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:144kB shmem:204kB slab_reclaimable:4kB slab_unreclaimable:80kB kernel_stack:80kB pagetables:4kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.777646] lowmem_reserve[]: 0 3757 3757 3757 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.779553] Node 0 DMA32 free:69956kB min:44868kB low:56084kB high:67300kB active_anon:1481708kB inactive_anon:2196252kB active_file:1772kB inactive_file:1748kB unevictable:0kB isolated(anon):96kB isolated(file):0kB present:3915776kB managed:3849676kB mlocked:0kB dirty:0kB writeback:0kB mapped:2267676kB shmem:2349796kB slab_reclaimable:38712kB slab_unreclaimable:13620kB kernel_stack:2032kB pagetables:16832kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:14 all_unreclaimable? no Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.795757] lowmem_reserve[]: 0 0 0 0 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.797650] Node 0 DMA: 11*4kB (EM) 8*8kB (EM) 1*16kB (E) 2*32kB (UE) 1*64kB (E) 1*128kB (E) 2*256kB (UE) 2*512kB (EM) 3*1024kB (UEM) 3*2048kB (EMR) 1*4096kB (M) = 15228kB Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.804011] Node 0 DMA32: 13752*4kB (UEM) 85*8kB (EM) 54*16kB (M) 43*32kB (M) 17*64kB (M) 15*128kB (M) 10*256kB (M) 3*512kB (M) 2*1024kB (M) 0*2048kB 1*4096kB (R) = 71176kB Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.811528] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.815461] 588017 total pagecache pages Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.816967] 0 pages in swap cache Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.818205] Swap cache stats: add 0, delete 0, find 0/0 Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.820215] Free swap = 0kB Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.821475] Total swap = 0kB Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.822543] 982941 pages RAM Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.823640] 0 pages HighMem/MovableOnly Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.825110] 16525 pages reserved Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.826310] 0 pages hwpoisoned Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.827473] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.830523] [ 159] 0 159 8242 800 21 0 0 systemd-journal Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.833717] [ 162] 0 162 10200 135 22 0 -1000 systemd-udevd Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.836679] [ 316] 0 316 6351 1726 14 0 0 dhclient Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.839533] [ 351] 0 351 7181 72 18 0 0 cron Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.842112] [ 353] 0 353 4964 68 14 0 0 systemd-logind Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.844894] [ 362] 107 362 10531 96 26 0 -900 dbus-daemon Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.847504] [ 376] 106 376 8345 154 21 0 0 ntpd Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.850085] [ 377] 0 377 65721 457 30 0 0 rsyslogd Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.852755] [ 388] 0 388 3909 39 12 0 0 agetty Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.855305] [ 389] 0 389 3864 40 13 0 0 agetty Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.857914] [ 451] 0 451 13796 168 29 0 -1000 sshd Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.860562] [ 481] 1002 481 26362 5081 54 0 0 perfmon_loop.rb Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.863918] [ 486] 1002 486 15211 3146 31 0 0 cht_perfmon Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.867273] [ 625] 0 625 9560 144 22 0 0 master Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.869878] [ 630] 108 630 10164 234 24 0 0 qmgr Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.872356] [ 3443] 0 3443 20130 213 41 0 0 sshd Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.874903] [ 3445] 1000 3445 20164 222 39 0 0 sshd Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.877451] [ 3446] 1000 3446 3176 43 9 0 0 sftp-server Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.880261] [ 4982] 105 4982 614831 42946 132 0 -900 postgres Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.882988] [ 6634] 0 6634 1570 23 9 0 0 collectdmon Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.885784] [ 6635] 0 6635 174485 156 36 0 0 collectd Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.888347] [22899] 105 22899 615399 541666 1105 0 0 postgres Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.890970] [22900] 105 22900 615395 14251 88 0 0 postgres Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.893836] [22901] 105 22901 615088 4252 53 0 0 postgres Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.896365] [22902] 105 22902 615305 1316 60 0 0 postgres Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.899083] [22903] 105 22903 21336 378 40 0 0 postgres Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.901954] [ 2946] 108 2946 10076 137 22 0 0 pickup Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.904462] [ 6376] 0 6376 20130 213 42 0 0 sshd Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.907440] [ 6378] 1000 6378 20130 209 40 0 0 sshd Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.910534] [ 6379] 1000 6379 5795 151 16 0 0 bash Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.913144] [ 6382] 0 6382 11515 107 28 0 0 Sudo Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.915635] [ 6383] 0 6383 11895 96 27 0 0 su Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.918149] [ 6384] 105 6384 5796 139 16 0 0 bash Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.920900] [ 6561] 105 6561 18289 236 40 0 0 psql Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.923394] [ 6569] 105 6569 925161 853454 1718 0 0 postgres Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.926097] [16319] 0 16319 10865 95 25 0 0 cron Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.928611] [16320] 0 16320 10865 95 25 0 0 cron Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.931063] [16321] 0 16321 10865 95 25 0 0 cron Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.933758] [16322] 0 16322 10865 95 25 0 0 cron Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.936197] [16323] 1000 16323 1084 20 7 0 0 sh Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.938762] [16324] 1000 16324 1084 20 7 0 0 sh Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.941450] [16325] 1000 16325 1084 21 7 0 0 sh Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.943849] [16326] 1000 16326 1084 21 6 0 0 sh Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.946409] [16327] 1000 16327 3612 54 12 0 0 telemetria.sh Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.949258] [16328] 1000 16328 3613 57 12 0 0 instantaneo.sh Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.952633] [16329] 1000 16329 21335 4808 48 0 0 mon-put-instanc Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.955744] [16330] 1000 16330 3612 54 12 0 0 conexiones.sh Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.958555] [16366] 1000 16366 10744 1513 26 0 0 psql Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.961216] [16367] 1000 16367 10735 1512 26 0 0 psql Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.963722] [16368] 1000 16368 7725 1044 19 0 0 aws

Mise à jour 2017-06-01 16:19 GMT

Paramètres modifiés pour:

checkpoint_segments = 100 # in logfile segments, min 1, 16MB each checkpoint_timeout = 30s # range 30s-1h

Et j'ai rempli le disque dur :( J'ai généreusement augmenté les checkpoint_segments, mais je n'ai pas d'abord vérifié l'espace disponible. Heureusement, je teste cette procédure dans un environnement de non-production. Je vais donc peut-être devoir cloner à nouveau le serveur de production, ou existe-t-il un moyen de libérer de l'espace temporaire utilisé, qui est maintenant gaspillé?

ERROR: could not extend file "base/16384/3940428": No space left on device HINT: Check free disk space.

Selon la question de @ deszo, les valeurs de surcharge de mémoire sont les suivantes:

vm.nr_overcommit_hugepages = 0 vm.overcommit_kbytes = 0 vm.overcommit_memory = 0 vm.overcommit_ratio = 50

Mise à jour 2017-06-01 18:10 GMT

L'instance de serveur est un AWS c4.large (2 vCPU, 3,75 Go de RAM)

Quelques paramètres supplémentaires from postgresql.conf:

shared_buffers = 2GB # min 128kB work_mem = 32MB # min 64kB max_connections =800 # (change requires restart)

Evan Carroll · Answer

Je suppose que vous exécutez PostgreSQL sur AWS. Si oui, vous devez l'identifier dans la question et les balises.

Le dernier processus mentionné dans le vidage du noyau est en fait aws donc cela semble être une valeur sûre

Jun 1 13:49:04 ip-10-1-0-9 kernel: [260956.963722] [16368] 1000 16368 7725 1044 19 0 0 aws

En ce qui concerne PostgreSQL, cela implique simplement n échec de connexion

En ce qui concerne le dépannage de ce qui déclenche l'oom_killer d'aws, vous avez deux options ..

Inférieur work_mem en dessous du seuil qui déclenche oom_killer. Quelle que soit la requête planifiée différemment autour de la différence de mémoire ou plus lente, c'est celle qui vous retient.
Ensemble log_statement et lisez les derniers problèmes de requêtes avant le crash de mémoire insuffisante.