web-dev-qa-db-fra.com

Gèle aléatoire à Kubuntu 20.04

C'est ma première fois que je suis confronté à des problèmes avec Ubuntu sur ma machine, j'ai récemment changé le SSD de mon PC avec un tout neuf, cela fonctionne vraiment bien dans Windows et le firmware est à jour.

matériel

  • Kingston A200 NVME 500GB (BTRS et XFS)
  • Graphiques hybrides (Intel HD 530, NVIDIA GEFORCE GTX 950M)

logiciel

  • Pilote Nvidia 440 (des référentiels officiels, profil prime: sur demande)
  • Pilote Cuda (du référentiel officiel)
  • Kernel Linux 5.4.0-42-générique (démarrage sécurisé activé)

Parfois, j'utilise mon ordinateur portable et Kwin arrêtez de travailler, je ne peux pas ouvrir le lanceur d'applications, mais je peux changer la fenêtre via la touche Alt + Tab mais après quelques secondes, l'écran est complètement gelé, je ne peux pas contrôler le Souris, la température commence à augmenter, je ne peux pas passer à une autre console pour vérifier l'erreur (contrôle + alt + f2) et je ne peux que redémarrer mon PC avec la touche SYSRQ Magic + reisub.

Info pertinents de mon système:

Version du BIOS

Sudo dmidecode -s bios-version
E5CN63WW

Données de RAM et de swap:

free -h
              total        used        free      shared  buff/cache   available
Mem:           15Gi       3,9Gi       7,0Gi       1,3Gi       4,5Gi        10Gi
Swap:         3,8Gi       1,8Gi       2,0Gi

Accrochage

sysctl vm.swappiness
vm.swappiness = 60

La revue système journalctl -k -b -1 (pour moi) n'a rien montré de pertinent, mais je joins des messages avec des avertissements ou des alertes ci-dessous, au cas où je oublie quelque chose

Premier journal

aug 11 20:49:22 josejacomeb-Lenovo-ideapad-700-15ISK kernel: IRQ 125: no longer affine to CPU1
aug 11 20:49:22 josejacomeb-Lenovo-ideapad-700-15ISK kernel: IRQ 140: no longer affine to CPU4
aug 11 20:49:22 josejacomeb-Lenovo-ideapad-700-15ISK kernel: IRQ 124: no longer affine to CPU6
aug 11 20:49:22 josejacomeb-Lenovo-ideapad-700-15ISK kernel: IRQ 128: no longer affine to CPU6
aug 11 20:49:22 josejacomeb-Lenovo-ideapad-700-15ISK kernel: IRQ 138: no longer affine to CPU7
aug 11 20:49:22 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI: button: The lid device is not compliant to SW_LID.
aug 11 20:49:22 josejacomeb-Lenovo-ideapad-700-15ISK kernel: iwlwifi 0000:02:00.0: FW already configured (0) - re-configuring
aug 11 20:49:23 josejacomeb-Lenovo-ideapad-700-15ISK kernel: Bluetooth: hci0: unexpected event for opcode 0xfc2f
aug 11 20:49:29 josejacomeb-Lenovo-ideapad-700-15ISK kernel: kauditd_printk_skb: 43 callbacks suppressed
aug 11 20:49:55 josejacomeb-Lenovo-ideapad-700-15ISK kernel: xfs filesystem being remounted at /run/systemd/unit-root/var/cache/private/fwupdmgr supports timestamps until 2038 (0x7fffffff)

Deuxième journal

aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: [Firmware Bug]: TPM Final Events table missing or invalid
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: TAA CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html for more details.
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel:  #5 #6 #7
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI BIOS Error (bug): Failure creating named object [\_PR.CPU0._PPC], AE_ALREADY_EXISTS (20190816/dswload2-326)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-220)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI BIOS Error (bug): Failure creating named object [\_PR.CPU0._PCT], AE_ALREADY_EXISTS (20190816/dswload2-326)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-220)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI BIOS Error (bug): Failure creating named object [\_PR.CPU0._PSS], AE_ALREADY_EXISTS (20190816/dswload2-326)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-220)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI BIOS Error (bug): Failure creating named object [\_PR.CPU0.LPSS], AE_ALREADY_EXISTS (20190816/dswload2-326)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-220)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI BIOS Error (bug): Failure creating named object [\_PR.CPU0.TPSS], AE_ALREADY_EXISTS (20190816/dswload2-326)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-220)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI BIOS Error (bug): Failure creating named object [\_PR.CPU0.PSDF], AE_ALREADY_EXISTS (20190816/dswload2-326)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-220)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI BIOS Error (bug): Failure creating named object [\_PR.CPU0._PSD], AE_ALREADY_EXISTS (20190816/dswload2-326)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-220)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI BIOS Error (bug): Failure creating named object [\_PR.CPU0.HPSD], AE_ALREADY_EXISTS (20190816/dswload2-326)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-220)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI BIOS Error (bug): Failure creating named object [\_PR.CPU0.SPSD], AE_ALREADY_EXISTS (20190816/dswload2-326)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20190816/psobject-220)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform MSFT0101:00: failed to claim resource 1: [mem 0xfed40000-0xfed40fff]
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: acpi MSFT0101:00: platform device creation failed: -16
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: usb: port power management may be unreliable
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform eisa.0: EISA: Cannot allocate resource for mainboard
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform eisa.0: Cannot allocate resource for EISA slot 1
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform eisa.0: Cannot allocate resource for EISA slot 2
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform eisa.0: Cannot allocate resource for EISA slot 3
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform eisa.0: Cannot allocate resource for EISA slot 4
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform eisa.0: Cannot allocate resource for EISA slot 5
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform eisa.0: Cannot allocate resource for EISA slot 6
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform eisa.0: Cannot allocate resource for EISA slot 7
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: platform eisa.0: Cannot allocate resource for EISA slot 8
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: acpi PNP0C14:02: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:01)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: r8169 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: nvme nvme0: missing or invalid SUBNQN field.
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: xfs filesystem being remounted at / supports timestamps until 2038 (0x7fffffff)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: asus_wmi: ASUS Management GUID not found
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: uvcvideo 1-5:1.0: Entity type for entity Realtek Extended Controls Unit was not initialized!
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: uvcvideo 1-5:1.0: Entity type for entity Extension 4 was not initialized!
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: uvcvideo 1-5:1.0: Entity type for entity Processing 2 was not initialized!
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: uvcvideo 1-5:1.0: Entity type for entity Camera 1 was not initialized!
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: nvidia: loading out-of-tree module taints kernel.
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: nvidia: module license 'NVIDIA' taints kernel.
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: Disabling lock debugging due to kernel taint
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: thermal thermal_zone3: failed to read out thermal zone (-61)
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  440.100  Fri May 29 08:45:51 UTC 2020
aug 11 21:01:58 josejacomeb-Lenovo-ideapad-700-15ISK kernel: Bluetooth: hci0: unexpected event for opcode 0xfc2f
aug 11 21:02:01 josejacomeb-Lenovo-ideapad-700-15ISK kernel: iwlwifi 0000:02:00.0: FW already configured (0) - re-configuring
aug 11 21:02:01 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20190816/nsarguments-59)

Troisième journal

aug 11 21:44:10 josejacomeb-Lenovo-ideapad-700-15ISK kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  440.100  Fri May 29 08:45:51 UTC 2020
aug 11 21:44:10 josejacomeb-Lenovo-ideapad-700-15ISK kernel: thermal thermal_zone3: failed to read out thermal zone (-61)
aug 11 21:44:10 josejacomeb-Lenovo-ideapad-700-15ISK kernel: Bluetooth: hci0: unexpected event for opcode 0xfc2f
aug 11 21:44:13 josejacomeb-Lenovo-ideapad-700-15ISK kernel: iwlwifi 0000:02:00.0: FW already configured (0) - re-configuring
aug 11 21:44:13 josejacomeb-Lenovo-ideapad-700-15ISK kernel: ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20190816/nsarguments-59)
aug 11 22:21:25 josejacomeb-Lenovo-ideapad-700-15ISK kernel: iwlwifi 0000:02:00.0: FW already configured (0) - re-configuring
aug 11 22:21:26 josejacomeb-Lenovo-ideapad-700-15ISK kernel: Bluetooth: hci0: unexpected event for opcode 0xfc2f
aug 11 22:22:31 josejacomeb-Lenovo-ideapad-700-15ISK kernel: kauditd_printk_skb: 37 callbacks suppressed

Mise à jour

J'ai du mal réinstallé Kubuntu 20.04.1 avec une partition ext4, il semble être une erreur SSD, de nouvelles informations sont ci-dessous:

  • nVME0N1P5/partition
  • nVME0N1P4/Home Partition Il se produit au hasard lorsque j'utilise mon PC et que l'ordinateur est complètement gelé.
[ 3378.408344] systemd-journald (423): Failed to write entry (22 items, 780 bytes), ignoring: Read-only 
[ 3378.408611] systemd-journald [423] : Failed to write entry (22 items, 769 bytes), ignoring: Read-only 

Un autre journal de l'erreur de congélation.

[ 827214225 EXT4-fs error (device nvme0n1p5): __ext4_find_entry:1531: inode #3407921: comm gmain: reading directory lblock 0
[ 827.214749] EXT4-fs error (device nvme0n1p5): __ext4_find_entry:1531: inode #3407921: conn gmain: reading directory lblock 0 
[ 827.214764] EXT4-fs error (device nvme0n1p5): __ext4_find_entry:1531: inode #3407921: comm gmain: reading directory lblock 0

Parfois, quand je ferme mon ordinateur portable, cette erreur se produit

[ 16918.166564] systemd-shutdown [1]: Remounting '/' timed out. issuing SIGKILL to PID 11240.
[ 16982.141788] nvme nvme0: Device not ready: aborting reset
[ 16982.143784] nvme : Removing after probe failure status: -19

Mise à jour 2

À l'aide de Kubuntu Live Iso, j'ai effectué le test FSCK, aucun problème trouvé.

root@kubuntu:/home/kubuntu# fsck /dev/nvme0n1p3 
fsck from util-linux 2.34
e2fsck 1.45.5 (07-Jan-2020)
/dev/nvme0n1p3: clean, 257827/6111232 files, 8741020/24413952 blocks
root@kubuntu:/home/kubuntu# echo $?
0
root@kubuntu:/home/kubuntu# fsck /dev/nvme0n1p5
fsck from util-linux 2.34
e2fsck 1.45.5 (07-Jan-2020)
/dev/nvme0n1p5: clean, 754959/6447104 files, 10749435/25785856 blocks
root@kubuntu:/home/kubuntu# echo $?
0

Problème quand redémarrer

nvme nvme0: Device not ready; aborting reset
nvme nvme0: Abort status: 0x371
nvme nvme0: Abort status: 0x371
nvme nvme0: Abort status: 0x371
Remounting '/' timed out, issuing SIGKILL to PID 7544.

Le SMART Analyse est le suivant:

Sudo smartctl -i /dev/nvme0
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-42-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Kingston SA2000M8500G
Serial Number:                      50026B7683BC98CE
Firmware Version:                   S5Z42105
PCI Vendor/Subsystem ID:            0x2646
IEEE OUI Identifier:                0x0026b7
Controller ID:                      1
Number of Namespaces:               1
Namespace 1 Size/Capacity:          500.107.862.016 [500 GB]
Namespace 1 Utilization:            142.133.460.992 [142 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            0026b7 683bc98ce5
Local Time is:                      Wed Aug 26 23:49:45 2020 CEST


Sudo smartctl -a /dev/nvme0         
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-42-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Kingston SA2000M8500G
Serial Number:                      50026B7683BC98CE
Firmware Version:                   S5Z42105
PCI Vendor/Subsystem ID:            0x2646
IEEE OUI Identifier:                0x0026b7
Controller ID:                      1
Number of Namespaces:               1
Namespace 1 Size/Capacity:          500.107.862.016 [500 GB]
Namespace 1 Utilization:            142.114.676.736 [142 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            0026b7 683bc98ce5
Local Time is:                      Wed Aug 26 23:51:50 2020 CEST
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Maximum Data Transfer Size:         32 Pages
Warning  Comp. Temp. Threshold:     75 Celsius
Critical Comp. Temp. Threshold:     80 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     9.00W       -        -    0  0  0  0        0       0
 1 +     4.60W       -        -    1  1  1  1        0       0
 2 +     3.80W       -        -    2  2  2  2        0       0
 3 -   0.0450W       -        -    3  3  3  3     2000    2000
 4 -   0.0040W       -        -    4  4  4  4    15000   15000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        30 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    3.966.522 [2,03 TB]
Data Units Written:                 6.036.943 [3,09 TB]
Host Read Commands:                 38.899.250
Host Write Commands:                46.064.389
Controller Busy Time:               601
Power Cycles:                       390
Power On Hours:                     241
Unsafe Shutdowns:                   160
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Thermal Temp. 1 Transition Count:   7
Thermal Temp. 1 Total Time:         24

Error Information (NVMe Log 0x01, max 256 entries)
No Errors Logged

SSD Firmware

Merci d'avoir lu. Qu'est ce que j'ai mal fait? Tout commentaire est vraiment apprécié!

Salutations

2
José Jácome

Le problème était d'une fonctionnalité SSD, les transitions d'état d'alimentation autonome (APST) faisaient la gel. Pour en atténuer, jusqu'à ce qu'ils libèrent la solution, incluez la ligne nvme_core.default_ps_max_latency_us=0 dans le GRUB_CMDLINE_LINUX_DEFAULT options. Par exemple:

GRUB_TIMEOUT=10
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nvme_core.default_ps_max_latency_us=0"
GRUB_CMDLINE_LINUX=""
1
José Jácome