SlideShare a Scribd company logo
Intro to Kernel Debugging: Just make the crashing stop!
Welcome. Here’s what’s in store if you stick around.
We’ll Introduce:
● Gathering debug information
● Kernel development processes
● Oops analysis
● Code inspection
● Git tricks for finding fixes
● Engaging the kernel community
● How to dive deeper into debugging
Dave Chiluk, Linux Platform Engineer at Indeed
We’ll also cover a case study of a real-life
XFS filesystem corruption bug
Dave Chiluk
Intro to Kernel Debugging:
Just Make the Crashing Stop!
Linux Platform Engineer, Indeed
Intro to Kernel Debugging:
Just Make the Crashing Stop!
We’ll Introduce:
● Gathering debug information
● Kernel development processes
● Oops analysis
● Code inspection
● Git tricks for finding fixes
● Engaging the kernel community
● How to dive deeper into debugging
We’ll walk through a real-life case study.
Intro to Kernel Debugging - Just make the crashing stop!
Intro to Kernel Debugging - Just make the crashing stop!
Dave Chiluk
● Indeed.com: Linux Platform Engineer
○ Fix issues in the open-source code that Indeed uses
● Canonical: Ubuntu Sustaining Engineering
○ Supported Customers with Ubuntu Kernel problems
○ Ubuntu core developer
Scale Up
● Servers are like pets
● You name them, and when they get sick, you nurse them
back to health
Scale Out
● Servers are like cattle
● You number them, and when they get sick, you shoot them
Bill Baker, Distinguished Engineer, Microsoft
- Scaling SQL Server 2012 by Glenn Berry
Pets vs. Cattle
Pets vs. Cattle… and Wolves
Scale Up
● Servers are like pets
● You name them, and when they get sick, you nurse them
back to health
Scale Out
● Servers are like cattle
● You number them, and when they get sick, you shoot them
● If wolves start eating too many cattle, shoot them too.
Case Study: An xfs crash
● SYSENG-2163: Rekick dc1-srv3
Description: "We need to rekick dc1-srv3 due to filesystem corruption.
Currently this host is in downtime and removed from the mesos cluster."
● SYSENG-2336: Rekick dc2-srv11
● SYSENG-2342: Rekick dc2-srv13
● SYSENG-2398: replace dc1-srv3; /var is corrupt for the n'th time
● SYSENG-2624: Rekick dc3-srv7: /var corrupted
● SYSENG-2723: Rekick dc1-srv6
● SYSENG-2770: /var corrupted on dc1-srv16
● SYSENG-2802: Fix the corrupt disk issue on dc1-srv10 (kernel bug)
● SYSENG-2849: dc4-srv5/6 var corruption
● SYSENG-2850: Monitor on corrupt filesystems
● SYSENG-3056: dc2-srv28 /var corruption by xfs bug
Step 1: Gather Information
Step 1: Gather Information
XFS (dm-4): Internal error XFS_WANT_CORRUPTED_GOTO at line 3505 of
file fs/xfs/libxfs/xfs_btree.c. Caller xfs_free_ag_extent+0x35d/0x7a0
[xfs]
CPU: 18 PID: 9896 Comm: mesos-slave Not tainted
4.10.10-1.el7.elrepo.x86_64 #1
Hardware name: Supermicro PIO-618U-TR4T+-ST031/X10DRU-i+, BIOS 2.0
12/17/2015
Call Trace:
dump_stack+0x63/0x87
xfs_error_report+0x3b/0x40 [xfs]
? xfs_free_ag_extent+0x35d/0x7a0 [xfs]
xfs_btree_insert+0x1b0/0x1c0 [xfs]
xfs_free_ag_extent+0x35d/0x7a0 [xfs]
xfs_free_extent+0xbb/0x150 [xfs]
xfs_trans_free_extent+0x4f/0x110 [xfs]
? xfs_trans_add_item+0x5d/0x90 [xfs]
xfs_extent_free_finish_item+0x26/0x40 [xfs]
xfs_defer_finish+0x149/0x410 [xfs]
xfs_remove+0x281/0x330 [xfs]
xfs_vn_unlink+0x55/0xa0 [xfs]
vfs_rmdir+0xb6/0x130
do_rmdir+0x1b3/0x1d0
SyS_rmdir+0x16/0x20
do_syscall_64+0x67/0x180
entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f85d8d92397
RSP: 002b:00007f85cef9b758 EFLAGS: 00000246 ORIG_RAX: 0000000000000054
RAX: ffffffffffffffda RBX: 00007f858c00b4c0 RCX: 00007f85d8d92397
RDX: 00007f858c09ad70 RSI: 0000000000000000 RDI: 00007f858c09ad70
RBP: 00007f85cef9bc30 R08: 0000000000000001 R09: 0000000000000002
R10: 0000006f74656c67 R11: 0000000000000246 R12: 00007f85cef9c640
R13: 00007f85cef9bc50 R14: 00007f85cef9bcc0 R15: 00007f85cef9bc40
XFS (dm-4): xfs_do_force_shutdown(0x8) called from line 236 of file
fs/xfs/libxfs/xfs_defer.c. Return address = 0xffffffffa028f087
XFS (dm-4): Corruption of in-memory data detected. Shutting down
filesystem
XFS (dm-4): Please umount the filesystem and rectify the problem(s)
● Kernel Version
● Logs
○ First Oops
○ /var/log
○ Console output
○ rsyslog if necessary
● crashdump
● sosreport
● sar
Filesystem Errors
Dump the filesystem - dd,
physical removal
xfs-metadump
xfs-restore
Network Errors
tcpdump
wireshark
Step 1: Gather Information
Device Driver Errors
Firmware level?
Module arguments?
modinfo output
For Subsystem Issues
Many others ...
Step 2: Get the Sources
Find the exact sources
used to build your kernel.
Get the Sources: Kernel Development
4.17 4.18-rc1 4.18-rc2 ... v4.18
Mainline Kernel Development
● All active kernel development eventually gets merged here
● git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
● Currently Maintained by Greg Kroah-Hartman (previously Linus Torvalds)
● 14432 Patches from v4.17 (June 3, 2018) to v4.18 (Aug 12, 2018) - 10 WEEKS!
New features, bug fixes
Linux Mainline Tree
Get the Sources: Kernel Development
Linux Mainline Tree
4.17 4.18-rc1 4.18-rc2 ... 4.18
Subsystem Development trees
● Maintained by “Lieutenant” Subsystem Maintainers
● XFS https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git
● Networking git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
xfs-linux
net-next
Get the Sources
Linux Mainline Tree
Stable Kernels
● Release kernels + bug fixes
● Maintained by Greg Kroah-Hartman
● git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
● One repository many branches.
● Short-term stable and long-term stable
4.2
Linux Stable 4.2 (STS)
Bug fixes
4.3
Linux Stable 4.3 (STS)
4.5
Linux Stable 4.5 (STS)
Linux Stable 4.4 (LTS to Feb 2022)
4.4
Mainline or Stable Kernels
● git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
make old config
make; make modules_install; make install
● update-initramfs
● update-grub
Prebuilt mainline kernels are available for many distributions
● https://wiki.ubuntu.com/Kernel/MainlineBuilds
● http://elrepo.org - Centos linux-stable kernels.
Step 2: Get the Sources
Indeed uses and contributes to elrepo kernels
Get the Sources
Linux Stable
Centos 7
3.10.0 4.13 4.14 4.15 4.16 4.17 4.18
Fedora 27
Ubuntu 17.10
Ubuntu 18.04
Fedora 28
Ubuntu 18.10
Centos 7
Distribution Kernels
● Typically branched from Linux-stable kernels, but not necessarily LTS kernels
● Follow linux-stable process + feature work
● Own maintainers
Step 2: Get the Sources
Centos
● $ git clone https://git.centos.org/summary/rpms!kernel.git
$ git clone https://git.centos.org/git/centos-git-common.git
$ cd kernel && git checkout c7
$ centos-git-common/get_sources.sh
$ rpm-build -ba
● Provides an RPM-centric source tree, a source tarball, and a
bunch of individual patches that are applied
● This is not a “real” git repository
Step 2: Get the Sources
Ubuntu / Debian
● apt-get source linux-image-$(uname-r)
● http://kernel.ubuntu.com/git/?q=ubuntu%2F
● git clone git://kernel.ubuntu.com/ubuntu/ubuntu-*
● rebuild
Step 2: Get the Sources
Debuginfo is unstripped versions of vmlinux
● This is needed if you want to run crash against a crashdump or
do register analysis against the stack trace in your oops.
Centos/ RHEL
● yum --enablerepo=base-debuginfo install -y
kernel-debuginfo-$(uname -r)
Ubuntu
● https://wiki.ubuntu.com/Debug%20Symbol%20Packages
Debug Information
Step 2: Get the Sources
Kernel Structure
documentation/
process/ - how to interact with the community
admin-guide/ - the manual
admin-guide/bug-hunting.rst
mm/ - Memory Management
net/ - Network
fs/ - Filesystems
arch/ - Architecture Specific
drivers/ - Device Drivers
firmware/ - Binary Blobs
scripts/
get_maintainer.pl
checkpatch.pl
Start here!
Step 3: Oops Analysis
Step 3: Oops Analysis - The Toolkit
BUG: unable to handle kernel NULL pointer dereference at
(null)
IP: [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
PGD 1a3aa8067
PUD 1a3b3d067
PMD 0
Oops: 0002 [#1] PREEMPT SMP
Modules linked in: bnep ccm binfmt_misc uvcvideo videobuf2_vmalloc
videobuf2_memops videobuf2_v4l2 videobuf2_core hid_a4tech videodev
x86_pkg_temp_thermal intel_powerclamp coretemp ath3k btusb btrtl
btintel bluetooth kvm_intel snd_hda_codec_hdmi kvm
snd_hda_codec_realtek snd_hda_codec_generic irqbypass crc32c_intel
arc4 i915 snd_hda_intel snd_hda_codec ath9k ath9k_common ath9k_hw
ath i2c_algo_bit snd_hwdep mac80211 ghash_clmulni_intel
snd_hda_core snd_pcm snd_timer cfg80211 ehci_pci xhci_pci
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm
xhci_hcd ehci_hcd asus_nb_wmi(-) asus_wmi sparse_keymap r8169
rfkill mxm_wmi serio_raw snd mii mei_me lpc_ich i2c_i801 video
soundcore mei i2c_smbus wmi i2c_core mfd_core
CPU: 3 PID: 3275 Comm: modprobe Not tainted 4.9.34-gentoo #34
Hardware name: ASUSTeK COMPUTER INC. K56CM/K56CM, BIOS K56CM.206
08/21/2012
task: ffff8801a639ba00 task.stack: ffffc900014cc000
RIP: 0010:[<ffffffff816c7348>] [<ffffffff816c7348>]
__mutex_lock_slowpath+0x98/0x120
RSP: 0018:ffffc900014cfce0 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff8801a54315b0 RCX: 00000000c0000100
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8801a54315b4
RBP: ffffc900014cfd30 R08: 0000000000000000 R09: 0000000000000002
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801a54315b4
R13: ffff8801a639ba00 R14: 00000000ffffffff R15: ffff8801a54315b8
FS: 00007faa254fb700(0000) GS:ffff8801aef80000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000001a3b1b000 CR4: 00000000001406e0
Stack:
ffff8801a54315b8 0000000000000000 ffffffff814733ae ffffc900014cfd28
ffffffff8146a28c ffff8801a54315b0 0000000000000000 ffff8801a54315b0
ffff8801a66f3820 0000000000000000 ffffc900014cfd48 ffffffff816c73e7
Call Trace:
[<ffffffff814733ae>] ? acpi_ut_release_mutex+0x5d/0x61
[<ffffffff8146a28c>] ? acpi_ns_get_node+0x49/0x52
[<ffffffff816c73e7>] mutex_lock+0x17/0x30
[<ffffffffa00a3bb4>] asus_rfkill_hotplug+0x24/0x1a0 [asus_wmi]
[<ffffffffa00a4421>] asus_wmi_rfkill_exit+0x61/0x150 [asus_wmi]
[<ffffffffa00a49f1>] asus_wmi_remove+0x61/0xb0 [asus_wmi]
[<ffffffff814a5128>] platform_drv_remove+0x28/0x40
[<ffffffff814a2901>] __device_release_driver+0xa1/0x160
[<ffffffff814a29e3>] device_release_driver+0x23/0x30
[<ffffffff814a1ffd>] bus_remove_device+0xfd/0x170
[<ffffffff8149e5a9>] device_del+0x139/0x270
[<ffffffff814a5028>] platform_device_del+0x28/0x90
[<ffffffff814a50a2>] platform_device_unregister+0x12/0x30
[<ffffffffa00a4209>] asus_wmi_unregister_driver+0x19/0x30 [asus_wmi]
[<ffffffffa00da0ea>] asus_nb_wmi_exit+0x10/0xf26 [asus_nb_wmi]
[<ffffffff8110c692>] SyS_delete_module+0x192/0x270
[<ffffffff810022b2>] ? exit_to_usermode_loop+0x92/0xa0
[<ffffffff816ca560>] entry_SYSCALL_64_fastpath+0x13/0x94
Code: e8 5e 30 00 00 8b 03 83 f8 01 0f 84 93 00 00 00 48 8b 43 10 4c
8d 7b 08 48 89 63 10 41 be ff ff ff ff 4c 89 3c 24 48 89 44 24 08
<48> 89 20 4c 89 6c 24 10 eb 1d 4c 89 e7 49 c7 45 08 02 00 00 00
RIP [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
RSP <ffffc900014cfce0>
CR2: 0000000000000000
---[ end trace 8d484233fa7cb512 ]---
note: modprobe[3275] exited with preempt_count 2
Step 3: Oops Analysis - The Toolkit
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
PGD 1a3aa8067
PUD 1a3b3d067
PMD 0
Oops: 0002 [#1] PREEMPT SMP
Modules linked in: bnep ccm binfmt_misc uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2
videobuf2_core hid_a4tech videodev x86_pkg_temp_thermal intel_powerclamp coretemp ath3k btusb btrtl btintel
bluetooth kvm_intel snd_hda_codec_hdmi kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass crc32c_intel
arc4 i915 snd_hda_intel snd_hda_codec ath9k ath9k_common ath9k_hw ath i2c_algo_bit snd_hwdep mac80211
ghash_clmulni_intel snd_hda_core snd_pcm snd_timer cfg80211 ehci_pci xhci_pci drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops drm xhci_hcd ehci_hcd asus_nb_wmi(-) asus_wmi sparse_keymap r8169 rfkill
mxm_wmi serio_raw snd mii mei_me lpc_ich i2c_i801 video soundcore mei i2c_smbus wmi i2c_core mfd_core
CPU: 3 PID: 3275 Comm: modprobe Not tainted 4.9.34-gentoo #34
Hardware name: ASUSTeK COMPUTER INC. K56CM/K56CM, BIOS K56CM.206 08/21/2012
task: ffff8801a639ba00 task.stack: ffffc900014cc000
RIP: 0010:[<ffffffff816c7348>] [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
Step 3: Oops Analysis - The Toolkit
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
PGD 1a3aa8067
PUD 1a3b3d067
PMD 0
Oops: 0002 [#1] PREEMPT SMP
Modules linked in: bnep ccm binfmt_misc uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2
videobuf2_core hid_a4tech videodev x86_pkg_temp_thermal intel_powerclamp coretemp ath3k btusb btrtl btintel
bluetooth kvm_intel snd_hda_codec_hdmi kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass crc32c_intel
arc4 i915 snd_hda_intel snd_hda_codec ath9k ath9k_common ath9k_hw ath i2c_algo_bit snd_hwdep mac80211
ghash_clmulni_intel snd_hda_core snd_pcm snd_timer cfg80211 ehci_pci xhci_pci drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops drm xhci_hcd ehci_hcd asus_nb_wmi(-) asus_wmi sparse_keymap r8169 rfkill
mxm_wmi serio_raw snd mii mei_me lpc_ich i2c_i801 video soundcore mei i2c_smbus wmi i2c_core mfd_core
CPU: 3 PID: 3275 Comm: modprobe Not tainted 4.9.34-gentoo #34
Hardware name: ASUSTeK COMPUTER INC. K56CM/K56CM, BIOS K56CM.206 08/21/2012
task: ffff8801a639ba00 task.stack: ffffc900014cc000
RIP: 0010:[<ffffffff816c7348>] [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
Step 3: Oops Analysis - The Toolkit
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
PGD 1a3aa8067
PUD 1a3b3d067
PMD 0
Oops: 0002 [#1] PREEMPT SMP
Modules linked in: bnep ccm binfmt_misc uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2
videobuf2_core hid_a4tech videodev x86_pkg_temp_thermal intel_powerclamp coretemp ath3k btusb btrtl btintel
bluetooth kvm_intel snd_hda_codec_hdmi kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass crc32c_intel
arc4 i915 snd_hda_intel snd_hda_codec ath9k ath9k_common ath9k_hw ath i2c_algo_bit snd_hwdep mac80211
ghash_clmulni_intel snd_hda_core snd_pcm snd_timer cfg80211 ehci_pci xhci_pci drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops drm xhci_hcd ehci_hcd asus_nb_wmi(-) asus_wmi sparse_keymap r8169 rfkill
mxm_wmi serio_raw snd mii mei_me lpc_ich i2c_i801 video soundcore mei i2c_smbus wmi i2c_core mfd_core
CPU: 3 PID: 3275 Comm: modprobe Not tainted 4.9.34-gentoo #34
Hardware name: ASUSTeK COMPUTER INC. K56CM/K56CM, BIOS K56CM.206 08/21/2012
task: ffff8801a639ba00 task.stack: ffffc900014cc000
RIP: 0010:[<ffffffff816c7348>] [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
Step 3: Oops Analysis - The Toolkit
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
PGD 1a3aa8067
PUD 1a3b3d067
PMD 0
Oops: 0002 [#1] PREEMPT SMP
Modules linked in: bnep ccm binfmt_misc uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2
videobuf2_core hid_a4tech videodev x86_pkg_temp_thermal intel_powerclamp coretemp ath3k btusb btrtl btintel
bluetooth kvm_intel snd_hda_codec_hdmi kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass crc32c_intel
arc4 i915 snd_hda_intel snd_hda_codec ath9k ath9k_common ath9k_hw ath i2c_algo_bit snd_hwdep mac80211
ghash_clmulni_intel snd_hda_core snd_pcm snd_timer cfg80211 ehci_pci xhci_pci drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops drm xhci_hcd ehci_hcd asus_nb_wmi(-) asus_wmi sparse_keymap r8169 rfkill
mxm_wmi serio_raw snd mii mei_me lpc_ich i2c_i801 video soundcore mei i2c_smbus wmi i2c_core mfd_core
CPU: 3 PID: 3275 Comm: modprobe Not tainted 4.9.34-gentoo #34
Hardware name: ASUSTeK COMPUTER INC. K56CM/K56CM, BIOS K56CM.206 08/21/2012
task: ffff8801a639ba00 task.stack: ffffc900014cc000
RIP: 0010:[<ffffffff816c7348>] [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
Step 3: Oops Analysis - The Toolkit
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
PGD 1a3aa8067
PUD 1a3b3d067
PMD 0
Oops: 0002 [#1] PREEMPT SMP
Modules linked in: bnep ccm binfmt_misc uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2
videobuf2_core hid_a4tech videodev x86_pkg_temp_thermal intel_powerclamp coretemp ath3k btusb btrtl btintel
bluetooth kvm_intel snd_hda_codec_hdmi kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass crc32c_intel
arc4 i915 snd_hda_intel snd_hda_codec ath9k ath9k_common ath9k_hw ath i2c_algo_bit snd_hwdep mac80211
ghash_clmulni_intel snd_hda_core snd_pcm snd_timer cfg80211 ehci_pci xhci_pci drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops drm xhci_hcd ehci_hcd asus_nb_wmi(-) asus_wmi sparse_keymap r8169 rfkill
mxm_wmi serio_raw snd mii mei_me lpc_ich i2c_i801 video soundcore mei i2c_smbus wmi i2c_core mfd_core
CPU: 3 PID: 3275 Comm: modprobe Not tainted 4.9.34-gentoo #34
Hardware name: ASUSTeK COMPUTER INC. K56CM/K56CM, BIOS K56CM.206 08/21/2012
task: ffff8801a639ba00 task.stack: ffffc900014cc000
RIP: 0010:[<ffffffff816c7348>] [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
Step 3: Oops Analysis - The Toolkit
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
PGD 1a3aa8067
PUD 1a3b3d067
PMD 0
Oops: 0002 [#1] PREEMPT SMP
Modules linked in: bnep ccm binfmt_misc uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2
videobuf2_core hid_a4tech videodev x86_pkg_temp_thermal intel_powerclamp coretemp ath3k btusb btrtl btintel
bluetooth kvm_intel snd_hda_codec_hdmi kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass crc32c_intel
arc4 i915 snd_hda_intel snd_hda_codec ath9k ath9k_common ath9k_hw ath i2c_algo_bit snd_hwdep mac80211
ghash_clmulni_intel snd_hda_core snd_pcm snd_timer cfg80211 ehci_pci xhci_pci drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops drm xhci_hcd ehci_hcd asus_nb_wmi(-) asus_wmi sparse_keymap r8169 rfkill
mxm_wmi serio_raw snd mii mei_me lpc_ich i2c_i801 video soundcore mei i2c_smbus wmi i2c_core mfd_core
CPU: 3 PID: 3275 Comm: modprobe Not tainted 4.9.34-gentoo #34
Hardware name: ASUSTeK COMPUTER INC. K56CM/K56CM, BIOS K56CM.206 08/21/2012
task: ffff8801a639ba00 task.stack: ffffc900014cc000
RIP: 0010:[<ffffffff816c7348>] [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
Step 3: Oops Analysis - The Toolkit
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
PGD 1a3aa8067
PUD 1a3b3d067
PMD 0
Oops: 0002 [#1] PREEMPT SMP
Modules linked in: bnep ccm binfmt_misc uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2
videobuf2_core hid_a4tech videodev x86_pkg_temp_thermal intel_powerclamp coretemp ath3k btusb btrtl btintel
bluetooth kvm_intel snd_hda_codec_hdmi kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass crc32c_intel
arc4 i915 snd_hda_intel snd_hda_codec ath9k ath9k_common ath9k_hw ath i2c_algo_bit snd_hwdep mac80211
ghash_clmulni_intel snd_hda_core snd_pcm snd_timer cfg80211 ehci_pci xhci_pci drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops drm xhci_hcd ehci_hcd asus_nb_wmi(-) asus_wmi sparse_keymap r8169 rfkill
mxm_wmi serio_raw snd mii mei_me lpc_ich i2c_i801 video soundcore mei i2c_smbus wmi i2c_core mfd_core
CPU: 3 PID: 3275 Comm: modprobe Not tainted 4.9.34-gentoo #34
Hardware name: ASUSTeK COMPUTER INC. K56CM/K56CM, BIOS K56CM.206 08/21/2012
task: ffff8801a639ba00 task.stack: ffffc900014cc000
RIP: 0010:[<ffffffff816c7348>] [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
Step 3: Oops Analysis - The Toolkit
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
PGD 1a3aa8067
PUD 1a3b3d067
PMD 0
Oops: 0002 [#1] PREEMPT SMP
Modules linked in: bnep ccm binfmt_misc uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2
videobuf2_core hid_a4tech videodev x86_pkg_temp_thermal intel_powerclamp coretemp ath3k btusb btrtl btintel
bluetooth kvm_intel snd_hda_codec_hdmi kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass crc32c_intel
arc4 i915 snd_hda_intel snd_hda_codec ath9k ath9k_common ath9k_hw ath i2c_algo_bit snd_hwdep mac80211
ghash_clmulni_intel snd_hda_core snd_pcm snd_timer cfg80211 ehci_pci xhci_pci drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops drm xhci_hcd ehci_hcd asus_nb_wmi(-) asus_wmi sparse_keymap r8169 rfkill
mxm_wmi serio_raw snd mii mei_me lpc_ich i2c_i801 video soundcore mei i2c_smbus wmi i2c_core mfd_core
CPU: 3 PID: 3275 Comm: modprobe Not tainted 4.9.34-gentoo #34
Hardware name: ASUSTeK COMPUTER INC. K56CM/K56CM, BIOS K56CM.206 08/21/2012
task: ffff8801a639ba00 task.stack: ffffc900014cc000
RIP: 0010:[<ffffffff816c7348>] [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
Step 3: Oops Analysis - The Toolkit
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
PGD 1a3aa8067
PUD 1a3b3d067
PMD 0
Oops: 0002 [#1] PREEMPT SMP
Modules linked in: bnep ccm binfmt_misc uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2
videobuf2_core hid_a4tech videodev x86_pkg_temp_thermal intel_powerclamp coretemp ath3k btusb btrtl btintel
bluetooth kvm_intel snd_hda_codec_hdmi kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass crc32c_intel
arc4 i915 snd_hda_intel snd_hda_codec ath9k ath9k_common ath9k_hw ath i2c_algo_bit snd_hwdep mac80211
ghash_clmulni_intel snd_hda_core snd_pcm snd_timer cfg80211 ehci_pci xhci_pci drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops drm xhci_hcd ehci_hcd asus_nb_wmi(-) asus_wmi sparse_keymap r8169 rfkill
mxm_wmi serio_raw snd mii mei_me lpc_ich i2c_i801 video soundcore mei i2c_smbus wmi i2c_core mfd_core
CPU: 3 PID: 3275 Comm: modprobe Not tainted 4.9.34-gentoo #34
Hardware name: ASUSTeK COMPUTER INC. K56CM/K56CM, BIOS K56CM.206 08/21/2012
task: ffff8801a639ba00 task.stack: ffffc900014cc000
RIP: 0010:[<ffffffff816c7348>] [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
Step 3: Oops Analysis - The Toolkit
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
PGD 1a3aa8067
PUD 1a3b3d067
PMD 0
Oops: 0002 [#1] PREEMPT SMP
Modules linked in: bnep ccm binfmt_misc uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2
videobuf2_core hid_a4tech videodev x86_pkg_temp_thermal intel_powerclamp coretemp ath3k btusb btrtl btintel
bluetooth kvm_intel snd_hda_codec_hdmi kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass crc32c_intel
arc4 i915 snd_hda_intel snd_hda_codec ath9k ath9k_common ath9k_hw ath i2c_algo_bit snd_hwdep mac80211
ghash_clmulni_intel snd_hda_core snd_pcm snd_timer cfg80211 ehci_pci xhci_pci drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops drm xhci_hcd ehci_hcd asus_nb_wmi(-) asus_wmi sparse_keymap r8169 rfkill
mxm_wmi serio_raw snd mii mei_me lpc_ich i2c_i801 video soundcore mei i2c_smbus wmi i2c_core mfd_core
CPU: 3 PID: 3275 Comm: modprobe Not tainted 4.9.34-gentoo #34
Hardware name: ASUSTeK COMPUTER INC. K56CM/K56CM, BIOS K56CM.206 08/21/2012
task: ffff8801a639ba00 task.stack: ffffc900014cc000
RIP: 0010:[<ffffffff816c7348>] [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
Step 3: Oops Analysis - The Toolkit
RIP: 0010:[<ffffffff816c7348>] [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
RSP: 0018:ffffc900014cfce0 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff8801a54315b0 RCX: 00000000c0000100
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8801a54315b4
RBP: ffffc900014cfd30 R08: 0000000000000000 R09: 0000000000000002
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801a54315b4
R13: ffff8801a639ba00 R14: 00000000ffffffff R15: ffff8801a54315b8
FS: 00007faa254fb700(0000) GS:ffff8801aef80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000001a3b1b000 CR4: 00000000001406e0
$ objdump -d -S -l ./vmlinux-4.15.0-22-generic > /tmp/objdump.out
ffffffff81992130 <__mutex_lock_slowpath>:
__mutex_lock_slowpath():
/build/linux-lZKWha/linux-4.15.0/kernel/locking/mutex.c:1130
ffffffff81992130: e8 3b fb 06 00 callq ffffffff81a01c70 <__fentry__>
ffffffff81992135: 55 push %rbp
/build/linux-lZKWha/linux-4.15.0/kernel/locking/mutex.c:1131
ffffffff81992136: be 02 00 00 00 mov $0x2,%esi
/build/linux-lZKWha/linux-4.15.0/kernel/locking/mutex.c:1130
ffffffff8199213b: 48 89 e5 mov %rsp,%rbp
Step 3: Oops Analysis - The Toolkit
Register to argument mapping defined in:
[linux]/arch/x86/entry/calling.h
x86 function call convention, 64-bit:
Step 3: Oops Analysis - The Toolkit
Stack:
ffff8801a54315b8 0000000000000000 ffffffff814733ae ffffc900014cfd28
ffffffff8146a28c ffff8801a54315b0 0000000000000000 ffff8801a54315b0
ffff8801a66f3820 0000000000000000 ffffc900014cfd48 ffffffff816c73e7
Call Trace:
[<ffffffff814733ae>] ? acpi_ut_release_mutex+0x5d/0x61
[<ffffffff8146a28c>] ? acpi_ns_get_node+0x49/0x52
[<ffffffff816c73e7>] mutex_lock+0x17/0x30
[<ffffffffa00a3bb4>] asus_rfkill_hotplug+0x24/0x1a0 [asus_wmi]
[<ffffffffa00a4421>] asus_wmi_rfkill_exit+0x61/0x150 [asus_wmi]
[<ffffffffa00a49f1>] asus_wmi_remove+0x61/0xb0 [asus_wmi]
[<ffffffff814a5128>] platform_drv_remove+0x28/0x40
[<ffffffff814a2901>] __device_release_driver+0xa1/0x160
[<ffffffff814a29e3>] device_release_driver+0x23/0x30
[<ffffffff814a1ffd>] bus_remove_device+0xfd/0x170
[<ffffffff8149e5a9>] device_del+0x139/0x270
[<ffffffff814a5028>] platform_device_del+0x28/0x90
[<ffffffff814a50a2>] platform_device_unregister+0x12/0x30
[<ffffffffa00a4209>] asus_wmi_unregister_driver+0x19/0x30 [asus_wmi]
[<ffffffffa00da0ea>] asus_nb_wmi_exit+0x10/0xf26 [asus_nb_wmi]
[<ffffffff8110c692>] SyS_delete_module+0x192/0x270
[<ffffffff810022b2>] ? exit_to_usermode_loop+0x92/0xa0
[<ffffffff816ca560>] entry_SYSCALL_64_fastpath+0x13/0x94
Step 3: Oops Analysis - The Toolkit
Stack:
ffff8801a54315b8 0000000000000000 ffffffff814733ae ffffc900014cfd28
ffffffff8146a28c ffff8801a54315b0 0000000000000000 ffff8801a54315b0
ffff8801a66f3820 0000000000000000 ffffc900014cfd48 ffffffff816c73e7
Call Trace:
[<ffffffff814733ae>] ? acpi_ut_release_mutex+0x5d/0x61
[<ffffffff8146a28c>] ? acpi_ns_get_node+0x49/0x52
[<ffffffff816c73e7>] mutex_lock+0x17/0x30
[<ffffffffa00a3bb4>] asus_rfkill_hotplug+0x24/0x1a0 [asus_wmi]
[<ffffffffa00a4421>] asus_wmi_rfkill_exit+0x61/0x150 [asus_wmi]
[<ffffffffa00a49f1>] asus_wmi_remove+0x61/0xb0 [asus_wmi]
[<ffffffff814a5128>] platform_drv_remove+0x28/0x40
[<ffffffff814a2901>] __device_release_driver+0xa1/0x160
[<ffffffff814a29e3>] device_release_driver+0x23/0x30
[<ffffffff814a1ffd>] bus_remove_device+0xfd/0x170
[<ffffffff8149e5a9>] device_del+0x139/0x270
[<ffffffff814a5028>] platform_device_del+0x28/0x90
[<ffffffff814a50a2>] platform_device_unregister+0x12/0x30
[<ffffffffa00a4209>] asus_wmi_unregister_driver+0x19/0x30 [asus_wmi]
[<ffffffffa00da0ea>] asus_nb_wmi_exit+0x10/0xf26 [asus_nb_wmi]
[<ffffffff8110c692>] SyS_delete_module+0x192/0x270
[<ffffffff810022b2>] ? exit_to_usermode_loop+0x92/0xa0
[<ffffffff816ca560>] entry_SYSCALL_64_fastpath+0x13/0x94
Step 3: Oops Analysis - The Toolkit
Stack:
ffff8801a54315b8 0000000000000000 ffffffff814733ae ffffc900014cfd28
ffffffff8146a28c ffff8801a54315b0 0000000000000000 ffff8801a54315b0
ffff8801a66f3820 0000000000000000 ffffc900014cfd48 ffffffff816c73e7
Call Trace:
[<ffffffff814733ae>] ? acpi_ut_release_mutex+0x5d/0x61
[<ffffffff8146a28c>] ? acpi_ns_get_node+0x49/0x52
[<ffffffff816c73e7>] mutex_lock+0x17/0x30
[<ffffffffa00a3bb4>] asus_rfkill_hotplug+0x24/0x1a0 [asus_wmi]
[<ffffffffa00a4421>] asus_wmi_rfkill_exit+0x61/0x150 [asus_wmi]
[<ffffffffa00a49f1>] asus_wmi_remove+0x61/0xb0 [asus_wmi]
[<ffffffff814a5128>] platform_drv_remove+0x28/0x40
[<ffffffff814a2901>] __device_release_driver+0xa1/0x160
[<ffffffff814a29e3>] device_release_driver+0x23/0x30
[<ffffffff814a1ffd>] bus_remove_device+0xfd/0x170
[<ffffffff8149e5a9>] device_del+0x139/0x270
[<ffffffff814a5028>] platform_device_del+0x28/0x90
[<ffffffff814a50a2>] platform_device_unregister+0x12/0x30
[<ffffffffa00a4209>] asus_wmi_unregister_driver+0x19/0x30 [asus_wmi]
[<ffffffffa00da0ea>] asus_nb_wmi_exit+0x10/0xf26 [asus_nb_wmi]
[<ffffffff8110c692>] SyS_delete_module+0x192/0x270
[<ffffffff810022b2>] ? exit_to_usermode_loop+0x92/0xa0
[<ffffffff816ca560>] entry_SYSCALL_64_fastpath+0x13/0x94 Bottom / oldest
Top / most recent
Step 3: Oops Analysis - The Toolkit
Stack:
ffff8801a54315b8 0000000000000000 ffffffff814733ae ffffc900014cfd28
ffffffff8146a28c ffff8801a54315b0 0000000000000000 ffff8801a54315b0
ffff8801a66f3820 0000000000000000 ffffc900014cfd48 ffffffff816c73e7
Call Trace:
[<ffffffff814733ae>] ? acpi_ut_release_mutex+0x5d/0x61
[<ffffffff8146a28c>] ? acpi_ns_get_node+0x49/0x52
[<ffffffff816c73e7>] mutex_lock+0x17/0x30
[<ffffffffa00a3bb4>] asus_rfkill_hotplug+0x24/0x1a0 [asus_wmi]
[<ffffffffa00a4421>] asus_wmi_rfkill_exit+0x61/0x150 [asus_wmi]
[<ffffffffa00a49f1>] asus_wmi_remove+0x61/0xb0 [asus_wmi]
[<ffffffff814a5128>] platform_drv_remove+0x28/0x40
[<ffffffff814a2901>] __device_release_driver+0xa1/0x160
[<ffffffff814a29e3>] device_release_driver+0x23/0x30
[<ffffffff814a1ffd>] bus_remove_device+0xfd/0x170
[<ffffffff8149e5a9>] device_del+0x139/0x270
[<ffffffff814a5028>] platform_device_del+0x28/0x90
[<ffffffff814a50a2>] platform_device_unregister+0x12/0x30
[<ffffffffa00a4209>] asus_wmi_unregister_driver+0x19/0x30 [asus_wmi]
[<ffffffffa00da0ea>] asus_nb_wmi_exit+0x10/0xf26 [asus_nb_wmi]
[<ffffffff8110c692>] SyS_delete_module+0x192/0x270
[<ffffffff810022b2>] ? exit_to_usermode_loop+0x92/0xa0
[<ffffffff816ca560>] entry_SYSCALL_64_fastpath+0x13/0x94
Step 3: Oops Analysis - The Toolkit
Code: A hex-dump of the section of machine code that was being run at the time the Oops occurred.
Code: e8 5e 30 00 00 8b 03 83 f8 01 0f 84 93 00 00 00 48 8b 43 10 4c 8d 7b 08 48 89 63 10 41 be ff ff ff ff
4c 89 3c 24 48 89 44 24 08 <48> 89 20 4c 89 6c 24 10 eb 1d 4c 89 e7 49 c7 45 08 02 00 00 00
$ echo "Code: e8 5e 30 00 00 8b 03 83 f8 01 0f 84 93 00 00 00 48 8b 43 10 4c 8d 7b 08 48 89 63 10 41 be ff ff
ff ff 4c 89 3c 24 48 89 44 24 08 <48> 89 20 4c 89 6c 24 10 eb 1d 4c 89 e7 49 c7 45 08 02 00 00 00
" | ~/src/linux/scripts/decodecode
Code: e8 5e 30 00 00 8b 03 83 f8 01 0f 84 93 00 00 00 48 8b 43 10 4c 8d 7b 08 48 89 63 10 41 be ff ff ff ff
4c 89 3c 24 48 89 44 24 08 <48> 89 20 4c 89 6c 24 10 eb 1d 4c 89 e7 49 c7 45 08 02 00 00 00
All code
========
0: e8 5e 30 00 00 callq 0x3063
5: 8b 03 mov (%rbx),%eax
7: 83 f8 01 cmp $0x1,%eax
a: 0f 84 93 00 00 00 je 0xa3
10: 48 8b 43 10 mov 0x10(%rbx),%rax
14: 4c 8d 7b 08 lea 0x8(%rbx),%r15
18: 48 89 63 10 mov %rsp,0x10(%rbx)
1c: 41 be ff ff ff ff mov $0xffffffff,%r14d
22: 4c 89 3c 24 mov %r15,(%rsp)
26: 48 89 44 24 08 mov %rax,0x8(%rsp)
2b:* 48 89 20 mov %rsp,(%rax) <-- trapping instruction
Case Study: Log Output
XFS (dm-4): Internal error XFS_WANT_CORRUPTED_GOTO at line 3505 of file fs/xfs/libxfs/xfs_btree.c. Caller
xfs_free_ag_extent+0x35d/0x7a0 [xfs]
CPU: 18 PID: 9896 Comm: mesos-slave Not tainted 4.10.10-1.el7.elrepo.x86_64 #1
Hardware name: Supermicro PIO-618U-TR4T+-ST031/X10DRU-i+, BIOS 2.0 12/17/2015
Call Trace:
dump_stack+0x63/0x87
xfs_error_report+0x3b/0x40 [xfs]
? xfs_free_ag_extent+0x35d/0x7a0 [xfs]
xfs_btree_insert+0x1b0/0x1c0 [xfs]
xfs_free_ag_extent+0x35d/0x7a0 [xfs]
xfs_free_extent+0xbb/0x150 [xfs]
xfs_trans_free_extent+0x4f/0x110 [xfs]
? xfs_trans_add_item+0x5d/0x90 [xfs]
xfs_extent_free_finish_item+0x26/0x40 [xfs]
xfs_defer_finish+0x149/0x410 [xfs]
xfs_remove+0x281/0x330 [xfs]
xfs_vn_unlink+0x55/0xa0 [xfs]
vfs_rmdir+0xb6/0x130
do_rmdir+0x1b3/0x1d0
SyS_rmdir+0x16/0x20
do_syscall_64+0x67/0x180
entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f85d8d92397
RSP: 002b:00007f85cef9b758 EFLAGS: 00000246 ORIG_RAX: 0000000000000054
RAX: ffffffffffffffda RBX: 00007f858c00b4c0 RCX: 00007f85d8d92397
RDX: 00007f858c09ad70 RSI: 0000000000000000 RDI: 00007f858c09ad70
RBP: 00007f85cef9bc30 R08: 0000000000000001 R09: 0000000000000002
R10: 0000006f74656c67 R11: 0000000000000246 R12: 00007f85cef9c640
R13: 00007f85cef9bc50 R14: 00007f85cef9bcc0 R15: 00007f85cef9bc40
XFS (dm-4): xfs_do_force_shutdown(0x8) called from line 236 of file fs/xfs/libxfs/xfs_defer.c. Return address =
0xffffffffa028f087
XFS (dm-4): Corruption of in-memory data detected. Shutting down filesystem
XFS (dm-4): Please umount the filesystem and rectify the problem(s)
Provides exact Kernel Version + file + line number !
XFS (dm-4): Internal error XFS_WANT_CORRUPTED_GOTO at line 3505 of file
fs/xfs/libxfs/xfs_btree.c. Caller xfs_free_ag_extent+0x35d/0x7a0 [xfs]
CPU: 18 PID: 9896 Comm: mesos-slave Not tainted 4.10.10-1.el7.elrepo.x86_64 #1
Call Trace:
dump_stack+0x63/0x87
xfs_error_report+0x3b/0x40 [xfs]
? xfs_free_ag_extent+0x35d/0x7a0 [xfs]
xfs_btree_insert+0x1b0/0x1c0 [xfs]
xfs_free_ag_extent+0x35d/0x7a0 [xfs]
xfs_free_extent+0xbb/0x150 [xfs]
xfs_trans_free_extent+0x4f/0x110 [xfs]
? xfs_trans_add_item+0x5d/0x90 [xfs]
xfs_extent_free_finish_item+0x26/0x40 [xfs]
xfs_defer_finish+0x149/0x410 [xfs]
xfs_remove+0x281/0x330 [xfs]
xfs_vn_unlink+0x55/0xa0 [xfs]
vfs_rmdir+0xb6/0x130
do_rmdir+0x1b3/0x1d0
SyS_rmdir+0x16/0x20
do_syscall_64+0x67/0x180
entry_SYSCALL64_slow_path+0x25/0x25
Case Study: The Oops
XFS (dm-4): Internal error XFS_WANT_CORRUPTED_GOTO at line 3505 of file
fs/xfs/libxfs/xfs_btree.c. Caller xfs_free_ag_extent+0x35d/0x7a0 [xfs]
CPU: 18 PID: 9896 Comm: mesos-slave Not tainted 4.10.10-1.el7.elrepo.x86_64 #1
3462 xfs_btree_insert(
...
3493 /*
3494 * Insert nrec/nptr into this level of the tree.
3495 * Note if we fail, nptr will be null.
3496 */
3497 error = xfs_btree_insrec(pcur, level, &nptr, &rec, key,
3498 &ncur, &i);
3499 if (error) {
3500 if (pcur != cur)
3501 xfs_btree_del_cursor(pcur, XFS_BTREE_ERROR);
3502 goto error0;
3503 }
3504
3505 XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, i == 1, error0);
Case Study: Code Inspection
Case Study: Code Inspection
3241 xfs_btree_insrec(
3242 struct xfs_btree_cur *cur, /* btree cursor */
...
3248 int *stat) /* success/failure */
3249 {
...
3271 /*
3272 * If we have an external root pointer, and we've made it to the
3273 * root level, allocate a new root block and we're done.
3274 */
3275 if (!(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) &&
3276 (level >= cur->bc_nlevels)) {
3277 error = xfs_btree_new_root(cur, stat);
xfs_btree_insrec
|-> xfs_btree_new_root
|-> cur->bc_ops->alloc_block(cur, &rptr, &lptr, stat);
|-> xfs_allocbt_alloc_block
|-> xfs_alloc_get_freelist
|-> agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
if (bno == NULLAGBLOCK) {
XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT);
*stat = 0;
return 0;
}
Case Study: Code Inspection
Step 4: Check for Fixes
Step 4: Check for Fixes
Google
● Stack trace
● Mailing list archives
● Related terms given your understanding
Step 4: Check for Fixes
Check the git commit logs for fixes
$ git log -r v4.10.. -i --grep "crash" fs/xfs
$ git log -r v4.10.. -i --grep "agfl" fs/xfs
$ git log -r v4.10.. fs/xfs/libxfs/xfs_btree.c
Use git bisect to identify fixes already exist on mainline
$ git bisect start
$ git bisect bad <fixed version>
$ git bisect good <bad version>
Step 5: Gather Even More Info
Step 5: Gather Even More Info
● Enable crashdump
● Instrument the kernel code (kprintf) / build a custom kernel
● systemtap, ftrace, kprobes, jprobes
● eBPF
● perf
● Analyze filesystem metadata offline
Step 6: Engage the Community
Copyright © 2016, Eklektix, Inc.
Mailing lists:
● lkml
● Subsystem specific lists!
● patchwork.kernel.org - Mailing list hub
IRC
● Freenode
○ #xfs
○ ##kernel
○ #ubuntu-kernel
● OFTC
○ #kernelnewbies
Step 6: Engage the Community
Step 6: Engage the Community
E-mail sent to linux-xfs mailing list:
● Include full Oops output.
● Include any analysis you may have been able to do.
"My best guess given code analysis is that we are unable to allocate a
new node in the allocation group free-list btree."
Response:
"Without xfs_repair output, … we have no idea whether this was
caused by corruption or some other problem... If I had a dollar for
every time I've seen this sort of error report, I'd have retired years
ago.”
- Dave Chinner
Step 6: Engage the Community
● Gathered the requested information
● Responded via IRC freenode.net #xfs
"This is the problematic issue: commit 96f859d52bcb ("libxfs:
pack the agfl header structure so XFS_AGFL_SIZE is correct")...
[I] need to resurrect the old patches [I] had that automatically
detected this condition and fixed it."
- Dave Chinner
Step 6.5: The Temporary Fix
Step 6.5: The Temporary Fix
● Explicitly removed problematic patch
● Submitted patch removal to the elrepo kernel
○ http://elrepo.org/bugs/view.php?id=829
○ http://elrepo.org/bugs/view.php?id=833
● Considered running xfs-repair on every /var volume in our cluster
● Proceeded to attempt to create a "fixup" patchset
Intro to Kernel Debugging - Just make the crashing stop!
Step 7: The Real Fix
Step 7: The Real Fix
4 patch rewrites on the linux-xfs mailing list
3 months later a27ba2607 is accepted into the xfs development tree
Step 7: The Real Fix
Mainline
● A month later Linus merges patches from xfs-devel into 4.17rc1
during the 4.17 merge window.
Linux Stable
● Linux stable backport and submission to linux-stable mailing list.
● Greg KH accepts the patches and adds them to the stable-queue
git tree for review.
● Eventually merged to stable branches such as 4.4.
Distribution Kernels
● Most distros follow Linux Stable guidelines.
● Check your distro just to make sure.
Don't forget to upgrade your cluster!
Questions?
http://bit.ly/kerneldebugging
Ad

More Related Content

What's hot (20)

Plug-ins: Building, Shipping, Storing, and Running - Nandhini Santhanam and T...
Plug-ins: Building, Shipping, Storing, and Running - Nandhini Santhanam and T...Plug-ins: Building, Shipping, Storing, and Running - Nandhini Santhanam and T...
Plug-ins: Building, Shipping, Storing, and Running - Nandhini Santhanam and T...
Docker, Inc.
 
Kernel Recipes 2019 - BPF at Facebook
Kernel Recipes 2019 - BPF at FacebookKernel Recipes 2019 - BPF at Facebook
Kernel Recipes 2019 - BPF at Facebook
Anne Nicolas
 
Enhancing OpenShift Security for Business Critical Deployments
Enhancing OpenShift Security for Business Critical DeploymentsEnhancing OpenShift Security for Business Critical Deployments
Enhancing OpenShift Security for Business Critical Deployments
DevOps.com
 
Continuous Security in DevOps
Continuous Security in DevOpsContinuous Security in DevOps
Continuous Security in DevOps
Maciej Lasyk
 
Docker on openstack by OpenSource Consulting
Docker on openstack by OpenSource ConsultingDocker on openstack by OpenSource Consulting
Docker on openstack by OpenSource Consulting
Open Source Consulting
 
Docker Security: Are Your Containers Tightly Secured to the Ship?
Docker Security: Are Your Containers Tightly Secured to the Ship?Docker Security: Are Your Containers Tightly Secured to the Ship?
Docker Security: Are Your Containers Tightly Secured to the Ship?
Michael Boelen
 
Postgre sql linuxcontainers by Jignesh Shah
Postgre sql linuxcontainers by Jignesh ShahPostgre sql linuxcontainers by Jignesh Shah
Postgre sql linuxcontainers by Jignesh Shah
PivotalOpenSourceHub
 
[NYC Meetup] Docker at Nuxeo
[NYC Meetup] Docker at Nuxeo[NYC Meetup] Docker at Nuxeo
[NYC Meetup] Docker at Nuxeo
Nuxeo
 
Jurijs Velikanovs - RAC Attack 101 - How to install 12c RAC on your laptop
Jurijs Velikanovs -  RAC Attack 101 - How to install 12c RAC on your laptop  Jurijs Velikanovs -  RAC Attack 101 - How to install 12c RAC on your laptop
Jurijs Velikanovs - RAC Attack 101 - How to install 12c RAC on your laptop
Andrejs Vorobjovs
 
Docker 1.11 Presentation
Docker 1.11 PresentationDocker 1.11 Presentation
Docker 1.11 Presentation
Sreenivas Makam
 
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s goingKernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Anne Nicolas
 
Coredns nodecache - A highly-available Node-cache DNS server
Coredns nodecache - A highly-available Node-cache DNS serverCoredns nodecache - A highly-available Node-cache DNS server
Coredns nodecache - A highly-available Node-cache DNS server
Yann Hamon
 
Docker - container and lightweight virtualization
Docker - container and lightweight virtualization Docker - container and lightweight virtualization
Docker - container and lightweight virtualization
Sim Janghoon
 
RHEL/Fedora + Docker (and SELinux)
RHEL/Fedora + Docker (and SELinux)RHEL/Fedora + Docker (and SELinux)
RHEL/Fedora + Docker (and SELinux)
Maciej Lasyk
 
tow nodes Oracle 12c RAC on virtualbox
tow nodes Oracle 12c RAC on virtualboxtow nodes Oracle 12c RAC on virtualbox
tow nodes Oracle 12c RAC on virtualbox
justinit
 
On-Demand Image Resizing Extended - External Meet-up
On-Demand Image Resizing Extended - External Meet-upOn-Demand Image Resizing Extended - External Meet-up
On-Demand Image Resizing Extended - External Meet-up
Jonathan Lee
 
Docker Security Paradigm
Docker Security ParadigmDocker Security Paradigm
Docker Security Paradigm
Anis LARGUEM
 
Make Your Containers Faster: Linux Container Performance Tools
Make Your Containers Faster: Linux Container Performance ToolsMake Your Containers Faster: Linux Container Performance Tools
Make Your Containers Faster: Linux Container Performance Tools
Kernel TLV
 
OpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBIT
OpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBITOpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBIT
OpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBIT
OpenNebula Project
 
Docker Networking - Common Issues and Troubleshooting Techniques
Docker Networking - Common Issues and Troubleshooting TechniquesDocker Networking - Common Issues and Troubleshooting Techniques
Docker Networking - Common Issues and Troubleshooting Techniques
Sreenivas Makam
 
Plug-ins: Building, Shipping, Storing, and Running - Nandhini Santhanam and T...
Plug-ins: Building, Shipping, Storing, and Running - Nandhini Santhanam and T...Plug-ins: Building, Shipping, Storing, and Running - Nandhini Santhanam and T...
Plug-ins: Building, Shipping, Storing, and Running - Nandhini Santhanam and T...
Docker, Inc.
 
Kernel Recipes 2019 - BPF at Facebook
Kernel Recipes 2019 - BPF at FacebookKernel Recipes 2019 - BPF at Facebook
Kernel Recipes 2019 - BPF at Facebook
Anne Nicolas
 
Enhancing OpenShift Security for Business Critical Deployments
Enhancing OpenShift Security for Business Critical DeploymentsEnhancing OpenShift Security for Business Critical Deployments
Enhancing OpenShift Security for Business Critical Deployments
DevOps.com
 
Continuous Security in DevOps
Continuous Security in DevOpsContinuous Security in DevOps
Continuous Security in DevOps
Maciej Lasyk
 
Docker on openstack by OpenSource Consulting
Docker on openstack by OpenSource ConsultingDocker on openstack by OpenSource Consulting
Docker on openstack by OpenSource Consulting
Open Source Consulting
 
Docker Security: Are Your Containers Tightly Secured to the Ship?
Docker Security: Are Your Containers Tightly Secured to the Ship?Docker Security: Are Your Containers Tightly Secured to the Ship?
Docker Security: Are Your Containers Tightly Secured to the Ship?
Michael Boelen
 
Postgre sql linuxcontainers by Jignesh Shah
Postgre sql linuxcontainers by Jignesh ShahPostgre sql linuxcontainers by Jignesh Shah
Postgre sql linuxcontainers by Jignesh Shah
PivotalOpenSourceHub
 
[NYC Meetup] Docker at Nuxeo
[NYC Meetup] Docker at Nuxeo[NYC Meetup] Docker at Nuxeo
[NYC Meetup] Docker at Nuxeo
Nuxeo
 
Jurijs Velikanovs - RAC Attack 101 - How to install 12c RAC on your laptop
Jurijs Velikanovs -  RAC Attack 101 - How to install 12c RAC on your laptop  Jurijs Velikanovs -  RAC Attack 101 - How to install 12c RAC on your laptop
Jurijs Velikanovs - RAC Attack 101 - How to install 12c RAC on your laptop
Andrejs Vorobjovs
 
Docker 1.11 Presentation
Docker 1.11 PresentationDocker 1.11 Presentation
Docker 1.11 Presentation
Sreenivas Makam
 
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s goingKernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Kernel Recipes 2016 - Kernel documentation: what we have and where it’s going
Anne Nicolas
 
Coredns nodecache - A highly-available Node-cache DNS server
Coredns nodecache - A highly-available Node-cache DNS serverCoredns nodecache - A highly-available Node-cache DNS server
Coredns nodecache - A highly-available Node-cache DNS server
Yann Hamon
 
Docker - container and lightweight virtualization
Docker - container and lightweight virtualization Docker - container and lightweight virtualization
Docker - container and lightweight virtualization
Sim Janghoon
 
RHEL/Fedora + Docker (and SELinux)
RHEL/Fedora + Docker (and SELinux)RHEL/Fedora + Docker (and SELinux)
RHEL/Fedora + Docker (and SELinux)
Maciej Lasyk
 
tow nodes Oracle 12c RAC on virtualbox
tow nodes Oracle 12c RAC on virtualboxtow nodes Oracle 12c RAC on virtualbox
tow nodes Oracle 12c RAC on virtualbox
justinit
 
On-Demand Image Resizing Extended - External Meet-up
On-Demand Image Resizing Extended - External Meet-upOn-Demand Image Resizing Extended - External Meet-up
On-Demand Image Resizing Extended - External Meet-up
Jonathan Lee
 
Docker Security Paradigm
Docker Security ParadigmDocker Security Paradigm
Docker Security Paradigm
Anis LARGUEM
 
Make Your Containers Faster: Linux Container Performance Tools
Make Your Containers Faster: Linux Container Performance ToolsMake Your Containers Faster: Linux Container Performance Tools
Make Your Containers Faster: Linux Container Performance Tools
Kernel TLV
 
OpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBIT
OpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBITOpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBIT
OpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBIT
OpenNebula Project
 
Docker Networking - Common Issues and Troubleshooting Techniques
Docker Networking - Common Issues and Troubleshooting TechniquesDocker Networking - Common Issues and Troubleshooting Techniques
Docker Networking - Common Issues and Troubleshooting Techniques
Sreenivas Makam
 

Similar to Intro to Kernel Debugging - Just make the crashing stop! (20)

Kernel Recipes 2015 - Hardened kernels for everyone
Kernel Recipes 2015 - Hardened kernels for everyoneKernel Recipes 2015 - Hardened kernels for everyone
Kernel Recipes 2015 - Hardened kernels for everyone
Anne Nicolas
 
Containers with systemd-nspawn
Containers with systemd-nspawnContainers with systemd-nspawn
Containers with systemd-nspawn
Gábor Nyers
 
NRPE - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core 4 and others.
NRPE - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core 4 and others.NRPE - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core 4 and others.
NRPE - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core 4 and others.
Marc Trimble
 
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios CoreNrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
Nagios
 
Systemd: the modern Linux init system you will learn to love
Systemd: the modern Linux init system you will learn to loveSystemd: the modern Linux init system you will learn to love
Systemd: the modern Linux init system you will learn to love
Alison Chaiken
 
Tuning systemd for embedded
Tuning systemd for embeddedTuning systemd for embedded
Tuning systemd for embedded
Alison Chaiken
 
Android for Embedded Linux Developers
Android for Embedded Linux DevelopersAndroid for Embedded Linux Developers
Android for Embedded Linux Developers
Opersys inc.
 
Kernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysisKernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysis
Anne Nicolas
 
Linuxcon Barcelon 2012: LXC Best Practices
Linuxcon Barcelon 2012: LXC Best PracticesLinuxcon Barcelon 2012: LXC Best Practices
Linuxcon Barcelon 2012: LXC Best Practices
christophm
 
Basic Linux kernel
Basic Linux kernelBasic Linux kernel
Basic Linux kernel
Morteza Nourelahi Alamdari
 
Deployment of WebObjects applications on FreeBSD
Deployment of WebObjects applications on FreeBSDDeployment of WebObjects applications on FreeBSD
Deployment of WebObjects applications on FreeBSD
WO Community
 
Linux Crash Dump Capture and Analysis
Linux Crash Dump Capture and AnalysisLinux Crash Dump Capture and Analysis
Linux Crash Dump Capture and Analysis
Paul V. Novarese
 
Rolling upgrade OpenStack
Rolling upgrade OpenStackRolling upgrade OpenStack
Rolling upgrade OpenStack
Vietnam Open Infrastructure User Group
 
Select, manage, and backport the long term stable kernels
Select, manage, and backport the long term stable kernelsSelect, manage, and backport the long term stable kernels
Select, manage, and backport the long term stable kernels
SZ Lin
 
Linux Kernel - Let's Contribute!
Linux Kernel - Let's Contribute!Linux Kernel - Let's Contribute!
Linux Kernel - Let's Contribute!
Levente Kurusa
 
LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205
Linaro
 
Linux Capabilities - eng - v2.1.5, compact
Linux Capabilities - eng - v2.1.5, compactLinux Capabilities - eng - v2.1.5, compact
Linux Capabilities - eng - v2.1.5, compact
Alessandro Selli
 
SystemV vs systemd
SystemV vs systemdSystemV vs systemd
SystemV vs systemd
All Things Open
 
Crash dump analysis - experience sharing
Crash dump analysis - experience sharingCrash dump analysis - experience sharing
Crash dump analysis - experience sharing
James Hsieh
 
kdump: usage and_internals
kdump: usage and_internalskdump: usage and_internals
kdump: usage and_internals
LinuxCon ContainerCon CloudOpen China
 
Kernel Recipes 2015 - Hardened kernels for everyone
Kernel Recipes 2015 - Hardened kernels for everyoneKernel Recipes 2015 - Hardened kernels for everyone
Kernel Recipes 2015 - Hardened kernels for everyone
Anne Nicolas
 
Containers with systemd-nspawn
Containers with systemd-nspawnContainers with systemd-nspawn
Containers with systemd-nspawn
Gábor Nyers
 
NRPE - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core 4 and others.
NRPE - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core 4 and others.NRPE - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core 4 and others.
NRPE - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core 4 and others.
Marc Trimble
 
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios CoreNrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
Nagios
 
Systemd: the modern Linux init system you will learn to love
Systemd: the modern Linux init system you will learn to loveSystemd: the modern Linux init system you will learn to love
Systemd: the modern Linux init system you will learn to love
Alison Chaiken
 
Tuning systemd for embedded
Tuning systemd for embeddedTuning systemd for embedded
Tuning systemd for embedded
Alison Chaiken
 
Android for Embedded Linux Developers
Android for Embedded Linux DevelopersAndroid for Embedded Linux Developers
Android for Embedded Linux Developers
Opersys inc.
 
Kernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysisKernel Recipes 2015 - Kernel dump analysis
Kernel Recipes 2015 - Kernel dump analysis
Anne Nicolas
 
Linuxcon Barcelon 2012: LXC Best Practices
Linuxcon Barcelon 2012: LXC Best PracticesLinuxcon Barcelon 2012: LXC Best Practices
Linuxcon Barcelon 2012: LXC Best Practices
christophm
 
Deployment of WebObjects applications on FreeBSD
Deployment of WebObjects applications on FreeBSDDeployment of WebObjects applications on FreeBSD
Deployment of WebObjects applications on FreeBSD
WO Community
 
Linux Crash Dump Capture and Analysis
Linux Crash Dump Capture and AnalysisLinux Crash Dump Capture and Analysis
Linux Crash Dump Capture and Analysis
Paul V. Novarese
 
Select, manage, and backport the long term stable kernels
Select, manage, and backport the long term stable kernelsSelect, manage, and backport the long term stable kernels
Select, manage, and backport the long term stable kernels
SZ Lin
 
Linux Kernel - Let's Contribute!
Linux Kernel - Let's Contribute!Linux Kernel - Let's Contribute!
Linux Kernel - Let's Contribute!
Levente Kurusa
 
LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205LMG Lightning Talks - SFO17-205
LMG Lightning Talks - SFO17-205
Linaro
 
Linux Capabilities - eng - v2.1.5, compact
Linux Capabilities - eng - v2.1.5, compactLinux Capabilities - eng - v2.1.5, compact
Linux Capabilities - eng - v2.1.5, compact
Alessandro Selli
 
Crash dump analysis - experience sharing
Crash dump analysis - experience sharingCrash dump analysis - experience sharing
Crash dump analysis - experience sharing
James Hsieh
 
Ad

More from All Things Open (20)

Let's Create a GitHub Copilot Extension! - Nick Taylor, Pomerium
Let's Create a GitHub Copilot Extension! - Nick Taylor, PomeriumLet's Create a GitHub Copilot Extension! - Nick Taylor, Pomerium
Let's Create a GitHub Copilot Extension! - Nick Taylor, Pomerium
All Things Open
 
Leveraging Pre-Trained Transformer Models for Protein Function Prediction - T...
Leveraging Pre-Trained Transformer Models for Protein Function Prediction - T...Leveraging Pre-Trained Transformer Models for Protein Function Prediction - T...
Leveraging Pre-Trained Transformer Models for Protein Function Prediction - T...
All Things Open
 
Gen AI: AI Agents - Making LLMs work together in an organized way - Brent Las...
Gen AI: AI Agents - Making LLMs work together in an organized way - Brent Las...Gen AI: AI Agents - Making LLMs work together in an organized way - Brent Las...
Gen AI: AI Agents - Making LLMs work together in an organized way - Brent Las...
All Things Open
 
You Don't Need an AI Strategy, But You Do Need to Be Strategic About AI - Jes...
You Don't Need an AI Strategy, But You Do Need to Be Strategic About AI - Jes...You Don't Need an AI Strategy, But You Do Need to Be Strategic About AI - Jes...
You Don't Need an AI Strategy, But You Do Need to Be Strategic About AI - Jes...
All Things Open
 
DON’T PANIC: AI IS COMING – The Hitchhiker’s Guide to AI - Mark Hinkle, Perip...
DON’T PANIC: AI IS COMING – The Hitchhiker’s Guide to AI - Mark Hinkle, Perip...DON’T PANIC: AI IS COMING – The Hitchhiker’s Guide to AI - Mark Hinkle, Perip...
DON’T PANIC: AI IS COMING – The Hitchhiker’s Guide to AI - Mark Hinkle, Perip...
All Things Open
 
Fine-Tuning Large Language Models with Declarative ML Orchestration - Shivay ...
Fine-Tuning Large Language Models with Declarative ML Orchestration - Shivay ...Fine-Tuning Large Language Models with Declarative ML Orchestration - Shivay ...
Fine-Tuning Large Language Models with Declarative ML Orchestration - Shivay ...
All Things Open
 
Leveraging Knowledge Graphs for RAG: A Smarter Approach to Contextual AI Appl...
Leveraging Knowledge Graphs for RAG: A Smarter Approach to Contextual AI Appl...Leveraging Knowledge Graphs for RAG: A Smarter Approach to Contextual AI Appl...
Leveraging Knowledge Graphs for RAG: A Smarter Approach to Contextual AI Appl...
All Things Open
 
Artificial Intelligence Needs Community Intelligence - Sriram Raghavan, IBM R...
Artificial Intelligence Needs Community Intelligence - Sriram Raghavan, IBM R...Artificial Intelligence Needs Community Intelligence - Sriram Raghavan, IBM R...
Artificial Intelligence Needs Community Intelligence - Sriram Raghavan, IBM R...
All Things Open
 
Don't just talk to AI, do more with AI: how to improve productivity with AI a...
Don't just talk to AI, do more with AI: how to improve productivity with AI a...Don't just talk to AI, do more with AI: how to improve productivity with AI a...
Don't just talk to AI, do more with AI: how to improve productivity with AI a...
All Things Open
 
Open-Source GenAI vs. Enterprise GenAI: Navigating the Future of AI Innovatio...
Open-Source GenAI vs. Enterprise GenAI: Navigating the Future of AI Innovatio...Open-Source GenAI vs. Enterprise GenAI: Navigating the Future of AI Innovatio...
Open-Source GenAI vs. Enterprise GenAI: Navigating the Future of AI Innovatio...
All Things Open
 
The Death of the Browser - Rachel-Lee Nabors, AgentQL
The Death of the Browser - Rachel-Lee Nabors, AgentQLThe Death of the Browser - Rachel-Lee Nabors, AgentQL
The Death of the Browser - Rachel-Lee Nabors, AgentQL
All Things Open
 
Making Operating System updates fast, easy, and safe
Making Operating System updates fast, easy, and safeMaking Operating System updates fast, easy, and safe
Making Operating System updates fast, easy, and safe
All Things Open
 
Reshaping the landscape of belonging to transform community
Reshaping the landscape of belonging to transform communityReshaping the landscape of belonging to transform community
Reshaping the landscape of belonging to transform community
All Things Open
 
The Unseen, Underappreciated Security Work Your Maintainers May (or may not) ...
The Unseen, Underappreciated Security Work Your Maintainers May (or may not) ...The Unseen, Underappreciated Security Work Your Maintainers May (or may not) ...
The Unseen, Underappreciated Security Work Your Maintainers May (or may not) ...
All Things Open
 
Integrating Diversity, Equity, and Inclusion into Product Design
Integrating Diversity, Equity, and Inclusion into Product DesignIntegrating Diversity, Equity, and Inclusion into Product Design
Integrating Diversity, Equity, and Inclusion into Product Design
All Things Open
 
The Open Source Ecosystem for eBPF in Kubernetes
The Open Source Ecosystem for eBPF in KubernetesThe Open Source Ecosystem for eBPF in Kubernetes
The Open Source Ecosystem for eBPF in Kubernetes
All Things Open
 
Open Source Privacy-Preserving Metrics - Sarah Gran & Brandon Pitman
Open Source Privacy-Preserving Metrics - Sarah Gran & Brandon PitmanOpen Source Privacy-Preserving Metrics - Sarah Gran & Brandon Pitman
Open Source Privacy-Preserving Metrics - Sarah Gran & Brandon Pitman
All Things Open
 
Open-Source Low-Code - Craig St. Jean, Xebia
Open-Source Low-Code - Craig St. Jean, XebiaOpen-Source Low-Code - Craig St. Jean, Xebia
Open-Source Low-Code - Craig St. Jean, Xebia
All Things Open
 
How I Learned to Stop Worrying about my Infrastructure and Love [Open]Tofu
How I Learned to Stop Worrying about my Infrastructure and Love [Open]TofuHow I Learned to Stop Worrying about my Infrastructure and Love [Open]Tofu
How I Learned to Stop Worrying about my Infrastructure and Love [Open]Tofu
All Things Open
 
The Developers' Framework for Content Creation
The Developers' Framework for Content CreationThe Developers' Framework for Content Creation
The Developers' Framework for Content Creation
All Things Open
 
Let's Create a GitHub Copilot Extension! - Nick Taylor, Pomerium
Let's Create a GitHub Copilot Extension! - Nick Taylor, PomeriumLet's Create a GitHub Copilot Extension! - Nick Taylor, Pomerium
Let's Create a GitHub Copilot Extension! - Nick Taylor, Pomerium
All Things Open
 
Leveraging Pre-Trained Transformer Models for Protein Function Prediction - T...
Leveraging Pre-Trained Transformer Models for Protein Function Prediction - T...Leveraging Pre-Trained Transformer Models for Protein Function Prediction - T...
Leveraging Pre-Trained Transformer Models for Protein Function Prediction - T...
All Things Open
 
Gen AI: AI Agents - Making LLMs work together in an organized way - Brent Las...
Gen AI: AI Agents - Making LLMs work together in an organized way - Brent Las...Gen AI: AI Agents - Making LLMs work together in an organized way - Brent Las...
Gen AI: AI Agents - Making LLMs work together in an organized way - Brent Las...
All Things Open
 
You Don't Need an AI Strategy, But You Do Need to Be Strategic About AI - Jes...
You Don't Need an AI Strategy, But You Do Need to Be Strategic About AI - Jes...You Don't Need an AI Strategy, But You Do Need to Be Strategic About AI - Jes...
You Don't Need an AI Strategy, But You Do Need to Be Strategic About AI - Jes...
All Things Open
 
DON’T PANIC: AI IS COMING – The Hitchhiker’s Guide to AI - Mark Hinkle, Perip...
DON’T PANIC: AI IS COMING – The Hitchhiker’s Guide to AI - Mark Hinkle, Perip...DON’T PANIC: AI IS COMING – The Hitchhiker’s Guide to AI - Mark Hinkle, Perip...
DON’T PANIC: AI IS COMING – The Hitchhiker’s Guide to AI - Mark Hinkle, Perip...
All Things Open
 
Fine-Tuning Large Language Models with Declarative ML Orchestration - Shivay ...
Fine-Tuning Large Language Models with Declarative ML Orchestration - Shivay ...Fine-Tuning Large Language Models with Declarative ML Orchestration - Shivay ...
Fine-Tuning Large Language Models with Declarative ML Orchestration - Shivay ...
All Things Open
 
Leveraging Knowledge Graphs for RAG: A Smarter Approach to Contextual AI Appl...
Leveraging Knowledge Graphs for RAG: A Smarter Approach to Contextual AI Appl...Leveraging Knowledge Graphs for RAG: A Smarter Approach to Contextual AI Appl...
Leveraging Knowledge Graphs for RAG: A Smarter Approach to Contextual AI Appl...
All Things Open
 
Artificial Intelligence Needs Community Intelligence - Sriram Raghavan, IBM R...
Artificial Intelligence Needs Community Intelligence - Sriram Raghavan, IBM R...Artificial Intelligence Needs Community Intelligence - Sriram Raghavan, IBM R...
Artificial Intelligence Needs Community Intelligence - Sriram Raghavan, IBM R...
All Things Open
 
Don't just talk to AI, do more with AI: how to improve productivity with AI a...
Don't just talk to AI, do more with AI: how to improve productivity with AI a...Don't just talk to AI, do more with AI: how to improve productivity with AI a...
Don't just talk to AI, do more with AI: how to improve productivity with AI a...
All Things Open
 
Open-Source GenAI vs. Enterprise GenAI: Navigating the Future of AI Innovatio...
Open-Source GenAI vs. Enterprise GenAI: Navigating the Future of AI Innovatio...Open-Source GenAI vs. Enterprise GenAI: Navigating the Future of AI Innovatio...
Open-Source GenAI vs. Enterprise GenAI: Navigating the Future of AI Innovatio...
All Things Open
 
The Death of the Browser - Rachel-Lee Nabors, AgentQL
The Death of the Browser - Rachel-Lee Nabors, AgentQLThe Death of the Browser - Rachel-Lee Nabors, AgentQL
The Death of the Browser - Rachel-Lee Nabors, AgentQL
All Things Open
 
Making Operating System updates fast, easy, and safe
Making Operating System updates fast, easy, and safeMaking Operating System updates fast, easy, and safe
Making Operating System updates fast, easy, and safe
All Things Open
 
Reshaping the landscape of belonging to transform community
Reshaping the landscape of belonging to transform communityReshaping the landscape of belonging to transform community
Reshaping the landscape of belonging to transform community
All Things Open
 
The Unseen, Underappreciated Security Work Your Maintainers May (or may not) ...
The Unseen, Underappreciated Security Work Your Maintainers May (or may not) ...The Unseen, Underappreciated Security Work Your Maintainers May (or may not) ...
The Unseen, Underappreciated Security Work Your Maintainers May (or may not) ...
All Things Open
 
Integrating Diversity, Equity, and Inclusion into Product Design
Integrating Diversity, Equity, and Inclusion into Product DesignIntegrating Diversity, Equity, and Inclusion into Product Design
Integrating Diversity, Equity, and Inclusion into Product Design
All Things Open
 
The Open Source Ecosystem for eBPF in Kubernetes
The Open Source Ecosystem for eBPF in KubernetesThe Open Source Ecosystem for eBPF in Kubernetes
The Open Source Ecosystem for eBPF in Kubernetes
All Things Open
 
Open Source Privacy-Preserving Metrics - Sarah Gran & Brandon Pitman
Open Source Privacy-Preserving Metrics - Sarah Gran & Brandon PitmanOpen Source Privacy-Preserving Metrics - Sarah Gran & Brandon Pitman
Open Source Privacy-Preserving Metrics - Sarah Gran & Brandon Pitman
All Things Open
 
Open-Source Low-Code - Craig St. Jean, Xebia
Open-Source Low-Code - Craig St. Jean, XebiaOpen-Source Low-Code - Craig St. Jean, Xebia
Open-Source Low-Code - Craig St. Jean, Xebia
All Things Open
 
How I Learned to Stop Worrying about my Infrastructure and Love [Open]Tofu
How I Learned to Stop Worrying about my Infrastructure and Love [Open]TofuHow I Learned to Stop Worrying about my Infrastructure and Love [Open]Tofu
How I Learned to Stop Worrying about my Infrastructure and Love [Open]Tofu
All Things Open
 
The Developers' Framework for Content Creation
The Developers' Framework for Content CreationThe Developers' Framework for Content Creation
The Developers' Framework for Content Creation
All Things Open
 
Ad

Recently uploaded (20)

Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 

Intro to Kernel Debugging - Just make the crashing stop!

  • 1. Intro to Kernel Debugging: Just make the crashing stop! Welcome. Here’s what’s in store if you stick around. We’ll Introduce: ● Gathering debug information ● Kernel development processes ● Oops analysis ● Code inspection ● Git tricks for finding fixes ● Engaging the kernel community ● How to dive deeper into debugging Dave Chiluk, Linux Platform Engineer at Indeed We’ll also cover a case study of a real-life XFS filesystem corruption bug
  • 2. Dave Chiluk Intro to Kernel Debugging: Just Make the Crashing Stop! Linux Platform Engineer, Indeed
  • 3. Intro to Kernel Debugging: Just Make the Crashing Stop! We’ll Introduce: ● Gathering debug information ● Kernel development processes ● Oops analysis ● Code inspection ● Git tricks for finding fixes ● Engaging the kernel community ● How to dive deeper into debugging We’ll walk through a real-life case study.
  • 6. Dave Chiluk ● Indeed.com: Linux Platform Engineer ○ Fix issues in the open-source code that Indeed uses ● Canonical: Ubuntu Sustaining Engineering ○ Supported Customers with Ubuntu Kernel problems ○ Ubuntu core developer
  • 7. Scale Up ● Servers are like pets ● You name them, and when they get sick, you nurse them back to health Scale Out ● Servers are like cattle ● You number them, and when they get sick, you shoot them Bill Baker, Distinguished Engineer, Microsoft - Scaling SQL Server 2012 by Glenn Berry Pets vs. Cattle
  • 8. Pets vs. Cattle… and Wolves Scale Up ● Servers are like pets ● You name them, and when they get sick, you nurse them back to health Scale Out ● Servers are like cattle ● You number them, and when they get sick, you shoot them ● If wolves start eating too many cattle, shoot them too.
  • 9. Case Study: An xfs crash ● SYSENG-2163: Rekick dc1-srv3 Description: "We need to rekick dc1-srv3 due to filesystem corruption. Currently this host is in downtime and removed from the mesos cluster." ● SYSENG-2336: Rekick dc2-srv11 ● SYSENG-2342: Rekick dc2-srv13 ● SYSENG-2398: replace dc1-srv3; /var is corrupt for the n'th time ● SYSENG-2624: Rekick dc3-srv7: /var corrupted ● SYSENG-2723: Rekick dc1-srv6 ● SYSENG-2770: /var corrupted on dc1-srv16 ● SYSENG-2802: Fix the corrupt disk issue on dc1-srv10 (kernel bug) ● SYSENG-2849: dc4-srv5/6 var corruption ● SYSENG-2850: Monitor on corrupt filesystems ● SYSENG-3056: dc2-srv28 /var corruption by xfs bug
  • 10. Step 1: Gather Information
  • 11. Step 1: Gather Information XFS (dm-4): Internal error XFS_WANT_CORRUPTED_GOTO at line 3505 of file fs/xfs/libxfs/xfs_btree.c. Caller xfs_free_ag_extent+0x35d/0x7a0 [xfs] CPU: 18 PID: 9896 Comm: mesos-slave Not tainted 4.10.10-1.el7.elrepo.x86_64 #1 Hardware name: Supermicro PIO-618U-TR4T+-ST031/X10DRU-i+, BIOS 2.0 12/17/2015 Call Trace: dump_stack+0x63/0x87 xfs_error_report+0x3b/0x40 [xfs] ? xfs_free_ag_extent+0x35d/0x7a0 [xfs] xfs_btree_insert+0x1b0/0x1c0 [xfs] xfs_free_ag_extent+0x35d/0x7a0 [xfs] xfs_free_extent+0xbb/0x150 [xfs] xfs_trans_free_extent+0x4f/0x110 [xfs] ? xfs_trans_add_item+0x5d/0x90 [xfs] xfs_extent_free_finish_item+0x26/0x40 [xfs] xfs_defer_finish+0x149/0x410 [xfs] xfs_remove+0x281/0x330 [xfs] xfs_vn_unlink+0x55/0xa0 [xfs] vfs_rmdir+0xb6/0x130 do_rmdir+0x1b3/0x1d0 SyS_rmdir+0x16/0x20 do_syscall_64+0x67/0x180 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f85d8d92397 RSP: 002b:00007f85cef9b758 EFLAGS: 00000246 ORIG_RAX: 0000000000000054 RAX: ffffffffffffffda RBX: 00007f858c00b4c0 RCX: 00007f85d8d92397 RDX: 00007f858c09ad70 RSI: 0000000000000000 RDI: 00007f858c09ad70 RBP: 00007f85cef9bc30 R08: 0000000000000001 R09: 0000000000000002 R10: 0000006f74656c67 R11: 0000000000000246 R12: 00007f85cef9c640 R13: 00007f85cef9bc50 R14: 00007f85cef9bcc0 R15: 00007f85cef9bc40 XFS (dm-4): xfs_do_force_shutdown(0x8) called from line 236 of file fs/xfs/libxfs/xfs_defer.c. Return address = 0xffffffffa028f087 XFS (dm-4): Corruption of in-memory data detected. Shutting down filesystem XFS (dm-4): Please umount the filesystem and rectify the problem(s) ● Kernel Version ● Logs ○ First Oops ○ /var/log ○ Console output ○ rsyslog if necessary ● crashdump ● sosreport ● sar
  • 12. Filesystem Errors Dump the filesystem - dd, physical removal xfs-metadump xfs-restore Network Errors tcpdump wireshark Step 1: Gather Information Device Driver Errors Firmware level? Module arguments? modinfo output For Subsystem Issues Many others ...
  • 13. Step 2: Get the Sources Find the exact sources used to build your kernel.
  • 14. Get the Sources: Kernel Development 4.17 4.18-rc1 4.18-rc2 ... v4.18 Mainline Kernel Development ● All active kernel development eventually gets merged here ● git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git ● Currently Maintained by Greg Kroah-Hartman (previously Linus Torvalds) ● 14432 Patches from v4.17 (June 3, 2018) to v4.18 (Aug 12, 2018) - 10 WEEKS! New features, bug fixes Linux Mainline Tree
  • 15. Get the Sources: Kernel Development Linux Mainline Tree 4.17 4.18-rc1 4.18-rc2 ... 4.18 Subsystem Development trees ● Maintained by “Lieutenant” Subsystem Maintainers ● XFS https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git ● Networking git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git xfs-linux net-next
  • 16. Get the Sources Linux Mainline Tree Stable Kernels ● Release kernels + bug fixes ● Maintained by Greg Kroah-Hartman ● git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git ● One repository many branches. ● Short-term stable and long-term stable 4.2 Linux Stable 4.2 (STS) Bug fixes 4.3 Linux Stable 4.3 (STS) 4.5 Linux Stable 4.5 (STS) Linux Stable 4.4 (LTS to Feb 2022) 4.4
  • 17. Mainline or Stable Kernels ● git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git make old config make; make modules_install; make install ● update-initramfs ● update-grub Prebuilt mainline kernels are available for many distributions ● https://wiki.ubuntu.com/Kernel/MainlineBuilds ● http://elrepo.org - Centos linux-stable kernels. Step 2: Get the Sources Indeed uses and contributes to elrepo kernels
  • 18. Get the Sources Linux Stable Centos 7 3.10.0 4.13 4.14 4.15 4.16 4.17 4.18 Fedora 27 Ubuntu 17.10 Ubuntu 18.04 Fedora 28 Ubuntu 18.10 Centos 7 Distribution Kernels ● Typically branched from Linux-stable kernels, but not necessarily LTS kernels ● Follow linux-stable process + feature work ● Own maintainers
  • 19. Step 2: Get the Sources Centos ● $ git clone https://git.centos.org/summary/rpms!kernel.git $ git clone https://git.centos.org/git/centos-git-common.git $ cd kernel && git checkout c7 $ centos-git-common/get_sources.sh $ rpm-build -ba ● Provides an RPM-centric source tree, a source tarball, and a bunch of individual patches that are applied ● This is not a “real” git repository
  • 20. Step 2: Get the Sources Ubuntu / Debian ● apt-get source linux-image-$(uname-r) ● http://kernel.ubuntu.com/git/?q=ubuntu%2F ● git clone git://kernel.ubuntu.com/ubuntu/ubuntu-* ● rebuild
  • 21. Step 2: Get the Sources Debuginfo is unstripped versions of vmlinux ● This is needed if you want to run crash against a crashdump or do register analysis against the stack trace in your oops. Centos/ RHEL ● yum --enablerepo=base-debuginfo install -y kernel-debuginfo-$(uname -r) Ubuntu ● https://wiki.ubuntu.com/Debug%20Symbol%20Packages Debug Information
  • 22. Step 2: Get the Sources Kernel Structure documentation/ process/ - how to interact with the community admin-guide/ - the manual admin-guide/bug-hunting.rst mm/ - Memory Management net/ - Network fs/ - Filesystems arch/ - Architecture Specific drivers/ - Device Drivers firmware/ - Binary Blobs scripts/ get_maintainer.pl checkpatch.pl Start here!
  • 23. Step 3: Oops Analysis
  • 24. Step 3: Oops Analysis - The Toolkit BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120 PGD 1a3aa8067 PUD 1a3b3d067 PMD 0 Oops: 0002 [#1] PREEMPT SMP Modules linked in: bnep ccm binfmt_misc uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core hid_a4tech videodev x86_pkg_temp_thermal intel_powerclamp coretemp ath3k btusb btrtl btintel bluetooth kvm_intel snd_hda_codec_hdmi kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass crc32c_intel arc4 i915 snd_hda_intel snd_hda_codec ath9k ath9k_common ath9k_hw ath i2c_algo_bit snd_hwdep mac80211 ghash_clmulni_intel snd_hda_core snd_pcm snd_timer cfg80211 ehci_pci xhci_pci drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm xhci_hcd ehci_hcd asus_nb_wmi(-) asus_wmi sparse_keymap r8169 rfkill mxm_wmi serio_raw snd mii mei_me lpc_ich i2c_i801 video soundcore mei i2c_smbus wmi i2c_core mfd_core CPU: 3 PID: 3275 Comm: modprobe Not tainted 4.9.34-gentoo #34 Hardware name: ASUSTeK COMPUTER INC. K56CM/K56CM, BIOS K56CM.206 08/21/2012 task: ffff8801a639ba00 task.stack: ffffc900014cc000 RIP: 0010:[<ffffffff816c7348>] [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120 RSP: 0018:ffffc900014cfce0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff8801a54315b0 RCX: 00000000c0000100 RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8801a54315b4 RBP: ffffc900014cfd30 R08: 0000000000000000 R09: 0000000000000002 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801a54315b4 R13: ffff8801a639ba00 R14: 00000000ffffffff R15: ffff8801a54315b8 FS: 00007faa254fb700(0000) GS:ffff8801aef80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000001a3b1b000 CR4: 00000000001406e0 Stack: ffff8801a54315b8 0000000000000000 ffffffff814733ae ffffc900014cfd28 ffffffff8146a28c ffff8801a54315b0 0000000000000000 ffff8801a54315b0 ffff8801a66f3820 0000000000000000 ffffc900014cfd48 ffffffff816c73e7 Call Trace: [<ffffffff814733ae>] ? acpi_ut_release_mutex+0x5d/0x61 [<ffffffff8146a28c>] ? acpi_ns_get_node+0x49/0x52 [<ffffffff816c73e7>] mutex_lock+0x17/0x30 [<ffffffffa00a3bb4>] asus_rfkill_hotplug+0x24/0x1a0 [asus_wmi] [<ffffffffa00a4421>] asus_wmi_rfkill_exit+0x61/0x150 [asus_wmi] [<ffffffffa00a49f1>] asus_wmi_remove+0x61/0xb0 [asus_wmi] [<ffffffff814a5128>] platform_drv_remove+0x28/0x40 [<ffffffff814a2901>] __device_release_driver+0xa1/0x160 [<ffffffff814a29e3>] device_release_driver+0x23/0x30 [<ffffffff814a1ffd>] bus_remove_device+0xfd/0x170 [<ffffffff8149e5a9>] device_del+0x139/0x270 [<ffffffff814a5028>] platform_device_del+0x28/0x90 [<ffffffff814a50a2>] platform_device_unregister+0x12/0x30 [<ffffffffa00a4209>] asus_wmi_unregister_driver+0x19/0x30 [asus_wmi] [<ffffffffa00da0ea>] asus_nb_wmi_exit+0x10/0xf26 [asus_nb_wmi] [<ffffffff8110c692>] SyS_delete_module+0x192/0x270 [<ffffffff810022b2>] ? exit_to_usermode_loop+0x92/0xa0 [<ffffffff816ca560>] entry_SYSCALL_64_fastpath+0x13/0x94 Code: e8 5e 30 00 00 8b 03 83 f8 01 0f 84 93 00 00 00 48 8b 43 10 4c 8d 7b 08 48 89 63 10 41 be ff ff ff ff 4c 89 3c 24 48 89 44 24 08 <48> 89 20 4c 89 6c 24 10 eb 1d 4c 89 e7 49 c7 45 08 02 00 00 00 RIP [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120 RSP <ffffc900014cfce0> CR2: 0000000000000000 ---[ end trace 8d484233fa7cb512 ]--- note: modprobe[3275] exited with preempt_count 2
  • 25. Step 3: Oops Analysis - The Toolkit BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120 PGD 1a3aa8067 PUD 1a3b3d067 PMD 0 Oops: 0002 [#1] PREEMPT SMP Modules linked in: bnep ccm binfmt_misc uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core hid_a4tech videodev x86_pkg_temp_thermal intel_powerclamp coretemp ath3k btusb btrtl btintel bluetooth kvm_intel snd_hda_codec_hdmi kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass crc32c_intel arc4 i915 snd_hda_intel snd_hda_codec ath9k ath9k_common ath9k_hw ath i2c_algo_bit snd_hwdep mac80211 ghash_clmulni_intel snd_hda_core snd_pcm snd_timer cfg80211 ehci_pci xhci_pci drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm xhci_hcd ehci_hcd asus_nb_wmi(-) asus_wmi sparse_keymap r8169 rfkill mxm_wmi serio_raw snd mii mei_me lpc_ich i2c_i801 video soundcore mei i2c_smbus wmi i2c_core mfd_core CPU: 3 PID: 3275 Comm: modprobe Not tainted 4.9.34-gentoo #34 Hardware name: ASUSTeK COMPUTER INC. K56CM/K56CM, BIOS K56CM.206 08/21/2012 task: ffff8801a639ba00 task.stack: ffffc900014cc000 RIP: 0010:[<ffffffff816c7348>] [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
  • 26. Step 3: Oops Analysis - The Toolkit BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120 PGD 1a3aa8067 PUD 1a3b3d067 PMD 0 Oops: 0002 [#1] PREEMPT SMP Modules linked in: bnep ccm binfmt_misc uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core hid_a4tech videodev x86_pkg_temp_thermal intel_powerclamp coretemp ath3k btusb btrtl btintel bluetooth kvm_intel snd_hda_codec_hdmi kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass crc32c_intel arc4 i915 snd_hda_intel snd_hda_codec ath9k ath9k_common ath9k_hw ath i2c_algo_bit snd_hwdep mac80211 ghash_clmulni_intel snd_hda_core snd_pcm snd_timer cfg80211 ehci_pci xhci_pci drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm xhci_hcd ehci_hcd asus_nb_wmi(-) asus_wmi sparse_keymap r8169 rfkill mxm_wmi serio_raw snd mii mei_me lpc_ich i2c_i801 video soundcore mei i2c_smbus wmi i2c_core mfd_core CPU: 3 PID: 3275 Comm: modprobe Not tainted 4.9.34-gentoo #34 Hardware name: ASUSTeK COMPUTER INC. K56CM/K56CM, BIOS K56CM.206 08/21/2012 task: ffff8801a639ba00 task.stack: ffffc900014cc000 RIP: 0010:[<ffffffff816c7348>] [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
  • 27. Step 3: Oops Analysis - The Toolkit BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120 PGD 1a3aa8067 PUD 1a3b3d067 PMD 0 Oops: 0002 [#1] PREEMPT SMP Modules linked in: bnep ccm binfmt_misc uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core hid_a4tech videodev x86_pkg_temp_thermal intel_powerclamp coretemp ath3k btusb btrtl btintel bluetooth kvm_intel snd_hda_codec_hdmi kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass crc32c_intel arc4 i915 snd_hda_intel snd_hda_codec ath9k ath9k_common ath9k_hw ath i2c_algo_bit snd_hwdep mac80211 ghash_clmulni_intel snd_hda_core snd_pcm snd_timer cfg80211 ehci_pci xhci_pci drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm xhci_hcd ehci_hcd asus_nb_wmi(-) asus_wmi sparse_keymap r8169 rfkill mxm_wmi serio_raw snd mii mei_me lpc_ich i2c_i801 video soundcore mei i2c_smbus wmi i2c_core mfd_core CPU: 3 PID: 3275 Comm: modprobe Not tainted 4.9.34-gentoo #34 Hardware name: ASUSTeK COMPUTER INC. K56CM/K56CM, BIOS K56CM.206 08/21/2012 task: ffff8801a639ba00 task.stack: ffffc900014cc000 RIP: 0010:[<ffffffff816c7348>] [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
  • 28. Step 3: Oops Analysis - The Toolkit BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120 PGD 1a3aa8067 PUD 1a3b3d067 PMD 0 Oops: 0002 [#1] PREEMPT SMP Modules linked in: bnep ccm binfmt_misc uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core hid_a4tech videodev x86_pkg_temp_thermal intel_powerclamp coretemp ath3k btusb btrtl btintel bluetooth kvm_intel snd_hda_codec_hdmi kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass crc32c_intel arc4 i915 snd_hda_intel snd_hda_codec ath9k ath9k_common ath9k_hw ath i2c_algo_bit snd_hwdep mac80211 ghash_clmulni_intel snd_hda_core snd_pcm snd_timer cfg80211 ehci_pci xhci_pci drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm xhci_hcd ehci_hcd asus_nb_wmi(-) asus_wmi sparse_keymap r8169 rfkill mxm_wmi serio_raw snd mii mei_me lpc_ich i2c_i801 video soundcore mei i2c_smbus wmi i2c_core mfd_core CPU: 3 PID: 3275 Comm: modprobe Not tainted 4.9.34-gentoo #34 Hardware name: ASUSTeK COMPUTER INC. K56CM/K56CM, BIOS K56CM.206 08/21/2012 task: ffff8801a639ba00 task.stack: ffffc900014cc000 RIP: 0010:[<ffffffff816c7348>] [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
  • 29. Step 3: Oops Analysis - The Toolkit BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120 PGD 1a3aa8067 PUD 1a3b3d067 PMD 0 Oops: 0002 [#1] PREEMPT SMP Modules linked in: bnep ccm binfmt_misc uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core hid_a4tech videodev x86_pkg_temp_thermal intel_powerclamp coretemp ath3k btusb btrtl btintel bluetooth kvm_intel snd_hda_codec_hdmi kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass crc32c_intel arc4 i915 snd_hda_intel snd_hda_codec ath9k ath9k_common ath9k_hw ath i2c_algo_bit snd_hwdep mac80211 ghash_clmulni_intel snd_hda_core snd_pcm snd_timer cfg80211 ehci_pci xhci_pci drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm xhci_hcd ehci_hcd asus_nb_wmi(-) asus_wmi sparse_keymap r8169 rfkill mxm_wmi serio_raw snd mii mei_me lpc_ich i2c_i801 video soundcore mei i2c_smbus wmi i2c_core mfd_core CPU: 3 PID: 3275 Comm: modprobe Not tainted 4.9.34-gentoo #34 Hardware name: ASUSTeK COMPUTER INC. K56CM/K56CM, BIOS K56CM.206 08/21/2012 task: ffff8801a639ba00 task.stack: ffffc900014cc000 RIP: 0010:[<ffffffff816c7348>] [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
  • 30. Step 3: Oops Analysis - The Toolkit BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120 PGD 1a3aa8067 PUD 1a3b3d067 PMD 0 Oops: 0002 [#1] PREEMPT SMP Modules linked in: bnep ccm binfmt_misc uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core hid_a4tech videodev x86_pkg_temp_thermal intel_powerclamp coretemp ath3k btusb btrtl btintel bluetooth kvm_intel snd_hda_codec_hdmi kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass crc32c_intel arc4 i915 snd_hda_intel snd_hda_codec ath9k ath9k_common ath9k_hw ath i2c_algo_bit snd_hwdep mac80211 ghash_clmulni_intel snd_hda_core snd_pcm snd_timer cfg80211 ehci_pci xhci_pci drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm xhci_hcd ehci_hcd asus_nb_wmi(-) asus_wmi sparse_keymap r8169 rfkill mxm_wmi serio_raw snd mii mei_me lpc_ich i2c_i801 video soundcore mei i2c_smbus wmi i2c_core mfd_core CPU: 3 PID: 3275 Comm: modprobe Not tainted 4.9.34-gentoo #34 Hardware name: ASUSTeK COMPUTER INC. K56CM/K56CM, BIOS K56CM.206 08/21/2012 task: ffff8801a639ba00 task.stack: ffffc900014cc000 RIP: 0010:[<ffffffff816c7348>] [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
  • 31. Step 3: Oops Analysis - The Toolkit BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120 PGD 1a3aa8067 PUD 1a3b3d067 PMD 0 Oops: 0002 [#1] PREEMPT SMP Modules linked in: bnep ccm binfmt_misc uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core hid_a4tech videodev x86_pkg_temp_thermal intel_powerclamp coretemp ath3k btusb btrtl btintel bluetooth kvm_intel snd_hda_codec_hdmi kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass crc32c_intel arc4 i915 snd_hda_intel snd_hda_codec ath9k ath9k_common ath9k_hw ath i2c_algo_bit snd_hwdep mac80211 ghash_clmulni_intel snd_hda_core snd_pcm snd_timer cfg80211 ehci_pci xhci_pci drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm xhci_hcd ehci_hcd asus_nb_wmi(-) asus_wmi sparse_keymap r8169 rfkill mxm_wmi serio_raw snd mii mei_me lpc_ich i2c_i801 video soundcore mei i2c_smbus wmi i2c_core mfd_core CPU: 3 PID: 3275 Comm: modprobe Not tainted 4.9.34-gentoo #34 Hardware name: ASUSTeK COMPUTER INC. K56CM/K56CM, BIOS K56CM.206 08/21/2012 task: ffff8801a639ba00 task.stack: ffffc900014cc000 RIP: 0010:[<ffffffff816c7348>] [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
  • 32. Step 3: Oops Analysis - The Toolkit BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120 PGD 1a3aa8067 PUD 1a3b3d067 PMD 0 Oops: 0002 [#1] PREEMPT SMP Modules linked in: bnep ccm binfmt_misc uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core hid_a4tech videodev x86_pkg_temp_thermal intel_powerclamp coretemp ath3k btusb btrtl btintel bluetooth kvm_intel snd_hda_codec_hdmi kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass crc32c_intel arc4 i915 snd_hda_intel snd_hda_codec ath9k ath9k_common ath9k_hw ath i2c_algo_bit snd_hwdep mac80211 ghash_clmulni_intel snd_hda_core snd_pcm snd_timer cfg80211 ehci_pci xhci_pci drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm xhci_hcd ehci_hcd asus_nb_wmi(-) asus_wmi sparse_keymap r8169 rfkill mxm_wmi serio_raw snd mii mei_me lpc_ich i2c_i801 video soundcore mei i2c_smbus wmi i2c_core mfd_core CPU: 3 PID: 3275 Comm: modprobe Not tainted 4.9.34-gentoo #34 Hardware name: ASUSTeK COMPUTER INC. K56CM/K56CM, BIOS K56CM.206 08/21/2012 task: ffff8801a639ba00 task.stack: ffffc900014cc000 RIP: 0010:[<ffffffff816c7348>] [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
  • 33. Step 3: Oops Analysis - The Toolkit BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120 PGD 1a3aa8067 PUD 1a3b3d067 PMD 0 Oops: 0002 [#1] PREEMPT SMP Modules linked in: bnep ccm binfmt_misc uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core hid_a4tech videodev x86_pkg_temp_thermal intel_powerclamp coretemp ath3k btusb btrtl btintel bluetooth kvm_intel snd_hda_codec_hdmi kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass crc32c_intel arc4 i915 snd_hda_intel snd_hda_codec ath9k ath9k_common ath9k_hw ath i2c_algo_bit snd_hwdep mac80211 ghash_clmulni_intel snd_hda_core snd_pcm snd_timer cfg80211 ehci_pci xhci_pci drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm xhci_hcd ehci_hcd asus_nb_wmi(-) asus_wmi sparse_keymap r8169 rfkill mxm_wmi serio_raw snd mii mei_me lpc_ich i2c_i801 video soundcore mei i2c_smbus wmi i2c_core mfd_core CPU: 3 PID: 3275 Comm: modprobe Not tainted 4.9.34-gentoo #34 Hardware name: ASUSTeK COMPUTER INC. K56CM/K56CM, BIOS K56CM.206 08/21/2012 task: ffff8801a639ba00 task.stack: ffffc900014cc000 RIP: 0010:[<ffffffff816c7348>] [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
  • 34. Step 3: Oops Analysis - The Toolkit BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120 PGD 1a3aa8067 PUD 1a3b3d067 PMD 0 Oops: 0002 [#1] PREEMPT SMP Modules linked in: bnep ccm binfmt_misc uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core hid_a4tech videodev x86_pkg_temp_thermal intel_powerclamp coretemp ath3k btusb btrtl btintel bluetooth kvm_intel snd_hda_codec_hdmi kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass crc32c_intel arc4 i915 snd_hda_intel snd_hda_codec ath9k ath9k_common ath9k_hw ath i2c_algo_bit snd_hwdep mac80211 ghash_clmulni_intel snd_hda_core snd_pcm snd_timer cfg80211 ehci_pci xhci_pci drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm xhci_hcd ehci_hcd asus_nb_wmi(-) asus_wmi sparse_keymap r8169 rfkill mxm_wmi serio_raw snd mii mei_me lpc_ich i2c_i801 video soundcore mei i2c_smbus wmi i2c_core mfd_core CPU: 3 PID: 3275 Comm: modprobe Not tainted 4.9.34-gentoo #34 Hardware name: ASUSTeK COMPUTER INC. K56CM/K56CM, BIOS K56CM.206 08/21/2012 task: ffff8801a639ba00 task.stack: ffffc900014cc000 RIP: 0010:[<ffffffff816c7348>] [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120
  • 35. Step 3: Oops Analysis - The Toolkit RIP: 0010:[<ffffffff816c7348>] [<ffffffff816c7348>] __mutex_lock_slowpath+0x98/0x120 RSP: 0018:ffffc900014cfce0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff8801a54315b0 RCX: 00000000c0000100 RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8801a54315b4 RBP: ffffc900014cfd30 R08: 0000000000000000 R09: 0000000000000002 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801a54315b4 R13: ffff8801a639ba00 R14: 00000000ffffffff R15: ffff8801a54315b8 FS: 00007faa254fb700(0000) GS:ffff8801aef80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000001a3b1b000 CR4: 00000000001406e0 $ objdump -d -S -l ./vmlinux-4.15.0-22-generic > /tmp/objdump.out ffffffff81992130 <__mutex_lock_slowpath>: __mutex_lock_slowpath(): /build/linux-lZKWha/linux-4.15.0/kernel/locking/mutex.c:1130 ffffffff81992130: e8 3b fb 06 00 callq ffffffff81a01c70 <__fentry__> ffffffff81992135: 55 push %rbp /build/linux-lZKWha/linux-4.15.0/kernel/locking/mutex.c:1131 ffffffff81992136: be 02 00 00 00 mov $0x2,%esi /build/linux-lZKWha/linux-4.15.0/kernel/locking/mutex.c:1130 ffffffff8199213b: 48 89 e5 mov %rsp,%rbp
  • 36. Step 3: Oops Analysis - The Toolkit Register to argument mapping defined in: [linux]/arch/x86/entry/calling.h x86 function call convention, 64-bit:
  • 37. Step 3: Oops Analysis - The Toolkit Stack: ffff8801a54315b8 0000000000000000 ffffffff814733ae ffffc900014cfd28 ffffffff8146a28c ffff8801a54315b0 0000000000000000 ffff8801a54315b0 ffff8801a66f3820 0000000000000000 ffffc900014cfd48 ffffffff816c73e7 Call Trace: [<ffffffff814733ae>] ? acpi_ut_release_mutex+0x5d/0x61 [<ffffffff8146a28c>] ? acpi_ns_get_node+0x49/0x52 [<ffffffff816c73e7>] mutex_lock+0x17/0x30 [<ffffffffa00a3bb4>] asus_rfkill_hotplug+0x24/0x1a0 [asus_wmi] [<ffffffffa00a4421>] asus_wmi_rfkill_exit+0x61/0x150 [asus_wmi] [<ffffffffa00a49f1>] asus_wmi_remove+0x61/0xb0 [asus_wmi] [<ffffffff814a5128>] platform_drv_remove+0x28/0x40 [<ffffffff814a2901>] __device_release_driver+0xa1/0x160 [<ffffffff814a29e3>] device_release_driver+0x23/0x30 [<ffffffff814a1ffd>] bus_remove_device+0xfd/0x170 [<ffffffff8149e5a9>] device_del+0x139/0x270 [<ffffffff814a5028>] platform_device_del+0x28/0x90 [<ffffffff814a50a2>] platform_device_unregister+0x12/0x30 [<ffffffffa00a4209>] asus_wmi_unregister_driver+0x19/0x30 [asus_wmi] [<ffffffffa00da0ea>] asus_nb_wmi_exit+0x10/0xf26 [asus_nb_wmi] [<ffffffff8110c692>] SyS_delete_module+0x192/0x270 [<ffffffff810022b2>] ? exit_to_usermode_loop+0x92/0xa0 [<ffffffff816ca560>] entry_SYSCALL_64_fastpath+0x13/0x94
  • 38. Step 3: Oops Analysis - The Toolkit Stack: ffff8801a54315b8 0000000000000000 ffffffff814733ae ffffc900014cfd28 ffffffff8146a28c ffff8801a54315b0 0000000000000000 ffff8801a54315b0 ffff8801a66f3820 0000000000000000 ffffc900014cfd48 ffffffff816c73e7 Call Trace: [<ffffffff814733ae>] ? acpi_ut_release_mutex+0x5d/0x61 [<ffffffff8146a28c>] ? acpi_ns_get_node+0x49/0x52 [<ffffffff816c73e7>] mutex_lock+0x17/0x30 [<ffffffffa00a3bb4>] asus_rfkill_hotplug+0x24/0x1a0 [asus_wmi] [<ffffffffa00a4421>] asus_wmi_rfkill_exit+0x61/0x150 [asus_wmi] [<ffffffffa00a49f1>] asus_wmi_remove+0x61/0xb0 [asus_wmi] [<ffffffff814a5128>] platform_drv_remove+0x28/0x40 [<ffffffff814a2901>] __device_release_driver+0xa1/0x160 [<ffffffff814a29e3>] device_release_driver+0x23/0x30 [<ffffffff814a1ffd>] bus_remove_device+0xfd/0x170 [<ffffffff8149e5a9>] device_del+0x139/0x270 [<ffffffff814a5028>] platform_device_del+0x28/0x90 [<ffffffff814a50a2>] platform_device_unregister+0x12/0x30 [<ffffffffa00a4209>] asus_wmi_unregister_driver+0x19/0x30 [asus_wmi] [<ffffffffa00da0ea>] asus_nb_wmi_exit+0x10/0xf26 [asus_nb_wmi] [<ffffffff8110c692>] SyS_delete_module+0x192/0x270 [<ffffffff810022b2>] ? exit_to_usermode_loop+0x92/0xa0 [<ffffffff816ca560>] entry_SYSCALL_64_fastpath+0x13/0x94
  • 39. Step 3: Oops Analysis - The Toolkit Stack: ffff8801a54315b8 0000000000000000 ffffffff814733ae ffffc900014cfd28 ffffffff8146a28c ffff8801a54315b0 0000000000000000 ffff8801a54315b0 ffff8801a66f3820 0000000000000000 ffffc900014cfd48 ffffffff816c73e7 Call Trace: [<ffffffff814733ae>] ? acpi_ut_release_mutex+0x5d/0x61 [<ffffffff8146a28c>] ? acpi_ns_get_node+0x49/0x52 [<ffffffff816c73e7>] mutex_lock+0x17/0x30 [<ffffffffa00a3bb4>] asus_rfkill_hotplug+0x24/0x1a0 [asus_wmi] [<ffffffffa00a4421>] asus_wmi_rfkill_exit+0x61/0x150 [asus_wmi] [<ffffffffa00a49f1>] asus_wmi_remove+0x61/0xb0 [asus_wmi] [<ffffffff814a5128>] platform_drv_remove+0x28/0x40 [<ffffffff814a2901>] __device_release_driver+0xa1/0x160 [<ffffffff814a29e3>] device_release_driver+0x23/0x30 [<ffffffff814a1ffd>] bus_remove_device+0xfd/0x170 [<ffffffff8149e5a9>] device_del+0x139/0x270 [<ffffffff814a5028>] platform_device_del+0x28/0x90 [<ffffffff814a50a2>] platform_device_unregister+0x12/0x30 [<ffffffffa00a4209>] asus_wmi_unregister_driver+0x19/0x30 [asus_wmi] [<ffffffffa00da0ea>] asus_nb_wmi_exit+0x10/0xf26 [asus_nb_wmi] [<ffffffff8110c692>] SyS_delete_module+0x192/0x270 [<ffffffff810022b2>] ? exit_to_usermode_loop+0x92/0xa0 [<ffffffff816ca560>] entry_SYSCALL_64_fastpath+0x13/0x94 Bottom / oldest Top / most recent
  • 40. Step 3: Oops Analysis - The Toolkit Stack: ffff8801a54315b8 0000000000000000 ffffffff814733ae ffffc900014cfd28 ffffffff8146a28c ffff8801a54315b0 0000000000000000 ffff8801a54315b0 ffff8801a66f3820 0000000000000000 ffffc900014cfd48 ffffffff816c73e7 Call Trace: [<ffffffff814733ae>] ? acpi_ut_release_mutex+0x5d/0x61 [<ffffffff8146a28c>] ? acpi_ns_get_node+0x49/0x52 [<ffffffff816c73e7>] mutex_lock+0x17/0x30 [<ffffffffa00a3bb4>] asus_rfkill_hotplug+0x24/0x1a0 [asus_wmi] [<ffffffffa00a4421>] asus_wmi_rfkill_exit+0x61/0x150 [asus_wmi] [<ffffffffa00a49f1>] asus_wmi_remove+0x61/0xb0 [asus_wmi] [<ffffffff814a5128>] platform_drv_remove+0x28/0x40 [<ffffffff814a2901>] __device_release_driver+0xa1/0x160 [<ffffffff814a29e3>] device_release_driver+0x23/0x30 [<ffffffff814a1ffd>] bus_remove_device+0xfd/0x170 [<ffffffff8149e5a9>] device_del+0x139/0x270 [<ffffffff814a5028>] platform_device_del+0x28/0x90 [<ffffffff814a50a2>] platform_device_unregister+0x12/0x30 [<ffffffffa00a4209>] asus_wmi_unregister_driver+0x19/0x30 [asus_wmi] [<ffffffffa00da0ea>] asus_nb_wmi_exit+0x10/0xf26 [asus_nb_wmi] [<ffffffff8110c692>] SyS_delete_module+0x192/0x270 [<ffffffff810022b2>] ? exit_to_usermode_loop+0x92/0xa0 [<ffffffff816ca560>] entry_SYSCALL_64_fastpath+0x13/0x94
  • 41. Step 3: Oops Analysis - The Toolkit Code: A hex-dump of the section of machine code that was being run at the time the Oops occurred. Code: e8 5e 30 00 00 8b 03 83 f8 01 0f 84 93 00 00 00 48 8b 43 10 4c 8d 7b 08 48 89 63 10 41 be ff ff ff ff 4c 89 3c 24 48 89 44 24 08 <48> 89 20 4c 89 6c 24 10 eb 1d 4c 89 e7 49 c7 45 08 02 00 00 00 $ echo "Code: e8 5e 30 00 00 8b 03 83 f8 01 0f 84 93 00 00 00 48 8b 43 10 4c 8d 7b 08 48 89 63 10 41 be ff ff ff ff 4c 89 3c 24 48 89 44 24 08 <48> 89 20 4c 89 6c 24 10 eb 1d 4c 89 e7 49 c7 45 08 02 00 00 00 " | ~/src/linux/scripts/decodecode Code: e8 5e 30 00 00 8b 03 83 f8 01 0f 84 93 00 00 00 48 8b 43 10 4c 8d 7b 08 48 89 63 10 41 be ff ff ff ff 4c 89 3c 24 48 89 44 24 08 <48> 89 20 4c 89 6c 24 10 eb 1d 4c 89 e7 49 c7 45 08 02 00 00 00 All code ======== 0: e8 5e 30 00 00 callq 0x3063 5: 8b 03 mov (%rbx),%eax 7: 83 f8 01 cmp $0x1,%eax a: 0f 84 93 00 00 00 je 0xa3 10: 48 8b 43 10 mov 0x10(%rbx),%rax 14: 4c 8d 7b 08 lea 0x8(%rbx),%r15 18: 48 89 63 10 mov %rsp,0x10(%rbx) 1c: 41 be ff ff ff ff mov $0xffffffff,%r14d 22: 4c 89 3c 24 mov %r15,(%rsp) 26: 48 89 44 24 08 mov %rax,0x8(%rsp) 2b:* 48 89 20 mov %rsp,(%rax) <-- trapping instruction
  • 42. Case Study: Log Output XFS (dm-4): Internal error XFS_WANT_CORRUPTED_GOTO at line 3505 of file fs/xfs/libxfs/xfs_btree.c. Caller xfs_free_ag_extent+0x35d/0x7a0 [xfs] CPU: 18 PID: 9896 Comm: mesos-slave Not tainted 4.10.10-1.el7.elrepo.x86_64 #1 Hardware name: Supermicro PIO-618U-TR4T+-ST031/X10DRU-i+, BIOS 2.0 12/17/2015 Call Trace: dump_stack+0x63/0x87 xfs_error_report+0x3b/0x40 [xfs] ? xfs_free_ag_extent+0x35d/0x7a0 [xfs] xfs_btree_insert+0x1b0/0x1c0 [xfs] xfs_free_ag_extent+0x35d/0x7a0 [xfs] xfs_free_extent+0xbb/0x150 [xfs] xfs_trans_free_extent+0x4f/0x110 [xfs] ? xfs_trans_add_item+0x5d/0x90 [xfs] xfs_extent_free_finish_item+0x26/0x40 [xfs] xfs_defer_finish+0x149/0x410 [xfs] xfs_remove+0x281/0x330 [xfs] xfs_vn_unlink+0x55/0xa0 [xfs] vfs_rmdir+0xb6/0x130 do_rmdir+0x1b3/0x1d0 SyS_rmdir+0x16/0x20 do_syscall_64+0x67/0x180 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f85d8d92397 RSP: 002b:00007f85cef9b758 EFLAGS: 00000246 ORIG_RAX: 0000000000000054 RAX: ffffffffffffffda RBX: 00007f858c00b4c0 RCX: 00007f85d8d92397 RDX: 00007f858c09ad70 RSI: 0000000000000000 RDI: 00007f858c09ad70 RBP: 00007f85cef9bc30 R08: 0000000000000001 R09: 0000000000000002 R10: 0000006f74656c67 R11: 0000000000000246 R12: 00007f85cef9c640 R13: 00007f85cef9bc50 R14: 00007f85cef9bcc0 R15: 00007f85cef9bc40 XFS (dm-4): xfs_do_force_shutdown(0x8) called from line 236 of file fs/xfs/libxfs/xfs_defer.c. Return address = 0xffffffffa028f087 XFS (dm-4): Corruption of in-memory data detected. Shutting down filesystem XFS (dm-4): Please umount the filesystem and rectify the problem(s)
  • 43. Provides exact Kernel Version + file + line number ! XFS (dm-4): Internal error XFS_WANT_CORRUPTED_GOTO at line 3505 of file fs/xfs/libxfs/xfs_btree.c. Caller xfs_free_ag_extent+0x35d/0x7a0 [xfs] CPU: 18 PID: 9896 Comm: mesos-slave Not tainted 4.10.10-1.el7.elrepo.x86_64 #1 Call Trace: dump_stack+0x63/0x87 xfs_error_report+0x3b/0x40 [xfs] ? xfs_free_ag_extent+0x35d/0x7a0 [xfs] xfs_btree_insert+0x1b0/0x1c0 [xfs] xfs_free_ag_extent+0x35d/0x7a0 [xfs] xfs_free_extent+0xbb/0x150 [xfs] xfs_trans_free_extent+0x4f/0x110 [xfs] ? xfs_trans_add_item+0x5d/0x90 [xfs] xfs_extent_free_finish_item+0x26/0x40 [xfs] xfs_defer_finish+0x149/0x410 [xfs] xfs_remove+0x281/0x330 [xfs] xfs_vn_unlink+0x55/0xa0 [xfs] vfs_rmdir+0xb6/0x130 do_rmdir+0x1b3/0x1d0 SyS_rmdir+0x16/0x20 do_syscall_64+0x67/0x180 entry_SYSCALL64_slow_path+0x25/0x25 Case Study: The Oops
  • 44. XFS (dm-4): Internal error XFS_WANT_CORRUPTED_GOTO at line 3505 of file fs/xfs/libxfs/xfs_btree.c. Caller xfs_free_ag_extent+0x35d/0x7a0 [xfs] CPU: 18 PID: 9896 Comm: mesos-slave Not tainted 4.10.10-1.el7.elrepo.x86_64 #1 3462 xfs_btree_insert( ... 3493 /* 3494 * Insert nrec/nptr into this level of the tree. 3495 * Note if we fail, nptr will be null. 3496 */ 3497 error = xfs_btree_insrec(pcur, level, &nptr, &rec, key, 3498 &ncur, &i); 3499 if (error) { 3500 if (pcur != cur) 3501 xfs_btree_del_cursor(pcur, XFS_BTREE_ERROR); 3502 goto error0; 3503 } 3504 3505 XFS_WANT_CORRUPTED_GOTO(cur->bc_mp, i == 1, error0); Case Study: Code Inspection
  • 45. Case Study: Code Inspection 3241 xfs_btree_insrec( 3242 struct xfs_btree_cur *cur, /* btree cursor */ ... 3248 int *stat) /* success/failure */ 3249 { ... 3271 /* 3272 * If we have an external root pointer, and we've made it to the 3273 * root level, allocate a new root block and we're done. 3274 */ 3275 if (!(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) && 3276 (level >= cur->bc_nlevels)) { 3277 error = xfs_btree_new_root(cur, stat);
  • 46. xfs_btree_insrec |-> xfs_btree_new_root |-> cur->bc_ops->alloc_block(cur, &rptr, &lptr, stat); |-> xfs_allocbt_alloc_block |-> xfs_alloc_get_freelist |-> agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp); if (bno == NULLAGBLOCK) { XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT); *stat = 0; return 0; } Case Study: Code Inspection
  • 47. Step 4: Check for Fixes
  • 48. Step 4: Check for Fixes Google ● Stack trace ● Mailing list archives ● Related terms given your understanding
  • 49. Step 4: Check for Fixes Check the git commit logs for fixes $ git log -r v4.10.. -i --grep "crash" fs/xfs $ git log -r v4.10.. -i --grep "agfl" fs/xfs $ git log -r v4.10.. fs/xfs/libxfs/xfs_btree.c Use git bisect to identify fixes already exist on mainline $ git bisect start $ git bisect bad <fixed version> $ git bisect good <bad version>
  • 50. Step 5: Gather Even More Info
  • 51. Step 5: Gather Even More Info ● Enable crashdump ● Instrument the kernel code (kprintf) / build a custom kernel ● systemtap, ftrace, kprobes, jprobes ● eBPF ● perf ● Analyze filesystem metadata offline
  • 52. Step 6: Engage the Community Copyright © 2016, Eklektix, Inc.
  • 53. Mailing lists: ● lkml ● Subsystem specific lists! ● patchwork.kernel.org - Mailing list hub IRC ● Freenode ○ #xfs ○ ##kernel ○ #ubuntu-kernel ● OFTC ○ #kernelnewbies Step 6: Engage the Community
  • 54. Step 6: Engage the Community E-mail sent to linux-xfs mailing list: ● Include full Oops output. ● Include any analysis you may have been able to do. "My best guess given code analysis is that we are unable to allocate a new node in the allocation group free-list btree." Response: "Without xfs_repair output, … we have no idea whether this was caused by corruption or some other problem... If I had a dollar for every time I've seen this sort of error report, I'd have retired years ago.” - Dave Chinner
  • 55. Step 6: Engage the Community ● Gathered the requested information ● Responded via IRC freenode.net #xfs "This is the problematic issue: commit 96f859d52bcb ("libxfs: pack the agfl header structure so XFS_AGFL_SIZE is correct")... [I] need to resurrect the old patches [I] had that automatically detected this condition and fixed it." - Dave Chinner
  • 56. Step 6.5: The Temporary Fix
  • 57. Step 6.5: The Temporary Fix ● Explicitly removed problematic patch ● Submitted patch removal to the elrepo kernel ○ http://elrepo.org/bugs/view.php?id=829 ○ http://elrepo.org/bugs/view.php?id=833 ● Considered running xfs-repair on every /var volume in our cluster ● Proceeded to attempt to create a "fixup" patchset
  • 59. Step 7: The Real Fix
  • 60. Step 7: The Real Fix 4 patch rewrites on the linux-xfs mailing list 3 months later a27ba2607 is accepted into the xfs development tree
  • 61. Step 7: The Real Fix Mainline ● A month later Linus merges patches from xfs-devel into 4.17rc1 during the 4.17 merge window. Linux Stable ● Linux stable backport and submission to linux-stable mailing list. ● Greg KH accepts the patches and adds them to the stable-queue git tree for review. ● Eventually merged to stable branches such as 4.4. Distribution Kernels ● Most distros follow Linux Stable guidelines. ● Check your distro just to make sure. Don't forget to upgrade your cluster!