mirror of
https://github.com/torvalds/linux.git
synced 2026-01-25 07:47:50 +00:00
Introduce a generic infrastructure for tracking recoverable hardware
errors (HW errors that are visible to the OS but does not cause a panic)
and record them for vmcore consumption. This aids post-mortem crash
analysis tools by preserving a count and timestamp for the last occurrence
of such errors. On the other side, correctable errors, which the OS
typically remains unaware of because the underlying hardware handles them
transparently, are less relevant for crash dump and therefore are NOT
tracked in this infrastructure.
Add centralized logging for sources of recoverable hardware errors based
on the subsystem it has been notified.
hwerror_data is write-only at kernel runtime, and it is meant to be read
from vmcore using tools like crash/drgn. For example, this is how it
looks like when opening the crashdump from drgn.
>>> prog['hwerror_data']
(struct hwerror_info[1]){
{
.count = (int)844,
.timestamp = (time64_t)1752852018,
},
...
This helps fleet operators quickly triage whether a crash may be
influenced by hardware recoverable errors (which executes a uncommon code
path in the kernel), especially when recoverable errors occurred shortly
before a panic, such as the bug fixed by commit ee62ce7a1d ("page_pool:
Track DMA-mapped pages and unmap them when destroying the pool")
This is not intended to replace full hardware diagnostics but provides a
fast way to correlate hardware events with kernel panics quickly.
Rare machine check exceptions—like those indicated by mce_flags.p5 or
mce_flags.winchip—are not accounted for in this method, as they fall
outside the intended usage scope for this feature's user base.
[leitao@debian.org: add hw-recoverable-errors to toctree]
Link: https://lkml.kernel.org/r/20251127-vmcoreinfo_fix-v1-1-26f5b1c43da9@debian.org
Link: https://lkml.kernel.org/r/20251010-vmcore_hw_error-v5-1-636ede3efe44@debian.org
Signed-off-by: Breno Leitao <leitao@debian.org>
Suggested-by: Tony Luck <tony.luck@intel.com>
Suggested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com> [APEI]
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Bob Moore <robert.moore@intel.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: Konrad Rzessutek Wilk <konrad.wilk@oracle.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: "Oliver O'Halloran" <oohall@gmail.com>
Cc: Omar Sandoval <osandov@osandov.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
158 lines
2.4 KiB
ReStructuredText
158 lines
2.4 KiB
ReStructuredText
.. SPDX-License-Identifier: GPL-2.0
|
|
|
|
==============================
|
|
Driver implementer's API guide
|
|
==============================
|
|
|
|
The kernel offers a wide variety of interfaces to support the development
|
|
of device drivers. This document is an only somewhat organized collection
|
|
of some of those interfaces — it will hopefully get better over time! The
|
|
available subsections can be seen below.
|
|
|
|
|
|
General information for driver authors
|
|
======================================
|
|
|
|
This section contains documentation that should, at some point or other, be
|
|
of interest to most developers working on device drivers.
|
|
|
|
.. toctree::
|
|
:maxdepth: 1
|
|
|
|
basics
|
|
driver-model/index
|
|
device_link
|
|
infrastructure
|
|
ioctl
|
|
pm/index
|
|
|
|
Useful support libraries
|
|
========================
|
|
|
|
This section contains documentation that should, at some point or other, be
|
|
of interest to most developers working on device drivers.
|
|
|
|
.. toctree::
|
|
:maxdepth: 1
|
|
|
|
early-userspace/index
|
|
connector
|
|
device-io
|
|
devfreq
|
|
dma-buf
|
|
component
|
|
io-mapping
|
|
io_ordering
|
|
uio-howto
|
|
vfio-mediated-device
|
|
vfio
|
|
vfio-pci-device-specific-driver-acceptance
|
|
|
|
Bus-level documentation
|
|
=======================
|
|
|
|
.. toctree::
|
|
:maxdepth: 1
|
|
|
|
auxiliary_bus
|
|
cxl/index
|
|
eisa
|
|
firewire
|
|
i3c/index
|
|
isa
|
|
men-chameleon-bus
|
|
pci/index
|
|
rapidio/index
|
|
slimbus
|
|
usb/index
|
|
virtio/index
|
|
vme
|
|
w1
|
|
xillybus
|
|
|
|
|
|
Subsystem-specific APIs
|
|
=======================
|
|
|
|
.. toctree::
|
|
:maxdepth: 1
|
|
|
|
80211/index
|
|
acpi/index
|
|
backlight/lp855x-driver.rst
|
|
clk
|
|
coco/index
|
|
console
|
|
crypto/index
|
|
dmaengine/index
|
|
dpll
|
|
edac
|
|
extcon
|
|
firmware/index
|
|
fpga/index
|
|
frame-buffer
|
|
aperture
|
|
generic-counter
|
|
gpio/index
|
|
hsi
|
|
hte/index
|
|
hw-recoverable-errors
|
|
i2c
|
|
iio/index
|
|
infiniband
|
|
input
|
|
interconnect
|
|
ipmb
|
|
ipmi
|
|
libata
|
|
mailbox
|
|
md/index
|
|
media/index
|
|
mei/index
|
|
memory-devices/index
|
|
message-based
|
|
misc_devices
|
|
miscellaneous
|
|
mmc/index
|
|
mtd/index
|
|
mtdnand
|
|
nfc/index
|
|
ntb
|
|
nvdimm/index
|
|
nvmem
|
|
parport-lowlevel
|
|
phy/index
|
|
pin-control
|
|
pldmfw/index
|
|
pps
|
|
ptp
|
|
pwm
|
|
pwrseq
|
|
regulator
|
|
reset
|
|
rfkill
|
|
s390-drivers
|
|
scsi
|
|
serial/index
|
|
sm501
|
|
soundwire/index
|
|
spi
|
|
surface_aggregator/index
|
|
switchtec
|
|
sync_file
|
|
target
|
|
tee
|
|
thermal/index
|
|
tty/index
|
|
wbrf
|
|
wmi
|
|
xilinx/index
|
|
zorro
|
|
|
|
.. only:: subproject and html
|
|
|
|
Indices
|
|
=======
|
|
|
|
* :ref:`genindex`
|