mirror of
https://github.com/torvalds/linux.git
synced 2026-01-24 23:16:46 +00:00
Add CXL region debugfs attributes to inject and clear poison based on an offset into the region. These new interfaces allow users to operate on poison at the region level without needing to resolve Device Physical Addresses (DPA) or target individual memdevs. The implementation uses a new helper, region_offset_to_dpa_result() that applies decoder interleave logic, including XOR-based address decoding when applicable. Note that XOR decodes rely on driver internal xormaps which are not exposed to userspace. So, this support is not only a simplification of poison operations that could be done using existing per memdev operations, but also it enables this functionality for XOR interleaved regions for the first time. New debugfs attributes are added in /sys/kernel/debug/cxl/regionX/: inject_poison and clear_poison. These are only exposed if all memdevs participating in the region support both inject and clear commands, ensuring consistent and reliable behavior across multi-device regions. If tracing is enabled, these operations are logged as cxl_poison events in /sys/kernel/tracing/trace. The ABI documentation warns users of the significant risks that come with using these capabilities. A CXL Maturity Map update shows this user flow is now supported. Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/f3fd8628ab57ea79704fb2d645902cd499c066af.1754290144.git.alison.schofield@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
158 lines
6.8 KiB
Plaintext
158 lines
6.8 KiB
Plaintext
What: /sys/kernel/debug/cxl/memX/inject_poison
|
|
Date: April, 2023
|
|
KernelVersion: v6.4
|
|
Contact: linux-cxl@vger.kernel.org
|
|
Description:
|
|
(WO) When a Device Physical Address (DPA) is written to this
|
|
attribute, the memdev driver sends an inject poison command to
|
|
the device for the specified address. The DPA must be 64-byte
|
|
aligned and the length of the injected poison is 64-bytes. If
|
|
successful, the device returns poison when the address is
|
|
accessed through the CXL.mem bus. Injecting poison adds the
|
|
address to the device's Poison List and the error source is set
|
|
to Injected. In addition, the device adds a poison creation
|
|
event to its internal Informational Event log, updates the
|
|
Event Status register, and if configured, interrupts the host.
|
|
It is not an error to inject poison into an address that
|
|
already has poison present and no error is returned. If the
|
|
device returns 'Inject Poison Limit Reached' an -EBUSY error
|
|
is returned to the user. The inject_poison attribute is only
|
|
visible for devices supporting the capability.
|
|
|
|
TEST-ONLY INTERFACE: This interface is intended for testing
|
|
and validation purposes only. It is not a data repair mechanism
|
|
and should never be used on production systems or live data.
|
|
|
|
DATA LOSS RISK: For CXL persistent memory (PMEM) devices,
|
|
poison injection can result in permanent data loss. Injected
|
|
poison may render data permanently inaccessible even after
|
|
clearing, as the clear operation writes zeros and does not
|
|
recover original data.
|
|
|
|
SYSTEM STABILITY RISK: For volatile memory, poison injection
|
|
can cause kernel crashes, system instability, or unpredictable
|
|
behavior if the poisoned addresses are accessed by running code
|
|
or critical kernel structures.
|
|
|
|
What: /sys/kernel/debug/cxl/memX/clear_poison
|
|
Date: April, 2023
|
|
KernelVersion: v6.4
|
|
Contact: linux-cxl@vger.kernel.org
|
|
Description:
|
|
(WO) When a Device Physical Address (DPA) is written to this
|
|
attribute, the memdev driver sends a clear poison command to
|
|
the device for the specified address. Clearing poison removes
|
|
the address from the device's Poison List and writes 0 (zero)
|
|
for 64 bytes starting at address. It is not an error to clear
|
|
poison from an address that does not have poison set. If the
|
|
device cannot clear poison from the address, -ENXIO is returned.
|
|
The clear_poison attribute is only visible for devices
|
|
supporting the capability.
|
|
|
|
TEST-ONLY INTERFACE: This interface is intended for testing
|
|
and validation purposes only. It is not a data repair mechanism
|
|
and should never be used on production systems or live data.
|
|
|
|
CLEAR IS NOT DATA RECOVERY: This operation writes zeros to the
|
|
specified address range and removes the address from the poison
|
|
list. It does NOT recover or restore original data that may have
|
|
been present before poison injection. Any original data at the
|
|
cleared address is permanently lost and replaced with zeros.
|
|
|
|
CLEAR IS NOT A REPAIR MECHANISM: This interface is for testing
|
|
purposes only and should not be used as a data repair tool.
|
|
Clearing poison is fundamentally different from data recovery
|
|
or error correction.
|
|
|
|
What: /sys/kernel/debug/cxl/regionX/inject_poison
|
|
Date: August, 2025
|
|
Contact: linux-cxl@vger.kernel.org
|
|
Description:
|
|
(WO) When a Host Physical Address (HPA) is written to this
|
|
attribute, the region driver translates it to a Device
|
|
Physical Address (DPA) and identifies the corresponding
|
|
memdev. It then sends an inject poison command to that memdev
|
|
at the translated DPA. Refer to the memdev ABI entry at:
|
|
/sys/kernel/debug/cxl/memX/inject_poison for the detailed
|
|
behavior. This attribute is only visible if all memdevs
|
|
participating in the region support both inject and clear
|
|
poison commands.
|
|
|
|
TEST-ONLY INTERFACE: This interface is intended for testing
|
|
and validation purposes only. It is not a data repair mechanism
|
|
and should never be used on production systems or live data.
|
|
|
|
DATA LOSS RISK: For CXL persistent memory (PMEM) devices,
|
|
poison injection can result in permanent data loss. Injected
|
|
poison may render data permanently inaccessible even after
|
|
clearing, as the clear operation writes zeros and does not
|
|
recover original data.
|
|
|
|
SYSTEM STABILITY RISK: For volatile memory, poison injection
|
|
can cause kernel crashes, system instability, or unpredictable
|
|
behavior if the poisoned addresses are accessed by running code
|
|
or critical kernel structures.
|
|
|
|
What: /sys/kernel/debug/cxl/regionX/clear_poison
|
|
Date: August, 2025
|
|
Contact: linux-cxl@vger.kernel.org
|
|
Description:
|
|
(WO) When a Host Physical Address (HPA) is written to this
|
|
attribute, the region driver translates it to a Device
|
|
Physical Address (DPA) and identifies the corresponding
|
|
memdev. It then sends a clear poison command to that memdev
|
|
at the translated DPA. Refer to the memdev ABI entry at:
|
|
/sys/kernel/debug/cxl/memX/clear_poison for the detailed
|
|
behavior. This attribute is only visible if all memdevs
|
|
participating in the region support both inject and clear
|
|
poison commands.
|
|
|
|
TEST-ONLY INTERFACE: This interface is intended for testing
|
|
and validation purposes only. It is not a data repair mechanism
|
|
and should never be used on production systems or live data.
|
|
|
|
CLEAR IS NOT DATA RECOVERY: This operation writes zeros to the
|
|
specified address range and removes the address from the poison
|
|
list. It does NOT recover or restore original data that may have
|
|
been present before poison injection. Any original data at the
|
|
cleared address is permanently lost and replaced with zeros.
|
|
|
|
CLEAR IS NOT A REPAIR MECHANISM: This interface is for testing
|
|
purposes only and should not be used as a data repair tool.
|
|
Clearing poison is fundamentally different from data recovery
|
|
or error correction.
|
|
|
|
What: /sys/kernel/debug/cxl/einj_types
|
|
Date: January, 2024
|
|
KernelVersion: v6.9
|
|
Contact: linux-cxl@vger.kernel.org
|
|
Description:
|
|
(RO) Prints the CXL protocol error types made available by
|
|
the platform in the format:
|
|
|
|
0x<error number> <error type>
|
|
|
|
The possible error types are (as of ACPI v6.5):
|
|
|
|
0x1000 CXL.cache Protocol Correctable
|
|
0x2000 CXL.cache Protocol Uncorrectable non-fatal
|
|
0x4000 CXL.cache Protocol Uncorrectable fatal
|
|
0x8000 CXL.mem Protocol Correctable
|
|
0x10000 CXL.mem Protocol Uncorrectable non-fatal
|
|
0x20000 CXL.mem Protocol Uncorrectable fatal
|
|
|
|
The <error number> can be written to einj_inject to inject
|
|
<error type> into a chosen dport.
|
|
|
|
What: /sys/kernel/debug/cxl/$dport_dev/einj_inject
|
|
Date: January, 2024
|
|
KernelVersion: v6.9
|
|
Contact: linux-cxl@vger.kernel.org
|
|
Description:
|
|
(WO) Writing an integer to this file injects the corresponding
|
|
CXL protocol error into $dport_dev ($dport_dev will be a device
|
|
name from /sys/bus/pci/devices). The integer to type mapping for
|
|
injection can be found by reading from einj_types. If the dport
|
|
was enumerated in RCH mode, a CXL 1.1 error is injected, otherwise
|
|
a CXL 2.0 error is injected.
|