std: `sleep_until` on Motor and VEX
This PR:
* Forwards the public `sleep_until` to the private `sleep_until` on Motor OS
* Adds a `sleep_until` implementation on VEX that yields until the deadline has passed
CC @lasiotus
CC @lewisfm @tropicaaal @Gavin-Niederman @max-niederman
add `simd_splat` intrinsic
Add `simd_splat` which lowers to the LLVM canonical splat sequence.
```llvm
insertelement <N x elem> poison, elem %x, i32 0
shufflevector <N x elem> v0, <N x elem> poison, <N x i32> zeroinitializer
```
Right now we try to fake it using one of
```rust
fn splat(x: u32) -> u32x8 {
u32x8::from_array([x; 8])
}
```
or (in `stdarch`)
```rust
fn splat(value: $elem_type) -> $name {
#[derive(Copy, Clone)]
#[repr(simd)]
struct JustOne([$elem_type; 1]);
let one = JustOne([value]);
// SAFETY: 0 is always in-bounds because we're shuffling
// a simd type with exactly one element.
unsafe { simd_shuffle!(one, one, [0; $len]) }
}
```
Both of these can confuse the LLVM optimizer, producing sub-par code. Some examples:
- https://github.com/rust-lang/rust/issues/60637
- https://github.com/rust-lang/rust/issues/137407
- https://github.com/rust-lang/rust/issues/122623
- https://github.com/rust-lang/rust/issues/97804
---
As far as I can tell there is no way to provide a fallback implementation for this intrinsic, because there is no `const` way of evaluating the number of elements (there might be issues beyond that, too). So, I added implementations for all 4 backends.
Both GCC and const-eval appear to have some issues with simd vectors containing pointers. I have a workaround for GCC, but haven't yet been able to make const-eval work. See the comments below.
Currently this just adds the intrinsic, it does not actually use it anywhere yet.
Fix(lib/win/net): Remove hostname support under Win7
Fixesrust-lang/rust#150896. `GetHostNameW` is not available under Windows 7, leading to dynamic linking failures upon program executions. For now, as it is still unstable, this therefore appropriately cfg-gates the feature in order to mark the Win7 as unsupported with regards to this particular feature. Porting the functionality for Windows 7 would require changing the underlying system call and so more work for the immediate need.
@rustbot label C-bug O-windows-7 T-libs A-io
Three targets, covering A32 and T32 instructions, and soft-float and
hard-float ABIs. Hard-float not available in Thumb mode. Atomics
in Thumb mode require __sync* functions from compiler-builtins.
Fix compilation of std/src/sys/pal/uefi/tests.rs
Dropped the `align` test since the `POOL_ALIGNMENT` and `align_size` items it uses do not exist.
The other changes are straightforward fixes for places where the test code drifted from the current API, since the tests are not yet built in CI for the UEFI target.
CC @Ayush1325
Don't use default build-script fingerprinting in `test`
This changes the `test` build script so that it does not use the default fingerprinting mechanism in cargo which causes a full scan of the package every time it runs. This build script does not depend on any of the files in the package.
This is the recommended approach for writing build scripts.
Fix is_ascii performance regression on AVX-512 CPUs when compiling with -C target-cpu=native
## Summary
This PR fixes a severe performance regression in `slice::is_ascii` on AVX-512 CPUs when compiling with `-C target-cpu=native`.
On affected systems, the current implementation achieves only ~3 GB/s for large inputs, compared to ~60–70 GB/s previously (≈20–24× regression). This PR restores the original performance characteristics.
This change is intended as a **temporary workaround** for upstream LLVM poor codegen. Once the underlying LLVM issue is fixed and Rust is able to consume that fix, this workaround should be reverted.
## Problem
When `is_ascii` is compiled with AVX-512 enabled, LLVM's auto-vectorization generates ~31 `kshiftrd` instructions to extract mask bits one-by-one, instead of using the efficient `pmovmskb`
instruction. This causes a **~22x performance regression**.
Because `is_ascii` is marked `#[inline]`, it gets inlined and recompiled with the user's target settings, affecting anyone using `-C target-cpu=native` on AVX-512 CPUs.
## Root cause (upstream)
The underlying issue appears to be an LLVM vectorizer/backend bug affecting certain AVX-512 patterns.
An upstream issue has been filed by @folkertdev to track the root cause: llvm/llvm-project#176906
Until this is resolved in LLVM and picked up by rustc, this PR avoids triggering the problematic codegen pattern.
## Solution
Replace the counting loop with explicit SSE2 intrinsics (`_mm_movemask_epi8`) that force `pmovmskb` codegen regardless of CPU features.
## Godbolt Links (Rust 1.92)
| Pattern | Target | Link | Result |
|---------|--------|------|--------|
| Counting loop (old) | Default SSE2 | https://godbolt.org/z/sE86xz4fY | `pmovmskb` |
| Counting loop (old) | AVX-512 (znver4) | https://godbolt.org/z/b3jvMhGd3 | 31x `kshiftrd` (broken) |
| SSE2 intrinsics (fix) | Default SSE2 | https://godbolt.org/z/hMeGfeaPv | `pmovmskb` |
| SSE2 intrinsics (fix) | AVX-512 (znver4) | https://godbolt.org/z/Tdvdqjohn | `vpmovmskb` (fixed) |
## Benchmark Results
**CPU:** AMD Ryzen 5 7500F (Zen 4 with AVX-512)
### Default Target (SSE2) — Mixed
| Size | Before | After | Change |
|------|--------|-------|--------|
| 4 B | 1.8 GB/s | 2.0 GB/s | **+11%** |
| 8 B | 3.2 GB/s | 5.8 GB/s | **+81%** |
| 16 B | 5.3 GB/s | 8.5 GB/s | **+60%** |
| 32 B | 17.7 GB/s | 15.8 GB/s | -11% |
| 64 B | 28.6 GB/s | 25.1 GB/s | -12% |
| 256 B | 51.5 GB/s | 48.6 GB/s | ~same |
| 1 KB | 64.9 GB/s | 60.7 GB/s | ~same |
| 4 KB+ | ~68-70 GB/s | ~68-72 GB/s | ~same |
### Native Target (AVX-512) — Up to 24x Faster
| Size | Before | After | Speedup |
|------|--------|-------|---------|
| 4 B | 1.2 GB/s | 2.0 GB/s | **1.7x** |
| 8 B | 1.6 GB/s | 5.0 GB/s | **3.3x** |
| 16 B | ~7 GB/s | ~7 GB/s | ~same |
| 32 B | 2.9 GB/s | 14.2 GB/s | **4.9x** |
| 64 B | 2.9 GB/s | 23.2 GB/s | **8x** |
| 256 B | 2.9 GB/s | 47.2 GB/s | **16x** |
| 1 KB | 2.8 GB/s | 60.0 GB/s | **21x** |
| 4 KB+ | 2.9 GB/s | ~68-70 GB/s | **23-24x** |
### Summary
- **SSE2 (default):** Small inputs (4-16 B) 11-81% faster; 32-64 B ~11% slower; large inputs unchanged
- **AVX-512 (native):** 21-24x faster for inputs ≥1 KB, peak ~70 GB/s (was ~3 GB/s)
Note: this is the pure ascii path, but the story is similar for the others.
See linked bench project.
## Test Plan
- [x] Assembly test (`slice-is-ascii-avx512.rs`) verifies no `kshiftrd` with AVX-512
- [x] Existing codegen test updated to `loongarch64`-only (auto-vectorization still used there)
- [x] Fuzz testing confirms old/new implementations produce identical results (~53M iterations)
- [x] Benchmarks confirm performance improvement
- [x] Tidy checks pass
## Reproduction / Test Projects
Standalone validation tools: https://github.com/bonega/is-ascii-fix-validation
- `bench/` - Criterion benchmarks for SSE2 vs AVX-512 comparison
- `fuzz/` - Compares old/new implementations with libfuzzer
## Related Issues
- issue opened by @folkertdev llvm/llvm-project#176906
- Regression introduced in https://github.com/rust-lang/rust/pull/130733
Dropped the `align` test since the `POOL_ALIGNMENT` and `align_size`
items it uses do not exist.
The other changes are straightforward fixes for places where the test
code drifted from the current API, since the tests are not yet built in
CI for the UEFI target.
This changes the `test` build script so that it does not use the default
fingerprinting mechanism in cargo which causes a full scan of the
package every time it runs. This build script does not depend on any of
the files in the package.
This is the recommended approach for writing build scripts.
Clean up or resolve cfg-related instances of `FIXME(f16_f128)`
* Replace target-specific config that has a `FIXME` with `cfg(target_has_reliable_f*)`
* Take care of trivial intrinsic-related FIXMEs
* Split `FIXME(f16_f128)` into `FIXME(f16)`, `FIXME(f128)`, or `FIXME(f16,f128)` to more clearly identify what they block
The individual commit messages have more details.
This FIXME was introduced in 6e2d934a88 ("Add more `f16` and `f128`
library functions and constants") but I can't actually find my
reasoning. As far as I can tell, LLVM has been lowering `fabs.f128` as
bitwise operations for a while across all targets so there shouldn't be
a problem here.
Thus, apply what is suggested in the FIXME.
There are a number of instances of `FIXME(f16_f128)` related to target
configuration; either these could use `target_has_reliable_f128`, or the
FIXME is describing such a cfg and is thus redundant (since any
`cfg(target_has_reliable_f*)` needs to be removed before stabilization
anyway).
Switch to using `target_has_reliable_*` where applicable and remove the
redundant FIXMEs.
std: use `clock_nanosleep` for `sleep` where available
`nanosleep` is specified to use `CLOCK_REALTIME` but the documentation (especially the example) for `sleep` imply that it measures time using `Instant`, which uses `CLOCK_MONOTONIC`. Thus, this PR makes `sleep` use a relative `clock_nanosleep` with `CLOCK_MONOTONIC` where available. This doesn't make a difference for Linux (which uses `CLOCK_MONOTONIC` for `nanosleep` anyway) but is relevant for e.g. FreeBSD.
This also restores nanosecond-sleep precision for WASI, since https://github.com/rust-lang/rust/issues/150290 was caused by `nanosleep` internally using `clock_nanosleep` with `CLOCK_REALTIME` which is unsupported on WASIp2.
CC @alexcrichton for the WASI fix
Remove the `#[target_feature(enable = "sse2")]` attribute and make the
function safe to call. The SSE2 requirement is already enforced by the
`#[cfg(target_feature = "sse2")]` predicate.
Individual unsafe blocks are used for intrinsic calls with appropriate
SAFETY comments.
Also adds FIXME reference to llvm#176906 for tracking when this
workaround can be removed.
fix `f16` doctest FIXMEs
tracking issue: https://github.com/rust-lang/rust/issues/116909
Remove a bunch of fixmes, and run docs tests on all targets that (should) support them.
r? tgross35
fix fallback impl for select_unpredictable intrinsic
`intrinsics::select_unpredictable` does not drop the value that is not selected, but the fallback impl did not consider this behavior. This creates an observable difference between Miri and actual execution, and possibly does not play well with the `core::hint` version which has extra logic to drop the value that was not selected.
```rust
#![feature(core_intrinsics)]
fn main() {
core::intrinsics::select_unpredictable(true, LoudDrop(0), LoudDrop(1));
// cargo run: "0"; cargo miri run: "1 0"
}
struct LoudDrop(u8);
impl Drop for LoudDrop {
fn drop(&mut self) {
print!("{} ", self.0);
}
}
```
This change let me remove the `T: [const] Destruct` bound as well, since the destructor is no longer relevant.
Support slices in type reflection
Tracking issue: rust-lang/rust#146922
This PR adds support for inspecting slices `[T]` through type reflection. It does so by adding the new `Slice` struct + variant, which is very similar to `Array` but without the `len` field:
```rust
pub struct Slice {
pub element_ty: TypeId,
}
```
Here is an example reflecting a slice:
```rust
match const { Type::of::<[u8]>() }.kind {
TypeKind::Slice(slice) => assert_eq!(slice.element_ty, TypeId::of::<u8>()),
_ => unreachable!(),
}
```
In addition to this, I also re-arranged the type info unit tests.
r? @oli-obk
Rollup of 5 pull requests
Successful merges:
- rust-lang/rust#151010 (std: use `ByteStr`'s `Display` for `OsStr`)
- rust-lang/rust#151156 (Add GCC and the GCC codegen backend to build-manifest and rustup)
- rust-lang/rust#151219 (Fixed ICE when using function pointer as const generic parameter)
- rust-lang/rust#151343 (Port some crate level attrs to the attribute parser)
- rust-lang/rust#151463 (add x86_64-unknown-linux-gnuasan to CI)
r? @ghost
std: use `ByteStr`'s `Display` for `OsStr`
Besides reducing duplication, this also results in formatting parameters like padding, align and fill being respected.
Move bootstrap configuration to library workspace
This creates a new "dist" profile in the standard library which contains configuration for the distributed std artifacts previously contained in bootstrap, in order for a future build-std implementation to use. bootstrap.toml settings continue to override these defaults, as would any RUSTFLAGS provided. I've left some cargo features driven by bootstrap for a future patch.
Unfortunately, profiles aren't expressive enough to express per-target overrides, so [this risc-v example](c8f22ca269/src/bootstrap/src/core/build_steps/compile.rs (L692)) was not able to be moved across. This could go in its own profile which Cargo would have to know to use, and then the panic-abort rustflags overrides would need duplicating again. Doesn't seem like a sustainable solution as a couple similar overrides would explode the number of lines here.
We could use a cargo config in the library workspace for this, but this then would have to be respected by Cargo's build-std implementation and I'm not yet sure about the tradeoffs there.
This patch also introduces a build probe to deal with the test crate's stability which is obviously not ideal, I'm open to other solutions here or can back that change out for now if anyone prefers.
cc @Mark-Simulacrum https://github.com/rust-lang/rfcs/pull/3874
make `simd_insert_dyn` and `simd_extract_dyn` const
For use in `stdarch`. We currently use an equivalent of the fallback body here, but on some targets the intrinsic generate better code.
r? RalfJung
Replace `#[rustc_do_not_implement_via_object]` with `#[rustc_dyn_incompatible_trait]`
Background: `#[rustc_do_not_implement_via_object]` on a trait currently still allows `dyn Trait` to exist (if the trait is otherwise dyn-compatible), it just means that `dyn Trait` does not automatically implement `Trait` via the normal object candidate. For some traits, this means that `dyn Trait` does not implement `Trait` at all (e.g. `Unsize` and `Tuple`). For some traits, this means that `dyn Trait` implements `Trait`, but with different associated types (e.g. `Pointee`, `DiscriminantKind`). Both of these cases can can cause issues with codegen , as seen in https://github.com/rust-lang/rust/issues/148089 (and https://github.com/rust-lang/rust/issues/148089#issuecomment-3447803823 ), because codegen assumes that if `dyn Trait` does not implement `Trait` (including if `dyn Trait<Assoc = T>` does not implement `Trait` with `Assoc == T`), then `dyn Trait` cannot be constructed, so vtable accesses on `dyn Trait` are unreachable, but this is not the case if `dyn Trait` has multiple supertraits: one which is `#[rustc_do_not_implement_via_object]`, and one which we are doing the vtable access to call a method from.
This PR replaces `#[rustc_do_not_implement_via_object]` with `#[rustc_dyn_incompatible_trait]`, which makes the marked trait dyn-incompatible, making `dyn Trait` not well-formed, instead of it being well-formed but not implementing `Trait`. This resolvesrust-lang/rust#148089 by making it not compile.
May fixrust-lang/rust#148615
The traits that are currently marked `#[rustc_do_not_implement_via_object]` are: `Sized`, `MetaSized`, `PointeeSized`, `TransmuteFrom`, `Unsize`, `BikeshedGuaranteedNoDrop`, `DiscriminantKind`, `Destruct`, `Tuple`, `FnPtr`, `Pointee`. Of these:
* `Sized` and `FnPtr` are already not dyn-compatible (`FnPtr: Copy`, which implies `Sized`)
* `MetaSized`
* Removed `#[rustc_do_not_implement_via_object]`. Still dyn-compatible after this change. (Has a special-case in the trait solvers to ignore the object candidate for `dyn MetaSized`, since it `dyn MetaSized: MetSized` comes from the sized candidate that all `dyn Trait` get.)
* `PointeeSized`
* Removed `#[rustc_do_not_implement_via_object]`. It doesn't seem to have been doing anything anyway ([playground](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2024&gist=a395626c8bef791b87a2d371777b7841)), since `PointeeSized` is removed before trait solving(?).
* `Pointee`, `DiscriminantKind`, `Unsize`, and `Tuple` being dyn-compatible without having `dyn Trait: Trait` (with same assoc tys) can be observed to cause codegen issues (https://github.com/rust-lang/rust/issues/148089) so should be made dyn-incompatible
* `Destruct`, `TransmuteFrom`, and `BikeshedGuaranteedNoDrop` I'm not sure if would be useful as object types, but they can be relaxed to being dyn-compatible later if it is determined they should be.
-----
<details> <summary> resolved </summary>
Questions before merge:
1. `dyn MetaSized: MetaSized` having both `SizedCandidate` and `ObjectCandidate`
1. I'm not sure if the checks in compiler/rustc_trait_selection/src/traits/project.rs and compiler/rustc_next_trait_solver/src/solve/assembly/mod.rs were "load-bearing" for `MetaSized` (which is the only trait that was previously `#[rustc_do_not_implement_via_object]` that is still dyn-compatible after this change). Is it fine to just remove them? Removing them (as I did in the second commit) doesn't change any UI test results.
3. IIUC, `dyn MetaSized` could get its `MetaSized` implementation in two ways: the object candidate (the normal `dyn Trait: Trait`) that was supressed by `#[rustc_do_not_implement_via_object]`, and the `SizedCandidate` (that all `dyn Trait` get for `dyn Trait: MetaSized`). Given that `MetaSized` has no associated types or methods, is it fine that these both exist now? Or is it better to only have the `SizedCandidate` and leave these checks in (i.e. drop the second commit, and remove the FIXMEs)?
4. Resolved: the trait solvers special-case `dyn MetaSized` to ignore the object candidate in preference to the sizedness candidate (technically the check is for any `is_sizedness_trait`, but only `MetaSized` gets this far (`Sized` is inherently dyn-incompatible, and `dyn PointeeSized` is ill-formed for other reasons)
4. Diagnostics improvements?
1. The diagnostics are kinda bad. If you have a `trait Foo: Pointee {}`, you now get a note that reads like *Foo* "opted out of dyn-compatbility", when really `Pointee` did that.
2. Resolved: can be improved later
<details> <summary>diagnostic example</summary>
```rs
#![feature(ptr_metadata)]
trait C: std::ptr::Pointee {}
fn main() {
let y: &dyn C;
}
```
```rs
error[E0038]: the trait `C` is not dyn compatible
--> c.rs:6:17
|
6 | let y: &dyn C;
| ^ `C` is not dyn compatible
|
note: for a trait to be dyn compatible it needs to allow building a vtable
for more information, visit <https://doc.rust-lang.org/reference/items/traits.html#dyn-compatibility>
--> /home/zachary/opt_mount/zachary/Programming/rust-compiler-2/library/core/src/ptr/metadata.rs:57:1
|
57 | #[rustc_dyn_incompatible_trait]
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ...because it opted out of dyn-compatbility
|
::: c.rs:3:7
|
3 | trait C: std::ptr::Pointee {}
| - this trait is not dyn compatible...
error: aborting due to 1 previous error
For more information about this error, try `rustc --explain E0038`.
```
</details> </details>
Still investigating "3. `compiler/rustc_hir/src/attrs/encode_cross_crate.rs`: Should `DynIncompatibleTrait` attribute be encoded cross crate?"