Inclusive Range Performance

Inclusive ranges may hurt performance compared to the equivalent exclusive range. Published Oct 08, 2023

I got baited by this r/rust post on Zig being faster than Rust, which is not the focus of this tidbit, rather this comment in the same post:

Try to change this:
let sqrt = (n as f32).sqrt() as u32;
!(3..=sqrt)
...to:
let sqrt = (n as f32).sqrt() as u32 + 1;
!(3..sqrt) ...
...just for fun and see what happens. The ..= range can be significantly slower than the .. range because in some cases it compiles to two separate comparisons instead of one. I don't know if it will make a difference in your case but I'm curious.

u/ObligatoryOption

After a short excursion, I learned something new!

Inclusive ranges are slower compared to their equivalent exclusive range.

The inclusive range version requires doing additional checks since the upper bound may be the largest value for the integer type. Without the check, the iteration variable will overflow and cause an infinite loop, see also this comment by CAD1997. These checks also result in less optimization opportunities.

Rust

pub fn exclusive(upper_limit: u64) -> u64 {
    let mut sum = 0;
    for _ in 1..(upper_limit + 1) {
        sum += std::hint::black_box(1);
    }
    sum
}

pub fn inclusive(upper_limit: u64) -> u64 {
    let mut sum = 0;
    for _ in 1..=upper_limit {
        sum += std::hint::black_box(1);
    }
    sum
}

Assembly

example::exclusive:
        lea     rax, [rdi - 1]
        cmp     rax, -3
        ja      .LBB0_1
        xor     eax, eax
        lea     rcx, [rsp - 8]
.LBB0_4:
        mov     qword ptr [rsp - 8], 1
        add     rax, qword ptr [rsp - 8]
        dec     rdi
        jne     .LBB0_4
        ret
.LBB0_1:
        xor     eax, eax
        ret

example::inclusive:
        test    rdi, rdi
        je      .LBB1_1
        mov     ecx, 1
        xor     eax, eax
        lea     rdx, [rsp - 8]
.LBB1_4:
        mov     rsi, rcx
        cmp     rcx, rdi
        adc     rcx, 0
        mov     qword ptr [rsp - 8], 1
        add     rax, qword ptr [rsp - 8]
        cmp     rsi, rdi
        jae     .LBB1_2
        cmp     rcx, rdi
        jbe     .LBB1_4
.LBB1_2:
        ret
.LBB1_1:
        xor     eax, eax
        ret

Benchmark

The benchmark was done using Criterion:

fn benchmark(c: &mut criterion::Criterion) {
    let mut group = c.benchmark_group("Iteration");
    for limit in &[256, 512, 1024, 2048, 4096, 8192] {
        group.bench_with_input(
            criterion::BenchmarkId::new("Exclusive", limit),
            limit,
            |b, i| b.iter(|| exclusive(*i)),
        );
        group.bench_with_input(
            criterion::BenchmarkId::new("Inclusive", limit),
            limit,
            |b, i| b.iter(|| inclusive(*i)),
        );
    }
    group.finish();
}

criterion::criterion_group!(benches, benchmark);
criterion::criterion_main!(benches);

And the output:

As you can see, the exclusive range version performs about twice as fast as the inclusive range version.

Toggle fullscreen

let sqrt: u32

Toggle fullscreen

f32

A 32-bit floating-point type (specifically, the “binary32” type defined in IEEE 754-2008).

This type can represent a wide range of decimal numbers, like 3.5, 27, -113.75, 0.0078125, 34359738368, 0, -1. So unlike integer types (such as i32), floating-point types can represent non-integer numbers, too.

However, being able to represent this wide range of numbers comes at the cost of precision: floats can only represent some of the real numbers and calculation with floats round to a nearby representable number. For example, 5.0 and 1.0 can be exactly represented as f32, but 1.0 / 5.0 results in 0.20000000298023223876953125 since 0.2 cannot be exactly represented as f32. Note, however, that printing floats with println and friends will often discard insignificant digits: println!("{}", 1.0f32 / 5.0f32) will print 0.2.

Additionally, f32 can represent some special values:

−0.0: IEEE 754 floating-point numbers have a bit that indicates their sign, so −0.0 is apossible value. For comparison −0.0 = +0.0, but floating-point operations can carrythe sign bit through arithmetic operations. This means −0.0 × +0.0 produces −0.0 anda negative number rounded to a value smaller than a float can represent also produces −0.0.
∞ and−∞: these result from calculationslike 1.0 / 0.0.
NaN (not a number): this value results fromcalculations like (-1.0).sqrt(). NaN has some potentially unexpectedbehavior:
- It is not equal to any float, including itself! This is the reason f32doesn’t implement the Eq trait.
- It is also neither smaller nor greater than any float, making itimpossible to sort by the default comparison operation, which is thereason f32 doesn’t implement the Ord trait.
- It is also considered infectious as almost all calculations where oneof the operands is NaN will also result in NaN. The explanations on thispage only explicitly document behavior on NaN operands if this defaultis deviated from.
- Lastly, there are multiple bit patterns that are considered NaN.Rust does not currently guarantee that the bit patterns of NaN arepreserved over arithmetic operations, and they are not guaranteed to beportable or even fully deterministic! This means that there may be somesurprising results upon inspecting the bit patterns,as the same calculations might produce NaNs with different bit patterns.This also affects the sign of the NaN: checking is_sign_positive or is_sign_negative ona NaN is the most common way to run into these surprising results.(Checking x >= 0.0 or x <= 0.0 avoids those surprises, but also how negative/positivezero are treated.)See the section below for what exactly is guaranteed about the bit pattern of a NaN.

When a primitive operation (addition, subtraction, multiplication, or division) is performed on this type, the result is rounded according to the roundTiesToEven direction defined in IEEE 754-2008. That means:

The result is the representable value closest to the true value, if thereis a unique closest representable value.
If the true value is exactly half-way between two representable values,the result is the one with an even least-significant binary digit.
If the true value’s magnitude is ≥ f32::MAX + 2^{(f32::MAX_EXP −f32::MANTISSA_DIGITS − 1)}, the result is ∞ or −∞ (preserving thetrue value’s sign).
If the result of a sum exactly equals zero, the outcome is +0.0 unlessboth arguments were negative, then it is -0.0. Subtraction a - b isregarded as a sum a + (-b).

For more information on floating-point numbers, see Wikipedia.

NaN bit patterns

This section defines the possible NaN bit patterns returned by floating-point operations.

The bit pattern of a floating-point NaN value is defined by:

a sign bit.
a quiet/signaling bit. Rust assumes that the quiet/signaling bit being set to 1 indicates aquiet NaN (QNaN), and a value of 0 indicates a signaling NaN (SNaN). In the following wewill hence just call it the “quiet bit”.
a payload, which makes up the rest of the significand (i.e., the mantissa) except for thequiet bit.

The rules for NaN values differ between arithmetic and non-arithmetic (or “bitwise”) operations. The non-arithmetic operations are unary -, abs, copysign, signum, {to,from}_bits, {to,from}_{be,le,ne}_bytes and is_sign_{positive,negative}. These operations are guaranteed to exactly preserve the bit pattern of their input except for possibly changing the sign bit.

The following rules apply when a NaN value is returned from an arithmetic operation:

The result has a non-deterministic sign.
The quiet bit and payload are non-deterministically chosen from the following set of options:
- Preferred NaN: The quiet bit is set and the payload is all-zero.
- Quieting NaN propagation: The quiet bit is set and the payload is copied from any inputoperand that is a NaN. If the inputs and outputs do not have the same payload size (i.e., foras casts), then
  - If the output is smaller than the input, low-order bits of the payload get dropped.
  - If the output is larger than the input, the payload gets filled up with 0s in the low-orderbits.
- Unchanged NaN propagation: The quiet bit and payload are copied from any input operandthat is a NaN. If the inputs and outputs do not have the same size (i.e., for as casts), thesame rules as for “quieting NaN propagation” apply, with one caveat: if the output is smallerthan the input, dropping the low-order bits may result in a payload of 0; a payload of 0 is notpossible with a signaling NaN (the all-0 significand encodes an infinity) so unchanged NaNpropagation cannot occur with some inputs.
- Target-specific NaN: The quiet bit is set and the payload is picked from a target-specificset of “extra” possible NaN payloads. The set can depend on the input operand values.See the table below for the concrete NaNs this set contains on various targets.

In particular, if all input NaNs are quiet (or if there are no input NaNs), then the output NaN is definitely quiet. Signaling NaN outputs can only occur if they are provided as an input value. Similarly, if all input NaNs are preferred (or if there are no input NaNs) and the target does not have any “extra” NaN payloads, then the output NaN is guaranteed to be preferred.

The non-deterministic choice happens when the operation is executed; i.e., the result of a NaN-producing floating-point operation is a stable bit pattern (looking at these bits multiple times will yield consistent results), but running the same operation twice with the same inputs can produce different results.

These guarantees are neither stronger nor weaker than those of IEEE 754: IEEE 754 guarantees that an operation never returns a signaling NaN, whereas it is possible for operations like SNAN * 1.0 to return a signaling NaN in Rust. Conversely, IEEE 754 makes no statement at all about which quiet NaN is returned, whereas Rust restricts the set of possible results to the ones listed above.

Unless noted otherwise, the same rules also apply to NaNs returned by other library functions (e.g. min, minimum, max, maximum); other aspects of their semantics and which IEEE 754 operation they correspond to are documented with the respective functions.

When an arithmetic floating-point operation is executed in const context, the same rules apply: no guarantee is made about which of the NaN bit patterns described above will be returned. The result does not have to match what happens when executing the same code at runtime, and the result can vary depending on factors such as compiler version and flags.

Target-specific “extra” NaN values

`target_arch`	Extra payloads possible on this platform
`aarch64`, `arm`, `arm64ec`, `loongarch64`, `powerpc` (except when `target_abi = "spe"`), `powerpc64`, `riscv32`, `riscv64`, `s390x`, `x86`, `x86_64`	None
`nvptx64`	All payloads
`sparc`, `sparc64`	The all-one payload
`wasm32`, `wasm64`	If all input NaNs are quiet with all-zero payload: None. Otherwise: all payloads.

For targets not in this table, all payloads are possible.

Algebraic operators

Algebraic operators of the form a.algebraic_*(b) allow the compiler to optimize floating point operations using all the usual algebraic properties of real numbers – despite the fact that those properties do not hold on floating point numbers. This can give a great performance boost since it may unlock vectorization.

The exact set of optimizations is unspecified but typically allows combining operations, rearranging series of operations based on mathematical properties, converting between division and reciprocal multiplication, and disregarding the sign of zero. This means that the results of elementary operations may have undefined precision, and “non-mathematical” values such as NaN, +/-Inf, or -0.0 may behave in unexpected ways, but these operations will never cause undefined behavior.

Because of the unpredictable nature of compiler optimizations, the same inputs may produce different results even within a single program run. Unsafe code must not rely on any property of the return value for soundness. However, implementations will generally do their best to pick a reasonable tradeoff between performance and accuracy of the result.

For example:

x = a.algebraic_add(b).algebraic_add(c).algebraic_add(d);

May be rewritten as:

x = a + b + c + d; // As written
x = (a + c) + (b + d); // Reordered to shorten critical path and enable vectorization

Toggle fullscreen

std::f32

pub fn sqrt(self) -> f32

Returns the square root of a number.

Returns NaN if self is a negative number other than -0.0.

Precision

The result of this operation is guaranteed to be the rounded infinite-precision result. It is specified by IEEE 754 as squareRoot and guaranteed not to change.

Examples

let positive = 4.0_f32;
let negative = -4.0_f32;
let negative_zero = -0.0_f32;

assert_eq!(positive.sqrt(), 2.0);
assert!(negative.sqrt().is_nan());
assert!(negative_zero.sqrt() == negative_zero);

Toggle fullscreen

u32

The 32-bit unsigned integer type.

Toggle fullscreen

codeintel::block_8024ad63dff2a0c2

pub fn exclusive(upper_limit: u64) -> u64

Toggle fullscreen

upper_limit: u64

Toggle fullscreen

u64

The 64-bit unsigned integer type.

Toggle fullscreen

let mut sum: u64

Toggle fullscreen

extern crate std

The Rust Standard Library

The Rust Standard Library is the foundation of portable Rust software, a set of minimal and battle-tested shared abstractions for the broader Rust ecosystem. It offers core types, like [Vec<T>] and [Option<T>], library-defined operations on language primitives, standard macros, [I/O] and [multithreading], among many other things.

std is available to all Rust crates by default. Therefore, the standard library can be accessed in use statements through the path std, as in use std::env.

How to read this documentation

If you already know the name of what you are looking for, the fastest way to find it is to use the search button at the top of the page.

Otherwise, you may want to jump to one of these useful sections:

If this is your first time, the documentation for the standard library is written to be casually perused. Clicking on interesting things should generally lead you to interesting places. Still, there are important bits you don’t want to miss, so read on for a tour of the standard library and its documentation!

Once you are familiar with the contents of the standard library you may begin to find the verbosity of the prose distracting. At this stage in your development you may want to press the “ Summary” button near the top of the page to collapse it into a more skimmable view.

While you are looking at the top of the page, also notice the “Source” link. Rust’s API documentation comes with the source code and you are encouraged to read it. The standard library source is generally high quality and a peek behind the curtains is often enlightening.

What is in the standard library documentation?

First of all, The Rust Standard Library is divided into a number of focused modules, all listed further down this page. These modules are the bedrock upon which all of Rust is forged, and they have mighty names like [std::slice] and [std::cmp]. Modules’ documentation typically includes an overview of the module along with examples, and are a smart place to start familiarizing yourself with the library.

Second, implicit methods on primitive types are documented here. This can be a source of confusion for two reasons:

While primitives are implemented by the compiler, the standard libraryimplements methods directly on the primitive types (and it is the onlylibrary that does so), which are documented in the section on primitives.
The standard library exports many modules with the same name as primitive types. These define additional items related to the primitivetype, but not the all-important methods.

So for example there is a page for the primitive type char that lists all the methods that can be called on characters (very useful), and there is a page for the module std::char that documents iterator and error types created by these methods (rarely useful).

Note the documentation for the primitives [str] and [T] (also called ‘slice’). Many method calls on String and [Vec<T>] are actually calls to methods on [str] and [T] respectively, via deref coercions.

Third, the standard library defines [The Rust Prelude], a small collection of items - mostly traits - that are imported into every module of every crate. The traits in the prelude are pervasive, making the prelude documentation a good entry point to learning about the library.

And finally, the standard library exports a number of standard macros, and lists them on this page (technically, not all of the standard macros are defined by the standard library - some are defined by the compiler - but they are documented here the same). Like the prelude, the standard macros are imported by default into all crates.

Contributing changes to the documentation

Check out the Rust contribution guidelines here. The source for this documentation can be found on GitHub in the ‘library/std/’ directory. To contribute changes, make sure you read the guidelines first, then submit pull-requests for your suggested changes.

Contributions are appreciated! If you see a part of the docs that can be improved, submit a PR, or chat with us first on Zulip #docs.

A Tour of The Rust Standard Library

The rest of this crate documentation is dedicated to pointing out notable features of The Rust Standard Library.

Containers and collections

The option and result modules define optional and error-handling types, [Option<T>] and [Result<T, E>]. The iter module defines Rust’s iterator trait, Iterator, which works with the for loop to access collections.

The standard library exposes three common ways to deal with contiguous regions of memory:

[Vec<T>] - A heap-allocated vector that is resizable at runtime.
[T; N] - An inline array with a fixed size at compile time.
[T] - A dynamically sized slice into any other kind of contiguousstorage, whether heap-allocated or not.

Slices can only be handled through some kind of pointer, and as such come in many flavors such as:

&[T] - shared slice
&mut [T] - mutable slice
Box<[T]> - owned slice

[str], a UTF-8 string slice, is a primitive type, and the standard library defines many methods for it. Rust [str]s are typically accessed as immutable references: &str. Use the owned String for building and mutating strings.

For converting to strings use the format macro, and for converting from strings use the [FromStr] trait.

Data may be shared by placing it in a reference-counted box or the [Rc] type, and if further contained in a [Cell] or [RefCell], may be mutated as well as shared. Likewise, in a concurrent setting it is common to pair an atomically-reference-counted box, [Arc], with a [Mutex] to get the same effect.

The collections module defines maps, sets, linked lists and other typical collection types, including the common [HashMap<K, V>].

Platform abstractions and I/O

Besides basic data types, the standard library is largely concerned with abstracting over differences in common platforms, most notably Windows and Unix derivatives.

Common types of I/O, including [files], [TCP], and [UDP], are defined in the io, fs, and net modules.

The thread module contains Rust’s threading abstractions. sync contains further primitive shared memory types, including [atomic], [mpmc] and [mpsc], which contains the channel types for message passing.

Use before and after `main()`

Many parts of the standard library are expected to work before and after main(); but this is not guaranteed or ensured by tests. It is recommended that you write your own tests and run them on each platform you wish to support. This means that use of std before/after main, especially of features that interact with the OS or global state, is exempted from stability and portability guarantees and instead only provided on a best-effort basis. Nevertheless bug reports are appreciated.

On the other hand core and alloc are most likely to work in such environments with the caveat that any hookable behavior such as panics, oom handling or allocators will also depend on the compatibility of the hooks.

Some features may also behave differently outside main, e.g. stdio could become unbuffered, some panics might turn into aborts, backtraces might not get symbolicated or similar.

Non-exhaustive list of known limitations:

after-main use of thread-locals, which also affects additional features:
- thread::current
under UNIX, before main, file descriptors 0, 1, and 2 may be unchanged(they are guaranteed to be open during main,and are opened to /dev/null O_RDWR if they weren’t open on program start)

Toggle fullscreen

core

pub mod hint

Hints to compiler that affects how code should be emitted or optimized.

Hints may be compile time or runtime.

Toggle fullscreen

core::hint

pub const fn black_box<T>(dummy: T) -> T

An identity function that hints to the compiler to be maximally pessimistic about what black_box could do.

Unlike [std::convert::identity], a Rust compiler is encouraged to assume that black_box can use dummy in any possible valid way that Rust code is allowed to without introducing undefined behavior in the calling code. This property makes black_box useful for writing code in which certain optimizations are not desired, such as benchmarks.

Note however, that black_box is only (and can only be) provided on a “best-effort” basis. The extent to which it can block optimisations may vary depending upon the platform and code-gen backend used. Programs cannot rely on black_box for correctness, beyond it behaving as the identity function. As such, it must not be relied upon to control critical program behavior. This also means that this function does not offer any guarantees for cryptographic or security purposes.

This limitation is not specific to black_box; there is no mechanism in the entire Rust language that can provide the guarantees required for constant-time cryptography. (There is also no such mechanism in LLVM, so the same is true for every other LLVM-based compiler.)

When is this useful?

While not suitable in those mission-critical cases, black_box’s functionality can generally be relied upon for benchmarking, and should be used there. It will try to ensure that the compiler doesn’t optimize away part of the intended test code based on context. For example:

fn contains(haystack: &[&str], needle: &str) -> bool {
    haystack.iter().any(|x| x == &needle)
}

pub fn benchmark() {
    let haystack = vec!["abc", "def", "ghi", "jkl", "mno"];
    let needle = "ghi";
    for _ in 0..10 {
        contains(&haystack, needle);
    }
}

The compiler could theoretically make optimizations like the following:

The needle and haystack do not change, move the call to contains outside the loop anddelete the loop
Inline contains
needle and haystack have values known at compile time, contains is always true. Removethe call and replace with true
Nothing is done with the result of contains: delete this function call entirely
benchmark now has no purpose: delete this function

It is not likely that all of the above happens, but the compiler is definitely able to make some optimizations that could result in a very inaccurate benchmark. This is where black_box comes in:

use std::hint::black_box;

// Same `contains` function.
fn contains(haystack: &[&str], needle: &str) -> bool {
    haystack.iter().any(|x| x == &needle)
}

pub fn benchmark() {
    let haystack = vec!["abc", "def", "ghi", "jkl", "mno"];
    let needle = "ghi";
    for _ in 0..10 {
        // Force the compiler to run `contains`, even though it is a pure function whose
        // results are unused.
        black_box(contains(
            // Prevent the compiler from making assumptions about the input.
            black_box(&haystack),
            black_box(needle),
        ));
    }
}

This essentially tells the compiler to block optimizations across any calls to black_box. So, it now:

Treats both arguments to contains as unpredictable: the body of contains can no longer beoptimized based on argument values
Treats the call to contains and its result as volatile: the body of benchmark cannotoptimize this away

This makes our benchmark much more realistic to how the function would actually be used, where arguments are usually not known at compile time and the result is used in some way.

How to use this

In practice, black_box serves two purposes:

It prevents the compiler from making optimizations related to the value returned by black_box
It forces the value passed to black_box to be calculated, even if the return value of black_box is unused

use std::hint::black_box;

let zero = 0;
let five = 5;

// The compiler will see this and remove the `* five` call, because it knows that multiplying
// any integer by 0 will result in 0.
let c = zero * five;

// Adding `black_box` here disables the compiler's ability to reason about the first operand in the multiplication.
// It is forced to assume that it can be any possible number, so it cannot remove the `* five`
// operation.
let c = black_box(zero) * five;

While most cases will not be as clear-cut as the above example, it still illustrates how black_box can be used. When benchmarking a function, you usually want to wrap its inputs in black_box so the compiler cannot make optimizations that would be unrealistic in real-life use.

use std::hint::black_box;

// This is a simple function that increments its input by 1. Note that it is pure, meaning it
// has no side-effects. This function has no effect if its result is unused. (An example of a
// function *with* side-effects is `println!()`.)
fn increment(x: u8) -> u8 {
    x + 1
}

// Here, we call `increment` but discard its result. The compiler, seeing this and knowing that
// `increment` is pure, will eliminate this function call entirely. This may not be desired,
// though, especially if we're trying to track how much time `increment` takes to execute.
let _ = increment(black_box(5));

// Here, we force `increment` to be executed. This is because the compiler treats `black_box`
// as if it has side-effects, and thus must compute its input.
let _ = black_box(increment(black_box(5)));

There may be additional situations where you want to wrap the result of a function in black_box to force its execution. This is situational though, and may not have any effect (such as when the function returns a zero-sized type such as () unit).

Note that black_box has no effect on how its input is treated, only its output. As such, expressions passed to black_box may still be optimized:

use std::hint::black_box;

// The compiler sees this...
let y = black_box(5 * 10);

// ...as this. As such, it will likely simplify `5 * 10` to just `50`.
let _0 = 5 * 10;
let y = black_box(_0);

In the above example, the 5 * 10 expression is considered distinct from the black_box call, and thus is still optimized by the compiler. You can prevent this by moving the multiplication operation outside of black_box:

use std::hint::black_box;

// No assumptions can be made about either operand, so the multiplication is not optimized out.
let y = black_box(5) * black_box(10);

During constant evaluation, black_box is treated as a no-op.

Toggle fullscreen

codeintel::block_8024ad63dff2a0c2

pub fn inclusive(upper_limit: u64) -> u64

Toggle fullscreen

codeintel::block_76e55356ba7d88a5

fn benchmark(c: &mut criterion::Criterion)

Toggle fullscreen

c: &mut {unknown}

Toggle fullscreen

let mut group: {unknown}

Toggle fullscreen

let limit: &i32

Toggle fullscreen

b: {unknown}

Toggle fullscreen

i: {unknown}