On Ad-Hoc Polymorphism and Type Safety

Summary

In the post, the author states this:

I'll focus on the Rust example, and then explain why I think the premise of the article is flawed.

Ignoring the poor unidiomatic use in the "ad-hoc polymorphic" example, the author shows two versions of allow_list_count, one which assumes the type of the allow_list field, the other relying on its implementation of the IntoIterator trait.

The author then introduces a "refactor" to the original data type:

Their claim is then that the "ad-hoc polymorphic" version will still compile and produce an unexpected result, while the "monomorphic" version will fail to compile.

Counterexample

The reason why I believe the argument does not hold is because a different refactor (with a similar rationale as provided by the author) will result in the "monomorphic" version being incorrect too!

Note that this situation is plausible and not unlike the one given by the author the type of the allow_list field is changed to match an according change in semantics. Now, the "monomorphic" version compiles, but since it also includes clients with enabled = false the count is incorrect! Fundamentally, a change in semantics may break any code relying on it, regardless of how that code is implemented.

  1. The example uses a boolean field here for simplicity, but "best practice" dictates that we use an enum with two variants (eg. Access::Allow and Access::Disallow).

  1. For the "ad-hoc polymorphic" code:

    fn (: Settings) ->  {
        .allow_list.into_iter().count()
    }

    a. The semantic assumption is that settings.allow_list is a collection of allowed clients.
    b. The code assumption is that settings.allow_list is an iterable (ie. can be turned into an iterator).

    The change of type from Vec<ClientIdentifier> to Option<Vec<ClientIdentifier>> breaks the first assumption but not the second. Since the second assumption is not violated, the code compiles.

  2. For the "monomorphic" code:

    fn (: Settings) ->  {
        ::(.allow_list)
    }

    a. The semantic assumption is that settings.allow_list is a collection of allowed clients.
    b. The code assumption is that settings.allow_list is a vector (specifically, std::vec::Vec).

    The change of type from Vec<ClientIdentifier> to Option<Vec<ClientIdentifier>> breaks both the first and second assumption. Since the second assumption is violated, the code fails to compile.

    However, the change of type from Vec<ClientIdentifier> to Vec<Client> breaks the first assumption but not the second. Since the second assumption is not violated, the code compiles. It is hopefully clear that we are in the same situation as 1. where the code is now incorrect despite compiling.

codeintel::block_b983937aa0f5dd81
struct Settings {
    allow_list: Vec<{unknown}>,
}
codeintel::block_b983937aa0f5dd81::Settings
allow_list: Vec<{unknown}, Global>
alloc::vec
pub struct Vec<T, A = Global>
where
    A: Allocator,
{
    buf: RawVec<T, A>,
    len: usize,
}

A contiguous growable array type, written as Vec<T>, short for ‘vector’.

Examples

let mut vec = Vec::new();
vec.push(1);
vec.push(2);

assert_eq!(vec.len(), 2);
assert_eq!(vec[0], 1);

assert_eq!(vec.pop(), Some(2));
assert_eq!(vec.len(), 1);

vec[0] = 7;
assert_eq!(vec[0], 7);

vec.extend([1, 2, 3]);

for x in &vec {
    println!("{x}");
}
assert_eq!(vec, [7, 1, 2, 3]);

The vec macro is provided for convenient initialization:

let mut vec1 = vec![1, 2, 3];
vec1.push(4);
let vec2 = Vec::from([1, 2, 3, 4]);
assert_eq!(vec1, vec2);

It can also initialize each element of a Vec<T> with a given value. This may be more efficient than performing allocation and initialization in separate steps, especially when initializing a vector of zeros:

let vec = vec![0; 5];
assert_eq!(vec, [0, 0, 0, 0, 0]);

// The following is equivalent, but potentially slower:
let mut vec = Vec::with_capacity(5);
vec.resize(5, 0);
assert_eq!(vec, [0, 0, 0, 0, 0]);

For more information, see Capacity and Reallocation.

Use a Vec<T> as an efficient stack:

let mut stack = Vec::new();

stack.push(1);
stack.push(2);
stack.push(3);

while let Some(top) = stack.pop() {
    // Prints 3, 2, 1
    println!("{top}");
}

Indexing

The Vec type allows access to values by index, because it implements the Index trait. An example will be more explicit:

let v = vec![0, 2, 4, 6];
println!("{}", v[1]); // it will display '2'

However be careful: if you try to access an index which isn’t in the Vec, your software will panic! You cannot do this:

let v = vec![0, 2, 4, 6];
println!("{}", v[6]); // it will panic!

Use [get] and [get_mut] if you want to check whether the index is in the Vec.

Slicing

A Vec can be mutable. On the other hand, slices are read-only objects. To get a slice, use &. Example:

fn read_slice(slice: &[usize]) {
    // ...
}

let v = vec![0, 1];
read_slice(&v);

// ... and that's all!
// you can also do it like this:
let u: &[usize] = &v;
// or like this:
let u: &[_] = &v;

In Rust, it’s more common to pass slices as arguments rather than vectors when you just want to provide read access. The same goes for [String] and [&str].

Capacity and reallocation

The capacity of a vector is the amount of space allocated for any future elements that will be added onto the vector. This is not to be confused with the length of a vector, which specifies the number of actual elements within the vector. If a vector’s length exceeds its capacity, its capacity will automatically be increased, but its elements will have to be reallocated.

For example, a vector with capacity 10 and length 0 would be an empty vector with space for 10 more elements. Pushing 10 or fewer elements onto the vector will not change its capacity or cause reallocation to occur. However, if the vector’s length is increased to 11, it will have to reallocate, which can be slow. For this reason, it is recommended to use Vec::with_capacity whenever possible to specify how big the vector is expected to get.

Guarantees

Due to its incredibly fundamental nature, Vec makes a lot of guarantees about its design. This ensures that it’s as low-overhead as possible in the general case, and can be correctly manipulated in primitive ways by unsafe code. Note that these guarantees refer to an unqualified Vec<T>. If additional type parameters are added (e.g., to support custom allocators), overriding their defaults may change the behavior.

Most fundamentally, Vec is and always will be a (pointer, capacity, length) triplet. No more, no less. The order of these fields is completely unspecified, and you should use the appropriate methods to modify these. The pointer will never be null, so this type is null-pointer-optimized.

However, the pointer might not actually point to allocated memory. In particular, if you construct a Vec with capacity 0 via Vec::new, vec![], Vec::with_capacity(0), or by calling [shrink_to_fit] on an empty Vec, it will not allocate memory. Similarly, if you store zero-sized types inside a Vec, it will not allocate space for them. Note that in this case the Vec might not report a [capacity] of 0. Vec will allocate if and only if size_of::<T>() * capacity > 0. In general, Vec’s allocation details are very subtle — if you intend to allocate memory using a Vec and use it for something else (either to pass to unsafe code, or to build your own memory-backed collection), be sure to deallocate this memory by using from_raw_parts to recover the Vec and then dropping it.

If a Vec has allocated memory, then the memory it points to is on the heap (as defined by the allocator Rust is configured to use by default), and its pointer points to [len] initialized, contiguous elements in order (what you would see if you coerced it to a slice), followed by [capacity] - [len] logically uninitialized, contiguous elements.

A vector containing the elements 'a' and 'b' with capacity 4 can be visualized as below. The top part is the Vec struct, it contains a pointer to the head of the allocation in the heap, length and capacity. The bottom part is the allocation on the heap, a contiguous memory block.

            ptr      len  capacity
       +--------+--------+--------+
       | 0x0123 |      2 |      4 |
       +--------+--------+--------+
            |
            v
Heap   +--------+--------+--------+--------+
       |    'a' |    'b' | uninit | uninit |
       +--------+--------+--------+--------+
  • uninit represents memory that is not initialized, see [MaybeUninit].
  • Note: the ABI is not stable and Vec makes no guarantees about its memorylayout (including the order of fields).

Vec will never perform a “small optimization” where elements are actually stored on the stack for two reasons:

  • It would make it more difficult for unsafe code to correctly manipulate a Vec. The contents of a Vec wouldn’t have a stable address if it were only moved, and it would be more difficult to determine if a Vec had actually allocated memory.

  • It would penalize the general case, incurring an additional branch on every access.

Vec will never automatically shrink itself, even if completely empty. This ensures no unnecessary allocations or deallocations occur. Emptying a Vec and then filling it back up to the same [len] should incur no calls to the allocator. If you wish to free up unused memory, use [shrink_to_fit] or [shrink_to].

[push] and [insert] will never (re)allocate if the reported capacity is sufficient. [push] and [insert] will (re)allocate if [len] == [capacity]. That is, the reported capacity is completely accurate, and can be relied on. It can even be used to manually free the memory allocated by a Vec if desired. Bulk insertion methods may reallocate, even when not necessary.

Vec does not guarantee any particular growth strategy when reallocating when full, nor when [reserve] is called. The current strategy is basic and it may prove desirable to use a non-constant growth factor. Whatever strategy is used will of course guarantee O(1) amortized [push].

It is guaranteed, in order to respect the intentions of the programmer, that all of vec![e_1, e_2, ..., e_n], vec![x; n], and [Vec::with_capacity(n)] produce a Vec that requests an allocation of the exact size needed for precisely n elements from the allocator, and no other size (such as, for example: a size rounded up to the nearest power of 2). The allocator will return an allocation that is at least as large as requested, but it may be larger.

It is guaranteed that the [Vec::capacity] method returns a value that is at least the requested capacity and not more than the allocated capacity.

The method Vec::shrink_to_fit will attempt to discard excess capacity an allocator has given to a Vec. If [len] == [capacity], then a Vec<T> can be converted to and from a Box<[T]> without reallocating or moving the elements. Vec exploits this fact as much as reasonable when implementing common conversions such as [into_boxed_slice].

Vec will not specifically overwrite any data that is removed from it, but also won’t specifically preserve it. Its uninitialized memory is scratch space that it may use however it wants. It will generally just do whatever is most efficient or otherwise easy to implement. Do not rely on removed data to be erased for security purposes. Even if you drop a Vec, its buffer may simply be reused by another allocation. Even if you zero a Vec’s memory first, that might not actually happen because the optimizer does not consider this a side-effect that must be preserved. There is one case which we will not break, however: using unsafe code to write to the excess capacity, and then increasing the length to match, is always valid.

Currently, Vec does not guarantee the order in which elements are dropped. The order has changed in the past and may change again.

codeintel::block_c1f6ae03a7868e31
fn allow_list_count(settings: Settings) -> usize
settings: {unknown}
usize

The pointer-sized unsigned integer type.

The size of this primitive is how many bytes it takes to reference any location in memory. For example, on a 32 bit target, this is 4 bytes and on a 64 bit target, this is 8 bytes.

alloc::vec::Vec
impl<T, A> Vec<T, A>
pub const fn len(&self) -> usize
where
    // Bounds from impl:
    A: Allocator,

Returns the number of elements in the vector, also referred to as its ‘length’.

Examples

let a = vec![1, 2, 3];
assert_eq!(a.len(), 3);
codeintel::block_5f55c94274abc387
struct Settings {
    allow_list: Option<Vec<{unknown}>>,
}
codeintel::block_5f55c94274abc387::Settings
allow_list: Option<Vec<{unknown}, Global>>
core::option
pub enum Option<T> {
    None,
    Some( /* … */ ),
}

The Option type. See the module level documentation for more.

std::macros
macro_rules! println

Prints to the standard output, with a newline.

On all platforms, the newline is the LINE FEED character (\n/U+000A) alone (no additional CARRIAGE RETURN (\r/U+000D)).

This macro uses the same syntax as format, but writes to the standard output instead. See [std::fmt] for more information.

The println! macro will lock the standard output on each call. If you call println! within a hot loop, this behavior may be the bottleneck of the loop. To avoid this, lock stdout with io::stdout().lock():

use std::io::{stdout, Write};

let mut lock = stdout().lock();
writeln!(lock, "hello world").unwrap();

Use println! only for the primary output of your program. Use [eprintln!] instead to print error and progress messages.

See the formatting documentation in std::fmt for details of the macro argument syntax.

Panics

Panics if writing to [io::stdout] fails.

Writing to non-blocking stdout can cause an error, which will lead this macro to panic.

Examples

println!(); // prints just a newline
println!("hello there!");
println!("format {} arguments", "some");
let local_variable = "some";
println!("format {local_variable} arguments");
core::option::Option
Some(T)

Some value of type T.

alloc::macros
macro_rules! vec

Creates a [Vec] containing the arguments.

vec! allows Vecs to be defined with the same syntax as array expressions. There are two forms of this macro:

  • Create a [Vec] containing a given list of elements:
let v = vec![1, 2, 3];
assert_eq!(v[0], 1);
assert_eq!(v[1], 2);
assert_eq!(v[2], 3);
  • Create a [Vec] from a given element and size:
let v = vec![1; 3];
assert_eq!(v, [1, 1, 1]);

Note that unlike array expressions this syntax supports all elements which implement Clone and the number of elements doesn’t have to be a constant.

This will use clone to duplicate an expression, so one should be careful using this with types having a nonstandard Clone implementation. For example, vec![Rc::new(1); 5] will create a vector of five references to the same boxed integer value, not five references pointing to independently boxed integers.

Also, note that vec![expr; 0] is allowed, and produces an empty vector. This will still evaluate expr, however, and immediately drop the resulting value, so be mindful of side effects.

core::option::Option
impl<T> IntoIterator for Option<T>
fn into_iter(self) -> IntoIter<T>

Returns a consuming iterator over the possibly contained value.

Examples

let x = Some("string");
let v: Vec<&str> = x.into_iter().collect();
assert_eq!(v, ["string"]);

let x = None;
let v: Vec<&str> = x.into_iter().collect();
assert!(v.is_empty());
core::iter::traits::iterator::Iterator
pub trait Iterator
pub fn count(self) -> usize
where
    Self: Sized,

Consumes the iterator, counting the number of iterations and returning it.

This method will call [next] repeatedly until None is encountered, returning the number of times it saw Some. Note that [next] has to be called at least once even if the iterator does not have any elements.

Overflow Behavior

The method does no guarding against overflows, so counting elements of an iterator with more than usize::MAX elements either produces the wrong result or panics. If overflow checks are enabled, a panic is guaranteed.

Panics

This function might panic if the iterator has more than usize::MAX elements.

Examples

let a = [1, 2, 3];
assert_eq!(a.iter().count(), 3);

let a = [1, 2, 3, 4, 5];
assert_eq!(a.iter().count(), 5);
core::option::Option
None

No value.

alloc::string
pub struct String {
    vec: Vec<u8>,
}

A UTF-8–encoded, growable string.

String is the most common string type. It has ownership over the contents of the string, stored in a heap-allocated buffer (see Representation). It is closely related to its borrowed counterpart, the primitive [str].

Examples

You can create a String from a literal string with [String::from]:

let hello = String::from("Hello, world!");

You can append a char to a String with the [push] method, and append a [&str] with the [push_str] method:

let mut hello = String::from("Hello, ");

hello.push('w');
hello.push_str("orld!");

If you have a vector of UTF-8 bytes, you can create a String from it with the [from_utf8] method:

// some bytes, in a vector
let sparkle_heart = vec![240, 159, 146, 150];

// We know these bytes are valid, so we'll use `unwrap()`.
let sparkle_heart = String::from_utf8(sparkle_heart).unwrap();

assert_eq!("💖", sparkle_heart);

UTF-8

Strings are always valid UTF-8. If you need a non-UTF-8 string, consider OsString. It is similar, but without the UTF-8 constraint. Because UTF-8 is a variable width encoding, Strings are typically smaller than an array of the same chars:

// `s` is ASCII which represents each `char` as one byte
let s = "hello";
assert_eq!(s.len(), 5);

// A `char` array with the same contents would be longer because
// every `char` is four bytes
let s = ['h', 'e', 'l', 'l', 'o'];
let size: usize = s.into_iter().map(|c| size_of_val(&c)).sum();
assert_eq!(size, 20);

// However, for non-ASCII strings, the difference will be smaller
// and sometimes they are the same
let s = "💖💖💖💖💖";
assert_eq!(s.len(), 20);

let s = ['💖', '💖', '💖', '💖', '💖'];
let size: usize = s.into_iter().map(|c| size_of_val(&c)).sum();
assert_eq!(size, 20);

This raises interesting questions as to how s[i] should work. What should i be here? Several options include byte indices and char indices but, because of UTF-8 encoding, only byte indices would provide constant time indexing. Getting the ith char, for example, is available using [chars]:

let s = "hello";
let third_character = s.chars().nth(2);
assert_eq!(third_character, Some('l'));

let s = "💖💖💖💖💖";
let third_character = s.chars().nth(2);
assert_eq!(third_character, Some('💖'));

Next, what should s[i] return? Because indexing returns a reference to underlying data it could be &u8, &[u8], or something similar. Since we’re only providing one index, &u8 makes the most sense but that might not be what the user expects and can be explicitly achieved with [as_bytes()]:

// The first byte is 104 - the byte value of `'h'`
let s = "hello";
assert_eq!(s.as_bytes()[0], 104);
// or
assert_eq!(s.as_bytes()[0], b'h');

// The first byte is 240 which isn't obviously useful
let s = "💖💖💖💖💖";
assert_eq!(s.as_bytes()[0], 240);

Due to these ambiguities/restrictions, indexing with a usize is simply forbidden:

let s = "hello";

// The following will not compile!
println!("The first letter of s is {}", s[0]);

It is more clear, however, how &s[i..j] should work (that is, indexing with a range). It should accept byte indices (to be constant-time) and return a &str which is UTF-8 encoded. This is also called “string slicing”. Note this will panic if the byte indices provided are not character boundaries - see [is_char_boundary] for more details. See the implementations for [SliceIndex<str>] for more details on string slicing. For a non-panicking version of string slicing, see [get].

The [bytes] and [chars] methods return iterators over the bytes and codepoints of the string, respectively. To iterate over codepoints along with byte indices, use [char_indices].

Deref

String implements [Deref]<Target = [str]>, and so inherits all of [str]’s methods. In addition, this means that you can pass a String to a function which takes a [&str] by using an ampersand (&):

fn takes_str(s: &str) { }

let s = String::from("Hello");

takes_str(&s);

This will create a [&str] from the String and pass it in. This conversion is very inexpensive, and so generally, functions will accept [&str]s as arguments unless they need a String for some specific reason.

In certain cases Rust doesn’t have enough information to make this conversion, known as [Deref] coercion. In the following example a string slice &'a str implements the trait TraitExample, and the function example_func takes anything that implements the trait. In this case Rust would need to make two implicit conversions, which Rust doesn’t have the means to do. For that reason, the following example will not compile.

trait TraitExample {}

impl<'a> TraitExample for &'a str {}

fn example_func<A: TraitExample>(example_arg: A) {}

let example_string = String::from("example_string");
example_func(&example_string);

There are two options that would work instead. The first would be to change the line example_func(&example_string); to example_func(example_string.as_str());, using the method [as_str()] to explicitly extract the string slice containing the string. The second way changes example_func(&example_string); to example_func(&*example_string);. In this case we are dereferencing a String to a [str], then referencing the [str] back to [&str]. The second way is more idiomatic, however both work to do the conversion explicitly rather than relying on the implicit conversion.

Representation

A String is made up of three components: a pointer to some bytes, a length, and a capacity. The pointer points to the internal buffer which String uses to store its data. The length is the number of bytes currently stored in the buffer, and the capacity is the size of the buffer in bytes. As such, the length will always be less than or equal to the capacity.

This buffer is always stored on the heap.

You can look at these with the [as_ptr], [len], and [capacity] methods:

let story = String::from("Once upon a time...");

// Deconstruct the String into parts.
let (ptr, len, capacity) = story.into_raw_parts();

// story has nineteen bytes
assert_eq!(19, len);

// We can re-build a String out of ptr, len, and capacity. This is all
// unsafe because we are responsible for making sure the components are
// valid:
let s = unsafe { String::from_raw_parts(ptr, len, capacity) } ;

assert_eq!(String::from("Once upon a time..."), s);

If a String has enough capacity, adding elements to it will not re-allocate. For example, consider this program:

let mut s = String::new();

println!("{}", s.capacity());

for _ in 0..5 {
    s.push_str("hello");
    println!("{}", s.capacity());
}

This will output the following:

0
8
16
16
32
32

At first, we have no memory allocated at all, but as we append to the string, it increases its capacity appropriately. If we instead use the [with_capacity] method to allocate the correct capacity initially:

let mut s = String::with_capacity(25);

println!("{}", s.capacity());

for _ in 0..5 {
    s.push_str("hello");
    println!("{}", s.capacity());
}

We end up with a different output:

25
25
25
25
25
25

Here, there’s no need to allocate more memory inside the loop.

codeintel::block_9e6360fd4f440dda
struct Client {
    id: {unknown},
    enabled: bool,
}
codeintel::block_9e6360fd4f440dda::Client
id: {unknown}
codeintel::block_9e6360fd4f440dda::Client
enabled: bool
bool

The boolean type.

The bool represents a value, which could only be either true or false. If you cast a bool into an integer, true will be 1 and false will be 0.

Basic usage

bool implements various traits, such as [BitAnd], [BitOr], [Not], etc., which allow us to perform boolean operations using &, | and !.

if requires a bool value as its conditional. assert!, which is an important macro in testing, checks whether an expression is true and panics if it isn’t.

let bool_val = true & false | false;
assert!(!bool_val);

Examples

A trivial example of the usage of bool:

let praise_the_borrow_checker = true;

// using the `if` conditional
if praise_the_borrow_checker {
    println!("oh, yeah!");
} else {
    println!("what?!!");
}

// ... or, a match pattern
match praise_the_borrow_checker {
    true => println!("keep praising!"),
    false => println!("you should praise!"),
}

Also, since bool implements the Copy trait, we don’t have to worry about the move semantics (just like the integer and float primitives).

Now an example of bool cast to integer type:

assert_eq!(true as i32, 1);
assert_eq!(false as i32, 0);
codeintel::block_9e6360fd4f440dda
struct Settings {
    allow_list: Vec<Client>,
}
codeintel::block_9e6360fd4f440dda::Settings
allow_list: Vec<Client, Global>
codeintel::block_68483f2d1c8d6945
fn allow_list_count(settings: Settings) -> usize
codeintel::block_404e9a53979e8e69
fn allow_list_count(settings: Settings) -> usize