alloc::string
pub struct String {
vec: Vec<u8>,
}
A UTF-8–encoded, growable string.
String is the most common string type. It has ownership over the contents of the string, stored in a heap-allocated buffer (see Representation). It is closely related to its borrowed counterpart, the primitive [str].
You can create a String from a literal string with [String::from]:
let hello = String::from("Hello, world!");
You can append a char to a String with the [push] method, and append a [&str] with the [push_str] method:
let mut hello = String::from("Hello, ");
hello.push('w');
hello.push_str("orld!");
If you have a vector of UTF-8 bytes, you can create a String from it with the [from_utf8] method:
let sparkle_heart = vec![240, 159, 146, 150];
let sparkle_heart = String::from_utf8(sparkle_heart).unwrap();
assert_eq!("💖", sparkle_heart);
Strings are always valid UTF-8. If you need a non-UTF-8 string, consider OsString. It is similar, but without the UTF-8 constraint. Because UTF-8 is a variable width encoding, Strings are typically smaller than an array of the same chars:
let s = "hello";
assert_eq!(s.len(), 5);
let s = ['h', 'e', 'l', 'l', 'o'];
let size: usize = s.into_iter().map(|c| size_of_val(&c)).sum();
assert_eq!(size, 20);
let s = "💖💖💖💖💖";
assert_eq!(s.len(), 20);
let s = ['💖', '💖', '💖', '💖', '💖'];
let size: usize = s.into_iter().map(|c| size_of_val(&c)).sum();
assert_eq!(size, 20);
This raises interesting questions as to how s[i] should work. What should i be here? Several options include byte indices and char indices but, because of UTF-8 encoding, only byte indices would provide constant time indexing. Getting the ith char, for example, is available using [chars]:
let s = "hello";
let third_character = s.chars().nth(2);
assert_eq!(third_character, Some('l'));
let s = "💖💖💖💖💖";
let third_character = s.chars().nth(2);
assert_eq!(third_character, Some('💖'));
Next, what should s[i] return? Because indexing returns a reference to underlying data it could be &u8, &[u8], or something similar. Since we’re only providing one index, &u8 makes the most sense but that might not be what the user expects and can be explicitly achieved with [as_bytes()]:
let s = "hello";
assert_eq!(s.as_bytes()[0], 104);
assert_eq!(s.as_bytes()[0], b'h');
let s = "💖💖💖💖💖";
assert_eq!(s.as_bytes()[0], 240);
Due to these ambiguities/restrictions, indexing with a usize is simply forbidden:
let s = "hello";
println!("The first letter of s is {}", s[0]);
It is more clear, however, how &s[i..j] should work (that is, indexing with a range). It should accept byte indices (to be constant-time) and return a &str which is UTF-8 encoded. This is also called “string slicing”. Note this will panic if the byte indices provided are not character boundaries - see [is_char_boundary] for more details. See the implementations for [SliceIndex<str>] for more details on string slicing. For a non-panicking version of string slicing, see [get].
The [bytes] and [chars] methods return iterators over the bytes and codepoints of the string, respectively. To iterate over codepoints along with byte indices, use [char_indices].
String implements [Deref]<Target = [str]>, and so inherits all of [str]’s methods. In addition, this means that you can pass a String to a function which takes a [&str] by using an ampersand (&):
fn takes_str(s: &str) { }
let s = String::from("Hello");
takes_str(&s);
This will create a [&str] from the String and pass it in. This conversion is very inexpensive, and so generally, functions will accept [&str]s as arguments unless they need a String for some specific reason.
In certain cases Rust doesn’t have enough information to make this conversion, known as [Deref] coercion. In the following example a string slice &'a str implements the trait TraitExample, and the function example_func takes anything that implements the trait. In this case Rust would need to make two implicit conversions, which Rust doesn’t have the means to do. For that reason, the following example will not compile.
trait TraitExample {}
impl<'a> TraitExample for &'a str {}
fn example_func<A: TraitExample>(example_arg: A) {}
let example_string = String::from("example_string");
example_func(&example_string);
There are two options that would work instead. The first would be to change the line example_func(&example_string); to example_func(example_string.as_str());, using the method [as_str()] to explicitly extract the string slice containing the string. The second way changes example_func(&example_string); to example_func(&*example_string);. In this case we are dereferencing a String to a [str], then referencing the [str] back to [&str]. The second way is more idiomatic, however both work to do the conversion explicitly rather than relying on the implicit conversion.
A String is made up of three components: a pointer to some bytes, a length, and a capacity. The pointer points to the internal buffer which String uses to store its data. The length is the number of bytes currently stored in the buffer, and the capacity is the size of the buffer in bytes. As such, the length will always be less than or equal to the capacity.
This buffer is always stored on the heap.
You can look at these with the [as_ptr], [len], and [capacity] methods:
let story = String::from("Once upon a time...");
let (ptr, len, capacity) = story.into_raw_parts();
assert_eq!(19, len);
let s = unsafe { String::from_raw_parts(ptr, len, capacity) } ;
assert_eq!(String::from("Once upon a time..."), s);
If a String has enough capacity, adding elements to it will not re-allocate. For example, consider this program:
let mut s = String::new();
println!("{}", s.capacity());
for _ in 0..5 {
s.push_str("hello");
println!("{}", s.capacity());
}
This will output the following:
0
8
16
16
32
32
At first, we have no memory allocated at all, but as we append to the string, it increases its capacity appropriately. If we instead use the [with_capacity] method to allocate the correct capacity initially:
let mut s = String::with_capacity(25);
println!("{}", s.capacity());
for _ in 0..5 {
s.push_str("hello");
println!("{}", s.capacity());
}
We end up with a different output:
25
25
25
25
25
25
Here, there’s no need to allocate more memory inside the loop.