Collection types.
The Subspace library provides implementations of common general purpose programming data structures, with rich APIs that allow for interactions with Iterators, and with APIs that provide safe defaults.
The collections offer similar functionality to the C++ standard containers library but differ in some key ways.
- Introduce compiler errors for common mistakes instead of runtime failures and Undefined Behaviour which leads to miscompiles. Subspace relies on the latest C++ standards to achieve this, and does not provide less-safe APIs compatible with old C++ standards.
- Providing safe defaults. All API methods will do what is asked of them, or fail to compile. And in some cases, will perform runtime checks and terminate in the case of a software bug, which is represented in the method documentation.
- No uninitialized memory through default initialization.
- Indexing operations with negative signed values no longer compile.
Containers that index on
size_t
(as in the standard library), instead ofusize
, will silently accept memory safety bugs with negative indices. - Providing explicit unsafe backdoors. Occasionally runtime checks can't be elided by the compiler and they are in hot code that has visible performance impact. Explicit unsafe backdoors allow individual callsites to opt out of runtime checks as needed, with this choice being fully visible in the syntax of the code. This allows them to be properly scrutinized in code review or checked for with tooling.
- Providing fallible APIs for element access that hook into the rich,
composable APIs of
Option
in order to clearly and easily write error handling instead of Undefined Behaviour or crashes. - No accidental copies. Subspace collections (that are not view types) do
not satisfy the
Copy
concept, and instead must be explicitly cloned viasus::clone(x)
to make a copy. This allows them to be passed by value without introducing a copy at a caller that was expecting it to be received by reference. - Catch iterator invalidation. By default Subspace containers are built with runtime protection against iterator invalidation. Iterators produced by collections are tracked and if the collection is mutated while an iterator still exists, the collection will panic and terminate the program.
Subspace's collections can be grouped into four major categories:
- Sequences:
Vec
,Array
(TODO: VecDeque, LinkedList, Hive) - Maps: (TODO: HashMap, BTreeMap, FlatMap)
- Sets: (TODO: HashSet, BTreeSet, FlatSet)
- Misc: (TODO: BinaryHeap)
When Should You Use Which Collection
These are fairly high-level and quick break-downs of when each collection should be considered. Detailed discussions of strengths and weaknesses of individual collections can be found on their own documentation pages.
Use a Vec when:
- You want to collect items up to be processed or sent elsewhere later, and don't care about any properties of the actual values being stored.
- You want a sequence of elements in a particular order, and will only be appending to (or near) the end.
- You want a stack.
- You want a resizable array.
- You want a heap-allocated array.
Use an Array when:
- You want a fixed-size array of items that are all constructed up front and share a single lifetime.
- You want to store a sequence of compile-time constants.
- You want the sequence to live on the stack.
TODO: More collections here as they exist!
TODO: Performance info/comparisons when there's more types.
Slices vs spans
Slice
and SliceMut
are how the Subspace library exposes views of
contiguous sequences of elements with O(1) random access. They provide const
and mutable access to the underlying objects, respectively.
Slice<T>
is similar to
std::span<const T>
and SliceMut<T>
is similar
to std::span<T>
.
All contiguous sequence collections in this library can
implicitly convert to a Slice
(always) or
to a SliceMut
(if the collection is
mutable).
Slices and owning contiguous collections like
Vec
share most of the same API surface,
including methods such as
sort
,
chunks
,
iter
, and
concat
.
Capacity Management
Many collections provide several constructors and methods that refer to "capacity". These collections are generally built on top of an array. Optimally, this array would be exactly the right size to fit only the elements stored in the collection, but for the collection to do this would be very inefficient. If the backing array was exactly the right size at all times, then every time an element is inserted, the collection would have to grow the array to fit it. Due to the way memory is allocated and managed on most computers, this would almost surely require allocating an entirely new array and copying every single element from the old one into the new one. Hopefully you can see that this wouldn't be very efficient to do on every operation.
Most collections therefore use an amortized allocation strategy. They generally let themselves have a fair amount of unoccupied space so that they only have to grow on occasion. When they do grow, they allocate a substantially larger array to move the elements into so that it will take a while for another grow to be required. While this strategy is great in general, it would be even better if the collection never had to resize its backing array. Unfortunately, the collection itself doesn't have enough information to do this itself. Therefore, it is up to us programmers to give it hints.
Any with_capacity()
constructor will instruct the collection to allocate
enough space for the specified number of elements. Ideally this will be for
exactly that many elements, but some implementation details may prevent
this. See collection-specific documentation for details. In general, use
with_capacity()
when you know exactly how many elements will be inserted,
or at least have a reasonable upper-bound on that number.
When anticipating a large influx of elements, the reserve family of methods can be used to hint to the collection how much room it should make for the coming items. As with with_capacity, the precise behavior of these methods will be specific to the collection of interest.
For optimal performance, collections will generally avoid shrinking
themselves. If you believe that a collection will not soon contain any more
elements, or just really need the memory, the shrink_to_fit()
method
prompts the collection to shrink the backing array to the minimum size
capable of holding its elements.
Finally, if ever you're interested in what the actual capacity of the
collection is, most collections provide a capacity()
method to query this
information on demand. This can be useful for debugging purposes, or for use
with the reserve()
methods.
Iterators
Iterators are a powerful and robust mechanism used throughout the Subspace C++ library. Iterators provide a sequence of values in a generic, safe, efficient and convenient way. The contents of an iterator are usually lazily evaluated, so that only the values that are actually needed are ever actually produced, and no allocation need be done to temporarily store them. Iterators are primarily consumed using a for loop, although many functions also take iterators where a collection or sequence of values is desired.
All of the collections in Subspace provide several iterators for performing
bulk manipulation of their contents. The three primary iterators almost
every collection should provide are iter()
, iter_mut()
, and
into_iter()
. Some of these may not be provided on collections where it
would not be reasonable to provide them.
iter()
provides an iterator of immutable references to all the contents of
a collection in the most “natural” order. For sequence collections like Vec,
this means the items will be yielded in increasing order of index starting
at 0. For ordered collections like BTreeMap, this means that the items will
be yielded in sorted order. For unordered collections like HashMap, the
items will be yielded in whatever order the internal representation made
most convenient. This is great for reading through all the contents of the
collection.
const auto vec = sus::Vec<i32>(1, 2, 3, 4);
for (const auto& x: vec.iter()) {
fmt::println("vec contained {}", x);
}
iter_mut()
provides an iterator of mutable references in the same order as
iter. This is great for mutating all the contents of the collection.
auto vec = sus::Vec<i32>(1, 2, 3, 4);
for (auto& x: vec.iter_mut()) {
x += 1;
}
into_iter()
transforms the actual collection into an iterator over its
contents by-value. This is great when the collection itself is no longer
needed, and the values are needed elsewhere. Using extend()
with
into_iter()
is the main way that contents of one collection are moved into
another. extend()
automatically calls into_iter()
, and takes any T
that satisfies IntoIterator
. Calling collect()
on an iterator itself is
also a great way to convert one collection into another. Both of these
methods should internally use the capacity management tools discussed in the
previous section to do this as efficiently as possible.
auto vec1 = sus::Vec<i32>(1, 2, 3, 4);
auto vec2 = sus::Vec<i32>(10, 20, 30, 40);
vec1.extend(sus::move(vec2));
#include "sus/collections/vec_deque.h"
auto vec = sus::Vec<i32>(1, 2, 3, 4);
auto deque = sus::move(vec).into_iter().collect<sus::VecDeque<i32>>();
Iterators also provide a series of adapter methods for performing common threads to sequences. Among the adapters are functional favorites like map, fold, skip and take. Of particular interest to collections is the rev adapter, which reverses any iterator that supports this operation. Most collections provide reversible iterators as the way to iterate over them in reverse order.
auto vec = sus::Vec<i32>(1, 2, 3, 4);
for (const auto& x: vec.iter().rev()) {
fmt::println("vec contained {}", x);
}
Several other collection methods also return iterators to yield a sequence of results but avoid allocating an entire collection to store the result in. This provides maximum flexibility as collect or extend can be called to “pipe” the sequence into any collection if desired. Otherwise, the sequence can be looped over with a for loop. The iterator can also be discarded after partial use, preventing the computation of the unused items.
Ranges
The collections in the Subspace C++ library can be used with standard ranges
by calling the range()
adaptor on any
Iterator
. It will return an object that satisfies
std::ranges::input_range
and
std::ranges::viewable_range
,
such as with vec.iter().range()
.
Iterators over value types or mutable references when converted by
range()
will also satisfy
std::ranges::output_range
,
such as with vec.iter_mut().range()
.
Conversely, an Iterator
can be constructed for
a standard range such as std::vector
through
sus::iter::from_range
.
Familiarity with Rust APIs
These collections were inspired by porting Rust
std::collections
and traits/concepts into C++. Additional changes are minimized to aid with
familiarity and working across languages, but some are necessary for use in
C++ or interop with the standard library. Subspace also provides additional
containers specific to C++ when needed, such as Array
(and TODO: FlatMap).
Classes
-
A collection of objects of type
T
, with a fixed sizeN
. -
An iterator that consumes an
Array
and returns the items from it. -
An iterator over a slice in (non-overlapping) chunks (
chunk_size
elements at a time), starting at the beginning of the slice. -
An iterator over a slice in (non-overlapping) chunks (
chunk_size
elements at a time), starting at the beginning of the slice. -
An iterator over a mutable slice in (non-overlapping) chunks (
chunk_size
elements at a time), starting at the beginning of the slice. -
An iterator over mutable a slice in (non-overlapping) chunks (
chunk_size
elements at a time), starting at the beginning of the slice. -
A draining iterator for
Vec<T>
. -
An iterator over a slice in (non-overlapping) chunks (
chunk_size
elements at a time), starting at the end of the slice. -
An iterator over a slice in (non-overlapping) chunks (
chunk_size
elements at a time), starting at the end of the slice. -
An iterator over a mutable slice in (non-overlapping) chunks (
chunk_size
elements at a time), starting at the end of the slice. -
An iterator over a mutable slice in (non-overlapping) chunks (
chunk_size
elements at a time), starting at the end of the slice. -
An iterator over subslices separated by elements that match a predicate function, starting from the end of the slice.
-
An iterator over the subslices of the vector which are separated by elements that match pred, starting from the end of the slice.
-
An iterator over subslices separated by elements that match a predicate function, limited to a given number of splits, starting from the end of the slice.
-
An iterator over subslices separated by elements that match a predicate function, limited to a given number of splits, starting from the end of the slice.
-
A dynamically-sized const view into a contiguous sequence of objects of type
const T
. -
An iterator over a contiguous array of objects with const access to them.
-
An iterator over a contiguous array of objects with mutable access to them.
-
A dynamically-sized mutable view into a contiguous sequence of objects of type
T
. -
An iterator over subslices separated by elements that match a predicate function.
-
An iterator over subslices separated by elements that match a predicate function. Unlike
Split
, it contains the matched part as a terminator of the subslice. -
An iterator over subslices separated by elements that match a predicate function. Unlike
Split
, it contains the matched part as a terminator of the subslice. -
An iterator over subslices separated by elements that match a predicate function.
-
An iterator over subslices separated by elements that match a predicate function, limited to a given number of splits.
-
An iterator over mutable subslices separated by elements that match a predicate function, limited to a given number of splits.
-
A resizeable contiguous buffer of type
T
. -
An iterator that consumes a
Vec
and returns the items from it. -
An iterator over overlapping subslices of length
size
. -
An iterator over overlapping subslices of length
size
.
Operators
-
Compares two Arrays.
-
Compares an Array and a Slice.
-
Compares an Array and a SliceMut.
Concepts
-
Types that support being flattened and concatenated together into a collection.
-
Types that support being flattened and concatenated together into a collection, with a separator between each item. This is similar to
Concat
but with a separator.
Function Aliases
-
Implicit for-ranged loop iteration for all collections via the
iter
method. -
Implicit for-ranged loop iteration for all collections via the
iter
method.