Easier Iteration with Generator

In Python, we can make a container type iterable by implement __iter__ method.

For example, if we want to provide all value in a range [lo, hi)

1
2
3
4
5
6
7
class TwoTillSix:

def __iter__(self):
for val in range(2, 6):
yield val

list(TwoTillSix()) # [2, 3, 4, 5]

With the help of this generator function, we don’t need a separate iterator type. Just keep the state during iteration directly in some local variable.

In this post, let’s see how we can do this in C++, with the help of std::generator<T>.

A shorter version of this post can be found in Gist and Compiler Explorer.

Coroutine and Concurrent Programming

What happens in the example above is that Python interpreter binds the execution state of that function to a generator object. Every time we want a value from that generator, the execution resumes until the function yields the next value (or returns).

This is doable because Python provide a builtin support for coroutine, an interruptible/resumable function, so the control flow can interleave between callee and caller functions. And that’s the primitive building block of concurrent programming.

1
2
3
4
5
6
7
values = TwoTillSix()

for val in values:
print(val)
if val == 3:
break
# Print 2 and 3. The body of `TwoTillSix.__iter__` is only executed twice.

As we can see here, the generator makes it possible that some values are evaluated lazily, and we can even stop the evaluation any time when we lost the interest. All of this is possible because the support of coroutine.

C++ Coroutine and std::generator

Fortunately, C++ has language level support for coroutine since C++20. When compiler see a function using new keywords co_await / co_yield and co_return, it will generate corresponding code to provide the ability of partial execution, and the storage of state for that function when it’s interrupted.

Also, starting from C++23, std::generator is included in standard library. As the coroutine (promise-)type that capture the common usage of… yes, generator!! (Actually I’m surprised that there is no any high-level coroutine type in standard library when coroutine is accepted as a language feature when C++20 is finalized.)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
std::generator<int> fibonacci() {
int a = 1, b = 0;
while (true) {
co_yield a;
b = std::exchange(a, a + b);
}
}

int main() {
for (int num : fibonacci()) {
if (num > 10)
break;
std::println("{}", num); // prints 1 1 2 3 5 8
}
return 0;
}

When fibonacci is called here, an interruptible function is created, and wrapped in the std::generator return value. Every time we take next value out in the for-loop, the function resumes and yield next value (and is interrupted again).

So now we have the ability to interleave control flow, just like the example in Python earlier!! But… that’s a free function, usually what we encounter is a container class. A class must has begin/end method, and corresponding increment/dereference operator on the returned iterator type, in order to be iterable. (It’s modeled as named requirements Container in standard.) If a class need another method call to be iterated on, it just looks… not like C++.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
struct Factorial { // a virtual container for factorial series
std::generator<int> generate() {
int product = 1;
int val = 1;
while (true) {
co_yield product;
product *= val++;
}
}
};

int main() {
Factorial fac {};
for (int num : fac.generate()) { // well.. can we iterate on fac directly?
if (num > 10)
break;
std::println("{}", num);
}
}

So… can we keep the implementation simple (using std::generator coroutine type), while keeping the container being iterated in usual way?

CRTP: Curiously Recurring Template Pattern

Now our goal become clear:

How can we make a class be used as before, if the class only provides a method that return a std::generator?”

That’s the place where we adopt CRTP to solve problem.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
/// @brief HasGenerator identifies the customization point of CRTP base AutoContainer.
/// @tparam C the child class type.
/// @tparam T the value type.
template <typename C, typename T>
concept HasGenerator = requires(C c) {
{ c.generate() } -> std::same_as<std::generator<T>>;
};

/// @brief A CRTP base type that allow easy iteration implementation.
/// @tparam T the value type of element being iterated. Not child class type.
/// @details Child class must provide std::generator<T> generate() member function.
template <std::semiregular T>
struct AutoContainer {

class Iter {
std::generator<T> gen_;
std::ranges::iterator_t<std::generator<T>> iter_;

public:
Iter(std::generator<T>&& gen)
: gen_ { std::move(gen) } // ensure generator being alive
, iter_ { gen_.begin() } { }

bool operator==(const std::default_sentinel_t&) const { return iter_ == gen_.end(); }
Iter& operator++() { return ++iter_, *this; }
T operator*() const { return *iter_; }
};

// deducing-this so that we can do CRTP.
// Also use concept to give better error message when doing mistake.
Iter begin(this HasGenerator<T> auto&& self) {
return Iter(self.generate());
}

// deducing-this not required, just be symmetric with begin.
std::default_sentinel_t end(this auto&& self) {
return std::default_sentinel;
};
};

AutoContainer can inherited by any class that want the all the implementation for iteration. The base class here provides an iterator type to capture the generator, and forwards the increment/dereference operation to it. With this iterator type, the remain effort is on the begin method, just construct the iterator from self.generate(). Here self is typed by deducing-this so generate() can be provided later by child class. And that’s the place where the base class connects to it’s child.

Also at the begin method, we add concept constrain HasGenerator. This way we can avoid error waterfall when using templated library. And the interesting thing here is: “The thing to be templated so that we achieve CRTP, is the templated begin method, not the AutoContainer class itself. If there are two child class both inherit from AuthContainer<int>, only one AuthContainer type is instantiated, but two different begin are instantiated.

(See also: Curiously Recurring Template Pattern - cppreference.com)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
int main() {

/// @brief Factorial sequence generator.
struct Factorial : public AutoContainer<int> {

// Implement `generate` so this class can be iterated like a container.
std::generator<int> generate() const {
// The mutable state is the local variable of this coroutine,
// so the method receiver can be const.
int product = 1;
int val = 1;
while (true) {
co_yield product;
product *= val++;
}
}
};

const Factorial fac {};
for (int value : fac) { // noticed that we don't call generate() here
if (value > 10)
break;
std::println("{}", value); // prints 1, 1, 2, 6
}
}

Now we can use the AutoContainer by inherit it in any child class. In the example above, Factorial inherit from AutoContainer<int> (since it generates integers). All we need to do is implement the generate method. Being a coroutine, the logic is clean.

Now everything works and looks nice!! :D

To make this class recognized as a std::ranges::input_range, that is, to make this class works with <ranges> library, the base AutoContainer actually need more stuffs, like value_type/difference_type on it’s iterator, support of post-increment operator, and so on… But that’s probably too much in a tutorial, so let’s stop here.

More Standard Library Support, Maybe?

We describe the minimal concept about coroutine, one important application of coroutine as Generator (std::generator), and how to use it in a beautiful way. But there is another important application of coroutine: Promise (with Event Loop).

JS developers already get used to the Promise type (or maybe the async/await keyword). Python also support this kind of usage since Python 3.5. But unfortunately, there is still no such thing in C++ standard library, even if the coroutine is standardized in C++20.

The good news is that there is already a proposal P2300 and a reference implementation NVIDIA/stdexec. (They don’t use the term Promise, probably because promise_type already have other meaning in the context of C++ coroutine)

Hope we can see more usage about these in the future.

The True Placeholder Symbol in C++

In many programming language, the common way to indicate that the symbol is not important, is to use _ for the symbol.

It was just a convention in C++, but it will become a language feature start from C++26.

Well… what’s the difference?


We use _ when there is some declaration but we do not care the name / have no good name for the variable.

For example, a common trick to preserve the life time of RAII lock is

1
2
3
4
5
void doJob() {
static std::mutex mutex;
std::lock_guard _(mutex); // give it a name so it won't unlock immediately
// some jobs ...
}

Or in structure-binding statement.

1
2
3
4
5
6
7
8
template <std::regular T>
std::tuple<T, bool> someJob() {
return { {}, true };
}

void foo() {
auto [_, done] = someJob<int>();
}

The problem is… in C++, this style is just a convention, _ is still a regular variable. So if we want to ignore two value with different type, it does not work since the type mismatch.

1
2
3
4
5
void foo() {
auto [_1, done1] = someJob<int>();
auto [_2, done2] = someJob<std::string>();
// we need to separate _2 from _1
}

That’s frustrating, especially for people with experience of pattern-matching expression in other languages.

So in C++26 (proposed by P2169), now we can new way to interpret the semantic of _.

The rule is simple.

If there is only one declaration of _ in some scope, everything is same as before.

A we can reference it later if we wan’t, although it’s probably a bad smell to use _ in this case.

If there are more declarations of _, they all refer to different objects respectively.

In this case, they can only be assigned to. Try to use them is a compiling error.

And we can finally write something that looks more natural.

1
2
3
4
void foo() {
auto [_, done1] = someJob<int>();
auto [_, done2] = someJob<std::string>();
}

Golang has this feature from the beginning, is called blank identifier. For Python, although being a dynamic-type language, there is no problem to do use _ for different type value. _ is defined as a wildcard when pattern-matching is introduced to Python (PEP 634).

It’s happy to see this came to C++ now. :D

There is No Unit Type in C++

There is NO unit type in C++. (Not in core language spec, at least.)

A Story

Assuming we are define a abstract interface for storage to get name / set name for some user.

1
2
3
4
5
6
7
class Storage {
public:
virtual ~Storage() = default;

virtual std::string GetName(int id) const = 0;
virtual void SetName(int id, std::string name) const = 0;
};

Simple and straightforward.

But it’s intended to be a storage accessed through network, so any operation on it is inherently going to fail at some time. Also, We are 2024 now, loving FP, preferring expression over statement, monadic operation being so cool. Thus we decide to wrap all return type in std::optional to indicate these actions may fail.

1
2
3
4
5
6
7
class Storage {
public:
virtual ~Storage() = default;

virtual std::optional<std::string> GetName(int id) const = 0;
virtual std::optional<void> SetName(int id, std::string name) const = 0;
};

Looks good! But now it fails to be compiled.

1
...\include\optional(100,26): error C2182: '_Value': this use of 'void' is not valid

Well. template stuff.

What Happened?

The problem is that void is an incomplete type in C/C++, and always to be treat specially when we are trying to use them.

By incomplete in C/C++, we mean a type that the size of which is not (yet) known.

For example, if we forward declare a struct type, and later define it’s member. The struct type is incomplete before the definition.

1
2
3
4
5
6
7
8
9
struct Item;

Item item; // <- invalid usage, since that the size of Item is unknown yet.

struct Item {
int price;
};

Item item; // <- valid usage here.

And void is a type that is impossible to be complete by specification.

But we can have a function that return void? Well, we return nothing

1
2
3
void foo() { }

void foo() { return; } // Or explicit return, both equivalent.

BTW, C before C23 prefer putting a void in parameter list to indicate that a function takes nothing, e.g. int bar(void), but is’s kinda broken design here.

Since that we can not evaluate bar(foo()). There is no such thing that is a void and exists.

1
2
3
4
5
6
7
void foo() { }

int bar(void) { return 0; }

int main() {
return bar(foo()); // <- invalid expression here.
}

So back to our problem here std::optional<void>

Conceptually, std::optional<T> is just a some T with additional information of value-existence.

1
2
3
4
5
template <typename T>
struct MyOptional {
T value;
bool hasValue;
};

Because that there is impossible to have a member void value, std::optional<void> is not going to be a valid type at first place.

(Well, we can make a specialization for void, but that’s another story.)

So, How can We Fix?

The problem here is that there is no a valid value for void in C/C++. At some level, program can be though of a bunch of expressions. An running a program is just the evaluation of these expressions. (Also the side effects, for real products / services)

The atom of expression is value. If there is a concept that’s not possible to be express as a value, we are kicking ourselves.

Take Python for example, if we have a function that return nothing, then the function actually returns None when it exits.

1
2
3
4
def foo():
pass

assert(foo() is None) # check pass here
1
2
3
4
5
6
7
def foo(arg: None) -> None:
pass

def bar(arg: None) -> None:
pass

bar(foo(None)) # well.. if we really want to chain them together

So nothing itself is a thing, in Python, we call it None. Every expression now is evaluated to some value that exists, the the type system build up that is complete at any case.

The concept of nothing itself here is call unit type in type theory. It’s a type that has one and only one value. In Python, the value is None, in JavaScript it’s null, in Golang … maybe struct{}{} is a good choice, although not standardized by the language.

Unit Type in C++

Now is the time for C++. As we already see, void is not a good choice for unit type because we can not have a value for it. Are there other choices here?

Just define a empty struct and use it probably not a good choice, since that now our custom unit type is not compatible with unit type from other code in the product code base.

How about nullptr, that’s the only one value for std::nullptr_t. (So the type is std::optional<std::nullptr_t>). It’s a feasible choice, but looks weird since that pointer implies indirect access semantic, but it’s not the case when using with std::optional<T> here.

How about using std::nullopt_t? It’s also a unit type but it’s now more confusing. What’s does it mean by std::optional<std::nullopt_t>? A optional with empty option? There is a static assert in std::optional<T> template that forbid this usage directly, probably because it’s too confusing.

Maybe std::tuple<>? A tuple with zero element, so it have only one value, the empty tuple. That seems to be a good choice because the canonical unit type in Haskell is () the empty tuple. So it looks natural for people came from Haskell. But personally I don’t like this either since that now the type has nested angle bracket as std::optional<std::tuple<>>.

There is a type called std::monostate, arrived at the same time as std::optional in C++17. This candidate do not have additional implication by it’s type or it’s name. It’s monostate! Just a little wordy.

std::monostate is originally designed to solve the problem for a std::variant<...> to be default initialized with any value. But it’s name and it’s characteristic are all fit our requirement here. Thus a good choice for wrapping a function that may fail but return nothing.

Now the interface looks like

1
2
3
4
5
6
7
class Storage {
public:
virtual ~Storage() = default;

virtual std::optional<std::string> GetName(int id) const = 0;
virtual std::optional<std::monostate> SetName(int id, std::string name) const = 0;
};

Hmm… std::optional<std::monostate>, which takes 29 characters. C++ is not easy. Just like we use std::shared_ptr<T> all over the places.

Maybe the C++ Standards Committee should specialize std::optional<void>, just like std::expected<void> in C++23.


Wish someday void can be a REAL unit type in C/C++. :D