The problem is… in C++, this style is just a convention, _ is still a regular variable.
So if we want to ignore two value with different type, it does not work since the type mismatch.
1 2 3 4 5
voidfoo(){ auto [_1, done1] = someJob<int>(); auto [_2, done2] = someJob<std::string>(); // we need to separate _2 from _1 }
That’s frustrating, especially for people with experience of pattern-matching expression in
other languages.
So in C++26 (proposed by P2169), now we can new way to interpret the semantic of _.
The rule is simple.
If there is only one declaration of _ in some scope, everything is same as before.
A we can reference it later if we wan’t, although it’s probably a bad smell to use _ in this case.
If there are more declarations of _, they all refer to different objects respectively.
In this case, they can only be assigned to. Try to use them is a compiling error.
And we can finally write something that looks more natural.
1 2 3 4
voidfoo(){ auto [_, done1] = someJob<int>(); auto [_, done2] = someJob<std::string>(); }
Golang has this feature from the beginning, is called blank identifier.
For Python, although being a dynamic-type language, there is no problem to do use _ for different
type value. _ is defined as a wildcard when pattern-matching is introduced to Python (PEP 634).
But it’s intended to be a storage accessed through network, so any operation on it is
inherently going to fail at some time.
Also, We are 2024 now, loving FP, preferring expression over statement, monadic operation
being so cool.
Thus we decide to wrap all return type in std::optional to indicate these actions may fail.
...\include\optional(100,26): error C2182: '_Value': this use of 'void' is not valid
Well. template stuff.
What Happened?
The problem is that void is an incomplete type in C/C++, and always to be treat specially
when we are trying to use them.
By incomplete in C/C++, we mean a type that the size of which is not (yet) known.
For example, if we forward declare a struct type, and later define it’s member. The struct type
is incomplete before the definition.
1 2 3 4 5 6 7 8 9
structItem;
Item item; // <- invalid usage, since that the size of Item is unknown yet.
structItem { int price; };
Item item; // <- valid usage here.
And void is a type that is impossible to be complete by specification.
But we can have a function that return void?
Well, we return nothing
1 2 3
voidfoo(){ }
voidfoo(){ return; } // Or explicit return, both equivalent.
BTW, C before C23 prefer putting a void in parameter list to indicate that a function takes
nothing, e.g. int bar(void), but is’s kinda broken design here.
Since that we can not evaluate bar(foo()). There is no such thing that is a void and exists.
Conceptually, std::optional<T> is just a some T with additional information of value-existence.
1 2 3 4 5
template <typename T> structMyOptional { T value; bool hasValue; };
Because that there is impossible to have a member void value,
std::optional<void> is not going to be a valid type at first place.
(Well, we can make a specialization for void, but that’s another story.)
So, How can We Fix?
The problem here is that there is no a valid value for void in C/C++.
At some level, program can be though of a bunch of expressions. An running a program
is just the evaluation of these expressions. (Also the side effects, for real products / services)
The atom of expression is value. If there is a concept that’s not possible to be express
as a value, we are kicking ourselves.
Take Python for example, if we have a function that return nothing, then the function actually
returnsNone when it exits.
1 2 3 4
deffoo(): pass
assert(foo() isNone) # check pass here
1 2 3 4 5 6 7
deffoo(arg: None) -> None: pass
defbar(arg: None) -> None: pass
bar(foo(None)) # well.. if we really want to chain them together
So nothing itself is a thing, in Python, we call it None.
Every expression now is evaluated to some value that exists, the the type system build up that
is complete at any case.
The concept of nothing itself here is call unit type in type theory.
It’s a type that has one and only one value.
In Python, the value is None, in JavaScript it’s null, in Golang … maybe struct{}{} is a
good choice, although not standardized by the language.
Unit Type in C++
Now is the time for C++. As we already see, void is not a good choice for unit type because we
can not have a value for it. Are there other choices here?
Just define a empty struct and use it probably not a good choice, since that
now our custom unit type is not compatible with unit type from other code in the product code base.
How about nullptr, that’s the only one value for std::nullptr_t.
(So the type is std::optional<std::nullptr_t>).
It’s a feasible choice, but looks weird since that pointer implies indirect access semantic,
but it’s not the case when using with std::optional<T> here.
How about using std::nullopt_t? It’s also a unit type but it’s now more confusing.
What’s does it mean by std::optional<std::nullopt_t>? A optional with empty option?
There is a static assert in std::optional<T> template that forbid this usage directly,
probably because it’s too confusing.
Maybe std::tuple<>? A tuple with zero element, so it have only one value, the empty tuple.
That seems to be a good choice because the canonical unit type in Haskell is () the empty tuple.
So it looks natural for people came from Haskell.
But personally I don’t like this either since that now the type has nested angle bracket
as std::optional<std::tuple<>>.
There is a type called std::monostate, arrived at the same time as std::optional in C++17.
This candidate do not have additional implication by it’s type or it’s name.
It’s monostate! Just a little wordy.
std::monostate is originally designed to solve the problem for a std::variant<...> to be
default initialized with any value. But it’s name and it’s characteristic are all fit our
requirement here. Thus a good choice for wrapping a function that may fail but
return nothing.
For a long long long time, Golang have no standard way to represent a iterable sequence.
C++ has range adaptor and iterator (although not strictly typed, only by concept), Python has
iterable/iterator by __iter__/__next__, JavaScript has standardized for-of and
Symbol.iterator since ES6.
Now it’s time for Golang. Starting from Golang 1.23 Aug., we have iterator functions.
funcmain() { for idx := range Iota() { if idx == 3 { break } fmt.Println(idx) // print 0 1 2 } }
According to Go 1.23 Release Notes
Now the range keyword accept three kinds of functions, for which takes another yield function
that yield zero/one/two values.
The loop control variable and the body of the for-loop is translated into the yield function by
language definition. So we can still write imperative-style loop structure even though we are
actually doing some functional-style function composition here.
Why Do We Need This?
Standardize the iterable/iterator interface is a important pre-condition for lazy evaluation.
For example, how should we do when we need to iterates through all non-negative integer,
and doing some map/filter/reduce on them? It waste space to allocate a list for all these
integers (if possible).
Someone may say “we already have channel types”.
Well, but that requires a separate coroutine instance. We probably don’t want such heavy
cost every time we are doing some iterate operations.
Also a separate coroutine means additional synchronization and lifecycle control.
For example, how can we terminate the Count coroutine when we need early break in loop?
funcmain() { for idx := range Count(0) { if idx == 10 { break } fmt.Println("Loop: ", idx) } }
We need some mechanism like context object or another channel right?
That’s a burden for such easy task here.
On the other hand, iterator functions are just ordinary function that accept another function
to yield/output the iterated values, so it’s much lightweight than a separate coroutine.
We want fast program, right? :D
The Stop Design
For languages like Python and JavaScript, the iterator function (or generator in Python terms)
is paused and the control is transfer back to the function that iterates the values.
When break/return happens and no more value are required, the iterator function just
got collected by the runtime since that there are no more references to the function object.
But how do we early break the iteration process, if the control is transfer
into the iterator function? Let’s look at the function signature again.
(Take one value iterator function for example).
1
func(yield func(idx int)bool)
The yield function returns a bool to indicate that whether the loop body does reach the
end, or encounter a break statement. So in normal case, we continue to next possible value
after yield return, but if we got false from yield, our iterator function
can return immediately.
Ecosystem around Iterator
The beauty of iterator only appears if the ecosystem, or we say, the common operations around
iterator are already implemented in standard library. That means:
Conversion from and to standard container types, like slicemap and chan
Operations and compositions of iterators, e.g. map/filter/reduce/chain/take …
In Python, there are generator expressions, which evolves implicit map/filter.
reduce is a function at global scope, also there are many useful functions in itertools package,
e.g. pairwise, batched, chain. Most builtin container types takes iterable as first argument
in it’s constructor.
In Golang, the first part is mostly done along the release of Golang 1.23.
For example, to convert slice from and to iterator, we can use slices.Collect and slices.Values.
For second part, there is a plan to add x/exp/xiter package under golang.org namespace.
There should be at least Concat, Map, Filter, Reduce, Zip … once it’s released.
But unfortunately it’s not compete yet.
Any/All short-circuit lazy evaluation of a predicate on a sequence of values
Forward/Backward absorb input and iterate in reversed order.
For example, if we want to define a iterator function for a binary tree in recursive manner,
we can use Empty and Pack together with Chain to implement this easily.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
type Node struct { val int left *Node right *Node }
funcTraverse(node *Node) iter.Seq[int] { // Empty is useful as base case during recursive generator chaining. if node == nil { return Empty[int]() } // Pack is useful to promote a single value into a iterable for chaining. return Chain( Traverse(node.left), Pack(node.val), Traverse(node.right), ) }
assert5 == 2 | Adder(3) # Is 5 equals to 2 + 3 ? Yes!!
This works because the | operator of integer 2 check the type of Adder(3) and found that
is not something it recognized, so it returns NotImplemented and our reverse magic method goes.
In C++, the | operator is overloaded(?) on range adaptors to accept ranges.
So maybe we can make something similar, having some object implements __ror__ that accept
an iterable and return another value (probably a iterator).
Pipe-able Higher Order Function
So back to our motivation, Python already have something like filtermapreduce,
and also the powerful generator expression to filter and/or map without explicit function call.
1
values = filter(lambda v: v % 2 == 0, range(10))
1
values = (v for v inrange(10) if v % 2 == 0)
But it’s just hard to chain multiple operations together while preserving readability.
So let’s make a filter object that support | operator
It just take some time for we to write the class representation for filter, map, reduce,
take, any … and any higher function you may think useful.
Wait, it looks so tedious. Python should be a powerful language, isn’t it?
Piper and Decorators
The function capturing and __ror__ implementation can be so annoying for all high order function.
If we can make sure __ror__ only take left operand, and return the return value of the captured
function, than we can extract a common Piper class. We just need another function to produce a
function that already capture the required logic.
Now it looks a little nicer … but we still need to implement all wrapper functions for all
kinds of operations?
Again, the only difference between these wrapped functions is the logic inside apply function,
so we can extract this part again, with a decorator!! :D
The on decorator accept some function func, and return a function that first take the
tail arguments of func and return a function that accept head argument of func through
pipe operator.
So now we can express our thoughts in our codebase using pipeline style code,
just with one helper class and one helper decorator! :D
1 2 3 4 5 6 7 8
values = range(10) result = ( values | filter(lambda val: val % 2 == 0) | map(str) | on(lambda chunks: "".join(chunks))() # create pipe-able object on the fly ) assert result == "02468"
or
1 2
for val inrange(10) | filter(lambda val: val % 2 == 0): print(val)
defon(func: Callable[Concatenate[_T, _P], _R]) -> Callable[_P, Piper[_T, _R]]: """ "on" decorates a func into pipe-style function. The result function first takes the arguments, excluding first, and returns an object that takes the first argument through "|" operator. """
Thanks to the symbol-lazy-loading ability in Unix environment,
we can do many interesting thing on functions from some shared library
when executing some executables.
All we need to do are
Implement a shared library that contains the functions we want to hijack.
Run the executable with our magic library inserted.
Make a Shared Library
If we want to replace some function with stub / fake implementation.
we can just implement a function with the same name and the same signature.
For example, if we want to fixed the clock during unit test …
1 2 3 4 5 6 7 8 9 10 11
// in hijack.c
#include<time.h>
// a "time" function which always return the timestamp of the epoch. time_ttime(time_t* arg) { if (arg) *arg = 0;
return0; }
If we want do observation about some function call, but still delegate the
call to the original function, we can implement a function that load
corresponding function at runtime and pass the function call.
For example, if we want to monitor the call sequence of file open action.
classContainer: definvoke(t: type[T]) -> T: # search and build a instance of type t
c = Container() instance: SomeType = c.invoke(SomeType) # ok, we can infer instance is SomeType
這邊 type[T] (or typing.Type[T], the old way) 用來表示我們正在用 t 來傳遞傳遞某個 type T,
而非 type 為 T 的某個 instance。
From typing document:
A variable annotated with C may accept a value of type C. In contrast, a variable annotated with
type[C] (or typing.Type[C]) may accept values that are classes themselves
The Problem
OK,我們有 type[T] 可以用。 DI library 開發者可以用型別為 type[T] 的 t 來做 indexing,
library 使用者可以享受到 static type checker 帶來的 type safety。
於是我們拿這個 library 來用在真實情境.. 沒想到一下子就碰上問題了。
當我們定義 interface type,並透過 DI container 對該 interface 拿取對應的 implementation instance 時。
因為 interface 通常是個 abstract class (or protocol),mypy type checker 會報錯
(mypy: type-abstract)。
Variables and parameters annotated with Type[Proto] accept only concrete (non-protocol) subtypes of Proto.
The main reason for this is to allow instantiation of parameters with such type. For example:
1 2
deffun(cls: Type[Proto]) -> int: return cls().meth() # OK
由於 PEP 544 自從 2017 年就已經完成,mypy 預設執行這個檢查也行之有年,
現在再來改這個行為或許已經來不及了。
於是為了解決這個問題,2020 有人在開了新 issue 9773 · python/mypy
想要定義新的 annotation TypeForm[T]/TypeExpr[T] 來達成要表達任意 type 的 type 的需求。
到目前 (2024-06),對應的 PEP 747 draft 也已經被提出了。
若一切順利,以後我們就會用 TypeExpr[T] 來表達這類 generic function
1 2 3 4 5 6 7 8
classContainer: definvoke(t: TypeExpr[T]) -> T: # search and build a instance of type t
classSomeType(Protocol): ...
c = Container() instance = c.invoke(SomeType) # ok, we find a object for type SomeType for you! operator = c.invoke(Callable[[int], bool]) # you need a (int -> bool)? no problem!
至於目前嘛.. library user 在使用到這類 library 的檔案加入下面這行即可。
我想要修改的範圍和造成的影響應該都還可以接受。