The True Placeholder Symbol in C++

In many programming language, the common way to indicate that the symbol is not important, is to use _ for the symbol.

It was just a convention in C++, but it will become a language feature start from C++26.

Well… what’s the difference?


We use _ when there is some declaration but we do not care the name / have no good name for the variable.

For example, a common trick to preserve the life time of RAII lock is

1
2
3
4
5
void doJob() {
static std::mutex mutex;
std::lock_guard _(mutex); // give it a name so it won't unlock immediately
// some jobs ...
}

Or in structure-binding statement.

1
2
3
4
5
6
7
8
template <std::regular T>
std::tuple<T, bool> someJob() {
return { {}, true };
}

void foo() {
auto [_, done] = someJob<int>();
}

The problem is… in C++, this style is just a convention, _ is still a regular variable. So if we want to ignore two value with different type, it does not work since the type mismatch.

1
2
3
4
5
void foo() {
auto [_1, done1] = someJob<int>();
auto [_2, done2] = someJob<std::string>();
// we need to separate _2 from _1
}

That’s frustrating, especially for people with experience of pattern-matching expression in other languages.

So in C++26 (proposed by P2169), now we can new way to interpret the semantic of _.

The rule is simple.

If there is only one declaration of _ in some scope, everything is same as before.

A we can reference it later if we wan’t, although it’s probably a bad smell to use _ in this case.

If there are more declarations of _, they all refer to different objects respectively.

In this case, they can only be assigned to. Try to use them is a compiling error.

And we can finally write something that looks more natural.

1
2
3
4
void foo() {
auto [_, done1] = someJob<int>();
auto [_, done2] = someJob<std::string>();
}

Golang has this feature from the beginning, is called blank identifier. For Python, although being a dynamic-type language, there is no problem to do use _ for different type value. _ is defined as a wildcard when pattern-matching is introduced to Python (PEP 634).

It’s happy to see this came to C++ now. :D

There is No Unit Type in C++

There is NO unit type in C++. (Not in core language spec, at least.)

A Story

Assuming we are define a abstract interface for storage to get name / set name for some user.

1
2
3
4
5
6
7
class Storage {
public:
virtual ~Storage() = default;

virtual std::string GetName(int id) const = 0;
virtual void SetName(int id, std::string name) const = 0;
};

Simple and straightforward.

But it’s intended to be a storage accessed through network, so any operation on it is inherently going to fail at some time. Also, We are 2024 now, loving FP, preferring expression over statement, monadic operation being so cool. Thus we decide to wrap all return type in std::optional to indicate these actions may fail.

1
2
3
4
5
6
7
class Storage {
public:
virtual ~Storage() = default;

virtual std::optional<std::string> GetName(int id) const = 0;
virtual std::optional<void> SetName(int id, std::string name) const = 0;
};

Looks good! But now it fails to be compiled.

1
...\include\optional(100,26): error C2182: '_Value': this use of 'void' is not valid

Well. template stuff.

What Happened?

The problem is that void is an incomplete type in C/C++, and always to be treat specially when we are trying to use them.

By incomplete in C/C++, we mean a type that the size of which is not (yet) known.

For example, if we forward declare a struct type, and later define it’s member. The struct type is incomplete before the definition.

1
2
3
4
5
6
7
8
9
struct Item;

Item item; // <- invalid usage, since that the size of Item is unknown yet.

struct Item {
int price;
};

Item item; // <- valid usage here.

And void is a type that is impossible to be complete by specification.

But we can have a function that return void? Well, we return nothing

1
2
3
void foo() { }

void foo() { return; } // Or explicit return, both equivalent.

BTW, C before C23 prefer putting a void in parameter list to indicate that a function takes nothing, e.g. int bar(void), but is’s kinda broken design here.

Since that we can not evaluate bar(foo()). There is no such thing that is a void and exists.

1
2
3
4
5
6
7
void foo() { }

int bar(void) { return 0; }

int main() {
return bar(foo()); // <- invalid expression here.
}

So back to our problem here std::optional<void>

Conceptually, std::optional<T> is just a some T with additional information of value-existence.

1
2
3
4
5
template <typename T>
struct MyOptional {
T value;
bool hasValue;
};

Because that there is impossible to have a member void value, std::optional<void> is not going to be a valid type at first place.

(Well, we can make a specialization for void, but that’s another story.)

So, How can We Fix?

The problem here is that there is no a valid value for void in C/C++. At some level, program can be though of a bunch of expressions. An running a program is just the evaluation of these expressions. (Also the side effects, for real products / services)

The atom of expression is value. If there is a concept that’s not possible to be express as a value, we are kicking ourselves.

Take Python for example, if we have a function that return nothing, then the function actually returns None when it exits.

1
2
3
4
def foo():
pass

assert(foo() is None) # check pass here
1
2
3
4
5
6
7
def foo(arg: None) -> None:
pass

def bar(arg: None) -> None:
pass

bar(foo(None)) # well.. if we really want to chain them together

So nothing itself is a thing, in Python, we call it None. Every expression now is evaluated to some value that exists, the the type system build up that is complete at any case.

The concept of nothing itself here is call unit type in type theory. It’s a type that has one and only one value. In Python, the value is None, in JavaScript it’s null, in Golang … maybe struct{}{} is a good choice, although not standardized by the language.

Unit Type in C++

Now is the time for C++. As we already see, void is not a good choice for unit type because we can not have a value for it. Are there other choices here?

Just define a empty struct and use it probably not a good choice, since that now our custom unit type is not compatible with unit type from other code in the product code base.

How about nullptr, that’s the only one value for std::nullptr_t. (So the type is std::optional<std::nullptr_t>). It’s a feasible choice, but looks weird since that pointer implies indirect access semantic, but it’s not the case when using with std::optional<T> here.

How about using std::nullopt_t? It’s also a unit type but it’s now more confusing. What’s does it mean by std::optional<std::nullopt_t>? A optional with empty option? There is a static assert in std::optional<T> template that forbid this usage directly, probably because it’s too confusing.

Maybe std::tuple<>? A tuple with zero element, so it have only one value, the empty tuple. That seems to be a good choice because the canonical unit type in Haskell is () the empty tuple. So it looks natural for people came from Haskell. But personally I don’t like this either since that now the type has nested angle bracket as std::optional<std::tuple<>>.

There is a type called std::monostate, arrived at the same time as std::optional in C++17. This candidate do not have additional implication by it’s type or it’s name. It’s monostate! Just a little wordy.

std::monostate is originally designed to solve the problem for a std::variant<...> to be default initialized with any value. But it’s name and it’s characteristic are all fit our requirement here. Thus a good choice for wrapping a function that may fail but return nothing.

Now the interface looks like

1
2
3
4
5
6
7
class Storage {
public:
virtual ~Storage() = default;

virtual std::optional<std::string> GetName(int id) const = 0;
virtual std::optional<std::monostate> SetName(int id, std::string name) const = 0;
};

Hmm… std::optional<std::monostate>, which takes 29 characters. C++ is not easy. Just like we use std::shared_ptr<T> all over the places.

Maybe the C++ Standards Committee should specialize std::optional<void>, just like std::expected<void> in C++23.


Wish someday void can be a REAL unit type in C/C++. :D

Golang 1.23 Iterator Functions

For a long long long time, Golang have no standard way to represent a iterable sequence.

C++ has range adaptor and iterator (although not strictly typed, only by concept), Python has iterable/iterator by __iter__/__next__, JavaScript has standardized for-of and Symbol.iterator since ES6.

Now it’s time for Golang. Starting from Golang 1.23 Aug., we have iterator functions.

How It Works.

Sample code explains faster.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
func Iota() func(yield func(idx int) bool) {
return func(yield func(idx int) bool) {
idx := 0
for {
if !yield(idx) {
return
}
idx += 1
}
}
}

func main() {
for idx := range Iota() {
if idx == 3 {
break
}
fmt.Println(idx) // print 0 1 2
}
}

According to Go 1.23 Release Notes Now the range keyword accept three kinds of functions, for which takes another yield function that yield zero/one/two values.

1
2
3
func(func() bool)
func(func(K) bool)
func(func(K, V) bool)

The loop control variable and the body of the for-loop is translated into the yield function by language definition. So we can still write imperative-style loop structure even though we are actually doing some functional-style function composition here.

Why Do We Need This?

Standardize the iterable/iterator interface is a important pre-condition for lazy evaluation. For example, how should we do when we need to iterates through all non-negative integer, and doing some map/filter/reduce on them? It waste space to allocate a list for all these integers (if possible).

Someone may say “we already have channel types”. Well, but that requires a separate coroutine instance. We probably don’t want such heavy cost every time we are doing some iterate operations.

Also a separate coroutine means additional synchronization and lifecycle control. For example, how can we terminate the Count coroutine when we need early break in loop?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
func Count(start int) chan int {
output := make(chan int)
go func() {
idx := start
for {
output <- idx
idx += 1
}
}()
return output
}

func main() {
for idx := range Count(0) {
if idx == 10 {
break
}
fmt.Println("Loop: ", idx)
}
}

We need some mechanism like context object or another channel right? That’s a burden for such easy task here.

On the other hand, iterator functions are just ordinary function that accept another function to yield/output the iterated values, so it’s much lightweight than a separate coroutine. We want fast program, right? :D

The Stop Design

For languages like Python and JavaScript, the iterator function (or generator in Python terms) is paused and the control is transfer back to the function that iterates the values. When break/return happens and no more value are required, the iterator function just got collected by the runtime since that there are no more references to the function object.

But how do we early break the iteration process, if the control is transfer into the iterator function? Let’s look at the function signature again. (Take one value iterator function for example).

1
func(yield func(idx int) bool)

The yield function returns a bool to indicate that whether the loop body does reach the end, or encounter a break statement. So in normal case, we continue to next possible value after yield return, but if we got false from yield, our iterator function can return immediately.

Ecosystem around Iterator

The beauty of iterator only appears if the ecosystem, or we say, the common operations around iterator are already implemented in standard library. That means:

  • Conversion from and to standard container types, like slice map and chan
  • Operations and compositions of iterators, e.g. map/filter/reduce/chain/take

In Python, there are generator expressions, which evolves implicit map/filter. reduce is a function at global scope, also there are many useful functions in itertools package, e.g. pairwise, batched, chain. Most builtin container types takes iterable as first argument in it’s constructor.

In Golang, the first part is mostly done along the release of Golang 1.23. For example, to convert slice from and to iterator, we can use slices.Collect and slices.Values.

For second part, there is a plan to add x/exp/xiter package under golang.org namespace. There should be at least Concat, Map, Filter, Reduce, Zip … once it’s released. But unfortunately it’s not compete yet.

See: iter: new package for iterators · Issue #61897 · golang/go

Also I create a toy package github.com/wdhongtw/mice/flow to address some important building wheel around iterators

  • Empty/Pack return a iterator for zero/one value
  • Any/All short-circuit lazy evaluation of a predicate on a sequence of values
  • Forward/Backward absorb input and iterate in reversed order.

For example, if we want to define a iterator function for a binary tree in recursive manner, we can use Empty and Pack together with Chain to implement this easily.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
type Node struct {
val int
left *Node
right *Node
}

func Traverse(node *Node) iter.Seq[int] {
// Empty is useful as base case during recursive generator chaining.
if node == nil {
return Empty[int]()
}
// Pack is useful to promote a single value into a iterable for chaining.
return Chain(
Traverse(node.left),
Pack(node.val),
Traverse(node.right),
)
}

Looks cool, doesn’t it? :D

Pipeline Style Map-Reduce in Python

Since C++20, C++ provide a new style of data processing, and the ability of lazy evaluation by chaining the iterator to another iterator.

1
2
3
4
5
6
7
8
9
#include <ranges>
#include <vector>

int main() {
std::vector<int> input = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
auto output = input | std::views::filter([](const int n) {return n % 3 == 0;})
| std::views::transform([](const int n) {return n * n;});
// now output is [0, 9, 36, 81], conceptually
}

Can we use this style in Python? Yes! :D

Evaluation of Operators

In Python, all expression evolves a arithmetic operator, e.g. a + b, is evaluate by follow rule

  • If the (forward) special method, e.g. __add__ exists on left operand
    • It’s invoked on left operand, e.g. a.__add__(b)
    • If the invocation return some meaningful value other than NotImplemented, done!
  • If the (forward) special method does not exist, or the invocation returns NotImplemented, then
  • If the reverse special method, e.g. __radd__ exists on right operand
    • It’s invoked on the right operator, e.g. b.__radd__(a)
    • If the invocation return some meaningful value other than NotImplemented, done!
  • Otherwise, TypeError is raised

So it seems possible here… Let’s make a quick experiment

1
2
3
4
5
6
7
8
9
class Adder:
def __init__(self, rhs: int) -> None:
self._rhs = rhs

def __ror__(self, lhs: int) -> int:
return lhs + self._rhs


assert 5 == 2 | Adder(3) # Is 5 equals to 2 + 3 ? Yes!!

This works because the | operator of integer 2 check the type of Adder(3) and found that is not something it recognized, so it returns NotImplemented and our reverse magic method goes.

In C++, the | operator is overloaded(?) on range adaptors to accept ranges. So maybe we can make something similar, having some object implements __ror__ that accept an iterable and return another value (probably a iterator).

Pipe-able Higher Order Function

So back to our motivation, Python already have something like filter map reduce, and also the powerful generator expression to filter and/or map without explicit function call.

1
values = filter(lambda v: v % 2 == 0, range(10))
1
values = (v for v in range(10) if v % 2 == 0)

But it’s just hard to chain multiple operations together while preserving readability.

So let’s make a filter object that support | operator

1
2
3
4
5
6
7
8
9
10
11
12
class Filter:
def __init__(self, predicate: Callable[[int], bool]) -> None:
self._predicate = predicate

def __ror__(self, values: Iterable[int]) -> Iterator[int]:
for value in values:
if self._predicate(value):
yield value


selected = range(10) | Filter(lambda val: val % 2 == 0)
assert [0, 2, 4, 6, 8] == list(selected)

How about map?

1
2
3
4
5
6
7
8
9
10
11
class Mapper:
def __init__(self, transform: Callable[[int], int]) -> None:
self._transform = transform

def __ror__(self, values: Iterable[int]) -> Iterator[int]:
for value in values:
yield self._transform(value)


processed = range(3) | Mapper(lambda val: val * 2)
assert [0, 2, 4] == list(processed)

Works well, we are great again!!

It just take some time for we to write the class representation for filter, map, reduce, take, any … and any higher function you may think useful.

Wait, it looks so tedious. Python should be a powerful language, isn’t it?

Piper and Decorators

The function capturing and __ror__ implementation can be so annoying for all high order function. If we can make sure __ror__ only take left operand, and return the return value of the captured function, than we can extract a common Piper class. We just need another function to produce a function that already capture the required logic.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
class Piper(Generic[_T, _U]):
def __init__(self, func: Callable[[_T], _U]) -> None:
self._func = func

def __ror__(self, lhs: _T) -> _U:
return self._func(lhs)


def filter_wrapped(predicate: Callable[[_T], bool]):
def apply(items: Iterable[_T]) -> Iterator[_T]:
for item in items:
if predicate(item):
yield item

return Piper(apply)


selected = range(10) | filter_wrapped(lambda val: val % 2 == 0)
assert [0, 2, 4, 6, 8] == list(selected)

Now it looks a little nicer … but we still need to implement all wrapper functions for all kinds of operations?

Again, the only difference between these wrapped functions is the logic inside apply function, so we can extract this part again, with a decorator!! :D

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def on(func: Callable[Concatenate[_T, _P], _R]) -> Callable[_P, Piper[_T, _R]]:
def wrapped(*args: _P.args, **kwargs: _P.kwargs) -> Piper[_T, _R]:
def apply(head: _T) -> _R:
return func(head, *args, **kwargs)

return Piper(apply)

return wrapped


@on
def filter(items: Iterable[_T], predicate: Callable[[_T], bool]) -> Iterator[_T]:
for item in items:
if predicate(item):
yield item


selected = range(10) | filter(lambda val: val % 2 == 0)
assert [0, 2, 4, 6, 8] == list(selected)

The on decorator accept some function func, and return a function that first take the tail arguments of func and return a function that accept head argument of func through pipe operator.

So now we can express our thoughts in our codebase using pipeline style code, just with one helper class and one helper decorator! :D

1
2
3
4
5
6
7
8
values = range(10)
result = (
values
| filter(lambda val: val % 2 == 0)
| map(str)
| on(lambda chunks: "".join(chunks))() # create pipe-able object on the fly
)
assert result == "02468"

or

1
2
for val in range(10) | filter(lambda val: val % 2 == 0):
print(val)

Appendix

Complete type-safe code here

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
"""
pipe is a module that make it easy to write higher-order pipeline function
"""

from collections.abc import Callable, Iterable, Iterator
from typing import Generic, TypeVar, ParamSpec, Concatenate

_R = TypeVar("_R")
_T = TypeVar("_T")
_P = ParamSpec("_P")


class Piper(Generic[_T, _R]):
"""
Piper[T, R] is a function that accept T and return R

call the piper with "value_t | piper_t_r"
"""

def __init__(self, func: Callable[[_T], _R]) -> None:
self._func = func

def __ror__(self, items: _T) -> _R:
return self._func(items)


def on(func: Callable[Concatenate[_T, _P], _R]) -> Callable[_P, Piper[_T, _R]]:
"""
"on" decorates a func into pipe-style function.

The result function first takes the arguments, excluding first,
and returns an object that takes the first argument through "|" operator.
"""

def wrapped(*args: _P.args, **kwargs: _P.kwargs) -> Piper[_T, _R]:
def apply(head: _T) -> _R:
return func(head, *args, **kwargs)

return Piper(apply)

return wrapped

Introduction of Function Hijacking in C

Thanks to the symbol-lazy-loading ability in Unix environment, we can do many interesting thing on functions from some shared library when executing some executables.

All we need to do are

  • Implement a shared library that contains the functions we want to hijack.
  • Run the executable with our magic library inserted.

Make a Shared Library

If we want to replace some function with stub / fake implementation. we can just implement a function with the same name and the same signature.

For example, if we want to fixed the clock during unit test …

1
2
3
4
5
6
7
8
9
10
11
// in hijack.c

#include <time.h>

// a "time" function which always return the timestamp of the epoch.
time_t time(time_t* arg) {
if (arg)
*arg = 0;

return 0;
}

If we want do observation about some function call, but still delegate the call to the original function, we can implement a function that load corresponding function at runtime and pass the function call.

For example, if we want to monitor the call sequence of file open action.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// in hijack.c

#include <dlfcn.h>
#include <stdio.h>
#include <time.h>

// write "open" action to standard error before open the file
FILE* fopen(const char* restrict path, const char* restrict mode) {
static int used_count = 0;
used_count += 1;
fprintf(stderr, "open file [%d]: \"%s\"\n", used_count, path);

typedef FILE* (*wrapped_type)(const char* restrict path, const char* restrict mode);
// no dlopen, just search the function in magic handle RTLD_NEXT
wrapped_type wrapped = dlsym(RTLD_NEXT, "fopen");
return wrapped(path, mode);
}

After finish our implementation, compile them as a shared library, called hijack.so here.

1
cc -fPIC -shared -o hijack.so hijack.c

Hijack during Actual Execution

We can use LD_PRELOAD environment variable to do insert our special shared library for any executable during execution.

1
LD_PRELOAD="path-to-shared-lib" executable

For example, if we want to use the implementations in last section in our executable, called app here.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// app.c

#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main() {
FILE* handle = fopen("output.txt", "a");
assert(handle);
fclose(handle);

time_t current = time(NULL);
printf("now: %s", asctime(gmtime(&current)));

return EXIT_SUCCESS;
}

(Compile and) run the executable

1
2
cc -o app app.c
LD_PRELOAD=./hijack.so ./app

Output

1
2
open file [1]: "output.txt"
now: Thu Jan 1 00:00:00 1970

The open-file action is traced, and the time is fixed to the epoch.

If we need to overwrite functions with more than one shared library, just use : to separate them in LD_PRELOAD.

Conclusion

It’s a powerful feature, it allows we to do observation, to replace with mock/fake implementation, or sometime even to apply emergency patch.

And all make this possible are the dynamic linking mechanism, and the one-to-one mapping from symbol from sources to libraries/binaries.

Although development in C is somehow inconvenient, but it’s still a interesting experience when seeing this kind of usage. :D

Python 的 type 不是 type?

Intro

近期因為一些原因,想自己寫一個 Python 用的 DI library。寫是寫完了,不含 test 基本上不到 50 行, 也 release 到 luckydep · PyPI 了。 不過在寫的過程中發現了一些問題。

與 Golang 不同。 在 Python 中,DI container 拿取特定 instance 的介面 (invoke/bind/get 等,下稱 invoke) 需要明確傳遞想拿取的 instance 的 type。

1
2
3
4
5
6
7
func Invoke[T any](c *Container) T {
var _ T // can construct T event if T is a interface
}

// although we need to specify the type parameter, the type parameter
// is not passed during runtime
var instance = Invoke[SomeType](c)
1
2
3
4
5
class Container:
def invoke(t): # search and build a instance of type t

c = Container()
instance = c.invoke(SomeType) # need to pass type as a parameter

其中的根本差異是,Golang 類型的 static type language,generic function 會真的根據不同型別,產生對應的 function 出來,這些 function 的 byte code/machine code 自然知道當下在處理的型別。 而 Python 這類語言,靠 static type checker 建立 generic function,實際上到 runtime 時還是只有一個 function,自然會需要傳遞 type 給 invoke 介面。

自從 Python 3.6 開始我們有 type hint,所以我們可以 annotate function/method 來幫助 IDE/type checker 來推論正確的型別。

1
2
3
4
5
class Container:
def invoke(t: type[T]) -> T: # search and build a instance of type t

c = Container()
instance: SomeType = c.invoke(SomeType) # ok, we can infer instance is SomeType

這邊 type[T] (or typing.Type[T], the old way) 用來表示我們正在用 t 來傳遞傳遞某個 type T, 而非 type 為 T 的某個 instance。

From typing document:

A variable annotated with C may accept a value of type C. In contrast, a variable annotated with type[C] (or typing.Type[C]) may accept values that are classes themselves

The Problem

OK,我們有 type[T] 可以用。 DI library 開發者可以用型別為 type[T]t 來做 indexing, library 使用者可以享受到 static type checker 帶來的 type safety。

於是我們拿這個 library 來用在真實情境.. 沒想到一下子就碰上問題了。 當我們定義 interface type,並透過 DI container 對該 interface 拿取對應的 implementation instance 時。 因為 interface 通常是個 abstract class (or protocol),mypy type checker 會報錯 (mypy: type-abstract)。

1
2
3
4
5
6
7
class SomeInterface(Protocol):
def hello(self): ...

class SomeImplementation:
def hello(self): ...

c.invoke(SomeInterface) # trigger mypy error [type-abstract]

不會吧… 這不是我們需要 DI 的最重要原因嗎? 我們定義 interface 並另外提供 implementation,來達到隔離不同 class 職責的效果。 結果當 user 要用這個 library 的時候卻卡在型別檢查…

The History

翻閱文件,第一時間以為這是 mypy 的設計問題。

Mypy always allows instantiating (calling) type objects typed as Type[t]

沒想到翻了 mypy issue #4717 · python/mypy 後,發現這是已經寫在 PEP 544 內的規格。

Variables and parameters annotated with Type[Proto] accept only concrete (non-protocol) subtypes of Proto. The main reason for this is to allow instantiation of parameters with such type. For example:

1
2
def fun(cls: Type[Proto]) -> int:
return cls().meth() # OK

mypy 允許 construct 一個不知道 constructor 長什麼樣子的 interface type, 所以該標示 Type[Proto] 的 parameter 只能傳遞 concrete type… 嗯?

繼續往下追,想不到一開始會有這個檢查,是因為 Guido 本人 在 2016 年開的 #1843 · python/mypy, 認為應該允許這種使用方法。

於是 mypy 加入了這個檢查,後來 2017 年的 PEP 544 也明確定義了這個使用規則。

The Controversy

這個 t: type[T] 的設計引起很多爭議,從 #4717 · python/mypy 來看,不少人認為: 為了允許 construct t() 而限制只能傳遞 concrete class 會大幅限制這個 type[T] 的使用情境。

也有人認為這個檢查根本就不合理,因為沒有人能保證這個 protocol type 底下的 concrete class 的 constructor 到底要吃什麼東西。 即使 static type check 檢查過了,t() 在 runtime 噴掉一點也不奇怪。 更何況根本沒看過有人在 protocol type 上面定義 __init__ method,這個 t() 一開始到底要怎麼檢查也不知道。

如果看相其他語言的開發經驗… Golang 生態系 constructor 是 plain function,定義 interface type 時自然不會包含 constructor。 寫 C++ 的人應該也沒聽過什麼 abstract constructor,只有 destructor 會掛 abstract keyword。 回到 Python 自身,mypypyright 兩大工具也都允許 __init__ 的 signature 在繼承鍊中被修改。 (see: python/typing · Discussion #1305)

至於 typing.Type 的文件,寫得很模糊,我想有一定程度的人看到反而更容易誤會。

type[C] … may accept values that are classes themselves …

就算捨棄掉 protocol,限制都只能用 concrete class 來定義 interface。 這個只能允許 concrete class 的規則還造成了另一個問題: 使用者該如何傳遞 function type?

1
c.register(Callable[[int, int], int], lambda a, b: a + b) # ????

說好的 function as first-class citizen 呢? 怎麼到了要傳遞型別時就不行了?

在翻閱 issue 的過程中,發現其他 DI framework 的 repo 也遇上同樣的問題 #143 · python-injector/injector, 頓時覺得自己不孤單。

The Future

由於 PEP 544 自從 2017 年就已經完成,mypy 預設執行這個檢查也行之有年, 現在再來改這個行為或許已經來不及了。

於是為了解決這個問題,2020 有人在開了新 issue 9773 · python/mypy 想要定義新的 annotation TypeForm[T]/TypeExpr[T] 來達成要表達任意 type 的 type 的需求。 到目前 (2024-06),對應的 PEP 747 draft 也已經被提出了。

若一切順利,以後我們就會用 TypeExpr[T] 來表達這類 generic function

1
2
3
4
5
6
7
8
class Container:
def invoke(t: TypeExpr[T]) -> T: # search and build a instance of type t

class SomeType(Protocol): ...

c = Container()
instance = c.invoke(SomeType) # ok, we find a object for type SomeType for you!
operator = c.invoke(Callable[[int], bool]) # you need a (int -> bool)? no problem!

至於目前嘛.. library user 在使用到這類 library 的檔案加入下面這行即可。 我想要修改的範圍和造成的影響應該都還可以接受。

1
# mypy: disable-error-code="type-abstract"

期許 Python typing system 完好的那天到來。

Timeline

利用 SSH 建立 SOCKS Proxy

最近因為疫情又開始 WFH 了。 公司有提供一些 VPN solution 讓員工存取公司內網路, 但有一些架在 public cloud 上的服務後台因為有擋來源 IP,無法在家直接存取。

這時候 SSH 內建的 SOCKS proxy server 功能就可以派上用場了!

SSH 可以在建立連線時,一併在本機端開出一個 SOCKS (version 4 and 5) 的 server, 接下來任何應用程式都可以將任意的 TCP 連線透過這個 SOCKS server,轉送到 SSH server 後再與目標站台連線。 因為大家一定在公司裡有台可以 SSH 的機器(?),於是這種限制公司 IP 的管理後台就可以順利存取。 :D

使用方式很簡單,SSH 連線時多下參數即可。

1
ssh "target-machine" -D "localhost:1080" -N
  • -D localhost:1080: 決定要開在 local 的 SOCKS port,RFC 建議是 1080
  • -N: 如果不需要開一個 shell,只是要 SOCKS proxy 功能,那可以多帶此參數

Note: SSH 有支援 SOCKS5 (可做 IPv6 proxy) 但不支援 authentication,不過因為 SOCKS server 可以如上述設定只開在 localhost 上,所以沒麼問題。

接著我們就可以設定 OS 層級或是 application 層級的 proxy 設定來使用這個 proxy 了! 以我一開始遇到的問題來說,通常我會多開一個 Firefox 並設定使用 proxy 來存取公司的各種管理後台。 這樣就可以保持其他網路流量還是直接往外打,不需要過 proxy。 :D

若要快速啟動 proxy,可以使用 Windows Terminal 並設定一個 profile,執行上述 SSH 指令。

PuTTY 作為 Windows 上最多人使用的 SSH client,也有支援 SOCKS proxy 功能, 詳見: How To Set up a SOCKS Proxy Using Putty & SSH - Security Musings

Reference

Golang 1.18 Generics 終於來臨

今天 Golang 1.18 終於正式釋出啦! 我們終於有辦法在 Golang 裡做 generic programming 啦!

Golang 是一個近 10 年快速竄紅的程式語言,但在很多面向其實還是非常土炮。 得靠後續社群不斷的討論與貢獻才達到一個比較完善的水準。 像是..

  • context package in Golang 1.7: 解決 long job cancelling 的問題
  • errors package in Golang 1.13: 滿足其他語言常見的 error 嵌套需求和提供統一的判斷方式
  • Generic support in Golang 1.18: 提供開發者實作各種型無關演算法的機會

一直以來,在 Golang 的標準函式庫中,碰到類型無關的抽象問題時,最後給出來的解法大概就兩種

  1. 所有 input / output 參數都定義成 interface{},大家一起把型別檢查往 run-time 丟
  2. 同一個 class / function 對於常用的 data type 通通實作一遍

前者最典型的大概就是 sync 這個函示庫,後者.. 大家應該都看過那個慘不忍睹的 sort..

不過這些都是過去式了,從今天開始,大家都可以寫自己想要的 generic code / library / framework。 :D

Usage

基本語法很簡單,只要在想做成 generic 的 function / struct 後面多加一個 [T TypeName] 即可, TypeName 是用原本就有的 interface{...} 語法來表示,可以自己描述這個 generic function / struct 支援的型態必須滿足什麼樣的介面。

以 Python style 的 sort by key 當例子。 我們可以定義一個 generic 的 sort function,並且明定送進來的 list 內的每個元素需要支援一個 Key 函示, 作為排序時的根據。

範例 code 如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
package main

type Keyable interface {
Key() int
}

func Sort[T Keyable](items []T) []T {
if len(items) <= 1 {
return items
}

pivot := items[0]
less, greater := []T{}, []T{}
for _, item := range items[1:] {
if item.Key() < pivot.Key() {
less = append(less, item)
} else {
greater = append(greater, item)
}
}

return append(append(less, pivot), greater...)
}

type Person struct {
Name string
Age int
}

func (n Person) Key() int {
return n.Age
}

func main() {
persons := []Person{{Name: "alice", Age: 33}, {Name: "bob", Age: 27}}

persons = Sort(persons)
}

在這個範例中,Sort 要求送進來的 []T 當中的 T 要實作 Keyable 介面 (提供 Key method)。 當我們想排序一堆 Person 時,我們可以在這個 Person 物件上定義 Key method,取出 Person 的年齡。 完成之後,我們就可以依年齡來排序 []Person 了。

期許自己未來可以多加利用這個遲來的功能.. XD

References

如何避免 Commit Message 拼錯字?

文件打錯字還好,隨時可以修。 但在 commit message 中打錯字,可是會流傳千古。

身為一個 RD,有個極簡的解決方法.. 把底下這行設定放到 .vimrc 內即可。 (O

1
autocmd FileType gitcommit setlocal spell

(對於屬 Git commit message 的 buffer 自動啟用 spell check 功能)

設定完之後,當出現 vim 不認識的單字時,就會有醒目的顏色提示, 提醒自己該回頭看一下是不是又拼錯字了。

當然,要有舒適的拼字檢查體驗,字典檔的維護也是很重要的一環。 不過那又是另一個話題了..

Reference

Google 更新 Go 的社群行為準則

昨天 Go blog 上出新文章,說要更新 Code of Conduct。

一直一來覺得每個社群的 CoC 都寫得差不多,不外乎是要互相尊重、開放透明、建設性發言等等。 也因為都差不多,平常也不會去細看。反正就是些正常人該有的道德觀。 因此,看到說要更新 Code of Conduct 讓我感到有點好奇。

讀一讀讀下去,其實這次 Go community 的 CoC 就是新增一條:

Be responsible. What you say and do matters. Take responsibility …

沒想到連這個都要寫進 CoC …。可能 Go 的核心開發團隊看 issue 真的看到心累了? XD

See: Code of Conduct Updates - go.dev