Bitisle

Posted 2024-10-23notes

There is NO unit type in C++. (Not in core language spec, at least.)

A Story

Assuming we are define a abstract interface for storage to get name / set name for some user.

class Storage {
public:
    virtual ~Storage() = default;

    virtual std::string GetName(int id) const = 0;
    virtual void SetName(int id, std::string name) const = 0;
};

Simple and straightforward.

But it’s intended to be a storage accessed through network, so any operation on it is inherently going to fail at some time. Also, We are 2024 now, loving FP, preferring expression over statement, monadic operation being so cool. Thus we decide to wrap all return type in std::optional to indicate these actions may fail.

class Storage {
public:
    virtual ~Storage() = default;

    virtual std::optional<std::string> GetName(int id) const = 0;
    virtual std::optional<void> SetName(int id, std::string name) const = 0;
};

Looks good! But now it fails to be compiled.

1	...\include\optional(100,26): error C2182: '_Value': this use of 'void' is not valid

Well. template stuff.

What Happened?

The problem is that void is an incomplete type in C/C++, and always to be treat specially when we are trying to use them.

By incomplete in C/C++, we mean a type that the size of which is not (yet) known.

For example, if we forward declare a struct type, and later define it’s member. The struct type is incomplete before the definition.

struct Item;

Item item; // <- invalid usage, since that the size of Item is unknown yet.

struct Item {
    int price;
};

Item item; // <- valid usage here.

And void is a type that is impossible to be complete by specification.

But we can have a function that return void? Well, we return nothing

1
2
3

void foo() { }

void foo() { return; } // Or explicit return, both equivalent.

BTW, C before C23 prefer putting a void in parameter list to indicate that a function takes nothing, e.g. int bar(void), but is’s kinda broken design here.

Since that we can not evaluate bar(foo()). There is no such thing that is a void and exists.

void foo() { }

int bar(void) { return 0; }

int main() {
    return bar(foo()); // <- invalid expression here.
}

So back to our problem here std::optional<void>

Conceptually, std::optional<T> is just a some T with additional information of value-existence.

template <typename T>
struct MyOptional {
    T value;
    bool hasValue;
};

Because that there is impossible to have a member void value, std::optional<void> is not going to be a valid type at first place.

(Well, we can make a specialization for void, but that’s another story.)

So, How can We Fix?

The problem here is that there is no a valid value for void in C/C++. At some level, program can be though of a bunch of expressions. An running a program is just the evaluation of these expressions. (Also the side effects, for real products / services)

The atom of expression is value. If there is a concept that’s not possible to be express as a value, we are kicking ourselves.

Take Python for example, if we have a function that return nothing, then the function actually returns None when it exits.

def foo():
    pass

assert(foo() is None) # check pass here

def foo(arg: None) -> None:
    pass

def bar(arg: None) -> None:
    pass

bar(foo(None)) # well.. if we really want to chain them together

So nothing itself is a thing, in Python, we call it None. Every expression now is evaluated to some value that exists, the the type system build up that is complete at any case.

The concept of nothing itself here is call unit type in type theory. It’s a type that has one and only one value. In Python, the value is None, in JavaScript it’s null, in Golang … maybe struct{}{} is a good choice, although not standardized by the language.

Unit Type in C++

Now is the time for C++. As we already see, void is not a good choice for unit type because we can not have a value for it. Are there other choices here?

Just define a empty struct and use it probably not a good choice, since that now our custom unit type is not compatible with unit type from other code in the product code base.

How about nullptr, that’s the only one value for std::nullptr_t. (So the type is std::optional<std::nullptr_t>). It’s a feasible choice, but looks weird since that pointer implies indirect access semantic, but it’s not the case when using with std::optional<T> here.

How about using std::nullopt_t? It’s also a unit type but it’s now more confusing. What’s does it mean by std::optional<std::nullopt_t>? A optional with empty option? There is a static assert in std::optional<T> template that forbid this usage directly, probably because it’s too confusing.

Maybe std::tuple<>? A tuple with zero element, so it have only one value, the empty tuple. That seems to be a good choice because the canonical unit type in Haskell is () the empty tuple. So it looks natural for people came from Haskell. But personally I don’t like this either since that now the type has nested angle bracket as std::optional<std::tuple<>>.

There is a type called std::monostate, arrived at the same time as std::optional in C++17. This candidate do not have additional implication by it’s type or it’s name. It’s monostate! Just a little wordy.

std::monostate is originally designed to solve the problem for a std::variant<...> to be default initialized with any value. But it’s name and it’s characteristic are all fit our requirement here. Thus a good choice for wrapping a function that may fail but return nothing.

Now the interface looks like

class Storage {
public:
    virtual ~Storage() = default;

    virtual std::optional<std::string> GetName(int id) const = 0;
    virtual std::optional<std::monostate> SetName(int id, std::string name) const = 0;
};

Hmm… std::optional<std::monostate>, which takes 29 characters. C++ is not easy. Just like we use std::shared_ptr<T> all over the places.

Maybe the C++ Standards Committee should specialize std::optional<void>, just like std::expected<void> in C++23.

Wish someday void can be a REAL unit type in C/C++. :D

Posted 2024-06-05notes

Python 的 type 不是 type？

Intro

近期因為一些原因，想自己寫一個 Python 用的 DI library。寫是寫完了，不含 test 基本上不到 50 行，也 release 到 luckydep · PyPI 了。不過在寫的過程中發現了一些問題。

與 Golang 不同。在 Python 中，DI container 拿取特定 instance 的介面 (invoke/bind/get 等，下稱 invoke) 需要明確傳遞想拿取的 instance 的 type。

func Invoke[T any](c *Container) T {
    var _ T // can construct T event if T is a interface
}

// although we need to specify the type parameter, the type parameter
// is not passed during runtime
var instance = Invoke[SomeType](c)

class Container:
    def invoke(t): # search and build a instance of type t

c = Container()
instance = c.invoke(SomeType) # need to pass type as a parameter

其中的根本差異是，Golang 類型的 static type language，generic function 會真的根據不同型別，產生對應的 function 出來，這些 function 的 byte code/machine code 自然知道當下在處理的型別。而 Python 這類語言，靠 static type checker 建立 generic function，實際上到 runtime 時還是只有一個 function，自然會需要傳遞 type 給 invoke 介面。

自從 Python 3.6 開始我們有 type hint，所以我們可以 annotate function/method 來幫助 IDE/type checker 來推論正確的型別。

class Container:
    def invoke(t: type[T]) -> T: # search and build a instance of type t

c = Container()
instance: SomeType = c.invoke(SomeType) # ok, we can infer instance is SomeType

這邊 type[T] (or typing.Type[T], the old way) 用來表示我們正在用 t 來傳遞傳遞某個 type T，而非 type 為 T 的某個 instance。

From typing document:

A variable annotated with C may accept a value of type C. In contrast, a variable annotated with type[C] (or typing.Type[C]) may accept values that are classes themselves

The Problem

OK，我們有 type[T] 可以用。 DI library 開發者可以用型別為 type[T] 的 t 來做 indexing， library 使用者可以享受到 static type checker 帶來的 type safety。

於是我們拿這個 library 來用在真實情境.. 沒想到一下子就碰上問題了。當我們定義 interface type，並透過 DI container 對該 interface 拿取對應的 implementation instance 時。因為 interface 通常是個 abstract class (or protocol)，mypy type checker 會報錯 (mypy: type-abstract)。

class SomeInterface(Protocol):
    def hello(self): ...

class SomeImplementation:
    def hello(self): ...

c.invoke(SomeInterface) # trigger mypy error [type-abstract]

不會吧… 這不是我們需要 DI 的最重要原因嗎？我們定義 interface 並另外提供 implementation，來達到隔離不同 class 職責的效果。結果當 user 要用這個 library 的時候卻卡在型別檢查…

The History

翻閱文件，第一時間以為這是 mypy 的設計問題。

Mypy always allows instantiating (calling) type objects typed as Type[t]

沒想到翻了 mypy issue #4717 · python/mypy 後，發現這是已經寫在 PEP 544 內的規格。

Variables and parameters annotated with Type[Proto] accept only concrete (non-protocol) subtypes of Proto. The main reason for this is to allow instantiation of parameters with such type. For example:
1
2
def fun(cls: Type[Proto]) -> int:
    return cls().meth() # OK

mypy 允許 construct 一個不知道 constructor 長什麼樣子的 interface type，所以該標示 Type[Proto] 的 parameter 只能傳遞 concrete type… 嗯？

繼續往下追，想不到一開始會有這個檢查，是因為 Guido 本人 在 2016 年開的 #1843 · python/mypy，認為應該允許這種使用方法。

於是 mypy 加入了這個檢查，後來 2017 年的 PEP 544 也明確定義了這個使用規則。

The Controversy

這個 t: type[T] 的設計引起很多爭議，從 #4717 · python/mypy 來看，不少人認為: 為了允許 construct t() 而限制只能傳遞 concrete class 會大幅限制這個 type[T] 的使用情境。

也有人認為這個檢查根本就不合理，因為沒有人能保證這個 protocol type 底下的 concrete class 的 constructor 到底要吃什麼東西。即使 static type check 檢查過了，t() 在 runtime 噴掉一點也不奇怪。更何況根本沒看過有人在 protocol type 上面定義 __init__ method，這個 t() 一開始到底要怎麼檢查也不知道。

如果看相其他語言的開發經驗… Golang 生態系 constructor 是 plain function，定義 interface type 時自然不會包含 constructor。寫 C++ 的人應該也沒聽過什麼 abstract constructor，只有 destructor 會掛 abstract keyword。回到 Python 自身，mypy 和 pyright 兩大工具也都允許 __init__ 的 signature 在繼承鍊中被修改。 (see: python/typing · Discussion #1305)

至於 typing.Type 的文件，寫得很模糊，我想有一定程度的人看到反而更容易誤會。

type[C] … may accept values that are classes themselves …

就算捨棄掉 protocol，限制都只能用 concrete class 來定義 interface。這個只能允許 concrete class 的規則還造成了另一個問題: 使用者該如何傳遞 function type？

1	c.register(Callable[[int, int], int], lambda a, b: a + b) # ????

說好的 function as first-class citizen 呢？怎麼到了要傳遞型別時就不行了？

在翻閱 issue 的過程中，發現其他 DI framework 的 repo 也遇上同樣的問題 #143 · python-injector/injector，頓時覺得自己不孤單。

The Future

由於 PEP 544 自從 2017 年就已經完成，mypy 預設執行這個檢查也行之有年，現在再來改這個行為或許已經來不及了。

於是為了解決這個問題，2020 有人在開了新 issue 9773 · python/mypy 想要定義新的 annotation TypeForm[T]/TypeExpr[T] 來達成要表達任意 type 的 type 的需求。到目前 (2024-06)，對應的 PEP 747 draft 也已經被提出了。

若一切順利，以後我們就會用 TypeExpr[T] 來表達這類 generic function

class Container:
    def invoke(t: TypeExpr[T]) -> T: # search and build a instance of type t

class SomeType(Protocol): ...

c = Container()
instance = c.invoke(SomeType) # ok, we find a object for type SomeType for you!
operator = c.invoke(Callable[[int], bool]) # you need a (int -> bool)? no problem!

至於目前嘛.. library user 在使用到這類 library 的檔案加入下面這行即可。我想要修改的範圍和造成的影響應該都還可以接受。

1	# mypy: disable-error-code="type-abstract"

期許 Python typing system 完好的那天到來。

Timeline

2016-07: #1843 · python/mypy Guido 提出要 instantiate 的需求
2017-05: PEP 544 standardized and published
2018-05: #4717 · python/mypy first discussion against type[T] design
2020-04: #143 · python-injector/injector
2020-10: #9773 · python/mypy propose idea of TypeFrom[T]
2024-06: PEP 747 draft created