Bitisle

Posted 2025-06-28tips

shpool is a tool like tmux and screen to keep persistent shell sessions across hangups.

It’s quite new (first released 2023), and extremely simple to use. Usually I only need one persistent shell in remote machine, and this tool, unlike tmux, is designed to support only one shell per session. That makes me very satisfied.

Currently there is no Bash auto-complete setup provided from the official repo. Therefore I implemented one for myself. With that, I can make my life easier. 🤣

1 2	shpool <tab> # complete sub-commands like attach / list / help ... shpool attach <tab> # complete available sessions

Here is the auto-complete setup script for Bash, for anyone who have same requirement.

# shellcheck shell=bash

_shpool_completions() {

  [[ ${#COMP_WORDS[@]} -eq 1 ]] && return 0
  [[ ${COMP_WORDS[0]} != "shpool" ]] && return 0

  if [[ ${#COMP_WORDS[@]} -eq 2 ]]; then
    local commands=(attach detach kill list version help)
    local partial="${COMP_WORDS[$COMP_CWORD]}"
    mapfile -t COMPREPLY < <(compgen -W "${commands[*]}" -- "${partial}")
    return 0
  fi

  [[ ${#COMP_WORDS[@]} -ne 3 ]] && return 0

  # Now, try completing session names for commands that take them
  local partial="${COMP_WORDS[$COMP_CWORD]}"
  local command="${COMP_WORDS[$COMP_CWORD-1]}"

  case "${command}" in
    attach|detach|kill)
      local -a sessions
      mapfile -t sessions < <(shpool list | tail -n +2 | awk '{print $1}')
      mapfile -t COMPREPLY < <(compgen -W "${sessions[*]}" -- "${partial}")
      return 0
      ;;
    *)
      # No need for other commands, they don't take arguments
      COMPREPLY=()
      return 0
      ;;
  esac
}

complete -F _shpool_completions shpool

Wish this tool gain more public notice!

GitHub Link: https://github.com/shell-pool/shpool

Posted 2025-02-26notes

Easier Iteration with Generator

In Python, we can make a container type iterable by implement __iter__ method.

For example, if we want to provide all value in a range [lo, hi) …

class TwoTillSix:

    def __iter__(self):
        for val in range(2, 6):
            yield val

list(TwoTillSix())  # [2, 3, 4, 5]

With the help of this generator function, we don’t need a separate iterator type. Just keep the state during iteration directly in some local variable.

In this post, let’s see how we can do this in C++, with the help of std::generator<T>.

A shorter version of this post can be found in Gist and Compiler Explorer.

Coroutine and Concurrent Programming

What happens in the example above is that Python interpreter binds the execution state of that function to a generator object. Every time we want a value from that generator, the execution resumes until the function yields the next value (or returns).

This is doable because Python provide a builtin support for coroutine, an interruptible/resumable function, so the control flow can interleave between callee and caller functions. And that’s the primitive building block of concurrent programming.

values = TwoTillSix()

for val in values:
    print(val)
    if val == 3:
        break
# Print 2 and 3. The body of `TwoTillSix.__iter__` is only executed twice.

As we can see here, the generator makes it possible that some values are evaluated lazily, and we can even stop the evaluation any time when we lost the interest. All of this is possible because the support of coroutine.

C++ Coroutine and `std::generator`

Fortunately, C++ has language level support for coroutine since C++20. When compiler see a function using new keywords co_await / co_yield and co_return, it will generate corresponding code to provide the ability of partial execution, and the storage of state for that function when it’s interrupted.

Also, starting from C++23, std::generator is included in standard library. As the coroutine (promise-)type that capture the common usage of… yes, generator!! (Actually I’m surprised that there is no any high-level coroutine type in standard library when coroutine is accepted as a language feature when C++20 is finalized.)

std::generator<int> fibonacci() {
    int a = 1, b = 0;
    while (true) {
        co_yield a;
        b = std::exchange(a, a + b);
    }
}

int main() {
    for (int num : fibonacci()) {
        if (num > 10)
            break;
        std::println("{}", num); // prints 1 1 2 3 5 8
    }
    return 0;
}

When fibonacci is called here, an interruptible function is created, and wrapped in the std::generator return value. Every time we take next value out in the for-loop, the function resumes and yield next value (and is interrupted again).

So now we have the ability to interleave control flow, just like the example in Python earlier!! But… that’s a free function, usually what we encounter is a container class. A class must has begin/end method, and corresponding increment/dereference operator on the returned iterator type, in order to be iterable. (It’s modeled as named requirements Container in standard.) If a class need another method call to be iterated on, it just looks… not like C++.

struct Factorial { // a virtual container for factorial series
    std::generator<int> generate() {
        int product = 1;
        int val = 1;
        while (true) {
            co_yield product;
            product *= val++;
        }
    }
};

int main() {
    Factorial fac {};
    for (int num : fac.generate()) { // well.. can we iterate on fac directly?
        if (num > 10)
            break;
        std::println("{}", num);
    }
}

So… can we keep the implementation simple (using std::generator coroutine type), while keeping the container being iterated in usual way?

CRTP: Curiously Recurring Template Pattern

Now our goal become clear:

How can we make a class be used as before, if the class only provides a method that return a std::generator?”

That’s the place where we adopt CRTP to solve problem.

/// @brief HasGenerator identifies the customization point of CRTP base AutoContainer.
/// @tparam C the child class type.
/// @tparam T the value type.
template <typename C, typename T>
concept HasGenerator = requires(C c) {
    { c.generate() } -> std::same_as<std::generator<T>>;
};

/// @brief A CRTP base type that allow easy iteration implementation.
/// @tparam T the value type of element being iterated. Not child class type.
/// @details Child class must provide std::generator<T> generate() member function.
template <std::semiregular T>
struct AutoContainer {

    class Iter {
        std::generator<T> gen_;
        std::ranges::iterator_t<std::generator<T>> iter_;

    public:
        Iter(std::generator<T>&& gen)
            : gen_ { std::move(gen) } // ensure generator being alive
            , iter_ { gen_.begin() } { }

        bool operator==(const std::default_sentinel_t&) const { return iter_ == gen_.end(); }
        Iter& operator++() { return ++iter_, *this; }
        T operator*() const { return *iter_; }
    };

    // deducing-this so that we can do CRTP.
    // Also use concept to give better error message when doing mistake.
    Iter begin(this HasGenerator<T> auto&& self) {
        return Iter(self.generate());
    }

    // deducing-this not required, just be symmetric with begin.
    std::default_sentinel_t end(this auto&& self) {
        return std::default_sentinel;
    };
};

AutoContainer can inherited by any class that want the all the implementation for iteration. The base class here provides an iterator type to capture the generator, and forwards the increment/dereference operation to it. With this iterator type, the remain effort is on the begin method, just construct the iterator from self.generate(). Here self is typed by deducing-this so generate() can be provided later by child class. And that’s the place where the base class connects to it’s child.

Also at the begin method, we add concept constrain HasGenerator. This way we can avoid error waterfall when using templated library. And the interesting thing here is: “The thing to be templated so that we achieve CRTP, is the templated begin method, not the AutoContainer class itself. If there are two child class both inherit from AuthContainer<int>, only one AuthContainer type is instantiated, but two different begin are instantiated.

int main() {

    /// @brief Factorial sequence generator.
    struct Factorial : public AutoContainer<int> {

        // Implement `generate` so this class can be iterated like a container.
        std::generator<int> generate() const {
            // The mutable state is the local variable of this coroutine,
            // so the method receiver can be const.
            int product = 1;
            int val = 1;
            while (true) {
                co_yield product;
                product *= val++;
            }
        }
    };

    const Factorial fac {};
    for (int value : fac) { // noticed that we don't call generate() here
        if (value > 10)
            break;
        std::println("{}", value); // prints 1, 1, 2, 6
    }
}

Now we can use the AutoContainer by inherit it in any child class. In the example above, Factorial inherit from AutoContainer<int> (since it generates integers). All we need to do is implement the generate method. Being a coroutine, the logic is clean.

Now everything works and looks nice!! :D

To make this class recognized as a std::ranges::input_range, that is, to make this class works with <ranges> library, the base AutoContainer actually need more stuffs, like value_type/difference_type on it’s iterator, support of post-increment operator, and so on… But that’s probably too much in a tutorial, so let’s stop here.

More Standard Library Support, Maybe?

We describe the minimal concept about coroutine, one important application of coroutine as Generator (std::generator), and how to use it in a beautiful way. But there is another important application of coroutine: Promise (with Event Loop).

JS developers already get used to the Promise type (or maybe the async/await keyword). Python also support this kind of usage since Python 3.5. But unfortunately, there is still no such thing in C++ standard library, even if the coroutine is standardized in C++20.

The good news is that there is already a proposal P2300 and a reference implementation NVIDIA/stdexec. (They don’t use the term Promise, probably because promise_type already have other meaning in the context of C++ coroutine)

Hope we can see more usage about these in the future.

Posted 2025-02-04notes

The Backbone of Frontend Framework

It’s almost impossible to write a frontend without any UI framework nowadays.

All these frameworks provide some kind of mechanism that, whenever we change a value in JavaScript, some element in UI will be changed automatically. For example, adding new row to a table when we push a item into a list.

But how does this work under the surface? That’s the Proxy Object.

Internal Method and Proxy

We need to talk about internal method in JS first.

A internal method is a special method/function that decide how a object behaves in JS runtime. For example, [[GET]] internal method determines the result when we access some property prop from an object obj, e.g. obj.prop. And [[Call]] internal method determines what to proceed when we call the object obj as a function, e.g. obj().

These internal methods can be intercepted and customized from some existing object, with a Proxy instance.

For example, if we want to intercept the property-read action of some object. Here is the way to do.

var person = {
    name: "Alice"
};

var agent = new Proxy(
    person,
    {
        get: function(target, property, receiver) {
            console.log(`access "${property}" as ${target[property]}`);
            return target[property];
        },
    },
);

var _ = agent.name; // prints: access "name" as Alice"

In the snippet above, agent is the proxy object instance. The second argument to Proxy constructor is the handler, and the get function in that handler is a trap to the [[GET]] internal method.

See

The `set` Trap for Binding

Just like we can intercept the property read action for some object, we can also intercept the property write action. And that’s what UI frameworks do underneath.

Assuming we have a paragraph in DOM tree, identified by id greeting.

1	<p id="greeting">Hi</p>

We can ensure that the text is updated automatically when we are doing some change to JavaScript object.

var element = document.querySelector('#greeting');
var handler = {
    set: function(target, property, value, receiver) {
        if (property !== 'text')
            return false;
        target[property] = value;
        element.textContent = value; // react to the change
        return true;
    }
};

var greeting = new Proxy({text: element.textContent}, handler);
console.log(greeting.text); // print 'Hi'
greeting.text = 'Hello';
// the text in the html page will be changed accordingly.

Once this proxy object is created, we make the view (HTML) reactive to the model (JavaScript object), and the binding is established.

In the examples above, we may noticed that all the intercepted objects are an object literally. That means we can not do similar thing to primitive type in JS, like number and string. And that do have impact on the design of UI framework. For example, Vue.js framework has a ref() wrapper for that.

However, Proxy is still a powerful feature in JS, and probably the most important feature for any senior engineer who need to work with JS. So just get used to it. :D

Appendix

Full HTML file for the view-model binding example.

<!DOCTYPE html> 
<html>
<head>
    <title>Proxy Demo</title>
</head>
<body>
    <p id="greeting">Hi</p>
    <script>
        document.addEventListener('DOMContentLoaded', function() {
            var element = document.querySelector('#greeting');
            var handler = {
                set: function(target, property, value, receiver) {
                    if (property !== 'text')
                        return false;
                    target[property] = value;
                    element.textContent = value; // react to the change
                    return true;
                }
            };

            var greeting = new Proxy({text: element.textContent}, handler);
            console.log(greeting.text); // print 'Hi'
            greeting.text = 'Hello';
            // the text in the html page will change accordingly.
        });
    </script>
</body>
</html>

Posted 2024-11-26news

The True Placeholder Symbol in C++

In many programming language, the common way to indicate that the symbol is not important, is to use _ for the symbol.

It was just a convention in C++, but it will become a language feature start from C++26.

Well… what’s the difference?

We use _ when there is some declaration but we do not care the name / have no good name for the variable.

For example, a common trick to preserve the life time of RAII lock is

void doJob() {
    static std::mutex mutex;
    std::lock_guard _(mutex); // give it a name so it won't unlock immediately
    // some jobs ...
}

Or in structure-binding statement.

template <std::regular T>
std::tuple<T, bool> someJob() {
    return { {}, true };
}

void foo() {
    auto [_, done] = someJob<int>();
}

The problem is… in C++, this style is just a convention, _ is still a regular variable. So if we want to ignore two value with different type, it does not work since the type mismatch.

void foo() {
    auto [_1, done1] = someJob<int>();
    auto [_2, done2] = someJob<std::string>();
    // we need to separate _2 from _1
}

That’s frustrating, especially for people with experience of pattern-matching expression in other languages.

So in C++26 (proposed by P2169), now we can new way to interpret the semantic of _.

The rule is simple.

If there is only one declaration of _ in some scope, everything is same as before.

A we can reference it later if we wan’t, although it’s probably a bad smell to use _ in this case.

If there are more declarations of _, they all refer to different objects respectively.

In this case, they can only be assigned to. Try to use them is a compiling error.

And we can finally write something that looks more natural.

void foo() {
    auto [_, done1] = someJob<int>();
    auto [_, done2] = someJob<std::string>();
}

Golang has this feature from the beginning, is called blank identifier. For Python, although being a dynamic-type language, there is no problem to do use _ for different type value. _ is defined as a wildcard when pattern-matching is introduced to Python (PEP 634).

It’s happy to see this came to C++ now. :D

Posted 2024-10-23notes

There is No Unit Type in C++

There is NO unit type in C++. (Not in core language spec, at least.)

A Story

Assuming we are define a abstract interface for storage to get name / set name for some user.

class Storage {
public:
    virtual ~Storage() = default;

    virtual std::string GetName(int id) const = 0;
    virtual void SetName(int id, std::string name) const = 0;
};

Simple and straightforward.

But it’s intended to be a storage accessed through network, so any operation on it is inherently going to fail at some time. Also, We are 2024 now, loving FP, preferring expression over statement, monadic operation being so cool. Thus we decide to wrap all return type in std::optional to indicate these actions may fail.

class Storage {
public:
    virtual ~Storage() = default;

    virtual std::optional<std::string> GetName(int id) const = 0;
    virtual std::optional<void> SetName(int id, std::string name) const = 0;
};

Looks good! But now it fails to be compiled.

1	...\include\optional(100,26): error C2182: '_Value': this use of 'void' is not valid

Well. template stuff.

What Happened?

The problem is that void is an incomplete type in C/C++, and always to be treat specially when we are trying to use them.

By incomplete in C/C++, we mean a type that the size of which is not (yet) known.

For example, if we forward declare a struct type, and later define it’s member. The struct type is incomplete before the definition.

struct Item;

Item item; // <- invalid usage, since that the size of Item is unknown yet.

struct Item {
    int price;
};

Item item; // <- valid usage here.

And void is a type that is impossible to be complete by specification.

But we can have a function that return void? Well, we return nothing

1
2
3

void foo() { }

void foo() { return; } // Or explicit return, both equivalent.

BTW, C before C23 prefer putting a void in parameter list to indicate that a function takes nothing, e.g. int bar(void), but is’s kinda broken design here.

Since that we can not evaluate bar(foo()). There is no such thing that is a void and exists.

void foo() { }

int bar(void) { return 0; }

int main() {
    return bar(foo()); // <- invalid expression here.
}

So back to our problem here std::optional<void>

Conceptually, std::optional<T> is just a some T with additional information of value-existence.

template <typename T>
struct MyOptional {
    T value;
    bool hasValue;
};

Because that there is impossible to have a member void value, std::optional<void> is not going to be a valid type at first place.

(Well, we can make a specialization for void, but that’s another story.)

So, How can We Fix?

The problem here is that there is no a valid value for void in C/C++. At some level, program can be though of a bunch of expressions. An running a program is just the evaluation of these expressions. (Also the side effects, for real products / services)

The atom of expression is value. If there is a concept that’s not possible to be express as a value, we are kicking ourselves.

Take Python for example, if we have a function that return nothing, then the function actually returns None when it exits.

def foo():
    pass

assert(foo() is None) # check pass here

def foo(arg: None) -> None:
    pass

def bar(arg: None) -> None:
    pass

bar(foo(None)) # well.. if we really want to chain them together

So nothing itself is a thing, in Python, we call it None. Every expression now is evaluated to some value that exists, the the type system build up that is complete at any case.

The concept of nothing itself here is call unit type in type theory. It’s a type that has one and only one value. In Python, the value is None, in JavaScript it’s null, in Golang … maybe struct{}{} is a good choice, although not standardized by the language.

Unit Type in C++

Now is the time for C++. As we already see, void is not a good choice for unit type because we can not have a value for it. Are there other choices here?

Just define a empty struct and use it probably not a good choice, since that now our custom unit type is not compatible with unit type from other code in the product code base.

How about nullptr, that’s the only one value for std::nullptr_t. (So the type is std::optional<std::nullptr_t>). It’s a feasible choice, but looks weird since that pointer implies indirect access semantic, but it’s not the case when using with std::optional<T> here.

How about using std::nullopt_t? It’s also a unit type but it’s now more confusing. What’s does it mean by std::optional<std::nullopt_t>? A optional with empty option? There is a static assert in std::optional<T> template that forbid this usage directly, probably because it’s too confusing.

Maybe std::tuple<>? A tuple with zero element, so it have only one value, the empty tuple. That seems to be a good choice because the canonical unit type in Haskell is () the empty tuple. So it looks natural for people came from Haskell. But personally I don’t like this either since that now the type has nested angle bracket as std::optional<std::tuple<>>.

There is a type called std::monostate, arrived at the same time as std::optional in C++17. This candidate do not have additional implication by it’s type or it’s name. It’s monostate! Just a little wordy.

std::monostate is originally designed to solve the problem for a std::variant<...> to be default initialized with any value. But it’s name and it’s characteristic are all fit our requirement here. Thus a good choice for wrapping a function that may fail but return nothing.

Now the interface looks like

class Storage {
public:
    virtual ~Storage() = default;

    virtual std::optional<std::string> GetName(int id) const = 0;
    virtual std::optional<std::monostate> SetName(int id, std::string name) const = 0;
};

Hmm… std::optional<std::monostate>, which takes 29 characters. C++ is not easy. Just like we use std::shared_ptr<T> all over the places.

Maybe the C++ Standards Committee should specialize std::optional<void>, just like std::expected<void> in C++23.

Wish someday void can be a REAL unit type in C/C++. :D

Posted 2024-08-22notes

Golang 1.23 Iterator Functions

For a long long long time, Golang have no standard way to represent a iterable sequence.

C++ has range adaptor and iterator (although not strictly typed, only by concept), Python has iterable/iterator by __iter__/__next__, JavaScript has standardized for-of and Symbol.iterator since ES6.

Now it’s time for Golang. Starting from Golang 1.23 Aug., we have iterator functions.

How It Works.

Sample code explains faster.

func Iota() func(yield func(idx int) bool) {
	return func(yield func(idx int) bool) {
		idx := 0
		for {
			if !yield(idx) {
				return
			}
			idx += 1
		}
	}
}

func main() {
	for idx := range Iota() {
		if idx == 3 {
			break
		}
		fmt.Println(idx) // print 0 1 2
	}
}

According to Go 1.23 Release Notes Now the range keyword accept three kinds of functions, for which takes another yield function that yield zero/one/two values.

1
2
3

func(func() bool)
func(func(K) bool)
func(func(K, V) bool)

The loop control variable and the body of the for-loop is translated into the yield function by language definition. So we can still write imperative-style loop structure even though we are actually doing some functional-style function composition here.

Why Do We Need This?

Standardize the iterable/iterator interface is a important pre-condition for lazy evaluation. For example, how should we do when we need to iterates through all non-negative integer, and doing some map/filter/reduce on them? It waste space to allocate a list for all these integers (if possible).

Someone may say “we already have channel types”. Well, but that requires a separate coroutine instance. We probably don’t want such heavy cost every time we are doing some iterate operations.

Also a separate coroutine means additional synchronization and lifecycle control. For example, how can we terminate the Count coroutine when we need early break in loop?

func Count(start int) chan int {
	output := make(chan int)
	go func() {
		idx := start
		for {
			output <- idx
			idx += 1
		}
	}()
	return output
}

func main() {
	for idx := range Count(0) {
		if idx == 10 {
			break
		}
		fmt.Println("Loop: ", idx)
	}
}

We need some mechanism like context object or another channel right? That’s a burden for such easy task here.

On the other hand, iterator functions are just ordinary function that accept another function to yield/output the iterated values, so it’s much lightweight than a separate coroutine. We want fast program, right? :D

The Stop Design

For languages like Python and JavaScript, the iterator function (or generator in Python terms) is paused and the control is transfer back to the function that iterates the values. When break/return happens and no more value are required, the iterator function just got collected by the runtime since that there are no more references to the function object.

But how do we early break the iteration process, if the control is transfer into the iterator function? Let’s look at the function signature again. (Take one value iterator function for example).

1	func(yield func(idx int) bool)

The yield function returns a bool to indicate that whether the loop body does reach the end, or encounter a break statement. So in normal case, we continue to next possible value after yield return, but if we got false from yield, our iterator function can return immediately.

Ecosystem around Iterator

The beauty of iterator only appears if the ecosystem, or we say, the common operations around iterator are already implemented in standard library. That means:

Conversion from and to standard container types, like slice map and chan
Operations and compositions of iterators, e.g. map/filter/reduce/chain/take …

In Python, there are generator expressions, which evolves implicit map/filter. reduce is a function at global scope, also there are many useful functions in itertools package, e.g. pairwise, batched, chain. Most builtin container types takes iterable as first argument in it’s constructor.

In Golang, the first part is mostly done along the release of Golang 1.23. For example, to convert slice from and to iterator, we can use slices.Collect and slices.Values.

For second part, there is a plan to add x/exp/xiter package under golang.org namespace. There should be at least Concat, Map, Filter, Reduce, Zip … once it’s released. But unfortunately it’s not compete yet.

See: iter: new package for iterators · Issue #61897 · golang/go

Also I create a toy package github.com/wdhongtw/mice/flow to address some important building wheel around iterators

Empty/Pack return a iterator for zero/one value
Any/All short-circuit lazy evaluation of a predicate on a sequence of values
Forward/Backward absorb input and iterate in reversed order.

For example, if we want to define a iterator function for a binary tree in recursive manner, we can use Empty and Pack together with Chain to implement this easily.

type Node struct {
    val   int
    left  *Node
    right *Node
}

func Traverse(node *Node) iter.Seq[int] {
    // Empty is useful as base case during recursive generator chaining.
    if node == nil {
        return Empty[int]()
    }
    // Pack is useful to promote a single value into a iterable for chaining.
    return Chain(
        Traverse(node.left),
        Pack(node.val),
        Traverse(node.right),
    )
}

Looks cool, doesn’t it? :D

Posted 2024-07-13tips

Pipeline Style Map-Reduce in Python

Since C++20, C++ provide a new style of data processing, and the ability of lazy evaluation by chaining the iterator to another iterator.

#include <ranges>
#include <vector>

int main() {
    std::vector<int> input = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
    auto output = input | std::views::filter([](const int n) {return n % 3 == 0;})
                        | std::views::transform([](const int n) {return n * n;});
    // now output is [0, 9, 36, 81], conceptually
}

Can we use this style in Python? Yes! :D

Evaluation of Operators

In Python, all expression evolves a arithmetic operator, e.g. a + b, is evaluate by follow rule

If the (forward) special method, e.g. __add__ exists on left operand
- It’s invoked on left operand, e.g. a.__add__(b)
- If the invocation return some meaningful value other than NotImplemented, done!
If the (forward) special method does not exist, or the invocation returns NotImplemented, then
If the reverse special method, e.g. __radd__ exists on right operand
- It’s invoked on the right operator, e.g. b.__radd__(a)
- If the invocation return some meaningful value other than NotImplemented, done!
Otherwise, TypeError is raised

So it seems possible here… Let’s make a quick experiment

class Adder:
    def __init__(self, rhs: int) -> None:
        self._rhs = rhs

    def __ror__(self, lhs: int) -> int:
        return lhs + self._rhs


assert 5 == 2 | Adder(3)  # Is 5 equals to 2 + 3 ? Yes!!

This works because the | operator of integer 2 check the type of Adder(3) and found that is not something it recognized, so it returns NotImplemented and our reverse magic method goes.

In C++, the | operator is overloaded(?) on range adaptors to accept ranges. So maybe we can make something similar, having some object implements __ror__ that accept an iterable and return another value (probably a iterator).

Pipe-able Higher Order Function

So back to our motivation, Python already have something like filter map reduce, and also the powerful generator expression to filter and/or map without explicit function call.

1	values = filter(lambda v: v % 2 == 0, range(10))

1	values = (v for v in range(10) if v % 2 == 0)

But it’s just hard to chain multiple operations together while preserving readability.

So let’s make a filter object that support | operator

class Filter:
    def __init__(self, predicate: Callable[[int], bool]) -> None:
        self._predicate = predicate

    def __ror__(self, values: Iterable[int]) -> Iterator[int]:
        for value in values:
            if self._predicate(value):
                yield value


selected = range(10) | Filter(lambda val: val % 2 == 0)
assert [0, 2, 4, 6, 8] == list(selected)

How about map?

class Mapper:
    def __init__(self, transform: Callable[[int], int]) -> None:
        self._transform = transform

    def __ror__(self, values: Iterable[int]) -> Iterator[int]:
        for value in values:
            yield self._transform(value)


processed = range(3) | Mapper(lambda val: val * 2)
assert [0, 2, 4] == list(processed)

Works well, we are great again!!

It just take some time for we to write the class representation for filter, map, reduce, take, any … and any higher function you may think useful.

Wait, it looks so tedious. Python should be a powerful language, isn’t it?

Piper and Decorators

The function capturing and __ror__ implementation can be so annoying for all high order function. If we can make sure __ror__ only take left operand, and return the return value of the captured function, than we can extract a common Piper class. We just need another function to produce a function that already capture the required logic.

class Piper(Generic[_T, _U]):
    def __init__(self, func: Callable[[_T], _U]) -> None:
        self._func = func

    def __ror__(self, lhs: _T) -> _U:
        return self._func(lhs)


def filter_wrapped(predicate: Callable[[_T], bool]):
    def apply(items: Iterable[_T]) -> Iterator[_T]:
        for item in items:
            if predicate(item):
                yield item

    return Piper(apply)


selected = range(10) | filter_wrapped(lambda val: val % 2 == 0)
assert [0, 2, 4, 6, 8] == list(selected)

Now it looks a little nicer … but we still need to implement all wrapper functions for all kinds of operations?

Again, the only difference between these wrapped functions is the logic inside apply function, so we can extract this part again, with a decorator!! :D

def on(func: Callable[Concatenate[_T, _P], _R]) -> Callable[_P, Piper[_T, _R]]:
    def wrapped(*args: _P.args, **kwargs: _P.kwargs) -> Piper[_T, _R]:
        def apply(head: _T) -> _R:
            return func(head, *args, **kwargs)

        return Piper(apply)

    return wrapped


@on
def filter(items: Iterable[_T], predicate: Callable[[_T], bool]) -> Iterator[_T]:
    for item in items:
        if predicate(item):
            yield item


selected = range(10) | filter(lambda val: val % 2 == 0)
assert [0, 2, 4, 6, 8] == list(selected)

The on decorator accept some function func, and return a function that first take the tail arguments of func and return a function that accept head argument of func through pipe operator.

So now we can express our thoughts in our codebase using pipeline style code, just with one helper class and one helper decorator! :D

values = range(10)
result = (
    values
    | filter(lambda val: val % 2 == 0)
    | map(str)
    | on(lambda chunks: "".join(chunks))() # create pipe-able object on the fly
)
assert result == "02468"

1 2	for val in range(10) \| filter(lambda val: val % 2 == 0): print(val)

Appendix

Complete type-safe code here

"""
pipe is a module that make it easy to write higher-order pipeline function
"""

from collections.abc import Callable, Iterable, Iterator
from typing import Generic, TypeVar, ParamSpec, Concatenate

_R = TypeVar("_R")
_T = TypeVar("_T")
_P = ParamSpec("_P")


class Piper(Generic[_T, _R]):
    """
    Piper[T, R] is a function that accept T and return R

    call the piper with "value_t | piper_t_r"
    """

    def __init__(self, func: Callable[[_T], _R]) -> None:
        self._func = func

    def __ror__(self, items: _T) -> _R:
        return self._func(items)


def on(func: Callable[Concatenate[_T, _P], _R]) -> Callable[_P, Piper[_T, _R]]:
    """
    "on" decorates a func into pipe-style function.

    The result function first takes the arguments, excluding first,
    and returns an object that takes the first argument through "|" operator.
    """

    def wrapped(*args: _P.args, **kwargs: _P.kwargs) -> Piper[_T, _R]:
        def apply(head: _T) -> _R:
            return func(head, *args, **kwargs)

        return Piper(apply)

    return wrapped

Posted 2024-07-08tips

Introduction of Function Hijacking in C

Thanks to the symbol-lazy-loading ability in Unix environment, we can do many interesting thing on functions from some shared library when executing some executables.

All we need to do are

Implement a shared library that contains the functions we want to hijack.
Run the executable with our magic library inserted.

Make a Shared Library

If we want to replace some function with stub / fake implementation. we can just implement a function with the same name and the same signature.

For example, if we want to fixed the clock during unit test …

// in hijack.c

#include <time.h>

// a "time" function which always return the timestamp of the epoch.
time_t time(time_t* arg) {
    if (arg)
        *arg = 0;

    return 0;
}

If we want do observation about some function call, but still delegate the call to the original function, we can implement a function that load corresponding function at runtime and pass the function call.

For example, if we want to monitor the call sequence of file open action.

// in hijack.c

#include <dlfcn.h>
#include <stdio.h>
#include <time.h>

// write "open" action to standard error before open the file
FILE* fopen(const char* restrict path, const char* restrict mode) {
    static int used_count = 0;
    used_count += 1;
    fprintf(stderr, "open file [%d]: \"%s\"\n", used_count, path);

    typedef FILE* (*wrapped_type)(const char* restrict path, const char* restrict mode);
    // no dlopen, just search the function in magic handle RTLD_NEXT
    wrapped_type wrapped = dlsym(RTLD_NEXT, "fopen");
    return wrapped(path, mode);
}

After finish our implementation, compile them as a shared library, called hijack.so here.

1	cc -fPIC -shared -o hijack.so hijack.c

Hijack during Actual Execution

We can use LD_PRELOAD environment variable to do insert our special shared library for any executable during execution.

1	LD_PRELOAD="path-to-shared-lib" executable

For example, if we want to use the implementations in last section in our executable, called app here.

// app.c

#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main() {
    FILE* handle = fopen("output.txt", "a");
    assert(handle);
    fclose(handle);

    time_t current = time(NULL);
    printf("now: %s", asctime(gmtime(&current)));

    return EXIT_SUCCESS;
}

(Compile and) run the executable

1 2	cc -o app app.c LD_PRELOAD=./hijack.so ./app

Output

1 2	open file [1]: "output.txt" now: Thu Jan 1 00:00:00 1970

The open-file action is traced, and the time is fixed to the epoch.

If we need to overwrite functions with more than one shared library, just use : to separate them in LD_PRELOAD.

Conclusion

It’s a powerful feature, it allows we to do observation, to replace with mock/fake implementation, or sometime even to apply emergency patch.

And all make this possible are the dynamic linking mechanism, and the one-to-one mapping from symbol from sources to libraries/binaries.

Although development in C is somehow inconvenient, but it’s still a interesting experience when seeing this kind of usage. :D

Posted 2024-06-05notes

Python 的 type 不是 type？

Intro

近期因為一些原因，想自己寫一個 Python 用的 DI library。寫是寫完了，不含 test 基本上不到 50 行，也 release 到 luckydep · PyPI 了。不過在寫的過程中發現了一些問題。

與 Golang 不同。在 Python 中，DI container 拿取特定 instance 的介面 (invoke/bind/get 等，下稱 invoke) 需要明確傳遞想拿取的 instance 的 type。

func Invoke[T any](c *Container) T {
    var _ T // can construct T event if T is a interface
}

// although we need to specify the type parameter, the type parameter
// is not passed during runtime
var instance = Invoke[SomeType](c)

class Container:
    def invoke(t): # search and build a instance of type t

c = Container()
instance = c.invoke(SomeType) # need to pass type as a parameter

其中的根本差異是，Golang 類型的 static type language，generic function 會真的根據不同型別，產生對應的 function 出來，這些 function 的 byte code/machine code 自然知道當下在處理的型別。而 Python 這類語言，靠 static type checker 建立 generic function，實際上到 runtime 時還是只有一個 function，自然會需要傳遞 type 給 invoke 介面。

自從 Python 3.6 開始我們有 type hint，所以我們可以 annotate function/method 來幫助 IDE/type checker 來推論正確的型別。

class Container:
    def invoke(t: type[T]) -> T: # search and build a instance of type t

c = Container()
instance: SomeType = c.invoke(SomeType) # ok, we can infer instance is SomeType

這邊 type[T] (or typing.Type[T], the old way) 用來表示我們正在用 t 來傳遞傳遞某個 type T，而非 type 為 T 的某個 instance。

From typing document:

A variable annotated with C may accept a value of type C. In contrast, a variable annotated with type[C] (or typing.Type[C]) may accept values that are classes themselves

The Problem

OK，我們有 type[T] 可以用。 DI library 開發者可以用型別為 type[T] 的 t 來做 indexing， library 使用者可以享受到 static type checker 帶來的 type safety。

於是我們拿這個 library 來用在真實情境.. 沒想到一下子就碰上問題了。當我們定義 interface type，並透過 DI container 對該 interface 拿取對應的 implementation instance 時。因為 interface 通常是個 abstract class (or protocol)，mypy type checker 會報錯 (mypy: type-abstract)。

class SomeInterface(Protocol):
    def hello(self): ...

class SomeImplementation:
    def hello(self): ...

c.invoke(SomeInterface) # trigger mypy error [type-abstract]

不會吧… 這不是我們需要 DI 的最重要原因嗎？我們定義 interface 並另外提供 implementation，來達到隔離不同 class 職責的效果。結果當 user 要用這個 library 的時候卻卡在型別檢查…

The History

翻閱文件，第一時間以為這是 mypy 的設計問題。

Mypy always allows instantiating (calling) type objects typed as Type[t]

沒想到翻了 mypy issue #4717 · python/mypy 後，發現這是已經寫在 PEP 544 內的規格。

Variables and parameters annotated with Type[Proto] accept only concrete (non-protocol) subtypes of Proto. The main reason for this is to allow instantiation of parameters with such type. For example:
1
2
def fun(cls: Type[Proto]) -> int:
    return cls().meth() # OK

mypy 允許 construct 一個不知道 constructor 長什麼樣子的 interface type，所以該標示 Type[Proto] 的 parameter 只能傳遞 concrete type… 嗯？

繼續往下追，想不到一開始會有這個檢查，是因為 Guido 本人 在 2016 年開的 #1843 · python/mypy，認為應該允許這種使用方法。

於是 mypy 加入了這個檢查，後來 2017 年的 PEP 544 也明確定義了這個使用規則。

The Controversy

這個 t: type[T] 的設計引起很多爭議，從 #4717 · python/mypy 來看，不少人認為: 為了允許 construct t() 而限制只能傳遞 concrete class 會大幅限制這個 type[T] 的使用情境。

也有人認為這個檢查根本就不合理，因為沒有人能保證這個 protocol type 底下的 concrete class 的 constructor 到底要吃什麼東西。即使 static type check 檢查過了，t() 在 runtime 噴掉一點也不奇怪。更何況根本沒看過有人在 protocol type 上面定義 __init__ method，這個 t() 一開始到底要怎麼檢查也不知道。

如果看相其他語言的開發經驗… Golang 生態系 constructor 是 plain function，定義 interface type 時自然不會包含 constructor。寫 C++ 的人應該也沒聽過什麼 abstract constructor，只有 destructor 會掛 abstract keyword。回到 Python 自身，mypy 和 pyright 兩大工具也都允許 __init__ 的 signature 在繼承鍊中被修改。 (see: python/typing · Discussion #1305)

至於 typing.Type 的文件，寫得很模糊，我想有一定程度的人看到反而更容易誤會。

type[C] … may accept values that are classes themselves …

就算捨棄掉 protocol，限制都只能用 concrete class 來定義 interface。這個只能允許 concrete class 的規則還造成了另一個問題: 使用者該如何傳遞 function type？

1	c.register(Callable[[int, int], int], lambda a, b: a + b) # ????

說好的 function as first-class citizen 呢？怎麼到了要傳遞型別時就不行了？

在翻閱 issue 的過程中，發現其他 DI framework 的 repo 也遇上同樣的問題 #143 · python-injector/injector，頓時覺得自己不孤單。

The Future

由於 PEP 544 自從 2017 年就已經完成，mypy 預設執行這個檢查也行之有年，現在再來改這個行為或許已經來不及了。

於是為了解決這個問題，2020 有人在開了新 issue 9773 · python/mypy 想要定義新的 annotation TypeForm[T]/TypeExpr[T] 來達成要表達任意 type 的 type 的需求。到目前 (2024-06)，對應的 PEP 747 draft 也已經被提出了。

若一切順利，以後我們就會用 TypeExpr[T] 來表達這類 generic function

class Container:
    def invoke(t: TypeExpr[T]) -> T: # search and build a instance of type t

class SomeType(Protocol): ...

c = Container()
instance = c.invoke(SomeType) # ok, we find a object for type SomeType for you!
operator = c.invoke(Callable[[int], bool]) # you need a (int -> bool)? no problem!

至於目前嘛.. library user 在使用到這類 library 的檔案加入下面這行即可。我想要修改的範圍和造成的影響應該都還可以接受。

1	# mypy: disable-error-code="type-abstract"

期許 Python typing system 完好的那天到來。

Timeline

2016-07: #1843 · python/mypy Guido 提出要 instantiate 的需求
2017-05: PEP 544 standardized and published
2018-05: #4717 · python/mypy first discussion against type[T] design
2020-04: #143 · python-injector/injector
2020-10: #9773 · python/mypy propose idea of TypeFrom[T]
2024-06: PEP 747 draft created

Posted 2022-04-20tips

利用 SSH 建立 SOCKS Proxy

最近因為疫情又開始 WFH 了。公司有提供一些 VPN solution 讓員工存取公司內網路，但有一些架在 public cloud 上的服務後台因為有擋來源 IP，無法在家直接存取。

這時候 SSH 內建的 SOCKS proxy server 功能就可以派上用場了！

SSH 可以在建立連線時，一併在本機端開出一個 SOCKS (version 4 and 5) 的 server，接下來任何應用程式都可以將任意的 TCP 連線透過這個 SOCKS server，轉送到 SSH server 後再與目標站台連線。因為大家一定在公司裡有台可以 SSH 的機器(?)，於是這種限制公司 IP 的管理後台就可以順利存取。 :D

使用方式很簡單，SSH 連線時多下參數即可。

1	ssh "target-machine" -D "localhost:1080" -N

-D localhost:1080: 決定要開在 local 的 SOCKS port，RFC 建議是 1080
-N: 如果不需要開一個 shell，只是要 SOCKS proxy 功能，那可以多帶此參數

Note: SSH 有支援 SOCKS5 (可做 IPv6 proxy) 但不支援 authentication，不過因為 SOCKS server 可以如上述設定只開在 localhost 上，所以沒麼問題。

接著我們就可以設定 OS 層級或是 application 層級的 proxy 設定來使用這個 proxy 了！以我一開始遇到的問題來說，通常我會多開一個 Firefox 並設定使用 proxy 來存取公司的各種管理後台。這樣就可以保持其他網路流量還是直接往外打，不需要過 proxy。 :D

若要快速啟動 proxy，可以使用 Windows Terminal 並設定一個 profile，執行上述 SSH 指令。

PuTTY 作為 Windows 上最多人使用的 SSH client，也有支援 SOCKS proxy 功能，詳見: How To Set up a SOCKS Proxy Using Putty & SSH - Security Musings

Coroutine and Concurrent Programming

C++ Coroutine and `std::generator`

CRTP: Curiously Recurring Template Pattern

More Standard Library Support, Maybe?

Internal Method and Proxy

The `set` Trap for Binding

Appendix

A Story

What Happened?

So, How can We Fix?

Unit Type in C++

How It Works.

Why Do We Need This?

The Stop Design

Ecosystem around Iterator

Evaluation of Operators

Pipe-able Higher Order Function

Piper and Decorators

Appendix

Make a Shared Library

Hijack during Actual Execution

Conclusion

Intro

The Problem

The History

The Controversy

The Future

Timeline

Reference

Categories

Recents

Archives

Tags

Coroutine and Concurrent Programming

C++ Coroutine and std::generator

CRTP: Curiously Recurring Template Pattern

More Standard Library Support, Maybe?

Internal Method and Proxy

The set Trap for Binding

Appendix

A Story

What Happened?

So, How can We Fix?

Unit Type in C++

How It Works.

Why Do We Need This?

The Stop Design

Ecosystem around Iterator

Evaluation of Operators

Pipe-able Higher Order Function

Piper and Decorators

Appendix

Make a Shared Library

Hijack during Actual Execution

Conclusion

Intro

The Problem

The History

The Controversy

The Future

Timeline

Reference

Categories

Recents

Archives

Tags

C++ Coroutine and `std::generator`

The `set` Trap for Binding