Pipeline Style Map-Reduce in Python
Since C++20, C++ provide a new style of data processing, and the ability of lazy evaluation by chaining the iterator to another iterator.
1 |
|
Can we use this style in Python? Yes! :D
Evaluation of Operators
In Python, all expression evolves a arithmetic operator, e.g. a + b
, is evaluate by follow rule
- If the (forward) special method, e.g.
__add__
exists on left operand- It’s invoked on left operand, e.g.
a.__add__(b)
- If the invocation return some meaningful value other than
NotImplemented
, done!
- It’s invoked on left operand, e.g.
- If the (forward) special method does not exist, or the invocation returns
NotImplemented
, then - If the reverse special method, e.g.
__radd__
exists on right operand- It’s invoked on the right operator, e.g.
b.__radd__(a)
- If the invocation return some meaningful value other than
NotImplemented
, done!
- It’s invoked on the right operator, e.g.
- Otherwise,
TypeError
is raised
So it seems possible here… Let’s make a quick experiment
1 | class Adder: |
This works because the |
operator of integer 2
check the type of Adder(3)
and found that
is not something it recognized, so it returns NotImplemented
and our reverse magic method goes.
In C++, the |
operator is overloaded(?) on range adaptors to accept ranges.
So maybe we can make something similar, having some object implements __ror__
that accept
an iterable and return another value (probably a iterator).
Pipe-able Higher Order Function
So back to our motivation, Python already have something like filter
map
reduce
,
and also the powerful generator expression to filter and/or map without explicit function call.
1 | values = filter(lambda v: v % 2 == 0, range(10)) |
1 | values = (v for v in range(10) if v % 2 == 0) |
But it’s just hard to chain multiple operations together while preserving readability.
So let’s make a filter object that support |
operator
1 | class Filter: |
How about map?
1 | class Mapper: |
Works well, we are great again!!
It just take some time for we to write the class representation for filter
, map
, reduce
,
take
, any
… and any higher function you may think useful.
Wait, it looks so tedious. Python should be a powerful language, isn’t it?
Piper and Decorators
The function capturing and __ror__
implementation can be so annoying for all high order function.
If we can make sure __ror__
only take left operand, and return the return value of the captured
function, than we can extract a common Piper
class. We just need another function to produce a
function that already capture the required logic.
1 | class Piper(Generic[_T, _U]): |
Now it looks a little nicer … but we still need to implement all wrapper functions for all kinds of operations?
Again, the only difference between these wrapped functions is the logic inside apply function, so we can extract this part again, with a decorator!! :D
1 | def on(func: Callable[Concatenate[_T, _P], _R]) -> Callable[_P, Piper[_T, _R]]: |
The on
decorator accept some function func
, and return a function that first take the
tail arguments of func
and return a function that accept head argument of func
through
pipe operator.
So now we can express our thoughts in our codebase using pipeline style code, just with one helper class and one helper decorator! :D
1 | values = range(10) |
or
1 | for val in range(10) | filter(lambda val: val % 2 == 0): |
Appendix
Complete type-safe code here
1 | """ |