I would like to nominate Arun Thirupathi (github ID: arunthirupathi) as a Presto committer. Arun has been
contributing a lot to the Presto ORC reader and writer. He is now one of the few experts in the Presto community having a deep grasp of file formats.
In Presto, Arun worked extensively on ORC support and is an expert in the columnar file formats. Arun also
worked on Presto core and has contributed to multiple modules in Presto. Arun improves the code by constant refactoring, adding additional tests and improving the documentation of Presto. Arun has reviewed most changes to the columnar file format. Arun has
reviewed changes from different contributors and provides quality feedback.
In addition, Arun is not new to open source. Arun was an active member of Voldemort, a distributed key/value
store that was once popular.
In details, Arun has
48 commits
10K lines of addition and 4K lines of removal
30+ PR reviews and 140+ review comments
The major contributions include:
Rewrote the ORC dictionary writer to improve performance by 3X.
Improve the performance of queries that use Map functions like MAP_AGG, ELEMENT_AT by introducing lazy hash tables in Presto.
Improved the IO performance of Presto ORC reader and writer by introducing new layouts, configurable tail sizes.
Optimized the dictionary writer performance by using chunked memory, optimized data structure.
Fixed multiple bugs in presto-orc like support for stripes with 2 billion rows, IO errors are masked, Hive filter pushdown bugs.
Improved columnar statistics memory efficiency.
Simplified the presto-orc code by constant refactoring.
Maintaining/Upgrading dependency of presto like orc-protobuf, fastutil, hive-apache