[Proposal] Adding presto-query-predictor as a new top-level project in Presto on GitHub


zluo@...
 


Hi,

We'd like to add presto-query-predictor as a new top level project in Presto.

The Presto query predictor introduces machine learning techniques to provide a quick estimate of resource usage (CPU time and peak memory bytes) of a Presto query. It is achieved by training ML models from historical Presto logs. At Twitter, the project helped with load balancing, traffic management, etc.

Currently, we have open-sourced the project in a separate branch in the twitter-fork presto repo.
https://github.com/twitter-forks/presto/tree/query-predictor/presto-query-predictor
The documentation is served at
https://chunxutang.github.io/presto-query-predictor-docs/
The codebase is written in Python.

Why create a new repo for the project?

Since open source, we have received interests/questions/feature-requests from multiple Presto developers/users. Keeping the project in twitter forked presto branch brings up troubles in: code sharing, Python build process, and feature support. 

For example, we don’t have a specific GitHub issue tracker for the project, which makes it not convenient for us to answer questions or feature requests. It’s also cumbersome to create a unified build process for the Python module.

By creating a new repo under Presto umbrella, we could get:
A unified platform to answer questions and feature requests.
A primary repo/branch for releases and Python package maintenance.
An easily discovered codebase for viewing and sharing.
More collaboration with the open-source community of introducing ML techniques to the Presto ecosystem.

Please reply if you have any questions or concerns on this project.

Thanks,
Zhenxiao

Join presto-dev@lists.prestodb.io to automatically receive all group messages.