[Proposal] Adding presto-query-predictor as a new top-level project in Presto on GitHub
We'd like to add presto-query-predictor as a new top level project in Presto.
The Presto query predictor introduces machine learning techniques to provide a quick estimate of resource usage (CPU time and peak memory bytes) of a Presto query. It is achieved by training ML models from historical Presto logs. At Twitter, the project helped with load balancing, traffic management, etc.
Currently, we have open-sourced the project in a separate branch in the twitter-fork presto repo.
The documentation is served at
The codebase is written in Python.
Why create a new repo for the project?
Since open source, we have received interests/questions/feature-requests from multiple Presto developers/users. Keeping the project in twitter forked presto branch brings up troubles in: code sharing, Python build process, and feature support.
For example, we don’t have a specific GitHub issue tracker for the project, which makes it not convenient for us to answer questions or feature requests. It’s also cumbersome to create a unified build process for the Python module.
By creating a new repo under Presto umbrella, we could get:
A unified platform to answer questions and feature requests.
A primary repo/branch for releases and Python package maintenance.
An easily discovered codebase for viewing and sharing.
More collaboration with the open-source community of introducing ML techniques to the Presto ecosystem.
Please reply if you have any questions or concerns on this project.