[Proposal] Adding presto-query-predictor as a new top-level project in Presto on GitHub
zluo@...
Hi, We'd like to add presto-query-predictor as a new top level project in Presto. The Presto query predictor introduces machine learning techniques to provide a quick estimate of resource usage (CPU time and peak memory bytes) of a Presto query. It is achieved by training ML models from historical Presto logs. At Twitter, the project helped with load balancing, traffic management, etc. Currently, we have open-sourced the project in a separate branch in the twitter-fork presto repo. https://github.com/twitter-forks/presto/tree/query-predictor/presto-query-predictor The documentation is served at https://chunxutang.github.io/presto-query-predictor-docs/ The codebase is written in Python. Why create a new repo for the project? Since open source, we have received interests/questions/feature-requests from multiple Presto developers/users. Keeping the project in twitter forked presto branch brings up troubles in: code sharing, Python build process, and feature support. For example, we don’t have a specific GitHub issue tracker for the project, which makes it not convenient for us to answer questions or feature requests. It’s also cumbersome to create a unified build process for the Python module. By creating a new repo under Presto umbrella, we could get: A unified platform to answer questions and feature requests. A primary repo/branch for releases and Python package maintenance. An easily discovered codebase for viewing and sharing. More collaboration with the open-source community of introducing ML techniques to the Presto ecosystem. Please reply if you have any questions or concerns on this project. Thanks, Zhenxiao
|
|