| 文件 | 最后提交记录 | 最后更新时间 |
|---|---|---|
| 1 个月前 | ||
| 6 个月前 | ||
| 1 个月前 | ||
| 2 个月前 | ||
| 8 个月前 | ||
| 3 个月前 | ||
| 3 个月前 | ||
| 7 个月前 | ||
| 1 个月前 | ||
| 25 天前 | ||
| 3 个月前 | ||
| 2 个月前 | ||
| 2 个月前 | ||
| 9 个月前 |
joy
Joy is project hoping to make it easier to create high performance data processing logic.
- Joy leverage LLVM to create the code dynamically.
- Joy doesn't define any intermediate IR, since
- SQL can be the IR
- the number of operations in SQL is not that many, we can create optimised operators for all of them
- Joy provides a API, hopefully simplified, , without requiring LLVM knowledge, to create and optimise the needed operators
Introduction
Bring happiness to data processing. 😃
API overview:
table api groupby api
code gen api
Architecture
Usage
requirements:
- provide high performance
atom operators, which can be combined intotask - SqlJit compiler
fusioncapability to performantasklevel optimization such as Weld, optimizations:- dynamic vector size: the compiler should take into account the CPU capabilities and decide on for example
vector size, to leverage SIMD and at the same reduce CPU cache miss - type compaction, requires statistics, for example long -> int -> short, phone -> long
- dynamic vector size: the compiler should take into account the CPU capabilities and decide on for example
- ensure cacheline alignment
Use TPC-H Q1 as an example to create the group by aggregator functionality
the purpose of this project is to create a sql processing engine using miri
https://github.com/rust-lang/miri Rust miri provides a mid-level IR which can be used to interpret and run rust code
The idea is to compile sql into rust closure and run using miri
Weld: maintains it's own language Joy: use closure instead
- crate a new SQL JIT compiler leveraging llvm-sys
- take over optimizer from Weld?
Weld:
- use closure syntax with its own parser -- lots of code to maintain
- require more time to parse and visit the AST, which could impact the JIT performance
- We can potentially borrow the optimizer passes and implement more
Data Types
The type system should be as transparent as possible, ideally we should be able to
use the native data types such as i32, i64, f32, u8 directly.
Column::create()
Code Gen
- Direct LLVM code gen without parser to reduce code gen latency
- direct optimization of the code to reduce the time needed for optimization pass
- expose high level data processing codegen API for the community to create optimized data processing logic (vs Weld expose IR)
MCJit?? --> ORCJit (On request compiler)
Code Gen Simplified
The Joy project target to
-
provide a codegen framework requires NO knowledge of LLVM
-
zero overhead codegen: framework should not bring any overhead to the generated code.
-
A framework which provides a
uniop(single input) and abinop(2 input) can we provide a trait for each of the operator type? how is the trait plug into the codegen? what's the benefit of using codegen for join? -
built-in Vectorize input support
-
pluggable logic such as groupby, join
Debug using Visual Studio Code
- Install the RUST and LLDB plugin for the vscode
- Config the debug launch.json and input attach under
configurationsin launch.json. TheLLDB attachis automatically displayed. - In the debug panel, click
Launchin debug window and select the process to debug. - You can add breakpoints in the vscode and debug the code.
gen()
- The gen function provides boilerplate code which loops over each row.
- allows generate code while looping over each row
- allows composition of generated code processing each row
the context of the generated code: 1. has access to all of the columns 2. knows what columns needed is needed 3. all columns are access via column index 3. which column to store the output
C++ Building
- Build with llvm options(-S -O3 -emit-llvm -fno-discard-value-names)
./build.sh release
- Build without llvm option
./build.sh debug