Supported Engines & Query Dialects
Supported Engines
Name | Technology | Query Dialect | Engine Repository | Image | Notes |
---|---|---|---|---|---|
spark |
Apache Spark | Spark Streaming SQL | GitHub | Docker Hub | Also used in SQL shell. Currently the only engine that supports GIS data. |
flink |
Apache Flink | Flink Streaming SQL | GitHub | Docker Hub | Has best support for stream-to-table joins |
datafusion |
DataFusion | DataFusion Batch SQL | - | - | Experimental engine used in tail command and as an alternative to Spark in SQL shell. It’s batch-oriented so unlikely to be used for transformations. |
Schema Support
Feature | kamu | Spark | Flink | DataFusion |
---|---|---|---|---|
Basic types | ✔️ | ✔️ | ✔️ | ✔️ |
Decimal type | ✔️ | ✔️ | ✔️** | ❌*** |
Nested types | ✔️* | ✔️ | ❌ | ❌ |
GIS types | ✔️* | ✔️ | ❌ | ❌ |
*
There is currently no way to express nested and GIS data types when declaring root dataset schemas, but you still can use them through pre-processing queries
**
Apache Flink has known issues with Decimal type and currently relies on our patches that have not been upstreamed yet, so stability is not guaranteed FLINK-17804.
***
Arrow is lacking Decimal compute support ARROW-RS-272
Operation Types
Feature | Spark | Flink |
---|---|---|
Filter | ✔️ | ✔️ |
Map | ✔️ | ✔️ |
Aggregations | ❌* | ✔️ |
Stream-to-Stream Joins | ❌* | ✔️ |
Projection / Temporal Table Joins | ❌* | ✔️ |
GIS extensions | ✔️ | ❌ |
*
Spark Engine is capable of stream processing but temporarily we have to use it in the batch processing mode, so only row-level operations like map and filter are currently usable, as those do not require corrent stream processing and watermarking semantics.