Supported Engines

Name Technology Query Dialect Engine Repository Image Notes
spark Apache Spark Spark Streaming SQL GitHub Docker Hub Also used in SQL shell. Currently the only engine that supports GIS data.
flink Apache Flink Flink Streaming SQL GitHub Docker Hub Has best support for stream-to-table joins
datafusion DataFusion DataFusion Batch SQL - - Experimental engine used in tail command and as an alternative to Spark in SQL shell. It’s batch-oriented so unlikely to be used for transformations.

Schema Support

Feature kamu Spark Flink DataFusion
Basic types ✔️ ✔️ ✔️ ✔️
Decimal type ✔️ ✔️ ✔️** ❌***
Nested types ✔️* ✔️
GIS types ✔️* ✔️

* There is currently no way to express nested and GIS data types when declaring root dataset schemas, but you still can use them through pre-processing queries

** Apache Flink has known issues with Decimal type and currently relies on our patches that have not been upstreamed yet, so stability is not guaranteed FLINK-17804.

*** Arrow is lacking Decimal compute support ARROW-RS-272

Operation Types

Feature Spark Flink
Filter ✔️ ✔️
Map ✔️ ✔️
Aggregations ❌* ✔️
Stream-to-Stream Joins ❌* ✔️
Projection / Temporal Table Joins ❌* ✔️
GIS extensions ✔️

* Spark Engine is capable of stream processing but temporarily we have to use it in the batch processing mode, so only row-level operations like map and filter are currently usable, as those do not require corrent stream processing and watermarking semantics.