Kamu Node is a Kubernetes-native set of applications designed to be easily installable on-premise and in private cloud.
ODF datasets are always safely stored in the storage of your choice, separate from the rest of the system. Multiple storage systems are supported.
Root and derivative datasets can be stored separately, so that only root datasets were (geo-)replicated to safeguard them from catastrophic loss of data. The property that derivative datasets can be fully reconstructed from metadata can enable significant cost savings.
This set of storage systems is needed for operating the node and web platform.
- Metadata - used for efficient access to metadata chains of datasets without making too many queries to dataset storage
- Flows & Tasks - stores the configuration, execution state, and history of various operations performed by the node (ingesting data, derivative transforms, data queries)
- Monitoring - stores traces and metrics for monitoring the health of the deployment
Web platform data:
- Auth - stores user and organization accounts, permissions, and linked cryptographic keys
- Governance - stores issues, discussions, comments, attachments and other information associated with datasets
Data Processing Layer
To perform data-intensive operations on datasets the node maintains a pool of engines of needed type and version (see also Supported Engines). Acting as a Kubernetes controller the Engine Provisioner decides when to start and stop individual engine pods based on demand, configuration, and available capacity in the cluster.
The current demand for data processing is managed by the Task Scheduler that prioritizes individual tasks and hands them over to executors.
These services are the “brain” of the node. They serve API requests as well as decide when to perform various processing and maintenance tasks on datasets and pipelines.
Data-intensive operations like push ingest and SQL queries are handled by the Data Gateway component that exposes data under a wide variety of protocols for reading and writing.
All other kinds of requests are handled by the pool of API Server components that provide access to all functionality of the node via GraphQL and REST APIs.
Services communicate with one-another via Event Bus component.
Kamu Web Platform is a browser application that is served to the browser as a set of static files. The Web Server pods are stateless and linearly scalable. Since application files change infrequently, they can be fronted by a CDN to optimize loading times and decrease the traffic to the deployment.
Web platform application running in a browser communicates with API Server via GraphQL protocol.
Optionally, other front-end applications like Jupyter Hub and Apache Superset can be deployed to provide data science and business intelligence capabilities on top of Kamu Node. We often include them in our deployment examples to show how to connect them to Kamu.