- Backwards-compatible
- Forwards-compatible
Summary
This RFC proposes a new Open Data Fabric manifest format and a set of resource types aiming to re-center ODF around the declarative approach for defining the desired state of the system, achieve better functional decomposition of the system, and provide a better framework for extensibility.Table of Contents
- Motivation
- General Direction
- Terminology
- Proposal
- Compatibility
- Prior art
- Rationale and alternatives
- Unresolved questions
- Future possibilities
- Appendix A: Example Manifests
Motivation
-
Open Data Fabric spec started and revolved around the single
DatasetSnapshotmanifest that defines the desired state of dataset metadata. We have usedMetadataEventtypes in the snapshot type to define quite complex features like polling data ingestion and push sources. With gained experience, we started to realize that some of these features don’t belong in dataset metadata as they not only don’t contribute to dataset provenance or validity, but may reveal sensitive information about publisher’s infrastructure that must remain private. This RFC proposes to move some of this information out of the dataset metadata chain to be kept private and at the same time make core ODF format spec more lean and easier to adopt. -
Reference implementations like Kamu have been adding a lot of new features that are strongly related to reusable data pipelines but clearly don’t belong in the dataset metadata:
- Flow schedules and configuration
- Dataset variables and secrets
- ReBAC permissions
General Direction
ODF is quickly becoming a “Kubernetes for Data” - a high-level integration framework for different data formats, compute engines, privacy technologies, and data access APIs. We are embracing Infrastructure-as-Code philosophy:- Manifest files will be the primary way to instantiate and update all resources in data pipelines
- Configurations can be managed in git
- Configuration will describe “desired state” of the system
- Reconciliation mechanism will be responsible for matching actual state with the desired state
Terminology
We will stick to the following terminology:- Resource - is a managed back-end entity that has an identity and consumes or allocates some compute or storage resources
- Manifest - is a declarative configuration (e.g. a YAML file) that describes the desired resource state.
- Controller - a process that keeps track of certain manifest types and reconciles them with the state of objects under its control. The purpose of controllers is to constantly move the current resource state towards the desired state described by manifests.
DatasetYAML file is a manifest applying which will create a correspondingDatasetresourceVariableSetis a high-level manifest applying which will create a set ofVariableresourcesIngressis a manifest application of which will create a high-levelIngressresource. The controller ofIngresswill then generate and apply lower-level manifests forApiEndpoint, aBuffer(e.g. Kafka topic), and aSource, which will in turn result in provisioning of corresponding resources.
Proposal
The proposal will often reference Kubernetes as one of the best examples of the declarative IaC approach. We will draw many parallels with its design and build on its learnings.Resource Manifests
In Kubernetes manifests are structured as:$schemaidentifies the type of the resource using a resolvable URL that points to the JSON Schema file- This replaces separate
apiVersion/kindfields (as in K8s) with a single self-describing identifier
- This replaces separate
headerscontains identity and ownership informationheadersis used instead ofmetadatabecause the latter is already an overloaded term in ODF and is generic to the point of being meaningless
specdefines the desired state of the resourcestatuscontains information about the current state and the reconciliation process
Type Identification
We use$schema URL to identify resource types:
$schema URL is formatted as {base-url}/{context}/{version}/{Name} and carries:
- Controlling organization domain (e.g.
opendatafabric.org) - Bounded context (e.g.
config) - Version (e.g.
v1alpha1) - Resource name (e.g.
SecretSet)
$schema field and automatically fetch the associated JSON schema to provide validation and auto-completion.
Resource schemas will be registered within ODF node and, similarly to Kubernetes CRDs, assigned a short type name (e.g. SecretSet) that can be used instead of the schema. This concept is very similar to JSON-LD expansion.
Versioning
Version is part of the$schema URL in the form of v1 or v1alpha1.
- Version should NOT be thought of only as manifest schema. It captures both how resource is defined and the semantics of how it behaves, i.e. version may be incremented if resource behavior changes significantly even when the schema stays the same.
- Versions apply to the level of entire bounded context, not an individual resource, so if one domain contains multiple related resources a version bump would apply to all of them.
Multi-tenancy
Resources can explicitly define whichaccount they belong to:
namespace-based isolation - ODF is based on ReBAC account-centric model that allows complex ownership and access control hierarchies, e.g. teams, organizations, flexible permissions for accounts outside of organizations.
Identity
A manifest file will usually only define thename:
(Account, ResourceType, ResourceName) uniquely identifies a resource within an ODF node (see also references). There may be multiple resources of different types under one account with the same name.
Upon resource creation ODF node will additionally assign resource a unique id (UUID v4):
id is a stable identifier of a resource within one node.
Including id in the manifest can be used to ensure the manifest applies to the exact needed resource, but sacrifices portability of the manifest across ODF nodes.
When including both id and name the identity matching will be performed by id while the name will be synchronized with the manifest, thus allowing you to rename resources.
IDs vs. DIDs
As resources describe the desired state of the system - resource instances are separate from objects that are created to fulfill that state. Thus e.g. aDataset resource is not the same thing as an ODF dataset stored in some S3 bucket. Dataset resource can be used to create or configure an ODF dataset, but they are separate objects.
This is why resources like Dataset or Account have both:
id(UUIDv4) that identifies the resource within a node- and a
did(W3C DID) that identifies an ODF dataset or a person on a global network
did:odf:xxyy and another node replicates this dataset - they will have two Dataset resources with the same did but different id.
Since DIDs are unique, reference types support identifying target resources by DIDs of the corresponding objects, purely for convenience.
Labels & Annotations
Resources can specify custom labels and annotations. Both are maps of string keys to any JSON values, but only labels get indexed and can be used for querying:Dataset resource above will automatically get the datasetKind: Root label without you needing to specify it manually because it’s very common to filter datasets by datasetKind.
References
Resource manifests can link to other resources using references, forming a DAG. Resources can be referenced by:- ID
- Type and name (optionally including the owner account name)
volume reference above is equivalent to:
Typed References
Many contexts may choose to provide extended versions of resource references that:- Restrict target resources to a specific type (e.g.
DatasetRef,VolumeRef) - Provide additional features (e.g. ability to reference a sub-path of a secret via
Secret:postgres#password) - Allow referencing by DIDs (e.g.
account:did:pkh:eip155:1:0xaa..)
Reference resolution
When setting up complex resource graphs like ingestion pipelines it’s very convenient to reference all components by names, especially when target resources are not created yet and were not assigned anid.
When operating complex graphs long-term, however, as resources get renamed and deleted referencing by name may not only become a nuisance, but cause security risks if old names get reused.
This is why ODF nodes will resolve all references to stable IDs upon first apply, while account and name information will remain for annotational purposes:
- It should be possible to create e.g. a
Sourcethat references a non-existingSecretand have it created later - It should be possible to delete a resource referenced by others
Selectors
Multiple resources can be referenced at once with selectors. Resources can be referenced in bulk by shared properties like:- Name patterns
- Label filters
Ownership
When an object is created by another higher-level object it can write the association into the header asownerReference. This creation provenance trail can be used for automatic cascading deletion and garbage collection.
Generations
Resource reconciliation is an eventually-consistent process. While a controller is working to reconcile one version of a resource the desired state may be changed by the user. To reflect this lag, a sequentialgeneration number is incremented every time the resource header and spec are updated.
status section observedGeneration can be used to see what generation the controller had a chance to process.
Note that generation does not increment on status changes as it is intended to signify changes to the desired state.
Status
Thestatus section of the resource manifest never appears in user-defined manifests. It is maintained by the ODF nodes and writeable only by resource controllers. It is used to provide detailed information about the reconciliation status of the resource.
The main controller of a resource populates the phase and associated top-level fields during reconciliation attempts, while the conditions field provides a generic mechanism to attach additional information like error codes and messages. The conditions can be contributed by multiple controllers.
Example of Source resource status where reconciliation attempt for generation: 2 failed because it links to non-existing secret:
phase field state machine:
The conditions are keyed by schema IDs to disambiguate, avoid name collisions, and provide schema checking.
Resource Application
When applying manifests the following steps take place:- Load each manifest in apply batch and validate against its schema
- Lookup if target resources already exist (by
idorname) - Assign
ids to resources that do not yet exist - Resolve all resource references to
ids of existing resources or those about to be created - Reorder resources in depth-first dependency order
- Pass every resource into
pre-applystep of its corresponding main controller to:- Perform additional validation
- Contribute additional labels (e.g.
datasetKind) - Re-write parts of the spec (e.g. to encrypt raw secrets into
jwebefore the spec is stored)
- Resource
generationis incremented - Resource specs are saved into the event store
- Reconciliation process is initiated asynchronously
APIs
Current state of ODF APIs
Current API of the Node evolved as several groups of functionality:- REST API is composed of semi-overlapping groups like:
- Simple transfer protocol
- Smart transfer protocol
- Data query and commitments
- Standalone APIs
- GraphQL
- FlightSQL
- REST API to become a superset of GraphQL API
- Minimize the burden of maintaining two APIs (reuse object schemas as much as possible)
- Establish a good versioning strategy
REST API strategy
We will introduce another core REST API protocol group: Resource protocol. Resource protocol will define:- How to list, create, update, delete, and get the state of various resources in an ODF system
- Top-level resource addressing scheme and serve as a nesting point for other protocols like simple/smart transfer and data querying
- The account acted upon by the auth subject will be reflected in
accountquery argument:- no argument - same as
?account= ?account=- use account from the auth token?account=opendatafabric.org- specifies account by name?account=did:key:...- specifies account by DID
- no argument - same as
- Listing:
/<context>/<version>/<type>/dataset/v1/dataset/auth/v1/relation
- By object ref:
/<context>/<version>/<type>/<id-did-name>/config/v1/secret/c27331ce-ce88-4ff9-8c5a-4ce8107cc03f/dataset/v1/dataset/did:odf:123..321/dataset/v1/dataset/my-dataset(searches current account only)
- Other HTTP-based protocols:
/<protocol>/.../graphql
GraphQL strategy
Initially GQL will cover only generic resource operations (apply, rename, delete etc.) while representing the resources as plain JSON scalars. This approach avoids having a dynamic GQL schema for types that will be registered in ODF node in runtime. In the future we will consider the benefits of representing ODF resource manifests as fully-featured types and reference graph navigation. For example a manifest like this one:Compatibility
Proposed changes can be introduced in implementations in parallel with existing mechanics, gradually adoptingDataset and Source resources as replacements for DatasetSnapshot and in-metadata polling and push sources.
Prior art
Kubernetes Design Notes
- Generated Object API Reference
- OpenAPI Spec
- apimachinery
- Kubernetes API concepts
- kube.rs
- https://github.com/Arnavion/k8s-openapi
- Kubernetes API Groups (api-based versioning instead of resource-based)
Kubernetes API Overview
Kubernetes provides a unified API that is very extensible and provides a uniform way to work with all object resources. Basic Kubernetes API scheme is:/apis/<group>/<version>/<kind>/<name>/apis/apps/v1/deployments/apis/apps/v1/deployments/my-deployment
/apis/<group>/<version>/namespaces/<namespace>/<kind>/<name>/apis/apps/v1/namespaces/my-namespace/deployments/my-deployment
- Proven, mature model
- API-based versioning
- API groups allowing different subsystems to evolve independently
- Namespace is the only unit of multi-tenancy
- In ODF we aim for more flexible multi-tenancy model based on accounts
- Objects are addressed by names - it’s not possible to use
uid- The names in k8s are immutable, but in ODF they can not only be changed within a node, but also be different across nodes
Rationale and alternatives
ODF objects as Kubernetes CRDs
If we were to piggy-back on Kubernetes:- Pros:
- Mature system
- Reusing tools and integrations
- Cons:
- We would inherit many legacy design choices
- Namespaces and RBAC don’t provide real multi-tenancy
- We would like to think multi-tenant / multi-region /multi-cloud
- See https://www.kcp.io/
- Performance concerns in case of millions of objects
- Excessive coupling that would be hard to undo
- Pros:
- More flexibility
- Can natively support ReBAC and our vision of multi-tenancy
- Can freely evolve manifest formats to make them easier to author
- Cons:
- A lot more work
- Templating would require re-inventing something like helm + helmfile
Terraform modules for ODF API
- Pros:
- Can wrap existing APIs in TF modules without modifications
- Existing tool for complex dependency management and templating
- Cons:
- TF feels like a hack to add declarativeness onto services that don’t support it
- Very few data engineers have TF experience
- High potential for desync between TF state and actual pipeline state
- TF state needs to be stored somewhere, requiring more infrastructure on the user’s side
- Harder to validate and to report useful errors
Unresolved questions
- DatasetID vs ResourceID
- should we have both?
- two types of selectors then?
Future possibilities
JSON-LD
We are considering using JSON-LD for manifests in future. JSON-LD manifest could look like this:Extensibility of union types
Some union types likeTaskSpec and Ingress currently provide a fixed set of variants. This is very convenient for manifest auto-completion, but will restrict implementation from experimenting with other types of tasks and sources.
We should in the future extend those types with an Ext variant that allows specifying fully generic values, e.g.:
Appendix A: Example Manifests
Example resource manifests are provided with this RFC in the/examples directory.
An entity-relationship diagram is also available to see the interplay of core prototype types.
Note that many resource types are still in early prototype stage of maturity and will be refined during the reference implementation.