Documentation Index
Fetch the complete documentation index at: https://docs.kamu.dev/llms.txt
Use this file to discover all available pages before exploring further.
kamu
Usage: kamu [OPTIONS] <COMMAND>
Subcommands:
add— Add a new dataset or modify an existing onecompletions— Generate tab-completion scripts for your shellconfig— Get or set configuration optionsdelete [rm]— Delete a datasetexport— Exports a datasetingest— Adds data to the root dataset according to its push source configurationinit— Initialize an empty workspace in the current directoryinspect— Group of commands for exploring dataset metadatalist [ls]— List all datasets in the workspacelog— Shows dataset metadata historylogin— Authenticates with a remote ODF server interactivelylogout— Logs out from a remote Kamu servernew— Creates a new dataset manifest from a templatenotebook— Starts the notebook server for exploring the data in the workspacepull— Pull new data into the datasetspush— Push local data into a repositoryrename [mv]— Rename a datasetreset— Revert the dataset back to the specified staterepo— Manage set of tracked repositoriessearch— Searches for datasets in the registered repositoriessql— Executes an SQL query or drops you into an SQL shellsystem— Command group for system-level functionalitytail— Displays a sample of most recent records in a datasetui— Opens web interfaceverify— Verifies the validity of a datasetversion— Outputs build information
-v— Sets the level of verbosity (repeat for more)--no-color— Disable color output in the terminal-q,--quiet— Suppress all non-essential output-y,--yes— Do not ask for confirmation and assume the ‘yes’ answer--trace— Record and visualize the command execution as perfetto.dev trace--show-error-stack-trace— Show stack trace in case of a command execution error--metrics— Dump all metrics at the end of command execution
kamu add
Add a new dataset or modify an existing one
Usage: kamu add [OPTIONS] [MANIFEST]...
Arguments:
<MANIFEST>— Dataset manifest reference(s) (path, or URL)
-
-r,--recursive— Recursively search for all manifest in the specified directory -
--replace— Delete and re-add datasets that already exist -
--stdin— Read manifests from standard input -
--name <N>— Overrides the name in a loaded manifest -
--visibility <VIS>— Changing the visibility of the added dataset Possible values:private,public
kamu pull command.
kamu completions
Generate tab-completion scripts for your shell
Usage: kamu completions <SHELL>
Arguments:
-
<SHELL>Possible values:bash,elvish,fish,powershell,zsh
~/.bashrc:
source <(kamu completions bash)
You will need to reload your shell session (or execute the same command in your current one) for changes to take effect.
Zsh:
Append the following to your ~/.zshrc:
autoload -U +X bashcompinit && bashcompinit
source <(kamu completions bash)
Please contribute a guide for your favorite shell!
kamu config
Get or set configuration options
Usage: kamu config <COMMAND>
Subcommands:
list [ls]— Display current configuration combined from all config filesget— Get current configuration valueset— Set or unset configuration value
kamu is managed very similarly to git. Starting with your current workspace and going up the directory tree you can have multiple .kamuconfig YAML files which are all merged together to get the resulting config.
Most commonly you will have a workspace-scoped config inside the .kamu directory and the user-scoped config residing in your home directory.
Examples:
List current configuration as combined view of config files:
kamu config list
Get current configuration value:
kamu config get engine.runtime
Set configuration value in workspace scope:
kamu config set engine.runtime podman
Set configuration value in user scope:
kamu config set —user engine.runtime podman
Unset or revert to default value:
kamu config set —user engine.runtime
kamu config list
Display current configuration combined from all config files
Usage: kamu config list [OPTIONS]
Options:
-
--scope <SC>— Which configs to use Default value:combinedPossible values:user: Includes only config in user home directoryworkspace: Includes only current workspace configcombined: Includes configs in workspace, parent directories, and user home dir
-
--user— Show only user scope configuration -
--with-defaults— Show configuration with all default values applied -
-o,--output-format <FMT>— Serialization format of the returned object Default value:yamlPossible values:yaml,json
kamu config get
Get current configuration value
Usage: kamu config get [OPTIONS] [CFGKEY]
Arguments:
<CFGKEY>— Path to the config option
-
--scope <SC>— Which configs to use Default value:combinedPossible values:user: Includes only config in user home directoryworkspace: Includes only current workspace configcombined: Includes configs in workspace, parent directories, and user home dir
-
--user— Operate on the user scope configuration file -
--with-defaults— Get default value if config option is not explicitly set -
-o,--output-format <FMT>— Serialization format of the returned object Default value:yamlPossible values:yaml,json
kamu config set
Set or unset configuration value
Usage: kamu config set [OPTIONS] <CFGKEY> [VALUE]
Arguments:
<CFGKEY>— Path to the config option<VALUE>— New value to set
-
--scope <SC>— Which configs to consider Default value:combinedPossible values:user: Includes only config in user home directoryworkspace: Includes only current workspace configcombined: Includes configs in workspace, parent directories, and user home dir
-
--user— Operate on the user scope configuration file -
-i,--input-format <FMT>— Serialization format of the provided object Default value:yamlPossible values:yaml,json
kamu delete
Delete a dataset
Usage: kamu delete [OPTIONS] [DATASET]...
Arguments:
<DATASET>— Local dataset reference(s)
-a,--all— Delete all datasets in the workspace-r,--recursive— Also delete all transitive dependencies of specified datasets
kamu export
Exports a dataset
Usage: kamu export [OPTIONS] --output-format <OUTPUT_FORMAT> <DATASET>
Arguments:
<DATASET>— Local dataset reference
--output-path <OUTPUT_PATH>— Export destination. Dafault is<current workdir>/<dataset name>--output-format <OUTPUT_FORMAT>— Output format--records-per-file <RECORDS_PER_FILE>— Number of records per file, if stored into a directory. It’s a soft limit. For the sake of export performance the actual number of records may be slightly different
export/dataset.csvis a file pathexport/dataset.csv/is a directory pathexport/dataset/is a directory pathexport/datasetis a directory path
kamu ingest
Adds data to the root dataset according to its push source configuration
Usage: kamu ingest [OPTIONS] <DATASET> [FILE]...
Arguments:
<DATASET>— Local dataset reference<FILE>— Data file(s) to ingest
-
--source-name <SRC>— Name of the push source to use for ingestion -
--event-time <T>— Event time to be used if data does not contain one -
--stdin— Read data from the standard input -
-r,--recursive— Recursively propagate the updates into all downstream datasets -
--input-format <FMT>— Overrides the media type of the data expected by the push source Possible values:csv,json,ndjson,geojson,ndgeojson,parquet,esrishapefile
kamu init
Initialize an empty workspace in the current directory
Usage: kamu init [OPTIONS]
Options:
--exists-ok— Don’t return an error if workspace already exists--pull-images— Only pull container images and exit
.kamu directory contains dataset metadata, data, and all supporting files (configs, known repositories etc.).
kamu inspect
Group of commands for exploring dataset metadata
Usage: kamu inspect <COMMAND>
Subcommands:
lineage— Shows the dependency tree of a datasetquery— Shows the transformations used by a derivative datasetschema— Shows the dataset schema
kamu inspect lineage
Shows the dependency tree of a dataset
Usage: kamu inspect lineage [OPTIONS] [DATASET]...
Arguments:
<DATASET>— Local dataset reference(s)
-
-o,--output-format <FMT>— Format of the output Possible values:shell,dot,csv,html -
-b,--browse— Produce HTML and open it in a browser
kamu inspect query
Shows the transformations used by a derivative dataset
Usage: kamu inspect query <DATASET>
Arguments:
<DATASET>— Local dataset reference
kamu verify command).
kamu inspect schema
Shows the dataset schema
Usage: kamu inspect schema [OPTIONS] <DATASET>
Arguments:
<DATASET>— Local dataset reference
-
-o,--output-format <FMT>— Format of the output Default value:odf-yamlPossible values:arrow-json,ddl,odf-json,odf-yaml,parquet,parquet-json
kamu list
List all datasets in the workspace
Usage: kamu list [OPTIONS]
Options:
-
-o,--output-format <FMT>— Format to display the results in Possible values:csv: Comma-separated valuesjson: Array of Structures formatndjson: One Json object per line - easily splittable formatjson-soa: Structure of arrays - more compact and efficient format for encoding entire dataframejson-aoa: Array of arrays - compact and efficient and preserves column ordertable: A pretty human-readable tableparquet: Parquet columnar storage. Only available when exporting to file(s)
-
-w,--wide— Show more details (repeat for more)
kamu log
Shows dataset metadata history
Usage: kamu log [OPTIONS] <DATASET>
Arguments:
<DATASET>— Local dataset reference
-
-o,--output-format <FMT>— Format of the output Possible values:shell,yaml -
-f,--filter <FLT>— Types of events to include -
--limit <LIMIT>— Maximum number of blocks to display Default value:500
- Data ingestion / transformation
- Change of query
- Change of schema
- Change of source URL or other ingestion steps in a root dataset
kamu login
Authenticates with a remote ODF server interactively
Usage: kamu login [OPTIONS] [SERVER] [COMMAND]
Subcommands:
oauth— Performs non-interactive login to a remote Kamu server via OAuth provider tokenpassword— Performs non-interactive login to a remote Kamu server via login and password
<SERVER>— ODF server URL (defaults to kamu.dev)
--user— Store access token in the user home folder rather than in the workspace--check— Check whether existing authorization is still valid without triggering a login flow--access-token <ACCESS_TOKEN>— Provide an existing access token--repo-name <REPO_NAME>— Repository name which will be used to store in repositories list--skip-add-repo— Don’t automatically add a remote repository for this host
kamu login oauth
Performs non-interactive login to a remote Kamu server via OAuth provider token
Usage: kamu login oauth <PROVIDER> <ACCESS_TOKEN> [SERVER]
Arguments:
<PROVIDER>— Name of the OAuth provider, i.e. ‘github’<ACCESS_TOKEN>— OAuth provider access token<SERVER>— ODF backend server URL (defaults to kamu.dev)
kamu login password
Performs non-interactive login to a remote Kamu server via login and password
Usage: kamu login password <LOGIN> <PASSWORD> [SERVER]
Arguments:
<LOGIN>— Specify user name<PASSWORD>— Specify password<SERVER>— ODF backend server URL (defaults to kamu.dev)
kamu logout
Logs out from a remote Kamu server
Usage: kamu logout [OPTIONS] [SERVER]
Arguments:
<SERVER>— ODF server URL (defaults to kamu.dev)
--user— Drop access token stored in the user home folder rather than in the workspace-a,--all— Log out of all servers
kamu new
Creates a new dataset manifest from a template
Usage: kamu new [OPTIONS] <NAME>
Arguments:
<NAME>— Name of the new dataset
--root— Create a root dataset--derivative— Create a derivative dataset
org.example.data.yaml file from template in the current directory:
kamu new org.example.data —root
kamu notebook
Starts the notebook server for exploring the data in the workspace
Usage: kamu notebook [OPTIONS]
Options:
-
--address <ADDRESS>— Expose HTTP server on specific network interface -
--http-port <HTTP_PORT>— Expose HTTP server on specific port -
--engine <ENG>— Engine type to use for the notebook Possible values:datafusion,spark -
-e,--env <VAR>— Propagate or set an environment variable in the notebook (e.g.-e VARor-e VAR=foo)
kamu pull
Pull new data into the datasets
Usage: kamu pull [OPTIONS] [DATASET]...
Arguments:
<DATASET>— Local or remote dataset reference(s)
-
-a,--all— Pull all datasets in the workspace -
-r,--recursive— Also pull all transitive dependencies of specified datasets -
--fetch-uncacheable— Pull latest data from uncacheable data sources -
--as <NAME>— Local name of a dataset to use when syncing from a repository -
--no-alias— Don’t automatically add a remote push alias for this destination -
--set-watermark <TIME>— Injects a manual watermark into the dataset to signify that no data is expected to arrive with event time that precedes it -
-f,--force— Overwrite local version with remote, even if revisions have diverged -
--reset-derivatives-on-diverged-input— Run hard compaction of derivative dataset if transformation failed due to root dataset compaction -
--visibility <VIS>— Changing the visibility of the pulled dataset(s) Possible values:private,public
- Run polling ingest to pull data into a root dataset from an external source
- Run transformations on a derivative dataset to process previously unseen data
- Pull dataset from a remote repository into your workspace
- Update watermark on a dataset
kamu repo add -h for supported sources):
kamu pull ipfs://bafy…a0dx/data
kamu pull s3://my-bucket.example.org/odf/org.example.data
kamu pull s3+https://example.org:5000/data —as org.example.data
Advance the watermark of a dataset:
kamu pull —set-watermark 2020-01-01 org.example.data
kamu push
Push local data into a repository
Usage: kamu push [OPTIONS] [DATASET]...
Arguments:
<DATASET>— Local or remote dataset reference(s)
-
-a,--all— Push all datasets in the workspace -
-r,--recursive— Also push all transitive dependencies of specified datasets -
--no-alias— Don’t automatically add a remote push alias for this destination -
--to <REM>— Remote alias or a URL to push to -
-f,--force— Overwrite remote version with local, even if revisions have diverged -
--visibility <VIS>— Changing the visibility of the initially pushed dataset(s) Possible values:private,public
kamu repo add -h for supported protocols):
kamu push org.example.data —to s3://my-bucket.example.org/odf/org.example.data
Sync dataset to a named repository (see kamu repo command group):
kamu push org.example.data —to kamu-hub/org.example.data
Sync dataset that already has a push alias:
kamu push org.example.data
Sync datasets matching pattern that already have push aliases:
kamu push org.example.%
Add dataset to local IPFS node and update IPNS entry to the new CID:
kamu push org.example.data —to ipns://k5..zy
kamu rename
Rename a dataset
Usage: kamu rename <DATASET> <NAME>
Arguments:
<DATASET>— Dataset reference<NAME>— The new name to give it
kamu reset
Revert the dataset back to the specified state
Usage: kamu reset <DATASET> <HASH>
Arguments:
<DATASET>— Dataset reference<HASH>— Hash of the block to reset to
kamu repo
Manage set of tracked repositories
Usage: kamu repo <COMMAND>
Subcommands:
add— Adds a repositorydelete [rm]— Deletes a reference to repositorylist [ls]— Lists known repositoriesalias— Manage set of remote aliases associated with datasets
kamu repo add
Adds a repository
Usage: kamu repo add <NAME> <URL>
Arguments:
<NAME>— Local alias of the repository<URL>— URL of the repository
kamu repo delete
Deletes a reference to repository
Usage: kamu repo delete [OPTIONS] [REPOSITORY]...
Arguments:
<REPOSITORY>— Repository name(s)
-a,--all— Delete all known repositories
kamu repo list
Lists known repositories
Usage: kamu repo list [OPTIONS]
Options:
-
-o,--output-format <FMT>— Format to display the results in Possible values:csv: Comma-separated valuesjson: Array of Structures formatndjson: One Json object per line - easily splittable formatjson-soa: Structure of arrays - more compact and efficient format for encoding entire dataframejson-aoa: Array of arrays - compact and efficient and preserves column ordertable: A pretty human-readable tableparquet: Parquet columnar storage. Only available when exporting to file(s)
kamu repo alias
Manage set of remote aliases associated with datasets
Usage: kamu repo alias <COMMAND>
Subcommands:
add— Adds a remote alias to a datasetdelete [rm]— Deletes a remote alias associated with a datasetlist [ls]— Lists remote aliases
kamu repo alias add
Adds a remote alias to a dataset
Usage: kamu repo alias add [OPTIONS] <DATASET> <ALIAS>
Arguments:
<DATASET>— Local dataset reference<ALIAS>— Remote dataset name
--push— Add a push alias--pull— Add a pull alias
kamu repo alias delete
Deletes a remote alias associated with a dataset
Usage: kamu repo alias delete [OPTIONS] [DATASET] [ALIAS]
Arguments:
<DATASET>— Local dataset reference<ALIAS>— Remote dataset name
-a,--all— Delete all aliases--push— Delete a push alias--pull— Delete a pull alias
kamu repo alias list
Lists remote aliases
Usage: kamu repo alias list [OPTIONS] [DATASET]
Arguments:
<DATASET>— Local dataset reference
-
-o,--output-format <FMT>— Format to display the results in Possible values:csv: Comma-separated valuesjson: Array of Structures formatndjson: One Json object per line - easily splittable formatjson-soa: Structure of arrays - more compact and efficient format for encoding entire dataframejson-aoa: Array of arrays - compact and efficient and preserves column ordertable: A pretty human-readable tableparquet: Parquet columnar storage. Only available when exporting to file(s)
kamu search
Searches for datasets in the registered repositories
Usage: kamu search [OPTIONS] [QUERY]
Arguments:
<QUERY>— Search terms
-
-l,--local— Search local datasets instead of searching in remote repositories -
-n,--max-results <MAX_RESULTS>— Maximum results to fetch Default value:10 -
-o,--output-format <FMT>— Format to display the results in Possible values:csv: Comma-separated valuesjson: Array of Structures formatndjson: One Json object per line - easily splittable formatjson-soa: Structure of arrays - more compact and efficient format for encoding entire dataframejson-aoa: Array of arrays - compact and efficient and preserves column ordertable: A pretty human-readable tableparquet: Parquet columnar storage. Only available when exporting to file(s)
-
--repo <REPO>— Repository name(s) to search in
kamu sql
Executes an SQL query or drops you into an SQL shell
Usage: kamu sql [OPTIONS] [COMMAND]
Subcommands:
server— Runs an SQL engine in a server mode
-
-o,--output-format <FMT>— Format to display the results in Possible values:csv: Comma-separated valuesjson: Array of Structures formatndjson: One Json object per line - easily splittable formatjson-soa: Structure of arrays - more compact and efficient format for encoding entire dataframejson-aoa: Array of arrays - compact and efficient and preserves column ordertable: A pretty human-readable tableparquet: Parquet columnar storage. Only available when exporting to file(s)
-
--engine <ENG>— Engine type to use for this SQL session Possible values:datafusion,spark -
--url <URL>— URL of a running JDBC server (e.g. jdbc:hive2://example.com:10000) -
-c,--command <CMD>— SQL command to run -
--script <FILE>— SQL script file to execute -
--output-path <OUTPUT_PATH>— When set, result will be stored to a given path instead of being printed to stdout -
--records-per-file <RECORDS_PER_FILE>— Number of records per file, if stored into a directory. It’s a soft limit. For the sake of export performance the actual number records may be slightly different
export/dataset.csvis a file pathexport/dataset.csv/is a directory pathexport/dataset/is a directory pathexport/datasetis a directory path
org.example.data LIMIT 10’ -o csv
Run SQL server to use with external data processing tools:
kamu sql server —address 0.0.0.0 —port 8080
Connect to a remote SQL server:
kamu sql —url jdbc:hive2://example.com:10000
Note: Currently when connecting to a remote SQL kamu server you will need to manually instruct it to load datasets from the data files. This can be done using the following command:
CREATE TEMP VIEW my.dataset AS (SELECT * FROM parquet.kamu_data/my.dataset);
kamu sql server
Runs an SQL engine in a server mode
Usage: kamu sql server [OPTIONS]
Options:
-
--address <ADDRESS>— Expose server on specific network interface -
--port <PORT>— Expose server on specific port -
--engine <ENG>— Engine type to use for this server Possible values:datafusion,spark -
--livy— Run Livy server instead of JDBC
kamu system
Command group for system-level functionality
Usage: kamu system <COMMAND>
Subcommands:
api-server— Run HTTP + GraphQL servercompact— Compact a datasetdebug-token— Validate a Kamu tokendepgraph— Validate a Kamu tokendecode— Decode a manifest filediagnose— Run basic system diagnose checkgenerate-token— Generate a platform token from a known secret for debugginggc— Runs garbage collection to clean up cached and unreachable objects in the workspaceinfo— Summary of the system informationipfs— IPFS helpersupgrade-workspace— Upgrade the layout of a local workspace to the latest version
kamu system api-server
Run HTTP + GraphQL server
Usage: kamu system api-server [OPTIONS] [COMMAND]
Subcommands:
gql-query— Executes the GraphQL query and prints out the resultgql-schema— Prints the GraphQL schema
--address <ADDRESS>— Bind to a specific network interface--http-port <HTTP_PORT>— Expose HTTP+GraphQL server on specific port--get-token— Output a JWT token you can use to authorize API queries--external-address <EXTERNAL_ADDRESS>— Allows changing the base URL used in the API. Can be handy when launching inside a container
kamu system api-server gql-query
Executes the GraphQL query and prints out the result
Usage: kamu system api-server gql-query [OPTIONS] <QUERY>
Arguments:
<QUERY>— GQL query
--full— Display the full result including extensions
kamu system api-server gql-schema
Prints the GraphQL schema
Usage: kamu system api-server gql-schema
kamu system compact
Compact a dataset
Usage: kamu system compact [OPTIONS] [DATASET]...
Arguments:
<DATASET>— Local dataset references
-
--max-slice-size <SIZE>— Maximum size of a single data slice file in bytes Default value:300000000 -
--max-slice-records <RECORDS>— Maximum amount of records in a single data slice file Default value:10000 -
--hard— Perform ‘hard’ compaction that rewrites the history of a dataset -
--keep-metadata-only— Perform compaction without saving data blocks -
--verify— Perform verification of the dataset before running a compaction
kamu system debug-token
Validate a Kamu token
Usage: kamu system debug-token <TOKEN>
Arguments:
<TOKEN>— Access token
kamu system depgraph
Validate a Kamu token
Usage: kamu system depgraph
kamu system decode
Decode a manifest file
Usage: kamu system decode [OPTIONS] [MANIFEST]
Arguments:
<MANIFEST>— Manifest reference (path, or URL)
--stdin— Read manifests from standard input
kamu system diagnose
Run basic system diagnose check
Usage: kamu system diagnose
kamu system generate-token
Generate a platform token from a known secret for debugging
Usage: kamu system generate-token [OPTIONS]
Options:
-
--subject <SUBJECT>— Account ID to generate token for -
--login <LOGIN>— Account name to derive ID from (for predefined accounts only) -
--expiration-time-sec <EXPIRATION_TIME_SEC>— Token expiration time in seconds Default value:3600
kamu system gc
Runs garbage collection to clean up cached and unreachable objects in the workspace
Usage: kamu system gc
kamu system info
Summary of the system information
Usage: kamu system info [OPTIONS]
Options:
-
-o,--output-format <FMT>— Format of the output Possible values:shell,json,yaml
kamu system ipfs
IPFS helpers
Usage: kamu system ipfs <COMMAND>
Subcommands:
add— Adds the specified dataset to IPFS and returns the CID
kamu system ipfs add
Adds the specified dataset to IPFS and returns the CID
Usage: kamu system ipfs add <DATASET>
Arguments:
<DATASET>— Dataset reference
kamu system upgrade-workspace
Upgrade the layout of a local workspace to the latest version
Usage: kamu system upgrade-workspace
kamu tail
Displays a sample of most recent records in a dataset
Usage: kamu tail [OPTIONS] <DATASET>
Arguments:
<DATASET>— Local dataset reference
-
-o,--output-format <FMT>— Format to display the results in Possible values:csv: Comma-separated valuesjson: Array of Structures formatndjson: One Json object per line - easily splittable formatjson-soa: Structure of arrays - more compact and efficient format for encoding entire dataframejson-aoa: Array of arrays - compact and efficient and preserves column ordertable: A pretty human-readable tableparquet: Parquet columnar storage. Only available when exporting to file(s)
-
-n,--num-records <NUM>— Number of records to display Default value:10 -
-s,--skip-records <SKP>— Number of initial records to skip before applying the limit Default value:0
kamu ui
Opens web interface
Usage: kamu ui [OPTIONS]
Options:
--address <ADDRESS>— Expose HTTP server on specific network interface--http-port <HTTP_PORT>— Which port to run HTTP server on--get-token— Output a JWT token you can use to authorize API queries
kamu verify
Verifies the validity of a dataset
Usage: kamu verify [OPTIONS] [DATASET]...
Arguments:
<DATASET>— Local dataset reference(s)
-r,--recursive— Verify the entire transformation chain starting with root datasets--integrity— Check only the hashes of metadata and data without replaying transformations
- Trustworthiness of the source data that went into it
- Soundness of the derivative transformation chain that shaped it
- Guaranteeing that derivative data was in fact produced by declared transformations
kamu version
Outputs build information
Usage: kamu version [OPTIONS]
Options:
-
-o,--output-format <FMT>— Format of the output Possible values:shell,json,yaml