Supported Repository Types
Data inkamu is shared via . There are multiple types of repositories that differ by the kinds of services they provide. The most basic repository allows you to upload (“push”) and download (“pull”) data, while more advanced can allow searching datasets, query data in-place, or subscribing for updates.
| Type | Description | Capabilities | URL Examples |
|---|---|---|---|
| Local FS | A basic repository that uses the local file system folder. Mainly used for examples and testing. | pullpush | file:///home/me/example/repositoryfile:///c:/Users/me/example/repository |
| HTTP(s) | A basic repository that provides read-only access to data. | pull | https://example.org/dataset |
| S3 | A basic repository that stores data in Amazon S3 bucket. Can be used with any S3-compatible storage API | pullpush | s3://bucket.my-company.examples3+http://my-minio-server:9000/buckets3+https://my-minio-server:9000/bucket |
| IPFS | Uses IPFS HTTP Gateway for reading. Push is only possible via IPNS (see details). | pullpush | ipfs://bafy...vqheipns://k51q...v7mn |
| ODF | ODF-native repositories types that support very fast data transfer and querying data in-place | pullpushquerysearch | odf+http://odf.acme.comodf+https://kamu.dev |
Push / Pull Aliases
Starting out, you can always use explicit URLs when syncing data from and to repositories. If some remote source contains a dataset you’re interested in you can download it using thepull command:
push command:
kamu will automatically create “pull” and “push aliases” for this dataset:
pull and push commands will analyze the state of the dataset in the and at the repository and will only upload data and metadata that wasn’t previously seen (a minimal update).
These commands are also “safe”. They will detect all types of concurrent changes and history divergence and prevent you from overwriting someone else’s changes.
When you pull a
Root dataset from a remote source running kamu pull on it will attempt to sync it from the repository, and NOT execute a polling or other ingest action. In kamu list datasets with pull aliases will be showing up as Remote(...).Repositories
If you store multiple datasets side by side in some locations you can add it as a repository. Repositories are configured per using thekamu repo command group.
To add new repo use:
kamu repo list.
To pull/push a dataset from/to this repo you can now use remote references like <repo-name>/<dataset-name>: