Installation
General Information
kamu
is a single-binary utility that comes bundled with most of its dependencies.
It relies on container-based virtualization (e.g. docker
or podman
) to run such heavyweight frameworks like Spark, Flink, and Jupyter in isolated environments without needing you to install thousands of libraries and bloating your beloved laptop with their dependencies.
The tool comes with very good shell completions, so make sure to configure them!
Please report any issues during the installation process here.
Supported Platforms
Linux
Linux is our primary target environment. We don’t have packages for various Linux flavors yet, but since the tool is just a simple binary it’s very easy to get started:
- Install
docker
using your distro’s package manager (alternatively we highly recommend trying podman) - Make sure you can launch containers without
sudo
by following official documentation - Install
kamu
via installer script by running:curl -s "https://get.kamu.dev" | sh
- Verify your setup by running:
kamu system diagnose
See also:
MacOS X
We fully support Intel and M-series Macs, to install kamu
please follow these steps:
- Install Docker for Mac
- Consider allocating more CPUs and memory to the Docker VM in the settings
- If you want to run
kamu
outside of your user home directory - you may need to add additional mounts to the Docker VM. For example if your workspace is in/opt/myworkspace
you’ll need to mount it under the same name into the VM in Docker settings.
- Install
kamu
via installer script by running:curl -s "https://get.kamu.dev" | sh
- Verify your setup by running:
kamu system diagnose
See also:
Windows (using WSL2)
- Install WSL2 following these steps Make sure you can at least open Linux shell before proceeding
- Install
docker
- We recommend installing Docker Desktop for Windows and enable WSL2 backend
- Ensure that from your linux distribution you can launch containers without
sudo
and the following works:docker run -it hello-world
- Inside WSL2 distribution install
kamu
via installer script by running:curl -s "https://get.kamu.dev" | sh
- Follow installer instructions to update
PATH
and install completions:# Add to the end of your ~/.bashrc PATH="$PATH:/home/$USER/.local/bin source <(kamu completions bash)
- After restarting the shell confirm that this works:
# Executable can be found kamu # Should auto-complete to `kamu version` kamu vers<press TAB>
- Verify your setup by running:
kamu system diagnose
See also:
Windows (using Docker Desktop)
- Install and run Docker Desktop.
- It’s a good idea to give the Docker’s VM more CPU and RAM - you can do so in
VirtualBox
. - Make sure that you can run
docker ps
successfully.
- It’s a good idea to give the Docker’s VM more CPU and RAM - you can do so in
- We recommend using Windows Terminal that supports unicode symbols and full colors
- Download the latest
kamu
binary for Windows - Add it to your
PATH
environment variable
Docker Toolbox runs Docker in a Virtual Machine. This means to mount a file from your host file system into a Docker container the file first needs to be mounted into VM, so make sure all paths that kamu
will need are mapped in VirtualBox VM settings.
C:\Users\me\kamu
. When kamu
runs it will detect that Docker runs in a VM it will convert it to /c/Users/me/kamu
. So in your VM settings you may need to add a mapping from C:\Users\me
to /c/Users/me
.Upgrading
On most platforms a new version of kamu
can be installed by simply re-running the installer script:
curl -s "https://get.kamu.dev" | sh
Installing shell completions
To be able to auto-complete the kamu
commands please install completion scripts for the shell of your choosing. You can find detailed instructions by running kamu completions --help
.
If you use bash
add the following to your ~/.bashrc
file:
source <(kamu completions bash)
If you use zsh
add the following to your ~/.zshrc
file:
autoload -U +X bashcompinit && bashcompinit
source <(kamu completions bash)
A Note on Docker Security
We take your security very seriously. Unfortunately the execution model of docker
that involves running the daemon process under root
violates the Unix user permission model. Combined with the step of making docker
command sudo-less this means that any process you run under your user can potentially access the entire file system with root privileges. Until docker
changes its runtime model, sudo-less access to Docker will remain a security threat.
On our side we are taking following measures to gain your trust:
kamu
and all of its components are open-source and available for review- All of our
docker
images are based on reputable source images and are available for review - When
kamu
startsdocker
containers it limits the scope of volumes it’s mounting to a minimum. You can review the volume mounts by runningkamu
with-v
flag or usingdocker ps
.
To avoid all these issues please consider using podman
- this container runtime operates in daemon-less and root-less mode, so it’s fully compliant with the standard Unix permission model.
Using Podman instead of Docker
podman
is an alternative container runtime that fixes the security shortcomings of docker
. We highly recommend you to give it a try, as we are planning to make it a default runtime in the near future.
In order to instruct kamu
to use podman
run:
kamu config set --user engine.runtime podman
On some systems you need to separately install podman-dnsname
package to allow containers to communicate with one another via hostnames. To check whether you have it run:
podman network create test
podman network ls
# NETWORK ID NAME VERSION PLUGINS
# 9f86d081884c test 0.4.0 bridge,portmap,firewall,tuning,dnsname
# ^^^ plugin installed
podman network prune
Development Images
It is sometimes convenient to get kamu-cli
in a Docker/Podman image. For this we have few options:
ghcr.io/kamu-data/kamu-base:latest
- comes with justpodman
andkamu-cli
pre-installedghcr.io/kamu-data/kamu-base:latest-with-data
- comes with a sample data pipeline that you can use to test different features withghcr.io/kamu-data/kamu-base:latest-with-data-mt
- comes with a sample multi-tenant dataset repository
For example, try running:
docker run -it --rm ghcr.io/kamu-data/kamu-base:latest-with-data kamu list
podman
installed inside the container, so when kamu
runs an engine podman
will be trying to pull an image and start a container from within another container. For such container-in-container setup to work you may need to pass --privileged
flag when running this image.