Apache Superset is an modern open-source data exploration and visualization platform. It can be connected to kamu via native Arrow Flight SQL protocol using Python client.

In this setup:

  • kamu runs a Flight SQL server - a high-performance protocol for data transfer
  • In the Superset environment we install an additional flightsql-dbapi Python package
  • Superset uses generic database API provided by SQLAlchemy framework
  • flightsql-dbapi package provides custom engine implementation for SQLAlchemy that translates all Superset’s queries into Flight SQL protocol

To connect Superset to kamu follow these simple steps:

  1. Start with being able to run Superset locally using docker-compose (see this official guide)
    • Rest of the guide assumes that you are launching superset in “non-dev” mode using:
      docker-compose -f docker-compose-non-dev.yml up
      
  2. Install flightsql-dbapi Python package into Superset container:
    • Stop and clean up the environment:
      docker-compose -f docker-compose-non-dev.yml down
      
    • Create <superset repo>/docker/requirements-local.txt file (as per this guide) with the following contents:
      # At the time of this writing Superset used arrow version with a critical to us bug
      pyarrow==13.0.0
      flightsql-dbapi==0.2.1
      
  3. (Optional) Specify your MabBox API Token in <superset repo>/docker/.env-non-dev
  4. Run kamu Flight SQL server in a desired workspace:
    kamu sql server --address 0.0.0.0 --port 50050
    
  5. Start Superset via docker-compose again
  6. Create a new database connection in Superset
    • Use "Other" database kind
    • As URL specify:
      datafusion+flightsql://anonymous:anonymous@<hostname or IP>:50050?insecure=True
      
  • Skip insecure=True when node is set up with TLS
  • To authenticate via access token use:
    datafusion+flightsql://<hostname or IP>:50050?token=<TOKEN>