Apache Superset is a modern, enterprise-ready business intelligence web application that makes it easy to visualise large datasets and build complex dashboards.
At Reflective Data, we are using Apache Superset to monitor all data going through our platform with minimum latency. This allows us to easily combine data from different databases and every analyst can build their own dashboards.
In this article, we are giving a quick overview of what Superset is, what it’s good for and how to get started (installing).
- A rich set of data visualizations
- An easy-to-use interface for exploring and visualizing data
- Create and share dashboards
- Enterprise-ready authentication with integration with major authentication providers (database, OpenID, LDAP, OAuth & REMOTE_USER through Flask AppBuilder)
- An extensible, high-granularity security/permission model allowing intricate rules on who can access individual features and the dataset
- A simple semantic layer, allowing users to control how data sources are displayed in the UI by defining which fields should show up in which drop-down and which aggregation and function metrics are made available to the user
- Integration with most SQL-speaking RDBMS through SQLAlchemy
- Deep integration with Druid.io
Installing Apache Superset
This tutorial describes the simplest and fastest solution for getting Apache Superset up and running in development. In this article, Ubuntu 16.04 is being used as the platform. For other platforms, custom integrations and production installations, please refer to the official documentation.
Step 1 – Install Dependencies
Apache Superset has some OS-level dependencies, the following
sudo apt-get install build-essential libssl-dev libffi-dev python-dev python-pip libsasl2-dev libldap2-dev
On Ubuntu 16.04 If you have python3.5 installed alongside with python2.7, as is default on Ubuntu 16.04 LTS, run this command also
sudo apt-get install build-essential libssl-dev libffi-dev python3.5-dev python-pip libsasl2-dev libldap2-dev
Step 2 – Python’s setup tools and pip
Get the latest version of
pip install --upgrade setuptools pip
Step 3 – Install and initialize Apache Superset
Follow these few simple steps to install Superset
# Install superset pip install superset # Create an admin user (you will be prompted to set username, first and last name before setting a password) fabmanager create-admin --app superset # Initialize the database superset db upgrade # Load some data to play with superset load_examples # Create default roles and permissions superset init # To start a development web server on port 8088, use -p to bind to another port superset runserver -d
If everything went well, you should be able to go to http://localhost:8088 in your browser, log in using the credentials you entered while creating the admin account. You will have some sample data waiting for you that you can use to play around with different visualizations and dashboards.
Step 4 – Connect your database
While playing around with sample data is fun, connecting your own data source gives Apache Superset a whole new meaning.
As Apache Superset doesn’t ship with database connectors, you will need to install this first. This depends on the type of database you are going to connect to. For MySQL, you’d have to
install pip install mysqlclient.
Superset is using SqlAlchemy for connecting to databases.
After logging in to Apache Superset, click on “Sources” and choose “Databases”. There you can add a new connection. All you have to provide is a name and SQLAlchemy URI. The URI will look something like this:
After clicking on “Test connection” you should see the list of tables in your database.
Step 5 – Creating your first report
After successfully connecting your first data source, navigate to “tables”. There you can add new tables, based on the tables you have in your database. After adding a table, click on its name and a data explorer with the data from this table will show up.
Try playing with different metrics, dimensions, time-frames and visualizations. I bet you will be surprised by how easy yet flexible the tool is.
After creating a visualization you like, you can simply add to one of the dashboards. Or create a new one if you like.
Getting data into Apache Superset
While Apache Superset can connect to numerous data sources, one of the most common ones is a data warehouse that combines data from various sources (CMS, CRM, analytics, social etc.).
To get data into a data warehouse, companies usually go for a data pipeline that can either stream or batch load data from different tools and other sources into a data warehouse.
One of the most common data warehouses on the market is Google BigQuery which connects easily with Apache Superset. To get data into BigQuery, we recommend using a Marketing Data Pipeline from Reflective Data.
Apache Superset is a powerful business intelligence tool that has flexible data visualization options and is ready for enterprise usage.
As you saw in this article, getting it up and running in your development environment is fast and easy. If you have any interest in business intelligence and data visualization I strongly recommend giving Superset a try.
In case you have already used Apache Superset or if you have any questions, feel free to share in the comments below.