Apache Superset is a modern, enterprise-ready business intelligence web application that makes it easy to visualise large datasets and build complex dashboards.
At Reflective Data, we are using Apache Superset to monitor all data going through our platform with minimum latency. This allows us to easily combine data from different databases and every analyst can build their own dashboards.
In this article, we are giving a quick overview of what Superset is, what it’s good for and how to get started (installing).
Key Features
- A rich set of data visualizations
- An easy-to-use interface for exploring and visualizing data
- Create and share dashboards
- Enterprise-ready authentication with integration with major authentication providers (database, OpenID, LDAP, OAuth & REMOTE_USER through Flask AppBuilder)
- An extensible, high-granularity security/permission model allowing intricate rules on who can access individual features and the dataset
- A simple semantic layer, allowing users to control how data sources are displayed in the UI by defining which fields should show up in which drop-down and which aggregation and function metrics are made available to the user
- Integration with most SQL-speaking RDBMS through SQLAlchemy
- Deep integration with Druid.io
(Source: https://superset.incubator.apache.org)
Screenshots
(Source: https://superset.incubator.apache.org)
Installing Apache Superset
This tutorial describes the simplest and fastest solution for getting Apache Superset up and running in development. In this article, Ubuntu 16.04 is being used as the platform. For other platforms, custom integrations and production installations, please refer to the official documentation.
Step 1 – Install Dependencies
Apache Superset has some OS-level dependencies, the following
sudo apt-get install build-essential libssl-dev libffi-dev python-dev python-pip libsasl2-dev libldap2-dev
On Ubuntu 16.04 If you have python3.5 installed alongside with python2.7, as is default on Ubuntu 16.04 LTS, run this command also
sudo apt-get install build-essential libssl-dev libffi-dev python3.5-dev python-pip libsasl2-dev libldap2-dev
Step 2 – Python’s setup tools and pip
Get the latest version of pip
and setuptools
libraries
pip install --upgrade setuptools pip
Step 3 – Install and initialize Apache Superset
Follow these few simple steps to install Superset
# Install superset pip install superset # Create an admin user (you will be prompted to set username, first and last name before setting a password) fabmanager create-admin --app superset # Initialize the database superset db upgrade # Load some data to play with superset load_examples # Create default roles and permissions superset init # To start a development web server on port 8088, use -p to bind to another port superset runserver -d
If everything went well, you should be able to go to http://localhost:8088 in your browser, log in using the credentials you entered while creating the admin account. You will have some sample data waiting for you that you can use to play around with different visualizations and dashboards.
Step 4 – Connect your database
While playing around with sample data is fun, connecting your own data source gives Apache Superset a whole new meaning.
As Apache Superset doesn’t ship with database connectors, you will need to install this first. This depends on the type of database you are going to connect to. For MySQL, you’d have to install pip install mysqlclient
.
Superset is using SqlAlchemy for connecting to databases.
After logging in to Apache Superset, click on “Sources” and choose “Databases”. There you can add a new connection. All you have to provide is a name and SQLAlchemy URI. The URI will look something like this:
mysql://root:XXXXXXXXXX@104.198.32.xxx:3306/rd_demo_db
After clicking on “Test connection” you should see the list of tables in your database.
Step 5 – Creating your first report
After successfully connecting your first data source, navigate to “tables”. There you can add new tables, based on the tables you have in your database. After adding a table, click on its name and a data explorer with the data from this table will show up.
Try playing with different metrics, dimensions, time-frames and visualizations. I bet you will be surprised by how easy yet flexible the tool is.
After creating a visualization you like, you can simply add to one of the dashboards. Or create a new one if you like.
Getting data into Apache Superset
While Apache Superset can connect to numerous data sources, one of the most common ones is a data warehouse that combines data from various sources (CMS, CRM, analytics, social etc.).
To get data into a data warehouse, companies usually go for a data pipeline that can either stream or batch load data from different tools and other sources into a data warehouse.
One of the most common data warehouses on the market is Google BigQuery which connects easily with Apache Superset. To get data into BigQuery, we recommend using a Marketing Data Pipeline from Reflective Data.
Conclusion
Apache Superset is a powerful business intelligence tool that has flexible data visualization options and is ready for enterprise usage.
As you saw in this article, getting it up and running in your development environment is fast and easy. If you have any interest in business intelligence and data visualization I strongly recommend giving Superset a try.
In case you have already used Apache Superset or if you have any questions, feel free to share in the comments below.
Hello Silver,
I have created a python Django application and I want to link a dashboard that I have made in the superset.
How could I do that? I did so much research on the internet but I didn’t get the appropriate answer.
It would be very helpful if you help me to integrate this project.
Hi Bhavik,
I would need more details about what you’re trying to build here.
What do you mean exactly by linking the dashboard with your Django application? Do you want to embed your Superset dashboard within your application or send view data about your application in Superset or something else?
Silver
Hi Silver,
First of all thanks for your response.
I want to embed my superset dashboard within my Django application.
I have created a dashboard button in my Django application and when I click on that dashboard button I should see the dashboard which I have made in the superset.
Hi Bhavik,
We haven’t done this ourselves but I recommend you check out this feature currently in beta.
https://github.com/apache/superset/issues/17187
Hi Silver,
I have a requirement in one of my project to display all kind of reports in Superset (related to Agile). The data we are relying on resides in JIRA. Following are the objectives to achieve:
1. To export data from JIRA to backend database of SuperSet.
2. Automate data loading process making it real time i.e. as soon as new task is created or updated, it should be loaded immediately into the backend.
3. Customizable and flexible dashboard accommodating new reports and questions/demands from key stakeholders.
Please let me know your valuable suggestions/comments for these objectives. That will really help in initiating this project.
Appreciate your help
Thanks,
Adeel Faraz
Hi Silver,
Can we Integrate superset into an application?
Regards,
Ashish
Hi Ashish,
Considering it’s an open-source software then sure. But this will require a good amount of custom development unless you use something really basic like HTML iframes. I don’t think there’s a good guide for doing this, though.
Silver
Hi Silver,
is there an enterprise Superset open-source version available?
We are able to see the Superset developer version.
How & Where to identify the Enterprise version?
HI Team,
Can you please provide the basic steps to install the superset in Redhat linux and what are the prereqisites for the same?
Hi Silver,
Is there a way to check how much traffic I’m getting on my superset dashboard and how much time is each user spending on various sub-tabs within the dashboard?
Hi Ram,
Theoretically, you should be able to install something like Google Analytics on Superset but we haven’t done this in practice.
Hi Silver,
When I execute this command
# Load some data to play with
superset load_examples
I get an error that,
load_examples is not a known command.
Can you help me ??
Hello James!
Just checked and works for me. Please check/ask here https://github.com/apache/incubator-superset/issues