What is Analytics Data Pipeline

Analytics Data Pipeline is a system that can stream and/or batch load data from different data sources into one or more databases or data warehouses.

Most analytics data pipelines are built on cloud infrastructure (Google Cloud Platform, Amazon Web Services, Microsoft Azure etc.) and leverage multiple different services.

Data sources

Events are the basis of most data pipelines, they trigger other actions and take the largest part (storage) of your data lake/warehouse.

There is a wide variety of events that may act as a source for your data pipeline (i.e. Google Analytics). Here are the six main categories of sources that generate an ongoing flow of events.

Website users
- Click stream
- Google Analytics parallel tracking
- Form analytics
Server events
- Orders
- Payments
Mobile Apps
- Real-time
- Batch
Ads
- Google Ads
- Facebook Ads
- Twitter Ads
- Others
Feedback tool
- Surveys
- On-site polls
3rd parties
- Testing tools
- Personalization tools
- Email providers
- Call tracking

Data destinations

Depending on the size of the organization, the amount of data generated and the planned uses cases, the choice of data destinations can be quite different.

Relational database

A relational database is very good and, in many cases, the most effective storage option for an analytics data pipeline. It’s relational nature, though, can become limiting when working with large amounts of unstructured or semi-structured data. This is where data warehouses and data lakes come into play.

Most common relational databases are MySQL and PostgreSQL.

Data warehouse or data lake

Data warehouses and data lakes are robust data storage solutions, that are specifically designed to ingest data from analytics data pipelines. They are often cloud-hosted and can hold massive amounts of data.

Most common data warehouses are BigQuery, Amazon Redshift and Snowflake.

Other tools and platforms

Useful links

Last modified: April 27, 2020

Want to see more articels like What is Analytics Data Pipeline? Check out all definitions in the Analytics Dictionary.