Analytics Data Pipeline is a system that can stream and/or batch load data from different data sources into one or more databases or data warehouses.
Most analytics data pipelines are built on cloud infrastructure (Google Cloud Platform, Amazon Web Services, Microsoft Azure etc.) and leverage multiple different services.
Events are the basis of most data pipelines, they trigger other actions and take the largest part (storage) of your data lake/warehouse.
There is a wide variety of events that may act as a source for your data pipeline (i.e. Google Analytics). Here are the six main categories of sources that generate an ongoing flow of events.
- Website users
- Click stream
- Google Analytics parallel tracking
- Form analytics
- Server events
- Mobile Apps
- Google Ads
- Facebook Ads
- Twitter Ads
- Feedback tool
- On-site polls
- 3rd parties
- Testing tools
- Personalization tools
- Email providers
- Call tracking
Depending on the size of the organization, the amount of data generated and the planned uses cases, the choice of data destinations can be quite different.
A relational database is very good and, in many cases, the most effective storage option for an analytics data pipeline. It’s relational nature, though, can become limiting when working with large amounts of unstructured or semi-structured data. This is where data warehouses and data lakes come into play.
Data warehouse or data lake
Data warehouses and data lakes are robust data storage solutions, that are specifically designed to ingest data from analytics data pipelines. They are often cloud-hosted and can hold massive amounts of data.
Other tools and platforms
- Analytics Data Pipeline as a Service
- Six Key Components of an Analytics Data Pipeline
- A Simple and Scalable Analytics Pipeline
- Google Analytics Parallel Tracking
Last modified: April 27, 2020
Want to see more articels like What is Analytics Data Pipeline? Check out all definitions in the Analytics Dictionary.