The number of marketing tools an average business uses has grown rapidly. Besides one or two analytics platforms there’re a few ads platforms, CRM, CMS, several social media platforms, an email system and probably a few more tools and platforms.
All of those tools are supposed to make our work as marketers, business owners or data analysts easier and more effective. In reality, though, you will end up with a bunch of silos – systems that don’t really communicate well with each other and almost never agree on any of the important metrics.
Data silos create confusion and disagreement between teams, leading to a situation where, at the end of the day, no-one knows which tool or numbers to trust.
You need a marketing data warehouse.
As time goes on and your business grows, the number of marketing tools and platforms involved is likely to grow as well. Without one central location for all data, this means more and more data silos and less efficiency.
A marketing data warehouse is the only real solution to break these silos. To put it very simply, you need a system that sends all of your marketing data, from all tools, into one central location.
Below are some of the reasons that should make it clear that your business needs a marketing data warehouse.
– Single source of truth: the main benefit you receive from having a marketing data warehouse is that everyone from your business will look at the same metrics, calculated the same way and coming from one location. This alone will save you from a lot of confusion and possible conflicts between teams and functions.
– A complete overview of the user journey: no one tool knows everything about your users thus looking at them doesn’t show you the complete user journey. Combining data from all tools and sources, both online and offline gives you a much more detailed picture of the entire user journey. Furthermore, this gives you much more accurate long term metrics like users’ lifetime value.
– Access to raw data: most marketing tools’ user interfaces show you aggregated data and calculated metrics. Besides limiting your ability to do your own calculations, this is also the main reason why data from different tools never match – they simply measure and calculate metrics like users, sessions, conversion rate, revenue etc. in a different way. When working with raw data, you define your own aggregation and calculation rules – this should be agreed on company level.
– More accurate attribution: tools like Google Analytics are great when you’re just getting started with marketing channel attribution. What most tools lack, though, are access to all touch-points (online and offline) and the ability to use custom attribution models (Markov, FBA, ML-based etc.).
– Manual data joining doesn’t scale: exporting data from various sources and joining them in Excel/Sheets can give you interesting insights you wouldn’t have seen in any single tool but let’s be honest, this system isn’t very scalable. Working with a data warehouse, you can pre-join commonly combined data into views or pull everything together in a dashboard.
– Visualization and BI tools: unfortunately, not all marketing tools have a connector with all major data visualization and BI tools. Working with CSV files and Sheets isn’t automatic nor scalable. Data warehouses like BigQuery, have a native integration with nearly all decent BI and visualization tools.
– Machine learning: with tools like BigQuery ML, Auto ML and others, machine learning has become available to the masses. The only part that’s missing, though, is access to high-quality raw hit-level data that is required for training your models. With all your marketing data in your data warehouse, you can create models from product recommendations to smarter remarketing campaigns.
***
I could go on for hours writing about the benefits that favor having a proper data warehouse. What about you? Let us know of the reasons your business needs (or already has) a data warehouse in the comments below.
How to set up a data warehouse?
Having a data warehouse is kind of a no-brainer these days. That doesn’t mean, though, that setting one up is simple as 1-2-3.
Yes, there are some tools that are kind of plug-and-play solutions. But just like every business is different so are their requirements for a data warehouse. There is no real one size fits all solution here.
Mapping data sources
Before thinking of anything else, map all the data sources you have. This can include but isn’t limited to: Google Analytics, CRM, CMS, offline data, Ads platforms, email tools, SMS and push notification platforms etc.
Next, try on group all data sources into a document and sort them somehow in a logical order. For example, (1) analytics platforms, (2) ads platforms etc.
After writing down and sorting your data sources, think about each one. Take a look at their documentation and data structure. In your document, take note of what kind of data and in what format you’d need in your data warehouse. Start by thinking of how you’re using this data today.
Furthermore, take note of how fresh each dataset has to be. Do you need in real-time, hourly, or maybe a daily update is fine?
As an extra, I recommend you check if your data source has a public API or if they allow automated data export. If that’s not the case, it’s highly likely you can’t easily export data from the tool and I’d look for an alternative for that tool. Every decent marketing tool should have a public API for exporting data.
Mapping use cases
Based on the list of data sources you have available, now it’s time to think about all the use cases for a data warehouse that includes all of them.
Again, I recommend you start a document and structure it logically based on some category. For example, (1) attribution, (2) user behavior analysis, (3) alerts etc.
Of course, you don’t have to think about everything beforehand but it helps you later on deciding the right schema, load frequency (batch load vs stream), pre-calculated views and metrics etc.
Start by thinking of all the ways you are currently using the data available within each marketing tool. Next, think of ways you’ve joined the data in the past using tools like Excel or Google Sheets. Then, I recommend, you go completely creative. Think of things you were never able to get from your data but would really like to. For example, analyzing user journey across all touchpoints, online and offline.
Build or buy?
Building a data warehouse has never been easier. Every cloud platform has their own offering and several solutions have been built on top of major cloud platforms.
To be honest, setting up a data warehouse is the easy part. What makes it difficult though, is getting data in the data warehouse – automatically, reliably, fast and without duplicates. This part is known as a data pipeline and this is the system that feeds your data warehouse and, ultimately, your business with trustworthy data.
Now, where do you get a solid data pipeline? You have three options.
1. Build: if your company has enough resources (money, time, developers, data scientists, data engineers) then building everything in-house might be a good idea. There are some useful building blocks available in most cloud platforms (i.e. Cloud Dataflow, Pub/Sub, Cloud Functions in GCP). What I would recommend, though, is to make sure you have at least one person in the team that has led a data pipeline build before or at least someone you could consult with. Otherwise, there are things that will probably go wrong in your first attempt.
2. Ready-made solutions: depending on the level of customization you require, some of the ready-made data pipeline solutions might fit most of your needs. Some tools to check out are Stitch, Fivetran, Funnel and Segment. All of them do a great job at what they’re built for – offering fixed or semi-flexible data pipeline to feed your data warehouse with data. Unfortunately, though, each of them has their shortcomings. For example, none of them is able to send raw hit-level Google Analytics data into your data warehouse and process it into sessions based on your rules. Oh, and setting such a tool up can still be a considerable amount of work.
3. Data Pipeline as a Service: to gain full control of your marketing data, you need a system that’s tailored to your business’ needs. A way to get there is to hire a company that has all the necessary infrastructure (connectors, integrations, batch loaders, monitoring systems etc.) and is willing to take a personal approach with every client. This means getting to know your business, interviewing your team and building a solid data pipeline using the building blocks they already have – and code everything they don’t.
***
Start by mapping your data sources and use cases as described above. Then see if any of the ready-made solutions could fit your needs. If you need a more customized/advanced solution, think about your internal resources. If you need help with deciding which way to go or want to learn more about data pipeline as a service, talk to experts at RD.
Costs
No matter what path you choose, your costs will mostly depend on the number of different data sources and the amount of data you’re going to collect.
1. Build: the process of planning, building and testing a data pipeline is likely to take 1-3 months. Pricing for the cloud platform depends on usage but expect to pay at least $250/month. Most cloud platforms offer calculators to help you estimate the cost (i.e. Google Cloud Platform). You are probably not going to build all connectors yourself, therefore expect to pay $500 – $1,000 for tools like Stitch and SuperMetrics.
2. Ready-made solutions: some solutions start as low as $120/mo but the pricing grows rapidly into thousands based on your usage. Another thing to keep in mind is that some providers have free connectors and premium connectors, meaning you might have to pay extra for some data sources.
3. Data Pipeline as a Service: when hiring a company to build you a custom data pipeline, there will almost always be an initial setup cost involved. This depends heavily on the number of connectors and customization required. A very light setup usually starts around $500 and a decent pipeline is usually a few thousand $ (one-time fee). In most cases, the monthly cost depends on usage only. In fact, we’ve seen this to be the most cost-effective solution of the three in the long run. This is because there is no premium to pay for certain connectors and data streams are better optimized than in most in-house solutions.
How to move forward
As mentioned before, start by mapping your data sources and use cases. Then, based on your needs and resources available try to pick one of the paths listed in the previous section. If you’re not sure if a solution is right for you, I’d recommend contacting some service providers or independent data warehousing experts. You can start by posting your questions or ideas in the comments below this article.
Learn more about the Data Pipeline as a Service offered by Reflective Data and schedule a call with one of the experts today.
Couldn’t agree more. Before we invested big in our marketing data warehouse, everything was in silos and it was a real pain in the ass to be honest. BigQuery is by far the best and most cost effective option out there. Highly recommended!