Collecting close to or above 1 million events per day? Then you’ve probably realized that using the free version of GA4 you’re hitting several limitations around sampling and raw data export.
In this article, we’re going to explain some of the approaches you should be taking to make sure you get the most out of your valuable data without losing any of the valuable insights. Oh, and without having to upgrade to GA360.
Common problems with GA4 for high-traffic websites and apps
1. Quota Limits and Sampling
- Issue: GA4 has quotas for API calls, data collection, and BigQuery exports. For high-traffic sites, these limits can be exceeded, leading to incomplete or delayed data.
- Symptoms:
- Delayed or partial data in reports.
- Data sampling in the interface or when querying large datasets.
- Only partial data is exported to BigQuery every day
2. Latency in Data Processing
- Issue: High traffic volumes can lead to delays in GA4 processing, resulting in longer wait times for data to appear in reports.
- Symptoms:
- Real-time reports lagging or missing data.
- Delays in standard report updates.
3. Data Thresholding
- Issue: GA4 applies data thresholding for reports that include user identifiers or demographic data, especially when dealing with large volumes of traffic.
- Symptoms:
- Reports showing incomplete or aggregated data.
- Warning messages about thresholding.
4. Event Limitations
- Issue: GA4 has a limit of 500 distinct event names per property. High-traffic sites often generate a large variety of events, which can exceed this limit.
- Symptoms:
- Events not being logged or missing in reports.
5. Data Retention Settings
- Issue: GA4 defaults to a data retention period of 2 or 14 months for detailed user data, which can be insufficient for long-term analysis.
- Symptoms:
- Historical data is no longer accessible in GA4 after the retention period.
6. Overwhelming Volume of Custom Dimensions/Parameters
- Issue: GA4 allows up to 50 custom dimensions per property. High-traffic apps often push this limit, causing issues with tracking extra custom data.
- Symptoms:
- Missing or dropped dimensions in reports.
7. Cross-Platform and Cross-Domain Tracking
- Issue: High-traffic businesses with complex setups (e.g., mobile app + website) may face difficulties implementing seamless cross-platform tracking.
- Symptoms:
- Duplicate or fragmented user sessions across platforms.
8. Debugging and Testing Challenges
- Issue: High traffic can make it difficult to test changes without impacting production data.
- Symptoms:
- Errors in tracking configurations affecting a large user base.
9. Increased Cost for BigQuery Analysis
- Issue: Exporting large volumes of data to BigQuery can result in significant costs, especially when running frequent or complex queries.
- Symptoms:
- Unexpectedly high cloud bills for BigQuery.
10. User Privacy and Compliance Challenges
- Issue: High-traffic businesses are more likely to face scrutiny regarding GDPR, CCPA, and other privacy laws.
- Symptoms:
- Compliance risks due to insufficient data anonymization or consent management.
If any of the issues mentioned above sounds familiar, continue reading as there are solutions to all of them.
How to tackle common issues with GA4 for high-traffic websites and apps
As companies start hitting the limits in the free version of GA4, oftentimes they consider upgrading to GA360. True, it will solve some of the issues like offering a higher number of daily exported events to BigQuery, more custom event parameters, dimensions etc. but eventually, you may end up hitting those, too. Besides, upgrading to 360 doesn’t solve some of the issues around GDPR, and data sampling can still be an issue. Not to mention the cost…
The solution we are discussing in this article is known as Parallel Tracking.
In short, here’s how Parallel Tracking for GA4 works.
1. Tracking Code Adjustment
A minor update to the Google Analytics tracking code is necessary to enable streaming all hits to Reflective Data’s endpoint. This approach, called Parallel Tracking, is compatible with all types of GA4 implementations, including GTM, server-side setups, gtag.js, and third-party applications.
2. Event Processing
The Data Processing Engine captures and processes all events in the same manner as Google Analytics. Designed for nearly unlimited scalability, it avoids the data processing restrictions found in GA4. All operations are hosted on Google Cloud, with the flexibility to choose your preferred region.
3. Data Storage in Your Warehouse
By default, data is stored in Google BigQuery, though we support other data warehouses such as AWS, Azure, and Snowflake. All data is fully processed and ready for reporting within seconds.
Reflective Data does not store any events or other data on its servers at any time.
How does Parallel Tracking solve common limitations in GA4
1. Quota Limits and Sampling
- Issue: GA4 has quotas for API calls, data collection, and BigQuery exports.
- Solution:
- Parallel Tracking for GA4 has no limits on the BigQuery exports. We know sites that are exporting well over 10M events per day without losing any events.
2. Latency in Data Processing
- Issue: High traffic volumes can lead to delays in GA4 processing, resulting in longer wait times for data to appear in reports.
- Solution:
- With Parallel Tracking, you can have GA4 events processed and stored in your data warehouse within seconds.
3. Data Thresholding
- Issue: GA4 applies data thresholding for reports that include user identifiers or demographic data, especially when dealing with large volumes of traffic.
- Solution:
- Parallel Tracking for GA4 allows you to work with 100% of the data. No limits on data cardinality or event counts.
4. Event Limitations
- Issue: GA4 has a limit of 500 distinct event names per property. High-traffic sites often generate a large variety of events, which can exceed this limit.
- Solution:
- With Parallel Tracking, there are no limits to distinct event names. Create, trigger and send as many unique events as necessary for the use case.
5. Data Retention Settings
- Issue: GA4 defaults to a data retention period of 2 or 14 months for detailed user data, which can be insufficient for long-term analysis.
- Solution:
- Using Parallel Tracking to send GA4 data into a data warehouse of your choice (BigQuery, S3, Snowflake, Redshift etc.) ensures you have full control and ownership of your data.
6. Overwhelming Volume of Custom Dimensions/Parameters
- Issue: GA4 allows up to 50 custom dimensions per property. High-traffic apps often push this limit, causing issues with tracking extra custom data.
- Solution:
- Parallel Tracking doesn’t set any limits on the number of custom dimensions, metrics or event parameters that you are allowed to collect.
7. Cross-Platform and Cross-Domain Tracking
- Issue: High-traffic businesses with complex setups (e.g., mobile app + website) may face difficulties implementing seamless cross-platform tracking.
- Solution:
- With Parallel Tracking for GA4 comes the ultimate flexibility around distinguishing and joining data from various sources. This includes web domains (including subdomains), mobile apps, web apps, server applications and more.
8. Debugging and Testing Challenges
- Issue: High traffic can make it difficult to test changes without impacting production data.
- Solution:
- Parallel Tracking comes with a robust environment for staging and testing all changes before pushing them into production. Never lose valuable data because of a broken tracking system again.
9. Increased Cost for BigQuery Analysis
- Issue: Exporting large volumes of data to BigQuery can result in significant costs, especially when running frequent or complex queries.
- Solution:
- Parallel Tracking gives you full control over your data pipeline. This means you can adjust the settings to collect only what you need to avoid overpaying for the data warehouse vendor.
10. User Privacy and Compliance Challenges
- Issue: High-traffic businesses are more likely to face scrutiny regarding GDPR, CCPA, and other privacy laws.
- Solution:
- With Parallel Tracking you can choose in which region your data is processed and stored. For GDPR compliance, for example, several sites keep all their data within the EU (this includes collection, processing and long-term storage).
As you can see, Parallel Tracking is the best companion to your GA4 implementation on a high-traffic website. Not only is it multiple times more affordable than its alternatives (including GA360) but it’s the only platform that provides a solution to all of the common issues.
If you want to learn more about GA4 Parallel Tracking, please schedule a free consultation session with one of Reflective Data’s account managers or data engineers (depending on your needs).