Sampling in Google Analytics – Points to Remember6 min read
WHAT IS GOOGLE ANALYTICS?
Google Analytics is a freemium web analytics service offered by Google that tracks and reports website traffic. This is because, for most companies, their website works as a core for all of their digital traffic which makes it the best route to give you an evaluative outlook of the effectiveness of all the campaigns.
However, a sapless strategy in analytics will shackle the very base of generating traffic and maximizing your business. Despite providing a plethora of tools, Google Analytics sometimes fails to process Herculean amount of data. Several cases have been reported when the sites with heavy traffic have received a report from sampled data instead of the required instantaneous report. This is where Google Analytics is somehow forced to work out on a small pool of Big data to come out with a report instantly. Seldom it happens that sampled data is not statistically sound or the caricature of the original data, So the report turns out to be filled with many errors and is not accurate.
WHAT IS SAMPLING?
Analyzing the oceanic amount of data without the deployment of technology seems to be entirely tedious and impractical. To solve this, analysts resort to a secret weapon called “Sampling.” Sampling is a simple procedure for analyzing a miniature part of the enormous data to uncover the important information present in the massive amount of data.
HOW DOES SAMPLING WORK IN GOOGLE ANALYTICS?
The default report is free from sampled data. However, if you require a custom report, filter the report in any way, or want to add a segment. Then sampled data is the one which you will be looking at. The default sample threshold as set by GA standard version is 500,000 session. For the premium version thresholds vary according to how queries are configured.
Seldom this possesses a problem. If you are considering a single profile, and it contributes to around 99% of the total traffic to web property, and after applying a segment if you’re looking at an 80-90% sample, then the outcome seems to be pretty legitimate. The problem arises when the sampled data represents only a smaller fraction of the entire data. This sampled data is clearly not the representative of entire bulk, even though, a 10% sampling data may encounter huge fluctuations, but the results derived on such a smaller fraction of the entire data would be misleading. Google Analytics despite having an intelligent algorithm to ensure sampling doesn’t procure adverse effects to your data, it only uses a mere 5% of your data which is entirely inactive.
HOW CAN YOU FIX IT?
The following are a few solutions to help you get clean data and crisp insights again-
1. Filter and reduce the data range:
When examining a report which has met the sampling threshold or slightly jumps past the boundary, the interface shows that the report is being sampled. Rather than viewing the data of entire month all at once, it will be a save if you start looking at smaller chunks individually, say, the data of four weeks in that month. This ensures that a subset of data is being viewed which maps fewer sessions, and thereby restricting it under the sampling threshold. Yes, this somehow seems to be a very monotonous act to pursue, but when this is done you can merely club all these reports into a single report for the month with the help of tools outside GA.
2. Use Standard Reports:
Google Analytics offers some great pre-configured reports to work with out of the box, but the ability to customize and build your own reports from scratch is what allows marketers to gain truly valuable insights from the tool. Not only is this a huge time saver, but it’s also a great way to get ideas for reports you might not otherwise think to create. The standard reports in Google Analytics provide an edge over the custom reports since these are not sampled. This can be verified since in a standard report you won’t be able to mark the presence of sample message.
3. Query Partitioning:
Another peaceful solution to the effects of sampling is “breaking up the timeframe into smaller timeframes.” Data Sampling depends upon the type of user’s query and sampling rate can vary from query to query. Each GA view has got a set of pre-aggregated data which are used to display unsampled reports quickly. If the query can be completely satisfied with the existing unsampled and pre-aggregated data then GA does not sample the data otherwise it does. For example, a complete set of a year’s data can be divided into 12 separate months (12 different queries), or simply a month’s data can be broken into four separate queries (each query resembling a week’s data).
However, to further ease the process, tools such as ShufflePoint and Analytics Canvas does this for you with free of cost in an automated process. The tools partition the query and programmatically loop through the desired timeframe, aggregating the report back together once done. It appears as if the tool is making one query, but actually, it’s partitioning of queries. This programming is based on how you configure granularly. However, it may require some time investment in experimenting to find an equilibrium between pace and precision. Both of these tools are compatible with the extraction of data from Google Analytics and creating reports.
4. Query Explorer:
Google Analytics Query Explorer lets you create API queries to retrieve raw data from a specific account in Google Analytics. What exactly it can do? For starters, it allows you to download the raw website data that Google Analytics UPI doesn’t represent well. With this raw data, one can create their own custom reports and dashboards. The best part, it can refresh live data whenever required.
5. Downloading and analyzing unsampled reports:
This is only applicable if you are a Google Analytic Premium user. Google Analytics unleashed a new feature called Custom Tables which opens up to the creation of a custom table with metrics and dimensions of your choice. It enables you to download the unsampled report. You can create a report which is otherwise sampled, as a “Custom Table” which resembles the out-of-box reports. However, this option is available only for a premium user. Custom tables can, however, be a time-saver and also a lead for the creation of new sets. You can create up to a 100 Custom Tables. This is amazing because you won’t have to worry about the sampled data for the reports you use often.
6. Big Query:
BigQuery is yet another feature only for premium users. Google Analytics desegregates with Google BigQuery which opens up the realm for moving huge datasets and also does SQL based querying. This feature can crunch and process data spanning over billions of rows. This also opens the door to unsampled hit level data, which in short opens up for a robust and very sophisticated analysis.
These are some of the ways to evade from the gruesome task of sampling of data. Because it is evident that technology is powered through the need to do a lot less with a lot more, using new tools to make our lives easier.