How Sampling in GA takes your business decisions in the wrong direction?6 min read
Sampling as a subject is not that agonizing instead is a life saver jacket when dealing with an oceanic amount of data. But, it does break a sweat when it is applied in Google Analytics. So, what is meant by sampling and how it is used in Google Analytics?
In data analysis, sampling is the method of experimenting with a small subset of data in order to discover an eloquent amount of insights in the more extensive data set. Sampling is done when it’s quite impractical or infeasible to examine the complete data set.
Let’s take an example and understand this concept if you want to count the number of people staying in a city where there is 100 locality and has a uniform distribution of the population over the regions. Then it will be quite the enigmatic job to count the people of all 100 areas. Instead, we can resort to the mean of sampling, where we will calculate the number of persons staying in a single locality and multiply by 100.
Sampling as a whole shouldn’t be feared, but when it is included in Google Analytics, then it is quite challenging to digest. Solely because of this, you should learn about the intricacies that surround the domain of sampling in Google Analytics and ways to avoid it.
When does Sampling occur in Google Analytics?
Google Analytics doesn’t resort to sampling unless or until it is necessary. There are cases where, without sampling, a specific report cannot be generated. So, let’s learn about these cases to tackle the agonizing effect of sampling.
Sampling in different reports:
- Default Reports:
Usually, in the case of a default report sampling is not practiced. Default reports are generated in the left pane under Acquisition, Behavior, Audience and Conversions. Analytics stores a complete set of unfiltered data for each property and each account. GA frequently adds new reports and also changes the method of calculation of metrics. Sampling comes into the picture when the date range of a report is included before the commencement of the report. Even it gets applied when a metric calculation is changed, in these cases Analytics issues an ad-hoc query and sampling is triggered.
Also, if you are using UTM override feature then sampling can be stimulated.
- Ad-Hoc Reports:
Whenever there is a modification applied in a default report in some manner, like using a segment, or an alteration in the filter or dimension and also during the creation of a custom report with the help of dimensions and metrics that are not available in a default report, the last step is known as the generation of an ad-hoc query.
Not all ad-hoc queries undergo sampling, the specific queries which exceed the threshold of your property type.
Below is a screenshot of a report based on sampling where you can see the message, “The report is based on 9.52% of sessions”. If you see this message flashing on the top right corner of your report page, then you see a sampling report.
- Multi-Channel Reports and Attribution Reports:
It closely resembles the default reports sampling cases, like unless you tend to modify the report no sampling is applied if there is any sort of modification which encompasses the examples of changing the lookback window or changing the included conversions or even the addition of a segment or a secondary dimension.
However, the upper limit of the sample is 1M conversions that can be returned.
- Flow-Visualization Reports
For a selected date range, the maximum sessions that can be registered by the flow-visualisation report are around 100K sessions.
These are the most probable cases where Google Analytics tend to resort to the tool of sampling.
How does it hurt you?
Sampling can easily hurt your analysis, and this analysis will coerce you to take a decision that will eventually torment you. So, let’s learn how sampling can burn a hole in your analysis.
- When Google Analytics start taking up a small subset of your entire data, then you basically cannot trust on the metrics that are reported by it as the subset of data may not contain all the insights of your complete traffic data set.
- If you get to see the sampling mark then any of the metrics among “sessions”, “page views”, “users”, “bounce rates”, “conversion rates” and even the “revenue” precision can be anywhere near 10% to 80%, which is undoubtedly a disaster to take inference from.
“A sampling report showed the revenue to be $1.2 million whereas the original report told it was nearly $950k”
Any decisions taken from the inference of such reports will land you up in grave disaster and can surely take your business towards a state of oblivion.
Let’s consider some examples and speculate how menacing sampling can be to your business decisions and your marketing domain.
Let’s consider a campaign run by person “X”, which is getting around 500K sessions in a day, and he wants to find out how well it is performing every hour.
To get this detail, he logs into Google Analytics, and when he applies all the segments and the dimensions required for an hourly report, he sees that a report is made from a sample of data. This report is quite vague since not all the sessions are included and hence you do not know which is the peak hour, where the maximum conversions take place and which campaign performs miserably.
Sampling makes you see the smaller and instead the wrong picture and inference drawn from such datasets will surely make you fumble in the long run.
What are the measures to counter the monster ‘sampling’?
There are several measures to deal with sampling and is mentioned below:
- While using Sampling always resort to greater precision
The larger the data set, the more accurate the results are, and the smaller the data set is, the less traffic estimates it generates.
However, there is an upper limit to the selection of the data.
- Avert from using Secondary dimensions or Advanced Segments
While dealing with reports, avoid using advanced segments settings and secondary dimensions.
- Try to run your reports for shorter time frames
- Avoiding filtered views
- Avert from using custom reports
- Using Query Partitioning
Try to break your queries in such a way that it doesn’t trigger sampling. There are several tools available for the same.
We saw how sampling can be a curse for a business and can sometimes cause havoc while generating inference. So, it is always better to stay away from sampling. However, during unavoidable cases, the hacks mentioned above can surely be a life saver.
Still got worried about sampling. Allow us to look into your data and generate reports and also help you to take your critical decisions.