I get a healthy amount of referral traffic, but for a long time I have not known what this means. Social traffic clearly comes from social media pages, Organic Search traffic comes from search engines like Google, and Direct traffic is usually a combination of unreported organic search, direct typing of your URL, and bookmarks.
Then there is Referral traffic which is defined as the following:
Google’s method of reporting visits that came to your site from sources outside of its search engine. When someone clicks on a hyperlink to go to a new page on a different website, Analytics tracks the click as a referral visit to the second site.
When I reviewed my Google Analytics Acquisition data (which provides valuable information for your blog/business), I found that a large amount of traffic was as a result of referral traffic channels. Once I dug deeper, I found that these sources included ‘floating share buttons’, ‘get free social traffic’, and other obscure sources.So what does this mean exactly? Well, this is all
So what does this mean exactly? Well, this is all referrer spam.
What is Referrer Spam?
Referrer spam is fake traffic from spam bots, which mimic a referral link. In this case a spammer makes multiple requests to your website, using a fake referrer URL to a site that they want to advertise. The aim of spammers is to promote their site, to index the contents of your website, and to build links to their site so that they can improve their search engine rankings.
Additionally, as spam bots crawl through your site, they eat up bandwidth and slow your site down. Furthermore, some spam bots also identify (and sometimes attack) vulnerabilities on your site.
Now, there are 2 main types of referrer spam; Ghost referrer spam and Non-Ghost referrer spam (crawlers).
The Ugly Side of Referrer Spam
Someone somewhere directs spam to Google Analytics accounts, leaving laymen like me feeling like traffic royalty. This referrer spam distorts your Google Analytics data, making your research and analysis corrupt and redundant. This can be especially crippling for small businesses who are not receiving a lot of traffic to begin with.
At the moment Google has left us in the dark, meaning that we have to stop this referral spam ourselves. There are many ways of doing this such as using .htacess, but I don’t really understand this method, and I’m wary of making coding changes.
Thankfully, you can block referrer spam right from your Google Analytics (GA) account. Granted, the drop in traffic will be a blow to your ego, however, it should motivate you to work harder on boosting your traffic. Below I will define referrer spam, and then show you how to filter this spam effectively.
STEP 1: FILTERING GHOST REFERRER SPAM
What is ghost referral spam?
Ghost spam is spam bots that never interact with your site. These spammers send data directly to your GA servers, so that they appear as referral traffic in your reports.
Therefore you are getting referral traffic like ‘floating share buttons’ from spammers who have never accessed your site.
We won’t get into the technicalities of how these spammers achieve this, but in summary they randomly generate your GA ID (UA-XXXXXx-1), and thereafter they send you the fake referral traffic.
Because this type of spam does not pass through your website, you cannot block it through coding and plug-ins. The only way to stop it from messing with your data is to use the filter that I will provide below.
How to identify ghost spam
Ghost spam tends to be sporadic, so when you check your GA data you will find that there will be spikes of ghost spam, followed by several days of no ghost spam. Spotting ghost spam is not difficult, as the data provided is clearly false.
To identify ghost spam take the following actions:
1. Go to Reporting> Acquisition> All Channels> Referral
2. Once you are here click Secondary Dimensions> Behavior> Hostnames
3. Your screen will split into Source and Hostnames, and you can easily see which sources and hostnames look out of place (a list of some of the most common ghost spam sources is provided below).
Additionally, you will find that most of the ghost spam sources give you a ‘not set’ value when it comes to the corresponding hostnames.
P.S. Most ghost spam has a bounce rate of 0% or 100%.
How to identify valid hostnames
Before you create a filter for ghost spam, you need to identify your valid hostnames. You can do this easily by taking the following actions:
1. Go to Reporting> Audience> Technology> Network
2. Select Hostname as the Primary Dimension
3. Copy down all the valid hostnames that you see. The valid hostnames are where your GA tracking code appears, so all your valid hostnames should have a direct connection to your site. A hostname like amazon.com is not valid, as you have not put your tracking code on this site.
These are some valid hostnames that will appear:
http://www.yourdomain.com, yourdomain.com, blog.yourdomain.com, support.yourdomain.com, shoppingcart.com, translate.googleusercontent.com (this is used by visitors coming from other countries who need translation services).
Creating a customer filter for ghost spam
The following filter will take care of all your ghost spam, regardless of whether it appears in your referral, direct, or organic traffic channels. This filter is called the Valid Hostnames Filter, and it is the brainchild of Carlos Escalera at Ohow.
As I mentioned above, ghost spam uses invalid hostnames. Therefore, this filter works to include only valid hostnames in your GA report. To create the filter take the following actions:
1. Create an expression that includes all the valid hostnames that you have gathered. The expression will appear like this:
2. Go to Admin> Filters> New Filters> Create New Filter
3. Enter ‘Valid Hostnames’ as the Filter Name
4. Filter Type> Custom> Include
Filter Field> Hostname
5. Copy the valid hostnames expression that you created in point 1, and then paste it in the box marked Filter Pattern.
6. Click on verify this filter before you save it. You should see a table showing your data, before and after you apply the filters.
7. If the table looks good, then you can click on save.
P.S. If you add your GA tracking I.D. to video or ecommerce services like PayPal and YouTube, make sure that you add the relevant hostname to the filter above. This way that traffic can be recognized.
|Ghost Spam Worst Offenders|
|vitaly rules google||Get-Free-Traffic-Now.com||social-buttons.com|
|o-o-6-o-o.com / referral||hulfingtonpost.com||forum20.smailik.org|
STEP 2: FILTERING NON-GHOST REFERRER SPAM
What is non-ghost referrer spam?
Non-ghost referrer spam is also known as crawler referrer spam. While good web crawlers like Google Bots crawl your site so that they can index your content for the search engines, crawler referrer spam browses your site with different intentions e.g. getting your web property I.D. and sending traffic to their own site.
While ghost spam does not visit your site, crawlers visit your page and can do damage. Even worse is the fact that crawlers use valid domains e.g. apple.com, so that they don’t look out of place in your GA reports. But fear not, I’ll show you how to identify them in a few seconds.
How to identify non-ghost spam
Unlike ghost spam, crawlers use valid hostnames to send fake referral traffic to your site. Therefore you won’t be able to identify them using the method I gave above. Instead, here is how to spot these smart bots:
1. Go to Acquisition> All Traffic> Channels> Referrals
2. Select Secondary Dimensions> Behavior> Hostnames
3. Use the list provided below to identify which crawler spam is using a valid hostname
4. List the domains of all the crawler spam in a word/excel document
Creating a customer filter for non-ghost spam
Unlike ghost spam, crawler referrer spam visits your site. This means that you can block them through coding and plug-ins. However, below I will show you how to filter the spam out from your GA. Like the filter above, the Crawler Spam Filter comes courtesy of Carlos Escalera.
1. Go to Admin> Filters> New Filter
2. Enter ‘Crawler Spam’ as the Filter name
3. Filter Type> Custom> Exclude
Filter Field> Campaign Source
4. Create a regular expression, which includes the domains of the crawler spam that you have already identified. The format is the same, and it should look something like this:
5. Insert this regular expression in the Filter Pattern box
6. Verify the filter, then click save.
|Crawler Spam Worst Offenders
STEP 3: BLOCKING BOTS AND SPIDERS
These bots and spiders are not harmful, as they help with your search engine rankings. Instead of blocking them, you can exclude them by following the easy steps below:
Go to Admin> View Settings> Check box marked ‘Exclude all hits from known bots and spiders’> Save
EXTRA TIPS FOR FILTERING REFERRER SPAM
1. Filters permanently alter your data, so before you create any filters create a new GA view. This way you can have GA data that is completely unfiltered. You can do this by simply going to Admin> View Setting> Copy View.
2. Once you identify ghost spam in your Google Analytics report, do not try and visit these sites . While some spam bots are harmless, many of them are looking to intentionally install malware on your computer.
3. Spam bots target weak and vulnerable sites more often than they do protected sites. You should therefore look at investing in quality hosting, to reduce the frequency of these attacks.
4. Check your GA report for new referrer spam on a monthly basis, and then update your filters accordingly. Depending on the size of your site, you can choose to ignore the spam bots sending negligible traffic, and block the ones that are sending large amounts of fake traffic to your site.
5. Use a firewall for your site; this acts as the first line of defense for bad spam bots.
6. In my previous post I talked about using custom alerts to monitor unusual shifts in traffic patterns. The purpose of custom alerts is to let you know when your site is experiencing a huge spike or drop in traffic. These alerts can also bring your attention to spikes of traffic caused by an influx of hits by bad spam bots. In this way you can take immediate steps to block these bots from damaging your site.
BLOCKING REFERRER SPAM
The 3 steps I have provided above will help you filter referrer spam from your Google Analytics. However, these methods do not stop the spam from crawling your site to begin with, and they do not block spam from hitting your web server. To do this you will need to block the crawler referral spam completely.
There are several methods of blocking crawler referral spam before it reaches your Google Analytics report, and these include adding code to your .htaccess file, deflecting spam traffic, changing your tracking I.D, adding a blacklist of referrers, and using WordPress plug-ins.
Each of these methods has its benefits and drawbacks, and if you would like to know more about these methods, you can check out the helpful resources below:
The Valid Hostname Filter will include valid hostnames and exclude spammer hostnames, and the Crawler Spam Filter will exclude all traffic coming from spam bots that are crawling through your site. When you use these 2 filters with the bots and spiders exclusion setting, you will efficiently filter out referral spam from your Google Analytics.
No more struggling with inflated traffic amounts and corrupted data. By applying the simple filters above, you can ensure that you only receive a clear and true depiction of your traffic and audience behavior (bounce rate, duration, sessions).
Before you leave, drop a comment below and tell me if the filters are working for you. Also share this with your friends, so that they know how to keep their GA safe from those pain in the ass spammers. And don’t forget to subscribe to my spam-free newsletter for all things Business Broken Down.
It was good talking to you, and I hope to see you next week.