“Banned” By Google? Find Out How to Entice Googlebot to Recrawl Your Site
Posted on 18th August, 2007
In my previous post I wrote about a problem I had where many of my sites were suddenly removed from Google search result pages.
It was â€˜unsettling’ to say the least because I could had easily lost hundreds of dollars per day from AdSense and affiliate programs that depend on Google organic traffic during that debacle.
I found it strange to see Googlebot repeatedly spewed the robots.txt unreachable error or Network unreachable errors via my Google Webmaster Console when I was absolutely sure that my server uptime had been nothing but one hundred percent during the period.
If you are not familiar with the robots.txt, it’s a file used to keep web pages from being indexed by search engines.
A few questions came to mind when it happened to me.
Did Google finally penalise me for selling text link ads?
I know Google often regards some sites that engage in selling and buying of text links as ignoring the users’ best interest. This is clearly stated in their webmaster guidelines where sites participating in such schemes would risk having their rankings dropped from Google search result pages.
I’ve written about Google’s opinion on paid link before and the addition of paid link reporting form at Google Webmaster Central. Did I finally became the victim of their position on this matter?
That said, I only accept quality and relevance links at my sites and blogs. Besides, several of those affected sites didn’t participate in selling of text links. So I concluded that it had nothing to do with selling text link ads.
Did they consider my Partner page as excessive link exchange practice?
Google had recently updated their Webmaster guideline where they’ve added that excessive reciprocal links or excessive link exchanging (“Link to me and I’ll link to you.”) can negatively impact your site’s ranking in search results.
However, I doubt what I am doing is excessive â€“ at least not what I think they consider excessive at the moment.
While the Partner page at Sabahan.com was created exclusively for cross linking, I only accept useful sites or blogs related to technology, marketing or blogging. A site that doesn’t offer any value to my users isn’t good enough for the search engine and will be deleted.
Occasionally though, some low quality, unrelated blogs might slipped through but they would be deleted in a manual review that I did every now and then. Other affected sites were never involved in any link exchange practice. So, I had crossed this as a possible reason off my list.
Were my sites struck by algorithm changes?
Perhaps what I had been doing to optimise those sites in the search engines are now regarded as spamming.
This might be possible if it affects one or two sites, but not when 15 or more sites, which include several blogs, forums, static HTML sites, e-commerce sites across several different niches were simultaneously affected. It didn’t make sense.
Perhaps it was Google that’s having technical difficulty with their server.
That’s possible but if that’s the case, I am sure there would had been many other site owners affected during that period.
A quick search at Google to find similar occurrences in the past 3 months didn’t return any result that supports this notion. Those that I’ve discovered seemed to be isolated incidents.
So what was really happening here?
Then it struck me that the answer was right in front of me. I guess like some people, I tend to overlook the simple details in favour of a more complicated explanation.
When something like this happens to you, Google Webmaster Central will be your best friend – seriously. Googlebot had been trying to tell me that both my robots.txt file or network was unreachable, and that’s exactly what causing the problem… duh!
The trick was to figure out how did the robots.txt or my server became unreachable when I knew for sure my server had nothing but 100% uptime during that debacle. So obviously, something was preventing Googlebot from accessing my robots.txt file or server. And that something must had been blocking Googlebot IP address.
After some searching I discovered the following error message from my server log file.
[Sat Jul 14 01:39:32 2007] [error] [client 18.104.22.168 ] mod_security: Access denied with code 406. Pattern match “=(http|www|ftp)\\\\:/(.+)\\\\.(c|dat|kek|gif|jpe?g|jpeg|png|sh|txt|bmp|dat|txt|js|html?|tmp|asp)\\\\x20?\\\\?” at REQUEST_URI [hostname ” www.portable-cd-mp3-player.com “] [uri “/frame/index.php?url= http://reviews.cnet.com/SanDisk_Sansa_m240_1GB_silver/4505-6490_7-31563923.html?subj=fdba&part=rss&tag=MP_Portable+Audio+Devices “]
Now it looks like the mod_security had blocked Googlebot IP address. I have CSF â€“ ConsifgServer firewall running and further check revealed that it had blocked Googlebot IP address.
Fixing the problem was a matter of removing Googlebot IP address from the csf.deny file and adding it into /etc/csf/csf.allow file. Of course this can be done easily via the CSF graphical user interface.
Once that done, I resubmitted my sitemap.xml file via Google Webmaster Central and it didn’t take long before Googlebot start to recrawl my sites.
In some situation, the problem would appear to go away by itself and Googlebot would start to crawl your site again. This could happen if the Googlebot comes from a different IP address which is not blocked by your server.
Having a sitemap file for your blog allows Google to index it faster. Check out my other article to learn more. If you have a static HTML site, or any site other than a blog, you can use the tool at XML-Sitemaps.com to generate a sitemap.xml file for your sites easily. It’ll include up to 500 pages from your site in the sitemap file for free.
To prevent similar problem from recurring in the future, I’ve added the following line into my mod_security ( modsec.user.conf ) file to prevent Googlebot from being blocked.
# GoogleBot by user-agent…
SecFilterSelective HTTP_USER_AGENT “Google” nolog,allow
SecFilterSelective HTTP_USER_AGENT “Googlebot” nolog,allow
SecFilterSelective HTTP_USER_AGENT “GoogleBot” nolog,allow
SecFilterSelective HTTP_USER_AGENT “googlebot” nolog,allow
SecFilterSelective HTTP_USER_AGENT “Googlebot-Image” nolog,allow
SecFilterSelective HTTP_USER_AGENT “AdsBot-Google” nolog,allow
SecFilterSelective HTTP_USER_AGENT “Googlebot-Image/1.0” nolog,allow
SecFilterSelective HTTP_USER_AGENT “Googlebot/2.1” nolog,allow
SecFilterSelective HTTP_USER_AGENT “Googlebot/Test” nolog,allow
SecFilterSelective HTTP_USER_AGENT “Mediapartners-Google/2.1” nolog,allow
SecFilterSelective HTTP_USER_AGENT “Mediapartners-Google*” nolog,allow
SecFilterSelective HTTP_USER_AGENT “msnbot” nolog,allow
Of course if you don’t manage your own server, this is probably something that you don’t have to worry about, although you might want to refer your server admin to this article if something similar happens to you.
If your blog or site is suddenly removed from Google index for no apparent reason, just head over to Google Webmaster Central. It’ll offer a hint as to the cause of the problem.