Top

“Banned” By Google? Find Out How to Entice Googlebot to Recrawl Your Site

August 18, 2007 by  

In my previous post I wrote about a problem I had where many of my sites were suddenly removed from Google search result pages.

It was ‘unsettling’ to say the least because I could had easily lost hundreds of dollars per day from AdSense and affiliate programs that depend on Google organic traffic during that debacle.

I found it strange to see Googlebot repeatedly spewed the robots.txt unreachable error or Network unreachable errors via my Google Webmaster Console when I was absolutely sure that my server uptime had been nothing but one hundred percent during the period.

 

roboterror.png

If you are not familiar with the robots.txt, it’s a file used to keep web pages from being indexed by search engines.

A few questions came to mind when it happened to me.

 

Did Google finally penalise me for selling text link ads?

I know Google often regards some sites that engage in selling and buying of text links as ignoring the users’ best interest. This is clearly stated in their webmaster guidelines where sites participating in such schemes would risk having their rankings dropped from Google search result pages.

I’ve written about Google’s opinion on paid link before and the addition of paid link reporting form at Google Webmaster Central. Did I finally became the victim of their position on this matter?

That said, I only accept quality and relevance links at my sites and blogs. Besides, several of those affected sites didn’t participate in selling of text links. So I concluded that it had nothing to do with selling text link ads.

 

Did they consider my Partner page as excessive link exchange practice?

Google had recently updated their Webmaster guideline where they’ve added that excessive reciprocal links or excessive link exchanging ("Link to me and I’ll link to you.") can negatively impact your site’s ranking in search results.

However, I doubt what I am doing is excessive – at least not what I think they consider excessive at the moment.

While the Partner page at Sabahan.com was created exclusively for cross linking, I only accept useful sites or blogs related to technology, marketing or blogging. A site that doesn’t offer any value to my users isn’t good enough for the search engine and will be deleted.

Occasionally though, some low quality, unrelated blogs might slipped through but they would be deleted in a manual review that I did every now and then. Other affected sites were never involved in any link exchange practice. So, I had crossed this as a possible reason off my list.

 

Were my sites struck by algorithm changes?

Perhaps what I had been doing to optimise those sites in the search engines are now regarded as spamming.

This might be possible if it affects one or two sites, but not when 15 or more sites, which include several blogs, forums, static HTML sites, e-commerce sites across several different niches were simultaneously affected. It didn’t make sense.

 

Perhaps it was Google that’s having technical difficulty with their server.

That’s possible but if that’s the case, I am sure there would had been many other site owners affected during that period.

A quick search at Google to find similar occurrences in the past 3 months didn’t return any result that supports this notion. Those that I’ve discovered seemed to be isolated incidents.

 

So what was really happening here?

Then it struck me that the answer was right in front of me. I guess like some people, I tend to overlook the simple details in favour of a more complicated explanation.

When something like this happens to you, Google Webmaster Central will be your best friend – seriously. Googlebot had been trying to tell me that both my robots.txt file or network was unreachable, and that’s exactly what causing the problem… duh!

The trick was to figure out how did the robots.txt or my server became unreachable when I knew for sure my server had nothing but 100% uptime during that debacle. So obviously, something was preventing Googlebot from accessing my robots.txt file or server. And that something must had been blocking Googlebot IP address.

After some searching I discovered the following error message from my server log file.

[Sat Jul 14 01:39:32 2007] [error] [client 66.249.66.228 ] mod_security: Access denied with code 406. Pattern match "=(http|www|ftp)\\\\:/(.+)\\\\.(c|dat|kek|gif|jpe?g|jpeg|png|sh|txt|bmp|dat|txt|js|html?|tmp|asp)\\\\x20?\\\\?" at REQUEST_URI [hostname " www.portable-cd-mp3-player.com "] [uri "/frame/index.php?url= http://reviews.cnet.com/SanDisk_Sansa_m240_1GB_silver/4505-6490_7-31563923.html?subj=fdba&part=rss&tag=MP_Portable+Audio+Devices "]

Now it looks like the mod_security had blocked Googlebot IP address. I have CSF – ConsifgServer firewall running and further check revealed that it had blocked Googlebot IP address.

Fixing the problem was a matter of removing Googlebot IP address from the csf.deny file and adding it into /etc/csf/csf.allow file. Of course this can be done easily via the CSF graphical user interface.

Once that done, I resubmitted my sitemap.xml file via Google Webmaster Central and it didn’t take long before Googlebot start to recrawl my sites.

sitemap.PNG

googlebot.png

 

In some situation, the problem would appear to go away by itself and Googlebot would start to crawl your site again. This could happen if the Googlebot comes from a different IP address which is not blocked by your server.

Having a sitemap file for your blog allows Google to index it faster. Check out my other article to learn more. If you have a static HTML site, or any site other than a blog, you can use the tool at XML-Sitemaps.com to generate a sitemap.xml file for your sites easily. It’ll include up to 500 pages from your site in the sitemap file for free.

To prevent similar problem from recurring in the future, I’ve added the following line into my mod_security ( modsec.user.conf ) file to prevent Googlebot from being blocked.

# GoogleBot by user-agent…
SecFilterSelective HTTP_USER_AGENT "Google" nolog,allow
SecFilterSelective HTTP_USER_AGENT "Googlebot" nolog,allow
SecFilterSelective HTTP_USER_AGENT "GoogleBot" nolog,allow
SecFilterSelective HTTP_USER_AGENT "googlebot" nolog,allow
SecFilterSelective HTTP_USER_AGENT "Googlebot-Image" nolog,allow
##
SecFilterSelective HTTP_USER_AGENT "AdsBot-Google" nolog,allow
SecFilterSelective HTTP_USER_AGENT "Googlebot-Image/1.0" nolog,allow
SecFilterSelective HTTP_USER_AGENT "Googlebot/2.1" nolog,allow
SecFilterSelective HTTP_USER_AGENT "Googlebot/Test" nolog,allow
SecFilterSelective HTTP_USER_AGENT "Mediapartners-Google/2.1" nolog,allow
SecFilterSelective HTTP_USER_AGENT "Mediapartners-Google*" nolog,allow
SecFilterSelective HTTP_USER_AGENT "msnbot" nolog,allow

 

Of course if you don’t manage your own server, this is probably something that you don’t have to worry about, although you might want to refer your server admin to this article if something similar happens to you.

If your blog or site is suddenly removed from Google index for no apparent reason, just head over to Google Webmaster Central. It’ll offer a hint as to the cause of the problem.

Did you enjoy this post? Please subscribe via RSS or email.

Related posts

Comments

RSS feed | Trackback URI

21 Comments »

Comment by lilian
2007-08-18 14:22:55

Thanks for the clear explanations. Another new thing I learnt today.

 
Comment by Yien Bin
2007-08-18 14:50:47

Hi Gaman, been following your rss feeds. Very handy info you have about seo here. Thank you.

 
Comment by ben
2007-08-18 15:55:18

l learnt something new today :) thanks gaman…so now ur ranking should work fine am i right?

Comment by Gaman
2007-08-18 15:57:43

It should be.

 
 
Comment by ben
2007-08-18 16:04:04

good then gaman :) i think i become your die hard fan already..lols.

 
Comment by pinolobu Subscribed to comments via email
2007-08-18 21:17:55

I assume before that there were no problems with googlebot.

Questions:

1 Did google change the googlebot ip address?
2 Does your firewall by default allow access to your websites from any IP address?
3 If so, what happened that made the IP address suddenly blocked by your firewall?

Comment by Gaman
2007-08-19 13:23:45

The googlebot behaviour matches certain characteristics which had raised the mod security red flag. As for the Googlebot, you can’t be too sure unless you can confirm it with your server log.

 
 
Comment by Mr. Rajawang
2007-08-19 16:41:30

i also got 40% drop in my adsense earnings..maybe these were the reason! Thanks gaman!

Comment by Gaman
2007-08-19 16:58:27

Was that as a result of a drop in traffic?

Comment by Mr. Rajawang
2007-08-20 18:01:43

nope,my traffic is still the same. I was wondering why it drop suddenly since around 10 days ago..

(Comments wont nest below this level)
Comment by Gaman
2007-08-21 01:10:44

That’s probably an AdSense issue

 
 
 
 
Comment by chrisblogging.com
2007-08-20 23:15:13

Glad to see that you are at least getting to the bottom of this…

 
2007-08-21 21:56:32

one of things you didn’t mention was in Webmaster tools, you can tell the googlebot to go back and revisit your site manually as well…

 
Comment by Andrew Shim
2007-08-21 23:41:21

excellent tip. will check my logs for similar error.

 
Comment by Wino Subscribed to comments via email
2010-06-21 18:38:30

I realise this is an old article, however has anyone tried the above mod_security changes?

I’ve implemented them however am still having problems with Google Bot getting blocked by LFD!

 
Name (required)
E-mail (required - never shown publicly)
URI
Subscribe to comments via email
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> in your comment.

Trackback responses to this post

Bottom