Is Your Site Being Penalized By the Search Engines For Duplicate Content?

June 14, 2006 by  

Recently, I noticed one of the most heavily trafficked blog in Malaysia was suffering from a massive lost in traffic within a week period. Later, I found out that another blog with exactly similar content, post by post, might have something to do with the traffic lost suffered by the first blog.

I am not sure who is copying who but as a result of this, the first site was penalized by Google and some of its pages do not show up in Google search results.

It has also been reported that the original blog is using a hidden text to trick the search engine. If this is true this might have also contributed to the penalization.

A post over SEO by the SEA looks at what conditions may cause a search engine not to list pages.

Some duplication of content may mean that pages are filtered at the time of serving of results by search engines, and there is no guarantee as to which version of a page will show in results and which versions won’t. Duplication of content may also mean that some sites and some pages aren’t indexed by search engines at all, or that a search engine crawling program will stop indexing all of the pages of a site because it finds too many copies of the same pages under different URLs.

According to the post, search engines see duplicate content when you have the followings:

  1. Product descriptions from manufacturers, publishers, and producers reproduced by a number of different distributors in large ecommerce sites
  2. Alternative print pages
  3. Pages that reproduce syndicated RSS feeds through a server side script
  4. Canonicalization issues, where a search engine may see the same page as different pages with different URLs
  5. Pages that serve session IDs to search engines, so that they try to crawl and index the same page under different URLs
  6. Pages that serve multiple data variables through URLs, so that they crawl and index the same page under different URLs
  7. Pages that share too many common elements, or where those are very similar from one page to another, including title, meta descriptions, headings, navigation, and text that is shared globally.
  8. Copyright infringement
  9. Use of the same or very similar pages on different subdomains or different country top level domains (TLDs)
  10. Article syndication
  11. Mirrored sites

The post goes into much more details explaining eaach condition and how the search engine see it as content duplication.

If you are having difficulties with the page of your site showing up in search engines or just curious how duplicate content can affect you, check out the full story here:
Duplicate Content Issues and Search Engines


Did you enjoy this post? Please subscribe via RSS or email.

Related posts


RSS feed | Trackback URI


Comment by Katana
2006-06-14 09:20:06

If you can post the link of the copycat site, that would be great. 😛

Comment by ShaolinTiger
2006-06-14 11:33:34

He’s not down from dupe content I think he’s down because he was using some blackhat subdomain linking schemes and some other dodgy stuff, that’s how he got so much traffic in the first place.

Comment by Gaman
2006-06-14 11:45:24

Katana: Check this out

ShaolinTiger: That could be one of the reason but I haven’t came across such subdomain. Can you post a link to such subdomain he is using?

I believe, Google may have started catching up with the dulicate content also. If you see how extensive the copycat site is, you’ll know what I mean.

In addition, he uses hidden text, and auto extract the search keywords from search engine query strings and put those keywords on the pages.

Comment by CypherHackz
2006-06-14 11:48:11

But I’ve read in a site, they said unless we put up the source link, it will not mark as duplicate.

Comment by Gaman
2006-06-14 11:50:28

CyberHacks: I used to believe that too but the search engine is getting more clever these days apparently.

Comment by Gaman
2006-06-18 21:39:16

BTW ShaolinTiger, I doubt Kahsoon is using the subdomain trick. As far as I can see, most of his traffic come from Google and they go to the domain as I see from his sitemeter stats all this while.

Check this out to see what I mean:

As you can see, no subdomain appears in the search result, that’s always the case before start losing traffic.

Name (required)
E-mail (required - never shown publicly)
Subscribe to comments via email
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> in your comment.

Trackback responses to this post