DevisedMedia.com

Affordable Web Development Resources Since 2003

 



 

Liquidation Merchandise


Creating a Robots.txt file


Some people believe that they should create different pages for different search engines, each page optimized for one keyword and for one search engine. Now, while I don't recommend that people create different pages for different search engines, if you do decide to create such pages, there is one issue that you need to be aware of.

These pages, although optimized for different search engines, often turn out to be pretty similar to each other. The search engines now have the ability to detect when a site has created such similar looking pages and are penalizing or even banning such sites. In order to prevent your site from being penalized for spamming, you need to prevent the search engine spiders from indexing pages which are not meant for it, i.e. you need to prevent AltaVista from indexing pages meant for Google and vice-versa. The best way to do that is to use a robots.txt file.

You should create a robots.txt file using a text editor like Windows Notepad. Don't use your word processor to create such a file.

Here is the basic syntax of the robots.txt file:

User-Agent: [Spider Name]
Disallow: [File Name]

For instance, to tell AltaVista's spider, Scooter, not to spider the file named myfile1.html residing in the root directory of the server, you would write

User-Agent: Scooter
Disallow: /myfile1.html

To tell Google's spider, called Googlebot, not to spider the files myfile2.html and myfile3.html, you would write

User-Agent: Googlebot
Disallow: /myfile2.html
Disallow: /myfile3.html

You can, of course, put multiple User-Agent statements in the same robots.txt file. Hence, to tell AltaVista not to spider the file named myfile1.html, and to tell Google not to spider the files myfile2.html and myfile3.html, you would write

User-Agent: Scooter
Disallow: /myfile1.html

User-Agent: Googlebot
Disallow: /myfile2.html
Disallow: /myfile3.html

If you want to prevent all robots from spidering the file named myfile4.html, you can use the * wildcard character in the User-Agent line, i.e. you would write

User-Agent: *
Disallow: /myfile4.html

However, you cannot use the wildcard character in the Disallow line.

Once you have created the robots.txt file, you should upload it to the root directory of your domain. Uploading it to any sub-directory won't work - the robots.txt file needs to be in the root directory.

I won't discuss the syntax and structure of the robots.txt file any further - you can get the complete specifications from here.

Now we come to how the robots.txt file can be used to prevent your site from being penalized for spamming in case you are creating different pages for different search engines. What you need to do is to prevent each search engine from spidering pages which are not meant for it.

For simplicity, let's assume that you are targeting only two keywords: "tourism in Australia" and "travel to Australia". Also, let's assume that you are targeting only three of the major search engines: AltaVista, HotBot and Google.

Now, suppose you have followed the following convention for naming the files: Each page is named by separating the individual words of the keyword for which the page is being optimized by hyphens. To this is added the first two letters of the name of the search engine for which the page is being optimized.

Hence, the files for AltaVista are

tourism-in-australia-al.html
travel-to-australia-al.html

The files for HotBot are

tourism-in-australia-ho.html
travel-to-australia-ho.html

The files for Google are

tourism-in-australia-go.html
travel-to-australia-go.html

As I noted earlier, AltaVista's spider is called Scooter and Google's spider is called Googlebot.

A list of spiders for the major search engines can be found here.

Now, we know that HotBot uses Inktomi and from this list, we find that Inktomi's spider is called Slurp.

Using this knowledge, here's what the robots.txt file should contain:

User-Agent: Scooter
Disallow: /tourism-in-australia-ho.html
Disallow: /travel-to-australia-ho.html
Disallow: /tourism-in-australia-go.html
Disallow: /travel-to-australia-go.html

User-Agent: Slurp
Disallow: /tourism-in-australia-al.html
Disallow: /travel-to-australia-al.html
Disallow: /tourism-in-australia-go.html
Disallow: /travel-to-australia-go.html

User-Agent: Googlebot
Disallow: /tourism-in-australia-al.html
Disallow: /travel-to-australia-al.html
Disallow: /tourism-in-australia-ho.html
Disallow: /travel-to-australia-ho.html

When you put the above lines in the robots.txt file, you instruct each search engine not to spider the files meant for the other search engines.

When you have finished creating the robots.txt file, double-check to ensure that you have not made any errors anywhere in it. A small error can have disastrous consequences - a search engine may spider files which are not meant for it, in which case it can penalize your site for spamming, or, it may not spider any files at all, in which case you won't get top rankings in that search engine.

An useful tool to check the syntax of your robots.txt file can be found here. While it will help you correct syntactical errors in the robots.txt file, it won't help you correct any logical errors, for which you will still need to go through the robots.txt thoroughly, as mentioned above.




Article by Sumantra Roy. Sumantra is one of the most respected and recognized search engine positioning specialists on the Internet. For more articles on search engine placement, subscribe to his 1st Search Ranking Newsletter by sending a blank email to 1stSearchRanking.999.99@optinpro.com or by going to www.1stSearchRanking.net

The Search Engine Marketing Kit - Chapter 1

More than 350 million English language Web searches are conducted every day. Is your site well-ranked in the results? In Chapter 1 of The Search Engine Marketing Kit, Dan explains the essential background you'll need to know before you can efficiently and intelligently undertake search engine marketing.

Navigating Open Source Licensing

The decision to use an open source license can plunge Web professionals into a mire of patent, trademark and copyright law. In this expose, Blane speaks with Eric Raymond, cofounder of the Open Source Initiative, in an effort to untangle the complexities of open source licensing.

2004 - Open Source Year in Review

2004 was a big year for the open source movement. In his detailed round-up, Blane considers key developments, from the rebirth of Novell to the new GNU... and everything in between!

The Apple Xserve - an Introduction

The Apple Xserve launched in 2002 without much fanfare in the server marketplace. Two years later, the platform has gained more steam than even avid Apple watchers had anticipated. Blane explains what the fuss is all about in this detailed expose.

How a Core Relationship Strategy Can Help You Increase Profits

Are you having relationship problems? Some of your clients are time-drains, don't pay, require constant communication... and these guys distract you from the profitable clients! Andrew provides a practical strategy to identify and enhance relationships with your best clients - and resolve your issues with the others!

5 Great Background Masking Techniques in Photoshop

Isolate objects. Extract objects. Cut out an image. Remove a background. Whatever you call it, the separation of objects from the background of an image is an essential skill for Web designers. Here, Corrie explains 5 masking techniques, highlights their pros and cons, and identifies the applications to which each is suited.

From Independent Contractor to Business Owner: How to Take the Leap

Countless freelancers want to build their businesses into bigger organizations, but don't know where to start. Andrew's step-by-step guide to taking the leap towards sustainable business provides the answers.

Hardening Apache - A Conversation with the Author

The lack of decent Apache security titles prompted Tony Mobily to pen Hardening Apache, the new, definitive reference on the subject. Here, Blane talks with Tony about the technology, the book, and the future...

Use Amazon Web Services in ASP.NET

Amazon Web Services can push fresh content to your site, and help you make some cash in the process. Use ASP.NET with the Amazon Web Service to query the company's catalogue and return results to your site -- Philip's practical tutorial shows how.

Create XP-style Icons Using Illustrator or Freehand

Cool XP-style icons are within your capabilities! Corrie creates vector graphics to replicate XP icons - and design her own - in the last of her 3-part series on vector graphics.

Product Strategies To Boost Your Web Business

Products overcome the fundamental flaw of services: they aren't limited by time. Augment your service offering with resalable products and watch your profits grow. Andrew explains why - and how - to do it!

Flash Panels - Inspiration, Creation and Implementation

Create your very own Flash Panel to control the rotation of Movie Clips on the stage using standard Flash MX 2004 components, a hefty sprinkling of ActionScript and some tips and tricks -- Steve shows how it's done!

Host .NET In SQL Server 2005 Express

SQL Server 2005 goes beyond T-SQL to provide the full power and breadth of functionality available in the .NET Framework. In this hands-on tutorial, Philip shows how to build stored procedures that host CLR-code using SQL Server 2005 Express.

Review: Learning eZ publish 3

eZ publish fans and newbies alike will benefit from 'Learning eZ publish 3: Building Content Management Solutions'. Penned by a selection of experienced, high-profile members of the eZ publish community, this well-organised, practical book fills the holes left by the online documentation.

MySQL 3 to 4 Migration in the Real World

What's it really like to migrate your site from MySQL 3 to MySQL 4? Why would you do it, and what are the pitfalls? Tyson Lowery of SimDynasty.com answers these questions and more in this candid interview.

hi mom