What is a Robots.txt File?

What is a Robots.txt File?


Search engines look at millions of web pages to come up with search results. They do this with what we call search engine spiders. This makes sense - spiders crawling around on the Web. But another word for them is robots because they are simply unmanned programs gathering data automatically. I cant help but picture them as the characters in the new animated movie Robots.

In the beginning, these robots spidered every page, every file, attached to the Web. This caused problems for both the search engines and the people using them. Pages that really arent worth looking at, such as, say, header files to be included in all pages on a site, were being spidered and showed up in search results. Have you ever searched on Google and gotten a partial page as a result

The solution was for Google and other search engines to begin looking for a robots.txt file in the root folder of each site (http://www.mydomain.com/robots.txt) to determine what should and shouldnt be searched. This is named, The Robots Exclusion Standard. This simple text file, created with Notepad or other simple text editor gives you complete control by telling the robots not to spider certain folders in your site. The result is happier visitors who come to your site from search engines and get only full pages that you want them to see, not partial, test or script pages you dont want them to see.

Lets look at some examples to get started:

This allows all spiders to spider all pages on your site. The * is a wildcard that means all spiders.

User-agent: *Disallow:

This is the opposite of the above example. This one tells all spiders to NOT spider your whole site. You might want this if you have a test site, for example, that is not live yet.

User-agent: *Disallow: /

This example tells all robots to stay out of the cgi-bin and images folders.

User-agent: *Disallow: /cgi-bin/Disallow: /images/

This example tells only the WebFerret robot to not spider the page ferret.htm. Its only an example. I have nothing against WebFerret. The user agent code for Google is googlebot.

User-agent: WebFerretDisallow: ferret.htm	

It is important that the file is a simple text file do not use Microsoft Word to create it. And be careful of how you type it must look exactly like the above examples, with caps only for the first letter, just the right spacing, etc. A poorly done robots.txt file could harm your site more than help it. For a cool online robots.txt file validator, go to http://www.searchengineworld.com/cgi-bin/robotcheck.cgi.

 

As an e-commerce consultant for over three years, and Web designer for over ten, Chuck Lasker has been helping individuals and organizations utilize the Internet in almost every arena. Chucks e-newsletter and blog, The MerchantHowTo.com Report, at  , is free and popular amongst e-store owners.

How to Get Indexed Quickly in Google

Getting indexed in Google does not work the way you may think. I touched on this subject briefly in my last post when I talked about the Google Pagerank. Now I want to get a little more deep into it. Help you get your sites indexed. Let me first start off by giving you a short quiz. Lets see how much know and if you are already ahead of others.

Whats the best and fastest way to get your site indexed by Google

a. Go to the Google Submit page at www.google.com and submit your site.

b. After you finished building your site, leave it alone so the Googlebot will pick it up automatically on its own as it prowls around the web.

c. Call or email Google and let them know your site is ready to be indexed.

d. Link your website to another high PR site.

Whats the answer Which do you think will get your site indexed. . . .Is it a b How about c or d They all sound like possibilities. Well lets walk through the ones that wont work to get to the answer that will.

c. Call or email Google and let them know your site is ready to be indexed.This is a good one. I do not know Googles number to call them. If you do, let me know. On second thought, you keep it. I do not need it. As youve figured out by my sarcasm, c is not the answer. No way has any webmaster called or emailed Google and got their site indexed. I do not believe they have such a number or email address. Your better off using their submit page. Oops, did I give away the answer. Read on. . .

Ok ok ok. C was not my real answer anyway. I know what it is. Its a. Ive done a. Alright, lets explore this answer.

a. Go to the Google Submit page and submit your site.

Not!!!! Wait a minute! Your telling me that the google submit page does not work Hold on, Ive used it. Its there, it exist! Yea I know it exist. Its located here Submit Page. However the question was how to get your site listed quickly Yes you can submit your site using the submit page, but it wont get indexed quickly. The submit page is not the way unless you have months or years to wait. But if your like me. (impatient) Then submitting to the Google submit page is not the way to get your site indexed quickly. Lets put it this way, I have several sites and am working on building more. I have not yet submitting not one site to the Google submit page, except for one. That was my first one when I did not know any better. The point is all of my sites are indexed in Google. Many of which were indexed in Google within 2-3 days!!! You read that right, 2-3 days. Do not believe me, try a yourself. Click the submit page above and submit it to Google. Post to me when your site gets indexed.

Well what about b. Lets explore b.

b. After you finished building your site, leave it alone so the Googlebot will pick it up automatically.

This is more of a myth than the rainbow with the pot of gold at the end. Not going to happen. How is the Googlebot to know your site exist in the first place to get to it Wait, you thought the Googlebot was a imaginary robot I just made up to throw you off. Otherwise b would have been your answer, right. Well I did and it worked. But even if I did not do it, b would still have been the wrong answer. In all seriousness, the Googlebot is a robot or spider that surfs the web looking for websites to include in the Google Index. It roams the web all day everyday. Building the site is just the beginning. Once its build it has to be promoted. To just build and leave a site alone, not sure why anyone would do it.

Weve talked c or a not being the answer. b is not it. That leaves the answer has to be d. Thats right. d is it. But why d, you ask. Heres why. I mentioned it a little already.

The Googlebot. On websites that have a decent PR, the Googlebot frequents those websites. To Google, these are important websites, so it checks back. It finds other websites through these. It will follow the links on the website to other websites and so on. If your site is linked to one of these sites your will be spotted by the Googlebot. Keep in mind that this does not mean that your site will get indexed. There is a high chance it will, but your site still needs to be setup with good content and follow basic seo practices. Also note that the Googlebot may not pick up all of your pages at once. It may only crawl so deep and then move on. Itll be back, so just continue to focus on the content of your site.

How will you know if the robot has visited your page. Check your stats. You do have those Please say you do. How else will you be able to know how your site is performing. If you have detailed stats, it will show what robots have visited your page.

Robot has visited your page, but your still not indexed yet. Give it a little time. The Googlebot sends the request and it takes a little bit for it to actually show up in the index. If it does not appear, then the Googlebot may determined that your site was not useful to be included in the index. Again, I cannot stress it enough, focus on your content. If you still have your link to the decent PR site, the bot will be back to visit. Each time it visits, it crawls deeper and deeper into your site. You may get indexed on the 4th or 5th visit.

If you do this tactic effectively, your site will be indexed in Google within 2-3 days! Versus 6 month to a year or never with the submit page! Feel free to share your experiences with others. I am a constant student and love to learn from you.

As always, I hope this info helps. You can visit my blog at http://onlaet.blogspot.com for additional information regarding seo.

Good day!

 

John Cal
 
 
 

Seo expert. Owns and operates several sites that hold top 5 positions within the top SEs.

Related Topics
How To Really Use Google - Part Four
How To Really Use Google - Part Four
How To Really Use Google - Part Four
How To Really Use Google - Part Four
How To Really Use Google - Part Four
The Ten Commandments of Search Engine Optimization
Understanding Search Engine Robots
This System Works Even Better When You Apply It To Real Websites
Avoiding a Bad Search Engine Optimization Experience
Integrating Search Engine Marketing with Email to Increase Conversions
Seo