Back in 1997 I did some research in an attempt to reverse-engineer algorithms used by search engines. In that year, the big ones included AltaVista, Webcralwer, Lycos, Infoseek, and a few others.
I was able to largely declare my research a success. In fact, it was so accurate that in one case I was able to write a program that produced the exact same search results as one of the search engines. This article explains how I did it, and how it is still beneficial today.
Step 1: Determine Rankable Traits
The first thing to do is make a list of what you want to measure. I came up with about 15 different possible ways to rank a web page. They included things like:
- keywords in title
- keyword density
- keyword frequency
- keyword in header
- keyword in ALT tags
- keyword emphasis (bold, strong, italics)
- keyword in body
- keyword in url
- keyword in domain or sub-domain
- criteria by location (density in title, header, body, or tail) etc
Step 2: Invent a New Keyword
The second step is to determine which keyword to test with. The key is to choose a word that does not exist in any language on Earth. Otherwise, you will not be able to isolate your variables for this study.
I used to work at a company called Interactive Imaginations, and our site was Riddler.com and the Commonwealth Network. At the time, Riddler was the largest entertainment web site, and CWN was one of the top trafficed sites on the net (in the top 3). I turned to my co-worker Carol and mentioned I needed a fake word. She gave me oofness. I did a quick search and it was not found on any search engine.
Note that a unique word can also be used to see who has copied content from your web sites onto their own. Since all of my test pages are gone (for many years now), a search on Google shows some sites that did copy my pages.
Step 3: Create Test Pages
The next thing to do was to create test pages. I took my home page for my now defunct Amiga search engine Amicrawler.com and made about 75 copies of it. I then numbered each file 1.html, 2.html... 75.html.
For each measurement criteria, I made at least 3 html files. For example, to measure keyword density in title, I modified the html titles of the first 3 files to look like this:
1.html:
The html files of course contained the rest of my home page. I then logged in my notebook that files 1 - 3 were keyword density in title files.
I repeated this type of html editing for about 75 or so files, until I had every criteria covered. The files where then uploaded to my web server and placed in the same directoty so that search engines can find them.
Step 4: Wait for Search Engines to Index Test Pages
Over the next few days, some of the pages started appearing in search engines. However a site like AltaVista might only show 2 or 3 pages. Infoseek / Ultraseek at the time was doing real time indexing so I got to test everything right away. In some cases, I had to wait a few weeks or months for the pages to get indexed.
Simply typing the keyword oofness would bring up all pages indexed that had that keyword, in the order ranked by the search engine. Since only my pages contained that word, I would not have competing pages to confuse me.
Step 5: Study Results
To my surprise, most search engines had very poor ranking methodology. Webcrawler used a very simple word density scoring system. In fact, I was able to write a program that gave the exact same search engine results as Webcrawler. Thats right, just give it a list of 10 urls, and it will rank them in the exact same order as Webcrawler. Using this program I would make any of my pages rank #1 if I wanted to. Problem is of course that Webcrawler did not generate any traffic even if I was listed number 1, so I did not bother with it.
AltaVista responded best with the most number of keywords in the title of the html. It ranked a few pages way at the bottom, but I dont recall which criteria performed worst. And the rest of the pages ranked somewhere in the middle. All in all, AltaVista only cared about keywords in the title. Everything else didnt seem to matter.
A few years later, I repeated this test with AltaVista and found it was giving high preference to domain names. So I added a wildcard to my DNS and web server, and put keywords in the sub-domain. Voila! All of my pages had #1 ranking for any keyword I chose. This of course led to one problem... Competiting web sites dont like losing their top positions and will do anything to protect their rankings when it costs them traffic.
Other Methods of Testing Search Engines
I am going to quickly list some other things that can be done to test search engines algorithms. But these are all lengthy topics to discuss.
I tested some search engines by uploading large copies of the dictionary, and redirecting any traffic to a safe page. I also tested them by indexing massive quantities of documents (in the millions) under hundreds of domain names. I found in general that there are very few magic keywords found in most documents. The fact still remains that a few keyword search times like sex, britney spears, etc brought in traffic but most do not. Hence, most pages never saw any people traffic.
Drawbacks
Unfortunately there were some drawbacks to getting listed #1 for a lot of keywords. I found that it ticked off a lot of people who had competing web sites. They would usually start by copying my winning methodology (like placing keywords in the sub-domain), and then repeat the process themselves, and flood the search engines with 100 times more pages than the 1 page I had made. It made it worthless to compete for prime keywords.
And second, certain data cannot be measured. You can use tools like Alexa to determine traffic or Googles site:domain.com to find out how many listings a domain has, but unless you have a lot of this data to measure, you wont get any useable readings. What good is it for you to try and beat a major web site for a major keyword if they already have millions of visitors per day, you dont, and it is part of the search engine ranking
Bandwidth and resources can become a problem. I have had web sites where 75% of my traffic was search engine spiders. And they slammed my sites every second of every day for months. I would literally get 30,000 hits from the Google spider every day, in addition to other spiders. And contrary to what THEY believe, they arent as friendly as they claim.
Another drawback is that if you are doing this for a corporate web site, it might not look so good.
For example, you might recall a few weeks ago when Google was caught using shadow pages, and of course claimed they were only test pages. Right. Does Google have no dev servers No staging servers Are they smart enough to make shadow pages hidden from normal users but not smart enough to hide dev or test pages from normal users Have they not figured out how a URL or IP filter works Those pages must have served a purpose, and they didnt want most people to know about it. Maybe they were just weather balloon pages
I recall discovering some pages that were placed by a hot online & print tech magazine (that wired us into the digital world) on search engines. They had placed numerous blank landing pages using font colors matching the background, which contained large quantities of keywords for their largest competitor. Perhaps they wanted to pay digital homage to CNET Again, this was probably back in 1998. In fact, they were running articles at the time about how it is wrong to try and trick search engines, yet they were doing it themselves.
Conclusion
While this methodology is good for learning a few things about search engines, on the whole I would not recommend making this the basis for your web site promotion. The quantity of pages to compete against, the quality of your visitors, the shoot-first mentality of search engines, and many other factors will prove that there are better ways to do web site promotion.
This methodology can be used for reverse engineering other products. For example, when I worked at Agency.com doing stats, we used a product made by a major micro software company (you might be using one of their fine operating system products right now) to analyze web server logs. The problem was that it took more than 24 hours to analyze 1 days worth of logs, so it was never up to date. A little bit of magic and a little bit of perl was able to generate the same reports in 45 minutes simply by feeding the same logs into both systems until the results came out the same and every condition was accounted for.
Copyright 2005 . All Rights Reserved. is a book price comparison shopping engine, allowing you to locate the cheapest prices on millions of books and ebooks.
So it seems that content is still king, and that RSS article feed does a nice job of streaming in fresh, keyword-loaded content that gets your site indexed fast AND brings Adsense Revenue. Thats great, but if you think that articles replace traditional web copy, youre dead wrong.
Look at any top corporations website structure to see that the traditional website model still takes first prize. Even if youve unearthed the absolute best articles on the web from leading experts, an article feed is still no way to greet a customer whos just landed on your homepage.
Think about this. How did you feel the last time you clicked and landed and sat there blinking at an article that you werent planning on reading If the article managed to suck you in, were you tempted after perusing it, to explore other parts of that website Or did you just go, Ahh, good one and continue on your merry ADD-like web-surfing way
My guess is, you clicked away. So if youre looking to launch a new site or considering making improvements on your existing site, how about putting that RSS feed back where it belongs - on the article page.
Remember your visitor. Now that hes there, how can you keep him interested in sticking around The guests who arrive on your homepage are no different than the guests who come to your home. You work hard to create a home that reflects who you are - your unique style, your hobbies, habits, culture and beliefs. When someone comes into YOUR house, you dont push someone elses stuff on them. Why would you do that on your website
Give your website guests the opportunity to get to know your company and what you stand for. That means including lively, inspiring and sales-driven copy that explains your product and service offerings. That means a company philosophy and a personal bio that tells the story of who you are and builds a case for your trustworthiness and expertise. That means copy that addresses your customers emotional mindset and sucks them into wanting and even needing what you sell. And lets not forget the hallmarks of a real business - the privacy policy, user agreement, payment policies and customer-satisfaction guarantee as each pertains to your profession.
Nobody should ever be confused by what they see on your website. Nobody should ever be lost on your website. Give your website a test run - click from a few different outside links and see where you end up. You MUST have a logo--on every page. Your company name must be clearly labeled on every page, even the deep content pages. Your contact information should be easily accessible and at the very least include an email address or comment submission form. Give your customer every opportunity to reach out and get in touch with you.
Finally - your website should be strategically linked from page to page, keyword to relevant keyword, with the ultimate purchase end in sight. Sure, its fun to create websites for the heck of it, but if youre interested in having a real business that reflects true professionalism, then YES you need legitimate, one-of-a-kind web content that tells the story of you and what you do. And above all, it should expertly lead the customer to buy!
Hire an experienced designer who knows about Search Engine Optimization. Hire a copywriter who can create sizzling content on all the pages that will remain stationary and even those that will require regular updating. Take care in creating a website thats a marketers dream come true. Invest in the future of your business.
Copyright 2005 Dina Giolitto. All rights reserved.
Dina Giolitto is a copywriting consultant and ghostwriter with 10 years of experience writing corporate print materials and web content. Trust her with your next e-book, article series or web project, and make a lasting impression on your audience of information-hungry prospects. Visit for more information.