Thursday, July 13, 2006
Is it just me, or are these charts a bit goofy? Does Yahoo really still have 23% of the search market? Is Google at less than half the search market?
I don't believe it. Any webmaster will tell you that Google represents almost ALL of the search engine traffic. Yahoo is nowhere near 23%. Just read the blogs, here, here, here and here and on countless other blogs.
So I decided to poke around Alexa's stats and generate my own search engine market share graph. According to my calcs, Google is at almost 85% market share. Yahoo is at 9%. This seems to mirror what I expected.
I'm not sure why the premier (read: expensive) services don't match what everybody sees in their logs...
What do you see in your logs?
Some notes about my calculations:
I am counting reach, not pageviews. I didn't count any international domains like yahoo.co.jp or google.de, etc. For Google I only counted google.com and none of the subdomains like mail.google.com and images.google.com. For Yahoo I only counted search.yahoo.com. For MSN I only counted search.msn.com. For ask I only counted ask.com and search.ask.com. For AOL I only counted search.aol.com. For MyWay I only counted mysearch.myway.com and search.myway.com.
Wednesday, July 05, 2006
I'm on a jag here. We are still digging around the Web Search Platform and pulling up some stats. It can be addictive.
I managed to pull this graph together without spending any money. I simply used the "Create a Collection" feature of the platform to find out what was in the crawl.
The question was this: What response codes does our crawler get when it tries to crawl the Web?
When the crawler attempts to crawl a document, it is like knocking on a door... is anybody home? Did you move? Did I knock on the wrong door? Did the house disappear?
My methodology was dead-simple. I just used the Create a Collection form on the Web Search Platform to construct queries asking how many documents existed in the Alexa Crawl with various HTTP response codes during the April to May time period. Within a few seconds I had the answer.
In the pie chart above, green represents response code 200 - OK, meaning that it was a successful transaction.
Items in Red are in the 5XX class and represent server errors. These could be DNS problems, servers that are down, or other problems.
6.5 Less than .5 percent of all docs were server errors.
Another 6.5 percent were the items in blue, 4XX class, including everybody's favorite, 404 error - page not found. These are referred to as client errors.
Last, but not least, there are the 3XX error codes in yellow, indicating that the document has been moved. About 8% of all docs returned this code.
We have a few more ideas about what to do next, like distribution of domains across the Web and we are looking for more. Suggestions are welcome.