Thursday, July 27, 2006

Put Thumbnail Images on your Website


Alexa launched the Alexa Site Thumbnail web service today. This service allows webmasters to easily incorporate thumbnail images of web site home pages into their sites.

Developers and webmasters can sign up and get started using the Alexa Site Thumbnail web service today at the Amazon Web Services web site http://aws.amazon.com/ast.

Thursday, July 13, 2006

Search Engine Market Share

I occasionally run into charts of search engine market share from Nielsen, like this one, or from other expensive services like Comscore.

Is it just me, or are these charts a bit goofy? Does Yahoo really still have 23% of the search market? Is Google at less than half the search market?

I don't believe it. Any webmaster will tell you that Google represents almost ALL of the search engine traffic. Yahoo is nowhere near 23%. Just read the blogs, here, here, here and here and on countless other blogs.

So I decided to poke around Alexa's stats and generate my own search engine market share graph. According to my calcs, Google is at almost 85% market share. Yahoo is at 9%. This seems to mirror what I expected.

I'm not sure why the premier (read: expensive) services don't match what everybody sees in their logs...

What do you see in your logs?

Some notes about my calculations:
I am counting reach, not pageviews. I didn't count any international domains like yahoo.co.jp or google.de, etc. For Google I only counted google.com and none of the subdomains like mail.google.com and images.google.com. For Yahoo I only counted search.yahoo.com. For MSN I only counted search.msn.com. For ask I only counted ask.com and search.ask.com. For AOL I only counted search.aol.com. For MyWay I only counted mysearch.myway.com and search.myway.com.

Comments - Permalink

Thursday, July 06, 2006

Distribution of Domain Types

Alexa's Applications Engineer, Derrick Pallas, decided to do some creative analysis of domain types, or TLDs (Top Level Domains) by traffic rank and produced this nice looking graph. It takes a few minutes to grok, but there is a payoff for the effort.

The area at the bottom of the graph in blue represents ".com" domains. According to the graph, the top 4 sites on the web are all .com domains. When you get out to rank 65,000 the .com domains represent about 60% . The remainder is made up of other popular domains, in order, .net, .org, .de, .ru, etc...

Some of the most interesting and somewhat surprising datapoints occur in the ranks 5 through 50 range, where both China and Japan are well represented. But, further down the ranks, both China and Japan begin to fall off and represent relatively small portion of the top 65K sites.
Conversely, Russia is underpresented in the top sites, but out at the farther reaches of the graph is fairly well represented.

What this graph doesn't take into consideration is the reach of the largest sites. As I mentioned in a previous post the largest sites receive vastly more traffic than the rest. The top 3 sites have an average reach of 23%, meaning that 23% of all users on the Internet are likely to visit them. Compare that to site #10 with a reach of 5%, or to site #100 with a reach of .1%, or to site 1,000 with a reach of .06%, or to site 10,000 with a reach of .02%. The net effect of this (no pun intended), if we were to redraw the graph showing the reach of the TLDs, is that the TLDs shown on the left of the graph would have a much larger influence on the right of the graph. Meaning the graph would become mostly blue, with some orange for China, some pink for Japan, and not much else.

Not to shift into sales/marketing mode, but I'm sure some of you will ask, all of the source data is available as web feeds on Amazon.com. There is the Top Sites Service, which can be used to get the list of top sites globally, or by country. Then there is the AWIS feed, which includes the URL Info operation that can be used to get all of the Alexa traffic info for any URL.

Comments - Permalink

Wednesday, July 05, 2006

HTTP Response Codes - Is Anybody Home?

Update: Some of the numbers got scrambled on my way to the spreadsheet. Updated the graph and the numbers below.

I'm on a jag here. We are still digging around the Web Search Platform and pulling up some stats. It can be addictive.

I managed to pull this graph together without spending any money. I simply used the "Create a Collection" feature of the platform to find out what was in the crawl.

The question was this: What response codes does our crawler get when it tries to crawl the Web?

When the crawler attempts to crawl a document, it is like knocking on a door... is anybody home? Did you move? Did I knock on the wrong door? Did the house disappear?

My methodology was dead-simple. I just used the Create a Collection form on the Web Search Platform to construct queries asking how many documents existed in the Alexa Crawl with various HTTP response codes during the April to May time period. Within a few seconds I had the answer.

In the pie chart above, green represents response code 200 - OK, meaning that it was a successful transaction. 80% 84% of all items Alexa attempted to crawl in April and May returned that code.

Items in Red are in the 5XX class and represent server errors. These could be DNS problems, servers that are down, or other problems. 6.5 Less than .5 percent of all docs were server errors.

Another 6.5 percent were the items in blue, 4XX class, including everybody's favorite, 404 error - page not found. These are referred to as client errors.

Last, but not least, there are the 3XX error codes in yellow, indicating that the document has been moved. About 8% of all docs returned this code.

We have a few more ideas about what to do next, like distribution of domains across the Web and we are looking for more. Suggestions are welcome.

Comments - Permalink