Thursday, December 21, 2006

Alexa Web Search APIs

Based on some recent news, I thought now would be a good time to reach out and remind developers that Alexa is fully committed to providing deep and meaningful programmatic access to Alexa's web search engine. If you need to get your digital hands on raw documents then you should give Alexa a try.

From the Alexa Web Search product detail page:

Alexa Web Search gives programmatic access to Alexa's web search engine. Developers can pass in search queries and get back up to 20 results at a time. This service allows developers or researchers to create innovative new search applications.

Taking advantage of advanced search
features allows developers to restrict the search results to a particular category of sites, such as computers or sports, or search for phrases, documents in a specific language, or for specific file formats. Searches can be limited to document titles, anchor text, just the text of the document or just the document URL. Adult-content filtering is also available. In total, there are over 50 different search fields that allow developers to refine their searches.

See how this service works in practice: use the search on the Alexa web site at www.alexa.com, which is powered by the Alexa Web Search web service.

Advanced users can further customize the Alexa web search engine by adding custom search fields or by building completely custom search engines. To find out more about how to customize Alexa Web Search, please visit the Alexa Web Search Platform home page at websearch.alexa.com, take the tour, and read the technical documentation.


Link

Comments | Permalink

Thursday, December 14, 2006

Image Search Released Today

About a month ago we decided to take the Web Search Platform on an exercise run. We put one engineer to the task and asked him to build an image search engine. Today we are releasing it on alexa.com. It is still in Beta while we work out some of the kinks and continue to build the index. At this point we’ve indexed 10% of the images in the Alexa crawl. We expect to have all of them indexed in the next few months.

Alexa is committed to enabling anybody with programming skills to build a full-featured search engine without spending millions on crawlers, storage and indexing technology. So far we’ve released quite a few search engines of our own on the Web Search Platform, some with complete source code and tutorials, including:

  • Alexa Web Search – this is Alexa’s primary search index featured on alexa.com
  • Alexa Image Search – just released today

As well as the following niche/vertical search engines, all of which come with complete source code and build instructions:

  • Camera Image Search – this lets you search for photos based on the camera and settings. Thinking of buying a Sony Cybershot? See what pictures from that camera look like.
  • Robots.txt Search – this search lets you see which Web sites block which crawlers. For example, want to see all sites that block Yahoo Slurp?
  • Zip File Search – this search lets you look for files buried in Zip files. For example, looking for an old winamp skin?
  • HCard Search – this search lets you look for HCard contact info on the web. Looking for somebody named Mack?

If you are thinking about building a search engine the Web Search Platform is definitely worth a look. Sign up for the beta and get started.

Comments | Permalink

Tuesday, December 12, 2006

More Traffic Graphs

I've been looking for more sitemeter traffic graphs. They aren't as easy to find as I'd like. It would be great if somebody would put together a custom search engine to help me find them (hint, hint.) After much searching, I found two more. First up, defamer.com.

Defamer is a celebrity gossip rag (their words, not mine), currently ranked 5,716 in Alexa. According to sitemeter they are getting somewhere around 250,000 visits per day. Shown here is their 30 day visits history (in green) overlaid with their Alexa Reach graph (the blue line.)

This graph provides further evidence that if you are a popular site, you can expect to see correlation between Reach and Visitors.

I should probably mention, just to be clear, that Reach and Visitors aren't the same. Reach describes the percent of all Internet users who visit a site, while Visitors describes, well, visitors. The net effect is that a reach measurement will tend to constrain the peaks and valleys on the graph. But for the purposes of this experiment, they are close enough.

Next up, michellemalkin.com.
Michelle Malkin's site is a political blog, currently ranked 7,898 by Alexa. According to site meter she gets about half the visitors of defamer.com. Shown here is Michelle Malkin's 1 year visitor graph (in green) overlaid with her Alexa's reach history graph (the blue line).

Over the course of the year the reach graph is doing a pretty good job of following the trend of her site.

The Reach measurement, because it measures percent of all Internet users, as opposed to total visitors, is, over time, going to expose some variations based on geographic Internet usage. In this example, you can see some of it. Her site traffic was flat March through November, but Alexa shows it trending downward. This is because the percent of US users, compared to all users, is trending down, and hence the percent of Internet users visiting her site goes down with it.

As I mentioned earlier, defamer gets about twice as many daily visitors as Michelle Malkin. Do the Alexa Reach History graphs confirm this?

Looks just about right. Defamer looks to have an average of somewhere around 0.045% Reach, while Michelle Malkin has 0.022%.

While these two hand-picked examples happen to look great, I could probably find others that don't look great, especially for sites that have less traffic than these. In other words, the Alexa graphs, while helpful and informative, are not always perfect. One look at the graphs above and you can see that they are only approximations. But, if used properly, Alexa can be a useful indicator of trends and, if you use some discretion, relative popularity.

Comments | Permalink