Tuesday, August 01, 2006

Distribution of Image Types

A few weeks ago, we released a graph showing how the distribution of Top Level Domains changes by site popularity. That got me thinking about other kinds of distribution graphs we could do based on data publicly available from Alexa. This week, we’ve got a graph charting the distribution of different image MIME types by image size. The data was collected using the search engine built into the Alexa Web Search Platform by asking it to count documents of a certain size and MIME type.

Distribution of Image Types by Image SizeAs you can see, for images less than 2k, choosy webmasters choose GIF, the red-purple stripe. Between 2k and 8M, JPEGs (in blue) are dominant. Beyond 8M, the TIFF format (dark purple) starts to get its cut of the action. Other types like PNGs (yellow) and BMPs (cyan) exist in our crawl --- there were even a few SHX files --- but are never a majority of the files out there for any size category. (Because this comes from with raw crawl data, the results take the popularity of the containing site and the link structure of the web into account automatically since Alexa is more likely to crawl what we consider more important sites more often.)

There are two ways to look at this data. One is that the formats with high distribution on the light side are used more often for smaller images, and formats on the heavy side are used more often for larger images. The other is that formats on the light side are intrinsically smaller, due to better compression or a better encoding. This is certainly the case with PNGs and BMPs, both of which are lossless: BMPs appear closer to the right of the graph because they are generally used for images with larger dimensions and are not compressed as well as PNGs, if at all. The major difference between JPEGs and TIFFs, both of which are used for larger graphics, is that JPEGs are lossy and TIFFs are not, so TIFFs are normally larger than JPEGs of similar dimensions. In general, the two most popular formats we see are JPEG for large images and GIF for small graphics, with PNG squeezing in across the spectrum.

And all of this data was collected for free through the Collection Manager in the AWSP Portal! You can find out more in our Developer's Corner.

Comments - Permalink