Home>All>How Search Engines Work: Everything You Need to Know to Understand Crawlers
  • find out exactly how search engines work

How Search Engines Work: Everything You Need to Know to Understand Crawlers

7 minute read

Back in 1996, two Stanford PhDs theorized a new kind of search engine. Instead of ranking results based on how many times a keyword appeared on a webpage, Larry Page and Sergey Brin figured it would be better to rank results based on the relationships between pages. They called their idea “BackRub” because it ranked search results based on backlinks.

That’s pretty different compared to how search engines work today. Page and Brin’s search engine, Google, receives 5.5 billion searches a day. Or 63,000 searches per second. For every one of those queries, the search engine trawls through more than 130 trillion individual pages internet-wide and selects results in less than a second.

Behind those results lies a lot of groundwork. While Google — and other search engines — is notoriously secretive about the mechanisms behind search results, marketers benefit from knowing how search engines work. Understanding how search engines find, organize, and select results means you can better optimize your web pages to rank.

How Search Engines Work: The Basics

A “search engine” is several interlinked mechanisms that work together to identify pieces of web content — images, videos, website pages, etc. — based on the words you type into a search bar. Site owners use Search Engine Optimization to improve the chances that content on their site will show up in search results.

Search engines use three basic mechanisms:

  • Web crawlers: Bots that continually browse the web for new pages. Crawlers collect the information needed to index a page correctly and use hyperlinks to hop to other pages and index them too.
  • Search index: A record of all web pages online, organized in a way that allows association between keyword terms and page content. Search engines also have ways of grading the quality of content in their indexes.
  • Search algorithms: Calculations that grade the quality of web pages, figure out how relevant that page is to a search term, and determine how the results are ranked based on quality and popularity.

Search engines try to deliver the most useful results for each user to keep large numbers of users coming back time and again. This makes business sense, as most search engines make money through advertising. Google made an impressive $116B in 2018, for example.

How Search Engines Crawl, Index, and Rank Content

Search engines look simple from the outside. You type in a keyword, and you get a list of relevant pages. But that deceptively easy interchange requires a lot of computational heavy lifting backstage.

The hard work starts way before you make a search. Search engines work round-the-clock, gathering information from the world’s websites and organizing that information, so it’s easy to find. This is a three-step process of first crawling web pages, indexing them, then ranking them with search algorithms.

Crawling

Search engines rely on crawlers — automated scripts — to scour the web for information. Crawlers start out with a list of websites. Algorithms — sets of computational rules — automatically decide which of these sites to crawl. The algorithms also dictate how many pages to crawl and how frequently.

Crawlers visit each site on the list systematically, following links through tags like HREF and SRC to jump to internal or external pages. Over time, the crawlers build an ever-expanding map of interlinked pages.

Takeaway for Marketers

Make sure your site is easily accessible to crawlers. If bots can’t crawl it, they can’t index it, and that means your site won’t appear in search results. You can help guarantee crawler accessibility by implementing the following:

  • Logical site hierarchy: Define a logical site architecture that flows from domain to category to subcategory. This lets crawlers move through your site more quickly, allowing the site to stay within its crawl budget.
  • Links: Use internal on every page. Crawlers need links to move between pages. Pages without any links are un-crawlable and therefore un-indexable.
  • XML sitemap: Make a list of all your website’s pages, including blog posts. This list acts as an instruction manual for crawlers, telling them which pages to crawl. There are plugins and tools—like Yoast and Google XML Sitemaps—that will generate and update your sitemap when you publish new content.

If you’re not sure whether your site is accessible to crawlers, check out our Site Audit tool. The tool catches accessibility issues and gives advice on how to fix them. It also send a fresh technical SEO report for your site every two weeks, so you can stay on top of your site visibility for crawlers.

improve crawler accessibility

Alexa’s Site Audit tool identifies pages without links, allowing you to improve crawler accessibility.

Indexing

After finding a page, a bot fetches (or renders) it similar to the way your browser does. That means the bot should “see” what you see, including images, videos, or other types of dynamic page content.

The bot organizes this content into categories, including images, CSS and HTML, text and keywords, etc. This process allows the crawler to “understand” what’s on the page, a necessary precursor to deciding for which keyword searches the page is relevant.

Search engines then store this information in an index, a giant database with a catalog entry for every word seen on every webpage indexed. Google’s index, the Caffeine Index, takes up around 100,000,000 gigabytes and fills “server farms,” thousands of computers that never get turned off, around the globe.

Takeaway for Marketers

Make sure crawlers “see” your site how you want them to; control which parts of the site you allow them to index.

  • URL Inspection Tool: If you want to know what crawlers see when they land on your site, use the URL Inspection Tool. You can also use the tool to find out why crawlers aren’t indexing the page or request that Google crawl it.
  • Robots.txt: You won’t want crawlers to show every page of your site in SERPs; author pages or pagination pages, for example, can be excluded from indexes. Use a robots.txt file to control access by telling bots which pages they can crawl.

Blocking crawlers from certain work-a-day areas of your site won’t affect your search rankings. Rather, it’ll help crawlers focus crawl budget on the most important pages.

Ranking

In the final step, search engines sort through indexed information and return the right results for each query. They do that with search algorithms, rules that analyze what a searcher is looking for and which results best answer the query.

Algorithms use numerous factors to define the quality of the pages in their index. Google is leverages a whole series of algorithms to rank relevant results. Many of the ranking factors used in these algorithms analyze the general popularity of a piece of content and even the qualitative experience users have when they land on the page. These factors include:

  • Backlink quality
  • Mobile-friendliness
  • “Freshness,” or how recently content was updated
  • Engagement
  • Page speed
  • To make sure that the algorithms are doing their job properly, Google uses human Search Quality Raters to test and refine the algorithm. This is one of the few times when humans, not programs, are involved in how search engines work.

Takeaway for Marketers

Search engines want to show the most relevant, usable results. This keeps searchers happy and ad revenue rolling in. That’s why most search engines’ ranking factors are actually the same factors that human searchers judge content by such as page speed, freshness, and links to other helpful content.

When designing and refreshing websites, optimize page speed, readability, and keyword density to send positive ranking signals to search engines. Working to improve engagement metrics like time-on-page and bounce rate can also help boost rankings.

When designing and refreshing websites, optimize page speed, readability, and keyword density to send positive ranking signals to search engines. Click To Tweet

Find out more about how to rank on Google.

What Happens When a Search Is Performed?

Now we know about the three-step process search engines use to return relevant results. Crawling, indexing, and ranking allow search engines to find and organize information. But how does that help them answer your search query?

Let’s walk through how search engines answer queries step-by-step, from the moment you type a term in the search bar.

Step 1: Search Engines Parse Intent

To return relevant results, search engines have to “understand” the search intent behind a term. They use sophisticated language models to do that, breaking down your query into chunks of keywords and parsing meaning.

For example, Google’s synonym system allows the search engine to recognize when groups of words mean the same thing. So when you type in “dark colored dresses,” search engines will return results for black dresses as well as dark tones. The engine understands that dark is often synonymous with black.

understanding the search intent with Google's synonym system

Search results of “dark colored dress” pull up synonymous results as well.

Search engines also use keywords to understand broad “categories” of search intent. In the “dark colored dress” example, the term “buy” signals to search engines that it should pull up product pages to match a shopping searcher’s intent.

Find out how to optimize for semantic search.

Search results also use “freshness” algorithms to understand searcher intent. These algorithms identify trending keywords and return newer pages. You’ll see this for terms such as “election results,” which return radically different SERP results during election time and non-election time.

Step 2: Search Engines Match Pages to Query Intent

Once the search engine understands what kind of result you want to see, it needs to find matching pages. A series of factors help the search engine decide which pages are best, including:

  • Title/content relevance
  • Types of content
  • Content quality
  • Site quality and freshness
  • Page popularity
  • Language of query

So if you search “best places to eat sushi,” search engines will match list pages with “sushi” or synonyms (e.g., “japanese food”) in the title and body content. They’ll order these results based on popularity, freshness, and quality factors.

Depending on the search intent, search engines may also show enriched results such as the knowledge graph or image carousel.

Step 3: Search Engines Apply ‘Localized’ Factors

A number of individual factors come into play when search engines decide which results you see. You may see different results for “best frozen cheese pizza” than a friend who lives in another state thanks to a combination of individual factors.

  • Location: Some searches, like “restaurants near me,” are obviously location-dependent. But Google will rank results for local factors even in non-location-specific searches. A search for “football” will likely show pages about the Steelers to someone in Pittsburgh and pages about the 49ers to someone in San Francisco.
  • Search settings: Search settings are also an important indicator of which results you’re likely to find useful, such as if you set a preferred language or opted into SafeSearch (a tool that helps filter out explicit results).
  • Search history: A user’s search history also influences the results they see. For example, search for the term “hemingway,” and you’ll see results for both the writer and the editing app. Click on a few results about the writer, and run a search for “hemingway” again. This time, you’ll see a greater number of results about the writer than the app.

Takeaway for Marketers

Search results are highly specific and dynamic. It’s impossible to predict when and how your site will appear to each individual searcher. The best approach is to send strong relevance signals to search engines through keyword research, technical SEO, and content strategy. That way, you’ll show up in SERPs that are truly relevant to your content.

Use This Knowledge To Boost Results

Once you know how search engines work, it’s easier to create websites that are crawlable and indexable. Sending the right signals to search engines guarantees that your pages appear in results pages relevant to your business. Serving up to searchers, and search engines, the content they want is a step along the path to a successful online business.

Sign up for a free trial of Alexa’s Advanced Plan to get the Site Audit tools you need to make sure your content is in good condition for crawlers. Plus, you get access to comprehensive reports that identify technical and on-page optimization opportunities you may have missed.

Google and the Google logo are registered trademarks of Google LLC, used with permission.

All

How Search Engines Work: Everything You Need to Know to Understand Crawlers