How Google Works

Google begins by crawling the web.

Because there is no centralised registry of all web resources in the world, Google must explore the entire web on a regular basis. To accomplish this, Google employs automated software known as a web crawler, also known as Googlebot.

Googlebot roams the Internet on a regular basis, looking for new or recently updated webpages. This is referred to as crawling. It is usually done in one of two ways.

First, Googlebot returns to the pages it discovered during previous crawls. It then follows either all of the links found there or the XML sitemap, if one has been submitted. All newly discovered pages are then added to the list of pages to be crawled later.

Second, Googlebot crawls the pages that site owners submit through Google Search Console. The crawler receives another batch of webpages to add to its crawling queue.

Normally, Googlebot will crawl all new pages that it discovers. However, a page will not be crawled if any of the following conditions are met:

It is prohibited from crawling in the site’s robots.txt file
It cannot be accessed by an anonymous user, for example, login pages.

If a page is a duplicate of another, Googlebot will visit it less frequently in order to improve crawling efficiency.

In addition to discovering new web pages, the crawling stage includes the rendering (visualisation) of a newly discovered page. Googlebot loads the page’s HTML, third-party code, JavaScript, and CSS using the Chrome browser.

The pages are then added to Google’s index.

When Googlebot discovers a new page, it attempts to determine what the page is about. Indexing is the term for this procedure. It includes a thorough examination of all page elements, including text content, meta tags and attributes, images, and videos, among others.

In general, all newly discovered and crawled pages are then indexed. The only exception is if the noindex directive appears in a tag or header on the page. Googlebot will not index the page in this case.

When indexing is complete, the crawler catalogues the page in the Google index – Google Search’s database. For the time being, the Google index contains hundreds of billions of webpages.

This new page will be served to searchers once it has been indexed.

When Google receives a query, it returns search results.

When a user types a query into the Search box, Google consults its index to find and serve the most relevant results. The procedure is known as “serving,” and it consists of eight steps.

1. Establishing context and narrowing the index

Google has already factored in a few things that will help it narrow down the index and filter out irrelevant results by the time you submit your search request.

Here’s what Google looks for before you hit the Enter key:

Google uses your location to deliver content that is relevant to your location. As a result, even if you don’t specify a location when searching for a vegan cafe nearby, you’ll see a Local pack (a map with three local businesses listed) even if you don’t specify one.
Google examines the query’s language. If you search in German, all search results will be in German, regardless of your location or the preferred language set in Search settings.
Google looks at the type of device you’re using. If you’re using a phone, Google will prioritise mobile-friendly pages. Furthermore, this determines which SERP features you will see. For example, featured snippets and ads are more frequently returned on desktop, whereas some other features are exclusive to mobile search.
Google honours your search preferences. When you enable SafeSearch filtering, Google will not show you explicit search results. Similarly, if you select Show personal results, you will receive personalised answers and recommendations based on the data in your Google account.

2. Determining the query’s meaning and intent

After you’ve submitted your search query to Google, it must determine the true meaning of your query. Users do not always know how to spell something correctly or phrase a query in the way that webmasters do.

The first thing Google does is recognise new words and correct spelling errors. Google employs natural language understanding models to decipher unknown words, typos, and conceptual errors. This is accomplished primarily by examining the entire query rather than focusing on a single word.

The query’s meaning and intent are then determined by Google. Previously, Google would match words in queries to words on pages without understanding the meaning of the words. Everything changed in 2013 with the introduction of the Hummingbird algorithm. That’s when Google entered a new era of semantic search, focusing on understanding the meaning of the query rather than individual keywords. This update foreshadowed the Artificial Intelligence systems that became the most significant breakthrough in natural language processing.

3. Determine whether the query requires new content.

Once Google understands the meaning and intent of your search query, it checks to see if you’re looking for something that necessitates the most recent and up-to-date information (news, politics, events, etc.).

Google applies the Query Deserves Freshness (QDF) mathematical model to your query to determine if you are looking for current information. To begin, the model determines that a topic is hot if news sites or blog posts are actively discussing it. Alternatively, if the number of searches on a particular topic increases. When Google determines that the topic you want the most up-to-date information on is the one you want, it rewards up-to-date content with higher rankings.

For example, if you search for “prince harry and Meghan,” you’re probably expecting to find some information about them. As a result, Google displays Top Stories with the most recent news about the couple at the top of the SERP.

4. Determine whether the query is about Your Money or Your Life.

Along with the QDF check, Google examines your query to see if it is one that Google considers unacceptable for returning untrustworthy content. Such inquiries and pages are referred to as Your Money or Your Life (YMYL). Typically, these are health, safety, financial, and other related topics.

With the Medic update, it is now possible to distinguish between Your Money and Your Life queries and match them to the appropriate content. If Google determines that the query necessitates YMYL content, it assesses the expertise, authority, and trustworthiness (E-A-T) of the relevant pages, their creators, and the websites in general. Pages with a higher E-A-T score will be ranked higher in the end.

For example, if you search for “stock exchange,” the first SERP will primarily include highly trusted pages such as Nasdaq, the London Stock Exchange, the New York Stock Exchange, and so on.

5. Creating a visual representation of the SERP

The SERP may look different depending on the type of query you enter. For example, in addition to ten blue links, it may display a slew of advertisements, Knowledge Graph results, a map, and so on.

As a result, before Google returns its final SERP, it determines what type of search results will be most appropriate. As practise has shown, the SERP structure is heavily influenced by search intent:

There is also a discernible difference in how Google selects which SERP features to display for mobile and desktop searches.

For example, mobile SERPs have the following distinguishing characteristics: Broaden and refine your search (Predictive features), Knowledge Panel with the View in 3D feature, Short Videos, and Web Stories

Meanwhile, some features, such as ads and featured snippets, appear more frequently on desktops.

The logic behind such a distinction can be found in how we use these two types of devices. We have more time to study text content when we are at the desktop. On the contrary, when we use our phones, we expect to find information as quickly as possible. As a result, Google “equips” the SERP with additional predictive and visual features.

6. Selecting the most pertinent pages for each type of search result

Google looks at how well the information on a website corresponds to the search query after it has grasped the concepts in the query and pages. Google analyses text, images, and videos, as well as all meta elements such as title, meta description, and alt tags, to determine the relevance of content.

Pages that are more relevant, i.e. those that best meet the needs of the user, will be ranked higher. However, keep in mind that, while content relevance is important, it is not the only ranking factor. It is the combination of many factors that can ensure high rankings on the SERP.

7. Balancing the relevance and importance of pages

Google prioritises pages with the most trustworthy and high-quality content. At this stage, it tries to strike the right balance of information relevance and authority.

The first thing Google does for this purpose is evaluate the page’s content quality. As a result, it identifies the signals that demonstrate expertise, authority, and trustworthiness on a specific topic. This procedure entails the following steps:

PageRank estimation. Google looks to see if other well-known websites link to or refer to the content of the given page. The count is also important. The more backlinks the page receives from high-quality sites, the more likely it is to rank at the top.
The anti-spam algorithm detects any spam or other deceptive or manipulative behaviour. Of course, anything that violates Google Guidelines will not be ranked highly.
Checking to see if a website is secure. HTTPS is regarded as the gold standard by Google because it provides encryption, data integrity, and authentication. The page is rewarded if it provides a secure user experience.

And, because Google prioritises user experience, it also looks to see if the page is simple to navigate and use – the page’s usability. The procedure is also quite complicated, and it entails the following steps:

Examining the page for distracting interstitials. If there are popups that prevent users from viewing the main content, the page will not rank well.
Checking to see if the site is optimised for all device types. Web content should be as simple to consume on mobile, tablet, and desktop devices.
Considering the site’s Core Web Vitals. Loading time, interactivity, and visual stability all influence how engaged your visitors are and how kind Google is to your content.

Pages that provide both quality and usability, obviously, rank higher in search results.

8. Displaying the outcome to users

When your query has been thoroughly examined from all angles and the AI algorithms have completed their work, Google returns the most relevant search results. Take a look at the image below – the entire process takes only a fraction of a second.

Fun fact #1: The time you’ve spent reading this guide up to this point has been enough for Google to process 38 million queries.

Fun fact #2: You might think you’ve cracked the Google algorithm. But it’s too soon to pop the cork – the algorithm could change tomorrow.

Google’s algorithm is constantly being improved.

Google cannot change specific search results manually in order to improve search. Rather, it is constantly changing and adapting its algorithms. For example, in 2020, Google will make approximately 4 500 improvements to Search. On average, Google makes about 12 changes per day – we can say Google is a hard worker.

Below, I attempted to break down Google’s efforts in this regard.

1. Combating webspam

Fighting spam is a pain in the neck for Google. Google claimed that in 2020 alone, they discovered approximately 40 billion spammy pages per day.

Spam, according to Google, is anything that deceives users and violates Google Quality Guidelines. They are as follows:

Automatically generated content
sneaky redirects
link schemes
thin content
paid links
cloaking
hidden text and links
doorway pages
scraped content
pure affiliate sites
irrelevant keywords
automated queries
user-generated spam

In reality, spam removal is a multistep process that includes both Google AI algorithms and manual review by the spam removal team.

Between the crawling and indexing stages, a large portion of spam webpages are filtered out. The rest that gets through is caught by the filters later on during the ranking and serving stages.

Despite the perfection of today’s anti-spam algorithms, some webpages continue to appear in SERPs. This is when Google’s spam removal team enters the picture. They investigate spam reports submitted by searchers and take manual action against sites that violate Google’s policies. As a result, spammy websites are lowered in ranking or even removed from search results.

Don’t be alarmed if Google takes a manual action against you. First, you’ll notice a notification in your Search Console. Then, it’s critical to address any issues that may have contributed to this. Once everything is in order, your site is likely to regain its ranking.

2. Algorithm for testing

Without tests and experiments, it is impossible to perfect search. Each new idea that comes to Google’s minds is rigorously tested before it is released.

As a result, in order to improve search quality, Google collaborates with Search Quality Raters, a group of independent reviewers from around the world. The raters evaluate how efficient the search is and whether the search results provided satisfy the user’s search intent. Furthermore, they assess the quality of search results based on the content’s Expertise, Authoritativeness, and Trustworthiness. What’s more, they do all of this while strictly adhering to Quality Rating Guidelines.

Aside from search quality tests, Google also conducts side-by-side experiments with the assistance of Quality Raters. Google shows Raters two sets of search results: one with and one without the proposed change. The Raters are then asked which results they prefer and why.

The ratings provided by Quality Raters have no direct impact on a page’s rankings. Instead, this data is aggregated to assist Google in determining how well their search algorithms perform.

Furthermore, Google conducts live traffic experiments to observe how real people interact with a feature under development. It enables the feature for a small group of users before comparing the results to those of a control group. If the outcome is not satisfactory, the feature is not approved for further integration.

To round out the picture, let’s look at the most recent Google updates.

3. The most recent developments

Google updates can be divided into two categories.

The first category contains minor updates. They usually go unnoticed by searchers and cause minor ranking fluctuations for SEOs. Google usually does not provide any information about such changes.

The second category includes Google’s major (core) algorithm updates, which are of particular interest because they can drastically alter the game for both users and SEOs. I’ve compiled a list of some of the most notable updates over the last seven years.

Serving of high-quality content:

Medic update (August 2018). This algorithm was developed to improve the detection of expertise, authority, and trustworthiness in web content. This is done in order to push the YMYL pages with the highest E-A-T scores to the top of search results.
Passage ranking update (February 2021). Google can use it to assess the relevance of a specific passage rather than the entire page and rank it individually. Even needle-in-a-haystack information can now be found among a plethora of lines.
Search spam updates (2021). The updates targeted content that violated Google’s webmaster guidelines and were intended to more effectively combat spam in web and image results.
Link spam update (July 2021). Because of it, Google can detect and eliminate link spam in a broader range of languages. The effectiveness of deceptive link building techniques was significantly reduced as a result.
Product reviews update (2021, 2022). Google can use it to identify and effectively reward high-quality product reviews with higher rankings. Google now provides even more useful and valuable information to users.

Understanding natural language and search intent:

The RankBrain algorithm (October 2015). This is the first machine-learning algorithm capable of processing previously unseen search queries and intelligently matching them to relevant pages.
BERT (October 2019). The implementation of this NLP algorithm altered how Google understands words in queries. Because of it, Google can understand even the smallest nuances in context and thus effectively match queries to the appropriate results.
MUM stands for Multitask Unified Model (June 2021). This new algorithm outperforms BERT by orders of magnitude. MUM is capable of comprehending complex questions and information of all types (photo, video) in multiple languages. Google will learn to answer users’ questions like real experts thanks to MUM. Because the update is new, it will take some time for us to realise its full potential.

Providing an exceptional user experience:

Updates for mobile devices (2015, 2016). They were designed to increase the visibility of mobile-friendly pages in mobile search results. Users can now easily find relevant results that are readable without the need for zooming or horizontal scrolling. The updates also made mobile friendliness a ranking signal for mobile search.
Framework for Accelerated Mobile Pages (AMP) (2016). This open-source project was created to help mobile pages load much faster, but it has since been expanded to desktop sites, emails, ads, and so on. It loads the page content even before the user visits it.
Indexing for mobile first (2019). This is the next stage of mobile-friendly updates – Google now not only rewards mobile-friendly pages with high rankings, but it also crawls, indexes, and ranks the mobile version of the site first.
Page experience enhancements (2021, 2022). Core Web Vitals (Largest Content Input, First Input Delay, and Cumulative Layout Shift) were added by Google as page experience signals for both mobile and desktop searches. Thus, in order to rank pages, Google now considers whether they load quickly, are mobile-friendly, use HTTPS, have no intrusive ads, and do not move content as the pages load.

Conclusion

No matter how hard the global SEO community tries, the Google Search algorithm will always be shrouded in mystery. The reason for this is that Google wants to prevent third-party manipulation of search results, so it only reveals a fraction of how it really works.

I hope my article lifted the veil of secrecy and helped you understand the fundamentals of Google and its algorithm.

Let's work together

Let’s talk about your website project

If you’re looking for a brand new website or a redesign of your existing site, we’d love to hear from you.

Let’s talk about web hosting & support

Do you need support with website hosting and maintenance? Get in touch to find out more.

Let’s talk about search optimisation

Is your website struggling to be found in search engines? Let’s have a chat about how we can help you climb up the rankings.

CALL: 0191 228 6959
EMAIL: hello@michaelwalsh.design

Send us a message