
In May 2024, leaked Google API documentation sent SEO pros scrambling to try and decipher the notoriously secretive inner workings of the Google algorithm.
The leak detailed over 14,000 ranking signals across 2,500 pages, spanning everything from the influence of click data to link value and content quality, sparking intrigue and excitement among those looking to optimise websites to boost search engine rankings.
How much should we trust the leak?
A spokesperson for Google, Davis Thompson, confirmed that the leak was legitimate in an email to The Verge but urged marketers to proceed carefully, stating:
“We would caution against making inaccurate assumptions about Search based on out-of-context, outdated, or incomplete information…”
We still don’t know how Google prioritises the factors mentioned in the documents, how much weight is given to each one, when the documentation was created, or its context in the current algorithm.
While we don’t know the exact context of the documents included in the data leak, it still provides us with a fascinating (and unprecedented) insight into the algorithm of the largest search engine, which generally keeps ranking signals a closely guarded secret. Much of the documentation confirms long-suspected ranking signals, while some points are at odds with official Google guidelines.
Key findings of the SEO algorithm leak
We’ve picked out some of the most significant takeaways from the leak, and outlined their potential impact here.
Many of the points below are best practice in any digital marketing strategy regardless of the documentation leak, but it is interesting to have many of these confirmed, especially when previous guidelines said otherwise, or remained unconfirmed.
This is by no means an exhaustive list, but it provides a summary of the key points, as chosen by our SEO team.
Click data matters
Despite previously denying click data is a ranking factor, the documents confirm Google’s use of ‘NavBoost’ to measure click logs of user behaviour in ranking signals.
In a Reddit Ask Me Anything, Gary Illyes, an analyst on the Google Search Team, was asked to confirm whether UX signals such as dwell time and bounce rate were used as ranking signals. He responded ‘those are generally made up crap’, apparently quashing UX signal theories from other prominent SEO experts and digital marketers.
As click data is used to measure dwell time, or the amount of time a user spends on a page before clicking back to search results, dissatisfaction with a page is likely measured by clicks. A high bounce rate and low dwell time is a strong indicator of mismatched search intent with (or as well as) a poor user experience.
For example, if a user clicks through to a site and has to scroll through reams of content or irrelevant information to find what they’re looking for, they’re likely to leave within seconds and head back to another site to find what they need. The same goes for being bombarded by pop-ups, or content failing to load quickly enough.
Both a higher dwell time, and higher ‘time on page’ (or how long a user spends on your site) are measures of success in both SEO and digital marketing, as it suggests that the content has engaged the user, and are more likely to convert as a result.
It is now interesting to note that elements such as videos, images and on-page tools do have a positive knock-on effect on ranking performance, reinforcing the importance of optimising for user experience.
Google uses Chrome data
The leak also appears to confirm that Google uses Chrome data to monitor how users interact with a site, something else it previously denied, most recently by Google Search Advocate John Mueller in a Google Hangout in January 2022.
It is suggested that the ranking signal listed as ‘chrome_trans_clicks’ in the document leak not only gets its click data from Chrome but also uses this to identify popular links in the site and generate the sitelinks SERP feature from this data.
The fact that Google is using their own Chrome data to measure the user satisfaction of the onward journey outside of their own website is a huge step up from the level of data we previously knew them to utilise.
Content is scored for originality
The ‘OriginalContentScore’ suggests that content is scored for its originality, which makes sense after the Helpful Content Update rollout in 2022. In a bid to reward people-first content and combat low-quality AI content, Google recently emphasised the importance of quality content by ingraining the Helpful Content System deeper into its March core update in 2024.
Google’s search update blog stated: ‘you’ll now see 45% less low-quality, unoriginal content in search results versus the 40% improvement we expected across this work.’
‘Original’ content presents unique and comprehensive information, research or analysis that matches the user’s search intent. Google wants their content to appeal to humans, rather than search engines, and is cracking down on spammy low quality content, including content that is a rehash of the top queries without presenting new ideas or interpretations. If content is engaging, well-written and helpful, Google has succeeded in presenting the best-quality result for their query, and searchers are happy.
Relevance is key
Other fascinating nuggets of apparent content clarity from the leak include confirmation that Google uses ‘TitleMatchScore’ to measure whether the webpage’s title matches a search query. The leak also shows that the page title can be longer than the recommended 50-70 characters, as Google will still use the words you’ve included to assess how well the title and content matches the user’s query.
Alongside quality and relevance, the leak also confirms that content is ranked based on ‘freshness’ and that Google emphasises content based on update frequency - regular content updates are crucial to success for some content types.
The message is clear: keeping a sharp focus on creating useful, relevant content with a focused page title will keep users on your page.
Google ranks websites, not just webpages
The phrase ‘Google ranks webpages, not websites’ has been widely discussed (and accepted by some SEOs) over the years, with Google’s John Mueller ‘liking’ a tweet in apparent confirmation of the statement.
This suggested that Google crawls and indexes pages in their own right at page level, rather than including the overall strength and relevance of the whole website in its ranking signals.
While the quality of the content on a page is crucial to ranking increase, other data from the leak suggests maintaining high standards of consistent, quality content across your site is critical, including content depth and user engagement, as suggested by ‘ChardScores’.
It is suspected that chard scores are applied at site level and are used to predict overall page and site quality based on existing content.
Although this goes against information we’ve been led to believe in the past, it makes sense - ‘entities’ are already a key part of the Google algorithm. For example, if Google considers a site to be relevant to a specific ‘entity’ or subject area, it can be more difficult to rank it for a less-related entity.
Conversely, if a website shows relevance to a particular entity, with strong website architecture and topic clusters related to sub-topics of its main subject, it can be easier to rank for new but related terms.
With this in mind, pages with low-quality or ‘thin’ content and no backlinks are worth pruning to keep standards high across your site.
Subdomains vs subfolders
The leak clarified that a subdomain is treated as a separate site, which has long been considered true by SEO pros.
However, John Mueller has said in the past that both subdomains and subfolders are treated equally, which is clearly at odds with common beliefs and the algorithm leak. This shows that it’s important to remember that if your blog or shop is on a subdomain, you would have to build its authority separately to the primary domain, an interesting detail to have clarified.
Domain Authority
A referring domain from a website with a strong link profile has always been recognised as an important ranking signal. Until the leak, Google’s stance on domain authority was clear. Although it recognises it as a tool, “Google doesn’t use ‘domain authority’”, tweeted by John Mueller back in 2016 - a recurrent theme confirmed multiple times by several members of Google’s team throughout the years.
However, the leaked documents mention a ranking signal called ‘SiteAuthority’, which many have likened to ‘domain authority’, with ‘SiteAuthority’ encompassing many factors including content quality, click data and a site’s link profile.
The documents appear to clarify the importance of domain authority, and how it influences backlink weighting, along with other factors like traffic and relevance.
Links do count
Google has spoken less about backlinks over the past couple of years, instead focusing on promoting its quality content guidelines to combat the rise of generic AI content and content designed to manipulate rankings.
This led many in SEO to speculate on whether backlinks were still as much of a significant ranking factor, and whether their weighting had changed.
The leak confirms that high-quality links are still an essential part of ranking success, and that PageRank (a metric that measures the quality and quantity of backlinks to a page) is still an important ranking signal. What is most interesting, however, is apparent confirmation of how backlinks are weighted, and how Google views links.
Backlinks with a higher domain rating and consistent traffic volume are considered, as is the topic relevance on both sides of the link to ensure that backlinks are earned naturally by those sharing content they find useful or is supported by authoritative sites with topical relevance.
Pages that regularly earn new or ‘fresh’ links are also favoured, suggesting that Google gives more weighting to websites that provide new and engaging content that drives traffic over a site with high authority that rarely earns links or organic traffic.
Google penalises spammy links
The leak also confirmed that Google measures link velocity to identify spam links and nullify them, and that they penalise low quality ‘toxic’ links.
This quells the debate on whether low quality links were simply ‘ignored’ by Google, or whether they do actually take action on spam.
These findings suggest that Google has the capacity to differentiate between a targeted attack on a website, compared to random low quality links that were obtained naturally.
Has the Google algorithm leak changed our approach to SEO?
At Ascensor, we’ve always believed that a website with quality content that matches user search intent on an easily navigable and well-structured site is the key to success.
There are a lot of things we don’t know for certain when it comes to how Google uses ranking signals, but uncertainty is a familiar concept for SEO professionals, and a healthy dose of scepticism is always sensible when approaching Google guidelines.
The best SEO pros are constantly testing and evolving their practices based on the latest Search trends, the impact of their strategies and the type of account they’re working with.
What we do know is that focusing on prioritising a great user experience through structure, navigation and quality content is key to appealing to search engines, and your customers.
Want to find out how SEO can make a difference to your business? Speak to our experts.