Recently leaked internal documentation from Google’s Content Warehouse API has provided significant insights into the company’s search algorithms. These documents offer details about data storage for content, links, and user interactions, though they lack specifics on scoring functions.
The leaked documents detail 2,596 modules with 14,014 attributes related to various Google services such as YouTube, Assistant, and web documents. These modules exist within a monolithic repository, meaning all code is stored in one location and accessible by any machine on the network.
Google Search algorithm leaked today.
It outlines 2,596 modules with 14,014 ranking features related to various Google services.
Here's what (13 things) we found: pic.twitter.com/ueEWM2Z6ea
— Hridoy Rehman (@hridoyreh) May 28, 2024
One significant revelation from the leaked documents is the existence of a feature called “siteAuthority.” This feature indicates that Google measures the overall authority of a website, which contradicts Google’s previous public statements denying such a practice. Site-wide authority refers to the overall trustworthiness and credibility of an entire website, rather than just individual pages. This metric can significantly impact how a website’s pages rank in search results.
Additionally, the documents reveal that systems like NavBoost use click data to influence rankings. This means that Google considers how often users click on a particular search result when determining its ranking. Despite Google’s public denials, this practice suggests that user interaction data plays a role in ranking decisions, potentially favoring pages that receive more clicks.
The documents also mention a “hostAge” attribute used to sandbox new sites, which contradicts Google’s previous statements denying the existence of a sandbox. In SEO terminology, a sandbox is a theoretical filter that prevents new websites from ranking well in search results until they have established a certain level of trust and authority. The “hostAge” attribute suggests that Google does apply a probationary period to new sites, limiting their visibility in search results initially.
Moreover, the documentation shows that Chrome data is used in ranking algorithms, despite Google’s previous denials. This means that data collected from users’ browsing behavior in the Chrome browser can influence how websites are ranked in search results. This includes metrics such as time spent on a page and bounce rates, further integrating user experience data into the ranking process.
Google has already lost considerable public trust over the past few years. If these allegations are proven true, the company will face an even greater challenge in regaining the public’s confidence.