Mobile First Indexing is People First Indexing

Mobile First Indexing: The Human Network

In a world where data is the new gold, the way tech giants collect and utilize information often remains behind a veil. Hold onto your hats, folks—Cindy Krum has some eye-opening insights that might just make you see the web in a whole new light.

In the nearly 45 minuet long presentation on how mobile first indexing works that you can see below, Cindy details what we’ve been suspecting for quite a while now.

Google isn’t just a search engine or an ad platform; it’s a colossal network powered by—you guessed it—you. By leveraging the distributed network of Chrome browsers, Google models and trains their systems, from Search to Ads. Relying on their promises to firewall data between products? That’s like betting on a three-legged horse in the Kentucky Derby.

Google Caught in a Web of Contradictions

Recent developments have pulled back the curtain on Google’s operations. The Department of Justice lawsuit and API leaks, something I have talked about previously, show that Google’s claims of not using click data is more smoke and mirrors than truth. Despite long denying the use of clicks and engagement in their algorithm, evidence show otherwise.

To start to understand what exactly is going on we first need to look at chrome, the most popular web browser by far and some of its behavior.

Chrome's Massive Data Collection

If you’re using Chrome, you’re part of the show. The browser collects enormous amounts of interaction data. Don’t believe it? Type `chrome://histograms` into your address bar and see the avalanche of data for yourself. This data can include everything from what forms you fill out, to what buttons you pressed on the page.

BFCache in Chrome: Performance Boost or Data Pipeline?

Another point that we can look to for more hints at what is going on under the hood of chrome is the BFCache.

BFCache was introduced to improve performance, allowing fully rendered versions of web pages to be stored and quickly reloaded. However, they can just as easily be delivered back to Google from your device. Chrome has an entire API that gives full access to your device to *.google.com. Tracking cpu, gpu, memory usage, and more. But that’s just to speed up Google Meet right?

Probably not.

Initially, webmasters could prevent BFCache use through page flags, but Chrome now ignores those commands. Website developers might have good reason to tell the browser not to cache a page to keep functionality from breaking, but if Chrome is delivering rendered pages back to Google they have a reason to ignore this flag. Sneaky, isn’t it?

Incognito Mode's Limitations & Social Media

Even in incognito mode, Google can associate your private or disposable accounts with your behavior on Chrome. There was an entire lawsuit over this that settled for $5 Billion dollars.

Now start thinking about how close Google has become with Reddit.

Think about how personal you get on your social media accounts, how much information you share on even throwaway accounts? I’d personally argue that Google wanted G+ to be a platform, not just so that they could share in the social media wave, but so that they would have yet another data point to mine for information.

So, when G+ failed, they started looking elsewhere, and finally Reddit started playing ball with Google. Combine that with the massive amount of your personal information leaking out to Google even when you are supposed to be “anonymous” and you have massive privacy concerns.

Unraveling Mobile First Indexing

When Google rolled out Mobile First Indexing, the messaging was as clear as mud. Supposedly aimed at improving JavaScript rendering, tests conducted by Tom Anthony and Malte Ubl yielded wildly different results—Tom reported 2% JavaScript rendering, while Malte claimed 100%.

The Discrepancy and Malte Ubl's Role

So, what’s the skinny? If Google were rendering JavaScript at 100%, it would strain their systems and websites alike.

If you have some understanding of the ins and outs of building websites, you’d understand this limitation.

Malte Ubl, an ex-Google employee who helped build the Mobile First Indexing systems, might hold the key. Cindy hypothesizes that Mobile First Indexing is actually People First Indexing. In other words, Google is using your resources and computers running Chrome to render pages and send back JavaScript-rendered pages to Google.

The key lies in this discrepancy between the test results of Tom and Malte. Cindy argues, compellingly, that the second stage of mobile first indexing that happens as “resources become available” is actually real people using their computers unknowingly to send back fully rendered versions of websites to Google.

The discrepancy that we see between these two results, is the difference between Google’s own systems, and the distributed network of Chrome web browsers.

We, or more specifically, our computers running Chrome, are those resources.

This distributed computing model isn’t new—it’s the bee’s knees in projects like Bitcoin mining and Folding@Home.

Heck, Google has a patent that details the exact way that they could implement a distributed machine learning network. But all of this only works because of Chrome’s significant market share.

Quite likely turning individual users into cogs in Google’s grand machine.

Just to start, ever wondered why Chrome loves to eat up your system’s RAM?

Other Clues

If that’s not enough, there’s more to unpack.

Google has been found indexing private files, WhatsApp groups, and more. How did they stumble upon these restricted items without Chrome acting like a fly on the wall?

Ever searched for something or made a purchase, only to see related ads days or weeks later on different devices? That’s no fluke. You’re being lumped into groups of people “interested” in those things, and the ads and behavior follow you like a bad penny.

Yet this is not all, we have evidence from Google’s own terms of service. Buried in the fine print, Google’s Terms and Conditions reveal that all their services are interconnected. They’re using and combining all this information, making every user a piece of a larger puzzle.

Changes After Mobile First Indexing

Core Web Vitals began focusing heavily on mobile experiences, incorporating real user data. Where did this data come from? You guessed it—Chrome. It seems plausible that Google uses this data for more than just enhancing “experiences.” User behavior offers a treasure trove of insights beyond load times.

Yet this was not the only thing that changed, take a gander at the wording around cloaking. It’s changed from being strict line in the sand to, almost saying it’s ok in some cases.

his implies that detecting cloaking has become more challenging due to their new crawling methods. Additionally, robots instructions had to be per page, not site-wide—making sense only if crawling is happening in a more human-like way.

Core Web Vitals and Real User Data

If you know how real users are using websites, what’s keeping Google from using that as a ranking signal?

The Shift to New Search Console and Analytics

Google replaced functioning products with new ones—why fix what isn’t broken? Perhaps the old analytics data contained information they’d rather keep under wraps, like decreasing organic clicks or other questionable practices. The switch to Analytics 4 conveniently wipes the slate clean and make it difficult for historical data to be used to catch them out on their behavior.

We know that this is not beyond the realm of possibility. Google specifically got in trouble for deleting internal documents as part of the DOJ trial.

And Cookies

If you remember, Google initially planned to kill cookies in the name of privacy.

However, this move, coupled with Google’s open bidding, would have funneled more advertising spending into Google’s coffers. They backtracked, possibly due to the DOJ investigation, to keep their cards close to their chest.

For those of us who were looking at this move, we saw it for what it was. A blatant attempt to further their data monopoly. Third party cookies let independent websites and advertisers track their own data and user behavior, without this, Google would have a complete strangle hold on important user behavior information.

Stopping third party cookies was never about privacy, that is something that users can easily control. It was about keeping Google in power.

Yet Skeptics Remain

Pedro Dias raised questions about these conclusions, but his arguments often circle back to trusting Google. Given the concerns and the complexities unveiled, that’s a tough pill to swallow.

We know, or at least have just about every reason, to take everything that Google says with a massive amount of salt.

Google isn’t the “Don’t be Evil” company anymore. We need to accept that.

So What Should Small Business Owners Do?

For the moment, no one is going to leave the Google ecosystem in enough numbers to affect Google’s ability to collect and use this information. If you are concerned about this data collection, you should take steps to move away from Google and Chrome specifically.

But there are some things that we can take advantage of knowing that this is how at least part of the algorithm works.

Emphasize Real User Interaction

Understand that getting real people to interact with and visit your site is more crucial than ever. Google is keen on signals indicating genuine user engagement with your website.

Drive Traffic Through Multiple Channels

Push people to visit your website through various channels—social media, forums, newsletters, you name it. Diversify your traffic sources to avoid putting all your eggs in one basket.

Create Engaging Content That Keeps Visitors Hooked

Give people a reason to stay, read, and interact with your website. Engaging content is king, and in this dance, you want to be the cat’s pajamas.

Wrapping Up

Maybe you are not convinced. But I urge you to think about this. Google is one of the largest collectors of data on the planet. They have the largest market share in search, web browsers, and mobile phones.

They make the vast majority of their money from advertising, and they have every incentive to use all possible data streams to enhance their core products, search and ads, to keep that dominant position.

With the rise of AI, large language models, there is also a massive need for training data.

If we are honest, Google’s position in the AI space is not the best. But they can do something that no one else can to catch up to the other players. Tap into the vast amounts of user behavior data that they can collect and leverage it for training their own models. Something that no other AI company could do.

Published: October 13, 2024
Tags: Google API Leak, Mobile First Indexing