Press "Enter" to skip to content

Web Bots Are Taking Your Data—And Now Recall Wants to Watch Too

AI crawlers are increasingly harming Wikimedia and open source sites by overwhelming their infrastructure with excessive data requests. Since January 2024, Wikimedia has seen a fifty percent increase in bandwidth used for downloading multimedia content, primarily driven by automated bots rather than human users. These bots account for sixty-five percent of resource-consuming traffic, while overall page views from bots are around thirty-five percent. This surge in scraping activity not only strains resources but also poses a threat to the sustainability of open source projects. As developers express frustration over these invasive practices, experts warn that if left unchecked, these AI crawlers could jeopardize the open web and access to knowledge, limiting opportunities for academic researchers and journalists.

Microsoft is gradually rolling out a preview of its new feature called Recall, which captures screenshots of user activity on Copilot Plus PCs. This rollout is currently available to Windows Insiders. Originally intended for launch last June alongside Copilot Plus PCs, the feature faced delays due to security concerns raised by experts. After postponements, including one planned for October, Microsoft aims to provide a secure user experience before a wider release. A previous preview was made available in November to specific Copilot Plus PC users. According to a recent blog post, users must opt in to save snapshots with Recall and can pause this feature at any time.

Why do we care?

There is an important conversation to be had with customers about data ownership.  The genie is likely out of the bottle for a lot of already published information, and most small companies will not be able to fight back against their data being scraped.    There’s two angles.  First, how do your customers react to being scraped, and second, how do they react to using scraped data.

Because Recall is essentially the individual version of that.    Recall is opt-in – yet organizations should determine what their policy is for opting in as an organization.  And this data conversation is an important one to have with customers.