News, Trends, and Insights for IT & Managed Services Providers
News, Trends, and Insights for IT & Managed Services Providers
Wikipedia Feeds the AI Beast—But Wants to on Its Own Terms

Wikipedia has announced a partnership with Kaggle, a Google-owned data science community platform, to create a machine-readable dataset of its content specifically designed for training artificial intelligence models. This initiative comes in response to a significant increase in non-human traffic due to bots scraping the site for AI training, with bandwidth consumption rising by 50% since January 2024. The new dataset will initially focus on English and French, providing stripped-down versions of Wikipedia articles that exclude references and markdown code. As the Wikimedia Foundation seeks to manage costs associated with this surge in traffic, it emphasizes the importance of protecting contributors’ rights by adhering to Creative Commons licensing terms. The dataset is expected to enhance accessibility for AI developers while addressing the ongoing challenges of content scraping from the platform.
A concerning new trend has emerged where users are employing OpenAI’s latest models, o3 and o4-mini, to conduct reverse location searches from photographs. These AI models have advanced image-analyzing capabilities, allowing them to identify cities, landmarks, and even specific venues based on visual clues. This trend has gained traction on social media platforms, with users sharing examples of the models successfully identifying locations from various types of images. For instance, one user demonstrated how the model accurately identified a location from a seemingly random photo taken in a library. However, experts warn that this capability poses significant privacy risks, as malicious actors could misuse this technology to uncover personal information. OpenAI has yet to address these potential dangers in its safety reports for the new models.

Why do we care?

Wikipedia’s move is a calculated pivot: if AI models are going to ingest your data anyway, better to shape how it happens. It tackles two core issues — runaway scraping costs and contributor rights under Creative Commons licensing.   Expect more structured open data offerings, and be aware this is an offering you too can include. In the webinar I hosted today, Srinivas Krishnaswamy offered ready-to-use schema templates for your website.   And those helping clients with AI integration will need to track these new official pipelines — they’ll often be more cost-effective and compliant than unstructured scraping.
I included the OpenAI insights to provide perspective on unanticipated safety risks.

Choose your upgrade:

Get the full benefits of Business of Tech Plus

Insider Access

$12/month

Perfect for MSPs and ITSPs that want full interviews, early access, and ad-free listening

  • Programmatic Ad-free private podcast feedSame show, little interruptions
  • Channel Chatter previews1–2 topics with light insights
  • Early access to interview episodesHear it days before public release
  • Monthly Insider BriefTighter analysis you can share internally
  • Extra audio segmentsCut interviews, behind-the-scenes commentary, quick competitive notes
  • Become an Insider for $12/month

    Leadership Access

    $149/month

    Perfect for MSPs and Vendors that run a team and need the extended tactics, executive summaries, and weekly alignment brief

  • All Insider Access benefits plus . . .
  • Invite your teamIncludes access for 5 team members with option to add more
  • Vendor Strategy BriefsThe entire library, plus new analysis every month
  • Channel ChatterAll topics, full insights, complete vendor discussion + sentiment list
  • Quarterly State of the Channel Briefing
  • Monthly AMA submission priorityAsk Dave direct questions, and skip the line
  • Get the Leadership Edge for $149/month

    Vendor Partner

    $500/month

    Perfect for channel companies or vendors looking to deepen their engagement with the show.

  • All Leadership Access benefits plus . . .
  • Get highlighted as a show sponsor You'll get placement in the show notes, throughout the website, and on our dedicated sponsors page.
  • Enjoy regular shout outs You'll be featured in a rotating format during the show
  • Become a show sponsor for $500/month

    Search all stories