Press "Enter" to skip to content

Big AI, Big Trouble? Low-Cost Models Challenge Industry Leaders

A bit more model disruption.

Researchers at Stanford and the University of Washington have developed a new artificial intelligence reasoning model called S1, which can be trained for under fifty dollars in cloud computing credits. This model demonstrates capabilities similar to high-end models like OpenAI’s O1 and DeepSeek’s R1, particularly in math and coding tests. The team fine-tuned S1 by distilling it from Google’s reasoning model, Gemini 2.0 Flash Thinking Experimental, using a dataset of just one thousand curated questions. Training S1 took less than thirty minutes on sixteen Nvidia H100 GPUs, costing around twenty dollars for cloud computing resources. This innovation raises concerns about the commoditization of AI, as small teams can replicate multimillion-dollar models at a fraction of the cost. The findings suggest that while distillation is a cost-effective method for recreating existing AI capabilities, it may not lead to the development of significantly superior models.

Hugging Face has developed an open-source AI research agent named “Open Deep Research” in just 24 hours, following the launch of OpenAI’s similar feature. This new tool aims to replicate OpenAI’s performance, achieving an accuracy of fifty-five percent on the General AI Assistants benchmark, compared to OpenAI’s sixty-seven percent. The project emphasizes the importance of an “agent” framework that enables AI models to perform complex, multi-step tasks, such as gathering information from various sources. Hugging Face’s Aymeric Roucher highlighted that while their current model uses closed weights for efficiency, it can easily be adapted to open-source models, showcasing the flexibility of their approach. The swift development process was aided by contributions from the open-source community.

A recent study by the enterprise AI startup Vectara reveals that DeepSeek’s R1 model is prone to generating inaccurate information, or “hallucinations,” at a significantly higher rate than other reasoning models. While models from OpenAI and Google showed the least hallucinations, Alibaba’s Qwen performed the best among those tested. Interestingly, DeepSeek’s earlier version, known as V3, was found to be over three times more accurate than R1. Vectara’s head of developer relations, Ofer Mendelevitch, emphasized the importance of balancing various capabilities during model training to mitigate these issues. The findings highlight ongoing concerns regarding DeepSeek’s training data and moderation capabilities, raising questions about its potential impact on U.S.-based AI companies.

Major U.S. cloud providers, including Amazon Web Services, Microsoft, and Google, have begun offering access to DeepSeek without charging users based on text generation, unlike other models. Instead, businesses only pay for the computing resources they use. This model can be significantly cheaper, with DeepSeek costing three to four times less than alternatives via an application programming interface. For instance, using Meta’s Llama model through Amazon Web Services can cost three dollars per one million tokens, while DeepSeek offers a more efficient processing alternative. Notably, more than one thousand of Databricks’ twelve thousand customers have adopted models based on DeepSeek R1.

Why do we care?

The headline is that the notion that cutting-edge AI requires vast resources is being challenged. This raises existential questions for companies investing billions in proprietary models.   The models will not be the value in the chain.

AI is becoming cheaper and more accessible, but quality and trust remain key differentiators. The industry isn’t just competing on raw intelligence anymore—it’s competing on efficiency, reliability, and cost-effectiveness.    And most aren’t clearing the bar here.  

This is all good news for application development and for services providers.