Little by Little, a Little Becomes a Lot

Mar 6, 2025

Vibhu Mittal

I joined Inflection AI a few months ago. Originally founded by Mustafa Suleyman, Karen Simonyan, and Reid Hoffman, Inflection AI had already built some of the most advanced AI technologies in the world. Their work resulted in the launch of Pi, a chatbot designed to be the best in the world at friendliness, empathy, and natural conversation. Tens of millions of users had engaged with Pi and found immense value in it. However, the challenge ahead was to take this extraordinary technical foundation, build a new team, and extend the work—this time with a focus on enterprise AI solutions.

I had spent decades working on text generation: my thesis work in the 1990s was on generating instructional texts; during my time at Pitt/CMU I worked on building patient education systems that conversed with their users; I was one of the four people who built the first prototype of Google Translate. But the advent of Transformers had changed the AI landscape, while I had been looking elsewhere -- working on edtech in K-12 classrooms. So, I approached this new challenge at Inflection AI with a blend of some excitement and a lot of trepidation.

Early Adjustments and Unexpected Learnings

One of our first decisions came almost immediately. We needed to trim our GPU usage – our single largest line item, so we decided to curtail some of the consumer traffic to Pi—we could save some money, and our new focus was going to be the enterprise use case anyway. We didn’t want to cut it completely, since Pi remained a significant proof point for enterprise discussions – few companies had successfully deployed AI at this scale. So we decided to cut back third-party integrations—eliminating channels like WhatsApp, Telegram, and Signal—and keeping only the web and mobile versions. This would allow us to retain millions of users for model testing while cutting expenses.

But the golden rule is that one should expect the unexpected. Though we were able to reduce the number of users, our costs did not decrease as hoped. A substantial portion of Pi’s most loyal users simply migrated to our native apps – in the process discovering and embracing the speech interface. And as people tend to talk more than they type, our usage—and therefore our costs—remained higher than anticipated. The lesson: cost reduction would not come from scaling back access but from building more efficient models.

Reducing costs meant optimizing performance—and that meant addressing our compute infrastructure. Nvidia GPUs were in such high demand that bidding wars were breaking out, and major cloud providers were rationing supply. The GPU-focused hyperscalers demanded long-term leases with upfront payments. Our applicants were asking us how many GPUs we had access to as a condition of accepting employment.

If we were struggling to secure GPUs, wouldn’t our enterprise customers face the same issue? The answer was clear: we needed flexibility. Intel, with the largest footprint for enterprise hardware globally, became a natural choice. If we could get our software running efficiently on Intel hardware, we could provide AI solutions even to businesses with limited access to Nvidia GPUs.

It sounded straightforward—which should have warned us that it wasn’t. Several weeks and hundreds of kernel rewrites later, after countless late nights, our engineers—working alongside Intel researchers—successfully managed to get our inference layer optimized for Intel Gaudi servers. The effort was worth it, getting us deeply into the codebase, attracting interest from enterprises seeking multi-architecture deployments, and proving our commitment to adaptability.

Expanding Capabilities Through Strategic Acquisitions

We needed more than mere multi-architecture, state of the art EQ capabilities to succeed. What we couldn’t build, we could buy. We expanded our capabilities through two key acquisitions:

BoostKPI: A company focused on handling large, structured data sets. Businesses continuously accumulate vast amounts of data, but extracting actionable insights remains a challenge. BoostKPI helps unlock the full potential of this data.
Jelled.AI: A company focused on building AI-driven personas and institutional memory. This aligns closely with our mission to create AI that understands and adapts to specific enterprise contexts.

Beyond IP, both companies brought exceptional founding teams, strengthening our talent pool. Now, with our team growing, we resumed training pipelines—experimenting with smaller, enterprise-friendly models—and began collecting and curating fine-tuning data. Fine-tuning has recently gained greater attention due to new models that have displayed surprisingly good performance from fine tuning, and is a critical component in transforming a raw, factual model into one that is engaging, supportive, and aligned with enterprise needs.

Contributions to Open Source and Model Development

As our expertise, and confidence, increased, we started making meaningful contributions to the broader AI community. Our work on TorchTune, an open-source fine-tuning pipeline for PyTorch, has been noticed and acknowledged. We have published some of our work on arXiv, and as we stabilize our enterprise deployment frameworks, we intend to open-source aspects of our code to support wider adoption.

Looking Ahead: The Future of Enterprise AI

Since last March, building on our foundation of the original team’s SOTA work in model building, we’ve extended the product focus to enterprise use cases, assembled a new team, and delivered both models and applications to our initial customers. Some of our milestones we celebrated include:

Enterprise product deployment both in cloud and on-prem of model ensembles
Model encryption for security during storage and deployment
Enterprise specific safety models and fine tuning
Optimizing inference pipelines for Intel hardware, in addition to Nvidia, enabling broader adoption
Maintaining millions of users as real-world test points for refining our ‘reasoning about EQ’ models

We are now in the final stages of fine-tuning and testing a new family of models that push the boundaries of data provenance and cost efficiency. Built on a state-of-the-art Mixture-of-Experts (MoE) framework, these models significantly improve inference speed and cost-effectiveness. Our largest model achieves top-tier HellaSwag benchmark scores. Alongside this, we are testing smaller, optimized versions ranging from AI models that run entirely on handheld devices with no connectivity to those that operate across large cloud clusters. We will be publishing more details on the models and their capabilities in the coming weeks.

Inflection AI is committed to delivering the best enterprise AI solutions—combining scale, efficiency, and adaptability. If you’re interested in the future of AI-driven enterprise transformation, come talk to us.

Back to Blog Overview

Little by Little, a Little Becomes a Lot

Recent Articles

Bringing Agentic Workflows into Inflection for Enterprise

Introducing Inflection for Enterprise

The Future of Pi

Bringing Agentic Workflows into Inflection for Enterprise

Introducing Inflection for Enterprise

Bringing Agentic Workflows into Inflection for Enterprise

Introducing Inflection for Enterprise