Palo Alto, CA, June, 22, 2023 – At Inflection, our mission is to create a personal AI for everyone. In May 2023, we released Pi (Pi.ai) – your personal AI, designed to be empathetic, useful, and safe (Pi press release).
We believe that pre-training is as important as finetuning when it comes to creating high quality, safe, and useful AI experiences. That’s why we set out to develop our own state-of-the-art LLMs. As a vertically integrated AI studio, we do everything in-house for AI training and inference: from data ingestion, to model design, to high-performance infrastructure.
To offer our users superb quality and speed, we needed to develop a model that is both scalable in production, as well as more capable than widely deployed LLMs such as GPT-3.5 and LLaMA. We are excited to share that we have now achieved this goal.
Inflection-1 was trained using thousands of NVIDIA H100 GPUs on a very large dataset. Our team has been able to take advantage of our end-to-end pipeline to develop a number of proprietary technical advances that have enabled these results. This technical memo summarizes our evaluations and compares our performance against other LLMs.
The memo shows that Inflection-1 is the best model in its compute class, outperforming GPT-3.5, LLaMA, Chinchilla, and PaLM-540B on a wide range of benchmarks commonly used for comparing LLMs. We will also be releasing a technical memo detailing one of our models in the same compute class as PaLM-2 and GPT-4.
This is an achievement we are proud of, having started Inflection just over a year ago. We expect dramatic improvements in the coming months as we continue to scale and innovate to deliver on our mission to build the most capable and safe AI products, accessible to millions of users.
Summary Evaluation Results
We evaluated Inflection-1 on a wide range of benchmarks against models in the same compute class, defined as models trained using at most the FLOPs of PaLM-540B. A summary of the six most popular benchmarks follows. Further details are available in our technical memo.
Inflection-1 sets a new standard on MMLU
Massive Multitask Language Understanding (MMLU) is a commonly used benchmark that tests a very wide range of academic knowledge. The benchmark includes exams from 57 different categories ranging from high school and college to professional level difficulty. See question examples
On this benchmark, our model is the best performing foundation model in its class, outperforming Meta’s LLaMA, OpenAI’s GPT 3.5, and Google’s PaLM 540B.
Inflection-1 achieves 72.7% on average across all 57 tasks, with greater than 90% accuracy on 5 different tasks. On 15 tasks Inflection-1 achieves greater than 85% accuracy. For comparison, a human expert scores an average of 89.8%, whilst an average human rater scores 34.5% overall.
We compare Inflection-1 to many models on the MMLU benchmark. Our model outperforms all models in our compute class including both GPT-3.5 and LLaMA.
Inflection-1 is significantly better at Trivia Questions
On TriviaQA and Natural Questions, two benchmarks that measure the closed book question answering capabilities of a language model, Inflection-1 outperforms LLaMA, Chinchilla, and PaLM 540B, improving upon LLaMA’s TriviaQA performance by 2.1%. On Natural Questions, our model outperforms PaLM 540B by 8.6% and LLaMA by 6%. In fact, our model is competitive with Google’s latest flagship model PaLM 2-L.
On trivia style question answering, our model outperforms LLaMA by a considerable margin and is competitive with Google's recent flagship model, PaLM 2-L. For TriviaQA, we show two different evaluation splits allowing us to compare to LLaMA, Chinchilla, and PaLM.
It is worth noting that the results we present in the memo are those of the Inflection-1 foundation model which has not undergone any fine-tuning or alignment. Pi is powered by Inflection-1 which has been further transformed through a proprietary adaptation process to become a useful and safe personal AI. We plan to release further details of our safety methodology in future technical memos.
For further details, see our technical memo. If you are interested in training world class foundation models and integrating them into a new generation of products, come join us on our mission! Apply here.