Our policy on frontier safety

October 30, 2023: Inflection’s mission is to build a personal artificial intelligence (AI) for everyone. That means an AI that is a trusted partner: an advisor, companion, teacher, coach, and assistant rolled into one. To achieve this promise and serve such an important role in the lives of our users, we must continuously work to ensure that our products are safe, secure, and trustworthy.

As part of this, Inflection joined the voluntary commitments developed with the White House in July 2023 to manage the risks posed by AI. We believe cross-company commitments such as these are critical to setting good norms in both the industry and research setting globally. They also crystallize our individual dedication to working on frontier AI systems in a responsible manner.

In advance of the AI Safety Summit to be held November 1-2, 2023, the UK Government requested that Inflection outline our existing and evolving policies on nine areas of AI Safety. These areas include the priorities identified in the July White House commitments, as well as additional areas of interest.

Below is our response, detailing our approach to each of the requested areas. It also serves as a brief summary of our progress to date against the White House voluntary commitments. As Inflection continues to advance work in all areas of AI Safety, we plan to issue periodic reports of this kind to update the public on its progress.

Area 1: Responsible Capability Scaling

Inflection believes that the promise of the latest generation of AI is tremendous. Given the global challenges facing society in this century, we believe that it is imperative to continue to make progress in this technology. This is a moment for seizing the massive opportunities on offer to boost global prosperity and wellbeing at a time when both are faltering.

At the same time, Inflection recognizes that scaling these models must be done responsibly. It therefore believes in a disciplined, phased approach that steadily increases model scale while rigorously checking that it is safe at each step. This includes investments in (1) pre-deployment safety, (2) post-launch monitoring and mitigation, and (3) global governance efforts that attempt to standardize practices across the research community.

This incorporates each of the AI Safety areas that will be discussed in further detail below:

  • Pre-Deployment Safety: Frontier AI must pass a robust, continuously improved set of evaluations and third-party red-teaming in order to be cleared for any form of public launch (Area 2), and must conform to a set of rigorous internal restrictions on data use (Area 3). This work must be done under conditions of the highest security to avoid the risk of leaking “unauthorized” models or methodologies (Area 4).

  • Post-Launch Monitoring and Mitigation: Safety is a continuous process that does not end with launch. We monitor and investigate concerning platform activity to identify emerging safety issues with deployed models (Area 5). This is combined with public disclosures (Area 6) that help external stakeholders assess the risk and the appropriate applications of the technology, and methods for third-parties to report vulnerabilities to us (Area 7).

  • Global Governance: The world is less safe if frontier AI developers operate largely independently of one another. Inflection is a strong supporter of collaborations across industry, government, academia and civil society that help to share know-how, standardize efforts and ultimately produce effective techniques, norms, protocols, regulations, institutions and treaties around AI. This includes investments to research the risks of AI (Area 8) and build standards around identifiers of AI-generated material (Area 9).

Area 2: Model Evaluations and Red Teaming

The powerful capabilities and sometimes unpredictable behavior of frontier AI systems necessitate that the technology industry move away from a “launch and iterate” paradigm. Instead, the industry must instead learn to “iterate and launch”: rigorously stress testing and evaluating models many times before allowing it to be released to the public.

Inflection has been guided by this approach since the very beginning. Our dedicated in-house safety team is tasked with conducting rigorous pre-launch safety reviews that combine both qualitative and quantitative evaluations of a candidate model for release. Failure to meet our robust internal evaluation standards will bar a model from launch. This includes standards restricting our models from providing harmful information, engaging in hateful or discriminatory conduct, and distributing explicit content, among others. Importantly, this team is independent and expert: it is not beholden to a product or technical team, and is empowered to make hard, sometimes internally unpopular decisions that may require the company to delay a launch or limit or even scrap a feature.

Red-teaming is and will continue to be the engine at the heart of our evaluation framework. Red-teams provide the best indication of how a model will perform in real-world situations when released publicly, and expose the model to a wide range of pressures and exploits that flag up any vulnerabilities.

To do this, we commission outside experts as well as relying on our safety team. Inflection is currently building teams of highly specialized red-teamers that can bring their unique expertise to investigate models in a manner our “in-house” teams would not have the context to do effectively. This has included collaborations with mental health professionals to ensure that our models perform well in sensitive or difficult conversations with users, as well as an effort investigating whether our models present a biosecurity risk.

This domain expert red-teaming is growing as a component of our operations, consistent with the July voluntary commitments. Further expansion of these teams and capabilities, including on CBRN (chemical, biological, radiological, and nuclear) risks among others, is now underway and will mark the next phase of development in the area.

Area 3: Data Input Controls and Audits

Inflection trains its language models primarily on the same publicly available datasets used by many other academic researchers and other AI companies. But, whatever the source, we recognize that the use of datasets for training must be done conscientiously, with due consideration to legal, safety, and ethical concerns.

Within Inflection, these issues receive the highest level of executive consideration. All data used in pre-training, fine-tuning, and any other forms of model refinement is reviewed by our legal team in collaboration with the Chief Executive Officer of the company. This evaluation addresses whether the data should be used at all, and – if acceptable – under what terms the data should be used.

Beyond this, we document our data practices and provide easy ways for users to reach out. Our Privacy Policy documents our usage of data inputted post-deployment, and provides contact information for further inquiries on the lifecycle of data within Inflection.

Area 4: Security Controls Including Securing Model Weights

Commitment to practices like rigorous pre-deployment evaluation and red-teaming only contributes to greater global safety when accompanied by effective security controls. Leaked models can be easily repurposed for malicious purposes, and the unauthorized disclosure of safety protocols and other forms of know-how may allow for harmful exploitation.

To that end, Inflection ensures a high level of security, both for internal systems and externally-facing products and services. This includes internal restrictions on data and code access, technical safeguards against unauthorized access, and a bug bounty (Area 7) that helps in proactively surfacing vulnerabilities. For security reasons it is not possible to disclose all the measures taken, but they are competitive with the best standards in the field.

Beyond this, the next phase of progress requires a cross-industry information sharing effort. Frontier AI companies are valuable targets for a variety of actors, many of whom may have cyber-capabilities posing a significant threat. By pooling intelligence about persistent threats between companies, the industry would be able to better coordinate in hardening their infrastructure and thwarting malicious attempts to access sensitive data.

Area 5: Preventing and Monitoring Model Misuse

Safety is not just a function of the model itself, but the behavior of users as well. Sophisticated users with malicious intent may thwart security protocols and apply models to inappropriate or harmful purposes. Monitoring and rapid response is therefore a mandatory part of any framework for ensuring the safety of frontier AI systems.

Within Inflection, our safety team maintains around-the-clock monitoring and investigation of suspicious platform behaviors to identify issues and mitigate “in the wild” threats in real time. This includes periodic reviews examining the observed safety of production systems across critical areas. It also includes “tripwire” systems that immediately escalate unusual behaviors to the attention of our on-call safety experts, such as behavioral patterns associated with systematic efforts to undermine our model safety protocols. Once alerted, the safety team has authorization to leverage a full range of tools in blocking malicious actors from accessing our systems, and implementing mitigations to limit the harm of an unanticipated vulnerability on an ongoing basis.

Going forwards, the next phase of development in this area is to apply frontier AI itself to maximize the situational awareness and effectiveness of our safety team. We are experimenting with the use of language models in identifying misuse on the platform that will help our efforts scale over time.

Area 6: Model Reporting and Information Sharing

Model reporting is a cornerstone of ensuring safety in the AI space. Without clear disclosures of the capabilities and limitations of a model, it will be challenging for downstream users, governments, and society at large to assess their risks.

As a starting point, Inflection has, since its launch, maintained a public website disclosing its approach on safety. This site is intended to provide a candid, non-technical description of how Inflection develops its policies, aligns its models to those policies, and monitors model behavior. It also discloses a list of the known issues and limitations associated with our models. This page is periodically reviewed and updated as our thinking on safety and understanding of the risks continue to evolve.

Our next milestone is the publication of a more in-depth technical safety paper which will follow our June 2023 technical memo reviewing the performance of the Inflection-1 model against various benchmarks. We support the long-standing consensus around the value of model cards, and are implementing a process by which each new model released by Inflection will be accompanied by applicable safety documentation. Furthermore, we believe in the value of establishing official government channels for model reporting, and intend to work with policymakers to identify the most effective ways of doing so.

Area 7: Reporting Structure for Vulnerabilities

Inflection has been pleased to see greater recognition within the industry that security for AI depends not just on the internal efforts of technology developers, but also on creating open channels that allow vulnerabilities to be disclosed by third-parties. This draws on the hard-won experience within the cybersecurity space, and we applaud efforts to rapidly adapt those frameworks within the area of AI.

Since the July voluntary commitments, Inflection has implemented a closed pilot bug bounty program, which invites security researchers to proactively identify technical and model vulnerabilities to our team. This is inclusive of “traditional” security vulnerabilities that may pose risks to data or infrastructure, as well as vulnerabilities existing in the model itself, such as a susceptibility to novel prompt engineering approaches.

Our experience with bug bounties confirms their value in the frontier AI context. Our pilot program has already improved our ability to proactively identify potential issues, enabling our safety team to implement fixes faster. This is an initial step. This pilot phase of the bounty program allows us to calibrate policies to maximize their ability to accelerate our internal safety team. Prior to the end of the year, we intend to open up this bug bounty program more widely to the public.

There remains considerable fragmentation across model bug bounty programs as to what constitutes a clear “vulnerability” and how to weight the severity of a given vulnerability. We feel that greater standardization for these disclosures may be a public good, making it easier for the security community to contribute to multiple bounty programs and build experience across various frontier AI systems. Inflection is seeking opportunities to work across the industry to achieve greater harmonization on this front in the coming months.

Area 8: Prioritising Research on Risks Posed by AI

AI’s impact on society will be significant and multifaceted. We are dedicated to “prioritizing research on the societal risks that AI systems can pose”, as described in the July voluntary commitments. Our progress on this front comprises both internal research and external work on governance.

Internally, Inflection believes that personal AIs can serve as empathetic companions that help people grow intellectually and emotionally over a period of years or even decades.** Doing this well requires an understanding of the opportunities and risks that is grounded in long-standing research in the fields of psychology and sociology.** We are presently building our internal research team on these issues, and will be releasing our research on these topics as we enter 2024.

On the governance front, we believe that much of the discourse around AI continues to be hindered by the lack of a respected body tasked with producing consistent, rigorous, and objective data about the global state of AI safety and best practice. Along with other industry leaders for example, we have publicly called for the creation of an International Panel on AI Safety (IPAIS), which would mirror the role that the International Panel on Climate Change (IPCC) has served in the environmental context, alongside advocating for this body in a wide variety of fora. We will continue to pursue efforts to establish this body in the coming months, and maintain a wide set of engagements across the policy and regulation space.

Area 9: Identifiers of AI-Generated Material

Fast and effective identification of AI-generated material is an urgent area of concern. We believe that this problem is particularly challenging since it requires cross-industry collaboration to effectively address. Without standardized methods of watermarking and identification, the space will remain highly fragmented, leaving third-parties seeking to prevent harmful uses without straightforward means of identifying and attributing synthetic content. We are proactively talking to peers and others about how to establish and implement standards in this domain.

One area that Inflection prioritizes here is child safety. We are supporting the development of industry standards and guidance to help address threats posed by generative AI to children. One component of this work is the standardization of watermarking and provenance indicators in generative AI, an important lever to prevent the abusive use of these tools for creating synthetic sexual imagery.