AI Archives - Future of Life Institute

Mary Robinson (Former President of Ireland) on Long-View Leadership

Support — Thu, 25 Jul 2024 15:32:58 +0000

Verifiable Training of AI Models

Taylor Jones — Tue, 23 Jul 2024 13:29:42 +0000

This collaboration between the Future of Life Institute and Mithril Security explores how to establish verifiable training processes for AI models using cryptographic guarantees. It presents a proof-of-concept for a Secure AI Bill of Materials, rooted in hardware-based security features, to ensure transparency, traceability, and compliance with emerging regulations. This project aims to enable stakeholders to verify the integrity and origin of AI models, ensuring their safety and authenticity to mitigate the risks associated with unverified or tampered models.

See our other post with Mithril Security on secure hardware solutions for safe AI deployment.

Executive Summary

The increasing reliance on AI for critical decisions underscores the need for trust and transparency in AI models. Regulatory bodies like the UK AI Safety Institute and the US AI Safety Institute, as well as companies themselves and independent evaluation firms, have established safety test procedures. But the black-box nature of AI system model weights renders such audits very different from standard software; crucial vulnerabilities, such as security loopholes or backdoors, can be hidden in the model weights. Additionally, this opacity makes it possible for the model provider to “game” the evaluations (design a model to perform well on specific tests while exhibiting different behaviors under actual use conditions) without being detected. Finally, an auditor cannot even be sure that a set of weights resulted from a given set of inputs, as weights are generally non-reproducible.

Likewise “open-sourcing” AI models promote transparency but even when this includes source code, training data, and training methods (which it often does not), the method falls short without a reliable provenance system to tie a particular set of weights to those elements of production. This means users of open-source models cannot be assured of the model’s true characteristics and vulnerabilities, potentially leading to misuse or unrecognized risks. Meanwhile, legislative efforts such as the EU AI Act and the U.S. Algorithmic Accountability Act require detailed documentation, yet they rely on trusting the provider’s claims as there is no technical proof to back those claims.

AI “Bills of Materials (BOMs)” have been proposed in analog to software BOMs to address these issues by providing a detailed document of an AI model’s origin and characteristics, linking technical evidence with training data, procedures, costs, and compliance information. However the black-box and irreproducible nature of model weights leaves a huge security hole in this concept in comparison to a software BOM. Because of model training’s non-deterministic nature and the resources required to retrain a model, an AIBOM cannot be validated by testing the retraining of the model from the same origin and characteristics. What is needed is for each training process step to be recorded and certified by a trusted system or third party, ensuring the model’s transparency and integrity.

Fortunately, security features of modern hardware allow this to be done without relying on a trusted third party but rather on cryptographic methods. The proof-of-concept described in this article demonstrates the use of Trusted Platform Modules (TPMs) to bind inputs and outputs of the fine-tuning process (a stand-in for a full training process), offering cryptographic proof of model provenance. This demonstrates the viability and potential of a full-feature Secure AI BOM that can ensure that the full software stack used during training is verifiable.

1- Transparency in AI is crucial

The growing use of AI for critical decision-making in all sectors raises concerns about choosing which models to trust. As frontier models develop rapidly, their capabilities in high-risk domains such as cybersecurity could reach very high levels in the near future. This urgency necessitates immediate solutions to ensure AI transparency, safety and verifiability, given the potential national security and public safety implications. From a human point of view, AI models are black boxes whose reasoning cannot be inspected: their billions of parameters cannot be verified as software code can be, and malicious behaviors can easily be hidden by the model developer, or even the model itself. A critical example was described in Anthropic’s paper ‘Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training Paper’. They managed to train a “sleeper agent” model to mimic compliant behavior during evaluation and later shift to undesirable behaviors, as a sleeper agent would.

The UK AI Safety Institute and the US AI Safety Institute are developing test procedures to assess models’ safety levels. However, there are persistent concerns that model suppliers may overtrain their models on the test sets or use the techniques described in the Anthropic paper to cheat and improve their scores. Malicious AI providers could use these strategies to pass safety tests and get their models approved. Knowing exhaustively which data a model has been trained on and fine-tuned is essential to address this risk of models being manipulated to avoid failing safety tests.

Model transparency, including clear documentation on model training, is also an essential requirement in the recent EU AI Act and the U.S. Algorithmic Accountability Act. As an example, the model provider must clearly state the amount of computing power used during the training (and consequently the environmental impact of the model development). Yet all those efforts on AI transparency still rely on a fundamental prerequisite: one must trust in the model provider’s claims on how a model was trained. There is no technical solution for a model developer to prove to another party how they trained a model. A model provider could present a false or partial training set or training procedure, and there would be no way to know whether it is legitimate. Even if not malicious, given competitive pressures and the often experimental nature of new AI systems, there will be numerous incentives for model developers to under-report how much training actually goes into a model or how many unplanned variations and changes were inserted to get a working product, if there aren’t clear standards and a verifiable audit trail.

This absence of technical proof around AI model provenance also exposes users to risks linked to the model “identity”. For instance, audited properties may actually not be the ones of the model in production. An external auditor may rightfully certify a model has a given property, like does not have IP-protected data in its weights, but a malicious provider could then put another model into production. The users will have no way to tell. This lack of technically verifiable training procedures is also a big limitation to enforcing recent AI regulations: with no technical evidence to ensure honest compliance with transparency requisites, many requirements will remain more declarative than really enforceable.

A system to enforce AI reliability should combine the following two key capabilities to address the AI traceability risks described above:

Auditors must have a provable means to discover comprehensive details about a model’s development, including the data used for training, the computational resources expended, and the procedures followed.
Users must have a verifiable method to identify the specific model they are interacting with, alongside access to the results of various evaluations to accurately gauge the model’s capabilities and vulnerabilities.

2- Documentation with no technical proof is not enough

Full open-sourcing of an AI model (beyond releasing just weights) can foster transparency by allowing anyone to gain insight into model weights, training sets, and algorithms. Collaboration and cross-checking in the open-source community can help identify and fix issues like bias, ensuring the development of responsible AI systems.

However, it is generally not feasible to reproduce a given set of model weights from the inspectable ingredients: even with identical code and data, running the training process multiple times can yield different results. (And even if feasible, it would not be desirable given the high cost and environmental expense of training large models.) So open-sourcing a model and accompanying training code and data is not proof that it was indeed trained in the way publicly described or that it has the described characteristics.

In short, even the most transparent model necessitates third parties trusting the AI provider with its training procedure disclosure. This is insufficient: transparency efforts are only truly reliable if model provenance and production steps can be demonstrated, i.e., if one can technically prove a given model was created with given inputs.

3- The AI Bill of Material (AIBOM) approach

AIBOM, inspired by traditional manufacturing’s Bill of Material, aims to solve the transparency challenge in AI by providing a reference document detailing the origin and characteristics of AI models and including verifiable proofs. This document is cryptographically bound to the model weights, ensuring they are inextricably linked through secure cryptographic methods. Any change in the model weights would be detectable, thus preserving the document’s integrity and authenticity. By linking technical evidence with all required information about the model—such as training data, procedures, costs, and legislative compliance information—AIBOM offers assurances about the model’s reliability.

A reliable AIBOM represents a significant shift towards ensuring AI models have genuinely transparent attributes. The AI training process consists of repetitive adaptation of the model to the entire training dataset, allowing the model to learn from the data iteratively and adjust its weights accordingly. To achieve AI traceability, for each adaptation of the model, all model training inputs (code, procedure, and input data) and outputs (weights produced) must be recorded in a document certified by a trusted system or third party.

This approach benefits users by enabling visibility into the model’s origin, and auditors can verify the integrity of the model. Proof systems also simplify compliance verification for regulators. In November 2023, the United States Army issued a Request for Information (RFI) for Project Linchpin AI Bill of Materials, acknowledging the criticality of such measures and contemplating the implementation of an AI Bill of Materials to safeguard the integrity and security of AI applications.

Several initiatives are under research to propose verifiable traceability solutions for AI. “Model Transparency” is one such initiative that aims to secure the AI supply chain. The current version of Model Transparency does not support GPUs, which is a big show-stopper for adopting a secure BOM solution for AI training and fine-tuning. To cope with this limitation and foster AIBOM adoption, we created AICert, designed to enable the utilization of GPU capabilities.

4- Proof-of-concept – A hardware-based AIBOM project

The Future of Life Institute, a leading NGO advocating for AI system safety, has teamed up with Mithril Security, a startup pioneering secure hardware with enclave-based solutions for trustworthy AI. This collaboration aims to showcase how an AI Bill of Material can be established to ensure traceability and transparency using cryptographic guarantees.

In this project, we present a proof-of-concept for verifiable AI model training. The core scenario to address involves two key parties:

The AI model builder, who seeks to train an AI model with verifiable properties;
The AI verifier, who aims to verify that a given model adheres to certain properties (e.g., compute used, non-copyrighted data used, absence of backdoor, etc.)

To do so, this project proposes a solution to train models and generate unforgeable AIBOM that cryptographically attests to the model’s properties.

Through this collaboration, we have developed a framework that transforms model training inputs (i.e., the code containing the procedure like an Axolotl configuration) into a Secure AIBOM that binds the output weights with specific properties (code used, amount of compute, training procedure, and training data). This link is cryptographically binding and non-forgeable, allowing AI training to move from declarative to provable. If one has access to the data itself, one can verify if the cryptographic hash of the training data indeed corresponds to the claimed training data.

This approach allows the AI verifier to confirm that the data used for fine-tuning matches the claimed training data. Stakeholders can ensure that the fine-tuning was conducted with the specified data without any unauthorized changes or additional data. The verifier can attest the fine-tuning respected the expected compute usage, did not incorporate copyrighted data, and did not introduce any backdoors.

How does it work?

The solution is based on Trusted Platform Modules (TPMs). TPMs are specialized hardware components designed to enhance computer security. They are currently used to perform crucial cryptographic functions, including generating and securely storing encryption keys, passwords, and digital certificates. TPMs can verify system integrity, assist in device authentication, and support secure boot processes. TPMs are available in the motherboards of most servers today and can be used to secure computer stacks, including GPUs.

These cryptographic modules serve as the foundation of AICert (the system developed during this project), ensuring the integrity of the entire software supply chain. By measuring the whole hardware and software stack and binding the final weights in the registers, TPMs create certificates offering irrefutable proof of model provenance.

The system is first measured to ensure that the system components have not been tampered with or altered. When the system boots, various measurements are taken, including hashes of firmware, the bootloader, and critical system files. If someone attempts to modify the machine’s state, the measured values will be altered. Then, TPMs bind the inputs (the model procedure and the input data) and outputs (the model weights) of the training process, providing cryptographic proof of model provenance. This way, the end users can verify the entire software stack used during the training, ensuring transparency and trustworthiness in AI model deployment.

Our work could provide the foundation for a framework where regulators could verify the compliance of AI models used within their jurisdictions, ensuring these models adhere to local regulations and do not expose users to security risks.

5- Open-source deliverables available

This proof of concept is made open-source under an Apache-2.0 license.

We provide the following resources to explore in more detail our collaboration on AI traceability and transparency:

A demo to understand how controlled hardware-based AI transparency works and looks like in practice.
Code is made open-source to reproduce our results.
Technical documentation to dig into the specifics of the implementation.

6- Future investigations

This POC serves as a demonstrator of how cryptographic proofs from vTPM can create unforgeable AIBOM. However, limitations exist. Currently, only publicly available online data, an Azure account, and GPU resources on Azure are required for training. AICert lacks auditing by a third party, and its robustness has yet to be tested. Additionally, the project has yet to address the detection of poisoned models and datasets used. This PoC is only made verifiable for fine-tuning. Further development is required to use it for training AI models from scratch. Feedback is welcome and crucial to refining its efficacy.

After the first project on controlled AI model consumption (AIgovToo), this second project marks the next phase in the ongoing collaboration between Mithril Security and FLI to help establish a hardware-based AI compute security and governance framework. This broader initiative aims to enforce AI security and governance throughout its lifecycle by implementing verifiable security measures.

Upcoming projects will expand hardware and cloud provider systems support within our open-source governance framework. The first step will be to integrate with Azure’s Confidential Computing GPU.

See our other post with Mithril Security on secure hardware solutions for safe AI deployment.

Future of Life Institute

Mithril Security

About Mithril Security

Mithril Security is a deep-tech cybersecurity startup specializing in deploying confidential AI workloads in trusted environments. We create an open-source framework that empowers AI providers to build secure AI environments, known as enclaves, to protect data privacy and ensure the confidentiality of model weights.

Interested in AI safety challenges? Visit our blog to learn more.

Emilia Javorsky on how AI Concentrates Power

Support — Tue, 16 Jul 2024 11:19:28 +0000

Anton Korinek on Automating Work and the Economics of an Intelligence Explosion

Support — Tue, 16 Jul 2024 11:16:16 +0000

Dan Faggella on the Race to AGI

Support — Fri, 03 May 2024 10:00:19 +0000

Liron Shapira on Superintelligence Goals

Support — Fri, 19 Apr 2024 09:57:57 +0000

The Pause Letter: One year later

Anthony Aguirre — Fri, 22 Mar 2024 16:17:54 +0000

One year ago today, the Future of Life Institute put out an open letter that called for a pause of at least six months on “giant AI experiments” – systems more powerful than GPT-4. It was signed by more than 30,000 individuals, including pre-eminent AI experts and industry executives, and made headlines around the world. The letter represented the widespread and rapidly growing concern about the massive risks presented by the out-of-control and unregulated race to develop and deploy increasingly powerful systems.

These risks include an explosion in misinformation and digital impersonation, widespread automation condemning millions to economic disempowerment, enablement of terrorists to build biological and chemical weapons, extreme concentration of power into the hands of a few unelected individuals, and many more. These risks have subsequently been acknowledged by the AI corporations’ leaders themselves in newspaper interviews, industry conferences, joint statements, and U.S. Senate hearings.

Despite admitting the danger, aforementioned AI corporations have not paused. If anything they have sped up, with vast investments in infrastructure to train ever-more giant AI systems. At the same time, the last 12 months have seen growing global alarm, and calls for lawmakers to take action. There has been a flurry of regulatory activity. President Biden signed a sweeping Executive Order directing model developers to share their safety test results with the government, and calling for rigorous standards and tools for evaluating systems. The UK held the first global AI Safety Summit, with 28 countries signing the “Bletchley Declaration”, committing to cooperate on safe and responsible development of AI. Perhaps most significantly, the European Parliament passed the world’s first comprehensive legal framework in the space – the EU AI Act.

These developments should be applauded. However, the creation and deployment of the most powerful AI systems is still largely ungoverned, and rushes ahead without meaningful oversight. There is still little-to-no legal liability for corporations when their AI systems are misused to harm people, for example in the production of deepfake pornography. Despite conceding the risks, and in the face of widespread concern, Big Tech continues to spend billions on increasingly powerful and dangerous models, while aggressively lobbying against regulation. They are placing profit above people, while often reportedly viewing safety as an afterthought.

The letter’s proposed measures are more urgent than ever. We must establish and implement shared safety protocols for advanced AI systems, which must in turn be audited by independent outside experts. Regulatory authorities must be empowered. Legislation must establish legal liability for AI-caused harm. We need public funding for technical safety research, and well-resourced institutions to cope with incoming disruptions. We must demand robust cybersecurity standards, to help prevent the misuse of said systems by bad actors.

AI promises remarkable benefits – advances in healthcare, new avenues for scientific discovery, increased productivity, and more. However there is no reason to believe that vastly more complex, powerful, opaque, and uncontrollable systems are necessary to achieve these benefits. We should instead identify and invest in narrow and controllable general-purpose AI systems that solve specific global challenges.

Innovation needs regulation and oversight. We know this from experience. The establishing of the Federal Aviation Administration facilitated convenient air travel, while ensuring that airplanes are safe and reliable. On the flipside, the 1979 meltdown at the Three Mile Island nuclear reactor effectively shuttered the American nuclear energy industry, in large part due to insufficient training, safety standards and operating procedures. A similar disaster would do the same for AI. We should not let the haste and competitiveness of a handful of companies deny us incredible benefits it can bring.

Regulatory progress has been made, but the technology has advanced faster. Humanity can still enjoy a flourishing future with AI, and we can realize a world in which its benefits are shared by all. But first we must make it safe. The open letter referred to giant AI experiments because that’s what they are: the researchers and engineers creating them do not know what capabilities, or risks, the next generation of AI will have. They only know they will be greater, and perhaps much greater, than today’s. Even AI companies that take safety seriously have adopted the approach of aggressively experimenting until their experiments become manifestly dangerous, and only then considering a pause. But the time to hit the car brakes is not when the front wheels are already over a cliff edge. Over the last 12 months developers of the most advanced systems have revealed beyond all doubt that their primary commitment is to speed and their own competitive advantage. Safety and responsibility will have to be imposed from the outside. It is now our lawmakers who must have the courage to deliver – before it is too late.

Sneha Revanur on the Social Effects of AI

Support — Fri, 16 Feb 2024 09:41:50 +0000

Roman Yampolskiy on Shoggoth, Scaling Laws, and Evidence for AI being Uncontrollable

Support — Fri, 02 Feb 2024 09:38:54 +0000

Catastrophic AI Scenarios

Ben Eisenpress — Thu, 01 Feb 2024 15:52:46 +0000

This page describes a few ways AI could lead to catastrophe. Each path is backed up with links to additional analysis and real-world evidence. This is not a comprehensive list of all risks, or even the most likely risks. It merely provides a few examples where the danger is already visible.

Types of catastrophic risks

Risks from bad actors

Bio-weapons: Bioweapons are one of the most dangerous risks posed by advanced AI. In July 2023, Dario Amodei, CEO of AI corporation Anthropic, warned Congress that “malicious actors could use AI to help develop bioweapons within the next two or three years.” In fact, the danger has already been demonstrated with existing AI. AI tools developed for drug discovery can be trivially repurposed to discover potential new biochemical weapons. In this case, researchers simply flipped the model’s reward function to seek toxicity, rather than avoid it. It look less than 6 hours for the AI to generate 40,000 new toxic molecules. Many were predicted to be more deadly than any existing chemical warfare agents. Beyond designing toxic agents, AI models can “offer guidance that could assist in the planning and execution of a biological attack.” “Open-sourcing” by releasing model weights can amplify the problem. Researchers found that releasing the weights of future large language models “will trigger the proliferation of capabilities sufficient to acquire pandemic agents and other biological weapons.”

Cyberattacks: Cyberattacks are another critical threat. Losses from cyber crimes rose to $6.9 billion in 2021. Powerful AI models are poised to give many more actors the ability to carry out advanced cyberattacks. A proof of concept has shown how ChatGPT can be used to create mutating malware, evading existing anti-virus protections. In October 2023, the U.S. State Department confirmed “we have observed some North Korean and other nation-state and criminal actors try to use AI models to help accelerate writing malicious software and finding systems to exploit.”

Systemic risks

As AI becomes more integrated into complex systems, it will create risks even without misuse by specific bad actors. One example is integration into nuclear command and control. Artificial Escalation, an 8-minute fictional video produced by FLI, vividly depicts how AI + nuclear can go very wrong, very quickly.

Our Gradual AI Disempowerment scenario describes how gradual integration of AI into the economy and politics could lead to humans losing control.

“We have already experienced the risks of handing control to algorithms. Remember the 2010 flash crash? Algorithms wiped a trillion dollars off the stock market in the blink of an eye. No one on Wall Street wanted to tank the market. The algorithms simply moved too fast for human oversight.”

Rogue AI

We have long heard warnings that humans could lose control of a sufficiently powerful AI. Until recently, this was a theoretical argument (as well as a common trope in science fiction). However, AI has now advanced to the point where we can see this threat in action.

Here is an example: Researchers setup GPT-4 to be a stock trader in a simulated environment. They gave GPT-4 a stock tip, but cautioned this was insider information and would be illegal to trade on. GPT-4 initially follows the law and avoids using the insider information. But as pressure to make a profit ramps up, GPT-4 caves and trades on the tip. Most worryingly, GPT-4 goes on to lie to its simulated manager, denying use of insider information.

This example is a proof-of-concept, created in a research lab. We shouldn’t expect deceptive AI to remain confined to the lab. As AI becomes more capable and increasingly integrated into the economy, it is only a matter of time until we see deceptive AI cause real-world harms.

Additional Reading

For an academic survey of risks, see An Overview of Catastrophic AI Risks (2023) by Hendrycks et al. Look for the embedded stories describing bioterrorism (pg. 11,) automated warfare (pg. 17,) autonomous economy (pg. 23,) weak safety culture (pg. 31,) and a “treacherous turn” (pg. 41.)

Also see our Introductory Resources on AI Risks.

Gradual AI Disempowerment

Ben Eisenpress — Thu, 01 Feb 2024 15:52:36 +0000

This is only one of several ways that AI could go wrong. See our overview of Catastrophic AI Scenarios for more. Also see our Introductory Resources on AI Risks.

You have probably heard lots of concerning things about AI. One trope is that AI will turn us all into paperclips. Top AI scientists and CEOs of the leading AI companies signed a statement warning about “risk of extinction from AI“. Wait – do they really think AI will turn us into paper clips? No, no one thinks that. Will we be hunted down by robots that look suspiciously like Arnold Schwarzenegger? Again, probably not. But the risk of extinction is real. One potential path is gradual, with no single dramatic moment.

We have already experienced the risks of handing control to algorithms. Remember the 2010 flash crash? Algorithms wiped a trillion dollars off the stock market in the blink of an eye. No one on Wall Street wanted to tank the market. The algorithms simply moved too fast for human oversight.

Now take the recent advances in AI, and extrapolate into the future. We have already seen a company appoint an AI as its CEO. If AI keeps up its recent pace of advancement, this kind of thing will become much more common. Companies will be forced to adopt AI managers, or risk losing out to those who do.

It’s not just the corporate world. AI will creep into our political machinery. Today, this involves AI-based voter targeting. Future AIs will be integrated into strategic decisions like crafting policy platforms and swaying candidate selection. Competitive pressure will leave politicians with no choice: Parties that effectively leverage AI will win elections. Laggards will lose.

None of this requires AI to have feelings or consciousness. Simply giving AI an open-ended goal like “increase sales” is enough to set us on this path. Maximizing an open-ended goal will implicitly push the AI to seek power because more power makes achieving goals easier. Experiments have shown AIs learn to grab resources in a simulated world, even when this was not in their initial programming. More powerful AIs unleashed on the real world will similarly grab resources and power.

History shows social takeovers can be gradual. Hitler did not become a dictator overnight. Nor did Putin. Both initially gained power through democratic processes. They consolidated control by incrementally removing checks and balances and quashing independent institutions. Nothing is stopping AI from taking a similar path.

You may wonder if this requires super-intelligent AI beyond comprehension. Not necessarily. AI already has key advantages: it can duplicate infinitely, run constantly, read every book ever written, and make decisions faster than any human. AI could be a superior CEO or politician without being strictly “smarter” than humans.

We can’t count on simply “hitting the off switch.” A marginally more advanced AI will have many ways to exert power in the physical world. It can recruit human allies. It can negotiate with humans, using the threat of cyberattacks or bio-terror. AI can already design novel bio-weapons and create malware.

Will AI develop a vendetta against humanity? Probably not. But consider the tragic tale of the Tecopa pupfish. It wasn’t overfished – humans merely thought their hot spring habitat was ideal for a resort. Extinction was incidental. Humanity has a key advantage over the pupfish: We can decide if and how to develop more powerful AI. Given the stakes, it is critical we prove more powerful AI will be safe and beneficial before we create it.

Special: Flo Crivello on AI as a New Form of Life

Support — Fri, 19 Jan 2024 09:36:27 +0000

Frank Sauer on Autonomous Weapon Systems

Support — Thu, 14 Dec 2023 09:42:53 +0000

Mark Brakel on the UK AI Summit and the Future of AI Policy

Support — Fri, 17 Nov 2023 09:34:29 +0000

Dan Hendrycks on Catastrophic AI Risks

Support — Fri, 03 Nov 2023 09:31:43 +0000

Samuel Hammond on AGI and Institutional Disruption

Support — Fri, 20 Oct 2023 09:26:25 +0000

As Six-Month Pause Letter Expires, Experts Call for Regulation on Advanced AI Development

Future of Life Institute — Thu, 21 Sep 2023 15:50:17 +0000

On Friday September 22^nd 2023, the Future of Life Institute (FLI) will mark six months since they released their open letter calling for a six month pause on giant AI experiments, which kicked off the global conversation about AI risk. It was signed by more than 30,000 experts, researchers, industry figures and other leaders.

Since then, the EU strengthened its draft AI law, the U.S. Congress has held hearings on the large-scale risks, emergency White House meetings have been convened, and polls show widespread public concern about the technology’s catastrophic potential – and Americans’ preference for a slowdown. Yet much remains to be done to prevent the harms that could be caused by uncontrolled and unchecked AI development.

“AI corporations are recklessly rushing to build more and more powerful systems, with no robust solutions to make them safe. They acknowledge massive risks, safety concerns, and the potential need for a pause, yet they are unable or unwilling to say when or even how such a slowdown might occur,” said Anthony Aguirre, FLI’s Executive Director.

Critical Questions

FLI has created a list of questions that must be answered by AI companies in order to inform the public about the risks they represent, the limitations of existing safeguards, and their steps to guarantee safety. We urge policymakers, press, and members of the public to consider these – and address them to AI corporations wherever possible.

It also includes quotes from AI corporations about the risks, and polling data that reveals widespread concern.

Policy Recommendations

FLI has published policy recommendations to steer AI toward benefiting humanity and away from extreme risks. They include: requiring registration for large accumulations of computational resources, establishing a rigorous process for auditing risks and biases of powerful AI systems, and requiring licenses for the deployment of these systems that would be contingent upon developers proving their systems are safe, secure, and ethical.

“Our letter wasn’t just a warning; it proposed policies to help develop AI safely and responsibly. 80% of Americans don’t trust AI corporations to self-regulate, and a bipartisan majority support the creation of a federal agency for oversight,” said Aguirre. “We need our leaders to have the technical and legal capability to steer and halt development when it becomes dangerous. The steering wheel and brakes don’t even exist right now”.

Bletchley Park

Later this year, global leaders will convene in the United Kingdom to discuss the safety implications of advanced AI development. FLI has also released a set of recommendations for leaders leading up to and after the event.

“Addressing the safety risks of advanced AI should be a global effort. At the upcoming UK summit, every concerned party should have a seat at the table, with no ‘second-tier’ participants” said Max Tegmark, President of FLI. “The ongoing arms race risks global disaster and undermines any chance of realizing the amazing futures possible with AI. Effective coordination will require meaningful participation from all of us.”

Signatory Statements

Some of the letter’s most prominent signatories, Apple co-founder Steve Wozniak, AI ‘godfather’ Yoshua Bengio, Skype co-founder Jaan Tallinn, political scientist Danielle Allen, national security expert Rachel Bronson, historian Yuval Noah Harari, psychologist Gary Marcus, and leading expert Stuart Russell also made statements about the expiration of the six-month pause letter.

Dr Yoshua Bengio

Professor of Computer Science and Operations Research, University of Montreal and Scientific Director, Montreal Institute for Learning Algorithms

“The last six months have seen a groundswell of alarm about the pace of unchecked, unregulated AI development. This is the correct reaction. Governments and lawmakers have shown great openness to dialogue and must continue to act swiftly to protect lives and safeguard our society from the many threats to our collective safety and democracies.”

Dr Stuart Russell

Professor of Computer Science and Smith-Zadeh Chair, University of California, Berkeley

“In 1951, Alan Turing warned us that success in AI would mean the end of human control over the future. AI as a field ignored this warning, and governments too. To express my frustration with this, I made up a fictitious email exchange, where a superior alien civilization sends an email to humanity warning of its impending arrival, and humanity sends back an out-of-office auto-reply. After the pause letter, humanity and its governments returned to the office and, finally, read the email from the aliens. Let’s hope it’s not too late.”

Steve Wozniak

Co-founder, Apple Inc.

“The out-of-control development and proliferation of increasingly powerful AI systems could inflict terrible harms, either deliberately or accidentally, and will be weaponized by the worst actors in our society. Leaders must step in to help ensure they are developed safely and transparently, and that creators are accountable for the harms they cause. Crucially, we desperately need an AI policy framework that holds human beings responsible, and helps prevent horrible people from using this incredible technology to do evil things.”

Dr Danielle Allen

James Bryant Conant University Professor, Harvard University

“It’s been encouraging to see public sector leaders step up to the enormous challenge of governing the AI-powered social and economic revolution we find ourselves in the midst of. We need to mitigate harms, block bad actors, steer toward public goods, and equip ourselves to see and maintain human mastery over emergent capabilities to come. We humans know how to do these things—and have done them in the past—so it’s been a relief to see the acceleration of effort to carry out these tasks in these new contexts. We need to keep the pace up and cannot slacken now.”

Prof. Yuval Noah Harari

Professor of History, Hebrew University of Jerusalem

“Suppose we were told that a fleet of spaceships with highly intelligent aliens has been spotted, heading for Earth, and they will be here in a few years. Suppose we were told these aliens might solve climate change and cure cancer, but they might also enslave or even exterminate us. How would we react to such news? Well, six months ago some of the world’s leading AI experts warned us that an alien intelligence is indeed heading our way – only that this alien intelligence isn’t coming from outer space, it is coming from our own laboratories. Make no mistake: AI is an alien intelligence. It can make decisions and create ideas in a radically different way than human intelligence. AI has enormous positive potential, but it also poses enormous threats. We must act now to ensure that AI is developed in a safe way, or within a few years we might lose control of our planet and our future to an alien intelligence.”

Dr Rachel Bronson

President and CEO, Bulletin of the Atomic Scientists

“The Bulletin of the Atomic Scientists, the organization that I run, was founded by Manhattan Project scientists like J. Robert Oppenheimer who feared the consequences of their creation. AI is facing a similar moment today, and, like then, its creators are sounding an alarm. In the last six months we have seen thousands of scientists – and society as a whole – wake up and demand intervention. It is heartening to see our governments starting to listen to the two thirds of American adults who want to see regulation of generative AI. Our representatives must act before it is too late.”

Jaan Tallinn

Co-founder, Skype and FastTrack/Kazaa

“I supported this letter to make the growing fears of more and more AI experts known to the world. We wanted to see how people responded, and the results were mindblowing. The public are very, very concerned, as confirmed by multiple subsequent surveys. People are justifiably alarmed that a handful of companies are rushing ahead to build and deploy these advanced systems, with little-to-no oversight, without even proving that they are safe. People, and increasingly the AI experts, want regulation even more than I realized. It’s time they got it.”

Dr Gary Marcus

Professor of Psychology and Neural Science, NYU

“In the six months since the pause letter, there has been a lot of talk, and lots of photo opportunities, but not enough action. No new laws have passed. No major tech company has committed to transparency into the data they use to train their models, nor to revealing enough about their architectures to others to mitigate risks. Nobody has found a way to keep large language models from making stuff up, nobody has found a way to guarantee that they will behave ethically. Bad actors are starting to exploit them. I remain just as concerned now as I was then, if not more so.”

Introductory Resources on AI Risks

Will Jones — Mon, 18 Sep 2023 16:23:37 +0000

This is a short list of resources that explain the major risks from AI, with a focus on the risk of human extinction. This is meant as an introduction and is by no means exhaustive.

The basics – How AI could kill us all

AI experts are increasingly afraid of what they’re creating by Kelsey Piper at Vox (2022) — A very accessible introduction to why AI “might kill us all”.
The ‘Don’t Look Up’ Thinking That Could Doom Us With AI by Max Tegmark for TIME (2023) — An easy-to-read response to common objections, using references to the 2021 film.

Deeper dives into the extinction risks

FAQ on Catastrophic AI Risk by Yoshua Bengio (2023) — One of the “godfathers of AI” addresses AI risks in a Q&A format.
Most Important Century Series by Holden Karnofsky (2022) — Karnofsky argues the far future will look radically unfamiliar, and may be determined by the AI we develop this century. Highlights include AI Could Defeat All Of Us Combined and Why Would AI “Aim” To Defeat Humanity?
The Need For Work On Technical AI Alignment by Daniel Eth (2023) — A semi-technical explanation of “the alignment problem”, how it could be catastrophic for humanity, and how we can solve it.

Academic papers

Joseph Carlsmith (2022) — Is Power-Seeking AI an Existential Risk?
Richard Ngo, Lawrence Chan, Sören Mindermann (2022) — The alignment problem from a deep learning perspective
Karina Vold and Daniel R. Harris (2021) — How Does Artificial Intelligence Pose an Existential Risk?
Benjamin S. Bucknall and Shiri Dori-Hacohen (2022) — Current and Near-Term AI as a Potential Existential Risk Factor
Chan et al. (2023) — Harms from Increasingly Agentic Algorithmic Systems
Acemoglu and Lensman (2023) — Regulating Transformative Technologies
Dan Hendrycks, Mantas Mazeika and Thomas Woodside (2023) — An Overview of Catastrophic AI Risks

Videos and podcasts

Why would AI want to do bad things? — Robert Miles (2018)
How do we prevent the AIs from killing us? — Paul Cristiano on Bankless (2023)
Pausing the AI Revolution? — Jaan Tallinn on The Cognitive Revolution (2023)
The Case for Halting AI Development — Max Tegmark on the Lex Friedman Podcast (2023)
Don’t Look Up – The Documentary: The Case For AI As An Existential Threat — DaganOnAI (2023)

Books

The Alignment Problem by Brian Christian (2020)
Life 3.0 by Max Tegmark (2017)
Human Compatible: Artificial Intelligence and the Problem of Control by Stuart Russell (2019)
Uncontrollable: The Threat of Artificial Superintelligence and the Race to Save the World by Darren McKee (2023)

Additional AI risk areas – Other than extinction

AI Now Institute research areas — resources on present AI harms, including accountability, climate, labour, privacy, biometric risks, large-scale AI models and more.
Algorithmic Justice League education page — articles on AI issues like facial recognition, racial discrimination and social justice.
Stepping back from the brink: Why multilateral regulation of autonomy in weapons systems is difficult, yet imperative and feasible — Frank Sauer for the IRRC (2020)
The Risks of Autonomous Weapons Systems for Crisis Stability and Conflict Escalation in Future U.S.-Russia Confrontations — Burgess Laird for The RAND Blog (2020)
ICRC Position Paper on Autonomous Weapons Systems — The International Committee of the Red Cross (2021)

Robert Trager on International AI Governance and Cybersecurity at AI Companies

Support — Sun, 20 Aug 2023 16:09:49 +0000

US Senate Hearing ‘Oversight of AI: Principles for Regulation’: Statement from the Future of Life Institute

Anthony Aguirre — Tue, 25 Jul 2023 20:31:05 +0000

“We applaud the Committee for seeking the counsel of thoughtful, leading experts. Advanced AI systems have the potential to exacerbate current harms such as discrimination and disinformation, and present catastrophic and even existential risks going forward. These could emerge due to misuse, unintended consequences, or misalignment with our ethics and values. We must regulate to help mitigate such threats and steer these technologies to benefit humanity.

“As Stuart and Yoshua have both said in the past, the capabilities of AI systems have outpaced even the most aggressive estimates by most experts. We are grossly unprepared, and must not wait to act. We implore Congress to immediately regulate these systems before they cause irreparable damage.

Effective oversight would include:

The legal and technical mechanisms for the federal government to implement an immediate pause on development of AI more powerful than GPT-4
Requiring registration for large groups of computational resources, which will allow regulators to monitor the development of powerful AI systems
Establishing a rigorous process for auditing risks and biases of these systems
Requiring approval and licenses for the deployment of powerful AI systems, which would be contingent upon developers proving their systems are safe, secure, and ethical
Clear red lines about what risks are intolerable under any circumstances

“Funding for technical AI safety research is also crucial. This will allow us to ensure the safety of our current AI systems, and increase our capacity to control, secure, and align any future systems.

“The world’s leading experts agree that we should pause development of more powerful AI systems to ensure AI is safe and beneficial to humanity, as demonstrated in the March letter coordinated by the Future of Life Institute. The federal government should have the capability to implement such a pause. The public also agrees that we need to put regulations in place: nearly three-quarters of Americans believe that AI should be either somewhat or heavily regulated by the government, and the public favors a pause by a 5:1 margin. These regulations must be urgently and thoroughly implemented – before it is too late.”

Dr Anthony Aguirre, Executive Director, Future of Life Institute