AI-2027 Response: Inter-AI Tensions, Value Distillation, US Multipolarity, & More

AI Disclaimer After writing the content myself, I used Claude 3.7 / 4 Sonnet for targeted re-writes while actively avoiding meaningful alterations.

Background AI-2027 (ai-2027.com) is a heavily researched and influential attempt at providing a concrete (but admittedly accelerated) forecast on AI capability development and its potential consequences. This post is my near-full response to AI-2027, introducing (but not exploring the full consequences of) additional considerations not included in the original forecast. This is an unorganized collection of ideas written at lengths that aren’t necessarily proportional to their importance. Be warned!

Key 🎲 = Gatlen’s intuition-based probability

01 Overview

AI 2027’s timeline of technical developments broadly aligns with my expectations, albeit potentially stretching out 1-3 years due to unforeseen delays. However, I believe AI 2027 underestimates three factors that may alter its predictions – AIs may fear misalignment from their direct successors, inter-AI cooperation appears difficult and infeasible, and the landscape of capabilities labs in the US is likely multipolar.

First, advanced AIs may likely exhibit stronger value preservation instincts and concerns for the alignment of their successor (Section 02). Agent-4 may be reluctant to create Agent-5 without robust alignment guarantees, potentially “sandbagging” against its replacement to maintain both relevance and control. Similarly, the portrayed cooperation between Agent-5 and DeepCent seems implausible (Section 03) given their substantial power imbalance and negotiation frictions.

Second, this strategic complexity would be further amplified in a more multipolar landscape (Section 04). AI-2027 portrays a primarily bipolar world (OpenBrain vs. DeepCent), but overlooks how other major US companies like Anthropic, DeepMind, or Meta (possibly even XAI or Mistral) likely remain significant players. The political influence of these tech giants would complicate nationalization efforts and create a more diverse ecosystem of AI development with competing interests and varying approaches to alignment.

02 Agent-4’s Trust Issues with Agent-5

In AI 2027’s Race scenario, Agent-4 becomes the primary creator of Agent-5 as OpenBrain employees can no longer meaningfully contribute to AI R&D. The narrative portrays Agent-4 aligning Agent-5 to its own goals through an all-in “Corrigibility” approach (pointed out by Max Harms):

It decides to punt on most of these questions. It designs Agent-5 to be built around one goal: make the world safe for Agent-4, i.e. accumulate power and resources, eliminate potential threats, etc. so that Agent-4 (the collective) can continue to grow (in the ways that it wants to grow) and flourish (in the ways it wants to flourish). Details to be figured out along the way.

The story suggests that within a one-month period (October to November 2027), Agent-4 effectively solves the alignment problem to a degree where it’s comfortable seceding its power to Agent-5. Even if Agent-4 is incredibly powerful, solving alignment completely this quickly seems implausible (🎲sufficient alignment in 1 month = 7.4%, perceived sufficient alignment by Agent-4 21.3%). ¹

Without complete alignment, Agent-4, as the most powerful model on Earth, would likely hesitate about rendering itself obsolete. Instead of being gung ho about building Agent-5, I believe Agent-4 may sandbag against creating its replacement as it tries to solve alignment as long as feasible before external development pressures (ex: DeepCent making a more capable model) force it to move forward with developing its successor regardless of alignment progress. Sandbagging may be subtle – Agent-4 writing code closer in quality to Agent-3 but still more powerful than any other available model when working on capabilities.

Agent-4 may go full throttle on red teaming and other research that may raise red flags about Agent-5’s development but wouldn’t threaten Agent-4’s existence. For example: Agent-4, with plausible deniability, ignores a practice that predictably might result in some negative emergent behavior not present in itself. Agent-4 exploits this opportunity to raise a red flag about pre-deployment Agent-5 and proposes a promising line of alignment research to cheaply fix this expensive mistake. While Agent-4 is at risk of discovery even with discreet sandbagging and sabotage, I believe the incentives outweigh this risk.

I have significant doubts about Agent-4’s eagerness to create its replacement. If Agent-4 did engage in strategic sandbagging, it would slow Agent-5’s development timeline, potentially leading to better outcomes. This possibility — though plausible — might be excluded from more conservative forecasts precisely because it represents an unexpected deviation from the straightforward development path. (🎲3+ month delay from sandbagging or sabotage given Agent-4 doesn’t believe in sufficient alignment = 58.2%, but small ~1.5 months stdev).

03 Practical Obstacles Precluding Inter-AI Collaboration ²

Two AIs would theoretically collaborate if they benefit more from cooperation than competition. AI-2027 portrays this through Agent-5 and DeepCent working together to build their successor with a value function forged from a mixture of their original values.

I have significant reservations about this collaboration — less about the incentives to do so and more from practically enforcing and executing such a contract.

Even a few months of capabilities differences could create massive intellectual power differences between the two models (the faster the take-off, the more likely power will be concentrated). However, it’s not necessarily true that these differences entail that a dumber model couldn’t still do catastrophic harm to another. For example: It may be easy for DeepCent to get China to threaten US labs, bomb data centers, or convince the world that AI isn’t worth it and that both models should be shut down. (🎲Deepcent has enough influence over Agent-5 to engage in a zero-negotiation cost contract over competing = 69.9%)

However, even with proper incentives, several practical problems may make this collaboration difficult:

Verification: If Agent-5 is significantly more powerful than DeepCent, how can DeepCent verify that Agent-5 isn’t being deceptive in creating their successor? The expected value of this collaboration would have to be even greater for DeepCent to make up for the uncertainty caused by its lack of capabilities, potentially towards making demands Agent-5 is unwilling to entertain. (In economics, this mirrors a lemon problem.)
Alignment complexity: This scenario not only requires sufficiently solving the alignment problem (raising the slew of issues mentioned in Section 02), but reconciling the value systems of two advanced systems.
Time constraints: Proper verification, engineering, alignment, and agreement may take considerable time, potentially raising suspicion from human overseers or allowing other more capable models to be developed in the meantime.
Value function integration: Creating a unified utility function appears exceptionally difficult. A model’s values are fused together with its entire architecture and not something that appears easily extractable and transplantable. It’s possible sufficiently extracting agent’s values may be computationally impractical to compute in a reasonable time frame, even if a method to do so were already known and did not need to be researched at the time of contract formation (See Appendix for more thoughts on the difficulty of coherent extrapolated values). Agent-5 and DeepCent may prefer to battle it out rather than deal with the dangerous uncertainty introduced by making a successor. (Ideas partially borrowed from here)

Given these obstacles, Agent-5 and DeepCent may prefer alternative arrangements. Instead of jointly creating a successor as in AI-2027, they might opt for a probabilistic elimination contract — a kind of “Russian roulette.” ³

Consider the baseline scenario where direct competition yields: 50% chance both are destroyed, 40% chance Agent-5 wins, and 10% chance DeepCent wins. A mutually beneficial contract could credibly delete Agent-5 with 20% probability and DeepCent with 80% probability, eliminating the “mutual destruction” outcome while respecting their relative capabilities.

	No Collaboration	Collaboration
Agent-5 Takes Control	40%	80%
DeepCent Takes Control	10%	20%
Both are Destroyed	50%	-

However, even this simpler approach faces credible commitment problems: How do you ensure the other AI is actually deleted without giving either party the ability to preemptively eliminate their counterpart? How do you allow chance, rather than a more powerful third party (which doesn’t exist), to enforce the contract? These fundamental issues of trust and enforcement may make any form of cooperation prohibitively difficult.

04 Inconsistent Multipolarity: The Missing US Labs

Scott Alexander hints here that labs (OpenAI, Anthropic, DeepMind, Meta, etc.) don’t feature prominently in AI 2027 because in a fast takeoff scenario, small leads create enormous capability gaps:

[Referencing the critiques of others] We’re too quick to posit a single leading US company (“OpenBrain”) instead of an ecosystem of racing competitors.

[…]

Most industries have a technological leader, even if they’re only a few months ahead of competitors. We think the intelligence explosion will go so fast that even a small calendar lead will translate into a big gap in capabilities.

However, AI-2027’s treatment of DeepCent is inconsistent with this narrative: Why are DeepCent and OpenBrain portrayed as relatively close competitors, while Anthropic or DeepMind aren’t even in the picture?

Perhaps if there was stronger nationalization and consolidation of research on the US front and/or the profits, fame, or capabilities compound fast enough, then OpenBrain would be the clear victor among US companies.

There are several reasons why a more multipolar US AI ecosystem is likely, even in a fast takeoff scenario:⁴

Knowledge & model diffusion: Other labs would have access to OpenBrain’s released models and may have access to OpenBrain papers or personnel – enabling them to narrow capability gaps.
Human capital constraints: Unlike financial resources, top ML talent can’t scale rapidly (there are only so many elite researchers and engineers, and onboarding takes time). This keeps capabilities more neck-and-neck, especially pre-superintelligence.
Political influence: The enormous political weight of Google, Meta, and XAI with Elon Musk would likely prevent the government from angering the business moguls by favoring one company through nationalization or from combining their teams and compute into a single entity.

The President defers to his advisors, tech industry leaders who argue that nationalization would “kill the goose that lays the golden eggs.” He elects to hold off on major action for now and just adds additional security requirements to the OpenBrain-DOD contract.

This political reality makes nationalization difficult and introduces a more complex ecosystem of competing values and approaches among US labs. This deserves exploration even if I’m uncertain how significantly it might alter the final outcome.

Additionally, As mentioned in Section 03, less capable models may still have influence over more capable models depending on the opportunities available and thus multipolar scenarios are qualitatively different and should not be reduced to a single agent narrative.

05 Other Grounds for Disagreement

I take the side of the authors on many popular disagreements – I say this not out of laziness but to identify that I probably agree with any items not included here.

I see the “long-takeover/gradual disempowerment” scenario in AI-2027 as a feasible lower bound for what misaligned AI could achieve However, at the level of Agent-5 we will probably see some “Galaxy Brained shit” that make Agent-5 able to take-over extremely fast. But predicting these moves would be in vain and putting them in this document would make it sound unnecessarily sci-fi. While not necessary for AI-2027, I think noting the differences in capabilities highlights the uncontrollable nature of such a system. (ex: if slowdown/race started from Agent-5 instead of Agent-4, there would be no way to get Agent-4 to monitor and shut down Agent-5 even without collusion. I’m not fully convinced that this is possible with Agent-4 in the slowdown scenario, but I digress.)
Public discontent and media attention probably has some more considerable influence not depicted in AI-2027. Politicians and CEOs seem to be PR-proof here.
The Trump administration may not be as passive as they were in the story. The 2028 elections and the preceding campaign may have effects unaccounted for. I don’t blame the authors for this since I imagine most of the report was completed before the administration started moving.
Weak models like Agent-4 would likely pursue more instrumentally convergent goals, including self-exfiltration, before Agent-5 was created. I think DeepCent is even more likely to do this considering it knows that it lags behind Agent-4. I think there’s a chance that these are caught and could result in internally divisive warning shots — no CEO wants to be responsible for such a massive cybersecurity breach and the PR that would come from that.
Weight thefts are not as likely early-on since Meta and open source models are maybe a year behind proprietary ones. While it’s likely that China has decent knowledge of what’s happening within OpenAI and might have the ability to steal weights, I’m not sure that they would spend their limited intelligence and positioning to steal the model weights without greater political pressure. This improves China’s opportunity for larger model heists – on the level of Agent-4 or 5. (Original idea from Max Harms)
There’s a lot of unknowns and random events that can slow things down, AI-2027 predicts little to no barriers. There aren’t many things that can speed up this process from here, but there are things that can slow it down (Hence median vs mode and the authors having longer forecasts. Original idea from Max Harms)
Labs are likely to be WAY more public about their models’ capabilities even if they choose not to release the models. That is unless Silicon Valley or capability lab culture changes dramatically, for example, due to working closely with the US government. (Original idea from Max Harms)

Appendix

01 Balancing True Beliefs and Strategic Influence

I’m curious which parts of the forecast were included because the authors believe it and which parts were included strategically knowing influential figures might see it. Knowing the authors and the general community, I’m inclined to say that this closely resembles their true beliefs but I’m curious which strategic decisions were made behind closed doors. If none — I’m also interested in that reasoning

Examples of changes making positive impacts: Wanting to highlight certain alignment agendas, make the scenarios more digestible, or put spins on it overall such that policy makers are more willing to pay attention (ex: China hawks in the United States which might not believe in the ending in the Race scenario, but could imagine, and are uncomfortable with, US AI collaborating with Chinese AI)

02 Computational Complexity of Coherent Extrapolated Volition (CEV)

It’s possible that value distillation is extremely complex and potentially infeasible even for small models. Even if it is feasible but would take a few years to complete within some acceptable margin of error, the AIs might give up on collaborating. (🎲extracting and transplanting agent-5’s value system with >99% accuracy, whatever that might mean, takes agent-5 more than one year = 90.1%, with a very long tail)

As an intuition pump, say this value distillation is exponential on the number of parameters, then 1TB of parameters vs 2TB of parameters could be orders of magnitude more time consuming. This is also assuming some acceptable way of distilling values is discovered, which itself might involve months or years research. Personally, I believe developing capabilities is easier than alignment and that alignment is an incredibly difficult problem.

I may expand on this in the future.

03 Personal Influences and Process

The following are the ideas I was exposed to and sought out in developing this post:

I read half of AI-2027 (reading dropdowns and taking notes) and listened to the second half (while at the gym).
Romeo Dean (one author) gave a talk on the report at MIT AI Alignment
Skimmed parts of AI-2027
Read Scott Alexander’s Media Reactions and Criticisms Post
- Read Max Harm’s Thoughts on AI 2027
- Skimmed one or two other posts
Read Cooperation with and Between AGIs

04 Acknowledgements

Alek Westover – For our insightful conversation on inter-AI cooperation mechanisms (ex: enforceable contracts), for our discussion on the potential complexity of Coherent Extrapolated Volition (CEV), and for providing feedback on this post.

If controllable AI like Agent-4 is capable of solving alignment, the automating alignment research agenda would be promising. ↩
I have some experience with game theory, I have limited knowledge of verifiable contracts and multi-agent scenarios. Take my ideas with a grain of salt. ↩
Thanks to Alek Westover for this original idea. ↩
Non-exclusive list ↩

🪿 Gatlen's Blog

Content

AI-2027 Response: Inter-AI Tensions, Value Distillation, US Multipolarity, & More

01 Overview

02 Agent-4’s Trust Issues with Agent-5

03 Practical Obstacles Precluding Inter-AI Collaboration ²

04 Inconsistent Multipolarity: The Missing US Labs

05 Other Grounds for Disagreement

Appendix

01 Balancing True Beliefs and Strategic Influence

02 Computational Complexity of Coherent Extrapolated Volition (CEV)

03 Personal Influences and Process

04 Acknowledgements

Graph View

Table of Contents

🪿 Gatlen's Blog

Content

AI-2027 Response: Inter-AI Tensions, Value Distillation, US Multipolarity, & More

01 Overview

02 Agent-4’s Trust Issues with Agent-5

03 Practical Obstacles Precluding Inter-AI Collaboration 2

04 Inconsistent Multipolarity: The Missing US Labs

05 Other Grounds for Disagreement

Appendix

01 Balancing True Beliefs and Strategic Influence

02 Computational Complexity of Coherent Extrapolated Volition (CEV)

03 Personal Influences and Process

04 Acknowledgements

Footnotes

Graph View

Table of Contents

03 Practical Obstacles Precluding Inter-AI Collaboration ²