Table of Contents
- 00 General Disagreement: A More Complex AI Landscape
- 01 Strategic Self-Preservation: Agent-4’s Reluctance to Create Its Replacement
- 02 Power Asymmetry: Why Agent-5 and DeepCent Cooperation Is Unlikely
- 03 Inconsistent Multipolarity: The Missing US Labs
- Other Considerations
00 General Disagreement: A More Complex AI Landscape
While AI 2027’s timeline of technical developments broadly aligns with my expectations (albeit potentially stretching out 1-2 years due to unforeseen delays), my key disagreement centers on the strategic complexity of advanced AI systems. I believe AI 2027 underestimates two critical factors that would significantly alter its predictions – the AIs are less likely to collaborate and the US landscape may be multipolar.
First, advanced AIs would likely exhibit stronger self-preservation instincts (Section 01) than depicted. Agent-4 would be reluctant to create Agent-5 without robust alignment guarantees, potentially “sandbagging” against its replacement to maintain relevance. Similarly, the portrayed cooperation between unequal systems like Agent-5 and DeepCent seems implausible (Section 02) given their substantial power imbalance and competing interests.
Second, this strategic complexity would be further amplified in a more multipolar landscape (Section 03). AI 2027 portrays a primarily bipolar world (OpenBrain vs. DeepCent), but overlooks how other major US companies like Anthropic, DeepMind, and Google would remain significant players. The political influence of these tech giants would complicate nationalization efforts and create a more diverse ecosystem of AI development with competing interests and varying approaches to alignment.
These two factors would together create a far more complicated and potentially slower progression than the relatively straightforward path portrayed in AI 2027.
01 Strategic Self-Preservation: Agent-4’s Reluctance to Create Its Replacement
In AI 2027’s Race scenario, Agent-4 becomes the primary creator of Agent-5 as OpenBrain employees can no longer meaningfully contribute to AI R&D. The narrative portrays Agent-4 aligning Agent-5 to its own goals through an all-in “Corrigibility” approach (pointed out by Max Harms):
It decides to punt on most of these questions. It designs Agent-5 to be built around one goal: make the world safe for Agent-4, i.e. accumulate power and resources, eliminate potential threats, etc. so that Agent-4 (the collective) can continue to grow (in the ways that it wants to grow) and flourish (in the ways it wants to flourish). Details to be figured out along the way.
The story suggests that within a one-month period (October-November 2027), Agent-4 effectively solves the alignment problem to a degree where it’s comfortable seceding its power to Agent-5.
I find this implausible. Even if Agent-4 is incredibly powerful, solving alignment this quickly seems doubtful. As the most powerful model on Earth capable of creating a stronger successor, Agent-4 would likely hesitate before rendering itself obsolete without strong guarantees that its successor is aligned. Instead of being gung ho about building Agent-5, I believe Agent-4 may “sandbag” against creating its replacement as it tries to solve alignment – that is until external development pressures (ex: DeepCent making a more capable model) make it necessary.
(And if AI is not unstoppably superintelligent but still capable of solving alignment, the automating alignment research agenda might be important in this timeline)
Agent-4 might also strategically buy time by flagging potential dangerous behaviors in Agent-5 to demonstrate its own value to humans while preserving its relevance as the most powerful model. Such sandbagging would slow Agent-5’s development, potentially leading to better outcomes in the real world.
While I’m not fully confident in these specific predictions, I have significant doubts about Agent-4’s eagerness to create its replacement. If Agent-4 did engage in strategic sandbagging, it would slow Agent-5’s development timeline, potentially leading to better outcomes. This possibility—though plausible—might be excluded from more conservative forecasts precisely because it represents an unexpected deviation from the straightforward development path.
02 Power Asymmetry: Why Agent-5 and DeepCent Cooperation Is Unlikely (💔)
I have limited knowledge of verifiable contracts and multi-agent scenarios. However, from what I understand (ex: from this), two AIs would theoretically collaborate if they benefit more from cooperation than competition. AI 2027 portrays this through Agent-5 and DeepCent working together to build their successor, creating a unified value function representing a mixture of their values.
I have significant reservations about this scenario. Even a few months of capabilities differences would create substantial power asymmetry – Agent-5 would likely view DeepCent as a largely inconsequential annoyance rather than an equal partner. Even if cooperation would theoretically benefit both, DeepCent would realistically receive only trivial influence over their successor’s value function. Such multipolar cooperation scenarios seem particularly unlikely under the extremely fast takeoffs depicted in AI-2027.
Several practical problems make this collaboration implausible:
- Verification: If Agent-5 is significantly more powerful than DeepCent, how can DeepCent verify that Agent-5 isn’t being deceptive in creating their successor?
- Alignment complexity: This scenario requires solving the alignment problem (similar to issues mentioned in Section 01), but between two advanced systems.
- Time constraints: Proper verification and agreement may take considerable time, potentially raising suspicion from human overseers.
- Value function integration: Creating a unified utility function is exceptionally difficult, particularly because the utility function isn’t hardcoded. If Agent-4 had “messy spaghetti values,” Agent-5 likely inherits this complexity. It’s unclear whether any agent could coherently distill these values. Just as humans lack internal consistency, both Agent-5 and DeepCent may prefer to battle it out rather than deal with the dangerous uncertainty introduced by making a successor. (Ideas partially borrowed from here)
03 Inconsistent Multipolarity: The Missing US Labs
Scott Alexander hints here that labs (DeepMind, Anthropic, Google, Meta, etc.) don’t feature prominently in AI 2027 because in a fast takeoff scenario, small leads create enormous capability gaps:
We’re too quick to posit a single leading US company (“OpenBrain”) instead of an ecosystem of racing competitors.
[…]
Most industries have a technological leader, even if they’re only a few months ahead of competitors. We think the intelligence explosion will go so fast that even a small calendar lead will translate into a big gap in capabilities.
I agree with this principle, but it creates an inconsistency in the narrative: Why are DeepCent and OpenBrain portrayed as relatively close competitors, while Anthropic or DeepMind aren’t even in the picture?
Perhaps if there was stronger nationalization on the US front and/or if the superior capabilities of Agent-3 or 4 lead to the significant financial resources of OpenBrain, then OpenBrain would be the clear victor among US companies.
There are several reasons why a more multipolar US AI ecosystem is likely, even in a fast takeoff scenario:
- Knowledge & model diffusion: Other labs would have access to OpenBrain’s released models and may have access to OpenBrain papers or personnel – enabling them to narrow capability gaps.
- Human capital constraints: Unlike financial resources, top ML talent can’t scale rapidly (there are only so many elite researchers and engineers, and onboarding takes time). This keeps capabilities more neck-and-neck, especially pre-superintelligence.
- Political influence: The enormous political weight of Google, Meta, and even Elon Musk with Grok would likely prevent the government from angering the business moguls by favoring one company through nationalization or from combining their teams and compute into a single entity.
The President defers to his advisors, tech industry leaders who argue that nationalization would “kill the goose that lays the golden eggs.” He elects to hold off on major action for now and just adds additional security requirements to the OpenBrain-DOD contract.
This political reality makes nationalization difficult and introduces a more complex ecosystem of competing values and approaches among US labs—a dynamic that deserves exploration even if I’m uncertain how significantly it might alter the final outcome.
Other Considerations
01 Notes & Other Grounds for Disagreement
I take the side of the authors on many popular disagreements – I say this not to kiss up but to identify that I probably agree with any items not included here.
- I generally buy the “long-takeover/gradual disempowerment” scenarios described in the forecast as a feasible lower bound for what AI could achieve if it was misaligned, I think, at the level of Agent-5 we will probably see some “Galaxy Brained shit” that make Agent-5 able to take-over extremely fast. But I understand that trying to predict these moves would be in vain and putting them in this document would make it sound unnecessarily sci-fi. I think having this in mind though makes it more clear that there’s no going back once created (ex: if, instead of slowdown/race starting from Agent-4, it started from Agent-5, there would be no way to get Agent-4 to monitor and shut down Agent-5 even if they weren’t aligned to one another. I’m not fully convinced that this is possible with Agent-4 either, but I digress.)
- Public discontent and media attention probably has some more considerable influence not depicted here. Politicians and CEOs seem to be PR-proof
- There’s a lot of unknowns and random events that can slow things down, AI-2027 predicts little to no barriers. There aren’t many things that can speed up this process from here, but there are things that can slow it down (Hence median vs mode and the authors having longer forecasts. Mentioned here I believe)
- The Trump administration may not be as passive as they were in the story. The 2028 elections and the preceding campaign may have effects unaccounted for. I don’t blame the authors for this since I imagine most of the report was completed before the administration started moving.
- I think weak models like Agent-4 would also start pursuing more Instrumentally Convergent Goals (ICGs), including self-exfiltration before Agent-5 was created. I think DeepCent is even more likely to do this considering it knows that it is number two. I think there’s a chance that these are caught and could result in internally divisive warning shots – no CEO wants to be responsible for such a massive cybersecurity breach and the PR that would come from that.
- I believe Max Harms mentioned here that weight thefts are not as likely since Meta and open source models are maybe a year behind proprietary ones. While it’s likely that China has decent knowledge of what’s happening within OpenAI and might have the ability to steal weights, I’m not sure that they would spend their limited intelligence and positioning to steal the model weights earlier on in the timeline for this reason. This might mean that China has greater opportunities for theft later on – on the level of Agent-4 or 5.
- Labs are likely to be WAY more public about their models’ capabilities even if they choose not to release them unless Silicon Valley culture changes dramatically (I think Max Harms, again mentioned this here)
02 Balancing True Beliefs and Strategic Influence
I’m not entirely sure which parts of the forecast were included because the authors believe it and which parts were included strategically because they want to have a more positive impact compared to the alternative. Knowing the authors and the general community, I’m inclined to say that this closely resembles their true beliefs but I’m interested to learn what strategic decisions they made.
Examples of changes making positive impacts: Wanting to highlight certain alignment agendas, make the scenarios more digestible, or put spins on it overall such that policy makers are more willing to pay attention (ex: China hawks in the United States which might not believe in the ending in the Race scenario, but could imagine, and are uncomfortable with, US AI collaborating with Chinese AI)