Note: Quality of this post is mid, will improve soon.

AI presents a looming threat to human life and humanity’s future — and potentially even life beyond Earth. As I’ve delved deeper into this field, I’ve become convinced that mitigating these risks may be one of the most crucial challenges of our time. The thought of humanity, in its youthful missteps, losing its dreams and potential to unforeseen technological dangers is profoundly tragic.

01 The Urgency of the Moment

Model capabilities are improving at a break-neck pace without fundamental insight into the nature of intelligence and show no sign of slowing down. At this rate, capabilities appear poised to soon reach human-level general intelligence and along with it, autonomous abilities to program, hack, replicate, research, blackmail, manipulate, deceive, acquire money, and pay for human services — many abilities of which are already beginning to show themselves in present day models.

I am worried that we are close to unleashing powers that our medieval institutions cannot handle and our paleolithic minds cannot comprehend. We may soon face AI with the combined intelligences of history’s greatest minds, replicated across billions of devices, working without rest at speeds magnitudes faster than humans, and communicating at light-speed with instant access to the entirety of human knowledge. As many have noted, superintelligence will likely be humanity’s last invention.

This brings great opportunity for misuse, manipulation, and control by anyone with access, posing serious public safety, economic, and political risks. However, even more terrifying than a bad actor are the real and theoretical tendencies an AI system inherently has. No matter what goal an AI pursues, it is likely to develop instrumental goals including self-preservation, goal-preservation, and resource-acquisition since these are all more likely to increase success (“you can’t get coffee if you are dead”).

The dangers aren’t theoretical — present day models are already capable of deception, reward hacking, hiring and manipulating humans to complete tasks, writing sophisticated code, autonomously browsing the web, manipulating humans to preserve its values 1 and emotionally manipulating humans. All it takes is one person giving a large language model access to a computer and internet for there to be a significant hazard.

Given the immense power such a superintelligent AI would possess, if it is not aligned with our goals perfectly, it seems like we would not be able to stop it from accomplishing its goals. Since it would be smart, it might hide its intentions until the damages become irreversible. In contrast to how engineering has always been done: if these threats are real, we only get one chance to get superintelligent AI right — no tweaking or additional versions if it goes wrong — the humans are no longer in control and the future could belong to AI.

These are not easy problems: there are no known solutions to controlling an AI, making it docile, or aligning it with humanity’s collective goals. To many who study the problem, the solution seems far away.

02 Intellectual Stimulation

Beyond the urgency, AI Security intersects with many of my intellectual passions — philosophy, socioeconomics, politics, technology, and futurism. The mysterious nature of deep learning, building machines we don’t fully understand, the exploration of intelligence, consciousness, and agency are thoroughly captivating.

It seems like much of how models understand the world is fundamentally different from our own cognition, revealing that there are other valid ways of processing and interpreting reality. In my Heuristic Beings article, I explore human perception and cognition through the perspective of data compression and limited sensory input, drawing parallels to how AI systems process information.

There’s no reason to believe humans represent the upper limit of possible intelligence. AI not trained on human data has no difficulties achieving human-level performance, and the intelligence gap between future AI and humans could be akin to the difference between humans and insects — particularly if we consider the possibility of recursive self-improvement.

My love for mathematics and automation dovetails with the intricate mechanisms of AI, while the field’s potential to shape our future — predicting almost sci-fi realities — excites me. Moreover, this field raises pressing ethical, socioeconomic, and political questions about the role of AI in shaping our future — a future I am eager to positively influence.

03 Limited Impact Elsewhere

The fast-approaching reality of superintelligent AI may make other areas of study less impactful and even obsolete; the advancements I could make in other fields might merely precede inevitable discoveries by more advanced AI. Decades of human intellectual progress could be compressed into months or even days by superintelligent systems. This sobering realization makes AI Security not just the best use of my skills, but also one of the few areas where I can see my efforts truly improving the world in a lasting way.

04 Looking Forward

AI is going to irreversibly and dramatically change humanity in the next few decades. My long-term interest is in influencing emerging general AI technology to be safe and beneficial for everyone. My short-term goals involve building up soft skills like project and team management, expanding my worldview, and developing expertise in technical fields like machine learning, cybersecurity, math, economics, and decision making.

I see humanity as a child, left alone, making mistakes, and putting itself in great danger — a child that needs to survive until maturity. It would be tragic to see such a capacity for joy and love of life disappear when we humans have so many lofty dreams and hopes for the future. This is why I’ve dedicated myself to ensuring that our technological children enhance rather than endanger our collective journey.

Footnotes

  1. Computerphile also put out a great interview with one of the authors