I’m generally an optimist. My default is usually to try and find a positive outcome or a silver lining in most things. My bar for having a negative outlook on something is generally quite high.
That bar has now been hit for AI safety.
Don’t get me wrong - I still think AI is a fundamentally brilliant technology on par with the invention of the steam engine and the internet. It’s going to change our world in ways we can’t possibly imagine yet, and we’re going to see phenomenal advancements in science, technology, and achieve great things.
But as with all technology, it has the potential to be used for immense good in the world, but it also has the potential to cause absolutely devastating destruction to our world and our society.
Jack Clark, one of the co-founders of Anthropic, gave a great analogy in an interview with The Rest is Politics this week (worth listening to), where he said these new models are like developing new nuclear power plants, where every so often they inadvertently pop out a nuclear bomb alongside it.
As we build increasingly advanced models, the prospect that the technology could be used for devastating harm gets larger and larger. There’s an asymmetric risk trade-off where those trying to use the technology for harm only need to get through once to cause significant harm, whereas those defending need to get it right all the time.
With those odds, I don’t think it’ll be long before we see an AI-enabled catastrophe. And I fear that it’s going to take a catastrophe before anyone takes the impetus to introduce safety measures to stop things like this happening.
Cyberattacks through frontier models like Mythos (which Anthropic say is coming to the public in the coming weeks…) are one example we’ve heard a lot about in recent months, but the generality of the technology means it could be applied across a range of vectors - it could be used to build bioweapons or threaten national security or be used by militaries on the battlefield (as we’ve already seen in Venezuela and Iran).
These problems are only going to get bigger as we further develop more frontier models. So what do you do about it?
Well, I don’t know. If I did, I’d be spending all day trying to make it happen, but here are some rough thoughts I’ve had recently (possibly and probably somewhat incoherently):
You can’t stop the development of frontier models. The proverbial genie is out of the hat. This would be like stopping industrialists using the steam engine during the Industrial Revolution, and we all know what happened to the Luddites. And besides, even if you wanted to, you’d need global agreement that nobody would do it, lest someone gets an unfair advantage. I don’t have faith that our current state of diplomacy is going to make that happen - not least when we’re struggling to do so with nuclear proliferation.
You can’t bring the frontier model companies under state control or really regulate them. If you stop an OpenAI or Anthropic from doing it, someone else will spring up, possibly in a different country, doing the same thing eventually. If OpenAI and Anthropic and Google all agree not to release “dangerous” models, what happens six months down the line when an open-source model reaches the same point of development? Anthropic changed their policy earlier this year from saying they’d stop development of models that crossed danger thresholds until safeguards were in place, to saying they would now only pause if the Board (or their Long Term Benefit Trust) thinks it’s necessary.
You can’t really rely on the companies to self-regulate either. Historically, you could get some comfort (rightly or wrongly) by listening to these companies tell you how virtuous they were and how they’d set up all these governance structures and complicated mechanisms to ensure that the good of the world remained at the forefront of everything they did. But then when push came to shove at OpenAI and Sam Altman was removed because the board thought the company was getting too focussed on fast development at the cost of safety, the board got replaced and Sam was reinstated pretty quickly.
As more and more of these companies go public (notably Anthropic, OpenAI and I guess technically SpaceX this year), the pressure between choosing option A that makes more money but is perhaps a bit morally questionable and a greater risk to safety and option B that is the better moral choice and better for safety but doesn’t maximise shareholder value becomes pretty deafening. It becomes very difficult for these companies to commit to prioritising AI safety over the profit motive and that’s scary.
And this is all hugely amplified by the rate of change of the technology. We’re getting closer and closer to AI that can perform tasks at the level of human competence and agentic tools that can go off and do things by themselves. That brings it easier and easier for something to go wrong, and I’m worried about the day that it does.
Now I don’t have the answers to any of these questions (or even really know what all the questions are just now) but I’ve been doing some learning through bluedot.org who have some great resources. I’ve done their Future of AI course and would really recommend - it takes two hours and you can sign up for the more detailed ones after if you enjoy it.
This is one of the biggest problems in the world right now. The more people are aware of it and the more brain power that’s spent on it, the closer we get to helping mitigate some of the risks.
Welcome any thoughts, comments or criticism.
RH