
The concept is bleakly called P(doom)—the probability that AI will destroy humanity.
Daisy McGregor is not as worried about it as she used to be. Leading UK policy at AI research house Anthropic, she points to belated government efforts to work on AI safety.
But Roman Yampolskiy, prominent among AI experts for warning of the danger, reckons P(doom) is close to 100 percent. Speaking alongside McGregor at ASPI’s Sydney Dialogue conference on 4 December, he said, ‘You can’t indefinitely control something way smarter than you.’
‘It’s kind of like building a … perpetual safety device, guaranteeing that every model, ever made by every company, in every environment, with all the malevolent interactions in [training] data, will never make a big mistake. That’s very unlikely.’
The something that would be way smarter than humanity is artificial super intelligence. This would quickly follow artificial general intelligence (AGI), itself defined as the level at which machines would be as clever as people, said Sydney Dialogue convenor David Wroe, hosting the discussion.
AGI is only a few years away, many experts say. McGregor and Yampolskiy agreed that it could be achieved not with some necessary technical breakthroughs but basically by scaling up the computing resources that are being made available to train AI models.
As we race towards superintelligence, we seem to have little time for ensuring it won’t make decisions that would suit it but destroy humanity. Asked for a concrete example of such a risk, Yampolskiy cited the possibility of an AI agent wanting to cool the planet so its servers could run better.
‘We don’t know what wants and needs those [AI] agents will come up with, but they definitely may as a side effect of implementing those goals, harm humanity,’ he told Sydney Dialogue participants. ‘Until you can guarantee that this is not going to happen, you should not be allowed to build it.’
McGregor said she was far more optimistic about P(doom) than she had been a few years ago, though she acknowledged that its value was above zero.
‘Really we need to solve some of these safety research problems in parallel with developing the models,’ she said. Now she was more confident that it could be done.
Governments were paying attention, she pointed out, reflecting on the improvement since late 2023, when a global AI Safety Summit was held at Bletchley Park, the home of the British decryption effort in World War II and its pioneering digital electronic computers.
‘When we did the Bletchley Park summit, at the time when I was working for the UK government, and … in the labs, they were seeing—we were seeing—massive jumps in capability between models.’ The level of AI capability then was not concerning, ‘but it was the changes. And there was no action, but, not only that, no expertise in governments.’
So P(doom) had been quite high in her estimation then.
Now in government ‘there has been more action but particularly … work to understand the risks behind the scenes …. There has been quite a lot of work behind the scenes on things like pre-deployment testing.’
Various governments, including Britain and Australia, have established AI safety institutes, although the US one is now called the Center for AI Standards and Innovation.
Recounting the rate of progress, Yampolskiy said, ‘We have already got to the point where it doesn’t make sense to employ a human student most of the time as a research assistant. You are better off with a model at this point. So, I think we are super close’ to the AGI threshold.
‘I think we’ve crossed the barrier in many sub-domains’ of mental activity. ‘Have we crossed it for all the top humans in every domain? Not yet, but we are getting there very quickly. Every week, there is a new breakthrough announced, a new model comes out and it’s always like, “Oh, my God, it’s another 20 percent improvement ….” ’
Improvements are coming in part from post-training for models, a process that is now used to refine and improve their capabilities after they have been initially developed with the standard process of pre-training in feeding them with masses of data.
McGregor said model developers had begun using post-training only this year, because only then had models been clever enough to get an advantage from it, rather as a child must have grown enough to learn well from structured instruction.
The increased computing resources that would achieve AGI would particularly be applied to post-training, she said. It would work best with problems whose answers were easily verifiable, she added.