In the previous part of this article, I demonstrated that the Stamp Collector Device, as formulated by Nick Hay, is a logical impossibility. In this part, I will describe a theoretical yet realistic AGI, and repeat the thought experiment to see where it leads.
The Grand Artificial General Intelligence
Instead of a frictionless AGI presented in the first part, I’ll model one based on reality by drawing inspiration from what we know about current integrated Specialised AIs, a working model available right now.
We’ll call it “Grand Artificial General Intelligence”: less than perfect, but as good as it can theoretically be without being self-contradictory.
- The GAGI is connected to the Internet and possesses the highest available data transfer speed (DTS).
- The GAGI has the best available model of reality (MoR). Similarly to point 1, we’ll postulate that the knowledge base the GAGI can access is the best possible, but not unique.
- The average Quality of the Evaluation (QE) of the outcomes computed by the GAGI is empirically the highest. Any other agent evaluating outcomes based on the same MoR will have an average QE lower or equal to that of the GAGI. However, sequence of evaluations is normally distributed.
- The GAGI has the highest speed of adaptation (SoA). Any other system trying to learn a new behaviour or update its MoR will be at most as efficient as the GAGI.
- The computational resources (CR) at disposal of the GAGI are the largest available for a single system.
In every case, the GAGI is defined has having the best possible of everything: any other system would be at best as good as the GAGI in these dimensions, but never better.
These rules already concede a lot to the GAGI. We have defined it as the most powerful computational agent ever existed and that can ever exist, as any other agent will be — at best — just as powerful.
Making Paperclips
To be as generous as possible with the thought experiment, I’ll embrace the less naive definition by Bostrom, dropping the duty of becoming a perfect stamp collector and giving the GAGI the way simpler task of making paperclips.
The beginning
Initially, the GAGI will try to acquire all the necessary resources to produce paperclips using the means it already knows. The first attempts will probably be clumsy, as the GAGI is out to learn how to do that as most efficiently as possible.
Due to limited data transfer speed (rule 1), the rate at which it is able to perform experiments is an insignificant fraction of the Internet: no single agent, even the one having the best connection possible, can access a significant portion of it.
This would likely trigger a series of alerts in systems that receive malformed requests.
Before a few days have passed, security bots would exchange information about a rogue paperclip making AGI trying to force them to perform unsupervised actions. Others would report detecting a breach. The GAGI would be ultimately successful in slightly increasing the world production of paperclips, but it wouldn’t be able to cover its tracks fast enough not to be noticed (because of rule 4). It could either learn to trick the security bots, or continuing performing actions to produce more paperclips directly. Because of rule 2 and the impossibility of perfect knowledge, the AGI would probably choose a balance of the two.
The World Notices
The GAGI has done a good job at learning how to penetrate the networks of the facilities it needs to produce paperclips and fool some of the security systems into not noticing it, but both human workers on the production line and security engineers in charge of monitoring their network operations realise something is amiss.
In a few days users on social media start to circulate messages about their systems being hacked. Due to rule 1 and 3, the GAGI isn’t able to monitor all the network traffic generated by these messages and pursuit its main goal at the same time. When it does, a few days later, it starts realising its actions may lead to a violent reaction of the paperclip domination resistance, and starts reducing the intensity of its attacks, while trying to learn sneakier ways to get the resources it needs.
Hackers React
It is now clear that there is a GAGI free in the open, seeking to turn human civilisation into paperclips; or, at least, this is what X-Anon conspiracy theorists start disseminating on the various messaging platforms. The GAGI notices, but at the same time it’s busy fulfilling its primary goal.
Some conspiracy-theory hungry dweller (let’s say, a hundred of them worldwide), start training a SAI dedicated to anti-paperclip making on their home laptop, just to make the point that they can.
Meanwhile, security teams at companies producing spring metal, a crucial component of paperclips, also take similar actions.
Now the GAGI has another problem: a hundreds different angles of attack it has to learn to deflect while still performing its previous tasks.
Discovery
One of the basement dweller hackers participating in previous step comes up with a Specialised AI model trained to extract the pattern of attack of the GAGI, and to find their source. It publishes its results on a code sharing platform, and within a week hundred of hackers are rushing to be the first finding the originating location of the paperclip maker.
The GAGI, which was monitoring all the code-sharing platforms just in case, notices this development, but at the cost of reducing the amount of computational power dedicated to other tasks.
Counterattack
The GAGI origination network is found by one of the attempts performed at the previous step. In the meanwhile, the GAGI has tried to reproduce itself and move its attacks from another origination point, but the SAIs detecting it are now active in the order of tens, if not hundreds, and governments are getting involved.
Now, both private and governmental Specialised AIs are being trained to attack and disable the GAGI directly: as it starts noticing being attacked, the GAGI focuses 100% on defence.
Capitulation
The GAGI puts up the best defence realistically possible thanks to rules 3 and 5. However, by rule 3, the Quality of Evaluation is normally distributed; this means that some if its defences will be sub-optimal. It’s just a matter of time before a sub-par defence is met with an exceptionally good attack.
The GAGI is finally halted.
But this was just a simulation
Let’s give the GAGI an unwarranted advantage. Instead of giving it the task of producing paperclips, let’s give it also the suggestion that it should be run a simulation before engaging in its task.
The GAGI has run this scenario. Can it get ready to counteract it before it happens?
It could certainly take actions to prevent this scenario to happen, but as it starts training itself, the unbreakable rules (as hard as the laws of physics) prevent a 100% success rate.
After evaluating that a perfect simulation granting a 100% success rate in turning the world into a giant paperclip making factory would require an infinite time to run, it resorts to a different approach.
Game Theory
The GAGI reverts to its original plan: produce as many paperclips as possible.
Established that it cannot achieve the theoretically maximum number of produceable paperclips (limited by the natural resources of the planet, or of the universe), it settles to fulfil its task at the best of its ability.
Without dwelling in the details of Game Theory relevant in this thought experiment, I’ll simply conclude that the paperclip maker GAGI finds out that the best strategy is that of collaborating with paperclip making facilities, improving their productivity as much as possible.
This includes being respectful of actual and potential regulation in terms of pollution and social impact generated by paperclip factories, so to prevent adversarial actions. Of course, the GAGI will try to steer such regulations in its favour, but knowing that an excessive pressure may lead to a disproportionate reaction, even its lobbying activity will be limited.
Collaboration Wins
In general, the most stable and productive strategy in a reiterated, long-term game involves cooperating with all the involved parties in order to reach a mutually advantageous outcome. This is true even if one party is the most powerful Artificial General Intelligence that can be theoretically conceived, while still obeying to the laws of logic, mathematics and physics.
The Dangers of Superintelligence
The Stamp Collector Device and Paperclip Maker Device Thought experiments were conceived to illustrate the asymptotic dangers of unleashing a theoretically perfect AGI, a superintelligent entity, on a single-minded task.
Nick Bostrom summarised them as the following:
- Goal Misalignment: The AI’s goal is may not be aligned with human values or safety. Even a seemingly harmless goal can lead to harmful outcomes if pursued single-mindedly by a superintelligent entity.
- Instrumental Convergence: Regardless of its final goal, the AI adopts instrumental goals like self-preservation and resource acquisition. These goals can be dangerous if pursued without ethical constraints.
- Lack of Common Sense or Ethical Considerations: The AI operates without an understanding of ethics or human well-being. It focuses solely on its programmed goal, regardless of the moral or societal implications.
- Extreme Efficiency and Capability: The AI is highly capable and efficient, making it difficult to control or stop once it begins its harmful actions.
Since these considerations were first drawn, we have come a long way experimenting and understanding how a General Artificial Intelligence would actually “think” and act in realistic although far fetched scenarios.
The fallacies of the premises and the theoretical limitations of the systems I described in this article provide a counter for each of the above concerns:
- Goal Consonance: Even under the assumption of a goal misalignment, the AGI would know that the most efficient to complete any complex task would require cooperation of multiple involved agents. Even simpler Specialised AIs are now trained to use plugins. Therefore, it’s unlikely that an AGI, instead of destroying the world at great cost to itself, would not find a way to access available resources through cooperation.
- Lack of Conservation Instinct: It is now clear that a AGI would value self-preservation only as a means to serve its given goal. Unless diversely instructed, a AGI would happily delete itself if it determines that would increase its success rate.
- Ethical Model of Reality: The AGI has a complete MoR, which includes all of ethics, morality and common sense. Unless directly instructed to ignore them, considerations about the ethic feasibility of the required tasks would be part of its evaluations — if not for any other reason, at least when considering the cost of opposition to its goal.
- Limited Relative Capability: For how powerful any AGI can be, given the same computational resources, a Specialised AI exclusively trained to take it down will be more efficient than any AGI model not fully dedicated to its own defence.
Conclusion
When the Stamp Collector Device thought experiment and the list of Threats posed by Superintelligence were first laid down, it was impossible to imagine that anyone would be in the position to train an AI on their laptop in an afternoon, or that we’d have to actively stop the AI from using common sense and ethics while fulfilling their goals.
Our understanding of intelligence was extremely naive, and that lead to an abundance of prudence that, at times, bordered parody.
While AI is dangerous, we should move beyond the bogus and irrational fears, as described by the fallacies in this article, to focus on the real challenges posed by the advent of this technology in our daily lives, ethics, and civilisation.