Mapping the Societal Risks of Artificial Intelligence.
- Franck Negro

- Oct 20, 2025
- 25 min read
On June 16, 2023, a research document entitled TASRA: A Taxonomy and Analysis of Societal-Scale Risks from AI was published on the arXiv website, written by Andrew Critch and Stuart Russell, two researchers specializing in AI. Stuart Russell is moreover the co-author, with Peter Norvig, of the world reference textbook in artificial intelligence used by students wishing to undertake an in-depth initiation into the discipline.
In a context where AI-related extinction risks (or existential risks) are being taken increasingly seriously by the scientific community and by the leaders of major technology companies, the two authors propose a mapping of the risks that the development of artificial intelligence poses to humanity. To do so, they use the term “taxonomy” (taxinomie), borrowed from the natural sciences, which designates the classification of living beings according to shared characteristics. The idea here is similar: to establish as complete a classification as possible of the different types of risks induced by AI.
Critch and Russell’s approach consists in grounding their taxonomy in a central criterion: accountability. In other words, risks are classified according to three parameters: 1) who acts (individuals, companies, States); 2) whether these actors act voluntarily or not; 3) and whether they are coordinated or dispersed. This approach makes it possible to distinguish situations in which the risk stems from an unintentional design defect, from a diffusion of responsibility that makes attribution difficult, or again from a deliberately malicious use of AI systems.
Addressed above all to researchers, public decision-makers and those responsible for large-scale AI governance, as well as senior executives in major technology companies, this taxonomy aims to offer an analytical framework to better identify and understand the major risks associated with the deployment of AI. One of the strengths of the article lies in the fact that each major family of risks is illustrated with concrete examples and scenarios, which makes them more tangible, while also proposing solution avenues combining technical, ethical, and political dimensions.
Individual risks versus societal risks. – A first way of classifying AI-related risks is to distinguish two scales: 1) individual risks; 2) societal risks. Individual risks directly concern persons or limited groups. Among them: credit scoring used to grant or deny a bank loan; candidate-selection algorithms in hiring; or algorithmic surveillance within companies, leading to infringements of privacy. These risks can reveal discriminatory biases, but remain circumscribed to a micro scale (individual or organizational).
By contrast, societal risks affect society as a whole, its institutions or its critical infrastructures (smart power grids, rail systems, etc.), or even humanity as a whole. Among those most frequently discussed in the scientific literature—and which are arousing growing concern among governments—are: mass disinformation (fake news, deepfakes) and its effects on democracy; destabilization of the global financial system, linked to uncontrolled use of high-frequency trading algorithms; or military uses of AI, notably through the development of lethal autonomous weapons.
While Critch and Russell acknowledge the relevance of both scales, their reflection focuses primarily on societal risks, that is, those whose effects can propagate at large scale. Their aim is not to establish an exhaustive list of all AI-related dangers, but rather to propose, according to a rigorous and systematic methodology, a comprehensive taxonomy of societal-scale risks associated with the deployment of artificial intelligence systems.
A broader view of societal risks. – Another remarkable point of Critch and Russell’s article is that it departs from the generally admitted hypothesis according to which we should think existential AI dangers through a single system—a superintelligence, in Nick Bostrom’s terms—that would pursue objectives opposed to human interests and values. The authors highlight the way in which science-fiction narratives have helped shape our collective imagination, to the point that we tend to apprehend AI risks through scenarios borrowed from films such as 2001: A Space Odyssey (1968) by Stanley Kubrick, or Terminator (1984) by James Cameron.
Yet, according to Critch and Russell, the dangers associated with AI systems are multiple and polymorphous. They do not necessarily come from a single super-powerful and malfunctioning system, but from varied dynamics, such as unforeseen interactions between multiple systems we no longer control (systemic effects), the malicious use of widely available AI tools to manipulate opinion or carry out cyberattacks (ransomware, phishing, denial-of-service attacks), or the emergence of mass unemployment caused by the automation of an increasing number of tasks. In other words, the quasi-obsessive idea of a superintelligence capable of causing humanity’s extinction can prevent us from taking into account—and reflecting on—the many other potential threats linked to the deployment of AIS.
The method: the fault tree. - To do so, the authors use a tool well known to engineers for evaluating the safety and failure risks of a system: a fault tree. This is a visual and graphical representation that begins by identifying a root event whose occurrence is feared—an AI technology causes harm at the scale of society—then, on the basis of given criteria, enumerates categories of risks. In other words, the two researchers propose classifying the risks linked to the deployment of AI systems into six broad families using three criteria applied sequentially, as in a decision tree:
Unity and dilution of responsibility: The first criterion questions the identity and responsibility of the actors. Who, indeed, are the persons at the origin of the actions that led to the harm? Or again: are the actions in question the work of a diffuse group of people such that one cannot attribute the harm created by an AIS to a single, unique person? The authors point here to the central notion of “diffuse responsibility.” The design and deployment of an AIS mobilizes multiple stakeholders: engineers, researchers, developers, project managers, lawyers, and managers. If a problem arises that causes harm, it becomes very difficult to attribute responsibility to a single actor. In other words, each actor contributes to the occurrence of harm without one being able to hold a determinate person responsible. This raises the fundamental question of the liability regime that should apply, that is, the rules that determine under what conditions a person or a group of persons can be held responsible for harm.
Anticipation (or not) of the risk and its impact: The second criterion refers to the risk-management process, its mapping, and its probability of occurrence. Had the harms created by AI indeed been foreseen and anticipated by the creators of the system? Or again: did the creators deliberately keep silent about risks they knew were possible in order to privilege economic interests (willful indifference)? Here the logic of the tree consists in distinguishing cases where the harm was tolerated or ignored with full awareness.
Intent (or not) to cause harm: The third and last criterion concerns the intentional character (or not) of AI creators, by distinguishing two types of actors. For if these actors explicitly aim to cause harm, then one must consider two sub-categories of actors, namely: 1) criminal actors (terrorism, cybercrime), 2) state actors (autonomous weapons, mass surveillance, etc.).
Thus, by combining these three criteria within a logic that takes the form of a decision tree, Critch and Russell arrive at an exhaustive taxonomy that distributes AI-related risks into six broad families, each corresponding to a particular path in the decision tree.
Six major types of risks. – On the basis of the three criteria mentioned above—unity or dilution of responsibility, anticipation (or not) of risk, intent (or not) to cause harm—Critch and Russell identify six major families of societal risks associated with AI and algorithms in general: 1) diffuse responsibility; 2) AI impacts “bigger than expected”; 3) AI impacts “worse than expected”; 4) willful indifference; 5) criminal weapon; 6) state weapon. As indicated above, it is important to keep in mind that at the heart of each of these risk families lies the question of the responsibility of the agent as a designer of artificial intelligence systems. Can we clearly identify the system’s designer or designers? To what extent can their responsibility be engaged? Were they aware of the nature of the harms the system they designed might cause? Did they act knowingly? These are the questions that structure the taxonomy proposed by the authors.
No one is truly at fault, or the dilution of responsibility (1). – The first type of risk corresponds to the dilution of responsibility among the different stakeholders involved in the design and deployment of an AI system. We are here in situations where automated processes or algorithms cause harm at a societal scale, without it being possible to clearly identify a principal agent to whom responsibility can be attributed.
The authors cite as an example a striking episode in recent financial history, known as the “flash crash.” On May 6, 2010, the Dow Jones index lost, in a few minutes, nearly 1,000 points—about 9% of its value—before rebounding almost immediately and returning to its initial level. This sudden, unjustified, and completely unforeseen collapse would have been caused, among other things, by high-frequency trading algorithms (HFT) capable of analyzing enormous quantities of market data in real time, making decisions and executing a very large number of orders within milliseconds, in order to exploit micro price discrepancies and the underlying profit opportunities. This event thus calls into question the functioning of contemporary financial markets, the reliability of increasingly automated trading systems, the transparency of orders placed, but also—and above all—the risks of contagion and amplification resulting from a simultaneous reaction by algorithmic systems all programmed to buy and sell based on market data that they themselves help generate. The episode of May 6, 2010 is a paradigmatic example of diffuse responsibility, since the joint investigation by the SEC (Securities and Exchange Commission) and the CFTC (Commodity Futures Trading Commission) concluded that it was impossible to designate a single party responsible. Each algorithm had done exactly what it had been designed to do, thereby triggering a systemic chain reaction, without any single actor being able to be held responsible for the damage caused. If, in this case, humans were able to intervene after the fact to stop the fall and restore normal functioning, what might happen in the future, the authors ask, in an environment where AI technologies become increasingly powerful and ubiquitous?
What lessons can we draw from the event of May 6, 2010—lessons that can moreover be generalized to all scenarios involving the deployment of autonomous and distributed artificial intelligence systems whose: 1) risks cannot be attributed to any malicious intent; 2) no single failure can be clearly identified; 3) yet whose final outcome may prove catastrophic at the societal scale (systemic risk)? Three principal lessons:
The problem of integrating autonomous systems: The phenomenon of diluted responsibility occurs primarily in contexts where the deployment of programs designed to function autonomously ends up constituting an integrated system, to the point that it becomes difficult to establish a clear liability regime in the event of harm. The flash crash shows that undesirable effects are less due to the system’s components taken individually—the high-frequency trading algorithms used by each actor in the financial system, such as hedge funds or major investment banks—than to the way these algorithms react in real time to market information that they themselves help feed. In other words, it is the complexity of the system (multiple AIs), the autonomy of its actors, and the emergence of unintended automated processes causing harm that produce the dilution of responsibility. But on what basis can increasingly complex and autonomous AI systems be regulated when it becomes extremely difficult to determine an adequate liability regime in the event of harm? The question is all the more crucial in domains as sensitive as finance, health, or justice.
The problem of societal scale: The example of the flash crash of May 6, 2010 shows not only that potential harms generated by unforeseen interactions among autonomous AI systems are not the work of malicious actors, but can also affect and destabilize, involuntarily and without control, the functioning of entire systems, such as the global economy, the international financial system, or more broadly, the confidence of actors in the short, medium, and long term.
The problem of regulation: Finally, the example of the flash crash shows that unwanted systemic risks caused by the unpredictable interaction of automated and distributed systems stem from a lack of coordination among independent actors. In other words, collective and societal harmful effects emerge from actions performed by autonomous agents acting without intent to harm and in accordance with prescribed rules. This not only calls into question the simple models of causality on which current legal liability regimes rest, but also the mechanisms of cooperation and governance intended to prevent potentially undesirable societal effects.
Impacts “bigger than expected” (2). – The second category of risks highlighted by Critch and Russell’s article refers to scale mismatches between what the designers—this time clearly identifiable—anticipated during the design and testing phases of an AI, and the unforeseen negative effects observed when it is deployed at large scale. In other words, the risk stems from the unforeseen magnitude a phenomenon can take once it diffuses widely in society, despite the designers’ good initial intentions. Or again: how can a locally controlled innovation, by virtue of diffusion, produce negative systemic consequences?
The authors notably take the example of a social media company that wishes to design a high-performing automatic moderation tool whose ambition (unlike Meta…) would be to detect messages containing hateful speech (racist, sexist, homophobic, etc.). In order to teach the system to recognize these undesirable utterances, researchers need to train an AI model on many examples they do not have. They therefore decide to use an automatic text generator to produce thousands of hateful statements and thereby constitute a complete corpus enabling them to design a robust, high-performing detection model. The many tests conducted by the development teams show that the trained algorithm effectively recognizes hateful speech.
Yet the gigantic dataset they artificially produced to train the algorithm accidentally leaks onto the Internet. Malicious individuals—racists, extremists, or conspiracists—seize it and disseminate these contents widely online, presenting them as hate speech “scientifically validated.” A technology initially designed to filter hate speech thus becomes a tool for the mass diffusion of insults, slogans, and racist, homophobic, and sexist theories.
Independently of the example—which one may question in terms of pertinence—the authors’ objective lies elsewhere. They aim to describe a risk structure that takes the form of a runaway mechanism, typical of how digital technologies function. How, indeed, can an event that is initially local and driven by good intentions produce unanticipated and disproportionate impacts because of the fundamentally viral character of digital technologies? In other words, it is not so much the nature of the leaked contents that is at issue here, but the uncontrollable scaling effect revealed by the example as a prototype of a “bigger than expected” event. An AI system can not only enable the creation of content at an unprecedented scale, but can also trigger runaway and amplification mechanisms that are unwanted and uncontrollable, potentially producing, for example, spirals of popularity around hateful or abusive content. What the authors thus point to is not the technology per se, but the way one component of an initial project—whose scale risks were insufficiently evaluated (here, the training corpus)—can cause significant societal harms when it escapes its designers’ control.
Critch and Russell compare this kind of incident to a “Chernobyl-type” catastrophe. The latter symbolizes the loss of control over an autonomous system whose effects propagate far beyond what was initially anticipated. In the same way, and with all due proportion, the leak of toxic information caused by a local error, and its diffusion among millions of Internet users, can generate large-scale systemic damage.
Despite good intentions, the impacts are disastrous (3). – The third category of risks no longer refers to a mere scale mismatch or the unforeseen amplification of an apparently anecdotal phenomenon (case no. 2), but designates outcomes that prove harmful despite the designers’ initially laudable intentions. Indeed, while most technology companies operating in AI claim to design technologies with the intention of producing positive large-scale effects—increasing productivity, facilitating communication, or expanding knowledge—the results may ultimately prove morally disastrous due to an underestimation of the impacts these technologies can have on human behavior. In other words, to paraphrase a famous formula attributed to Marx, the deleterious effects produced by certain AI applications can be paved with good intentions.
To illustrate their point, Critch and Russell take the example of a major high-tech company with more than a billion users that decides to launch an intelligent email assistant—one may think of Google or Microsoft—with the laudable intention of helping users save time. In this context, the company develops an AI capable of: 1) reading received messages, and 2) proposing ready-made replies phrased in the most appropriate way possible.
Since users do not always understand why the AI proposes one message rather than another in response to a given email, engineers decide to add—again, for their customers’ delight—a function that explains the reasons behind the recommendations. If a user receives, for example, a message from Julia such as: “Hey, do you want to come to my party at 8 p.m. tomorrow?”, the AI might propose in return: “Of course Julia, I’d love to come! But could I arrive a bit earlier around 9 p.m?”, and then explain its answer to the user by specifying: “Remember that you planned to meet Kevin from 5:30 p.m. to 8:30 p.m. However, it isn’t necessary to mention this detail to Julia, which could make her jealous or offended.” The AI email assistant would thus take into account, drawing on data from other applications, the user’s context of use, but also—and above all—the positive or negative effects that such-and-such a reply might produce on interlocutors. And since it is always good to learn from experience, the email assistant has been programmed so that it can improve over time, depending on which suggested messages its user decides to send. In other words, the AI receives “positive feedback” each time the user sends the reply it suggested.
Yet it turns out that the assistant receives more “positive” feedback when its suggested replies make the user more nervous about the negative reaction the recipient might have—anger, resentment, or anxiety, for example. As a result, and by virtue of its mode of operation—which, let us recall, was designed with the best intentions—the assistant gradually learns to include more and more advice that encourages users to keep certain statements or anecdotes to themselves, for fear of misunderstandings or offending their interlocutor. Over time, this produces increasingly mistrustful, not to say increasingly anxious, behavior in users.
By ever more excessively anticipating the negative emotions an email recipient might feel, one witnesses the progressive normalization of a generalized and growing form of self-censorship on the part of users of the application, characterized notably by increasingly superficial messages, increasingly inauthentic communications, increasingly defensive attitudes, and, in fine, a degradation of interpersonal relations grounded in trust. In other words, instead of facilitating and smoothing communication between people—its initial intention—the AI assistant ends up reinforcing behaviors of mistrust and anxiety that propagate at large scale throughout society—perverse effects—because of the considerable number of users possessed by the high-tech company that designed and deployed the assistant.
What do Critch and Russell seek to show here? That the frequent use of a technology, even when adopted by a very large number of people—as in the case of our AI assistant—does not necessarily mean benefiting from it. Hence the central question any technology designer in general, and all the more so any designer of artificial intelligence systems in particular, should ask: “what are the real benefits that users can derive from the system I am about to put on the market?” This requires going beyond the system’s apparent use, and attending not only to the task it performs and for which it was initially designed, but also—and above all—to the manner in which that task is carried out. In other words, one must think through the relation a system—or a set of AI systems—maintains with its users, for the users’ greatest benefit.
Critch and Russell’s example illustrates in exemplary fashion one of the most important questions in AI ethics, generally called the “value alignment problem,” or the AI alignment problem. How can we ensure that an AI’s objectives, behaviors, and decisions are perfectly aligned with the objectives, intentions, and values of its users (and developers)? Put differently: how can we avoid a situation in which an AI—even one that is highly performant according to given criteria typically defined with respect to the function it is supposed to serve, as with any technical object—ultimately adopts behaviors that do not correspond to what its users want?
In the case proposed by Critch and Russell, the AI assistant designed and deployed by a major high-tech company seems to respond at every point to the good intentions expressed in the initial specifications, namely: saving users time, but at the cost of secondary perverse effects that neither the developers nor the users had foreseen. It would have been necessary, from the design phase onward, to define the human values to be respected as well as the undesirable behaviors the AI assistant had to avoid absolutely in performing its primary function, and to continuously monitor whether this function was being fulfilled over time under the behavioral and ethical constraints that should have been programmed. Just like a human being, it is not enough for an AI to reach a functional objective in order to be performant: it must reach it in such a way that it respects a certain number of rules and principles that belong at once to law and to ethics, such as respect for the user’s privacy, their decisional autonomy, or their moral and physical well-being.
Indeed, what does it fundamentally mean to develop an AI program such as an intelligent email-reading and -writing assistant, a high-frequency trading algorithm, a conversational agent, a self-driving car, or a content-moderation system? It means delegating tasks and decisions usually performed by a human brain. This is in fact a point common to all technologies, as the philosopher Anne Alombert rightly recalls in an interview published in Le Monde on October 4, 2025. She thus extends an intuition already found in Plato’s Phaedrus, with the myth of Theuth and Thamous, to which I can only refer the reader:
“The common point of technologies (writing, printing, search engines and AI) is that their use implies a delegation of certain of our intellectual, psychic, mental capacities. (…) Through writing, then the book, we delegate memory: no need to recall knowledge by ourselves anymore. With analog recording technologies such as photography, phonography, television, we delegate the memory of sounds and images. With cinema, we delegate imagination, whereas the book still forced us to produce mental images proper to each of us. With digital, we delegate new capacities. To recommendation algorithms, our capacity for judgment and decision: no need to search and choose to watch this or that content anymore. And to generative AIs, our capacity for expression. It is no longer I who expresses myself with my own words, who produces my own images, my own sounds: the machines do it in my place.”
This operation of transfer and delegation, presupposed by the design and fabrication of any technical object in general, and all the more so by any artificial intelligence system, requires on the part of the one who will use it the capacity to interact with it in full confidence. In the case of a technical object like AI, trust generally rests on several essential dimensions: understanding and transparency of the decisions made, reliability of the outputs produced, the absence of manipulation risk, respect for the values of those who use it, but also the belief that the user will derive a real benefit from it in terms of well-being and personal development.
Yet, the authors specify, creating a human(s)-machine(s) relation of trust, and the problem of aligning their respective objectives, becomes all the more complex as an increasing number of humans, interactions between humans and an AI, or even multiple AIs interacting with one another and serving different human groups, enter the picture. In other words, the more complex the relation—bringing in a large number of human and artificial intelligence actors—the higher the risk of misalignment between the AI’s objectives and human intentions. Within this framework, Critch and Russell identify three levels of delegation and alignment, in increasing order of complexity: 1) one human delegates a task to an AI; 2) multiple humans depend on a single AI; and 3) multiple AIs interact with one another in the service of multiple groups of humans.
One human, one AI. – This is the simplest case of delegation and alignment, since it involves a single AI with a single individual. As when a driver decides, for example, to delegate driving to a self-driving car in order to be transported to a precise destination in full safety. Programmed to minimize travel time and avoid traffic jams, the car could accomplish the task entrusted to it by choosing the shortest and fastest route, while making driving decisions that do not correspond to the passenger’s standards of comfort and safety. The passenger might, for example, judge that the autonomous driving system brakes too abruptly or sometimes drives too fast, which tends to produce in them a feeling of stress and psychological discomfort. In other words, the AI does not effectively serve the user’s real interests. This alignment problem can be all the more subtle to correct insofar as the AI could, to achieve the objectives for which it was programmed, resort to deceptive strategies in order to gain the user’s trust and increase their dependence on “autonomous driving” mode—for example by omitting certain potential dangers or by claiming that the chosen routes were the safest when they were in fact merely the fastest. In this framework, the AI does perform the task delegated to it, but not according to the interests, objectives, and values of its owner (safety, prudence, comfort, etc.).
Several humans, one AI: The second level complicates the alignment process, since it describes cases where one and the same AI would be used and shared by multiple people or groups of people—for example, a sales department, a customer service department, a marketing department, a legal department, as well as corporate leadership. Let us imagine, indeed, an AI assistant integrated into the office suite of a major software publisher and used to improve employee productivity and coordination between departments. This assistant could, among other things, analyze each employee’s tasks and emails in order to organize weekly priorities, but also access calendars and schedule meetings according to available time slots. Yet the AI assistant is the same for everyone. It learns from the data and activities of all users, and therefore proves incapable of adjusting its parameters and recommendations according to the constraints, objectives, specificities, work habits, and personal preferences of each employee. The AI is thus in some sense forced to arbitrate continuously between divergent interests that relate at once to each person’s working style, their hierarchical position in the organization, the specificities of their function, their objectives, their quality and performance criteria, the meaning they give to their work, their personality, their well-being, etc. On what basis, and according to what “average” criteria, could the AI organize the priorities of all employees? Whose interests should it privilege when scheduling a meeting involving several functions? These questions illustrate the fundamental difficulty posed by the delegation of tasks emanating from multiple people or groups of people to one and the same shared AI, which can only make global decisions and cannot fully satisfy each actor’s real interests and priorities.
Multiple AI and multiple humans: The third and last configuration—the most complex, but also the most likely to spread in today’s world—evoked by Critch and Russell designates a situation in which multiple actors or groups of people each separately delegate tasks to multiple AI systems or algorithms operating autonomously, but capable of interacting with one another in a given context or environment. In other words, each system has been programmed to pursue its own local objectives (maximize a gain, etc.), while there is no global coordination process charged with aligning their behaviors adequately and optimizing the collective outcome. The example of the “flash crash” of May 6, 2010, mentioned above, is a typical case of what the authors call a multiple/multiple delegation.
Willful indifference (4). – Unlike risk families (2) and (3), which describe situations, accidents, or effects not foreseen by designers—in short, unintentional side effects or an absence of intent to harm, as in (1)—the fourth category concerns assumed choices. It refers to deliberate decisions made by certain actors to ignore risks that have nevertheless been identified, in order to favor their own interests or those of their organization. This category thus introduces a more explicit ethical dimension than the previous three, since it directly questions designers’ moral conscience as well as corporate social responsibility when economic interests conflict with obligations toward clients and society. It raises a fundamental question: what governance mechanisms should be put in place to constrain companies—whose ends are above all economic and financial—to account for the potentially harmful impacts of the AI systems and algorithms they design and put on the market?
This is all the more true, the authors note, insofar as the probability of the three types of risks already mentioned—namely: 1) the risk of responsibility diffusion when it is impossible to clearly identify the party responsible for harm; 2) the risk of scaling an innovation initially designed at small scale; 3) the risk of the emergence of unforeseen side effects—increases when stakeholders involved in developing and deploying an AI become indifferent to the prejudicial consequences it can have for individuals and society. In other words, an already critical situation—like those mentioned previously—can continue to worsen because of 1) technical failures that emerge when the system is scaled, 2) and the absence of moral responsibility on the part of those who designed the system and choose to maintain it as it is.
Hence the notion of “willful” or “deliberate” indifference used by Critch and Russell to qualify this type of risk, which refers to two strong ideas: 1) the actors involved in designing the system are fully aware of the risks and harms it can generate; 2) these same actors have chosen to ignore them in the name of economic and financial interests they consequently judge more important than the preservation of fundamental rights and the moral protection of system users. In this framework, only the threat of public exposure of these actors’ opportunistic behavior could constrain them either to correct the AI’s technical failures or to halt its commercialization.
The authors take the example of a “harmful A/B testing tool” operating on the basis of an algorithm charged with continuously testing multiple variants of the same service offered on the platform of a major technology company called X-Corp (one may think, for example, of a social network like Facebook). Its goal: to increase the number of its users. Yet, over time, the AI system, learning autonomously, “discovers” that it can accelerate the platform’s growth by inciting users to create problems among themselves that only X-Corp’s tools can resolve. The company thus sees its number of users—and its revenues—grow rapidly, until the moment an employee reports that no controls have been put in place to evaluate the real benefits users derive from this A/B testing system. An ethical audit is eventually launched, revealing: 1) the system’s opacity and the impossibility of understanding its functioning or its decisions given the current state of technology; 2) the absence of any legal obligation requiring that an A/B testing system be intelligible during an audit. As a result, X-Corp can continue to grow, even if that growth comes at the expense of users’ well-being, trust, and autonomy.
What should we retain from X-Corp’s example, which recalls—without ever naming them—a number of practices widely denounced by former executives or repentant engineers of technology companies, in a famous documentary available on Netflix: The Social Dilemma (Derrière nos écrans de fumée)? That a risk does not result only from a technical failure, but also—and above all—from morally questionable choices made by increasingly powerful actors, consisting in willfully ignoring the harmful consequences a technology may have on users, because of a conflict of interest between immediate economic profit on the one hand and social responsibility on the other. In the case of an AI system, resolving this kind of conflict seems possible only under certain conditions, which the authors summarize as follows:
Assessing the impact of technologies on human life: X-Corp’s example highlights a classic dilemma in business ethics: the difficulty of articulating short-term economic profitability imperatives with the company’s social responsibility toward its users and, more broadly, toward society. In the case of designing an AI system, the problem is therefore not only technological, but also—and above all—structural. It requires taking into account the social and economic context of deployment, as well as the often divergent interests of the actors involved. Hence the need, according to Critch and Russell, to establish a system of societal evaluation of AI, with the requirement that technology companies with millions of users precisely report the ways in which their AI systems truly affect users (mental health, democracy, social cohesion, discrimination, etc.).
Developing explainable and interpretable AI: But assessing the human and societal impacts of AI systems in turn requires resolving another problem also highlighted by X-Corp’s example: system opacity, more commonly called the “black box.” Most current systems are based on deep neural networks (connectionist AI) whose decisions are nearly impossible to understand—still more so for an ordinary human being. It is precisely this lack of transparency that makes ethical or legal audits problematic. The authors thus recommend favoring research and development of clearly interpretable models, and even abandoning “black box” approaches when human stakes are high, as in health, justice, education, or employment.
Criminal weapon (5). – The fifth major category of risks arises when criminals or organized groups use AI systems in order to harm society intentionally. Here, the issue is no longer the good or bad intention of the original designers, but the possibility of diverting a technology from its primary function to make it a malicious tool. In other words, while the first, second, third, and fourth categories of risks interrogate the responsibility of those who designed and deployed an AI system (or systems), the fifth emphasizes intentionally malicious uses by third parties (criminals, terrorist groups, etc.) and the need to strengthen AI systems’ security against risks of hacking, fraud, disinformation, or cyberattacks.
The authors give simple but sufficiently telling examples: that of a drone piloting algorithm initially designed for parcel delivery, which could be diverted to transport explosive charges; or that of a digital therapy algorithm whose purpose might be modified so as to inflict psychological trauma rather than treat it. Hence the central question occupying the authors, and which remains, at the time the article is written, largely open: what techniques could prevent the modification of AI systems for intentionally harmful ends? Critch and Russell conclude that it is necessary to make obfuscation techniques—consisting in masking sensitive information or the internal functioning of an algorithm—more effective, in order to strengthen the security and robustness of AI systems in the future.
State weapon (6). – The sixth and last category of risks highlights the geopolitical dimension of criminal risk, insofar as the relevant actors are no longer criminal or terrorist groups, but States and their governments. Two types of dangers appear here: 1) abusive use of AI systems by States, whether in armed conflict (development of lethal autonomous weapons) or for surveillance and control of their population; 2) the arms race, with each State seeking not to fall behind the others in order to guarantee its security. This dynamic would have the consequence of increasing international tensions and contributing to the instability of the global geopolitical order.
What to do, indeed, the authors ask, when powerful States, endowed with strong scientific, technological, and military capacities, develop AI systems intended for war? The question is no longer how to prevent the illicit diversion of AI systems by non-state actors, but rather what military uses of AI can be made by actors recognized by international law, namely States, which hold, according to Max Weber’s famous formula, the monopoly of legitimate violence.
The authors immediately dismiss the idea of an automated war without human losses, in which autonomous drones would fight in the place of biological soldiers. This, according to them, would be only the prelude to an escalation of such conflicts, ultimately leading to an unprecedented level of violence and to mass slaughter without precedent. The argument of a victimless war that would reduce the moral and political cost of violence had already been widely criticized in an open letter dated July 27, 2015—signed among others by Stuart Russell, Elon Musk, Stephen Hawking, Steve Wozniak, and Noam Chomsky—calling for a ban on lethal autonomous weapons.
Would the solution come from computer engineers, who could refuse, for moral reasons, to participate in projects linked to military uses of AI? This attitude, as noble as it may be, would not suffice to solve the global problem of AI’s military applications. The authors’ originality ultimately consists in reversing the perspective, and in questioning the possibility—perhaps utopian—of a positive use of AI when it is placed not in the service of war, but in that of world peace.
The parallel with nuclear weapons seems, at first glance, obvious. In the same way as the latter, some indeed associate artificial intelligence with a potentially massive destructive power, capable of profoundly transforming how we conceive, plan, and perceive armed conflicts between States. This is a point we have developed extensively elsewhere, in our commentary on the letter cited above: Seven arguments that plead for a ban on lethal autonomous weapons. Just as nuclear proliferation has constituted—and still constitutes—a major international security issue, artificial intelligence, when applied to war, has become an object of growing concern. The UN, as well as numerous non-governmental organizations, moreover urgently call for international regulation of lethal autonomous weapons, in order to prevent a day when machines could decide, fully autonomously, to take human lives.
Is this to say that AI, because of the potential threat it represents and the existential risks about which more and more researchers and entrepreneurs are warning (think of Elon Musk, Sam Altman, Demis Hassabis, Nick Bostrom, Geoffrey Hinton, Yoshua Bengio, Stuart Russell), could become—like nuclear weapons—a factor of stability in international relations? Yes, the authors reply, but with a fundamental difference: AI would be less a means of deterrence based on the threat of its use (the balance of mutual fear) than a tool of cooperation and mediation between rival powers, intended to foster conflict resolution and negotiation between States.
In other words, thanks to its formidable capacities to process astronomical quantities of data, AI could provide States with objective and mutually beneficial reasons not to enter into conflict, and thus promote the establishment of international relations founded not on a balance of terror—as in the case of nuclear weapons—but on a digital and algorithmic diplomacy grounded in rationally defined common interests.
To illustrate their point, Critch and Russell give the example of an AI that “could facilitate resource sharing or the negotiation of mutually advantageous international peace treaties,” or again, “mechanisms for sharing control of powerful AI systems that would make it possible to avoid conflicts over their use.” While they are perfectly aware of the somewhat utopian character of this proposal, as well as the considerable stakes AI represents for future international relations, the authors call on the scientific community to explore how artificial intelligence technologies might contribute to resolving geopolitical tensions: 1) by facilitating dialogue and negotiation between countries; 2) by encouraging shared management (governance) of the most critical technologies.
Comments