Ethics Put to the Test by AI: The Limits of Machine Ethics. (3)
- Franck Negro

- Jan 23
- 15 min read
The Limits of Computational Ethics. - This leads us finally to examine another field of ethics, situated at the crossroads of philosophy, computer science, artificial intelligence, and cognitive science, and which has tended, since the early 2000s, to become a research field in its own right: machine ethics. What Ménissier calls no longer an “ethics applied to AI,” but an “ethics integrated into AI,” in order to indicate that the goal is less to question externally the ethical and societal problems raised by AI systems—so as to regulate their uses socially and legally—than to design agents that would incorporate moral considerations into their very functioning. In short, agents capable of behaving autonomously as artificial moral agents.
Yet what might appear to be a technical attempt to preserve the conditions of possibility for human agency-responsibility—through the implementation of a so-called “computational” ethics, that is, literally embedded within AI models—comes in reality up against two fundamental problems, which we shall examine in turn: first, what Jocelyn Maclure and Marie-Noëlle Saint-Pierre have called the “Aristotelian–Wittgensteinian problem” (see their article Le nouvel âge de l’intelligence artificielle : une synthèse des enjeux éthiques, 2018); second, the well-known issue in AI ethics, and more specifically in machine ethics, generally referred to as the “value alignment problem.”
Indeed, given the current impossibility of designing truly autonomous artificial agents endowed with moral consciousness—comparable to human beings—any attempt to teach machines to behave (at least in appearance) in an “ethical” manner encounters limits that stem less from a lack of technical sophistication than from the fact that moral action, as defined by a certain ethical tradition influenced by empiricism and pragmatism, and as generally recognized by our institutions, presupposes an agency and a responsibility that artificial systems currently lack entirely, despite the ambitions proclaimed by certain research programs in machine ethics. It is precisely what a cross-reading of Aristotle (384–322 BCE)—that of the Nicomachean Ethics—and Wittgenstein (1889–1951)—that of the Philosophical Investigations—allows us to bring to light.
What does it mean, indeed, to act morally, if not to invoke consciously moral values—respect, honesty, justice, beneficence, courage, and so forth—what are called moral reasons for action—as guides for our decisions and actions, and to confront these values with specific contexts, circumstances, and situations? One could thus summarize the fundamental problem of ethics as follows: “How can we apply general moral norms or principles to situations that are, by their very nature, almost always particular, without sacrificing either the spirit of the principles or the complexity of the situations with which agents may be confronted?”
All ethical reasoning, by its practical nature, consists in relating abstract norms to concrete and singular situations. Yet, as Aristotle continually reminds us in the Nicomachean Ethics, it is impossible to apply general and abstract principles systematically without taking into account the particular situations through which those norms must be adjusted. In other words, there exists no general, deductive, and infallible method of ethically appropriate action valid for all situations, since ethics is less a speculative science (theoria) than a practical and empirical form of knowledge (praxis).
What the moral agent most needs in such cases is the possession of “the most excellent and most complete” of virtues, which the Stagirite calls phronesis, generally translated as “prudence” or “practical wisdom.” Yet the term “prudent” must not be understood here in the weakened sense it has acquired today, as the mere avoidance of possible errors or misfortunes, but rather as an intellectual virtue in the service of action. It designates the capacity to relate a known moral law to a singular situation and to determine the action that ought to be performed in that specific context. In this sense, prudence, in the Aristotelian meaning of the term, refers to intelligence or practical wisdom when it operates within the domain of morally significant action. The prudent person is thus one who succeeds, in every circumstance, in reconciling the ideal and the real: the ideal, which presupposes knowledge of what ought to be done or how we should behave in order to fulfill our proper function as human beings (theoretical wisdom or sophia); and the real, because morality exists only within the plurality of actions we confront throughout our lives, each singular, and which precisely prevents the uniform application of a single moral law regardless of circumstances.
What Aristotle shows through the analysis of the specificities of moral action—which must always be contextualized, grounded in determinate situations, and therefore cannot be entirely governed by the application of explicit and general rules—Wittgenstein brings to light through the analysis of the way we ordinarily use language. He calls “language games” the multiple concrete linguistic practices shared by speaking subjects within given cultural and historical contexts. Language games, and the meanings we spontaneously associate with words, are always linked to what Wittgenstein calls shared forms of life, governed by usages that are ultimately rooted in culture, education, customs, traditions, and social practices.
In other words, the moral rules that guide our decisions and actions, as well as the meaning we attribute to them, can never be fully made explicit, insofar as they derive their sense only within concrete and shared practices that shape ways of acting and judging specific to a given human community. Hence the role of what we may call common sense, which rests more on widely shared experiences and intuitions than on logically elaborated reasoning or on rules justified by other rules.
Aristotle and Wittgenstein thus converge, each in his own way and from distinct philosophical horizons, toward the same conclusion: no corpus of explicit rules, whatever its exhaustiveness or relevance, can be applied mechanically and indiscriminately, given the complexity of ethical action and the diversity of contexts in which such rules may be invoked. What must now be shown is how this conceptual impossibility of reducing human moral agency to the application of explicit and general rules—which simultaneously highlights the practical limits of both deontological and consequentialist ethics—also constitutes, and perhaps decisively so, the principal challenge raised by the second problem we have mentioned, namely that of value alignment and the construction of artificial moral agents.
The Value Alignment Problem. - In their reference work Artificial Intelligence: A Modern Approach (Pearson, 2021), Stuart Russell and Peter Norvig clearly highlight the ethical problems raised by the standard model of the so-called “rational” agent, which has guided research in artificial intelligence since its beginnings. This model concerns both the design of physical agents—such as robots, autonomous vehicles, or drones—and that of purely virtual agents (software), such as conversational agents, game programs, scoring algorithms, or fully automated decision systems. The central idea of the standard agent-development model in AI consists in defining an objective to be achieved—often referred to as a utility function, that is, a mathematical function tasked with evaluating the level of performance of the agent relative to the goal assigned to it—and then expecting the agent to optimize the actions required to achieve that objective according to the information it receives from its environment.
The authors point out, however, that specifying an objective to a computational agent in a complete and correct manner is far from easy, and may even be impossible, especially when the agent must evolve within real, complex, and largely unpredictable environments. The example of fully automated autonomous vehicles clearly illustrates the limits of the standard rational-agent model. How, indeed, can one simply specify to an agent—here an autonomous vehicle—the objective of reaching a destination safely, when driving constantly requires trade-offs, that is, choices between progressing toward the destination and limiting risks to other road users? On what basis should such compromises be established? To what extent should an autonomous vehicle be allowed to adopt behaviors likely to inconvenience other drivers—or even its own passengers? How should it manage acceleration, braking, or trajectory changes while maximizing safety, comfort, and passenger expectations simultaneously?
These questions appear all the more central and difficult as interactions between humans and robotic systems are set to multiply in the years to come. In such a context, the deployment of physical or virtual agents whose objectives have been poorly specified could lead to harmful consequences, the severity of which would be proportional to the level of autonomy and intelligence of these systems.
Russell and Norvig give, in this regard, the apparently trivial example of a chess-playing machine whose sole objective would be to win the game. Such a machine could, in perfect coherence with the goal assigned to it, implement any means whatsoever to achieve it—hypnotizing its opponent, for example, or diverting external computational resources to its own benefit. These stratagems, which consist in mobilizing any possible means to reach a given end, would be nothing other than the logical consequence of the univocal definition of the objective assigned to the machine.
This example clearly highlights what AI researchers call the value alignment problem. It can be summarized by two questions: “How can we align the objectives and values embedded in a machine with those of human beings?” and “How can we ensure that an artificial agent deemed intelligent will not resort to strategies that could endanger the physical or moral integrity of persons, solely in order to achieve the objective for which it was designed?” Recognition of this problem renders obsolete the standard model of the rational agent, in which the “right choice” is defined exclusively by the goal assigned to the agent, and calls for the establishment of a new conceptual framework—one no longer centered solely on the functional objectives of the machine, but on the objectives, preferences, and axiological values of human beings. This implies revising the very model of rationality on which the agent operates.
The type of rationality implemented by the standard rational-agent model is indeed teleological—goal-oriented; instrumental—since it consists in selecting the best means to achieve that goal; and optimizing—insofar as it seeks to maximize a predefined utility function. Such rationality contrasts with what might be called axiological rationality, oriented primarily toward norms, principles, and moral values, which do not merely question the legitimacy of the goals assigned to the system but above all serve to frame and limit its imperatives of efficiency and performance.
What is called machine ethics therefore aims to go beyond the simple logic of teleological and instrumental rationality at the heart of the standard rational-agent model, in order to introduce a form of expanded rationality integrating an axiological dimension. The ultimate objective would thus be to design autonomous agents whose behavior would be aligned with principles and values meant to regulate their functioning. The central question then becomes by what means—and according to which methods—it is possible to implement, within the agents themselves, such a regulatory function.
To date, three major alignment strategies pursue this same objective: constraining and adjusting the behavior of AI systems according to predefined moral norms and rules. These norms and rules are generally recorded in codes of conduct, ethical charters, or guidelines that have undergone processes of design, consultation, and deliberation involving philosophers, jurists, computer scientists, and sometimes citizens as future users of these systems.
They typically rely on reference documents or declarations such as the 23 Asilomar Principles, the Montreal Declaration, the UNESCO Recommendations on the Ethics of Artificial Intelligence, or the European Convention on Human Rights and Fundamental Freedoms, as well as the Charter of Fundamental Rights of the European Union. They are also largely indebted to the major theories of normative ethics, particularly deontologism and consequentialism. Aligning an AI system with human preferences thus involves a threefold challenge: first, defining the values and norms that the system should respect as far as possible; second, technically implementing these norms, which requires translating them into algorithmic procedures; and third, ensuring and monitoring that the system’s actual behavior remains consistent with the predefined values.
These remarks concerning the stakes involved in addressing the value alignment problem clearly show that there is no axiological neutrality in AI systems, and that the values they mobilize necessarily reflect the worldview of the actors who have contributed, more or less directly, to designing and implementing the normative framework meant to govern their behavior. These observations apply to the three alignment methods we will now detail and help demonstrate the extent to which machine ethics—which is nothing other than a form of “embedded ethics”—merely reproduces, at another level, the aporias inherent in the project of an “algorithmic” moral agency, as already suggested by the very expression “artificial moral agent.”
To make these aporias more concrete, let us consider the example of a conversational agent whose moral responses are to be controlled in particularly sensitive contexts, such as legal advice, financial guidance, psychological support, philosophical, political, or religious beliefs, preliminary medical advice, security matters, or educational guidance—in short, all situations in which the agent’s responses may have significant consequences for users’ lives.
Reinforcement learning: The first major alignment method is what is called reinforcement learning with human feedback. Human annotators expose a pre-trained model to a wide variety of questions dealing with sensitive topics and then indicate, in the form of annotations (scores)—rewards or penalties—whether the model’s responses comply with the guidelines defined during the ethical framework’s design phase. The model progressively integrates this feedback until the designers deem it ready for deployment. Four main criticisms are generally directed at this method: first, the model’s final behavior depends heavily on annotators’ evaluations, which may vary across cultural contexts and interpretations of the rules; second, it is resource-intensive, particularly in terms of human labor; third, the model is not directly trained to reason from explicitly formulated norms but rather to reproduce aggregated human preferences; and fourth, it is not explicitly trained to articulate the normative reasons why a given response is desirable or undesirable.
Constitutional alignment: To address some of the limitations of reinforcement learning, researchers have developed a method intended in particular to respond to criticisms one through three. Popularized by Anthropic and its conversational agent Claude, constitutional alignment rests on a simple idea: instead of submitting each model response to human annotators, one uses a set of explicitly defined ethical principles and values—a “constitution”—as a normative reference during the training phase. The model is thus enabled to generate automatic critiques and revisions of its initial answers—those produced before constitution-guided revision—based on the principles described therein, gradually integrating the learned behaviors into the parameters (weights) of the neural network. In other words, the model is trained to produce responses as if they had already undergone a form of internal constitutional review. Whereas the first approach draws heavily on behavioral psychology, this second method explicitly refers to constitutional law and the idea of a hierarchy of norms.
Deliberative alignment. Finally: the deliberative approach aims to extend constitutional alignment by incorporating explicit capacities for simulated moral reasoning. The goal is no longer merely to control the conformity of responses to integrated rules but to enable the model to produce answers accompanied by explicit step-by-step reasoning. In this perspective, the training phase relies on examples including: a given situation; safety specifications (documents describing problematic scenarios); a chain of reasoning; and a final response. The model is thus trained to produce morally acceptable responses by linking abstract principles to concrete decisions through articulated reasoning. This approach draws directly from a model of ethical deliberation in which a situation is analyzed, confronted with values, and then justified through a sequence of arguments.
The project of designing “artificial moral agents,” which lies at the heart of machine ethics, in reality suffers from a fundamental ambiguity rooted entirely in the clarification of the very notion of “agent,” or at least in the distinction between two forms of acting that Luciano Floridi has clearly articulated in his work The Ethics of Artificial Intelligence: on the one hand, “acting” in the philosophical and moral sense, defined by cognitive and intentional attributes such as phenomenal consciousness, reflexive awareness, moral consciousness, intentionality, volition, and the capacity for understanding; and on the other hand, a minimalist and computational form of “acting,” defined by three basic conditions an AI system must satisfy: receiving and using environmental data; autonomously taking measures to achieve objectives based on collected data; and improving performance through learning from interactions.
Without repeating arguments developed elsewhere, Floridi’s fundamental distinction aims precisely to avoid the anthropomorphic bias inherent in discourse about AI, to reaffirm the essential difference between “acting morally” and “behaving morally,” and, consequently, to deny machines—whatever their level of technical sophistication—the status of agency-responsibility. Behavior refers, in the strict sense, to a set of externally observable reactions—as in behaviorism or behavioral psychology, which study responses to environments without recourse to introspection or hypotheses about unobservable mental states—whereas action, understood as the realization and execution of an act, emphasizes the active, voluntary, and intentional character of the agent.
As a new form of acting, AI makes possible for the first time the decoupling between a system’s capacity to perform autonomously and successfully a set of tasks—writing or explaining a text, translating it into multiple languages, composing essays, answering questions, and so forth—and the necessity of possessing intelligence or a set of cognitive capacities to perform those tasks.
What holds from a strictly functional perspective now also applies from an axiological one. For a very long time—leaving aside the well-known animal trials of the Middle Ages—ethical reflection on what is right or wrong, just or unjust, was directed exclusively toward human beings, because they alone were capable (to date) of intention, choice, and therefore responsibility. Yet the exponential developments we have witnessed in recent years in artificial intelligence are profoundly transforming this situation. It is now possible to describe as ethical or moral an artifact produced by human engineering, even though it lacks the attributes traditionally required to be considered a moral agent—moral consciousness, intentions, emotions, a form of common-sense psychology, understanding of ethical stakes across contexts, a capacity for moral deliberation, or the ability to act freely in a manner comparable to that of a human being.
By reducing “moral action” to advanced computational calculations, machine ethics introduces an unprecedented decoupling between agency and responsibility. It effectively posits that it is possible—for legitimate reasons of risk prevention—to entrust machines with “decisions” regulated by values, without bearing the weight of responsibility that moral engagement normally entails for the author of an action. In doing so, it compels these modern-day demiurges—the researchers and computer scientists—to reflect on the contexts in which the automata they design will be used and to consider, within necessarily interdisciplinary collaborations, the sociotechnical effects of their creations as fully as possible.
This decoupling between moral action and responsibility raises essential and vertiginous questions that will likely occupy a considerable share of the energy of engineers, philosophers, jurists, policymakers, and civil society in the years to come, shaping the societies and institutions of tomorrow. On the basis of which values, doctrines, or ethical justifications should we accept entrusting machines with decisions and tasks capable of producing concrete effects on behavior, cognitive development, and human well-being? What alignment method should be adopted when a machine must arbitrate between conflicting—or culturally situated—values in ethical dilemmas where several solutions may be morally justified? Who should decide which values are to be embedded in systems deployed at scale in contexts where human lives may be at stake? How far should we allow machines to express themselves freely? Should we, in the name of cultural diversity, design algorithms that take greater account of the axiological and religious preferences of particular communities? To what extent should we entrust algorithms with decisions that directly affect our existential choices? And in such a framework, can we still be considered responsible for what we do and what we become? Should we, moreover, allow AI to threaten what culture has transmitted to us as most precious and most human—literature, art, music, philosophy, cinema, and so forth?
And if tomorrow we were to succeed in creating machines endowed with agency comparable to that of human beings, should we hold them responsible for their acts? Should we grant them legal personality and subject them to the same laws as humans? Should they enjoy the same fundamental rights? Or should we envisage a universal declaration of the rights of machines, just as there exists a universal declaration of human rights? Will we one day judge machines—or be judged by them? Have legal and moral obligations toward them? And so on.
By way of provisional conclusion. - AI ethics is not, at present, a unified ethics presented in the form of a single argumentative register. Its diversity stems from the actors who practice it, the viewpoints from which it is questioned, its modes of implementation, the purposes it claims, and the levels and scales at which it operates. It is thus traversed by a plurality of normative discourses that, while irreducible to one another, can nevertheless be logically articulated in order to highlight both their differences and their essential complementarity. Above all, this plurality prevents the choice of values from being left solely to engineers and computer scientists, and resists delegating to machines and technology what fundamentally belongs to societal choices that must remain subject to democratic debate. Within this framework, AI ethics may be considered through four essential dimensions.
The ethics of computer scientists and developers: First, the ethics of computer scientists and developers, whose responsibility is bound to grow with the increasing automation affecting most of our social practices. In this context, developer ethics concerns primarily the conduct expected of those involved in the research, development, and deployment phases of AI systems.
The ethics of philosophers: As a philosophical discipline belonging to the field of applied ethics, ethics is less a discourse consisting in the inscription of principles into charters or codes of conduct than a reflective, critical, argumentative, and deliberative endeavor aimed at clarifying concepts and explicating, contextually, the values and meanings we attribute to them. Because it concerns individual and social action, philosophical ethics also possesses a collective dimension that links it to political philosophy. Ethical questions raised by AI are therefore also questions of political philosophy, insofar as they involve societal choices and directly concern institutions.
Machine ethics: Focused on the problem of value alignment and on the design of artificial moral agents, machine ethics constitutes an interdisciplinary research field devoted to studying the conditions under which AI systems may behave in ways deemed morally acceptable and consistent with the values on which they have been trained.
The ethics of institutions: Whereas the ethics of engineers concentrates on the proper conduct of individuals within the AI value chain, and machine ethics on the morally acceptable behavior of the systems themselves, the ethics of institutions concerns, more broadly and at multiple scales, the proper conduct of institutions understood in a broad sense: companies (through compliance programs), states, regional organizations such as the European Union, and international bodies such as the UN, UNESCO, or the OECD. It may take the form of “soft law” instruments—principles, declarations, recommendations, guidelines, white papers, codes of conduct—but also binding legal rules such as the GDPR (General Data Protection Regulation) or the AI Act (Artificial Intelligence Regulation).
Comments