The United Nations opened three days of talks in Geneva on June 15 about artificial intelligence in the military domain and its implications for international peace and security.
The subject is broader than autonomous weapons. Military AI can support intelligence analysis, target identification, command decisions, logistics, cyber operations, and the operation of weapon systems. Those applications do not carry identical risks, but they share a difficult question: what happens when software influences a decision whose consequences cannot be undone?
The meeting at the Palais des Nations runs from June 15 through June 17. It is important to describe it accurately. These are informal multi-stakeholder exchanges, not treaty negotiations, and the meeting will not itself produce binding international rules.
Its value lies elsewhere. The discussion is moving beyond general promises to use AI responsibly and toward the operational requirements that make responsibility real: testing, traceability, human judgment, civilian protection, accountability, and limits on particular uses.
Military AI is already an operational issue
The category "military AI" can sound like a future scenario, but AI-enabled systems are already being integrated into military planning and operations. Human Rights Watch's background briefing for the Geneva meeting points to uses in target recognition, intelligence fusion, weapon guidance, air and missile defense, cyber operations, and systems that accelerate decisions about the use of force.
Speed is often presented as the advantage. A system can process more imagery, signals, or reports than a human team can examine manually. It can surface patterns, rank possible targets, or recommend a response while an operation is unfolding.
That same speed can compress the time available to verify a target, test an assumption, assess possible civilian harm, or challenge the machine's recommendation. Faster analysis is useful only when the people responsible for the decision still have enough information and time to exercise judgment.
Testing a model is not the same as testing a weapon
Traditional military testing tries to establish how a system behaves under defined conditions. Machine-learning systems complicate that process. Their behavior depends on training data, operating data, deployment context, model updates, and interactions that cannot always be reproduced in a laboratory.
Human Rights Watch argues that adoption is moving faster than testing, evaluation, verification, and validation. Its briefing warns that simulations and formal methods may reveal only part of an AI system's behavior, while opacity makes it harder to explain why a system produced a particular output.
The International Committee of the Red Cross similarly emphasizes the limitations of military AI: unreliable or unrepresentative data, brittleness in unfamiliar conditions, adversarial manipulation, and a tendency for people to place too much confidence in automated recommendations. In a high-stakes setting, a plausible answer is not enough. Operators need to understand where a system is reliable, where it is uncertain, and when it should not be used.
This turns testing into a continuous responsibility rather than a one-time gate. A system that performed acceptably before deployment may behave differently after a model update, a data-source change, or a shift to another environment.
Human control has to be more than a button
Many military AI principles say a human will remain "in the loop." That phrase can conceal more than it clarifies.
A person may technically approve a recommendation while having only seconds to respond, little visibility into the supporting evidence, and strong organizational pressure to accept the machine's output. Under those conditions, the human can become a confirmation step rather than an independent decision-maker.
Meaningful control requires enough time, context, training, and authority to question the system. It also requires a practical way to stop or override an automated process without creating another danger. If an operator cannot see the source of a recommendation, inspect its confidence and limitations, or escalate uncertainty, the presence of an approval button does not solve the accountability problem.
This is why the Geneva conversation matters. It creates space to ask which decisions must remain human, what evidence must accompany an AI recommendation, how operators can contest it, and whether some uses should be prohibited regardless of claimed accuracy.
Accountability follows the decision chain
AI can distribute responsibility across model developers, data suppliers, defense contractors, commanders, and frontline operators. When a system contributes to an unlawful or harmful outcome, that technical supply chain can make it difficult to reconstruct what happened and who had the power to prevent it.
Traceability therefore has to cover more than a model's final output. Investigators may need to know which model version was active, what data entered the system, what alternatives it produced, what warnings were shown, who reviewed the recommendation, and whether a human changed or accepted it.
The ICRC's position starts from existing international humanitarian law: people remain responsible for the legal judgments involved in planning and carrying out attacks. AI does not transfer that responsibility to a model or its vendor. The harder policy question is how military systems and procedures must be designed so that human responsibility remains exercisable in practice.
Informal does not mean inconsequential
The June meeting is not expected to settle the debate. States differ on military advantage, acceptable risk, transparency, procurement, and whether new international law is needed. Civil-society groups are also pressing for stronger restrictions than many governments currently support.
Human Rights Watch is using the meeting to call for a moratorium on AI as the basis for targeting decisions until stronger safeguards are in place, along with greater transparency, meaningful human control, and red lines around military uses. Those are advocacy recommendations, not agreed UN policy. The distinction matters.
What the meeting can do is narrow the distance between abstract principles and concrete obligations. Future negotiations will be more useful if they are organized around identifiable applications and measurable safeguards rather than the word "AI" as one undifferentiated category.
That practical framing also connects with the broader national-security infrastructure questions we examined in the U.S. national-security AI directive. Procurement, evaluation, vendor dependence, secure deployment, and human accountability are not separate from governance. They are where governance becomes operational.
What this means for technology teams
Most product teams will never build a targeting system. The engineering lessons still apply wherever AI influences a high-consequence decision.
First, constrain authority. An AI system should be allowed to perform only the actions its testing, monitoring, and failure controls can support. A recommendation engine should not quietly become an autonomous decision-maker because the surrounding workflow makes approval automatic.
Second, make inputs and assumptions traceable. Teams should be able to reconstruct which data, model version, configuration, and policy produced an important output. This is closely related to the jurisdiction-aware controls discussed in our article on the emerging state AI rulebook: disclosure and accountability have to be designed into the product rather than added after a failure.
Third, design human review as a real capability. Reviewers need time, relevant evidence, uncertainty signals, and authority to reject the recommendation. They also need an escalation path when the system encounters a case outside its tested conditions.
Finally, build an override that works under pressure. A safeguard that is slow, obscure, or available only to an administrator is unlikely to help when a consequential automated process is already moving.
The conversation is becoming more specific
The Geneva exchanges will not create a military AI treaty in three days. They may still mark a useful shift in the international conversation.
"Responsible AI" is too broad to protect anyone on its own. The meaningful questions are narrower: What was the system permitted to do? How was it tested? Which evidence did the human receive? Could the recommendation be challenged? Who could stop the process? Who remains accountable afterward?
Military AI makes those questions unusually urgent because mistakes can cost lives and destabilize conflicts. But the underlying standard belongs wherever software is given influence over decisions that matter: authority must be bounded, evidence must be inspectable, humans must be able to intervene, and responsibility must remain clear.