Home > AI, Cloud & Data > Article

Anthropic Blocks OpenAI API Over GPT-5 Benchmarking Dispute

Photo by: Anthropic

By Diego Valverde | Journalist & Industry Analyst - Tue, 08/05/2025 - 09:40

Anthropic, the developer of the Claude family of AI models, has revoked Application Programming Interface (API) access for its direct competitor OpenAI. The company implemented the measure after discovering that OpenAI's technical staff used Anthropic's tools, specifically Claude Code, to develop and benchmark its upcoming GPT-5 model.

"Claude Code has become the go-to choice for coders everywhere, and so it was no surprise to learn OpenAI's own technical staff were also using our coding tools ahead of the launch of GPT-5. Unfortunately, this is a direct violation of our terms of service," says Christopher Nulty, Head of External Communications, Anthropic.

In response, Hannah Wong, Chief Communications Officer, OpenAI, characterizes the practice as an industry standard to evaluate AI systems. "While we respect Anthropic's decision to cut off our API access, it is disappointing considering our API remains available to them," says Wong.

Anthropic's decision is indicative of the intensifying competition within the Generative AI sector. As large language models (LLMs) become more powerful and commercially viable, the companies developing them seek to protect their intellectual property and competitive advantages more rigorously.

The core of the conflict resides in Anthropic's commercial terms of service. These terms explicitly prohibit clients from using its services to "build a competing product or service, including training competing AI models" or to "reverse-engineer or duplicate" the services. Such contractual clauses are fundamental to the strategy of protecting intangible assets in a field where innovation is rapid and expensive.

Developing and training a model like Claude or GPT-5 requires massive investments in computing power, data, and human talent. Allowing a competitor to directly use an API to evaluate, compare, and potentially improve its own product undermines that investment. This incident, according to Wired, highlights a fundamental tension in the industry: the balance between collaborative or benchmarking practices that have historically driven technological progress and the need to protect commercial interests in a highly competitive market.

Ban Details

Reports indicate that OpenAI did not use the standard Claude chat interface. Instead, its teams accessed the models through special developer APIs, connecting Claude to their internal tools. This method allowed them to run systematic tests and performance benchmarks of Claude against their own models on specific tasks. These tasks included coding, creative writing, and, critically, safety responses.

The safety tests involved using prompts with sensitive content, such as child sexual abuse material, self-harm, and defamation, to audit and compare the models' safety barriers. OpenAI defends its methodology as a standard practice to "evaluate other AI systems to measure progress and improve safety." However, that this evaluation occurred in the context of GPT-5's development was the inflection point for Anthropic.

The situation is complex because while Anthropic has cut off primary access, it has also communicated that it "will continue to ensure OpenAI has API access for benchmarking and safety evaluation purposes, as is standard industry practice," reports Wired. It remains unclear how this restricted access will be delineated to prevent a recurrence of the misuse.

This event is not the first of its kind for Anthropic. Back in June 2025, the company previously restricted access for the AI coding startup Windsurf following rumors of a potential acquisition by OpenAI. The measure also coincides with Anthropic's announcement of new weekly usage limits for Claude Code, a decision driven by unprecedented demand that has tested its infrastructure capacity.

This conflict raises significant legal and strategic questions for the B2B AI ecosystem. It opens a debate on the enforceability and scope of terms of service in a "black box" environment where monitoring the end use of an API is difficult. The outcomes of these disputes could set precedents on how competitor interactions are regulated, the definition of "fair use" for model evaluation, and the balance between intellectual property protection and fostering an open innovation ecosystem.

Photo by: Anthropic