Comparative LLM Usage Across Sectors

Comparative real-world usage of LLMs and adjacent AI technologies from June 2025 to June 2026: which models (GPT-5, Claude, Gemini, Llama, Mistral, DeepSeek, Qwen) dominate which sectors, how they are deployed (hosted API, Bedrock/Azure, self-hosted vLLM/Ollama, RAG, agents, fine-tuning), what workloads they serve, and how organisations measure, budget, and publicly report token cost and actual spend.

Claude Opus 4.8
financial
frontier
academic
vc
blogs
tech

Synthesised 2026-06-20

Narrative

Developer usage of LLMs, as captured by the Stack Overflow 2025 Developer Survey (49,000+ respondents across 177 countries), is heavily concentrated at the top. OpenAI GPT models lead with 81–82% developer adoption, followed by Claude Sonnet models at 43–45% and Gemini Flash at 35%. The survey, the first edition to ask about specific LLMs rather than AI tools in general, also found that 84% of developers use or plan to use AI tools, yet 46% distrust the accuracy of output - a trust deficit that grew 15 percentage points year-on-year. The 2025 DORA State of AI-assisted Software Development report, drawn from nearly 5,000 survey responses and 100+ hours of qualitative interviews, confirmed near-universal adoption: 90% of respondents reported using AI in daily software development work, spending a median of two hours per day on it. Critically, DORA found that AI functions as a multiplier of existing team conditions - strengthening high performers while amplifying dysfunction in weaker organisations - not as a uniform productivity lever.

The Thoughtworks Technology Radar shifted markedly across its two 2025 editions. Volume 32 (April 2025) highlighted retrieval-augmented generation and prompt engineering as emergent practices. By Volume 33 (November 2025), the signal had moved to context engineering, the Model Context Protocol, and growth of agentic systems - a transition Thoughtworks CTO Rachel Laycock described as a step change in industry thinking. Separately, InfoQ's practitioner coverage tracked the same arc: agentic AI moved from architectural curiosity to a distinct engineering discipline requiring specification-driven development, atomic decomposition, and observable workflows. A 2025 practitioner study surveying 306 engineers across 26 domains found that production agents are typically constrained: 68% execute at most 10 steps before requiring human intervention, and 70% rely on prompting off-the-shelf models rather than weight tuning, with reliability cited as the primary development challenge. The ZenML analysis of 1,200 production LLM deployments reinforced this: the organisations extracting real value were those investing in evaluation pipelines and infrastructure-based guardrails, not those with the most impressive demos.

Deployment architecture is fragmenting into a well-defined set of patterns. Cloud-managed services (Azure OpenAI, Bedrock, Vertex AI) remain dominant for enterprise teams lacking inference infrastructure. Self-hosted inference via vLLM is consolidating as the production-grade open alternative: Stripe reportedly achieved a 73% inference cost reduction by migrating 50 million daily API calls to vLLM on one-third of a prior GPU fleet. Ollama serves prototyping and local development, with practitioners citing a clear migration path from Ollama to vLLM when real users appear. The Thoughtworks Radar noted a growing set of organisations moving toward self-hosted, governed deployments to address multi-tenancy, access control, and data residency requirements. Open-weight adoption is expanding but remains below closed-model share in large enterprises: a16z's 2026 enterprise CIO survey found that 23% of large enterprises were running OpenAI's o3 in production compared to 3% for DeepSeek, though open-weight models - particularly Llama 4, DeepSeek V3/V4, and Qwen 3.x - are gaining faster in regulated sectors where third-party API calls face procurement or compliance obstacles.

LLM cost measurement is undergoing rapid professionalisation under the banner of "TokenOps", a FinOps-adjacent discipline that applies visibility, allocation, and optimisation principles to token consumption. Per-token prices fell approximately 80% across the major provider families between early 2025 and early 2026, yet enterprise AI spend still rose: CloudZero's State of AI Costs report measured average monthly AI spend growing from $63,000 in 2024 to $85,500 in 2025, a 36% increase. The a16z 2025 enterprise survey found average LLM spend growing from $4.5 million to $7 million over two years, with CIOs forecasting $11.6 million by end of 2026. The primary driver of spend growth is not per-token price but volume expansion driven by agentic loops: agentic workflows trigger 10–20 LLM calls per user task versus single-query chatbots, with one analysis citing a Gartner figure that agentic models require 5–30x more tokens per task than standard chatbots. The evidence base for published enterprise spend figures remains thin and largely vendor-driven; independently verified budget data at organisation level is almost absent, and ROI claims from enterprises themselves remain, as a16z noted, "less dramatic than one might expect."

Sources

ID	Title	Outlet	Date	Significance
p1	2025 Stack Overflow Developer Survey	Stack Overflow	2025-07	First edition to ask about specific LLMs by name; 49,000+ respondents establish GPT models at 81%, Claude Sonnet at 43%, and Gemini Flash at 35% developer adoption, with 46% distrusting AI output accuracy.
p2	Developers remain willing but reluctant to use AI: The 2025 Developer Survey results are here	Stack Overflow Blog	2025-12	Detailed breakdown of LLM model usage by developer segment, showing Claude Sonnet more prevalent among professional developers (45%) than learners (30%), alongside new agentic AI tool data.
p3	[DORA	State of AI-assisted Software Development 2025](https://dora.dev/dora-report-2025/)	DORA / Google Cloud	2025-09
p4	How are developers using AI? Inside Google's 2025 DORA report	Google Blog	2025-09	Official Google summary of DORA 2025 findings: 80%+ of respondents report AI productivity gains, 59% report improved code quality, with the DORA AI Capabilities Model introduced as a prescriptive framework.
p5	AI Is Amplifying Software Engineering Performance, Says the 2025 DORA Report	InfoQ	2026-03	Practitioner-oriented analysis of DORA 2025 findings, framing AI as a multiplier of existing engineering conditions rather than a universal productivity gain - relevant to deployment decision-making.
p6	Thoughtworks Technology Radar Highlights The Rapid Evolution of AI Assistance in 2025	Thoughtworks	2025-11	Volume 33 of the biannual Radar documents the shift from RAG and prompt engineering (Volume 32) to context engineering, MCP, and agentic systems, signalling practitioner maturation in LLM adoption.
p7	[Macro trends in the tech industry	November 2025	Thoughtworks](https://www.thoughtworks.com/en-de/insights/blog/technology-strategy/macro-trends-tech-industry-november-2025)	Thoughtworks
p8	Technology Radar Volume 32: GenAI techniques and observability	Thoughtworks	2025-04	Volume 32 baseline against which Volume 33 shifts can be measured; identifies RAG retrieval techniques, LLM observability tools, and structured output as the leading practitioner concerns of early 2025.
p9	Agentic AI Architecture Framework for Enterprises	InfoQ	2025-07	Named-practitioner, case-study-grounded framework describing three production tiers for enterprise agentic AI, providing the most detailed public architecture guidance for regulated and complex deployments.
p10	The Architectural Shift: AI Agents Become Execution Engines While Backends Retreat to Governance	InfoQ	2025-10	Documents the structural shift where agents move from intent recognition to action execution via MCP, with Gartner data that 40% of enterprise applications will embed task-specific agents by 2026.
p11	Google's Eight Essential Multi-Agent Design Patterns	InfoQ	2026-01	Documents Google's official multi-agent design pattern taxonomy (sequential, loop, parallel and five derivatives) drawn from production Agent Development Kit experience, a key reference for practitioners.
p12	Agentic AI Patterns Reinforce Engineering Discipline	InfoQ	2026-03	Covers practitioner-derived engineering patterns for agentic AI, emphasising specification-driven development and automated traceability as responses to quality and reliability failures in agent deployments.
p13	What I Learned Building Multi-Agent Systems from Scratch (Shopify)	InfoQ	2026-05	Named-practitioner case study from Shopify describing the evolution from single-prompt AI to multi-agent microservices architecture, with concrete lessons on token efficiency and context engineering.
p14	What 1,200 Production Deployments Reveal About LLMOps in 2025	ZenML Blog	2025-12	Analysis of 1,200 real production LLM deployments identifies six patterns separating successful teams from those stuck in demo mode, with a documented example of cost escalating from $127 to $47,000 weekly due to an agent loop error.
p15	How 100 Enterprise CIOs Are Building and Buying Gen AI in 2025	Andreessen Horowitz (a16z)	2026-02	Survey of 100 enterprise CIOs showing average LLM spend growing from $4.5M to $7M over two years, 37% now using five or more models, and multi-model deployment becoming the default pattern.
p16	Leaders, gainers and unexpected winners in the Enterprise AI arms race	Andreessen Horowitz (a16z)	2026-02	Follow-on a16z enterprise survey documenting that 54% of CIOs say reasoning models accelerated LLM adoption, 23% run OpenAI o3 versus 3% DeepSeek in production, and that reported ROI remains below narrative expectations.
p17	A16Z Report: Startup Spend Confirms LLMs Central to Enterprise Purchase Intent	MLQ.ai / a16z	2025-08	Uses verified transaction data from 200,000+ startups to confirm GPT and Claude as the most-purchased AI applications, offering payment-verified evidence rather than self-reported usage data.
p18	Token Economics and TokenOps: The Definitive Guide to FinOps for Tokens	Finout	2026-06	Defines TokenOps as an emerging discipline applying FinOps principles to LLM token consumption, with the key empirical observation that per-token prices are falling while total enterprise spend rises due to agentic volume growth.
p19	LLM API Pricing Comparison In 2026: Every Major Model, Ranked By Cost	CloudZero	2026-05	CloudZero State of AI Costs report data showing average monthly AI spend at $85,500 in 2025 (up 36% from 2024), with token price ranges from $0.10 to $30 per million tokens across current frontier models.
p20	FinOps for AI: LLM Cost Governance	Rick Pollick (practitioner blog)	2026-06	Practitioner-authored analysis citing Stanford AI Index 2025 and Menlo Ventures data to show inference costs fell 280x from 2022 to 2024 while enterprise spend rose from $2.3B (2023) to $37B (2025).
p21	Open-Weight Models H1 2026: DeepSeek, Qwen, Llama Recap	Digital Applied	2026-05	Tracks the diverging release cadences and enterprise adoption trajectories of the three main open-weight families through H1 2026, documenting sovereign-cloud deployment patterns and procurement-side adoption in finance, healthcare, and public sector.
p22	[vLLM Production Deployment	Introl Blog](https://introl.com/blog/vllm-production-deployment-inference-serving-architecture)	Introl	2026-02
p23	Open-Source vs Commercial LLMs: The Complete Guide (2026)	SitePoint	2026-04	Provides empirical breakeven analysis for self-hosted versus API deployment, estimating the crossover at 10–30M tokens per day and quantifying DevOps overhead at 0.5–1.0 FTE per self-hosted deployment.
p24	DORA Report 2025 Key Takeaways: AI Impact on Dev Metrics	Faros AI	2026-04	Triangulates DORA 2025 survey findings with Faros telemetry from 10,000 developers, identifying the AI Productivity Paradox: individual output rises (98% more PRs merged) while organisational delivery metrics remain flat.
p25	DeepSeek V4 Launch: 4 Specs That Make It the Most Disruptive Open-Weight Model of 2026	MindStudio	2026-05	Documents the commercial and compliance case for open-weight frontier models in regulated sectors, showing how healthcare, finance, and legal organisations use DeepSeek V4 weights to avoid third-party API compliance overhead.