Research · Tech Industry & Practitioner

Back to sweep

Research sweep · deep · 2025 – 2026

Comparative LLM Usage Across Sectors

Comparative real-world usage of LLMs and adjacent AI technologies from June 2025 to June 2026: which models (GPT-5, Claude, Gemini, Llama, Mistral, DeepSeek, Qwen) dominate which sectors, how they are deployed (hosted API, Bedrock/Azure, self-hosted vLLM/Ollama, RAG, agents, fine-tuning), what workloads they serve, and how organisations measure, budget, and publicly report token cost and actual spend.

  • Claude Opus 4.8
  • financial
  • frontier
  • academic
  • vc
  • blogs
  • tech

Synthesised 2026-06-20

Narrative

Developer usage of LLMs, as captured by the Stack Overflow 2025 Developer Survey (49,000+ respondents across 177 countries), is heavily concentrated at the top. OpenAI GPT models lead with 81–82% developer adoption, followed by Claude Sonnet models at 43–45% and Gemini Flash at 35%. The survey, the first edition to ask about specific LLMs rather than AI tools in general, also found that 84% of developers use or plan to use AI tools, yet 46% distrust the accuracy of output - a trust deficit that grew 15 percentage points year-on-year. The 2025 DORA State of AI-assisted Software Development report, drawn from nearly 5,000 survey responses and 100+ hours of qualitative interviews, confirmed near-universal adoption: 90% of respondents reported using AI in daily software development work, spending a median of two hours per day on it. Critically, DORA found that AI functions as a multiplier of existing team conditions - strengthening high performers while amplifying dysfunction in weaker organisations - not as a uniform productivity lever.

The Thoughtworks Technology Radar shifted markedly across its two 2025 editions. Volume 32 (April 2025) highlighted retrieval-augmented generation and prompt engineering as emergent practices. By Volume 33 (November 2025), the signal had moved to context engineering, the Model Context Protocol, and growth of agentic systems - a transition Thoughtworks CTO Rachel Laycock described as a step change in industry thinking. Separately, InfoQ's practitioner coverage tracked the same arc: agentic AI moved from architectural curiosity to a distinct engineering discipline requiring specification-driven development, atomic decomposition, and observable workflows. A 2025 practitioner study surveying 306 engineers across 26 domains found that production agents are typically constrained: 68% execute at most 10 steps before requiring human intervention, and 70% rely on prompting off-the-shelf models rather than weight tuning, with reliability cited as the primary development challenge. The ZenML analysis of 1,200 production LLM deployments reinforced this: the organisations extracting real value were those investing in evaluation pipelines and infrastructure-based guardrails, not those with the most impressive demos.

Deployment architecture is fragmenting into a well-defined set of patterns. Cloud-managed services (Azure OpenAI, Bedrock, Vertex AI) remain dominant for enterprise teams lacking inference infrastructure. Self-hosted inference via vLLM is consolidating as the production-grade open alternative: Stripe reportedly achieved a 73% inference cost reduction by migrating 50 million daily API calls to vLLM on one-third of a prior GPU fleet. Ollama serves prototyping and local development, with practitioners citing a clear migration path from Ollama to vLLM when real users appear. The Thoughtworks Radar noted a growing set of organisations moving toward self-hosted, governed deployments to address multi-tenancy, access control, and data residency requirements. Open-weight adoption is expanding but remains below closed-model share in large enterprises: a16z's 2026 enterprise CIO survey found that 23% of large enterprises were running OpenAI's o3 in production compared to 3% for DeepSeek, though open-weight models - particularly Llama 4, DeepSeek V3/V4, and Qwen 3.x - are gaining faster in regulated sectors where third-party API calls face procurement or compliance obstacles.

LLM cost measurement is undergoing rapid professionalisation under the banner of "TokenOps", a FinOps-adjacent discipline that applies visibility, allocation, and optimisation principles to token consumption. Per-token prices fell approximately 80% across the major provider families between early 2025 and early 2026, yet enterprise AI spend still rose: CloudZero's State of AI Costs report measured average monthly AI spend growing from $63,000 in 2024 to $85,500 in 2025, a 36% increase. The a16z 2025 enterprise survey found average LLM spend growing from $4.5 million to $7 million over two years, with CIOs forecasting $11.6 million by end of 2026. The primary driver of spend growth is not per-token price but volume expansion driven by agentic loops: agentic workflows trigger 10–20 LLM calls per user task versus single-query chatbots, with one analysis citing a Gartner figure that agentic models require 5–30x more tokens per task than standard chatbots. The evidence base for published enterprise spend figures remains thin and largely vendor-driven; independently verified budget data at organisation level is almost absent, and ROI claims from enterprises themselves remain, as a16z noted, "less dramatic than one might expect."


Sources

ID Title Outlet Date Significance
p1 2025 Stack Overflow Developer Survey Stack Overflow 2025-07 First edition to ask about specific LLMs by name; 49,000+ respondents establish GPT models at 81%, Claude Sonnet at 43%, and Gemini Flash at 35% developer adoption, with 46% distrusting AI output accuracy.
p2 Developers remain willing but reluctant to use AI: The 2025 Developer Survey results are here Stack Overflow Blog 2025-12 Detailed breakdown of LLM model usage by developer segment, showing Claude Sonnet more prevalent among professional developers (45%) than learners (30%), alongside new agentic AI tool data.
p3 [DORA State of AI-assisted Software Development 2025](https://dora.dev/dora-report-2025/) DORA / Google Cloud 2025-09
p4 How are developers using AI? Inside Google's 2025 DORA report Google Blog 2025-09 Official Google summary of DORA 2025 findings: 80%+ of respondents report AI productivity gains, 59% report improved code quality, with the DORA AI Capabilities Model introduced as a prescriptive framework.
p5 AI Is Amplifying Software Engineering Performance, Says the 2025 DORA Report InfoQ 2026-03 Practitioner-oriented analysis of DORA 2025 findings, framing AI as a multiplier of existing engineering conditions rather than a universal productivity gain - relevant to deployment decision-making.
p6 Thoughtworks Technology Radar Highlights The Rapid Evolution of AI Assistance in 2025 Thoughtworks 2025-11 Volume 33 of the biannual Radar documents the shift from RAG and prompt engineering (Volume 32) to context engineering, MCP, and agentic systems, signalling practitioner maturation in LLM adoption.
p7 [Macro trends in the tech industry November 2025 Thoughtworks](https://www.thoughtworks.com/en-de/insights/blog/technology-strategy/macro-trends-tech-industry-november-2025) Thoughtworks
p8 Technology Radar Volume 32: GenAI techniques and observability Thoughtworks 2025-04 Volume 32 baseline against which Volume 33 shifts can be measured; identifies RAG retrieval techniques, LLM observability tools, and structured output as the leading practitioner concerns of early 2025.
p9 Agentic AI Architecture Framework for Enterprises InfoQ 2025-07 Named-practitioner, case-study-grounded framework describing three production tiers for enterprise agentic AI, providing the most detailed public architecture guidance for regulated and complex deployments.
p10 The Architectural Shift: AI Agents Become Execution Engines While Backends Retreat to Governance InfoQ 2025-10 Documents the structural shift where agents move from intent recognition to action execution via MCP, with Gartner data that 40% of enterprise applications will embed task-specific agents by 2026.
p11 Google's Eight Essential Multi-Agent Design Patterns InfoQ 2026-01 Documents Google's official multi-agent design pattern taxonomy (sequential, loop, parallel and five derivatives) drawn from production Agent Development Kit experience, a key reference for practitioners.
p12 Agentic AI Patterns Reinforce Engineering Discipline InfoQ 2026-03 Covers practitioner-derived engineering patterns for agentic AI, emphasising specification-driven development and automated traceability as responses to quality and reliability failures in agent deployments.
p13 What I Learned Building Multi-Agent Systems from Scratch (Shopify) InfoQ 2026-05 Named-practitioner case study from Shopify describing the evolution from single-prompt AI to multi-agent microservices architecture, with concrete lessons on token efficiency and context engineering.
p14 What 1,200 Production Deployments Reveal About LLMOps in 2025 ZenML Blog 2025-12 Analysis of 1,200 real production LLM deployments identifies six patterns separating successful teams from those stuck in demo mode, with a documented example of cost escalating from $127 to $47,000 weekly due to an agent loop error.
p15 How 100 Enterprise CIOs Are Building and Buying Gen AI in 2025 Andreessen Horowitz (a16z) 2026-02 Survey of 100 enterprise CIOs showing average LLM spend growing from $4.5M to $7M over two years, 37% now using five or more models, and multi-model deployment becoming the default pattern.
p16 Leaders, gainers and unexpected winners in the Enterprise AI arms race Andreessen Horowitz (a16z) 2026-02 Follow-on a16z enterprise survey documenting that 54% of CIOs say reasoning models accelerated LLM adoption, 23% run OpenAI o3 versus 3% DeepSeek in production, and that reported ROI remains below narrative expectations.
p17 A16Z Report: Startup Spend Confirms LLMs Central to Enterprise Purchase Intent MLQ.ai / a16z 2025-08 Uses verified transaction data from 200,000+ startups to confirm GPT and Claude as the most-purchased AI applications, offering payment-verified evidence rather than self-reported usage data.
p18 Token Economics and TokenOps: The Definitive Guide to FinOps for Tokens Finout 2026-06 Defines TokenOps as an emerging discipline applying FinOps principles to LLM token consumption, with the key empirical observation that per-token prices are falling while total enterprise spend rises due to agentic volume growth.
p19 LLM API Pricing Comparison In 2026: Every Major Model, Ranked By Cost CloudZero 2026-05 CloudZero State of AI Costs report data showing average monthly AI spend at $85,500 in 2025 (up 36% from 2024), with token price ranges from $0.10 to $30 per million tokens across current frontier models.
p20 FinOps for AI: LLM Cost Governance Rick Pollick (practitioner blog) 2026-06 Practitioner-authored analysis citing Stanford AI Index 2025 and Menlo Ventures data to show inference costs fell 280x from 2022 to 2024 while enterprise spend rose from $2.3B (2023) to $37B (2025).
p21 Open-Weight Models H1 2026: DeepSeek, Qwen, Llama Recap Digital Applied 2026-05 Tracks the diverging release cadences and enterprise adoption trajectories of the three main open-weight families through H1 2026, documenting sovereign-cloud deployment patterns and procurement-side adoption in finance, healthcare, and public sector.
p22 [vLLM Production Deployment Introl Blog](https://introl.com/blog/vllm-production-deployment-inference-serving-architecture) Introl 2026-02
p23 Open-Source vs Commercial LLMs: The Complete Guide (2026) SitePoint 2026-04 Provides empirical breakeven analysis for self-hosted versus API deployment, estimating the crossover at 10–30M tokens per day and quantifying DevOps overhead at 0.5–1.0 FTE per self-hosted deployment.
p24 DORA Report 2025 Key Takeaways: AI Impact on Dev Metrics Faros AI 2026-04 Triangulates DORA 2025 survey findings with Faros telemetry from 10,000 developers, identifying the AI Productivity Paradox: individual output rises (98% more PRs merged) while organisational delivery metrics remain flat.
p25 DeepSeek V4 Launch: 4 Specs That Make It the Most Disruptive Open-Weight Model of 2026 MindStudio 2026-05 Documents the commercial and compliance case for open-weight frontier models in regulated sectors, showing how healthcare, finance, and legal organisations use DeepSeek V4 weights to avoid third-party API compliance overhead.

We use analytics cookies to understand site usage and improve the service. We do not use marketing cookies.