Research · Tech Industry & Practitioner
Back to sweepResearch sweep · deep · 2025 – 2026
Comparative LLM Usage Across Sectors
Comparative real-world usage of LLMs and adjacent AI technologies from June 2025 to June 2026: which models (GPT-5, Claude, Gemini, Llama, Mistral, DeepSeek, Qwen) dominate which sectors, how they are deployed (hosted API, Bedrock/Azure, self-hosted vLLM/Ollama, RAG, agents, fine-tuning), what workloads they serve, and how organisations measure, budget, and publicly report token cost and actual spend.
- Claude Opus 4.8
- financial
- frontier
- academic
- vc
- blogs
- tech
Synthesised 2026-06-20
Narrative
Developer usage of LLMs, as captured by the Stack Overflow 2025 Developer Survey (49,000+ respondents across 177 countries), is heavily concentrated at the top. OpenAI GPT models lead with 81–82% developer adoption, followed by Claude Sonnet models at 43–45% and Gemini Flash at 35%. The survey, the first edition to ask about specific LLMs rather than AI tools in general, also found that 84% of developers use or plan to use AI tools, yet 46% distrust the accuracy of output - a trust deficit that grew 15 percentage points year-on-year. The 2025 DORA State of AI-assisted Software Development report, drawn from nearly 5,000 survey responses and 100+ hours of qualitative interviews, confirmed near-universal adoption: 90% of respondents reported using AI in daily software development work, spending a median of two hours per day on it. Critically, DORA found that AI functions as a multiplier of existing team conditions - strengthening high performers while amplifying dysfunction in weaker organisations - not as a uniform productivity lever.
The Thoughtworks Technology Radar shifted markedly across its two 2025 editions. Volume 32 (April 2025) highlighted retrieval-augmented generation and prompt engineering as emergent practices. By Volume 33 (November 2025), the signal had moved to context engineering, the Model Context Protocol, and growth of agentic systems - a transition Thoughtworks CTO Rachel Laycock described as a step change in industry thinking. Separately, InfoQ's practitioner coverage tracked the same arc: agentic AI moved from architectural curiosity to a distinct engineering discipline requiring specification-driven development, atomic decomposition, and observable workflows. A 2025 practitioner study surveying 306 engineers across 26 domains found that production agents are typically constrained: 68% execute at most 10 steps before requiring human intervention, and 70% rely on prompting off-the-shelf models rather than weight tuning, with reliability cited as the primary development challenge. The ZenML analysis of 1,200 production LLM deployments reinforced this: the organisations extracting real value were those investing in evaluation pipelines and infrastructure-based guardrails, not those with the most impressive demos.
Deployment architecture is fragmenting into a well-defined set of patterns. Cloud-managed services (Azure OpenAI, Bedrock, Vertex AI) remain dominant for enterprise teams lacking inference infrastructure. Self-hosted inference via vLLM is consolidating as the production-grade open alternative: Stripe reportedly achieved a 73% inference cost reduction by migrating 50 million daily API calls to vLLM on one-third of a prior GPU fleet. Ollama serves prototyping and local development, with practitioners citing a clear migration path from Ollama to vLLM when real users appear. The Thoughtworks Radar noted a growing set of organisations moving toward self-hosted, governed deployments to address multi-tenancy, access control, and data residency requirements. Open-weight adoption is expanding but remains below closed-model share in large enterprises: a16z's 2026 enterprise CIO survey found that 23% of large enterprises were running OpenAI's o3 in production compared to 3% for DeepSeek, though open-weight models - particularly Llama 4, DeepSeek V3/V4, and Qwen 3.x - are gaining faster in regulated sectors where third-party API calls face procurement or compliance obstacles.
LLM cost measurement is undergoing rapid professionalisation under the banner of "TokenOps", a FinOps-adjacent discipline that applies visibility, allocation, and optimisation principles to token consumption. Per-token prices fell approximately 80% across the major provider families between early 2025 and early 2026, yet enterprise AI spend still rose: CloudZero's State of AI Costs report measured average monthly AI spend growing from $63,000 in 2024 to $85,500 in 2025, a 36% increase. The a16z 2025 enterprise survey found average LLM spend growing from $4.5 million to $7 million over two years, with CIOs forecasting $11.6 million by end of 2026. The primary driver of spend growth is not per-token price but volume expansion driven by agentic loops: agentic workflows trigger 10–20 LLM calls per user task versus single-query chatbots, with one analysis citing a Gartner figure that agentic models require 5–30x more tokens per task than standard chatbots. The evidence base for published enterprise spend figures remains thin and largely vendor-driven; independently verified budget data at organisation level is almost absent, and ROI claims from enterprises themselves remain, as a16z noted, "less dramatic than one might expect."
Sources
| ID | Title | Outlet | Date | Significance |
|---|---|---|---|---|
| p1 | 2025 Stack Overflow Developer Survey | Stack Overflow | 2025-07 | First edition to ask about specific LLMs by name; 49,000+ respondents establish GPT models at 81%, Claude Sonnet at 43%, and Gemini Flash at 35% developer adoption, with 46% distrusting AI output accuracy. |
| p2 | Developers remain willing but reluctant to use AI: The 2025 Developer Survey results are here | Stack Overflow Blog | 2025-12 | Detailed breakdown of LLM model usage by developer segment, showing Claude Sonnet more prevalent among professional developers (45%) than learners (30%), alongside new agentic AI tool data. |
| p3 | [DORA | State of AI-assisted Software Development 2025](https://dora.dev/dora-report-2025/) | DORA / Google Cloud | 2025-09 |
| p4 | How are developers using AI? Inside Google's 2025 DORA report | Google Blog | 2025-09 | Official Google summary of DORA 2025 findings: 80%+ of respondents report AI productivity gains, 59% report improved code quality, with the DORA AI Capabilities Model introduced as a prescriptive framework. |
| p5 | AI Is Amplifying Software Engineering Performance, Says the 2025 DORA Report | InfoQ | 2026-03 | Practitioner-oriented analysis of DORA 2025 findings, framing AI as a multiplier of existing engineering conditions rather than a universal productivity gain - relevant to deployment decision-making. |
| p6 | Thoughtworks Technology Radar Highlights The Rapid Evolution of AI Assistance in 2025 | Thoughtworks | 2025-11 | Volume 33 of the biannual Radar documents the shift from RAG and prompt engineering (Volume 32) to context engineering, MCP, and agentic systems, signalling practitioner maturation in LLM adoption. |
| p7 | [Macro trends in the tech industry | November 2025 | Thoughtworks](https://www.thoughtworks.com/en-de/insights/blog/technology-strategy/macro-trends-tech-industry-november-2025) | Thoughtworks |
| p8 | Technology Radar Volume 32: GenAI techniques and observability | Thoughtworks | 2025-04 | Volume 32 baseline against which Volume 33 shifts can be measured; identifies RAG retrieval techniques, LLM observability tools, and structured output as the leading practitioner concerns of early 2025. |
| p9 | Agentic AI Architecture Framework for Enterprises | InfoQ | 2025-07 | Named-practitioner, case-study-grounded framework describing three production tiers for enterprise agentic AI, providing the most detailed public architecture guidance for regulated and complex deployments. |
| p10 | The Architectural Shift: AI Agents Become Execution Engines While Backends Retreat to Governance | InfoQ | 2025-10 | Documents the structural shift where agents move from intent recognition to action execution via MCP, with Gartner data that 40% of enterprise applications will embed task-specific agents by 2026. |
| p11 | Google's Eight Essential Multi-Agent Design Patterns | InfoQ | 2026-01 | Documents Google's official multi-agent design pattern taxonomy (sequential, loop, parallel and five derivatives) drawn from production Agent Development Kit experience, a key reference for practitioners. |
| p12 | Agentic AI Patterns Reinforce Engineering Discipline | InfoQ | 2026-03 | Covers practitioner-derived engineering patterns for agentic AI, emphasising specification-driven development and automated traceability as responses to quality and reliability failures in agent deployments. |
| p13 | What I Learned Building Multi-Agent Systems from Scratch (Shopify) | InfoQ | 2026-05 | Named-practitioner case study from Shopify describing the evolution from single-prompt AI to multi-agent microservices architecture, with concrete lessons on token efficiency and context engineering. |
| p14 | What 1,200 Production Deployments Reveal About LLMOps in 2025 | ZenML Blog | 2025-12 | Analysis of 1,200 real production LLM deployments identifies six patterns separating successful teams from those stuck in demo mode, with a documented example of cost escalating from $127 to $47,000 weekly due to an agent loop error. |
| p15 | How 100 Enterprise CIOs Are Building and Buying Gen AI in 2025 | Andreessen Horowitz (a16z) | 2026-02 | Survey of 100 enterprise CIOs showing average LLM spend growing from $4.5M to $7M over two years, 37% now using five or more models, and multi-model deployment becoming the default pattern. |
| p16 | Leaders, gainers and unexpected winners in the Enterprise AI arms race | Andreessen Horowitz (a16z) | 2026-02 | Follow-on a16z enterprise survey documenting that 54% of CIOs say reasoning models accelerated LLM adoption, 23% run OpenAI o3 versus 3% DeepSeek in production, and that reported ROI remains below narrative expectations. |
| p17 | A16Z Report: Startup Spend Confirms LLMs Central to Enterprise Purchase Intent | MLQ.ai / a16z | 2025-08 | Uses verified transaction data from 200,000+ startups to confirm GPT and Claude as the most-purchased AI applications, offering payment-verified evidence rather than self-reported usage data. |
| p18 | Token Economics and TokenOps: The Definitive Guide to FinOps for Tokens | Finout | 2026-06 | Defines TokenOps as an emerging discipline applying FinOps principles to LLM token consumption, with the key empirical observation that per-token prices are falling while total enterprise spend rises due to agentic volume growth. |
| p19 | LLM API Pricing Comparison In 2026: Every Major Model, Ranked By Cost | CloudZero | 2026-05 | CloudZero State of AI Costs report data showing average monthly AI spend at $85,500 in 2025 (up 36% from 2024), with token price ranges from $0.10 to $30 per million tokens across current frontier models. |
| p20 | FinOps for AI: LLM Cost Governance | Rick Pollick (practitioner blog) | 2026-06 | Practitioner-authored analysis citing Stanford AI Index 2025 and Menlo Ventures data to show inference costs fell 280x from 2022 to 2024 while enterprise spend rose from $2.3B (2023) to $37B (2025). |
| p21 | Open-Weight Models H1 2026: DeepSeek, Qwen, Llama Recap | Digital Applied | 2026-05 | Tracks the diverging release cadences and enterprise adoption trajectories of the three main open-weight families through H1 2026, documenting sovereign-cloud deployment patterns and procurement-side adoption in finance, healthcare, and public sector. |
| p22 | [vLLM Production Deployment | Introl Blog](https://introl.com/blog/vllm-production-deployment-inference-serving-architecture) | Introl | 2026-02 |
| p23 | Open-Source vs Commercial LLMs: The Complete Guide (2026) | SitePoint | 2026-04 | Provides empirical breakeven analysis for self-hosted versus API deployment, estimating the crossover at 10–30M tokens per day and quantifying DevOps overhead at 0.5–1.0 FTE per self-hosted deployment. |
| p24 | DORA Report 2025 Key Takeaways: AI Impact on Dev Metrics | Faros AI | 2026-04 | Triangulates DORA 2025 survey findings with Faros telemetry from 10,000 developers, identifying the AI Productivity Paradox: individual output rises (98% more PRs merged) while organisational delivery metrics remain flat. |
| p25 | DeepSeek V4 Launch: 4 Specs That Make It the Most Disruptive Open-Weight Model of 2026 | MindStudio | 2026-05 | Documents the commercial and compliance case for open-weight frontier models in regulated sectors, showing how healthcare, finance, and legal organisations use DeepSeek V4 weights to avoid third-party API compliance overhead. |