Research · Blogs & Independent Thinkers
Back to sweepResearch sweep · deep · 2025 – 2026
Comparative LLM Usage Across Sectors
Comparative real-world usage of LLMs and adjacent AI technologies from June 2025 to June 2026: which models (GPT-5, Claude, Gemini, Llama, Mistral, DeepSeek, Qwen) dominate which sectors, how they are deployed (hosted API, Bedrock/Azure, self-hosted vLLM/Ollama, RAG, agents, fine-tuning), what workloads they serve, and how organisations measure, budget, and publicly report token cost and actual spend.
- Claude Opus 4.8
- financial
- frontier
- academic
- vc
- blogs
- tech
Synthesised 2026-06-20
Narrative
The clearest quantitative window into enterprise LLM adoption comes from Menlo Ventures' paired 2025 reports. The mid-year report recorded enterprise LLM API spend doubling from $3.5 billion to $8.4 billion in a single six-month stretch, driven by inference overtaking training as the dominant cost category. By year-end, Anthropic had captured 40% of the enterprise LLM API market, up from 12% in 2023, with OpenAI falling to 27% and Google reaching 21%. Code generation was the catalyst: Claude held 42% developer share in that workload, double OpenAI's figure. These are survey-derived numbers from roughly 500 US enterprise decision-makers, and Menlo is an investor in Anthropic, which warrants scepticism about precision, but the directional claim of a rapid share shift is corroborated by the OpenRouter empirical dataset.
OpenRouter's 100 trillion-token study, published in partnership with a16z in December 2025, offers the most granular observed-behaviour dataset available to independent analysts. The platform, serving over five million developers across 300-plus models, recorded proprietary models handling the majority of tokens throughout the study period, but open-weight models grew to roughly one-third of usage by late 2025. Within the open-weight segment, DeepSeek alone accumulated 14.37 trillion tokens, followed by Qwen at 5.59 trillion and Meta Llama at 3.96 trillion. The market fragmented sharply after what the study labels the "Summer Inflection": DeepSeek's near-monopoly of over 50% of open-source share in early 2025 collapsed as Qwen 3, Kimi K2, and the GPT-OSS variants entered the field. Tool use and agentic workloads showed a parallel shift, with reasoning-model tokens climbing from a negligible slice in Q1 2025 to over 50% of total usage by mid-year.
Self-hosted inference emerged as a mainstream business strategy rather than a researcher pastime. Practitioners broadly distinguish two deployment paths: Ollama for prototyping and single-developer use, and vLLM for production multi-user workloads, with the latter's PagedAttention mechanism cited as delivering substantially higher throughput at concurrent load. The driver in regulated sectors - healthcare, legal, finance, government - is compliance. HIPAA, GDPR, and SOC 2 create hard boundaries around data egress that push organisations toward on-premise or private-cloud deployment of open-weight models, primarily Llama and Mistral variants. By contrast, Menlo's enterprise survey found open-source adoption actually declining from 19% to 11% year-on-year among large enterprises, suggesting that the self-hosting trend is more pronounced in mid-market and developer-led organisations than in top-tier procurement-driven accounts.
The production-deployment gap remains the defining practical problem. The Metadata Weekly Substack cited an MIT figure that 95% of enterprise AI pilots never reached production in 2025, while ZenML's LLMOps Database, which crossed 1,200 catalogued case studies by December 2025, found that successful systems consistently combined LLMs with traditional deterministic rules and classical ML rather than delegating entirely to foundation models. On cost, practitioners writing on Medium and specialist FinOps blogs in 2025 and 2026 document a structural measurement problem: LLM bills arrive as undifferentiated totals that hide per-feature and per-user attribution. The practitioner consensus, drawn from multiple independent sources, treats LLM cost as an architectural discipline imposed at design time rather than a billing line read at month-end. Token prices dropped approximately 80% between early 2025 and early 2026, yet total enterprise spend tripled, illustrating classic Jevons-paradox dynamics that independent commentators noted explicitly.
Sources
| ID | Title | Outlet | Date | Significance |
|---|---|---|---|---|
| b1 | 2025 Mid-Year LLM Market Update: Foundation Model Landscape + Economics | Menlo Ventures | 2025-07 | Primary quantitative source on enterprise LLM API market share shift, recording Anthropic overtaking OpenAI and enterprise spend doubling to $8.4 billion in six months. |
| b2 | 2025: The State of Generative AI in the Enterprise | Menlo Ventures | 2025-12 | Year-end enterprise survey of ~500 decision-makers documenting Anthropic at 40% LLM API share, open-source decline to 11%, and $37 billion total generative AI spend in 2025. |
| b3 | State of AI: An Empirical 100 Trillion Token Study with OpenRouter | OpenRouter / a16z | 2025-12 | Largest observed-behaviour dataset on LLM usage patterns, covering 100 trillion tokens and documenting open-weight model growth, reasoning model surge, and tool-use concentration. |
| b4 | State of AI: An Empirical 100 Trillion Token Study with OpenRouter (arXiv preprint) | arXiv | 2026-01 | Peer-accessible version of the OpenRouter/a16z study, with detailed methodology including DeepSeek's 14.37 trillion tokens, Qwen at 5.59 trillion, and Llama at 3.96 trillion. |
| b5 | OpenRouter's 100 Trillion Token Study: The Real State of AI Usage in 2025 | Adam Holter (personal blog) | 2025-12 | Independent analysis of the OpenRouter dataset, synthesising the dual-market structure thesis and the market fragmentation after the Summer Inflection. |
| b6 | The State of AI in Q4 2025 | Substack (Pat McGuinness) | 2025-12 | Independent Substack synthesis of Q4 2025 AI adoption data, citing Ramp card-data showing paid AI adoption at 43.8% of US businesses and Google reporting a 50x yearly increase in monthly tokens. |
| b7 | I think Anthropic and OpenAI have found product-market fit | Simon Willison's Weblog | 2026-05 | Simon Willison's practitioner observation that Anthropic's Enterprise plan shifted to API-usage billing by late 2025, with companies reporting surprising LLM bill sizes, signalling genuine production-scale adoption. |
| b8 | The last six months in LLMs in five minutes | Simon Willison's Weblog | 2026-05 | Practitioner summary of the November 2025 inflection point in LLM capability, covering the shift to RLVR-trained coding models across OpenAI and Anthropic. |
| b9 | LLM predictions for 2026, shared with Oxide and Friends | Simon Willison's Weblog | 2026-01 | First-principles prediction piece from a leading practitioner blogger, explicitly invoking Jevons paradox as the mechanism explaining why falling token prices do not reduce total spend. |
| b10 | Agentic Engineering Patterns | Substack (Simon Willison) | 2026-02 | Willison's Substack post covering the November 2025 inflection point and the emergence of agentic engineering as a distinct discipline from earlier LLM prompt-engineering workflows. |
| b11 | What is agentic engineering? | Simon Willison's Weblog | 2026-03 | Practitioner definition of agentic engineering, providing the architectural framing most cited in 2025-2026 discussions of production agent deployment across GPT-5, Gemini, and Claude. |
| b12 | [Deep | LLM 2026: From the Illusion of Model Development Stagnation to Large-Scale Real-World Agent Deployment](https://fundaai.substack.com/p/deepllm-2026-from-the-illusion-of) | Substack (FundaAI) | 2026-01 |
| b13 | The 2026 AI Reality Check: It's the Foundations, Not the Models | Substack (Metadata Weekly) | 2025-12 | Substack analysis citing MIT data that 95% of enterprise AI pilots failed to reach production in 2025, arguing that data and governance foundations, not model selection, determine deployment success. |
| b14 | Why Do LLM Applications Fail in Production? | Substack (The Gen Academy) | 2026-05 | Detailed technical Substack post documenting that agentic token consumption runs at roughly 4x chat usage and multi-agent at 15x or more, explaining why production economics differ sharply from demo economics. |
| b15 | What 1,200 Production Deployments Reveal About LLMOps in 2025 | ZenML Blog | 2025-12 | Practitioner analysis of 1,200 catalogued LLMOps case studies, finding that successful production systems combine LLMs with deterministic rules rather than relying on foundation models alone. |
| b16 | The Agent Deployment Gap: Why Your LLM Loop Isn't Production-Ready | ZenML Blog | 2025-07 | Practitioner post identifying the structural gap between agent prototyping and production deployment, with patterns drawn from real deployments as of mid-2025. |
| b17 | The AI Agents Stack (2026 Edition) | O'Reilly Radar | 2026-06 | Maps the six-layer infrastructure required for production agents, documenting LangGraph's emergence as the graph-orchestration standard with confirmed deployments at Uber, JPMorgan, LinkedIn, and Klarna. |
| b18 | The Rise of the Agent Runtime | Work-Bench | 2026-02 | Documents agentic infrastructure cost shock with a case study showing costs jumping 10x from prototyping to staging, illustrating budget risk from unoptimised RAG and agent orchestration. |
| b19 | LLM Token Costs Benchmarked: What Engineering and FinOps Leaders Actually Need to Know | Cloudchipr | 2026-05 | Documents an approximately 80% drop in LLM API prices between early 2025 and early 2026 and argues for per-workload cost tracking over per-token pricing as the operative FinOps metric. |
| b20 | FinOps for AI LLM Cost Governance | Rick Pollick (personal blog) | 2026-06 | Synthesises Stanford AI Index data on inference cost decline alongside Menlo spend figures and FinOps Foundation survey showing 98% of practitioners now managing AI spend, framing the Jevons-paradox dynamic explicitly. |
| b21 | LLM FinOps: Per-Feature Cost Attribution and Token Budgets | Zop.dev | 2026-05 | Practitioner post documenting the per-feature attribution problem with a concrete example of a $48,000 monthly Anthropic bill that no one could break down by feature or customer. |
| b22 | 10 ML FinOps Habits to Right-Size Models, Right-Price Tokens | Medium (Nexumo) | 2025-12 | Medium practitioner post framing LLM budget leakage as the norm and arguing that model routing, token caps, and per-feature tagging are the core habits of mature ML FinOps. |
| b23 | Open-Weight AI Models Are Catching Up: What It Means for Enterprise Automation | MindStudio | 2026-05 | Practitioner analysis comparing open-weight and closed models across production task categories, finding parity on coding, classification, and extraction but a persistent closed-model edge on complex multi-step reasoning. |
| b24 | vLLM vs Ollama vs LocalAI: Best tools for self-hosting LLMs in 2025 | eMasterLabs | 2026-03 | Practitioner comparison articulating the compliance-driven case for self-hosted LLMs in healthcare, legal, finance, and government under HIPAA, GDPR, and SOC 2 constraints. |
| b25 | Self-Hosted LLM Guide: Costs, Architecture and Breakeven Point | Alpacked | 2026-05 | Documents the canonical Ollama-to-vLLM migration path and the total cost of ownership components most teams undercount when evaluating self-hosted versus API deployment. |