Research · Academic & arXiv
Back to sweepResearch sweep · deep · 1995 – 2026
Compounding Waves — How Each Tech Era Built the Substrate, and the Skills, for the Next
The compounding economic logic of three successive technology waves from January 1995 to May 2026 — internet disintermediation of distribution, software-defined platforms and cloud infrastructure, and the current AI/agentic systems wave — examining the technical, economic and human-skills dependencies that make each wave a precondition for the next, the new categories of work each wave created, and whether the relationship is best understood as cumulative compounding or as externalised costs harvested by later layers.
- financial
- academic
- blogs
- vc
Synthesised 2026-05-11
Narrative
The foundational dependency chain across all three technology waves is most visible in the scaling-law literature. Kaplan et al. (2020, arXiv) established that language model performance scales as a power law with compute, data, and parameters across seven orders of magnitude, while Hoffmann et al.'s Chinchilla laws (2022) demonstrated that data must grow proportionally with model size for compute-optimal training. These results made the open web — assembled by wave-one internet infrastructure — the load-bearing input for wave-three AI, not merely a convenience. METR's empirical benchmarking programme then provides the clearest view of how rapidly that AI wave is closing the gap with human software labour: the HCAST and RE-Bench task suites show AI task-completion time horizons have doubled every seven months since 2019, and as of early 2025 Claude 3.7 Sonnet achieves 50% success on tasks that take human experts roughly 50 minutes.
The data-access transition from open externality to contested resource is now the best-documented constraint in the field. Longpre et al.'s NeurIPS 2024 paper 'Consent in Crisis' audited 14,000 web domains and found robots.txt and Terms of Service restrictions on AI crawling rose more than 500% between mid-2023 and April 2024, with 28%+ of the most critical C4 sources now fully restricted. A subsequent arXiv paper on crawler restrictions (2025) extends this finding to political-bias consequences, showing moderate news sites withdrawing first and leaving hyperpartisan material over-represented in training sets. Together, these papers frame wave three as the last generation able to consume the open web at minimal cost: the Wikimedia Foundation reported in 2025 that 65% of its most expensive traffic now comes from AI crawlers, raising the question of who funds the next data substrate.
The labour-economics literature is in productive disagreement about net job effects. Acemoglu and Restrepo's task-displacement framework, synthesised in Oxford Open Economics (2024), shows automation systematically reduces labour share unless offset by new task creation. Applied to LLMs, the recent 'Crashing Waves vs Rising Tides' preprint (arXiv, 2026) finds that AI capability gains improve performance broadly across task durations rather than arriving in abrupt occupation-level bursts, while the 'Augmenting or Automating Labor' paper (arXiv, 2025) shows displacement effects outpace productivity gains for low-skilled workers from 2015 to 2022. The Harvard Business School displacement-complementarity study finds a 24% reduction in automatable skills per firm per quarter post-ChatGPT against a 15% increase in augmentation-exposed roles — but this positive complementarity figure is concentrated in mid-to-high-skilled occupations.
On the infrastructure concentration question, van der Vlist et al. in Big Data & Society (2024) and the Internet Policy Review cloud-platform study both document how hyperscaler cloud credits, API lock-in, and strategic investment create a circular capital structure. The 'Hype, Sustainability' arXiv paper (2024) quantifies this: three major cloud providers contributed two-thirds of the $27 billion raised by AI startups in 2023, and up to 80–90% of early-stage AI startup capital flows back to cloud providers. The 'externalised harvest' framing thus has strong empirical support — value created by wave-one open-web content and wave-two cloud infrastructure accrues disproportionately to current-wave AI labs and their hyperscaler backers, while the original creators receive neither compensation nor attribution.
Sources
| ID | Title | Outlet | Date | Significance |
|---|---|---|---|---|
| a1 | Scaling Laws for Neural Language Models | arXiv (OpenAI) | 2020-01 | Established power-law relationships between model performance and compute, data, and parameters, providing the empirical foundation for the hyperscale training regime that defines the AI wave's infrastructure dependency. |
| a2 | Training Compute-Optimal Large Language Models (Chinchilla) | arXiv (DeepMind) | 2022-03 | Revised compute-optimal training ratios, demonstrating that frontier models require data to scale proportionally with parameters, making data supply a binding constraint co-equal with compute. |
| a3 | Consent in Crisis: The Rapid Decline of the AI Data Commons | arXiv / NeurIPS 2024 Datasets and Benchmarks Track | 2024-07 | First large-scale longitudinal audit of 14,000 web domains showing that in a single year (2023–2024) robots.txt and Terms of Service restrictions rose 500%+, directly measuring the closure of the open-web externality that wave-three AI consumed for free. |
| a4 | RE-Bench: Evaluating Frontier AI R&D Capabilities of Language Model Agents Against Human Experts | arXiv / METR | 2024-11 | METR's benchmark comparing AI agents with 61 human experts on ML research engineering tasks, providing the primary empirical evidence on how far agentic systems have advanced toward automating the software-engineering labour that underpins the cloud and AI waves. |
| a5 | HCAST: Human-Calibrated Autonomy Software Tasks | arXiv / METR | 2025-03 | METR's 189-task benchmark for measuring autonomous AI capabilities across ML, cybersecurity, software engineering, and general reasoning, used as the primary instrument for tracking the 7-month doubling time of AI task-completion horizons. |
| a6 | Measuring AI Ability to Complete Long Tasks | arXiv / METR | 2025-03 | Introduces the 50%-task-completion time horizon metric, showing frontier AI models doubled their effective task length every seven months since 2019 and extrapolating this to agent-level software autonomy within a decade. |
| a7 | Crashing Waves vs. Rising Tides: Preliminary Findings on AI and Labor Markets | arXiv | 2026-04 | Empirically distinguishes whether AI capability gains arrive in abrupt bursts for specific tasks ('crashing waves') or as broad parallel shifts across task duration ('rising tides'), with direct implications for which occupational categories face displacement and when. |
| a8 | Augmenting or Automating Labor? The Effect of AI on Employment and Wages | arXiv | 2025-03 | Distinguishes automation AI from augmentation AI using US labour-market data (2015–2022), finding displacement effects outweigh productivity gains for low-skilled occupations and that automation exposure negatively affects new-work creation. |
| a9 | Complement or Substitute? How AI Increases the Demand for Human Skills | arXiv | 2024-12 | Uses 65,000+ job-posting websites (2018–2023) to show AI produces both substitution and complementarity effects on skill demand, with spillover effects reaching workers not directly interfacing with AI systems. |
| a10 | Artificial Intelligence, Automation and Work | NBER / SSRN (MIT Working Paper) | 2018-01 | Acemoglu and Restrepo's foundational task-based framework showing automation displaces labour from tasks machines can perform, establishing the theoretical scaffold for all subsequent empirical AI labour-market research. |
| a11 | A Task-Based Approach to Inequality | Oxford Open Economics | 2024 | Acemoglu and Restrepo's synthesis of task-displacement theory applied to AI, arguing that automation reduces labour share and may depress wages unless counterbalanced by creation of new labour-intensive tasks — the key 'this time is different' test. |
| a12 | Job Transformation, Specialization, and the Labor Market Effects of AI | Working paper (NBER-affiliated) | 2024 | Formal model projecting LLM-induced automation onto heterogeneous workers, finding wages drop up to 35% in the most exposed occupations while rising roughly 4% at moderate exposure, with AI raising returns to social and non-routine manual skills. |
| a13 | Automation and Augmentation: Artificial Intelligence, Robots, and Work | Annual Review of Sociology | 2024 | Comprehensive literature review confirming displacement effects from task automation persist while noting that automation efficacy does not increase monotonically, and that policy intervention is required to prevent widening inequality. |
| a14 | AI and the Future of Work: A Literature Review | arXiv | 2024-08 | Synthesises the labour-economics consensus, noting Acemoglu's estimate of only 0.71% TFP gain from AI over ten years contra Goldman Sachs' 7% GDP uplift, illustrating the wide empirical disagreement on net job creation vs displacement. |
| a15 | Platform Competition in Two-Sided Markets | Journal of the European Economic Association | 2003 | Rochet and Tirole's foundational two-sided-market model explaining how internet platforms court buyers and sellers simultaneously, providing the theoretical basis for understanding how wave-one disintermediation created structural preconditions for wave-two platform economics. |
| a16 | Platform Power in AI: The Evolution of Cloud Infrastructures in the Political Economy of Artificial Intelligence | Internet Policy Review | 2024 | Empirical analysis of AWS, Azure, and Google Cloud trajectories from 2017 to 2021, tracing how hyperscalers operationalise infrastructural power through the cloud-to-AI dependency chain. |
| a17 | Big AI: Cloud Infrastructure Dependence and the Industrialisation of Artificial Intelligence | Big Data & Society | 2024 | Documents how Amazon, Microsoft, and Google use cloud credits, APIs, and technical support to enrol AI startups into their infrastructure stacks, making hyperscaler lock-in the primary mechanism of value capture from the AI wave. |
| a18 | Hype, Sustainability, and the Price of the Bigger-is-Better Paradigm in AI | arXiv | 2024-09 | Documents the circular capital structure where three hyperscalers contributed two-thirds of the $27 billion raised by AI startups in 2023, and applies Jevons' Paradox to explain why efficiency gains from scaling increase rather than reduce overall resource consumption. |
| a19 | Scaling Laws of Synthetic Data for Language Models | arXiv | 2025-03 | Examines whether synthetic data can substitute for organic web-scraped corpora as natural language supply approaches saturation, testing whether scaling laws hold when training tokens are model-generated rather than human-produced. |
| a20 | Web Crawler Restrictions, AI Training Datasets and Political Biases | arXiv | 2025-10 | Shows that increased robots.txt restrictions by moderate news sources push AI training corpora toward hyperpartisan content, linking data-access restrictions to downstream bias in the political composition of training sets. |
| a21 | Somesite I Used to Crawl: Awareness, Agency and Efficacy in Protecting Content Creators from AI Crawlers | ACM Internet Measurement Conference 2025 | 2025 | Active and passive measurement study of AI crawler behaviour across popular sites, finding that 50–70% of website traffic is now automated and that crawlers do not reliably respect robots.txt directives. |
| a22 | Generative AI and the Future of the Digital Commons | arXiv | 2025-08 | Frames the foreclosure of open web data through an Ostrom commons lens, documenting that 65% of Wikimedia's most expensive traffic now originates from AI crawlers and analysing governance frameworks for distinguishing search, archival, and training crawlers. |
| a23 | A Critical Analysis of the Largest Source for Generative AI Training Data: Common Crawl | ACM FAccT 2024 | 2024-06 | Provides a structural critique of Common Crawl as AI training infrastructure, examining governance, funding ties to AI labs, and the copyright and accountability gaps embedded in its data pipeline. |
| a24 | Displacement or Complementarity? The Labor Market Effects of Generative AI | Harvard Business School Working Paper | 2024 | Finds a 24% decrease in generative AI-exposed skills per firm per quarter in highly automatable jobs post-ChatGPT, against a 15% increase in augmentation-exposed jobs, providing the most granular job-posting evidence on the dual displacement-complementarity dynamic. |
| a25 | Artificial Intelligence, Domain AI Readiness, and Firm Productivity | arXiv | 2025-08 | Examines why many firms fail to realise AI productivity returns despite heavy investment, finding that domain AI readiness — quality of external academic and data infrastructure — is a stronger predictor than internal technical capability alone. |