Research Explainer · Zhang et al. (2025)

Agile teams want AI to be a teammate, but the tools, skills, and rules aren't ready yet

A workshop of 30+ researchers and practitioners at XP2025 catalogued six categories of frustration with GenAI in agile software development and co-created a five-theme research roadmap to move from isolated experiments to human-centered integration.

Published August 2025

78.6% of voters flagged lack of prompting skills as their top frustration, the highest single sub-challenge across all categories

73.3% voted 'too many tools, unclear which to use' as the most pressing tooling challenge

66.7% cited hallucinations and unreliable outputs as the chief data and model quality concern

5 Themes form the co-created research roadmap: Tooling, Human Factors, Governance, Value Realization, and Creativity

120+ individual practitioner pain points collected, then clustered into six frustration categories during workshop retrospectives

Top-voted frustration by category: what keeps agile teams up at night

Percentage of workshop voters who selected the top challenge within each frustration category. Voter pools ranged from 8 to 18 per category. Source: Table 1, Zhang et al. (2025).

The paradox: AI promises 20–50% productivity gains, yet teams can't make it stick

Industry studies from GitHub and McKinsey report productivity improvements of 20–50% on common development tasks like unit-test generation and boilerplate coding. Academic work confirms that AI tools reduce cognitive load, help distributed teams share knowledge, and accelerate exploratory prototyping. The pitch is compelling.

The reality reported by 30+ practitioners at the XP2025 workshop is rather less tidy. They collected more than 120 individual pain points, which clustered into six frustration categories: tooling overload, governance confusion, team-process misalignment, data and model quality issues, knowledge and prompting gaps, and a perceived lack of AI creativity. The single highest-voted sub-challenge across all categories was the absence of prompting skills (78.6% of voters). The second was AI's tendency to produce bland, derivative ideas (75.0%). Tool confusion came third (73.3%).

These frustrations are not cosmetic. They threaten core agile principles. Fragmented tooling disrupts flow and increases cognitive load, undermining sustainable pace. Governance uncertainty kills transparency. Skills gaps prevent teams from translating AI potential into customer-driven value. The workshop's central finding is that AI tooling advances far faster than the organizational practices, regulatory clarity, and human skills needed to use it well.

Six frustrations, dissected

The workshop used a Padlet-based voting exercise to surface and rank practitioners' biggest grievances. Each category reveals a distinct failure mode in AI-agile integration.

The six frustration categories and their top-voted challenges:

F1 Tooling (73.3%)Too many tools, unclear which to use. Participants described a 'paradox of choice' driven by fragmented vendor ecosystems and mandated enterprise licences that limit experimentation.
F2 Governance (53.3%)Unclear data privacy and protection boundaries. Practitioners 'don't know what's behind the checkbox' when opting out of model telemetry, amplifying GDPR, IP leakage, and EU AI Act anxieties.
F3 Team Misalignment (52.9%)AI integration doesn't yield valuable outcomes. Missing success criteria, junior over-reliance, and premature abandonment after early failures compound the problem.
F4 Data Quality (66.7%)Hallucinations and unreliable outputs. Two-thirds of voters targeted this, with teams voicing a 'fear of investing in multiple LLMs' without cross-check mechanisms.
F5 Prompting Gaps (78.6%)Lack of prompting skills or best practices. Workshop participants described prompt crafting as 'writing code in natural language,' yet almost no organisations provide systematic training.
F6 Creativity (75.0%)AI lacks creativity and originality. Foundation models, trained on historical artefacts, tend to regress toward the mean, producing derivative ideas when teams need divergent thinking.

The five-theme roadmap: from short-term fixes to long-term transformation

The workshop reframed its frustration categories into five interrelated research themes, each pairing implementable short-term actions with visionary long-term directions. The overarching vision is a shift from GenAI as an isolated smart tool to GenAI as an adaptive, context-aware teammate that preserves developer flow and evolves alongside human practices.

Theme 1 (Tooling) calls for a systematic taxonomy of AI tools mapped to agile activities and an open-access tool selection guide in the near term, progressing toward multi-agent model selection interfaces and longitudinal field studies of integrated toolchains. Theme 2 (Human Factors) starts with role-specific prompting assessments and onboarding workshops, then moves toward 'shadow agents' that silently observe sprint planning and retrospectives to offer post-hoc feedback. Theme 3 (Governance) proposes sandbox environments for safe AI experimentation and practitioner briefs on regulation, building toward transparent audit mechanisms and agent-based governance frameworks with automated policy enforcement.

Theme 4 (Value Realization) tackles the measurement vacuum: teams need contextual success criteria and multi-faceted evaluation frameworks now, with AI-driven value-tracking dashboards and economic models of AI-agile synergy on the horizon. Theme 5 (Creativity) encourages exploratory case studies of AI-augmented design ideation and experiments with multimodal tools (text, image, code), aiming at co-creative workflows where AI serves as a genuine creative partner rather than a pattern recycler.

What has to be true for this roadmap to work

The authors identify four implementation enablers. First, the community needs dedicated AI4Agile research testbeds that simulate real project settings and allow controlled experiments. Second, high-quality annotated datasets of agile artefacts (user stories, commit messages, meeting transcripts, design prototypes) are essential for training and benchmarking. Third, evaluation frameworks must capture both quantitative metrics (velocity, code quality, rework rate) and qualitative ones (trust, fairness, ethical alignment) over time. Fourth, shared open-source infrastructure, including repositories of tools, benchmark suites, prompts, and case studies, is needed to lower the barrier to experimentation.

The paper is honest about its limitations. Thirty-odd participants, however diverse, cannot represent every perspective, and the translation from workshop outputs to roadmap themes involves author interpretation. The call to action is explicit: validate and extend these themes through surveys, interviews, and longitudinal case studies, and report findings from industrial settings.

Why this matters for practitioners right now

The roadmap's most immediately useful insight is that the biggest obstacle to GenAI in agile is not the technology. It is the socio-technical gap: the space between what the tools can do and what organisations are prepared to absorb. Practitioners described prompt crafting as 'writing code in natural language,' governance as a checkbox they don't understand, and tool selection as a paradox of choice with no good answer.

The paper's structure offers a practical triage. If your team is drowning in tools, start with Theme 1's taxonomy and selection guide. If nobody knows how to prompt effectively, Theme 2's role-specific training is the entry point. If legal is blocking adoption, Theme 3's sandbox approach lets you experiment without risk. If leadership asks 'what's the ROI?', Theme 4's success-criteria framework gives you a defensible answer. And if your standups are generating decent code but boring products, Theme 5 is where the creativity question gets a research agenda.

The honest version of the AI-in-agile story is not that the tools are bad. They demonstrably accelerate certain tasks. The problem is that speed without skill, governance, and measurement creates the illusion of progress while quietly eroding the human practices that made agile work in the first place.

BOTTOM LINE

GenAI tools promise 20–50% productivity gains for agile teams, but the XP2025 workshop found that tool fragmentation, prompting illiteracy, governance confusion, and missing success metrics are preventing sustained value. The resulting five-theme research roadmap provides both immediate actions (tool taxonomies, sandbox environments, role-specific training) and long-term directions (shadow agents, agent-based governance, co-creative workflows) to close the socio-technical gap between what AI can do and what organisations are ready to absorb.

Reference

Zhang, Z., Herda, T., Pichler, V., Abrahamsson, P., Hanssen, G.K., Kerievsky, J., Polyakov, A., Chandna, M., Irgens, M., Kemell, K.-K., Khan, A.A., Kwok, C., Leybourn, E., Malik, M., Mleczko, D., Moalagh, M., Morales, C., Pieskova, Y., Planötscher, D., Saari, M., Tkalich, A., Gstettner, K.J., & Wang, X. (2025). AI and Agile Software Development: A Research Roadmap from the XP2025 Workshop. arXiv preprint arXiv:2508.20563. https://arxiv.org/abs/2508.20563