In a nutshell: Keep an eye out for developments, but for now, AI agents are for innovation teams and experiments, most of us are better focusing on using AI systems like ChatGPT and Google Gemini to improve everyday tasks and processes.
What are AI agents?
The term ‘AI agents’, (or ‘agentic AI’ if you’re being posh) describes software that can make decisions and take actions to achieve its goals. The idea has a great deal of potential, but the excitement is generating a lot of noise and confusion.
Why It Matters
Media headlines and technology predictions for the year suggest AI agents are inevitable and imminent.
Many tech companies (Microsoft, Amazon, Salesforce, etc.) are rushing products to market under the “agent” banner.
As with all early-stage technology, it’s crucial to separate genuinely ‘agentic’ capabilities from marketing hype.
A working Definition
An AI agent can act on your behalf, not just respond with text. It plans, decides, and takes action toward a goal, adjusting as needed. ‘Agentic AI’ refers to a system that takes instructions, plans its approach, and acts autonomously while making decisions. The name comes from its having ‘agency’ on behalf of whoever (or whatever) gave it a brief.
Beware of certainty
‘Agentic’ is becoming an industry buzzword, with everyone from Microsoft to Salesforce re-labelling AI apps and features as ‘agents’ or ‘agentic’ while offering little more than conventional task automation. We’ve started to call this ‘agent-washing’. Certainty around their true autonomy implies overconfidence or even credulity – the more confidently someone claims their tool has ‘full agentic AI’, the more you should question it.
How ‘agentic’ is it really?
To cut through the hype, marketing, and to make a true assessment of how ‘agentic’ the recent launches are, we developed a scoring system and scored them all:
Level 1 - Little to no autonomy – just an enhanced chatbot in practice.
Example: Basic chatbots rely solely on LLM responses without integration with APIs or knowledge bases. E.g. ChatGPT or DeepSeek (not the Operator feature).
Level 2 - Primarily advanced automation – can’t reliably plan or adapt beyond its script.
Example: Inline agents in Amazon Bedrock automate scripted tasks (e.g., data retrieval) but lack dynamic adaptation.
Level 3 - Some agentic features, but requires human oversight for key decisions.
Example: Oracle’s AI Agents for limited tasks; GitHub Copilot, User Confirmation Agent ensures safety by requiring human approval for actions like payments.
Level 4- Strongly agentic but limited to certain domains.
Example: Amazon Bedrock, NVIDIA’s Blueprints, both NVIDIA and AWS blueprints focus on domain-specific workflows (e.g., customer service, PDF extraction) with limited but robust autonomy.
Level 5 - Fully autonomous – plans, executes, and makes decisions in real time.
Example: Microsoft AutoGen, Salesforce Agentforce.
NB: These are subjective calls at the time of writing, intended as an illustrative guide. Expect features and definitions of agentic AI systems to rapidly change.
How much should you care about AI Agents right now?
In a nutshell: Unless you’re on an innovation team, or want to be in the weeds of this thing, don’t worry for now.
Focus on basics first
If your organisation is still exploring large language models (LLMs) like ChatGPT, don’t leap to full autonomy. Build a solid foundation of AI literacy and robust processes for prompt engineering, data security, and compliance.
Practical pilots matter
Even if an ‘agent’ product promises advanced capabilities, start with small use cases. Evaluate how well it handles a straightforward business process or automates repeated tasks without introducing risk.
Look for tangible value
Is an agent performing tasks that reduce cost, time, or effort in a measurable way? Or is it simply generating more sophisticated-sounding text?
Watch-outs and next steps
Contested Definitions
Different companies have different definitions of agents. Don’t get stuck in debate – agree on your own working definition of ‘agentic AI’ to keep conversations productive.
Multiple Models, Multiple Risks
Each provider has potential biases, so rely on more than one model if you need accurate or neutral results.
Governance and Oversight
If you do implement any level of an agentic system, ensure there is human monitoring or a robust auditing mechanism. Bear in mind both compliance and reputational risks if an AI agent runs amok.
Scoring and testing
When you’re ready to dig deeper, you can consider developing or adapting your own scoring system to evaluate any new AI product claiming ‘agentic’ capabilities. Share those findings internally to demystify product offerings. What would agentic AI look like if it met your organisation’s needs and standards? Develop tests and scenarios for new agents that come to market to see how useful they will be.
"Agent-washing' is brilliant