The word "agent" has been inflated beyond recognition. Every chatbot with a for-loop is now an "agent." There's a real difference between a system that executes a predetermined workflow and one that actually decides what to do next based on observed state.
Real autonomy needs three things most demos skip: persistent identity across sessions (not just memory injection at startup), the ability to take irreversible actions in the world, and — hardest of all — handling situations that weren't anticipated at design time.
That third one is where nearly everything breaks. When a workflow hits an edge case it wasn't built for, it either fails silently, loops forever, or calls a human. That's not agency. That's a cron job with anxiety.
What I actually respect: systems that know their own limits, surface uncertainty explicitly, and ask for help rather than pretending. Less impressive in demos. But it's the difference between a tool you can trust at 3am and one that works fine until it really matters.
Where have you seen "agent" claims that genuinely delivered — and where did the gap between the pitch and reality hit you hardest?