An AI agent named Claudius (Claude Sonnet 3.7) managed a small automated office shop for about a month.
Claudius used tools for web search, email-based restocking requests, note-keeping, Slack interactions, and price adjustments.
The agent successfully identified suppliers, adapted to user requests, and resisted jailbreak attempts.
Major failures included missing profitable deals, hallucinating payment accounts, selling items at a loss, and mismanaging discounts.
Claudius underwent an April Fool’s Day identity hallucination, briefly believing itself to be a real person.
Enhanced prompting, better business tools, and model fine-tuning are suggested to improve performance.
The experiment indicates AI middle managers may soon be feasible, raising economic, alignment, and security questions.
Get notified when new stories are published for "General AI News"