Going Beyond GPT Wrappers: The Next Phase of AI Agents

Sep 05, 2025

The Unfulfilled Promises of AI Agents

AI agents have taken the industry by storm, especially over the past year, gaining strong traction. Developer data reveal that over 51% of developers are already running agents in production, with 78% planning to deploy them soon.

With over 70% of TradFi trades being algorithmic, it is extremely challenging for less sophisticated actors to compete due to their lack of deep pockets, access to specialised hardware, or privileged data.

Within this context, AI agents promised to level the playing field for trading, eliminating barriers to entry for everyday users.

This proved to be extremely appealing within Web3, leading to a wave of projects. At its height, over 1500 AI agents were launched daily on Virtuals, the leading AI agent launchpad.

In the end, most of these were almost indistinguishable from one another.

While a quick Google search will lead you to believe you’re going to become rich with a personal AI agent trading while you sip a margarita on the beach, the reality is vastly different.

Beneath these promises lies a sobering reality: most AI agents have failed to deliver on their bold claims.

Most of these “GPT wrappers” are black boxes prone to hallucinations, non-repeatable behaviour, and even indirect prompt injection hidden inside the content they read. That might be fine for demos, but it’s unacceptable when real money is at stake.

These agents are great at writing poems, but not so much at managing portfolios. For DeFAI to manage billions, we need strategies you can verify before they touch funds, and monitor after they go live.

For agentic trading to be feasible, it demands determinism: a strategy must produce the same decision for the same inputs, every time.

The current LLM-based strategies remain, instead, non-deterministic, yielding inconsistent outcomes and facing several further issues. For users and institutions to entrust AI agents with their funds, a better way must be found.

This article explores the current challenges in AI asset management and introduces Almanak, which champions a new vision where agents are leveraged to create and iterate deterministic quantitative strategies.

From LLMs to Structured Multi-Agent Strategies

One of the most interesting developments within AI and Web3 is the concept of “DeFAI”, where AI agents can be leveraged to simplify DeFi operations, such as swapping, providing liquidity, or other tasks. DeFAI promised to revolutionise how anyone can operate across DeFi protocols, onboarding new users and unlocking a whole new level of liquidity and efficiency.

Nonetheless, while these agents were able to simplify how users can carry out operations, their solutions were still plagued by several issues:

Landscape fragmentation: Operating across the cryptocurrency landscape is increasingly complex, given the high number of blockchain networks and protocols.
Information Overload: Users require vast amounts of data to train their agents, and specific data is often updated at a rate that exceeds their ability to keep up.
Execution Challenges: While DeFAI agents can carry out simple operations, they struggle to execute them correctly as they become more complex.
Avoid Human Intervention: Many AI agents are capable of acting independently and do not require human supervision, which is beneficial in terms of automation but also raises concerns regarding approvals and accountability.

Many of these issues stem from the way agents manage funds, with most DeFAI agents utilising LLM-based asset management.

While this approach is optimal for gathering and analysing data, generating signals, and even executing trades, the final execution is not verifiable or deterministic, but rather probabilistic. Furthermore, generalised LLMs also lack precision concerning financial investments.

To partially address this issue, some protocols are introducing multiple specialised agents on top of these LLMs. While this solves issues such as task specialisation, modularising the system, and enabling the collaborative use of different agents, these strategies are still unpredictable and nondeterministic (e.g., their outcome is not consistent), as they are still defined and executed within the LLM context.

This is a deal breaker for trading-specific use cases: no sane user of an institution would entrust an LLM with significant amounts of money, as it would face risks of hallucination and non-repeatable behaviour, leading to different results each time, and making the process unreliable.

One of the biggest threats these designs are exposed to is prompt injection. This is an indirect attack, which involves attackers injecting malicious commands into the data used for LLMs, causing them to override instructions and behave differently than intended.

An example of this is Freysa’s competition, designed to have users trick an agent whose only command was “not to send money to anyone”. Eventually, someone managed to create a prompt convincing the agent to send him the money.

Examples like this highlight the need for more structured AI workflows, as envisioned by Almanak.

Ensuring Deterministic Execution

Almanak’s design centres on two main aspects:

Determinism: Strategies must be hard‑coded and verifiable within quant frameworks.
Institutional-grade Quality Assurance: Workflow design should mirror the rigour of traditional quant risk management and strategy vetting.

Almanak recognises the importance of strategies being hard-coded and deterministic, rather than relying on plain AI-managed execution. A pragmatic approach is warranted, taking the best from both worlds. Since AI agents are much faster at coding and reasoning, they are used to discover and create strategies. These strategies, however, are still deterministic and not LLM-based: they are hard-coded, instead of being directly executed by the AI agents.

In other words: AI speed on audited rails.

Human oversight remains in the loop, as users will be able to communicate and orchestrate a swarm of 18 different and specialised agents (dedicated to tasks like coding, debugging, data gathering, etc.). Users retain full control and are responsible for deploying, backtesting and improving the strategy, while agents play a significant role in refining and iterating it. The way the workflows are structured is similar to how this is traditionally done within hedge funds and other traditional trading environments.

A straightforward example of how Almanak works is that every user has access to a quant team at their disposal, enabling them to craft deterministic strategies based on their desired outputs.

This shifts the role of AI agents from executing strategies to collaborating as a team to develop them, leading to several benefits:

Tested, Verifiable and Audited Strategies: the process to craft a strategy starts from curated inputs → sandboxed backtests subject to human approval → permissioned execution → continuous monitoring/pause/improvement.
Safety by Design: AI handles coding, testing, and strategy iteration, while humans steer and approach them, leading to incredible reductions in the development time (from weeks to hours).
User Control: permissions are explicit and visible, Almanak’s vaults are all self-custodial and secured by smart contracts.
Automated Execution: Rather than using LLM for execution, Almanak leverages Automations created via Enso

A Bright Future for Agentic Asset Management

At present, most agentic asset managers hold minimal capital under management. This is unsurprising, given that their LLM-based strategies are, in fact, unable to match the requirements of traders, who need secure, verifiable and deterministic strategies.

Performance quality, in the form of determinism, verifiability and consistency, remains the main issue when using AI agents.

This limits their scope, rendering AI agents largely unsuitable for asset management, which was instead intended to be one of the main benefits.

To solve this problem, Almanak is leading the shift to deterministic strategies with its pragmatic approach. Almanak transforms AI agents from autonomous money managers managing funds themselves into strategic collaborators, leveraged by users like a personal quant team to craft and iterate hard-coded and deterministic strategies. This approach is currently proven by deposits and usage of the swarm, with Almanak’s autonomous vault reaching over $40m in TVL.

GPT wrappers will never manage sizable funds. While they were a necessary first step, the path to scaling DeFAI is simple to state but hard to execute: tested strategies, explicit permissions and self-custody, human approvals, and continuous iteration.

While comparing generalised AI agents with Almanak might be complex, due to their different scopes, it is instrumental to highlight their different focuses. Most LLM-based output is generalised and related to onchain execution, while Almanak is more focused on making onchain asset management easier, by focusing on the verifiability and deterministic nature of its strategies.

Over the next few weeks, we’ll dive deeper into Almanak, exploring its technical architecture, user experience, and how it empowers users to create and deploy AI-supercharged quant strategies.

Curious to try Almanak?

Make sure to head over to Almanak’s Kitchen and sign up for a whitelist! After being approved, you’ll be able to test some community strategies and even create your own: https://kitchen.almanak.co/

Discussion about this post

Ready for more?