AI Agents and Finance

I met with Matteo Fabro, an AI innovator on a mission to disrupt financial research and analysis.

With degrees in International Economics and Finance from Università Bocconi as well as Music from the Conservatory of Music C. Pollini, Matteo brings a unique blend of technical expertise and creative thinking. He has experience developing deep learning applications, task automation, and autonomous AI agents that multiply productivity. In this conversation, Matteo shares his insights on how AI is transforming the world of finance. We explore the current state-of-the-art in AI agent technology and how it’s being applied to automate complex research and analysis tasks. Matteo explains his approach to designing AI agents, key challenges in deploying them in the financial domain, and the immense potential they hold to streamline workflows. We also discuss the future outlook for autonomous AI and its implications for the finance industry.

The following is an edited transcript of our conversation.

David 
What is your current role? How did you get to do what you’re doing?

Matteo 
My education started with music because I attended the conservatory at a very young age, graduating early in piano while also attending high school. When I had to choose what to study at university I was conflicted because I wanted to do so many things. I would have liked to study physics, my passion, or even neuroscience and cognitive studies. In the end, partly due to peer pressure, I chose finance to be able to study in Milan in a relatively international environment while remaining in Italy, so I attended Bocconi.

Before studying finance I was interested in machine learning and various models, not the fashionable ones. For example, I started using the first Cleverbot when I was ten years old. It was a very simple neural network, maybe not even a real neural network, but a pre-programmed language model. I was fascinated by the fact that I could play it practically endlessly. This recalls the famous Eliza effect, where people interacted with an unrealistic bot but became convinced that it was a real cognitive entity.

David 
It’s a super cool effect that we’re seeing intensify with newer applications. People allow themselves to engage in dialogue with machines in a way that exceeds expectations and project their emotions into this emerging human-machine relationship.

Matteo 
Absolutely. Then I studied the basics of machine learning, without venturing too much because it is a field that requires a certain investment of resources that I didn’t want to tackle in depth. I remember that during my first internship, while working on variable interpretations for B2B purposes, I was studying different network architectures. I went from simple recurrent network to LSTM to transformer, which was little known at the time but is now very popular. The timing was right when OpenAI released GPT-3, the first truly intelligent language model with some intelligence.

David 
It is worth explaining for those who follow us and are not specialists that within the great field of information technology, which is labeled somewhat inappropriately as information science because I see it more as an engineering field than a scientific one, we have several subsets.

Matteo 
A somewhat transdisciplinary field.

David 
Yes, even if we want to be more precise, the difference is that a scientific field discovers principles and laws of nature formalized in theories and then falsified or at least partially verified with experiments. Instead, an engineering field applies the results of the first through different combinations and formulations that may be far from trivial and require similar efforts to identify and achieve them, but do not necessarily identify and incorporate new principles. In computing the two partially overlap, but it is relatively rare for a new type of computer to start from first principles. The von Neumann architecture has dominated and still dominates the way we use computers since the 1940s. For 30 years there have been quantum computers that have not yet reached such a widespread distribution as to have an impact on what computing can do. Timidly or even very ambitiously, but still theoretical and yet to be seen, there are new approaches that try to make computers much more efficient from an energy point of view, a bit like the brain which can do this with just 20 watts. that our AI systems can’t even do with 100 megawatts. This illustrates that different systems rarely go to first principles to improve computing. Within this field we have artificial intelligence, a particular subset that contains many different areas such as classification systems, recommendation systems, preventive maintenance and many other things. Within this are machine learning machine learning systems, within which we find deep learning. And within that, one of the approaches is that of transformers which with GPT-2, GPT-3, GPT-4 and since yesterday GPT-4O have achieved truly brilliant capabilities, attracting many talents and curious people like you. I think it was important to make this little journey for those who follow us. Going more specifically, what are these skills that these networks have developed?

Matteo 
The most significant was precisely through GPT-3. The developers of ChatGPT discovered that this system, simply programmed to learn the next word in a sequence, was able to learn certain things without having to relearn them from scratch, but simply by giving it the final instruction or examples in the context available. He was able to learn new skills that he had not been explicitly trained in, such as translation.

David 
Can I ask you if you think a slightly different wording is correct? Training on an extremely large set of information means that, although the system is not explicitly oriented towards having certain skills, there are other latent ones to be discovered by users or programmers who continue to be surprised because the system is able to do something without it simply being expected. You only find out because someone thinks to ask you: “But do you know, are you capable of doing X?” And the system responds…

Matteo 
“Of course, I can.”

David 
“But why didn’t you tell us before?” “No one asked me.”

Matteo 
In the beginning the way this was done was by giving him examples. For example, you told him “This is French to English, this is Spanish to English, this is Italian to English” and then you asked him “Russian to English” and he managed to complete the sentence. This is the original Transformer paper. Instead, the original GPT-3 paper is “GPT is Future Learner”, where these “few shots”, as they are called because you give them some examples, are practically the translation. This was very interesting from a machine learning point of view because it means that there is machine learning without having to invest thousands, tens of thousands, millions of dollars as they are doing now to train a system to do a specific task. Very simply you give him three examples and he does it as if you had trained him for that. This was already very important. The only problem with this approach was that it was very unstable because there were often errors of different types, which are called hallucinations now, even though in reality they are simply errors in the model.

David 
The term “hallucination” certainly captures the attention of those who hear it, but it is too anthropomorphic a term. It implies that the system has a perception of reality and when it hallucinates, it visualizes and then communicates something that is not part of this reality. This in my opinion is a bit misleading because we are still wondering how robust and resilient the so-called “world model” is, the scheme of reality that systems develop. I try to replace this term, which I consider excessively anthropocentric, with something else, attributing the systems’ obsession with giving answers even when they are not certain or even inventing non-existent quotes and references, to an insufficient introspective capacity and the inability to realize that it is better to stop. A bit like an overly anxious student at an exam who instead of saying “I don’t know, I should think about it a bit more, I don’t know if I have the time”, goes on elaborating absurdities. In fact, maybe it’s worth mentioning, there are techniques that improve reasoning when you tell the system “Try to do chain reasoning, think about it more, verify what you said before telling the user”. These explicit reminders trigger objectively better results.

Matteo 
Once GPT-3 went out of fashion, ChatGPT came out and completely blew up the scene. ChatGPT is simply GPT-3 trained not only to predict the next word in the sequence as best as possible, but also to make the interlocutor like the answer. The interlocutor can say “I like it” or “I don’t like it” and these are the two ways in which he has been instructed. This makes it work very well for being what it is.

David 
This user feedback is technically called “reinforcement learning with human feedback” and is also part of the phase that precedes the release of these models to the public, especially when making them align with desirable behaviors or discourage undesirable behaviors. It’s a very delicate activity in terms of finding the right balance. I don’t know if we’ll go into detail about how you can do this or how often you get it wrong in actually hitting the right spot. So you’re telling me that when the user presses thumbs up or thumbs down in the ChatGPT interface, that information is incorporated by OpenAI to improve the system?

Matteo 
Yes sure.

David 
In your opinion does this happen, not in real time, but in any case with progressive phases of improvement of the system or only when a new important version is released?

Matteo 
No, it is a continuous process and the model is always slightly modified when there is the reinforcement process. It’s real-time reinforcement. There are now various new techniques to ensure that this solution can also be done asynchronously, so as to be able to do this process which could take up to ten years if you do it using human operators on a very complicated model, perhaps in a few days if you have such an asynchronous process. This methodology is being studied by various researchers. However, I don’t know much about it, it’s not what I do, so I’ll stop here on the applied methodologies.

David 
The important difference between GPT-2 and GPT-3 is that GPT-2 was open source, while GPT-3 was not and still is not, as is GPT-4. Contrary to other models such as Lama, Mistral, etc., which although starting to approach the performance of GPT-4 are available in open source format. This is very interesting because it allows a whole series of applications that cannot be done using OpenAI. What would you still highlight regarding the evolution of systems and how they have become useful in the space of a few years, now forming the basis of a new generation of applications?

Matteo 
Between GPT-3 and GPT-2 there is an ocean of difference, because GPT-2 is a model completely without practical application. Hallucinations are everywhere, it makes no sense because there is no practical use. While GPT-3 is a model that actually has practical use, especially when taught via human reinforcement. The difference between the two makes one make sense and the other doesn’t. A very similar argument can be used to distinguish between GPT-4 and GPT-3. GPT-4 is the latest generation of this type of models and is again far superior to GPT-3 for various reasons. Especially hallucinations, which are simply a measure of the amount of errors in the output, have been greatly reduced in GPT-4. GPT-4 demonstrates certain abilities that one would not expect at all from this prediction system.​​​​​​​​​​​​​​​​

David 
GPT-4 comes in the top percentiles of tests of all types, from legal tests to mathematics, biology, the most disparate things. It demonstrates true useful generality, not just generality for generality’s sake. One thing to highlight is that GPT-4 and GPT-4O, the system announced yesterday, have brought multimode forward. The “O” actually stands for “Omni”. We don’t know what OpenAI will add further, but already today the same unified model is capable of receiving and sending back different types of data, not only text but also images, audio and we already know, even if it is not yet available to everyone, even video. This multimodality is certainly an interesting feature and goes in the direction we need to go, that of the agents. How would you define a system that is an agent, as opposed to a system that interacts but is not an AI agent?

Matteo 
The definitions here are still very nebulous because it is something that has existed for a very short time, so it is not possible to give precise definitions. But let’s say that the LLM (Large Language Model) itself is only made to have text output. The interesting thing is that this output can be used in various ways. One of them is to use the produced text as input to a certain type of action. For example, we can use it as OpenAI is trying to do to do searches on Google, where people can say “I want to do this search” and he simply, by saying something, causes the computer to intercept the word he says and the action he says. wants to take, and these words actually become an action in real life. Obviously Google is not a concrete action, but it can also be applied to things like sending an email. The LLM says “I want to email David saying

David 
While you were talking, I asked ChatGPT what it says and it gave me an answer that was too long. Asking him what an AI agent is, he mentioned autonomy, sensory ability, making goal-oriented decisions, adaptability, interaction. Too much stuff. I asked him to reduce it to two short sentences and lo and behold, the first term he said, “autonomous”, is absolutely right because it passes…

Matteo 
…from having to carry out the action and copy and paste in the email to having the email sent independently by the agent in a completely autonomous way. Then there are various definitions of agent. We have the simple autonomous agent, where for example you don’t let it decide the action, so you can have an agent that only sends emails. It’s practically not even an agent, it’s simply an instance of LLM connected to the email. Then there is the actual autonomous agent, where the agent not only decides the input to the action but also decides the action itself to take. For example, first he wants to go to Google and then he wants to send the email depending on the information he found, and so on. Then there are the actual cognitive architectures, which are called ideal agents, where you have not only the ability to perform actions but also internal reasoning within the system. As you were saying before, it helps if you make these models “think out loud”, because they don’t think within themselves.

David 
Let’s say it doesn’t come naturally to him, you have to tell him explicitly.

Matteo 
There are two types of thought according to the theory: type 1 thought and type 2 thought. Type 1 thought is the spontaneous one that comes to us, for example to me now when I am speaking. I don’t think about what I’m about to say, otherwise I wouldn’t have the cognitive capacity to think while I speak. But if I want to write something of a certain caliber, I would sit there and think about what I want to write, I would think to myself, I would talk to myself practically while I write. This is called a type 2 system, where you reason in words just. This is what you can make the language model do, as you were saying before, to increase performance on the task you want it to do, which can be for example that of being an autonomous agent.

David 
The book I’m sharing the screen with is by Daniel Kahneman, who passed away recently. He wrote this popular text on his theory of two systems of thought, system 1 and system 2, slow and fast. It has been a very successful approach that has also inspired several implementations in artificial intelligence. Returning to agents, in this sense it is clear that an insufficiently powerful AI cannot become an agent because it would not accomplish anything. There must be a threshold beyond which it is useful to think that an AI can become an autonomous agent capable of structuring its own actions and generating a result useful for achieving an objective for the instructions it has received. In your example, sending an email to the right person, in the right tone, to talk about the right things, at the right time, etc. In your opinion, are we already at the point that AI agents can be built on current platforms? I show the so-called leaderboard, that is, the ranking of the most powerful models. I see that the one announced yesterday is not yet present. So, are these systems powerful enough to allow the creation of agents?

Matteo 
Of course, again, the definitions are very opaque so it’s difficult to define what is a real agent and what isn’t. However, if you start from the assumption that the agent is autonomous, has a certain type of internal reasoning, etc., certainly for certain applications yes. Even applications that are not very simple but quite complex, it is possible to create agents via the language model. Then obviously this is a very very new field, because it has only been possible for a year and a half, after GPT-3 was released with the Zero Shot, that is, you give him the instruction without having to give him various examples. As we were saying, it becomes more reliable and can actually structure a real agent through instructions.

David 
We said that traditional training occurs through the use of a huge number of data, and the cleaner and more structured this is, the better, because it can represent a wide range of examples for different types of characteristics or information about the world. Then there is this characteristic that emerges of the ability to learn with few examples (few-shot learning) instead of many. Now you mentioned zero-shot learning, which is the ability to go ahead and perform a new action or answer a new type of question without having had any example in front of you. This is the general intelligence goal that everyone is working on.

Matteo 
Exactly this: having a zero-shot system that works practically perfectly for every type of task. I’d say we’re very close.

David 
We then took a short route and arrived at the agents. Before we go further and give further examples of how AI agents can be applied, how one might try to make use of them today or what is expected tomorrow, I would like to ask you about your startup, The Asyst, a London-based company active in the field of finance. I don’t know if it’s still in the stealth phase, where its activities aren’t yet totally public, but maybe you can tell us something about what it does.

Matteo 
We very simply act as an agent for financial research, therefore for funds, newspapers… Today I spoke with the Wall Street Journal. Banks, in short, all the various stakeholders in finance in one way or another have to do research and this is a process that takes a long time. The time of these professionals obviously costs a lot for various reasons, and the agent manages to do a very similar job for a cost that is obviously zero in comparison. For example, if a research through one of these analysts or journalists can take, say, 20 hours, and they get paid 50 euros an hour, do 50 times 20 and it’s 1000 euros for the company. Instead the agent will do it for you for 50 euros. So it’s a 95% saving from the company’s point of view. We started with an agent with memory, this was the first thing, which was supposed to be like an assistant for the world of finance. Then little by little we specialized in research specifically for institutions.

David 
Where are you at? Is it already available in beta or still alpha? Is it already released? At what stage is the development?

Matteo 
We did the alpha with various professionals in different banks, some funds, etc. Now, however, we are more in the beta phase, that is, where we are starting to market the product with funds, newspapers, etc.

David 
So the type of use is as if it were a search engine, except that instead of returning a series of links to pages where you then manually have to go and collate the information, the agent takes over the objective of what package and brings back a report of a certain length, I don’t know if two pages or two hundred pages, whatever it may be.

Matteo 
What you are describing is more what companies of a slightly different type do. For example there’s one called Perplexity, it’s very strong on this. Then now OpenAI is also doing this project, most likely with Apple, and they will release it in symbiosis. But what we do is a little different. Ours is not a search engine increased to the nth degree, ours is actually a real digital analyst, in the sense that it does everything an analyst would do.

David 
And the difference lies in the fact that the search engine, even the advanced one that Perplexity does today and that OpenAI will do, carries out this collage of information and then stops there. While the analyst, having seen the information available, draws conclusions and possibly makes recommendations, right?

Matteo 
Yes, not only that. On the augmented search engine, if we want to call it that, you cannot compare it to an analyst because they are two completely different things. One is as if you did a Google search, it jotted down three important points from the sources and gave you the answer. This can take 5 to 10 minutes for one person, which is still a significant time saver. But what we do instead is fundamental, quantitative, real research. This is a process that can take an analyst two, three, five, ten days, a few weeks. It’s just a completely different order of magnitude. And clearly the reason why we do this for financial institutions is also because there is a cost associated with this. Every action of the agent, i.e. every action of the language model infrastructure, has a certain associated cost. The more the model works, the more it costs you in the end. Financial institutions in this sense…

David 
Have you already decided how you will structure your prices? Will it be offered on a subscription basis? Is it installed by the user? Is there a set of report packages that can generate? Does each report cost a certain amount? Does each page of the report cost a certain amount? How do you plan to…?

Matteo 
It’s very similar to how you would price a person. As you say, I pay the person by the hour. Instead of paying for this per hour, which is not a correct measure of the input you put into it, you pay it per token consumed, therefore as a credit to the language model. It’s also much more measurable than an employee’s impact. This is useful because, if I am an employee, I can do things at a higher cognitive level than I would otherwise have to do, working on more…

David 
Here we go in a very important direction. In the meantime, thanks for describing Asyst’s activities. Let’s touch on the possible consequences of artificial intelligence assistants, where there are two schools of thought: those that say “Thank goodness, I can fire everyone”, and those that instead say “No, I have important, passionate, talented people who, as you just mentioned, have the ability today, because they can use these digital assistants, to do the most value-added, the most creative, the most unlikely things based on past habits or what we already know works, because the known functions are instead delegated to…”

Matteo 
It depends a lot on the application, because for certain applications there is certainly a risk of replacement. For example, the application that OpenAI released yesterday, where you talk to the agent… A lot of telemarketers, people who sell on the phone, people who talk on the phone in general will most likely be replaced by this software, because it doesn’t make much sense. have a person like that.

David
It doesn’t make sense anymore. But I’m a little confused about this. Not because you’re not right, in my opinion that will be the case, but because I simply don’t answer the phone. How do telemarketers catch people? Because someone answers. I say this half-jokingly. The fact that there are many of them, so they must be doing something right. I don’t understand, so I don’t talk. Maybe they will find another use, I don’t know. Instead, in the world of finance, since that’s what we do…

Matteo
In the world of finance, the risk of substitution is very low because institutions are not interested in reducing costs. It is not their priority, as they manage billions or trillions of dollars. Personnel costs are small in comparison. What they are actually interested in is having a competitive advantage over others, and we offer them exactly this. The first to have technology of this type at their disposal will be much more efficient and will therefore have an important advantage.

David
I found myself analyzing the results better, checking them and comparing them. I often have two or more engines do the same thing and ask each one which is the best of the various results, without saying where they come from. Interestingly, most of the time they agree with each other. I wonder if your financial clients will also do this: they will get a detailed report based on the output of your system and then spend time and human resources to verify it, deepen it, complement it or compare it with others.

Matteo
This is precisely the idea for many. Obviously there are various uses. The journalist I spoke to today said that he is interested in having a general perspective and then going to verify. For others it’s more about finding opportunities that you wouldn’t be able to spot doing the work manually. If you had to have a human analyst do the research, you can do one, whereas with this system you can do over a hundred and rework them all together. The opportunities you can identify are far superior.

David
One of the most interesting application fields of these AI systems is to apply them as assistants to developers, so-called co-pilots, to increase productivity by avoiding errors such as vulnerabilities in the code. Do you think it could also be an application for agents, making them increasingly autonomous in creating code?

Matteo
Yes of course, in the end the whole computational world is code. What we are doing now is also code. This paradigm starts with agents developing code autonomously, but then we get to the point where the code no longer exists and only natural language instruction exists.

David
It will be interesting and delicate when the software agent is given access to its source code, which is already happening to a certain degree.

Matteo
We need to differentiate between entities. There is the agent entity that can operate on its own code, but the really interesting thing, if we talk about technological singularity, is when the AI ​​reworks the code of actual intelligence. This would mean that the agent modifies not only the source code of the agent, but also the foundation model that underlies everything, such as GPT-4, Claude or Llama.

David
Allowing systems to interface with the world, to be autonomous by performing actions, to have access to their source code to improve it, traces steps in a clear direction that represents the dream of some and the nightmare of others in this very rapid moment of development .

Matteo
Exactly, this is one of the approaches. Then there is also the approach of using several collaborative agents in a complex system, where you have multiple agents working together. Our software uses a primitive form of this, which will become a very important field in the years to come. We don’t know yet what direction it will take, it’s still early days, but it seems like a very interesting thing that a lot of people are working on.

David
I show an example of agent swarm based on this library. What do you think?

Matteo
The idea is interesting, but the problem is that no one is working on it, it has practically been abandoned. There are others that are actively being worked on. One of the most notable is Microsoft’s Autogen. Then there are some very interesting smaller ones. It’s all still very experimental, there’s nothing practical, but little by little, having more powerful foundation models and reduced costs, we will certainly have something very interesting on this front.

David
You mentioned that your system represents about a 20x cost reduction compared to a human operator. Periodically OpenAI and others drastically lower costs. Yesterday a 50% drop for developers was announced. It’s going in such a direction that we will have a super abundance of intelligence to manage. Agents coordinated with each other and possibly structured in a hierarchical way will ensure that this redundancy is distilled to the point where they are sure they can carry out an action with the desired effects, or let the human operator know that he can ignore paths unproductive to focus on an appropriate decision or confirmation.

Matteo
You raised an interesting point about the hierarchy of these multi-agent systems that are still being researched, so it’s not yet known what direction they will take. We had once talked about how a decentralized decision-making system would also be interesting, almost a republic of agents. This can also be very interesting to explore.

David
Matteo, I know you have an upcoming client meeting for The Aissist and you need to prepare. In the meantime, thank you very much for this conversation. See you in a couple of weeks to talk about it again with the Milan chapter of Singularity University. Thanks for being with us.