AI For Financial Modeling Not Yet Perfect
Editor's note: This article is Part 1 of a three-part series on using AI with financial modelling. Part 2 will look at using AI in the scope, plan, and design stages before building a model; and Part 3 will look at the test and implement stages after the model has been built.
The AI Revolution: Hype vs. Reality in Finance
The world refuses to stand still. Artificial Intelligence (AI) is all around us, and we are being told we must embrace it or else become a casualty of history, documented and detailed presumably by ChatGPT, DeepSeek, or one of their siblings.
For those of us working in accounting and finance, we are constantly being asked by management how we might become more productive, more efficient, more effective. Not a day goes by without someone emailing me asking how to incorporate AI into the world of spreadsheet and financial modelling. A common request is to ask how to automate constructing financial models (this is so popular a question I am writing a book on the subject).
Well, spoiler alert: AI is not quite there — yet.
Understanding AI's Current Capabilities in Financial Modeling
Common AI tools such as ChatGPT, an AI developed by OpenAI, and Copilot, which uses ChatGPT technology, are essentially large language models (LLMs) trained to understand and generate human-like text. The model attempts to answer the prompt and then adjusts the algorithm based upon its accuracy. This is a purely computational and probabilistic response based upon a set of algorithms: The model has no understanding of the actual meanings behind the words. LLMs like structure, and most models are not structured in the same way languages are. They are more complex, causing LLMs to provide incomplete and/or incorrect results for now.
AI in Action: A Spreadsheet Test Case
To demonstrate, consider the following very “plain” example:
This spreadsheet example was stripped bare of any special formatting like frozen panes or stylised headings in an attempt to keep things as simple as possible. It is basically a spreadsheet containing date headings and inputs for sales, costs of goods sold (COGS), operating expenditure, and working capital timing.
The Pitfalls: AI Hallucinations and Inaccuracies
Despite these inputs being easily identifiable to a human, we found that even the very latest AI (at the time of writing) produces erroneous and incomplete results. Here is what ChatGPT cited when asked about line items (Copilot’s results were similar):
It’s not quite right, is it? I am not sure I built a historical model starting with results beginning a few months before the end of World War I — and then only for that first year, in contradiction to “… values corresponding to different time periods”.
It is well documented that AI can “hallucinate”, ie, detail facts that do not hold up to scrutiny. For example, a recent vanity search on AI led me to believe I am about to be “awarded a knighthood from Queen”. Perhaps this critique counts as my Bohemian Rhapsody.
The above is not a hallucination: It is AI not checking its own statements. Having a start date set in 1918 with only one period, even though several are recognised, clearly shows AI does not sense-check. In fact, when challenged, AI often doubles down and maintains its stance. For example, after several prompts, ChatGPT maintained the following were the only assumptions listed on the worksheet:
The AI neural network is confused by simple, linear inputs. What if we started asking for modelling calculations and outputs? Conscious much detail is missing from this high-level overview, I asked both ChatGPT and Copilot to create an income statement (P&L) from the above inputs (which also had other inputs not shown in the Excel screenshot):
The aim was to see how AI coped with dynamic depreciation calculations as the economic life of the assets was deliberately set to be less than the number of periods for the case study. This was to check whether AI might over-depreciate assets.
Back in 2023, several examples on the internet showed how poorly AI ventured on these sorts of tasks, struggling to compute even the most basic arithmetic. However, AI is learning quickly, and calculation engines have become freely available. For example, ChatGPT produced the following:
Of course, formatting of numbers, widening of columns, etc.could all have been performed, but arguably this isn’t what AI does. Interestingly, ChatGPT produced the same results I did.
However, Copilot wasn’t quite as accurate. It calculated the figures correctly until Depreciation, which then created knock-on effects in the EBIT and Net Income totals together with the dependent Taxes calculation.
Compared to 18 months ago, when several articles were published demonstrating the shortfalls in AI computations, this issue may seem less material. It is true that AI has improved immensely in the meantime, but with increased scale or different inputs, these errors may still multiply into something significant that cannot be ignored. The discrepancies in AI models also indicate a deeper issue.
Challenges in Verifying AI-Generated Financial Models
How can you rely on an AI-driven financial modelling calculation or output? You will need to review even the simplest calculations. Even though only one calculation appeared to be wrong, all results will need to be checked, as there is little logic in where an error might occur. This process will have to be undertaken manually, which suggests in the future, model auditing may become more prevalent in the finance arena.
Checking models will not be that straightforward either. Asking AI to produce outputs rather than write specific formulae tends to generate outputs with hard-coded results, eg:
In order to automate a model and build and rely on it, it seems you will have to rebuild it first to check it. That seems to defeat the entire purpose of the task.
Given one of the primary reasons we model in Excel is to undertake what-if? analysis, generating hard-coded results is not helpful. Asking AI to produce the formulae often led to incorrect calculations with results where the numbers would differ from those originally displayed. This leads to end users trusting neither the results nor the model.
Prompt Engineering vs. First-Time Accuracy
AI needs to make a good first impression. The problem is LLMs are designed to iterate to the correct answer with a tighter and tighter regimen of questioning (what is known as “prompt engineering”):
Iterating to get to solutions will require a change of managerial mindset, where the focus is on total quality management and “getting it right the first time, every time”.
ChatGPT and Copilot both suffered from recurring issues that required intervention from an experienced modeller to identify and rectify. These included illogical steps (one period being treated as many), inconsistencies in calculations, and, interestingly, forcing unnecessary consistencies in calculations, too.
To clarify the final point, consider the following. Some calculations are only required in one period (eg,terminal value in a discounted cash flow or the final repayment of a loan). Copilot in particular does not cope well with “one-off” calculations presently and prefers to have such computations in all periods. Of course, these issues can be rectified — but would it not have been simpler and faster to have just built the model manually?
Limitations of Current AI Tools
Furthermore, you should note that AI threads have limits to the amount of pre-data (past prompts and answers to those prompts) that may be considered at any given point. ChatGPT will ask you to create a new thread if a conversation becomes too overloaded, which may result in lost progress. Copilot also has a limited number of responses:
The Human Element Remains Crucial (For Now)
LLMs are not experts in tax, accounting, or valuations. They are merely trained to understand and generate human-like text based upon source data provided. Skynet has not yet been built. Whilst it may be best not to antagonise our would-be future AI overlords, accountants will still be required to build, modify, and extend their financial planning and analytical models. For the time being.
A Word to the Wise: Navigating AI in Financial Modeling
This article may seem a little negative, but this is only the first in a three-part series. This is intended to address the most common questions asked surrounding automating the development of a financial model or spreadsheet. AI can help, but you need to understand the current limitations, risks, and required checking/auditing procedures you should implement in such instances.
Hopefully, this will help you explain to colleagues what is — and what is not — possible presently. However, AI is ever-improving and in the near future may produce much more positive results.
This is not to say AI should not be considered when working with financial models. In the second and third parts of this series, I will explain how AI can be a useful preparation tool for building models and interpreting/analysing those models that have already been built.
— Liam Bastick, FCMA, CGMA, FCA, is director of SumProduct, a global consultancy specialising in Excel training. He is also an Excel MVP (as appointed by Microsoft) and author of Introduction to Financial Modelling and Continuing Financial Modelling. Send ideas for future Excel-related articles to him at liam.bastick@sumproduct.com. To comment on this article or to suggest an idea for another article, contact Oliver Rowe at Oliver.Rowe@aicpa-cima.com.
Further Learning and Resources
LEARNING RESOURCE
AI-Powered Excel: Leveraging AI and ChatGPT for Supercharged Productivity
This webcast will have you streamlining your Excel work, research, and documentation, saving you time and effort in your day-to-day tasks.
WEBCAST
MEMBER RESOURCES
Articles
- “Using the AI in Power BI to Do Root Cause Analyses”, FM magazine, 27 March 2025
- “Excel Modelling: How to Implement 3 Types of Checks”, FM magazine, 25 March 2025