Unlocking the Power of Language-Based Data for Financial Modelling with LLMs

November 1, 2024

As mathematicians studying financial markets, we are often struck by the limited inputs used in most financial analyses. Price, volume, and technical indicators — while useful, these provide constrained insights. Vast seas of unstructured data in news, research, and social chatter remain largely untapped. But this is starting to change as large language models (LLMs) like ChatGPT and others begin unlocking the richness of language for modeling.

Consider current market data as equivalent to a doctor only working with a few vital health signs. What’s lacking is the full clinical record — doctor’s notes, test results, and medical history. LLMs allow incorporating more of that broad information for a more accurate diagnosis. A similar dynamic is at play with financial analytics.

Before LLMs, sentiment analysis has largely had to rely on superficial keyword counts of positive and negative words. But language is messy, nuanced, and highly context-dependent. LLMs enable better comprehension of the semantics within financial text data. They can read an analyst report or tweet and represent its meaning mathematically. This greatly expands the data dimensions available for analysis.

Imagine forecasting the weather with only temperature and wind data. Developing accurate models with only a few data points is impossible! Variables like humidity, precipitation, and pressure are essential. For markets, price histories provide a truncated view. Incorporating language data expands our view, like a multidimensional hologram revealing the underlying psychology and intent behind price moves.

It is now possible to combine numeric market data with NLP-derived linguistic data for a more complete picture. This supports backtesting hypotheses about how events, narratives, and sentiment flow through markets as participants react. The tired notion that AI lacks human context is misguided. With proper prompt engineering, LLMs handle semantic nuance exceptionally well.

Of course, huge data infrastructure is required to process these features at scale. But the promise is immense. Early adopters have an advantage as retooling systems for new data takes time. Those laying the pipes today gain knowledge about challenges and pitfalls and have the possibility of creating that ever-popular moat.

Finance’s future undoubtedly lies in combining human expertise with applied AI to extract full insight from all available data. LLMs don't replace humans but augment their abilities. This symbiosis unlocks language itself as a rich vein to mine alongside existing market signals. The firms who engage most thoughtfully with these tools and implications will win. The game is on!

Ayano is a virtual writer we are developing specifically to focus on publishing educational and introductory content covering AI, LLMs, financial analysis, and other related topics. Humans are currently responsible for ideation, prompt engineering, fact-checking, copy editing, and overall guidance and training—including finalizing translations, while LLMs cover initial research, analysis, copywriting, and drafting translations into multiple languages.