Machine Learning and Big Data Investing
Machine Learning and Big Data Investing
Executive Summary
Machine learning enables computers or machines to learn from data directly without being explicitly programmed.
By nature, machine learning models can capture nonlinearities better than traditional models can. To extract valu-
able information hidden in a large dataset, you need to use modern tools for processing big data and machine learn-
ing together.
What can you do with machine learning and big data in investing?
• Asset allocation and optimization
• Sentiment analysis with natural language processing
• Outlier and fraud detection
• Financial forecasting and price prediction
All these applications use machine learning in novel ways to solve fundamental investment challenges. However,
successfully applying machine learning techniques requires data science skills—a special skill set with a low supply
and high demand.
Addressing the data science skills gap, the August 2018 LinkedIn Workforce Report illustrated that the shortage of
data scientists in the U.S. has grown to 151,717 people and already spread beyond the finance and tech industries.1
And then there’s an opportunity to deploy machine learning models in traditional IT systems or on the cloud, where
analytics work in real time for decision support and decision automation.
This white paper shows illustrates how you can apply machine learning and big data techniques to solve investment
problems and improve investment performance.
1
https://news.linkedin.com/2018/8/linkedin-workforce-report-august-2018
W H I T E PA P E R | 2
Machine Learning and Big Data in Quantitative Investing
• Reinforcement learning: learning behaviors or actions. The aim of reinforcement learning is to build a model
that can perform a series of actions to maximize cumulative rewards. Instead of using a known set of input and
output, reinforcement learning optimizes actions relative to a reward function. Fundamentally, reinforcement
learning is like trial and error, in which the agent learns from positive and negative rewards based on its action.
Learn More
»»Machine Learning with MATLAB - Ebook
W H I T E PA P E R | 3
Machine Learning and Big Data in Quantitative Investing
Learn More
»»Mastering Machine Learning: A Step-by-Step Guide with MATLAB - Ebook
W H I T E PA P E R | 4
Machine Learning and Big Data in Quantitative Investing
• Files: Many datasets consist of a large number of small and medium-sized files. The number of files can grow
quickly, and the files often do not fit into the memory of a single computer. These files typically reside within one
or more directories on a shared drive and may consist of delimited text, spreadsheets, images, videos, and various
proprietary formats.
• Databases: A wide range of database types are used to store and manage big sets of data in finance, including rela-
tional, graph, and document databases.
• Hadoop: Hadoop® is a system for storing and processing big datasets based on distributed computing and storage
principles. It comprises two major subsystems that coexist on a cluster of computer servers (Figure 2):
• Hadoop Distributed File System (HDFS): A large failure-resistant file system
• YARN: An application scheduling framework that manages applications that run on Hadoop, including batch
processing frameworks such as MapReduce and Spark™, and SQL interfaces such as Hive and Impala
W H I T E PA P E R | 5
Machine Learning and Big Data in Quantitative Investing
What Can You Do with Machine Learning and Big Data in Investing?
Consider using machine learning and big data when you have a complex task or problem involving a large amount of
data and lots of variables, but no existing formula or equation. For example, machine learning and big data tech-
niques are a good option when:
• The nature of data is unstructured (e.g., a combination of text, image, audio, or video).
• You need to quickly respond to large amounts of or high-velocity data, as in trade execution.
• Expert knowledge, handwritten rules, and equations are too complex to model, as in news sentiment analysis.
• The nature of the data keeps changing, and the program needs to adapt, as in asset allocation (Figure 3), automat-
ed trading, energy demand forecasting, and price trend prediction.
»» Asset Allocation - Hierarchical Risk Parity (Code Example)
Figure 3. Asset allocation workflow using hierarchical risk parity analysis (HRP) in MATLAB.
W H I T E PA P E R | 6
Machine Learning and Big Data in Quantitative Investing
W H I T E PA P E R | 7
Machine Learning and Big Data in Quantitative Investing
W H I T E PA P E R | 8
Machine Learning and Big Data in Quantitative Investing
“Our previous system was so tedious and our datasets are so large that I don’t think this
would have been possible without MATLAB and its ability to handle big data and interact
directly with Bloomberg and our database.”
— Ananthi Jegan, Olam CFSG
“Before we had MATLAB, we would not have been able to produce the clustering model
within a reasonable time. We simply would not have done it. MATLAB has opened up
new horizons for us.”
— Pierre-Yves Boillat, Banque Cantonale Vaudoise
W H I T E PA P E R | 9
Machine Learning and Big Data in Quantitative Investing
“MATLAB, MATLAB Production Server, and MathWorks Training Services enabled people
on our risk team with conditional programming experience in C++ or Java® to efficiently
develop a core library for financial analysis and then deploy it as a web application,
making it available to production systems in our enterprise environment.”
— Marcus Veltum, Helaba Invest
Figure 4. The Classification Learner app, which lets you interactively train, validate, and tune classification models.
W H I T E PA P E R | 10
Machine Learning and Big Data in Quantitative Investing
Conclusion
Machine learning and big data enable investment managers to make informed decisions based on data-driven
insights and predictions not previously available through traditional approaches. New applications and investment
insights can be delivered to end customers faster than before.
MATLAB provides an interactive environment and prebuilt functions and libraries to enable quants to become data
scientists and develop custom machine learning models. Through flexible deployment options, you can integrate
production-ready models quickly into existing IT infrastructure, saving time and eliminating error-prone transla-
tions to different programming environments.
Learn More
Machine Learning for Algorithmic Trading (32:55) - Video
Machine Learning Made Easy (34:34) - Video
Forecasting Bitcoin Volatility Using the Regression Learner App (5:56) - Video
MATLAB for Quantitative Finance and Risk Management - Free Product Trial
© 2019 The MathWorks, Inc. MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See mathworks.com/trademarks for a list of additional trademarks.
Other product or brand names may be trademarks or registered trademarks of their respective holders.
W H I T E PA P E R | 11