Big Data and How BI Got Its Groove Back
Big Data and How BI Got Its Groove Back
Greg McDowell [email protected] (415) 835-3934 Patrick Walravens [email protected] (415) 835-8943 Peter Lowry [email protected] (415) 869-4418
Big Data and Howbullet text here Its Groove Back Insert bullet heading here. Insert BI Got
Insert bullet heading here. Insert bullet text here
FOR DISCLOSURE AND FOOTNOTE INFORMATION, REFER TO THE JMP FACTS AND DISCLOSURES SECTION
TABLE OF CONTENTS
Executive Summary....3 Part I: An Introduction to Big Data A Big Data Primer....4 Big Data Market Opportunity....11 The Data Management Landscape.16 The Resurgence of Business Intelligence..23 Big Data Stock Performance....33 Part II: Initiation Summaries Part III: Privately Held Companies in Big Data Space (hard copy only) JMP Securities Software Team ....44 JMP Facts and Disclosures..46
EXECUTIVE SUMMARY
We believe the proliferation of data is one of the most disruptive forces in technology today. Despite accounting for a small portion of industry revenues today, we believe "Big Data" is poised for rapid growth. The purpose of this report is to help investors better understand "Big Data" and the market opportunity ahead. Part I: An Introduction to Big Data and the Resurgence of Business Intelligence In Part I of this report, we define and size the market opportunities created by Big Data. We define Big Data as data sets of extreme volume and extreme variety. In 2011, we estimate that "Big Data" is a $9.1 billion market opportunity, representing only 2% of the $407 billion spent on software, storage, and servers. We refer to this software, storage, and server spending as enterprise IT spending. Ten years ago, spending on Big Data was minimal due to the fact that data sets (volume) were much smaller, data had less variety, and the velocity of data flowing into organizations was much slower. Over the next ten years, we expect Big Data-related computing to increase to $86.4 billion, representing 11% of all enterprise IT spending and a 10-year CAGR of 25%. The key growth driver of Big Data is the proliferation of data. This proliferation of data has caused enterprises to need new tools and processes to collect data (both structured and unstructured) and also to store, manage, manipulate, analyze, aggregate, combine, and integrate data. In Part I we also discuss the resurgence of the Business Intelligence ("BI") market. We believe the business intelligence landscape is about to go through a major sea change that will radically transform the landscape and the way the industry thinks about analytics. In our view, the two primary reasons for the sea change are Big Data and the consumerization of Enterprise BI driven by trends such as mobile BI. With respect to Big Data, we believe it has become very easy to collect data, but difficult to make sense of that data using traditional BI tools. In other words, as the useful life of information has decreased, so has the utility of traditional BI tools which have historically been very backwards looking. Part II: Key Publicly Traded Companies in the Big Data Space In Part II of this report, we are initiating coverage of the infrastructure software group with a relatively constructive viewpoint. In the current volatile environment for stocks, we believe long-term investors should focus on the positive implications of emerging secular trends such as Big Data that could create significant profit opportunities over the next few years. We are recommending software companies with solid but flexible operating strategies that, in our opinion, will be primary beneficiaries of the Big Data trend. We are initiating coverage on six infrastructure software companies: MicroStrategy Inc. (MSTR), Progress Software Corp. (PRGS), Qlik Technologies (QLIK), Quest Software (QSFT), Teradata Corporation (TDC), and Tibco Software Inc. (TIBX). We are initiating coverage as follows: MicroStrategy with a Market Outperform rating and $140 price target. Progress Software with a Market Perform rating. Qlik Technologies ("QlikTech") with a Market Outperform rating and a $35 price target. Quest Software with a Market Perform rating. Teradata with a Market Outperform rating and $63 price target. TIBCO Software with a Market Outperform rating and a $33 price target. We also discuss the Big Data strategies for eight other publicly traded companies.
Part III- Privately Held Companies in Big Data Space (Available in Hard Copy Only) In Part III of this report, we provide profiles of 100 leading private software companies that are leveraged to benefit from the Big Data trend. Many of these companies approach the Big Data market from different angles, including the NoSQL movement, in-memory databases, columnar databases, Hadoop-related technologies, data grid/data cache solutions, solutions related to open source R, data visualization, predictive analytics, and real-time dashboards. Our favorite private companies include Cloudera, Splunk, Tableau Software, and Talend.
3
IDC "Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis." Forrester "Big data: techniques and technologies that make handling data at extreme scale economical." 451 Group "Big data is a term applied to data sets that are large, complex or dynamic (or a combination thereof) and for which there is a requirement to capture, manage and process the data set in its entirety, such that it is not possible to process the data using traditional software tools and analytic techniques within tolerable time frames." McKinsey Global Institute "Big data" refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze. This definition is intentionally subjective and incorporates a moving definition of how big a dataset needs to be in order to be considered big data (i.e., we don't define big data in terms of being larger than a certain number of terabytes (thousands of gigabytes). We assume that, as technology advances over time, the size of datasets that qualify as big data will also increase." Gartner "When business leaders or data management professionals talk about big data, they often emphasize volume, with minimal consideration of velocity, variety and complexity the other aspects of quantification: Velocity involves streams of data, structured record creation, and availability for access and delivery. Velocity means both how fast data is being produced and how fast the data must be processed to meet demand. Variety includes tabular data (databases), hierarchical data, documents, e-mail, metering data, video, image, audio, stock ticker data, financial transactions and more. Complexity means that different standards, domain rules and even storage formats can exist with each asset type. An information management system for media cannot have only one video solution."
Source: IDC, Forrester, 451 Group, McKinsey and Gartner
We think the third-party analyst firms have done a commendable job in their attempts to define Big Data. We do point out, however, that both vendors and industry analysts have latched onto the concept of the three V's: Volume, Velocity and Variety. We are also seeing firms add additional V's such as Variability and Value.
Many of these firms have also provided some useful illustrations on Big Data, as shown in Figure 2 below. FIGURE 2: Forrester's Four V's of Extreme Scale
Source: http://blogs.forrester.com/brian_hopkins/11-08-29-big_data_brewer_and_a_couple_of_webinars
Gartner takes a similar approach with the "V's", but also adds Complexity, as shown in Figure 3 below. FIGURE 3: Garter's Big Data Graph
Source: Gartner
Just how popular is the term Big Data becoming? A quick look at Google Trends Search Volume Index reveals the popularity of the term, as shown in Figure 4 below: FIGURE 4: Google Trends of Term "Big Data"
We compared the term "Big Data" to "Cloud Computing" and interestingly, the trajectory of "Cloud Computing" in 2008 was very similar. Investors who bought publicly-traded companies leveraged to the cloud-computing trend in 2008 have done well, as evidenced by the price performance of stocks such as RightNow Technologies (RNOW, MP, $37 PT, Walravens), salesforce.com (CRM, MO, $170 PT), and VMware (VMW, MO, $123 PT). A look at the job trend graph on Indeed.com illustrates a similar trend as shown in Figure 5 below: FIGURE 5: Job Trends from Indeed.com for Term "Big Data"
Source: Indeed.com
Source: Informatica
One reason the concept of Big Data even exists is because the world's technological installed capacity to store information has increased by a factor of 113 in a 20-year period (1986-2007), as shown in Figure 7 below. In an excellent article by Martin Hilbert and Priscilla Lopez published in Science magazine, the authors estimated that the total amount of information grew from 2.6 optimallycompressed exabytes in 1986 to 295 optimally-compressed exabytes in 2007. The authors note that "piling up the imagined 404 billion CD-ROM from 2007 would create a stack from the earth to the moon and a quarter of this distance beyond (with 1.2 mm thickness per CD)." In a short span of 20 years we have moved from an almost 100% analog driven world (books, newsprint, x-rays, etc.) in 1986 to a primarily digital driven world in 2007. FIGURE 7: Worlds Technological Installed Capacity to Store Information
Source: Published Online 10 February 2011, Science 1 April 2011: Vol. 332 no. 6025 pp. 60-65, DOI: 10.1126/science.1200970 Article Title: The World's Technological Capacity to Store, Communicate, and Compute Information Article Authors: Martin Hilbert and Priscilla Lopez
We like Figure 8 below because it shows, in MIPS (million instructions per second), the world's technological installed capacity to compute information on general-purpose computers. As shown, we have gone from a world in 1986 where 41% of installed capacity was by pocket calculator, to 2007, when pocket calculators were less than 1%. FIGURE 8: Worlds Technological Installed Capacity to Compute Information on Generalpurpose Computers, in MIPS
Source: Published Online 10 February 2011, Science 1 April 2011: Vol. 332 no. 6025 pp. 60-65, DOI: 10.1126/science.1200970 Article Title: The World's Technological Capacity to Store, Communicate, and Compute Information Article Authors: Martin Hilbert and Priscilla Lopez
Just how fast is this "digital universe" expected to grow looking ahead? According to IDC, as shown in Figure 9 below, in 2009 there were nearly 800,000 petabytes (a million gigabytes) in the digital universe. In 2011, the amount of information created and replicated will surpass 1.8 zettabytes (1.8 trillion gigabytes) - growing by a factor of nine in just five years. By 2020, IDC expects the number to grow to 35 zettabytes, which is a factor of 44 and a CAGR of 40%.
Source: IDC
The explosion of data is causing new firms and technologies to emerge. Our favorite private company example is Splunk. Splunk is the engine for machine data. It is software which collects, indexes and harnesses any machine data generated by an organizations IT systems and infrastructure - physical, virtual and in the cloud. According to Splunk, machine data is unstructured, massive in scale and contains a categorical record of all transactions, systems, applications, user activities, security threats and fraudulent activity. Splunk can be used in a variety of use cases, including application management, security and compliance, infrastructure and IT Operations Management, and business and web analytics. Almost half of the Fortune 100 and over 2,900 licensed customers in 70 countries use Splunk. Interestingly, beginning with Version 4 Splunk uses MapReduce to retrieve and analyze massive datasets.
10
11
Based on our estimates and IDC estimates, we project that the total Enterprise IT market will grow around 5% over the next 10 years, reaching $676 billion by 2021. FIGURE 11: Total Enterprise IT Spending: 2011-2021
12
Because Big Data is becoming a larger share of enterprise IT spending, it is growing much faster than the overall enterprise IT market. As shown in Figure 12 below, we expect Big Data to grow from $9.1 billion in 2011 to $86.4 billion in 2021, a compound annual growth rate of 25%. FIGURE 12: Big Data Estimates: 2011-2021
We arrive at these estimates by making certain assumptions of different components of the Big Data market. In the next section, we break down the different components of the market.
13
We believe the Big Data market is comprised primarily of three different sub-segments: Business Analytics, Storage, and Servers. In this section we define the total size of these markets and discuss penetration rates of Big Data in each of these markets. Figure 13 below highlights the total size of these markets, based on IDC and JMP estimates. As shown, the total market is around $131.4 billion in 2011 growing to $238.4 billion in 2021. FIGURE 13: Total Market Size of Business Analytics, Storage, and Services
Business Analytics YOY Growth Storage YOY Growth Servers YOY Growth Total Market Size YOY Growth 2011 32.0 45.2 54.1 131.4 2012 35.3 10% 47.5 5% 54.6 1% 137.4 5% 2013 39.0 10% 49.9 5% 55.0 1% 143.9 5% 2014 43.0 10% 50.8 2% 55.0 0% 148.8 3% 2015 47.2 10% 53.3 5% 55.7 1% 156.3 5% 2016 52.2 10% 56.0 5% 58.5 5% 166.7 7% 2017 57.8 11% 58.8 5% 61.4 5% 178.1 7% 2018 64.4 11% 61.7 5% 64.5 5% 190.7 7% 2019 72.2 12% 64.8 5% 67.7 5% 204.7 7% 2020 81.3 13% 68.1 5% 71.1 5% 220.5 8% 2021 92.2 13% 71.5 5% 74.7 5% 238.4 8%
The segment that requires the most explanation, in our opinion, is the Business Analytics market. IDC defines the Business Analytics Market as the "combination of the data warehouse (DW) platform software with performance management and analytic applications and business intelligence (BI) and analytic tools." Figure 13 below provides a taxonomy of the Business Analytics market. As shown, there are three overall categories of the business analytics market: BI and Analytics Tools, Data Warehousing Platform software, and Analytic Applications. IDC expects these three markets to grow at a 2010-2015 CAGR of 9.2%, 9.8%, and 7.9%, respectively, with the total Business Analytics Market representing a CAGR of 8.9%. The Business Analytics market is expected to grow from $30.7 billion in 2011 to $43.1 billion in 2015. FIGURE 14: IDC's Business Analytics Taxonomy, 2011
Source: IDC
14
The key question we had to ask ourselves in trying to size the Big Data market was "What could Big Data's penetration be within each of the three main sub-segments of the market: Business Analytics, Storage, and Servers?" In other words, what percentage of the total market is comprised of projects that can fall under the Big Data definition? As a baseline, we have assumed that around 7% of the size of the Business Analytics, Storage and Servers market in 2011 meets the definition of Big Data. We assume that by 2021, 36% of the Business Analytics, Storage, and Servers market will meet the definition of Big Data. This leads to the breakdown of the $9.1 billion estimate in 2011 and the $86.4 billion estimate in 2021, as shown in Figure 15 below.
FIGURE 15: Total Market Size of Business Analytics, Storage, and Services (in billions)
Business Analytics YOY Growth Storage YOY Growth Servers YOY Growth Big Data YOY Growth $ $ $ $ 2011 2012 2.4 $ 3.2 $ 33% 3.1 $ 3.9 $ 28% 3.6 $ 4.5 $ 24% 9.1 $ 11.6 $ 28% 2013 4.3 33% 5.1 28% 5.6 24% 14.9 28% $ $ $ $ 2014 5.6 32% 6.3 24% 6.8 22% 18.7 26% $ $ $ $ 2015 7.4 31% 8.0 28% 8.4 24% 23.9 27% $ $ $ $ 2016 9.7 32% 10.3 28% 10.8 28% 30.8 29% $ $ $ $ 2017 12.8 32% 13.2 28% 13.8 28% 39.8 29% $ $ $ $ 2018 16.8 32% 15.2 16% 16.0 16% 48.0 21% 2019 $ 22.1 32% $ 17.6 16% $ 18.5 16% $ 58.2 21% $ $ $ $ 2020 29.1 32% 20.3 16% 21.3 16% 70.7 22% $ $ $ $ 2021 38.3 32% 23.5 16% 24.6 16% 86.4 22%
15
Source: http://blogs.the451group.com/information_management/2011/04/15/nosql-newsql-and-beyond/
One technology in the above graph that we would like to highlight is Hadoop. Hadoop and Big Data are often used in the same breath. While we contend that Big Data is a lot more than just Hadoop, it is useful to understand what Hadoop is, in order to have a better appreciation for the Big Data movement.
16
According to the Apache Hadoop website, Hadoop is defined as "the Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver highavailability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures." Like the term "Big Data", Hadoop is increasingly popular, as shown in Figure 17 below. FIGURE 17: Google Trends of Term "Hadoop"
A look at the job trend graph on Indeed.com illustrates a similar trend as shown in Figure 18 below: FIGURE 18: Job Trends from Indeed.com for Term "Hadoop"
17
FIGURE 19: Hadoop Subprojects and Related Projects at Apache The project includes these subprojects: Hadoop Common: The common utilities that support the other Hadoop subprojects. Hadoop Distributed File System (HDFS): A distributed file system that provides highthroughput access to application data. Hadoop MapReduce: A software framework for distributed processing of large data sets on compute clusters. Other Hadoop-related projects at Apache include: Avro: A data serialization system. Cassandra: A scalable multi-master database with no single points of failure. Chukwa: A data collection system for managing large distributed systems. HBase: A scalable, distributed database that supports structured data storage for large tables. Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying. Mahout: A Scalable machine learning and data mining library. Pig: A high-level data-flow language and execution framework for parallel computation. ZooKeeper: A high-performance coordination service for distributed applications.
Source: http://hadoop.apache.org/
There are a number of private companies that are producing commercial offerings around the Hadoop community. Private companies such as Appistry, Cloudera, DataStax, Hortonworks, and MapR Technologies are all worth watching and are highlighted in the private company section of this report. In the next section, we provide real world case studies of how enterprises are using Hadoop and Hadoop related technologies.
18
19
Wal-Mart (WMT, Not Covered) Hadoop is part of Wal-Mart's strategy to analyze large amounts of data to better compete against online retailers including Amazon.com. With the increasing role that social networking sites such as Facebook and Twitter are playing in online shopping, Wal-Mart is also looking to glean insights into what consumers want. Wal-Mart uses Hadoop in its keyword campaigns to drive traffic from search engines to Walmart.com. The software collects information about millions of keywords and then comes up with optimal bids for each word. It also allows them to create language models so the site can return more relevant product results when a user searches for a specific product or an item based on that users' Tweets or Facebook posts. Tennessee Valley Authority (TVA) The Tennessee Valley Authority ("TVA") is a federally-owned corporation in the United States that provides flood control, electricity generation, and economic development in the Tennessee Valley. The TVA was selected to collect data from phasor measurement unit ("PMU") devices on behalf of the North American Electric Reliability Corporation ("NERC") to help ensure the reliability of the bulk power system in North America. PMU data includes voltage, current, frequency and location data, and is considered part of the measurement data for the generation and transmission portion of the so-called smart grid. It uses smart-grid field devices to collect data on its power-transmission lines and facilities across the country. These sensors send in data at a rate of 30 times per second and the rate of incoming PMU data was growing very quickly with more and more PMU devices coming online regularly. The TVA was faced with the problem of how to reliably store this data and make it available for use. Hadoop was selected because it solved their storage issues and provided a robust computing platform to analyze the data. It also allowed them to employ commodity hardware and open source software at a fraction of the price of proprietary systems to achieve a much more manageable expenditure curve as its repository grows. Rapleaf Rapleaf helps businesses create more personalized experiences to their customers by providing them with useful information about each customer, such as age, gender, location and interests via their Personalization API. Businesses leverage this insight to better understand their customers in order to personalize deals and offers, show them more relevant content and give them a better experience online and off. Rapleaf has a vast amount of consumer data which includes over a billion email addresses and terabytes of data. Hadoop has allowed Rapleaf to manage and work with this data a scale much more easily than their previous RDBMS systems. They have implemented a batch-oriented process that allows them to ingest and normalize raw data from numerous sources, analyze it and then package the data into easily-served objects. Crossbow Crossbow is an open-source, Hadoop-enabled software pipeline for quickly, accurately, and cheaply analyzing human genomes in the cloud. While human genomes are about 99.9% identical, discovering differences between genomes is the key to understanding many diseases, including how to treat them. While sequencing has undoubtedly become an important and ubiquitous tool, the rapid improvements in sequencing technology have created a firehose problem of how to store and analyze the huge volume of DNA sequence data being generated in a short period of time. Presently, the process of scanning and mapping generates about 100GB of compressed data (read sequences and associated quality scores) for one human genome. Crossbow combines one of the fastest sequence alignment algorithms, Bowtie, with a very accurate genotyping algorithm, SoapSNP, within Hadoop to distribute and accelerate the computation. The pipeline can accurately analyze an entire genome in one day on a 10-node local cluster or in about three hours for less than $100, using a 40-node, 320-core cluster rented from Amazons (AMZN, NC) EC2 utility computing service. Our evaluation against a gold standard of known differences within the individual, shows Crossbow is better than 99% accurate at identifying differences between human genomes. Crossbow will enable the computational analysis without requiring researchers to own or maintain their own computer infrastructure. Bank of America (BAC, Market Perform, Covered By David Trone) With Hadoop, Bank of America has been able to analyze billions of records to gain a better understanding of the impact of new and existing financial products. The bank can now examine things like credit and operational risk of products across different lines of business including home loans, insurance, and online banking.
20
Disney (DIS, Not Covered) Disney was faced with the challenge of what to do with the increasing amount of data collected from business operations, customers transactions, along with unstructured data created by social media and their various web properties (i.e. ESPN and ABC). Disney's Technology Shared Service Group uses Hadoop as a cost-effective way to analyze and correlate information from all of its different businesses including theme-park attendance, reservations at resort hotels, purchases from Disney stores and viewership of Disney's cable TV programming. General Electric (GE, Not Covered) GE is running several use cases on their Hadoop cluster, which gave them deeper analytic capabilities and insights into their business. The marketing and communications teams can assess how the public perceives the company through sentiment analysis. It uses Hadoop to mine text such as updates on Facebook and Twitter along with news reports and other information on the Internet to understand, with 80-percent accuracy, how consumers feel about GE and its various divisions. They have also built a recommendations engine for their intranet allowing them to display targeted press releases to each user based on their job function, user profile and prior visits to the site. Finally, Hadoop enables them to work with several types of remote monitoring and diagnostic data from their energy and wind business.
21
NoSQL MOVEMENT
Besides Hadoop, one of the most interesting areas of the Data Management landscape is the NoSQL movement. The NoSQL movement (or sometimes called the "not only SQL" movement) refers to database management systems that tend to be non-relational. The NoSQL movement consists of four primary categories. Key-value stores, Bigtable clones, document databases, and graph databases. Figure 19 below highlights the four NoSQL categories from a "data size" and "data complexity" angle. The first NoSQL category is "key-value stores", which is based on Amazon's Dynamo paper that was published in 2007. The data model of key-value stores is a collection of K-V pairs. Examples include Dynomite, Voldemort, Membrain, and Berkeley DB, among others. The second NoSQL category is BigTable clones, which is based on Google's BigTable paper that was published in 2006. The data model of BigTable is big table, column families. Examples of BigTable include HBase, Hypertable, and Cassandra. The third NoSQL category is document databases. People often think of Lotus Notes when they think of document databases. Examples include CouchDB, MongoDB, and RavenDB. The fourth NoSQL category is graph databases. A graph database "uses graph structures with nodes, edges, and properties to represent and store information." Examples of graph databases include AllegroGraph, Sones, Neo4J, InfiniteGraph and GraphDB.
Source: http://www.slideshare.net/emileifrem/nosql-overview-neo4j-intro-and-production-example-qcon-london2010?src=related_normal&rel=8600029
There is controversy around the use of NoSQL solutions. Proponents of NoSQL solutions cite the flexibility, scalability, low price, and NoSQL solutions appropriateness for specific use cases. Critics of the NoSQL movement often cite the maturity level of NoSQL solutions, the lack of commercial support, and the inability for NoSQL databases to work with traditional BI tools. Stepping back, we believe new data management technologies will continue to emerge to handle the Big Data trend. While traditional RDBMS's will likely continue to play a significant role well into the future, the days of one database standard or technology within an organization are quickly coming to an end, in our opinion
22
23
What are the hottest areas of business intelligence and analytics? TDWI Research analyst Philip Russom recently put together a thoughtful piece on Big Data Analytics. TDWI surveyed 360 companies across a broad sector of industries and found that Advanced Analytics, Advanced Data Visualization, and Predictive Analytics had the highest commitment levels with the most potential for growth, as shown in Figure 22 below. Gartner's research on the topic supports TDWI's survey, with Gartner noting that lighter footprint, visualization-based analytics and data discovery products are the fastest growing areas in the business intelligence space, growing at 3x the overall BI market, according to Gartner. Gartner expects the data discovery market alone to grow from $591 million in 2011 to $1.6 billion in 2015.
FIGURE 22: Options for Big Data Analytics Plotted by Potential Growth and Commitment
24
We believe the business intelligence landscape is about to go through a major sea change that will radically transform the landscape and the way the industry thinks about analytics. We have identified three primary reasons for the sea change: 1) 2) 3) Big Data and the Explosion of Data The Consumerization of Enterprise BI Industry Consolidation
We discuss each below: 1) Big Data and the Explosion of Data Earlier in this report we covered some of the reasons behind the explosion in data, including the precipitous drop in memory prices over the last 10 years and the sheer number of "devices" now collecting information for enterprises. The problem is simply that it has become very easy to collect data, but difficult to make sense of that data using traditional BI tools. In other words, as the useful life of information has decreased, so has the utility of traditional BI tools which have historically been very backwards looking. 2) The Consumerization of Enterprise BI We see users within enterprises increasingly demanding easier-to-use and more intuitive business intelligence solutions. This is driven simply by the consumerization of all enterprise IT. The consumerization of IT has been covered extensively by other sources, but in our view, simply means that there has been a shift from enterprise behavior influencing an individuals behavior at home (i.e., Circa 1995, a person saying I have email at work and I now want email at home) to behavior in the home influencing enterprise IT (i.e., Circa 2011, a person saying what do you mean I can't get my corporate email on my iPhone). We believe the consumerization of BI is being driven by individuals being able to have amazing analytics on their internet and mobile devices. These individuals increasingly insist on having access to analytics in their day-to-day jobs. 3) Industry Consolidation The final reason a major sea change is occurring in the business intelligence space, is simply the change in the vendor landscape. As investors recall, there has been massive consolidation as the heavyweights in the tech industry invested heavily in the space. The most prominent examples include IBM's acquisition of Cognos and SPSS, SAP's acquisition of Business Objects, and Oracle's acquisition of Hyperion.
In the next section, we discuss five areas within business intelligence that we believe investors should understand: 1) 2) 3) 4) 5) Business Analytics Versus Business Intelligence Agile BI Versus Traditional BI Business Intelligence in the Cloud The R Open Source Programming Language Data Visualization
25
Business Intelligence
Identify business trends Understand the timeframe of change in business trend Understand the different elements involved in the business trend Quantify the change in the business trend
Business Analytics
Understand and act on business trends Forecast the possibility of the trend occurring again Understand the implications of the business trend Understand other possible explanations and scenarios associated with the change in business trend Key aspects include (a) Statistical / quantitative analysis (b) Data mining (c) Predictive modeling (d) Multi-variate testing
5.
Key aspects include (a) (b) (c) (d) (e) (f) (g) Reporting (KPIs, metrics) Automated monitoring / alerting Dashboards Scorecards OLAP (Cubes, slice and dice, drill-down) Ad hoc query generation Deployment of solution to multiple devices
We make the distinction between Business Intelligence and Business Analytics to highlight that the BI industry so far has done a good job with the "Business Intelligence" side of the chart but in many ways we are still in the early innings of the "Business Analytics" side of the chart.
26
27
28
R OVERVIEW
R is an open-source programming language and software environment for statistical computing and graphics. R provides a wide variety of statistical and graphical techniques, including linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, and others. R is easily extensible through functions and extensions and the R community is noted for its active contributions in terms of packages. According to Rexer's Annual Data Miner Survey in 2010, R has become the most widely used data mining tool, used by 43% of data miners. R is an implementation of the S programming language, developed by Bell Laboratories in 1976 to provide an alternative and more interactive approach to statistical analysis than what had been currently available. R is a free, open-source dialect of S and there is also a commercial version called S-PLUS available. R is also part of the GNU project, created by Ross Ihaka and Robert Gentleman of the University of Auckland, New Zealand. It was initially conceived as both men wanted better-suited technology for their statistics students, who needed to analyze data and produce graphical models of the information and they found existing software difficult to use. What is R and Why is it Used? Recently, the business intelligence sector began taking notice of the many benefits of R programming language, which is particularly adaptive to predictive analytics. It can be used to identify patterns or trends in massive data sets, making it ideal for researching retail, financial, and medical trends. Predictive analytics is an area of statistical analysis that deals with extracting information from data and using it to predict future trends and behavior patterns as well as identifying risks and opportunities. Models capture relationships among many factors to allow the assessment of risk or potential associated with a particular set of conditions, guiding decision making for candidate transactions. As data mining and predictive analytics continue to rapidly accelerate, R provides the tools to support these activities across all industries including, actuarial science, financial services, insurance, telecommunications, retail, travel, healthcare, pharmaceuticals and others. It supports drug companies doing pharmaceutical development, insurance companies performing underwriting and risk assessment, and students performing academic research. R has become one of the most popular and primary programming languages used by statisticians, scientists, and data analysts both commercially and within academia. R's popularity seems to be a result of its usability, extensibility, and roots in open-source. R is an integrated programming environment for data manipulation, calculation and graphical display of data sets. It helps people perform a wide variety of computing tasks by giving them access to various commands and pre-supplied packets. It also allows users to script their own functions (or modify existing ones) to do custom tasks. This provides much of the flexibility of languages such as C, but with the advantage of building upon R's robust numerical routines, data management functions, and graphing tools. Its ease of use has made it especially appealing to people without deep computer programming skills and is making it the de facto standard. According to an article from the New York Times, "It allows statisticians to do very intricate and complicated analyses without knowing the blood and guts of computing systems." Another strength of R is static graphics and the ease with which welldesigned, publication-quality graphs and mathematical symbols can be produced. Speed is also one of the biggest draws for the R programming language, which can process up to 12 gigabytes of data in seconds. Because R has stronger object-oriented programming facilities than most statistical computing languages, it can be more easily customized and extended through the use of user-submitted packages for specific functions or specific areas of study. Advanced users can write C code to manipulate R objects directly or link code written in C, C++ or Fortran to R at run-time. Over 1500 packages exist today. Some examples include: BiodiversityR - offers a graphical interface aimed at simplifying the calculation of environmental trends Emu - analyzing speech patterns Finally, because R is open-source, users have the ability to freely modify existing or create entirely new functions and packages compared with commercial software packages that use proprietary functions to perform the analysis.
29
Who uses R? R is used by both corporate users and universities, with an estimated two millions users, particularly scientists, programmers and academics who routinely do research. While software from SAS Institute has been the preferred tool, R is gaining popularity, especially within academia. Its ability to perform high-end analytics combined with its open-source, free-distribution model seem to be key in this shift. Corporate customers include Google (Not Covered), Pfizer (NC), Merck (NC), Bank of America (BAC, MP, Trone), Shell (NC) and the InterContinental Group (NC). Google uses R to help it understand trends in ad pricing and illuminate trends in the search data it collects. Pfizer has created customized packages that allow its scientists to manipulate their own data during non-clinical drug studies immediately versus sending that information off to a statistician. A number of financial institutions have also used it to create packages to perform derivatives analysis. Wal-Mart (NC) is also a high-profile user of R, using it to interpret the needs and spending habits of customers. R Disadvantages: While there appear to be many advantages to R, there are also currently some disadvantages. In the eyes of some, SAS Institute is better suited to handle "big data." R is limited by RAM because the data is in memory. R also appears to lack documentation and has limited commercial support. Commercial Releases: In Oct 2011, Oracle (ORCL, MO, $36 PT, Walravens) announced the Big Data Appliance, which integrates R, Apache Hadoop, Oracle Enterprise Linux, a NoSQL database with the Exadata hardware. This is an engineered system optimized for acquiring, organizing and loading unstructured data into Oracle Database 11g. In 2007, Revolution Analytics was founded to provide commercial support for Revolution R, its distribution of R which also includes components developed by the company. It includes additional components such as a web services framework and the ability for reading and writing data in the SAS File Format.
30
DATA VISUALIZATION
The final area of business intelligence is the data visualization or business discovery market. This is the most exciting component of the business intelligence market, in our opinion, and includes vendors such as QlikTech, Tableau Software, and TIBCO Software's Spotfire. Perhaps the best way to understand the data visualization market is to simply look at the type of graphs/charts these tools can produce. Some of our favorite examples of what enterprises now expect are shown in Figure 24 and 25 below.
Source: http://globaleconomicanalysis.blogspot.com/search?updated-max=2011-09-02T10%3A27%3A00-05%3A00&max-results=3
31
Figure 25 below highlights an interactive visualization of an Average Draft Position (ADP) of the CBS Sports Fantasy Football league, powered by Tableau Software. FIGURE 25: Another Data Visualization Example
Source: http://fantasynews.cbssports.com/fantasyfootball
32
As shown in Figure 27 below, the best performing stock YTD in our Big Data index is TIBCO Software (TIBX), up 44% YTD, followed by MicroStrategy (MSTR), up 38% YTD, compared to a 0.4% decline in NASDAQ. The worst performing stock has been Progress Software (PRGS), down 26% YTD, compared to a 0.4% decline in NASDAQ. FIGURE 27: YTD Individual Stock Performance of Big Data Index Companies
Source: FactSet
33
While Big Data Stocks have outperformed the NASDAQ YTD, the stocks have also pulled back harder than the NASDAQ since the market started to turn south from July 22nd. Figure 28 below shows the median performance of the "Big Data Index" Compared to NASDAQ from July 22nd to November 10th. As shown, NASDAQ is down 8% while the "Big Data Index" is down 12% FIGURE 28: Big Data Stock Index Versus NASDAQ- Since July 22nd Pullback
Source: FactSet
The Big Data stocks that have pulled back the most since their 52 week highs include Pegasystems, Progress Software, and MicroStrategy, as shown in Figure 29 below. We believe the recent pullback may represent a compelling buying opportunity for investors to build or add to positions in selected Big Data Stocks. FIGURE 29: Percentage Change from 52-Week High
Source: FactSet
34
Source: FactSet
35
The fastest growing company in the Big Data Index using consensus estimates is Qlik Technologies, followed by NetApp, Pegasystems, and Informatica. FIGURE 31: Fastest Growing Companies in the "Big Data Index" (CY12 Estimates)
Source: FactSet
Interestingly, on a PEG Basis using 2012 P/E /2012-2013 EPS Growth, the Big Data stocks trade at 1.0x, the same multiple of our NASDAQ technology index. FIGURE 32: PEG Ratio of Big Data Index Versus HW/SW Technology Index
Source: FactSet
36
One metric we like to focus on is Free Cash Flow. On a TTM basis, our Big Data index trades at.21x EV/TTM FCF versus our NASDAQ technology index of 18x. FIGURE 33: EV/TTM FCF
Source: FactSet
While this multiple seems steep, when we compare it to the expected 2011 revenue growth, the stocks appear inexpensive compared to the NASDAQ technology index. As shown in Figure 34 below, our Big Data Index trades at an EV/TTM FCF divided by CY11 expected growth rate of 1.5x compared to 2.1x for the NASDAQ technology index. FIGURE 34: EV/TTM FCF Divided by CY11 Expected Growth Rate
Source: FactSet
37
38
39
40
41
INITIATIONS SUMMARY
We are initiating coverage on six companies in the infrastructure software universe: MicroStrategy (MSTR) We are initiating coverage on MicroStrategy with a Market Outperform rating and $140 price target. MicroStrategy is the largest publicly-traded independent BI vendor. We like MicroStrategy because of its powerful value proposition of an end-to-end BI architecture and analytics platform, its large market presence with some of the leading companies in the world, its well-built developer ecosystem, and its four-quarters in a row of double-digit license growth which we expect to continue. While MicroStrategy has invested heavily in 2011 (with operating margins expected to be down 800 basis points) to better compete with the emerging players like QlikTech and Tableau Software, we believe the investments will start to bear fruit toward the end of this year leading to significant operating margin expansion next year and revenue with EPS estimates coming in above consensus estimates. We look for 2011, 2012, and 2013 EPS of $1.81, $3.67, and $5.14 versus consensus of $1.79, $3.63, and $4.59, respectively. Our $140 price target implies a very reasonable 2013 EV/Revenue multiple of 1.9x, a discount to the peer group, and a 2013 P/E of 27x, slightly above MicroStrategy's average 5 year forward P/E multiple, roughly in line with its TTM revenue growth rate of 28%. Progress Software (PRGS) We are initiating coverage on Progress Software with a Market Perform rating. Progress Software provides enterprise software products that enable organizations to be operationally responsive to a changing business environment. We like the steps Progress is taking to transition the business to the fast-growing category of Enterprise Business Solutions; however, we remain on the sidelines until we see a permanent CEO in place, more consistent sales execution, and possibly a divestiture of nonstrategic assets. Progress Software trades at a 2013 P/E multiple of 10x versus the peer group median of 11x. We look for 2011 non-GAAP EPS of $1.43, versus consensus of $1.45; 2012 non-GAAP EPS of $1.66, versus consensus of $1.63; and 2013 non-GAAP EPS of $1.76, versus consensus of $1.72. Qlik Technologies (QLIK) We are initiating coverage on Qlik Technologies ("QlikTech") with a Market Outperform rating and a $35 price target. QlikTech is the fastest growing company in our Big Data/Business Intelligence coverage universe and one of the fastest growing publicly-traded software companies, with expected 2011 revenue growth of 41%. We like QlikTech because we believe it has a wide-open market opportunity, a strong value proposition, and based on our survey of 16 customers, we believe the company will be able to exceed growth expectations. We look for 2011, 2012, and 2013 non-GAAP EPS of $0.30, $0.47, and $0.68 (versus consensus of $0.29, $0.44, and $0.63) on revenue growth of 42%, 28%, and 25%, respectively. Our $35 price target implies an EV/2013 revenue multiple of 5.6x, a modest premium to the high-growth software peer group. Quest Software (QSFT) We are initiating coverage on Quest Software with a Market Perform rating. Quest Software is a provider of enterprise systems management software products that has grown primarily via acquisition. We like Quest's ability to generate cash (with a TTM FCF yield of 10%) and its deep product portfolio. However, Quest's performance in the past four quarters has been inconsistent, with EPS misses in three of the last six quarters and revenue misses in two of the last six quarters. While we believe the downside risk on this name is limited due to its valuation, we remain on the sidelines until we see more consistent execution. Quest Software trades at a 2013 P/E of 10x, versus the comp group of 11x. We look for 2011 non-GAAP EPS of $1.33, in line with consensus; 2012 non-GAAP EPS of $1.64, versus consensus of $1.65; and 2013 non-GAAP EPS of $1.90, versus consensus of $1.93.
42
Teradata (TDC) We are initiating coverage on Teradata with a Market Outperform rating and $63 price target. We like Teradata because it is the leading data warehousing vendor, we believe it stands to benefit from the Big Data trend more than any other software vendor; the competitive environment for Teradata is more benign than the conventional wisdom believes, and we believe the company is well positioned to beat consensus expectations for 2012 and 2013. We look for 2011, 2012, and 2013 non-GAAP EPS of $2.29, $2.70, and $3.15, versus consensus of $2.25, $2.58, and $2.95. Our $63 price target represents an FY13 P/E multiple of 20x, in line with Teradata's 10-year average. TIBCO Software (TIBX) We are initiating coverage on TIBCO Software with a Market Outperform rating and a $33 price target. We like TIBCO because we believe it is a well-managed company growing the top line 21% and that is committed to growing the bottom line 15-20% per year, it is a cash flow machine with a 10-year FCF CAGR of 24%, it is tapping a large market opportunity that we feel is getting even bigger as a result of the Big Data trend, it is well diversified across verticals and product areas, it has strong partnerships, and we believe it represents a good acquisition target. We look for 2011 non-GAAP EPS of $0.94 (consensus $0.94), 2012 non-GAAP EPS of $1.12 (consensus $1.11), and 2013 non-GAAP EPS of $1.32 (consensus of $1.27) on revenue growth of 21%, 13%, and 11%, respectively, well above its comp group. Our $33 price target implies a 2013 P/E of 25x, in line with TIBCO's expected 2011 license growth rate and a premium to the peer group median of 15x.
43
44
Peter Lowry Associate [email protected] 415-869-4418 Peter Lowry joined JMP Securities in June 2011 and serves as an Associate in Equity Research covering Software. Prior to joining JMP, Peter had 15 years experience as an Investment Banker, Private Banker and CPA at top-tier firms such as PWC, Schroder, Lehman Brothers, UBS, Bank of America, Deutsche Bank and Ion Partners. Peter worked with both corporate and private clients with finance issues across public accounting, corporate finance, capital markets and private wealth management. Peter has an MBA from Columbia University, an MS in Public Accounting from the University of Hartford, and a BA from Hamilton College.
We would also like to acknowledge the following gentlemen for their help with the Big Data project: Praveen Chandran Rishi Sharma Alec Short Vincent Song Naga Surendran Vijay Tennety Julian Terkaly
45
Publicly Traded Companies Covered by JMP and Mentioned in This Report (as of November 15, 2011):
Company Actuate Corporation Adobe Systems, Inc. Bank of America Corp. Cisco Systems, Inc. Citrix Systems, Inc. CommVault Systems, Inc. Cornerstone OnDemand, Inc. Demand Media, Inc. DemandTec, Inc. EMC Corporation Hewlett-Packard Company Informatica Corporation JDA Software Group Inc. MicroStrategy, Inc. Oracle Corporation Progress Software Corporation Qlik Technologies Inc. Rackspace Hosting, Inc. RealPage, Inc. Responsys, Inc. RightNow Technologies, Inc. SAP AG Symantec Corporation TIBCO Software Inc. Teradata Corporation Ultimate Software Group VMware, Inc. salesforce.com Disclosures (1) (1) (1) (1) (1) (1,3) (1,3,5) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) (1,3) (1,3) (1,3) (1) (1) (1) (1) (1) (1) (1)
JMP Securities Research Ratings and Investment Banking Services: (as of October 3, 2011)
Regulatory Equivalent Buy Hold Sell # Co's Under Coverage 207 105 3 315 % of Total 66% 33% 1% 100% Regulatory Rating Buy Hold Sell # Co's Under Coverage 207 105 3 315 % of Total 66% 33% 1% 100% # Co's Receiving IB Services in Past 12 Months 58 7 0 65 % of Co's With This Rating 28% 7% 0% 21%
46
JMP Disclaimer:
JMP Securities LLC (the Firm) compensates research analysts, like other Firm employees, based on the Firms profitability, which includes revenues from the Firms institutional sales, trading, and investment banking departments as well as on the quality of the services and activities performed that are intended to benefit the Firms institutional clients. These data have been prepared by JMP Securities LLC for informational purposes only and are based on information available to the public from sources that we believe to be reliable, but we do not guarantee their accuracy or completeness. Any opinions and projections expressed herein reflect our judgment at this date and are subject to change without notice. These data are neither intended nor should be considered as an offer to sell or a solicitation or a basis for any contract for the purchase of any security or other financial product. JMP Securities LLC, its affiliates, JMP Group LLC, Harvest Capital Strategies LLC, and their respective partners, directors, officers, and associates may have a long or short position in, may act as a market maker for, or may purchase or sell a position in the securities mentioned herein. JMP Securities LLC or its affiliates may be performing, have performed, or seek to perform investment banking, advisory, or other services and may have acted as manager or co-manager for a public offering of securities for any company mentioned herein. The reader should assume that JMP Securities LLC will solicit business from the company covered in this report. Copyright 2011. All rights reserved by JMP Securities LLC. JMP Securities LLC is a member of FINRA, NYSE Arca, NASDAQ, and SIPC.
47
Financial Services Capital Markets David Trone Steven Fu, CFA Chris Ross, CFA
Real Estate Hotels & Resorts William C. Marks Housing & Housing Supply Chain Michael G. Smith Land Development Michael G. Smith Real Estate & Property Services William C. Marks Real Estate Technology Michael G. Smith REITs: Healthcare Peter L. Martin, CFA Aaron Hecht REITs: Office & Industrial Mitch Germain Technology Clean Technology Alex Gauna Communications Equipment Erik Suppiger Semiconductors Alex Gauna
(415) 835-8944 (415) 835-8965 (415) 835-8965 (415) 835-8944 (415) 835-8965 (415) 835-8904 (415) 835-3963 (212) 906-3546
Consumer & Specialty Finance, Commercial Banks John Hecht (415) 835-3912 Kyle M. Joseph (415) 835-3940 Financial Processing & Outsourcing David M. Scharf Kevane A. Wong Insurance Matthew J. Carletti Christine Worley Market Structure David M. Scharf Kevane A. Wong (415) 835-8942 (415) 835-8976 (312) 768-1784 (312) 768-1786 (415) 835-8942 (415) 835-8976
Residential & Commercial Real Estate Finance Steven C. DeLaney (404) 848-7773 Trevor Cranston, CFA (415) 869-4431 Trevor Cranston, CFA Healthcare Biotechnology Charles C. Duncan, PhD Roy Buchanan, PhD Jason N. Butler, PhD Gena H. Wang, PhD Liisa A. Bayko Heather Behanna, PhD Jason N. Butler, PhD Healthcare Facilities & Services Peter L. Martin, CFA Aaron Hecht Healthcare Services Constantine Davides, CFA Tim McDonough Medical Devices J. T. Haresco, III, PhD (415) 869-4431
(415) 835-8998 (415) 835-3918 (415) 835-8998 (415) 835-8943 (415) 835-3934 (415) 869-4418 (415) 835-3934
(212) 906-3510 (212) 906-3514 (212) 906-3505 (212) 906-3528 (312) 768-1785 (312) 768-1795 (212) 906-3505 (415) 835-8904 (415) 835-3963 (617) 235-8502 (617) 235-8504 (415) 869-4477
For Additional Information Mark Lehmann President, JMP Securities (415) 835-3908 Erin Seidemann Vice President, Publishing (415) 835-3970