Pentaho Performance and Scalability Overview
Pentaho Performance and Scalability Overview
This guide provides an overview of some of the performance and scalability capabilities of the Pentaho Business Analytics Platform.
Contents
Pentaho Scalability and High-Performance Architecture....................................................... 3 Pentaho Business Analytics Server........................................................................................... 3 Deployment on 64-bit Operating Systems .............................................................................. 3 Clustering Multiple Server Loads .............................................................................................. 4 Optimizing the Configuration of the Reporting and Analysis Engines Pentaho Reporting...................................................................................................................... 4 Pentaho Analysis........................................................................................................................ In-Memory Caching Capabilities.............................................................................................. 4 5
Aggregate Table Support........................................................................................................... 6 Partitioning Support for High Cardinality Dimensionality.................................................... Pentaho Data Integration......................................................................................................... Multi-threaded Architecture..................................................................................................... Transformation Processing Engine.......................................................................................... 6 6 7 7
Clustering and Partitioning........................................................................................................ 8 Executing in Hadoop (Pentaho MapReduce)........................................................................... 9 Native Support for Big Data Sources including Hadoop, NoSQL and High-Performance Analytical Databases.......................................................................... 9 Customer Examples and Use Cases........................................................................................ 10
4Optimizing the Configuration of the Reporting and Analysis Engines 4Pentaho Reporting
The Pentaho Reporting engine enables the retrieval, formatting and processing of information from a data source, to generate user-readable output. One example for increasing the performance and scalability of the Pentaho Reporting solutions is to take advantage of result set caching. When rendered, a parameterized report must account for every dataset required for every parameter. Every time a parameter field changes, every dataset is recalculated. This can negatively impact performance. Caching parameterized reports result sets creates improved performance for larger datasets.
4Pentaho Analysis
The Pentaho Analysis engine (Mondrian) creates an analysis schema, and forms data sets from that schema by using an MDX query. Maximizing performance and scalability always begins with the proper design and tuning of source data. Once the database has been optimized, there are some additional areas within the Pentaho Analysis engine that can be tuned.
aIn-Memory Caching Capabilities Pentahos in-memory caching capability enables ad hoc analysis of millions of rows of data in seconds. Pentahos pluggable, in-memory architecture is integrated with popular open source caching platforms such as Infinispan and Memcached and is used by many of the worlds most popular social, ecommerce and multi-media websites. In addition, Pentaho allows in-memory aggregation of data where granular data can be rolled-up to higher-level summaries entirely in-memory, reducing the need to send new queries to the database. This will result in even faster performance for more complex analytic queries.
We have operational metrics for six different businesses running in each of our senior care facilities that need to be retrieved and accessed everyday by our corporate management, the individual facilities managers, as well as the line of business managers in a matter of seconds. Now, with the high performance in-memory analysis capabilities in the latest release of Pentaho Business Analytics, we can be more aggressive in rollouts adding more metrics to dashboards, giving dashboards and data analysis capabilities to more users, and see greater usage rates and more adoption of business analytics solutions. Brandon Jackson, Dir. of Analytics and Finance, StoneGate Senior Living LLC.
aAggregate Table Support When working with large data sets, properly creating and using aggregate tables greatly improves performance. An aggregate table coexists with the base fact table, and contains pre-aggregated measures built from the fact table. Registered in the schema Pentaho Analysis can choose to use an aggregate table rather than the fact table, resulting in faster query performance.
aPartitioning Support for High Cardinality Dimensionality Large, enterprise data warehouse deployments often contain attributes comprised of tens or hundreds of thousands of unique members. For these use cases, the Pentaho Analysis engine can be configured to properly address a (partitioned) high-cardinality dimension. This will streamline SQL generation for partitioned tables; ultimately, only the relevant partitions will be queried, which can greatly increases query performance.
4Multi-threaded Architecture
PDIs streaming engine architecture provides the ability to work with extremely large data volumes, and provides Enterprise-class performance and scalability with a broad range of deployment options including dedicated, clustered, and/or cloud-based ETL servers The architecture allows both vertical and horizontal scaling. The engine executes tasks in parallel and across multiple CPUs on a single machine as well as across multiple servers via clustering and partitioning.
Example of a Data Integration Flow with Multiple Threads for a Single Step (Row Denormalizer)
4Native Support for Big Data Sources including Hadoop, NoSQL and High-Performance Analytic Databases
Pentaho supports native access, bulk-loading and querying of a large number of databases including: NoSQL data sources such as:
Cassandra HBase MongoDB HPCC Systems ElasticSearch
Use Case
Store Operations Dashboard Customer Value Analysis
# of Users (total)
<500
# of Users (concurrent
200 <25 With less than 10 seconds response time Website Activity Analysis <5 With less than 10 seconds response time
5+ TB HP Neoview 1200
Social Networking
High-tech Manufacturing
Stream Global 10 Operational Providers of Sales, Dashboards Customer Service and Technical support for the Fortune 1000
Sheetz
1 TB in <25 Vectorwise 10+ TB in a 20-node Hadoop cluster loading 200,000 rows per second 20 billion chat logs per month 240 million user profiles 500 GB to 1 TB in > 100,000 an 8-node Greenplum cluster 200 GB in Oracle N/A Cloudera Hadoop Loading 10 million records per hour 650,000 XML documents per week (2 to 4 MB each) 100+ million devices dimension Data from 28 200+ switches around the world 12 source systems e.g. Oracle HRMS, SAP, Salesforce. com 20 million records per hour 2+ TB in Teradata 80
3,000
N/A
30
GLOBAL HEADQUARTERS
Citadel International, Suite 340 5950 Hazeltine National Dr. Orlando, FL 32822, USA TEL +1 407 812 OPEN (6736) FAX +1 407 517 4575
US & WORLDWIDE SALES OFFICE
201 Mission St., Suite 2375 San Francisco, CA 94105, USA TEL +1 415 525 5540 TOLL FREE +1 866 660 7555
UNITED KINGDOM, REST OF EUROPE, MIDDLE EAST & AFRICA
London, United Kingdom TEL +44 7711 104854 TOLL FREE (UK) 0 800 680 0693 France Offices - Paris, France TOLL FREE (France) 0800 915343 Germany, Austria, Switzerland Offices - Frankfurt, Germany TEL +49 6051 7084 112 TOLL FREE (Germany) 0800 186 0332 Belgium, Netherlands, and Luxembourg Offices - Antwerp, Belgium TEL +31 621 505255
To learn more about Pentahos Business Analytics software and services, contact your Pentaho Sales Representative online at pentaho.com or call +1.866.660.7555.