1) The presentation discusses Druid, an open source analytics engine that can perform aggregations on memory mapped data in sub-second time.
2) It describes how Druid fits into their software stack at the API layer and how they extend its capabilities through a SQL interface and addressing limitations like limited querying and missing features like distinct counts.
3) Examples of SQL queries against Druid are shown to demonstrate its capabilities like group by, filtering, joins, and handling of timeseries data.
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015NoSQLmatters
How do you monitor performance for one of your clients on a specific user segmentation when dealing with billions of events a day ? With over 2 billion ads served and 230Tb of data processed a day, we at Criteo have a comprehensive need for an interactive analytics stack. And by interactive, we mean a querying system with dynamic filtering to drill down over multiple dimensions, answering within sub-second latency. This session will take you on our journey with Druid, ""an open-source data store designed for real-time exploratory analytics on large data sets"". We will explore Druid's architecture and noticeable concepts, how relevant they are for some use cases and how it really performs.
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018Charles Allen
Charles Allen covers data processing, analytics, and insights systems at Snap. Strength points for Druid use cases are called out as are differences in some of the processing systems used.
This is the slide collection from the second talk from:
https://www.meetup.com/druidio-la/events/254080924/
Python and MongoDB as a Market Data Platform by James BlackburnPyData
This document discusses using Python and MongoDB as a scalable platform for storing time series market data. It outlines some of the challenges of storing different types and sizes of financial data from various sources. The goals are to access 10 years of 1-minute data in under 1 second, store all data types in a single location, and have a system that is fast, complete, scalable, and supports agile development. MongoDB is chosen as it matches well with Python and allows for fast, low latency access. The system implemented uses MongoDB to store data in a way that supports versioning, arbitrary data types, and efficient storage and retrieval of pandas DataFrames. Performance tests show significant speed improvements over SQL and other tick databases for accessing intra
You Sun Jeong is a senior software engineer at SK Telecom who has worked on several big data projects including a Hadoop data warehouse, real-time network analytics, and a big data discovery solution. The presentation introduces Druid, an open source distributed OLAP data store designed to enable fast analytics on large datasets. Key features of Druid include real-time ingestion, in-memory columnar storage, and support for common OLAP operations like roll-ups, drill-downs, and slicing and dicing. A live demo then shows real-time ingestion and cohort analysis on a dataset of mobile app user events.
This document provides an overview of real-time indexing in Druid. It describes the key components of Druid's real-time indexing architecture including Tranquility, the indexing service, firehose, plumber and real-time tasks. Tranquility is used to ingest event streams from Kafka in real-time and submit indexing tasks to Druid. The tasks read data from the firehose, incrementally build indexes, and push completed segments to deep storage via the plumber. The document explains how these components work together to continuously ingest and index streaming data.
The document discusses using MongoDB as a tick store for financial data. It provides an overview of MongoDB and its benefits for handling tick data, including its flexible data model, rich querying capabilities, native aggregation framework, ability to do pre-aggregation for continuous data snapshots, language drivers and Hadoop connector. It also presents a case study of AHL, a quantitative hedge fund, using MongoDB and Python as their market data platform to easily onboard large volumes of financial data in different formats and provide low-latency access for backtesting and research applications.
The integration between Spring Framework and MongoDB tends to be somewhat unknown. This presentation shows the different projects that compose Spring ecosystem, Springdata, Springboot, SpringIO etc and how to merge between the pure JAVA projects to massive enterprise systems that require the interaction of these systems together.
Webinar: Choosing the Right Shard Key for High Performance and ScaleMongoDB
Read these webinar slides to learn how selecting the right shard key can future proof your application.
The shard key that you select can impact the performance, capability, and functionality of your database.
Real-time Analytics with Apache Flink and DruidJan Graßegger
This document discusses using Apache Flink and Druid for real-time analytics. It describes Druid as an online analytical processing (OLAP) system that is column-oriented, distributed, and uses built-in data sharding based on time windows. It also introduces Tranquility, which helps ingest real-time data into Druid from systems like Kafka, Spark, and Flink. The document proposes a processing architecture using Kafka, Flink, Druid and Tranquility, with HDFS for replays, to enable real-time reporting with capabilities for replays from HDFS and Kafka.
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...Gianfranco Palumbo
Presented on 28th June 2013 in Dublin's Jaspersoft Big Data event
http://www.jaspersoft.com/event/big-data-analysis-made-easy
Are you in the process of evaluating or migrating to MongoDB? We will cover key aspects of migrating to MongoDB from a RDBMS, including Schema design, Indexing strategies, Data migration approaches as your implementation reaches various SDLC stages, Achieving operational agility through MongoDB Management Services (MMS).
- MongoDB is well-suited for systems of engagement that have demanding real-time requirements, diverse and mixed data sets, massive concurrency, global deployment, and no downtime tolerance.
- It performs well for workloads with mixed reads, writes, and updates and scales horizontally on demand. However, it is less suited for analytical workloads, data warehousing, business intelligence, or transaction processing workloads.
- MongoDB shines for use cases involving single views of data, mobile and geospatial applications, real-time analytics, catalogs, personalization, content management, and log aggregation. It is less optimal for workloads requiring joins, full collection scans, high-latency writes, or five nines u
MongoDB has taken a clear lead in adoption among the new generation of databases, including the enormous variety of NoSQL offerings. A key reason for this lead has been a unique combination of agility and scalability. Agility provides business units with a quick start and flexibility to maintain development velocity, despite changing data and requirements. Scalability maintains that flexibility while providing fast, interactive performance as data volume and usage increase. We'll address the key organizational, operational, and engineering considerations to ensure that agility and scalability stay aligned at increasing scale, from small development instances to web-scale applications. We will also survey some key examples of highly-scaled customer applications of MongoDB.
When it comes time to select database software for your project, there are a bewildering number of choices. How do you know if your project is a good fit for a relational database, or whether one of the many NoSQL options is a better choice?
In this webinar you will learn when to use MongoDB and how to evaluate if MongoDB is a fit for your project. You will see how MongoDB's flexible document model is solving business problems in ways that were not previously possible, and how MongoDB's built-in features allow running at scale.
Topics covered include:
Performance and Scalability
MongoDB's Data Model
Popular MongoDB Use Cases
Customer Stories
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.Rommel Garcia
Abstract:
Netflix is a media services provider and uses Apache Druid to measure the customer experience of watching videos, anywhere in the world. They have significant investment of Apache Druid that helps them improve their services across the board.
NTT is the fourth largest telecommunications company in the world. They use Druid for global traffic visibility for technical, economical and security use cases.
Rubicon Project is one of the world’s largest digital advertising exchanges. To achieve accurate data calculations, get great analytical performance and extract intelligence fast. They use Druid as their foundation for their realtime analytics platform.
In this presentation, we will discuss what their challenges were and why the moved to Apache Druid to meet their needs.
Presenter: Rommel Garcia
Bio:
Rommel Garcia is currently the Director of Solutions Engineering at Imply, a company founded by the same people that created Druid. He is an author of the book Virtualizing Hadoop - How to install. Deploy, and Optimize Hadoop in Virtualized Infrastructure. In the last 10 years, he had worked on Data Management both on-prem and in the cloud, distributed systems, Big Data analytics, and hardware accelerated analytics using GPUs.
The MongoDB Spark Connector integrates MongoDB and Apache Spark, providing users with the ability to process data in MongoDB with the massive parallelism of Spark. The connector gives users access to Spark's streaming capabilities, machine learning libraries, and interactive processing through the Spark shell, Dataframes and Datasets. We'll take a tour of the connector with a focus on practical use of the connector, and run a demo using both Spark and MongoDB for data processing.
MongoDB and Hadoop: Driving Business InsightsMongoDB
This document discusses using MongoDB and Hadoop together to drive business insights. It provides an overview of the evolving data landscape, with Hadoop used for large datasets and analytics and MongoDB used for operational workloads. Example use cases shown are combining MongoDB for real-time applications with Hadoop for analysis in domains like commerce, insurance, and fraud detection. The MongoDB Connector for Hadoop is described, allowing MongoDB to act as a data source and sink for tools like MapReduce, Pig, Hive, and Spark. A demo is shown of a movie recommendation application that uses Spark running on Hadoop to generate recommendations from a MongoDB dataset and store the results back in MongoDB.
Learn how you can enjoy the developer productivity, low TCO, and unlimited scale of MongoDB as a tick database for capturing, analyzing, and taking advantage of opportunities in tick data. This presentation will illustrates how MongoDB can easily and quickly store variable data formats, like top and depth of book, multiple asset classes, and even news and social networking feeds. It will explore aggregating and analyzing tick data in real-time for automated trading or in batch for research and analysis and how auto-sharding enables MongoDB to scale with commodity hardware to satisfy unlimited storage and performance requirements.
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB
The document discusses schema design considerations for time series traffic monitoring data stored in MongoDB. It evaluates alternatives like storing a document per event, per minute, or per hour. Storing aggregated data improves write performance by changing inserts to updates, improves analytic queries by reducing document reads, and reduces memory requirements by decreasing index size. The best schema depends on the read and write workload of the specific application.
Buzz Moschetti presents on using MongoDB and Hadoop together for success with big data projects. He outlines a real-time directed content system that uses MongoDB for operational data and recommendations, Hadoop for batch analytics, and integrates the two with real-time updates. The system dynamically updates user profiles and recommendations based on user clicks and periodic re-analysis of all data in Hadoop. It provides both real-time and long-term analytics capabilities through this integrated architecture.
Back to Basics 2017: Introduction to ShardingMongoDB
Sharding is a method for distributing data across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput operations by providing the capability for horizontal scaling.
CyberAgent is a leading Internet company in Japan focused on smartphone social communities and a game platform known as Ameba, which has 40M users. In this presentation, we will introduce how we use HBase for storing social graph data and as a basis for ad systems, user monitoring, log analysis, and recommendation systems.
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDBMongoDB
This webinar will guide you through the best practices for migrating off of a relational database. Whether you are migrating an existing application, or considering using MongoDB in place of your traditional relational database for a new project, this webinar will get you to production faster, with less effort, cost and risk.
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...NoSQLmatters
Data analysis is an exploratory process that requires a variety of tools and a flexible data store. Data analysis projects are easy to start but quickly become difficult to manage and error prone when depending on file-based data storage. Relational databases are poorly equipped to accommodate the dynamic demands complex analysis. This talk describes best practices for using MongoDB for analytics projects. Examples will be drawn from a large scale text mining project (approximately 25 million documents) that applies machine learning (neural networks and support vector machines) and statistical analysis. Tools discussed include R, Spark, Python scientific stack, and custom pre-processing scripts but the focus is on using these with the document database.
This document discusses using Redis and Elasticsearch together for time series data. Redis Streams can be used to store time-stamped data in Redis, and then a Logstash pipeline can be used to extract the data from Redis and index it into Elasticsearch. The RediSearch module for Redis allows full-text search of Redis data. Dashboards in Kibana can then visualize and analyze the time series data stored in Elasticsearch.
Eagle6 is a product that use system artifacts to create a replica model that represents a near real-time view of system architecture. Eagle6 was built to collect system data (log files, application source code, etc.) and to link system behaviors in such a way that the user is able to quickly identify risks associated with unknown or unwanted behavioral events that may result in unknown impacts to seemingly unrelated down-stream systems. This session is designed to present the capabilities of the Eagle6 modeling product and how we are using MongoDB to support near-real-time analysis of large disparate datasets.
MongoDB is a document-oriented, high performance, highly available, and horizontally scalable operational database. It addresses challenges with traditional RDBMS like handling high volumes of data, semi-structured and unstructured data types, and the need for agile development. MongoDB can be used for financial services use cases like high volume data feeds, risk analytics, product catalogs, trade capture, reporting, reference data management, portfolio management, quantitative analysis, and automated trading. It provides features like flexible schemas, indexing, aggregation, scaling out through sharding, and integration with Hadoop.
Mongo db and hadoop driving business insights - finalMongoDB
MongoDB and Hadoop can work together to solve big data problems facing today's enterprises. We will take an in-depth look at how the two technologies complement and enrich each other with complex analyses and greater intelligence. We will take a deep dive into the MongoDB Connector for Hadoop and how it can be applied to enable new business insights with MapReduce, Pig, and Hive, and demo a Spark application to drive product recommendations.
Webinar: Choosing the Right Shard Key for High Performance and ScaleMongoDB
Read these webinar slides to learn how selecting the right shard key can future proof your application.
The shard key that you select can impact the performance, capability, and functionality of your database.
Real-time Analytics with Apache Flink and DruidJan Graßegger
This document discusses using Apache Flink and Druid for real-time analytics. It describes Druid as an online analytical processing (OLAP) system that is column-oriented, distributed, and uses built-in data sharding based on time windows. It also introduces Tranquility, which helps ingest real-time data into Druid from systems like Kafka, Spark, and Flink. The document proposes a processing architecture using Kafka, Flink, Druid and Tranquility, with HDFS for replays, to enable real-time reporting with capabilities for replays from HDFS and Kafka.
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...Gianfranco Palumbo
Presented on 28th June 2013 in Dublin's Jaspersoft Big Data event
http://www.jaspersoft.com/event/big-data-analysis-made-easy
Are you in the process of evaluating or migrating to MongoDB? We will cover key aspects of migrating to MongoDB from a RDBMS, including Schema design, Indexing strategies, Data migration approaches as your implementation reaches various SDLC stages, Achieving operational agility through MongoDB Management Services (MMS).
- MongoDB is well-suited for systems of engagement that have demanding real-time requirements, diverse and mixed data sets, massive concurrency, global deployment, and no downtime tolerance.
- It performs well for workloads with mixed reads, writes, and updates and scales horizontally on demand. However, it is less suited for analytical workloads, data warehousing, business intelligence, or transaction processing workloads.
- MongoDB shines for use cases involving single views of data, mobile and geospatial applications, real-time analytics, catalogs, personalization, content management, and log aggregation. It is less optimal for workloads requiring joins, full collection scans, high-latency writes, or five nines u
MongoDB has taken a clear lead in adoption among the new generation of databases, including the enormous variety of NoSQL offerings. A key reason for this lead has been a unique combination of agility and scalability. Agility provides business units with a quick start and flexibility to maintain development velocity, despite changing data and requirements. Scalability maintains that flexibility while providing fast, interactive performance as data volume and usage increase. We'll address the key organizational, operational, and engineering considerations to ensure that agility and scalability stay aligned at increasing scale, from small development instances to web-scale applications. We will also survey some key examples of highly-scaled customer applications of MongoDB.
When it comes time to select database software for your project, there are a bewildering number of choices. How do you know if your project is a good fit for a relational database, or whether one of the many NoSQL options is a better choice?
In this webinar you will learn when to use MongoDB and how to evaluate if MongoDB is a fit for your project. You will see how MongoDB's flexible document model is solving business problems in ways that were not previously possible, and how MongoDB's built-in features allow running at scale.
Topics covered include:
Performance and Scalability
MongoDB's Data Model
Popular MongoDB Use Cases
Customer Stories
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.Rommel Garcia
Abstract:
Netflix is a media services provider and uses Apache Druid to measure the customer experience of watching videos, anywhere in the world. They have significant investment of Apache Druid that helps them improve their services across the board.
NTT is the fourth largest telecommunications company in the world. They use Druid for global traffic visibility for technical, economical and security use cases.
Rubicon Project is one of the world’s largest digital advertising exchanges. To achieve accurate data calculations, get great analytical performance and extract intelligence fast. They use Druid as their foundation for their realtime analytics platform.
In this presentation, we will discuss what their challenges were and why the moved to Apache Druid to meet their needs.
Presenter: Rommel Garcia
Bio:
Rommel Garcia is currently the Director of Solutions Engineering at Imply, a company founded by the same people that created Druid. He is an author of the book Virtualizing Hadoop - How to install. Deploy, and Optimize Hadoop in Virtualized Infrastructure. In the last 10 years, he had worked on Data Management both on-prem and in the cloud, distributed systems, Big Data analytics, and hardware accelerated analytics using GPUs.
The MongoDB Spark Connector integrates MongoDB and Apache Spark, providing users with the ability to process data in MongoDB with the massive parallelism of Spark. The connector gives users access to Spark's streaming capabilities, machine learning libraries, and interactive processing through the Spark shell, Dataframes and Datasets. We'll take a tour of the connector with a focus on practical use of the connector, and run a demo using both Spark and MongoDB for data processing.
MongoDB and Hadoop: Driving Business InsightsMongoDB
This document discusses using MongoDB and Hadoop together to drive business insights. It provides an overview of the evolving data landscape, with Hadoop used for large datasets and analytics and MongoDB used for operational workloads. Example use cases shown are combining MongoDB for real-time applications with Hadoop for analysis in domains like commerce, insurance, and fraud detection. The MongoDB Connector for Hadoop is described, allowing MongoDB to act as a data source and sink for tools like MapReduce, Pig, Hive, and Spark. A demo is shown of a movie recommendation application that uses Spark running on Hadoop to generate recommendations from a MongoDB dataset and store the results back in MongoDB.
Learn how you can enjoy the developer productivity, low TCO, and unlimited scale of MongoDB as a tick database for capturing, analyzing, and taking advantage of opportunities in tick data. This presentation will illustrates how MongoDB can easily and quickly store variable data formats, like top and depth of book, multiple asset classes, and even news and social networking feeds. It will explore aggregating and analyzing tick data in real-time for automated trading or in batch for research and analysis and how auto-sharding enables MongoDB to scale with commodity hardware to satisfy unlimited storage and performance requirements.
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB
The document discusses schema design considerations for time series traffic monitoring data stored in MongoDB. It evaluates alternatives like storing a document per event, per minute, or per hour. Storing aggregated data improves write performance by changing inserts to updates, improves analytic queries by reducing document reads, and reduces memory requirements by decreasing index size. The best schema depends on the read and write workload of the specific application.
Buzz Moschetti presents on using MongoDB and Hadoop together for success with big data projects. He outlines a real-time directed content system that uses MongoDB for operational data and recommendations, Hadoop for batch analytics, and integrates the two with real-time updates. The system dynamically updates user profiles and recommendations based on user clicks and periodic re-analysis of all data in Hadoop. It provides both real-time and long-term analytics capabilities through this integrated architecture.
Back to Basics 2017: Introduction to ShardingMongoDB
Sharding is a method for distributing data across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput operations by providing the capability for horizontal scaling.
CyberAgent is a leading Internet company in Japan focused on smartphone social communities and a game platform known as Ameba, which has 40M users. In this presentation, we will introduce how we use HBase for storing social graph data and as a basis for ad systems, user monitoring, log analysis, and recommendation systems.
Webinar: “ditch Oracle NOW”: Best Practices for Migrating to MongoDBMongoDB
This webinar will guide you through the best practices for migrating off of a relational database. Whether you are migrating an existing application, or considering using MongoDB in place of your traditional relational database for a new project, this webinar will get you to production faster, with less effort, cost and risk.
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...NoSQLmatters
Data analysis is an exploratory process that requires a variety of tools and a flexible data store. Data analysis projects are easy to start but quickly become difficult to manage and error prone when depending on file-based data storage. Relational databases are poorly equipped to accommodate the dynamic demands complex analysis. This talk describes best practices for using MongoDB for analytics projects. Examples will be drawn from a large scale text mining project (approximately 25 million documents) that applies machine learning (neural networks and support vector machines) and statistical analysis. Tools discussed include R, Spark, Python scientific stack, and custom pre-processing scripts but the focus is on using these with the document database.
This document discusses using Redis and Elasticsearch together for time series data. Redis Streams can be used to store time-stamped data in Redis, and then a Logstash pipeline can be used to extract the data from Redis and index it into Elasticsearch. The RediSearch module for Redis allows full-text search of Redis data. Dashboards in Kibana can then visualize and analyze the time series data stored in Elasticsearch.
Eagle6 is a product that use system artifacts to create a replica model that represents a near real-time view of system architecture. Eagle6 was built to collect system data (log files, application source code, etc.) and to link system behaviors in such a way that the user is able to quickly identify risks associated with unknown or unwanted behavioral events that may result in unknown impacts to seemingly unrelated down-stream systems. This session is designed to present the capabilities of the Eagle6 modeling product and how we are using MongoDB to support near-real-time analysis of large disparate datasets.
MongoDB is a document-oriented, high performance, highly available, and horizontally scalable operational database. It addresses challenges with traditional RDBMS like handling high volumes of data, semi-structured and unstructured data types, and the need for agile development. MongoDB can be used for financial services use cases like high volume data feeds, risk analytics, product catalogs, trade capture, reporting, reference data management, portfolio management, quantitative analysis, and automated trading. It provides features like flexible schemas, indexing, aggregation, scaling out through sharding, and integration with Hadoop.
Mongo db and hadoop driving business insights - finalMongoDB
MongoDB and Hadoop can work together to solve big data problems facing today's enterprises. We will take an in-depth look at how the two technologies complement and enrich each other with complex analyses and greater intelligence. We will take a deep dive into the MongoDB Connector for Hadoop and how it can be applied to enable new business insights with MapReduce, Pig, and Hive, and demo a Spark application to drive product recommendations.
Sometimes , some things work better than other things. MongoDB is great for quick access to low-latency data; Treasure Data is great for infinitely scalable historical data store. A lambda architecture is also explained.
How to get the best of both: MongoDB is great for low latency quick access of recent data; Treasure Data is great for infinitely growing store of historical data. In the latter case, one need not worry about scaling.
The document summarizes a meeting of the Accra MongoDB User Group held on November 10th, 2012. It provides information about MongoDB and 10gen, the company that develops MongoDB. It discusses 10gen's founders, management, offices, investors, and customer portfolio. It also summarizes why users should join the MongoDB User Group and covers topics from the meeting including MongoDB operations, what's new in version 2.2, aggregation framework, TTL collections, fragmentation, data center awareness, and developing an application to find nearby restaurants serving fufu using MongoDB.
Webinar: How Banks Use MongoDB as a Tick DatabaseMongoDB
Learn why MongoDB is spreading like wildfire across capital markets (and really every industry) and then focus in particular on how financial firms are enjoying the developer productivity, low TCO, and unlimited scale of MongoDB as a tick database for capturing, analyzing, and taking advantage of opportunities in tick data.
This document discusses building a social analytics tool using MongoDB from a developer's perspective. It covers using MongoDB for its schema-less data and ability to handle fast read-write operations. Key topics include using aggregation queries to gain insights from data by chaining queries together and filtering/manipulating results at each stage. JavaScript capabilities in MongoDB allow applying business logic directly to data. Examples demonstrate removing garbage data and stopwords. Indexes, current progress, and tips/tricks learned around cloning collections and removing vs dropping are also covered, with a demo planned.
- Data is a precious resource that can last longer than the systems themselves (Tim Berners-Lee)
- Hadoop is an open-source framework for distributed storage and processing of large datasets across clusters of commodity hardware. It provides reliability, scalability and flexibility.
- Hadoop consists of HDFS for storage and MapReduce for processing. The main nodes include NameNode, DataNodes, JobTracker and TaskTrackers. Tools like Hive, Pig, HBase extend its capabilities for SQL-like queries, data flows and NoSQL access.
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
What’s important about a technology is what you can use it to do. I’ve looked at what a number of groups are doing with Apache Hadoop and NoSQL in production, and I will relay what worked well for them and what did not. Drawing from real world use cases, I show how people who understand these new approaches can employ them well in conjunction with traditional approaches and existing applications. Thread Detection, Datawarehouse optimization, Marketing Efficiency, Biometric Database are some examples exposed during this presentation.
The document discusses MongoDB and Hadoop. It provides an overview of how MongoDB and Hadoop can be used together, including use cases in commerce, insurance and fraud detection. It describes the MongoDB Connector for Hadoop, which allows reading and writing to MongoDB from Hadoop tools like MapReduce, Pig and Hive. The document concludes with a demo of a movie recommendation platform that uses both MongoDB and Spark on Hadoop to power a movie browsing web application and generate recommendations.
The document discusses MongoDB and Hadoop. It provides an overview of how MongoDB and Hadoop can be used together, including use cases in commerce, insurance and fraud detection. It describes the MongoDB Connector for Hadoop, which allows reading and writing to MongoDB from Hadoop tools like MapReduce, Pig and Hive. A demo is shown of a movie recommendation application that uses both MongoDB and Spark on Hadoop to power a web application.
MongoDB and Hadoop: Driving Business InsightsMongoDB
MongoDB and Hadoop can work together to solve big data problems facing today's enterprises. We will take an in-depth look at how the two technologies complement and enrich each other with complex analyses and greater intelligence. We will take a deep dive into the MongoDB Connector for Hadoop and how it can be applied to enable new business insights with MapReduce, Pig, and Hive, and demo a Spark application to drive product recommendations.
The document discusses Big Data on Azure and provides an overview of HDInsight, Microsoft's Apache Hadoop-based data platform on Azure. It describes HDInsight cluster types for Hadoop, HBase, Storm and Spark and how clusters can be automatically provisioned on Azure. Example applications and demos of Storm, HBase, Hive and Spark are also presented. The document highlights key aspects of using HDInsight including storage integration and tools for interactive analysis.
MongoDB and NoSQL use cases address trends of more and complex data, cloud computing, and fast application development. MongoDB provides horizontal scaling, ability to store complex data without pain, compatibility with object-oriented languages and frequent releases, high single-server performance, and cloud friendliness. However, it offers no complex transactions. Suitable use cases include high data volumes, complex data models, real-time analytics, agile development, and cloud deployment. Examples of users are given for content management, operational intelligence, metadata management, high-volume data feeds, marketing personalization, and dictionary services.
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...MongoDB
Corva's analytics platform enables real-time engineering and machine learning predictions and powers faster and safer drilling. The platform utilizes AWS serverless Lambda & extensible, data-driven API with MongoDB to handle 100,000+ requests per minute of streaming sensor data.
Dev Jumpstart: Build Your First App with MongoDBMongoDB
New to MongoDB? This talk will introduce the philosophy and features of MongoDB. We’ll discuss the benefits of the document-based data model that MongoDB offers by walking through how one can build a simple app. We’ll cover inserting, updating, and querying the database of books. This session will jumpstart your knowledge of MongoDB development, providing you with context for the rest of the day's content.
MongoDB and Hadoop: Driving Business InsightsMongoDB
MongoDB and Hadoop can work together to solve big data problems facing today's enterprises. We will take an in-depth look at how the two technologies complement and enrich each other with complex analyses and greater intelligence. We will take a deep dive into the MongoDB Connector for Hadoop and how it can be applied to enable new business insights with MapReduce, Pig, and Hive, and demo a Spark application to drive product recommendations.
This document provides an overview and agenda for a presentation on Azure DocumentDB. It begins with an introduction to DocumentDB, then covers getting started by setting it up in Azure, how to work with it using C#, cost and usage details, use cases and limitations. Key points are that DocumentDB is a fully-managed NoSQL document database with horizontal scalability. It provides a familiar programming model and common database functions like indexing, consistency options, and stored procedures.
Hadoop is an open-source framework for distributed storage and processing of large datasets across clusters of commodity hardware. It addresses limitations in traditional RDBMS for big data by allowing scaling to large clusters of commodity servers, high fault tolerance, and distributed processing. The core components of Hadoop are HDFS for distributed storage and MapReduce for distributed processing. Hadoop has an ecosystem of additional tools like Pig, Hive, HBase and more. Major companies use Hadoop to process and gain insights from massive amounts of structured and unstructured data.
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
This presentation discusses migrating data from other data stores to MongoDB Atlas. It begins by explaining why MongoDB and Atlas are good choices for data management. Several preparation steps are covered, including sizing the target Atlas cluster, increasing the source oplog, and testing connectivity. Live migration, mongomirror, and dump/restore options are presented for migrating between replicasets or sharded clusters. Post-migration steps like monitoring and backups are also discussed. Finally, migrating from other data stores like AWS DocumentDB, Azure CosmosDB, DynamoDB, and relational databases are briefly covered.
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
MongoDB Kubernetes operator and MongoDB Open Service Broker are ready for production operations. Learn about how MongoDB can be used with the most popular container orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications. A demo will show you how easy it is to enable MongoDB clusters as an External Service using the Open Service Broker API for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
Humana, like many companies, is tackling the challenge of creating real-time insights from data that is diverse and rapidly changing. This is our journey of how we used MongoDB to combined traditional batch approaches with streaming technologies to provide continues alerting capabilities from real-time data streams.
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe.
This talk covers:
Common components of an IoT solution
The challenges involved with managing time-series data in IoT applications
Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance.
How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts
At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
Our clients have unique use cases and data patterns that mandate the choice of a particular strategy. To implement these strategies, it is mandatory that we unlearn a lot of relational concepts while designing and rapidly developing efficient applications on NoSQL. In this session, we will talk about some of our client use cases, the strategies we have adopted, and the features of MongoDB that assisted in implementing these strategies.
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
Encryption is not a new concept to MongoDB. Encryption may occur in-transit (with TLS) and at-rest (with the encrypted storage engine). But MongoDB 4.2 introduces support for Client Side Encryption, ensuring the most sensitive data is encrypted before ever leaving the client application. Even full access to your MongoDB servers is not enough to decrypt this data. And better yet, Client Side Encryption can be enabled at the "flick of a switch".
This session covers using Client Side Encryption in your applications. This includes the necessary setup, how to encrypt data without sacrificing queryability, and what trade-offs to expect.
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
MongoDB Kubernetes operator is ready for prime-time. Learn about how MongoDB can be used with most popular orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications.
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
The document discusses guidelines for ordering fields in compound indexes to optimize query performance. It recommends the E-S-R approach: placing equality fields first, followed by sort fields, and range fields last. This allows indexes to leverage equality matches, provide non-blocking sorts, and minimize scanning. Examples show how indexes ordered by these guidelines can support queries more efficiently by narrowing the search bounds.
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
Aggregation pipeline has been able to power your analysis of data since version 2.2. In 4.2 we added more power and now you can use it for more powerful queries, updates, and outputting your data to existing collections. Come hear how you can do everything with the pipeline, including single-view, ETL, data roll-ups and materialized views.
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
The document describes a methodology for data modeling with MongoDB. It begins by recognizing the differences between document and tabular databases, then outlines a three step methodology: 1) describe the workload by listing queries, 2) identify and model relationships between entities, and 3) apply relevant patterns when modeling for MongoDB. The document uses examples around modeling a coffee shop franchise to illustrate modeling approaches and techniques.
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business.
This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
Virtual assistants are becoming the new norm when it comes to daily life, with Amazon’s Alexa being the leader in the space. As a developer, not only do you need to make web and mobile compliant applications, but you need to be able to support virtual assistants like Alexa. However, the process isn’t quite the same between the platforms.
How do you handle requests? Where do you store your data and work with it to create meaningful responses with little delay? How much of your code needs to change between platforms?
In this session we’ll see how to design and develop applications known as Skills for Amazon Alexa powered devices using the Go programming language and MongoDB.
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
aux Core Data, appréciée par des centaines de milliers de développeurs. Apprenez ce qui rend Realm spécial et comment il peut être utilisé pour créer de meilleures applications plus rapidement.
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
Il n’a jamais été aussi facile de commander en ligne et de se faire livrer en moins de 48h très souvent gratuitement. Cette simplicité d’usage cache un marché complexe de plus de 8000 milliards de $.
La data est bien connu du monde de la Supply Chain (itinéraires, informations sur les marchandises, douanes,…), mais la valeur de ces données opérationnelles reste peu exploitée. En alliant expertise métier et Data Science, Upply redéfinit les fondamentaux de la Supply Chain en proposant à chacun des acteurs de surmonter la volatilité et l’inefficacité du marché.
This is the keynote of the Into the Box conference, highlighting the release of the BoxLang JVM language, its key enhancements, and its vision for the future.
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxAnoop Ashok
In today's fast-paced retail environment, efficiency is key. Every minute counts, and every penny matters. One tool that can significantly boost your store's efficiency is a well-executed planogram. These visual merchandising blueprints not only enhance store layouts but also save time and money in the process.
Dev Dives: Automate and orchestrate your processes with UiPath MaestroUiPathCommunity
This session is designed to equip developers with the skills needed to build mission-critical, end-to-end processes that seamlessly orchestrate agents, people, and robots.
📕 Here's what you can expect:
- Modeling: Build end-to-end processes using BPMN.
- Implementing: Integrate agentic tasks, RPA, APIs, and advanced decisioning into processes.
- Operating: Control process instances with rewind, replay, pause, and stop functions.
- Monitoring: Use dashboards and embedded analytics for real-time insights into process instances.
This webinar is a must-attend for developers looking to enhance their agentic automation skills and orchestrate robust, mission-critical processes.
👨🏫 Speaker:
Andrei Vintila, Principal Product Manager @UiPath
This session streamed live on April 29, 2025, 16:00 CET.
Check out all our upcoming Dev Dives sessions at https://community.uipath.com/dev-dives-automation-developer-2025/.
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveScyllaDB
Want to learn practical tips for designing systems that can scale efficiently without compromising speed?
Join us for a workshop where we’ll address these challenges head-on and explore how to architect low-latency systems using Rust. During this free interactive workshop oriented for developers, engineers, and architects, we’ll cover how Rust’s unique language features and the Tokio async runtime enable high-performance application development.
As you explore key principles of designing low-latency systems with Rust, you will learn how to:
- Create and compile a real-world app with Rust
- Connect the application to ScyllaDB (NoSQL data store)
- Negotiate tradeoffs related to data modeling and querying
- Manage and monitor the database for consistently low latencies
Spark is a powerhouse for large datasets, but when it comes to smaller data workloads, its overhead can sometimes slow things down. What if you could achieve high performance and efficiency without the need for Spark?
At S&P Global Commodity Insights, having a complete view of global energy and commodities markets enables customers to make data-driven decisions with confidence and create long-term, sustainable value. 🌍
Explore delta-rs + CDC and how these open-source innovations power lightweight, high-performance data applications beyond Spark! 🚀
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfSoftware Company
Explore the benefits and features of advanced logistics management software for businesses in Riyadh. This guide delves into the latest technologies, from real-time tracking and route optimization to warehouse management and inventory control, helping businesses streamline their logistics operations and reduce costs. Learn how implementing the right software solution can enhance efficiency, improve customer satisfaction, and provide a competitive edge in the growing logistics sector of Riyadh.
AI and Data Privacy in 2025: Global TrendsInData Labs
In this infographic, we explore how businesses can implement effective governance frameworks to address AI data privacy. Understanding it is crucial for developing effective strategies that ensure compliance, safeguard customer trust, and leverage AI responsibly. Equip yourself with insights that can drive informed decision-making and position your organization for success in the future of data privacy.
This infographic contains:
-AI and data privacy: Key findings
-Statistics on AI data privacy in the today’s world
-Tips on how to overcome data privacy challenges
-Benefits of AI data security investments.
Keep up-to-date on how AI is reshaping privacy standards and what this entails for both individuals and organizations.
How Can I use the AI Hype in my Business Context?Daniel Lehner
𝙄𝙨 𝘼𝙄 𝙟𝙪𝙨𝙩 𝙝𝙮𝙥𝙚? 𝙊𝙧 𝙞𝙨 𝙞𝙩 𝙩𝙝𝙚 𝙜𝙖𝙢𝙚 𝙘𝙝𝙖𝙣𝙜𝙚𝙧 𝙮𝙤𝙪𝙧 𝙗𝙪𝙨𝙞𝙣𝙚𝙨𝙨 𝙣𝙚𝙚𝙙𝙨?
Everyone’s talking about AI but is anyone really using it to create real value?
Most companies want to leverage AI. Few know 𝗵𝗼𝘄.
✅ What exactly should you ask to find real AI opportunities?
✅ Which AI techniques actually fit your business?
✅ Is your data even ready for AI?
If you’re not sure, you’re not alone. This is a condensed version of the slides I presented at a Linkedin webinar for Tecnovy on 28.04.2025.
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...Alan Dix
Talk at the final event of Data Fusion Dynamics: A Collaborative UK-Saudi Initiative in Cybersecurity and Artificial Intelligence funded by the British Council UK-Saudi Challenge Fund 2024, Cardiff Metropolitan University, 29th April 2025
https://alandix.com/academic/talks/CMet2025-AI-Changes-Everything/
Is AI just another technology, or does it fundamentally change the way we live and think?
Every technology has a direct impact with micro-ethical consequences, some good, some bad. However more profound are the ways in which some technologies reshape the very fabric of society with macro-ethical impacts. The invention of the stirrup revolutionised mounted combat, but as a side effect gave rise to the feudal system, which still shapes politics today. The internal combustion engine offers personal freedom and creates pollution, but has also transformed the nature of urban planning and international trade. When we look at AI the micro-ethical issues, such as bias, are most obvious, but the macro-ethical challenges may be greater.
At a micro-ethical level AI has the potential to deepen social, ethnic and gender bias, issues I have warned about since the early 1990s! It is also being used increasingly on the battlefield. However, it also offers amazing opportunities in health and educations, as the recent Nobel prizes for the developers of AlphaFold illustrate. More radically, the need to encode ethics acts as a mirror to surface essential ethical problems and conflicts.
At the macro-ethical level, by the early 2000s digital technology had already begun to undermine sovereignty (e.g. gambling), market economics (through network effects and emergent monopolies), and the very meaning of money. Modern AI is the child of big data, big computation and ultimately big business, intensifying the inherent tendency of digital technology to concentrate power. AI is already unravelling the fundamentals of the social, political and economic world around us, but this is a world that needs radical reimagining to overcome the global environmental and human challenges that confront us. Our challenge is whether to let the threads fall as they may, or to use them to weave a better future.
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfAbi john
Analyze the growth of meme coins from mere online jokes to potential assets in the digital economy. Explore the community, culture, and utility as they elevate themselves to a new era in cryptocurrency.
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...SOFTTECHHUB
I started my online journey with several hosting services before stumbling upon Ai EngineHost. At first, the idea of paying one fee and getting lifetime access seemed too good to pass up. The platform is built on reliable US-based servers, ensuring your projects run at high speeds and remain safe. Let me take you step by step through its benefits and features as I explain why this hosting solution is a perfect fit for digital entrepreneurs.
Technology Trends in 2025: AI and Big Data AnalyticsInData Labs
At InData Labs, we have been keeping an ear to the ground, looking out for AI-enabled digital transformation trends coming our way in 2025. Our report will provide a look into the technology landscape of the future, including:
-Artificial Intelligence Market Overview
-Strategies for AI Adoption in 2025
-Anticipated drivers of AI adoption and transformative technologies
-Benefits of AI and Big data for your business
-Tips on how to prepare your business for innovation
-AI and data privacy: Strategies for securing data privacy in AI models, etc.
Download your free copy nowand implement the key findings to improve your business.
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxshyamraj55
We’re bringing the TDX energy to our community with 2 power-packed sessions:
🛠️ Workshop: MuleSoft for Agentforce
Explore the new version of our hands-on workshop featuring the latest Topic Center and API Catalog updates.
📄 Talk: Power Up Document Processing
Dive into smart automation with MuleSoft IDP, NLP, and Einstein AI for intelligent document workflows.
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc
Most consumers believe they’re making informed decisions about their personal data—adjusting privacy settings, blocking trackers, and opting out where they can. However, our new research reveals that while awareness is high, taking meaningful action is still lacking. On the corporate side, many organizations report strong policies for managing third-party data and consumer consent yet fall short when it comes to consistency, accountability and transparency.
This session will explore the research findings from TrustArc’s Privacy Pulse Survey, examining consumer attitudes toward personal data collection and practical suggestions for corporate practices around purchasing third-party data.
Attendees will learn:
- Consumer awareness around data brokers and what consumers are doing to limit data collection
- How businesses assess third-party vendors and their consent management operations
- Where business preparedness needs improvement
- What these trends mean for the future of privacy governance and public trust
This discussion is essential for privacy, risk, and compliance professionals who want to ground their strategies in current data and prepare for what’s next in the privacy landscape.
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc
Webinar: Managing Real Time Risk Analytics with MongoDB
1. Webinar : Managing Real Time Risk Analytics with MongoDB
will begin at 14:00 GMT / 7:00 AM PDT / 2:00 PM UTC
Audio should start immediately when you log into the event via
Audio Broadcast. You will need a VOID headset and reliable
internet connection for Audio Broadcast. If you are having issues
connecting, please dial 1-877-668-4493; Access code: 666 722
454.
There is a Q&A following the webinar. You can enter questions in
the chat box to the Host and Presenter.
A recording of the webinar will be available 24 hours after the
eventi s complete.
For any other issues please email [email protected].
2. Easy to Start, Easy to Develop, Easy to Scale
Managing Real Time Risk Analytics with
MongoDB
10gen, Inc.
November 2012
3. @dmroberts
[email protected] Solution Architect
Based in London
http://www.10gen.com/
4. Last Time
•Document-Oriented
•High Volume Data Feeds
•Dynamic schema •Tick Data capture
•Agile •Risk Analytics
•Flexible •Product Catalogs & Trade
•High Performance Capture
•Highly Available •P&L Reporting
•Horizontal Scale Out •Portfolio Management
•Reference Data Management
•Quantitative Analysis
•Automated Trading
5. Key Features for reporting /
analytics
•Dynamic Query Language
•Aggregation Framework
•Dynamic & Flexible Schemas
•Atomic Updates to documents
•Upserts
•Horizonal Scale Out
•Map Reduce
•Hadoop Integration
7. Risk Analytics & Reporting
Use Case:
•Collect and aggregate risk data
•Calculate risk / exposures
•Potentially real time
Why MongoDB?
•Collect data from a single or multiple sources
•Different formats
•Documents used to create ‘pre-aggregated’ reports
•Real Time
•Aggregation Framework for reporting
•e.g. exposure for a counter party
•Internal MR or Hadoop connector
•Batch process risk data
8. Portfolio / Position reporting
Use Case:
•Store positions or portfolio information
•Query to find current positions/portfolios
•Query by client or trader
Why MongoDB?
•Customer/client my have many different products
•Aggregation Framework to calculate values and views
•Work on extremely large data sets
•Current and historic data
9. Reporting / Analytics requirements
•How quickly do you need answers?
•How often do you need updates?
•Requirements will drive which methods to utilise.
•Generally the high the latency tolerance the greater the insight.
•Choices
•Batch calculations - large complex data volumes
•Pre-Aggregated - specific and very fast
•Real-time calculations - As needed reports and calculations
10. Batch Processing
•MongoDB internal Map Reduce
•Hadoop Map Reduce with MongoDB connector
raw
•Insight after batch run
hourly
•For instance every hour or day
•Output to documents/collection daily
•Fast read once data produced
monthly
•Results not up to last millisecond
•Can generate insight from huge datasets
•Rolled up stats
•Source collections -> reporting collection
11. Sharded MongoDB + Hadoop
Shard 1 Shard 2 Shard 3 Shard 4 Shard 5
c z t f v w y
a s u g e h d b x
Hadoop Hadoop Hadoop Hadoop Hadoop
Node Node Node Node Node
Hadoop Hadoop
Hadoop Hadoop
Node Node
Node Node
12. Use Query Language
•Query across documents using MongoDB JSON query language
•Infer results in the application code.
•Dynamic - but what happens when we have 1 billion documents.
•Indexing strategy key
•var data = db.pl.find({ positionId: 1234 })[0]
{
"_id" : ObjectId("50990a10fd421cb025407cb1"),
"positionId" : 1234,
"security" : "ORCL",
"quantity" : 1000,
"price" : 30.23,
"currency" : "USD"
}
data.price * data.quantity = 30230.00
13. Leverage schema design
•Group useful data together into documents
•Utilise upsert and $inc functionality of MongoDB
•Pre-aggregate reports
•$inc incrementing counters is light weight.
•Fast pre calculated data
•Low latency retrieval
•http://docs.mongodb.org/manual/use-cases/pre-aggregated-reports/