SlideShare a Scribd company logo
Current Topics in MapReduce and Virtualization Presented by Chris Bunch at UCSB CS595D - Seminar on Large-Scale Data Management February 2, 2010 http://cs.ucsb.edu/~cgb
To Recap: The “Comparison Paper” by DeWitt, Stonebraker, et al. [1] claims: Data movement is fast for Hadoop MR but slow for Vertica and DBMS-X Queries are fast on Vertica and DBMS-X and slow on Hadoop MR Conclusion: Hadoop MR bad, Vertica good
Specifically Comparison paper claims Hadoop MR is slow because: H MR must always read the entire file MR cannot enforce a schema in the input data (parsing it becomes a bottleneck) Fault tolerance requires data shuffling between Map and Reduce
Update In Jan. 2010’s CACM, DeWitt and Stonebraker [2] update their point of view: Hadoop MR and relational DBs complement each other Use Hadoop MR for “complex” or “quick-and-dirty” analyses. Use relational DBs for everything else.
Another Update Dean and Ghemawat also respond in Jan. 2010’s CACM [3]: Problems are with H MR, not MR itself MR does not need to read all the input data Can use BigTable / HBase to get a subset of the input data for processing
Continuing MR input / output doesn’t need to be simple text files (use BigTable / HBase) MR input / output data can have schemas Can be stored as Protocol Buffers Parsing a string: 1731 ns / record Parsing a Protocol Buffer: 20 ns / record
Fundamentally: Bad Representation of Data: 137|http://www.somehost.com/index.html|602 Good Representation of Data: message Rankings { required string pageurl = 1 required int32 pagerank = 2 required int32 avgduration = 3 }
Conclusion DeWitt and Stonebraker’s arguments are valid against Hadoop MR but not against MR itself Dean’s rebuttal clearly shows that Google MR overcomes DeWitt’s objections to it No native support for PB Serialization in Hadoop MR [4] (hybrid approach possible)
Part 2:  Virtualization Software layer for isolated execution of 1+ virtual guest system on real hardware (multicores) Improves hardware utilization, improves portability, other benefits Multiplexes hardware resources between guests
Virtualization Emulates ISA (captures privileged instructions) and devices, manages state Without OS modification: full virtualization With OS modification: paravirtualization Hardware support for virtualization (modern AMD / Intel processors)
Migrating VMs: Why? Load balancing Online maintenance Proactive fault tolerance Power management
Live Migration of Virtual Machines [5] Authored by Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Christian Limpach, Ian Pratt, and Andrew Warfield (Cambridge and University of Copenhagen) Published in NSDI 2005
In a Nutshell Perform continuous migration while the system is running to ensure that when migration is needed, it can be done quickly. Recorded service downtimes as low as 60ms using Xen
Motivation Process-level migration is hard Small interface between OS and VMM makes VM migration much easier Goal is to minimize application downtime, total migration time, and ensure that migration does not impact active services
Memory Migration Techniques Push phase: Source sends memory pages to destination VM Stop-and-copy phase: Source stops, sends pages, starts destination Pull phase: Destination retrieves memory pages from source VM as needed This hybrid technique uses the first two
Migrating Local Resources To migrate network traffic, simply send an ARP reply with the new destination Does not always work Can also create destination VM with same MAC address Local disk storage problem not addressed For now, use NFS
The Algorithm
Writable Working Sets Modified pages need to be re-copied over Dubbed the “Writable Working Set” Measure this by reading the dirty bitmap every 50 ms Small WWS ⇔ easy to migrate Large WWS ⇔ hard to migrate
WWS for SPEC CINT2000
Implementation Issues Managed migration: Daemon in a separate VM copies pages from source to destination Requires modification to Xen so that daemon can read shadow page tables Can stop source for final copy easily
Implementation Issues Self migration: Source copies pages to destination No modification to source OS needed Stopping source for final copy is hard First stop everything except migrator program, then copy final dirty pages
Rate Limiting If the migration process uses too much bandwidth, it can hamper other processes Relies on administrator specifying a min and max bandwidth to use Seems like it could be determined programatically
Optimizations Don’t copy pages that are frequently dirtied Slow down write-heavy services Don’t do this to essential processes Free all unused cache pages when migration starts Can incur a greater cost if needed later
Evaluation Hardware: 2 Dell PE-2650 servers Dual Xen 2GHz CPUs (one disabled) 2GB memory, Gigabit ethernet Software: XenLinux 2.4.27 Disk attached via NAS
SPECweb99
Quake 3 Server
Memory Muncher
Future Work Intelligently choose the placement and movement of VMs in a cluster Expand this technique to work for VMs not on the same subnet Add support for migrating hard drives Suggest using mirrored disks for now
Conclusions This new technique allows us to migrate VMs with low downtime Works well on applications w/small WWS Optimizations may help other cases but could impact application performance Future work looks promising
Live Migration of Virtual Machine Based on Full System Trace and Replay [6] Authored by Haikun Liu, Hai Jin, Xiaofei Liao, Liting Hu, and Chen Yu (Huazhong University) Published in HPDC 2009
In a Nutshell Previous methods migrate VM but incur too much downtime and too much network bandwidth. Records up to 72.4% reduction in app downtime, up to 31.5% reduction in migration time, and up to 95.9% reduction in data needed to synchronize VM state
Motivation Pre-copy methods fail in three ways: Can’t do memory intensive operations Slowing down write-heavy processes is infeasible in real-world applications The algorithm doesn’t recover the CPU’s cache, resulting in cache and TLB misses and possible performance degradation
Goals Minimize application downtime Minimize total migration time Minimize total data transferred All are similar to goals from previous work
Basic Idea Synchronize the state of the two machines Second machine then will follow same state as the first unless a non-deterministic event occurs Remedy this by keeping a log of non-deterministic events (time, external input) and replaying them
Getting Around Limitations Checkpoint / replay scheme succeeds: Can do memory intensive operations Doesn’t slow down write-heavy processes Does recover the CPU’s cache, avoiding cache and TLB misses and avoiding possible performance degradation
Specifically
Implementation Details Logging and sending logs done by source Replay performed by target Also entails monitoring R log  and initializing the migration
Implementation Details Checkpointing Pause source VM, change all pages to read-only, unpause VM Start copying pages and if writes come, redirect them to a Copy-On-Write buffer (COW) When done, merge pages and COW
Implementation Details File system access - must be SAN Reading / writing forbidden on target VM, redirected to log file (external input) Network redirection Same as before, uses ARP broadcasting
Experimental Setup Hardware: 2 AMD Athlon 3500+ CPUs 1 GB DDR RAM (VM only uses half) Gigabit Ethernet Software: UMLinux w/ RHEL AS3 Disk attached via NAS
Application Downtime
Total Migration Time
Data Transferred
Lessons Looking at kernel-build: Has low non-determinism, so R log  is small Total migration time is long because R replay  ≈ R log Recall we want apps with the original condition, R replay  >> R log , for best migration time
Synchronization Data
Summary: Pros Excels when R replay  >> R log Incurs less application downtime than previous work Total migration time less than previously Migrates with less traffic than previously
Summary: Cons Works only on single processor systems Works only when ARP redirect works Performs poorly when R replay  ≈ R log Still does not address regular hard drives Large size makes migration infeasible
References [1] Pavlo et al.,  A Comparison of Approaches to Large-Scale Data Analysis , SIGMOD 2009 [2] Stonebraker et al.,  MapReduce and Parallel DBMSs: Friends or Foes? , CACM Jan. 2010 [3] Dean et al.,  MapReduce: A Flexible Data Processing Tool , CACM, Jan. 2010 [4]  Add serialization support for Protocol Buffers ,  http://issues.apache.org/jira/browse/MAPREDUCE-377 [5] Clark et al.,  Live Migration of Virtual Machines , NSDI 2005 [6] Liu et al.,  Live Migration of Virtual Machine Based on Full System Trace and Replay, HPDC 2009
Ad

More Related Content

What's hot (19)

3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta
Databricks
 
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
DataStax Academy
 
Cloud
CloudCloud
Cloud
Damilola Mosaku
 
Glynn Bird – Cloudant – Building applications for success.- NoSQL matters Bar...
Glynn Bird – Cloudant – Building applications for success.- NoSQL matters Bar...Glynn Bird – Cloudant – Building applications for success.- NoSQL matters Bar...
Glynn Bird – Cloudant – Building applications for success.- NoSQL matters Bar...
NoSQLmatters
 
ADDO 2021: Why and how to include database changes in the deployment pipeline
ADDO 2021: Why and how to include database changes in the deployment pipelineADDO 2021: Why and how to include database changes in the deployment pipeline
ADDO 2021: Why and how to include database changes in the deployment pipeline
Eduardo Piairo
 
Lessons from Large-Scale Cloud Software at Databricks
Lessons from Large-Scale Cloud Software at DatabricksLessons from Large-Scale Cloud Software at Databricks
Lessons from Large-Scale Cloud Software at Databricks
Matei Zaharia
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
James Serra
 
Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive Industry
DataWorks Summit/Hadoop Summit
 
Move a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloudMove a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloud
Ike Ellis
 
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Spark Summit
 
Couchbase and Apache Spark
Couchbase and Apache SparkCouchbase and Apache Spark
Couchbase and Apache Spark
Matt Ingenthron
 
Seamless, Real-Time Data Integration with Connect
Seamless, Real-Time Data Integration with ConnectSeamless, Real-Time Data Integration with Connect
Seamless, Real-Time Data Integration with Connect
Precisely
 
Spark and Couchbase– Augmenting the Operational Database with Spark
Spark and Couchbase– Augmenting the Operational Database with SparkSpark and Couchbase– Augmenting the Operational Database with Spark
Spark and Couchbase– Augmenting the Operational Database with Spark
Matt Ingenthron
 
Review Oracle OpenWorld 2015 - Overview, Main themes, Announcements and Future
Review Oracle OpenWorld 2015 - Overview, Main themes, Announcements and FutureReview Oracle OpenWorld 2015 - Overview, Main themes, Announcements and Future
Review Oracle OpenWorld 2015 - Overview, Main themes, Announcements and Future
Lucas Jellema
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
Databricks
 
Introduction to couchbase
Introduction to couchbaseIntroduction to couchbase
Introduction to couchbase
Dipti Borkar
 
It's a wrap - closing keynote for nlOUG Tech Experience 2017 (16th June, The ...
It's a wrap - closing keynote for nlOUG Tech Experience 2017 (16th June, The ...It's a wrap - closing keynote for nlOUG Tech Experience 2017 (16th June, The ...
It's a wrap - closing keynote for nlOUG Tech Experience 2017 (16th June, The ...
Lucas Jellema
 
Simplifying Disaster Recovery with Delta Lake
Simplifying Disaster Recovery with Delta LakeSimplifying Disaster Recovery with Delta Lake
Simplifying Disaster Recovery with Delta Lake
Databricks
 
Realizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamRealizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache Beam
DataWorks Summit
 
3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta3D: DBT using Databricks and Delta
3D: DBT using Databricks and Delta
Databricks
 
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
DataStax Academy
 
Glynn Bird – Cloudant – Building applications for success.- NoSQL matters Bar...
Glynn Bird – Cloudant – Building applications for success.- NoSQL matters Bar...Glynn Bird – Cloudant – Building applications for success.- NoSQL matters Bar...
Glynn Bird – Cloudant – Building applications for success.- NoSQL matters Bar...
NoSQLmatters
 
ADDO 2021: Why and how to include database changes in the deployment pipeline
ADDO 2021: Why and how to include database changes in the deployment pipelineADDO 2021: Why and how to include database changes in the deployment pipeline
ADDO 2021: Why and how to include database changes in the deployment pipeline
Eduardo Piairo
 
Lessons from Large-Scale Cloud Software at Databricks
Lessons from Large-Scale Cloud Software at DatabricksLessons from Large-Scale Cloud Software at Databricks
Lessons from Large-Scale Cloud Software at Databricks
Matei Zaharia
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
James Serra
 
Preventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive IndustryPreventative Maintenance of Robots in Automotive Industry
Preventative Maintenance of Robots in Automotive Industry
DataWorks Summit/Hadoop Summit
 
Move a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloudMove a successful onpremise oltp application to the cloud
Move a successful onpremise oltp application to the cloud
Ike Ellis
 
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Spark Summit
 
Couchbase and Apache Spark
Couchbase and Apache SparkCouchbase and Apache Spark
Couchbase and Apache Spark
Matt Ingenthron
 
Seamless, Real-Time Data Integration with Connect
Seamless, Real-Time Data Integration with ConnectSeamless, Real-Time Data Integration with Connect
Seamless, Real-Time Data Integration with Connect
Precisely
 
Spark and Couchbase– Augmenting the Operational Database with Spark
Spark and Couchbase– Augmenting the Operational Database with SparkSpark and Couchbase– Augmenting the Operational Database with Spark
Spark and Couchbase– Augmenting the Operational Database with Spark
Matt Ingenthron
 
Review Oracle OpenWorld 2015 - Overview, Main themes, Announcements and Future
Review Oracle OpenWorld 2015 - Overview, Main themes, Announcements and FutureReview Oracle OpenWorld 2015 - Overview, Main themes, Announcements and Future
Review Oracle OpenWorld 2015 - Overview, Main themes, Announcements and Future
Lucas Jellema
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
Databricks
 
Introduction to couchbase
Introduction to couchbaseIntroduction to couchbase
Introduction to couchbase
Dipti Borkar
 
It's a wrap - closing keynote for nlOUG Tech Experience 2017 (16th June, The ...
It's a wrap - closing keynote for nlOUG Tech Experience 2017 (16th June, The ...It's a wrap - closing keynote for nlOUG Tech Experience 2017 (16th June, The ...
It's a wrap - closing keynote for nlOUG Tech Experience 2017 (16th June, The ...
Lucas Jellema
 
Simplifying Disaster Recovery with Delta Lake
Simplifying Disaster Recovery with Delta LakeSimplifying Disaster Recovery with Delta Lake
Simplifying Disaster Recovery with Delta Lake
Databricks
 
Realizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache BeamRealizing the promise of portable data processing with Apache Beam
Realizing the promise of portable data processing with Apache Beam
DataWorks Summit
 

Viewers also liked (20)

Graphically understand and interactively explore your Data Lineage
Graphically understand and interactively explore your Data LineageGraphically understand and interactively explore your Data Lineage
Graphically understand and interactively explore your Data Lineage
Mohammad Ahmed
 
Design Patterns
Design PatternsDesign Patterns
Design Patterns
Evandro Venancio
 
GraphDay Stockholm - Levaraging Graph-Technology to fight Financial Fraud
GraphDay Stockholm - Levaraging Graph-Technology to fight Financial FraudGraphDay Stockholm - Levaraging Graph-Technology to fight Financial Fraud
GraphDay Stockholm - Levaraging Graph-Technology to fight Financial Fraud
Neo4j
 
GraphDay Stockholm - Graphs in Action
GraphDay Stockholm - Graphs in ActionGraphDay Stockholm - Graphs in Action
GraphDay Stockholm - Graphs in Action
Neo4j
 
GraphDay Stockholm - iKnow Solutions - The Value Add of Graphs to Analytics a...
GraphDay Stockholm - iKnow Solutions - The Value Add of Graphs to Analytics a...GraphDay Stockholm - iKnow Solutions - The Value Add of Graphs to Analytics a...
GraphDay Stockholm - iKnow Solutions - The Value Add of Graphs to Analytics a...
Neo4j
 
Webinar: Intro to Cypher
Webinar: Intro to CypherWebinar: Intro to Cypher
Webinar: Intro to Cypher
Neo4j
 
GraphDay Stockholm - Telia Zone
GraphDay Stockholm - Telia Zone GraphDay Stockholm - Telia Zone
GraphDay Stockholm - Telia Zone
Neo4j
 
Neo4j GraphTalks - Einführung in Graphdatenbanken
Neo4j GraphTalks - Einführung in GraphdatenbankenNeo4j GraphTalks - Einführung in Graphdatenbanken
Neo4j GraphTalks - Einführung in Graphdatenbanken
Neo4j
 
GraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph Databases
GraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph DatabasesGraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph Databases
GraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph Databases
Neo4j
 
Identity and Access Management
Identity and Access ManagementIdentity and Access Management
Identity and Access Management
Neo4j
 
The Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
The Five Graphs of Government: How Federal Agencies can Utilize Graph TechnologyThe Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
The Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
Neo4j
 
GraphTalks Rome - Selecting the right Technology
GraphTalks Rome - Selecting the right TechnologyGraphTalks Rome - Selecting the right Technology
GraphTalks Rome - Selecting the right Technology
Neo4j
 
Intro to Neo4j presentation
Intro to Neo4j presentationIntro to Neo4j presentation
Intro to Neo4j presentation
jexp
 
GraphTalks Rome - Introducing Neo4j
GraphTalks Rome - Introducing Neo4jGraphTalks Rome - Introducing Neo4j
GraphTalks Rome - Introducing Neo4j
Neo4j
 
GraphTalks Rome - Identity and Access Management
GraphTalks Rome - Identity and Access ManagementGraphTalks Rome - Identity and Access Management
GraphTalks Rome - Identity and Access Management
Neo4j
 
GraphTalks Rome - The Italian Business Graph
GraphTalks Rome - The Italian Business GraphGraphTalks Rome - The Italian Business Graph
GraphTalks Rome - The Italian Business Graph
Neo4j
 
Knowledge Architecture: Graphing Your Knowledge
Knowledge Architecture: Graphing Your KnowledgeKnowledge Architecture: Graphing Your Knowledge
Knowledge Architecture: Graphing Your Knowledge
Neo4j
 
Journey of The Connected Enterprise - Knowledge Graphs - Smart Data
Journey of The Connected Enterprise - Knowledge Graphs - Smart DataJourney of The Connected Enterprise - Knowledge Graphs - Smart Data
Journey of The Connected Enterprise - Knowledge Graphs - Smart Data
Benjamin Nussbaum
 
How to Design Retail Recommendation Engines with Neo4j
How to Design Retail Recommendation Engines with Neo4jHow to Design Retail Recommendation Engines with Neo4j
How to Design Retail Recommendation Engines with Neo4j
Neo4j
 
Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...
Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...
Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...
Neo4j
 
Graphically understand and interactively explore your Data Lineage
Graphically understand and interactively explore your Data LineageGraphically understand and interactively explore your Data Lineage
Graphically understand and interactively explore your Data Lineage
Mohammad Ahmed
 
GraphDay Stockholm - Levaraging Graph-Technology to fight Financial Fraud
GraphDay Stockholm - Levaraging Graph-Technology to fight Financial FraudGraphDay Stockholm - Levaraging Graph-Technology to fight Financial Fraud
GraphDay Stockholm - Levaraging Graph-Technology to fight Financial Fraud
Neo4j
 
GraphDay Stockholm - Graphs in Action
GraphDay Stockholm - Graphs in ActionGraphDay Stockholm - Graphs in Action
GraphDay Stockholm - Graphs in Action
Neo4j
 
GraphDay Stockholm - iKnow Solutions - The Value Add of Graphs to Analytics a...
GraphDay Stockholm - iKnow Solutions - The Value Add of Graphs to Analytics a...GraphDay Stockholm - iKnow Solutions - The Value Add of Graphs to Analytics a...
GraphDay Stockholm - iKnow Solutions - The Value Add of Graphs to Analytics a...
Neo4j
 
Webinar: Intro to Cypher
Webinar: Intro to CypherWebinar: Intro to Cypher
Webinar: Intro to Cypher
Neo4j
 
GraphDay Stockholm - Telia Zone
GraphDay Stockholm - Telia Zone GraphDay Stockholm - Telia Zone
GraphDay Stockholm - Telia Zone
Neo4j
 
Neo4j GraphTalks - Einführung in Graphdatenbanken
Neo4j GraphTalks - Einführung in GraphdatenbankenNeo4j GraphTalks - Einführung in Graphdatenbanken
Neo4j GraphTalks - Einführung in Graphdatenbanken
Neo4j
 
GraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph Databases
GraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph DatabasesGraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph Databases
GraphDay Stockholm - Graphs in the Real World: Top Use Cases for Graph Databases
Neo4j
 
Identity and Access Management
Identity and Access ManagementIdentity and Access Management
Identity and Access Management
Neo4j
 
The Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
The Five Graphs of Government: How Federal Agencies can Utilize Graph TechnologyThe Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
The Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
Neo4j
 
GraphTalks Rome - Selecting the right Technology
GraphTalks Rome - Selecting the right TechnologyGraphTalks Rome - Selecting the right Technology
GraphTalks Rome - Selecting the right Technology
Neo4j
 
Intro to Neo4j presentation
Intro to Neo4j presentationIntro to Neo4j presentation
Intro to Neo4j presentation
jexp
 
GraphTalks Rome - Introducing Neo4j
GraphTalks Rome - Introducing Neo4jGraphTalks Rome - Introducing Neo4j
GraphTalks Rome - Introducing Neo4j
Neo4j
 
GraphTalks Rome - Identity and Access Management
GraphTalks Rome - Identity and Access ManagementGraphTalks Rome - Identity and Access Management
GraphTalks Rome - Identity and Access Management
Neo4j
 
GraphTalks Rome - The Italian Business Graph
GraphTalks Rome - The Italian Business GraphGraphTalks Rome - The Italian Business Graph
GraphTalks Rome - The Italian Business Graph
Neo4j
 
Knowledge Architecture: Graphing Your Knowledge
Knowledge Architecture: Graphing Your KnowledgeKnowledge Architecture: Graphing Your Knowledge
Knowledge Architecture: Graphing Your Knowledge
Neo4j
 
Journey of The Connected Enterprise - Knowledge Graphs - Smart Data
Journey of The Connected Enterprise - Knowledge Graphs - Smart DataJourney of The Connected Enterprise - Knowledge Graphs - Smart Data
Journey of The Connected Enterprise - Knowledge Graphs - Smart Data
Benjamin Nussbaum
 
How to Design Retail Recommendation Engines with Neo4j
How to Design Retail Recommendation Engines with Neo4jHow to Design Retail Recommendation Engines with Neo4j
How to Design Retail Recommendation Engines with Neo4j
Neo4j
 
Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...
Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...
Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...
Neo4j
 
Ad

Similar to Presentation on Large Scale Data Management (20)

HIGH AVAILABILITY AND LOAD BALANCING FOR POSTGRESQL DATABASES: DESIGNING AND ...
HIGH AVAILABILITY AND LOAD BALANCING FOR POSTGRESQL DATABASES: DESIGNING AND ...HIGH AVAILABILITY AND LOAD BALANCING FOR POSTGRESQL DATABASES: DESIGNING AND ...
HIGH AVAILABILITY AND LOAD BALANCING FOR POSTGRESQL DATABASES: DESIGNING AND ...
IJDMS
 
Hadoop
HadoopHadoop
Hadoop
Ramakrishna Reddy Bijjam
 
Clusters (Distributed computing)
Clusters (Distributed computing)Clusters (Distributed computing)
Clusters (Distributed computing)
Sri Prasanna
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
sreehari orienit
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
Hadoop User Group
 
Solutions for Exercises: Distributed Systems 5th Edition by Coulouris & Dolli...
Solutions for Exercises: Distributed Systems 5th Edition by Coulouris & Dolli...Solutions for Exercises: Distributed Systems 5th Edition by Coulouris & Dolli...
Solutions for Exercises: Distributed Systems 5th Edition by Coulouris & Dolli...
industriale82
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniques
ijsrd.com
 
How to scale your web app
How to scale your web appHow to scale your web app
How to scale your web app
Georgio_1999
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
programmermag
 
Mapreduce is for Hadoop Ecosystem in Data Science
Mapreduce is for Hadoop Ecosystem in Data ScienceMapreduce is for Hadoop Ecosystem in Data Science
Mapreduce is for Hadoop Ecosystem in Data Science
DakshGoti2
 
Virtual Machine Migration Techniques in Cloud Environment: A Survey
Virtual Machine Migration Techniques in Cloud Environment: A SurveyVirtual Machine Migration Techniques in Cloud Environment: A Survey
Virtual Machine Migration Techniques in Cloud Environment: A Survey
ijsrd.com
 
Challenges in Cloud Computing – VM Migration
Challenges in Cloud Computing – VM MigrationChallenges in Cloud Computing – VM Migration
Challenges in Cloud Computing – VM Migration
Sarmad Makhdoom
 
Everything comes in 3's
Everything comes in 3'sEverything comes in 3's
Everything comes in 3's
delagoya
 
Handling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsHandling Data in Mega Scale Systems
Handling Data in Mega Scale Systems
Directi Group
 
Porting Spring PetClinic to GigaSpaces
Porting Spring PetClinic to GigaSpacesPorting Spring PetClinic to GigaSpaces
Porting Spring PetClinic to GigaSpaces
Uri Cohen
 
Hadoop installation by santosh nage
Hadoop installation by santosh nageHadoop installation by santosh nage
Hadoop installation by santosh nage
Santosh Nage
 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High Availability
Cloudera, Inc.
 
How To Scale v2
How To Scale v2How To Scale v2
How To Scale v2
Georgio_1999
 
HIGH AVAILABILITY AND LOAD BALANCING FOR POSTGRESQL DATABASES: DESIGNING AND ...
HIGH AVAILABILITY AND LOAD BALANCING FOR POSTGRESQL DATABASES: DESIGNING AND ...HIGH AVAILABILITY AND LOAD BALANCING FOR POSTGRESQL DATABASES: DESIGNING AND ...
HIGH AVAILABILITY AND LOAD BALANCING FOR POSTGRESQL DATABASES: DESIGNING AND ...
IJDMS
 
Clusters (Distributed computing)
Clusters (Distributed computing)Clusters (Distributed computing)
Clusters (Distributed computing)
Sri Prasanna
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
sreehari orienit
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
Hadoop User Group
 
Solutions for Exercises: Distributed Systems 5th Edition by Coulouris & Dolli...
Solutions for Exercises: Distributed Systems 5th Edition by Coulouris & Dolli...Solutions for Exercises: Distributed Systems 5th Edition by Coulouris & Dolli...
Solutions for Exercises: Distributed Systems 5th Edition by Coulouris & Dolli...
industriale82
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniques
ijsrd.com
 
How to scale your web app
How to scale your web appHow to scale your web app
How to scale your web app
Georgio_1999
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
programmermag
 
Mapreduce is for Hadoop Ecosystem in Data Science
Mapreduce is for Hadoop Ecosystem in Data ScienceMapreduce is for Hadoop Ecosystem in Data Science
Mapreduce is for Hadoop Ecosystem in Data Science
DakshGoti2
 
Virtual Machine Migration Techniques in Cloud Environment: A Survey
Virtual Machine Migration Techniques in Cloud Environment: A SurveyVirtual Machine Migration Techniques in Cloud Environment: A Survey
Virtual Machine Migration Techniques in Cloud Environment: A Survey
ijsrd.com
 
Challenges in Cloud Computing – VM Migration
Challenges in Cloud Computing – VM MigrationChallenges in Cloud Computing – VM Migration
Challenges in Cloud Computing – VM Migration
Sarmad Makhdoom
 
Everything comes in 3's
Everything comes in 3'sEverything comes in 3's
Everything comes in 3's
delagoya
 
Handling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsHandling Data in Mega Scale Systems
Handling Data in Mega Scale Systems
Directi Group
 
Porting Spring PetClinic to GigaSpaces
Porting Spring PetClinic to GigaSpacesPorting Spring PetClinic to GigaSpaces
Porting Spring PetClinic to GigaSpaces
Uri Cohen
 
Hadoop installation by santosh nage
Hadoop installation by santosh nageHadoop installation by santosh nage
Hadoop installation by santosh nage
Santosh Nage
 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High Availability
Cloudera, Inc.
 
Ad

More from Chris Bunch (11)

AppScale at SB Cloud Meetup
AppScale at SB Cloud MeetupAppScale at SB Cloud Meetup
AppScale at SB Cloud Meetup
Chris Bunch
 
Ph.D. Defense
Ph.D. DefensePh.D. Defense
Ph.D. Defense
Chris Bunch
 
A Pluggable Autoscaling System @ UCC
A Pluggable Autoscaling System @ UCCA Pluggable Autoscaling System @ UCC
A Pluggable Autoscaling System @ UCC
Chris Bunch
 
AppScale + Neptune @ HPCDB
AppScale + Neptune @ HPCDBAppScale + Neptune @ HPCDB
AppScale + Neptune @ HPCDB
Chris Bunch
 
Neptune @ SoCal
Neptune @ SoCalNeptune @ SoCal
Neptune @ SoCal
Chris Bunch
 
AppScale @ LA.rb
AppScale @ LA.rbAppScale @ LA.rb
AppScale @ LA.rb
Chris Bunch
 
AppScale Talk at SBonRails
AppScale Talk at SBonRailsAppScale Talk at SBonRails
AppScale Talk at SBonRails
Chris Bunch
 
Active Cloud DB at CloudComp '10
Active Cloud DB at CloudComp '10Active Cloud DB at CloudComp '10
Active Cloud DB at CloudComp '10
Chris Bunch
 
Pregel: A System for Large-Scale Graph Processing
Pregel: A System for Large-Scale Graph ProcessingPregel: A System for Large-Scale Graph Processing
Pregel: A System for Large-Scale Graph Processing
Chris Bunch
 
Designing the Call of Cthulhu app with Google App Engine
Designing the Call of Cthulhu app with Google App EngineDesigning the Call of Cthulhu app with Google App Engine
Designing the Call of Cthulhu app with Google App Engine
Chris Bunch
 
Appscale at CLOUDCOMP '09
Appscale at CLOUDCOMP '09Appscale at CLOUDCOMP '09
Appscale at CLOUDCOMP '09
Chris Bunch
 
AppScale at SB Cloud Meetup
AppScale at SB Cloud MeetupAppScale at SB Cloud Meetup
AppScale at SB Cloud Meetup
Chris Bunch
 
A Pluggable Autoscaling System @ UCC
A Pluggable Autoscaling System @ UCCA Pluggable Autoscaling System @ UCC
A Pluggable Autoscaling System @ UCC
Chris Bunch
 
AppScale + Neptune @ HPCDB
AppScale + Neptune @ HPCDBAppScale + Neptune @ HPCDB
AppScale + Neptune @ HPCDB
Chris Bunch
 
AppScale @ LA.rb
AppScale @ LA.rbAppScale @ LA.rb
AppScale @ LA.rb
Chris Bunch
 
AppScale Talk at SBonRails
AppScale Talk at SBonRailsAppScale Talk at SBonRails
AppScale Talk at SBonRails
Chris Bunch
 
Active Cloud DB at CloudComp '10
Active Cloud DB at CloudComp '10Active Cloud DB at CloudComp '10
Active Cloud DB at CloudComp '10
Chris Bunch
 
Pregel: A System for Large-Scale Graph Processing
Pregel: A System for Large-Scale Graph ProcessingPregel: A System for Large-Scale Graph Processing
Pregel: A System for Large-Scale Graph Processing
Chris Bunch
 
Designing the Call of Cthulhu app with Google App Engine
Designing the Call of Cthulhu app with Google App EngineDesigning the Call of Cthulhu app with Google App Engine
Designing the Call of Cthulhu app with Google App Engine
Chris Bunch
 
Appscale at CLOUDCOMP '09
Appscale at CLOUDCOMP '09Appscale at CLOUDCOMP '09
Appscale at CLOUDCOMP '09
Chris Bunch
 

Recently uploaded (20)

Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 

Presentation on Large Scale Data Management

  • 1. Current Topics in MapReduce and Virtualization Presented by Chris Bunch at UCSB CS595D - Seminar on Large-Scale Data Management February 2, 2010 http://cs.ucsb.edu/~cgb
  • 2. To Recap: The “Comparison Paper” by DeWitt, Stonebraker, et al. [1] claims: Data movement is fast for Hadoop MR but slow for Vertica and DBMS-X Queries are fast on Vertica and DBMS-X and slow on Hadoop MR Conclusion: Hadoop MR bad, Vertica good
  • 3. Specifically Comparison paper claims Hadoop MR is slow because: H MR must always read the entire file MR cannot enforce a schema in the input data (parsing it becomes a bottleneck) Fault tolerance requires data shuffling between Map and Reduce
  • 4. Update In Jan. 2010’s CACM, DeWitt and Stonebraker [2] update their point of view: Hadoop MR and relational DBs complement each other Use Hadoop MR for “complex” or “quick-and-dirty” analyses. Use relational DBs for everything else.
  • 5. Another Update Dean and Ghemawat also respond in Jan. 2010’s CACM [3]: Problems are with H MR, not MR itself MR does not need to read all the input data Can use BigTable / HBase to get a subset of the input data for processing
  • 6. Continuing MR input / output doesn’t need to be simple text files (use BigTable / HBase) MR input / output data can have schemas Can be stored as Protocol Buffers Parsing a string: 1731 ns / record Parsing a Protocol Buffer: 20 ns / record
  • 7. Fundamentally: Bad Representation of Data: 137|http://www.somehost.com/index.html|602 Good Representation of Data: message Rankings { required string pageurl = 1 required int32 pagerank = 2 required int32 avgduration = 3 }
  • 8. Conclusion DeWitt and Stonebraker’s arguments are valid against Hadoop MR but not against MR itself Dean’s rebuttal clearly shows that Google MR overcomes DeWitt’s objections to it No native support for PB Serialization in Hadoop MR [4] (hybrid approach possible)
  • 9. Part 2: Virtualization Software layer for isolated execution of 1+ virtual guest system on real hardware (multicores) Improves hardware utilization, improves portability, other benefits Multiplexes hardware resources between guests
  • 10. Virtualization Emulates ISA (captures privileged instructions) and devices, manages state Without OS modification: full virtualization With OS modification: paravirtualization Hardware support for virtualization (modern AMD / Intel processors)
  • 11. Migrating VMs: Why? Load balancing Online maintenance Proactive fault tolerance Power management
  • 12. Live Migration of Virtual Machines [5] Authored by Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Christian Limpach, Ian Pratt, and Andrew Warfield (Cambridge and University of Copenhagen) Published in NSDI 2005
  • 13. In a Nutshell Perform continuous migration while the system is running to ensure that when migration is needed, it can be done quickly. Recorded service downtimes as low as 60ms using Xen
  • 14. Motivation Process-level migration is hard Small interface between OS and VMM makes VM migration much easier Goal is to minimize application downtime, total migration time, and ensure that migration does not impact active services
  • 15. Memory Migration Techniques Push phase: Source sends memory pages to destination VM Stop-and-copy phase: Source stops, sends pages, starts destination Pull phase: Destination retrieves memory pages from source VM as needed This hybrid technique uses the first two
  • 16. Migrating Local Resources To migrate network traffic, simply send an ARP reply with the new destination Does not always work Can also create destination VM with same MAC address Local disk storage problem not addressed For now, use NFS
  • 18. Writable Working Sets Modified pages need to be re-copied over Dubbed the “Writable Working Set” Measure this by reading the dirty bitmap every 50 ms Small WWS ⇔ easy to migrate Large WWS ⇔ hard to migrate
  • 19. WWS for SPEC CINT2000
  • 20. Implementation Issues Managed migration: Daemon in a separate VM copies pages from source to destination Requires modification to Xen so that daemon can read shadow page tables Can stop source for final copy easily
  • 21. Implementation Issues Self migration: Source copies pages to destination No modification to source OS needed Stopping source for final copy is hard First stop everything except migrator program, then copy final dirty pages
  • 22. Rate Limiting If the migration process uses too much bandwidth, it can hamper other processes Relies on administrator specifying a min and max bandwidth to use Seems like it could be determined programatically
  • 23. Optimizations Don’t copy pages that are frequently dirtied Slow down write-heavy services Don’t do this to essential processes Free all unused cache pages when migration starts Can incur a greater cost if needed later
  • 24. Evaluation Hardware: 2 Dell PE-2650 servers Dual Xen 2GHz CPUs (one disabled) 2GB memory, Gigabit ethernet Software: XenLinux 2.4.27 Disk attached via NAS
  • 28. Future Work Intelligently choose the placement and movement of VMs in a cluster Expand this technique to work for VMs not on the same subnet Add support for migrating hard drives Suggest using mirrored disks for now
  • 29. Conclusions This new technique allows us to migrate VMs with low downtime Works well on applications w/small WWS Optimizations may help other cases but could impact application performance Future work looks promising
  • 30. Live Migration of Virtual Machine Based on Full System Trace and Replay [6] Authored by Haikun Liu, Hai Jin, Xiaofei Liao, Liting Hu, and Chen Yu (Huazhong University) Published in HPDC 2009
  • 31. In a Nutshell Previous methods migrate VM but incur too much downtime and too much network bandwidth. Records up to 72.4% reduction in app downtime, up to 31.5% reduction in migration time, and up to 95.9% reduction in data needed to synchronize VM state
  • 32. Motivation Pre-copy methods fail in three ways: Can’t do memory intensive operations Slowing down write-heavy processes is infeasible in real-world applications The algorithm doesn’t recover the CPU’s cache, resulting in cache and TLB misses and possible performance degradation
  • 33. Goals Minimize application downtime Minimize total migration time Minimize total data transferred All are similar to goals from previous work
  • 34. Basic Idea Synchronize the state of the two machines Second machine then will follow same state as the first unless a non-deterministic event occurs Remedy this by keeping a log of non-deterministic events (time, external input) and replaying them
  • 35. Getting Around Limitations Checkpoint / replay scheme succeeds: Can do memory intensive operations Doesn’t slow down write-heavy processes Does recover the CPU’s cache, avoiding cache and TLB misses and avoiding possible performance degradation
  • 37. Implementation Details Logging and sending logs done by source Replay performed by target Also entails monitoring R log and initializing the migration
  • 38. Implementation Details Checkpointing Pause source VM, change all pages to read-only, unpause VM Start copying pages and if writes come, redirect them to a Copy-On-Write buffer (COW) When done, merge pages and COW
  • 39. Implementation Details File system access - must be SAN Reading / writing forbidden on target VM, redirected to log file (external input) Network redirection Same as before, uses ARP broadcasting
  • 40. Experimental Setup Hardware: 2 AMD Athlon 3500+ CPUs 1 GB DDR RAM (VM only uses half) Gigabit Ethernet Software: UMLinux w/ RHEL AS3 Disk attached via NAS
  • 44. Lessons Looking at kernel-build: Has low non-determinism, so R log is small Total migration time is long because R replay ≈ R log Recall we want apps with the original condition, R replay >> R log , for best migration time
  • 46. Summary: Pros Excels when R replay >> R log Incurs less application downtime than previous work Total migration time less than previously Migrates with less traffic than previously
  • 47. Summary: Cons Works only on single processor systems Works only when ARP redirect works Performs poorly when R replay ≈ R log Still does not address regular hard drives Large size makes migration infeasible
  • 48. References [1] Pavlo et al., A Comparison of Approaches to Large-Scale Data Analysis , SIGMOD 2009 [2] Stonebraker et al., MapReduce and Parallel DBMSs: Friends or Foes? , CACM Jan. 2010 [3] Dean et al., MapReduce: A Flexible Data Processing Tool , CACM, Jan. 2010 [4] Add serialization support for Protocol Buffers , http://issues.apache.org/jira/browse/MAPREDUCE-377 [5] Clark et al., Live Migration of Virtual Machines , NSDI 2005 [6] Liu et al., Live Migration of Virtual Machine Based on Full System Trace and Replay, HPDC 2009