A introductory discussion of cloud computing and capacity planning implications is followed by a step by step guide to running a Hadoop job in EMR, and finally a discussion of how to write your own Hadoop queries.
This document provides an overview of a workshop on cloud native, capacity, performance and cost optimization tools and techniques. It begins with introducing the difference between a presentation and workshop. It then discusses introducing attendees, presenting on various cloud native topics like migration paths and operations tools, and benchmarking Cassandra performance at scale across AWS regions. The goal is to explore cloud native techniques while discussing specific problems attendees face.
This document discusses serverless computing and function as a service (FaaS). It provides an overview of what serverless and FaaS are, how they work, common use cases, limitations, and the future impact. Key points include that FaaS provides fully managed compute with no servers to provision and pays only for actual usage, common triggers are HTTP requests and events, and use cases span web backends, mobile apps, file processing, bots, and more. The document also outlines limitations like statelessness and cold starts, as well as emerging practices around business models, architectures, operations, and tooling.
Yow Conference Dec 2013 Netflix Workshop Slides with NotesAdrian Cockcroft
This document provides an overview and agenda for a workshop on patterns for continuous delivery, high availability, DevOps and cloud native development using NetflixOSS open source tools and frameworks. The presenter introduces himself and his background. The content covers Netflix's architecture evolution from monolithic to microservices, how Netflix scales on AWS, and principles and outcomes that enable cloud native development. The workshop then dives into specific NetflixOSS projects like Eureka, Cassandra, Zuul and Hystrix that help with service discovery, data storage, routing and availability. Tools for deployment, configuration, cost analysis and developer productivity are also discussed.
This document provides an overview of Amazon EC2 and autoscaling. It discusses EC2 basics like instance lifecycle, types, and using Amazon Machine Images. It also covers bootstrapping EC2 instances using metadata and user data. Monitoring EC2 with CloudWatch and different types of autoscaling like vertical, horizontal, and using Auto Scaling groups are explained. Autoscaling helps ensure applications have the correct resources to handle varying load and reduces manual scaling efforts.
Evolution of Netflix's cloud security strategy. Includes cloud-based key management and hybrid security controls that span traditional datacenter and public cloud.
Serverless in production, an experience report (JeffConf)Yan Cui
This document provides an experience report on getting serverless applications ready for production. It discusses several important considerations for production readiness including testing, monitoring and alerting, configuration management, security, and continuous integration/delivery pipelines. The document also shares lessons learned from rebuilding several services using a serverless approach at Skype and the cost savings and velocity gains achieved.
Oracle Exalogic Elastic Cloud - Revolutionizing Data Center ConsolidationRex Wang
A significant challenge to enterprises attempting to consolidate sprawling datacenters is finding a platform that supports a broad range of workload types with enterprise reliability and performance. Oracle Exalogic Elastic Cloud is a complete system built with hardware and software engineered together for extreme performance, reliability, and scalability for the widest possible range of enterprise mission-critical workloads. Join this webcast to learn all about Exalogic—what it is and how it will revolutionize your data center through the highest performance and dramatic application consolidation.
Pets vs. Cattle: The Elastic Cloud StoryRandy Bias
My recent presentation to the Chicago DevOps Meetup that explains how we're moving from a servers as Pets world to a servers as Cattle world. Understanding this change is critical to success in cloud, DevOps, and delivering new value to the enterprise.
Big Data Hadoop Local and Public Cloud (Amazon EMR)IMC Institute
The document outlines a hands-on workshop on running Hadoop on Amazon Elastic MapReduce (EMR). It provides instructions on setting up an AWS account and necessary services like S3 and EC2. It then guides attendees on writing a word count MapReduce program, packaging it into a JAR file, and running it on EMR. The document shares examples and code for performing common analytics tasks like aggregation, group-by operations, and frequency distributions using MapReduce on Hadoop.
The TCO Calculator - Estimate the True Cost of Hadoop MapR Technologies
http://bit.ly/1wsAuRS - There are many hidden costs for Apache Hadoop that have different effects across different Hadoop distributions. With the new MapR TCO calculator organisations have a simple and reliable tool that is based on facts to compare costs.
The document summarizes the key changes between the old MapReduce API and the new MapReduce API in Hadoop. Some of the main changes include:
- Renaming all "mapred" packages to "mapreduce"
- Methods can now throw InterruptedException in addition to IOException
- Using Configuration instead of JobConf
- Changes to Mapper, Reducer, and RecordReader interfaces and classes
- Submitting jobs uses the Job class instead of JobConf and JobClient classes
- InputFormats and OutputFormats see changes like new methods and removed classes
How to calculate the cost of a Hadoop infrastructure on Amazon AWS, given some data volume estimates and the rough use case ?
Presentation attempts to compare the different options available on AWS.
2015 North Bridge Future of Cloud Computing Study, with Wikibon |Broadest exploration of cloud trends, cloud migration & evolution of the cloud computing sector. Survey participation was the largest to date and included responses from 38 countries. 50 collaborators supported the 5th Annual Future of Cloud Computing study, which reveals that cloud has become an accepted and integral technology. Furthermore, the study shows that despite deployment gaps among clouds, we should expect a future powered by hybrid cloud technologies. The question of whether companies are using the cloud has morphed to how deeply cloud adoption is integrated within the business. From the bottom to the top, all products and services will in some way be powered by the cloud making the promise of goods and services that have the potential to be better tomorrow than today. IT departments have reclaimed the reins on driving company technology strategy and cloud adoption as roles, skills and processes have shifted. Importantly, We’re also seeing the emergence of the cloud as the only way businesses can truly get more out of their data including analyzing and executing on it real-time. On the investment front, 2015 could tip the scale from private to public capital for SaaS companies.
This document provides an overview of Amazon Web Services (AWS) including its history, services, pricing model, global infrastructure, and how customers can get started with AWS. It describes how AWS began as Amazon's internal infrastructure and has grown to serve over 1 million customers globally across industries like startups, enterprises, and government agencies. The document outlines AWS's broad range of cloud computing services across categories like compute, storage, databases, analytics, mobile, and more. It emphasizes AWS's focus on innovation with new services and features, lower prices through economies of scale, and its utility-based on-demand pricing model. Finally, it suggests steps for getting started like using the free tier, training, and certification programs.
Curious about the cloud? We've got answers. Join HOSTING for an overview of cloud hosting and computing basics. From the history of the cloud to the projected future, we'll investigate the foundation of this $2.1 billion industry.
This document summarizes a presentation about cloud computing and its uses for GIS. Cloud computing provides scalable computing resources and applications as an on-demand service over the internet. The document defines different types of cloud services including Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). It provides examples of how Esri and other organizations are using the cloud, including deploying ArcGIS Server on Amazon Web Services and hosting web applications on ArcGIS.com. The benefits and risks of cloud computing for GIS are also discussed.
This document discusses how to reduce spending on AWS through various techniques:
1. Paying for cloud resources only when they are used through the pay-as-you-go model avoids upfront costs and allows turning off unused capacity.
2. Using reserved instances when capacity needs are predictable provides significant discounts compared to on-demand pricing.
3. Architecting applications in a "cost aware" manner, such as leveraging caching, auto-scaling, managed services, and right-sizing instances can optimize costs.
4. Taking advantage of AWS's economies of scale through consolidated billing and free services helps lower overall spend. Planning workload usage of spot instances can achieve up to 85% savings.
Cloud Computing Primer: Using cloud computing tools in your museumRobert J. Stein
A presentation by Robert Stein, Charlie Moad and Ari Davidow on cloud computing for the Museum Computer Network Conference in Portland, OR November, 2009
The document discusses cloud computing and provides examples of how museums can utilize cloud services. It describes common cloud applications and utilities, discusses pros and cons of the cloud, and provides specific examples of how the International Museum of Art used Amazon Web Services (AWS) to save costs on data storage and transition their website and video servers to the cloud.
Amazon S3 provides inexpensive cloud storage while EC2 offers virtual computing resources. S3 allows storage of unlimited data for $0.15 per GB per month with data retrieval priced at $0.10-$0.13 per GB depending on amount. EC2's virtual machines range in power and price from $0.10 per hour for a small instance to $0.80 per hour for an extra large one. Both services offer flexibility to scale up or down on demand with no long term commitments.
The Future is Now: Leveraging the Cloud with RubyRobert Dempsey
My presentation from the Ruby Hoedown on cloud computing and how Ruby developers can take advantage of cloud services to build scalable web applications.
The document provides instructions for launching an M-Pin Core service instance on Amazon EC2. It describes choosing an Amazon Machine Image, instance type, storage options, and configuring security groups. The steps also cover accessing the M-Pin Core trial demo and configuring the instance host and port. Once launched, the M-Pin Core service can be accessed in a browser to create identities and pins for strong authentication testing.
The document provides instructions for launching the M-Pin Core service on Amazon Elastic Compute Cloud (EC2). It describes:
1) How to create an EC2 instance, including choosing an Amazon Machine Image, instance type, storage, security groups, and other configuration details.
2) How to launch the M-Pin Core instance and access the 30-day free trial. This involves configuring the instance, host, and port and viewing the M-Pin Core service in a browser.
3) How to create an identity and pin using the M-Pin Core demo, and log in to test the strong authentication capabilities.
The document provides instructions for launching an M-Pin Core service instance on Amazon EC2. It describes choosing an Amazon Machine Image, instance type, storage options, and configuring security groups. The steps also cover accessing the M-Pin Core trial demo and configuring the instance host and port. Once launched, the M-Pin Core service can be accessed in a browser to create identities and pins for strong authentication testing.
How to run your Hadoop Cluster in 10 minutesVladimir Simek
- Two companies faced challenges processing big data on-premises, including high fixed costs, slow deployment, lack of scalability, and outages impacting production.
- Amazon Elastic MapReduce (EMR) provides a managed Hadoop service that allows companies to launch clusters within minutes in the AWS cloud at lower costs by using elastic and scalable infrastructure.
- AOL moved their 2PB on-premises Hadoop cluster to EMR, reducing costs by 4x while gaining automatic scaling and high availability across availability zones. EMR addressed their challenges and allowed faster restatement of historical data.
EC2 Masterclass from the AWS User Group Scotland MeetupIan Massingham
The document provides an overview of Amazon Elastic Compute Cloud (EC2) including what EC2 is, how it works, instance types, pricing models, and how to launch instances. Specifically:
- EC2 provides resizable compute capacity in the cloud and allows users to run and manage application servers and workloads.
- Users have complete control over their instances and can choose from different instance types optimized for compute, memory, storage or GPU.
- EC2 offers several pricing models including on-demand, reserved, and spot instances to provide flexibility and cost savings based on usage levels and predictability.
The document summarizes the benefits and costs of harnessing cloud computing services like Amazon EC2 and S3. Key services highlighted include Amazon EC2 virtual servers starting at $73/month, S3 scalable and reliable storage starting at $0.15/GB/month, and bandwidth prices under $10/mbps. Recommendations are provided for setting up redundant application and database servers using services like Puppet for auto-configuration, DNS Made Easy for failover, and CloudFront for content delivery. Cost estimates are given for basic server setups ranging from $73-292 per month.
The document discusses exploring cloud computing to reduce costs and improve performance for computing resources. It proposes a hybrid scheduler that would allocate additional machines from Amazon EC2's cloud services when local CPU usage exceeds a threshold, and deallocate them when usage decreases. This could help pay only for resources that are used, improve peak performance, and reduce costs compared to maintaining idle hardware. Key questions to answer include determining accurate cost models and the software to use on virtual machines.
Big Data Hadoop Local and Public Cloud (Amazon EMR)IMC Institute
The document outlines a hands-on workshop on running Hadoop on Amazon Elastic MapReduce (EMR). It provides instructions on setting up an AWS account and necessary services like S3 and EC2. It then guides attendees on writing a word count MapReduce program, packaging it into a JAR file, and running it on EMR. The document shares examples and code for performing common analytics tasks like aggregation, group-by operations, and frequency distributions using MapReduce on Hadoop.
The TCO Calculator - Estimate the True Cost of Hadoop MapR Technologies
http://bit.ly/1wsAuRS - There are many hidden costs for Apache Hadoop that have different effects across different Hadoop distributions. With the new MapR TCO calculator organisations have a simple and reliable tool that is based on facts to compare costs.
The document summarizes the key changes between the old MapReduce API and the new MapReduce API in Hadoop. Some of the main changes include:
- Renaming all "mapred" packages to "mapreduce"
- Methods can now throw InterruptedException in addition to IOException
- Using Configuration instead of JobConf
- Changes to Mapper, Reducer, and RecordReader interfaces and classes
- Submitting jobs uses the Job class instead of JobConf and JobClient classes
- InputFormats and OutputFormats see changes like new methods and removed classes
How to calculate the cost of a Hadoop infrastructure on Amazon AWS, given some data volume estimates and the rough use case ?
Presentation attempts to compare the different options available on AWS.
2015 North Bridge Future of Cloud Computing Study, with Wikibon |Broadest exploration of cloud trends, cloud migration & evolution of the cloud computing sector. Survey participation was the largest to date and included responses from 38 countries. 50 collaborators supported the 5th Annual Future of Cloud Computing study, which reveals that cloud has become an accepted and integral technology. Furthermore, the study shows that despite deployment gaps among clouds, we should expect a future powered by hybrid cloud technologies. The question of whether companies are using the cloud has morphed to how deeply cloud adoption is integrated within the business. From the bottom to the top, all products and services will in some way be powered by the cloud making the promise of goods and services that have the potential to be better tomorrow than today. IT departments have reclaimed the reins on driving company technology strategy and cloud adoption as roles, skills and processes have shifted. Importantly, We’re also seeing the emergence of the cloud as the only way businesses can truly get more out of their data including analyzing and executing on it real-time. On the investment front, 2015 could tip the scale from private to public capital for SaaS companies.
This document provides an overview of Amazon Web Services (AWS) including its history, services, pricing model, global infrastructure, and how customers can get started with AWS. It describes how AWS began as Amazon's internal infrastructure and has grown to serve over 1 million customers globally across industries like startups, enterprises, and government agencies. The document outlines AWS's broad range of cloud computing services across categories like compute, storage, databases, analytics, mobile, and more. It emphasizes AWS's focus on innovation with new services and features, lower prices through economies of scale, and its utility-based on-demand pricing model. Finally, it suggests steps for getting started like using the free tier, training, and certification programs.
Curious about the cloud? We've got answers. Join HOSTING for an overview of cloud hosting and computing basics. From the history of the cloud to the projected future, we'll investigate the foundation of this $2.1 billion industry.
This document summarizes a presentation about cloud computing and its uses for GIS. Cloud computing provides scalable computing resources and applications as an on-demand service over the internet. The document defines different types of cloud services including Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). It provides examples of how Esri and other organizations are using the cloud, including deploying ArcGIS Server on Amazon Web Services and hosting web applications on ArcGIS.com. The benefits and risks of cloud computing for GIS are also discussed.
This document discusses how to reduce spending on AWS through various techniques:
1. Paying for cloud resources only when they are used through the pay-as-you-go model avoids upfront costs and allows turning off unused capacity.
2. Using reserved instances when capacity needs are predictable provides significant discounts compared to on-demand pricing.
3. Architecting applications in a "cost aware" manner, such as leveraging caching, auto-scaling, managed services, and right-sizing instances can optimize costs.
4. Taking advantage of AWS's economies of scale through consolidated billing and free services helps lower overall spend. Planning workload usage of spot instances can achieve up to 85% savings.
Cloud Computing Primer: Using cloud computing tools in your museumRobert J. Stein
A presentation by Robert Stein, Charlie Moad and Ari Davidow on cloud computing for the Museum Computer Network Conference in Portland, OR November, 2009
The document discusses cloud computing and provides examples of how museums can utilize cloud services. It describes common cloud applications and utilities, discusses pros and cons of the cloud, and provides specific examples of how the International Museum of Art used Amazon Web Services (AWS) to save costs on data storage and transition their website and video servers to the cloud.
Amazon S3 provides inexpensive cloud storage while EC2 offers virtual computing resources. S3 allows storage of unlimited data for $0.15 per GB per month with data retrieval priced at $0.10-$0.13 per GB depending on amount. EC2's virtual machines range in power and price from $0.10 per hour for a small instance to $0.80 per hour for an extra large one. Both services offer flexibility to scale up or down on demand with no long term commitments.
The Future is Now: Leveraging the Cloud with RubyRobert Dempsey
My presentation from the Ruby Hoedown on cloud computing and how Ruby developers can take advantage of cloud services to build scalable web applications.
The document provides instructions for launching an M-Pin Core service instance on Amazon EC2. It describes choosing an Amazon Machine Image, instance type, storage options, and configuring security groups. The steps also cover accessing the M-Pin Core trial demo and configuring the instance host and port. Once launched, the M-Pin Core service can be accessed in a browser to create identities and pins for strong authentication testing.
The document provides instructions for launching the M-Pin Core service on Amazon Elastic Compute Cloud (EC2). It describes:
1) How to create an EC2 instance, including choosing an Amazon Machine Image, instance type, storage, security groups, and other configuration details.
2) How to launch the M-Pin Core instance and access the 30-day free trial. This involves configuring the instance, host, and port and viewing the M-Pin Core service in a browser.
3) How to create an identity and pin using the M-Pin Core demo, and log in to test the strong authentication capabilities.
The document provides instructions for launching an M-Pin Core service instance on Amazon EC2. It describes choosing an Amazon Machine Image, instance type, storage options, and configuring security groups. The steps also cover accessing the M-Pin Core trial demo and configuring the instance host and port. Once launched, the M-Pin Core service can be accessed in a browser to create identities and pins for strong authentication testing.
How to run your Hadoop Cluster in 10 minutesVladimir Simek
- Two companies faced challenges processing big data on-premises, including high fixed costs, slow deployment, lack of scalability, and outages impacting production.
- Amazon Elastic MapReduce (EMR) provides a managed Hadoop service that allows companies to launch clusters within minutes in the AWS cloud at lower costs by using elastic and scalable infrastructure.
- AOL moved their 2PB on-premises Hadoop cluster to EMR, reducing costs by 4x while gaining automatic scaling and high availability across availability zones. EMR addressed their challenges and allowed faster restatement of historical data.
EC2 Masterclass from the AWS User Group Scotland MeetupIan Massingham
The document provides an overview of Amazon Elastic Compute Cloud (EC2) including what EC2 is, how it works, instance types, pricing models, and how to launch instances. Specifically:
- EC2 provides resizable compute capacity in the cloud and allows users to run and manage application servers and workloads.
- Users have complete control over their instances and can choose from different instance types optimized for compute, memory, storage or GPU.
- EC2 offers several pricing models including on-demand, reserved, and spot instances to provide flexibility and cost savings based on usage levels and predictability.
The document summarizes the benefits and costs of harnessing cloud computing services like Amazon EC2 and S3. Key services highlighted include Amazon EC2 virtual servers starting at $73/month, S3 scalable and reliable storage starting at $0.15/GB/month, and bandwidth prices under $10/mbps. Recommendations are provided for setting up redundant application and database servers using services like Puppet for auto-configuration, DNS Made Easy for failover, and CloudFront for content delivery. Cost estimates are given for basic server setups ranging from $73-292 per month.
The document discusses exploring cloud computing to reduce costs and improve performance for computing resources. It proposes a hybrid scheduler that would allocate additional machines from Amazon EC2's cloud services when local CPU usage exceeds a threshold, and deallocate them when usage decreases. This could help pay only for resources that are used, improve peak performance, and reduce costs compared to maintaining idle hardware. Key questions to answer include determining accurate cost models and the software to use on virtual machines.
Elastic Compute Cloud (EC2) on AWS PresentationKnoldus Inc.
In this session, we will delve into Amazon Elastic Compute Cloud (Amazon EC2) , EC2 is a cloud computing service provided by Amazon Web Services (AWS) that enables users to easily and flexibly deploy and manage virtual servers, known as instances, in the cloud. EC2 offers a wide range of instance types to cater to diverse computing needs, from small-scale web applications to high-performance computing clusters. Users can select the operating system, configure the instance specifications, and scale their compute capacity up or down as needed, paying only for the compute resources they consume. This service empowers businesses and developers to efficiently run their applications, host websites, and perform various computing tasks in a scalable and cost-effective manner without the hassle of managing physical hardware.
This document provides an introduction and overview of Amazon EC2. It discusses what EC2 is, its core components and features such as Elastic Block Storage, Auto Scaling, and Elastic Load Balancing. It covers EC2 pricing models including On-Demand, Reserved, and Spot Instances. It also provides examples of how to cost-effectively run ASP.NET applications on EC2 and discusses tools for managing EC2 resources.
This document summarizes Steve Loughran's research into deploying applications across distributed cloud resources like Amazon EC2 and S3. It discusses moving from single server installations to server farms and cloud computing. Key benefits include scaling easily without large capital costs, but challenges include lack of persistent storage, dynamic IP addresses, and single points of failure. The document provides examples of using EC2 and S3 programmatically through the SmartFrog framework.
AWS re:Invent re:Cap 행사에서 발표된 강연 자료입니다. 이종남 프로페셔널 컨설턴트가 발표한 자료에서 더 자세한 내용을 담은 심화 콘텐츠입니다. 작성자는 이원일 시니어 컨설턴트입니다.
내용 요약: AWS 클라우드 인프라 활용의 이점을 극대화하기 위해 취해야 할 최적화 방안과 아키텍처 설계 방법에 대해 알아보겠습니다. AWS의 성능 최적화 전문가에게서 모범 사례를 습득하고 인프라 확장 과정에서 최적의 성능을 확보하려면 어떤 서비스를 어떻게 활용해야 하는지 알아보시기 바랍니다.
Definition of cloud computing and clearly seperation of meaning between it an SaaS. Debunks the "Isn't it just like..." myths. Then gives overview of existing vendors and the key problems.
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...Adrian Cockcroft
Flowcon keynote was a few days before CMG, a few tweaks and some extra content added at the start and end. Opening Keynote talk for both conferences on how Speed Wins and how Netflix is doing Continuous Delivery
Bottleneck analysis - Devopsdays Silicon Valley 2013Adrian Cockcroft
The document analyzes bottle delivery response time data over various intervals. Summary statistics show the response times have a mean of 3.086 seconds and standard deviation of 1.94 seconds. A chp analysis reveals the system is well-behaved with low lock contention.
Netflix Global Applications - NoSQL Search RoadshowAdrian Cockcroft
This document summarizes Netflix's approach to building cloud native applications. It discusses how Netflix uses microservices, replicates components across availability zones, and implements automated testing like Chaos Monkey to make applications resilient to failures. It also describes how Netflix uses Apache Cassandra and other open source tools to build highly available storage and handle large volumes of data in the cloud.
A collection of information taken from previous presentations that was used as drill down for supporting discussion of specific topics during the tutorial.
Same basic flow as the keynote, but with a lot more detail, and we had a lot more interactive discussion rather than a presentation format. See part 2 for some more specific detail and links to other presentations.
1) Cloud native applications are built to take advantage of cloud computing resources like dynamically provisioned micro-services and distributed ephemeral components.
2) Netflix has transitioned to being a cloud native application built on an open source platform using AWS for scalable infrastructure, but also uses other providers for services not fully supported by AWS like content delivery and DNS.
3) What has changed is developers are freed from being the bottleneck through decentralization and automation of operations, allowing for greater agility, innovation, and business competitiveness in the cloud native model.
Adrian Cockcroft discusses the challenges of building reliable cloud services in an imperfect environment. He describes Netflix's approach of using microservices, continuous delivery, and automation to create stability. Cockcroft also introduces NetflixOSS, an open source platform that provides libraries and tools to help other companies adopt this "cloud native" architecture. The talk outlines opportunities to improve portability and foster an ecosystem around NetflixOSS.
The document discusses Netflix's use of open source technologies in its cloud architecture. It summarizes how Netflix leverages open source software to build cloud native applications that are highly scalable and available on AWS. Key aspects include building stateless microservices, using Cassandra for data storage in a quorum across multiple availability zones, and tools like Edda for configuration management and monitoring. The document advocates for open sourcing Netflix's best practices to help drive innovation.
Introduction to the Netflix Open Source Software project, explains why Netflix is doing this, how all the parts fit together and what is planned to come next. Presented at the inaugural NetflixOSS Meetup February 6th 2013 at Netflix headquarters in Los Gatos.
AWS Re:Invent - High Availability Architecture at NetflixAdrian Cockcroft
Slides from my talk at AWS Re:Invent November 2012. Describes the architecture, how to make highly available application code and data stores, a taxonomy of failure modes, and actual failures and effects. Ends with a summary of @NetflixOSS projects so others can easily leverage this architecture.
Architecture talk aimed at a well informed developer audience (i.e. QConSF Real Use Cases for NoSQL track), focused mainly on availability. Skips the Netflix cloud migration stuff that is in other talks.
SV Forum Platform Architecture SIG - Netflix Open Source PlatformAdrian Cockcroft
Architecture overview of Netflix Cloud Architecture with a focus on the Open Source components that Netflix has put and is planning to release on http://netflix.github.com
Summary of past Cassandra benchmarks performed by Netflix and description of how Netflix uses Cassandra interspersed with a live demo automated using Jenkins and Jmeter that created two 12 node Cassandra clusters from scratch on AWS, one with regular disks and one with SSDs. Both clusters were scaled up to 24 nodes each during the demo.
Latest version of the Netflix Cloud Architecture story was given at Gluecon May 23rd 2012. Gluecon rocks, and lots of Van Halen references were added for the occasion. There tradeoff between developer driven high functionality AWS based PaaS, and operations driven low cost portable PaaS is discussed. The three sections cover the developer view, the operator view and the builder view.
This document discusses Ne0lix's cloud architecture and use of AWS. Ne0lix built its own scalable Java-oriented PaaS to run on AWS due to limited PaaS options when they started. They moved most applications to SaaS and the cloud for improved business agility and faster scaling. Ne0lix chose AWS for its scale, features, automation and global availability despite AWS also being a competitor in some areas. Their cloud architecture focuses on speed, scalability, and meeting goals around latency and capacity.
Cloud Architecture Tutorial - Why and What (1of 3) Adrian Cockcroft
Introduction to the Netflix Cloud Architecture Tutorial - discusses the why and what of cloud including the thinking behind Netflix choice of AWS, and the product features that Netflix runs in the cloud.
This document provides an overview of a presentation on cloud architecture and anti-architecture patterns. The presentation discusses moving a company's primary data store from a centralized SQL database to a distributed Cassandra database in the cloud. An initial prototype backup solution was overengineered, becoming complex and taking too long to implement fully. This highlighted the importance of defining anti-architecture constraints upfront to guide development in a simpler direction. The presentation concludes with a discussion of differences between the company's existing datacenter architecture and goals for a cloud architecture, focusing on replacing centralized components with distributed and decoupled alternatives.
Cloud Architecture Tutorial - Running in the Cloud (3of3)Adrian Cockcroft
Part 3 of the talk covers how to transition to cloud, how to bootstrap developers, how to run cloud services including Cassandra, capacity planning and workload analysis, and organizational structure
Slides from QConSF Nov 19th, 2011 focusing this time on describing the globally distributed and scaled industrial strength Java Platform as a Service that Netflix has built and run on top of AWS and Cassandra. Parts of that platform are being released as open source - Curator, Priam and Astyanax.
fennec fox optimization algorithm for optimal solutionshallal2
Imagine you have a group of fennec foxes searching for the best spot to find food (the optimal solution to a problem). Each fox represents a possible solution and carries a unique "strategy" (set of parameters) to find food. These strategies are organized in a table (matrix X), where each row is a fox, and each column is a parameter they adjust, like digging depth or speed.
Challenges in Migrating Imperative Deep Learning Programs to Graph Execution:...Raffi Khatchadourian
Efficiency is essential to support responsiveness w.r.t. ever-growing datasets, especially for Deep Learning (DL) systems. DL frameworks have traditionally embraced deferred execution-style DL code that supports symbolic, graph-based Deep Neural Network (DNN) computation. While scalable, such development tends to produce DL code that is error-prone, non-intuitive, and difficult to debug. Consequently, more natural, less error-prone imperative DL frameworks encouraging eager execution have emerged at the expense of run-time performance. While hybrid approaches aim for the "best of both worlds," the challenges in applying them in the real world are largely unknown. We conduct a data-driven analysis of challenges---and resultant bugs---involved in writing reliable yet performant imperative DL code by studying 250 open-source projects, consisting of 19.7 MLOC, along with 470 and 446 manually examined code patches and bug reports, respectively. The results indicate that hybridization: (i) is prone to API misuse, (ii) can result in performance degradation---the opposite of its intention, and (iii) has limited application due to execution mode incompatibility. We put forth several recommendations, best practices, and anti-patterns for effectively hybridizing imperative DL code, potentially benefiting DL practitioners, API designers, tool developers, and educators.
Transcript: Canadian book publishing: Insights from the latest salary survey ...BookNet Canada
Join us for a presentation in partnership with the Association of Canadian Publishers (ACP) as they share results from the recently conducted Canadian Book Publishing Industry Salary Survey. This comprehensive survey provides key insights into average salaries across departments, roles, and demographic metrics. Members of ACP’s Diversity and Inclusion Committee will join us to unpack what the findings mean in the context of justice, equity, diversity, and inclusion in the industry.
Results of the 2024 Canadian Book Publishing Industry Salary Survey: https://publishers.ca/wp-content/uploads/2025/04/ACP_Salary_Survey_FINAL-2.pdf
Link to presentation slides and transcript: https://bnctechforum.ca/sessions/canadian-book-publishing-insights-from-the-latest-salary-survey/
Presented by BookNet Canada and the Association of Canadian Publishers on May 1, 2025 with support from the Department of Canadian Heritage.
Slides for the session delivered at Devoxx UK 2025 - Londo.
Discover how to seamlessly integrate AI LLM models into your website using cutting-edge techniques like new client-side APIs and cloud services. Learn how to execute AI models in the front-end without incurring cloud fees by leveraging Chrome's Gemini Nano model using the window.ai inference API, or utilizing WebNN, WebGPU, and WebAssembly for open-source models.
This session dives into API integration, token management, secure prompting, and practical demos to get you started with AI on the web.
Unlock the power of AI on the web while having fun along the way!
UiPath Agentic Automation: Community Developer OpportunitiesDianaGray10
Please join our UiPath Agentic: Community Developer session where we will review some of the opportunities that will be available this year for developers wanting to learn more about Agentic Automation.
Webinar - Top 5 Backup Mistakes MSPs and Businesses Make .pptxMSP360
Data loss can be devastating — especially when you discover it while trying to recover. All too often, it happens due to mistakes in your backup strategy. Whether you work for an MSP or within an organization, your company is susceptible to common backup mistakes that leave data vulnerable, productivity in question, and compliance at risk.
Join 4-time Microsoft MVP Nick Cavalancia as he breaks down the top five backup mistakes businesses and MSPs make—and, more importantly, explains how to prevent them.
AI Agents at Work: UiPath, Maestro & the Future of DocumentsUiPathCommunity
Do you find yourself whispering sweet nothings to OCR engines, praying they catch that one rogue VAT number? Well, it’s time to let automation do the heavy lifting – with brains and brawn.
Join us for a high-energy UiPath Community session where we crack open the vault of Document Understanding and introduce you to the future’s favorite buzzword with actual bite: Agentic AI.
This isn’t your average “drag-and-drop-and-hope-it-works” demo. We’re going deep into how intelligent automation can revolutionize the way you deal with invoices – turning chaos into clarity and PDFs into productivity. From real-world use cases to live demos, we’ll show you how to move from manually verifying line items to sipping your coffee while your digital coworkers do the grunt work:
📕 Agenda:
🤖 Bots with brains: how Agentic AI takes automation from reactive to proactive
🔍 How DU handles everything from pristine PDFs to coffee-stained scans (we’ve seen it all)
🧠 The magic of context-aware AI agents who actually know what they’re doing
💥 A live walkthrough that’s part tech, part magic trick (minus the smoke and mirrors)
🗣️ Honest lessons, best practices, and “don’t do this unless you enjoy crying” warnings from the field
So whether you’re an automation veteran or you still think “AI” stands for “Another Invoice,” this session will leave you laughing, learning, and ready to level up your invoice game.
Don’t miss your chance to see how UiPath, DU, and Agentic AI can team up to turn your invoice nightmares into automation dreams.
This session streamed live on May 07, 2025, 13:00 GMT.
Join us and check out all our past and upcoming UiPath Community sessions at:
👉 https://community.uipath.com/dublin-belfast/
HCL Nomad Web – Best Practices and Managing Multiuser Environmentspanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-nomad-web-best-practices-and-managing-multiuser-environments/
HCL Nomad Web is heralded as the next generation of the HCL Notes client, offering numerous advantages such as eliminating the need for packaging, distribution, and installation. Nomad Web client upgrades will be installed “automatically” in the background. This significantly reduces the administrative footprint compared to traditional HCL Notes clients. However, troubleshooting issues in Nomad Web present unique challenges compared to the Notes client.
Join Christoph and Marc as they demonstrate how to simplify the troubleshooting process in HCL Nomad Web, ensuring a smoother and more efficient user experience.
In this webinar, we will explore effective strategies for diagnosing and resolving common problems in HCL Nomad Web, including
- Accessing the console
- Locating and interpreting log files
- Accessing the data folder within the browser’s cache (using OPFS)
- Understand the difference between single- and multi-user scenarios
- Utilizing Client Clocking
Generative Artificial Intelligence (GenAI) in BusinessDr. Tathagat Varma
My talk for the Indian School of Business (ISB) Emerging Leaders Program Cohort 9. In this talk, I discussed key issues around adoption of GenAI in business - benefits, opportunities and limitations. I also discussed how my research on Theory of Cognitive Chasms helps address some of these issues
TrsLabs - AI Agents for All - Chatbots to Multi-Agents SystemsTrs Labs
AI Adoption for Your Business
AI applications have evolved from chatbots
into sophisticated AI agents capable of
handling complex workflows. Multi-agent
systems are the next phase of evolution.
UiPath Agentic Automation: Community Developer OpportunitiesDianaGray10
Please join our UiPath Agentic: Community Developer session where we will review some of the opportunities that will be available this year for developers wanting to learn more about Agentic Automation.
Canadian book publishing: Insights from the latest salary survey - Tech Forum...BookNet Canada
Join us for a presentation in partnership with the Association of Canadian Publishers (ACP) as they share results from the recently conducted Canadian Book Publishing Industry Salary Survey. This comprehensive survey provides key insights into average salaries across departments, roles, and demographic metrics. Members of ACP’s Diversity and Inclusion Committee will join us to unpack what the findings mean in the context of justice, equity, diversity, and inclusion in the industry.
Results of the 2024 Canadian Book Publishing Industry Salary Survey: https://publishers.ca/wp-content/uploads/2025/04/ACP_Salary_Survey_FINAL-2.pdf
Link to presentation recording and transcript: https://bnctechforum.ca/sessions/canadian-book-publishing-insights-from-the-latest-salary-survey/
Presented by BookNet Canada and the Association of Canadian Publishers on May 1, 2025 with support from the Department of Canadian Heritage.
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025BookNet Canada
Book industry standards are evolving rapidly. In the first part of this session, we’ll share an overview of key developments from 2024 and the early months of 2025. Then, BookNet’s resident standards expert, Tom Richardson, and CEO, Lauren Stewart, have a forward-looking conversation about what’s next.
Link to recording, transcript, and accompanying resource: https://bnctechforum.ca/sessions/standardsgoals-for-2025-standards-certification-roundup/
Presented by BookNet Canada on May 6, 2025 with support from the Department of Canadian Heritage.
3. What is Capacity PlanningWe care about CPU, Memory, Network and Disk resources, and Application response timesWe need to know how much of each resource we are using now, and will use in the futureWe need to know how much headroom we have to handle higher loadsWe want to understand how headroom varies, and how it relates to application response times and throughput
4. Capacity Planning NormsCapacity is expensiveCapacity takes time to buy and provisionCapacity only increases, can’t be shrunk easilyCapacity comes in big chunks, paid up frontPlanning errors can cause big problemsSystems are clearly defined assetsSystems can be instrumented in detail
5. Capacity Planning in CloudsCapacity is expensiveCapacity takes time to buy and provisionCapacity only increases, can’t be shrunk easilyCapacity comes in big chunks, paid up frontPlanning errors can cause big problemsSystems are clearly defined assetsSystems can be instrumented in detail
6. Capacity is expensivehttp://aws.amazon.com/s3/ & http://aws.amazon.com/ec2/Storage (Amazon S3) $0.150 per GB – first 50 TB / month of storage used$0.120 per GB – storage used / month over 500 TBData Transfer (Amazon S3) $0.100 per GB – all data transfer in$0.170 per GB – first 10 TB / month data transfer out$0.100 per GB – data transfer out / month over 150 TBRequests (Amazon S3 Storage access is via http)$0.01 per 1,000 PUT, COPY, POST, or LIST requests$0.01 per 10,000 GET and all other requests$0 per DELETECPU (Amazon EC2)Small (Default) $0.085 per hour, Extra Large $0.68 per hourNetwork (Amazon EC2)Inbound/Outbound around $0.10 per GB
7. Capacity comes in big chunks, paid up frontCapacity takes time to buy and provisionNo minimum price, monthly billing“Amazon EC2 enables you to increase or decrease capacity within minutes, not hours or days. You can commission one, hundreds or even thousands of server instances simultaneously”Capacity only increases, can’t be shrunk easilyPay for what is actually usedPlanning errors can cause big problemsSize only for what you need now
8. Systems are clearly defined assetsYou are running in a “stateless” multi-tenanted virtual image that can die or be taken away and replaced at any timeYou don’t know exactly where it isYou can choose to locate “USA” or “Europe”You can specify zones that will not share components to avoid common mode failures
9. Systems can be instrumented in detailNeed to use stateless monitoring toolsMonitored nodes come and go by the hourNeed to write role-name to hostnamee.g. wwwprod002, not the EC2 defaultall monitoring by role-nameGanglia – automatic configurationMulticast replicated monitoring stateNo need to pre-define metrics and nodes
10. Acquisition requires management buy-inAnyone with a credit card and $10 is in businessData governance issues…Remember 1980’s when PC’s first turned up?Departmental budgets funded PC’sNo standards, management or backupsCentral IT departments could lose controlDecentralized use of clouds will be driven by teams seeking competitive advantage and business agility – ultimately unstoppable…
12. The Umbrella StrategyMonitor network traffic to cloud vendor API’sCatch unauthorized clouds as they formMeasure, trend and predict cloud activityPick two cloud standards, setup corporate accountsBetter sharing of lessons as they are learnedCreate path of least resistance for usersAvoid vendor lock-inAggregate traffic to get bulk discountsPressure the vendors to develop common standards, but don’t wait for them to arrive. The best APIs will be cloned eventuallySponsor a pathfinder project for each vendorNavigate a route through the cloudsDon’t get caught unawares in the rain
13. Predicting the WeatherBilling is variable, monthly in arrearsTry to predict how much you will be billed…Not an issue to start with, but can’t be ignoredAmazon has a cost estimator tool that may helpCentral analysis and collection of cloud metricsCloud vendor usage metrics via APIYour system and application metrics via GangliaIntra-cloud bandwidth is free, analyze in the cloud!Based on standard analysis framework “Hadoop”Validation of vendor metrics and billingCharge-back for individual user applications
14. Use it to learn it…Focus on how you can use the cloud yourself to do large scale log processingYou can upload huge datasets to a cloud and crunch them with a large cluster of computers using Amazon Elastic Map Reduce (EMR)Do it all from your web browser for a handful of dollars charged to a credit card.Here’s how
15. Cloud CrunchThe RecipeYou will need:A computer connected to the InternetThe Firefox browserA Firefox specific browser extensionA credit card and less than $1 to spendBig log files as ingredientsSome very processor intensive queries
16. RecipeFirst we will warm up our computer by setting up Firefox and connecting to the cloud.Then we will upload our ingredients to be crunched, at about 20 minutes per gigabyteYou should pick between one and twenty processors to crunch with, they are charged by the hour and the cloud takes about 10 minutes to warm up.The query itself starts by mapping the ingredients so that the mixture separates, then the excess is boiled off to make a nice informative reduction.
17. CostsFirefox and Extension – freeUpload ingredients – 10 cents/GigabyteSmall Processors – 11.5 cents/hour eachDownload results – 17 cents/GigabyteStorage – 15 cents/Gigabyte/monthService updates – 1 cent/1000 callsService requests – 1 cent/10,000 callsActual cost to run two example programs as described in this presentation was 26 cents
18. Faster Results at Same Cost!You may have trouble finding enough data and a complex enough query to keep the processors busy for an hour. In that case you can use fewer processorsConversely if you want quicker results you can use more and/or larger processors.Up to 20 systems with 8 CPU cores = 160 cores is immediately available. Oversize on request.
19. Step by StepWalkthrough to get you startedRun Amazon Elastic MapReduceexamplesGet up and running before this presentation is over….
20. Step 1 – Get Firefoxhttp://www.mozilla.com/en-US/firefox/firefox.html
21. Step 2 – Get S3Fox ExtensionNext select the Add-ons option from the Tools menu, select “Get Add-ons” and search for S3Fox.
22. Step 3 – Learn AboutAWSBring up http://aws.amazon.com/ to read about the services. Amazon S3 is short for Amazon Simple Storage Service, which is part of the Amazon Web Services product We will be using Amazon S3 to store data, and Amazon Elastic Map Reduce (EMR) to process it.Underneath EMR there is an Amazon Elastic Compute Cloud (EC2), which is created automatically for you each time you use EMR.
23. What is S3, EC2, EMR?Amazon Simple Storage Service lets you put data into the cloud that is addressed using a URL. Access to it can be private or public.Amazon Elastic Compute Cloud lets you pick the size and number of computers in your cloud.Amazon Elastic Map Reduce automatically builds a Hadoop cluster on top of EC2, feeds it data from S3, saves the results in S3, then removes the cluster and frees up the EC2 systems.
24. Step 4 – Sign Up For AWSGo to the top-right of the page and sign up at http://aws.amazon.com/You can login using the same account you use to buy books!
25. Step 5 – Sign Up For Amazon S3Follow the link to http://aws.amazon.com/s3/
27. Step 6 – Sign Up For Amazon EMRGo to http://aws.amazon.com/elasticmapreduce/
28. EC2 SignupThe EMR signup combines the other needed services such as EC2 in the sign up process and the rates for all are displayed.The EMR costs are in addition to the EC2 costs, so 1.5 cents/hour for EMR is added to the 10 cents/hour for EC2 making 11.5 cents/hour for each small instance running Linux with Hadoop.
29. EMR And EC2 Rates(old data – it is cheaper now with even bigger nodes)
30. Step 7 - How Big are EC2 Instances?See http://aws.amazon.com/ec2/instance-types/The compute power is specified in a standard unit called an EC2 Compute Unit (ECU).Amazon states “One EC2 Compute Unit (ECU) provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.”
31. Standard InstancesSmall Instance (Default) 1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit), 160 GB of instance storage, 32-bit platform.Large Instance 7.5 GB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each), 850 GB of instance storage, 64-bit platform.Extra Large Instance 15 GB of memory, 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each), 1690 GB of instance storage, 64-bit platform.
32. Compute Intensive InstancesHigh-CPU Medium Instance 1.7 GB of memory, 5 EC2 Compute Units (2 virtual cores with 2.5 EC2 Compute Units each), 350 GB of instance storage, 32-bit platform.High-CPU Extra Large Instance 7 GB of memory, 20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each), 1690 GB of instance storage, 64-bit platform.
33. Step 8 – Getting Startedhttp://docs.amazonwebservices.com/ElasticMapReduce/2009-03-31/GettingStartedGuide/There is a Getting Started guide and Developer Documentation including sample applicationsWe will be working through two of those applications.There is also a very helpful FAQ page that is worth reading through.
35. Step 10 – How ToMapReduce?Code directly in JavaSubmit “streaming” command line scriptsEMR bundles Perl, Python, Ruby and RCode MR sequences in Java using CascadingProcess log files using Cascading MultitoolWrite dataflow scripts with PigWrite SQL queries using Hive
36. Step 11 – Setup Access KeysGetting Started Guidehttp://docs.amazonwebservices.com/ElasticMapReduce/2009-03-31/GettingStartedGuide/gsFirstSteps.htmlThe URL to visit to get your key is:http://aws-portal.amazon.com/gp/aws/developer/account/index.html?action=access-key
37. Enter Access Keys in S3FoxOpen S3 Firefox Organizer and select the Manage Accounts button at the top leftEnter your access keys into the popup.
40. Step 13 – Run Your First JobSee the Getting Started Guide at:http://docs.amazonwebservices.com/ElasticMapReduce/2009-03-31/GettingStartedGuide/gsConsoleRunJob.htmllogin to the EMR console at https://console.aws.amazon.com/elasticmapreduce/homecreate a new job flow called “Word crunch”, and select the sample application word count.
42. Pick the Instance CountOne small instance (i.e. computer) will do…Cost will be 11.5 cents for 1 hour (minimum)
43. Start the Job FlowThe Job Flow takes a few minutes to get started, then completes in about 5 minutes run time
44. Step 14 – View The ResultsIn S3Fox click on the refresh icon, then doubleclick on the crunchie folderKeep clicking until the output file is visible
45. Save To Your PCS3Fox – click on left arrowSave to PCOpen in TextEditSee that the word “a” occurred 14716 timesboring…. so try a more interesting demo!Click Here
46. Step 15 – Crunch Some Log FilesCreate a new output bucket with a new nameStart a new job flow using the CloudFront DemoThis uses the Cascading Multitool
50. Agenda1. Hadoop - Background - Ecosystem - HDFS & Map/Reduce - Example2. Log & Data Analysis @ Netflix - Problem / Data Volume - Current & Future Projects 3. Hadoop on Amazon EC2 - Storage options - Deployment options
51. HadoopApache open-source software for reliable, scalable, distributed computing.Originally a sub-project of the Lucene search engine.Used to analyze and transform large data sets.Supports partial failure, Recoverability, Consistency & Scalability
52. Hadoop Sub-ProjectsCoreHDFS: A distributed file system that provides high throughput access to data. Map/Reduce : A framework for processing large data sets.HBase : A distributed database that supports structured data storage for large tablesHive : An infrastructure for ad hoc querying (sql like)Pig : A data-flow language and execution frameworkAvro : A data serialization system that provides dynamic integration with scripting languages. Similar to Thrift & Protocol Buffers Cascading : Executing data processing workflowChukwa, Mahout, ZooKeeper, and many more
53. Hadoop Distributed File System(HDFS)FeaturesCannot be mounted as a “file system”Access via command line or Java APIPrefers large files (multiple terabytes) to many small filesFiles are write once, read many (append coming soon)Users, Groups and PermissionsName and Space Quotas Blocksize and Replication factor are configurable per fileCommands: hadoop dfs -ls, -du, -cp, -mv, -rm, -rmrUploading fileshadoopdfs -put foomydata/foo
55. hadoop dfs –tail [–f] mydata/fooMap/ReduceMap/Reduce is a programming model for efficientdistributed computingData processing of large datasetMassively parallel (hundreds or thousands of CPUs)Easy to useProgrammers don’t worry about socket(), etc.It works like a Unix pipeline:cat * | grep | sort | uniq -c | cat > outputInput | Map | Shuffle & Sort | Reduce | OutputEfficiency from streaming through data, reducing seeksA good fit for a lot of applicationsLog processingIndex buildingData mining and machine learning
58. Input & Output FormatsThe application also chooses input and output formats, which define how the persistent data is read and written. These are interfaces and can be defined by the application.InputFormatSplits the input to determine the input to each map task.Defines a RecordReader that reads key, value pairs that are passed to the map taskOutputFormatGiven the key, value pairs and a filename, writes the reduce task output to persistent store.
59. Map/Reduce ProcessesLaunching ApplicationUser application codeSubmits a specific kind of Map/Reduce jobJobTrackerHandles all jobsMakes all scheduling decisionsTaskTrackerManager for all tasks on a given nodeTaskRuns an individual map or reduce fragment for a given jobForks from the TaskTracker
60. Word Count ExampleMapperInput: value: lines of text of inputOutput: key: word, value: 1ReducerInput: key: word, value: set of countsOutput: key: word, value: sumLaunching programDefines the jobSubmits job to cluster
61. Map Phasepublic static class WordCountMapperextends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { String line = value.toString();StringTokenizeritr = new StringTokenizer(line, “,”); while (itr.hasMoreTokens()) {word.set(itr.nextToken());output.collect(word, one); } }}
62. Reduce Phasepublic static class WordCountReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {int sum = 0; while (values.hasNext()) { sum += values.next().get(); }output.collect(key, new IntWritable(sum)); }}
63. Running the Jobpublic class WordCount { public static void main(String[] args) throws IOException {JobConf conf = new JobConf(WordCount.class); // the keys are words (strings)conf.setOutputKeyClass(Text.class); // the values are counts (ints)conf.setOutputValueClass(IntWritable.class);conf.setMapperClass(WordCountMapper.class);conf.setReducerClass(WordCountReducer.class);conf.setInputPath(new Path(args[0]);conf.setOutputPath(new Path(args[1]);JobClient.runJob(conf); }}
64. Running the Example InputWelcome to NetflixThis is a great place to workOuputNetflix 1This 1Welcome 1a 1great 1is 1place 1to 2work 1
66. Analysis using access logTopURLsUsersIPSizeTime based analysisSessionDurationVisits per sessionIdentify attacks (DOS, Invalid Plug-in, etc)Study Visit patterns for Resource PlanningPreload relevant dataImpact of WWW call on middle tier services
67. Why run Hadoop in the cloud?“Infinite” resourcesHadoop scales linearlyElasticityRun a large cluster for a short timeGrow or shrink a cluster on demand
68. Common sense and safetyAmazon security is goodBut you are a few clicks away from making a data set public by mistakeCommon sense precautionsBefore you try it yourself…Get permission to move data to the cloudScrub data to remove sensitive informationSystem performance monitoring logs are a good choice for analysis