This document discusses resource management and security in cloud computing. It covers topics such as inter-cloud resource management, resource provisioning methods, global exchange of cloud resources, and security challenges in cloud computing. Specifically, it discusses demand-driven, event-driven and popularity-driven methods for resource provisioning in clouds. It also summarizes proposed architectures for global exchange of cloud resources across geographic locations. Finally, it outlines some key security concerns for cloud computing like data breaches and the shared responsibility model between cloud providers and customers for security.
DHCP is a client-server protocol that assigns network parameters like IP addresses to devices from a server's address pool. A DHCP client broadcasts a request and the DHCP server responds with an offered address via acknowledgement packets. DNS translates human-friendly hostnames to IP addresses by querying a DNS server's address records, allowing users to access resources by name instead of numeric address. Together, DHCP and DNS simplify network configuration and access.
This document provides an overview of Amazon Route 53 DNS services including:
- IPv4 and IPv6 address spaces and how Route 53 resolves domain names to IP addresses using A records.
- Common DNS record types like NS, SOA, CNAME and how they work.
- Route 53 routing policies for controlling traffic like simple, weighted, latency, failover and geolocation routing.
- How alias records can simplify configuration by automatically reflecting changes to referenced resources.
- A example of setting up Route 53 with domains, record sets, Elastic Load Balancers and instances across regions.
Speaker spoke about features and benefits of the AWS Lambda service and explained how to increase system performance by using AWS services.
This presentation by Mykhailo Brodskyi (Senior Software Engineer, Consultant, GlobalLogic, Kharkiv), was delivered at GlobalLogic Kharkiv Java Conference 2018 on June 10, 2018.
Thin client computing involves using low-cost thin client terminals instead of traditional PCs. Thin clients have minimal local hardware and rely on a centralized server for processing and data storage. This reduces costs associated with hardware, maintenance, power consumption and IT staff. The main components are thin client terminals, thin client software, network connectivity and a server. Common protocols used include RDP, ICA and X11. Thin clients provide advantages like lower TCO, simplified management and security, but may impact performance for multimedia or graphics-heavy applications.
Security in Clouds: Cloud security challenges – Software as a
Service Security, Common Standards: The Open Cloud Consortium – The Distributed management Task Force – Standards for application Developers – Standards for Messaging – Standards for Security, End user access to cloud computing, Mobile Internet devices and the cloud. Hadoop – MapReduce – Virtual Box — Google App Engine – Programming Environment for Google App Engine.
This document discusses encryption options when using AWS, focusing on the AWS Key Management Service (KMS). KMS allows users to simplify the creation, control, rotation and use of encryption keys in AWS services like S3, EBS, RDS, Redshift and others. It addresses key storage, access and usage considerations. KMS uses symmetric AES-256 encryption for data keys and allows granular IAM control over who can create, enable/disable, use and audit keys. The presentation demonstrates how to create and use customer master keys in KMS and integrate encryption with S3 and EBS volumes.
Cloud load balancing distributes workloads and network traffic across computing resources in a cloud environment to improve performance and availability. It routes incoming traffic to multiple servers or other resources while balancing the load. Load balancing in the cloud is typically software-based and offers benefits like scalability, reliability, reduced costs, and flexibility compared to traditional hardware-based load balancing. Common cloud providers like AWS, Google Cloud, and Microsoft Azure offer multiple load balancing options that vary based on needs and network layers.
Cloud computing introduced with emphasis on the underlying technology explaining that more than virtualization is involved. Topics covered include: Cloud Technologies, Web Applications, Clustering, Terminal Services, Application Servers, Virtualization, Hypervisors, Service Models, Deployment Models, and Cloud Security.
This document provides an overview and introduction to big data concepts presented by Paresh Motiwala for the New England SQL Server group. The presentation covers sources of big data, privacy concerns, storing data in Hadoop, processing data with MapReduce, and using tools like R, Python, and Power BI for analysis and visualization. It defines key big data concepts like the 5 V's and discusses challenges like volume, variety and velocity of data. It also summarizes strategies for companies to develop their data and cloud capabilities.
The life cycle of a virtual machine (VM) provisioning processHitesh Mohapatra
The life cycle of a virtual machine (VM) provisioning process includes the following stages:
Creation: The VM is created
Configuration: The VM is configured in a development environment
Allocation: Virtual resources are allocated
Exploitation and monitoring: The VM is used and its status is monitored
Elimination: The VM is eliminated
Amazon Simple Notification Service (SNS) allows you to send push notifications to mobile or other distributed services, and scales as needs grow. It supports sending messages individually or broadcasting to multiple destinations. Amazon Simple Queue Service (SQS) is a fully managed message queuing service that can transmit any volume of data reliably and scalably. SQS uses three core APIs and stores messages redundantly across servers, providing high durability. It supports standard queues for high throughput and FIFO queues for strict ordering.
Server Consolidation in Cloud Computing EnvironmentHitesh Mohapatra
Server consolidation in cloud computing refers to the practice of reducing the number of physical servers by combining workloads onto fewer, more powerful virtual machines or cloud instances. This approach improves resource utilization, reduces operational costs, and enhances scalability while maintaining performance and reliability in cloud environments.
In this session you will learn:
HIVE Overview
Working of Hive
Hive Tables
Hive - Data Types
Complex Types
Hive Database
HiveQL - Select-Joins
Different Types of Join
Partitions
Buckets
Strict Mode in Hive
Like and Rlike in Hive
Hive UDF
For more information, visit: https://www.mindsmapped.com/courses/big-data-hadoop/hadoop-developer-training-a-step-by-step-tutorial/
Introduction to AWS VPC, Guidelines, and Best PracticesGary Silverman
I crafted this presentation for the AWS Chicago Meetup. This deck covers the rationale, building blocks, guidelines, and several best practices for Amazon Web Services Virtual Private Cloud. I classify it as a somewhere between a 101 and 201 level presentation.
If you like the presentation, I would appreciate you clicking the Like button.
Cloud-based IT resources need to be set up, configured, maintained, and monitored. The systems covered in this chapter are mechanisms that encompass and enable these types of management tasks.
Resource replication in cloud computing is the process of making multiple copies of the same resource. It's done to improve the availability and performance of IT resources.
The document discusses the Domain Name System (DNS), which translates domain names to IP addresses and vice versa. It describes the hierarchical structure of DNS with zones, resource records, and name servers. Primary and secondary name servers maintain authoritative data for zones, while caching name servers store previously looked up data to improve performance. The domain name resolution process involves queries to authoritative and caching name servers to map names to addresses.
Real time monitoring of AWS components and the applications you run on AWS in real-time
You can use CloudWatch to collect and track metrics, which are the variables you want to measure for your resources and applications
CloudWatch alarms send notifications or automatically make changes to the resources you are monitoring based on rules that you define
Man-in-the-Middle (MitM) attacks occur when an unknown third party intercepts communications between two parties and manipulates or observes the traffic. MitM attacks are used by cyber criminals to steal personal information and login credentials through various techniques like IP spoofing, DNS spoofing, HTTPS spoofing, SSL hijacking, email hijacking, Wi-Fi eavesdropping, and theft of browsing cookies. A Man-in-the-Browser attack involves using malware installed on a victim's browser via phishing to record information sent between targeted websites and the user, putting online banking customers at risk. Protecting against MitM attacks requires using HTTPS, avoiding public Wi-Fi without a VPN, and using internet security applications.
System models for distributed and cloud computingpurplesea
This document discusses different types of distributed computing systems including clusters, peer-to-peer networks, grids, and clouds. It describes key characteristics of each type such as configuration, control structure, scale, and usage. The document also covers performance metrics, scalability analysis using Amdahl's Law, system efficiency considerations, and techniques for achieving fault tolerance and high system availability in distributed environments.
In a fast growing storage space management world, it is now an important task to think about options that can safely store our data and at a cheaper cost. Small scale businesses, that cant afford their own storage spaces, can easily take the advantage of such services.
Cloud analytics is a type of cloud service model where data analysis and related services are performed on a public or private cloud. Cloud analytics can refer to any data analytics or business intelligence process that is carried out in collaboration with a cloud service provider.
Cloud analytics is also known as Software as a Service (SaaS)-based business intelligence (BI).
The document summarizes Apache Hadoop, an open-source software framework for distributed storage and processing of large datasets across clusters of computers. It describes the key components of Hadoop including the Hadoop Distributed File System (HDFS) which stores data reliably across commodity hardware, and the MapReduce programming model which allows distributed processing of large datasets in parallel. The document provides an overview of HDFS architecture, data flow, fault tolerance, and other aspects to enable reliable storage and access of very large files across clusters.
DynamoDB is a key-value database that achieves high availability and scalability through several techniques:
1. It uses consistent hashing to partition and replicate data across multiple storage nodes, allowing incremental scalability.
2. It employs vector clocks to maintain consistency among replicas during writes, decoupling version size from update rates.
3. For handling temporary failures, it uses sloppy quorum and hinted handoff to provide high availability and durability guarantees when some replicas are unavailable.
Technological Geeks Video 13 :-
Video Link :- https://youtu.be/mfLxxD4vjV0
FB page Link :- https://www.facebook.com/bitwsandeep/
Contents :-
Hive Architecture
Hive Components
Limitations of Hive
Hive data model
Difference with traditional RDBMS
Type system in Hive
Hadoop DFS consists of HDFS for storage and MapReduce for processing. HDFS provides massive storage, fault tolerance through data replication, and high throughput access to data. It uses a master-slave architecture with a NameNode managing the file system namespace and DataNodes storing file data blocks. The NameNode ensures data reliability through policies that replicate blocks across racks and nodes. HDFS provides scalability, flexibility and low-cost storage of large datasets.
This document summarizes key aspects of the Hadoop Distributed File System (HDFS). HDFS is designed for storing very large files across commodity hardware. It uses a master/slave architecture with a single NameNode that manages file system metadata and multiple DataNodes that store application data. HDFS allows for streaming access to this distributed data and can provide higher throughput than a single high-end server by parallelizing reads across nodes.
Cloud computing introduced with emphasis on the underlying technology explaining that more than virtualization is involved. Topics covered include: Cloud Technologies, Web Applications, Clustering, Terminal Services, Application Servers, Virtualization, Hypervisors, Service Models, Deployment Models, and Cloud Security.
This document provides an overview and introduction to big data concepts presented by Paresh Motiwala for the New England SQL Server group. The presentation covers sources of big data, privacy concerns, storing data in Hadoop, processing data with MapReduce, and using tools like R, Python, and Power BI for analysis and visualization. It defines key big data concepts like the 5 V's and discusses challenges like volume, variety and velocity of data. It also summarizes strategies for companies to develop their data and cloud capabilities.
The life cycle of a virtual machine (VM) provisioning processHitesh Mohapatra
The life cycle of a virtual machine (VM) provisioning process includes the following stages:
Creation: The VM is created
Configuration: The VM is configured in a development environment
Allocation: Virtual resources are allocated
Exploitation and monitoring: The VM is used and its status is monitored
Elimination: The VM is eliminated
Amazon Simple Notification Service (SNS) allows you to send push notifications to mobile or other distributed services, and scales as needs grow. It supports sending messages individually or broadcasting to multiple destinations. Amazon Simple Queue Service (SQS) is a fully managed message queuing service that can transmit any volume of data reliably and scalably. SQS uses three core APIs and stores messages redundantly across servers, providing high durability. It supports standard queues for high throughput and FIFO queues for strict ordering.
Server Consolidation in Cloud Computing EnvironmentHitesh Mohapatra
Server consolidation in cloud computing refers to the practice of reducing the number of physical servers by combining workloads onto fewer, more powerful virtual machines or cloud instances. This approach improves resource utilization, reduces operational costs, and enhances scalability while maintaining performance and reliability in cloud environments.
In this session you will learn:
HIVE Overview
Working of Hive
Hive Tables
Hive - Data Types
Complex Types
Hive Database
HiveQL - Select-Joins
Different Types of Join
Partitions
Buckets
Strict Mode in Hive
Like and Rlike in Hive
Hive UDF
For more information, visit: https://www.mindsmapped.com/courses/big-data-hadoop/hadoop-developer-training-a-step-by-step-tutorial/
Introduction to AWS VPC, Guidelines, and Best PracticesGary Silverman
I crafted this presentation for the AWS Chicago Meetup. This deck covers the rationale, building blocks, guidelines, and several best practices for Amazon Web Services Virtual Private Cloud. I classify it as a somewhere between a 101 and 201 level presentation.
If you like the presentation, I would appreciate you clicking the Like button.
Cloud-based IT resources need to be set up, configured, maintained, and monitored. The systems covered in this chapter are mechanisms that encompass and enable these types of management tasks.
Resource replication in cloud computing is the process of making multiple copies of the same resource. It's done to improve the availability and performance of IT resources.
The document discusses the Domain Name System (DNS), which translates domain names to IP addresses and vice versa. It describes the hierarchical structure of DNS with zones, resource records, and name servers. Primary and secondary name servers maintain authoritative data for zones, while caching name servers store previously looked up data to improve performance. The domain name resolution process involves queries to authoritative and caching name servers to map names to addresses.
Real time monitoring of AWS components and the applications you run on AWS in real-time
You can use CloudWatch to collect and track metrics, which are the variables you want to measure for your resources and applications
CloudWatch alarms send notifications or automatically make changes to the resources you are monitoring based on rules that you define
Man-in-the-Middle (MitM) attacks occur when an unknown third party intercepts communications between two parties and manipulates or observes the traffic. MitM attacks are used by cyber criminals to steal personal information and login credentials through various techniques like IP spoofing, DNS spoofing, HTTPS spoofing, SSL hijacking, email hijacking, Wi-Fi eavesdropping, and theft of browsing cookies. A Man-in-the-Browser attack involves using malware installed on a victim's browser via phishing to record information sent between targeted websites and the user, putting online banking customers at risk. Protecting against MitM attacks requires using HTTPS, avoiding public Wi-Fi without a VPN, and using internet security applications.
System models for distributed and cloud computingpurplesea
This document discusses different types of distributed computing systems including clusters, peer-to-peer networks, grids, and clouds. It describes key characteristics of each type such as configuration, control structure, scale, and usage. The document also covers performance metrics, scalability analysis using Amdahl's Law, system efficiency considerations, and techniques for achieving fault tolerance and high system availability in distributed environments.
In a fast growing storage space management world, it is now an important task to think about options that can safely store our data and at a cheaper cost. Small scale businesses, that cant afford their own storage spaces, can easily take the advantage of such services.
Cloud analytics is a type of cloud service model where data analysis and related services are performed on a public or private cloud. Cloud analytics can refer to any data analytics or business intelligence process that is carried out in collaboration with a cloud service provider.
Cloud analytics is also known as Software as a Service (SaaS)-based business intelligence (BI).
The document summarizes Apache Hadoop, an open-source software framework for distributed storage and processing of large datasets across clusters of computers. It describes the key components of Hadoop including the Hadoop Distributed File System (HDFS) which stores data reliably across commodity hardware, and the MapReduce programming model which allows distributed processing of large datasets in parallel. The document provides an overview of HDFS architecture, data flow, fault tolerance, and other aspects to enable reliable storage and access of very large files across clusters.
DynamoDB is a key-value database that achieves high availability and scalability through several techniques:
1. It uses consistent hashing to partition and replicate data across multiple storage nodes, allowing incremental scalability.
2. It employs vector clocks to maintain consistency among replicas during writes, decoupling version size from update rates.
3. For handling temporary failures, it uses sloppy quorum and hinted handoff to provide high availability and durability guarantees when some replicas are unavailable.
Technological Geeks Video 13 :-
Video Link :- https://youtu.be/mfLxxD4vjV0
FB page Link :- https://www.facebook.com/bitwsandeep/
Contents :-
Hive Architecture
Hive Components
Limitations of Hive
Hive data model
Difference with traditional RDBMS
Type system in Hive
Hadoop DFS consists of HDFS for storage and MapReduce for processing. HDFS provides massive storage, fault tolerance through data replication, and high throughput access to data. It uses a master-slave architecture with a NameNode managing the file system namespace and DataNodes storing file data blocks. The NameNode ensures data reliability through policies that replicate blocks across racks and nodes. HDFS provides scalability, flexibility and low-cost storage of large datasets.
This document summarizes key aspects of the Hadoop Distributed File System (HDFS). HDFS is designed for storing very large files across commodity hardware. It uses a master/slave architecture with a single NameNode that manages file system metadata and multiple DataNodes that store application data. HDFS allows for streaming access to this distributed data and can provide higher throughput than a single high-end server by parallelizing reads across nodes.
HDFS is a distributed file system designed to run on commodity hardware. It stores very large files reliably across machines by splitting files into blocks and replicating those blocks. The NameNode manages the file system namespace and maps blocks to DataNodes, which store the blocks. HDFS supports large files, streaming data access patterns, and runs reliably on clusters of commodity hardware.
Hadoop is an open source framework that allows distributed processing of large datasets across clusters of computers. It uses a master-slave architecture where a single node acts as the master (NameNode) to manage file system metadata and job scheduling, while other nodes (DataNodes) store data blocks and perform parallel processing tasks. Hadoop distributes data storage across clusters using HDFS for fault tolerance and high throughput. It distributes computation using MapReduce for parallel processing of large datasets. Common users of Hadoop include large internet companies for applications like log analysis, machine learning and reporting.
DFS allows administrators to consolidate file shares across multiple servers so users can access files from a single location. It provides benefits like centralized resources management, high accessibility regardless of physical location, fault tolerance through replication, and optimized workload distribution. HDFS is the distributed file system of Hadoop. It has a master/slave architecture with a single NameNode managing the file system namespace and DataNodes storing and retrieving blocks. The NameNode maintains metadata and regulates client access while DataNodes store and serve blocks upon instruction from the NameNode, providing streaming data access at large scale for batch processing workloads.
The Hadoop Distributed File System (HDFS) has a master/slave architecture with a single NameNode that manages the file system namespace and regulates client access, and multiple DataNodes that store and retrieve blocks of data files. The NameNode maintains metadata and a map of blocks to files, while DataNodes store blocks and report their locations. Blocks are replicated across DataNodes for fault tolerance following a configurable replication factor. The system uses rack awareness and preferential selection of local replicas to optimize performance and bandwidth utilization.
This document describes the key components and features of HDFS (Hadoop Distributed File System). It explains that HDFS is suitable for distributed storage and processing of large datasets across commodity hardware. It stores data as blocks across DataNodes and uses a Namenode to manage file system metadata and regulate client access. The goals of HDFS are fault tolerance, support for huge datasets, and performing computation near data.
In this session you will learn:
History of Hadoop
Hadoop Ecosystem
Hadoop Animal Planet
What is Hadoop?
Distinctions of Hadoop
Hadoop Components
The Hadoop Distributed Filesystem
Design of HDFS
When Not to use Hadoop?
HDFS Concepts
Anatomy of a File Read
Anatomy of a File Write
Replication & Rack awareness
Mapreduce Components
Typical Mapreduce Job
To know more, click here: https://www.mindsmapped.com/courses/big-data-hadoop/big-data-and-hadoop-training-for-beginners/
Big data interview questions and answersKalyan Hadoop
This document provides an overview of the Hadoop Distributed File System (HDFS), including its goals, design, daemons, and processes for reading and writing files. HDFS is designed for storing very large files across commodity servers, and provides high throughput and reliability through replication. The key components are the NameNode, which manages metadata, and DataNodes, which store data blocks. The Secondary NameNode assists the NameNode in checkpointing filesystem state periodically.
HDFS is a distributed file system designed for storing very large data files across commodity servers or clusters. It works on a master-slave architecture with one namenode (master) and multiple datanodes (slaves). The namenode manages the file system metadata and regulates client access, while datanodes store and retrieve block data from their local file systems. Files are divided into large blocks which are replicated across datanodes for fault tolerance. The namenode monitors datanodes and replicates blocks if their replication drops below a threshold.
HDFS is Hadoop's implementation of a distributed file system designed to store large amounts of data across clusters of machines. It is based on Google's GFS and addresses limitations of other distributed file systems like NFS. HDFS uses a master/slave architecture with a NameNode master storing metadata and DataNodes storing data blocks. Data is replicated across multiple DataNodes for reliability. The file system is optimized for large, sequential reads and writes of entire files rather than random access or updates.
* The file size is 1664MB
* HDFS block size is usually 128MB by default in Hadoop 2.0
* To calculate number of blocks required: File size / Block size
* 1664MB / 128MB = 13 blocks
* 8 blocks have been uploaded successfully
* So remaining blocks = Total blocks - Uploaded blocks = 13 - 8 = 5
If another client tries to access/read the data while the upload is still in progress, it will only be able to access the data from the 8 blocks that have been uploaded so far. The remaining 5 blocks of data will not be available or visible to other clients until the full upload is completed. HDFS follows write-once semantics, so partial
The document summarizes the Hadoop Distributed File System (HDFS), which is designed to reliably store and stream very large datasets at high bandwidth. It describes the key components of HDFS, including the NameNode which manages the file system metadata and mapping of blocks to DataNodes, and DataNodes which store block replicas. HDFS allows scaling storage and computation across thousands of servers by distributing data storage and processing tasks.
Best Hadoop Institutes : kelly tecnologies is the best Hadoop training Institute in Bangalore.Providing hadoop courses by realtime faculty in Bangalore.
The document provides an introduction to Hadoop and HDFS (Hadoop Distributed File System). It discusses key concepts such as:
- HDFS stores large datasets across commodity hardware in a fault-tolerant manner and provides scalable storage and access.
- HDFS has a master/slave architecture with a NameNode that manages metadata and DataNodes that store data blocks.
- Data is replicated across DataNodes for reliability, with one replica on a local rack and two on remote racks by default.
- Hadoop allows processing of large datasets in parallel across clusters and is well-suited for massive amounts of structured and unstructured data.
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Simplilearn
This video on Hadoop interview questions part-1 will take you through the general Hadoop questions and questions on HDFS, MapReduce and YARN, which are very likely to be asked in any Hadoop interview. It covers all the topics on the major components of Hadoop. This Hadoop tutorial will give you an idea about the different scenario-based questions you could face and some multiple-choice questions as well. Now, let us dive into this Hadoop interview questions video and gear up for youe next Hadoop Interview.
What is this Big Data Hadoop training course about?
The Big Data Hadoop and Spark developer course have been designed to impart an in-depth knowledge of Big Data processing using Hadoop and Spark. The course is packed with real-life projects and case studies to be executed in the CloudLab.
What are the course objectives?
This course will enable you to:
1. Understand the different components of the Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark
2. Understand Hadoop Distributed File System (HDFS) and YARN as well as their architecture, and learn how to work with them for storage and resource management
3. Understand MapReduce and its characteristics, and assimilate some advanced MapReduce concepts
4. Get an overview of Sqoop and Flume and describe how to ingest data using them
5. Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
6. Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
7. Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
8. Understand HBase, its architecture, data storage, and working with HBase. You will also understand the difference between HBase and RDBMS
9. Gain a working knowledge of Pig and its components
10. Do functional programming in Spark
11. Understand resilient distribution datasets (RDD) in detail
12. Implement and build Spark applications
13. Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques
14. Understand the common use-cases of Spark and the various interactive algorithms
15. Learn Spark SQL, creating, transforming, and querying Data frames
Learn more at https://www.simplilearn.com/big-data-and-analytics/big-data-and-hadoop-training
The document provides an introduction to the key concepts of Big Data including Hadoop, HDFS, and MapReduce. It defines big data as large volumes of data that are difficult to process using traditional methods. Hadoop is introduced as an open-source framework for distributed storage and processing of large datasets across clusters of computers. HDFS is described as Hadoop's distributed file system that stores data across clusters and replicates files for reliability. MapReduce is a programming model where data is processed in parallel across clusters using mapping and reducing functions.
Hadoop-professional-software-development-course-in-mumbaiUnmesh Baile
The document provides an overview of the Hadoop Distributed File System (HDFS). It describes HDFS's key features like fault tolerance, high throughput, and support for large datasets. The core architecture uses a master/slave model with a Namenode managing metadata and Datanodes storing file data blocks. HDFS employs replication for reliability and data is placed across racks for high availability. The document also discusses HDFS protocols, APIs, and how it ensures data integrity and handles failures.
This document provides an overview of database management systems. It discusses the objectives of studying databases including data models, SQL, transaction processing, and storage techniques. It also describes different data models like relational, ER, and object-oriented models. Key components of a database system architecture are explained including users, applications, query processor, and storage manager. Advantages and disadvantages of database systems compared to file processing systems are also summarized.
Title: A Quick and Illustrated Guide to APA Style Referencing (7th Edition)
This visual and beginner-friendly guide simplifies the APA referencing style (7th edition) for academic writing. Designed especially for commerce students and research beginners, it includes:
✅ Real examples from original research papers
✅ Color-coded diagrams for clarity
✅ Key rules for in-text citation and reference list formatting
✅ Free citation tools like Mendeley & Zotero explained
Whether you're writing a college assignment, dissertation, or academic article, this guide will help you cite your sources correctly, confidently, and consistent.
Created by: Prof. Ishika Ghosh,
Faculty.
📩 For queries or feedback: [email protected]
*Metamorphosis* is a biological process where an animal undergoes a dramatic transformation from a juvenile or larval stage to a adult stage, often involving significant changes in form and structure. This process is commonly seen in insects, amphibians, and some other animals.
INTRO TO STATISTICS
INTRO TO SPSS INTERFACE
CLEANING MULTIPLE CHOICE RESPONSE DATA WITH EXCEL
ANALYZING MULTIPLE CHOICE RESPONSE DATA
INTERPRETATION
Q & A SESSION
PRACTICAL HANDS-ON ACTIVITY
A measles outbreak originating in West Texas has been linked to confirmed cases in New Mexico, with additional cases reported in Oklahoma and Kansas. The current case count is 817 from Texas, New Mexico, Oklahoma, and Kansas. 97 individuals have required hospitalization, and 3 deaths, 2 children in Texas and one adult in New Mexico. These fatalities mark the first measles-related deaths in the United States since 2015 and the first pediatric measles death since 2003.
The YSPH Virtual Medical Operations Center Briefs (VMOC) were created as a service-learning project by faculty and graduate students at the Yale School of Public Health in response to the 2010 Haiti Earthquake. Each year, the VMOC Briefs are produced by students enrolled in Environmental Health Science Course 581 - Public Health Emergencies: Disaster Planning and Response. These briefs compile diverse information sources – including status reports, maps, news articles, and web content– into a single, easily digestible document that can be widely shared and used interactively. Key features of this report include:
- Comprehensive Overview: Provides situation updates, maps, relevant news, and web resources.
- Accessibility: Designed for easy reading, wide distribution, and interactive use.
- Collaboration: The “unlocked" format enables other responders to share, copy, and adapt seamlessly. The students learn by doing, quickly discovering how and where to find critical information and presenting it in an easily understood manner.
CURRENT CASE COUNT: 817 (As of 05/3/2025)
• Texas: 688 (+20)(62% of these cases are in Gaines County).
• New Mexico: 67 (+1 )(92.4% of the cases are from Eddy County)
• Oklahoma: 16 (+1)
• Kansas: 46 (32% of the cases are from Gray County)
HOSPITALIZATIONS: 97 (+2)
• Texas: 89 (+2) - This is 13.02% of all TX cases.
• New Mexico: 7 - This is 10.6% of all NM cases.
• Kansas: 1 - This is 2.7% of all KS cases.
DEATHS: 3
• Texas: 2 – This is 0.31% of all cases
• New Mexico: 1 – This is 1.54% of all cases
US NATIONAL CASE COUNT: 967 (Confirmed and suspected):
INTERNATIONAL SPREAD (As of 4/2/2025)
• Mexico – 865 (+58)
‒Chihuahua, Mexico: 844 (+58) cases, 3 hospitalizations, 1 fatality
• Canada: 1531 (+270) (This reflects Ontario's Outbreak, which began 11/24)
‒Ontario, Canada – 1243 (+223) cases, 84 hospitalizations.
• Europe: 6,814
How to Manage Opening & Closing Controls in Odoo 17 POSCeline George
In Odoo 17 Point of Sale, the opening and closing controls are key for cash management. At the start of a shift, cashiers log in and enter the starting cash amount, marking the beginning of financial tracking. Throughout the shift, every transaction is recorded, creating an audit trail.
A measles outbreak originating in West Texas has been linked to confirmed cases in New Mexico, with additional cases reported in Oklahoma and Kansas. The current case count is 795 from Texas, New Mexico, Oklahoma, and Kansas. 95 individuals have required hospitalization, and 3 deaths, 2 children in Texas and one adult in New Mexico. These fatalities mark the first measles-related deaths in the United States since 2015 and the first pediatric measles death since 2003.
The YSPH Virtual Medical Operations Center Briefs (VMOC) were created as a service-learning project by faculty and graduate students at the Yale School of Public Health in response to the 2010 Haiti Earthquake. Each year, the VMOC Briefs are produced by students enrolled in Environmental Health Science Course 581 - Public Health Emergencies: Disaster Planning and Response. These briefs compile diverse information sources – including status reports, maps, news articles, and web content– into a single, easily digestible document that can be widely shared and used interactively. Key features of this report include:
- Comprehensive Overview: Provides situation updates, maps, relevant news, and web resources.
- Accessibility: Designed for easy reading, wide distribution, and interactive use.
- Collaboration: The “unlocked" format enables other responders to share, copy, and adapt seamlessly. The students learn by doing, quickly discovering how and where to find critical information and presenting it in an easily understood manner.
The Pala kings were people-protectors. In fact, Gopal was elected to the throne only to end Matsya Nyaya. Bhagalpur Abhiledh states that Dharmapala imposed only fair taxes on the people. Rampala abolished the unjust taxes imposed by Bhima. The Pala rulers were lovers of learning. Vikramshila University was established by Dharmapala. He opened 50 other learning centers. A famous Buddhist scholar named Haribhadra was to be present in his court. Devpala appointed another Buddhist scholar named Veerdeva as the vice president of Nalanda Vihar. Among other scholars of this period, Sandhyakar Nandi, Chakrapani Dutta and Vajradatta are especially famous. Sandhyakar Nandi wrote the famous poem of this period 'Ramcharit'.
The *nervous system of insects* is a complex network of nerve cells (neurons) and supporting cells that process and transmit information. Here's an overview:
Structure
1. *Brain*: The insect brain is a complex structure that processes sensory information, controls behavior, and integrates information.
2. *Ventral nerve cord*: A chain of ganglia (nerve clusters) that runs along the insect's body, controlling movement and sensory processing.
3. *Peripheral nervous system*: Nerves that connect the central nervous system to sensory organs and muscles.
Functions
1. *Sensory processing*: Insects can detect and respond to various stimuli, such as light, sound, touch, taste, and smell.
2. *Motor control*: The nervous system controls movement, including walking, flying, and feeding.
3. *Behavioral responThe *nervous system of insects* is a complex network of nerve cells (neurons) and supporting cells that process and transmit information. Here's an overview:
Structure
1. *Brain*: The insect brain is a complex structure that processes sensory information, controls behavior, and integrates information.
2. *Ventral nerve cord*: A chain of ganglia (nerve clusters) that runs along the insect's body, controlling movement and sensory processing.
3. *Peripheral nervous system*: Nerves that connect the central nervous system to sensory organs and muscles.
Functions
1. *Sensory processing*: Insects can detect and respond to various stimuli, such as light, sound, touch, taste, and smell.
2. *Motor control*: The nervous system controls movement, including walking, flying, and feeding.
3. *Behavioral responses*: Insects can exhibit complex behaviors, such as mating, foraging, and social interactions.
Characteristics
1. *Decentralized*: Insect nervous systems have some autonomy in different body parts.
2. *Specialized*: Different parts of the nervous system are specialized for specific functions.
3. *Efficient*: Insect nervous systems are highly efficient, allowing for rapid processing and response to stimuli.
The insect nervous system is a remarkable example of evolutionary adaptation, enabling insects to thrive in diverse environments.
The insect nervous system is a remarkable example of evolutionary adaptation, enabling insects to thrive
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - WorksheetSritoma Majumder
Introduction
All the materials around us are made up of elements. These elements can be broadly divided into two major groups:
Metals
Non-Metals
Each group has its own unique physical and chemical properties. Let's understand them one by one.
Physical Properties
1. Appearance
Metals: Shiny (lustrous). Example: gold, silver, copper.
Non-metals: Dull appearance (except iodine, which is shiny).
2. Hardness
Metals: Generally hard. Example: iron.
Non-metals: Usually soft (except diamond, a form of carbon, which is very hard).
3. State
Metals: Mostly solids at room temperature (except mercury, which is a liquid).
Non-metals: Can be solids, liquids, or gases. Example: oxygen (gas), bromine (liquid), sulphur (solid).
4. Malleability
Metals: Can be hammered into thin sheets (malleable).
Non-metals: Not malleable. They break when hammered (brittle).
5. Ductility
Metals: Can be drawn into wires (ductile).
Non-metals: Not ductile.
6. Conductivity
Metals: Good conductors of heat and electricity.
Non-metals: Poor conductors (except graphite, which is a good conductor).
7. Sonorous Nature
Metals: Produce a ringing sound when struck.
Non-metals: Do not produce sound.
Chemical Properties
1. Reaction with Oxygen
Metals react with oxygen to form metal oxides.
These metal oxides are usually basic.
Non-metals react with oxygen to form non-metallic oxides.
These oxides are usually acidic.
2. Reaction with Water
Metals:
Some react vigorously (e.g., sodium).
Some react slowly (e.g., iron).
Some do not react at all (e.g., gold, silver).
Non-metals: Generally do not react with water.
3. Reaction with Acids
Metals react with acids to produce salt and hydrogen gas.
Non-metals: Do not react with acids.
4. Reaction with Bases
Some non-metals react with bases to form salts, but this is rare.
Metals generally do not react with bases directly (except amphoteric metals like aluminum and zinc).
Displacement Reaction
More reactive metals can displace less reactive metals from their salt solutions.
Uses of Metals
Iron: Making machines, tools, and buildings.
Aluminum: Used in aircraft, utensils.
Copper: Electrical wires.
Gold and Silver: Jewelry.
Zinc: Coating iron to prevent rusting (galvanization).
Uses of Non-Metals
Oxygen: Breathing.
Nitrogen: Fertilizers.
Chlorine: Water purification.
Carbon: Fuel (coal), steel-making (coke).
Iodine: Medicines.
Alloys
An alloy is a mixture of metals or a metal with a non-metal.
Alloys have improved properties like strength, resistance to rusting.
How to Subscribe Newsletter From Odoo 18 WebsiteCeline George
Newsletter is a powerful tool that effectively manage the email marketing . It allows us to send professional looking HTML formatted emails. Under the Mailing Lists in Email Marketing we can find all the Newsletter.
Multi-currency in odoo accounting and Update exchange rates automatically in ...Celine George
Most business transactions use the currencies of several countries for financial operations. For global transactions, multi-currency management is essential for enabling international trade.
Social Problem-Unemployment .pptx notes for Physiotherapy StudentsDrNidhiAgarwal
Unemployment is a major social problem, by which not only rural population have suffered but also urban population are suffered while they are literate having good qualification.The evil consequences like poverty, frustration, revolution
result in crimes and social disorganization. Therefore, it is
necessary that all efforts be made to have maximum.
employment facilities. The Government of India has already
announced that the question of payment of unemployment
allowance cannot be considered in India
Odoo Inventory Rules and Routes v17 - Odoo SlidesCeline George
Odoo's inventory management system is highly flexible and powerful, allowing businesses to efficiently manage their stock operations through the use of Rules and Routes.
Introduction to Vibe Coding and Vibe EngineeringDamian T. Gordon
Cloud Computing - Cloud Technologies and Advancements
1. UNIT V
CLOUD TECHNOLOGIES AND ADVANCEMENTS
• Hadoop
• Apache Hadoop is an open source software framework used to develop data
processing applications which are executed in a distributed computing
environment.
• Applications built using HADOOP are run on large data sets distributed across
clusters of commodity computers.
• Commodity computers are mainly useful for achieving greater computational
power at low cost.
• Similar to data residing in a local file system of a personal computer system, in
Hadoop, data resides in a distributed file system which is called as a Hadoop
Distributed File system.
2. • Apache Hadoop consists of two sub-projects –
• Hadoop MapReduce:
• MapReduce is a computational model and software
framework for writing applications which are run on Hadoop.
• These MapReduce programs are capable of processing
enormous data in parallel on large clusters of computation
nodes.
3. • HDFS (Hadoop Distributed File System):
• HDFS takes care of the storage part of Hadoop applications.
• MapReduce applications consume data from HDFS.
• HDFS creates multiple replicas of data blocks and distributes
them on compute nodes in a cluster.
• This distribution enables reliable and extremely rapid
computations.
5. • NameNode and DataNodes
• HDFS has a master/slave architecture.
• A HDFS cluster consists of a single NameNode, a master server that
manages the file system namespace and regulates access to files by
clients.
• In addition, there are a number of DataNodes, usually one per node in
the cluster, which manage storage attached to the nodes that they run
on.
• HDFS exposes a file system namespace and allows user data to be
stored in files.
7. • Internally, a file is split into one or more blocks and these blocks are
stored in a set of DataNodes.
• The NameNode executes file system namespace operations like
opening, closing, and renaming files and directories.
• It also determines the mapping of blocks to DataNodes.
• The DataNodes are responsible for serving read and write requests
from the file system’s clients.
• The DataNodes also perform block creation, deletion, and replication
upon instruction from the NameNode.
8. Functions of NameNode:
• It is the master daemon that maintains and manages the DataNodes (slave nodes)
• It records the metadata of all the files stored in the cluster, e.g. The location of
blocks stored, the size of the files, permissions, hierarchy, etc. There are two files
associated with the metadata:
• FsImage: It contains the complete state of the file system namespace since the start of the
NameNode.
• EditLogs: It contains all the recent modifications made to the file system with respect to the
most recent FsImage.
• It records each change that takes place to the file system metadata. For example, if
a file is deleted in HDFS, the NameNode will immediately record this in the EditLog.
9. • It regularly receives a Heartbeat and a block report from all the
DataNodes in the cluster to ensure that the DataNodes are live.
• It keeps a record of all the blocks in HDFS and in which nodes these
blocks are located.
• The NameNode is also responsible to take care of
the replication factor of all the blocks.
• In case of the DataNode failure, the NameNode chooses new
DataNodes for new replicas, balance disk usage and manages the
communication traffic to the DataNodes.
10. • The NameNode and DataNode are pieces of software designed to run on commodity
machines.
• These machines typically run a GNU/Linux operating system (OS). HDFS is built using
the Java language;
• Any machine that supports Java can run the NameNode or the DataNode software.
• A typical deployment has a dedicated machine that runs only the NameNode
software.
• Each of the other machines in the cluster runs one instance of the DataNode
software.
• The architecture does not preclude running multiple DataNodes on the same
machine but in a real deployment that is rarely the case.
11. DataNodes
• DataNodes are the slave nodes in HDFS. Unlike NameNode, DataNode is a
commodity hardware, that is, a non-expensive system which is not of high
quality or high-availability. The DataNode is a block server that stores the data in
the local file ext3 or ext4.
• Functions of DataNode:
• These are slave daemons or process which runs on each slave machine.
• The actual data is stored on DataNodes.
• The DataNodes perform the low-level read and write requests from the file
system’s clients.
• They send heartbeats to the NameNode periodically to report the overall health
of HDFS, by default, this frequency is set to 3 seconds
12. • The existence of a single NameNode in a cluster greatly simplifies the
architecture of the system.
• The NameNode is the arbitrator and repository for all HDFS metadata.
• The system is designed in such a way that user data never flows
through the NameNode.
13. • Secondary NameNode:
• Apart from these two daemons, there is a third daemon or a process
called Secondary NameNode. The Secondary NameNode works
concurrently with the primary NameNode as a helper daemon. And
don’t be confused about the Secondary NameNode being a backup
NameNode because it is not.
14. • Functions of Secondary NameNode:
• The Secondary NameNode is one which constantly reads all the file
systems and metadata from the RAM of the NameNode and writes it
into the hard disk or the file system.
• It is responsible for combining the EditLogs with FsImage from the
NameNode.
• It downloads the EditLogs from the NameNode at regular intervals
and applies to FsImage. The new FsImage is copied back to the
NameNode, which is used whenever the NameNode is started the
next time
15. • Blocks:
• Now, as we know that the data in HDFS is scattered across the DataNodes
as blocks. Let’s have a look at what is a block and how is it formed?
• Blocks are the nothing but the smallest continuous location on your hard
drive where data is stored. In general, in any of the File System, you store
the data as a collection of blocks. Similarly, HDFS stores each file as
blocks which are scattered throughout the Apache Hadoop cluster. The
default size of each block is 128 MB in Apache Hadoop 2.x (64 MB in
Apache Hadoop 1.x) which you can configure as per your requirement.
16. • t is not necessary that in HDFS, each file is stored in exact multiple of
the configured block size (128 MB, 256 MB etc.). Let’s take an
example where I have a file “example.txt” of size 514 MB as shown in
above figure. Suppose that we are using the default configuration of
block size, which is 128 MB. Then, how many blocks will be created?
5, Right. The first four blocks will be of 128 MB. But, the last block will
be of 2 MB size only.
17. • The File System Namespace
• HDFS supports a traditional hierarchical file organization.
• A user or an application can create directories and store files inside
these directories.
• The file system namespace hierarchy is similar to most other existing
file systems;
• one can create and remove files, move a file from one directory to
another, or rename a file.
• HDFS supports user access permissions.
18. • While HDFS follows naming convention of the FileSystem, some paths
and names (e.g. /.reserved and .snapshot ) are reserved.
• The NameNode maintains the file system namespace.
• Any change to the file system namespace or its properties is recorded
by the NameNode.
• An application can specify the number of replicas of a file that should
be maintained by HDFS.
• The number of copies of a file is called the replication factor of that
file. This information is stored by the NameNode.
19. • Data Replication
• HDFS is designed to reliably store very large files across machines in a
large cluster.
• It stores each file as a sequence of blocks. The blocks of a file are
replicated for fault tolerance.
• The block size and replication factor are configurable per file.
• All blocks in a file except the last block are the same size
• HDFS provides a reliable way to store huge data in a distributed
environment as data blocks. The blocks are also replicated to provide fault
tolerance. The default replication factor is 3 which is again configurable.
21. • Files in HDFS are write-once (except for appends and truncates) and
have strictly one writer at any time.
• The NameNode makes all decisions regarding replication of blocks.
• It periodically receives a Heartbeat and a Blockreport from each of
the DataNodes in the cluster.
• Receipt of a Heartbeat implies that the DataNode is functioning
properly.
• A Blockreport contains a list of all blocks on a DataNode
23. • Replication
• The placement of replicas is critical to HDFS reliability and
performance.
• Optimizing replica placement distinguishes HDFS from most other
distributed file systems.
• This is a feature that needs lots of tuning and experience.
• The purpose of a rack-aware replica placement policy is to improve
data reliability, availability, and network bandwidth utilization.
24. • Now, the following protocol will be followed whenever the data is
written into HDFS:
• At first, the HDFS client will reach out to the NameNode for a Write
Request against the two blocks, say, Block A & Block B.
• The NameNode will then grant the client the write permission and
will provide the IP addresses of the DataNodes where the file blocks
will be copied eventually.
• The selection of IP addresses of DataNodes is purely randomized
based on availability, replication factor and rack awareness that we
have discussed earlier.
25. • Let’s say the replication factor is set to default i.e. 3. Therefore, for
each block the NameNode will be providing the client a list of (3) IP
addresses of DataNodes. The list will be unique for each block.
• Suppose, the NameNode provided following lists of IP addresses to
the client:
• For Block A, list A = {IP of DataNode 1, IP of DataNode 4, IP of DataNode 6}
• For Block B, set B = {IP of DataNode 3, IP of DataNode 7, IP of DataNode 9}
• Each block will be copied in three different DataNodes to maintain the
replication factor consistent throughout the cluster.
26. • Replica Selection
• To minimize global bandwidth consumption and read latency, HDFS
tries to satisfy a read request from a replica that is closest to the
reader.
• If HDFS cluster spans multiple data centers, then a replica that is
resident in the local data center is preferred over any remote replica.
27. • Files in HDFS are write-once .
• The NameNode makes all decisions regarding replication of blocks.
• It periodically receives a Heartbeat and a Blockreport from each of
the DataNodes in the cluster.
• Receipt of a Heartbeat implies that the DataNode is functioning
properly.
• A Blockreport contains a list of all blocks on a DataNodes
28. MapReduce
• Hadoop MapReduce (Hadoop Map/Reduce) is a software framework
for distributed processing of large data sets on computing clusters.
• It is a sub-project of the Apache Hadoop project.
• Apache Hadoop is an open-source framework that allows to store and
process big data in a distributed environment across clusters of
computers using simple programming models.
• MapReduce is the core component for data processing in Hadoop
framework.
29. • Mapreduce helps to split the input data set into a number of parts
and run a program on all data parts parallel at once.
• The term MapReduce refers to two separate and distinct tasks.
• The first is the map operation, takes a set of data and converts it into
another set of data, where individual elements are broken down into
tuples (key/value pairs).
• The reduce operation combines those data tuples based on the key
and accordingly modifies the value of the key.
31. • Map Task
The Map task run in the following phases:-
a. RecordReader
• The recordreader transforms the input split into records.
• It provides the data to the mapper function in key-value pairs.
• Usually, the key is the positional information and value is the data that
comprises the record.
33. Types of Hadoop RecordReader in
MapReduce
• The RecordReader instance is defined by the InputFormat.
• By default, it uses TextInputFormat for converting data into a key-
value pair. TextInputFormat provides 2 types of RecordReaders:
i. LineRecordReader
ii. SequenceFileRecordReader
34. • b. Map
• In this phase, the mapper which is the user-defined function
processes the key-value pair from the recordreader.
• It produces zero or multiple intermediate key-value pairs.
• The key is usually the data on which the reducer function does the
grouping operation.
• And value is the data which gets aggregated to get the final result in
the reducer function.
35. • c. Combiner
• The combiner is actually a localized reducer which groups the data in the
map phase. It is optional.
• Combiner takes the intermediate data from the mapper and aggregates
them.
• It does so within the small scope of one mapper.
• In many situations, this decreases the amount of data needed to move
over the network. For example, moving (Hello World, 1) three times
consumes more network bandwidth than moving
(Hello World, 3).
37. Reduce Task
• The various phases in reduce task are as follows:
i. Shuffle and Sort
• The reducer starts with shuffle and sort step.
• This step sorts the individual data pieces into a large data list.
• The purpose of this sort is to collect the equivalent keys together.
38. • ii. Reduce
• The reducer performs the reduce function once per key
grouping.
• The framework passes the function key and an iterator object
containing all the values pertaining to the key.
• We can write reducer to filter, aggregate and combine data in a
number of different ways.
• Once the reduce function gets finished it gives zero or more key-
value pairs to the output format.
39. • iii. Output Format
• This is the final step.
• It takes the key-value pair from the reducer and writes it to the file by
record writer.
• By default, it separates the key and value by a tab and each record by
a newline character.
• Final data gets written to HDFS.
40. Virtual Box
• VirtualBox is opensource software for virtualizing the X86 computing
architecture.
• It acts as a hypervisor, creating a VM (Virtual Machine) in which the
user can run another OS (operating system).
• The operating system in which VirtualBox runs is called the "host" OS.
• The operating system running in the VM is called the "guest" OS.
VirtualBox supports Windows, Linux, or macOS as its host OS.
41. • Why Is VirtualBox Useful?
• One:
• VirtualBox allows you to run more than one operating system at a
time.
• This way, you can run software written for one operating system on
another (for example, Windows software on Linux or a Mac) without
having to reboot to use it (as would be needed if you used
partitioning and dual-booting).
42. • Two:
• By using a VirtualBox feature called “snapshots”, you can save a
particular state of a virtual machine and revert back to that state, if
necessary.
• This way, you can freely experiment with a computing environment.
• If something goes wrong (e.g. after installing misbehaving software or
infecting the guest with a virus), you can easily switch back to a
previous snapshot and avoid the need of frequent backups and
restores.
43. • Three:
• Software vendors can use virtual machines to ship entire software
configurations. For example, installing a complete mail server solution
on a real machine can be a tedious task (think of rocket science!).
• With VirtualBox, such a complex setup (then often called an
“appliance”) can be packed into a virtual machine. Installing and
running a mail server becomes as easy as importing such an appliance
into VirtualBox.
• Along these same lines, I find the “clone” feature of virtual box just
awesome!
44. • Four:
• On an enterprise level, virtualization can significantly reduce
hardware and electricity costs.
• Most of the time, computers today only use a fraction of their
potential power and run with low average system loads.
• A lot of hardware resources as well as electricity is thereby wasted.
• So, instead of running many such physical computers that are only
partially used, one can pack many virtual machines onto a few
powerful hosts and balance the loads between them.
45. VirtualBox Terminology
• When dealing with virtualization, it helps towards oneself with a bit of
crucial terminology, especially the following terms:
• Host Operating System (Host OS):
• The operating system of the physical computer on which VirtualBox
was installed. There are versions of VirtualBox for Windows, Mac OS ,
Linux and Solaris hosts.
• Guest Operating System (Guest OS):
• The operating system that is running inside the virtual machine.
46. • Virtual Machine (VM):
• We’ve used this term often already. It is the special environment that
VirtualBox creates for your guest operating system while it is running.
In other words, you run your guest operating system “in” a VM.
Normally, a VM will be shown as a window on your computers
desktop, but depending on which of the various frontends of
VirtualBox you use, it can be displayed in full screen mode or
remotely on another computer.
47. Google App Engine
• Google App Engine is a Platform as a Service and cloud computing
platform for developing and hosting web applications in Google-
managed data centers.
• App Engine is a fully managed, serverless platform for developing and
hosting web applications at scale.
• You can choose from several popular languages, libraries, and
frameworks to develop your apps, then let App Engine take care of
provisioning servers and scaling your app instances based on demand
• The App Engine requires that apps be written in Java or Python, store
data in Google BigTable and use the Google query language.
48. • Google App Engine provides more infrastructure than other
scalable hosting services such as Amazon Elastic Compute
Cloud (EC2).
• The App Engine also eliminates some system administration
and developmental tasks to make it easier to write scalable
applications.
• Google App Engine is free up to a certain amount of resource
usage.
• Users exceeding the per-day or per-minute usage rates for
CPU resources, storage, number of API calls or requests and
concurrent requests can pay for more of these resources.
49. • Modern web applications
• Quickly reach customers and end users by deploying web
apps on App Engine.
• With zero-config deployments and zero server management,
App Engine allows you to focus on writing code.
• Plus, App Engine automatically scales to support sudden
traffic spikes without provisioning, patching, or monitoring.
50. • Features
• Popular languages
• Build your application in Node.js, Java, Ruby, C#, Go, Python, or PHP—
or bring your own language runtime.
• Open and flexible
• Custom runtimes allow you to bring any library and framework to App
Engine.
51. • Fully managed
• A fully managed environment lets you focus on code while App Engine
manages infrastructure concerns.
• Powerful application diagnostics
• Use Cloud Monitoring and Cloud Logging to monitor the health and
performance of your app and Cloud Debugger and Error Reporting to
diagnose and fix bugs quickly.
• Application versioning
• Easily host different versions of your app, easily create development, test,
staging, and production environments.
52. • Application security
• Help safeguard your application by defining access rules with App
Engine firewall and leverage managed SSL/TLS certificates by default
on your custom domain at no additional cost.
53. • Advantages of Google App Engine
• There are many advantages to the Google App Engine that helps to
take your app ideas to the next level. This includes:
• Infrastructure for Security
• Around the world, the Internet infrastructure that Google has is
probably the most secure. There is rarely any type of unauthorized
access to date as the application data and code are stored in highly
secure servers.
54. • Quick to Start
• With no product or hardware to purchase and maintain, you can
prototype and deploy the app to your users without taking much
time.
• Easy to Use
• Google App Engine (GAE) incorporates the tools that you need to
develop, test, launch, and update the applications.
55. • Scalability
• Regardless of the amount of data or number of users that your app
stores, the app engine can meet your needs by scaling up or down as
required.
56. • Performance and Reliability
• Google is among the leaders worldwide among global brands. So, when you discuss
performance and reliability you have to keep that in mind. In the past 15 years, the
company has created new benchmarks based on its services’ and products’
performance. The app engine provides the same reliability and performance as any
other Google product.
• Cost Savings
• You don’t have to hire engineers to manage your servers or to do that yourself. You
can invest the money saved into other parts of your business.
• Platform Independence
• You can move all your data to another environment without any difficulty as there are
not many dependencies on the app engine platform.
57. • Open Stack
• OpenStack is a free open standard cloud computing platform, mostly
deployed as infrastructure-as-a-service in both public and private
clouds where virtual servers and other resources are made available
to users.
• OpenStack is a set of software tools for building and managing cloud
computing platforms for public and private clouds.
• OpenStack is managed by the OpenStack Foundation, a non-profit
that oversees both development and community-building around the
project.
58. • Introduction to OpenStack
• OpenStack lets users deploy virtual machines and other instances that
handle different tasks for managing a cloud environment on the fly.
• It makes horizontal scaling easy, which means that tasks that benefit
from running concurrently can easily serve more or fewer users on
the fly by just spinning up more instances.
• For example, a mobile application that needs to communicate with a
remote server might be able to divide the work of communicating
with each user across many different instances, all communicating
with one another but scaling quickly and easily as the application
gains more users.
59. • And most importantly, OpenStack is open source software, which
means that anyone who chooses to can access the source code, make
any changes or modifications they need, and freely share these
changes.
• It also means that OpenStack has the benefit of thousands of
developers all over the world working in tandem to develop the
strongest, most robust, and most secure product that they can.
60. • How is OpenStack used in a cloud environment?
• The cloud is all about providing computing for end users in a remote
environment, where the actual software runs as a service on reliable and
scalable servers rather than on each end-user's computer.
• Cloud computing can refer to a lot of different things, but typically the
industry talks about running different items "as a service"—software,
platforms, and infrastructure.
• OpenStack is considered Infrastructure as a Service (IaaS).
• Providing infrastructure means that OpenStack makes it easy for users to
quickly add new instance, upon which other cloud components can run.
• Typically, the infrastructure then runs a "platform" upon which a
developer can create software applications that are delivered to the end
users.
61. • What are the components of OpenStack?
• Because of its open nature, anyone can add additional components to
OpenStack to help it to meet their needs.
• But the OpenStack community has collaboratively identified nine key
components that are a part of the "core" of OpenStack, which are
distributed as a part of any OpenStack system and officially
maintained by the OpenStack community.
• Nova is the primary computing engine behind OpenStack. It is used
for deploying and managing large numbers of virtual machines and
other instances to handle computing tasks.
62. • Swift is a storage system for objects and files.
• The OpenStack Object Store project, known as Swift, offers cloud
storage software so that you can store and retrieve lots of data with a
simple API.
• It's built for scale and optimized for durability, availability, and
concurrency across the entire data set.
• Swift is ideal for storing unstructured data that can grow without
bound.
63. • Cinder is a block storage component, which is more analogous to the
traditional notion of a computer being able to access specific
locations on a disk drive. This more traditional way of accessing files
might be important in scenarios in which data access speed is the
most important consideration.
• Neutron provides the networking capability for OpenStack. It helps to
ensure that each of the components of an OpenStack deployment can
communicate with one another quickly and efficiently.
64. • Horizon is the dashboard behind OpenStack.
• It is the only graphical interface to OpenStack, so for users wanting to
give OpenStack a try, this may be the first component they actually
“see.”
• Developers can access all of the components of OpenStack
individually through an application programming interface (API), but
the dashboard provides system administrators a look at what is going
on in the cloud, and to manage it as needed.
65. • Keystone provides identity services for OpenStack. It is essentially a
central list of all of the users of the OpenStack cloud, mapped against
all of the services provided by the cloud, which they have permission
to use. It provides multiple means of access, meaning developers can
easily map their existing user access methods against Keystone.
• Glance provides image services to OpenStack. In this case, "images"
refers to images (or virtual copies) of hard disks.
66. • Prerequisite for minimum production deployment
• There are some basic requirements you’ll have to meet to deploy
OpenStack. Here are the prerequisites, drawn from the OpenStack
manual.
• Hardware: For OpenStack controller node, 12 GB RAM are needed as
well as a disk space of 30 GB to run OpenStack services. Two
SATA(Serial Advanced Technology Attachment) disks of 2 TB will be
necessary to store volumes used by instances. Communication with
compute nodes requires a network interface card (NIC) of 1 Gbps.
67. • Operating system (OS):
• OpenStack supports the following operating systems: Debian, Fedora,
Red Hat Enterprise Linux (RHEL), openSUSE, SLES Linux Enterprise
Server and Ubuntu.
68. • Federation in the Cloud
• Cloud federation is the practice of interconnecting
the cloud computing environments of two or more service providers
for the purpose of load balancing traffic and accommodating spikes in
demand. Cloud federation requires one provider to wholesale or rent
computing resources to another cloud provider.
• “Cloud federation manages consistency and access controls when two
or more independent geographically distinct Clouds share either
authentication, files, computing resources, command and control or
access to storage resources.”
69. • Cloud federation introduces additional issues that have to be
addressed in order to provide a secure environment in which to move
applications and services among a collection of federated providers.
• Baseline security needs to be guaranteed across all cloud vendors that
are part of the federation.
70. • An interesting aspect is represented by the management of the digital
identity across diverse organizations, security domains, and
application platforms.
• In particular, the term federated identity management refers to
standards-based approaches for handling authentication, single sign-
on (SSO), role-based access control, and session management in a
federated environment .
71. • No matter the specific protocol and framework, two main approaches can be
considered:
• Centralized federation model
• This is the approach taken by several identity federation standards. It
distinguishes two operational roles in an SSO transaction: the identity
provider and the service provider.
• Claim-based model
• This approach addresses the problem of user authentication from a different
perspective and requires users to provide claims answering who they are and
what they can do in order to access content or complete a transaction.
72. • The first model is currently used today; the second constitutes a
future vision for identity management in the cloud.
• Digital identity management constitutes a fundamental aspect of
security management in a cloud federation.
• To transparently perform operations across different administrative
domains, it is of mandatory importance to have a robust framework
for authentication and authorization, and federated identity
management addresses this issue.
• Federated identity management allows us to tie together the
computing stacks of different vendors and present them as a single
environment to users from a security point of view.
73. OpenNebula:
• OpenNebula is a cloud computing platform for managing
heterogeneous distributed data center infrastructures.
• The OpenNebula platform manages a data center's virtual
infrastructure, to build private, public and hybrid implementations of
Infrastructure as a Service.
76. • Much research work has been developed around OpenNebula.
• For example, the University of Chicago has come up with an advance
reservation system called Haizea Lease Manager.
• IBM Haifa has developed a policy-driven probabilistic admission
control and dynamic placement optimization for site level
management policies called the RESERVOIR Policy Engine
• Nephele is an SLA-driven automatic service management tool
developed by Telefonica and Virtual Cluster Tool for atomic cluster
management with versioning with multiple transport protocols from
CRS4 Distributed Computing Group.
77. Development
• OpenNebula follows a rapid release cycle to improve user satisfaction
by rapidly delivering features and innovations based on user
requirements and feedback.
• In other words, giving customers what they want more quickly, in
smaller increments, while additionally increasing technical quality.
• Major upgrades generally occur every 3-5 years and each upgrade
generally has 3-5 updates.
78. • Cloud Federations and Server Coalitions
• In large-scale systems, coalition formation supports more
effective use of resources, as well as convenient means to access
these resources.
• It is therefore not surprising that coalition formation for
computational grids has been investigated in the past.
• The interest in grid computing is fading away, while cloud
computing is widely accepted today and its adoption by more
and more institutions and individuals seems to be guaranteed at
least for the foreseeable future.
79. • Two classes of applications of cloud coalitions are reported in the
literature:
• 1.Coalitions among CSPs for the formation of cloud federations. A cloud
federation is an infrastructure allowing a group of CSPs to share
resources; the goal is to balance the load and improve system reliability.
• 2.Coalitions among the servers of a data center. The goal is to assemble
a pool of resources larger than the ones available from a single server.
• In recent years the number of CSPs has increased significantly. The
question if they should cooperate to share their resources led to the
idea of cloud federations, groups of CSPs who have agreed on a set of
common standards and are able to share their resources.
80. • Cloud coalition formation raises a number of technical, as well as
nontechnical problems.
• Cloud federations require a set of standards.
• The cloud computing landscape is still evolving and an early
standardization may slowdown and negatively affects the adoption of
new ideas and technologies.
• At the same time, CSPs want to maintain their competitive
advantages by closely guarding the details of their internal algorithms
and protocols.
81. • Four Levels of Federation
• Creating a cloud federation involves research and development at
different levels: conceptual, logical and operational, and
infrastructural.
83. • Figure provides a comprehensive view of the challenges faced in
designing and implementing an organizational structure that
coordinates together, cloud services that belong to different
administrative domains and makes them operate within a
context of a single unified service middleware.
• Each cloud federation level presents different challenges and
operates at a different layer of the IT stack.
• It then requires the use of different approaches and
technologies.
84. • CONCEPTUAL LEVEL
• The conceptual level addresses the challenges in presenting a cloud
federation as a favourable solution.
• In this level it is important to clearly identify the advantages for either
service providers or service consumers in joining a federation.
• To describe the new opportunities that a federated environment
creates.
85. • Elements of concern at this level are:
• Motivations for cloud providers to join a federation.
• Motivations for service consumers to influence a federation.
• Advantages for providers in leasing their services to other providers.
• Responsibilities of providers once they have joined the federation.
• Trust agreements between providers.
• Transparency versus consumers.
• Among these aspects, the most relevant are the motivations of both
service providers and consumers in joining a federation.
86. • LOGICAL & OPERATIONAL LEVEL
• The logical and operational level of a federated cloud identifies and
addresses the challenges in creating a framework that enables the
aggregation of providers that belong to different administrative domains
• At this level, policies and rules for interoperation are defined.
• Moreover, this is the layer at which decisions are made as to how and when
to lease a service to—or to leverage a service from— another provider.
• The logical component defines a context in which agreements among
providers are settled and services are conveyed, whereas the operational
component characterizes and shapes the dynamic behaviour of the
federation as a result of the single providers’ choices.
87. It is important at this level to address the following challenges:
• How should a federation be represented?
• How should we model and represent a cloud service, a cloud
provider, or an agreement?
• How should we define the rules and policies that allow providers
to join a federation?
• What are the mechanisms in place for settling agreements
among providers?
• What are provider’s responsibilities with respect to each other?
88. • When should providers and consumers take advantage of the
federation?
• Which kinds of services are more likely to be leased or bought?
• How should we price resources that are leased, and which
fraction of resources should we lease?
89. • INFRASTRUCTURE LEVEL
• The infrastructural level addresses the technical challenges involved
in enabling heterogeneous cloud computing systems to interoperate
seamlessly.
• It deals with the technology barriers that keep separate cloud
computing systems belonging to different administrative domains.
• By having standardized protocols and interfaces, these barriers can be
overcome.
90. At this level it is important to address the following issues:
• What kind of standards should be used?
• How should design interfaces and protocols be designed for interoperation?
• Which are the technologies to use for interoperation?
• How can we realize a software system, design platform components, and
services enabling interoperability?
Interoperation and composition among different cloud computing vendors is
possible only by means of open standards and interfaces. Moreover,
interfaces and protocols change considerably at each layer of the Cloud
Computing Reference Model.
91. Future of Federation
• The federated cloud model is a force for real democratization in the
cloud market.
• It’s how businesses will be able to use local cloud providers to
connect with customers, partners and employees anywhere in the
world.
• It’s how end users will finally get to realize the promise of the cloud.
• And, it’s how data center operators and other service providers will
finally be able to compete with, and beat, today’s so-called global
cloud providers.
92. • The future of cloud computing as one big public cloud.
• Others believe that enterprises will ultimately build a single large
cloud to host all their corporate services.
• This is, of course, because the benefit of cloud computing is
dependent on large – very large – scale infrastructure, which provides
administrators and service administrators and consumers the ability
for ease of deployment, self service, elasticity, resource pooling and
economies of scale.
• However, as cloud continues to evolve – so do the services being
offered.
93. • Cloud Services & Hybrid Clouds
• Services are now able to reach a wider range of consumers, partners,
competitors and public audiences.
• It is also clear that storage, compute power, streaming, analytics and
other advanced services are best served when they are in an
environment tailored for the proficiency of that service.
94. • One method of addressing the need of these service environments is
through the advent of hybrid clouds.
• Hybrid clouds, by definition, are composed of multiple distinct cloud
infrastructures connected in a manner that enables services and data
access across the combined infrastructure.
• The intent is to leverage the additional benefits that hybrid cloud
offers without disrupting the traditional cloud benefits.
• While hybrid cloud benefits come through the ability to distribute the
work stream, the goal is to continue to realize the ability for managing
peaks in demand, to quickly make services available and capitalize on
new business opportunities.
95. • The Solution: Federation
• Federation creates a hybrid cloud environment with an increased focus
on maintaining the integrity of corporate policies and data integrity.
• Think of federation as a pool of clouds connected through a channel of
gateways;
• gateways which can be used to optimize a cloud for a service or set of
specific services.
• Such gateways can be used to segment service audiences or to limit
access to specific data sets.
• In essence, federation has the ability for enterprises to service their
audiences with economy of scale without exposing critical applications or
vital data through weak policies or vulnerabilities.
96. • Many would raise the question: if Federation creates multiples of
clouds, doesn’t that mean cloud benefits are diminished?
• I believe the answer is no, due to the fact that a fundamental change
has transformed enterprises through the original adoption of cloud
computing, namely the creation of a flexible environment able to
adapt rapidly to changing needs based on policy and automation.
• Cloud end-users are often tied to a unique cloud provider, because of
the different APIs, image formats, and access methods exposed by
different providers that make very difficult for an average user to
move its applications from one cloud to another, so leading to a
vendor lock-in problem.
97. • Many SMEs have their own on-premise private cloud infrastructures
to support the internal computing necessities and workloads. These
infrastructures are often over-sized to satisfy peak demand periods,
and avoid performance slow-down. Hybrid cloud (or cloud bursting)
model is a solution to reduce the on-premise infrastructure size, so
that it can be dimensioned for an average load, and it is
complemented with external resources from a public cloud provider
to satisfy peak demands.
98. • Many big companies (e.g. banks, hosting companies, etc.) and also
many large institutions maintain several distributed data-centers or
server-farms, for example to serve to multiple geographically
distributed offices, to implement HA, or to guarantee server proximity
to the end user.
• Resources and networks in these distributed data-centers are usually
configured as non-cooperative separate elements.
99. • Many educational and research centers often deploy their own
computing infrastructures, that usually do not cooperate with other
institutions, except in same punctual situations (e.g. in joint projects or
initiatives).
• Many times, even different departments within the same institution
maintain their own non-cooperative infrastructures.This Study Group
will evaluate the main challenges to enable the provision of federated
cloud infrastructures, with special emphasis on inter-cloud networking
and security issues:
• Security and Privacy
• Interoperability and Portability
• Performance and Networking Cost
100. • The first key action aims at “Cutting through the Jungle of Standards”
to help the adoption of cloud computing by encouraging compliance
of cloud services with respect to standards and thus providing
evidence of compliance to legal and audit obligations.
• These standards aim to avoid customer lock in by promoting
interoperability, data portability and reversibility.
101. • The second key action “Safe and Fair Contract Terms and Conditions”
aims to protect the cloud consumer from insufficiently specific and
balanced contracts with cloud providers that do not “provide for
liability for data integrity, confidentiality or service continuity”.
• The cloud consumer is often presented with "take-it-or-leave-it
standard contracts that might be cost-saving for the provider but is
often undesirable for the user”.
102. • Interface: Various cloud service providers have different APIs, pricing
models and cloud infrastructure.
• Open cloud computing interface is necessary to be initiated to provide
a common application programming interface for multiple cloud
environments.
• The simplest solution is to use a software component that allows the
federated system to connect with a given cloud environment.
103. • Trusted Servers
• In order to make it easier to find people on other servers we introduced the
concept of “trusted servers” as one of our last steps.
• This allows administrator to define other servers they trust.
• If two servers trust each other they will sync their user lists.
• This way the share dialogue can auto-complete not only local users but also
users on other trusted servers.
• The administrator can decide to define the lists of trusted servers manually or
allow the server to auto add every other server to which at least one
federated share was successfully created.
• This way it is possible to let your cloud server learn about more and more
other servers over time, connect with them and increase the network of
trusted servers.
104. • Open Challenges: where we’re taking Federated Cloud Sharing
• Of course there are still many areas to improve.
• For example the way you can discover users on different server to
share with them, for which we’re working on a global, shared address
book solution.
• Another point is that at the moment this is limited to sharing files.
• A logical next step would be to extend this to many other areas like
address books, calendars and to real-time text, voice and video
communication and we are, of course, planning for that.