0% found this document useful (0 votes)
655 views

Emerging Technologies - Lecture Notes - CH 1 & 2

Uploaded by

lemma4a
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
655 views

Emerging Technologies - Lecture Notes - CH 1 & 2

Uploaded by

lemma4a
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 73

INTRODUCTION TO

EMERGING TECHNOLOGIES
NETSANET GETNET (MSC CE, BSC EE)
LECTURER, ADDIS ABABA SCIENCE AND TECHNOLOGY UNIVERSITY
DEPARTMENT OF COMPUTER ENGINEERING
I. INTRODUCTION
ACTIVITY 1.1
• What is an Emerging Technology?
• Emerging technology
• A new technology
• May also refer to the continuing development of existing technology;
• Can have slightly different meanings when used in different areas
• Media, business, science, or education.
• The term commonly refers to technologies that are currently developing, or that are
expected to be available within the next five to ten years, and is usually reserved for
technologies that are creating or are expected to create significant social or economic
effects.
• Technological evolution is a theory of radical transformation of society through
technological development.
TECHNOLOGY & EVOLUTION
• What is TECHNOLOGY?
• The application of scientific knowledge for practical purposes (e.g. industry)
• Machinery or equipment developed from the application of scientific
knowledge.
• Branch of knowledge dealing with engineering or applied sciences.
• Tools and machines that may be used to solve real-world problems.
• Science of Mechanical and Industrial Arts.
• What is EVOLUTION?
• The process of developing by gradual processes.
• Example: Cellular Communication Evoultion
TECHNOLOGY & EVOLUTION …
• From Latin word evolvere: e + volvere
• e stands for “out”, volvere stands for “to roll”
• So, evolution stands for something that rolls out! Something that changes its
forms incrementally.
• Something that changes adaptivey from what it was.
• Technological Evolution, Industrial Evolution, Sciemtific Evolution, Natural
Evolution
ACTIVITY 1.2
• List Top 5 emerging technologies in the current time
• Artificial Intelligence
• Robotics
• IoT
• 3D Printing
• Biometrics
• Blockchain – a chain of blocks. Block = digital information, chain = public database
• So blockchain stands for digital information stored in a public database.
• Blocks on the “blockchain” are made up of digital pieces of information with three parts:
• Information about date, time and amount of your current transaction on an e-commerce site such
as Amazon, information about who is participating in the transactions, information to distinguish
the blocks from other blocks (unique code called hash is used to tell apart blocks).
• Hashes are cryptographic codes created by special algorithms.
ACTIVITY 1.2 …
• Nanotechnology – manipulation of matter on an atomic, molecular or supramolecular scale
for a goal of precisely manipulating atoms and molecules for fabrication of macroscale
products.
• Cloud Computing
• Big Data
INDUSTRIAL REVOLUTION
• A period of major industrialization and innovation that took place during the late
1700s and early 1800s.
• An Industrial Revolution at its core occurs when a society shifts from using tools
to make products to use new sources of energy, such as coal, to power machines
in factories.
• The revolution started in England, with a series of innovations to make labor
more efficient and productive.
• The Industrial Revolution was a time when the manufacturing of goods moved
from small shops and homes to large factories.
• This shift brought about changes in culture as people moved from rural areas to
big cities in order to work in the industries.
INDUSTRIAL REVOLUTION…
• The American Industrial Revolution commonly referred to as the Second Industrial Revolution
• Started between 1820 and 1870.
• Changed the way items were manufactured and had a wide reach.
• Industries such as textile manufacturing, mining, glass making, and agriculture all had
undergone changes.
• For example, prior to the Industrial Revolution, textiles were primarily made of wool and were
handspun.
• First industrial revolution: mechanization through water and steam power
• Second industrial revolution: mass production and assembly lines using electricity
• Third industrial revolution: adoption of computers and automation
• Fourth industrial revolution: will take what was started in the third and enhance it with smart
and autonomous systems fueled by availability of data and machine learning.
INDUSTRIAL REVOLUTION…
• Generally, the following industrial revolutions fundamentally
changed and transformed the world around us into modern society:
• The steam engine,
• The age of science and mass production,
• The rise of digital technology,
• Smart and autonomous systems.
ACTIVITY 1.3
• What are the most important inventions of industrial
revolutions?
• Transportation: The Steam Engine, The Railroad, The Diesel Engine, The
Airplane.
• Communication.: The Telegraph (long distance transmission of textual
message). The Transatlantic Cable. The Phonograph (Thomas Edison). The
Telephone.
• Industry: The Cotton Gin. The Sewing Machine. Electric Lights.
HISTORICAL BACKGROUND
• Industrial Revolution began in Great Britain in the late 1970s before spreading
to the rest of Europe (Belgium, France, Germany were the countries that
followed England).
• Final cause of industrial revolution was Agricultural Revolution
• The industrial revolution began due in part to an increase food production which
was the key outcome of the Agricultural Revolution.
• Four types of industries:
• Primary Industry: getting raw materials (from mining, farming, fishing, hunting …)
• Secondary Industry: involves manufacturing. E.g. making cars, steel, …
• Tertiary Industry: provide services (teaching, nursing, …)
• Quaternary Industry: involves research and development (e.g. IT)
INDUSTRIAL REVOLUTIONS
• 1st Industrial Revolution (IR 1.0)
• Transition to new manufacturing processes.
• Began in the 1760s
• Transition from manual to machines
• Use of steam engines (steam power), machines and tools, and rise of factory system.
Steam engines use the force produced by
steam pressure to push a piston
back and forth inside a cylinder.
• This force is transformed by a
connecting rod and a flywheel.
INDUSTRIAL REVOLUTIONS (2)
• 2st Industrial Revolution (IR 2.0)
• Known as Technological Revolution
• Began in the 1870s
• Advancements like development of methods for manufacturing interchangeable parts and
widespread adoption of pre-existing technological systems (telegraph, railroad networks)
and new technological systems (like electrical power and telephones) were introduced.
INDUSTRIAL REVOLUTIONS (3)
• 3rd Industrial Revolution (IR 3.0)
• Transition from mechanical and analog electronic systems to digital electronics systems
• Began in the 1950s
• Nicknamed “Digital Revolution”
• Mass production and widespread use of digital logic circuits and the derived technologies
such as:
• Computers
• Hand phones, and
• The Internet
• Transformed traditional production and business techniques enabling people to
communicate without limitations of distance.
• Some of the IR 3.p practices are still practical and rising (digital computers, digital record).
INDUSTRIAL REVOLUTIONS (3) …
INDUSTRIAL REVOLUTIONS (4)
• 4th Industrial Revolution (IR 4.0)
• Advancement of technologies like:
• Robotics,
• IoT,
• Additive manufacturing, and
• Autonomous vehicles

• These technologies are also called “cyber-physical systems”


• A cyber-physical system is a mechanism that is controlled or monitored by
computer-based algorithms, tightly integrated with the Internet and its users.
INDUSTRIAL REVOLUTIONS (4) …
• Example: machines operated by giving instructions from computers (called
Computer Numerical Control, CNC)
• AI is another breakthrough
• Implemented almost everywhere in our daily digital experiences
• Used as key element in the Autonomous Vehicles and Robots.
ACTIVITY 1.4 & 1.5
Activity 1.4
• What makes IR 4.0 different from the previous IRs (IR 1.0 – IR 3.0)?

Activity 1.5
• Discus about Agricultural Revolutions, Information Revolutions and level of the
industrial revolution in Ethiopia and also compare with UK, USA, and China?
ROLE OF DATA FOR EMERGING TECHNOLOGIES
• What is data?
• It is regarded as the new oil and is the most important strategic asset today.
• Drives and determines the future of science and technology, the economy,
and everything in the world today and tomorrow.
• It also presents challenges, that in turn bring incredible innovation and
economic opportunities.
• Involves core disciplines such as computing, informatics and statistics as
well as the broad-based fields of business, social science, and health and
medical sciences.
ROLE OF DATA FOR EMERGING TECHNOLOGIES …
DATA VS INFORMATION VS BIG DATA
• Usually interpreted the same way but are actually different.
• Data refers to raw, unprocessed, unorganized, simple and useless in its own.
• Example: a certain set of numbers, characters, images
• When data is processed, organized, structured or presented in a given context so as to make
it useful, it is called information.
• Big data: is extremely large data sets that may be analyzed computationally to reveal
patterns, trends, and associations.
• It is a term used to describe large volumes of data – both structured and unstructured.
• Data that contains greater variety arriving in increasing volumes with high velocity. (the
three Vs)
• Data so voluminous that traditional data processing tools just can’t manage them.
• Huge volume of data that emanates from various sources.
DATA VS INFORMATION VS BIG DATA …
• Big data sources/applications
• Education industry
• Healthcare industry
• Government sector
• Media and Entertainment industry
• Weather patterns
• Transportation industry
• Banking sector
ACTIVITY 1.6

• Discuss Data, Information and Big Data.


• List out some programmable devices and explain their property.
ENABLING DEVICES & NETWORK (PROGRAMMABLE
DEVICES
• Four basic kinds of devices in the world of digital electronic systems:
(programmable devices)
• Memory, Microprocessors, Logic and Network devices.
• Memory devices: store some information (e.g. spreadsheet, database)
• Microprocessors: execute instructions (programs) to process data
• E.g. word processing, video game
• Logic devices: provide specific functions like device-to-device interfacing,
data communication, signal processing, data display, timing and control
operations.
• Network: is collection of computers, servers, mainframes, network devices,
peripherals connected together to allow sharing of data or other resources.
• The Internet is the biggest network we have.
DEVICES…
• Programmable devices are chips that incorporate:
• Field Programmable logic devices (FPGAs),
• Complex programmable logic devices (CPLD), and
• Programmable logic devices (PLD).
• There are also devices that are the analog equivalent of these called field-programmable
analog arrays.

Programmable devices
DEVICES …
• A computer is the most common programmable device: it can be
programmed to follow a set of instructions and produce some results.
• Computers may vary depending on their purposes.
• Small computers (many electronic devices we use such as calculators, phones,
…) perform only one or small number of operations, but still they are
programmed to follow a certain set of instructions to achieve that.
SOME PROGRAMMABLE DEVICES
• Achronix Speedster SPD60
• Actel’s
• Altera Stratix IV GT and Arria II GX
• Atmel’s AT91CAP7L
• Cypress Semiconductor’s programmable system-on-chip (PSoC) family
• Lattice Semiconductor’s ECP3
• Lime Microsystems’ LMS6002
• Silicon Blue Technologies
• Xilinx Virtex 6 and Spartan 6
• Xmos Semiconductor L series
ACTIVITY 1.7
• Under subtopic of enabling devices and network, we have seen some list of
programmable devices now barfly discussed futures of some programmable
devices?
DEVICES …
• Network devices/Service-enabling devices (SEDs)
• Channel Service Unit (CSU) and Data Service Unit (DSU) (CSU/DSU)
• Connect DTE (such as router) and the telecommunication network (CSU connects
the DTE to the telecommunication network, DSU interfaces with the DTE.
• Modems
• Routers
• Switches
• Conferencing equipment
• Network appliances (SID - system ID, NID – Network ID)
• Hosting equipment and servers
HMI – HUMAN-MACHINE INTERACTION
• Refers to communication and interaction between a human and a machine via a user
interface
• Natural interfaces becoming so common: devices are capable of understanding some of
human gestures.
• HCI – is the study of how people interact with computers and to what extent could
computers interact with human beings successfully.
• Three components in HCI:
• The user
• The computer
• The interaction (how they interact with each other).
• Users interact with computers using input/output devices
• Displays – displaying graphical user interfaces which the user could use to send commands to
the computer or receive results from the computer
• Input devices (KB, mouse) allowing inputs to computer
HMI – HUMAN-MACHINE INTERACTION …
• How important is human-computer interaction?
• HCI improves the interaction between users and computers by making
computers more user-friendly and receptive to the user's needs.
• Disciplines Contributing to Human-Computer Interaction (HCI)
• Cognitive psychology: Limitations, information processing, performance prediction,
cooperative working, and capabilities.
• Computer science: Including graphics, technology, prototyping tools, user interface
management systems.
• Linguistics.
• Engineering and design.
• Artificial intelligence.
• Human factors.
FUTURE TRENDS IN EMERGING TECHNOLOGIES
• 5G Networks
• Artificial Intelligence (AI)
• Autonomous Devices
• Blockchain
• Augmented Analytics
• Digital Twins (a digital replica of a living or non-living physical entity)
• Enhanced Edge Computing (distributed computing which brings computation and
data closer to the location where it is needed, saving bandwidth and improving
response time)
• Immersive Experiences in Smart Spaces (environments where humans and
technology can openly communicate with each other in a physical/digital setting).
ACTIVITY 1.9: ASSIGNMENT I – A (REPORT)
• Briefly discussed these emerging technologies how it could be shaping the future
of you and your business
• Chatbots (group 5)
• Virtual, Augmented & Mixed Reality (groups 2, 6)
• Blockchain. The blockchain frenzy is real (groups 3, 8)
• Ephemeral Apps (groups 4, 7)
• Artificial Intelligence. (1)
IN CLASS DISCUSSION/DEBATE
• Divide your class into small groups of 3-5 students.

• Describe what innovation or invention you choose.


• Why your choice of innovation or invention was the most important?
• The impact on society of their innovation.

• Steam Engine
• Railroad
• Interchangeable Parts
• Steamboat
• Spinning Jenny
• High-quality iron

II. DATA SCIENCE
OVERVIEW OF DATA SCIENCE
• Activity 2.1 - Define:
• Data science?
• Data and Information
• Big data?
• What is role of data in emerging technologies?
• Data Science is a multi-disciplinary field that uses scientific methods, processes,
algorithms, and systems to extract knowledge and insights from structured, semi-
structured and unstructured data.
• Much more than just analyzing data.
• Offers a range of roles and requires a range of skills (mathematical, programing, analytical, …)
OVERVIEW OF DATA SCIENCE …
• Example:
• Consider data involved in buying a box of cereal from the store or supermarket:
• Your data here is the planned purchase written somewhere
• When you get to the store, you use that piece of data to remind yourself about what
you need to buy and pick it up and put it in your cart.
• At checkout, the cashier scans the barcode on your box and the cash register logs the
price.
• Back in the warehouse, a computer informs the stock manager that it is time to order
this item from distributor because your purchase takes the last box in the store.
• You may have a coupon for your purchase and the cashier scans that too, giving you a
predetermined discount.
OVERVIEW OF DATA SCIENCE …
• Example:
• At the end of the week, a report of all the scanned manufacturer coupons gets uploaded
to the cereal company so they can issue a reimbursement to the grocery store for all of
the coupon discounts they have handed out to customers.
• Finally, at the end of the month, a store manager looks at a colorful collection of pie
charts showing all the different kinds of cereal that were sold and, on the basis of strong
sales of cereals, decides to offer more varieties of these on the store’s limited shelf
space next month.
• So, the small piece of information on your notebook ended up in many different places
• Notably on the desk of a manager as an aid to decision making.
• The data went through many transformations.
OVERVIEW OF DATA SCIENCE …
• Example …
• In addition to the computers where the data might have stopped by or stayed on for
the long term, lots of other pieces of hardware—such as the barcode scanner—were
involved in collecting, manipulating, transmitting, and storing the data.
• In addition, many different pieces of software were used to organize, aggregate,
visualize, and present the data.
• Finally, many different human systems were involved in working with the data.
• People decided which systems to buy and install, who should get access to what kinds
of data, and what would happen to the data after its immediate purpose was fulfilled.
• Data science evolves as one of the most promising and in-demand career paths.
• Professionals use advanced techniques for analyzing large volumes of data.
• They are also skilled in communicating results to their non-technical counterparts.
OVERVIEW OF DATA SCIENCE …
• Skills important for data science:
• Statistics
• Linear algebra
• Programming knowledge with focus on data warehousing, data mining, and data modeling
OVERVIEW OF DATA SCIENCE …
• Activity 2.2
• Describe in some detail the main disciplines that contribute to data science.
• Write a small report on the role of data scientists .
DATA VS INFORMATION
• Data: a representation of facts, concepts, or instructions in a formalized manner, which
should be suitable for communication, interpretation, or processing, by human or
electronic machines.
• It can be described as unprocessed facts and figures.
• It is represented groups of non-random symbols in the form of text, images, voice, videos
representing quantities, action and objects.
• Information is the processed/interpreted data on which decisions and actions are based.
• It is data that has been processed into a form that is meaningful to the recipient and is of
real or perceived value in the current or the prospective action or decision of recipient.
• It is interpreted data; created from organized, structured, and processed data in a
particular context.
DATA PROCESSING CYCLE
• Data processing: is the re-structuring or re-ordering of data by people or machine to
increase their usefulness and add values for a particular purpose.
• Consists of the following basic steps: input, processing, and output, in that order.

• Input − input data is prepared in some convenient form for processing.


• The form will depend on the processing machine. For example, when electronic computers are used,
the input data can be recorded on any one of the several types of input medium, such as magnetic
disks, tapes, and so on.
DATA PROCESSING CYCLE
• Processing - input data is changed to produce data in a more useful form.
• For example, pay-checks can be calculated from the time cards, or a summary of sales for the month
can be calculated from the sales orders.

• Output − the result of the proceeding processing step is collected.


• The particular form of the output data depends on the use of the data. For example, output data may be
pay-checks for employees.

• Activity 2.3
• Discuss the main differences between data and information with examples.
• Can we process data manually using a pencil and paper? Discuss the differences with
data processing using the computer.

DATA TYPES AND THEIR REPRESENTATION
• Data types can be described from diverse perspectives.
1. Computer science and programming perspective:
• A data type is an attribute of data that tells the compiler or interpreter how the
programmer intends to use the data.
• Almost all programming languages explicitly include the notion of data type, though
different languages may use different terminology.
• Common data types include:
• Integers: store integers.
• Booleans: store one of the two values: true or false
• Characters: store a single character (numeric, alphabetic, symbol, …)
• Floating-point numbers: stores real numbers
• Alphanumeric strings: stores a combination of characters and numbers.
DATA TYPES AND THEIR REPRESENTATION …
• A data type:
• Constrains the values that an expression (such as a variable or a function) might take.
• Defines the operations that can be performed on the data, the meaning of the data, and the way values
of that data type can be stored/represented.

2. Data types from Data Analytics perspective


• From a data analytics point of view there are three common types of data types or
structures:
• Structured, Semi-structured, and Unstructured data types.
• Describes the three types of data and metadata.
DATA TYPES AND THEIR REPRESENTATION …

Data types from a data analytics perspective


• Structured Data: is data that adheres to a pre-defined data model and is therefore straightforward
to analyze.
• Structured data conforms to a tabular format with a relationship between the different rows and
columns.
• Common examples of structured data are Excel files or SQL databases.
• Each of these has structured rows and columns that can be sorted.
• Structured data is considered the most ‘traditional’ form of data storage, since the earliest versions
of database management systems (DBMS) were able to store, process and access structured data.
DATA TYPES AND THEIR REPRESENTATION …
• Unstructured Data: is information that either does not have a predefined data model or is not organized
in a pre-defined manner.
• Unstructured information is typically text-heavy but may contain data such as dates, numbers, and
facts as well.
• This results in irregularities and ambiguities that make it difficult to understand using traditional
programs as compared to data stored in structured databases.
• Common examples of unstructured data include audio, video files or No-SQL databases.
• Semi-structured Data: is a form of structured data that does not conform with the formal structure of
data models associated with relational databases or other forms of data tables.
• But, contain tags or other markers to separate semantic elements and enforce hierarchies of records and
fields within the data.
• Therefore, it is also known as a self-describing structure.
• Examples of semi-structured data include JSON and XML are forms of semi-structured data.
DATA TYPES AND THEIR REPRESENTATION …
• Metadata – Data about Data: A last category of data type is metadata.
• From a technical point of view, this is not a separate data structure, but it is one of the
most important elements for Big Data analysis and big data solutions.
• Metadata is data about data. It provides additional information about a specific set of
data.
• Example: In a set of photographs, metadata could describe when and where the photos
were taken.
• The metadata then provides fields for dates and locations which, by themselves, can be
considered structured data.
• Because of this reason, metadata is frequently used by Big Data solutions for initial
analysis.
DATA TYPES AND THEIR REPRESENTATION …
• Activity 2.4
➢ Discuss data types from programing and analytics perspectives.
➢ Compare metadata with structured, unstructured and semi-structured data.
➢ Given at least one example of structured, unstructured and semi-structured data types.
DATA TYPES AND THEIR REPRESENTATION …
• Data value Chain:
• The Data Value Chain is introduced to describe the information flow within a big data
system as a series of steps needed to generate value and useful insights from data.
• The Big Data Value Chain identifies the following key high-level activities:
DATA TYPES AND THEIR REPRESENTATION …
• Data Acquisition: is the process of gathering, filtering, and cleaning data before it is put
in a data warehouse or any other storage solution on which data analysis can be carried
out.
• Data acquisition is one of the major big data challenges in terms of infrastructure
requirements.
• The infrastructure required to support the acquisition of big data must:
• deliver low, predictable latency in both capturing data and in executing queries;
• be able to handle very high transaction volumes, often in a distributed environment; and
• support flexible and dynamic data structures.
DATA TYPES AND THEIR REPRESENTATION …
• Data Analysis: is concerned with making the raw data acquired amenable to use in
decision-making as well as domain-specific usage.
• Data analysis involves exploring, transforming, and modelling data with the goal of
highlighting relevant data, synthesizing and extracting useful hidden information with
high potential from a business point of view.
• Related areas include data mining, business intelligence, and machine learning.
• Data Curation: is the active management of data over its life cycle to ensure it meets the
necessary data quality requirements for its effective usage.
• Data curation processes can be categorized into different activities such as content
creation, selection, classification, transformation, validation, and preservation.
DATA TYPES AND THEIR REPRESENTATION …
• Data curation is performed by expert curators that are responsible for improving the
accessibility and quality of data.
• Data curators (also known as scientific curators, or data annotators) hold the
responsibility of ensuring that data are trustworthy, discoverable, accessible, reusable,
and fit their purpose.
• A key trend for the curation of big data utilizes community and crowd sourcing
approaches.
• Data Storage: is the persistence and management of data in a scalable way that satisfies
the needs of applications that require fast access to the data.
• Relational Database Management Systems (RDBMS) have been the main, and almost
unique, solution to the storage paradigm for nearly 40 years.
DATA TYPES AND THEIR REPRESENTATION …
• However, the ACID (Atomicity, Consistency, Isolation, and Durability) properties that
guarantee database transactions lack flexibility with regard to schema changes and the
performance and fault tolerance when data volumes and complexity grow, making them
unsuitable for big data scenarios.
• NoSQL technologies have been designed with the scalability goal in mind and present a
wide range of solutions based on alternative data models.
• Data Usage: covers the data-driven business activities that need access to data, its
analysis, and the tools needed to integrate the data analysis within the business activity.
• Data usage in business decision-making can enhance competitiveness through reduction
of costs, increased added value, or any other parameter that can be measured against
existing performance criteria
ACTIVITY 2.5
➢ Which information flow step in the data value chain you think is labor-intensive? Why?
• Data Acquisition? Analysis? Curation? Storage? Usage?
• Of course, it is Curation!
➢ What are the different data types and their value chain?
BIG DATA: DEFINITION
• Big data is a blanket term for the non-traditional strategies and technologies needed to
gather, organize, process, and gather insights from large datasets.
• While the problem of working with data that exceeds the computing power or storage of
a single computer is not new, the pervasiveness, scale, and value of this type of
computing has greatly expanded in recent years.
• What Is Big Data?
• Big data is the term for a collection of data sets so large and complex that it becomes
difficult to process using on-hand database management tools or traditional data
processing applications.
BIG DATA: DEFINITION …
• Generally speaking, big data is:
• Large datasets
• The category of computing strategies and technologies that are used to handle large datasets.
• In this context, “large dataset” means a dataset too large to reasonably process or store
with traditional tools or on a single computer.
BIG DATA CHARACTERISTICS – THE 4VS
• Big data differs from traditional data in the following ways:
• Volume: large amounts of data Zeta bytes/Massive datasets. Orders of magnitude larger
than traditional datasets.
• Velocity: Data is live streaming or in motion. The speed that data moves through the
system. Data is frequently flowing into the system from multiple sources and is often
processed in real-time.
• Variety: data comes in many different forms, quality and from diverse sources. (Social
media, server logs, sensors, …)
• Veracity: can we trust the data? How accurate is it? etc.
BIG DATA THE 4VS: INFOGRAPHIC (IBM)
BIG DATA CHARACTERISTICS: THE 4VS …
BIG DATA SOLUTIONS: CLUSTERED COMPUTING
• Individual computers are often inadequate for handling big data at most stages.
• Clustered computing is used to better address the high storage and computational needs
of big data.
• Clustered computing is a form of computing in which a group of computers (often called
nodes) that are connected through a LAN (local area network) so that, they behave like a
single machine.
• The set of computers is called a cluster.
• The resources from these computers are pooled to appear as one more powerful computer
than the individual computers.
BIG DATA SOLUTIONS: CLUSTERED COMPUTING …
• Big data clustering software combines the resources of many smaller machines, seeking
to provide a number of benefits:
• Resource Pooling: Combining the available storage space, CPU and memory is
extremely important.
• Processing large datasets requires large amounts of all three of these resources.
• High Availability: Clusters provide varying levels of fault tolerance and availability
guarantees to prevent hardware or software failures from affecting access to data and
processing.
• Increasingly important for real-time analytics of big data.
• Easy Scalability: Clusters make it easy to scale horizontally by adding more
machines to the group. The system can react to changes in resource requirements
without expanding the physical resources on a machine.
BIG DATA SOLUTIONS: CLUSTERED COMPUTING …
• Using clusters requires a solution for managing cluster membership, coordinating
resource sharing, and scheduling actual work on individual nodes.
• Cluster membership and resource allocation can be handled by softwares like Hadoop’s
YARN (which stands for Yet Another Resource Negotiator).
• The assembled computing cluster often acts as a foundation that other software
interfaces with to process the data.
• The machines involved in the computing cluster are also typically involved with the
management of a distributed storage system, which we will talk about when we discuss
data persistence.
BIG DATA: ACTIVITY 2.6
List and discuss the characteristics of big data.
Describe the big data life cycle.
Which step you think most useful and why?
List and describe each technology or tool used in the big data life cycle.
Discuss the three methods of computing over a large dataset.
BIG DATA SOLUTIONS: HADOOP
• Hadoop is an open-source framework intended to make interaction with big data easier.
• It is a framework that allows for the distributed processing of large datasets across
clusters of computers using simple programming models.
• The four key characteristics of Hadoop are:
• Economical: Its systems are highly economical as ordinary computers can be used for
data processing.
• Reliable: It is reliable as it stores copies of the data on different machines and is
resistant to hardware failure.
• Scalable: It is easily scalable both, horizontally and vertically.
• Flexible: It is flexible and you can store as much structured and unstructured data as you
need.
BIG DATA SOLUTIONS: HADOOP ECOSYSTEM
• Hadoop Ecosystem is a platform or a suite which provides various services to solve the
big data problems.
• Hadoop has an ecosystem that has evolved from its four core components: data
management, access, processing, and storage.
• It is continuously growing to meet the needs of Big Data.
• It comprises the following components and many others:
• HDFS: Hadoop Distributed File System
• YARN: Yet Another Resource Negotiator
• MapReduce: Programming based Data Processing
• Spark: In-Memory data processing
BIG DATA SOLUTIONS: HADOOP ECOSYSTEM …
• PIG, HIVE: Query-based processing of data services
• HBase: NoSQL Database
• Mahout, Spark MLLib: Machine Learning algorithm libraries
• Solar, Lucene: Searching and Indexing
• Zookeeper: Managing cluster
• Oozie: Job Scheduling
BIG DATA SOLUTIONS: HADOOP ECOSYSTEM …
ACTIVITY 2.7: ASSIGNMENT I – B (REPORT +
PRESENTATION)
• Discuss the purpose of each Hadoop Ecosystem components.
• Group 1,3,5,7:
BIG DATA LIFE CYCLE WITH HADOOP
1. Ingesting data into the system
• The first stage of Big Data processing is to Ingest data into the system.
• The data is ingested or transferred to Hadoop from various sources such as relational
databases, systems, or local files.
• Sqoop transfers data from RDBMS to HDFS, whereas Flume transfers event data.
2. Processing the data in storage.
• The second stage is Processing.
• In this stage, the data is stored and processed.
• The data is stored in the distributed file system, HDFS, and the NoSQL distributed
data, HBase.
• Spark and MapReduce perform data processing.
BIG DATA LIFE CYCLE WITH HADOOP …
3. Computing and analyzing data
• The third stage is to Analyze Data
• Here, the data is analyzed by processing frameworks such as Pig, Hive, and Impala.
• Pig converts the data using a map and reduce and then analyzes it.
• Hive is also based on the map and reduce programming and is most suitable for
structured data.
4. Visualizing the results
• The fourth stage is access, which is performed by tools such as Sqoop, Hive, Hue
and Cloudera Search.
• In this stage, the analyzed data can be accessed by users.

You might also like