0% found this document useful (0 votes)

60 views

Chapter 2

This document provides an introduction to emerging technologies and data science. It discusses key concepts related to data science including data vs. information, data types and representation, data processing cycles, and data value chains. It also covers basic concepts of big data and the Hadoop ecosystem. The intended learning outcomes are for students to understand these foundational data science topics.

Uploaded by

Ashenafi Paulos

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views

Chapter 2

Uploaded by

Ashenafi Paulos

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 29

Addis Ababa Science and Technology

University

College of Mechanical and Electrical

Engineering

Introduction to Emerging
Technologies
Data science

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

Contents
□ Learning outcomes
□ An overview of data science
□ Data Vs information
□ Data processing cycle
□ Data types and their representation
□ Data value chain
□ Basic concepts of big data
□ Hadoop ecosystem
□ Review questions

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

Learning outcomes
After the successfully completing this chapter, the students can
□Differentiate data and information
□Explain data processing life cycle
□Differentiate different data types from diverse perspectives
□Explain the data value chain
□Explain the basics of big data
□Analyze Hadoop ecosystem components and their use in big data

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

An Overview of Data Science
□ Data science is a multi-disciplinary field that uses scientific methods, processes,
algorithms, and systems to extract knowledge and insights from structured,
semi-structured and unstructured data.

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

Data Vs Information
Data:
□ Representation of facts, concepts, or instructions in a formalized manner, which
should be suitable for communication, interpretation, or processing, by human or
electronic machines.
□ Described as unprocessed facts and figures.
□ Represented with the help of characters such as alphabets (A-Z, a-z), digits (0-9) or
special characters (+, -, /, *, <,>, =, etc.).

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

…Data Vs Information
Information:
□ Processed data on which decisions and actions are based.
□ Data that has been processed into a form that is meaningful to the recipient and is of
real or perceived value in the current or the prospective action or decision of
recipient.
□ Interpreted data; created from organized, structured, and processed data in a
particular context.

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

…Data Vs Information

Source: internet

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

Data Processing Cycle
□ Data processing is the re-structuring or re-ordering of data by people or machines to
increase their usefulness and add values for a particular purpose. It has three steps.
Input:
□ Data preparation in convenient form for processing. The form will depend on
the processing machine.
□ For example, when electronic computers are used for data processing, the input
data can be recorded on hard disk, CD, flash disk and so on.

Source: Introduction to emerging technology module page 23

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

Data Processing Cycle
Processing:
□ The input data is changed to produce data in a more useful form.
□ For example, interest can be calculated on deposit to a bank, or a summary of
sales for the month can be calculated from the sales orders.
Output:
□ The result of the processing step is collected. The particular form of the output
data depends on the use of the data.
□ For example, output data may be payroll for employees.

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

Data types and their representation
1. Data types from Computer programming perspective: defines the operations
that can be done on the data, the meaning of the data, and the way values of that
type can be stored.
E.G int, bool, char, float, double, string
2. Data types from Data Analytics perspective: there are three common types of
data types or structures: Structured, Semi-structured, and Unstructured data types.

Source: Introduction to emerging technology module page 25

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

Structured Data
□ It conforms to a tabular format with a relationship between the different rows and
columns.
□ Examples of structured data are Excel files or SQL databases. Each of these has
structured rows and columns that can be sorted.

Source: internet

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

Semi-structured data
□ It is a form of structured data that does not conform with the formal structure of data
models associated with relational databases or other forms of data tables, but
nonetheless, contains tags or other markers to separate semantic elements and
enforce hierarchies of records and fields within the data.
□ Examples of semi-structured data include JSON and XML

Source: internet

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

Unstructured Data
□ It is information that either does not have a predefined data model or is not
organized in a pre-defined manner. Unstructured information is typically text-heavy
but may contain data such as dates, numbers, and facts as well which results in
irregularities and ambiguities that make it difficult to understand using traditional
programs as compared to data stored in structured databases.
□ Examples of unstructured data include audio, video files or No-SQL databases.

Source: internet
26-May-20 By Dr. Dereje E. and Yonas T., AASTU
Metadata – Data about Data
□ It is not a separate data structure, but it is one of the most important elements for
Big Data analysis and big data solutions.
□ Metadata is data about data. It provides additional information about a specific set
of data.
□ Example, In a set of photographs, metadata could describe when and where
the photos were taken.

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

Data value Chain
□ The Data Value Chain is introduced to describe the information flow within a big
data system as a series of steps needed to generate value and useful insights from
data.

Source: Introduction to emerging technology module page 26

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

…Data value Chain
Data Acquisition:
□ The process of gathering, filtering, and cleaning data before it is put in a data
warehouse or any other storage solution on which data analysis can be carried out.
□ One of the major big data challenges in terms of infrastructure requirements because
the infrastructure must deliver low, predictable latency in both capturing data and in
executing queries; be able to handle very high transaction volumes, often in a
distributed environment; and support flexible and dynamic data structures.

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

…Data value Chain
Data Analysis:
□ Concerned with making the raw data acquired amenable to use in decision-making
as well as domain-specific usage.
□ Involves exploring, transforming, and modeling data with the goal of highlighting
relevant data, synthesizing and extracting useful hidden information with high
potential from a business point of view.

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

…Data value Chain
Data Curation:
□ The active management of data over its life cycle to ensure it meets the necessary
data quality requirements for its effective usage.
□ Its processes can be categorized into different activities such as content creation,
selection, classification, transformation, validation, and preservation.
□ Data curation is performed by expert curators that are responsible for improving the
accessibility and quality of data.
□ Data curators hold the responsibility of ensuring that data are
trustworthy, discoverable, accessible, reusable and fit their purpose.

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

…Data value Chain
Data Storage:
□ The persistence and management of data in a scalable way that satisfies
the needs of applications that require fast access to the data.
□ Relational Database Management Systems (RDBMS) have been the
main, and almost unique, a solution to the storage paradigm. However,
the ACID (Atomicity, Consistency, Isolation, and Durability)
properties that guarantee database transactions lack flexibility with
regard to schema changes
□ NoSQL technologies have been designed with the scalability goal in
mind and present a wide range of solutions based on alternative data
models.
26-May-20 By Dr. Dereje E. and Yonas T., AASTU
…Data value Chain
Data Usage:
□ It covers the data-driven business activities that need access
to data, its analysis, and the tools needed to integrate the
data analysis within the business activity.
□ Data usage in business decision-making can
enhance competitiveness through the reduction of costs,
increased added value, or any other parameter that can be
measured against existing performance criteria.

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

Basic concepts of big data
□ Big data is the term for a collection of data sets so large and complex that it
becomes difficult to process using on-hand database management tools or traditional
data processing applications.
□ In this context, a “large dataset” means a dataset too large to reasonably process or
store with traditional tooling or on a single computer.
□ E.g.

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

Basic concepts of big

data
Big data is characterized by 3V and more:

Source: Introduction to emerging technology module page 29

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

Clustered Computing
□ Individual computers are often inadequate for handling the big data at most stages.
□ To address the high storage and computational needs of big data, computer clusters
are needed.
□ Big data clustering software combines the resources of many smaller
machines, seeking to provide a number of benefits:
□ Resource Pooling □ combine available storage space, CPU, …
□ High Availability □ fault tolerance and availability
□ Easy Scalability □ expansion in resource requirement without expanding
the physical resources on the machine
□ The good example of clustering software is Hadoop’s YARN

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

Hadoop and its Ecosystem
□ Hadoop is an open-source framework intended to make interaction with big data
easier. It is a framework that allows for the distributed processing of large datasets
across clusters of computers using simple programming models.
□ Gives the massive data storage facility, enormous computational power and the
ability to handle different virtually limitless jobs or tasks.
□ The four key characteristics of Hadoop are:
□ Economical7 ordinary computers can be used for data processing
□ Reliable7 stores copies of data on different machines (resistant to HW failure)
□ Scalable7 expand horizontally or vertically by adding few extra nodes
□ Flexible7 store as much structured and unstructured data as you need

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

…Hadoop and its Ecosystem
Hadoop Ecosystem has evolved from its four core components:
1. Data management,
2. Data access,
3. Data processing, and
4. Data storage.
It is continuously growing to meet the needs of Big Data.

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

Source: Introduction to emerging technology module page 31
26-May-20 By Dr. Dereje E. and Yonas T., AASTU
Big Data Life Cycle with Hadoop
Has 4 stages:
1. Ingesting: transferring data into to Hadoop from various sources such as relational
databases, systems, or local files. Sqoop transfers data from RDBMS to HDFS
2. Processing: the data is stored and processed. The data is stored in the distributed
file system, HDFS, and the NoSQL distributed data, HBase. Spark and MapReduce
perform data processing.
3. Computing and analyzing: data analyzation using processing frameworks such as
Pig, Hive, and Impala. Pig converts the data using a map and reduce and then
analyzes it.
4. Visualizing: accessing the result, performed by tools such as Hue and Cloudera
Search.

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

Review Questions
□ Briefly explain data Vs information?
□ Discuss data and its types from computer programming and data
analytics perspectives?
□ Briefly explain each steps of data value chain?
□ List out and discuss the characteristics of Big Data?
□ What is Hadoop system? What is it used for?

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

END!!
26-May-20 By Dr. Dereje E. and Yonas T., AASTU

Complete SQL Notes
81% (53)
Complete SQL Notes
18 pages
Final PPT Trees
No ratings yet
Final PPT Trees
18 pages
How To Master The Art of Intraday Scalping
100% (1)
How To Master The Art of Intraday Scalping
3 pages
Guía Kingdoms of Amalur (Gamepressure)
100% (2)
Guía Kingdoms of Amalur (Gamepressure)
1,028 pages
Unit 1: Markets in Action: Topic 1: Introductory Concepts
No ratings yet
Unit 1: Markets in Action: Topic 1: Introductory Concepts
7 pages
Official Resume
No ratings yet
Official Resume
2 pages
Chapter 2 DS New
No ratings yet
Chapter 2 DS New
29 pages
Chapter 2 DS New
No ratings yet
Chapter 2 DS New
29 pages
Chapter 2 DS New
No ratings yet
Chapter 2 DS New
29 pages
Emerging Chapter 2
No ratings yet
Emerging Chapter 2
26 pages
CHAPTER 2 Emerging
No ratings yet
CHAPTER 2 Emerging
8 pages
Chapter 2. Introduction To Data Science
100% (2)
Chapter 2. Introduction To Data Science
45 pages
Data_Mining_RGPV_Questions_and_Answers (1)
No ratings yet
Data_Mining_RGPV_Questions_and_Answers (1)
2 pages
Technical Report Writing For Ca2 Examination: Topic: Introduction To Data Science
No ratings yet
Technical Report Writing For Ca2 Examination: Topic: Introduction To Data Science
7 pages
Sample Security Plan
No ratings yet
Sample Security Plan
9 pages
Chapter 2. Introduction to Data Science
No ratings yet
Chapter 2. Introduction to Data Science
41 pages
CH-2 Introduction To Data Science
No ratings yet
CH-2 Introduction To Data Science
26 pages
Chapter 2 - Overview for Data Science
No ratings yet
Chapter 2 - Overview for Data Science
31 pages
Chapter Two
No ratings yet
Chapter Two
14 pages
Chapter 2 - Introduction to Data Science (2)
No ratings yet
Chapter 2 - Introduction to Data Science (2)
35 pages
Chapter 2 - Intro to Data Sciences[2]
No ratings yet
Chapter 2 - Intro to Data Sciences[2]
41 pages
Chapter 2. Introduction To Data Science
No ratings yet
Chapter 2. Introduction To Data Science
40 pages
data science
No ratings yet
data science
23 pages
Chapter 2 - Intro To Data Sciences
No ratings yet
Chapter 2 - Intro To Data Sciences
41 pages
Unit 4
No ratings yet
Unit 4
10 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
55 pages
Emerging Chapter 2
No ratings yet
Emerging Chapter 2
22 pages
Data Lifecycle
No ratings yet
Data Lifecycle
55 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
32 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
27 pages
4. GE ELECT 1 - Data and Databases
No ratings yet
4. GE ELECT 1 - Data and Databases
5 pages
Emergency chapter two(2)
No ratings yet
Emergency chapter two(2)
41 pages
Multidisciplinary Field That Uses A Variety
No ratings yet
Multidisciplinary Field That Uses A Variety
48 pages
7 - Foundations of DS
No ratings yet
7 - Foundations of DS
8 pages
1_Notes
No ratings yet
1_Notes
37 pages
Chapter - 2 Data Sciences
No ratings yet
Chapter - 2 Data Sciences
25 pages
Class 9 (Chap #4)
No ratings yet
Class 9 (Chap #4)
9 pages
CMR-BDA- Unit-I
No ratings yet
CMR-BDA- Unit-I
102 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
28 pages
Chapter 2 - Intro to Data Sciences[2]
No ratings yet
Chapter 2 - Intro to Data Sciences[2]
46 pages
Business Analytics Notes
No ratings yet
Business Analytics Notes
6 pages
Emerging Tech Ch 2
No ratings yet
Emerging Tech Ch 2
62 pages
Research Assignment 02burhan Ul Din
No ratings yet
Research Assignment 02burhan Ul Din
8 pages
Data - Visualisation - Charts and Types of Data
No ratings yet
Data - Visualisation - Charts and Types of Data
7 pages
Chapter 2 - Intro To Data Sciences (Updated)
No ratings yet
Chapter 2 - Intro To Data Sciences (Updated)
67 pages
Data Analytics III-i
No ratings yet
Data Analytics III-i
85 pages
DSF 1-2
No ratings yet
DSF 1-2
28 pages
EmgTech Chapter 02
No ratings yet
EmgTech Chapter 02
52 pages
Chapter 2 EMTE@Kibru 014914
No ratings yet
Chapter 2 EMTE@Kibru 014914
40 pages
UNIT- I
No ratings yet
UNIT- I
17 pages
DA Merge Notes(30!09!24)
No ratings yet
DA Merge Notes(30!09!24)
348 pages
Data miningng
No ratings yet
Data miningng
8 pages
data science
No ratings yet
data science
23 pages
Introduction To Data Science: Chapter Two
No ratings yet
Introduction To Data Science: Chapter Two
52 pages
Chapter 2 Introduction To Data Science
No ratings yet
Chapter 2 Introduction To Data Science
50 pages
Data Profiling, Quality and Governance - research paper
No ratings yet
Data Profiling, Quality and Governance - research paper
13 pages
II CSE_A&B (96)DS-int 1 QP ANS-set1 - Copy
No ratings yet
II CSE_A&B (96)DS-int 1 QP ANS-set1 - Copy
7 pages
Module 1 Introduction To DataScience and Analytics
No ratings yet
Module 1 Introduction To DataScience and Analytics
10 pages
Big Data and Data Science
No ratings yet
Big Data and Data Science
6 pages
Fdsa Unit 1
No ratings yet
Fdsa Unit 1
25 pages
Data Mining MCA 3 Sem
No ratings yet
Data Mining MCA 3 Sem
51 pages
Chapter 2 - Introduction to Data Science
No ratings yet
Chapter 2 - Introduction to Data Science
37 pages
Unit 1
No ratings yet
Unit 1
11 pages
Data Science
From Everand
Data Science
Chloe Martin
No ratings yet
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Data Structures Explained: A Practical Guide with Examples
From Everand
Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Bussines Plan
No ratings yet
Bussines Plan
7 pages
Dere Woga of The Gamo
100% (8)
Dere Woga of The Gamo
4 pages
Chapter 4 IoT New
No ratings yet
Chapter 4 IoT New
25 pages
Chapter 1
No ratings yet
Chapter 1
27 pages
Chapter 6 Ethics New
No ratings yet
Chapter 6 Ethics New
24 pages
Chapter 3 AI New
No ratings yet
Chapter 3 AI New
35 pages
Chapter One - Inclusiveness
No ratings yet
Chapter One - Inclusiveness
22 pages
Chapter 4
No ratings yet
Chapter 4
30 pages
Chapter 1 (Tutorial Exercises)
No ratings yet
Chapter 1 (Tutorial Exercises)
8 pages
Chapter 1
No ratings yet
Chapter 1
62 pages
Civic Chap 4 Quick Note
100% (1)
Civic Chap 4 Quick Note
47 pages
DUALSHOCK4_DualSense_WIRELESS_CONTROLLER_LIBRARY_LICENSE_AGREEMENT_e
No ratings yet
DUALSHOCK4_DualSense_WIRELESS_CONTROLLER_LIBRARY_LICENSE_AGREEMENT_e
4 pages
Ana Gonzalez
No ratings yet
Ana Gonzalez
28 pages
Indian Railways Assignment
No ratings yet
Indian Railways Assignment
9 pages
Botchan Chapter 3
No ratings yet
Botchan Chapter 3
5 pages
Who Will Join This Fight, Started by Bridenstine - Huelskamp - Moving To The Senate Now - Google Groups
No ratings yet
Who Will Join This Fight, Started by Bridenstine - Huelskamp - Moving To The Senate Now - Google Groups
3 pages
Philippine Disaster Risk Reduction and Management System
100% (3)
Philippine Disaster Risk Reduction and Management System
22 pages
Camry FAQ P1of2
No ratings yet
Camry FAQ P1of2
45 pages
Technical Service Information: 1996 & Later Dodge/Chrysler Vehicles With 41Te/42Le Transaxles
No ratings yet
Technical Service Information: 1996 & Later Dodge/Chrysler Vehicles With 41Te/42Le Transaxles
4 pages
Level Set Method
No ratings yet
Level Set Method
38 pages
Project Cost & Estimation
No ratings yet
Project Cost & Estimation
2 pages
Utilization of Malunggay Seeds As A Natural Coagulant For Water Treatment
No ratings yet
Utilization of Malunggay Seeds As A Natural Coagulant For Water Treatment
5 pages
Globstrat Quickstart
No ratings yet
Globstrat Quickstart
27 pages
LEP5102 - 00 Specific Charge of The Electron - em
No ratings yet
LEP5102 - 00 Specific Charge of The Electron - em
3 pages
CHAPTER ONE AND TWO
No ratings yet
CHAPTER ONE AND TWO
26 pages
Milestone 1
No ratings yet
Milestone 1
22 pages
HPP
No ratings yet
HPP
22 pages
sm749 Cs 9300 Family Inst Ed08 en
No ratings yet
sm749 Cs 9300 Family Inst Ed08 en
76 pages
ECE 4215 Lesson 1 Introduction To Pavement Engineering
No ratings yet
ECE 4215 Lesson 1 Introduction To Pavement Engineering
11 pages
Atkinson Wald Dy Fi
No ratings yet
Atkinson Wald Dy Fi
7 pages
Power of Education Empowering Individuals
No ratings yet
Power of Education Empowering Individuals
2 pages
Annexure 1 - Summary of Indian TP Regulations
No ratings yet
Annexure 1 - Summary of Indian TP Regulations
10 pages
Ac PR10 21dit013
No ratings yet
Ac PR10 21dit013
9 pages
Remote Start Alarm Wiring 2008 Mitsubishi Lancer PDF
No ratings yet
Remote Start Alarm Wiring 2008 Mitsubishi Lancer PDF
4 pages
HVAC OutdoorAirVentilationStandard - PD
No ratings yet
HVAC OutdoorAirVentilationStandard - PD
5 pages

Chapter 2

Uploaded by

Chapter 2

Uploaded by

Addis Ababa Science and Technology

College of Mechanical and Electrical

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

Source: Introduction to emerging technology module page 23

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

Source: Introduction to emerging technology module page 25

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

Source: Introduction to emerging technology module page 26

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

Source: Introduction to emerging technology module page 29

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

26-May-20 By Dr. Dereje E. and Yonas T., AASTU

You might also like