0% found this document useful (0 votes)

176 views

Module 1 ITE Elective 1 New - Curriculum

The document defines key concepts in big data, data science, and data analytics. It states that big data has become important for businesses to glean insights but requires proper tools to analyze large datasets. Data science and analytics involve extracting insights from vast amounts of data using scientific methods and have become integral to business intelligence. Data science uses methods like statistics, machine learning, and deep learning to discover patterns in data. Data analytics focuses on processing existing datasets to uncover current insights and solutions. The data science process involves discovery, preparation, modeling, building, operationalizing, and communicating results. The document distinguishes data scientists, who predict the future using patterns, from data analysts, who answer provided questions and find meaningful current information. It lists several

Uploaded by

Garry Penoliar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

176 views

Module 1 ITE Elective 1 New - Curriculum

Uploaded by

Garry Penoliar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

ITE01 – IT ELECTIVE 1 (Foundations of Data Science) Module 1

Key concepts in big data, data science and data analytics.

What is Big Data?
Big data has become a major component in the tech world today thanks to the actionable insights
and results businesses can glean. However, the creation of such large datasets also requires
understanding and having the proper tools on hand to parse through them to uncover the right
information.
To better comprehend big data, the fields of data science and analytics have gone from largely being
relegated to academia, to instead becoming integral elements of Business Intelligence (BI) and big data
analytics tools.

What is Data Science?

Data Science is the area of study which involves extracting insights from vast amounts of data by the
use of various scientific methods, algorithms, and processes. It helps you to discover hidden patterns
from the raw data. The term Data Science has emerged because of the evolution of mathematical
statistics, data analysis, and big data (See Figure 1).
Furthermore, Data Science is an interdisciplinary field that allows you to extract knowledge from
structured or unstructured data. It enables you to translate a business problem into a research project
and then translate it back into a practical solution. (See Figure 2).

Figure 1: Data Science Components

Statistics:
Statistics is the most critical unit in Data science. It is the method or science of collecting and
analyzing numerical data in large quantities to get useful insights.

Prepared by: MR. ARNALDY D. FORTIN, MBA, MCS Page 1 of 10

ITE01 – IT ELECTIVE 1 (Foundations of Data Science) Module 1
Visualization:
Visualization technique helps you to access huge amounts of data in easy to understand and
digestible visuals.

Machine Learning:
Machine Learning explores the building and study of algorithms which learn to make predictions
about unforeseen/future data.

Deep Learning:
Deep Learning method is new machine learning research where the algorithm selects the analysis
model to follow.

Figure 2: Evolution of Data Sciences

Data analytics, on the other hand, focuses on processing and performing statistical analysis on
existing datasets. Analysts concentrate on creating methods to capture, process, and organize data to
uncover actionable insights for current problems, and establishing the best way to present this data.
More simply, the field of data and analytics is directed toward solving problems for questions we know
but we don’t know the answers to. More importantly, it’s based on producing results that can lead to
immediate improvements.
Data analytics also encompasses a few different branches of broader statistics and analysis which
help combine diverse sources of data and locate connections while simplifying the results.

Prepared by: MR. ARNALDY D. FORTIN, MBA, MCS Page 2 of 10

ITE01 – IT ELECTIVE 1 (Foundations of Data Science) Module 1

Figure 3: Data Science Process

Data Science Process

1. Discovery:
Discovery step involves acquiring data from all the identified internal & external sources which
helps you to answer the business question.
The data can be:
 Logs from webservers
 Data gathered from social media
 Census datasets
 Data streamed from online sources using APIs (Application Programming Interface)

2. Data Preparation:
Data can have lots of inconsistencies like missing value, blank columns, incorrect data format
which needs to be cleaned. You need to process, explore, and condition data before modeling.
The cleaner your data, the better are your predictions.

3. Model Planning:
In this stage, you need to determine the method and technique to draw the relation between
input variables. Planning for a model is performed by using different statistical formulas and

Prepared by: MR. ARNALDY D. FORTIN, MBA, MCS Page 3 of 10

ITE01 – IT ELECTIVE 1 (Foundations of Data Science) Module 1
visualization tools. SQL analysis services, R, and SAS/access are some of the tools used for this
purpose.

4. Model Building:
In this step, the actual model building process starts. Here, Data scientist distributes datasets for
training and testing. Techniques like association, classification, and clustering are applied to the
training data set. The model once prepared is tested against the "testing" dataset.

5. Operationalize:
In this stage, you deliver the final baselined model with reports, code, and technical documents.
Model is deployed into a real-time production environment after thorough testing.

6. Communicate Results
In this stage, the key findings are communicated to all stakeholders. This helps you to decide if
the results of the project are a success or a failure based on the inputs from the model.

What is Data Analytics?

Data analytics focuses on processing and performing statistical analysis on existing datasets.
Analysts concentrate on creating methods to capture, process, and organize data to uncover actionable
insights for current problems, and establishing the best way to present this data. More simply, the field
of data and analytics is directed toward solving problems for questions we know we don’t know the
answers to. More importantly, it’s based on producing results that can lead to immediate
improvements.
Data analytics also encompasses a few different branches of broader statistics and analysis
which help combine diverse sources of data and locate connections while simplifying the results.

How Data Scientists Are Different From Data Analysts:

1. A data scientist uses their skills to predict the future using past patterns while the work of data
analysts is to find meaningful information from the provided data.
2. A data scientist analyses the data and raise questions while data analyst finds the answer of all
the various issues arising in the mind of businesspersons. In short, a data scientist is more about
what if, on the other hand, a data analyst involves in the day-to-day analysis.
3. The work of data scientist is not only to address business problems but also to provide accurate
predictions about the business. However, data analysts only address business issues, but the
rest lies in the hand of the administration.
4. To extract information from a data, a data scientist uses machine learning while data analyst
uses R/SAS tools. (R programming and Statistical Analysis Software)
5. Data Scientists combine different sources and establish a link between them. Primarily data
scientists use diverse sources, explore and examine them. However, data analysts use only data
from a single reference to investigate and examines.
6. The accuracy rate of data scientists is as high as 90%. Whereas, data analysts work is to work on
the questions provided to them by the management.

Prepared by: MR. ARNALDY D. FORTIN, MBA, MCS Page 4 of 10

ITE01 – IT ELECTIVE 1 (Foundations of Data Science) Module 1
7. Data Scientists formulate questions whose answers will prove beneficial for the businesses. Data
analysts on the other hander only solvers a set of questions and hand it to the authorities.

Data Science Jobs & Roles

Most prominent Data Scientist job titles are:
 Data Scientist
 Data Engineer
 Data Analyst
 Statistician
 Data Architect
 Data Admin
 Business Analyst
 Data/Analytics Manager

Let's learn what each role entails in detail:

1. Data Scientist:
Role: A Data Scientist is a professional who manages enormous amounts of data to come up
with compelling business visions by using various tools, techniques, methodologies,
algorithms, etc.
Languages: R, SAS, Python, SQL, Hive, Matlab, Pig, Spark

2. Data Engineer:
Role: The role of data engineer is of working with large amounts of data. He develops,
constructs, tests, and maintains architectures like large scale processing system and
databases.
Languages: SQL, Hive, R, SAS, Matlab, Python, Java, Ruby, C + +, and Perl

3. Data Analyst:
Role: A data analyst is responsible for mining vast amounts of data. He or she will look for
relationships, patterns, trends in data. Later he or she will deliver compelling reporting
and visualization for analyzing the data to take the most viable business decisions.
Languages: R, Python, HTML, JS, C, C+ + , SQL

4. Statistician:
Role: The statistician collects, analyses, understand qualitative and quantitative data by using
statistical theories and methods.
Languages: SQL, R, Matlab, Tableau, Python, Perl, Spark, and Hive

5. Data Administrator:
Role: Data admin should ensure that the database is accessible to all relevant users. He also
makes sure that it is performing correctly and is being kept safe from hacking.
Languages: Ruby on Rails, SQL, Java, C#, and Python

Prepared by: MR. ARNALDY D. FORTIN, MBA, MCS Page 5 of 10

ITE01 – IT ELECTIVE 1 (Foundations of Data Science) Module 1
6. Business Analyst:
Role: This professional need to improve business processes. He/she as an intermediary between
the business executive team and IT department.
Languages: SQL, Tableau, Power BI and, Python

Tools and Techniques of Data Science

Big data is a term used in data science, which refers to the huge amount of data that has been
collected to be used for research and analysis. It goes through various processes, such as it is first
picked, stored, filtered, classified, validated, analyzed, and then processed for final visualization. (Ngiam
and Khor, 2019)
The tools and techniques of data science are two different things. Techniques are a set of
procedures that are followed to perform a task, whereas a tool is an equipment that is used to apply
that technique to perform a task.
Data scientists apply some operational methods, which are called the techniques on the data
through various software, which are known as tools. This combination is used in acquiring data, refining
it for the purpose intended, manipulating and labeling, and then examining the results for the best
possible outcomes.
These methods used by the data scientists and engineers are inclusive of all the operations starting
from the collection of data to storing and manipulating it, performing statistical analysis on it, and
visualization with the help of bars and charts and preparing predictive models for insights.
These processes are attained with the help of several tools and techniques which are extracted from
the three subjects mentioned above.
The lifecycle of a data science project is composed of various stages. Data passes through each stage
and is then transformed into information required by the respective field. Here we will have a look at
the most efficient, quick, and productive tools and techniques used by the data scientists to accomplish
their task at each stage.

 Techniques
What mathematical and statistical techniques you need to learn for data science? There are a
number of these techniques used in data science for the data collection, modification, storage, analysis,
insights, and then representation. The data analysts and scientists mostly work on the following
statistical analyzing techniques that follow as:
 Probability and Statistics
 Distribution
 Regression analysis
 Descriptive statistics
 Inferential statistics
 Non-Parametric statistics
 Hypothesis testing
 Linear Regression
 Logistic Regression
 Neural Networks
 K-Means clustering
 Decision Trees

Prepared by: MR. ARNALDY D. FORTIN, MBA, MCS Page 6 of 10

ITE01 – IT ELECTIVE 1 (Foundations of Data Science) Module 1

Although the list doesn’t end here, if you have studied statistics and mathematics, you will have an
idea of how the theories and techniques of samplings and correlations work. Particularly when you work
as a data scientist and need to conclude, research on the patterns, targeted insight, etc. (Sivarajah,
Kamal, Irani, and Weerakkody, 2017)

 Tools
Let us start exploring the tools which are used to work on data in different processes. As mentioned
earlier, data does go through a lot of processes in which it is collected, stored, worked upon, and
analyzed.
For your easy understanding, the tools defined here are categorized according to their processes.
The first process is data collection. Although data can be collected through various methods, which
include online surveys, interviews, forms, etc., the information gathered has to be transformed in a
readable form for the data analyst to work on. The following tools can be used for data collection.

1. Data Collection Tools

Text Analysis is about parsing texts in order to extract machine-readable facts from them.
The purpose of Text Analysis is to create structured data out of free text content. The
process can be thought of as slicing and dicing heaps of unstructured, heterogeneous
documents into easy-to-manage and interpret data pieces.

 Semantria
Semantria is a cloud-based tool that extracts data and information through analyzing
the text and sentiments in it. It is a high-end NLP (neuro-linguistic programming) based
tool that can detect the sentiments on specific elements based on the language used in
it (sounds like magic? No, it is science!).

 Trackur
It is yet another tool that collects data, especially on social media platforms, by tracking
the feedback on brands and products. It also works on sentiment analysis. It is a tool
used for monitoring and can be of great value for the marketing companies.

Today, many other apps use similar text /semantics analysis and content management, e.g., Open
Text, Opinion Crawl.

2. Data Storage Tools

These tools are used to store a huge amount of data – which is typically stored in shared
computers – and interact with it. These tools provide a platform to unite servers so that
data can be assessed easily.

 Apache Hadoop
It is a framework for software that deals with huge data volume and its computation. It
provides a layered structure to distribute the storage of data among clusters of
computers for easy data processing of big data.

Prepared by: MR. ARNALDY D. FORTIN, MBA, MCS Page 7 of 10

ITE01 – IT ELECTIVE 1 (Foundations of Data Science) Module 1

 Apache Cassandra
This tool is free and an open-source platform. It uses SQL and CSL (Cassandra structure
language) to communicate with the database. It can provide swift availability of data
stored on various servers.

 Mongo DB
It is a database that is document-oriented and also free to use. It is available on multiple
platforms like Windows, Solaris, and Linux. It is very easy to learn and is reliable.

Similar data storage platforms are CouchDB, Apache Ignite, and Oracle NOSQL Database.

3. Data Extraction Tools

Data extraction tools are also known as web scraping tools. They are automated and
extract information and data automatically from websites. The following tools can be
used for data extraction.

 OctoParse
It is a web scraping tool available in both free and paid versions. It gives data as output
in structured spreadsheets, which are readable and easy to use for further operations
on it. It can extract phone numbers, IP addresses, and email IDs along with different
data from the websites.

 Content Grabber
It is also a web scraping tool but comes with advanced skills such as debugging and error
handling. It can extract data from almost every website and provide structured data as
output in user preferred formats.

Similar tools are Mozenda, Pentaho and import.io.

4. Data Cleaning / Refining Tools

Integrated with databases, data cleaning tools are time-saving and reduce the time consumption
by searching, sorting, and filtering data to be used by the data analysts. The refined data
becomes easy to use and is relevant. (Blei and Smyth, 2017)

 Data Cleaner
Data cleaner works with the Hadoop database and is a very powerful data indexing tool.
It improves the quality of data by removing duplicates and transforming them into one
record. It can also find missing patterns and a specific data group.

 OpenRefine
This refining tool deals with tangled data. It cleans before transforming it into another
form. It provides data access with speed and ease.

Prepared by: MR. ARNALDY D. FORTIN, MBA, MCS Page 8 of 10

ITE01 – IT ELECTIVE 1 (Foundations of Data Science) Module 1
Similar data cleaning tools are MapReduce, Rapidminer, and Talend.

5. Data Analysis Tools

Data analysis tools not only analyze the data but also perform certain operations on the data.
These tools inspect the data and study data modeling to draw useful information out of the
data, which is conclusive and helps in decision making for a certain problem or query.

 R
The R programming language is the widely used programming language that is used by
software engineers to develop software that helps in statistical computing and graphics
too. It supports various platforms like Windows, Mac operating system, and Linux. It is
widely used by data analysts, statisticians, and researchers.

 Apache Spark
Apache Spark is a powerful analytical engine that provides real-time analysis and
processes data along with enabling mini and micro-batches and streaming. It is
productive as it provides workflows that are highly interactive.

 Python
Python has been a very powerful and high-level programming language that has been
around for quite a while. It was used for application development, but now it has been
upgraded with new tools to be used, especially with data science. It gives output files
which can be saved as CSV formats and used as spreadsheets.

Similar data analysis tools are Apache storm, SAS, Flink, and Hive, etc.

6. Data Visualization Tools

Data visualization tools are used to present data in a graphical representation for clear insight.
Many visualization tools are a combination of previous functions we discussed and can also
support data extraction and analysis along with visualization.

 Python
Python, as mentioned above, is a powerful and general-purpose programming language
that also provides data visualization. It is packed with vast graphical libraries to support
the graphical representation of a wide variety of data.

 Tableau
Having a very large consumer market, Tableau is referred to as the grandmaster of all
visualization software by Forbes. It is an open-source software that can be integrated
with the database, is easy to use, and furnishes interactive data visualization in the form
of bars, charts, and maps.

Prepared by: MR. ARNALDY D. FORTIN, MBA, MCS Page 9 of 10

ITE01 – IT ELECTIVE 1 (Foundations of Data Science) Module 1
 Orange
Orange also happens to be an open-source data visualization tool supporting data
extraction, data analysis, and machine learning. It does not require programming but
rather has an interactive and user-friendly graphical user interface that displays the data
in the form of bar charts, networks, heat maps, scatter plots, and trees.

 Google Fusion Table

It is a web service powered by Google, which can be easily used by non-programmers
for collecting data. You can upload your data in the form of CSV files and save them too.
It looks more like an excel spreadsheet and allows editing by which you can see real-
time changes in visualizations. It displays data in the form of pie charts, bars, timelines,
line plots, and scatter plots. It allows you to link the data tables to your websites. You
can also create a map based on your data, which can be further modified by coloring
and can also be shared.

Similar popular data visualization apps and tools are DataWrapper, Qlik, and Gephi, which are all
open source and also support CSV files as data input.

Prepared by: MR. ARNALDY D. FORTIN, MBA, MCS Page 10 of 10

Seminar On Data Science
100% (7)
Seminar On Data Science
25 pages
Lecture 1 What Is Data Science Prerequisites, Lifecycle and Applications Simplilearn
No ratings yet
Lecture 1 What Is Data Science Prerequisites, Lifecycle and Applications Simplilearn
5 pages
Chapter one-DSA
No ratings yet
Chapter one-DSA
20 pages
UNIT I Notes BA
No ratings yet
UNIT I Notes BA
79 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
6 pages
Unit-1 - IDS
No ratings yet
Unit-1 - IDS
29 pages
Introduction To Data Science What Is Data Science?
No ratings yet
Introduction To Data Science What Is Data Science?
11 pages
C1 Part2
No ratings yet
C1 Part2
28 pages
Unit-3 Intr Data Science
No ratings yet
Unit-3 Intr Data Science
150 pages
1. Introduction to Data Science.docx
No ratings yet
1. Introduction to Data Science.docx
24 pages
IDS Unit 1 Notes
No ratings yet
IDS Unit 1 Notes
24 pages
Intro Lectures To DSA
0% (1)
Intro Lectures To DSA
17 pages
Data Science Vs Data Analytics
No ratings yet
Data Science Vs Data Analytics
5 pages
Data Science M-1 Notes
No ratings yet
Data Science M-1 Notes
34 pages
Data Science PDF
No ratings yet
Data Science PDF
8 pages
Data Science Components
No ratings yet
Data Science Components
7 pages
data science notes Mtech
No ratings yet
data science notes Mtech
115 pages
Fundamentals of Data Science
100% (3)
Fundamentals of Data Science
62 pages
Unit 1-FDS
No ratings yet
Unit 1-FDS
18 pages
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
CD101 Fundamental of Data Science
No ratings yet
CD101 Fundamental of Data Science
41 pages
DSF 1-2
No ratings yet
DSF 1-2
28 pages
CH 1 Introduction To Data Science
100% (1)
CH 1 Introduction To Data Science
27 pages
Data Science Basics
No ratings yet
Data Science Basics
25 pages
IDS Complete Notes
No ratings yet
IDS Complete Notes
126 pages
Unit 1
No ratings yet
Unit 1
8 pages
Data Science Ppt1 Update
No ratings yet
Data Science Ppt1 Update
67 pages
FDS - Lecture Notes - III AIML, CSM
No ratings yet
FDS - Lecture Notes - III AIML, CSM
101 pages
Business Analytics - Suggetions - 2024
No ratings yet
Business Analytics - Suggetions - 2024
27 pages
DS
No ratings yet
DS
32 pages
Dsbda Unit 1
No ratings yet
Dsbda Unit 1
119 pages
Basics of Data Science KPK
No ratings yet
Basics of Data Science KPK
38 pages
DS PPT
No ratings yet
DS PPT
13 pages
Differences between Data Science and Data Analytics
No ratings yet
Differences between Data Science and Data Analytics
10 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
25 pages
Task 2a
No ratings yet
Task 2a
16 pages
DSE 3 Unit 1
100% (1)
DSE 3 Unit 1
10 pages
Applied_Data_Science-MODULE-1-SEM8
No ratings yet
Applied_Data_Science-MODULE-1-SEM8
16 pages
COMPUTATIONAL DATA SCIENCE - UNIT 1
No ratings yet
COMPUTATIONAL DATA SCIENCE - UNIT 1
18 pages
DataScience Reading
No ratings yet
DataScience Reading
6 pages
p.n[1]
No ratings yet
p.n[1]
14 pages
Data Science Bcs A
No ratings yet
Data Science Bcs A
20 pages
Data Science Study materials
No ratings yet
Data Science Study materials
47 pages
Introduction of Data Science.docx
No ratings yet
Introduction of Data Science.docx
28 pages
Introduction to Data Science
No ratings yet
Introduction to Data Science
17 pages
Ab Assignment 3
No ratings yet
Ab Assignment 3
7 pages
Introduction to Data-Science
No ratings yet
Introduction to Data-Science
246 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
30 pages
Unit-1 - Introduction to Data Science
No ratings yet
Unit-1 - Introduction to Data Science
17 pages
3250+module+1+ +Intro+to+Data+Science
No ratings yet
3250+module+1+ +Intro+to+Data+Science
71 pages
Data Science Unit-1 Notes
No ratings yet
Data Science Unit-1 Notes
19 pages
Vishwha D
No ratings yet
Vishwha D
29 pages
Data Science
No ratings yet
Data Science
19 pages
CSL-410-L02
No ratings yet
CSL-410-L02
16 pages
EDS Unit 1?
No ratings yet
EDS Unit 1?
15 pages
What Is A Data Scientist
No ratings yet
What Is A Data Scientist
21 pages
Inroduction To Data Science
No ratings yet
Inroduction To Data Science
62 pages
Data Science: by Neha Tyagi
100% (1)
Data Science: by Neha Tyagi
17 pages
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Data Science Mastery: From Beginner to Expert in Big Data Analytics
From Everand
Data Science Mastery: From Beginner to Expert in Big Data Analytics
Kameron Hussain
No ratings yet
Aiwon Business Services Brochure
0% (1)
Aiwon Business Services Brochure
6 pages
Tamirat report
No ratings yet
Tamirat report
42 pages
Profitability Performance of HDFC Bank and ICICI Bank: An Analytical and Comparative Study
No ratings yet
Profitability Performance of HDFC Bank and ICICI Bank: An Analytical and Comparative Study
13 pages
Sample Rationale
No ratings yet
Sample Rationale
4 pages
Coming Out To Siblings and Internalized Sexual Sti PDF
No ratings yet
Coming Out To Siblings and Internalized Sexual Sti PDF
22 pages
Data Mining and Data Profiling - Nargis Hamid Monami
No ratings yet
Data Mining and Data Profiling - Nargis Hamid Monami
7 pages
A-Level Physics P5
100% (19)
A-Level Physics P5
9 pages
Yusuf M.sc. Thesis Small Version
No ratings yet
Yusuf M.sc. Thesis Small Version
190 pages
STATS ACT
No ratings yet
STATS ACT
7 pages
Tips For Developing and Testing Questionnaires
100% (1)
Tips For Developing and Testing Questionnaires
5 pages
If The Bcuse Command Is Not Available, Install It With The Stata Command SSC Install Bcuse
No ratings yet
If The Bcuse Command Is Not Available, Install It With The Stata Command SSC Install Bcuse
9 pages
2.3. Effect of TQM Principles On Performance of Indian SMEs The Case of Automotive Supply Chain
No ratings yet
2.3. Effect of TQM Principles On Performance of Indian SMEs The Case of Automotive Supply Chain
19 pages
Chapter 1to 2
No ratings yet
Chapter 1to 2
17 pages
Proximate Composition, Sensory Characteristics, and Acceptability of Biko "Sticky Rice Cake" Made From Nixtamalized IPB Var 6 Corn Grits
No ratings yet
Proximate Composition, Sensory Characteristics, and Acceptability of Biko "Sticky Rice Cake" Made From Nixtamalized IPB Var 6 Corn Grits
11 pages
Data Mining: Model Overfitting Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Data Mining: Model Overfitting Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
15 pages
The Use of Undercover Game Application To Improve Students' Vocabulary
No ratings yet
The Use of Undercover Game Application To Improve Students' Vocabulary
16 pages
PPNCKH
No ratings yet
PPNCKH
8 pages
When There Is Significant Linear Correlation
No ratings yet
When There Is Significant Linear Correlation
4 pages
Excel Beta Example
No ratings yet
Excel Beta Example
5 pages
Sales Analysis
No ratings yet
Sales Analysis
7 pages
مقياس الضغوط النفسيه والمهنيه
100% (1)
مقياس الضغوط النفسيه والمهنيه
16 pages
Fowler 1994 PDF
No ratings yet
Fowler 1994 PDF
14 pages
statistics-and-probability-group-assignment (3)
No ratings yet
statistics-and-probability-group-assignment (3)
53 pages
data-warehousing-and-data-mining-kca012-2022
No ratings yet
data-warehousing-and-data-mining-kca012-2022
2 pages
Topic 08 - Data Modelling - Part II
No ratings yet
Topic 08 - Data Modelling - Part II
59 pages
Ebook Proceeding Icieb 2023
No ratings yet
Ebook Proceeding Icieb 2023
36 pages
Bsu 305 Business Research Methods Notes
No ratings yet
Bsu 305 Business Research Methods Notes
116 pages
Multicollinearity Slides PDF
No ratings yet
Multicollinearity Slides PDF
8 pages
Thank You For Taking The Week 3: Assignment 3. Week 3: Assignment 3
No ratings yet
Thank You For Taking The Week 3: Assignment 3. Week 3: Assignment 3
3 pages

Module 1 ITE Elective 1 New - Curriculum

Uploaded by

Module 1 ITE Elective 1 New - Curriculum

Uploaded by

ITE01 – IT ELECTIVE 1 (Foundations of Data Science) Module 1

Key concepts in big data, data science and data analytics.

What is Data Science?

Figure 1: Data Science Components

Prepared by: MR. ARNALDY D. FORTIN, MBA, MCS Page 1 of 10

Figure 2: Evolution of Data Sciences

Prepared by: MR. ARNALDY D. FORTIN, MBA, MCS Page 2 of 10

Figure 3: Data Science Process

Data Science Process

Prepared by: MR. ARNALDY D. FORTIN, MBA, MCS Page 3 of 10

What is Data Analytics?

How Data Scientists Are Different From Data Analysts:

Prepared by: MR. ARNALDY D. FORTIN, MBA, MCS Page 4 of 10

Data Science Jobs & Roles

Let's learn what each role entails in detail:

Prepared by: MR. ARNALDY D. FORTIN, MBA, MCS Page 5 of 10

Tools and Techniques of Data Science

Prepared by: MR. ARNALDY D. FORTIN, MBA, MCS Page 6 of 10

1. Data Collection Tools

2. Data Storage Tools

Prepared by: MR. ARNALDY D. FORTIN, MBA, MCS Page 7 of 10

3. Data Extraction Tools

Similar tools are Mozenda, Pentaho and import.io.

4. Data Cleaning / Refining Tools

Prepared by: MR. ARNALDY D. FORTIN, MBA, MCS Page 8 of 10

5. Data Analysis Tools

6. Data Visualization Tools

Prepared by: MR. ARNALDY D. FORTIN, MBA, MCS Page 9 of 10

 Google Fusion Table

Prepared by: MR. ARNALDY D. FORTIN, MBA, MCS Page 10 of 10

You might also like