0% found this document useful (0 votes)
0 views2 pages

Data Science Big Data Notes

Data Science is an interdisciplinary field focused on extracting insights from data, while Big Data involves analyzing large datasets to uncover patterns. Key components of Data Science include data collection, cleaning, analysis, and visualization, utilizing tools like Python and SQL. Big Data is defined by its 5 V's and employs technologies such as Hadoop and Apache Spark, with applications across various sectors including business, healthcare, and cybersecurity.

Uploaded by

MarieFernandes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views2 pages

Data Science Big Data Notes

Data Science is an interdisciplinary field focused on extracting insights from data, while Big Data involves analyzing large datasets to uncover patterns. Key components of Data Science include data collection, cleaning, analysis, and visualization, utilizing tools like Python and SQL. Big Data is defined by its 5 V's and employs technologies such as Hadoop and Apache Spark, with applications across various sectors including business, healthcare, and cybersecurity.

Uploaded by

MarieFernandes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Notes: Data Science and Big Data

1. Introduction

Data Science is an interdisciplinary field that uses scientific methods, algorithms, and
systems to extract knowledge and insights from structured and unstructured data.
Big Data refers to extremely large datasets that may be analyzed computationally to reveal
patterns, trends, and associations, especially relating to human behavior and interactions.

2. Components of Data Science

- Data Collection and Storage


- Data Cleaning and Preprocessing
- Exploratory Data Analysis (EDA)
- Statistical Analysis and Machine Learning
- Data Visualization
- Deployment and Communication of Results

3. Tools and Technologies in Data Science

- Programming Languages: Python, R


- Libraries: Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn
- Tools: Jupyter Notebook, Tableau, Power BI
- Databases: SQL, NoSQL (MongoDB)

4. Introduction to Big Data

Big Data is characterized by the 5 V’s:


- Volume: Large amount of data
- Velocity: Speed of data generation
- Variety: Different types of data (structured, unstructured)
- Veracity: Uncertainty of data
- Value: Insights derived from data
5. Big Data Technologies

- Hadoop Ecosystem: HDFS, MapReduce, Hive, Pig, HBase


- Apache Spark: Fast in-memory big data processing
- NoSQL Databases: MongoDB, Cassandra
- Data Lakes and Cloud Storage (AWS, Azure)

6. Applications

- Business Intelligence and Analytics


- Healthcare and Genomics
- Social Media and Web Analytics
- Fraud Detection and Cybersecurity
- E-commerce and Recommendation Systems

7. Challenges

- Data Privacy and Security


- Data Integration and Cleaning
- Real-time Processing
- Lack of Skilled Professionals

You might also like