0% found this document useful (0 votes)

9 views

Selected topic D4444oc

This document provides a comprehensive overview of Big Data, highlighting its significance in modern technology and business operations. It covers key technologies, storage solutions, data processing methods, analysis techniques, and visualization tools essential for harnessing Big Data's potential. By leveraging these tools, organizations can enhance decision-making, operational efficiency, and gain a competitive edge.

Uploaded by

yosefdemeke08

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Selected topic D4444oc

Uploaded by

yosefdemeke08

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

DEBARK UNVERSITY

COLLEGE OF NATURAL AND COMPUTATIONAL SCIENCE

DEPARTMENT OF COMPUTER SCIECNE

COURSE TITLE SELECTED TOPIC IN COMPUTER SCIENCE

COURSE CODE-CoSc4036

GROUP ASSIGNMENT

NAME ID

ALEMAYEHU ASIMARE 1300071

ABDUKERIM SINDEW 130

HANA FISA

YESHIMEBIT BERIE

KEBON TADESSE 1300550

SUBMITTED TO YESHAMBEL A.

SUBMISSION DATE 10-05-2017 E.C

DEBARK ETHIOPIA
Introduction
Big Data has become an essential aspect of modern technology and business operations. It
refers to vast and complex datasets that traditional data management tools cannot handle
efficiently. The rise of social media, IoT devices, e-commerce, and various digital platforms
has led to exponential data growth. Understanding Big Data technologies, storage methods,
processing techniques, and analytics is crucial for businesses to harness its full potential. This
document provides a comprehensive overview of Big Data concepts, exploring its
technologies, storage solutions, data processing methods, analysis techniques, and
visualization tools.
1. Introduction to Big Data

1. Introduction to Big Data

Big Data refers to extremely large and complex datasets that traditional data processing
software cannot efficiently handle. These datasets originate from various sources, including
social media platforms, IoT devices, e-commerce transactions, financial records, and
multimedia content. The data generated is vast, growing at unprecedented rates, and often
heterogeneous in nature. Big Data plays a pivotal role in enabling businesses to gain deeper
insights into customer behavior, optimize operations, and predict future trends. The ability to
harness Big Data effectively provides a competitive edge in various industries, including
healthcare, finance, retail, and telecommunications.

The significance of Big Data lies not only in its volume but also in its variety, velocity,
veracity, and value. Volume refers to the sheer amount of data, while variety highlights the
different formats such as text, audio, video, and structured data. Velocity represents the speed
at which data is generated and processed, whereas veracity concerns the reliability and
quality of data. The ultimate goal is to extract value from data, enabling data-driven decision-
making and innovation.

1.1 1.1 Big Data Technologies

Big Data technologies encompass a wide range of tools and frameworks designed to manage,
process, and analyze vast amounts of data. Key technologies include:

 Hadoop: An open-source framework for distributed storage and processing of large

datasets. It uses the MapReduce programming model to split data into smaller chunks,
which are processed in parallel across a distributed cluster. Hadoop's fault-tolerant
design makes it ideal for processing large-scale data. Additionally, Hadoop's ecosystem
includes tools like Hive for data querying and HBase for real-time read/write access.

 Apache Spark: A fast, in-memory data processing engine that supports batch
processing, stream processing, and machine learning. Spark's ability to perform
computations in memory improves performance significantly compared to Hadoop. It
also provides APIs in multiple languages, including Python, Java, and Scala, making it
versatile for different applications.

 NoSQL Databases: Non-relational databases like MongoDB, Cassandra, and Redis are
designed to handle semi-structured and unstructured data. These databases provide
horizontal scalability, flexibility, and faster data access. MongoDB is widely used for
document-based storage, while Cassandra excels in high availability and fault tolerance.

 Cloud Platforms: Cloud providers such as AWS, Microsoft Azure, and Google Cloud
offer scalable storage, computing power, and managed services like data warehousing,
machine learning, and big data analytics. These platforms provide services such as
Amazon Redshift, Google BigQuery, and Azure Data Lake that enable businesses to
perform large-scale data analytics.

 Data Lakes: Central repositories that allow organizations to store vast amounts of raw
data in its native format. Data lakes offer flexible storage, making them suitable for
handling structured, semi-structured, and unstructured data. They are often built on
cloud storage solutions and support advanced analytics using machine learning
algorithms.

 Edge Computing: A decentralized computing model that processes data closer to the
data source. Edge computing enhances real-time data processing, reduces latency, and
minimizes bandwidth usage. It is widely used in IoT applications where immediate data
processing is necessary for decision-making.

Big Data technologies continue to evolve, integrating artificial intelligence, machine learning,
and blockchain to improve performance, security, and insights extraction. These technologies
play a crucial role in building scalable and efficient Big Data architectures.

1.2 Data Storage and Management

Efficient data storage and management are critical components of Big Data systems. Various
storage solutions cater to different data types and business needs:

 On-Premises Storage: Traditional Storage Area Networks (SAN) and Network Attached
Storage (NAS) systems provide high-speed data access and security but require
significant upfront investment and maintenance. These systems offer complete control
over data but can become expensive and difficult to scale. They are often used in
industries where data security and regulatory compliance are paramount, such as
finance and healthcare.

 Cloud Storage: Cloud platforms such as Amazon S3, Google Cloud Storage, and
Microsoft Azure Blob Storage offer scalable, cost-efficient, and reliable data storage
solutions. Cloud storage provides on-demand storage capacity, pay-as-you-go pricing
models, and built-in disaster recovery features, making it ideal for businesses of all sizes.

 Distributed Storage: Distributed file systems like Hadoop Distributed File System
(HDFS) and Ceph distribute data across multiple nodes, ensuring fault tolerance and
high availability. This approach enhances data reliability and performance while
enabling seamless scalability. Distributed storage is widely adopted in environments
with large datasets that require high-speed processing.

 Object Storage: Object storage systems store data as objects with metadata and unique
identifiers. Examples include Amazon S3 and OpenStack Swift. This method is
particularly suited for unstructured data such as images, videos, and backups. Object
storage provides high scalability, redundancy, and data encryption for secure storage.

 Blockchain-Based Storage: Blockchain-based storage solutions enhance data security

and immutability by using cryptographic methods and decentralized storage networks.
These systems are particularly useful for storing sensitive information and ensuring
data integrity, especially in sectors like finance, healthcare, and supply chain
management.

Effective data management strategies involve:

 Data Governance: Establishing policies and procedures to ensure data quality, security,
and compliance with regulations such as GDPR and HIPAA. Data governance
frameworks help organizations maintain data integrity, privacy, and accessibility.

 Data Lifecycle Management: Managing data from its creation, storage, and archiving to
its eventual disposal. Automated lifecycle policies help reduce storage costs and ensure
regulatory compliance.

 Backup and Disaster Recovery: Implementing regular backups and disaster recovery
plans to safeguard data against loss and corruption. Cloud storage solutions often offer
automated backup services with multiple redundancy options.

 Metadata Management: Cataloging data with descriptive metadata to improve

searchability, retrieval, and analysis. Metadata management tools like Apache Atlas
help organizations organize and track their data assets.

 Data Deduplication: Identifying and eliminating redundant copies of data to optimize

storage capacity and improve system performance. Deduplication algorithms are
commonly used in backup and archival systems to reduce storage requirements.

By adopting these strategies, organizations can enhance data security, improve operational
efficiency, and ensure regulatory compliance while managing vast amounts of data.

1.3 Data Processing

Data processing converts raw data into valuable insights. Key stages include:

 Data Collection: Acquiring data from various sources like IoT devices, social media,
business applications, and public datasets.

 Data Cleaning: Removing duplicates, handling missing values, correcting errors, and
standardizing formats to improve data quality.

 Data Transformation: Aggregating, normalizing, and encoding data for compatibility

with analytical tools.

 Data Storage: Storing processed data in relational databases, NoSQL databases, or data
lakes.

 Data Analysis: Applying statistical, machine learning, and AI algorithms to extract

patterns and insights.
 Data Visualization: Creating interactive dashboards, charts, and graphs to represent
data insights.

Data processing techniques include:

 Batch Processing: Analyzing large datasets in scheduled intervals using tools like
Apache Hadoop.

 Real-Time Processing: Processing data as it arrives to enable immediate decision-

making, commonly used in fraud detection and IoT applications.

 Parallel Processing: Splitting large tasks into smaller sub-tasks executed simultaneously
across multiple processors.

 Distributed Processing: Using clusters of machines to process data simultaneously,

improving performance and scalability.

1.4 Data Analysis and Visualization

Data analysis and visualization simplify the interpretation of data insights. Types of analysis
include:

 Descriptive Analysis: Summarizing historical data to identify trends and patterns.

 Diagnostic Analysis: Investigating the reasons behind past events using statistical
methods.

 Predictive Analysis: Forecasting future outcomes using machine learning models.

 Prescriptive Analysis: Recommending actions based on predictive insights to optimize

decision-making.

 Exploratory Data Analysis (EDA): Discovering hidden patterns, outliers, and

relationships in datasets through statistical and visualization techniques.

Visualization tools such as Tableau, Power BI, and Python libraries like Matplotlib, Seaborn,
and Plotly help present data in an intuitive and interactive manner.

1.5 Big Data Analytics

Big Data Analytics involves advanced techniques to extract insights from large datasets. Key
components include:

 Data Collection: Aggregating structured, semi-structured, and unstructured data from

multiple sources.

 Data Storage and Management: Organizing data efficiently using distributed storage
systems and data lakes.

 Data Processing: Transforming raw data into analyzable formats using ETL (Extract,
Transform, Load) pipelines.

 Data Analysis: Applying statistical, machine learning, and deep learning algorithms to
extract insights.
 Data Visualization: Presenting data through interactive dashboards, charts, and
reports.

Types of analytics include descriptive, diagnostic, predictive, and prescriptive. Emerging trends
include AI integration, edge computing, blockchain-based security, and augmented analytics.

Summary
Big Data is revolutionizing industries by enabling data-driven decision-making, operational
efficiency, and innovation. The combination of cloud computing, AI, and distributed processing
empowers organizations to derive meaningful insights from vast datasets. This document
explored Big Data technologies, storage methods, data processing techniques, analysis methods,
and visualization tools. By leveraging these tools and best practices, organizations can improve
performance, reduce costs, and gain a competitive advantage in the digital age.
References

 Chen, M., Mao, S., & Liu, Y. (2014). Big Data: A Survey. Mobile Networks and
Applications.

 Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified Data Processing on Large
Clusters. Communications of the ACM.

 Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data concepts, methods, and
analytics. International Journal of Information Management.

 Hashem, I. A. T., et al. (2015). The rise of "big data" on cloud computing: Review and
open research issues. Information Systems.

 Marr, B. (2018). Big Data in Practice: How 45 Successful Companies Used Big Data
Analytics to Deliver Extraordinary Results. Wiley.

THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Showman House
100% (1)
Showman House
16 pages
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Hadoop Ecosystem for Big Data
From Everand
Hadoop Ecosystem for Big Data
Dr. Zemelak Goraga
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
Database Management System
From Everand
Database Management System
Manish Soni
No ratings yet
Comprehensive Guide to Azure HDInsight: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Azure HDInsight: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Enterprise Data Protection with Rubrik: Definitive Reference for Developers and Engineers
From Everand
Enterprise Data Protection with Rubrik: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Cloud-Based Multi-Modal Information Analytics
From Everand
Cloud-Based Multi-Modal Information Analytics
Tanushri Kaniyar
No ratings yet
Cloud Storage Evolution
From Everand
Cloud Storage Evolution
Lucas Lee
No ratings yet
Cohesity Architecture and Administration: Definitive Reference for Developers and Engineers
From Everand
Cohesity Architecture and Administration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Building and Operating Data Hubs: Using a practical Framework as Toolset
From Everand
Building and Operating Data Hubs: Using a practical Framework as Toolset
Georg Graner
No ratings yet
Informatica Solutions and Data Integration: Definitive Reference for Developers and Engineers
From Everand
Informatica Solutions and Data Integration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Enterprise Data Protection with Veritas Technologies: Definitive Reference for Developers and Engineers
From Everand
Enterprise Data Protection with Veritas Technologies: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Data Preparation with AWS Glue DataBrew: Definitive Reference for Developers and Engineers
From Everand
Efficient Data Preparation with AWS Glue DataBrew: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
InfluxDB Essentials: Definitive Reference for Developers and Engineers
From Everand
InfluxDB Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Essential Guide to DataStage Systems: Definitive Reference for Developers and Engineers
From Everand
Essential Guide to DataStage Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
AWS Timestream Data Management and Analysis: Definitive Reference for Developers and Engineers
From Everand
AWS Timestream Data Management and Analysis: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Snowflake Data Platform Engineering: Definitive Reference for Developers and Engineers
From Everand
Snowflake Data Platform Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
The Future of Database Management Technologies: Harnessing the Power of Data: Insights and Strategies in Database Management
From Everand
The Future of Database Management Technologies: Harnessing the Power of Data: Insights and Strategies in Database Management
Robert Lewis
No ratings yet
Engineering Data Mesh in Azure Cloud: Implement data mesh using Microsoft Azure's Cloud Adoption Framework
From Everand
Engineering Data Mesh in Azure Cloud: Implement data mesh using Microsoft Azure's Cloud Adoption Framework
Aniruddha Deswandikar
No ratings yet
Enterprise Data Science: Smarter Decisions with Big Data
From Everand
Enterprise Data Science: Smarter Decisions with Big Data
Vidhur Gupta
No ratings yet
SQL Demystified: A Beginner's Roadmap to Data Retrieval and Management
From Everand
SQL Demystified: A Beginner's Roadmap to Data Retrieval and Management
Kaushal Mehta
No ratings yet
Teradata Architecture and SQL Essentials: Definitive Reference for Developers and Engineers
From Everand
Teradata Architecture and SQL Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Essential Backup Strategies and Techniques: Definitive Reference for Developers and Engineers
From Everand
Essential Backup Strategies and Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
From Everand
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
JAMIE POWERS
No ratings yet
Dataiku Platform Foundations: Definitive Reference for Developers and Engineers
From Everand
Dataiku Platform Foundations: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Applied Hudi Systems: Definitive Reference for Developers and Engineers
From Everand
Applied Hudi Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Superset Data Exploration and Analysis Framework: Definitive Reference for Developers and Engineers
From Everand
Superset Data Exploration and Analysis Framework: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
ThoughtSpot Analytics and Administration: Definitive Reference for Developers and Engineers
From Everand
ThoughtSpot Analytics and Administration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Big Data Analytics in Cloud Computing
No ratings yet
Big Data Analytics in Cloud Computing
10 pages
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Efficient Database Management with HeidiSQL: Definitive Reference for Developers and Engineers
From Everand
Efficient Database Management with HeidiSQL: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
DataDog Operations and Monitoring Guide: Definitive Reference for Developers and Engineers
From Everand
DataDog Operations and Monitoring Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Application Design: Key Principles For Data-Intensive App Systems
From Everand
Application Design: Key Principles For Data-Intensive App Systems
Rob Botwright
No ratings yet
Practical HTCondor Administration: Definitive Reference for Developers and Engineers
From Everand
Practical HTCondor Administration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Striim Platform Essentials: Definitive Reference for Developers and Engineers
From Everand
Striim Platform Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Effective Business Intelligence with QuickSight
From Everand
Effective Business Intelligence with QuickSight
Rajesh Nadipalli
No ratings yet
Mastering Apache Iceberg: Managing Big Data in a Modern Data Lake
From Everand
Mastering Apache Iceberg: Managing Big Data in a Modern Data Lake
Robert Johnson
No ratings yet
Sqoop Essentials: Definitive Reference for Developers and Engineers
From Everand
Sqoop Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Azure Synapse Analytics Solutions: Definitive Reference for Developers and Engineers
From Everand
Azure Synapse Analytics Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Lake Development with Big Data: Explore architectural approaches to building Data Lakes that ingest, index, manage, and analyze massive amounts of data using Big Data technologies
From Everand
Data Lake Development with Big Data: Explore architectural approaches to building Data Lakes that ingest, index, manage, and analyze massive amounts of data using Big Data technologies
Pradeep Pasupuleti
No ratings yet
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
From Everand
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
Rob Botwright
No ratings yet
Redshift Essentials: Definitive Reference for Developers and Engineers
From Everand
Redshift Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Dataproc Administration and Engineering Solutions: Definitive Reference for Developers and Engineers
From Everand
Dataproc Administration and Engineering Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Deploying and Managing Applications with DigitalOcean: Definitive Reference for Developers and Engineers
From Everand
Deploying and Managing Applications with DigitalOcean: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
The Study of Building the Data Warehouse
From Everand
The Study of Building the Data Warehouse
venkateswara Rao
No ratings yet
Advanced Network Backup with Amanda: Definitive Reference for Developers and Engineers
From Everand
Advanced Network Backup with Amanda: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Datastore Architecture and Implementation: Definitive Reference for Developers and Engineers
From Everand
Datastore Architecture and Implementation: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Talend Data Integration Essentials: Definitive Reference for Developers and Engineers
From Everand
Talend Data Integration Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Decision Support System: Fundamentals and Applications for The Art and Science of Smart Choices
From Everand
Decision Support System: Fundamentals and Applications for The Art and Science of Smart Choices
Fouad Sabry
No ratings yet
Sisense Solutions and Implementation Guide: Definitive Reference for Developers and Engineers
From Everand
Sisense Solutions and Implementation Guide: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Google Cloud Memorystore in Practice: Definitive Reference for Developers and Engineers
From Everand
Google Cloud Memorystore in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Lexicon of Computer Science Terminology: Lexicon of Tech and Business, #16
From Everand
Lexicon of Computer Science Terminology: Lexicon of Tech and Business, #16
Mustafa Al-Dori
4/5 (1)
Experiment No _ 1 Bda
No ratings yet
Experiment No _ 1 Bda
10 pages
21CS71-SOLUTIONS
No ratings yet
21CS71-SOLUTIONS
24 pages
ML Individual Assigenment 1
No ratings yet
ML Individual Assigenment 1
11 pages
chapter 4 Neural Network
No ratings yet
chapter 4 Neural Network
46 pages
Selected topic Doc
No ratings yet
Selected topic Doc
14 pages
Chapter 3-Unsupervised learning_updated
No ratings yet
Chapter 3-Unsupervised learning_updated
54 pages
chapter 5 Model Evaluation
No ratings yet
chapter 5 Model Evaluation
21 pages
Machine Learningassignment G_7
No ratings yet
Machine Learningassignment G_7
10 pages
Chapter 1-Introduction (4)
No ratings yet
Chapter 1-Introduction (4)
33 pages
Tadlo mcl
No ratings yet
Tadlo mcl
11 pages
mcl ind assign
No ratings yet
mcl ind assign
10 pages
Oop PHP Mysqli
No ratings yet
Oop PHP Mysqli
5 pages
Canubing II - Dec
No ratings yet
Canubing II - Dec
73 pages
Mastering the SAP Business Information Warehouse Leveraging the Business Intelligence Capabilities of SAP NetWeaver 2nd Edition Kevin Mcdonald - The latest updated ebook version is ready for download
100% (1)
Mastering the SAP Business Information Warehouse Leveraging the Business Intelligence Capabilities of SAP NetWeaver 2nd Edition Kevin Mcdonald - The latest updated ebook version is ready for download
49 pages
Hus File Module Storage Subsystem Administration Guide
No ratings yet
Hus File Module Storage Subsystem Administration Guide
40 pages
Word 2007 Advanced
No ratings yet
Word 2007 Advanced
1 page
Helpful Answer: 6. Re: How To Create Mount Point in Linux
No ratings yet
Helpful Answer: 6. Re: How To Create Mount Point in Linux
2 pages
FAT and NTFS
100% (1)
FAT and NTFS
42 pages
PHP Soccer League
No ratings yet
PHP Soccer League
4 pages
Queries Chapter 5
No ratings yet
Queries Chapter 5
4 pages
Multimedia DB
No ratings yet
Multimedia DB
30 pages
Eth Cod Ab 2020 - 10 - 28
No ratings yet
Eth Cod Ab 2020 - 10 - 28
4 pages
Week 3 Teradata Practice Exercises Guide
No ratings yet
Week 3 Teradata Practice Exercises Guide
5 pages
Chapter 1 - The Cold, Hard Teradata Facts - Teradata Database Administration - Teradata Internals PDF
No ratings yet
Chapter 1 - The Cold, Hard Teradata Facts - Teradata Database Administration - Teradata Internals PDF
46 pages
Internet of Things 18Cs81: Module - 4 Data and Analytics For Iot
No ratings yet
Internet of Things 18Cs81: Module - 4 Data and Analytics For Iot
32 pages
Datadmin Quiz1 - 02072015
No ratings yet
Datadmin Quiz1 - 02072015
1 page
database project group 5
No ratings yet
database project group 5
22 pages
HANA Alert 66 Storage Snapshot Is Prepared
No ratings yet
HANA Alert 66 Storage Snapshot Is Prepared
12 pages
Sonu Kumars CV
No ratings yet
Sonu Kumars CV
1 page
Rbac 1
No ratings yet
Rbac 1
2 pages
Data warehouse unit 4 complete
No ratings yet
Data warehouse unit 4 complete
21 pages
SS2 Data Processing Exam
No ratings yet
SS2 Data Processing Exam
2 pages
Database Administrator (DBA) Responsibilities
No ratings yet
Database Administrator (DBA) Responsibilities
4 pages
Business Intelligence and Analytics Systems for Decision Support 10th Edition Sharda Solutions Manualdownload
100% (4)
Business Intelligence and Analytics Systems for Decision Support 10th Edition Sharda Solutions Manualdownload
56 pages
CDCSetup
No ratings yet
CDCSetup
4 pages
XXXX_GCP RESUME
No ratings yet
XXXX_GCP RESUME
2 pages
Chapter 1 Transaction Management & Concurrency Control
No ratings yet
Chapter 1 Transaction Management & Concurrency Control
89 pages
Mobile Phones Data Analysis: Introduction To Data Management Project Report
No ratings yet
Mobile Phones Data Analysis: Introduction To Data Management Project Report
12 pages
PIG Interview Qusetions
No ratings yet
PIG Interview Qusetions
15 pages
Digital Object Identifier (DOI) System
No ratings yet
Digital Object Identifier (DOI) System
7 pages

Selected topic D4444oc

Uploaded by

Selected topic D4444oc

Uploaded by

DEBARK UNVERSITY

COLLEGE OF NATURAL AND COMPUTATIONAL SCIENCE

DEPARTMENT OF COMPUTER SCIECNE

COURSE TITLE SELECTED TOPIC IN COMPUTER SCIENCE

ALEMAYEHU ASIMARE 1300071

ABDUKERIM SINDEW 130

KEBON TADESSE 1300550

SUBMISSION DATE 10-05-2017 E.C

1. Introduction to Big Data

1.1 1.1 Big Data Technologies

 Hadoop: An open-source framework for distributed storage and processing of large

1.2 Data Storage and Management

 Blockchain-Based Storage: Blockchain-based storage solutions enhance data security

Effective data management strategies involve:

 Metadata Management: Cataloging data with descriptive metadata to improve

 Data Deduplication: Identifying and eliminating redundant copies of data to optimize

1.3 Data Processing

 Data Transformation: Aggregating, normalizing, and encoding data for compatibility

 Data Analysis: Applying statistical, machine learning, and AI algorithms to extract

Data processing techniques include:

 Real-Time Processing: Processing data as it arrives to enable immediate decision-

 Distributed Processing: Using clusters of machines to process data simultaneously,

1.4 Data Analysis and Visualization

 Descriptive Analysis: Summarizing historical data to identify trends and patterns.

 Predictive Analysis: Forecasting future outcomes using machine learning models.

 Prescriptive Analysis: Recommending actions based on predictive insights to optimize

 Exploratory Data Analysis (EDA): Discovering hidden patterns, outliers, and

1.5 Big Data Analytics

 Data Collection: Aggregating structured, semi-structured, and unstructured data from

You might also like