This document provides an overview of key concepts related to data and big data. It defines data, digital data, and the different types of digital data including unstructured, semi-structured, and structured data. Big data is introduced as the collection of large and complex data sets that are difficult to process using traditional tools. The importance of big data is discussed along with common sources of data and characteristics. Popular tools and technologies for storing, analyzing, and visualizing big data are also outlined.
This document provides an introduction to big data analytics. It discusses what big data is, key concepts and terminology, the characteristics of big data including the five Vs, different types of data, and case study background. It also covers big data drivers like marketplace dynamics, business architecture, and information and communications technology. The slides include information on data analytics categories, business intelligence, key performance indicators, and how big data relates to business layers and the feedback loop.
This document provides an introduction to big data, including defining big data, discussing its history, importance, types, characteristics, how it works, challenges, technologies, and architecture. Big data is defined as extremely large and complex datasets that cannot be processed using traditional tools. It has existed for thousands of years but grew substantially in the 20th century. Companies use big data to improve operations and increase profits. The types include structured, semi-structured, and unstructured data. Big data works through data collection, storage, processing, analysis, and visualization. The challenges include rapid data growth, storage needs, unreliable data, and security issues. Technologies include those for operations and analytics. The architecture includes ingestion, batch processing, analytical storage
Data science involves extracting knowledge and insights from structured, semi-structured, and unstructured data using scientific processes. It encompasses more than just data analysis. The data value chain describes the process of acquiring data and transforming it into useful information and insights. It involves data acquisition, analysis, curation, storage, and usage. There are three main types of data: structured data that follows a predefined model like databases, semi-structured data with some organization like JSON, and unstructured data like text without a clear model. Metadata provides additional context about data to help with analysis. Big data is characterized by its large volume, velocity, and variety that makes it difficult to process with traditional tools.
The document provides an overview of key concepts in data science and big data including:
1) It defines data science, data scientists, and their roles in extracting insights from structured, semi-structured, and unstructured data.
2) It explains different data types like structured, semi-structured, unstructured and their characteristics from a data analytics perspective.
3) It describes the data value chain involving data acquisition, analysis, curation, storage, and usage to generate value from data.
4) It introduces concepts in big data like the 3V's of volume, velocity and variety, and technologies like Hadoop and its ecosystem that are used for distributed processing of large datasets.
Introduction to Data Analytics: Sources and nature
of data, classification of data (structured, semistructured,
unstructured), characteristics of data,
introduction to Big Data platform, need of data
analytics, evolution of analytic scalability, analytic
process and tools, analysis vs reporting, modern
data analytic tools, applications of data analytics.
Data Analytics Lifecycle: Need, key roles for
successful analytic projects, various phases of data
analytics lifecycle – discovery, data preparation,
model planning, model building, communicating
results, operationalization.
Data is unprocessed facts and figures that can be represented using characters. Information is processed data used to make decisions. Data science uses scientific methods to extract knowledge from structured, semi-structured, and unstructured data. The data processing cycle involves inputting data, processing it, and outputting the results. There are different types of data from both computer programming and data analytics perspectives including structured, semi-structured, and unstructured data. Metadata provides additional context about data.
Introduction to Business and Data Analysis Undergraduate.pdfAbdulrahimShaibuIssa
The document provides an introduction to business and data analytics. It discusses how businesses are recognizing the value of data analytics and are hiring and upskilling people to expand their data analytics capabilities. It also notes the significant demand for skilled data analysts. The document outlines the modern data ecosystem, including different data sources, key players in turning data into insights, and emerging technologies shaping the ecosystem. It defines data analysis and provides an overview of the data analyst ecosystem.
The document discusses data analytics and its evolution from relying on past experiences to using data-driven insights. It covers the types of analytics including descriptive, diagnostic, predictive, and prescriptive analytics. Descriptive analytics summarize past data, diagnostic analytics determine factors influencing outcomes, predictive analytics make future predictions, and prescriptive analytics identify best courses of action. The document also discusses data analysis tools, natural language processing, applications of analytics, benefits of analytics for IoT, and issues with big data in IoT contexts like smart agriculture.
Big data analytics (BDA) involves examining large, diverse datasets to uncover hidden patterns, correlations, trends, and insights. BDA helps organizations gain a competitive advantage by extracting insights from data to make faster, more informed decisions. It supports a 360-degree view of customers by analyzing both structured and unstructured data sources like clickstream data. Businesses can leverage techniques like machine learning, predictive analytics, and natural language processing on existing and new data sources. BDA requires close collaboration between IT, business users, and data scientists to process and analyze large datasets beyond typical storage and processing capabilities.
This document provides an overview of key concepts in data analytics including:
- The sources and nature of data as well as classifications like structured, semi-structured, and unstructured data.
- The need for data analytics to gather hidden insights, generate reports, perform market analysis, and improve business requirements.
- The stages of the data analytics lifecycle including discovery, data preparation, model planning, model building, and communicating results.
- Popular tools used in data analytics like R, Python, Tableau, and SAS.
This document defines big data and discusses its key characteristics and applications. It begins by defining big data as large volumes of structured, semi-structured, and unstructured data that is difficult to process using traditional methods. It then outlines the 5 Vs of big data: volume, velocity, variety, veracity, and variability. The document also discusses Hadoop as an open-source framework for distributed storage and processing of big data, and lists several applications of big data across various industries. Finally, it discusses both the risks and benefits of working with big data.
Business Analytics and Data mining.pdfssuser0413ec
Business analytics involves analyzing large amounts of data to discover patterns and make predictions. It uses techniques like data mining, predictive analytics, and statistical analysis. The goals are to help businesses make smarter decisions, identify trends, and improve performance. Data mining is the process of automatically discovering useful patterns from large data sets. It is used to extract knowledge from vast amounts of data that would otherwise be unknown. Data mining helps businesses gain insights from their data to increase sales, improve customer retention, and enhance brand experience.
The document discusses big data, including its characteristics, sources, uses, and challenges. It defines big data as extremely large and complex data that cannot be managed with traditional software. Key points include: big data is characterized by volume, velocity, and variety; sources include business applications and social media; uses include marketing, healthcare, and manufacturing; and challenges include dealing with the volume, variety, velocity, data quality issues, and cost of infrastructure. It also covers data management concepts like data governance, data stewards, the data lifecycle, data warehouses, ETL processes, data marts, data lakes, NoSQL databases, Hadoop, HDFS, and MapReduce programs.
its all about Artificial Intelligence(Ai) and Machine Learning and not on advanced level you can study before the exam or can check for some information on Ai for project
Ad
More Related Content
Similar to Big data Analytics Unit - CCS334 Syllabus (20)
This document provides an introduction to big data analytics. It discusses what big data is, key concepts and terminology, the characteristics of big data including the five Vs, different types of data, and case study background. It also covers big data drivers like marketplace dynamics, business architecture, and information and communications technology. The slides include information on data analytics categories, business intelligence, key performance indicators, and how big data relates to business layers and the feedback loop.
This document provides an introduction to big data, including defining big data, discussing its history, importance, types, characteristics, how it works, challenges, technologies, and architecture. Big data is defined as extremely large and complex datasets that cannot be processed using traditional tools. It has existed for thousands of years but grew substantially in the 20th century. Companies use big data to improve operations and increase profits. The types include structured, semi-structured, and unstructured data. Big data works through data collection, storage, processing, analysis, and visualization. The challenges include rapid data growth, storage needs, unreliable data, and security issues. Technologies include those for operations and analytics. The architecture includes ingestion, batch processing, analytical storage
Data science involves extracting knowledge and insights from structured, semi-structured, and unstructured data using scientific processes. It encompasses more than just data analysis. The data value chain describes the process of acquiring data and transforming it into useful information and insights. It involves data acquisition, analysis, curation, storage, and usage. There are three main types of data: structured data that follows a predefined model like databases, semi-structured data with some organization like JSON, and unstructured data like text without a clear model. Metadata provides additional context about data to help with analysis. Big data is characterized by its large volume, velocity, and variety that makes it difficult to process with traditional tools.
The document provides an overview of key concepts in data science and big data including:
1) It defines data science, data scientists, and their roles in extracting insights from structured, semi-structured, and unstructured data.
2) It explains different data types like structured, semi-structured, unstructured and their characteristics from a data analytics perspective.
3) It describes the data value chain involving data acquisition, analysis, curation, storage, and usage to generate value from data.
4) It introduces concepts in big data like the 3V's of volume, velocity and variety, and technologies like Hadoop and its ecosystem that are used for distributed processing of large datasets.
Introduction to Data Analytics: Sources and nature
of data, classification of data (structured, semistructured,
unstructured), characteristics of data,
introduction to Big Data platform, need of data
analytics, evolution of analytic scalability, analytic
process and tools, analysis vs reporting, modern
data analytic tools, applications of data analytics.
Data Analytics Lifecycle: Need, key roles for
successful analytic projects, various phases of data
analytics lifecycle – discovery, data preparation,
model planning, model building, communicating
results, operationalization.
Data is unprocessed facts and figures that can be represented using characters. Information is processed data used to make decisions. Data science uses scientific methods to extract knowledge from structured, semi-structured, and unstructured data. The data processing cycle involves inputting data, processing it, and outputting the results. There are different types of data from both computer programming and data analytics perspectives including structured, semi-structured, and unstructured data. Metadata provides additional context about data.
Introduction to Business and Data Analysis Undergraduate.pdfAbdulrahimShaibuIssa
The document provides an introduction to business and data analytics. It discusses how businesses are recognizing the value of data analytics and are hiring and upskilling people to expand their data analytics capabilities. It also notes the significant demand for skilled data analysts. The document outlines the modern data ecosystem, including different data sources, key players in turning data into insights, and emerging technologies shaping the ecosystem. It defines data analysis and provides an overview of the data analyst ecosystem.
The document discusses data analytics and its evolution from relying on past experiences to using data-driven insights. It covers the types of analytics including descriptive, diagnostic, predictive, and prescriptive analytics. Descriptive analytics summarize past data, diagnostic analytics determine factors influencing outcomes, predictive analytics make future predictions, and prescriptive analytics identify best courses of action. The document also discusses data analysis tools, natural language processing, applications of analytics, benefits of analytics for IoT, and issues with big data in IoT contexts like smart agriculture.
Big data analytics (BDA) involves examining large, diverse datasets to uncover hidden patterns, correlations, trends, and insights. BDA helps organizations gain a competitive advantage by extracting insights from data to make faster, more informed decisions. It supports a 360-degree view of customers by analyzing both structured and unstructured data sources like clickstream data. Businesses can leverage techniques like machine learning, predictive analytics, and natural language processing on existing and new data sources. BDA requires close collaboration between IT, business users, and data scientists to process and analyze large datasets beyond typical storage and processing capabilities.
This document provides an overview of key concepts in data analytics including:
- The sources and nature of data as well as classifications like structured, semi-structured, and unstructured data.
- The need for data analytics to gather hidden insights, generate reports, perform market analysis, and improve business requirements.
- The stages of the data analytics lifecycle including discovery, data preparation, model planning, model building, and communicating results.
- Popular tools used in data analytics like R, Python, Tableau, and SAS.
This document defines big data and discusses its key characteristics and applications. It begins by defining big data as large volumes of structured, semi-structured, and unstructured data that is difficult to process using traditional methods. It then outlines the 5 Vs of big data: volume, velocity, variety, veracity, and variability. The document also discusses Hadoop as an open-source framework for distributed storage and processing of big data, and lists several applications of big data across various industries. Finally, it discusses both the risks and benefits of working with big data.
Business Analytics and Data mining.pdfssuser0413ec
Business analytics involves analyzing large amounts of data to discover patterns and make predictions. It uses techniques like data mining, predictive analytics, and statistical analysis. The goals are to help businesses make smarter decisions, identify trends, and improve performance. Data mining is the process of automatically discovering useful patterns from large data sets. It is used to extract knowledge from vast amounts of data that would otherwise be unknown. Data mining helps businesses gain insights from their data to increase sales, improve customer retention, and enhance brand experience.
The document discusses big data, including its characteristics, sources, uses, and challenges. It defines big data as extremely large and complex data that cannot be managed with traditional software. Key points include: big data is characterized by volume, velocity, and variety; sources include business applications and social media; uses include marketing, healthcare, and manufacturing; and challenges include dealing with the volume, variety, velocity, data quality issues, and cost of infrastructure. It also covers data management concepts like data governance, data stewards, the data lifecycle, data warehouses, ETL processes, data marts, data lakes, NoSQL databases, Hadoop, HDFS, and MapReduce programs.
its all about Artificial Intelligence(Ai) and Machine Learning and not on advanced level you can study before the exam or can check for some information on Ai for project
Fluid mechanics is the branch of physics concerned with the mechanics of fluids (liquids, gases, and plasmas) and the forces on them. Originally applied to water (hydromechanics), it found applications in a wide range of disciplines, including mechanical, aerospace, civil, chemical, and biomedical engineering, as well as geophysics, oceanography, meteorology, astrophysics, and biology.
It can be divided into fluid statics, the study of various fluids at rest, and fluid dynamics.
Fluid statics, also known as hydrostatics, is the study of fluids at rest, specifically when there's no relative motion between fluid particles. It focuses on the conditions under which fluids are in stable equilibrium and doesn't involve fluid motion.
Fluid kinematics is the branch of fluid mechanics that focuses on describing and analyzing the motion of fluids, such as liquids and gases, without considering the forces that cause the motion. It deals with the geometrical and temporal aspects of fluid flow, including velocity and acceleration. Fluid dynamics, on the other hand, considers the forces acting on the fluid.
Fluid dynamics is the study of the effect of forces on fluid motion. It is a branch of continuum mechanics, a subject which models matter without using the information that it is made out of atoms; that is, it models matter from a macroscopic viewpoint rather than from microscopic.
Fluid mechanics, especially fluid dynamics, is an active field of research, typically mathematically complex. Many problems are partly or wholly unsolved and are best addressed by numerical methods, typically using computers. A modern discipline, called computational fluid dynamics (CFD), is devoted to this approach. Particle image velocimetry, an experimental method for visualizing and analyzing fluid flow, also takes advantage of the highly visual nature of fluid flow.
Fundamentally, every fluid mechanical system is assumed to obey the basic laws :
Conservation of mass
Conservation of energy
Conservation of momentum
The continuum assumption
For example, the assumption that mass is conserved means that for any fixed control volume (for example, a spherical volume)—enclosed by a control surface—the rate of change of the mass contained in that volume is equal to the rate at which mass is passing through the surface from outside to inside, minus the rate at which mass is passing from inside to outside. This can be expressed as an equation in integral form over the control volume.
The continuum assumption is an idealization of continuum mechanics under which fluids can be treated as continuous, even though, on a microscopic scale, they are composed of molecules. Under the continuum assumption, macroscopic (observed/measurable) properties such as density, pressure, temperature, and bulk velocity are taken to be well-defined at "infinitesimal" volume elements—small in comparison to the characteristic length scale of the system, but large in comparison to molecular length scale
This paper proposes a shoulder inverse kinematics (IK) technique. Shoulder complex is comprised of the sternum, clavicle, ribs, scapula, humerus, and four joints.
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptxRishavKumar530754
LiDAR-Based System for Autonomous Cars
Autonomous Driving with LiDAR Tech
LiDAR Integration in Self-Driving Cars
Self-Driving Vehicles Using LiDAR
LiDAR Mapping for Driverless Cars
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...Infopitaara
A feed water heater is a device used in power plants to preheat water before it enters the boiler. It plays a critical role in improving the overall efficiency of the power generation process, especially in thermal power plants.
🔧 Function of a Feed Water Heater:
It uses steam extracted from the turbine to preheat the feed water.
This reduces the fuel required to convert water into steam in the boiler.
It supports Regenerative Rankine Cycle, increasing plant efficiency.
🔍 Types of Feed Water Heaters:
Open Feed Water Heater (Direct Contact)
Steam and water come into direct contact.
Mixing occurs, and heat is transferred directly.
Common in low-pressure stages.
Closed Feed Water Heater (Surface Type)
Steam and water are separated by tubes.
Heat is transferred through tube walls.
Common in high-pressure systems.
⚙️ Advantages:
Improves thermal efficiency.
Reduces fuel consumption.
Lowers thermal stress on boiler components.
Minimizes corrosion by removing dissolved gases.
RICS Membership-(The Royal Institution of Chartered Surveyors).pdfMohamedAbdelkader115
Glad to be one of only 14 members inside Kuwait to hold this credential.
Please check the members inside kuwait from this link:
https://www.rics.org/networking/find-a-member.html?firstname=&lastname=&town=&country=Kuwait&member_grade=(AssocRICS)&expert_witness=&accrediation=&page=1
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...Infopitaara
A Boiler Feed Pump (BFP) is a critical component in thermal power plants. It supplies high-pressure water (feedwater) to the boiler, ensuring continuous steam generation.
⚙️ How a Boiler Feed Pump Works
Water Collection:
Feedwater is collected from the deaerator or feedwater tank.
Pressurization:
The pump increases water pressure using multiple impellers/stages in centrifugal types.
Discharge to Boiler:
Pressurized water is then supplied to the boiler drum or economizer section, depending on design.
🌀 Types of Boiler Feed Pumps
Centrifugal Pumps (most common):
Multistage for higher pressure.
Used in large thermal power stations.
Positive Displacement Pumps (less common):
For smaller or specific applications.
Precise flow control but less efficient for large volumes.
🛠️ Key Operations and Controls
Recirculation Line: Protects the pump from overheating at low flow.
Throttle Valve: Regulates flow based on boiler demand.
Control System: Often automated via DCS/PLC for variable load conditions.
Sealing & Cooling Systems: Prevent leakage and maintain pump health.
⚠️ Common BFP Issues
Cavitation due to low NPSH (Net Positive Suction Head).
Seal or bearing failure.
Overheating from improper flow or recirculation.
The Fluke 925 is a vane anemometer, a handheld device designed to measure wind speed, air flow (volume), and temperature. It features a separate sensor and display unit, allowing greater flexibility and ease of use in tight or hard-to-reach spaces. The Fluke 925 is particularly suitable for HVAC (heating, ventilation, and air conditioning) maintenance in both residential and commercial buildings, offering a durable and cost-effective solution for routine airflow diagnostics.
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITYijscai
With the increased use of Artificial Intelligence (AI) in malware analysis there is also an increased need to
understand the decisions models make when identifying malicious artifacts. Explainable AI (XAI) becomes
the answer to interpreting the decision-making process that AI malware analysis models use to determine
malicious benign samples to gain trust that in a production environment, the system is able to catch
malware. With any cyber innovation brings a new set of challenges and literature soon came out about XAI
as a new attack vector. Adversarial XAI (AdvXAI) is a relatively new concept but with AI applications in
many sectors, it is crucial to quickly respond to the attack surface that it creates. This paper seeks to
conceptualize a theoretical framework focused on addressing AdvXAI in malware analysis in an effort to
balance explainability with security. Following this framework, designing a machine with an AI malware
detection and analysis model will ensure that it can effectively analyze malware, explain how it came to its
decision, and be built securely to avoid adversarial attacks and manipulations. The framework focuses on
choosing malware datasets to train the model, choosing the AI model, choosing an XAI technique,
implementing AdvXAI defensive measures, and continually evaluating the model. This framework will
significantly contribute to automated malware detection and XAI efforts allowing for secure systems that
are resilient to adversarial attacks.
Passenger car unit (PCU) of a vehicle type depends on vehicular characteristics, stream characteristics, roadway characteristics, environmental factors, climate conditions and control conditions. Keeping in view various factors affecting PCU, a model was developed taking a volume to capacity ratio and percentage share of particular vehicle type as independent parameters. A microscopic traffic simulation model VISSIM has been used in present study for generating traffic flow data which some time very difficult to obtain from field survey. A comparison study was carried out with the purpose of verifying when the adaptive neuro-fuzzy inference system (ANFIS), artificial neural network (ANN) and multiple linear regression (MLR) models are appropriate for prediction of PCUs of different vehicle types. From the results observed that ANFIS model estimates were closer to the corresponding simulated PCU values compared to MLR and ANN models. It is concluded that the ANFIS model showed greater potential in predicting PCUs from v/c ratio and proportional share for all type of vehicles whereas MLR and ANN models did not perform well.
2. Unit I-UNDERSTANDING BIG DATA
• Introduction to big data – convergence of key trends – unstructured
data – industry examples of big data – web analytics – big data
applications– big data technologies – introduction to Hadoop –open
source technologies – cloud and big data – mobile business
intelligence – Crowd sourcing analytics – inter and trans firewall
analytics
3. What is Big Data Analytics
• Big data analytics refers to the systematic processing and
analysis of large amounts of data and complex data sets,
known as big data, to extract valuable insights.
• Big data analytics allows for the uncovering of trends, patterns
and correlations in large amounts of raw data to help analysts
make data-informed decisions.
• This process allows organizations to leverage the exponentially
growing data generated from diverse sources, including
internet-of-things (IoT) sensors, social media, financial
transactions and smart devices to derive actionable intelligence
through advanced analytic techniques.
4. Differences between big data and traditional data
• The main difference between big data analytics and
traditional data analytics is the type of data handled and the
tools used to analyze it. Traditional analytics deals with
structured data, typically stored in relational databases.
• This type of database helps ensure that data is well-
organized and easy for a computer to understand.
• Traditional data analytics relies on statistical methods and
tools like structured query language (SQL) for querying
database
5. Differences between big data and traditional data
• Big data analytics involves massive amounts of data in
various formats, including structured, semi-structured and
unstructured data.
• The complexity of this data requires more sophisticated
analysis techniques.
• Big data analytics employs advanced techniques like
machine learning and data mining to extract information
from complex data sets.
• It often requires distributed processing systems like Hadoop
to manage the sheer volume of data.
6. Data analysis methods
• Predictive modeling by incorporating artificial intelligence
(AI) and statistical algorithms
• Statistical analysis for in-depth data exploration and to
uncover hidden patterns
• What-if analysis to simulate different scenarios and explore
potential outcomes
• Processing diverse data sets, including structured, semi-
structured and unstructured data from various sources.
7. Data analysis methods
• Descriptive analytics
• The "what happened" stage of data analysis. Here, the focus is on summarizing and
describing past data to understand its basic characteristics.
• Diagnostic analytics
• The “why it happened” stage. By delving deep into the data, diagnostic analysis
identifies the root patterns and trends observed in descriptive analytics.
• Predictive analytics
• The “what will happen” stage. It uses historical data, statistical modeling and
machine learning to forecast trends.
• Prescriptive analytics
• Describes the “what to do” stage, which goes beyond prediction to provide
recommendations for optimizing future actions based on insights derived from all
previous.
9. The five V's of big data analytics
• Volume
• The sheer volume of data generated today, from social media feeds, IoT
devices, transaction records and more, presents a significant challenge.
• Traditional data storage and processing solutions are often inadequate to
handle this scale efficiently.
• Big data technologies and cloud-based storage solutions enable
organizations to store and manage these vast data sets cost-effectively,
protecting valuable data from being discarded due to storage limitations.
10. The five V's of big data analytics
Velocity
• Data is being produced at unprecedented speeds, from real-
time social media updates to high-frequency stock trading
records.
• The velocity at which data flows into organizations requires
robust processing capabilities to capture, process and
deliver accurate analysis in near real-time.
• Stream processing frameworks and in-memory data
processing are designed to handle these rapid data streams
and balance supply with demand.
11. The five V's of big data analytics
• Variety
Today's data comes in many formats, from structured to numeric
data in traditional databases to unstructured text, video and
images from diverse sources like social media and video
surveillance.
This variety demands flexible data management systems to handle
and integrate disparate data types for comprehensive analysis.
NoSQL databases, data lakes and schema-on-read technologies
provide the necessary flexibility to accommodate the diverse
nature of big data.
• .
12. The five V's of big data analytics
Varacity
• Data reliability and accuracy are critical, as decisions based
on inaccurate or incomplete data can lead to negative
outcomes.
• Veracity refers to the data's trustworthiness, encompassing
data quality, noise and anomaly detection issues.
Techniques and tools for data cleaning, validation and
verification are integral to ensuring the integrity of big data,
enabling organizations to make better decisions based on
reliable information
13. The five V's of big data analytics
• Value
Big data analytics aims to extract actionable insights that
offer tangible value.
This involves turning vast data sets into meaningful
information that can inform strategic decisions, uncover new
opportunities and drive innovation.
Advanced analytics, machine learning and AI are key to
unlocking the value contained within big data, transforming
raw data into strategic assets.
15. Process of Big Data Analytics
• Process data: After being collected, data must be systematically
organized, extracted, transformed and then loaded into a
storage system to ensure accurate analytical outcomes.
• Processing involves converting raw data into a format that is
usable for analysis, which might involve aggregating data from
different sources, converting data types or organizing data into
structure formats. Given the exponential growth of available
data, this stage can be challenging.
• Processing strategies may vary between batch processing, which
handles large data volumes over extended periods and stream
processing, which deals with smaller real-time data batches.
16. Process of Big Data Analytics
• Clean data: Regardless of size, data must be cleaned to
ensure quality and relevance.
• Cleaning data involves formatting it correctly, removing
duplicates and eliminating irrelevant entries. Clean data
prevents the corruption of output and safeguard’s reliability
and accuracy.
17. Process of Big Data Analytics
• Analyze data: Advanced analytics, such as data mining,
predictive analytics, machine learning and deep learning,
are employed to sift through the processed and cleaned
data.
• These methods allow users to discover patterns,
relationships and trends within the data, providing a solid
foundation for informed decision-making.
18. Types of big data
Structured data
• Structured data refers to highly organized information that is easily searchable
and typically stored in relational databases or spreadsheets. It adheres to a
rigid schema, meaning each data element is clearly defined and accessible in a
fixed field within a record or file. Examples of structured data include:
• Customer names and addresses in a customer relationship management (CRM)
system
• Transactional data in financial records, such as sales figures and account
balances
• Employee data in human resources databases, including job titles and salaries
• Structured data's main advantage is its simplicity for entry, search and analysis,
often using straightforward database queries like SQL. However, the rapidly
expanding universe of big data means that structured data represents a
relatively small portion of the total data available to organizations.
20. Unstructured Data
• Unstructured data lacks a pre-defined data model, making it
more difficult to collect, process and analyze. It comprises the
majority of data generated today, and includes formats such as:
• Textual content from documents, emails and social media posts
• Multimedia content, including images, audio files and videos
• Data from IoT devices, which can include a mix of sensor data,
log files and time-series data
• The primary challenge with unstructured data is its complexity
and lack of uniformity, requiring more sophisticated methods
for indexing, searching and analyzing. NLP, machine learning
and advanced analytics platforms are often employed to extract
meaningful insights from unstructured data
22. Semi-structured data
• Semi-structured data occupies the middle ground between
structured and unstructured data. While it does not reside in a
relational database, it contains tags or other markers to separate
semantic elements and enforce hierarchies of records and fields
within the data. Examples include:
• JSON (JavaScript Object Notation) and XML (eXtensible Markup
Language) files, which are commonly used for web data interchange
• Email, where the data has a standardized format (e.g., headers,
subject, body) but the content within each section is unstructured
• NoSQL databases, can store and manage semi-structured data more
efficiently than traditional relational databases
• Semi-structured data is more flexible than structured data but easier
to analyze than unstructured data, providing a balance that is
particularly useful in web applications and data integration tasks.