Introduction
Introduction
– Session 2 –
Table Of Content
What is big data
Introduction to cloud
• Understand the fundamental concept behind cloud based big data handling
What is big data ?
What is big data ?
Big data refers to extremely large and complex sets of data that cannot be easily processed or analyzed using
traditional data processing tools and methods. It often involves structured, unstructured, and semi-structured data
from various sources and can be used to uncover insights, patterns, and trends that can inform decision-making
and improve business operations.
1.Improved decision-making: By analyzing large and complex datasets, organizations can gain insights and
make informed decisions that can improve business operations, customer satisfaction, and overall performance.
2.Increased efficiency and productivity: Big data technologies can automate and streamline processes,
reducing manual labor and improving efficiency.
3.Competitive advantage: Organizations that can effectively analyze and utilize big data can gain a competitive
advantage by identifying new opportunities, improving customer experiences, and optimizing operations.
4.Better customer insights: Big data analytics can provide deeper insights into customer behavior and
preferences, allowing organizations to tailor their products and services to meet customer needs.
5.Innovation and discovery: Big data can help drive innovation and discovery by identifying new trends and
patterns that were previously unknown.
6.Cost savings: Big data technologies can help reduce costs by improving efficiency, reducing waste, and
optimizing operations.
7.Improved risk management: Big data analytics can help identify potential risks and threats, allowing
organizations to take proactive measures to mitigate them.
Why is Big Data important ?
Here are a few more reasons why big data is important:
8. Personalization: Big data analytics can help personalize experiences for customers, such as recommending
products or services based on their preferences and behavior.
9. Predictive analytics: Big data can be used to develop predictive models that can forecast future trends and
outcomes, allowing organizations to make proactive decisions and strategies.
10. Improved supply chain management: Big data can be used to optimize supply chain operations, such as
predicting demand, identifying potential bottlenecks, and improving inventory management.
11. Better fraud detection: Big data analytics can help identify potential fraudulent activity, such as credit card
fraud, by analyzing patterns and anomalies in data.
12. Improved healthcare outcomes: Big data can be used to analyze patient data and develop personalized
treatment plans, as well as identify potential health risks before they become serious.
13. Environmental sustainability: Big data can be used to monitor and analyze environmental data, such as air
and water quality, to identify potential issues and improve sustainability efforts.
Overall, big data has the potential to impact almost every aspect of modern life, from business operations to
healthcare to the environment. As such, it is becoming increasingly important for organizations to effectively collect,
store, process, and analyze data in order to stay competitive and improve outcomes.
How is big data handled ?
How is Big Data Handled ?
1 Data sources 2 Data ingestion 3 Data storage 4 Data processing
Big data can come Data must be ingested into Big data must be stored in a way that allows Big data processing involves the
from a variety of the big data system for for easy access and processing. This can use of tools and technologies such
sources, including processing. This can be include traditional databases as well as as Apache Hadoop, Apache Spark,
social media, sensors, done through tools such as specialized big data storage solutions such and MapReduce to analyze and
mobile devices, and Apache Kafka, Flume, or as Hadoop Distributed File System (HDFS), extract insights from large and
other sources. NiFi. Apache Cassandra, or Amazon S3. complex datasets.
Amazon Amazon
AWS Amazon S3 Amazon EMR
SageMaker QuickSight
Apache Hadoop,
Tableau, Power
Tools Apache Spark, or
BI, or Matplotlib
MapReduce
Cloud based big data handling and Analytics AWS Redshift is a fast, scalable, and
fully managed data warehouse that
makes it easy to analyze large
amounts of data.
Ashutosh Vyas
Solutions Lead – Data Science & Quantum Computing
Harman DTS India Pvt. Ltd.
“He has a 8+ years of experience in data science domain. He has worked on multiple projects of pattern
recognition, time series forecasting, regression modelling, NLP, classification and optimization in Life science,
Finance, FMCG and Media domain. He completed his Mtech. In 2015 from Iiit-b. He has expertise in Bayesian
and frequentist methods of machine learning and had been working in quantum ML and quantum optimization from
past 4 years. He works with an ethos of developing customer centric and robust solutions.”