0% found this document useful (0 votes)

13 views48 pages

Bigdata-cloud computing A K Mishra

The document discusses big data analytics, emphasizing its significance in agricultural sciences and the challenges faced in handling large datasets. It outlines the four layers of big data, the role of cloud computing in data analysis, and various tools and services available for bioinformatics. Additionally, it highlights the advantages and disadvantages of cloud computing, particularly in processing large datasets and the emergence of applications like CloVR for automated sequence analysis.

Uploaded by

fatmaeram49

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views48 pages

Bigdata-cloud computing A K Mishra

Uploaded by

fatmaeram49

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 48

Big data analytics

Dr. A. K. Mishra
Principal Scientist
ICAR-Indian Agricultural Research Institute, New Delhi
WHAT IS BIGDATA ?

• Every day we create 2.5 billion or quintillion bytes

(exabytes (EB) 1018 bytes) of data; 90% of the data in
the world today has been created in the last two years
alone.
• The data comes in various forms such as documents,
emails, images, graphs, videos, personal information, data
transactions and much more data obtained from various
new technologies.
• Big data analytics is a modern technique for collecting,
organizing, maintaining and analyzing large datasets to
discover new patterns and to uncover information which
hidden inside the data by using remote servers and
internet.
• The real-time application of big data analytics is now
Traditional Data Vs Big
Data
IN AGRICULTURAL SCIENCES

Agri-Engineers are joining the big-data club.

With the advent of high-throughput technologies, engineers are starting to
grapple with massive data sets, encountering challenges with
• Handling
• processing
• Sharing information

Challenges in handling agricultural data

• Large volume
• high throughput
• Relating and linking
• Complexity
• heterogeneity
The four layers of Big Data
• Data sources layer  This is where the data
arrives at your organization. It includes
everything from your sales records, customer
database, feedback, social media channels,
etc.
• Data storage layer  This is where the big
data is stored. Sophisticated but accessible
systems and tools have been developed –
such as Apache Hadoop DFS (distributed file
system) or Google File System (GFS) for data
storage.
• Data processing/analysis layer  To find
out something useful, we need to process and
analyze it. A common method is by using a
MapReduce tool.
• Data output layer  This is how the insights
gleaned through the analysis is passed on to
the people who can take action to benefit from
them. This output can take the form of
reports, charts, figures and key
recommendations
Analytics
Models How can we
make it
happen?
Prescriptive
What
Analytics
will
happen
Predictive
Why did ?Analytics
VALUE

it
happen?
Diagnostic
What
Analytics
happened
?
Descriptiv
e
Analytics

DIFFICULTY
How big data analytics works

Once the data is ready, it can be analyzed with the

software commonly used for advanced
analytics processes. That includes tools for:

• data mining: which sift through data sets in search

of patterns and relationships;
• predictive analytics: which build models to
forecast data behavior and other future developments;
• machine learning: which taps algorithms to
analyze large data sets; and
• deep learning: a more advanced offshoot of
machine learning.
Challenges in Biological Data Analysis

• Multiple comparisons issue high number of false positives than

true positives
• High-dimensional biological data difficult to discriminate
between two classes of data
• Small “n” and large “p” problem number of samples (n) <<
number of parameters to predict (p) in case of biological data
• Computational limitations limitations of the hardware/RAM/
number of processors etc
• Noisy high-throughput data sources of error/ difficulty in
controlling all experimental parameters
• Integration of multiple heterogeneous biological data various
forms of data/ pros and cons of redundancy
Solution to Big data analysis -
Cloud Computing
• It is the method for storing and accessing data and programs over the
Internet instead of your computer's hard drive.
• A type of computing that relies on sharing computing resources
rather than having local servers or personal devices.
• The word cloud (also phrased as "the cloud") is used as a metaphor
for "the Internet" and becomes “internet based computing”.
• Different services such as servers, storage and applications are
delivered to an organization's computers through the Internet.
• Cloud computing is comparable to grid computing.
• All computers in a network are harnessed to solve problems too
intensive for any stand-alone machine.
• Provides a platform or
service through
internet.
• Needs minimum
hardware and software
installed.
• Network may be LAN or
WAN on which
applications or
infrastructure is
deployed remotely and
user can access to meet
their needs. Cloud Computing Architecture
Evolution in Cloud Computing
200
Cloud 8
Computing
Software as a
199 Service • Anytime,
Utility
0 anywhere
Computing • Network-
Grid access to IT
Computing • Offering based resources
• Solving large computing subscriptions delivered
problems resources as a to dynamically
with parallel metered applications as a service.
computing service
ASHOKA: Advanced Super computing hub for Omics Knowledge in Agriculture
Indias’s First Supercomputing Facility for Agriculture at ICAR-IASRI, New Delhi

http://topsupercomputers-india.iisc.ernet.in/jsps/june2022/index.html
Open Source Softwares &
• Tools
Fifty two open source software and tools are configured on this
HPC environment to carry out various biological data analysis.
• These softwares and tools were identified based on online survey
conducted among researchers from National Agricultural Research
and Education System (NARES) institutions.

Categories of tools implemented in HPC.

Models of Cloud
2. Service models: Which defines the type of service cloud offers:
These are Cloud-based services in bioinformatics, classify them into Data as a
Service (DaaS), Software as a Service (SaaS), Platform as a Service (PaaS), and
Infrastructure as a Service (IaaS), and present our perspectives on the adoption of
cloud computing in bioinformatics.
Cloud computing comes in three basic
flavors: software as a service (SaaS),
platform as a service (PaaS), and
infrastructure as a service (IaaS).

CONSUME BUILD ON IT MIGRATE TO IT

Characteristics of A Cloud
Environment
• Dynamic- One of the keys to cloud computing is on-demand
provisioning
• Massively scalable- The service must react immediately to your
needs
• Multi-tenant- Cloud computing, by its nature, delivers shared
services
• Rapid elasticity- You can go from 5 servers to 50 or from 50
servers to 5
• Self-service - As a user, you can use the service as you require
• Per-usage based pricing model - You should only ever pay for
the amount of service you consume
• IP-based architecture - Cloud architectures are based on
Features of Cloud Computing –
10 Major Characteristics of
Cloud Computing
The “old” genome informatics
ecosystem
The “new” genome informatics
ecosystem based on cloud computing

Have two
options: continue
downloading
data to local
clusters as Continue to
before, or use work with the
cloud computing data via web-
– move the pages in their
cluster to the accustomed
data. way
Cloud resources in Bioinformatics
Software as a Service (SaaS)
 Bioinformatics requires a large variety of software tools for different types of
data analysis.
 Software as a Service (SaaS) cloud delivers software services online and
facilitates remote access to available bioinformatics software tools through the
Internet.
 As a consequence, SaaS eliminates the need of local installation and eases
software maintenances and updates, providing up-to-date cloud-based services for
bioinformatics data analysis over the Web.
 Efforts have been made to develop cloud-scale tools, including:
 Sequence analysis (mapping, assembly and alignment)
 Gene expression analysis
 Homology detection (orthologs and paralogs)
 Peak callers for ChIP-seq data
 Genome annotation (structural and functional)
 Identification of epistatic interactions of Single Nucleotide Polymorphisms
(SNPs)
 Various other cloud-based applications for NGS (Next-Generation Sequencing)
data analysis.
Cloud Computing Service
Providers
Cloud computing platforms have been emerging in the commercial
sector, including the Amazon Elastic Compute Cloud (EC2),
Rackspace Cloud, Flexiant and in the public sector to support
research, such as Magellan and DIAG
 It is an increasingly valuable tool for processing large datasets, and
it is already used by the US federal government, pharmaceutical
and Internet companies, as well as scientific labs and
bioinformatics services
Amazon Cloud Services

 Amazon offers a variety of bioinformatics-oriented virtual machine

images:
 Images prepopulated by Galaxy
 Bioconductor – programming environment with R statistic
package
 GBrowser – genome browser
 BioPerl
 JCVI Cloud BioLinux – collection of bioinformatics tools
including Celera Assembler
 Amazon also provides several large genomic datasets in its cloud:
 Complete copy of GeneBank (200 Gb)
 30x coverage sequencing reads of a trio of individuals from
1000 Genome Project (700Gb)
 Genome databases from Ensembl including annotated human
and 50 other species genomes
Progress in Cloud computing for
Bioinformatics
• The machine accessible interfaces for cloud computing in life sciences
have been developed based on HTTP-based Web service technologies,
for e.g.
 Simple Object Access Protocol (SOAP)
 REpresentational State Transfer (REST) services
 BioMoby
• They formalize how computers exchange messages, such as assignments,
input data, computation results, and the output of database searches
• Due to their drawbacks, open standard Extensible Messaging and
Presence Protocol (XMPP) was devised which are capable of
asynchronous communication (i.e. results are sent back to the user
automatically)
SOAP- Simple Object
Access Protocol
• HTTP and XML provide an at-hand solution that allows programs
running under different operating systems in a network to
communicate with each other called as SOAP.

• SOAP specifies exactly how to encode an HTTP header and an XML

file so that a program in one computer can call a program in another
computer and pass along information.

• SOAP also specifies how the called program can return a response.

• Despite its frequent pairing with HTTP, SOAP supports other

transport protocols as well.
REST- REpresentational State
Transfer
• REST is a style of software architecture
• In REST based application/architecture, state and functionality are
divided into distributed resources
• Every resource is uniquely addressable using a uniform and
minimal set of commands (typically using HTTP commands of
GET, POST, PUT, or DELETE over the Internet)
• The protocol is client/server, stateless, layered, and supports
caching.
• This is essentially the architecture of the Internet and explains the
popularity and ease-of-use for REST.
BioMOBY: an open source
biological web services
proposal
• BioMOBY is an Open Source research project which generates an architecture
for the discovery and distribution of biological data through web services
• Data and services are de-centralised, but the availability of these resources, and
the instructions for interacting with them, are registered in a central location
called MOBY Central.
• BioMOBY adds to the web services paradigm, as exemplified by Universal Data
Discovery and Integration (UDDI), by having an object-driven registry query
system with object and service ontologies.
• This allows users to navigate extensive and different data sets where each
possible next step is presented based on the data object currently in-hand.
• Moreover, a path from the current data object to a desired final data object could
be automatically discovered using the registry.
• Native BioMOBY objects are lightweight XML, and make up both the query and
the response of a SOAP transaction.
The databases: then
• Traditional RDBMS (relational database management system) have
been the de facto standard for database management throughout the
age of the internet. The architecture behind RDBMS is such that
data is organized in a highly-structured manner, following
the relational model. It is not a scalable solution to meet the needs
of 'big' data.
• NoSQL (commonly referred to as "Not Only SQL") represents a
completely different framework of databases that allows for high-
performance, agile processing of information at massive scale. It
has been the solution to handling some of the biggest data
warehouses on the planet – i.e. the likes of Google, Amazon, and
the CIA.
And now….What is Hadoop?
• Hadoop is not a type of database, but rather a software ecosystem
that allows for massively parallel computing. It is an enabler of
certain types NoSQL distributed databases (such as HBase), which
can allow for data to be spread across thousands of servers with
little reduction in performance.
• predicting the global Hadoop market will increase about $700 US
billion by 2022.
Contd..
• A staple of the Hadoop ecosystem is MapReduce,
a computational model that basically takes intensive data
processes and spreads the computation across a potentially
endless number of servers (generally referred to as a Hadoop
cluster).
• It has been a game-changer in supporting the enormous
processing needs of big data; a large data procedure which
might take 20 hours of processing time on a centralized
relational database system, may only take 3 minutes when
distributed across a large Hadoop cluster of commodity
servers, all processing in parallel.
MapReduce
 MapReduce is a software framework introduced by Google to
support distributed computing on large data sets on computer
clusters
 Users specify a map function that processes a key/value pair to
generate a set of intermediate key/value pairs, and a reduce
function that merges all intermediate values associated with the
same intermediate key
 MapReduce runs on a large cluster of commodity machines and
is highly scalable. It has several forms of implementation
provided by multiple programming languages, like Java, C# and
C++
Steps of MapReduce
 Map: A function called "Map," which allows different points
of the distributed cluster to distribute their work. The master
node takes the input, chops it up into smaller sub-problems,
and distributes those to worker nodes. The worker node
processes that smaller problem, and passes the answer back to
its master node.
 Reduce: A function called "Reduce," which is designed to
reduce the final form of the clusters’ results into one output.
The master node then takes the answers to all the sub-
problems and combines them in a way to get the output - the
answer to the problem it was originally trying to solve.
Ideal for Data-Intensive
Programmi
ng Model

Parallel Applications
Fault Map Moving
Computati
Tolerance
Reduce on to Data

Scalable
DECENTRALIZED MAPREDUCE
ARCHITECTURE ON CLOUD
SERVICES

Cloud Queues for scheduling, Tables to store meta-data and monitoring

data, Blobs for input/output/intermediate data storage.
Applications of MapReduce
• MapReduce is generally used in distributed grep, distributed
sort, Web link-graph reversal, Web access log stats, document
clustering, machine learning and statistical machine
translation.
• MapReduce algorithms using the cloud-ready framework
Hadoop are available for bioinformatics:
Sequence alignment
Short read mapping
SNP identification
RNA expression analysis
The pros and cons of Cloud
Computing
• It poses problems for developers and users of cloud
software
 As it requires large data transfers over precious
low-bandwidth
 Raises new privacy and security issues
 and is an inefficient solution for some types of
bioinformatics problems

• However,
 it is an increasingly valuable tool for processing
large datasets
 it is already used by the US federal government
 pharmaceutical
TECHNOLOGIES- CloVR
 A new application, Cloud Virtual Resource (CloVR) is a new
desktop application for push-button automated sequence analysis
that can utilize cloud computing resources.
 It is implemented as a single portable Virtual Machine (VM) that
provides several automated analysis pipelines for microbial
genomics, including 16S, whole genome and metagenome sequence
analysis. A virtual machine is a piece of software running on the
host computer that emulates the properties of a computer.
 The CloVR VM runs on a personal computer, utilizes local
computer resources and requires minimal installation, addressing
key challenges in deploying bioinformatics workflows.
 In addition, it supports use of remote cloud computing resources to
improve performance for large-scale sequence processing.
ISSUES IN BIOINFORMATICS
BIG DATA

Big Data generation and acquisition gives birth to profound challenges for storage, transfer
and security of the information.

1. Big data storage space would be needed by companies to store their data without any
limits. Also, the computational time is needed to be decreased with the increase in the data
for faster processing and efficient results.

2. Another challenge is to transfer the data from one location to another; it is mainly done
either by the use of external hard disks or by mail. Transfer and accessing of this data
becomes time consuming leading to decrease in processing time. Moreover, transfer of data
may reduce work efficiency. Big data has to be processed and computed simultaneously so
that we can get faster outputs which can be shared and used from any location the user
wants to access.

3. Security and the privacy of data is also a concern. In every case the most important issue
of handling bioinformatics big data is security of the data whether it is in the storage
database or while transferring of data via external hard disks or email, security issue is the
worry. Data has to be free from any threats as well as data integrity and security has to be
maintained.
Conclusion
• Cloud computing is the next big thing in Big data analytics.
• With its application sharing and cost effective properties, it is
useful for all and should be made accessible.
• It is an attractive technology at this critical juncture of current
genomic data storage and analysis.
• Cloud computing is necessarily a blind man’s stick for the
bioinformatics research. It is promising technology which
provides storage and access to data.
• The scalability of cloud to reduce traffic and Cloud
cryptography is away to ensure security.
• To harness the cloud in the beneficial and best possible way one need to
completely rely and use it in an uninterrupted way. To achieve this one
need to first optimize the commands in a proper channel in order to avoid
termination and recreation of the cloud instances
• Cloud computing came as a ray of hope for the researchers and database
organizers. This approach just condensed the resources, data and all tools
in the cloud and users can access those data in the virtual mode and can
work in cloud itself with that data without downloading and maintaining a
local copy in personal system.
• Don’t need to buy expensive resources to carry on their daily search.
Thank
You !

SAP ECC To SAP S4 Hana - High Level For Begineers
No ratings yet
SAP ECC To SAP S4 Hana - High Level For Begineers
6 pages
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Maya June Presentation
No ratings yet
Maya June Presentation
29 pages
Fog Computing (Use It in Some Application)
No ratings yet
Fog Computing (Use It in Some Application)
18 pages
CC Unit 5 Notes
No ratings yet
CC Unit 5 Notes
8 pages
CC Modul 5 Gud
No ratings yet
CC Modul 5 Gud
11 pages
Mod5 Cloud
No ratings yet
Mod5 Cloud
50 pages
cc part 2
No ratings yet
cc part 2
19 pages
732 18CS643 Saba Fatima
No ratings yet
732 18CS643 Saba Fatima
11 pages
ch4
No ratings yet
ch4
4 pages
Full Doc Janani
No ratings yet
Full Doc Janani
121 pages
Cloud Computing - Chapter 5
No ratings yet
Cloud Computing - Chapter 5
5 pages
Live 1014
No ratings yet
Live 1014
26 pages
Chapter 7
No ratings yet
Chapter 7
51 pages
Cloud Computing
0% (1)
Cloud Computing
5 pages
Big Data technologies UNIT 1
No ratings yet
Big Data technologies UNIT 1
5 pages
Module 5
No ratings yet
Module 5
11 pages
Big Data in Research and Education
No ratings yet
Big Data in Research and Education
70 pages
Big Data and Cloud Computing
No ratings yet
Big Data and Cloud Computing
27 pages
Practical File Cloud Computing IT-704
No ratings yet
Practical File Cloud Computing IT-704
27 pages
Role of Cloud Computing in Bioinformatics: Abstract
No ratings yet
Role of Cloud Computing in Bioinformatics: Abstract
4 pages
Module 5 - Chapter 2
No ratings yet
Module 5 - Chapter 2
11 pages
Session 8 - George Strawn - Big Data
No ratings yet
Session 8 - George Strawn - Big Data
34 pages
Chapter-8_1679587952156
No ratings yet
Chapter-8_1679587952156
60 pages
Big Data - Cloud - AI
No ratings yet
Big Data - Cloud - AI
45 pages
Fillatre Big Data
No ratings yet
Fillatre Big Data
98 pages
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Cloud Computing Made Simple: Navigating the Cloud: A Practical Guide to Cloud Computing
From Everand
Cloud Computing Made Simple: Navigating the Cloud: A Practical Guide to Cloud Computing
Poonam Devi
No ratings yet
Digital Technologies – an Overview of Concepts, Tools and Techniques Associated with it
From Everand
Digital Technologies – an Overview of Concepts, Tools and Techniques Associated with it
Editor IJSMI
No ratings yet
Unit-5 cc
No ratings yet
Unit-5 cc
23 pages
BDCC-07-00068
No ratings yet
BDCC-07-00068
19 pages
Ashish_Presentation_Stage1_modify_LR
No ratings yet
Ashish_Presentation_Stage1_modify_LR
24 pages
Big Data Overview and Roadmap Branko Primetica
No ratings yet
Big Data Overview and Roadmap Branko Primetica
23 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
34 pages
46-Article Text-261-2-10-20210422
No ratings yet
46-Article Text-261-2-10-20210422
10 pages
Introduction To Cloud Based Services PWatson
No ratings yet
Introduction To Cloud Based Services PWatson
41 pages
Big Data in Cloud Computing
No ratings yet
Big Data in Cloud Computing
11 pages
Cloud Compute
No ratings yet
Cloud Compute
46 pages
Ade 12 Unit 2
No ratings yet
Ade 12 Unit 2
20 pages
5.1 - CloudServices
No ratings yet
5.1 - CloudServices
72 pages
Cloud COMPUTING Module 4
No ratings yet
Cloud COMPUTING Module 4
50 pages
Backend Development
From Everand
Backend Development
Kai Turing
No ratings yet
C - B D A - A S C R F D: Loud Based IG ATA Nalytics Urvey of Urrent Esearch and Uture Irections
No ratings yet
C - B D A - A S C R F D: Loud Based IG ATA Nalytics Urvey of Urrent Esearch and Uture Irections
12 pages
Lecture1 Big Data
No ratings yet
Lecture1 Big Data
47 pages
Ugc Care Paper 1
No ratings yet
Ugc Care Paper 1
8 pages
Hadoop Ecosystem Large PDF
No ratings yet
Hadoop Ecosystem Large PDF
229 pages
G12 Chapter 2
No ratings yet
G12 Chapter 2
5 pages
CC MODULE 5
No ratings yet
CC MODULE 5
22 pages
Cloud Computing
From Everand
Cloud Computing
Dr. Nirvikar Katiyar
No ratings yet
IJRPR2483
No ratings yet
IJRPR2483
4 pages
Facilitating Role of Cloud Computing in Driving Big Data Emergence
No ratings yet
Facilitating Role of Cloud Computing in Driving Big Data Emergence
6 pages
Mod 4
No ratings yet
Mod 4
76 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
42 pages
Big Data and Hadoop
No ratings yet
Big Data and Hadoop
37 pages
A Cloud Based Approach For Big Data Analysis A Comprehensive Review
No ratings yet
A Cloud Based Approach For Big Data Analysis A Comprehensive Review
7 pages
Enabling Large-Scale Biomedical Analysis in The Cloud: Biomed Research International October 2013
No ratings yet
Enabling Large-Scale Biomedical Analysis in The Cloud: Biomed Research International October 2013
7 pages
Edge AI Solutions
From Everand
Edge AI Solutions
Kai Turing
No ratings yet
Emerging Trends - Q&A
No ratings yet
Emerging Trends - Q&A
5 pages
Bioinformatics analysis
No ratings yet
Bioinformatics analysis
16 pages
Accelerated DevOps with AI, ML & RPA: Non-Programmer’s Guide to AIOPS & MLOPS
From Everand
Accelerated DevOps with AI, ML & RPA: Non-Programmer’s Guide to AIOPS & MLOPS
Stephen Fleming
5/5 (2)
System Design Basics
From Everand
System Design Basics
Kai Turing
No ratings yet
LEI2020
No ratings yet
LEI2020
9 pages
monteiro2018
No ratings yet
monteiro2018
8 pages
MAP
No ratings yet
MAP
13 pages
AE011926 Eram Fatma
No ratings yet
AE011926 Eram Fatma
2 pages
big data summary
No ratings yet
big data summary
19 pages
2021-22 - EPBM_Final
No ratings yet
2021-22 - EPBM_Final
1 page
Mechatronics- Basics of arduino
No ratings yet
Mechatronics- Basics of arduino
42 pages
Bi/Bobj/Abap/ Consultant.: SAP Hana
No ratings yet
Bi/Bobj/Abap/ Consultant.: SAP Hana
12 pages
Digital + Data/AI Transformation: Parallels and Pitfalls
No ratings yet
Digital + Data/AI Transformation: Parallels and Pitfalls
20 pages
AI Technologies for Enhancing Democratic Processes final
No ratings yet
AI Technologies for Enhancing Democratic Processes final
21 pages
Project-Based Learning Programme in DA_Orientation_oct
No ratings yet
Project-Based Learning Programme in DA_Orientation_oct
19 pages
Shiladitya Das Sharma IAM PBAC
No ratings yet
Shiladitya Das Sharma IAM PBAC
13 pages
Resume Abdul Sarwary
No ratings yet
Resume Abdul Sarwary
1 page
Project Report Rohit
No ratings yet
Project Report Rohit
60 pages
7-Test Reporting and Analytics 20160729
No ratings yet
7-Test Reporting and Analytics 20160729
32 pages
Descriptive Analytics I: Nature of Data,: Statistical Modeling, and Visualization
No ratings yet
Descriptive Analytics I: Nature of Data,: Statistical Modeling, and Visualization
76 pages
Abdul Khaliq Parser Template
No ratings yet
Abdul Khaliq Parser Template
2 pages
Gym Management System
No ratings yet
Gym Management System
9 pages
DOC-20241201-WA0005.
No ratings yet
DOC-20241201-WA0005.
1 page
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-5
No ratings yet
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-5
4 pages
Dzone Guide Artificialintelligence 2017
No ratings yet
Dzone Guide Artificialintelligence 2017
38 pages
Research Paper Published
No ratings yet
Research Paper Published
10 pages
EE2024_14072024
No ratings yet
EE2024_14072024
35 pages
Importify-The Ultimate Guide To Start Building Your Dropshipping Business PDF
100% (1)
Importify-The Ultimate Guide To Start Building Your Dropshipping Business PDF
36 pages
Business-Analytics-Overview
No ratings yet
Business-Analytics-Overview
2 pages
PDF Cyber Security Intelligence and Analytics Zheng Xu Ebook Full Chapter
100% (7)
PDF Cyber Security Intelligence and Analytics Zheng Xu Ebook Full Chapter
53 pages
Aakash Verma - Analytics - Resume
No ratings yet
Aakash Verma - Analytics - Resume
1 page
IOT Questions
100% (4)
IOT Questions
4 pages
Bihl, Trevor J. - Zobaa, Ahmed F - Big Data Analytics in Future Power Systems (2019)
No ratings yet
Bihl, Trevor J. - Zobaa, Ahmed F - Big Data Analytics in Future Power Systems (2019)
189 pages
Information Visualization Courses For Students With A Computer Science Background
No ratings yet
Information Visualization Courses For Students With A Computer Science Background
4 pages
Go Beyond Analytics White Paper Chiller Optimization
No ratings yet
Go Beyond Analytics White Paper Chiller Optimization
7 pages
Jaya Ranjan: Female, 24 Years
No ratings yet
Jaya Ranjan: Female, 24 Years
1 page
Hku Msba Brochure
No ratings yet
Hku Msba Brochure
10 pages
GatekeeperBrochure R5
No ratings yet
GatekeeperBrochure R5
2 pages
Data Scientist Intern
No ratings yet
Data Scientist Intern
3 pages
Data Science Bootcamp Brochure
No ratings yet
Data Science Bootcamp Brochure
20 pages

Bigdata-cloud computing A K Mishra

Uploaded by

Bigdata-cloud computing A K Mishra

Uploaded by

Big data analytics

• Every day we create 2.5 billion or quintillion bytes

Agri-Engineers are joining the big-data club.

Challenges in handling agricultural data

Once the data is ready, it can be analyzed with the

• data mining: which sift through data sets in search

• Multiple comparisons issue high number of false positives than

Categories of tools implemented in HPC.

CONSUME BUILD ON IT MIGRATE TO IT

 Amazon offers a variety of bioinformatics-oriented virtual machine

• SOAP specifies exactly how to encode an HTTP header and an XML

• Despite its frequent pairing with HTTP, SOAP supports other

Cloud Queues for scheduling, Tables to store meta-data and monitoring

You might also like