Bigdata-cloud computing A K Mishra
Bigdata-cloud computing A K Mishra
Dr. A. K. Mishra
Principal Scientist
ICAR-Indian Agricultural Research Institute, New Delhi
WHAT IS BIGDATA ?
it
happen?
Diagnostic
What
Analytics
happened
?
Descriptiv
e
Analytics
DIFFICULTY
How big data analytics works
http://topsupercomputers-india.iisc.ernet.in/jsps/june2022/index.html
Open Source Softwares &
• Tools
Fifty two open source software and tools are configured on this
HPC environment to carry out various biological data analysis.
• These softwares and tools were identified based on online survey
conducted among researchers from National Agricultural Research
and Education System (NARES) institutions.
Have two
options: continue
downloading
data to local
clusters as Continue to
before, or use work with the
cloud computing data via web-
– move the pages in their
cluster to the accustomed
data. way
Cloud resources in Bioinformatics
Software as a Service (SaaS)
Bioinformatics requires a large variety of software tools for different types of
data analysis.
Software as a Service (SaaS) cloud delivers software services online and
facilitates remote access to available bioinformatics software tools through the
Internet.
As a consequence, SaaS eliminates the need of local installation and eases
software maintenances and updates, providing up-to-date cloud-based services for
bioinformatics data analysis over the Web.
Efforts have been made to develop cloud-scale tools, including:
Sequence analysis (mapping, assembly and alignment)
Gene expression analysis
Homology detection (orthologs and paralogs)
Peak callers for ChIP-seq data
Genome annotation (structural and functional)
Identification of epistatic interactions of Single Nucleotide Polymorphisms
(SNPs)
Various other cloud-based applications for NGS (Next-Generation Sequencing)
data analysis.
Cloud Computing Service
Providers
Cloud computing platforms have been emerging in the commercial
sector, including the Amazon Elastic Compute Cloud (EC2),
Rackspace Cloud, Flexiant and in the public sector to support
research, such as Magellan and DIAG
It is an increasingly valuable tool for processing large datasets, and
it is already used by the US federal government, pharmaceutical
and Internet companies, as well as scientific labs and
bioinformatics services
Amazon Cloud Services
• SOAP also specifies how the called program can return a response.
Parallel Applications
Fault Map Moving
Computati
Tolerance
Reduce on to Data
Scalable
DECENTRALIZED MAPREDUCE
ARCHITECTURE ON CLOUD
SERVICES
• However,
it is an increasingly valuable tool for processing
large datasets
it is already used by the US federal government
pharmaceutical
TECHNOLOGIES- CloVR
A new application, Cloud Virtual Resource (CloVR) is a new
desktop application for push-button automated sequence analysis
that can utilize cloud computing resources.
It is implemented as a single portable Virtual Machine (VM) that
provides several automated analysis pipelines for microbial
genomics, including 16S, whole genome and metagenome sequence
analysis. A virtual machine is a piece of software running on the
host computer that emulates the properties of a computer.
The CloVR VM runs on a personal computer, utilizes local
computer resources and requires minimal installation, addressing
key challenges in deploying bioinformatics workflows.
In addition, it supports use of remote cloud computing resources to
improve performance for large-scale sequence processing.
ISSUES IN BIOINFORMATICS
BIG DATA
Big Data generation and acquisition gives birth to profound challenges for storage, transfer
and security of the information.
1. Big data storage space would be needed by companies to store their data without any
limits. Also, the computational time is needed to be decreased with the increase in the data
for faster processing and efficient results.
2. Another challenge is to transfer the data from one location to another; it is mainly done
either by the use of external hard disks or by mail. Transfer and accessing of this data
becomes time consuming leading to decrease in processing time. Moreover, transfer of data
may reduce work efficiency. Big data has to be processed and computed simultaneously so
that we can get faster outputs which can be shared and used from any location the user
wants to access.
3. Security and the privacy of data is also a concern. In every case the most important issue
of handling bioinformatics big data is security of the data whether it is in the storage
database or while transferring of data via external hard disks or email, security issue is the
worry. Data has to be free from any threats as well as data integrity and security has to be
maintained.
Conclusion
• Cloud computing is the next big thing in Big data analytics.
• With its application sharing and cost effective properties, it is
useful for all and should be made accessible.
• It is an attractive technology at this critical juncture of current
genomic data storage and analysis.
• Cloud computing is necessarily a blind man’s stick for the
bioinformatics research. It is promising technology which
provides storage and access to data.
• The scalability of cloud to reduce traffic and Cloud
cryptography is away to ensure security.
• To harness the cloud in the beneficial and best possible way one need to
completely rely and use it in an uninterrupted way. To achieve this one
need to first optimize the commands in a proper channel in order to avoid
termination and recreation of the cloud instances
• Cloud computing came as a ray of hope for the researchers and database
organizers. This approach just condensed the resources, data and all tools
in the cloud and users can access those data in the virtual mode and can
work in cloud itself with that data without downloading and maintaining a
local copy in personal system.
• Don’t need to buy expensive resources to carry on their daily search.
Thank
You !