0% found this document useful (0 votes)
61 views3 pages

Big Data

This document discusses applying cloud computing to analyze large amounts of biomedical data, known as big data. Cloud computing can help solve problems in biomedicine by processing data from many angles rather than a single approach. It allows for more effective fusion of data and interventions. When applying cloud computing to biomedical big data, resources must be acquired, data stored and searched efficiently, and reasonable processing performed to achieve goals like disease treatment and social stability improvements. Examples of applying cloud computing include tools for genomic sequence analysis and annotation in areas like next-generation sequencing and mass spectrometry.

Uploaded by

hasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views3 pages

Big Data

This document discusses applying cloud computing to analyze large amounts of biomedical data, known as big data. Cloud computing can help solve problems in biomedicine by processing data from many angles rather than a single approach. It allows for more effective fusion of data and interventions. When applying cloud computing to biomedical big data, resources must be acquired, data stored and searched efficiently, and reasonable processing performed to achieve goals like disease treatment and social stability improvements. Examples of applying cloud computing include tools for genomic sequence analysis and annotation in areas like next-generation sequencing and mass spectrometry.

Uploaded by

hasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Application Of Cloud Computing In Biomedicine Big

Data Analysis Cloud Computing In Big Data


Tianyi Yang Yang Zhao
Texas Center for Integrative Environmental Medicine School of Public Health
TCIEM Nanjing Medical University
Austin, TX, USA Nanjing, Jiangsu, China
[email protected]

Abstract— In modern days, the medical technology has become data. Finally, complete the reasonable processing work and
the focus of the construction of the society, in the spurt growth of achieve the goal.
biomedical data, many medical work are faced with assorted
problems, including the huge amount of data and intensive
calculation, etc. In the current stage of development, the large data
II. METHODS
processing of biomedicine has received highly attention. In
addition, it is the major analysis method for a series of biological A. To Acquire Cloud Computing Resources
and biomedical experiments, e.g the sequencing in the second In the process of biomedical big data, cloud computing
generation, mass spectrometry analysis. Therefore, the overall trend applications, must be based on hard work. It means that the
of development is positive. As a representative of contemporary operator should obtain cloud computing resources effectively,
science and technology, cloud computing plays a critical role by guarantee the processing of biomedical big data to proceed
what it has created by itself. This paper discusses the application of from the right path. For this stage of the work, the general
cloud computing in biomedical data processing, and puts forward biological cloud users are lack of resource allocation and
some reasonable suggestions. professional knowledge in the control of cloud resource.
Therefore, the biological cloud will provide interface for the
Keywords— cloud computing; biomedicine; big data;
vast number of users, and ensure that the cloud computing of
processing
biomedical big data is more practical. For example, the Galaxy
CloudMan processing scheme can provide corresponding
I. INTRODUCTION configuration content for biological cloud users, and control
From an objective perspective, cloud computing can solve cloud computing environment based on EC2. In the process of
the problems through many methods in the application operation, users can use the Web interface provided by the
process, not confined to a single means, but from a number of CloudMan. Within a few minutes, users can complete the
angles. So, for problems happened in biomedicine of large configuration of Cluster. On the other hand, in the application
data, cloud computing solutions will be more fundamental, of the method, in order to ensure the efficient processing of
while it will reduce blunt operation means. It is worth noting biomedical data, it will provide the automated methods
that a part of the intervention need the cloud computing and customized by cloud resources so that it can better meet some
increase compatibility in the process of big data in specific needs of users. Cloud computing program as shown
biomedicine, so as to ensure that they can achieve better below.
fusion, and create a higher value in the specific work.
B. Storage and Data Search
The biomedical research has become more and more
important in recent years. It is not only related to the people's When applying cloud computing into the processing of
livelihood, but also has a greater influence on the treatment of large biomedical data, it must be better improved in function
diseases, bacterial control, environment improvement, social of storage and data search. Thereby, ensuring working needs
stability and so on. Hence, when the big biomedical data had a of users can be satisfied in the specific operations. Now many
rapid growth, it is essential to choose the most effective and cloud computing software design or system of cloud
reasonable technology for intervention. Otherwise it is very computing, can provide better Cloud resource leasing for users
difficult to ensure the previous work which can get adequate with payment function offered by the cloud platform. In most
progress and later work may deviate from the expected goal. cases, after the success of the lease service, users can fully
Analysis shows that the characteristics of biomedical data are upload their own documents, information and data etc. into the
large amount of data, and difficult to deal with. The cloud storage space. The advantage of the cloud storage space
advantages of cloud computing are the exact of data is that users can obtain resources whenever and wherever
intervention in large quantity, they have strong possible. The speed of download is relatively fast. It also can
complementary. To analyze from the perspective of practice, it download through online browse. For the search, only input
is impossible to directly get sound results at the beginning of part of the content, users can swiftly position the targeted
the application of cloud computing when processing the content through fuzzy search. Thus, the convenience of
biomedical data, it must carry out effective design of cloud services has been greatly improved.
computing scheme, and then observe the change of biomedical

This paper has supplementary downloadable material available at http://ieeexplore.ieee.org, provided by the authors.
This includes one video of the presentation in WMV format.
This material is 2 MB in size.
user complete documentation of each tool directly through the
graphical interface of the virtual machine. Another method
Get cloud
Configu
ration
(like SIMPLEX, CloVR, etc.) is packaging related data
Biological cloud
resources platform executi processing in the digital image .
on
Table 2: Biomedical Applications in Cloud Computing
parallel
action NGS Sequence Analysis
data System
storage mirror Category Name Function
CloudBLAST Pairwise Alignment

Run Cloud-Coffee Multiple Sequence Alignment


Storage
search sharing
Sequence CloudBurst Single Ended Sequence
Alignment Matching
Matching
Figure 1: basic models for biological cloud solutions Cloud-MAQ Sequence Matching
Table 1: Common Examples and Evaluation of Scientific CloudAligner Sequence Matching
Computing
Crossbow Identification SNP
Instance VAT Functional Annotation of
Specification(Internal Linux/UNIX Individual Genome Variation
Instance Cost
Memory Application
Type ($) Cloud RSD For Comparative Genomics,
GiB/Computing Unit Amount(h) Identification of Homologous
ECUS) Sequences

Standard Medium-sized 3.75GiB/2ECUs 0.130 SIMPLES Integrated Analysis Flow of


Exon Data
on-
Large 7.5GiB/4 ECUs 0.260
demand PathSeq Identification of Pathogenic
Instance Super Large 15GiB/8 ECUs 0.520 Microorganisms in Sequencing
Data From Unknown Samples
Second Super Large 15GiB/13ECUs 0.980 Functional CloudLCA Classification of Metagenomic
Generation Process Data Based on LCA Algorithm
Standard
Nyna Differential Expression of
on- Double Super Large 30GiB/826ECUs 1.960 Computational Data on RNA-
demand seq
Instance
Eoulsan Custom Data Processing Flow
Super Large 17.1GiB/6.5ECUs 0.570 on RNA-seq
Memory
Enhanced FX Establishing Gene Expression
Double Super Large 34.2GiB/13ECUs 1.140 Profiles and Identifying
on-
Genomic Variation
demand Four Times Super
Instance 68.4GiB/26ECUs 2.280
Large PeakRanger Peak identification of data on
ChIP-seq
CPU Medium-sized 1.7GiB/5ECUs 0.285
Galaxy Integrated Many NGS Analysis
Enhanced Process
on-
demand Super Large 7GiB/20ECUs 1.140 CloVR Integrated Automated Genome
Analysis Process
Instance
Comprehensive Cloud Biolinux Integrated Bioinformatics
Four Times Super Analysis Toolkit
Cluster 23GiB/33.5ECUs 1.610
Large Environment
Computing
Eight Times Super CloudMan For Automated Deployment
Example
60.5GiB/88ECUs 2.970 and Management of EC2
Large Clusters
Other Tools
C. Run and Share System Image
Category Name Function
The users of biomedical large data processing can run a
customized biological cloud system image (such as Cloud Expression YunBe Gene Enrichment Analysis,
BioLinux). Taking Cloud BioLinux as an example, it provides Profile Analysis Integrated Analysis of
BioVLAB-MMIA Expression Data of microRNA
users with preconfigured command-line and graphical user and mRNA
software. By the end of December, 2013, it offers at least 135
bioinformatics software, and the number of data processing Proteome Cloud CPFP Proteomic Data Analysis Flow
software is constantly increasing. Users can also access the
Database SeqWare Query Cloud Genome Data Retrieval Finally, the cosine similarity represents two Vector values
Engine Engine Engine of triple in the vector space, namely that is cosine of the angle,
Interface and Cloudgene Graphical Software so that the size of which can be used to measure the difference
Library
Hadoop-BAM
Management and Use Platform, between the two, the specific formula is as follows:
¦ xi ¦ yi
Java Library of Processing
Format Files Like BAM and
SAM cos(T )
¦x ¦y 2
i
2
i
. In the above formula, X and Y
represent triple, N represents the total number of the triple.
III. ALGORITHM DESIGN According to the above method, Similarity Value can be
obtained. In our system, we use two measures to acquire
Algorithm design can be accomplished through neighbor users or items, which is based on the similarity
collaborative filtering algorithm, which mainly includes the threshold door neighbor and shabby fixed number of
following steps: Collect users’ preference data--finding similar neighbors. Finally, it achieve recommendation mainly through
users and projects--recommended calculation. Firstly, taking
collaborative filtering and Project-based Collaborative
the advantages of large data collection module from Sqoop in
Filtering of users.
accordance with the principle of distribution. From the
information systems of hospital , it can collect more
information data, and effectively convert into a simple three IV. CONCLUSION
tuple: <UserID㸪ItemID, Preference>.After that, several This paper discusses the application in biomedical data
similarity measures are used to effectively calculate the processing of cloud computing, now many data processing
similarity between users. After analysis, Euclidean distance tasks can be completed with the help of cloud computing and
can express the real distance of two points in multidimensional no any serious problem ever happened. Therefore, it has more
space. The calculation formula is as follows: scientific features on the whole and meet the standards of
biomedical data processing. In the future, researchers should
d˄xˈy˅ ˄¦˄x t  y t˅˅
2
make further intensive study of cloud computing, so as to
. Second, with the use of
provide more programs for biomedical data processing.
Euclidean distance, effective similarity will be displayed as:
1 1
sim( x, y )
1  d ( x, y ) 1  (¦ ( xt  yt ) )
2
REFERENCES
. Third, [1] Yi Fu. Large Data Processing Platform Spark and Its Biomedical
using the Pearson correlation coefficient method, and Applications [J]. Communications World, 2016,04:286.
analyzing the ratio of covariance between two three-tuples [2] Zhixue Wang. Discussion on the Solution of Large Biomedical Data
㸦cov㸦x㸪y㸧㸧and standard deviation σxσy, the specific Processing by Cloud Computing [J]. Computer Knowledge And
formula is as follows: Technology, 2015,36:5-6.

¦ xiyi  ¦ N¦
xi yi [3] Shuai Yang, Zongqian Hu, Xiaochen Bo, Shengqi Wang,Fei Li,Donggen
Wang. Application of Cloud Computing in Biomedicine [J]. Chinese
cov( x, y ) Science: Life Science, 2013,07:569-578.
p ( x, y )
VxVy
¦ xi )(¦ yi  ¦ yi
2 2

(¦ xi
2 2
 )
N N
.

You might also like