SlideShare a Scribd company logo
Visit https://testbankfan.com to download the full version and
explore more testbank or solutions manual
Distributed and Cloud Computing 1st Edition Hwang
Solutions Manual
_____ Click the link below to download _____
https://testbankfan.com/product/distributed-and-cloud-
computing-1st-edition-hwang-solutions-manual/
Explore and download more testbank or solutions manual at testbankfan.com
Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Distributed Systems Concepts and Design 5th Edition
Coulouris Solutions Manual
https://testbankfan.com/product/distributed-systems-concepts-and-
design-5th-edition-coulouris-solutions-manual/
Digital Logic and Microprocessor Design with Interfacing
2nd Edition Hwang Solutions Manual
https://testbankfan.com/product/digital-logic-and-microprocessor-
design-with-interfacing-2nd-edition-hwang-solutions-manual/
Computer Accounting with QuickBooks Online A Cloud Based
Approach 1st Edition Yacht Solutions Manual
https://testbankfan.com/product/computer-accounting-with-quickbooks-
online-a-cloud-based-approach-1st-edition-yacht-solutions-manual/
Exploring Getting Started with Computing Concepts 1st
Edition Poatsy Solutions Manual
https://testbankfan.com/product/exploring-getting-started-with-
computing-concepts-1st-edition-poatsy-solutions-manual/
Differential Equations Computing and Modeling and
Differential Equations and Boundary Value Problems
Computing and Modeling 5th Edition Edwards Solutions
Manual
https://testbankfan.com/product/differential-equations-computing-and-
modeling-and-differential-equations-and-boundary-value-problems-
computing-and-modeling-5th-edition-edwards-solutions-manual/
Differential Equations Computing and Modeling 5th Edition
Edwards Solutions Manual
https://testbankfan.com/product/differential-equations-computing-and-
modeling-5th-edition-edwards-solutions-manual/
Computer Accounting with QuickBooks Online A Cloud Based
Approach 1st Edition Yacht Test Bank
https://testbankfan.com/product/computer-accounting-with-quickbooks-
online-a-cloud-based-approach-1st-edition-yacht-test-bank/
Differential Equations and Boundary Value Problems
Computing and Modeling 5th Edition Edwards Solutions
Manual
https://testbankfan.com/product/differential-equations-and-boundary-
value-problems-computing-and-modeling-5th-edition-edwards-solutions-
manual/
Hands on Virtual Computing 2nd Edition Simpson Solutions
Manual
https://testbankfan.com/product/hands-on-virtual-computing-2nd-
edition-simpson-solutions-manual/
Distributed and Cloud Computing 1st Edition Hwang Solutions Manual
6 - 1
Solutions to Homework Problems in Chapter 6
Hwang, Fox and Dongarra: Distributed and Cloud Computing,
Morgan Kaufmann Publishers, copyrighted 2012
Note: The solutions of Chapter 6 problems were assisted by graduate students from
Indiana University under the supervision of Dr. Judy Qiu:
Problem 6.1:
Get the source code from: http://dl.dropbox.com/u/12951553/bookanswers/answer6.1.zip
(a). We implemented a demo system, which is quite simple in its functionality: there’s a search
box used to find contacts, and once a contact has been found, we list recent emails and
attachments associated with the contact. To do this, the application offers 3 urls that are
called by the JavaScript running in the browser to obtain the data: search.json,
messages.json and files.json.
How the system respond to the request to get message history for a given contact
is done by calling /messages.json which accepts an email address as a GET parameter.
Note, this functionality requires an authentication step not shown here. The code behind
that call is as follows:
class MessagesHandler(webapp.RequestHandler):
def get(self):
current_user = users.get_current_user()
current_email = current_user.email()
emailAddr = self.request.get('email')
contextIO = ContextIO(api_key=settings.CONTEXTIO_OAUTH_KEY,
api_secret=settings.CONTEXTIO_OAUTH_SECRET,
api_url=settings.CONTEXTIO_API_URL)
response = contextIO.contactmessages(emailAddr,account=current_email)
self.response.out.write(simplejson.dumps(response.get_data()))
The code simply uses the contactmessages.json API call of and returns all the messages
including the subject, other recipients, thread ID, and even attachments in JSON format.
The complete code for this demo application has been made available by the Context.IO team
on our GitHub account (https://github.com/contextio/AppEngineDemo).
This answer is based on the Google App Engine Blog Post at
http://googleappengine.blogspot.com/2011/05/accessing-gmail-accounts-from-app.html.
(b). The dashboard of Google App Engine provides measurement on useful aspects of the
deployed application. For example, execution logs, version control, quota details, datastore
viewer, administration tools. It also provides detailed resource usage information as the
following:
6 - 2
Critical measurement can be easily retrieved from this powerful dashborad.
(c) . Automatic scaling is built in with App Engine, and it’s not visible to users.
http://code.google.com/appengine/whyappengine.html#scale
6 - 3
Problem 6.2:
Get the source code: http://dl.dropbox.com/u/12951553/bookanswers/answer6.2.zip
Here we design a very simple data storage system using the Blobstore service to
illustrate how Google App Engine handles data. The Blobstore API allows your application to
serve data objects, called blobs, that are much larger than the size allowed for objects in the
Datastore service. Blobs are useful for serving large files, such as video or image files, and for
allowing users to upload large data files. Blobs are created by uploading a file through an HTTP
request.
Typically, your applications will do this by presenting a form with a file upload field to the
user. When the form is submitted, the Blobstore creates a blob from the file's contents and
returns an opaque reference to the blob, called a blob key, which you can later use to serve the
blob. The application can serve the complete blob value in response to a user request, or it can
read the value directly using a streaming file-like interface. This system includes the following
functions: user login, data listing, data upload/download. Gzip compression is used when
possible to decrease the cost.
User login: This function is implemented using the User Service provided in GAE. If the user
is already signed in to your application, get_current_user() returns the User object for the user.
Otherwise, it returns None. If the user has signed in, display a personalized message, using the
nickname associated with the user's account. If the user has not signed in, tell webapp to
redirect the user's browser to the Google account sign-in screen. The redirect includes the URL
to this page (self.request.uri) so the Google account sign-in mechanism will send the user back
here after the user has signed in or registered for a new account.
user = users.get_current_user()
if user:
self.response.headers['Content-Encoding'] = 'gzip'
self.response.headers['Content-Type'] = 'text/plain'
self.response.out.write('Hello, ' + user.nickname())
self.response.out.write('<a href=' + users.create_logout_url("/") +'>sign out</a><br/>');
else:
self.redirect(users.create_login_url(self.request.uri))
The content is gzip compressed when sent back from the server. Also, a log out link is provided.
Data listing: To list the data uploaded by a specific user, the GQL is used to guarantee users
can only see/access data belongs to him.
class Blob(db.Model):
"""Models a data entry with an user, content, name, size, and date."""
user = db.UserProperty()
name = db.StringProperty(multiline=True)
content = blobstore.BlobReferenceProperty(blobstore.BlobKey)
date = db.DateTimeProperty(auto_now_add=True)
size = db.IntegerProperty()
6 - 4
This defines a data blob class with five properties: user whose value is a User object, name
whose value is a String, content whose value is a BlobKey pointed to this blob, date whose
value is datetime.datetime, and size whose value is an Integer. GQL, a SQL-like query
language, provides access to the App Engine datastore query engine's features using a familiar
syntax. The query happens here:
blobs = db.GqlQuery("SELECT * "
"FROM Blob "
"WHERE user = :1", user)
This can return all blobs uploaded by this user.
Data upload: To create and upload a blob, follow this procedure:
Call blobstore.create_upload_url() to create an upload URL for the form that the user will fill
out, passing the application path to load when the POST of the form is completed:
upload_url = blobstore.create_upload_url('/upload')
There is an asynchronous version, create_upload_url_async(). It allows your application
code to continue running while Blobstore generates the upload URL.
The form must include a file upload field, and the form's enctype must be set to multipart
/form data. When the user submits the form, the POST is handled by the Blobstore API, which
creates the blob. The API creates an info record for the blob and stores the record in the
datastore, and passes the rewritten request to your application on a given path as a blob key:
self.response.out.write('<html><body>')
self.response.out.write('<form action="%s" method="POST" enctype="multipart/form-data">' %
upload_url)
self.response.out.write("""Upload File: <input type="file" name="file"><br> <input type="submit"
name="submit" value="Submit"> </form></body></html>""")
• In this handler, you can store the blob key with the rest of your application's data model.
The blob key itself remains accessible from the blob info entity in the datastore. Note that
after the user submits the form and your handler is called, the blob has already been
saved and the blob info added to the datastore. If your application doesn't want to keep
the blob, you should delete the blob immediately to prevent it from becoming orphaned:
class UploadHandler(blobstore_handlers.BlobstoreUploadHandler):
def post(self):
try:
upload_files = self.get_uploads('file') # 'file' is file upload field in the form
blob_info = upload_files[0]
myblob = Blob()
myblob.name = blob_info.filename
myblob.size = blob_info.size
myblob.user = users.get_current_user()
myblob.content = blob_info.key()
myblob.put()
self.redirect('/')
except:
6 - 5
self.redirect('/')
• The webapp framework provides
the blobstore_handlers.BlobstoreUploadHandler upload handler class to help you parse
the form data. For more information, see the reference for BlobstoreUploadHandler.
• When the Blobstore rewrites the user's request, the MIME parts of the uploaded files
have their bodies emptied, and the blob key is added as a MIME part header. All other
form fields and parts are preserved and passed to the upload handler. If you don't specify
a content type, the Blobstore will try to infer it from the file extension. If no content type
can be determined, the newly created blob is assigned content type application/octet-
stream.
Data download: To serve blobs, you must include a blob download handler as a path in your
application. The application serves a blob by setting a header on the outgoing response. The
following sample uses the webapp framework. When using webapp, the handler should pass
the blob key for the desired blob to self.send_blob(). In this example, the blob key is passed to
the download handler as part of the URL. The download handler can get the blob key by any
means you choose, such as through another method or user action.
class ServeHandler(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
resource = str(urllib.unquote(resource))
blob_info = blobstore.BlobInfo.get(resource)
self.send_blob(blob_info)
The webapp framework provides the download handler class blobstore_handlers.
BlobstoreDownloadHandler to help you parse the form data. For more information, see the
reference for BlobstoreDownloadHandler. Blobs can be served from any application URL. To
serve a blob in your application, you put a special header in the response containing the blob
key. App Engine replaces the body of the response with the content of the blob.
Problem 6.3:
Source code: http://dl.dropbox.com/u/12951553/bookanswers/answer6.3.zip
For this question, we provided a JAVA SimpleDB application with all critical functions like
domain creation, data insertion, data edition, data deletion, and domain deletion. With these
functions demonstrate how to make basic requests to Amazon SimpleDB using the AWS SDK
for Java. The reader can easily scale this application up to meet the requirements from the
question.
Prerequisites: You must have a valid Amazon Web Services developer account, and be signed
up to use Amazon SimpleDB. For more information on Amazon SimpleDB, please refer to
http://aws.amazon.com/simpledb
http://aws.amazon.com/security-credentials
Problem 6.4:
6 - 6
Now, design and request an EC2 configuration on the AWS platform for parallel
multiplication of two very large matrices with an order exceeding 50,000.
Source code : http://156.56.93.128/PBMS/doc/answer6.4.zip
The parallel matrix multiplication is implemented using Hadoop 0.20.205, and experiments are
performed on Amazon EC2 platform with sample matrices between orders of 20,000 and
50,000. Steps to implement parallel matrix multiplication using Hadoop is as follows:
1) Split Matrix A and Matrix B into two grid of n*n blocked matrices. There will be 2*n*n
Map tasks, and n*n Reduce tasks.
2) Each Map task holds either A[p][q] or B[p][q] and then sends it to ‘n’ Reduce tasks
r[p][1<i<n], or r[1< j<n][q] respectively.
3) Each Reduce task r[p][q] receive 2*n sub-matrices which include A[p][1<i<n], and
B[q][1<j<n] from Map tasks, then Reduce task multiply A[p][1<i<n] to B[q][1<j<n], then
sum them up.
The advantages of this algorithm are: 1) splitting large matrix into small sub-matrices such
that working memory of sub-matrices can be fit in memory of small EC2 instance. 2) many small
tasks increase the application parallelism. The disadvantages include the parallel overhead in
terms of scheduling, communication, and sorting caused by many tasks.
EC2 configuration
In the experiments, we use instance type: EMR, M1.small: 1.7GB memory, 1core per node. We
created four instances group with 1, 2, 4, 8, 16 nodes respectively. One should note that
Hadoop jobtracker and namenode take one node for dedicated usage for the 2,4,8,16 nodes
cases.
Steps:
a. ./elastic-mapreduce --create --instance-count 16 –alive (apply resource)
b. ./elastic-mapreduce --jobflow j-22ZM5UUKIK69O –ssh (ssh to master node)
c. ./ s3cmd get s3://wc-jar/ matrix-multiply-hadoop.jar (download program jar file)
d. ./s3cmd get s3://wc-input/matrix-50k-5k ./50k-5k (download input data)
e. Hadoop dfs –put 50k-5k/* 50k-5k (upload data to HDFS)
f. Hadoop jar matrix-multiply-hadoop.jar 50k-5k output 50000 5000 10 (run
program)
Analysis
Figure1,2,3,4 show that our parallel matrix multiply implementation can scale well in EC2
especially for large matrices. For example, the relative speed-up for processing 20k,30k,40k,50k
data are 4.43, 7.75, 9.67, 11.58 respectively when using 16 nodes. The larger the matrices
sizes are, the better the parallel efficiency the application have. (The reason why performance
using two nodes is only a little faster than one node case is because the jobtracker and
tasktracker were run on separate nodes).
Other issues in the experiments:
Storage utilization: data size are 16GB+36GB+64GB+100GB for 20k, 30k,40k,50k data sets
respectively, and there are 216GB data in total. The total costs for the experiments are input
data transfer in: $0.1*216GB = $21.6; EC2 instances: M1.small, 290hours*$0.08/hour = $23.2.
System metric, such as resource utilization: using “CloudWatch” in AWS Management Console.
Fault tolerance, see answer for problem 4.10.
Experiments results
6 - 7
Figure 1:Parallel Matrix Multiply for 20K Figure 2:Parallel Matrix Multiply for 30K
Figure3:Parallel Matrix Multiply for 40K Figure4:Parallel Matrix Multiply for 50K
Problem 6.5:
We implemented the parallel matrix multiply application using EMR and S3 on AWS
platform. The basic algorithm and configuration are as the same as in problem 6.4. The only
difference is that in problem 6.6, Hadoop retrieve the input data from S3 rather than HDFS in
problem 6.4.
Analysis
Figure1,2,3,4 show that the parallel matrix multiply can scale well in EMR/S3 environment
especially for large matrices. The relative speed-up of processing 20k,30k,40k,50k data are
7.24, 12.3, 16.4, 19.39 respectively when using 16 nodes. The super-linear speedup results
were mainly caused by serious network contention when using single node to retrieve input data
from S3. As compared to results using HDFS in problem 6.4, the results of 20k, 30k, 40k, 50k
data sets using S3 on 16 nodes are 1.3, 1.65, 1.67, 1.66 times slower in job turnaround time
respectively. The results using fewer nodes are even much slower. For example, the results of
50k data using S3 using 2 nodes are 2.19 times slower than HDFS case. These results indicate
the big overhead when using Hadoop retrieves input data from S3. In figure 5, we show the
average speed of transferring data from S3 to EC2 instance is 8.54MB/sec. The detailed
algorithm, configuration and analysis of other issues such as speedup, cost-efficiency see
answers in problem 6.4.
Performance Results:
6 - 8
Figure 1:Parallel Matrix Multiply for 20K Figure 2: Parallel Matrix Multiply for 30K
Figure3: Parallel Matrix Multiply for 40K Figure4:Parallel Matrix Multiply for 50K
Figure 5: S3 data transferring speed
Problem 6.6:
Outline of Eli Lilly cloud usage
Eli Lilly uses cloud computing in its research area of the company. In silico analyses is a
large part of the research process for the pharmaceutical industry, and Eli Lilly is no exception.
Cloud computing provides Lilly the ability for bursting capabilities when its internal compute
environment is being utilized. Additionally, Eli Lilly relies on cloud computing for analyses on
public datasets, where there is little to no concern on intellectual property or security. By running
these analyses outside of its primary data centers, the company can free up internal resources
for high performance computing and high throughput computing workflows that either may not fit
well in the cloud or the analyses are considered more proprietary or regulated.
6 - 9
As of 2009, Eli Lilly was mainly using Amazon Web Services cloud, but have plans for
using many more cloud vendors in the future, requiring an orchestration layer between Eli Lily
and the various cloud services. According to Eli Lilly, a new server in AWS can be up and
running in three minutes compared to the seven and a half weeks it take to deploy a server
internally. A 64-node AWS Linux cluster can be online in five minutes compared with three
months it takes to set such a cluster internally.
One of the main drivers for Lilly to use the cloud is to speed development efforts through
the drug pipeline more quickly. If analyses can be done in a fraction of the time because of the
scale of the cloud then thousands of dollars spent on utility computing to speed up the pipeline
can generate millions of dollars of revenue in a quicker timeframe.
Sources:
http://www.informationweek.com/news/hardware/data_centers/228200755
http://www.informationweek.com/news/healthcare/clinical-systems/227400374
http://www.informationweek.com/cloud-computing/blog/archives/2009/01/whats_next_in_t.html
Problem 6.7:
The source codes of this application can be obtained from the following link:
http://dl.dropbox.com/u/27392330/forCloudBook/AzureTableDemo-gaoxm.zip .
Using the Azure SDK for Microsoft Visual Studio, we developed a simple web application as
shown in the following Figure. This application is extended from the Azure Table demo made by
Nancy Strickland (http://www.itmentors.com/code/2011/03/AzureUpdates/Tables.zip),
It can be used to demonstrate the application of Windows Azure Table, and to finish some
simple performance tests of Windows Azure Table. A Web role is created for this application,
which accesses the Windows Azure Table service from the Web server side. When the "Add
Customer" button is clicked, a new entity will be created and inserted in to an Azure table.
When the "Query Customer" button is clicked, the table is queried with the customer code and
the customer's name will be shown after "Name". And when proper values are set in the
"number of rows", "batch size", and "start rowkey" boxes, users can click the different "test"
buttons to complete different performance tests for Windows Azure Table.
Besides the local version, we also tried to deploy the application on a virtual machine in the
Azure cloud. Some experiences we got from writing and deploying this application are:
1. The concept and separation of "Web role", "VM role" and "Worker role" during development
are not straightforward to understand, and it takes some time to learn how to develop Azure
applications.
2. Users cannot remotely login to VMs by default. It takes some special configurations. Besides,
the security restrictions on VMs make it hard to operate the VMs. For example, almost all
websites are marked as "untrusted" by IE in the VMs, which makes it very hard to even
download something using the browser.
3. The SDK for Microsoft Visual Studio is powerful. The integration of the debugging and
deployment stages in Visual Studio is very convenient and easy to use. However, the
deployment process takes a long time, and it is hard to diagnose what is wrong if the
deployment fails.
6 - 10
4. Overall, we think the Amazon EC2 models and Amazon Web Services are easier to
understand and closer to developers' current experience
Figure 4. A simple Windows Azure Web application using Azure Table
Figure 5. Read and write speed for Windows Azure Table
Problem 6.8:
In Map-Reduce Programming model, there is a special case with implementing only the
map phase, which is also known as “map-only” problem. This achievement can enhance
existing application/binary to have high throughput with running them in parallel fashion; in other
word, it helps standalone program to utilize the large scale computing capability. The goal of
this exercise is to write a Hadoop “map-only” program with a bioinformatics application BLAST
(NCBI BLAST+: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.23/) under a Linux/Unix
environment.
6 - 11
Source code: http://dl.dropbox.com/u/12951553/bookanswers/feiteng_blast.zip
For details usage about the source code, please refer to
http://salsahpc.indiana.edu/tutorial/hadoopblast.html.
Problem 6.9:
This problem is research-oriented. Visit the posted Manjrasoft Aneka Software web site
for details and example Solutions.
Problem 6.10:
Repeat applications in Problems 6.1 to 6.7 using the academic/open source packages
described in Section 6.6 namely Eucalyptus, Nimbus, OpenStack, OpenNebula, Sector/Sphere.
This software is all available on FutureGrid http://www.futureGrid.org with a number of tutorials.
FutureGrid Tutorials - https://portal.futuregrid.org/tutorials
Using Eucalyptus on FutureGrid - https://portal.futuregrid.org/tutorials/eucalyptus
Using Nimbus on FutureGrid - https://portal.futuregrid.org/tutorials/nimbus
Using OpenStack on FutureGrid - https://portal.futuregrid.org/tutorials/openstack
Answer to question 6.15 also provides an overview of using Hadoop on FutureGrid cloud
envrionments.
Problem 6.11:
Test run the large-scale matrix multiplication program on two or three cloud performs (GAE,
AWS, and Azure). You can also choose another data-intensive application such as large-scale
search or business processing applications involving the masses from the general public.
Implement the application on at least two or all three cloud platforms, separately. The major
objective is to minimize the execution time of the application. The minor objective is to minimize
the user service costs. (a) Run the service on the Google GAE platform
(b) Run the service on the Amazon AWS platform
(c) Run the service on the Windows Azure platform
(d) Compare your compute and storage costs, design experiences, and experimental
results on all three cloud platforms. Report their relative performance and QoS results
measured.
Implementations:
The implementation of large-scale matrix multiplication program on AWS and Azure using
Hadoop and MPI are given in this chapter. The solution using Hadoop on Amazon AWS
platform was discussed in problem 6.4&6.6. Here we discuss the solution using MPI on Azure
HPC scheduler. A parallel matrix multiply algorithm, named Fox algorithm, was implemented
using MS.MPI. Then we created the host service and deployed the Windows HPC cluster on
Azure using Azure HPC Scheduler SDK tools. After that we logon to HPC cluster head node
and submit the large scale matrix multiplication there.
Source code : http://156.56.93.128/PBMS/doc/answer6.14.zip
Steps:
1) Setup Azure HPC SDK environment:
http://msdn.microsoft.com/en-us/library/windowsazure/hh545593.aspx
6 - 12
2) Configure and deploy HPC Cluster on Azure.
http://msdn.microsoft.com/en-us/library/hh560239(v=vs.85).aspx
3) Logon to head node of HPC cluster and copy executable binary on head node
4) Setup execution environment and configure firewall exception:
clusrun /nodegroup:computenode xcopy /E /Y HEADNODE1approot*.* F:approot
clusrun /nodegroup:computenode hpcfwutil register FoxMatrix.exe
F:approotFoxMatrix.exe
http://msdn.microsoft.com/en-us/library/hh560242(v=vs.85).aspx.
5) Submit MPI job to HPC scheduler:
job submit /nodegroup:computenodes /numnodes:16 mpiexec -n 16 -wdir F:approot
F:approotFoxMatrix.exe 16000
Comparison:
As compared with Amazon AWS, both the two platforms provide graphic interface for users to
deploy Hadoop or HPC cluster respectively. Developers can submit the HPC jobs and Hadoop
jobs to the dynamically deployed cluster either on the head node or on the client PC through job
submission API. In regard to the performance, both applications run on Azure and EC2 show
the performance fluctuation. Figure 1&2 show the maximum error of performance fluctuation of
Hadoop using S3, Hadoop using HDFS, MPIAzure, MPICluster are 8.1%, 1.9%, 5.3%, and
1.2% respectively. The network bandwidth fluctuation is the main reason lead to performance
fluctuation of Hadoop S3 implementation. The performance fluctuation of MPIAzure
implementation is due to the aggregated delay of MPI communication primitives caused by
system noise in guest OS in Cloud environment.
Figure 1: performance fluctuate of Hadoop using Figure 2: performance fluctuate of MPIAzure
HDFS and S3 for different problem sizes and MPIHPC for different problem sizes.
Performance analysis:
Performance analysis of parallel matrix multiplication on Amazon EC2 has been discussed
in problem 6.4. This section just analysis performance of MPIAzure implementation. Figure 1
show the speedup of the MPICluster implementation is 8.6%, 37.1%, and 19.3% faster than that
of MPIAzure implementation when using 4, 9, and 16 nodes respectively. Again, the
performance degradation of MPIAzure implementation is due to the poor network performance
in Cloud environment.
This is caused by the poor network performance in Cloud environment. Figure 4 shows
the performance of Fox algorithm of three implementations using 16 compute nodes. As
expected, MPIAzure is slower than MPICluster, but is faster than DryadCluser. Figure 4&5 show
the parallel overhead versus 1/Sqrt(n), where n refers to number of matrices elements per node.
6 - 13
In figure 5, the parallel overhead using 5x5, 4x4 and 3x3 nodes cases are linear in 1/Sqrt(n),
which indicate the Fox MS.MPI implementation scale well in our HPC cluster with the infinite
band network. In figure 4, the parallel overhead using 3x3 and 4x4nodes do not converge to X
axis for large matrices sizes. The reason is the serious network contention occurred in Cloud
environment when running with large matrices.
Figure 3: speedup for number of nodes using Figure 4: Job time of different runtime on Azure and
MPIAzure and MPICluster on difference nodes HPC cluster for different problem sizes
Figure 5: parallel overhead vs. 1/Sqrt(n) for the Figure 6: parallel overhead vs. 1/Sqrt(n) for the
Fox/MPIAzure/MKL on 3x3 and 4x4 nodes Fox/MPICluster/MKL on 3x3 and 4x4 nodes
Problem. 6.12:
Programming
Environment
Google
MapReduce
Apache Hadoop
MapReduce
Microsoft
Dryad
Coding Language
and Programming
Model used
Mechanisms
for Data
Handling
GFS(Google
File System)
HDFS (Hadoop
Distributed File
System)
Shared directories
and local disks
Failure handling
Methods
Re-execute failed
tasks and deplicated
Re-execution of
failed tasks;
Re-execution of
failed tasks;
6 - 14
execu- tion of the
slow tasks
Duplicate
execution of slow
tasks
Duplicate execution
of slow tasks
High-Level
Language
for data anlysis
Sawzall Pig Latin, Hive DryadLINQ
OS and Cluster
Environment
Linux Clusters Linux Clusters,
Amazon Elastic
MapReduce
on EC2
Windows HPCS
cluster
Intermediate data
transfer method
By File transfer
or using the http
links
By File transfer
or using the http
links
File, TCP pipes,
shared-memory
FIFOs
Problem 6.13:
The following program illustrates a sample application for image filtering using Aneka’s
MapReduce Programming Model. Note that the actual image filtering is dependent on the
problem domain and you may use any algorithm you see fit.
class Program
{
/// Reference to the configuration object.
static Configuration configuration = null;
/// Location of the configuration file.
static string configurationFileLocation = "conf.xml";
/// Processes the arguments given to the application and according
to the parameters read runs the application or shows the help.
/// <param name="args">program arguments</param>
static void Main(string[] args)
{
try
{
//Process the arguments
Program.ProcessArgs(args);
Program.SetupWorkspace();
//configure MapReduceApplication
MapReduceApplication<ImageFilterMapper, ImageFilterReducer>
application = new MapReduceApplication<ImageFilterMapper,
ImageFilterReducer>("ImageFilter", configuration);
//invoke and wait for result
application.InvokeAndWait(new EventHandler<Aneka.Entity.
ApplicationEventArgs>
(OnApplicationFinished));
}
catch (Exception ex)
{
Console.WriteLine(" Message: {0}", ex.Message);
Console.WriteLine("Application terminated unexpectedly.");
}
}
/// Hooks the ApplicationFinished events and Process the results
if the application has been successful.
/// <param name="sender">event source</param>
/// <param name="e">event information</param>
6 - 15
static void OnApplicationFinished(object sender,
Aneka.Entity.ApplicationEventArgs e)
{
if (e.Exception != null)
{
Console.WriteLine(e.Exception.Message);
}
Console.WriteLine("Press enter to finish!");
Console.ReadLine();
}
/// Processes the arguments given to the application and according
to the parameters read runs the application or shows the help.
/// <param name="args">program arguments</param>
static void ProcessArgs(string[] args)
{
for (int i = 0; i < args.Length; i++)
{
switch (args[i])
{
case "-c":
i++;
configurationFileLocation = args[i];
break;
default:
break;
}
}
}
/// Initializes the workspace
static void SetupWorkspace()
{
Configuration conf = Configuration. GetConfiguration(
Program.configurationFileLocation);
Program.configuration = conf;
}
}
/// Class ImageFilterMapper. Mapper implementation for the ImageFilter
application. The Map method reads the source images and performs the
required filtering. The output of the Map function is the filtered image.
public class ImageFilterMapper : Mapper<string, BytesWriteable>
{
/// The Map function receives as input the name of the image and its
contents. The filtering is then performed on the contents before
writing the results back to the storage.
/// <param name="input">A key-value pair representing the name of the
/// file and its contents.</param>
6 - 16
Once you have written and compiled you code, run your application by varying first the
input size and then the number of nodes (for example: 2, 4, 8, 16, ..). Plot a single graph of
execution time (y-axis) versus input size (x-axis) for the different sets of nodes used, so that
your final graph shows the difference in execution time for each of the sets of nodes. Next plot a
graph of speed-up (x-axis) versus input size (y-axis) for the different sets of nodes used.
Problem 6.14:
Developing a platform service such as Hadoop on various cloud infrastructure can be an
arduous task. Below we break this task down into 3 categories: building the VM, instantiating
VMs, and setup Hadoop.
Building a VM:
a) Eucalyptus: The ideal way to build a hadoop VM on Eucalyptus 2.0 is to start with a pre-
prepared base image and package it into your own EMI. You can find a starter image at
http://open.eucalyptus.com/wiki/starter-emis. Once a starter image is selected, it is
unzipped, mounted as a filesystem, and the Hadoop installation packages can be
unzipped in a desired installation path (recommended /opt). After the image is properly
prepared. The image is bundled, uploaded, and registered using the euca-bundle, euca-
upload and euca-register commands described at
http://open.eucalyptus.com/wiki/EucalyptusImageManagement_v2.0.
b) Nimbus: Select the precompiled Hadoop cluster available at the Nimbus Marketplace
http://scienceclouds.org/marketplace/ and add it to the given Nimbus cloud being used, if
not already available.
c) OpenStack: Similar to Eucalyptus, select a base image, either form the eucalyptus
precompiled images or from the Ubuntu UEC http://uec-images.ubuntu.com/releases/.
Once a starter image is selected, it is unzipped, mounted as a filesystem, and the
Hadoop installation packages can be unzipped in a desired installation path
(recommended /opt). After the image is properly prepared. The image is bundled,
protected override void Map(IMapInput<string, BytesWriteable> input)
{
byte[] image = input.Value.GetBytes();
// Put your image filtering algorithm here
// ...
// ...
Emit(input.Key, image);
}
}
/// Class ImageFilterReducer. Reducer implementation for the ImageFilter
application. The Reducer is an identity function which does no processing.
public class ImageFilterReducer : Reducer<string, BytesWriteable>
{
/// The Reduce function is an identify function which does no further
processing on the contents.
protected override void Reduce(IReduceInputEnumerator<BytesWriteable> input)
{
// This is an identity function. No additional processing is required.
}
}
6 - 17
uploaded, and registered using the euca-bundle, euca-upload and euca-register
commands described at
http://open.eucalyptus.com/wiki/EucalyptusImageManagement_v2.0.
Instantiate VMs:
a) Eucalyptus: Using the euca2ools commands and assuming the user has the
appropriate credentials and keypairs created, call euca-run-instances with the
predefined EMI number retrieved form the previous step. Alternatively, use the boto2
library in Python to create your own startup script.
b) Nimbus: Assuming the necessary credentials are in place, start the hadoop image by
using the bin/cloud-client.sh –run command and specifying the image name in the –
name attribute.
c) OpenStack: Using the euca2ools commands and assuming the user has the appropriate
credentials and keypairs created, call euca-run-instances with the predefined EMI
number retrieved form the previous step. Alternatively, use the boto2 library in Python to
create your own startup script.
Setup Hadoop:
Once a number of VMs have been instantiated and in the “running” state, select one as the
master Hadoop node and designate the others as slave nodes. For each node, set the proper
configuration in /etc/hosts, make changes to Hadoop’s configuration files as described at
https://portal.futuregrid.org/salsahadoop-futuregrid-cloud-eucalyptus#Configuration . Once
ready, you can start Hadoop on each VM with the bin/start-all.sh command and test using Lynx
and connecting to the master node’s MapReduce and HDFS services (lynx 10.0.2.131:9001
and lynx 10.0.2.131:9003).
Run WordCount:
Once the Hadoop HDFS and MapReduce services are running properly, run the WordCount
program described at
http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Example%3A+WordCoun
t+v1.0.
Problem. 6.15:
Examine the tutorials at http://www.salsahpc.org and
http://www.iterativemapreduce.org/samples.html. Compare the Hadoop and Twister on cases
specified by instructor from examples given there. Discuss their relative strength and
weakness. We select KMeansClustering application to compare Hadoop and Twister.
KMeans Clustering
Twister strengths
a) Data Caching: Twister supports in-memory caching of loop-invariant input data
(KMeans input data points) across iterations, eliminating the overhead of retrieving and
parsing the data in each iteration. Hadoop does not support caching of input data and
have to read & parse data from disk (or from another node in case of a non-data local
map task) in each iteration adding a significant overhead to the computation.
b) Iterative extensions: Twister programming model contains a combiner step (after the
reduce step) to merge the reduce outputs (new centeroids) and supports data
6 - 18
broadcasting at the beginning of an iteration. Hadoop does not support data
broadcasting or providing broadcast data (KMeans centroids) as an input to the map
tasks. Users will have to use an auxiliary mechanism (eg: distributed cache) to
broadcast and receive the centroid data. Also the users have to manually merge the new
centroids in the driver program.
c) Intermediate data communication: Twister performs streaming intermediate data
transfers directly to the reducers using messaging or TCP. Hadoop first writes the
intermediate data to the disks before transferring adding a significant performance
overhead as KMeansClustering performs significant amount of intermediate data
transfers.
Hadoop strengths
d) Fault Tolerance – Hadoop supports fine-grained task level fault tolerance, where it re-
execute the failed tasks to recover the computations. Hadoop also supports duplicate
execution of slow tasks to avoid the tail of slow tasks. Twister supports fault-tolerance
only in the iteration level, where if a task fails the whole iteration needs to be re-
executed.
e) Load Balancing – Hadoop performs global queue based dynamic scheduling resulting in
natural load balancing of the computations. Hadoop also supports having multiple waves
of map tasks per iteration, resulting in better load balancing and offsetting some of the
intermediate data communication costs (overlapping communication with computation).
Twister only supports static scheduling and do not support multiple waves of map tasks.
f) Monitoring – Hadoop provides a web based monitoring UI, where the user can monitor
the progress of the computations. Twister only provides a command line monitoring
output.
(a) and (b) applies only to the iterative MapReduce applications like KMeansClustering and
PageRank. Others apply classic MapReduce and pleasingly parallel applications as well.
Problem 6.16:
The given program is written in Hadoop, namely WebVisCounter. Readers are encouraged
to trace through the program or test run it on a cloud platform you have access. Analyze the
programming tasks performed by this Hadoop program and learn from using Hadoop library.
Refer to the tutorials on http://hadoop.apache.org/common/docs/r1.0.1/ on how to setup and run
Hadoop. Refer to the answer for question 6.15 for instructions on running Hadoop on cloud
environments.
Problem 6.17:
Twister K-means extends the MapReduce programming model iteratively. Many data
analysis techniques require iterative computations. For example, K-means clustering is the
application where multiple iterations of MapReduce computations are necessary for the overall
computation. Twister is an enhanced MapReduce runtime that supports iterative MapReduce
computations efficiently. In this assignment you will learn the iterative MapReduce programming
model and how to implement the K-means algorithm with Twister.
Please Learn how to use Twister from Twister webpage, and here is a helpful link:
http://salsahpc.indiana.edu/ICPAD/twister_kmeans_user_guide.htm
Problem 6.18:
6 - 19
DryadLINQ PageRank is a well known link analysis algorithm. It calculates numerical value
to each element of a hyperlinked set of web pages, which reflects the probability that the
random surfer will access that page. Implementing PageRank with MapReduce is of some
difficulty in both efficiency and programmability due to the random access model in large scale
web graph. DryadLINQ provide the SQL-like queries API that help programmer implement
PageRank without much effort. Besides, the Dryad infrastructure helps to scale the application
out in an easy way. This assignment will help you learn how to implement a simple PageRank
application with DryadLINQ.
We provide scratch code of DryadLINQ PageRank that can be compiled successfully. You
will learn the PageRank algorithm, and learn how to implement PageRank with DryadLINQ API.
PageRank algorithm
PageRank is the well know link analysis algorithm. It calculates numerical value to each element
of a hyperlinked set of web pages, which reflects the probability that the random surfer will
access that page. In mathematics’ view, the process of PageRank can be understood as a
Markov chain which needs recursively calculation to converge. The formula for PageRank
algorithm is given below:
This equation calculates the PageRank value for any page A. The updated rank value of page A
is the sum of each adjacency page’s own rank value divided by the number of outbound links of
that page. The damping factor (d) showed in Fig.1 illustrates the probability that one person will
continue searching web by following the links in the current web page. The damping factor is
subtracted from 1 and the result is divided by the number of web pages (N) in the collection.
Then this term “(1‐d)/N” is added to the updated rank value of page A. The damping factor
defined in Fig 1. is set as 0.85 in this assignment.
DryadLINQ Implementation
DryadLINQ is a compiler which translates LINQ programs to distributed computations.
LINQ is an extension to .NET, launched with Visual Studio 2008, which provides declarative
programming for data manipulation. With DryadLINQ, the programmer does not need to have
much knowledge about parallel or distributed computation. Thus any LINQ programmers turn
instantly into a cluster computing programmer.
The PageRank algorithm requires multiple iterations during the overall computation. One
iteration of PageRank computation consists of two job steps: 1) join the rank values table and
linkage table to generate the partial rank values; 2) aggregate the partial rank values for each
unique web page. A driver program keeps looping the join job and aggregate job until a stop
condition is achieved. E.g, the number of rounds has exceeded the threshold, or the total
difference of all rank values between two iterations is less than a predefined threshold.
In DryadLINQ PageRank we use “IQueryable<Page> pages” to store the linkage table,
and the “IQueryable<Vertex> rankValues” to store the rank values table. The linkage table is
built from the adjacency matrix of web graph. All the adjacency matrix input files are defined in
the partition table “cwpartition.pt”. The rank values are updated by using a Join of the current
“rankValues” with the “pages” object. The output of the Join is a list of <dest, value> pairs that
contain the partial rank values. We can aggregate those partial results by using a “GroupBy” on
the first element of the <dest, value> tuple. Then the partial rank values of each webpage are
accumulated, forming the new rank values for the next iteration.
Sample Code
6 - 20
Here is sample code of DryadLINQ PageRank. We use the formula showed in Fig 1. to
calculate the new rank values in each iteration.
public void RunPageRank()
{ // string ptPath = @"file://MADRID‐HEADNODEDryadDataHuiPageRankcwpartition.pt";
PartitionedTable<LineRecord> table = PartitionedTable.Get<LineRecord>(ptPath);
IQueryable<Page> pages = table.Select(lr => buildPage(lr.line));
Vertex[] ver = new Vertex[numPages]; double initialRank = 1.0 / numPages; for (int i = 0; i <
numPages; i++) {
ver[i].source = i+1;
ver[i].value = initialRank; } IQueryable<Vertex> rankValues =
ver.ToPartitionedTable("rankValues.pt"); IQueryable<Vertex> newRankValues = null;
for (int i = 0; i < 10; i++)
{ newRankValues = pages.Join(rankValues, page => page.source, vertex => vertex.source,
(page, vertex) => page.links.Select(dest => new Vertex(dest, vertex.value / page.numLinks))).
SelectMany(list => list). GroupBy(vertex => vertex.source).
Select(group => new Vertex(group.Key, group.Select(vertex => vertex.value).Sum()/numPages
*0.85+0.15/numPages));
rankValues = newRankValues;
Console.WriteLine(" pagerank iteration no:" + i); } SaveResults(
Problem 6.19:
The following program illustrates the use of Aneka’s Thread Programming Model for
matrix multiplication. The program takes as inputs two square matrices. Each AnekaThread
instance is a row-column multiplier, that is, a row from the first matrix is multiplied with the
corresponding row from the second matrix to produce the resulting cell for the final matrix. Each
of these row-column computations is performed independently on a Worker node. The results of
the computations are then put together by the client application.
/// Class <i><b>MatrixMultiplier</b></i>. Multiplies two square matrices, where
each element in the resulting matrix, C, is computed by multiplying the
corresponding row and column vectors of matrix A and B. Each is carried out
by distinct instances of AnekaThread multiplying two square matrices of
dimension n would thus requires n*n AnekaThread instances.
public class MatrixMultiplier
{
/// The application configuration
private Configuration configuration;
/// Creates an instance of MatrixMultiplier
/// <param name="schedulerUri">The uri to the Aneka scheduler</param>
public MatrixMultiplier(Uri schedulerUri)
{
configuration = new Configuration();
configuration.SchedulerUri = schedulerUri;
}
Exploring the Variety of Random
Documents with Different Content
prominently displaying the sentence set forth in paragraph 1.E.1
with active links or immediate access to the full terms of the Project
Gutenberg™ License.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if you
provide access to or distribute copies of a Project Gutenberg™ work
in a format other than “Plain Vanilla ASCII” or other format used in
the official version posted on the official Project Gutenberg™ website
(www.gutenberg.org), you must, at no additional cost, fee or
expense to the user, provide a copy, a means of exporting a copy, or
a means of obtaining a copy upon request, of the work in its original
“Plain Vanilla ASCII” or other form. Any alternate format must
include the full Project Gutenberg™ License as specified in
paragraph 1.E.1.
1.E.7. Do not charge a fee for access to, viewing, displaying,
performing, copying or distributing any Project Gutenberg™ works
unless you comply with paragraph 1.E.8 or 1.E.9.
1.E.8. You may charge a reasonable fee for copies of or providing
access to or distributing Project Gutenberg™ electronic works
provided that:
• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You provide a full refund of any money paid by a user who
notifies you in writing (or by e-mail) within 30 days of receipt
that s/he does not agree to the terms of the full Project
Gutenberg™ License. You must require such a user to return or
destroy all copies of the works possessed in a physical medium
and discontinue all use of and all access to other copies of
Project Gutenberg™ works.
• You provide, in accordance with paragraph 1.F.3, a full refund of
any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.E.9. If you wish to charge a fee or distribute a Project Gutenberg™
electronic work or group of works on different terms than are set
forth in this agreement, you must obtain permission in writing from
the Project Gutenberg Literary Archive Foundation, the manager of
the Project Gutenberg™ trademark. Contact the Foundation as set
forth in Section 3 below.
1.F.
1.F.1. Project Gutenberg volunteers and employees expend
considerable effort to identify, do copyright research on, transcribe
and proofread works not protected by U.S. copyright law in creating
the Project Gutenberg™ collection. Despite these efforts, Project
Gutenberg™ electronic works, and the medium on which they may
be stored, may contain “Defects,” such as, but not limited to,
incomplete, inaccurate or corrupt data, transcription errors, a
copyright or other intellectual property infringement, a defective or
damaged disk or other medium, a computer virus, or computer
codes that damage or cannot be read by your equipment.
1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except for
the “Right of Replacement or Refund” described in paragraph 1.F.3,
the Project Gutenberg Literary Archive Foundation, the owner of the
Project Gutenberg™ trademark, and any other party distributing a
Project Gutenberg™ electronic work under this agreement, disclaim
all liability to you for damages, costs and expenses, including legal
fees. YOU AGREE THAT YOU HAVE NO REMEDIES FOR
NEGLIGENCE, STRICT LIABILITY, BREACH OF WARRANTY OR
BREACH OF CONTRACT EXCEPT THOSE PROVIDED IN PARAGRAPH
1.F.3. YOU AGREE THAT THE FOUNDATION, THE TRADEMARK
OWNER, AND ANY DISTRIBUTOR UNDER THIS AGREEMENT WILL
NOT BE LIABLE TO YOU FOR ACTUAL, DIRECT, INDIRECT,
CONSEQUENTIAL, PUNITIVE OR INCIDENTAL DAMAGES EVEN IF
YOU GIVE NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.
1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you
discover a defect in this electronic work within 90 days of receiving
it, you can receive a refund of the money (if any) you paid for it by
sending a written explanation to the person you received the work
from. If you received the work on a physical medium, you must
return the medium with your written explanation. The person or
entity that provided you with the defective work may elect to provide
a replacement copy in lieu of a refund. If you received the work
electronically, the person or entity providing it to you may choose to
give you a second opportunity to receive the work electronically in
lieu of a refund. If the second copy is also defective, you may
demand a refund in writing without further opportunities to fix the
problem.
1.F.4. Except for the limited right of replacement or refund set forth
in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.
1.F.5. Some states do not allow disclaimers of certain implied
warranties or the exclusion or limitation of certain types of damages.
If any disclaimer or limitation set forth in this agreement violates the
law of the state applicable to this agreement, the agreement shall be
interpreted to make the maximum disclaimer or limitation permitted
by the applicable state law. The invalidity or unenforceability of any
provision of this agreement shall not void the remaining provisions.
1.F.6. INDEMNITY - You agree to indemnify and hold the Foundation,
the trademark owner, any agent or employee of the Foundation,
anyone providing copies of Project Gutenberg™ electronic works in
accordance with this agreement, and any volunteers associated with
the production, promotion and distribution of Project Gutenberg™
electronic works, harmless from all liability, costs and expenses,
including legal fees, that arise directly or indirectly from any of the
following which you do or cause to occur: (a) distribution of this or
any Project Gutenberg™ work, (b) alteration, modification, or
additions or deletions to any Project Gutenberg™ work, and (c) any
Defect you cause.
Section 2. Information about the Mission
of Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new computers.
It exists because of the efforts of hundreds of volunteers and
donations from people in all walks of life.
Volunteers and financial support to provide volunteers with the
assistance they need are critical to reaching Project Gutenberg™’s
goals and ensuring that the Project Gutenberg™ collection will
remain freely available for generations to come. In 2001, the Project
Gutenberg Literary Archive Foundation was created to provide a
secure and permanent future for Project Gutenberg™ and future
generations. To learn more about the Project Gutenberg Literary
Archive Foundation and how your efforts and donations can help,
see Sections 3 and 4 and the Foundation information page at
www.gutenberg.org.
Section 3. Information about the Project
Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-profit
501(c)(3) educational corporation organized under the laws of the
state of Mississippi and granted tax exempt status by the Internal
Revenue Service. The Foundation’s EIN or federal tax identification
number is 64-6221541. Contributions to the Project Gutenberg
Literary Archive Foundation are tax deductible to the full extent
permitted by U.S. federal laws and your state’s laws.
The Foundation’s business office is located at 809 North 1500 West,
Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up
to date contact information can be found at the Foundation’s website
and official page at www.gutenberg.org/contact
Section 4. Information about Donations to
the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission of
increasing the number of public domain and licensed works that can
be freely distributed in machine-readable form accessible by the
widest array of equipment including outdated equipment. Many
small donations ($1 to $5,000) are particularly important to
maintaining tax exempt status with the IRS.
The Foundation is committed to complying with the laws regulating
charities and charitable donations in all 50 states of the United
States. Compliance requirements are not uniform and it takes a
considerable effort, much paperwork and many fees to meet and
keep up with these requirements. We do not solicit donations in
locations where we have not received written confirmation of
compliance. To SEND DONATIONS or determine the status of
compliance for any particular state visit www.gutenberg.org/donate.
While we cannot and do not solicit contributions from states where
we have not met the solicitation requirements, we know of no
prohibition against accepting unsolicited donations from donors in
such states who approach us with offers to donate.
International donations are gratefully accepted, but we cannot make
any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.
Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.
Section 5. General Information About
Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could be
freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose network of
volunteer support.
Project Gutenberg™ eBooks are often created from several printed
editions, all of which are confirmed as not protected by copyright in
the U.S. unless a copyright notice is included. Thus, we do not
necessarily keep eBooks in compliance with any particular paper
edition.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.
This website includes information about Project Gutenberg™,
including how to make donations to the Project Gutenberg Literary
Archive Foundation, how to help produce our new eBooks, and how
to subscribe to our email newsletter to hear about new eBooks.
Ad

More Related Content

Similar to Distributed and Cloud Computing 1st Edition Hwang Solutions Manual (20)

IBM Connections Activity Stream 3rd Party Integration - Social Connect VI - P...
IBM Connections Activity Stream 3rd Party Integration - Social Connect VI - P...IBM Connections Activity Stream 3rd Party Integration - Social Connect VI - P...
IBM Connections Activity Stream 3rd Party Integration - Social Connect VI - P...
James Gallagher
 
08 asp.net session11
08 asp.net session1108 asp.net session11
08 asp.net session11
Vivek Singh Chandel
 
Devise and Rails
Devise and RailsDevise and Rails
Devise and Rails
William Leeper
 
Designing the Call of Cthulhu app with Google App Engine
Designing the Call of Cthulhu app with Google App EngineDesigning the Call of Cthulhu app with Google App Engine
Designing the Call of Cthulhu app with Google App Engine
Chris Bunch
 
Pinterest like site using REST and Bottle
Pinterest like site using REST and Bottle Pinterest like site using REST and Bottle
Pinterest like site using REST and Bottle
Gaurav Bhardwaj
 
Claims based authentication in share point 2010 .new
Claims based authentication in share point 2010 .newClaims based authentication in share point 2010 .new
Claims based authentication in share point 2010 .new
RavikantChaturvedi
 
Chanhao Jiang And David Wei Presentation Quickling Pagecache
Chanhao Jiang And David Wei Presentation Quickling PagecacheChanhao Jiang And David Wei Presentation Quickling Pagecache
Chanhao Jiang And David Wei Presentation Quickling Pagecache
Ajax Experience 2009
 
Crud tutorial en
Crud tutorial enCrud tutorial en
Crud tutorial en
forkgrown
 
Php BASIC
Php BASICPhp BASIC
Php BASIC
Gayathri Sampathkumar
 
How to disassemble one monster app into an ecosystem of 30
How to disassemble one monster app into an ecosystem of 30How to disassemble one monster app into an ecosystem of 30
How to disassemble one monster app into an ecosystem of 30
fiyuer
 
8-9-10. ASP_updated8-9-10. ASP_updated8-9-10. ASP_updated
8-9-10. ASP_updated8-9-10. ASP_updated8-9-10. ASP_updated8-9-10. ASP_updated8-9-10. ASP_updated8-9-10. ASP_updated
8-9-10. ASP_updated8-9-10. ASP_updated8-9-10. ASP_updated
dioduong345
 
Spsl v unit - final
Spsl v unit - finalSpsl v unit - final
Spsl v unit - final
Sasidhar Kothuru
 
Lecture 10.pptx
Lecture 10.pptxLecture 10.pptx
Lecture 10.pptx
Javaid Iqbal
 
Django 1.10.3 Getting started
Django 1.10.3 Getting startedDjango 1.10.3 Getting started
Django 1.10.3 Getting started
MoniaJ
 
Microsoft AZ-204 Exam Dumps
Microsoft AZ-204 Exam DumpsMicrosoft AZ-204 Exam Dumps
Microsoft AZ-204 Exam Dumps
Study Material
 
How To Manage API Request with AXIOS on a React Native App
How To Manage API Request with AXIOS on a React Native AppHow To Manage API Request with AXIOS on a React Native App
How To Manage API Request with AXIOS on a React Native App
Andolasoft Inc
 
need help completing week 6 ilab.. i will upload what I currently ha.docx
need help completing week 6 ilab.. i will upload what I currently ha.docxneed help completing week 6 ilab.. i will upload what I currently ha.docx
need help completing week 6 ilab.. i will upload what I currently ha.docx
niraj57
 
Cakephp's Cache
Cakephp's CacheCakephp's Cache
Cakephp's Cache
vl
 
The Big Picture and How to Get Started
The Big Picture and How to Get StartedThe Big Picture and How to Get Started
The Big Picture and How to Get Started
guest1af57e
 
C# Unit5 Notes
C# Unit5 NotesC# Unit5 Notes
C# Unit5 Notes
Sudarshan Dhondaley
 
IBM Connections Activity Stream 3rd Party Integration - Social Connect VI - P...
IBM Connections Activity Stream 3rd Party Integration - Social Connect VI - P...IBM Connections Activity Stream 3rd Party Integration - Social Connect VI - P...
IBM Connections Activity Stream 3rd Party Integration - Social Connect VI - P...
James Gallagher
 
Designing the Call of Cthulhu app with Google App Engine
Designing the Call of Cthulhu app with Google App EngineDesigning the Call of Cthulhu app with Google App Engine
Designing the Call of Cthulhu app with Google App Engine
Chris Bunch
 
Pinterest like site using REST and Bottle
Pinterest like site using REST and Bottle Pinterest like site using REST and Bottle
Pinterest like site using REST and Bottle
Gaurav Bhardwaj
 
Claims based authentication in share point 2010 .new
Claims based authentication in share point 2010 .newClaims based authentication in share point 2010 .new
Claims based authentication in share point 2010 .new
RavikantChaturvedi
 
Chanhao Jiang And David Wei Presentation Quickling Pagecache
Chanhao Jiang And David Wei Presentation Quickling PagecacheChanhao Jiang And David Wei Presentation Quickling Pagecache
Chanhao Jiang And David Wei Presentation Quickling Pagecache
Ajax Experience 2009
 
Crud tutorial en
Crud tutorial enCrud tutorial en
Crud tutorial en
forkgrown
 
How to disassemble one monster app into an ecosystem of 30
How to disassemble one monster app into an ecosystem of 30How to disassemble one monster app into an ecosystem of 30
How to disassemble one monster app into an ecosystem of 30
fiyuer
 
8-9-10. ASP_updated8-9-10. ASP_updated8-9-10. ASP_updated
8-9-10. ASP_updated8-9-10. ASP_updated8-9-10. ASP_updated8-9-10. ASP_updated8-9-10. ASP_updated8-9-10. ASP_updated
8-9-10. ASP_updated8-9-10. ASP_updated8-9-10. ASP_updated
dioduong345
 
Django 1.10.3 Getting started
Django 1.10.3 Getting startedDjango 1.10.3 Getting started
Django 1.10.3 Getting started
MoniaJ
 
Microsoft AZ-204 Exam Dumps
Microsoft AZ-204 Exam DumpsMicrosoft AZ-204 Exam Dumps
Microsoft AZ-204 Exam Dumps
Study Material
 
How To Manage API Request with AXIOS on a React Native App
How To Manage API Request with AXIOS on a React Native AppHow To Manage API Request with AXIOS on a React Native App
How To Manage API Request with AXIOS on a React Native App
Andolasoft Inc
 
need help completing week 6 ilab.. i will upload what I currently ha.docx
need help completing week 6 ilab.. i will upload what I currently ha.docxneed help completing week 6 ilab.. i will upload what I currently ha.docx
need help completing week 6 ilab.. i will upload what I currently ha.docx
niraj57
 
Cakephp's Cache
Cakephp's CacheCakephp's Cache
Cakephp's Cache
vl
 
The Big Picture and How to Get Started
The Big Picture and How to Get StartedThe Big Picture and How to Get Started
The Big Picture and How to Get Started
guest1af57e
 

Recently uploaded (20)

To study the nervous system of insect.pptx
To study the nervous system of insect.pptxTo study the nervous system of insect.pptx
To study the nervous system of insect.pptx
Arshad Shaikh
 
Handling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptxHandling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptx
AuthorAIDNationalRes
 
Quality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdfQuality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdf
Dr. Bindiya Chauhan
 
Geography Sem II Unit 1C Correlation of Geography with other school subjects
Geography Sem II Unit 1C Correlation of Geography with other school subjectsGeography Sem II Unit 1C Correlation of Geography with other school subjects
Geography Sem II Unit 1C Correlation of Geography with other school subjects
ProfDrShaikhImran
 
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - WorksheetCBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
Sritoma Majumder
 
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACYUNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
DR.PRISCILLA MARY J
 
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Celine George
 
Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025
Mebane Rash
 
SPRING FESTIVITIES - UK AND USA -
SPRING FESTIVITIES - UK AND USA            -SPRING FESTIVITIES - UK AND USA            -
SPRING FESTIVITIES - UK AND USA -
Colégio Santa Teresinha
 
How to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POSHow to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POS
Celine George
 
Sinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_NameSinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_Name
keshanf79
 
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
Celine George
 
LDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini UpdatesLDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini Updates
LDM Mia eStudios
 
pulse ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulse
pulse  ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulsepulse  ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulse
pulse ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulse
sushreesangita003
 
P-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 finalP-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 final
bs22n2s
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-3-2025.pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 5-3-2025.pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 5-3-2025.pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-3-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
Social Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy StudentsSocial Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy Students
DrNidhiAgarwal
 
Operations Management (Dr. Abdulfatah Salem).pdf
Operations Management (Dr. Abdulfatah Salem).pdfOperations Management (Dr. Abdulfatah Salem).pdf
Operations Management (Dr. Abdulfatah Salem).pdf
Arab Academy for Science, Technology and Maritime Transport
 
2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx
contactwilliamm2546
 
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Library Association of Ireland
 
To study the nervous system of insect.pptx
To study the nervous system of insect.pptxTo study the nervous system of insect.pptx
To study the nervous system of insect.pptx
Arshad Shaikh
 
Handling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptxHandling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptx
AuthorAIDNationalRes
 
Quality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdfQuality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdf
Dr. Bindiya Chauhan
 
Geography Sem II Unit 1C Correlation of Geography with other school subjects
Geography Sem II Unit 1C Correlation of Geography with other school subjectsGeography Sem II Unit 1C Correlation of Geography with other school subjects
Geography Sem II Unit 1C Correlation of Geography with other school subjects
ProfDrShaikhImran
 
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - WorksheetCBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
Sritoma Majumder
 
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACYUNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
UNIT 3 NATIONAL HEALTH PROGRAMMEE. SOCIAL AND PREVENTIVE PHARMACY
DR.PRISCILLA MARY J
 
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Celine George
 
Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025Stein, Hunt, Green letter to Congress April 2025
Stein, Hunt, Green letter to Congress April 2025
Mebane Rash
 
How to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POSHow to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POS
Celine George
 
Sinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_NameSinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_Name
keshanf79
 
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
Celine George
 
LDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini UpdatesLDMMIA Reiki Master Spring 2025 Mini Updates
LDMMIA Reiki Master Spring 2025 Mini Updates
LDM Mia eStudios
 
pulse ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulse
pulse  ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulsepulse  ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulse
pulse ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulse
sushreesangita003
 
P-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 finalP-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 final
bs22n2s
 
Social Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy StudentsSocial Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy Students
DrNidhiAgarwal
 
2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx2541William_McCollough_DigitalDetox.docx
2541William_McCollough_DigitalDetox.docx
contactwilliamm2546
 
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Library Association of Ireland
 
Ad

Distributed and Cloud Computing 1st Edition Hwang Solutions Manual

  • 1. Visit https://testbankfan.com to download the full version and explore more testbank or solutions manual Distributed and Cloud Computing 1st Edition Hwang Solutions Manual _____ Click the link below to download _____ https://testbankfan.com/product/distributed-and-cloud- computing-1st-edition-hwang-solutions-manual/ Explore and download more testbank or solutions manual at testbankfan.com
  • 2. Here are some recommended products that we believe you will be interested in. You can click the link to download. Distributed Systems Concepts and Design 5th Edition Coulouris Solutions Manual https://testbankfan.com/product/distributed-systems-concepts-and- design-5th-edition-coulouris-solutions-manual/ Digital Logic and Microprocessor Design with Interfacing 2nd Edition Hwang Solutions Manual https://testbankfan.com/product/digital-logic-and-microprocessor- design-with-interfacing-2nd-edition-hwang-solutions-manual/ Computer Accounting with QuickBooks Online A Cloud Based Approach 1st Edition Yacht Solutions Manual https://testbankfan.com/product/computer-accounting-with-quickbooks- online-a-cloud-based-approach-1st-edition-yacht-solutions-manual/ Exploring Getting Started with Computing Concepts 1st Edition Poatsy Solutions Manual https://testbankfan.com/product/exploring-getting-started-with- computing-concepts-1st-edition-poatsy-solutions-manual/
  • 3. Differential Equations Computing and Modeling and Differential Equations and Boundary Value Problems Computing and Modeling 5th Edition Edwards Solutions Manual https://testbankfan.com/product/differential-equations-computing-and- modeling-and-differential-equations-and-boundary-value-problems- computing-and-modeling-5th-edition-edwards-solutions-manual/ Differential Equations Computing and Modeling 5th Edition Edwards Solutions Manual https://testbankfan.com/product/differential-equations-computing-and- modeling-5th-edition-edwards-solutions-manual/ Computer Accounting with QuickBooks Online A Cloud Based Approach 1st Edition Yacht Test Bank https://testbankfan.com/product/computer-accounting-with-quickbooks- online-a-cloud-based-approach-1st-edition-yacht-test-bank/ Differential Equations and Boundary Value Problems Computing and Modeling 5th Edition Edwards Solutions Manual https://testbankfan.com/product/differential-equations-and-boundary- value-problems-computing-and-modeling-5th-edition-edwards-solutions- manual/ Hands on Virtual Computing 2nd Edition Simpson Solutions Manual https://testbankfan.com/product/hands-on-virtual-computing-2nd- edition-simpson-solutions-manual/
  • 5. 6 - 1 Solutions to Homework Problems in Chapter 6 Hwang, Fox and Dongarra: Distributed and Cloud Computing, Morgan Kaufmann Publishers, copyrighted 2012 Note: The solutions of Chapter 6 problems were assisted by graduate students from Indiana University under the supervision of Dr. Judy Qiu: Problem 6.1: Get the source code from: http://dl.dropbox.com/u/12951553/bookanswers/answer6.1.zip (a). We implemented a demo system, which is quite simple in its functionality: there’s a search box used to find contacts, and once a contact has been found, we list recent emails and attachments associated with the contact. To do this, the application offers 3 urls that are called by the JavaScript running in the browser to obtain the data: search.json, messages.json and files.json. How the system respond to the request to get message history for a given contact is done by calling /messages.json which accepts an email address as a GET parameter. Note, this functionality requires an authentication step not shown here. The code behind that call is as follows: class MessagesHandler(webapp.RequestHandler): def get(self): current_user = users.get_current_user() current_email = current_user.email() emailAddr = self.request.get('email') contextIO = ContextIO(api_key=settings.CONTEXTIO_OAUTH_KEY, api_secret=settings.CONTEXTIO_OAUTH_SECRET, api_url=settings.CONTEXTIO_API_URL) response = contextIO.contactmessages(emailAddr,account=current_email) self.response.out.write(simplejson.dumps(response.get_data())) The code simply uses the contactmessages.json API call of and returns all the messages including the subject, other recipients, thread ID, and even attachments in JSON format. The complete code for this demo application has been made available by the Context.IO team on our GitHub account (https://github.com/contextio/AppEngineDemo). This answer is based on the Google App Engine Blog Post at http://googleappengine.blogspot.com/2011/05/accessing-gmail-accounts-from-app.html. (b). The dashboard of Google App Engine provides measurement on useful aspects of the deployed application. For example, execution logs, version control, quota details, datastore viewer, administration tools. It also provides detailed resource usage information as the following:
  • 6. 6 - 2 Critical measurement can be easily retrieved from this powerful dashborad. (c) . Automatic scaling is built in with App Engine, and it’s not visible to users. http://code.google.com/appengine/whyappengine.html#scale
  • 7. 6 - 3 Problem 6.2: Get the source code: http://dl.dropbox.com/u/12951553/bookanswers/answer6.2.zip Here we design a very simple data storage system using the Blobstore service to illustrate how Google App Engine handles data. The Blobstore API allows your application to serve data objects, called blobs, that are much larger than the size allowed for objects in the Datastore service. Blobs are useful for serving large files, such as video or image files, and for allowing users to upload large data files. Blobs are created by uploading a file through an HTTP request. Typically, your applications will do this by presenting a form with a file upload field to the user. When the form is submitted, the Blobstore creates a blob from the file's contents and returns an opaque reference to the blob, called a blob key, which you can later use to serve the blob. The application can serve the complete blob value in response to a user request, or it can read the value directly using a streaming file-like interface. This system includes the following functions: user login, data listing, data upload/download. Gzip compression is used when possible to decrease the cost. User login: This function is implemented using the User Service provided in GAE. If the user is already signed in to your application, get_current_user() returns the User object for the user. Otherwise, it returns None. If the user has signed in, display a personalized message, using the nickname associated with the user's account. If the user has not signed in, tell webapp to redirect the user's browser to the Google account sign-in screen. The redirect includes the URL to this page (self.request.uri) so the Google account sign-in mechanism will send the user back here after the user has signed in or registered for a new account. user = users.get_current_user() if user: self.response.headers['Content-Encoding'] = 'gzip' self.response.headers['Content-Type'] = 'text/plain' self.response.out.write('Hello, ' + user.nickname()) self.response.out.write('<a href=' + users.create_logout_url("/") +'>sign out</a><br/>'); else: self.redirect(users.create_login_url(self.request.uri)) The content is gzip compressed when sent back from the server. Also, a log out link is provided. Data listing: To list the data uploaded by a specific user, the GQL is used to guarantee users can only see/access data belongs to him. class Blob(db.Model): """Models a data entry with an user, content, name, size, and date.""" user = db.UserProperty() name = db.StringProperty(multiline=True) content = blobstore.BlobReferenceProperty(blobstore.BlobKey) date = db.DateTimeProperty(auto_now_add=True) size = db.IntegerProperty()
  • 8. 6 - 4 This defines a data blob class with five properties: user whose value is a User object, name whose value is a String, content whose value is a BlobKey pointed to this blob, date whose value is datetime.datetime, and size whose value is an Integer. GQL, a SQL-like query language, provides access to the App Engine datastore query engine's features using a familiar syntax. The query happens here: blobs = db.GqlQuery("SELECT * " "FROM Blob " "WHERE user = :1", user) This can return all blobs uploaded by this user. Data upload: To create and upload a blob, follow this procedure: Call blobstore.create_upload_url() to create an upload URL for the form that the user will fill out, passing the application path to load when the POST of the form is completed: upload_url = blobstore.create_upload_url('/upload') There is an asynchronous version, create_upload_url_async(). It allows your application code to continue running while Blobstore generates the upload URL. The form must include a file upload field, and the form's enctype must be set to multipart /form data. When the user submits the form, the POST is handled by the Blobstore API, which creates the blob. The API creates an info record for the blob and stores the record in the datastore, and passes the rewritten request to your application on a given path as a blob key: self.response.out.write('<html><body>') self.response.out.write('<form action="%s" method="POST" enctype="multipart/form-data">' % upload_url) self.response.out.write("""Upload File: <input type="file" name="file"><br> <input type="submit" name="submit" value="Submit"> </form></body></html>""") • In this handler, you can store the blob key with the rest of your application's data model. The blob key itself remains accessible from the blob info entity in the datastore. Note that after the user submits the form and your handler is called, the blob has already been saved and the blob info added to the datastore. If your application doesn't want to keep the blob, you should delete the blob immediately to prevent it from becoming orphaned: class UploadHandler(blobstore_handlers.BlobstoreUploadHandler): def post(self): try: upload_files = self.get_uploads('file') # 'file' is file upload field in the form blob_info = upload_files[0] myblob = Blob() myblob.name = blob_info.filename myblob.size = blob_info.size myblob.user = users.get_current_user() myblob.content = blob_info.key() myblob.put() self.redirect('/') except:
  • 9. 6 - 5 self.redirect('/') • The webapp framework provides the blobstore_handlers.BlobstoreUploadHandler upload handler class to help you parse the form data. For more information, see the reference for BlobstoreUploadHandler. • When the Blobstore rewrites the user's request, the MIME parts of the uploaded files have their bodies emptied, and the blob key is added as a MIME part header. All other form fields and parts are preserved and passed to the upload handler. If you don't specify a content type, the Blobstore will try to infer it from the file extension. If no content type can be determined, the newly created blob is assigned content type application/octet- stream. Data download: To serve blobs, you must include a blob download handler as a path in your application. The application serves a blob by setting a header on the outgoing response. The following sample uses the webapp framework. When using webapp, the handler should pass the blob key for the desired blob to self.send_blob(). In this example, the blob key is passed to the download handler as part of the URL. The download handler can get the blob key by any means you choose, such as through another method or user action. class ServeHandler(blobstore_handlers.BlobstoreDownloadHandler): def get(self, resource): resource = str(urllib.unquote(resource)) blob_info = blobstore.BlobInfo.get(resource) self.send_blob(blob_info) The webapp framework provides the download handler class blobstore_handlers. BlobstoreDownloadHandler to help you parse the form data. For more information, see the reference for BlobstoreDownloadHandler. Blobs can be served from any application URL. To serve a blob in your application, you put a special header in the response containing the blob key. App Engine replaces the body of the response with the content of the blob. Problem 6.3: Source code: http://dl.dropbox.com/u/12951553/bookanswers/answer6.3.zip For this question, we provided a JAVA SimpleDB application with all critical functions like domain creation, data insertion, data edition, data deletion, and domain deletion. With these functions demonstrate how to make basic requests to Amazon SimpleDB using the AWS SDK for Java. The reader can easily scale this application up to meet the requirements from the question. Prerequisites: You must have a valid Amazon Web Services developer account, and be signed up to use Amazon SimpleDB. For more information on Amazon SimpleDB, please refer to http://aws.amazon.com/simpledb http://aws.amazon.com/security-credentials Problem 6.4:
  • 10. 6 - 6 Now, design and request an EC2 configuration on the AWS platform for parallel multiplication of two very large matrices with an order exceeding 50,000. Source code : http://156.56.93.128/PBMS/doc/answer6.4.zip The parallel matrix multiplication is implemented using Hadoop 0.20.205, and experiments are performed on Amazon EC2 platform with sample matrices between orders of 20,000 and 50,000. Steps to implement parallel matrix multiplication using Hadoop is as follows: 1) Split Matrix A and Matrix B into two grid of n*n blocked matrices. There will be 2*n*n Map tasks, and n*n Reduce tasks. 2) Each Map task holds either A[p][q] or B[p][q] and then sends it to ‘n’ Reduce tasks r[p][1<i<n], or r[1< j<n][q] respectively. 3) Each Reduce task r[p][q] receive 2*n sub-matrices which include A[p][1<i<n], and B[q][1<j<n] from Map tasks, then Reduce task multiply A[p][1<i<n] to B[q][1<j<n], then sum them up. The advantages of this algorithm are: 1) splitting large matrix into small sub-matrices such that working memory of sub-matrices can be fit in memory of small EC2 instance. 2) many small tasks increase the application parallelism. The disadvantages include the parallel overhead in terms of scheduling, communication, and sorting caused by many tasks. EC2 configuration In the experiments, we use instance type: EMR, M1.small: 1.7GB memory, 1core per node. We created four instances group with 1, 2, 4, 8, 16 nodes respectively. One should note that Hadoop jobtracker and namenode take one node for dedicated usage for the 2,4,8,16 nodes cases. Steps: a. ./elastic-mapreduce --create --instance-count 16 –alive (apply resource) b. ./elastic-mapreduce --jobflow j-22ZM5UUKIK69O –ssh (ssh to master node) c. ./ s3cmd get s3://wc-jar/ matrix-multiply-hadoop.jar (download program jar file) d. ./s3cmd get s3://wc-input/matrix-50k-5k ./50k-5k (download input data) e. Hadoop dfs –put 50k-5k/* 50k-5k (upload data to HDFS) f. Hadoop jar matrix-multiply-hadoop.jar 50k-5k output 50000 5000 10 (run program) Analysis Figure1,2,3,4 show that our parallel matrix multiply implementation can scale well in EC2 especially for large matrices. For example, the relative speed-up for processing 20k,30k,40k,50k data are 4.43, 7.75, 9.67, 11.58 respectively when using 16 nodes. The larger the matrices sizes are, the better the parallel efficiency the application have. (The reason why performance using two nodes is only a little faster than one node case is because the jobtracker and tasktracker were run on separate nodes). Other issues in the experiments: Storage utilization: data size are 16GB+36GB+64GB+100GB for 20k, 30k,40k,50k data sets respectively, and there are 216GB data in total. The total costs for the experiments are input data transfer in: $0.1*216GB = $21.6; EC2 instances: M1.small, 290hours*$0.08/hour = $23.2. System metric, such as resource utilization: using “CloudWatch” in AWS Management Console. Fault tolerance, see answer for problem 4.10. Experiments results
  • 11. 6 - 7 Figure 1:Parallel Matrix Multiply for 20K Figure 2:Parallel Matrix Multiply for 30K Figure3:Parallel Matrix Multiply for 40K Figure4:Parallel Matrix Multiply for 50K Problem 6.5: We implemented the parallel matrix multiply application using EMR and S3 on AWS platform. The basic algorithm and configuration are as the same as in problem 6.4. The only difference is that in problem 6.6, Hadoop retrieve the input data from S3 rather than HDFS in problem 6.4. Analysis Figure1,2,3,4 show that the parallel matrix multiply can scale well in EMR/S3 environment especially for large matrices. The relative speed-up of processing 20k,30k,40k,50k data are 7.24, 12.3, 16.4, 19.39 respectively when using 16 nodes. The super-linear speedup results were mainly caused by serious network contention when using single node to retrieve input data from S3. As compared to results using HDFS in problem 6.4, the results of 20k, 30k, 40k, 50k data sets using S3 on 16 nodes are 1.3, 1.65, 1.67, 1.66 times slower in job turnaround time respectively. The results using fewer nodes are even much slower. For example, the results of 50k data using S3 using 2 nodes are 2.19 times slower than HDFS case. These results indicate the big overhead when using Hadoop retrieves input data from S3. In figure 5, we show the average speed of transferring data from S3 to EC2 instance is 8.54MB/sec. The detailed algorithm, configuration and analysis of other issues such as speedup, cost-efficiency see answers in problem 6.4. Performance Results:
  • 12. 6 - 8 Figure 1:Parallel Matrix Multiply for 20K Figure 2: Parallel Matrix Multiply for 30K Figure3: Parallel Matrix Multiply for 40K Figure4:Parallel Matrix Multiply for 50K Figure 5: S3 data transferring speed Problem 6.6: Outline of Eli Lilly cloud usage Eli Lilly uses cloud computing in its research area of the company. In silico analyses is a large part of the research process for the pharmaceutical industry, and Eli Lilly is no exception. Cloud computing provides Lilly the ability for bursting capabilities when its internal compute environment is being utilized. Additionally, Eli Lilly relies on cloud computing for analyses on public datasets, where there is little to no concern on intellectual property or security. By running these analyses outside of its primary data centers, the company can free up internal resources for high performance computing and high throughput computing workflows that either may not fit well in the cloud or the analyses are considered more proprietary or regulated.
  • 13. 6 - 9 As of 2009, Eli Lilly was mainly using Amazon Web Services cloud, but have plans for using many more cloud vendors in the future, requiring an orchestration layer between Eli Lily and the various cloud services. According to Eli Lilly, a new server in AWS can be up and running in three minutes compared to the seven and a half weeks it take to deploy a server internally. A 64-node AWS Linux cluster can be online in five minutes compared with three months it takes to set such a cluster internally. One of the main drivers for Lilly to use the cloud is to speed development efforts through the drug pipeline more quickly. If analyses can be done in a fraction of the time because of the scale of the cloud then thousands of dollars spent on utility computing to speed up the pipeline can generate millions of dollars of revenue in a quicker timeframe. Sources: http://www.informationweek.com/news/hardware/data_centers/228200755 http://www.informationweek.com/news/healthcare/clinical-systems/227400374 http://www.informationweek.com/cloud-computing/blog/archives/2009/01/whats_next_in_t.html Problem 6.7: The source codes of this application can be obtained from the following link: http://dl.dropbox.com/u/27392330/forCloudBook/AzureTableDemo-gaoxm.zip . Using the Azure SDK for Microsoft Visual Studio, we developed a simple web application as shown in the following Figure. This application is extended from the Azure Table demo made by Nancy Strickland (http://www.itmentors.com/code/2011/03/AzureUpdates/Tables.zip), It can be used to demonstrate the application of Windows Azure Table, and to finish some simple performance tests of Windows Azure Table. A Web role is created for this application, which accesses the Windows Azure Table service from the Web server side. When the "Add Customer" button is clicked, a new entity will be created and inserted in to an Azure table. When the "Query Customer" button is clicked, the table is queried with the customer code and the customer's name will be shown after "Name". And when proper values are set in the "number of rows", "batch size", and "start rowkey" boxes, users can click the different "test" buttons to complete different performance tests for Windows Azure Table. Besides the local version, we also tried to deploy the application on a virtual machine in the Azure cloud. Some experiences we got from writing and deploying this application are: 1. The concept and separation of "Web role", "VM role" and "Worker role" during development are not straightforward to understand, and it takes some time to learn how to develop Azure applications. 2. Users cannot remotely login to VMs by default. It takes some special configurations. Besides, the security restrictions on VMs make it hard to operate the VMs. For example, almost all websites are marked as "untrusted" by IE in the VMs, which makes it very hard to even download something using the browser. 3. The SDK for Microsoft Visual Studio is powerful. The integration of the debugging and deployment stages in Visual Studio is very convenient and easy to use. However, the deployment process takes a long time, and it is hard to diagnose what is wrong if the deployment fails.
  • 14. 6 - 10 4. Overall, we think the Amazon EC2 models and Amazon Web Services are easier to understand and closer to developers' current experience Figure 4. A simple Windows Azure Web application using Azure Table Figure 5. Read and write speed for Windows Azure Table Problem 6.8: In Map-Reduce Programming model, there is a special case with implementing only the map phase, which is also known as “map-only” problem. This achievement can enhance existing application/binary to have high throughput with running them in parallel fashion; in other word, it helps standalone program to utilize the large scale computing capability. The goal of this exercise is to write a Hadoop “map-only” program with a bioinformatics application BLAST (NCBI BLAST+: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.23/) under a Linux/Unix environment.
  • 15. 6 - 11 Source code: http://dl.dropbox.com/u/12951553/bookanswers/feiteng_blast.zip For details usage about the source code, please refer to http://salsahpc.indiana.edu/tutorial/hadoopblast.html. Problem 6.9: This problem is research-oriented. Visit the posted Manjrasoft Aneka Software web site for details and example Solutions. Problem 6.10: Repeat applications in Problems 6.1 to 6.7 using the academic/open source packages described in Section 6.6 namely Eucalyptus, Nimbus, OpenStack, OpenNebula, Sector/Sphere. This software is all available on FutureGrid http://www.futureGrid.org with a number of tutorials. FutureGrid Tutorials - https://portal.futuregrid.org/tutorials Using Eucalyptus on FutureGrid - https://portal.futuregrid.org/tutorials/eucalyptus Using Nimbus on FutureGrid - https://portal.futuregrid.org/tutorials/nimbus Using OpenStack on FutureGrid - https://portal.futuregrid.org/tutorials/openstack Answer to question 6.15 also provides an overview of using Hadoop on FutureGrid cloud envrionments. Problem 6.11: Test run the large-scale matrix multiplication program on two or three cloud performs (GAE, AWS, and Azure). You can also choose another data-intensive application such as large-scale search or business processing applications involving the masses from the general public. Implement the application on at least two or all three cloud platforms, separately. The major objective is to minimize the execution time of the application. The minor objective is to minimize the user service costs. (a) Run the service on the Google GAE platform (b) Run the service on the Amazon AWS platform (c) Run the service on the Windows Azure platform (d) Compare your compute and storage costs, design experiences, and experimental results on all three cloud platforms. Report their relative performance and QoS results measured. Implementations: The implementation of large-scale matrix multiplication program on AWS and Azure using Hadoop and MPI are given in this chapter. The solution using Hadoop on Amazon AWS platform was discussed in problem 6.4&6.6. Here we discuss the solution using MPI on Azure HPC scheduler. A parallel matrix multiply algorithm, named Fox algorithm, was implemented using MS.MPI. Then we created the host service and deployed the Windows HPC cluster on Azure using Azure HPC Scheduler SDK tools. After that we logon to HPC cluster head node and submit the large scale matrix multiplication there. Source code : http://156.56.93.128/PBMS/doc/answer6.14.zip Steps: 1) Setup Azure HPC SDK environment: http://msdn.microsoft.com/en-us/library/windowsazure/hh545593.aspx
  • 16. 6 - 12 2) Configure and deploy HPC Cluster on Azure. http://msdn.microsoft.com/en-us/library/hh560239(v=vs.85).aspx 3) Logon to head node of HPC cluster and copy executable binary on head node 4) Setup execution environment and configure firewall exception: clusrun /nodegroup:computenode xcopy /E /Y HEADNODE1approot*.* F:approot clusrun /nodegroup:computenode hpcfwutil register FoxMatrix.exe F:approotFoxMatrix.exe http://msdn.microsoft.com/en-us/library/hh560242(v=vs.85).aspx. 5) Submit MPI job to HPC scheduler: job submit /nodegroup:computenodes /numnodes:16 mpiexec -n 16 -wdir F:approot F:approotFoxMatrix.exe 16000 Comparison: As compared with Amazon AWS, both the two platforms provide graphic interface for users to deploy Hadoop or HPC cluster respectively. Developers can submit the HPC jobs and Hadoop jobs to the dynamically deployed cluster either on the head node or on the client PC through job submission API. In regard to the performance, both applications run on Azure and EC2 show the performance fluctuation. Figure 1&2 show the maximum error of performance fluctuation of Hadoop using S3, Hadoop using HDFS, MPIAzure, MPICluster are 8.1%, 1.9%, 5.3%, and 1.2% respectively. The network bandwidth fluctuation is the main reason lead to performance fluctuation of Hadoop S3 implementation. The performance fluctuation of MPIAzure implementation is due to the aggregated delay of MPI communication primitives caused by system noise in guest OS in Cloud environment. Figure 1: performance fluctuate of Hadoop using Figure 2: performance fluctuate of MPIAzure HDFS and S3 for different problem sizes and MPIHPC for different problem sizes. Performance analysis: Performance analysis of parallel matrix multiplication on Amazon EC2 has been discussed in problem 6.4. This section just analysis performance of MPIAzure implementation. Figure 1 show the speedup of the MPICluster implementation is 8.6%, 37.1%, and 19.3% faster than that of MPIAzure implementation when using 4, 9, and 16 nodes respectively. Again, the performance degradation of MPIAzure implementation is due to the poor network performance in Cloud environment. This is caused by the poor network performance in Cloud environment. Figure 4 shows the performance of Fox algorithm of three implementations using 16 compute nodes. As expected, MPIAzure is slower than MPICluster, but is faster than DryadCluser. Figure 4&5 show the parallel overhead versus 1/Sqrt(n), where n refers to number of matrices elements per node.
  • 17. 6 - 13 In figure 5, the parallel overhead using 5x5, 4x4 and 3x3 nodes cases are linear in 1/Sqrt(n), which indicate the Fox MS.MPI implementation scale well in our HPC cluster with the infinite band network. In figure 4, the parallel overhead using 3x3 and 4x4nodes do not converge to X axis for large matrices sizes. The reason is the serious network contention occurred in Cloud environment when running with large matrices. Figure 3: speedup for number of nodes using Figure 4: Job time of different runtime on Azure and MPIAzure and MPICluster on difference nodes HPC cluster for different problem sizes Figure 5: parallel overhead vs. 1/Sqrt(n) for the Figure 6: parallel overhead vs. 1/Sqrt(n) for the Fox/MPIAzure/MKL on 3x3 and 4x4 nodes Fox/MPICluster/MKL on 3x3 and 4x4 nodes Problem. 6.12: Programming Environment Google MapReduce Apache Hadoop MapReduce Microsoft Dryad Coding Language and Programming Model used Mechanisms for Data Handling GFS(Google File System) HDFS (Hadoop Distributed File System) Shared directories and local disks Failure handling Methods Re-execute failed tasks and deplicated Re-execution of failed tasks; Re-execution of failed tasks;
  • 18. 6 - 14 execu- tion of the slow tasks Duplicate execution of slow tasks Duplicate execution of slow tasks High-Level Language for data anlysis Sawzall Pig Latin, Hive DryadLINQ OS and Cluster Environment Linux Clusters Linux Clusters, Amazon Elastic MapReduce on EC2 Windows HPCS cluster Intermediate data transfer method By File transfer or using the http links By File transfer or using the http links File, TCP pipes, shared-memory FIFOs Problem 6.13: The following program illustrates a sample application for image filtering using Aneka’s MapReduce Programming Model. Note that the actual image filtering is dependent on the problem domain and you may use any algorithm you see fit. class Program { /// Reference to the configuration object. static Configuration configuration = null; /// Location of the configuration file. static string configurationFileLocation = "conf.xml"; /// Processes the arguments given to the application and according to the parameters read runs the application or shows the help. /// <param name="args">program arguments</param> static void Main(string[] args) { try { //Process the arguments Program.ProcessArgs(args); Program.SetupWorkspace(); //configure MapReduceApplication MapReduceApplication<ImageFilterMapper, ImageFilterReducer> application = new MapReduceApplication<ImageFilterMapper, ImageFilterReducer>("ImageFilter", configuration); //invoke and wait for result application.InvokeAndWait(new EventHandler<Aneka.Entity. ApplicationEventArgs> (OnApplicationFinished)); } catch (Exception ex) { Console.WriteLine(" Message: {0}", ex.Message); Console.WriteLine("Application terminated unexpectedly."); } } /// Hooks the ApplicationFinished events and Process the results if the application has been successful. /// <param name="sender">event source</param> /// <param name="e">event information</param>
  • 19. 6 - 15 static void OnApplicationFinished(object sender, Aneka.Entity.ApplicationEventArgs e) { if (e.Exception != null) { Console.WriteLine(e.Exception.Message); } Console.WriteLine("Press enter to finish!"); Console.ReadLine(); } /// Processes the arguments given to the application and according to the parameters read runs the application or shows the help. /// <param name="args">program arguments</param> static void ProcessArgs(string[] args) { for (int i = 0; i < args.Length; i++) { switch (args[i]) { case "-c": i++; configurationFileLocation = args[i]; break; default: break; } } } /// Initializes the workspace static void SetupWorkspace() { Configuration conf = Configuration. GetConfiguration( Program.configurationFileLocation); Program.configuration = conf; } } /// Class ImageFilterMapper. Mapper implementation for the ImageFilter application. The Map method reads the source images and performs the required filtering. The output of the Map function is the filtered image. public class ImageFilterMapper : Mapper<string, BytesWriteable> { /// The Map function receives as input the name of the image and its contents. The filtering is then performed on the contents before writing the results back to the storage. /// <param name="input">A key-value pair representing the name of the /// file and its contents.</param>
  • 20. 6 - 16 Once you have written and compiled you code, run your application by varying first the input size and then the number of nodes (for example: 2, 4, 8, 16, ..). Plot a single graph of execution time (y-axis) versus input size (x-axis) for the different sets of nodes used, so that your final graph shows the difference in execution time for each of the sets of nodes. Next plot a graph of speed-up (x-axis) versus input size (y-axis) for the different sets of nodes used. Problem 6.14: Developing a platform service such as Hadoop on various cloud infrastructure can be an arduous task. Below we break this task down into 3 categories: building the VM, instantiating VMs, and setup Hadoop. Building a VM: a) Eucalyptus: The ideal way to build a hadoop VM on Eucalyptus 2.0 is to start with a pre- prepared base image and package it into your own EMI. You can find a starter image at http://open.eucalyptus.com/wiki/starter-emis. Once a starter image is selected, it is unzipped, mounted as a filesystem, and the Hadoop installation packages can be unzipped in a desired installation path (recommended /opt). After the image is properly prepared. The image is bundled, uploaded, and registered using the euca-bundle, euca- upload and euca-register commands described at http://open.eucalyptus.com/wiki/EucalyptusImageManagement_v2.0. b) Nimbus: Select the precompiled Hadoop cluster available at the Nimbus Marketplace http://scienceclouds.org/marketplace/ and add it to the given Nimbus cloud being used, if not already available. c) OpenStack: Similar to Eucalyptus, select a base image, either form the eucalyptus precompiled images or from the Ubuntu UEC http://uec-images.ubuntu.com/releases/. Once a starter image is selected, it is unzipped, mounted as a filesystem, and the Hadoop installation packages can be unzipped in a desired installation path (recommended /opt). After the image is properly prepared. The image is bundled, protected override void Map(IMapInput<string, BytesWriteable> input) { byte[] image = input.Value.GetBytes(); // Put your image filtering algorithm here // ... // ... Emit(input.Key, image); } } /// Class ImageFilterReducer. Reducer implementation for the ImageFilter application. The Reducer is an identity function which does no processing. public class ImageFilterReducer : Reducer<string, BytesWriteable> { /// The Reduce function is an identify function which does no further processing on the contents. protected override void Reduce(IReduceInputEnumerator<BytesWriteable> input) { // This is an identity function. No additional processing is required. } }
  • 21. 6 - 17 uploaded, and registered using the euca-bundle, euca-upload and euca-register commands described at http://open.eucalyptus.com/wiki/EucalyptusImageManagement_v2.0. Instantiate VMs: a) Eucalyptus: Using the euca2ools commands and assuming the user has the appropriate credentials and keypairs created, call euca-run-instances with the predefined EMI number retrieved form the previous step. Alternatively, use the boto2 library in Python to create your own startup script. b) Nimbus: Assuming the necessary credentials are in place, start the hadoop image by using the bin/cloud-client.sh –run command and specifying the image name in the – name attribute. c) OpenStack: Using the euca2ools commands and assuming the user has the appropriate credentials and keypairs created, call euca-run-instances with the predefined EMI number retrieved form the previous step. Alternatively, use the boto2 library in Python to create your own startup script. Setup Hadoop: Once a number of VMs have been instantiated and in the “running” state, select one as the master Hadoop node and designate the others as slave nodes. For each node, set the proper configuration in /etc/hosts, make changes to Hadoop’s configuration files as described at https://portal.futuregrid.org/salsahadoop-futuregrid-cloud-eucalyptus#Configuration . Once ready, you can start Hadoop on each VM with the bin/start-all.sh command and test using Lynx and connecting to the master node’s MapReduce and HDFS services (lynx 10.0.2.131:9001 and lynx 10.0.2.131:9003). Run WordCount: Once the Hadoop HDFS and MapReduce services are running properly, run the WordCount program described at http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Example%3A+WordCoun t+v1.0. Problem. 6.15: Examine the tutorials at http://www.salsahpc.org and http://www.iterativemapreduce.org/samples.html. Compare the Hadoop and Twister on cases specified by instructor from examples given there. Discuss their relative strength and weakness. We select KMeansClustering application to compare Hadoop and Twister. KMeans Clustering Twister strengths a) Data Caching: Twister supports in-memory caching of loop-invariant input data (KMeans input data points) across iterations, eliminating the overhead of retrieving and parsing the data in each iteration. Hadoop does not support caching of input data and have to read & parse data from disk (or from another node in case of a non-data local map task) in each iteration adding a significant overhead to the computation. b) Iterative extensions: Twister programming model contains a combiner step (after the reduce step) to merge the reduce outputs (new centeroids) and supports data
  • 22. 6 - 18 broadcasting at the beginning of an iteration. Hadoop does not support data broadcasting or providing broadcast data (KMeans centroids) as an input to the map tasks. Users will have to use an auxiliary mechanism (eg: distributed cache) to broadcast and receive the centroid data. Also the users have to manually merge the new centroids in the driver program. c) Intermediate data communication: Twister performs streaming intermediate data transfers directly to the reducers using messaging or TCP. Hadoop first writes the intermediate data to the disks before transferring adding a significant performance overhead as KMeansClustering performs significant amount of intermediate data transfers. Hadoop strengths d) Fault Tolerance – Hadoop supports fine-grained task level fault tolerance, where it re- execute the failed tasks to recover the computations. Hadoop also supports duplicate execution of slow tasks to avoid the tail of slow tasks. Twister supports fault-tolerance only in the iteration level, where if a task fails the whole iteration needs to be re- executed. e) Load Balancing – Hadoop performs global queue based dynamic scheduling resulting in natural load balancing of the computations. Hadoop also supports having multiple waves of map tasks per iteration, resulting in better load balancing and offsetting some of the intermediate data communication costs (overlapping communication with computation). Twister only supports static scheduling and do not support multiple waves of map tasks. f) Monitoring – Hadoop provides a web based monitoring UI, where the user can monitor the progress of the computations. Twister only provides a command line monitoring output. (a) and (b) applies only to the iterative MapReduce applications like KMeansClustering and PageRank. Others apply classic MapReduce and pleasingly parallel applications as well. Problem 6.16: The given program is written in Hadoop, namely WebVisCounter. Readers are encouraged to trace through the program or test run it on a cloud platform you have access. Analyze the programming tasks performed by this Hadoop program and learn from using Hadoop library. Refer to the tutorials on http://hadoop.apache.org/common/docs/r1.0.1/ on how to setup and run Hadoop. Refer to the answer for question 6.15 for instructions on running Hadoop on cloud environments. Problem 6.17: Twister K-means extends the MapReduce programming model iteratively. Many data analysis techniques require iterative computations. For example, K-means clustering is the application where multiple iterations of MapReduce computations are necessary for the overall computation. Twister is an enhanced MapReduce runtime that supports iterative MapReduce computations efficiently. In this assignment you will learn the iterative MapReduce programming model and how to implement the K-means algorithm with Twister. Please Learn how to use Twister from Twister webpage, and here is a helpful link: http://salsahpc.indiana.edu/ICPAD/twister_kmeans_user_guide.htm Problem 6.18:
  • 23. 6 - 19 DryadLINQ PageRank is a well known link analysis algorithm. It calculates numerical value to each element of a hyperlinked set of web pages, which reflects the probability that the random surfer will access that page. Implementing PageRank with MapReduce is of some difficulty in both efficiency and programmability due to the random access model in large scale web graph. DryadLINQ provide the SQL-like queries API that help programmer implement PageRank without much effort. Besides, the Dryad infrastructure helps to scale the application out in an easy way. This assignment will help you learn how to implement a simple PageRank application with DryadLINQ. We provide scratch code of DryadLINQ PageRank that can be compiled successfully. You will learn the PageRank algorithm, and learn how to implement PageRank with DryadLINQ API. PageRank algorithm PageRank is the well know link analysis algorithm. It calculates numerical value to each element of a hyperlinked set of web pages, which reflects the probability that the random surfer will access that page. In mathematics’ view, the process of PageRank can be understood as a Markov chain which needs recursively calculation to converge. The formula for PageRank algorithm is given below: This equation calculates the PageRank value for any page A. The updated rank value of page A is the sum of each adjacency page’s own rank value divided by the number of outbound links of that page. The damping factor (d) showed in Fig.1 illustrates the probability that one person will continue searching web by following the links in the current web page. The damping factor is subtracted from 1 and the result is divided by the number of web pages (N) in the collection. Then this term “(1‐d)/N” is added to the updated rank value of page A. The damping factor defined in Fig 1. is set as 0.85 in this assignment. DryadLINQ Implementation DryadLINQ is a compiler which translates LINQ programs to distributed computations. LINQ is an extension to .NET, launched with Visual Studio 2008, which provides declarative programming for data manipulation. With DryadLINQ, the programmer does not need to have much knowledge about parallel or distributed computation. Thus any LINQ programmers turn instantly into a cluster computing programmer. The PageRank algorithm requires multiple iterations during the overall computation. One iteration of PageRank computation consists of two job steps: 1) join the rank values table and linkage table to generate the partial rank values; 2) aggregate the partial rank values for each unique web page. A driver program keeps looping the join job and aggregate job until a stop condition is achieved. E.g, the number of rounds has exceeded the threshold, or the total difference of all rank values between two iterations is less than a predefined threshold. In DryadLINQ PageRank we use “IQueryable<Page> pages” to store the linkage table, and the “IQueryable<Vertex> rankValues” to store the rank values table. The linkage table is built from the adjacency matrix of web graph. All the adjacency matrix input files are defined in the partition table “cwpartition.pt”. The rank values are updated by using a Join of the current “rankValues” with the “pages” object. The output of the Join is a list of <dest, value> pairs that contain the partial rank values. We can aggregate those partial results by using a “GroupBy” on the first element of the <dest, value> tuple. Then the partial rank values of each webpage are accumulated, forming the new rank values for the next iteration. Sample Code
  • 24. 6 - 20 Here is sample code of DryadLINQ PageRank. We use the formula showed in Fig 1. to calculate the new rank values in each iteration. public void RunPageRank() { // string ptPath = @"file://MADRID‐HEADNODEDryadDataHuiPageRankcwpartition.pt"; PartitionedTable<LineRecord> table = PartitionedTable.Get<LineRecord>(ptPath); IQueryable<Page> pages = table.Select(lr => buildPage(lr.line)); Vertex[] ver = new Vertex[numPages]; double initialRank = 1.0 / numPages; for (int i = 0; i < numPages; i++) { ver[i].source = i+1; ver[i].value = initialRank; } IQueryable<Vertex> rankValues = ver.ToPartitionedTable("rankValues.pt"); IQueryable<Vertex> newRankValues = null; for (int i = 0; i < 10; i++) { newRankValues = pages.Join(rankValues, page => page.source, vertex => vertex.source, (page, vertex) => page.links.Select(dest => new Vertex(dest, vertex.value / page.numLinks))). SelectMany(list => list). GroupBy(vertex => vertex.source). Select(group => new Vertex(group.Key, group.Select(vertex => vertex.value).Sum()/numPages *0.85+0.15/numPages)); rankValues = newRankValues; Console.WriteLine(" pagerank iteration no:" + i); } SaveResults( Problem 6.19: The following program illustrates the use of Aneka’s Thread Programming Model for matrix multiplication. The program takes as inputs two square matrices. Each AnekaThread instance is a row-column multiplier, that is, a row from the first matrix is multiplied with the corresponding row from the second matrix to produce the resulting cell for the final matrix. Each of these row-column computations is performed independently on a Worker node. The results of the computations are then put together by the client application. /// Class <i><b>MatrixMultiplier</b></i>. Multiplies two square matrices, where each element in the resulting matrix, C, is computed by multiplying the corresponding row and column vectors of matrix A and B. Each is carried out by distinct instances of AnekaThread multiplying two square matrices of dimension n would thus requires n*n AnekaThread instances. public class MatrixMultiplier { /// The application configuration private Configuration configuration; /// Creates an instance of MatrixMultiplier /// <param name="schedulerUri">The uri to the Aneka scheduler</param> public MatrixMultiplier(Uri schedulerUri) { configuration = new Configuration(); configuration.SchedulerUri = schedulerUri; }
  • 25. Exploring the Variety of Random Documents with Different Content
  • 26. prominently displaying the sentence set forth in paragraph 1.E.1 with active links or immediate access to the full terms of the Project Gutenberg™ License. 1.E.6. You may convert to and distribute this work in any binary, compressed, marked up, nonproprietary or proprietary form, including any word processing or hypertext form. However, if you provide access to or distribute copies of a Project Gutenberg™ work in a format other than “Plain Vanilla ASCII” or other format used in the official version posted on the official Project Gutenberg™ website (www.gutenberg.org), you must, at no additional cost, fee or expense to the user, provide a copy, a means of exporting a copy, or a means of obtaining a copy upon request, of the work in its original “Plain Vanilla ASCII” or other form. Any alternate format must include the full Project Gutenberg™ License as specified in paragraph 1.E.1. 1.E.7. Do not charge a fee for access to, viewing, displaying, performing, copying or distributing any Project Gutenberg™ works unless you comply with paragraph 1.E.8 or 1.E.9. 1.E.8. You may charge a reasonable fee for copies of or providing access to or distributing Project Gutenberg™ electronic works provided that: • You pay a royalty fee of 20% of the gross profits you derive from the use of Project Gutenberg™ works calculated using the method you already use to calculate your applicable taxes. The fee is owed to the owner of the Project Gutenberg™ trademark, but he has agreed to donate royalties under this paragraph to the Project Gutenberg Literary Archive Foundation. Royalty payments must be paid within 60 days following each date on which you prepare (or are legally required to prepare) your periodic tax returns. Royalty payments should be clearly marked as such and sent to the Project Gutenberg Literary Archive Foundation at the address specified in Section 4, “Information
  • 27. about donations to the Project Gutenberg Literary Archive Foundation.” • You provide a full refund of any money paid by a user who notifies you in writing (or by e-mail) within 30 days of receipt that s/he does not agree to the terms of the full Project Gutenberg™ License. You must require such a user to return or destroy all copies of the works possessed in a physical medium and discontinue all use of and all access to other copies of Project Gutenberg™ works. • You provide, in accordance with paragraph 1.F.3, a full refund of any money paid for a work or a replacement copy, if a defect in the electronic work is discovered and reported to you within 90 days of receipt of the work. • You comply with all other terms of this agreement for free distribution of Project Gutenberg™ works. 1.E.9. If you wish to charge a fee or distribute a Project Gutenberg™ electronic work or group of works on different terms than are set forth in this agreement, you must obtain permission in writing from the Project Gutenberg Literary Archive Foundation, the manager of the Project Gutenberg™ trademark. Contact the Foundation as set forth in Section 3 below. 1.F. 1.F.1. Project Gutenberg volunteers and employees expend considerable effort to identify, do copyright research on, transcribe and proofread works not protected by U.S. copyright law in creating the Project Gutenberg™ collection. Despite these efforts, Project Gutenberg™ electronic works, and the medium on which they may be stored, may contain “Defects,” such as, but not limited to, incomplete, inaccurate or corrupt data, transcription errors, a copyright or other intellectual property infringement, a defective or
  • 28. damaged disk or other medium, a computer virus, or computer codes that damage or cannot be read by your equipment. 1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except for the “Right of Replacement or Refund” described in paragraph 1.F.3, the Project Gutenberg Literary Archive Foundation, the owner of the Project Gutenberg™ trademark, and any other party distributing a Project Gutenberg™ electronic work under this agreement, disclaim all liability to you for damages, costs and expenses, including legal fees. YOU AGREE THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE, STRICT LIABILITY, BREACH OF WARRANTY OR BREACH OF CONTRACT EXCEPT THOSE PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE THAT THE FOUNDATION, THE TRADEMARK OWNER, AND ANY DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE NOTICE OF THE POSSIBILITY OF SUCH DAMAGE. 1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you discover a defect in this electronic work within 90 days of receiving it, you can receive a refund of the money (if any) you paid for it by sending a written explanation to the person you received the work from. If you received the work on a physical medium, you must return the medium with your written explanation. The person or entity that provided you with the defective work may elect to provide a replacement copy in lieu of a refund. If you received the work electronically, the person or entity providing it to you may choose to give you a second opportunity to receive the work electronically in lieu of a refund. If the second copy is also defective, you may demand a refund in writing without further opportunities to fix the problem. 1.F.4. Except for the limited right of replacement or refund set forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
  • 29. INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PURPOSE. 1.F.5. Some states do not allow disclaimers of certain implied warranties or the exclusion or limitation of certain types of damages. If any disclaimer or limitation set forth in this agreement violates the law of the state applicable to this agreement, the agreement shall be interpreted to make the maximum disclaimer or limitation permitted by the applicable state law. The invalidity or unenforceability of any provision of this agreement shall not void the remaining provisions. 1.F.6. INDEMNITY - You agree to indemnify and hold the Foundation, the trademark owner, any agent or employee of the Foundation, anyone providing copies of Project Gutenberg™ electronic works in accordance with this agreement, and any volunteers associated with the production, promotion and distribution of Project Gutenberg™ electronic works, harmless from all liability, costs and expenses, including legal fees, that arise directly or indirectly from any of the following which you do or cause to occur: (a) distribution of this or any Project Gutenberg™ work, (b) alteration, modification, or additions or deletions to any Project Gutenberg™ work, and (c) any Defect you cause. Section 2. Information about the Mission of Project Gutenberg™ Project Gutenberg™ is synonymous with the free distribution of electronic works in formats readable by the widest variety of computers including obsolete, old, middle-aged and new computers. It exists because of the efforts of hundreds of volunteers and donations from people in all walks of life. Volunteers and financial support to provide volunteers with the assistance they need are critical to reaching Project Gutenberg™’s goals and ensuring that the Project Gutenberg™ collection will
  • 30. remain freely available for generations to come. In 2001, the Project Gutenberg Literary Archive Foundation was created to provide a secure and permanent future for Project Gutenberg™ and future generations. To learn more about the Project Gutenberg Literary Archive Foundation and how your efforts and donations can help, see Sections 3 and 4 and the Foundation information page at www.gutenberg.org. Section 3. Information about the Project Gutenberg Literary Archive Foundation The Project Gutenberg Literary Archive Foundation is a non-profit 501(c)(3) educational corporation organized under the laws of the state of Mississippi and granted tax exempt status by the Internal Revenue Service. The Foundation’s EIN or federal tax identification number is 64-6221541. Contributions to the Project Gutenberg Literary Archive Foundation are tax deductible to the full extent permitted by U.S. federal laws and your state’s laws. The Foundation’s business office is located at 809 North 1500 West, Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up to date contact information can be found at the Foundation’s website and official page at www.gutenberg.org/contact Section 4. Information about Donations to the Project Gutenberg Literary Archive Foundation Project Gutenberg™ depends upon and cannot survive without widespread public support and donations to carry out its mission of increasing the number of public domain and licensed works that can be freely distributed in machine-readable form accessible by the widest array of equipment including outdated equipment. Many
  • 31. small donations ($1 to $5,000) are particularly important to maintaining tax exempt status with the IRS. The Foundation is committed to complying with the laws regulating charities and charitable donations in all 50 states of the United States. Compliance requirements are not uniform and it takes a considerable effort, much paperwork and many fees to meet and keep up with these requirements. We do not solicit donations in locations where we have not received written confirmation of compliance. To SEND DONATIONS or determine the status of compliance for any particular state visit www.gutenberg.org/donate. While we cannot and do not solicit contributions from states where we have not met the solicitation requirements, we know of no prohibition against accepting unsolicited donations from donors in such states who approach us with offers to donate. International donations are gratefully accepted, but we cannot make any statements concerning tax treatment of donations received from outside the United States. U.S. laws alone swamp our small staff. Please check the Project Gutenberg web pages for current donation methods and addresses. Donations are accepted in a number of other ways including checks, online payments and credit card donations. To donate, please visit: www.gutenberg.org/donate. Section 5. General Information About Project Gutenberg™ electronic works Professor Michael S. Hart was the originator of the Project Gutenberg™ concept of a library of electronic works that could be freely shared with anyone. For forty years, he produced and distributed Project Gutenberg™ eBooks with only a loose network of volunteer support.
  • 32. Project Gutenberg™ eBooks are often created from several printed editions, all of which are confirmed as not protected by copyright in the U.S. unless a copyright notice is included. Thus, we do not necessarily keep eBooks in compliance with any particular paper edition. Most people start at our website which has the main PG search facility: www.gutenberg.org. This website includes information about Project Gutenberg™, including how to make donations to the Project Gutenberg Literary Archive Foundation, how to help produce our new eBooks, and how to subscribe to our email newsletter to hear about new eBooks.