A cloud gaming system based on user level virtualization and its resource scheduling.
for more ieee paper / full abstract / implementation , just visit www.redpel.com
This document describes a distributed compilation system called DistCom that utilizes idle computing resources across a network to speed up software build processes. It presents a distributed server/client model using object files as the basic unit. It also discusses CPU scheduling techniques for remote PC resources and cross-compiling for heterogeneous architectures. An evaluation shows DistCom can reduce a mobile platform build time by 65% compared to a local build, achieving performance similar to an 8-core PC using 10 distributed machines.
This project deals with the warehouse scale computers that power all the internet services which we use today. The project covers the hardware blocks used in a Google WSC. Also, the project deals with the architecture of hardware accelerators such as the Graphical Processing Unit and the Tensor Processing Unit, which is highly useful for the warehouse scale machines to run heavy tasks and also to support application-specific machine learning and deep learning tasks. Also, the project explains about the energy efficiency of the processors used by the Google WSC to achieve high performance. The project also tries to explain about performance enhancement mechanism used by Google WSC.
Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...Indrajit Poddar
This document provides an overview of enabling cognitive workloads on the cloud using GPUs with Mesos, Docker, and Marathon on IBM's POWER systems. It discusses requirements for GPUs in the cloud like exposing GPUs to containers and supporting multiple GPUs per node. It also summarizes Mesos and Kubernetes support for GPUs, and demonstrates running a deep learning workload on OpenPOWER hardware to identify dog breeds using Docker containers and GPUs.
A gossip protocol for dynamic resource management in large cloud environmentsJPINFOTECH JAYAPRAKASH
The document proposes a gossip protocol for dynamic resource management in large cloud environments. It aims to (1) ensure fair resource allocation among sites/applications, (2) dynamically adapt the allocation to load changes, and (3) scale with the number of physical machines and sites/applications. Existing gossip protocols have drawbacks like assuming static input, requiring restarts and global synchronization when input changes. The proposed protocol continuously executes on dynamic local input without global synchronization. It formally defines the resource allocation problem and provides an optimal solution without memory constraints and a heuristic solution that considers memory constraints and adaptation costs.
A Survey of Performance Comparison between Virtual Machines and Containersprashant desai
Since the onset of Cloud computing and its inroads into infrastructure as a service, Virtualization has become peak
of importance in the field of abstraction and resource management. However, these additional layers of abstraction provided by virtualization come at a trade-off between performance and cost in a cloud environment where everything is on a pay-per-use basis. Containers which are perceived to be the future of virtualization are developed to address these issues. This study paper scrutinizes the performance of a conventional virtual machine and contrasts them with the containers. We cover the critical
assessment of each parameter and its behavior when its subjected to various stress tests. We discuss the implementations and their performance metrics to help us draw conclusions on which one is ideal to use for desired needs. After assessment of the result and discussion of the limitations, we conclude with prospects for future research
Architecture exploration of recent GPUs to analyze the efficiency of hardware...journalBEEI
This document analyzes the efficiency of hardware resources in recent GPU architectures like Pascal compared to older architectures like Fermi. It simulates 9 benchmarks on a Fermi and Pascal-based GPU configuration using a cycle-accurate simulator. The results show that Pascal improves performance by 273% on average over Fermi. It also analyzes the impact of computing resources versus memory resources, varying the number of warp schedulers, and measuring barrier synchronization overhead. The goal is to understand how hardware upgrades in newer architectures translate to performance gains and guide future GPU development.
The document provides instructions for installing Xen Cloud Platform host software on a physical server. It describes selecting installation options such as keyboard layout, driver installation, clean vs upgrade install. It also covers configuring storage, networking and other setup steps. The goal is to install a Xen hypervisor and management tools to create a platform for hosting virtual machines.
Gamebryo LightSpeed provides improved runtime performance, a modular game framework, entity modeling tools, Lua scripting and debugging, and rapid iteration capabilities. New features include deferred lighting for improved rendering, a decoration system for terrain customization, terrain streaming for unlimited map sizes, and an enhanced water editor. It offers an integrated development environment with tools in Visual Studio, 3DS Max, and the Toolbench plugin suite.
Resumption of virtual machines after adaptive deduplication of virtual machin...IJECEIAES
In cloud computing, load balancing, energy utilization are the critical problems solved by virtual machine (VM) migration. Live migration is the live movement of VMs from an overloaded/underloaded physical machine to a suitable one. During this process, transferring large disk image files take more time, hence more migration and down time. In the proposed adaptive deduplication, based on the image file size, the file undergoes both fixed, variable length deduplication processes. The significance of this paper is resumption of VMs with reunited deduplicated disk image files. The performance measured by calculating the percentage reduction of VM image size after deduplication, the time taken to migrate the deduplicated file and the time taken for each VM to resume after the migration. The results show that 83%, 89.76% reduction overall image size and migration time respectively. For a deduplication ratio of 92%, it takes an overall time of 3.52 minutes, 7% reduction in resumption time, compared with the time taken for the total QCOW2 files with original size. For VMDK files the resumption time reduced by a maximum 17% (7.63 mins) compared with that of for original files.
Presentation I gave at the SORT Conference in 2011. Was generalized from some work I had done with using GPUs to accelerate image processing at FamilySearch.
This document discusses using OpenCL to accelerate numerical modeling of gravitational wave sources on hardware accelerators like GPUs and the Cell BE. It summarizes the EMRI Teukolsky Code, which models gravitational waves generated by a compact object orbiting a supermassive black hole by solving the Teukolsky equation. The authors parallelized this code using OpenCL to run on GPUs and the Cell BE, achieving performance comparable to using each vendor's native SDK while only writing code once for both architectures.
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and DockerIndrajit Poddar
Transparently accelerated Deep Learning workloads on OpenPOWER systems and GPUs using easy to use open source frameworks such as Caffe, Torch, Tensorflow, Theano.
SIMULATION AND PERFORMANCE ANALYSIS OF A LARGE SCALED INTERNET APPLICATION ...ankit_saluja
This document describes simulations of a large-scale Internet application (Facebook) on cloud computing environments using a tool called CloudAnalyst. Several scenarios are simulated with different configurations for data center location, service broker algorithm, and VM load balancing algorithm. Key results include overall response time, data center processing time, and cost. The best performance was found with multiple data centers located in each region using proximity-based routing and throttled load balancing, with an average response time of 205ms and total cost of $1,128.94.
This document discusses integrating Data Protection Manager (DPM) 2007 with a SAN (storage area network) to allow for quick file and application recovery using hardware-based snapshot and cloning technology. It provides details on using SAN clone technology for initial replication in DPM and recovery using SAN hardware snapshots. Performance testing showed cloning two 400GB Exchange storage groups took around 4 hours with a transfer rate of 56.8 MB/Sec.
Shader Model 5.0 introduces several new features for vertex, hull, domain, geometry, and pixel shaders, including uniform indexing of resources, SV_Coverage system value, and double precision support. Compute shaders also gain features like raw and structured buffer views, atomic operations, and thread local storage. Compute shaders are well-suited for general purpose GPU tasks like post-processing and can perform Gaussian blur more efficiently than pixel shaders by reducing memory bandwidth usage through thread local storage.
Power through your high school courseload with a responsive ChromebookPrincipled Technologies
Two Chromebooks with Intel Core i3-1125G4 and Intel Pentium Silver N6000 processors required less time to complete tasks in educational apps vs. two Chromebooks with MediaTek MT8183 and Qualcomm Snapdragon 7c processors
LIQUID-A Scalable Deduplication File System For Virtual Machine Imagesfabna benz
LIQUID-A Scalable Deduplication File System For Virtual Machine Images.
INTRODUCTION:Cloud computing means storing and accessing data programs over internet instead of yours computers hard drive.
A virtual machine is a software that creates a virtualized environment between the computer platform and the end user in which the end user can operate software.
Data Deduplication – data compression technology.
Eliminate duplicate copies of repeating data.
A redundant data block is replaced instead of storing multiple times.Improves storage utilization.
ADVANTAGES OF LIQUID:
*Fast virtual machine deployment with peer to peer data transfer.
*Low storage consumption by means of deduplication.
*Instant cloning for virtual machine images.
*On demand fetching through a network caching with local disks.
*LIQUID files has no specific limit.
CONCLUSION:
Presented LIQUID which is a deduplication file system with good IO performance.
Achieve by caching frequently accessed data blocks in memory cache.
Avoids additional disk operations.
Deduplication of VM images proved to be effective.
Proper resource allocation is critical to achieving top application performance in a virtualized environment. Resource contention degrades performance and underutilization can lead to costly server sprawl.
We found that adding VMTurbo to a VMware vSphere 5.5 cluster and following its reallocation recommendations gave our application performance a big boost. After reducing vCPU count, increasing memory allocation to active databases, and moving VMs to more responsive storage as VMTurbo directed, online transactions increased by 23.7 percent while latency dropped significantly. Avoid the pitfalls of poorly allocated VM resources and give your virtualized application every advantage by gaining control of your environment at every level.
This document describes using in-place computing on PostgreSQL to perform statistical analysis directly on data stored in a PostgreSQL database. Key points include:
- An F-test is used to compare the variances of accelerometer data from different phone models (Nexus 4 and S3 Mini) and activities (walking and biking).
- Performing the F-test directly in PostgreSQL via SQL queries is faster than exporting the data to an R script, as it avoids the overhead of data transfer.
- PG-Strom, an extension for PostgreSQL, is used to generate CUDA code on-the-fly to parallelize the variance calculations on a GPU, further speeding up the F-test.
Classification of Virtualization Environment for Cloud ComputingSouvik Pal
Cloud Computing is a relatively new field gaining more popularity day by day for ramified applications among the Internet users.
Virtualization plays a significant role for managing and coordinating the access from the resource pool to multiple virtual machines on
which multiple heterogeneous applications are running. Various virtualization methodologies are of significant importance because it
helps to overcome the complex workloads, frequent application patching and updating, and multiple software architecture. Although
a lot of research and study has been conducted on virtualization, a range of issues involved have mostly been presented in isolation
of each other. Therefore, we have made an attempt to present a comprehensive survey study of different aspects of virtualization. We
present our classification of virtualization methodologies and their brief explanation, based on their working principle and underlying
features.
Cloud Gaming Architectures: From Social to Mobile to MMOAWS Germany
October 21st 2015, Cloud Gaming Architectures: From Social to Mobile to MMO, Mark Bate
Das AWS Pop-up Loft in Berlin ist nur für kurze Zeit geöffnet. Vom 15.10. bis 13.11.2015 haben Sie die einmalige Gelegenheit Teil von etwas Besonderem zu sein. Werden Sie jetzt kostenlos Loft Member und erhalten Sie exklusiven Zugang zu den attraktiven Loft-Angeboten. http://aws.amazon.com/de/start-ups/loft/de-loft/
Ensuring High-performance of Mission-critical Java Applications in Multi-tena...Zhenyun Zhuang
The document discusses problems with ensuring high performance of mission-critical Java applications in multi-tenant cloud environments. It identifies issues caused by resource sharing between applications on the same platform, such as memory pressure triggering page swapping and direct reclaiming, which can severely degrade Java application performance through increased garbage collection pauses and reduced throughput. The authors investigate two scenarios in a production environment and determine that transparent huge pages, memory pressure from other applications, and interactions between the JVM and Linux memory management are key factors impacting Java application performance in multi-tenant cloud setups.
This white paper describes the design of a 50,000-seat virtual desktop deployment using VMware View and vSphere virtualization technologies. Key partners in the project include NetApp, VMware, Cisco, Fujitsu, and Wyse Technology. The design uses a "pool of desktops" approach with standardized server and storage configurations that can be replicated modularly to scale the deployment from 5,000 to 50,000 seats. Detailed specifications are provided for the servers, storage, networking, and other infrastructure components used in the deployment.
This document discusses quality of service (QoS) parameters that are important for online multiplayer gaming. It outlines key QoS metrics like throughput, transit delay, delay jitter, and error rate. These metrics affect both game developers and players. For developers, QoS impacts issues like latency, appropriate transport protocols, and compression techniques. For players, perceived latency is especially important as it can impact whether targets are hit in first-person shooter or racing games. The growing demand for fast and reliable online gaming means QoS will remain an important consideration.
Introducing GeForce NOW, a new game streaming service that is like Netflix for games. Learn about the benefits, technology and roadmap that will transform how video games are played.
Gamebryo LightSpeed provides improved runtime performance, a modular game framework, entity modeling tools, Lua scripting and debugging, and rapid iteration capabilities. New features include deferred lighting for improved rendering, a decoration system for terrain customization, terrain streaming for unlimited map sizes, and an enhanced water editor. It offers an integrated development environment with tools in Visual Studio, 3DS Max, and the Toolbench plugin suite.
Resumption of virtual machines after adaptive deduplication of virtual machin...IJECEIAES
In cloud computing, load balancing, energy utilization are the critical problems solved by virtual machine (VM) migration. Live migration is the live movement of VMs from an overloaded/underloaded physical machine to a suitable one. During this process, transferring large disk image files take more time, hence more migration and down time. In the proposed adaptive deduplication, based on the image file size, the file undergoes both fixed, variable length deduplication processes. The significance of this paper is resumption of VMs with reunited deduplicated disk image files. The performance measured by calculating the percentage reduction of VM image size after deduplication, the time taken to migrate the deduplicated file and the time taken for each VM to resume after the migration. The results show that 83%, 89.76% reduction overall image size and migration time respectively. For a deduplication ratio of 92%, it takes an overall time of 3.52 minutes, 7% reduction in resumption time, compared with the time taken for the total QCOW2 files with original size. For VMDK files the resumption time reduced by a maximum 17% (7.63 mins) compared with that of for original files.
Presentation I gave at the SORT Conference in 2011. Was generalized from some work I had done with using GPUs to accelerate image processing at FamilySearch.
This document discusses using OpenCL to accelerate numerical modeling of gravitational wave sources on hardware accelerators like GPUs and the Cell BE. It summarizes the EMRI Teukolsky Code, which models gravitational waves generated by a compact object orbiting a supermassive black hole by solving the Teukolsky equation. The authors parallelized this code using OpenCL to run on GPUs and the Cell BE, achieving performance comparable to using each vendor's native SDK while only writing code once for both architectures.
Fast Scalable Easy Machine Learning with OpenPOWER, GPUs and DockerIndrajit Poddar
Transparently accelerated Deep Learning workloads on OpenPOWER systems and GPUs using easy to use open source frameworks such as Caffe, Torch, Tensorflow, Theano.
SIMULATION AND PERFORMANCE ANALYSIS OF A LARGE SCALED INTERNET APPLICATION ...ankit_saluja
This document describes simulations of a large-scale Internet application (Facebook) on cloud computing environments using a tool called CloudAnalyst. Several scenarios are simulated with different configurations for data center location, service broker algorithm, and VM load balancing algorithm. Key results include overall response time, data center processing time, and cost. The best performance was found with multiple data centers located in each region using proximity-based routing and throttled load balancing, with an average response time of 205ms and total cost of $1,128.94.
This document discusses integrating Data Protection Manager (DPM) 2007 with a SAN (storage area network) to allow for quick file and application recovery using hardware-based snapshot and cloning technology. It provides details on using SAN clone technology for initial replication in DPM and recovery using SAN hardware snapshots. Performance testing showed cloning two 400GB Exchange storage groups took around 4 hours with a transfer rate of 56.8 MB/Sec.
Shader Model 5.0 introduces several new features for vertex, hull, domain, geometry, and pixel shaders, including uniform indexing of resources, SV_Coverage system value, and double precision support. Compute shaders also gain features like raw and structured buffer views, atomic operations, and thread local storage. Compute shaders are well-suited for general purpose GPU tasks like post-processing and can perform Gaussian blur more efficiently than pixel shaders by reducing memory bandwidth usage through thread local storage.
Power through your high school courseload with a responsive ChromebookPrincipled Technologies
Two Chromebooks with Intel Core i3-1125G4 and Intel Pentium Silver N6000 processors required less time to complete tasks in educational apps vs. two Chromebooks with MediaTek MT8183 and Qualcomm Snapdragon 7c processors
LIQUID-A Scalable Deduplication File System For Virtual Machine Imagesfabna benz
LIQUID-A Scalable Deduplication File System For Virtual Machine Images.
INTRODUCTION:Cloud computing means storing and accessing data programs over internet instead of yours computers hard drive.
A virtual machine is a software that creates a virtualized environment between the computer platform and the end user in which the end user can operate software.
Data Deduplication – data compression technology.
Eliminate duplicate copies of repeating data.
A redundant data block is replaced instead of storing multiple times.Improves storage utilization.
ADVANTAGES OF LIQUID:
*Fast virtual machine deployment with peer to peer data transfer.
*Low storage consumption by means of deduplication.
*Instant cloning for virtual machine images.
*On demand fetching through a network caching with local disks.
*LIQUID files has no specific limit.
CONCLUSION:
Presented LIQUID which is a deduplication file system with good IO performance.
Achieve by caching frequently accessed data blocks in memory cache.
Avoids additional disk operations.
Deduplication of VM images proved to be effective.
Proper resource allocation is critical to achieving top application performance in a virtualized environment. Resource contention degrades performance and underutilization can lead to costly server sprawl.
We found that adding VMTurbo to a VMware vSphere 5.5 cluster and following its reallocation recommendations gave our application performance a big boost. After reducing vCPU count, increasing memory allocation to active databases, and moving VMs to more responsive storage as VMTurbo directed, online transactions increased by 23.7 percent while latency dropped significantly. Avoid the pitfalls of poorly allocated VM resources and give your virtualized application every advantage by gaining control of your environment at every level.
This document describes using in-place computing on PostgreSQL to perform statistical analysis directly on data stored in a PostgreSQL database. Key points include:
- An F-test is used to compare the variances of accelerometer data from different phone models (Nexus 4 and S3 Mini) and activities (walking and biking).
- Performing the F-test directly in PostgreSQL via SQL queries is faster than exporting the data to an R script, as it avoids the overhead of data transfer.
- PG-Strom, an extension for PostgreSQL, is used to generate CUDA code on-the-fly to parallelize the variance calculations on a GPU, further speeding up the F-test.
Classification of Virtualization Environment for Cloud ComputingSouvik Pal
Cloud Computing is a relatively new field gaining more popularity day by day for ramified applications among the Internet users.
Virtualization plays a significant role for managing and coordinating the access from the resource pool to multiple virtual machines on
which multiple heterogeneous applications are running. Various virtualization methodologies are of significant importance because it
helps to overcome the complex workloads, frequent application patching and updating, and multiple software architecture. Although
a lot of research and study has been conducted on virtualization, a range of issues involved have mostly been presented in isolation
of each other. Therefore, we have made an attempt to present a comprehensive survey study of different aspects of virtualization. We
present our classification of virtualization methodologies and their brief explanation, based on their working principle and underlying
features.
Cloud Gaming Architectures: From Social to Mobile to MMOAWS Germany
October 21st 2015, Cloud Gaming Architectures: From Social to Mobile to MMO, Mark Bate
Das AWS Pop-up Loft in Berlin ist nur für kurze Zeit geöffnet. Vom 15.10. bis 13.11.2015 haben Sie die einmalige Gelegenheit Teil von etwas Besonderem zu sein. Werden Sie jetzt kostenlos Loft Member und erhalten Sie exklusiven Zugang zu den attraktiven Loft-Angeboten. http://aws.amazon.com/de/start-ups/loft/de-loft/
Ensuring High-performance of Mission-critical Java Applications in Multi-tena...Zhenyun Zhuang
The document discusses problems with ensuring high performance of mission-critical Java applications in multi-tenant cloud environments. It identifies issues caused by resource sharing between applications on the same platform, such as memory pressure triggering page swapping and direct reclaiming, which can severely degrade Java application performance through increased garbage collection pauses and reduced throughput. The authors investigate two scenarios in a production environment and determine that transparent huge pages, memory pressure from other applications, and interactions between the JVM and Linux memory management are key factors impacting Java application performance in multi-tenant cloud setups.
This white paper describes the design of a 50,000-seat virtual desktop deployment using VMware View and vSphere virtualization technologies. Key partners in the project include NetApp, VMware, Cisco, Fujitsu, and Wyse Technology. The design uses a "pool of desktops" approach with standardized server and storage configurations that can be replicated modularly to scale the deployment from 5,000 to 50,000 seats. Detailed specifications are provided for the servers, storage, networking, and other infrastructure components used in the deployment.
This document discusses quality of service (QoS) parameters that are important for online multiplayer gaming. It outlines key QoS metrics like throughput, transit delay, delay jitter, and error rate. These metrics affect both game developers and players. For developers, QoS impacts issues like latency, appropriate transport protocols, and compression techniques. For players, perceived latency is especially important as it can impact whether targets are hit in first-person shooter or racing games. The growing demand for fast and reliable online gaming means QoS will remain an important consideration.
Introducing GeForce NOW, a new game streaming service that is like Netflix for games. Learn about the benefits, technology and roadmap that will transform how video games are played.
- Virtual GPU (vGPU) technology from Nvidia allows multiple virtual desktop users to share a single physical GPU, increasing user density compared to previous solutions.
- Nvidia's GRID vGPU uses the Tesla M6, M10, and M60 GPUs with various profile options (Q, B, A) that allocate different amounts of GPU memory.
- Setting up vGPU requires installing Nvidia drivers in the VM, configuring the hypervisor and VM, and monitoring tools are available to check GPU usage.
- Use cases for vGPU include 3D applications, high-resolution displays, video acceleration, and GPU pass-through for applications like CAD and content creation.
An introduction to what multiplayer games are, what makes them different from normal games, how to approach building them and specifically how to begin building them with the Unity game engine.
Talk given at the GameIS & Dragonplay mobile multiplayer hackathon, 30/7/2015
Cloud Gaming Onward: Research Opportunities and OutlookAcademia Sinica
Cloud gaming has become increasingly more popular in the academia and the industry, evident by the large numbers of related research papers and startup companies. Some public cloud gaming services have attracted hundreds of thousands subscribers, demonstrating the initial success of cloud gaming services. Pushing the cloud gaming services forward, however, faces various challenges, which open up many research opportunities. In this paper, we share our views on the future cloud gaming research, and point out several research problems spanning over a wide spectrum of different directions: including distributed systems, video codecs, virtualization, human-computer interaction, quality of experience, resource allocation, and dynamic adaptation. Solving these research problems will allow service providers to offer high-quality cloud gaming services yet remain profitable, which in turn results in even more successful cloud gaming eco-environment. In addition, we believe there will be many more novel ideas to capitalize the abundant and elastic cloud resources for better gaming experience, and we will see these ideas and associated challenges in the years to come.
Gamelets - Multiplayer Mobile Games with Distributed Micro-Clouds [Full Text]Anand Bhojan
This document proposes a system called Gamelets that uses distributed micro-clouds to improve cloud gaming for mobile devices. Gamelets are minimal hardware devices like WiFi access points that are placed close to mobile clients. They run parts of games to reduce latency and bandwidth usage compared to rendering entirely in centralized cloud servers. Key challenges include distributing game data across Gamelets, distributed rendering to share loads, and security issues from the distributed nature. The document outlines a prototype implementation that divides games into zones distributed across Gamelets and uses multiple cameras for distributed rendering loads. Gamelets aim to enable more types of cloud games on mobile by addressing latency and scalability issues of traditional cloud approaches.
With the arrival of cloud technology, game accessibility and ub
iquity have a bright future; Games can be
hosted in a centralize server and accessed through the Internet by a thin client on a wide variety of devices
with modest capabilities: cloud gaming. However, current cloud gaming systems have very strong
requireme
nts in terms of network resources, thus reducing the accessibility and ubiquity of cloud games,
because devices with little bandwidth and people located in area with limited
and unstable
network
connectivity, cannot take advantage of these cloud services.
In this paper we present an adaptation technique inspired by the level of detail (LoD) approach in 3D
graphics. It delivers multiple platform accessibility
and network adaptability
, while improving user’s
quality of experience (QoE) by reducing the impa
ct of poor
and
unstable
network parameters (delay,
packet loss, jitter) on game interactivity. We validate our approach using a prototype game
in a controlled
environment
and characterize the user QoE in a pilot experiment. The results show that the propos
ed
framework provides a significant QoE enhancement
The document discusses the evolution of GPU architecture and capabilities over time. It describes how GPUs have become massively parallel processors with programmable capabilities beyond just graphics. The document outlines the core components of a GPU including the graphics pipeline and programming model. It also discusses how GPUs are well suited for parallel, data-intensive applications and how their capabilities have expanded into general purpose computing through technologies like CUDA.
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONScseij
This document summarizes a survey on GPU systems and their performance on different applications. It discusses how GPUs can be used for general purpose computing due to their high parallel processing capabilities. Several computational intensive applications that achieve speedups when implemented on GPUs are described, including video decoding, matrix multiplication, parallel AES encryption, and password recovery for MS office documents. The GPU architecture and Nvidia's CUDA programming model are also summarized. While GPUs provide significant performance benefits, some limitations for non-graphics applications are noted. The conclusion is that GPUs are a good alternative for computational intensive tasks to reduce CPU load and improve performance compared to CPU-only implementations.
Running Dicom Visualization On The Cell (Ps3) Rsna Poster Presentationbroekemaa
The document summarizes a project to use the computational capabilities of the PlayStation 3 (PS3) to accelerate medical image visualization. Researchers developed a framework to distribute DICOM image processing algorithms across client workstations and a PS3. Initial benchmarks showed the PS3 could process batches of 1000 DICOM images in under 10 minutes using just its main processor core. Future work involves optimizing algorithms for the PS3's specialized cores and comparing its performance to other hardware.
The International Journal of Engineering and Science (The IJES)theijes
The document summarizes research on heterogeneous computing using CPU-GPU integration. It proposes a unified graphics computing architecture (UGCA) that utilizes both CPU and GPU resources efficiently. The UGCA design translates PTX code to LLVM for execution on CPU and GPU. It also introduces a workload distribution module that splits tasks between CPU and GPU kernels based on granularity. Performance comparisons show CUDA providing better speedups than OpenCL due to its coarse-grained warp-level parallelism. The architecture aims to improve resource utilization for heterogeneous multi-core processors.
GPU computing uses a GPU as a co-processor to accelerate CPUs for general computing tasks by offloading computationally intensive parts of code. WebGL allows 3D configuration in real-time, while WebCL enables web applications to leverage parallel processing on GPUs and multi-core CPUs directly from a browser. Nokia and Samsung have implemented open source WebCL prototypes for Firefox and WebKit, allowing GPU acceleration of tasks like image processing.
This document discusses GPU computing and provides comparisons between CPU and GPU architectures and performance. It begins by introducing hybrid clusters that use accelerators like GPUs and FPGAs to provide high-performance computation. GPUs are discussed as being highly parallel and suitable for general-purpose computations. The document then summarizes GPU architecture and programming models like CUDA and OpenCL that are used to program GPUs. It provides an example GPU hardware architecture and explains how programming models map applications to GPU resources. Benchmark results are mentioned as showing GPUs can provide significantly faster computation times than CPUs for parallel problems.
This document discusses GPU computing and provides comparisons between CPU and GPU architectures and performance. It begins by introducing hybrid clusters that use accelerators like GPUs and FPGAs to provide high-performance computation. GPUs are discussed as being highly parallel and suitable for general-purpose computations. The document then summarizes GPU architecture and programming models like CUDA and OpenCL that are used to program GPUs. It provides an example GPU hardware architecture and explains how programming models map applications to GPU resources. Benchmark results are mentioned as showing GPUs can provide significantly faster computation times than CPUs for parallel problems.
The document discusses visualization systems and proposes concepts for their future development. It summarizes:
1) The "Visual Realityware" visualization software development environment, which uses an abstraction layer to allow developers to freely select mainstream graphics technologies and expand applications across multiple platforms with minimal bugs.
2) An application called "Virtual Anatomia" developed using Visual Realityware to visualize 3D biological data in real-time.
3) The concept of "Visionize" which is defined as a risk management methodology using visual communication to allow sharing of goals and visions in order to identify and prevent risks before issues arise.
Supporting bioinformatics applications with hybrid multi-cloud servicesAhmed Abdullah
ElasticHPC Supports the creation and management of cloud computing resources over multiple public cloud Providers Including Amazon, Azure, Google and Clouds supporting OpenStack.
Towards Fog-Assisted Virtual Reality MMOG with Ultra-Low LatencyIJCNCJournal
In this paper, we propose a method to realize a virtual reality MMOG (Massively Multiplayer Online Video Game) with ultra-low latency. The basic idea of the proposed method is to introduce a layer consisting of several fog nodes between clients and cloud server to offload a part of the rendering task which is conducted by the cloud server in conventional cloud games. We examine three techniques to reduce the latency in such a fog-assisted cloud game: 1) To maintain the consistency of the virtual game space, collision detection of virtual objects is conducted by the cloud server in a centralized manner; 2) To reflect subtle changes of the line of sight to the 3D game view, each client is assigned to a fog node and the head motion of the player acquired through HMD (Head-Mounted Display) is directly sent to the corresponding fog node; and 3) To offload a part of the rendering task, we separate the rendering of the background view from that of the foreground view, and migrate the former to other nodes including the cloud server. The performance of the proposed method is evaluated by experiments with an AWS-based prototype system. It is confirmed that the proposed techniques achieve the latency of 32.3 ms, which is 66 % faster than the conventional systems.
Exploring the Pros & Cons of GPU Cloud Servers for AI and ML.pdfGPU SERVER
In the modern era of AI and ML, the requirement of powerful computing assets is now more meaningful as compared to previous years. Cutting-edge GPU servers, proficient at managing challenging tasks, that are necessary for training AI models, etc. Standard CPUs usually fail to fulfill the demands when it comes to performing parallel processing. This is the case where GPU cloud servers play a significant role, offering robust solutions for artificial intelligence and ML-based tasks. Let’s check out the pros and cons of utilizing GPU cloud hosting, with a complete focus on how NVIDIA GPU cloud services can boost your tasks.
Image Processing Application on Graphics processorsCSCJournals
In this work, we introduce real time image processing techniques using modern programmable Graphic Processing Units GPU. GPU are SIMD (Single Instruction, Multiple Data) device that is inherently data-parallel. By utilizing NVIDIA new GPU programming framework, “Compute Unified Device Architecture” CUDA as a computational resource, we realize significant acceleration in image processing algorithm computations. We show that a range of computer vision algorithms map readily to CUDA with significant performance gains. Specifically, we demonstrate the efficiency of our approach by a parallelization and optimization of image processing, Morphology applications and image integral.
An exposition of performance comparison of graphic processing unit virtualiza...Asif Farooq
As the demand for computing power is increasing the number of new and improved methodologies in computer architectures are expanding. With the introduction of accelerated heterogeneous computing model, compute times for complex algorithms and tasks are reduced significantly as a result of high degree data parallelism. GPU based heterogeneous computing can not only benefit Cloud infrastructures but also large-scale distributed computing models to work more cost-effective by improving resource efficiencies and decreasing energy consumptions. Thus to implement such paradigm on cloud and largescale infrastructure would require effective GPU virtualization techniques. In this survey, an overview of GPGPU virtualization techniques using CUDA programming model is reviewed with a detailed performance comparison.
This document provides an overview and comparison of different GPU virtualization techniques using the CUDA programming model. It first reviews several techniques for GPU virtualization, including GViM, vCUDA, gVirtuS, rCUDA, DS-CUDA, LoGV, and Grid CUDA. It then compares these techniques based on factors like the CUDA version compatibility, hypervisor used, and whether they support remote GPU acceleration. Finally, the document provides a performance comparison based on overhead percentages and execution times reported in various studies, with rCUDA having the lowest overhead and fastest execution time on average.
VDI performance and price comparison: AMD-based Open Compute 3.0 server vs. H...Principled Technologies
Any organization using virtual desktop infrastructure can benefit by investing in servers that deliver high performance at a reasonable price. In our test, the AMD-based Open Compute 3.0 server hosted a few more virtual desktop sessions that the HP ProLiant DL360p Gen8 server did, while costing less than half as much.
Toradex's latest blog post written by Leonardo Graboski Veiga, FAE, Toradex Brasil, shows you how to provision an Ubuntu Server 16.04 LTS virtual machine in Microsoft Azure, and use Yocto/OpenEmbedded to generate an embedded Linux image. Read on here: https://www.toradex.com/blog/cloud-aided-yocto-build-speedup
An efficient tree based self-organizing protocol for internet of thingsredpel dot com
An efficient tree based self-organizing protocol for internet of things.
for more ieee paper / full abstract / implementation , just visit www.redpel.com
Validation of pervasive cloud task migration with colored petri netredpel dot com
The document describes a study that used Colored Petri Nets (CPN) to model and simulate task migration in pervasive cloud computing environments. The study made the following contributions:
1) It expanded the semantics of CPN to include context information, creating a new CPN model called CCPN.
2) Using CCPN, it constructed two task migration models - one that considered context and one that did not - to simulate task migration in a pervasive cloud based on the OSGi framework.
3) It simulated the two models in CPN Tools and evaluated them based on metrics like task migration accessibility, integrity of the migration process, and system reliability and stability after migration. It also
Web Service QoS Prediction Based on Adaptive Dynamic Programming Using Fuzzy ...redpel dot com
The document proposes a novel approach for predicting quality of service (QoS) metrics for cloud services. The approach combines fuzzy neural networks and adaptive dynamic programming (ADP) for improved prediction accuracy. Specifically, it uses an adaptive-network-based fuzzy inference system (ANFIS) to extract fuzzy rules from QoS data and employ ADP for online parameter learning of the fuzzy rules. Experimental results on a large QoS dataset demonstrate the prediction accuracy of this approach. The approach also provides a convergence proof to guarantee stability of the neural network weights during training.
Towards a virtual domain based authentication on mapreduceredpel dot com
This document proposes a novel authentication solution for MapReduce (MR) models deployed in public clouds. It begins by describing the MR model and job execution workflow. It then discusses security issues with deploying MR in open environments like clouds. Next, it specifies requirements for an MR authentication service, including entity identification, credential revocation, and authentication of clients, MR components, and data. It analyzes existing MR authentication methods and finds they do not fully address the needs of cloud-based MR deployments. The paper then proposes a new "layered authentication solution" with a "virtual domain based authentication framework" to better satisfy the requirements.
Privacy preserving and delegated access control for cloud applicationsredpel dot com
Privacy preserving and delegated access control for cloud applications
for more ieee paper / full abstract / implementation , just visit www.redpel.com
Performance evaluation and estimation model using regression method for hadoo...redpel dot com
Performance evaluation and estimation model using regression method for hadoop word count.
for more ieee paper / full abstract / implementation , just visit www.redpel.com
Frequency and similarity aware partitioning for cloud storage based on space ...redpel dot com
Frequency and similarity aware partitioning for cloud storage based on space time utility maximization model.
for more ieee paper / full abstract / implementation , just visit www.redpel.com
Multiagent multiobjective interaction game system for service provisoning veh...redpel dot com
Multiagent multiobjective interaction game system for service provisoning vehicular cloud
for more ieee paper / full abstract / implementation , just visit www.redpel.com
Efficient multicast delivery for data redundancy minimization over wireless d...redpel dot com
Efficient multicast delivery for data redundancy minimization over wireless data centers
for more ieee paper / full abstract / implementation , just visit www.redpel.com
Cloud assisted io t-based scada systems security- a review of the state of th...redpel dot com
Cloud assisted io t-based scada systems security- a review of the state of the art and future challenges.
for more ieee paper / full abstract / implementation , just visit www.redpel.com
I-Sieve: An inline High Performance Deduplication System Used in cloud storageredpel dot com
I-Sieve: An inline High Performance Deduplication System Used in cloud storage
for more ieee paper / full abstract / implementation , just visit www.redpel.com
Architecture harmonization between cloud radio access network and fog networkredpel dot com
Architecture harmonization between cloud radio access network and fog network
for more ieee paper / full abstract / implementation , just visit www.redpel.com
A tutorial on secure outsourcing of large scalecomputation for big dataredpel dot com
A tutorial on secure outsourcing of large scalecomputation for big data
for more ieee paper / full abstract / implementation , just visit www.redpel.com
A parallel patient treatment time prediction algorithm and its applications i...redpel dot com
A parallel patient treatment time prediction algorithm and its applications in hospital.
for more ieee paper / full abstract / implementation , just visit www.redpel.com
Understanding P–N Junction Semiconductors: A Beginner’s GuideGS Virdi
Dive into the fundamentals of P–N junctions, the heart of every diode and semiconductor device. In this concise presentation, Dr. G.S. Virdi (Former Chief Scientist, CSIR-CEERI Pilani) covers:
What Is a P–N Junction? Learn how P-type and N-type materials join to create a diode.
Depletion Region & Biasing: See how forward and reverse bias shape the voltage–current behavior.
V–I Characteristics: Understand the curve that defines diode operation.
Real-World Uses: Discover common applications in rectifiers, signal clipping, and more.
Ideal for electronics students, hobbyists, and engineers seeking a clear, practical introduction to P–N junction semiconductors.
Geography Sem II Unit 1C Correlation of Geography with other school subjectsProfDrShaikhImran
The correlation of school subjects refers to the interconnectedness and mutual reinforcement between different academic disciplines. This concept highlights how knowledge and skills in one subject can support, enhance, or overlap with learning in another. Recognizing these correlations helps in creating a more holistic and meaningful educational experience.
A measles outbreak originating in West Texas has been linked to confirmed cases in New Mexico, with additional cases reported in Oklahoma and Kansas. The current case count is 795 from Texas, New Mexico, Oklahoma, and Kansas. 95 individuals have required hospitalization, and 3 deaths, 2 children in Texas and one adult in New Mexico. These fatalities mark the first measles-related deaths in the United States since 2015 and the first pediatric measles death since 2003.
The YSPH Virtual Medical Operations Center Briefs (VMOC) were created as a service-learning project by faculty and graduate students at the Yale School of Public Health in response to the 2010 Haiti Earthquake. Each year, the VMOC Briefs are produced by students enrolled in Environmental Health Science Course 581 - Public Health Emergencies: Disaster Planning and Response. These briefs compile diverse information sources – including status reports, maps, news articles, and web content– into a single, easily digestible document that can be widely shared and used interactively. Key features of this report include:
- Comprehensive Overview: Provides situation updates, maps, relevant news, and web resources.
- Accessibility: Designed for easy reading, wide distribution, and interactive use.
- Collaboration: The “unlocked" format enables other responders to share, copy, and adapt seamlessly. The students learn by doing, quickly discovering how and where to find critical information and presenting it in an easily understood manner.
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - WorksheetSritoma Majumder
Introduction
All the materials around us are made up of elements. These elements can be broadly divided into two major groups:
Metals
Non-Metals
Each group has its own unique physical and chemical properties. Let's understand them one by one.
Physical Properties
1. Appearance
Metals: Shiny (lustrous). Example: gold, silver, copper.
Non-metals: Dull appearance (except iodine, which is shiny).
2. Hardness
Metals: Generally hard. Example: iron.
Non-metals: Usually soft (except diamond, a form of carbon, which is very hard).
3. State
Metals: Mostly solids at room temperature (except mercury, which is a liquid).
Non-metals: Can be solids, liquids, or gases. Example: oxygen (gas), bromine (liquid), sulphur (solid).
4. Malleability
Metals: Can be hammered into thin sheets (malleable).
Non-metals: Not malleable. They break when hammered (brittle).
5. Ductility
Metals: Can be drawn into wires (ductile).
Non-metals: Not ductile.
6. Conductivity
Metals: Good conductors of heat and electricity.
Non-metals: Poor conductors (except graphite, which is a good conductor).
7. Sonorous Nature
Metals: Produce a ringing sound when struck.
Non-metals: Do not produce sound.
Chemical Properties
1. Reaction with Oxygen
Metals react with oxygen to form metal oxides.
These metal oxides are usually basic.
Non-metals react with oxygen to form non-metallic oxides.
These oxides are usually acidic.
2. Reaction with Water
Metals:
Some react vigorously (e.g., sodium).
Some react slowly (e.g., iron).
Some do not react at all (e.g., gold, silver).
Non-metals: Generally do not react with water.
3. Reaction with Acids
Metals react with acids to produce salt and hydrogen gas.
Non-metals: Do not react with acids.
4. Reaction with Bases
Some non-metals react with bases to form salts, but this is rare.
Metals generally do not react with bases directly (except amphoteric metals like aluminum and zinc).
Displacement Reaction
More reactive metals can displace less reactive metals from their salt solutions.
Uses of Metals
Iron: Making machines, tools, and buildings.
Aluminum: Used in aircraft, utensils.
Copper: Electrical wires.
Gold and Silver: Jewelry.
Zinc: Coating iron to prevent rusting (galvanization).
Uses of Non-Metals
Oxygen: Breathing.
Nitrogen: Fertilizers.
Chlorine: Water purification.
Carbon: Fuel (coal), steel-making (coke).
Iodine: Medicines.
Alloys
An alloy is a mixture of metals or a metal with a non-metal.
Alloys have improved properties like strength, resistance to rusting.
A measles outbreak originating in West Texas has been linked to confirmed cases in New Mexico, with additional cases reported in Oklahoma and Kansas. The current case count is 817 from Texas, New Mexico, Oklahoma, and Kansas. 97 individuals have required hospitalization, and 3 deaths, 2 children in Texas and one adult in New Mexico. These fatalities mark the first measles-related deaths in the United States since 2015 and the first pediatric measles death since 2003.
The YSPH Virtual Medical Operations Center Briefs (VMOC) were created as a service-learning project by faculty and graduate students at the Yale School of Public Health in response to the 2010 Haiti Earthquake. Each year, the VMOC Briefs are produced by students enrolled in Environmental Health Science Course 581 - Public Health Emergencies: Disaster Planning and Response. These briefs compile diverse information sources – including status reports, maps, news articles, and web content– into a single, easily digestible document that can be widely shared and used interactively. Key features of this report include:
- Comprehensive Overview: Provides situation updates, maps, relevant news, and web resources.
- Accessibility: Designed for easy reading, wide distribution, and interactive use.
- Collaboration: The “unlocked" format enables other responders to share, copy, and adapt seamlessly. The students learn by doing, quickly discovering how and where to find critical information and presenting it in an easily understood manner.
CURRENT CASE COUNT: 817 (As of 05/3/2025)
• Texas: 688 (+20)(62% of these cases are in Gaines County).
• New Mexico: 67 (+1 )(92.4% of the cases are from Eddy County)
• Oklahoma: 16 (+1)
• Kansas: 46 (32% of the cases are from Gray County)
HOSPITALIZATIONS: 97 (+2)
• Texas: 89 (+2) - This is 13.02% of all TX cases.
• New Mexico: 7 - This is 10.6% of all NM cases.
• Kansas: 1 - This is 2.7% of all KS cases.
DEATHS: 3
• Texas: 2 – This is 0.31% of all cases
• New Mexico: 1 – This is 1.54% of all cases
US NATIONAL CASE COUNT: 967 (Confirmed and suspected):
INTERNATIONAL SPREAD (As of 4/2/2025)
• Mexico – 865 (+58)
‒Chihuahua, Mexico: 844 (+58) cases, 3 hospitalizations, 1 fatality
• Canada: 1531 (+270) (This reflects Ontario's Outbreak, which began 11/24)
‒Ontario, Canada – 1243 (+223) cases, 84 hospitalizations.
• Europe: 6,814
pulse ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulsesushreesangita003
what is pulse ?
Purpose
physiology and Regulation of pulse
Characteristics of pulse
factors affecting pulse
Sites of pulse
Alteration of pulse
for BSC Nursing 1st semester
for Gnm Nursing 1st year
Students .
vitalsign
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...larencebapu132
This is short and accurate description of World war-1 (1914-18)
It can give you the perfect factual conceptual clarity on the great war
Regards Simanchala Sarab
Student of BABed(ITEP, Secondary stage)in History at Guru Nanak Dev University Amritsar Punjab 🙏🙏
Exploring Substances:
Acidic, Basic, and
Neutral
Welcome to the fascinating world of acids and bases! Join siblings Ashwin and
Keerthi as they explore the colorful world of substances at their school's
National Science Day fair. Their adventure begins with a mysterious white paper
that reveals hidden messages when sprayed with a special liquid.
In this presentation, we'll discover how different substances can be classified as
acidic, basic, or neutral. We'll explore natural indicators like litmus, red rose
extract, and turmeric that help us identify these substances through color
changes. We'll also learn about neutralization reactions and their applications in
our daily lives.
by sandeep swamy
How to manage Multiple Warehouses for multiple floors in odoo point of saleCeline George
The need for multiple warehouses and effective inventory management is crucial for companies aiming to optimize their operations, enhance customer satisfaction, and maintain a competitive edge.
How to Customize Your Financial Reports & Tax Reports With Odoo 17 AccountingCeline George
The Accounting module in Odoo 17 is a complete tool designed to manage all financial aspects of a business. Odoo offers a comprehensive set of tools for generating financial and tax reports, which are crucial for managing a company's finances and ensuring compliance with tax regulations.
How to Subscribe Newsletter From Odoo 18 WebsiteCeline George
Newsletter is a powerful tool that effectively manage the email marketing . It allows us to send professional looking HTML formatted emails. Under the Mailing Lists in Email Marketing we can find all the Newsletter.
The ever evoilving world of science /7th class science curiosity /samyans aca...Sandeep Swamy
The Ever-Evolving World of
Science
Welcome to Grade 7 Science4not just a textbook with facts, but an invitation to
question, experiment, and explore the beautiful world we live in. From tiny cells
inside a leaf to the movement of celestial bodies, from household materials to
underground water flows, this journey will challenge your thinking and expand
your knowledge.
Notice something special about this book? The page numbers follow the playful
flight of a butterfly and a soaring paper plane! Just as these objects take flight,
learning soars when curiosity leads the way. Simple observations, like paper
planes, have inspired scientific explorations throughout history.
The ever evoilving world of science /7th class science curiosity /samyans aca...Sandeep Swamy
A cloud gaming system based on user level virtualization and its resource scheduling
1. A Cloud Gaming System Based on User-Level
Virtualization and Its Resource Scheduling
Youhui Zhang, Member, IEEE, Peng Qu, Jiang Cihang, and Weimin Zheng, Member, IEEE
Abstract—Many believe the future of gaming lies in the cloud, namely Cloud Gaming, which renders an interactive gaming application
in the cloud and streams the scenes as a video sequence to the player over Internet. This paper proposes GCloud, a GPU/CPU hybrid
cluster for cloud gaming based on the user-level virtualization technology. Specially, we present a performance model to analyze the
server-capacity and games’ resource-consumptions, which categorizes games into two types: CPU-critical and memory-io-critical.
Consequently, several scheduling strategies have been proposed to improve the resource-utilization and compared with others.
Simulation tests show that both of the First-Fit-like and the Best-Fit-like strategies outperform the other(s); especially they are near
optimal in the batch processing mode. Other test results indicate that GCloud is efficient: An off-the-shelf PC can support five high-end
video-games run at the same time. In addition, the average per-frame processing delay is 8$19 ms under different image-resolutions,
which outperforms other similar solutions.
Index Terms—Cloud computing, cloud gaming, resource scheduling, user-level virtualization
Ç
1 INTRODUCTION
CLOUD gaming provides game-on-demand services over
the Internet. This model has several advantages [1]: it
allows easy access to games without owning a game console
or high-end graphics processing units (GPUs); the game dis-
tribution and maintenance become much easier.
For cloud gaming, the response latency is the most essen-
tial factor of the quality of gamers’ experience “on the
cloud”. The number of games that can run on one machine
simultaneously is another important issue, which makes
this mode economical and then really practical. Thus, to
optimize cloud gaming experiences, CPU / GPU hybrid
systems are usually employed because CPU-only solutions
are not efficient for graphics rendering.
One of the industrial pioneers of cloud gaming, Onlive1
emphasized the former: it allocated one GPU per instance for
high-end video games. To improve utilization, some other
service-providers use the virtual machine (VM) technology
to share the GPU among games running on top of VMs. For
example, GaiKai2
and G-cluster3
stream games from cloud-
servers located around the world to internet-connected devi-
ces. Since the end of 2013, Amazon EC2 has also provided
the service for streaming games based on VMs.4
More technical details can be acquired from non-
commercial projects. GamePipe [2] is a VM-based cloud
cluster of CPU/GPU servers. Its characteristic lies in that,
not only cloud resources but also the local resources of
clients can be employed to improve the gaming quality.
Another system, GamingAnywhere [3], has used the user-
level virtualization technology. Compared with some solu-
tions, its processing delay is lower.
Besides, task scheduling is regarded as another key issue
to improve the utilization of resources, which has been veri-
fied in the high-performance GPU-computing fields [4], [5],
[6], [7]. However, to the best of our knowledge, the schedul-
ing research for cloud gaming has not received much
attention yet. One example based on VMs is VGRIS [8]
(including its successor VGASA [9]. It is a GPU-resource
management framework in the host OS and schedules vir-
tualized resource of guest OSes.
This paper proposes the design of a GPU/CPU hybrid sys-
tem for cloud gaming and its prototype, GCloud. GCloud has
used the user-level virtualization technology to implement a
sandbox for different types of games, which can isolate more
than one game-instance from each other on a game-server,
transparently capture the game’s video/audio outputs for
streaming, and handle the remote client-device’s inputs.
Moreover, a performance model has been presented;
thus we have analyzed resource-consumptions of games
and performance bottleneck(s) of a server, through exces-
sive experiments using a variety of hardware performance-
counters. Accordingly, several task-scheduling strategies
have been designed to improve the server utilization and
been evaluated respectively.
Different from related researches, we focus on the guide-
line of task-assignment, that is, on the reception of a game-
launch request, we should judge if a server is suitable to
undertake the new instance or not, under the condition sat-
isfying the performance requirements.
In addition, from the aspect of user-level virtualization
(there is some existing user-level solution, like Gaming-
Anywhere [3]), GCloud has its own characteristics:
1. http://www.onlive.com/
2. https://www.gaikai.com/
3. http://www.g-cluster.com/eng/
4. https://aws.amazon.com/game-hosting/
The authors are with the Department of Computer Science and Technology,
Tsinghua University, Beijing, China. E-mail: {zyh02, zwm-dcs}@tsinghua.
edu.cn, [email protected], [email protected].
Manuscript received 13 Nov. 2014; revised 11 May 2015; accepted 11 May
2015. Date of publication 14 May 2015; date of current version 13 Apr. 2016.
Recommended for acceptance by Y. Wang.
For information on obtaining reprints of this article, please send e-mail to:
[email protected], and reference the Digital Object Identifier below.
Digital Object Identifier no. 10.1109/TPDS.2015.2433916
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 27, NO. 5, MAY 2016 1239
1045-9219 ß 2015 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution
requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
www.redpel.com +917620593389
www.redpel.com +917620593389
2. First, it implements a virtual input-layer for each of con-
currently-running instances, rather than a system-wide one,
which can support more than one Direct-3D games at the
same time. Second, it designs a virtual storage layer to trans-
parently store each client’s configurations across all servers,
which has not been mentioned by related projects.
In summary, the following contributions have been
accomplished:
1) Enabling-technologies based on the light-weight virtu-
alization are introduced, especially those of GCloud ‘s
characteristics. (Section 3)
2) To balance the gaming-responsiveness and costs, we
adopt a “just good enough” principle to fix the FPS
(frame per second) of games to an acceptable level.
Under this principle, a performance model is con-
structed to analyze resource consumptions of games,
which categorizes games into two types: CPU-critical
and memory-io-critical; thus several scheduling mech-
anisms have been presented to improve the utiliza-
tion and compared. In addition, different from
previous jobs focused on the GPU-resource, our
work has found the host CPU or the memory bus is
the system bottleneck when several games are run-
ning simultaneously. (Section 4)
3) Such a cloud-gaming cluster has been constructed,
which supports the mainstream game-types. Results
of tests show that GCloud is highly efficient: An off-
the-shelf PC can support up to five concurrently-run-
ning video-games (each game’s image-resolution is
1024 Â 768 and the frame per second is 30). The aver-
age per-frame processing delay is 8$19 ms under
different image-resolutions, which can satisfy the
stringent delay requirement of highly-interactive
games. Tests have also verified the effects of our per-
formance model. (Section 5)
The remainder of this paper is organized as follows.
Section 2 presents the background knowledge of cloud gam-
ing as well as related work. Sections 3 and 4 are the main
part: the former introduces the user-level virtualization
framework and enabling technologies; the performance
model and its analysis method are given in the latter, as well
as the scheduling strategies. Section 5 presents the prototype
cluster and evaluates its performance. Section 6 concludes.
2 RELATED WORK
2.1 Cloud Gaming
Cloud gaming is a type of online gaming that allows
direct and on-demand streaming of game-scenes to
networked-devices, in which the actual game is running on
the server-end (main steps have been described in Fig. 1).
Moreover, to ensure the interactivity, all of these serial oper-
ations must happen in the order of milliseconds, which
challenges the system design critically.
The amount of latencies is defined as interaction delay.
Existing researches [10] have shown that different types of
games put forward different requirements.
One solution type of cloud-gaming is VM-based. For the
solutions based on VMs, Step 1 is completed in the guest OS
while others on the server-end are accomplish by the host.
Barboza et al. [11] presents such a solution, which provides
cloud gaming services and uses three levels of managers for
the cloud, hosts and clients. Some existing work, like GaiKai,
G-cluster, Amazon EC2 for streaming games and GamePipe
[2], also belong to this category.
In contrast to VM-based solutions, the user-level solution
inserts the virtualization layer between applications and the
run-time environment. This mode simplifies the processing
stack; thus it can reduce the extra overhead. GamingAny-
where [3] is such a user-level implementation, which sup-
ports Direct3D/SDL games on Windows and SDL games on
Linux.
Some solutions have enhanced the thin-client protocol to
support interactive gaming applications. Dependent on the
concrete implementation, they can be classified into the two
types. For example, Winter et al. [12] have enhanced the
thin-client server driver to integrate a real-time desktop
streamer to stream the graphical output of applications after
GPU processing, which can be regarded as a light-weight
virtualization-based solution. In contrast, Muse [13] uses
VMs to isolate and share GPU resources on the cloud-end,
which has enhanced the remote frame buffer (RFB) protocol
to compress the frame-buffer contents of server-side VMs.
However, these researches have focused on the optimiza-
tion of interaction delay, namely, taken care of the perfor-
mance of a single game on the cloud, rather than the
interference between concurrently-running instances. More-
over, none of these systems has presented any specific
scheduling strategy.
2.2 Resource Scheduling
For high performance computing (HPC), GPU virtualization
has been widely researched [14], [15], [16] for general pur-
pose computing. From the scheduling viewpoint, there are
also several researches, including Phull et al. [4], Ravi et al.
[5], Elliott and Anderson [6], [7] L. Chen et al. [7] and Bautin
et al. [17].
Fig. 1. The whole workflow of cloud-gaming.
1240 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 27, NO. 5, MAY 2016
www.redpel.com +917620593389
www.redpel.com +917620593389
3. However, none of these researches has considered the
cloud gaming characteristics, including the critical demand
on processing latencies, highly-coupled sequential opera-
tions and so on.
The work on the scheduling for cloud gaming is limited:
VGRIS [8] and its successor VGASA [9] are resource man-
agement frameworks for VM-based GPU resources, which
have implemented several scheduling algorithms for differ-
ent objectives. However, they are focused on scheduling
rendering tasks on a GPU, without considering other tasks
like image-capture / -encoding, etc. iCloudAccess [18] has
proposed an online control algorithm to perform gaming-
request dispatching and VM-server provisioning to reduce
latencies of the cloud gaming platform. A recent work is
[19], which has studied the optimized placement of cloud-
gaming-enabled VMs. The proposed heuristic algorithms
are efficient and nearly optimal. Ours can be regarded as
complementary to these researches, because they are
focused on the VM-granularity dispatching / provisioning
while we pay attention to issues inside an OS.
One related work on GPU-scheduling (but not cloud-gam-
ing-specific) is TimeGraph [20]: it is a real-time GPU scheduler
that has modified the device-driver for protecting important
GPU workloads from performance interference. Similarly, it
has not considered the cloud gaming characteristics.
Another category of related researches [21], [22] is con-
cerning the streaming media applications. For example,
Cherkasova and Staley [21] developed a workload-aware per-
formance model for video-on-demand (VOD) applications,
which is helpful to measure the capacity of a streaming sever
as well as the resource requirements. We have referred their
design principles to construct our performance model.
2.3 Others
To improve the processing efficiency and adaptation, Wang
and Dey [23] propose a rendering adaptation technique to
adapt the game rendering parameters to satisfy Cloud
Mobile Gaming’s constraints. Klionsky [24] has presented
an architecture which amortizes the cost of across-user ren-
dering. However, these two technologies are not transpar-
ent to games.
In addition, Jurgelionis et al. [25] explored the impact of
networking on gaming; Ojala and Tyrvainen [26] developed
a business model behind a cloud gaming company.
As a summary, compared with the above-mentioned
work, GCloud has the following features:
1) It is based on the user-level virtualization. Compared
with some existing user-level solution, GCloud has
proposed more thorough solutions for the virtual
input / storage.
2) From the aspect of performance modeling and sched-
uling, more real jobs (including image-capture, encod-
ing, etc.) have been considered (compared with
VGRIS / VGASA [8], [9]). In addition, we use the hard-
ware-assistant video encoding to mitigate the infer-
ence between games and to improve the performance.
3) Last but not least, our work is focused on related
issues inside a node, while [18], [19] do work on the
VM-granularity.
4) Furthermore, quite a few researches have been car-
ried out to measure the performance of cloud gam-
ing systems, like [27], [28], [29] and [30]. We also
referred them to complete our measurements.
3 SYSTEM ARCHITECTURE AND ENABLING
TECHNOLOGIES
3.1 The Framework
The system (in Fig. 2) is built with a cluster of CPU / GPU-
hybrid computing servers; a dedicated storage server is
used as the shared storage. Each computing server can host
the execution of several games simultaneously. One of these
servers is employed as the manager-node, which collects
real-time running information of all servers and completes
management tasks, including the task-assignment, user
authentication, etc.
It is necessary to note that the framework in Fig. 2 is for
small / medium system-scales. For a large scale system
with many users, a hierarchical architecture is needed to
avoid the bottleneck of information-exchange. In fact,
because the quality of gamers’ experience highly depends
on the response latency and the latter is sensitive to the
physical distance between clients and servers, the architec-
ture may be geographically-distributed, which is out of
scape of this paper. It also means that in one site the scale
will not be very large.5
Initially, gaming-agents on available computing servers
register to the manager, indicating that they are ready and
Fig. 2. System architecture.
5. According to OnLive, the theoretical upper bound of the distance
between a user and a cloud gaming server is approximately 1,000 miles.
In China, some gaming systems provide services for just one city or sev-
eral cities.
ZHANG ET AL.: A CLOUD GAMING SYSTEM BASED ON USER-LEVEL VIRTUALIZATION AND ITS RESOURCE SCHEDULING 1241
www.redpel.com +917620593389
www.redpel.com +917620593389
4. which games they can execute. When a client wants to play
some game, the manager will search for candidates among
the registered information. After such a server has been cho-
sen, a start-up command will be sent to the corresponding
agent to boot up the game within a light-weight virtualiza-
tion environment. Then, its address will be sent to the client.
Future communication will be done directly between the
two ends.
During the run time, each agent collects local runtime
information and sends it to the manager periodically; the
latter can get the latest status of resource-consumptions.
The storage server is an important role to provide the
personalized game-configuration for each user. For
instance, User A had played Game B on Server C. Now A
wants to play the game again while the manager finds that
Server C’s resources have been depleted. Then the task has
to be assigned to another server, D. Consequently, it is nec-
essary to restore A’s configurations of B on D, including the
game’s progress and other customized information. The
storage server is just used as the shared storage for all com-
puting nodes.
3.2 The User-Level Virtualization Environment
For each game, API Interception is employed to implement
a lightweight virtualization environment. API interception
means to intercept calls from the application to the underly-
ing running system. The typical applications include soft-
ware streaming [31], [32], etc. Here it is used to catch the
corresponding resource-access APIs from the game. In addi-
tion, our main target platform is MS Windows as Windows
dominates the PC video-game world.
3.2.1 Image Capture
Usually, gaming applications employ the mainstream 3D
computer-graphics-rendering libraries, like Direct3D or
OpenGL, to complete the hardware (GPU) acceleration;
GCloud supports both of them.
In the case of Direct3D, the typical workflow of a game is
usually an endless loop: First, some CPU computation pre-
pares the data for the GPU, e.g., calculating objects in the
upcoming frame. Then, the data is uploaded to the GPU
buffer and the GPU performs the computation, e.g., render-
ing, using its buffer contents and fills the front buffer. To
fetch contents of the image into the system memory for the
consequent processing, we intercept the Direct3D’s Present
API.
For OpenGL, we have intercepted the Present-like API in
OpenGL, glutSwapBuffers, to capture images.
For other games based on the common GUI window, we
just set a timer for the application’s main window, then we
intercept the right message handler to capture the image of
the target window periodically.
3.2.2 Audio Capture
Capturing of audio data is a platform-dependent task.
Because our main target platform is MS Windows, we inter-
cept Windows Audio Session APIs to capture the sound.
Core Audio serves as the foundation of quite a few higher-
level APIs; thus this method can bring about the best
adaptability.
3.2.3 Virtual Input Layer
Flash-based or OpenGL-based applications are usually
using the window’s default message-loop to handle inputs.
Thus, the solution is straightforward: We inject a dedicated
input-thread into the intercepted game-process. On recep-
tion of any control command from the client, this thread
will convert it into a local input message and send it to the
target window.
For Direct3D-based games, the situation is more compli-
cated. The existing work [3] replays input events using the
SendInput API on Windows. However, SendInput inserts
events into a system-wide queue, rather than the queue of a
specific process. So, it is difficult to support more than one
instance for the non-VM solution. To conquer this problem,
we intercepted quite a few DirectInput APIs to simulate
input-queues for any virtualized application; thus the user’s
input can be pushed into these queues and made accessible
to applications.
3.2.4 Virtual Storage Layer
From the storage aspect, a program can be divided into
three parts [31]: Part 12 include all resources provided by
the OS and those created/modified by the installation pro-
cess; Part 3 is the data created/modified/deleted during the
run time, which contains game-configurations of each user.
For the immutable parts, it is relatively easy to distribute
them to servers through some system clone method. The
focus is how to migrate resources of Part 3 across servers to
provide personalized game-configurations for users.
We construct a virtual storage layer by the interception of
file-system and registry accessing APIs of all games. During
the run time, the resource modified by the game instance
will be moved into Part 3. When the previously-described
case in Section 3.1 occurs, the virtual storage layer of Game
B on the current server can redirect resource-accesses to the
shared storage to visit the latest configurations of User A,
which were stored by the last run on Server C.
4 PERFORMANCE MODEL AND TASK SCHEDULING
As mentioned in Section 1, the response latency and the
number of games that one machine can execute simulta-
neously are both essential to a cloud gaming system. To a
large extent, they are in contradiction and existing systems
(like [3], [11], [12]) usually focus on the first issue.
However, it is not always economical. For example, if the
FPS of a given game is too high, it will consume more
resources. Moreover, the loss compression will counteract
the high video-quality to a certain extent.
Some scheduling work, like VGRIS / VGASA [8], [9], has
presented multi-task scheduling strategies. There are several
essential differences between our work and VGRIS / VGASA:
First, they are focused on how to schedule existing games on a
server, including the allocation of enough GPU resources for
a game, etc. In contrast, GCloud is focused on the assignment
of a new task. Second, they are focused on the GPU resource
and no any other operation (like image-capture, encoding,
etc.) has been considered, while our tests (presented in
Section 4.4) show the host CPU or the memory bus is the
bottleneck. Third, VGRIS and VGASA are VM-specific.
1242 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 27, NO. 5, MAY 2016
www.redpel.com +917620593389
www.redpel.com +917620593389
5. In this paper, we adopt a “just good enough” strategy,
which means that we just keep the game quality at some
acceptable level and then we try to satisfy the interactivity
requests of as many games as possible. Hence, there are two
main issues:
Issue 1: For a given server and its running game instances,
how to make sure the game quality is acceptable?
Issue 2: On an incoming request, which server is suitable to
launch the new game instance?
For Issue 1, we first give a brief pipeline model for cloud-
gaming, which can be used to judge whether the game qual-
ity is acceptable or not. Second, a method to fix the FPS has
been presented to provide the “just good enough” quality;
some hardware-assistant video encoding technique has also
been used to mitigate the inference between games further.
For Issue 2, several resource-metrics have been given. Then
we carry out tests to measure the server capacity and to cat-
egorize games into different types. Accordingly, we design
a server capacity model and corresponding task-assignment
strategies. These strategies have been compared with others.
4.1 Game Quality
A cloud gaming system’s interaction delay contains three
parts [27]: (1) Network delay, the time required for a round
of data exchange between the server and client; (2) Play-out
delay, the time required for the client to handle the received
for playback; (3) Processing delay, required for the server to
process a player’s command, and to encode and send the
corresponding frame back.
This paper is mainly about the server-side and the net-
work is assumed to be able to provide the sufficient band-
width, thus we focus on the processing delay that should be
confined into a limited range. The work [25] on measuring
the latency of cloud gaming has disclaimed that, for some
existing service-providers (like Onlive), the processing delay
is about 100-200ms. Thus, we use 100 ms as our scheduling
target, denoted MAX_PD. Another measurement of key
metrics is FPS; the required FPS is illustrated as FIXED_FPS.
In this work, FIXED_FPS is set to 30 by default.
As presented by Fig. 1, the gaming workflow can be
regarded as a pipeline including four steps: operations of
gaming logic, graphic rendering (including the image cap-
ture), encoding (including the color-space conversion) and
transmission. In addition, our tests show that given the suf-
ficient bandwidth, the delay of transmission is much less
than other steps. Thus, the fourth step can be skipped and
we focus on the remaining three.
Furthermore, the first two steps are completed by the
intercepted process, which is transparent to us; thus we
should combine them together and the sum of these laten-
cies is denoted by Tpresent. The average processing time
of the encoding step is denoted by Tencoding (The pipeline is
presented in Fig. 3). Hence, if the following conditions
(referred as Responsiveness Conditions) have been satisfied,
the requirement on the FPS and processing delay will be
met undoubtedly. To be more precise, satisfaction of the
first two conditions means the establishment of the last one,
under the default case.
Tpresent ¼ 1=FIXED FPS and (1)
Tencoding ¼ 1=FIXED FPS and (2)
Tencoding þ Tpresent ¼ MAX PD (3)
4.2 Fixed FPS
To provide the “just good enough” gaming quality, the FPS
value should be fixed to some acceptable level (Issue 1).
Because the interface of GPU drivers is not open, our solu-
tion is in the user-space, too.
Take the Direct3D game as an example, we intercept the
Present API to insert a Sleep call for adjusting the loop
latency: The rendering complexity is mostly affected by the
complexity of gaming scenes and the latter changes gradu-
ally. Thus, it is reasonable to predict Tpresent based on its
own historical information. In the implementation, the aver-
age time (denoted Tavg present) of the past 100 loops is used
as the prediction for the upcoming one (the similar method
has been adopted by [8], [9]) and the sleep time (Tsleep) is cal-
culated as:
Tsleep ¼ 1=FIXED FPS À Tavg present
The true problem lies in how to judge whether a busy
server is suitable to undertake a new game instance or not.
Thus, we should solve Issue 2 anyway.
4.3 Hardware-Assistant Video Encoding
The fixed-FPS can mitigate the inference between games
because it allocates just enough resource for rendering. Fur-
ther, we use the hardware-assistant video-encoding capabil-
ity of commodity CPUs for less inference.
The hardware technology of Intel CPUs, Quick Sync, has
been employed. It owns a full-hardware function pipeline
to compress raw images in the RGB or YUV format into the
H264 video. Now Quick Sync has become one of the main-
stream hardware encoding technologies.6
On the test server,
a Quick-Sync-enabled CPU can simultaneously support up
to twenty 30-FPS encoding tasks (the image resolution is
1024 Â 768); the latency for one frame is as low as 4.9 ms.
Fig. 3. Gaming pipeline.
6. Quick Sync was introduced with the Sandy Bridge CPU micro-
architecture. It is a part of the integrated on the same die as the CPU.
Thus, to enable it work with a discrete graphics card (used for gaming),
some special configuration should be set up as described by http://
mirillis.com/en/products/tutorials/ action-tutorial-intel-quick-sync-
setup_for_desktops.html. For AMD, its Accelerated Processing Unit
(APU) has the similar function.
ZHANG ET AL.: A CLOUD GAMING SYSTEM BASED ON USER-LEVEL VIRTUALIZATION AND ITS RESOURCE SCHEDULING 1243
www.redpel.com +917620593389
www.redpel.com +917620593389
6. Moreover, the CPU utilization of such one task is almost
negligible, less than 0.2 percent. (Details are presented in
Appendix A, which can be found on the Computer Society
Digital Library at http://doi.ieeecomputersociety.org/
10.1109/TPDS.2015.2433916). The result means it causes lit-
tle interference to other tasks. Thus, we use it as the refer-
ence implementation in all following tests, as well as in the
system prototype.
4.4 Resource-Metrics
Five types of system-resources have been focused on,
including the CPU, GPU, system-RAM, video-RAM and the
system bandwidth: The first two can be denoted by utiliza-
tion ratios; the next two are represented by memory con-
sumptions and the last refers to the miss number of the LLC
(Last Level Cache). Correspondingly, the server capacity
and the average resource requirements of a game (under
the condition satisfying the Responsiveness Conditions) can be
denoted by a tuple of five-items, U_CPU, U_GPU,
M_HOST, M_GPU, B.
Based on the above metrics, we should judge whether the
remaining resource-capacities of a server can meet the
demand of a new game or not. The key lies in how to mea-
sure the capacity of a server, as well as the game require-
ments. We present the following method to accomplish
these tasks, namely, to solve Issue 2.
4.4.1 Test Methods
Commercial GPUs usually implement driver / hardware
counters to provide the runtime performance information.
For example, the NVIDIA’s PerfKit APIs7
can collect
resource-consumption information of each GPU in real
time. Hence, we can get results accumulated from the previ-
ous time the GPU was sampled, including the percentage of
time the GPU is idle/busy, the consumption of graphic
memories, etc.
For commodity CPUs, the similar method has been used,
too. For instance, Intel has already provided the capability
to monitor performance events inside processors. Through
its performance counter monitor (PCM), a lot of perfor-
mance-related events per CPU-core, including the number
of LLC-misses, instructions per CPU cycle, etc., can be
obtained periodically.
The sample periods for CPU and GPU are both set to 3s.
In addition, we embed monitoring codes into the inter-
cepted gaming APIs to record processing delays of each
frame, which will be used to judge whether the Responsive-
ness Conditions have been met or not.
Moreover, it is necessary to note that, the integrated
graphics processor (that contains the Quick Sync encoding
engine) shares the LLC with CPU cores and there is no on-
chip graphics memory.8
Thus the hardware encoding pro-
cess needs to access the system memory (if the required
data is missed in the LLC), which means the corresponding
miss number is still suitable to indicate the memory
throughput with hardware encoding.
In addition, we select four representative games, includ-
ing three Direct3D video games (Direct3D is the most popu-
lar development library for PC video games) and one
OpenGL game. They are:
1) Need for Speed-Most Wanted (abbreviated to NFS).
It is a classic racing video game.
2) Modern Combat 2-Black Pegasus (abbreviated to
Combat), a first-person shooter video game.
3) Elder Scrolls: Skyrim-Dragonborn (abbreviated to
Scrolls), an action role-playing video game.
4) Angry Birds Classic: (abbreviated to Birds), the well-
known mobile-phone game’s PC version.
Several volunteers have been invited to play games on
the cloud gaming system and encouraged to play quite a
few game scenes and the duration is more than 15 minutes
for each game. After several loops, runtime information can
be collected for further analysis.
4.4.2 Test Cases
A Win 7 (64-bit) PC is used as the server, which is equipped
with an NVIDIA GTX780 GPU adapter (3 GB video mem-
ory), a common Core i7 CPU (four cores, 3.4 GHz) and 8 GB
RAM. By default, games will be streamed at the resolution
of 1024 Â 768 and the game picture quality is set to medium
in all cases; the FPS is fixed to 30. Video encoding is
completed by Quick Sync.
Single instance (Resource-requirement Tests). Each game has
been played in our virtualization environment alone and
resource consumptions are recorded in real-time. As
expected, Responsiveness Conditions can be met for each
game on the powerful machine; the corresponding
resource-requirements are presented as follows (Table 1).
Considering resource consolidation, the average value of
each item of the tuple has been used.
Multi-instances running simultaneously. Quite a few game
groups have been executed and sampled simultaneously.
For example, we have played 2$6 NFS instances at the
same time. Based on the runtime information, we can see
that this server can support up to five acceptable instances
simultaneously (we consider a game’s running quality
acceptable if its average FPS-value is not less than 90 per-
cent of the FIXED_FPS). While six instances are running, the
FPS value is less than 27, which is regarded as unacceptable.
Furthermore, we should identify the bottleneck that is
pivotal for task assignment. Considering the following facts
(in Fig. 4a), NFS is memory-io-critical:
When no more than five games are running simulta-
neously, the average FPS is stable (about 30.3) and the value
of million-miss-number-per-second increases almost linearly.
As six instances are running, the FPS is about 24.7 and the
throughput nearly remains unchanged (from 37.6 to 37.9). At
the same time, both U_GPU and U_CPU are far from
exhausted, 47 and 71 percent respectively. This phenomenon
indicates that memory accesses have impeded tasks from uti-
lizing the CPU/GPU resources efficiently. Moreover, memory
consumptions are not the bottleneck; thus no any swap opera-
tion will happen (For clarity, the information of memory-con-
sumptions is skipped in these figures).
For Combat and Scrolls (in Figs. 4b and 4c), the same
conclusion does hold: Under the condition satisfying
7. http://www.nvidia.com/object/nvperfkit_home.html
8. http://www.hardwaresecrets.com/printpage/Inside-the-Intel-
Sandy-Bridge-Microarchitecture/1161
1244 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 27, NO. 5, MAY 2016
www.redpel.com +917620593389
www.redpel.com +917620593389
7. performance requirements, there can be at most three con-
current instances of Scrolls. For Combat, the maximum num-
ber of instances is 5. At the same time, both U_GPU and
U_CPU are limited, too. On the other hand, Birds (in Fig. 4d)
is CPU-critical because it can exhaust the CPU (97 percent as
10 instances running and the average FPS is 27.1), while
the value of million-miss-number-per-second increases
almost linearly.
4.4.3 Modeling
Based on the previous results, we have normalized
the resource requirement and the server capacity; the princi-
ple is critical-resource-first: (1) For a memory-io-critical game
that the game-server can occupy Ni instances, the fifth
item (Bandwidth) of its tuple is set as MAX_SYSTEM_
THROUGHPUT9
/ Ni, regardless of the absolute value. (2)
For any CPU-critical that the game-server can occupy Nj
instances, the value of its Ucpu is set as 1/ Nj. (3) The other
tuple items are kept unchanged.
For example, the tuple of NFS is 9.15 percent, 2.01 per-
cent, 526, 220, MAX_SYSTEM_THROUGHPUT / 5, and
the Birds’ tuple is 100 percent / 10, 1.1 percent, 181, 142,
6.54. Tuples of these four games are listed in Table 2.
Then for a set of M games (each denoted as Gamei,
0 ¼ i M), if the sum of each kind of resource consump-
tion is less than the corresponding system-capacity, we con-
sider these games can run simultaneously and smoothly.
Formally, we use the following notations:
U CPUgame i, U GPUgame i, M HOSTgame i, M GPUgame i;
B game i : the tuple of resource requirements of Gamei;
100%, 100%, SERVER_RAM_CAPACITY, SERVER_VI-
DEO_RAM_CAPACITY, MAX_SYSTEM_THROUGHPUT
server: the capacity of a given server.
If the following conditions have been met, this sever can
occupy all games of the set running simultaneously.
X
0 i M
U CPUgame i 100%
X
0 i M
U GPUgame i 100%
X
0 i M
M HOSTgame i SERVER RAM CAPACITY
X
0 i M
M GPUgame i SERVER VIDEO RAM CAPACITY
X
0 i M
Bgame i MAX SYSTEM THROUGHPUT
Fig. 4. FPS and resource-consumptions of games.
TABLE 1
Resource-Requirements of Each Game
U_CPU
(%)
U_GPU
(%)
M_HOST
(MB)
M_GPU
(MB)
B (million miss-
number per
second)
NFS 9.15% 2.01% 526 220 8.10
Scrolls 14.55% 7.02% 795 560 13.52
Combat 8.47% 3.27% 800 296 7.97
Birds 9.36% 1.1% 181 142 6.54
9. MAX_SYSTEM_THROUGHPUT refers to the maximal LLC-miss-
number per second that the system can sustain. It can been evaluated
by a specially-designed program to access the memory space randomly
and intensively.
ZHANG ET AL.: A CLOUD GAMING SYSTEM BASED ON USER-LEVEL VIRTUALIZATION AND ITS RESOURCE SCHEDULING 1245
www.redpel.com +917620593389
www.redpel.com +917620593389
8. For example, one Scrolls, one Combat and two NFS can
run at the same time; if an extra NFS joins, this condition
will not be met and the bottleneck is B. Quite a few tests of
real games will be given in Section 5.1 to verify this design.
4.5 The Scheduling Strategy
As a conclusion, the following procedure for task assign-
ment is illustrated, which contains two stages.
Ready-Stage: when a game is being on-line, it will be
tested to get its resource requirements. Then, for any game
(denoted as Game_i), a tuple U_CPU, U_GPU, M_HOST,
M_GPU, Bgame_i can be given to represent its
requirements.
In addition, for any Server_j, its capacity is denoted as
U_CPU, U_GPU, M_HOST, M_GPU, Bserver_j. The corre-
sponding test-process has been described in the previous
paragraphs and each element will be labeled as the corre-
sponding maximum capacity.
Runtime-Stage: During the run time, the concurrent
resource-consumptions of each server (denoted as
U_CPU, U_GPU, M_HOST, M_GPU, Bserver_j_cur; in our
prototype, the average value of the latest one minute have
been used) are sampled periodically.
Moreover, the main goal of our scheduling strategy is to
minimize the number of servers used, which can be regarded
as a bin-packing problem. Serveral theoretical researches
[33], [34] have claimed that the First-fit and Best-fit algo-
rithms behave well for this problem, especially for the online
version with requests inserted in random order [34]. Thus,
we have designed two heuristic task-scheduling algorithms
based on the well-known First-fit and Best-fit strategies,
namely first-fit-like (FFL) and best-fit-like (BFL). The princi-
ple is straight; thus we only give their outlines here:
In FFL, for a given request of game_i, all servers will be
checked orderly, if one server (for example, server_j) can
occupy the new game, which means that each kind of
resource consumptions for all games on server_j (including
game_i) does not exceeds the capacity, this algorithm ends
successfully.
In BFL, the procedure is similar. The difference lies in
that, if there is more than one suitable server, the one will
leave the least critical resources is the best.
4.5.1 Tests with Artificial Traces
We have simulated our algorithms in two situations:
1) Several requests of the four games come simulta-
neously and must be dispatched instantly, namely,
in the batch processing mode.
2) Requests come one by one. The request-sequence fol-
lows a Poisson process with a mean time interval of
5 seconds; the duration of each game also follows a
Poisson process and the mean time is 40 minutes.
In both situations, we assume that there are enough servers
and each has an initial resource usage 10, 5, 3096, 512, 0
(it is gathered from our real servers). Thus, we can start a new
sever whenever needed. Moreover, from the aspect of
resource-usage, we mainly focus on the number of used-serv-
ers by each algorithm.
For the first situation, we have compared our algorithms
with three others:
Size-based task assignment (STA) [35]: This algorithm is
widely used in distributed systems, in which all tasks
within a given size range of resource requirements are
assigned to a particular server. Specific to our case, two
types of servers (for CPU-critical and for memory-IO-critical
respectively) are designated.
Packing algorithm (PA): It is a greedy algorithm. For
each server, it will be assigned as much games as possible
till all the games have been dispatched.
Dominant resource fairness (DRF) [36]: A fair sharing
model that generalizes max-min fairness to multiple
resource types. In our implementation, the collection of all
currently-used servers (called small servers) is regarded as a
big server. Whethe the big server can satisfy an incoming
request or not just depends on if there exists such a small
server. If not, a new small server will be added to enlarge
the big. The scheduling strategy inside the big one is First-fit
and all gaming requests are considered to be issued by dif-
ferent users.
We also estimate the ideal server-number for reference.
For each kind of resources (denoted by s), the minimum
number is
P
i¼1
P
i¼1
n
RRs
i =RRs
.
Here n is the total number of game requests; Ri denotes
the resource utilization of the i-th game and Rs
is the corre-
sponding resource capacity of a server. Thus, the maximum
of all minimums is the ideal number.
In the second situation our algorithms have been com-
pared with the STA algorithm, because others require the
information of the request sequence (which is unavailable
in this case) and will become the FFL.
Simulation results of Situation 1 are given in Fig. 5. The
y-axis stands for the needed-server numbers (for clarity,
TABLE 2
Resource-Requirements of Games
Tuple Game type
NFS 9.15%, 2.01%, 526, 220,
MAX_SYSTEM_BANDWIDTH / 5
memory-io-
critical
Scrolls 14.55%, 7.02%, 795, 560,
MAX_SYSTEM_BANDWIDTH / 3
memory-io-
critical
Combat 8.47%, 3.27%, 800, 296,
MAX_SYSTEM_BANDWIDTH / 5
memory-io-
critical
Birds 10%, 1.1%, 181, 142, 6.54 CPU-critical
Fig. 5. Server-numbers in Situation 1.
1246 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 27, NO. 5, MAY 2016
www.redpel.com +917620593389
www.redpel.com +917620593389
9. values have been normalized) as several requests have
arrived simultaneously (the request number is illustrated
by the x-axis). We can figure out that, compared with others,
the heuristic algorithms are quite good. Even considering
the ideal number, our algorithms are really close to the opti-
mal (the maximal value is 101.23 percent). Moreover, these
two algorithms perform almost equal in all cases.
Fig. 6 shows that the number of requested servers
when requests arrived in sequence (Situation 2). We can
figure out that our heuristic algorithms are more efficient
than the STA. These two algorithms also perform simi-
larly in all cases: compared with the BFL algorithm, the
more consumed resources by the FFL are less than
3.6 percent (57:55). At last, results show the performance
of FFL is about 20 percent faster than the BFL, while
both are fast enough (in the batch processing mode, both
can complete the task-assignment in several milliseconds
as the request number is 1,000).
4.5.2 Tests with Real Game-Traces
To further evaluate the proposed task-scheduling strategies,
we conduct a trace-driven simulation for a large-scale clus-
ter (some similar simulation method has been used in [37]);
each server is the same as the one presented in Section 4.4.
The dataset we used is the World of Warcraft history dataset
provided by Yeng-Ting Chen et al. [38]. Although this data-
set is based on the MMORPG of “World of Warcraft”, we
think it is useful in our case because cloud gaming and
MMORPG share many similarities, such as wide variations
in the gaming time, a huge bandwidth-demand and a large
number of concurrent users. Of course, necessary pre-proc-
essing is introduced to make the dataset more suitable,
namely, we have mapped the first four races in the dataset
(Blood Elf, Orc, Tauren and Troll) to the four kinds of games
in our system and the remaining one (the undead) is mapped
to one of these four games randomly.
In detail, we have used traces of three months that con-
sist of 396,631 game-requests (details are shown in Table 3).
Accordingly, a cluster of 200 servers has been simulated, in
which the master node collects the resource utilization of all
servers every one minute. Because previous tests have
shown that BFL and FFL policies perform similarly, we
have only tested the BFL scheduling policy here.
Fig. 7 shows numbers of running game-instances, acti-
vated servers and used servers (once it is used, a server will
be regarded as a used server no matter whether it is being
activated or not); there is an obvious linear relationship
between the number of game-instances and the number of
activated servers. What’s more, the average number of acti-
vated servers is 64, which is significantly less than the maxi-
mum number of used servers (152). It means that the
scheduling efficiency is good; it also means server consoli-
dation [37] can be used to further reduce the number of
servers.
Fig. 8 shows the average resource-utilizations of acti-
vated servers of each day. Although the utilization rates of
other resources are relatively low, the bandwidth’s is high.
It proves that most games are memory-io-critical, which
accords with our performance model.
We have completed another simulation, in which the
server number is infinite, to illustrate the relationship
between the total of used servers and the update-interval
for resource utilization.
Fig. 9 shows the relationship; we can see that when the
update-interval is less than 20 minutes, the number of used
servers varies slightly. When the interval is larger, the num-
ber has increased significantly. It means that we could use a
longer update-interval and the impact on the system effi-
ciency is very limited. It is also helpful to manage a large-
scale cloud gaming system, because message exchanges
between server-agents and the manager will be reduced
apparently.
Fig. 6. Server-numbers in Situation 2.
TABLE 3
Details of the Dataset
Parameter Value
Simulated period 3 months
Server number 200
Total game requests 396,631
Maximum game requests
arriving simultaneously
227
Maximum game instances
running simultaneously
757
Average lifetime of game
instances
85 minutes
Average interval between game
requests
3 minutes
Fig. 7. Running games and servers of each day.
ZHANG ET AL.: A CLOUD GAMING SYSTEM BASED ON USER-LEVEL VIRTUALIZATION AND ITS RESOURCE SCHEDULING 1247
www.redpel.com +917620593389
www.redpel.com +917620593389
10. 4.6 Discussions
4.6.1 Different Game Configurations and/or
Heterogeneous Servers
The above work is targeted to specific hardware and games
and we believe the method is practical: it is reasonable to
assume that any game should be tested fully before on-line;
thus the resource requirements of each game can be mea-
sured on the given server of which the hardware configura-
tion will remain unchanged for a long period.
If heterogeneous servers are used, as we have found that
the host CPU or the memory bus is the system bottleneck,
new servers’ capacities can also be derived, based on the
comparison between the CPU performance and system
bandwidth of reference servers and new servers (these met-
rics may have been labeled by the producer or can be
tested), which can avoid the exponentially-growing com-
plexity of testing. Appendix B, available in the online supple-
mental material, gives an example to show that the
capability of a new server for known games is predictable
and then summarizes the prediction method.
For different game configurations, the situation is more
complicated. Even if only the resolution is different, tests
show that there is not an obvious relationship between the
resolution and resource consumptions, although the con-
sumption of our framework itself (like encoding and image-
capture) is proportional to the resolution.
Therefore, our solution is: during the real operation ser-
vice period, such configurations can be evaluated on line
first. For example, we can schedule the same game with
same configurations to some dedicated server(s) if a user
has demanded. With the accumulation of game-runs, the
metrics will become more accurate.
4.6.2 Time-Dependent Factors
We use average values to denote resource requirements of a
given game. In reality, requirements are time-dependent,
which may vary in different gaming stages. However, we
believe average values are enough owing to the following facts:
1) The variety degree depends on the time granularity
heavily. Our tests show that the degree becomes
smaller with the increase of the time interval. When
the time interval is 30s (in Appendix C, available in
the online supplemental material), the variety of
requirements is relatively small.
2) Consider resource consolidation of multiple concur-
rently-running games, the usage of average values
are reasonable.
Moreover, it is necessary to note that some games will
last very long time to finish. Thus in our experimental envi-
ronment, it is difficult to explore plenty of scenes. However,
such a game can be evaluated on line first for data accumu-
lation (as we have mentioned above).
5 IMPLEMENTATION AND EVALUATION
5.1 Implementation
We have implemented the cloud gaming system based on
the user-level virtualization technology. Eight PC servers
are connected by a Gigabit Ethernet; their configurations
are the same as the previous one in Section 4.4. Detours [39]
has been used to complete the required interception func-
tions. In detail, we have implemented a DLL (called gamedll)
that can be inserted into any gaming process to wrap all
interesting APIs and to spawn two threads for input-recep-
tion and data-encoding / -streaming respectively.
Now our virtualization layer can stream Direct3D games,
OpenGL games and flash games to Windows, iOS and
Android clients, and receive remote operations. The UDT
(UDP-based Data Transfer) protocol [40] is used to deliver
the video / audio / operation data between the server
and client.
We use the periodical video capture as the timing-refer-
ence on the server side; any audio data between two conse-
cutive video-capture-timestamps will be delivered with the
current video data.
To be specific, Windows Audio Session APIs provide
some interface to create and manage audio streams to and
from audio devices. Our interception does replicate such
stream buffers. After the current image has been captured,
the audio data between the current read / write positions
(read position is just the current playback position) of the
buffer will be copied out immediately and sent out with the
current image. This method completes video / audio
synchronization and limits the timing discrepancy to the
reciprocal of the FPS value or so.
As mentioned in Section 4.1, an exception lies in that
games may decrease the FPS deliberately in some scenes,
which will cause more timing discrepancies. To remedy this
Fig. 8. Resource-utilizations of activated servers.
Fig. 9. Used servers of different update-intervals.
1248 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 27, NO. 5, MAY 2016
www.redpel.com +917620593389
www.redpel.com +917620593389
11. situation, a dedicated timer has been introduced to trigger
audio transmission only if the current interval of successive
frames is longer than a threshold.
Moreover, from the aspect of clients, to smooth the play-
back of received audio, one extra audio-buffer will be man-
aged by the cloud-gaming client software. Any received
audio will be stored into this buffer first to be appended to
the existing data (also in this buffer). As the whole buffer
has been filled, all will be copied to the playback device.
Thus, combined with the default buffer of the playback
device, it constructs a double-buffering mechanism, which
can parallelize the playback and reception and then smooth
the playback. Therefore, any audio data will be delayed for
some time: in our system, the length of this buffer is set to
occupy audio-data of 200 ms, which will make the playback
smooth. Results have been given in the next section.
5.2 Evaluation
The test environment and configurations are the same as
those in Section 4.4, as well as the testing method.
5.2.1 Overheads of the User-Level Virtualization
Technology Itself
We execute a game on a physical machine directly and
record the game speed (in term of the average FPS) and
average memory consumption. Then, this game is running
in the user-level virtualization environment (all related
APIs have been intercepted but no any real work, like image
capture, encoding, etc., has been enabled) and in a virtual
machine respectively; the same runtime information will be
recorded repeatedly.
The latest VMware Play 6 is employed and both the host
/ guest OSes are Win 7. The comparison is shown in Fig. 10
(for clarity, values have been normalized).
Consider the GPU utilization, the user-level technology
itself almost introduces no performance-loss, while the VM-
based solution’s efficiency is a little lower, about 90 percent
of the native. On the other side, the memory consumption
of the VM-based solution is 2.4 times as many as the native,
because the memory occupied by the guest OS is consider-
able. For the user-level solution, this consumption is almost
the same, too.
5.2.2 Processing Performance of the Server
The processing procedure of a cloud-gaming instance can
be divided into four parts: (1) image capture, which copies a
rendered into the system memory, (2) video encoding,
(3) transferring, which sends each compressed-frame into
the network, and (4) the process of the game-logic-operation
and rendering. The last one is mainly dependent on the con-
crete game while GCloud handles the others. Thus the first
three are object of this test and the amount of these delays is
denoted as SD (Server Delay).
Moreover, we intend to get the limit of the performance.
Hence only one instance is running on a server and the “try
the best” strategy is used. Namely, no Sleep call has been
inserted; the games can run as fast as possible. Some exist-
ing work [3] has completed the similar test for GamingAny-
where and Onlive, so that we can compare results with
theirs. Although the tested games of [3] are different, we
believe the comparison is meaningful because the server
delay is independent on specific games to a large extent.
Fig. 11 reports the average SD of three video games
under different resolutions. The corresponding FPS is in
Fig. 10. Comparison of resource consumption.
Fig. 11. Processing performance and the decomposition (three
resolutions).
ZHANG ET AL.: A CLOUD GAMING SYSTEM BASED ON USER-LEVEL VIRTUALIZATION AND ITS RESOURCE SCHEDULING 1249
www.redpel.com +917620593389
www.redpel.com +917620593389
12. Fig. 12. The average value of 720P is given in Fig. 13, as well
as the corresponding values of GamingAnywhere and OnLive
(values have been normalized).
Results show that, compared with similar solutions,
GCloud achieves smaller SDs (ranging from 8 ms to 19 ms),
which are positive correlated with resolutions. We think it
is mainly attributed to the high encoding performance of
Quick Sync. In contrast, the encoding delay of GamingAny-
where is about 14$16 ms per frame.
The transferring latency is smaller than others by two
orders of magnitude. Even in following cases of multiple
games, it does still hold. Thus, the transferring latency can
be skipped, as we have proposed in Section 4.
5.2.3 Multiple Games
The “just good enough” strategy is used; a Sleep call has been
used to fix the FPS. First, an OpenGL game and three
Direct3D games have been played one by one and the proc-
essing delay (including the sleep time) is sampled periodi-
cally; the sample period is one frame. Second, quite a few
game combinations, each including more than one game,
have been executed and sampled. Without loss of general-
ity, FPS values of some game combinations that are played
simultaneously are presented in Table 4, as well as the aver-
age absolute deviations (AADs). These combinations are:
Case 1: Two NFS instances;
Case 2: One NFS, one Combat and one Scrolls;
Case 3: Two NFS, one Combat and one Scrolls;
Case 4: One NFS, one Combat, one Scrolls and two Birds.
On the whole, the average FPS ranges from 30.5 to 31.5 as
one game is running alone. Their average absolute devia-
tions are 0.10 (Birds), 0.11 (NFS), 0.15 (Combat) and 1.47
(Scrolls) respectively, which means the FPS value is fairly
stable. Of course, there are quite a few delay-fluctuations. It
usually means the corresponding game-scenes are changing
rapidly, which is the common case for highly-interactive
games, especially for Scrolls.
With the increment of the number of concurrently-run-
ning games (it means more interferences between games),
the FPS values decrease correspondingly while the average
absolute deviations increase:
For Scrolls, as three games running (Case 2) at the same
time, its average FPS is 28.3 and the AAD is 2.13. For four
instances (Case 3), the values are 27.8 and 2.98 respectively.
For Combat, as three games running simultaneously, the
average FPS is 29.2; the AAD is 0.89. For four, the values are
28.8 and 1.59 respectively.
For the uncertainties of FPS values, we believe the main
reason lies in two aspects:
1) There exists interferences among several running
instances, including resource contests, which make
resource-consumption not totally linear with the
increase of instances (as illustrated in Fig. 4). For
example, Scrolls consumes the most resources, thus
its uncertainty is the biggest.
2) As mentioned in Section 4.6, resource require-
ments of games are time-dependent, which may
vary in different stages. It has also caused some
uncertainties.
Anyway, it means that the system can get satisfactory
gaming-effect and the FPS can be made relatively stable, as
multiple games are running simultaneously.
5.2.4 Verification of the Performance Model
According to the result of the performance model and
scheduling strategy, we test several typical server loads for
verification. Without loss of generality, the following cases
have been presented.
1) One Scrolls, one Combat and two NFS. As presented in
Table 5 (1st
row), the FPS value of each game is more
than 27 and the lowest is Scrolls’s, about 27.1. All are
not less than 90 percent of the FIXED_FPS (30), thus
they are accepetable. Because the system-RAM band-
width has been nearly exhausted (about 93 percent of
the MAX_SYSTEM_BANDWIDTH), when another
game join (regardless NFS or Birds), the FPS of Scrolls
will drop below the acceptable level.
Fig. 12. FPS of games.
Fig. 13. Comparison of the processing delay (1280 Â 720; the lower the
better).
TABLE 4
FPS Values and Average Absolute Deviations of
Different Numbers of the Running Games
Game / Case 1 2 3 4
NFS FPS 30.2 30.3 30.2 30.2
AAD 0.18 0.24 0.44 0.70
Combat FPS N/A 29.2 28.8 28.6
AAD N/A 0.89 1.59 1.89
Scrolls FPS N/A 28.3 27.8 27.3
AAD N/A 2.13 2.98 3.30
Birds FPS N/A N/A N/A 29.8
AAD N/A N/A N/A 0.56
1250 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 27, NO. 5, MAY 2016
www.redpel.com +917620593389
www.redpel.com +917620593389
13. 2) One Scrolls, one Combat, one NFS and three Birds. For
this case, the sum of each kind of resource consump-
tion is less than the corresponding system-capacity;
the relative maximum is the sum of memory
throughputs, about 95 percent of the MAX_SYS-
TEM_THROUGHPUT.In Table 5 (second row), the
FPS value of each game is more than 27.
3) One NFS, two Combat and five Birds.
4) Three NFS and five Birds.
In Case 3 4, the sum of memory throughputs is about
96 percent of the MAX_SYSTEM_THROUGHPUT. Anyway,
as the sum of each kind of resource consumption is less than
the corresponding system-capacity, the FPS value of each
game is still more than 27.
5.2.5 Discrepancy between Video and Audio
We have designed a method to calculate this discrep-
ancy: on the server, some sequences of full-black images
are inserted into the video-stream to replace original
scenes; at the same time, mute data will replace the cor-
responding audio-data, too. On the client, a screen
recording software is running with the gaming client.
Thus, through the analysis of audio / video streams of
recorded data, we can get time-stamps of the beginnings
of inserted video / audio sequences respectively. Then,
the discrepancies can be calculated. Results show that
these values are in the range of 180 ms$410 ms (Table 6).
We think the reason lies in the following, besides the
preset delays aforementioned:
1) The delay-fluctuations of games. The corresponding
FPS-values will be less than 30, which will increase
the timing discrepancy, because the accumulation
process of audio-data will be slowed.
2) The network’s delay-fluctuations. They will increase
the timing discrepancy, too. Our tests are carried out
in the campus. We believe, for the Internet, this fac-
tor will cause more delays.
3) The measurement error. The recording software
records the screen periodically, 30 FPS, while the
audio recording is consecutive. Thus, beginnings of
some sequences of full-black images may be lost,
which will decrease the gap.
6 CONCLUSIONS AND FUTURE WORK
This paper proposes GCloud, a GPU/CPU hybrid cluster for
cloud gaming based on the user-level virtualization
technology. We focus on the guideline of task scheduling:
To balance the gaming-responsiveness and costs, we fix the
game’s FPS to allocate just enough resources, which can also
mitigate the inference between games. Accordingly, a per-
formance model has been analyzed to explore the server-
capacity and the game-demands on resource, which can
locate the performance bottleneck and guide the task-sched-
uling based on games’ critical resource-demands. Compari-
sons show that both the First-Fit-like and Best-Fit-like
scheduling strategies can outperform others. Moreover,
they are near optimal in the batch processing mode.
In the future, we plan to enhance performance models to
support heterogeneous servers.
ACKNOWLEDGMENTS
The work is supported by the High Tech. RD Program of
China under Grant No. 2013AA01A215.
REFERENCES
[1] R. Shea, L. Jiangchuan, E.C.-H. Ngai, and C. Yong, “Cloud gam-
ing: Architecture and performance,” IEEE Netw., vol. 27, no. 4,
pp. 16–21, Jul./Aug. 2013.
[2] Z. Zhao, K. Hwang, and J. Villeta, “GamePipe: A virtualized cloud
platform design and performance evaluation,” in Proc. ACM 3rd
Workshop Sci. Cloud Comput., 2012, pp. 1–8.
[3] C.-Y. Huang, C.-H. Hsu, Y.-C. Chang, and K.-T. Chen,
“GamingAnywhere: An open cloud gaming system,” in Proc.
ACM Multimedia Syst., Feb. 2013, pp. 36–47.
[4] R. Phull, C.-H. Li, K. Rao, S. Cadambi, and S. T. Chakradhar,
“Interference-driven resource management for GPU-based het-
erogeneous clusters,” in Proc. 21st ACM Int. Symp. High Perform.
Distrib. Comput., 2012, pp. 109–120.
[5] V. T. Ravi, M. Becchi, G. Agrawal, and S. T. Chakradhar,
“Supporting GPU sharing in cloud environments with a transpar-
ent runtime consolidation framework,” in Proc. 20th ACM Int.
Symp. High Perform. Distrib. Comput., 2011, pp. 217–228.
[6] G. A. Elliott and J. H. Anderson, “Globally scheduled real-time
multiprocessor systems with GPUs,” Real-Time Syst., vol. 48, no. 1.
pp. 34–74, 2012.
[7] L. Chen, O. Villa, S. Krishnamoorthy, and G. R. Gao, “Dynamic
load balancing on single- and multi-gpu systems,” in Proc. IEEE
Int. Symp. Parallel Distrib. Process., 2010, pp. 1–12.
[8] M. Yu, C. Zhang, Z. Qi, J. Yao, Y. Wang, and H. Guan, “GRIS:
Virtualized GPU resource isolation and scheduling in cloud
gaming,” in Proc. 22nd Int. Symp. High-Perform. Parallel Distrib.
Comput., 2012, pp. 203–214.
[9] C. Zhang, J. Yao, Z. Qi, M. Yu, and H. Guan, “vGASA: Adaptive
scheduling algorithm of virtualized GPU resource in cloud
gaming,” IEEE Trans. Parallel Distrib. Syst., vol. 25, no. 11,
pp. 3036–3045, 2014.
[10] M. Claypool and K. Claypool, “Latency and player actions in
online games,” Commun. ACM, vol. 49, no. 11, pp. 40–45, 2006.
[11] D. C. Barboza, V. E. F. Rebello, E. W. G. Clua, and H. Lima, “A
simple architecture for digital games on demand using low per-
formance resources under a cloud computing paradigm,” in Proc.
Brazilian Symp., Games Digital Entertainment, 2010, pp. 33–39.
[12] D. De Winter, P. Simoens, and L. Deboosere, “A hybrid thin-client
protocol for multimedia streaming and interactive gaming
applications,” in Proc. Int. Workshop Netw. Oper. Syst. Support Digi-
tal Audio Video, 2006, p. 15.
TABLE 5
FPS of Concurrently-Running Games
TABLE 6
Discrepancy Values on the Client Side
Minimum Maximum Average
NFS 205 ms 395 ms 287 ms
Scrolls 213 ms 410 ms 323 ms
Combat 196 ms 336 ms 278 ms
Birds 180 ms 275 ms 242 ms
ZHANG ET AL.: A CLOUD GAMING SYSTEM BASED ON USER-LEVEL VIRTUALIZATION AND ITS RESOURCE SCHEDULING 1251
www.redpel.com +917620593389
www.redpel.com +917620593389
14. [13] W. Yu, J. Li, C. Hu, and L. Zhong, “Muse: A multimedia streaming
enabled remote interactivity system for mobile devices,” in Proc.
10th Int. Conf. Mobile Ubiquitous Multimedia, 2011, pp. 216–225.
[14] L. Shi, H. Chen, and J. Sun, “vCUDA: GPU accelerated high per-
formance computing in virtual machines,” in Proc. IEEE Int. Symp.
Parallel Distrib. Process., 2009, pp. 1–11.
[15] J. Duato, A. J. Pena, F. Silla, R. Mayo, and E. S. Quintana- Ortı,
“rCUDA: Reducing the number of GPU-based accelerators in
high performance clusters,” in Proc. Int. Conf. High Perform. Com-
put. Simul., 2010, pp. 224–231.
[16] V. Gupta, A. Gavrilovska, K. Schwan, H. Kharche, N. Tolia, V.
Talwar, and P. Ranganathan, “GViM: Gpu-accelerated virtual
machines,” in Proc. ACM Workshop Syst.-Level Virtualization High
Perform. Comput., 2009, pp. 17–24.
[17] M. Bautin, A. Dwarakinath, and T. cker Chiueh, “Graphic engine
resource management,” in Proc. 15th Multimedia Comput. Netw.,
2008, pp. 15–21.
[18] D. Wu Z. Xue, and J. He “iCloudAccess: Cost-effective streaming
of video games from the cloud with low latency,” IEEE Trans.
Circuits Syst. Video Technol., vol. 24, no. 8, pp. 1405–1416, Jan. 2014.
[19] H.-J. Hong, D.-Y. Chen, C.-Y. Huang, K.-T. Chen, and C.-H. Hsu,
“Placing virtual machines to optimize cloud gaming experience,”
IEEE Trans. Cloud Comput. , vol. 3, no. 1, pp. 42–53, Jan.–Mar. 2015.
[20] S. Kato, K. Lakshmanan, R. Rajkumar, and Y. Ishikawa,
“TimeGraph: GPU scheduling for real-time multi-tasking environ-
ments,” in Proc. USENIX Conf. USENIX Annu. Tech. Conf., 2011, p. 2.
[21] L Cherkasova and L Staley, “Building a performance model of
streaming media application in utility data center environment,” in
Proc. 3rd IEEE/ACM Int. Symp. Cluster Comput. Grid, 2003, pp. 52–59.
[22] V. Ishakian and A. Bestavros, “MORPHOSYS: Efficient colocation
of QoS-constrainedworkloads in the cloud,” in Proc. 12th IEEE/
ACM Int. Symp. Cluster, Cloud Grid Comput., 2012, pp. 90–97.
[23] S. Wang and S. Dey, “Rendering adaptation to address communi-
cation and computation constraints in cloud mobile gaming,” in
Proc. Global Telecommun. Conf., Dec. 6–10, 2010, pp. 1–6.
[24] D. Klionsky. A new architecture for cloud rendering and amor-
tized graphics. M.S. Thesis, School Comput. Sci., Carnegie Mellon
Univ., CMU-CS-11–122. [Online]. Available: http://reports-
archive.adm.cs.cmu.edu/anon/2011/abstracts/11–122.html.
[25] A. Jurgelionis, P. Fechteler, P. Eisert, F. Bellotti, and H. David,
“Platform for distributed 3D gaming,” Int. J. Comput. Games Tech-
nol. , vol. 2009, p. 1, 2009.
[26] A. Ojala and P. Tyrvainen, “Developing cloud business models:
A case study on cloud gaming,” IEEE Softw., vol. 28, no. 4,
pp. 42–47, Jul. 2011.
[27] S.-W. Chen, Y.-C. Chang, and P.-H. Tseng, C.-Y. Huang, and C.-L.
Lei, “Measuring the latency of cloud gaming systems,” in Proc.
19th ACM Int. Conf. Multimedia, 2011, pp. 1269–1272.
[28] S. Choy, B. Wong, G. Simon, and C. Rosenberg “The brewing
storm in cloud gaming: A measurement study on cloud to end-
user latency,” in Proc. 11th Annu. Workshop Netw. Syst. Support
Games, 2012, p. 2.
[29] Y.-T. Lee, K.-T. Chen, H.-I. Su, and C.-L. Lei, “Are all games equally
cloud-gaming-friendly? An electromyographic approach,” in Proc.
IEEE/ACM NetGames, 2012, pp. 109–120.
[30] K.-T. Chen, Y.-C. Chang, H.-J. Hsu, D.-Y. Chen, C.-Y. Huang, and
C.-H. Hsu, “On the quality of service of cloud gaming systems,”
IEEE Trans. Multimedia, vol. 16, no. 2, pp. 480–495, Feb. 2014.
[31] Y. Zhang, X. Wang, and L. Hong, “Portable desktop applications
based on P2P transportation and virtualization,” in Proc. 22nd
Large Installation Syst. Administration Conf., 2008, pp. 133–144.
[32] P. Guo, “CDE: Run any linux application on-demand without
installation,” in Proc. 25th USENIX Large Installation Syst. Adminis-
tration Conf., 2011, p. 2.
[33] B. Xia and T. Zhiyi, “Tighter bounds of the first fit algorithm for
the bin-packing problem,” Discrete Appl. Math., vol. 158, no. 15,
pp. 1668–1675, 2010.
[34] C. Kenyon, “Best-fit bin-packing with random order,” in Proc. 7th
Annu. ACM-SIAM Symp. Discrete Algorithm, 1996, vol. 96,
pp. 359–364.
[35] M. Harchol-Balter, M. E. Crovella, and C. Duarte Murta, “On
Choosing a task assignment policy for a distributed server sys-
tem,” J. Parallel Distrib. Comput., vol. 59, no. 2, pp. 204–228, 1999.
[36] A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker,
and I. Stoica, “Dominant resource fairness: Fair allocation of mul-
tiple resource types,” in Proc. 8th USENIX Symp. Netw. Syst. Des.
Implementation, 2011, pp. 323–336.
[37] Y.-T. Lee and K.-T. Chen, “Is server consolidation beneficial to
MMORPG? A case study of world of warcraft,” in Proc. IEEE 3rd
Int. Conf. Cloud Comput., 2013, pp. 435–442.
[38] Y.-T. Lee, K.-T. Chen, Y.-M. Cheng, and C.-L. Lei, “World of war-
craft avatar history dataset,” in Proc. 2nd Anuu. ACM Multimedia
Syst., Feb. 2011, pp. 123–128.
[39] G. Hunt and D. Brubacher, “Detours: Binary interception of
Win32 functions,” in Proc. 3rd USENIX Windows NT Symp., Jul.
1999, p. 14.
[40] Y. Gu and R. L. Grossman, “UDT: UDP-based data transfer for
high-speed wide area networks,” Comput. Netw., vol. 51, no. 7,
pp. 109–120, May 2007.
Youhui Zhang received the BSc and PhD
degrees in computer science from Tsinghua Uni-
versity, China, in 1998 and 2002. He is currently
a professor in the Department of Computer Sci-
ence, Tsinghua University. His research interests
include computer architecture, cloud computing,
and high-performance computing. He is a mem-
ber of the IEEE and the IEEE Computer Society.
Peng Qu received the BSc degree in computer
science from Tsinghua University, China, in
2013. He is currently working toward the PhD
degree in the Department of Computer Science,
University of Tsinghua, China. His interests
include cloud computing and micro-architecture.
Cihang Jiang received the BSc degree in com-
puter science from Tsinghua University, China, in
2013. He is currently a master student in the
Department of Computer Science, University of
Tsinghua, China. His research interest is cloud
computing.
Weimin Zheng received the BSc and MSc
degrees in computer science from Tsinghua Uni-
versity, China, in 1970 and 1982, respectively.
He is currently a professor in the Department of
Computer Science, University of Tsinghua,
China. His research interests include high perfor-
mance computing, network storage and distrib-
uted computing. He is a member of the IEEE and
the IEEE Computer Society.
For more information on this or any other computing topic,
please visit our Digital Library at www.computer.org/publications/dlib.
1252 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 27, NO. 5, MAY 2016
www.redpel.com +917620593389
www.redpel.com +917620593389