Grid and Cloud Unit II
Grid and Cloud Unit II
GRID SERVICES
PART – A
1. List the OGSA grid service interfaces?
2. Define OGSA.
The Open Grid Service Architecture (OGSA).is a standard provided by Global Grid Forum to
address the requirements of grid computing in an open and standard way. OGSA allows a system
to perform a specific task or solve a challenging problem by using distributed resources over the
interconnection network. This standard defines a common framework that allows businesses to
build grid platforms across enterprises and business partners
7. What is OGSI?
The Open Grid Services Interface (OGSI) defines mechanisms for creating, managing, and
exchanging information among Grid services.
.
8. What is OGSI specification? Mention its dimensions.
OGSI specification defines a component model using a web service as its core based technologies
with WSDL as the service description mechanism and XML as the message format. There are two
dimensions to the stateful nature of web service:
i. A service is maintaining its state information
ii. The interaction pattern between the client and service can be stateful.
i. Globus Toolkit – which is adopted as a grid technology solution for scientific and technical
computing
ii. Web services (WS) – a popular standard based framework for business and network
applications.
13. What are the access models for organizing a data grid?
Monadic model
Hierarchical model
Hybrid model
Federation model
14. What are the grid service features that OGSI specification defines?
Statefulness
Stateful interactions
The ability to create new instances
Service lifetime management
Notification of state changes and Grid service groups
FUNCTIONALITY REQUIREMENTS
Discovery and brokering. Mechanisms are required for discovering and/or allocating services,
data, and resources with desired properties. For example, clients need to discover network
services before they are used, service brokers need to discover hardware and software availability,
and service brokers must identify codes and platforms suitable for execution requested by the
client
Metering and accounting. The metering function records the usage and duration, especially
metering the usage of licenses. The auditing function audits usage and application profiles on
machines, and the billing function bills the user based on metering.
Data sharing. Mechanisms are required for accessing and managing data archives, for caching
data and managing its consistency, and for indexing and discovering data and metadata.
Deployment. The mechanism of data deployment is required for coping or deploying user data
over grid for executing users the job
Virtual organizations (VOs). This mechanism wth creating and managing virtual organizations
with group membership services. For the commercial data center use case, the grid creates a VO
in a data center that provides IT resources to the job upon the customer’s job request.
Monitoring. Moniterin tools are required to allow users to monitor their applications running on
the grid. Also, the resource or service owners need to monitor certain states so that the user of
those resources or services may manage the usage using the state information.
Policy. Different policies are required or recording and event polices which will be used for self-
controlling management, including failover and provisioning.
Fault tolerance. The grid system should provide a mechanism for detecting failover, load
redistribution, and other techniques used to achieve fault tolerance. Fault tolerance is particularly
important for long running queries that can return large amounts of data such as dynamic
scientific applications and commercial data center applications.
Disaster recovery. Disaster recovery mechanisms are required to check critical capability of
complex distributed grid infrastructures. For distributed systems, failure must be considered one
of the natural behaviors and disaster recovery mechanisms must be considered an essential
component of the design.
Self-healing capabilities of resources, services and systems are required. Significant manual
effort should not be required to monitor, diagnose, and repair faults.
Legacy application management. Legacy applications are those that cannot be changed, but they
are too allowable to give up or to complex to rewrite. Grid infrastructure has to be built around
them so that they can continue to be used.
Administration. Be able to “codify” and “automate” the normal practices used to administer the
environment. The goal is that systems should be able to selforganize and self-describe to manage
low-level configuration details based on higher-level configurations and management policies
specified by administrators.
Grouping/aggregation of services. The ability to instantiate (compose) services using some set of
existing services is a key requirement. There are two main types of composition techniques:
selection and aggregation. Selection involves choosing to use a particular service among many
services with the same operational interface. Aggregation involves orchestrating a functional flow
between services.
3.SECURITY REQUIREMENTS
Multiple security infrastructures. The applications running on the grid have to interoperate with
other applications or other grids. So there is a requirement of managing the multiple infrastructure
security. Distributed operation implies a need to interoperate with and manage multiple security
infrastructures.
Perimeter security solutions. Many applications on grid have to run at one side of firewalls. The
users have to work from other side of firewalls. So a standard security solution can be deployed to
protect the accessibility while making cross firewall interaction.
Certification. A trusted party certifies that a particular service has certain semantic behavior. For
example, a company could establish a policy of only using e-commerce services certified by
Yahoo.
Resource management is another multilevel requirement that deals with SLA negotiation,
provisioning, and scheduling for a variety of resource of grid environment.
Transport management. For applications that require some form of real-time scheduling, it can
be important to be able to schedule or provision bandwidth dynamically for data transfers or in
support of the other data sharing applications. In many (if not all) commercial applications,
reliable transport management is essential to obtain the end-to-end QoS required by the
application.
Management and monitoring. Support for the management and monitoring of resource usage and
the detection of SLA or contract violations by all relevant parties. Also, conflict management is
necessary.
Scheduling of service tasks. Long recognized as an important capability for any information
processing system, scheduling becomes extremely important and difficult for distributed grid
systems.
Load balancing. In many applications, it is necessary to make sure make sure deadlines are met
or resources are used uniformly. These are both forms of load balancing that must be made
possible by the underlying infrastructure.
Advanced reservation. This functionality may be required in order to execute the application on
reserved resources.
Notification and messaging. Notification and messaging are critical in most dynamic scientific
problems.
Workflow management. Many applications can be wrapped in scripts or processes that require
licenses and other resources from multiple sources. Applications coordinate using the file system
based on events.
Pricing. Mechanisms for determining how to render appropriate bills to users of a grid.
----------------------------------------------------------------------------------------------------------------
DATA-INTENSIVE GRID SERVICE MODELS
Data-intensive applications deals with massive amounts of data. For example, the data produced
annually by a Large Hadron Collider may exceed several petabytes (1015 bytes). The grid system
must be specially designed to discover, transfer, and manipulate these massive data sets.
Transferring massive data sets is a time-consuming task.
To efficiently manage massive data with low-cost storage and high-speed data movement the
following methods are used by data intensive applications.
Data Replication and Unified Namespace
This data access method is also known as caching, which is often applied to enhance data
efficiency in a grid environment. The data blocks are replicated and stored in multiple regions of a
grid. The users can access these data with locality of references. Replication strategies determine
when and where to create a replica of the data. The factors to consider include data demand,
network conditions, and transfer cost.
The data replication is of two types (1) Static (2)Dynamic
(1) Static: The location and the number of replicas are determined in advance and wll not be
modified.
(2)Dynamic: It can adjust the location and number of replicas.
Multiple participants may want to share the same data collection. To retrieve any piece of data, it
is necessary to have a grid with a unique global namespace and to have unique file names. The
typed of grid data access models are,
Monadic model: This is a centralized data repository model, All the data is saved in a
central data repository. When users want to access some data they have to submit requests
directly to the central repository.
Hierarchical model: The hierarchical model, is suitable for building a large data grid
which has only one large data access directory. The data may is transferred from the
source to a second-level center. Then data in regional center is transferred to the third level
center, then data objects are accessed by user.
Federation model: This data access model is better suited for designing a data grid with
multiple sources of data supplies. Sometimes this model is also known as a mesh model.
The data sources are distributed in different locations and the data are owned and
controlled by their original owner. The authorized users are authorized to request data from
any data source.
Hybrid model: This data access model. The model combines the best features of the
hierarchical and mesh models.
Parallel versus Striped Data Transfers
Compared with traditional FTP data transfer, parallel data transfer opens multiple data
streams for passing subdivided segments of a file simultaneously. Although the speed of
each stream is the same as in sequential streaming, the total time to move data in all
streams can be significantly reduced compared to FTP transfer.
In striped data transfer, a data object is partitioned into a number of sections, and each
section is placed in an individual site in a data grid. When a user requests this piece of data,
a data stream is created for each site, and all the sections of data objects are transferred
simultaneously. Striped data transfer can utilize the bandwidths of multiple sites more
efficiently to speed up data transfer.
OGSA Services
1.Metering Service
The metering service is used to record or measure the resource utilization of shared
resources over the grid infrastructure.
A grid service may consume multiple resources and a resource may be shared by multiple
service instances.
The sharing of underlying resources is managed by middleware and operating systems.
A metering interface provides access to a standard description of such aggregated data
(metering service Data).
A key parameter is the time window over which measurements are aggregated.
In commercial Unix systems, measurements are aggregated at administrator-defined
intervals (chronological entry), usually daily, primarily for the purpose of accounting.
An OGSA metering service must be able to meter the resource consumption of server,
storage, and network resources.
----------------------------------------------------------------------------------------------------------------
2) Service Groups and Discovery Services
GSHs and GSRs together form a two-level naming scheme, with HandleResolver services
mapping from handles to references; however, GSHs are not intended to contain semantic
information and indeed may be viewed for most purposes as opaque. Thus, other entities (both
humans and applications) need other means for discovering services with particular properties,
whether relating to interface, function, availability, location, policy.
Attribute naming schemes associate various metadata with services and support retrieval via
queries on attribute values. A registry implementing such a scheme allows service providers to
publish the existence and properties of the services that they provide, so that service consumers
can discover them A ServiceGroup is a collection of entries, where each entry is a grid service
implementing the rviceGroupEntry interface. The ServiceGroup interface also extends the
GridService interface.
Path naming or directory schemes (as used, for example, in file systems) represent an alternative
approach to attribute schemes for organizing services into a hierarchical name space that can be
navigated. The two approaches can be combined, as in LDAP.
-------------------------------------------------------------------------------------------------------------------
3) Rating Service
Rating service is used to provide rating to gird service instances based on metered information .A
rating interface needs to address two types of behaviors. Once the metered information is
available, it has to be translated into financial terms. That is, for each unit of usage, a price has to
be associated with it. This step is accomplished by the rating interfaces, which provide operations
that take the metered information and a rating package as input and output the usage in terms of
chargeable amounts.
Furthermore, when a business service is developed, a rating service is used to aggregate the costs
of the components used to deliver the service, so that the service owner can determine the pricing,
terms, and conditions under which the service will be offered to subscribe
--------------------------------------------------------------------------------------------------------------------
4)Choreography, Orchestration and work flow.
Choreography describes required patterns of interaction among grid services and templates for
sequences of interactions.
Orchestration describes the ways in which business processes are constructed from Web services
and other business processes, and how these processes interact.
Workflow is a pattern of business process interaction, not necessarily correspondingto a fixed set
of business processes.
This service allow the user applications to define workflow of services ie sequence of execution
by means o Choreography or Orchestration. The steps are,
Definition of a job flow
Assignment o resources to a gird flow instances
Scheduling of grid flows
Execution of gird flows
Management and monitoring of grid flow
Failures handling for gird flows.
-----------------------------------------------------------------------------------------------------------------
5). Transaction services
It s used to maintain supply chain management to execute the transactions n gird infrastructure for
financial services
--------------------------------------------------------------------------------------------------------------
6). Accounting services
This service s used to manage user and user account information. It is also used to calculate the
monthly usage charges based on subscription plan.
--------------------------------------------------------------------------------------------------------------
Event—Some occurrence within the state of the grid service or its environment that may be of
interest to third parties. This could be a state change or it could be environmental, such as a timer
event.
Message—An artifact of an event, containing information about an event that some entity wishes
to communicate to other entities.
Topic—A “logical” communications channel and matching mechanism to which a requestor may
subscribe to receive asynchronous messages and publishers may publish messages.
Four base data interfaces (WSDL portTypes) can be used to implement a variety of different data
service behaviors:
1. DataDescription defines OGSI service data elements representing key parameters of the data
virtualization encapsulated by the data service.
2. DataAccess provides operations to access and/or modify the contents of the data virtualization
encapsulated by the data service.
3. DataFactory provides an operation to create a new data service with a data virtualization
derived from the data virtualization of the parent (factory) data service.
4. DataManagement provides operations to monitor and manage the data service’s data
virtualization, including (depending on the implementation) the data sources (such as database
management systems) that underlie the data
Service
--------------------------------------------------------------------------------------------------------------------
Data Caching. In order to improve performance of access to remote data items, caching services
will be employed.d to migrate data for computation or to replicate state for a given service.
Consistency—Is the data in the cache the same as in the source? If not, what is the coherence
window? Different applications have very different requirements.
Cache invalidation protocols—How and when is cached data invalidated?
Write through or write back? When are writes to the cache committed back to the original data
source?
Security—How will access control to cached items be handled? Will access control enforcement
be delegated to the cache, or will access control be somehow
enforced by the original data source?
Integrity of cached data—Is the cached data kept in memory or on disk? How is it protected
from unauthorized access? Is it encrypted?
Discovery Services
discovery services are concerned with mapping from user-specified criteria
to appropriate GSHs
--------------------------------------------------------------------------------------------------------------------
The base manageable resource interface, which a resource or resource manager must
provide to be manageable
Canonical lifecycle states—the transitions between the states, and the operations necessary
for the transitions that complement OGSI lifetime service data
The ability to represent relationships among manageable resources including a canonical
set of relationship types
Life cycle metadata (XML attributes) common to all types of managed resources for
monitoring and control of service data and operations based on life cycle state
Canonical services factored out from across multiple resources or domain specific resource
managers, such as an operational port type (start/stop/pause/resume/quiesce).
Additional items that may come within the scope of the CMM specification are
New data types or metadata to convey semantic meaning of manageability information,
such as counter or gauge
Versioning information
Metadata to associate a metered usage (unit of measure) with manageability Information
Classification of properties such as metric and configuration
Registries and locating fine-grained resources
Managed resource identifier
--------------------------------------------------------------------------------------------------------------------