0% found this document useful (0 votes)
76 views42 pages

Unit 4 - Final

software

Uploaded by

Prabha Garan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views42 pages

Unit 4 - Final

software

Uploaded by

Prabha Garan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Software Architecture Styles One interesting theory of problem-solving in the programming domain states that programmers solve such

problems using programming plans, program fragments that correspond to stereotypical actions, and rules that describe programming conventions. For example, to compute the sum of a series of numbers, a programmer uses the running total loop plan. In this plan, some counter is initialized to zero and incremented with the next value of a series in the body of the loop. Experts tend to recall program fragments that correspond to plan structures before they recall other elements of the program. This nicely maps onto the idea that knowledge is stored in human memory in meaningful units or 'chunks'. An expert programmer has at his disposal a much larger number of knowledge chunks than a novice programmer. This concerns both programming knowledge and knowledge about the application domain. Both during the search for a solution and during program comprehension, the programmer tries to link up with knowledge arleady present. As a corollary, part of our education as programmer or software engineer should consist of acquiring a set of useful knowledge chunks. At the level of algorithms and abstract data types, such a body of knowledge has been accumulated over the years, and has been codified in text books and libraries of reusable components. As a result, abstractions, such as Quicksort, embodied in procedures and abstract data types, such as Stack and Binary Tree, have become part of our vocabulary and are routinely used in our daily work. The concepts embodied in these abstractions are useful during the design, implementation and maintenance of software for the following reasons: They can be used in a variety of setting and can be given unique names. The names are used in communicating the concepts and serve as labels when retrieving and storing them in human memory. The label Quicksort rings the same bell for all people working in our field. We have notations and mechanisms to support their use and reuse, such as procedure calls and the module concept. We have organized related concepts into semantic networks that can be searched for an item that fits the problem at hand. For example, we know the time and space tradeoffs between Quicksort and Bubblesort, or between standard binary search tree and an AVL tree, and we know the grounds on which to make a choice. Design patterns are collections of a few modules, or classes, which are often used in combination, and which together provide a useful abstraction. A design pattern is a recurring solution to a standard problem. The prototypical example of a pattern is the MVC or Model-View-Controller pattern known from Smalltalk. We may view design patterns as micro-architectures. Two further notions often used in this context are application framework and idiom. An application framework is a semi-finished system which needs to be instantiated to obtain a complete system. It describes the architecture of a family of similar systems. It is thus tied to a particular application domain. The best known examples are frameworks for building user interfaces. An idiom is a low-level pattern, specific to some programming language. For example, the Counted Pointer idiom can be used to handle references to objects dynamically created in C++. It keeps a reference counter which is incremented or decremented when references to an object are added or removed. Memory occupied by an object is freed if no references to that object remain, i.e. when the counter becomes zero. Frameworks and idioms thus offer solutions that are more concrete and language specific than the architectural styles or design patterns. The work in the area of software architecture and design patterns has been strongly influenced by the ideas of the architect Christopher Alexander, as formulated in his books The Timeless Way of Building and A Pattern Language. The term pattern derives from Alexander's work, and the format used to describe software architectural styles and design patterns is shaped after the format Alexander used to describe his patterns, like
Copyright :IT ENGG PORTAL www.itportal.in

alcove, office connection, or public outdoor room. In software engineering, we often draw a parallel with other engineering disciplines, in particular civil engineering. This comparison is made to highlight both similarities, such as the virtues of a phased approach, and differences, such as the observation that software is logical rather than phsyical, which hampers the control of progress. The comparison with the field of architecture is often made to illustrate the role of different views, as expressed in the different types of blueprint produced. Each of these blueprints emphasizes a particular aspect. The classical field of architecture provides some further interesting insights for software architecture. These insights concern: The notion of architectural style. The relationship between style and engineering. the relationship between style and materials. Architecture is a formal arrangement of architectural elements. An architectural style abstracts from the specifics of an architecture. The decomposition library system might, for instance, result in architecture consisting of one main program and four subroutines, sharing three data stores. If we abstract from these specifics, we obtain its architectural style, in which we concentrate on the types of its elements and their interconnections. Viewed in this way, an architectural style describes a certain codification of elements and their arrangements. Conversely, an architectural style constrains both the elements and their interrelationships. For example, the Tudor style describes how a certain type of house looks and also prescribes how its design should look. In a similar vein we may characterize a software architectural style such as, say, the pipes-and-filter style. Different engineering principles apply to different architectural styles. This often goes hand in hand with the types of material used. Cottage-style houses and high-rise apartments differ in the materials used and the engineering principles applied. A software design based on the abstract types (=material) emphasizes separation of concerns by encapsulating secrets (=engineering principle). A design based on pipes and filters emphasizes bundling of functionality in independent processes. When selecting a certain style with its corresponding engineering principles and materials, we are guided by the problem to be solved as well as the larger context in which the problem occurs. We cannot build a skyscraper from wooden posts. Environmental regulations may prohibit us from erecting high-rise buildings in rural areas. And, the narrow frontages of many houses on the Amsterdam canals are partly due to the fact that local taxes were based on the number of street-facing windows. Similar problem-specific and context-specific elements guide us in the selection of a software architectural style. These similarites between classical architecture and software architecture provide us with clues as to what constitutes a software architectural style and what its description should look like. In his book A Pattern Language, the architect Christopher Alexander presents 253 patterns, ranging in scale from how a city should look down to rules for the construction of a porch. Perhaps his most famous pattern is about the height of buildings: "There is abundant evidence to show that high buildings make people crazy. High buildings have no genuine advantage, except in speculative gains to banks and land owners. They are not cheaper, they do not help create open space, they make life difficult for children, they are expensive to maintain, they wreck the open spaces near them, and they damage the light and air and view. But quite apart from this, empirical evidence shows that they can actually damage people's minds and feelings. In any urban area, no matter how dense, keep the majority of buildings four stories or less. It is possible that certain buildings should exceed this limit, but they should never be buildings for human habitation." An Alexandrian pattern is not a cookbook, black box recipe for architects, any more than a dictionary is a toolkit for a novelist. Rather, a pattern is a flexible generic scheme providing a solutiong to a problem in a given
Copyright :IT ENGG PORTAL www.itportal.in

context. In a narrative form, its application looks like this: IF you find yourself in <context>, for example <examples>, with <problem>, THEN for some <reasons>, apply <pattern> to construct a solution leading to a <new context> and <other patterns>. The above Four Story Limit pattern may for example be applied in a context where one has to design a suburb. The citation gives some of the reasons for applying this pattern. If it is followed, it will give rise to the application of other patterns, such as those for planning parking lots, the layout of roads, or the design of individiual houses. A number of well-known software architectural styles are often in a framework that resembles a popular way of describing design patterns. Both the characterization and the framework are framed after Alexander's way of describing patterns. We will use this framework to describe a number of well-known and classic architectural styles. The framework has the following entries: Problem A description of the type of problem this style addresses. Certain characteristics of the requirements will guide the designer in his choice of a particular style. For example, if the problem consists of a series of independent transformations, a pipes-and-filter type of architecture suggests itself. Context A designer will be constrained in the use of a style by certain characteristics of the environment. Or, to put it the other way around, a style imposes certain requirements on the environment. For example, the pipes-and-filter style usually relies on operating system support for data transfer between filters. Solution A description of the solution chosen. The major elements of a software architecture are components and connectors. Components are the building blocks of a software architecture. They usually embody a computational element of some sort (like a procedure), but a component can also be a data store (such as a database). The connectors describe how components interact. Some typical types of component and connector are given in Tables 6 & 7.

Table 6 - Component Types Type Description The component performs a computation of some sort. Usually, the input and output to the component may have a local state, but this state disappears after the component has done its job. Example components of this type are mathematical functions and filters. A memory component maintains a collection of persistent, structured data, to be shared by a number of other components. Examples are a database, a file system, or a symbol table. A manager component contains a state and a number fo associated operations. When invoked, these operations use or update the state, and this state is retained between successive invocations of the manager's ooperations. Abstract data types and servers are example componetns of the manager type. A controller governs the time sequence of other events. A top-level control module and a scheduler are examples of a controller type.
Copyright :IT ENGG PORTAL www.itportal.in

Computational

Memory

Manager

Controller

Table 7 - Connector Types Type Description With this type of connector, there is a single thread of control between the caller and the called component. Control is transferred to the component being called, and this component remains in control until its work has ended. Only then is control transferred back to the calling component. The traditional procedure call and the remote procedure call are examples of this type of connector. With a data flow connector, processes interact through a stream of data, as in pipes. The components themselves are independent. Once input data to a component is available, it may continue its work. With implicit invocation, a computation is invoked when a certain event occurs, rather than by explicit interaction, as in a procedure call. Components raising events do not know which component is going to react and invoked components do not know which component raised the event to which they are reacting. Message passing occurs when we have independent processes that interact through explicit, discrete transfer of data, as in TCP/IP. Message passing can be synchronous, in which case the sending/receiving process is blocked until the message has been completely sent/received, or asynchronous, in which case the processes continue their work independently. When using shared data connectors, components operate concurrently on the same data space, as in blackboard systems or multiuser databases. Usually, some blocking scheme prevents concurrent writes to the same data. With Instantiation, one component (the instantiator) provides space for the state required by another component (the instantiated), as in abstract data types.

Procedure Call

Data Flow

Implicit Invocation

Message Passing

Shared data

Instantiation

The order of execution of components is governed by the control structure. The control structure captures how control is transformed during execution. The choice of components and connectors is not independent. Usually, a style is characterized by a combination of certain types of component and connector, as well as a certain control structure. The system model captures the intuition behind such a combination. Architectural styles give a rather general description. Variants are specializations which differ from the general style. Examples are references to real examples of a style. Architectural styles do not stem from theoretical investigations, but result from identifying and characterizing best practice.
4

Copyright :IT ENGG PORTAL www.itportal.in

Tables 8 through 13 contain descriptions of six well-known architectural styles: 1. 2. 3. 4. 5. 6. A Main Program with Subroutines An abstract data type Implicit Invocation Pipes and Filters Repository Layered

Main Program with Subroutines Architectural Style In the main program with subroutines architectural style, the main tasks of the system, are allocated to different components, which are called, in the appropriate order, from a control component. The decomposition is strongly geared towards an ordering of the various actions to be performed with respect to time. The top-level component controls this ordering.

Table 8 - Style: Main Program with Subroutines The system can be described as a hierarchy of procedure definitions. This style is a natural outcome of a functional decomposition of a system. The top level module acts as the main program. Its main task is to invoke the other modules in the correct order. As a consequence, there is usually a single thread of control. This style naturally fits in with programming languages that allow for nested definitions of procedures and modules. System Module Procedures and modules are defined in a hierarchy. Higher level modules call lower-level modules. The hierarchy may be strict, in which case modules at level n can only call modules at level n-1, or it may be weak, in which case modules at level n may call modules at level n-i, with i1. Procedures are grouped into modules following such criteria as coupling and cohesion. Components Groups of procedures, which may have their own local data, and global data which may be viewed as residing in the main program. Connectors Procedure call and shared access to data. Control Structure There is a single, centralized thread of control, the main program pulls the strings. This style is usually applied to systems running on one CPU. Abstractly, the model is preserved in systems running on multiple CPUs and using the Remote Procedure Call mechanism to invoke the processes.

Problem

Context

Solution

Variants

Abstract Data Type Architectural Style Components in the main program with subroutines type of decomposition often use shared data storage. Decisions about data representations, then, are a mutual property of the components that use the data. We may also try to make those decisions locally rather than globally. In that case, the user does not get direct access to the data structures, but is offered an interface. The data can only be accessed through appropriate procedure or
Copyright :IT ENGG PORTAL www.itportal.in

method calls. This is the essence of the Abstract Data Type Architectural Style.

Table 9 - Architectural Style: Abstract Data Type A central issue is to identify and protect related bodies of information. The style is especially suited for cases where the data representation is likely to change during the lifetime of the system. When the design matches the structure of the data in the problem domain, the resulting components encapsulate problem-domain entities and their operations. Many design methods, most notably the object-oriented ones, provide heuristics to identify real-world objects. These objects are then encapsulated in components of the system. Object-oriented programming languages provide the class concept, which allows us to relate similar objects and reuse code through the inheritance mechanism. System Model Each component maintains its own local data. Components hide a secret their representation of data. Components The components of this style are managers, such as servers, objects and abstract data type. Connectors Operations are invoked through procedure calls - messages. Control Structure Their is usually a single thread of control. Control is decentralized, however, a component may invoke any components whose services it requires. Methods or languages that are not object-oriented only allow us to hide data representations in modules. Object-oriented methods or languages differ as regards to their facilities for relating similar objects (single or multiple inheritance) and their binding of messages to operations (at compile time or run time).

Problem

Context

Solution

Variants

Implicit Invocation Architectural Style A major advantage of abstract data types over shared data is that changes in data representation and algorithms can be accomplished relatively easily. Changes in functionality, however, may be much harder to realize. This is because method invocations are explicit, hard-coded in the implementation. An alternative is to use the

Implicit Invocation Style.

Copyright :IT ENGG PORTAL www.itportal.in

Table 10 - Architectural Style: Implicit Invocation We have a loosely coupled collection of components each of which carries out some task and may enable other operations. The major characteristics of this style is that it does not bind recipients of signals to their originators. It is especially useful for applications that need to be able to be reconfigured, by changing a service provider or by enabling and disabling operations. This style usually requires an event handler that registers components' interests and notifies others. Because of the intrinsically decentralized nature of systems designed this way, correctness arguments are difficult. For the same reason, building a mental model of such systems during program comprehension is difficult, too. System Model Processes are independent and reactive. Processes are not invoked explicitly, but implicitly through the raising of an event. Components Components are processes that signal events without knowing which component is going to react them. Conversely, processes react to events raised somewhere in the system. Connectors Connectors are connected through the automatic invocation of processes that have registered interest in certain events. Control Structure Control is decentralized. Individual components are not aware of the recipients of signals. There are two major categories of systems exploiting implicit invocation. the first category comprises so-called tool-integration frameworks as exemplified by many software development support environments. They consist of a number of 'toolies' running as separate processes. Events are handled by a separate dispatcher process which uses some underlying operating system support such as UNIX sockets. The second category consists of languages with specialized notations and support for implicit invocation, such as when updated features of some object-oriented languages.

Problem

Context

Solution

Variants

In implicit invocation, a component is not invoked explicitly. Instead, an event is generated. Other components in the system may express their interest in this event by associating a method with it; this method is automatically invoked each time the event is raised. Functional changes can be realized easily by changing the list of events that components are interested in. Pipes And Filters Architectural Style Some applications consist of a series of components in which component i produces output which is read and processed by component i+1, in the same order in which it si written by component i. In such cases, we need not explicitly create these intermediate data structures. Rather, we may use the pipe-and-filter method of operation that is well known from UNIX, and directly feed the output of one transformation into the next one. The components are called filters and the FIFO connectors are called pipes.

Copyright :IT ENGG PORTAL www.itportal.in

Table 11 - Architectural Style: Pipes and Filters A series of independent sequential transformations on ordered data. Usually, the transformations are incremental. Often, the structure of the datastreams is very simple: a sequence of ASCII characters. If the data has a rich structure, this will imply quite some overhead for the parsing and unparsing of the data. This style requires that the system can be decomposed into a series of computations, filters, that incrementally transform one or more input streams. It usually relies on operating system operations to transfer the data from one process to another (pipes). Error handling is difficult to deal with uniformly in a collection of filters. System Model The resulting systems are characterized by continuous data flow between components, where the components naturally transform datastreams. Components The components are filters that perform local processing; i.e. they read part of their input data, transform the data, and produce part of their output. They have little internal state. Connectors Datastreams (usually plain ASCII, as in UNIX). Control Structures Data flow between components. Each component usually has its own thread of control. Pure filters have little internal state and process their input locally. In the degenerate case, they consume all of their input before producing any output. In that case, the result boils down to a batch-processing type of stream.

Problem

Context

Solution

Variants

An important characteristic of this scheme is that any structure imposed on the data to be passed between adjacent filters has to be explicitly encoded in the datastream that connects these filters. This encoding scheme involves decisions which must be known to both filters. The data has to be unparsed by one filter while the next filter must parse its input in order to rebuild that structure. The Achille's heel of the pipes-and-filters scheme is error handling. if one filter detects an error, it is cumbersome to pass the resulting error message through intermediate filters all the way to the final output. Filters must also be able to resynchronize after an error has been detected and filters further downstream must be able to tolerate incomplete input. Repository Architectural Style The repository style fits situations where the main issue is to manage a richly structured body of information. In the library example in the section on Requirements Engineering, the data concerns such things as the stock of available books and the collection of members of the library. The data is persistent and it is important that it always reflects the true state of affairs. A natural approach to this problem is to devise database schemas for the various types of data in the application and store the data in one or more databases. The functionality of the system is incorporated in a number of, relatively independent, computational elements. The result is a repository architectural style.

Copyright :IT ENGG PORTAL www.itportal.in

Table 12 - Architectural Style: Repository The central issue is maintaing a richly structured body of information. The information must typically be manipulated in many different ways. The data is long-lived and its integrity is important. This style often requires considerable support, in the form of a run-time system augmented with a database. Data definitions may have to be processed to generate support to maintain the correct structure of the data. System Model: The major characteristics of this model is its centralized, richly structured body of information. The computational elements acting upon the repository are often independent. Components: There is one memory component and many computational processes. Connectors: Computational units interact with the memory components by direct access or procedure call. Control Structure The control structure varies. In traditional database systems, for example, control depends on the input to the database functions. In a modern compiler, control is fixed: processes are sequential and incremental. In blackboard systems, control depends on the state of the computation. Traditional database systems are characterized by their transaction-oriented nature. The computational processes are independent and triggered by incoming requests. Modern compilers, and software development support environments, are systems that increment the information contained in the repository. Blackboard systems have their origin in AI. They have been used for complex applications such as speech recognition, in which different computational elements each solve part of the problem and update the information on the blackboard.

Problem

Context

Solution

Variants

Modern compilers are often structured in a similar way. Such a compiler maintains a central representation of the program to be translated. A rudimentary version of that representation results from the first, lexical, phase: a sequence of tokens rather than a sequence of character glyphs. Subsequent phases, sucy as syntax and semantic analysis, further enrich this structure into, for example, an abstract syntax tree. In the end, code is generated from this representation. Other tools, such as symbolic debuggers, pretty-printing programs or static analysis tools, may also employ the internal representation built by the compiler. The resulting architectural style again is that of repository: one memory component and a number of computational elements that act on the repository. Unlike the database variant, the order of invocation of the elements matters in the case of a compiler. Also, different computational elements enrich the internal representation, rather than merely update it. The repository architectural style can also be found in certain AI applications. In computationally complex applications, such as speech recognition, an internal representation is built and acted upon by different computational elements. For example, one computational element may filter noise, another one builds up phenomes, etc. The internal representation in this type of system is called a blackboard and the architecture is sometimes referred to as a blackboard architecture. A major difference with traditional database systems is that the invocation of computational elements in a blackboard architecture is triggered by the current state of the blackboard, rather than by external inputs. Elements from a blackboard architecture enrich and refine the state
Copyright :IT ENGG PORTAL www.itportal.in

representation until a solution is found. Layers Architectural Style The final example of an architectural style is the layered architectural style. A prototypical instance is the ISO Open System Interconnection Model for network communication. It has seven layers: 1. 2. 3. 4. 5. 6. 7. Physical Data Network Transport Session Presentation Application

The bottom layer provides basic functionality. Higher layers use the functionality of lower layers. The different layers can be viewed as virtual machines whose 'instructions' become more powerful and abstract as we go from lower layers to higher layers.

Table 13 - Architectural Style: Layers We can identify distinct classes of services that can be arranged hierarchially. The system can be depicted as a series of concentric circles, where services in one layer depend on services from inner layers. Quite often, such a system is split into three layers: Problem 1. Basic Services 2. General Utilities 3. Application Specific Utilities Each class of service has to be assigned to a specific layer. It may occasionally be difficult to properly identify the function layer succinctly and, as a consequence, to assign a given function to the most appropriate layer. This holds the more if we restrict visibility to just one layer. System Model: The resulting system consists of a hierarchy of layers. Usually, visibility of inner layers is restricted. Components: The components in each layer usually consist of collections of procedures. Connectors: Components generally interact through procedure calls. Because of the limited visibility, the interaction is limited. Control Structure: The system has a single thread of control. A layer may be viewed as a virtual machine, offering a set of 'instructions' to the next layer. Viewed this way, the peripheral layers get more and more abstract. Layering may also result from a wish to separate functionality, e.g. user interface layer or application logic layer. Variants of the layered scheme may differ as to the visibility of components to outer layers. In the most constrained case, visibility is constrained to the next layer up.
Copyright :IT ENGG PORTAL www.itportal.in

Context

Solution

Variants

1 0

In a layered scheme, by definition, lower levels cannot use the functionality offered by higher levels. However, the other way around, the situation is more varied. We may choose to allow layer n to use the functionality of each layer m, with m < n. We may also choose to limit the visibility of functionality offered by each layer, and for example restrict layer n to use only the functionality offered by layer n-1. A Design issue in each case is how to assign functionality to the different layers of the architecture, i.e. how to characterize the virtual machine it embodies. If visibility is not restricted, some of the elegance of the layered architecture gets lost. This situation resembles that of programming languages containing low-level bit manipulation operations alongside WHILE statements and procedure calls. If visibility is restricted, we may end up copying functionality to higher levels without increasing the level of abstraction. Using an example of a layered architectural style for use in telecommunications. It can be seen that layers do not necessarily correspond to different layers of abstraction. Rather, the functionality of the system has been separated. Two main guidelines drive the assignment of functionality to layers in this architecture: 1. Hardware dependent functionality should be placed in lower level layers than application dependent functionality. 2. Generic functionality should be placed in lower layers than specific functionality. The resulting architecture has four layers: 1. Operating System: This layer comprises the runtime system, database, memory management and so on. 2. Equipment Maintenance: This layer houses the control for peripheral devices and its interconnection structure. It deals with such things as data distribution and fault-handling of peripheral hardware. The bottom two layers togethre constitute the distributed operating infrastructure upon which applications run. 3. Logical-Resource Management: Logical resources can be divided into two classes. The first class constrains abstractions from hardware objects. The second class consists of software-related logical objects, such as those for call-forwarding in telephony. 4. Service Management: This layer contains the application functionality. A similar line of thought can be followed in other domains. For instance, it is hard to predict how future household equipment will be assembled into hardware boxes. Will the PC and the television have the same box? Will the television and the DVD player be combined or will they remain as separate boxes? No one seems to know. Since the half-life of many of these products is about six months, industry is forced to use a buildingblock approach, emphasizing reuse and the development of product families rather than products. A division of functionality into a hardware-related inner layer, a generic signal processing layer, and a user-oriented service layer suggests itself. The above architecture for telecommunications applications can be understood along the same lines. In practice, we will usually encounter a mixture of architectural styles. For example, many software environments can be characterized as a combination of the repository and layered architectural styles. The core of the system is a repository in which the various objects, ranging from program texts to work-breakdown structures, reside. Access to these objects as well as basic mechanisms for the execution and communication of tools are contained in a layer on top of this repository. The tools themselves are configured in one or more layers on top of these basic layers. Interaction between tools may yet follow another paradigm, such as implicit invocation.
1 1

Copyright :IT ENGG PORTAL www.itportal.in

Introduction to 3-Tier Architecture Introduction As a developer, the .NET framework and Visual Studio present many choices for choosing the right architecture, from placing the data access code directly in the UI through datasets and data source controls, to creating a data access layer that talks to the database, all the way to creating an ntier architecture approach that consists of multiple layers, and use data-transfer objects to pass data back and forth. If youve ever wondered why you should use layers and what the benefits ar e, this article is for you. This article delves into the use of layers and how they can benefit any application. What is a Layer? A layer is a reusable portion of code that performs a specific function. In the .NET environment, a layer is usually setup as a project that represents this specific function. This specific layer is in charge of working with other layers to perform some specific goal. In an application where the presentation layer needs to extract information from a backend database, the presentation would utilize a series of layers to retrieve the data, rather than having the database calls embedded directly within itself. Lets briefly look at the latter situation first. Two-Tier Architecture When the .NET 2.0 framework became available to the world, there were some neat features that allowed the developer to connect the frameworks GUI controls directly to the database. This approach is very handy when rapidly developing applications. However, its not always favorable to embed all of the business logic and data access code directly in the web site, for several reasons: Putting all of the code in the web site (business logic and data access) can make the application harder to maintain and understand. Reusing database queries in the presentation layer often isnt done, because of the typical data source control setup in the ASP.NET framework. Relying on the data source controls can make debugging more difficult, often due to vague error messages.

So in looking for an alternative, we can separate the data access code and business logic into separate layers, which well discuss next. The Data Layer The key component to most applications is the data. The data has to be served to the presentation layer somehow. The data layer is a separate component (often setup as a separate single or group of projects in a .NET solution), whose sole purpose is to serve up the data from the database and return it to the caller. Through this approach, data can be logically reused, meaning that a portion of an application reusing the same query can make a call to one data layer method, instead of embedding the query multiple times. This is generally more maintainable.
Copyright :IT ENGG PORTAL www.itportal.in

1 2

But the question is how is the data returned? Multiple frameworks employ different techniques, and below is a summary: ADO.NET Built into the .NET framework, ADO.NET contains a mechanism to query data out of the database and return it to the caller in a connected or disconnected fashion. This is the most common approach to working with data, because its already readily available. See more at: http://en.wikipedia.org/wiki/ADO.NET. Table Adapters/Strongly-Typed Datasets Strongly-typed datasets and table adapters provide a similar means to querying the data through ADO.NET, but add strong-typing features, meaning custom objects are generated for you to work with. See more here. Enterprise Library Enterprise library Data Access Application Block provides a flexible way to connect to databases of multiple types, without having to know anything about that database, through an abstract approach. See more at: http://msdn2.microsoft.com/enus/magazine/cc188705.aspx (read part one first). LINQ-to-SQL LINQ to SQL is an ORM tool that uses a DataContext object as the central point to query data from the database. See more here. (read parts one through eight first). Auto-Generated Code Tools like CodeSmith Studio automatically generate the code for you based upon a database schema. Simply writing a script to output the code you want to use and the backend is generated in a short amount of time.

Most (if not all) options above take advantage of the CRUD (create, read, update, or delete) operations that databases support, so all of that is available as shown above. There are plenty of resources online to help you get started. To see an overview of some of the options, please read this. Business Layer Though a web site could talk to the data access layer directly, it usually goes through another layer called the business layer. The business layer is vital in that it validates the input conditions before calling a method from the data layer. This ensures the data input is correct before proceeding, and can often ensure that the outputs are correct as well. This validation of input is called business rules, meaning the rules that the business layer uses to make judgments about the data. However, business rules dont only apply to data validation; these rules apply to any calculations or any other action that takes place in the business layer. Normally, its best to p ut as much logic as possible in the business layer, which makes this logic reusable across applications. One of the best reasons for reusing logic is that applications that start off small usually grow in functionality. For instance, a company begins to develop a web site, and as they realize their business needs, they later decide to add a smart client application and windows service to supplement the web site. The business layer helps move logic to a central layer for maximum reusability. Presentation Layer The ASP.NET web site or windows forms application (the UI for the project) is called the presentation layer. The presentation layer is the most important layer simply because its the one that everyone
1 3

Copyright :IT ENGG PORTAL www.itportal.in

sees and uses. Even with a well structured business and data layer, if the presentation layer is designed poorly, this gives the users a poor view of the system. Its best to remove as much business logic out of the UI and into the business layer. This usually involves more code, but in my mind, the excess time (which ranges from minimal to moderate, depending on the size of the application) pays off in the end. However, a well-architected system leaves another question: how do you display it in an ASP.NET or windows application? This can be more of a problem in ASP.NET, as the controls are more limited to the type of inputs they can receive. If you use certain architectures, like passing datasets from the data to the presentation layer, this isnt as much of a challenge; however, the challenge can come w ith business objects that support drill-through business object references. Why Separating Logic Is Useful? You may wonder why it is important to move as much logic outside the presentation layer and into the business layer. The biggest reason is reuse: logic placed in a business layer increases the reusability of an application. As applications grow, applications often grow into other realms. Applications may start out as a web application, but some of the functionality may later be moved to a smart client application. Portions of an application may be split between a web site and a web or windows service that runs on a server. In addition, keeping logic helps aid in developing a good design (sometimes code can get sloppier in the UI). However, there are some caveats to this: it takes a little longer to develop applications when most of the logic resides in the business layer. The reason is this often involves creating several sets of objects (data layer and access code, plus business objects) rather than embedding it in the application. The extra time that it takes to do this can be a turnoff for some managers and project leads, especially because it often requires you to be knowledgeable about object-oriented programming, more than most people are comfortable with. A layered approach is often a better approach because it pays dividends down the road. This is because as more and more code is developed, the following happens: Code is copied and pasted frequently, or code is reused in classes that could easily be moved to a business layer. Code that is very similar is often copied and pasted with slight modification, making duplication harder to track down. Its harder to maintain; even though applications with business objects are larger applications, they usually are structured better. Code is harder to unit test, if unit testing is available at all. Web applications and windows forms projects are hard to use unit testing with.
1 4

A good architecture is often harder to implement, but is easier to maintain because it often reduces the volume of code. This means that hours spent supporting an application are reduced. Distributed Applications
Copyright :IT ENGG PORTAL www.itportal.in

Using a separation of layers can aid in development of distributed applications. Because the code is broken up into layers, a layer that facilitates the use of remoting or web services can be added to the project, with a minimal amount of work. Development Techniques When developing a business object architecture, its good to know about the many design patterns that are out there. There are many websites, blogs, and books related to the subject of design patterns. One of the more well-known books on the subject is titled Design Patterns, whom the authors are often referred to as the Gang of Four. Another useful development technique is called Refactoring, or improving the quality of your code by making small changes to the way it works. This involves moving code into a method, or moving a method from one object to another, in a systematic, logical way. Martin Fowler has written a great book on this subject, called Refactoring, Improving the Design of Existing Code. There are plenty of books on the subject; this one is the source that helped me to understand refactoring the most. There are also tools on the market that can help you refactor in a faster way. One of those tools is Resharper by Jet Brains, which looks for a lot of code patterns and refactors them in a way that is useful. Some of the other refactoring tools that I heard about are Refactor Pro by DevExpress (free for VB.NET and ASP.NET), Visual Assist X by Whole Tomato Software, and Just Code by OmniCore. Conclusion This article reviewed the use of layers in an application, and discussed the fundamentals of their use. It also discussed the purpose of each layer, why using layers is important, and some other techniques useful for developing applications. --------------------------------------------------------------Loose coupling Service-Oriented Architecture Web services promote an environment for systems that is loosely coupled and interoperable. Many of the concepts for Web services come from a conceptual architecture called service-oriented architecture (SOA). SOA configures entities (services, registries, contracts, and proxies) to maximize loose coupling and reuse. Software architecture describes the systems components and the way they interact at a high level. These components are not necessarily entity beans or distributed objects. They are abstract modules of software deployed as a unit onto a server with other components. The interactions between components are called connectors. The configuration of components and connectors describes the way a system is structured and behaves, as shown in Figure 2.1. Rather than creating a formal definition for software architecture in this chapter, we will adopt this classic definition: The software architecture of a program or computing system is the structure or structures of the system, which comprise software components, the externally.
Copyright :IT ENGG PORTAL www.itportal.in

1 5

SOA is a relatively new term, but the term service as it relates to a software service has been around since at least the early 1990s, when it was used in Tuxedo to describe services and service processes (Herzum 2002). Sun defined SOA more rigorously in the late 1990s to descr ibe Jini, a lightweight environment fordynamically discovering and using services on a network. The technology is used mostly in reference to allowing network plug and play for devices. It allows devices such as printers to dynamically connect to and download drivers from the network and register their services as being available. Loose Coupling Coupling refers to the number of dependencies between modules. There are two types of coupling: loose and tight. Loosely coupled modules have a few well known dependencies. Tightly coupled modules have many unknown dependencies. Every software architecture strives to achieve loose coupling between modules. Service-oriented architecture promotes loose coupling between service consumers and service providers and the idea of a few well-known dependencies between consumers and providers. A systems degree of coupling directly affects its modifiability. The more tightly coupled a system is, the more a change in a service will require changes in service consumers. Coupling is increased when service consumers require a large amount of information about the service provider to use the service. In other words, if a service consumer knows the location and detailed data format for a service provider, the consumer and provider are more tightly coupled. If the consumer of the service does not need detailed knowledge of the service before invoking it, the consumer and provider are more loosely coupled. SOA accomplishes loose coupling through the use of contracts and bindings. A consumer asks a thirdparty registry for information about the type of service it wishes to use. The registry returns all the services it has available that match the consumers criteria. The consumer chooses which service to use, binds to it over a transport, and executes the method on it, based on the description of the service provided by the registry. The consumer does not depend directly on the services implementation but only on the contract the service supports. Since a service may be both a consumer and a provider of some services, the dependency on only the contract enforces the notion of loose coupling in serviceoriented architecture. Although coupling between service consumers and service producers is loose, implementation of the service can be tightly coupled with implementation of other services. For instance, if a set of services shares a framework, a database, or otherwise has information about each others implementation, they may be tightly coupled. In many instances, coupling cannot be avoided, and it sometimes contradicts the goal of code reusability. ---------------------------------------------------------1 6

Copyright :IT ENGG PORTAL www.itportal.in

XML Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards. The design goals of XML emphasize simplicity, generality, and usability over the Internet. It is a textual data format with strong support via Unicode for the languages of the world. Although the design of XML focuses on documents, it is widely used for the representation of arbitrary data structures, for example in web services. Many application programming interfaces (APIs) have been developed that software developers use to process XML data, and several schema systems exist to aid in the definition of XML-based languages. As of 2009, hundreds of XML-based languages have been developed, including RSS, Atom, SOAP, and XHTML. XML-based formats have become the default for many office-productivity tools, including Microsoft Office (Office Open XML), OpenOffice.org (OpenDocument), and Apple's iWork. XML has also been employed as the base language for communication protocols, such as XMPP. Key terminologies An introduction to the key constructs most often encountered in day-to-day use. (Unicode) Character By definition, an XML document is a string of characters. Almost every legal Unicode character may appear in an XML document. Processor and Application The processor analyzes the markup and passes structured information to an application. The specification places requirements on what an XML processor must do and not do, but the application is outside its scope. The processor (as the specification calls it) is often referred to colloquially as an XML parser. Markup and Content The characters which make up an XML document are divided into markup and content. Markup and content may be distinguished by the application of simple syntactic rules. All strings which constitute markup either begin with the character "<" and end with a ">", or begin with the character "&" and end with a ";". Strings of characters which are not markup are content. Tag A markup construct that begins with "<" and ends with ">". Tags come in three flavors: start-tags, for example <section>, end-tags, for example </section>, and empty-element tags, for example <linebreak />.
Copyright :IT ENGG PORTAL www.itportal.in

1 7

Element A logical document component either begins with a start-tag and ends with a matching end-tag or consists only of an empty-element tag. The characters between the start- and end-tags, if any, are the element's content, and may contain markup, including other elements, which are called child elements. An example of an element is <Greeting>Hello, world.</Greeting> (see hello world). Another is <line-break />. Attribute A markup construct consisting of a name/value pair that exists within a start-tag or empty-element tag. In the example (below) the element img has two attributes, src and alt: <img src="madonna.jpg" alt='Foligno Madonna, by Raphael' />. Another example would be <step number="3">Connect A to B.</step> where the name of the attribute is "number" and the value is "3". XML Declaration XML documents may begin by declaring some information about themselves, as in the following example. <?xml version="1.0" encoding="UTF-8" ?> Characters and escaping XML documents consist entirely of characters from the Unicode repertoire. Except for a small number of specifically excluded control characters, any character defined by Unicode may appear within the content of an XML document. The selection of characters that may appear within markup is somewhat more limited but still large. XML includes facilities for identifying the encoding of the Unicode characters that make up the document, and for expressing characters that, for one reason or another, cannot be used directly. Valid characters Unicode code points in the following ranges are valid in XML 1.0 documents: U+0009, U+000A, U+000D: these are the only C0 controls accepted in XML 1.0; U+0020U+D7FF, U+E000U+FFFD: this excludes some (not all) non-characters in the BMP (all surrogates, U+FFFE and U+FFFF are forbidden); U+10000U+10FFFF: this includes all code points in supplementary planes, including noncharacters.

XML 1.1 extends the set of allowed characters to include all the above, plus the remaining characters in the range U+0001U+001F. At the same time, however, it restricts the use of C0 and C1 control characters other than U+0009, U+000A, U+000D, and U+0085 by requiring them to be written in escaped form (for example U+0001 must be written as &#x01; or its equivalent). In the case of C1
Copyright :IT ENGG PORTAL www.itportal.in

1 8

characters, this restriction is a backwards incompatibility; it was introduced to allow common encoding errors to be detected. The code point U+0000 is the only character that is not permitted in any XML 1.0 or 1.1 document. Encoding detection The Unicode character set can be encoded into bytes for storage or transmission in a variety of different ways, called "encodings". Unicode itself defines encodings that cover the entire repertoire; well-known ones include UTF-8 and UTF-16.[11] There are many other text encodings that pre-date Unicode, such as ASCII and ISO/IEC 8859; their character repertoires in almost every case are subsets of the Unicode character set. XML allows the use of any of the Unicode-defined encodings, and any other encodings whose characters also appear in Unicode. XML also provides a mechanism whereby an XML processor can reliably, without any prior knowledge, determine which encoding is being used.[12] Encodings other than UTF-8 and UTF-16 will not necessarily be recognized by every XML parser. Escaping XML provides escape facilities for including characters which are problematic to include directly. For example: The characters "<" and "&" are key syntax markers and may never appear in content outside of a CDATA section.[13] Some character encodings support only a subset of Unicode: for example, it is legal to encode an XML document in ASCII, but ASCII lacks code points for Unicode characters such as "". It might not be possible to type the character on the author's machine. Some characters have glyphs that cannot be visually distinguished from other characters: examples are non-breaking-space (&#xa0;) " " compare space (&#x20;) " " Cyrillic Capital Letter A (&#x410;) "" compare Latin Capital Letter A (&#x61;) "A"

There are five predefined entities: 1. &lt; represents "<" 2. &gt; represents ">" 3. &amp; represents "&"
Copyright :IT ENGG PORTAL www.itportal.in

1 9

4. &apos; represents ' 5. &quot; represents " All permitted Unicode characters may be represented with a numeric character reference. Consider the Chinese character "", whose numeric code in Unicode is hexadecimal 4E2D, or decimal 20,013. A user whose keyboard offers no method for entering this character could still insert it in an XML document encoded either as &#20013; or &#x4e2d;. Similarly, the string "I <3 Jrg" could be encoded for inclusion in an XML document as "I &lt;3 J&#xF6;rg". "&#0;" is not permitted, however, because the null character is one of the control characters excluded from XML, even when using a numeric character reference.[14] An alternative encoding mechanism such as Base64 is needed to represent such characters. Comments Comments may appear anywhere in a document outside other markup. Comments cannot appear before the XML declaration. The string "--" (double-hyphen) is not allowed inside comments. Comments start with "<!--". The ampersand has no special significance within comments, so entity and character references are not recognized as such, and there is no way to represent characters outside the character set of the document encoding. An example of a valid comment: "<!-- no need to escape <code> & such in comments -->" International use XML supports the direct use of almost any Unicode character in element names, attributes, comments, character data, and processing instructions (other than the ones that have special symbolic meaning in XML itself, such as the less-than sign, "<"). Therefore, the following is a wellformed XML document, even though it includes both Chinese and Cyrillic characters: <?xml version="1.0" encoding="UTF-8" ?> <></> Well-formedness and error-handling The XML specification defines an XML document as a text that is well-formed, i.e. it satisfies a list of syntax rules provided in the specification. The list is fairly lengthy; some key points are: It contains only properly encoded legal Unicode characters. None of the special syntax characters such as "<" and "&" appear except when performing their markup-delineation roles. The begin, end, and empty-element tags that delimit the elements are correctly nested, with none missing and none overlapping.
2 0

Copyright :IT ENGG PORTAL www.itportal.in

The element tags are case-sensitive; the beginning and end tags must match exactly. Tag names cannot contain any of the characters !"#$%&'()*+,/;<=>?@[\]^`{|}~, nor a space character, and cannot start with -, ., or a numeric digit. There is a single "root" element that contains all the other elements.

The definition of an XML document excludes texts that contain violations of well-formedness rules; they are simply not XML. An XML processor that encounters such a violation is required to report such errors and to cease normal processing. This policy, occasionally referred to as draconian, stands in notable contrast to the behavior of programs that process HTML, which are designed to produce a reasonable result even in the presence of severe markup errors.[15] XML's policy in this area has been criticized as a violation of Postel's law ("Be conservative in what you send; be liberal in what you accept"). Schemas and validation In addition to being well-formed, an XML document may be valid. This means that it contains a reference to a Document Type Definition (DTD), and that its elements and attributes are declared in that DTD and follow the grammatical rules for them that the DTD specifies. XML processors are classified as validating or non-validating depending on whether or not they check XML documents for validity. A processor that discovers a validity error must be able to report it, but may continue normal processing. A DTD is an example of a schema or grammar. Since the initial publication of XML 1.0, there has been substantial work in the area of schema languages for XML. Such schema languages typically constrain the set of elements that may be used in a document, which attributes may be applied to them, the order in which they may appear, and the allowable parent/child relationships. DTD : Document Type Definition The oldest schema language for XML is the Document Type Definition (DTD), inherited from SGML. DTDs have the following benefits: DTD support is ubiquitous due to its inclusion in the XML 1.0 standard. DTDs are terse compared to element-based schema languages and consequently present more information in a single screen. DTDs allow the declaration of standard public entity sets for publishing characters. DTDs define a document type rather than the types used by a namespace, thus grouping all constraints for a document in a single collection.
2 1

DTDs have the following limitations: They have no explicit support for newer features of XML, most importantly namespaces.
Copyright :IT ENGG PORTAL www.itportal.in

They lack expressiveness. XML DTDs are simpler than SGML DTDs and there are certain structures that cannot be expressed with regular grammars. DTDs only support rudimentary datatypes. They lack readability. DTD designers typically make heavy use of parameter entities (which behave essentially as textual macros), which make it easier to define complex grammars, but at the expense of clarity. They use a syntax based on regular expression syntax, inherited from SGML, to describe the schema. Typical XML APIs such as SAX do not attempt to offer applications a structured representation of the syntax, so it is less accessible to programmers than an element-based syntax may be. Two peculiar features that distinguish DTDs from other schema types are the syntactic support for embedding a DTD within XML documents and for defining entities, which are arbitrary fragments of text and/or markup that the XML processor inserts in the DTD itself and in the XML document wherever they are referenced, like character escapes. DTD technology is still used in many applications because of its ubiquity.

XML Schema: XML Schema (W3C) A newer schema language, described by the W3C as the successor of DTDs, is XML Schema, often referred to by the initialism for XML Schema instances, XSD (XML Schema Definition). XSDs are far more powerful than DTDs in describing XML languages. They use a rich datatyping system and allow for more detailed constraints on an XML document's logical structure. XSDs also use an XML-based format, which makes it possible to use ordinary XML tools to help process them. RELAX NG: RELAX NG was initially specified by OASIS and is now also an ISO/IEC International Standard (as part of DSDL). RELAX NG schemas may be written in either an XML based syntax or a more compact non-XML syntax; the two syntaxes are isomorphic and James Clark's Trang conversion tool can convert between them without loss of information. RELAX NG has a simpler definition and validation framework than XML Schema, making it easier to use and implement. It also has the ability to use datatype framework plug-ins; a RELAX NG schema author, for example, can require values in an XML document to conform to definitions in XML Schema Datatypes. Schematron Schematron is a language for making assertions about the presence or absence of patterns in an XML document. It typically uses XPath expressions.
2 2

ISO DSDL and other schema languages

Copyright :IT ENGG PORTAL www.itportal.in

The ISO DSDL (Document Schema Description Languages) standard brings together a comprehensive set of small schema languages, each targeted at specific problems. DSDL includes RELAX NG full and compact syntax, Schematron assertion language, and languages for defining datatypes, character repertoire constraints, renaming and entity expansion, and namespace-based routing of document fragments to different validators. DSDL schema languages do not have the vendor support of XML Schemas yet, and are to some extent a grassroots reaction of industrial publishers to the lack of utility of XML Schemas for publishing. Some schema languages not only describe the structure of a particular XML format but also offer limited facilities to influence processing of individual XML files that conform to this format. DTDs and XSDs both have this ability; they can for instance provide the infoset augmentation facility and attribute defaults. RELAX NG and Schematron intentionally do not provide these.

Related specifications A cluster of specifications closely related to XML have been developed, starting soon after the initial publication of XML 1.0. It is frequently the case that the term "XML" is used to refer to XML together with one or more of these other technologies which have come to be seen as part of the XML core. XML Namespaces enable the same document to contain XML elements and attributes taken from different vocabularies, without any naming collisions occurring. Although XML Namespaces are not part of the XML specification itself, virtually all XML software also supports XML Namespaces. XML Base defines the xml:base attribute, which may be used to set the base for resolution of relative URI references within the scope of a single XML element. The XML Information Set or XML infoset describes an abstract data model for XML documents in terms of information items. The infoset is commonly used in the specifications of XML languages, for convenience in describing constraints on the XML constructs those languages allow. xml:id Version 1.0 asserts that an attribute named xml:id functions as an "ID attribute" in the sense used in a DTD. XPath defines a syntax named XPath expressions which identifies one or more of the internal components (elements, attributes, and so on) included in an XML document. XPath is widely used in other core-XML specifications and in programming libraries for accessing XMLencoded data. XSLT is a language with an XML-based syntax that is used to transform XML documents into other XML documents, HTML, or other, unstructured formats such as plain text or RTF. XSLT is very tightly coupled with XPath, which it uses to address components of the input XML document, mainly elements and attributes.
Copyright :IT ENGG PORTAL www.itportal.in

2 3

XSL Formatting Objects, or XSL-FO, is a markup language for XML document formatting which is most often used to generate PDFs. XQuery is an XML-oriented query language strongly rooted in XPath and XML Schema. It provides methods to access, manipulate and return XML, and is mainly conceived as a query language for XML databases. XML Signature defines syntax and processing rules for creating digital signatures on XML content. XML Encryption defines syntax and processing rules for encrypting XML content. Some other specifications conceived as part of the "XML Core" have failed to find wide adoption, including XInclude, XLink, and XPointer.

Use on the Internet XML has come into common use for the interchange of data over the Internet. RFC 3023 gives rules for the construction of Internet Media Types for use when sending XML. It also defines the types "application/xml" and "text/xml", which say only that the data are in XML, and nothing about its semantics. The use of "text/xml" has been criticized as a potential source of encoding problems and is now in the process of being deprecated.[3] RFC 3023 also recommends that XML-based languages be given media types beginning in "application/" and ending in "+xml"; for example "application/svg+xml" for SVG. Programming interfaces The design goals of XML include, "It shall be easy to write programs which process XML documents."[6] Despite this, the XML specification contains almost no information about how programmers might go about doing such processing. The XML Infoset specification provides a vocabulary to refer to the constructs within an XML document, but also does not provide any guidance on how to access this information. A variety of APIs for accessing XML have been developed and used, and some have been standardized.

Existing APIs for XML processing tend to fall into these categories: Stream-oriented APIs accessible from a programming language, for example SAX and StAX. Tree-traversal APIs accessible from a programming language, for example DOM. XML data binding, which provides an automated translation between an XML document and programming-language objects. Declarative transformation languages such as XSLT and XQuery.
2 4

Stream-oriented facilities require less memory and, for certain tasks which are based on a linear traversal of an XML document, are faster and simpler than other alternatives. Tree-traversal and
Copyright :IT ENGG PORTAL www.itportal.in

data-binding APIs typically require the use of much more memory, but are often found more convenient for use by programmers; some include declarative retrieval of document components via the use of XPath expressions. XSLT is designed for declarative description of XML document transformations, and has been widely implemented both in server-side packages and Web browsers. XQuery overlaps XSLT in its functionality, but is designed more for searching of large XML databases. Simple API for XML (SAX)

SAX is a lexical, event-driven interface in which a document is read serially and its contents are reported as callbacks to various methods on a handler object of the user's design. SAX is fast and efficient to implement, but difficult to use for extracting information at random from the XML, since it tends to burden the application author with keeping track of what part of the document is being processed. It is better suited to situations in which certain types of information are always handled the same way, no matter where they occur in the document. Pull parsing Pull parsing[17] treats the document as a series of items which are read in sequence using the Iterator design pattern. This allows for writing of recursive-descent parsers in which the structure of the code performing the parsing mirrors the structure of the XML being parsed, and intermediate parsed results can be used and accessed as local variables within the methods performing the parsing, or passed down (as method parameters) into lower-level methods, or returned (as method return values) to higher-level methods. Examples of pull parsers include StAX in the Java programming language, XMLReader in PHP and System.Xml.XmlReader in the .NET Framework. A pull parser creates an iterator that sequentially visits the various elements, attributes, and data in an XML document. Code which uses this iterator can test the current item (to tell, for example, whether it is a start or end element, or text), and inspect its attributes (local name, namespace, values of XML attributes, value of text, etc.), and can also move the iterator to the next item. The code can thus extract information from the document as it traverses it. The recursive-descent approach tends to lend itself to keeping data as typed local variables in the code doing the parsing, while SAX, for instance, typically requires a parser to manually maintain intermediate data within a stack of elements which are parent elements of the element being parsed. Pull-parsing code can be more straightforward to understand and maintain than SAX parsing code.

Document Object Model The Document Object Model (DOM) is an interface-oriented application programming interface that allows for navigation of the entire document as if it were a tree of node objects representing the document's contents. A DOM document can be created by a parser, or can be generated manually by
Copyright :IT ENGG PORTAL www.itportal.in

2 5

users (with limitations). Data types in DOM nodes are abstract; implementations provide their own programming language-specific bindings. DOM implementations tend to be memory intensive, as they generally require the entire document to be loaded into memory and constructed as a tree of objects before access is allowed. Data binding Another form of XML processing API is XML data binding, where XML data are made available as a hierarchy of custom, strongly typed classes, in contrast to the generic objects created by a Document Object Model parser. This approach simplifies code development, and in many cases allows problems to be identified at compile time rather than run-time. Example data binding systems include the Java Architecture for XML Binding (JAXB) and XML Serialization in .NET. XML as data type XML is beginning to appear as a first-class data type in other languages. The ECMAScript for XML (E4X) extension to the ECMAScript/JavaScript language explicitly defines two specific objects (XML and XMLList) for JavaScript, which support XML document nodes and XML node lists as distinct objects and use a dot-notation specifying parent-child relationships.[19] E4X is supported by the Mozilla 2.5+ browsers and Adobe Actionscript, but has not been adopted more universally. Similar notations are used in Microsoft's LINQ implementation for Microsoft .NET 3.5 and above, and in Scala (which uses the Java VM). The open-source xmlsh application, which provides a Linux-like shell with special features for XML manipulation, similarly treats XML as a data type, using the <[ ]> notation.[20] The Resource Description Framework defines a data type rdf:XMLLiteral to hold wrapped, canonical XML.[21] History XML is an application profile of SGML (ISO 8879). The versatility of SGML for dynamic information display was understood by early digital media publishers in the late 1980s prior to the rise of the Internet.[23][24] By the mid-1990s some practitioners of SGML had gained experience with the then-new World Wide Web, and believed that SGML offered solutions to some of the problems the Web was likely to face as it grew. Dan Connolly added SGML to the list of W3C's activities when he joined the staff in 1995; work began in mid-1996 when Sun Microsystems engineer Jon Bosak developed a charter and recruited collaborators. Bosak was well connected in the small community of people who had experience both in SGML and the Web. XML was compiled by a working group of eleven members,[26] supported by an (approximately) 150member Interest Group. Technical debate took place on the Interest Group mailing list and issues were resolved by consensus or, when that failed, majority vote of the Working Group. A record of design decisions and their rationales was compiled by Michael Sperberg-McQueen on December 4, 1997.[27] James Clark served as Technical Lead of the Working Group, notably contributing the empty-element "<empty />" syntax and the name "XML". Other names that had been put forward for consideration included "MAGMA" (Minimal Architecture for Generalized Markup Applications), "SLIM" (Structured Language for Internet Markup) and "MGML" (Minimal Generalized Markup Language). The co-editors of the specification were originally Tim Bray and Michael SperbergCopyright :IT ENGG PORTAL www.itportal.in

2 6

McQueen. Halfway through the project Bray accepted a consulting engagement with Netscape, provoking vociferous protests from Microsoft. Bray was temporarily asked to resign the editorship. This led to intense dispute in the Working Group, eventually solved by the appointment of Microsoft's Jean Paoli as a third co-editor. The XML Working Group never met face-to-face; the design was accomplished using a combination of email and weekly teleconferences.

Sources XML is a profile of an ISO standard SGML, and most of XML comes from SGML unchanged. From SGML comes the separation of logical and physical structures (elements and entities), the availability of grammar-based validation (DTDs), the separation of data and metadata (elements and attributes), mixed content, the separation of processing from representation (processing instructions), and the default angle-bracket syntax. Removed were the SGML Declaration (XML has a fixed delimiter set and adopts Unicode as the document character set). Other sources of technology for XML were the Text Encoding Initiative (TEI), which defined a profile of SGML for use as a "transfer syntax"; and HTML, in which elements were synchronous with their resource, document character sets were separate from resource encoding, the xml:lang attribute was invented, and (like HTTP) metadata accompanied the resource rather than being needed at the declaration of a link. The Extended Reference Concrete Syntax (ERCS) project of the SPREAD (Standardization Project Regarding East Asian Documents) project of the ISO-related China/Japan/Korea Document Processing expert group was the basis of XML 1.0's naming rules; SPREAD also introduced hexadecimal numeric character references and the concept of references to make available all Unicode characters. To support ERCS, XML and HTML better, the SGML standard IS 8879 was revised in 1996 and 1998 with WebSGML Adaptations. The XML header followed that of ISO HyTime. Ideas that developed during discussion which were novel in XML included the algorithm for encoding detection and the encoding header, the processing instruction target, the xml:space attribute, and the new close delimiter for empty-element tags. The notion of well-formedness as opposed to validity (which enables parsing without a schema) was first formalized in XML, although it had been implemented successfully in the Electronic Book Technology "Dynatext" software;[30] the software from the University of Waterloo New Oxford English Dictionary Project; the RISP LISP SGML text processor at Uniscope, Tokyo; the US Army Missile Command IADS hypertext system; Mentor Graphics Context; Interleaf and Xerox Publishing System. Versions There are two current versions of XML. The first (XML 1.0) was initially defined in 1998. It has undergone minor revisions since then, without being given a new version number, and is currently in
Copyright :IT ENGG PORTAL www.itportal.in

2 7

its fifth edition, as published on November 26, 2008. It is widely implemented and still recommended for general use. The second (XML 1.1) was initially published on February 4, 2004, the same day as XML 1.0 Third Edition,[31] and is currently in its second edition, as published on August 16, 2006. It contains features (some contentious) that are intended to make XML easier to use in certain cases.[32] The main changes are to enable the use of line-ending characters used on EBCDIC platforms, and the use of scripts and characters absent from Unicode 3.2. XML 1.1 is not very widely implemented and is recommended for use only by those who need its unique features. Prior to its fifth edition release, XML 1.0 differed from XML 1.1 in having stricter requirements for characters available for use in element and attribute names and unique identifiers: in the first four editions of XML 1.0 the characters were exclusively enumerated using a specific version of the Unicode standard (Unicode 2.0 to Unicode 3.2.) The fifth edition substitutes the mechanism of XML 1.1, which is more future-proof but reduces redundancy. The approach taken in the fifth edition of XML 1.0 and in all editions of XML 1.1 is that only certain characters are forbidden in names, and everything else is allowed, in order to accommodate the use of suitable name characters in future versions of Unicode. In the fifth edition, XML names may contain characters in the Balinese, Cham, or Phoenician scripts among many others which have been added to Unicode since Unicode 3.2. Almost any Unicode code point can be used in the character data and attribute values of an XML 1.0 or 1.1 document, even if the character corresponding to the code point is not defined in the current version of Unicode. In character data and attribute values, XML 1.1 allows the use of more control characters than XML 1.0, but, for "robustness", most of the control characters introduced in XML 1.1 must be expressed as numeric character references (and #x7F through #x9F, which had been allowed in XML 1.0, are in XML 1.1 even required to be expressed as numeric character references[34]). Among the supported control characters in XML 1.1 are two line break codes that must be treated as whitespace. Whitespace characters are the only control codes that can be written directly. There has been discussion of an XML 2.0, although no organization has announced plans for work on such a project. XML-SW (SW for skunkworks), written by one of the original developers of XML,[35] contains some proposals for what an XML 2.0 might look like: elimination of DTDs from syntax, integration of namespaces, XML Base and XML Information Set (infoset) into the base standard. The World Wide Web Consortium also has an XML Binary Characterization Working Group doing preliminary research into use cases and properties for a binary encoding of the XML infoset. The working group is not chartered to produce any official standards. Since XML is by definition textbased, ITU-T and ISO are using the name Fast Infoset for their own binary infoset to avoid confusion Criticism XML and its extensions have regularly been criticized for verbosity and complexity.[36] Mapping the basic tree model of XML to type systems of programming languages or databases can be difficult, especially when XML is used for exchanging highly structured data between applications, which was not its primary design goal. Other criticisms attempt to refute the claim that XML is a self-describing language[37] (though the XML specification itself makes no such claim). JSON and YAML are
Copyright :IT ENGG PORTAL www.itportal.in

2 8

frequently proposed as alternatives; both focus on representing structured data, rather than narrative documents.

2 9

Copyright :IT ENGG PORTAL www.itportal.in

XML Structure Structure Elements Document Types Document Structure Document Type Definition (DTD) An Example Element Attributes Process Instructions Extensible linking Language (XLL)

1)XML Elements To create an XML document, it must contain elements. Lets assume that I want to create a document with the elements LAND, FOREST, TREE, MEADOW, GRASS. Here is how I would use these elements: <LAND> <FOREST> <TREE>Oak</TREE> <TREE>Pine</TREE> <TREE>Maple</TREE> </FOREST> <MEADOW> <GRASS>Bluegrass</GRASS> <GRASS>Fescue</GRASS> <GRASS>Rye</GRASS> </MEADOW> </LAND> Each element is enclosed in <> brackets. The ending element has the '/' character before its name. As you can see, there is one element that contains all others, <LAND>. XML requires one element that contains all others. This single element, which in this case is "LAND", is called the root element. The FOREST element contains several TREE elements, and the MEADOW likewise contains several
Copyright :IT ENGG PORTAL www.itportal.in

3 0

elements of GRASS. Each element that is contained in another ends in that same element and therefore each element is properly nested. Elements that are included in another element are considered nested. The TREE elements in the above example are nested in the FOREST element. The FOREST element is the parent element to the TREE element and the TREE element is also called the sub-element to the FOREST element. These relationships hold true as you move up and down the element hierarchy. The FOREST and MEADOW elements are sub-elements to the LAND root element.

The below example is not well formed: <LAND> <FOREST> <TREE>Oak</TREE> <TREE>Pine</TREE> <TREE>Maple </FOREST> </TREE> <MEADOW> <GRASS>Bluegrass</GRASS> <GRASS>Fescue</GRASS> <GRASS>Rye</GRASS> </MEADOW> </LAND> Element Tags XML elements require both a beginning tag and an ending tag for all elements that have content. Elements with content may be written as: <TREE>Very large Oak tree</TREE> Elements with no content may be expressed as: <NOTHING></NOTHING> In shorthand it may be expressed as:
Copyright :IT ENGG PORTAL www.itportal.in

3 1

<NOTHING/> Elements with no content may be used to display graphics and other material in the document. Element Name Requirements Begins with a letter or underscore. The first character may be followed by any combination of letters, numbers, or other ASCII characters. Elements beginning with "XML" whether capatilized or not are reserved. Element Content

Elements may contain: Nested elements - Other sub-elements. Processing instructions Characters - Normal text.

CDATA sections Used to enter text that contains special characters not displayed normally by the browser such as less than or greater than sign. These signs are used to enclose tags and are special characters. An example CDATA section: <![CDATA[ The < and > characters are displayed normally here. ]]> Entity references - An entitity reference is precluded by a & sign. It is used to refer to a previously defined entity, much like a variable may be used in a program. Character references - References to characters that are not displayed normally in XMS such as the < or > characters. These characters are represented as &#60 and &#62 respectively and will be presented on the browser as the less than or greater than character they represent. Comments - Comments are included as shown below and may be placed anywhere except inside an element tag (markup). ---------------------------------2)XML Document Formation This page describes how to form an XML document and the two types of XML documents. Since XML does not require a Document Type Definition (DTD) there are two basic ways to create the document. Types of XML Documents
Copyright :IT ENGG PORTAL www.itportal.in

3 2

Well Formed - The logical structure is not validated against the DTD. A well formed document follows a set of rules to qualify as "well formed". Valid - Must have a DTD and obey all the rules. These rules are more stringent than rules for a well formed document. The document can be defined by a DTD which is either embedded inside the XML page or referenced as external to the page. Also you can write your own DTD or use a pre-existing DTD written by someone else.

Well Formed Valid Valid documents must: Be well formed. Include a DTD. Follow the rules set by the DTD. --------------------------------------------The definition of "well formed" is: There must be one and only one top level element. All elements must have a starting and an ending tag with matching starting and ending names. Element names are case sensitive. Elements must be nested properly.

3)XML Structure This page provides a description of XML structure including the document parts, the prologue, and provides a simple XML example document. Document Parts Prolog Document Element (root element)

The Prologue The prologue, equivalent to the header in HTML, may include the following: An XML declaration (optional) such as: <?xml version="1.0"?> A DTD or reference to one (optional). An example reference to an external DTD file: <!DOCTYPE LANGLIST SYSTEM "langlist.dtd"> Processing instructions - An example processing instruction that causes style to be determined by a style sheet: <?xml-stylesheet type="text/css" href="xmlstyle.css"?>

An XML Document Therefore a complete well formed XML document may look like:
Copyright :IT ENGG PORTAL www.itportal.in

3 3

<?xml version="1.0"?> <LAND> <FOREST> <TREE>Oak</TREE> <TREE>Pine</TREE> <TREE>Maple</TREE> </FOREST> <MEADOW> <GRASS>Bluegrass</GRASS> <GRASS>Fescue</GRASS> <GRASS>Rye</GRASS> </MEADOW> </LAND> The LAND element, above, is the root element. The below document is not an XML document since it does not qualify by the rules of a well formed document. There is more than one top level element which disqualifies the document from being well formed. <?xml version="1.0"?> <FOREST> <TREE>Oak</TREE> <TREE>Pine</TREE> <TREE>Maple</TREE> </FOREST> <MEADOW> <GRASS>Bluegrass</GRASS> <GRASS>Fescue</GRASS> <GRASS>Rye</GRASS> </MEADOW>
Copyright :IT ENGG PORTAL www.itportal.in

3 4

Defining Display If the HTML document is not linked to a style sheet, the XML document will be displayed with tags included. The elements and tags may be color coded to aid in viewing the document. The document is displayed without tags according to the style sheet if a link to one is specified. The following document shows a document with a link to a cascading style sheet: <?xml version="1.0"?> <?xml-stylesheet type="text/css" href="xmlstyle.css"?> <DATABASE> <TITLE>List of Items Important to Markup Languages</TITLE> <TITLE2>Languages</TITLE2> <LANGUAGES>SGML<LANGUAGES> <LANGUAGES>XML</LANGUAGES> <LANGUAGES>HTML<LANGUAGES> <TITLE2>Other</TITLE2> <OTHER>DTD<OTHER> <OTHER>DSSL<OTHER> <OTHER>Style Sheets</OTHER> </DATABASE> The below line, which is a part of the XML document above, is a processing instruction and is a part of the prolog. <?xml-stylesheet type="text/css" href="xmlstyle.css"?> The style sheet, "xmlstyle.css", may look like: DATABASE { display: block } TITLE { display: block; font-family: arial; color: #008000; font-weight: 600;
Copyright :IT ENGG PORTAL www.itportal.in

3 5

font-size: 22; text-align: center } TITLE2 { display: block; font-family: arial; color: #000080; font-weight: 400; font-size: 20 } LANGUAGES { display: block; list-style-type: decimal; font-family: arial; color: #000000; font-weight: 400; font-size: 18 } OTHER { display: block; list-style-type: square; font-family: arial; color: #0000ff; font-weight: 200; font-size: 14 } --------------------------------------------4)XML DTD In order to use DTDs with XML, the reader is encouraged to learn about the structure of DTDs. The document called "Document Type Definition (DTD)" on this website gives a more thorough explanation about how to read and construct a DTD than this document. The DTDs in this document are very basic and should, however, be easy to understand even for those who don't have familiarity with DTD's.
Copyright :IT ENGG PORTAL www.itportal.in

3 6

The DTD, whether included as part of the XML file or external to the XML file is used to define content. The DTD is used to determine the elements allowed in the file and which elements can be contained in other elements. It also describes the number of times specific elements may be contained in other elements. The DTD document on this website was written for SGML DTDs and the main difference lies in the fact that XML requires a beginning and ending tag for all elements. SGML does not have this requirement . Element Declaration Therefore in an SGML DTD an element tag for the <HR> element is: <!ELEMENT HR - O EMPTY -- horizontal rule --> If written for XML, the DTD would be: <!ELEMENT HR EMPTY -- horizontal rule -->

The difference between the two examples is that the XML <HR> declaration does not define whether the element requires a closing tag as done with the "- O" text. This is because all elements in XML are required to have closing tags and therefore defining whether a closing tag is required is pointless. ----------------------------------------------------5)XML Example This is an example of an XML document that used an external DTD file and cascading style sheet (CSS) file. The XML File The following file is called "parts.xml". <?xml version="1.0"?> <!DOCTYPE PARTS SYSTEM "parts.dtd"> <?xml-stylesheet type="text/css" href="xmlpartsstyle.css"?> <PARTS> <TITLE>Computer Parts</TITLE> <PART> <ITEM>Motherboard</ITEM> <MANUFACTURER>ASUS</MANUFACTURER> <MODEL>P3B-F</MODEL> <COST> 123.00</COST>
Copyright :IT ENGG PORTAL www.itportal.in

3 7

</PART> <PART> <ITEM>Video Card</ITEM> <MANUFACTURER>ATI</MANUFACTURER> <MODEL>All-in-Wonder Pro</MODEL> <COST> 160.00</COST> </PART> <PART> <ITEM>Sound Card</ITEM> <MANUFACTURER>Creative Labs</MANUFACTURER> <MODEL>Sound Blaster Live</MODEL> <COST> 80.00</COST> </PART> <PART> <ITEM inch Monitor</ITEM> <MANUFACTURER>LG Electronics</MANUFACTURER> <MODEL> 995E</MODEL> <COST> 290.00</COST> </PART> </PARTS>

This file specifies the use of two external files. A DTD file called "parts.dtd". This is done on the second line: <!DOCTYPE PARTS SYSTEM "parts.dtd"> The name after !DOCTYPE which is "PARTS" must be the same name as the encapsulating element in the XML file which is <PARTS> in this case. A CSS file called "xmlpartsstyle.css". This is done on the third line using a processing instruction: <?xml-stylesheet type="text/css" href="xmlpartsstyle.css"?>
Copyright :IT ENGG PORTAL www.itportal.in

3 8

The DTD File The following file is called "parts.dtd". <!ELEMENT PARTS (TITLE?, PART*)> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT PART (ITEM, MANUFACTURER, MODEL, COST)+> <!ATTLIST PART type (computer|auto|airplane) #IMPLIED> <!ELEMENT ITEM (#PCDATA)> <!ELEMENT MANUFACTURER (#PCDATA)> <!ELEMENT MODEL (#PCDATA)> <!ELEMENT COST (#PCDATA)> The root element is PARTS. The root element may contain no TITLE element or one TITLE element along with any number of PART elements. The PART element must contain one each of items ITEM, MANUFACTURER, MODEL, and COST in order. The PART element may contain an attribute called "type" which may have a value of "computer", "auto", or "airplane". The elements TITLE, ITEM, MANUFACTURER, MODEL, and COST all contain PCDATA which is parsed character data. The Style File The following file is used to set the style of the elements in the XML file. It is called "xmlpartstyle.css". PARTS { display: block } TITLE { display: block; font-family: arial; color: #008000; font-weight: 600; font-size: 22; margin-top: 12pt; text-align: center }
Copyright :IT ENGG PORTAL www.itportal.in

3 9

PART { display: block } ITEM { display: block; font-family: arial; color: #000080; font-weight: 400; margin-left: 15pt; margin-top: 12pt; font-size: 18 } MANUFACTURER { display: block; font-family: arial; color: #600060; font-weight: 400; margin-left: 45pt; margin-top: 5pt; font-size: 18 } MODEL { display: block; font-family: arial; color: #006000; font-weight: 400; margin-left: 45pt; margin-top: 5pt; font-size: 18 } COST
Copyright :IT ENGG PORTAL www.itportal.in

4 0

{ display: block; font-family: arial; color: #800000; font-weight: 400; margin-left: 45pt; margin-top: 5pt; font-size: 18 } How it Displays on your browser If you click on the following link and the loaded document does not display in color, your browser probably does not support XML. In this case, you should update your browser to the most recent version. ------------------------------------------------6)XML Element Attributes An XML attribute may be included with element declarations. XML Attributes are name/value pairs that may be associated with the element. XML attributes may be used to control some characteristics of the element. In the markup document attributes are used with the following form: <ELEMENTNAME Importance="minimal"> Rules for XML attribute names are the same as rules for XML element names. The attribute name may only appear once in the element start tag. The double or single quoted string may be used to delimit the attribute value. The attribute value may not contain the < character. The attribute value can only include the & character when using an entity or character reference. The DTD determines which element attributes are required and what the defaults are. Read the DTD Reference manual on this website for more information about writing DTDs and specifying element attributes. XML Process Instructions ---------------------------------------------7)XML Process Instructions XML process instructions are normally included in the XML document prolog, commonly thought of as the header in HTML. Processing instructions may be placed anywhere in the document so long as they are outside the element tags. standalone="yes"
4 1

Copyright :IT ENGG PORTAL www.itportal.in

The rules for the names of process instructions are similar to the rules for element names, however in the case of process instructions there are some reserved xml process instructions such as "xmlstylesheet". The process instruction syntax: <? target instruction ?> The target is the application name the instruction is meant for. It can be a reserved value such as "xml-stylesheet" or the name of an external application such as a script program name. --------------------------------------------------8)Extensible linking Language (XLL) XLL formally was called XLink. XLL stands for Extensible Linking language and consists of the following parts: XPointer - It is built on XPath and it establishes a common system to specify node locations. It allows for locating data that is not a complete node. XLink - Advanced linking. It links to multiple destinations, is bi-directional and allows for links to be displayed on other documents XInclude - Merges XML infosets into one infoset.

-X-X-X-X-X-X-X-X-X-X-X-X-X-X-X-X-X-X-X-X-X-X-X-X-X-X-

4 2

Copyright :IT ENGG PORTAL www.itportal.in

You might also like