Data Warehouse Architecture 032008
Data Warehouse Architecture 032008
Thilini Ariyachandra is assistant professor of MIS in the College of Business at the University of Cincinnati. [email protected]
Hugh J. Watson holds a C. Herman and Mary Virginia Terry Chair of Business Administration in the Terry College of Business at the University of Georgia. He is a Fellow of TDWI and the senior editor of the Business Intelligence Journal . [email protected]
SUCCESSFUL ARCHITECTURE
Data Collection
A Web-based surveytargeted at individuals involved in an organizations data warehouse implementationwas used to collect data. The survey included questions about the respondent, the respondents company, the companys data warehouse, and the success of the data warehouse architecture. Four hundred fty-four respondents provided usable information. The positions of the respondents were distributed relatively evenly among data warehouse managers, data warehouse staff members, IS managers, and independent consultants/system integrators. Respondents in the last group were asked to complete the survey with a particular client in mind. Surveyed companies ranged from small (less than $10 million in revenue) to large (in excess of $10 billion). Most of the companies are located in the United States (60%) and represent a variety of industries, with the nancial services industry (15%) providing the most responses. The predominant architecture was the hub-and-spoke (39%), followed by the bus architecture (26%), centralized (17%), independent data marts (12%), and federated (4%). The most common platform for hosting the data warehouses was Oracle (41%), followed by Microsoft (19%) and IBM (18%). The average (mean) gross revenue varied from $3.7 billion for independent data marts to $6 billion for the federated architecture.
Federated Architecture
Existing data warehouses, data marts, and legacy systems Logical/physical integration of common data elements End user access and applications
SUCCESSFUL ARCHITECTURE
Independent Data Marts Information Quality System Quality Individual Impacts Organizational Impacts 4.42 4.59 5.08 4.66
conrms the conventional wisdom that independent data marts are a poor architectural solution. Next lowest on all measures was the federated architecture. Firms sometimes have disparate decision-support platforms resulting from mergers and acquisitions, and they may choose a federated approach, at least in the short run. The ndings suggest that the federated architecture is not an optimal long-term solution. What is interesting, however, is the similarity of the averages for the bus, hub-and-spoke, and centralized architectures. The differences are sufciently small that no claims can be made for a particular architectures superiority over the others, at least based on a simple comparison of these success measures. We also collected data on the domain (e.g., varying from a sub-unit to company-wide) and the size (i.e., amount of data stored) of the warehouses. We found that the hub-and-spoke architecture is typically used with more enterprisewide implementations and larger warehouses.
2
We also investigated the cost and time required to implement the different architectures. Overall, the hub-and-spoke architecture was the most expensive and time-consuming to implement. This is not surprising, however, because of the larger domain and size of these warehouses. The architecture also requires a considerable commitment to up-front planning, which takes time and money.
Conclusion
The bus, hub-and-spoke, and centralized architectures earned similar scores on the success metrics. This nding helps explain why these competing architectures have survived over timethey are equally successful for their intended purposes. In terms of information and system quality and individual and organizational impacts, no single architecture is dominant. The similar success of the bus, hub-and-spoke, and centralized architectures is perhaps not all that surprising. In some ways, the architectures have evolved over time and become more similar. For example, the hub-andspoke architecture often includes dimensional data marts, which is at the heart of the bus architecture. Even the development methodologies (e.g., top down for the huband-spoke and centralized architectures, and life cycle or bottom up for the bus architecture) have evolved and become more similar. Each stresses the need to start small and deliver short-term wins but have a long-term plan. This evolution is appropriate and good for the industry, but it is also a likely reason that the scores on the success metrics are similar.
Information quality considers information accuracy, completeness, and consistency. System quality includes system exibility, scalability, and integration. An architecture has individual impacts when users can quickly and easily access data; think about, ask questions, and explore issues in ways that were not previously possible; and improve users decision-making capabilities. It has organizational impacts when it meets the business requirements, facilitates the use of business intelligence, supports the accomplishment of strategic business objectives, enables improvements in business processes, leads to high, quantiable ROI, and improves communications and cooperation across organizational units. Separate questions were used for the various aspects of information quality, system quality, individual impacts, and organizational impacts.