The Roadmap Project
The Roadmap Project
Leaders and C-levels felt there were many opportunities for the campus to realize large benefits
by reusing information collected by operational processes in order to support better decision‐
making and to improve processes.
The project consisted of two phases. The first phase was an interview‐based study
developing an architecture
The second phase focused on developing an architecture to deliver the EDW specified in the
business requirements study. The architecture was developed by a cross‐functional team
including functional business analysts and technical personnel. It was led by the campus data
architect.
For the information in the data warehouse to be valuable, it needs to be delivered in way
that makes it useful to campus personnel in doing their jobs. This is the job of business
intelligence applications. For most people, these applications are the data warehouse. They are
the software systems which help users understand what has happened, identify problems and
opportunities, and to make and evaluate plans. The warehouse includes a variety of these tools
because there are a variety of users with quite different needs and skills. The data warehouse
applications form a
toolbox with tools appropriate for the spectrum of campus personnel from deans to
departmental administrative assistants, from students to analysts, from researchers to
executives.
For the information in the data warehouse to be valuable, it needs to be delivered in way
that makes it useful to campus personnel in doing their jobs. This is the job of business
intelligence applications. For most people, these applications are the data warehouse. They are
the software systems which help users understand what has happened, identify problems and
opportunities, and to make and evaluate plans. The warehouse includes a variety of these tools
because there are a variety of users with quite different needs and skills.
Everybody on campus uses information to do their jobs, but they differ greatly in how well they
understand available information and its interconnections and also in how comfortable they are
in using information to help them make decisions. The data warehouse applications form a
toolbox with tools appropriate for the spectrum of campus personnel from deans to
departmental administrative assistants, from students to analysts, from researchers to
executives.
The following diagram summarizes the EDW applications:
Figure 2--Business Intelligence Applications
The architecture provides three kinds of applications:
1. Reporting, or information delivery, including variants such as dashboards and alerts.
2. Query.
3. Modeling, Planning and Forecasting.
Reporting:
Systems such as BAIRS and Cal‐Profiles focus on reporting.
BAIRS reports, will be configurable by users to deliver a specific subset of information. They will be
delivered through a secure portal which will make it
easy for users to further customize the reports and to share both report requests and results
with others in their workgroups. to print reports and to share them as electronic documents in other
formats, such as
Portable Document Format (PDF). As appropriate, reports will use graphics, such as charts and data
graphs, to make information
more comprehensible.
People who have analytical skills and jobs requiring analysis will need the ability to explore the
information in the warehouse. For example, how use of the summer program affects
outcome measures and how that varies for students with different majors or different economic
backgrounds. Using that power requires understanding the information in the warehouse and knowing
how to select data, summarize it or drill down for further detail, and particularly how to combine
information across subject areas.
analytical users will have access to views which make it easy to use the
dimensional structure of the data warehouse to easily “slice and dice” information.
Additionally, the warehouse will provide a multi‐dimensional analysis tool for delivering data as “cubes”
or multidimensional structures which lend themselves to easy “slicing and dicing”. (This analysis
approach is often referred to as On‐line Analytical Processing or OLAP). Both tools will allow
users to extract information for further analysis with desktop analysis tools such as
spreadsheets. Both query tools will deliver clear, easily‐understood and user‐focused
information documentation.
Many analytical users want to use the historical data in the warehouse to build models of
alternative scenarios—trying to predict, for example, the consequences across a variety of
business areas, such as course enrollments, student aid demand, and fees revenue. The
warehouse will provide several facilities enabling this kind of modeling:
• Good integration with external modeling tools, ranging from desktop spreadsheets to
sophisticated data mining tools.
• Multi‐dimensional tools for building large models directly in the warehouse database
operating with spreadsheet‐like functions but at a scale beyond the capability of
spreadsheets.
• Facilities for saving and sharing multidimensional models across a workgroup.
The most frequent example involves research projects. Because staff migrates among research projects,
project managers need tools for expressing a spending plan for a research project, especially a
multi‐year project, showing labor forecasts by individual.
Once plans are recorded, they are migrated into the data warehouse, where they are available
Data Architecture
There are two important views of the EDW data architecture. The first is the functional view of
the information the EDW stores and delivers. This view concerns content and meaning. The
second is the view of warehouse data as a set of software components, mostly interrelated
database tables. This view concerns access by query and reporting tools. These two views are
described as the Contents architecture and the Implementation architecture, respectively.
Contents: the warehouse bus and the logical data architecture.
Basically, the enterprise data warehouse is a font of information about what happened. It is a
history of the operations of the campus drawn from the campus information systems, which
record important information in the process of helping to carry out operations. Additionally, the
warehouse delivers corresponding information about plans such as forecasts and budgets.
Being able to investigate and understand in detail what happened enables a cycle of finding
problems and opportunities, crafting plans for doing things better, then measuring to learn
whether those plans worked out. Tracking plans as well helps campus personnel compare what
actually happened to what was planned in order to identify surprises, allowing for faster
responses to the unexpected and also supporting improvements in the planning and forecasting
process. So the contents of the warehouse boils down to history and plans linked to consistent
information about the context of those events or planned events.
Thus, the contents of the data warehouse have two components:
1. Information about history and plans. These are referred to as facts, as they usually
consist of discrete facts or measurements.
2. Information about the context in which these events or measurements occur. This
context information is organized along consistent dimensions. Sample dimensions
include time, organization, and student information.
These context dimensions provide the mechanism which enables a shared, enterprise data
warehouse. Combining or comparing information from different subject areas usually involves
lining up different facts along the same dimensions. For example comparing faculty workload,
student teaching contact and resources used involves combining facts about teaching
assignments, course enrollments, and expenditures along common dimensions of organization
and time.
Most of the data mismatch or “dueling data” which makes it hard to agree on reports and
measurements comes from using different versions of the same dimensions to select and
summarize data. Having a common set of dimension definitions, on the other hand, makes
comparing data across areas feasible and understandable.
Because context dimensions usually are shared across subject areas, one of the most important
and challenging aspects of data design for the warehouse consists of identifying shared
dimensions and making sure that everyone is using a set of shared definitions. The set of
shared dimensions represents a powerful interpretive grid for making sense of diverse facts.
They also enable iterative development of the warehouse. New fact tables can be sourced from
operating systems and linked to earlier components along existing shared dimensions. The data
warehouse grows by adding data marts one at a time to the warehouse. A data mart is a fact
table and its associated dimensions. A set of related fact tables makes up a data subject area.
Considerable investigation and analysis went into identifying the basic facts which make up the
campus data warehouse and their relationship to context dimensions. Together, the basic facts
and dimensions represent the base content architecture of Berkeley’s data warehouse. That
content architecture is summarized in a table of facts and dimensions attached as an appendix
to this document.
Implementation: data bases and the physical data architecture.
The implementation data architecture is summarized in Figure 2.
It contains these principal components:
1. The warehouse itself. This is a collection of tables and views consumed by end users,
directly or through the applications tools described above. The data in the warehouse
has been processed for consistency and alignment with standard data descriptions and
value sets. Facts have been aligned with the standard dimensions.
2. A staging environment. This is a set of data bases and files used by the data
maintenance process to prepare data for publication in the warehouse as it flows from
the operational systems which collect it originally. (This maintenance process is often
referred to as Extract‐Transform‐Load, or ETL). This environment is not directly
accessed by users of the warehouse.
3. A collection of metadata—descriptive information about the contents of the
warehouse, about the source systems, about the business meaning of information,
about access and privacy rules, and about the processes which create and transform
data before it appears in the warehouse. This metadata is described in greater detail
below. Though managed as a consistent set, the metadata itself is stored in repositories
distributed among the warehouse, the staging environment and the source systems.
(Though the architecture diagram suggests that the implementation is a single, centralized
database environment, in fact any of the components may be distributed across a variety of
database servers and even a variety of DBMSs, as discussed under “DBMS platform”, below.)