0% found this document useful (0 votes)

268 views

Size & Estimation of DW

1. Data warehouse systems are complex to measure using traditional functional metrics like function points due to differences in structure, data model, and purpose compared to operational systems. 2. Proper guidelines are needed to identify the user view, software boundaries, data components, and transactions of a data warehouse to accurately measure its size and estimate implementation efforts. 3. The paper discusses key differences between data warehouses and operational systems, provides guidelines for functionally measuring data warehouses, and emphasizes the importance of differentiating effort estimates by element types.

Uploaded by

Giuseppe Giuseppe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

268 views

Size & Estimation of DW

Uploaded by

Giuseppe Giuseppe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

SIZE & ESTIMATION OF DATA WAREHOUSE SYSTEMS

Luca Santillo

Data Processing Organization

00196 Roma, Via Flaminia, 217, Italy
Tel.: +39 06 3226887, Fax: +39 06 3233628
Email: [email protected]

Abstract

Data Warehouse Systems are a special context for the application of functional software
metrics. The use of an unique standard, as Function Point, gives serious comparability
issues with traditional systems or other paradigms, in terms of both numerical size and
implementation effort estimation. Peculiar guidelines are therefore necessary in order to
identify the user view, the software boundaries, the data and the transactional
components of such systems. Particularly, the boundary identification may strongly
affect the measurement result for a data warehouse project; consequently, one can find
huge, unacceptable deviations in the estimation of effort, time and cost for the given
project.
This paper shows the substantial differences between “traditional” software and data
warehouse systems, the main guidelines that one can use when measuring the latter, and
peculiar considerations for differentiating the effort estimation by measured element
types.
The depicted case studies highlight the fundamental relevance of the concept of “layer”,
as explicitly stated by the most recent evolutions in the functional metrics field
(COSMIC Full Function Point) in evaluating those functions which are seemingly
transparent to the final user, but which cannot be neglected when estimating the
implementation effort of the measured system.

Keywords: functional measurement, effort estimation, data warehouse.

INTRODUCTION
Software functional measurement methods aim to provide an objective, technology-
independent, user-significant measure of the size of software systems. IFPUG Function
Point method is a set of practices intended to be applied to every domain or application
typology. Despite of their generality, the IFPUG counting practices are not always easy
to apply in real or innovative environments. Apart from possible enhancements to the
expression of the practices, the key concept is that the recognizability of the functional
sizing elements of a software systems depends on the system user view, and this point
of view can widely change from one domain to another. It’s therefore necessary to
assess the correct approach to the sizing of a given system typology (data warehouse, in
our case), by means of providing domain-specific counting guidelines. The proposed
approach should not be considered as a different sizing method, but rather as an
“instantiation” of the general method concepts in a specific environment or domain.
On the other hand, if we use a specific measurement approach for the given domain, we
have to face the fact that effort estimation (of development or enhancement activities)
from this measurement cannot be obtained from general models (unless we accept the
strong risk of large estimation errors). Therefore, an “instantiation” of a generic effort
model is to be used.

DATA WAREHOUSE DEFINITIONS

Data Warehouse System
A data warehouse contains cleansed and organized data that allows decision makers to
make business decisions based on facts, not on intuition; it includes a repository of
information that is built using data from the far-flung, and often departmentally isolated,
systems of enterprise-wide computing (operational systems, or “data sources”). Creating
data to be analysed requires that the data be subject-oriented, integrated, time
referenced and non-volatile. Making sure that the data can be accessed quickly and can
meet the ad hoc queries that users need requires that the data be organized in a new
database design, the star (schema) or multidimensional data model. See Tab. 1 for an
overview of peculiar aspects of data warehouse systems, versus operational
(transactional) systems.

Transaction Processing Data Warehouse

Purpose Run day-to-day operations Information retrieval and analysis
Structure RDBMS optimised for Transaction RDBMS optimised for Query
Processing Processing
Data Model Normalised Multi-dimensional
Access SQL SQL, plus Advanced Analytical
tools.
Type of Data Data that runs the business Data to analyse the business
Nature of Data Detailed Summarized & Detailed
Data Indexes Few Many
Data Joins Many Some
Duplicated Data Normalized DBMS Denormalised DBMS
Derived Data & Rare Common
Aggregates
Table 1. Data Warehouse systems versus transactional systems.

Enterprise Data Warehouse (EDW)

An EDW contains detailed (and possibly summarized) data captured from one or more
operational systems, cleaned, transformed, integrated and loaded into a separate subject-
oriented database. As data flows from an operational system into an EDW, it does not
replace existing data in the EDW, but is instead accumulated to show a historical record
of business operations over a period of time that may range from a few months to many
years. The historical nature of the data in an EDW supports detailed analysis of business
trends, and this style of warehouse is used for short- and long-term business planning
and decision making covering multiple business units.

Data Mart (DM)

A DM is a subset of corporate data that is of value to a specific business unit,
department, or set of users. This subset consists of historical, summarized, and possibly
detailed data captured from operational systems (independent data marts) , or from an
EDW (dependent data marts). Since two or more data marts can use the same data
sources, an EDW can feed both sets of data marts and information queries, thereby
reducing redundant work.

Data Access Tools (OLAP, On-line Analytical Processing)

OLAP is the technology that enables users to access the data “multidimensionally” in a
fast, interactive, easy-to-use manner and performs advanced metric computations such
as comparison, percentage variations, and ranking. The main difference between OLAP
and other generic query and reporting tools is that OLAP allows users to look at the data
in terms of many dimensions.

Metadata
Simply stated, metadata is data about data. Metadata keeps track of what is where in the
data warehouse.

Extraction, Transformation, & Loading (ETL)

These are the typical phases required to create and update a data warehouse DB:
• In the Extraction phase, operational data are moved into the EDW (or
independent DM). The operational data can be in form of records in the tables of
a RDBMS or flat files where each field is separated by a delimiter.
• Transformation phase changes the structure of data storage. The transformation
process is carried out after designing the datamart schema. It is a process that
ensures that data is moved into the datamart, it changes the structure of data
suitable for transaction processing to a structure that is most suitable for DSS
analysis, providing a cleaning of the data when necessary, as defined from the
data warehouse manager.
• Loading phase represents an iterative process. The data warehouse has to be
populated continually and incrementally to reflect the changes in the operational
system(s).

Dimensions
A dimension is a structure that categorizes data in order to enable end users to answer
business questions. Commonly used dimensions are Customer, Product, and Time. The
data in the structure of a data warehouse system has two important components:
dimensions and facts. The dimensions are products, locations (stores), promotions, and
time, and similar attributes. The facts are sales (units sold or rented), profits, and similar
measures. A typical dimensional cube is shown in Fig. 1.

Figure 1. Sample Dimensional Cube.

Star Schema
Star Schema is a data analysis model analogue to a (multi)dimensional cube view. The
center of the star is the fact (or measure) table, while the others are dimensional tables.
Fig. 2 shows an example of star schema.

Figure 2. Example of Star Schema.

Specifically, dimension values are usually organized into hierarchies. Going up a level
in the hierarchy is called rolling up the data and going down a level in the hierarchy is
called drilling down the data. For example, within the time dimension, months roll up to
quarters, quarters roll up to years, and years roll up to all years, while within the
location dimension, stores roll up to cities, cities roll up to states, states roll up to
regions, regions roll up to countries, and countries roll up to all countries. Data analysis
typically starts at higher levels in the dimensional hierarchy and gradually drills down if
the situation warrants such analysis.

FUNCTIONAL MEASUREMENT DEFINITIONS

Functional Size
The size of a (software) system as viewed from a logical, non-technical point of view. It
is more significant to the user than physical or technical size, as for example Lines of
Code. This size should be shared between users and developers of the given system.

IFPUG Function Point

IFPUG Function Point measure is obtained by summing up the data and the
transactional functions, classified as Internal Logical Files, External Interface Files, and
External Inputs, Outputs, or Inquiries, with respect to the application boundary, which
divides the measured system from the user domain(or interfaced systems). See Tab. 2
for an overview of the numerical weights (here “complexity” depends depends on
logical structure of each element, in terms of quantities of logical attributes and
referenced files contained or used by files or transactions).

Low Complexity Average Complexity High Complexity

ILF 7 10 15
EIF 5 7 10
EI 3 4 6
EO 4 5 7
EQ 3 4 6
Table 2. Function Point elements’ weights.
COSMIC Full Function Point
COSMIC Full Function Point has been proposed as a superset of functional metrics,
which provides wider applicability than the IFPUG method. Its key concepts are the
possibility of viewing the measured system under different linked layers (different
levels of conceptual abstraction of the system functions) and the possibility to extend its
practices with “local extensions”.

FUNCTIONAL SIZE OF DATAWAREHOUSE SYSTEMS

IFPUG official documentation doesn’t provide specific examples for counting data
warehouse systems; on the other hand, this kind of system is growing in importance and
diffusion among private and public companies. We therefore need some guidelines,
especially if we consider the special user view and consequent specific data models for
this system type.

A generic data warehouse system can be viewed as made of three segments: Data
Assembling (see ETL above), System Administration (see also Metadata above), and
Data Access (see OLAP above).

Type of Count
Determining the type of count (Development project, Enhancement Project, or
Application) is usually easy, and doesn’t require specific guidelines for data warehouse
systems. We just remind that adding or changing functionality of a given system could
be considered as one or more (development and enhancement) projects, depending on
which system boundaries are firstly identified.

User View
Many figures contributes to constitute a data warehouse user:
• ETL procedures administrator,
• DB administrator,
• OLAP (or other access means) administrator,
• final user (who have access to the data warehouse information),
• any system providing or receiving data to or from the data warehouse system (for
example, operational systems which automatically send data to the ETL
procedures).

Application Boundary
When considering the separation between systems in the data warehouse domain, the
application boundaries should:
• be coherent with the organizational structure (e.g. each department has its own DM)
• reflect the project management autonomy of the EDW with respect to any DM,
• reflect the project management autonomy of each DM with respect to any other.

The following picture (Fig. 3) shows the proposed approach to the boundary question in
a data warehouse context. Note that the shown boundaries are orthogonal to the
segmentation by phase of the data warehouse system (ETL, Administration, Access).
Administration Administration
(EDW) (DM)
Metadat
Metadat

ETLD M
ETLE D W
DB (EDW) DM (dependent)

Metadati

ETLD M
Operational
Data DM

ETL (EDW) ETL (DM) Data Access

Figure 3. Boundary scheme for EDW, dependent DM, and independent DM.

Comments on boundaries
Note that, as stated also by the IFPUG Counting Practices Manual, some systems could
share some functionality, and each of them should count those functions. For example,
2 or more (dependent / independent) DMs’ can make use of the same external source
files (EDW / operational) in order to load their own data. While counting these shared
functions for each system which uses them, we should not ignore some reuse
consideration, when deriving the effort estimation for each system development or
enhancement project.

Boundary re-definition should be performed only in special cases, as the merge of 2

DMs’ into one, or the split of 1 DM into more than one system. In doing such a re-
definition, we have to mark some functions as deleted without effort (in the merge
case), or as duplicated without effort.

Data Functions

Operational source data

These are EIFs’ for the EDW or the independent DM which use them in the ETL
segment. While the separation into distinct logical files is performed from the point of
view of the operational system which provides and contains them as its own ILFs’, their
content, in terms of Data Element Types, and Record Element Types, should be counted
from the point of view of the target system. Note that simple physical duplicates on
different areas are usually not counted as different logical files.

A special case of the ETL procedure is when the operational system provides by its own
procedures the information to the EDW (or independent DM); in this case, no EIF is
counted for the latter, since we have External Outputs sending out of the source system
the information required, and not the target system reading and collecting the data.
Data warehouse internal data - Star schema data model

While counters are provided with sufficient guidelines and example for entity-
relationship data models, we have to face the case of star schema data models, which
correspond to the multidimensional cube views.

Since the fact table is not significant to the data warehouse user, without its dimensional
tables, and vice versa, we suggest the strong guideline that each “logical” star is an ILF
for the EDW or DM being counted. Each (fact and dimensional) table is a Record
Element Type for such a logical file. IN analogy with this, each “logical” cube is an
ILF, with N+1 RET, where N is the number of its dimensions (the axes of the cube).

In case of the so-called snow-flake schema, where the hierarchical dimensions are
exploded into their levels (e.g. month – quarter - year), the second order tables do not
represent other RETs’, since the counted RET is for the whole dimension (“time” in the
cited example).

The DETs’ of each hierarchy are only two, dimension level and value (e.g. “time level”,
which can be “month”, “quarter”, “year”, and “time value”, which can be “January”,
“February”, …, “I”, “II”, …, “1999”, “2000”, …, and so on).

Other attributes in the tables, apart from those who implement a hierarchy , are counted
as additional DETs’ for the logical file. A special case of data warehouse systems
attributes is that of pre-derived data, or data which are firstly derived in the ETL phases,
then recorded in the file, and finally accessed by the final user, in order to provide the
maximum performance. A logical analysis should be carried in order to distinguish the
case when the (final) user recognises these data as contained in the files, and then only
retrieved by inquiries, from the case when the user is not aware of such a physical
processing, and considers the data as derived online by the required output process.

Metadata

Technical metadata, as update frequency, system versioning, physical-logical files

mapping, are not identifiable as logical files. Since the data warehouse administrator is
one of the figures which constitute the general system user, some metadata can be
recognized and counted as logical files; example are:
• User profiles file
• Access Privileges file
• Data processing rules file
• Use Statistics file

Business metadata are good candidates for being counted as logical files; examples are:
• Data dictionary(what is the meaning of an attribute)
• Data on historical aspects (when a value for an attribute was provided)
• Data on the data owner (who provided a value for an attribute)

Transactional Functions
ETL: we suggest the strong guideline that the overall procedure of reading external
source files, cleaning and transforming their contents, reading eventually metadata, and
loading the derived information in the target system is a unique process from the data
warehouse user point of view; therefore we have only one EI for each target identified
ILF. DETs’ of such an EI should be all the attributes which enters the boundary of
system being counted, plus the eventual output attributes or data, such as messages to
the user for error or confirmation.

Administration: The administration segment contains traditional processes, such as the

management transactions for creating, updating, deleting, and viewing metadata.

Access: The main functions of the access segment are those who let the user consult
information from the data warehouse; such processes are counted as EOs’ or EQs’,
depending on the presence of derived data. Therefore, we have at least 1 process
(usually EO) for each identified “logical star” of the data warehouse DB. Note that
drilling down o rolling up the same star is equivalent to retrieving the same data, just
using different “levels” in the dimensional hierarchies – which are all DETs’ of the
same star – so different levels of the view are counted only once, as they are the same
logical output.

The drill down trigger itself is usually provided by common OLAP tools as a listbox on
every “drillable” attribute. Such mechanism is counted as a low complexity EQ (for
each distinct attribute of each distinct star), while the productivity coefficient for such a
process will strongly reduce its impact.

Function Taxonomy Classes

In order to support the effort estimation, the data and transactional functions should be
labelled depending on their role in the data warehouse system being measured. The
classes are: ETL (Extraction, Transformation & Loading), ADM (Administration), ACC
(Access). Tab. 3 provides examples of such a classification.

Type Where Examples

ILFETL EDW, DM • EDW DB logical files
• Independent DM DB logical files
• Dependent DM DB logical files, when logically distinct from
the EDW DB logical files
ILFA D M EDW, DM Metadata, significant LOG files, statistics
EIFETL EDW, DM Operational DB logical files
EIFE D W Dependent DM EDW’s ILFs’, when accessed by ETL or Access procedure
EIFA D M EDW, DM • Significant support files
• Externally maintained metadata
EIETL EDW, DM 1 EI for each identified ILF ETL
EIA D M EDW, DM Create, update, delete metadata
EOA D M EDW View metadata (with derived data)
EQA D M EDW View metadata (without derived data)
EOA C C DM 1 EO for each identified ILF ETL
EQA C C DM 1 EQ for each identified ILF ETL which has no corresponding
EOA C C , i.e. view without any derived data
EQLISTBOX DM Drill-down triggers, any other List Boxes
Table 3. Function Types Taxonomy.
Value Adjustment Factor (VAF)
At the present moment, a specific ISO Working Group is examining the candidates for a
standard software functional measurement definition; one preliminary result is that the
14 General System Characteristics, which constitute the VAF, should not be used;
therefore, we neglect VAF, or, that is equivalent, we consider its value equal to 1 in any
counting case.

Final Function Point Formulas

Standard formulas are used without specific recommendation. We only recall the use of
the proposed taxonomy; that means that, besides total of FP, we have to provide the
complete list of different functions, depending on their classes. Since we always assume
a VAF = 1 for data warehouse systems, the final count formulas are slightly simplified.

EFFORT ESTIMATION FOR DATAWAREHOUSE SYSTEMS

Data warehouse systems productivity factors
The main peculiar productivity aspects of data warehouse systems are:
• Many data and transactional functions are cut (flatten) because of the limit of
“high complexity”of the IFPUG model;
• Internal and external reuse can be very significant;
• Data warehouse and OLAP tools and technology positively impact the
implementation productivity, while the analysis phase can be very consuming
• Some segments (as Access) are more impacted by the use of tools.

All these factors lead us to consider an innovative structured approach to the utilization
of Function Point measure in the software effort estimation process, when applied to
data warehopuse systems. Instead of putting the mere total number of FP for a project in
a benchmarking regression equation, we found by empirical and heuristical research
some steps which provides an “adjusted” number, that can be seen as “FP-equivalent”
for effort estimation purpose. Of course, we should keep the original counted FP as the
size of the system in terms of the user view, while this “FP-equivalent” is a more
realistic number to use in a software effort estimation model. The coefficients proposed
in the following are to be multiplied with the original FP number of the corresponding
counted function. Only cases different from unitary (neutral) adjustment are shown.

1. Adjustment by intervention class (only specific classes are shown)

DEVEDW ENHEDW DEVDM ENHDM

Class Coefficient Coefficient Coefficient Coefficient
 RET − 4   RET − 4 
ILFE T L 1 1 1 +  1 + 
 4   4 
 FTR − 3   FTR − 3   FTR − 4   FTR − 4 
EIE T L 2 +  2 +  1 +  1 + 
 3   3   3   3 
 FTR − 4   FTR − 4 
EOA C C 1 +  1 +  1 1
 4   4 
 FTR − 3   FTR − 3 
EQA C C 1 +  1 +  1 1
 3   3 
Table 4. Adjustment coefficients by intervention class.
2. Adjustment by reuse (NESMA-like model)

2a. Development (both EDW & DM)

Consider each function class in the given count (e.g. all the ILFETL, then all EIFETL, and
so on). For each distinct function class:
a) Assign a reuse coefficient of 0.50 to each function (except the 1st ) of the set of
functions which share:
• 50% or more DETs’, and 50% or more RETs’ or FTRs’.
b) Assign a reuse coefficient of 0.75 to each function (except the 1st ) of the residue
set of functions which share:
• 50% or more DETs’, but less than 50% RETs’ or FTRs’;
• less than 50% DETs’, but 50% or more RETs’ or FTRs’.
c) Assign a reuse coefficient of 1.00 (neutral) to the residue.

The “1st function” means the function in the given class with highest functional
complexity, highest number of DETs’, highest number of RETs’ or FTRs’. The percent
values of DETs’, RETs’, and FTRs’, are determined with respect to this “1st function”.

In the special case of CRUD transactions sets in Administration segment, i.e. Create,
Read, Update, and Delete of generic file type, assign a uniform 0.5 adjustment to each
transaction in the unique identified CRUD.

2b. Enhancement (both EDW & DM)

Added Functions
Act as for Development.

Internally Changed Functions (i.e. added, changed, deleted DETs’, RETs’, or FTRs’)

DET%
ReuseENH
≤ 33% ≤ 67% ≤ 100% > 100%
≤ 33% 0.25 0.50 0.75 1.00
RET%
≤ 67% 0.50 0.75 1.00 1.25
or
≤ 100% 0.75 1.00 1.25 1.50
FTR%
> 100% 1.00 1.25 1.50 1.75
Table 5. Reuse coefficients for Internlly Changd Functions.
where the percent values are given by comparing the number of DETs’, RETs’, FTRs’
which are added, modified, or deleted, with respect to their pre-enhancement quantities.

Type Changed Functions (i.e. ILF to EIF, EQ to EO, etc.)

Assign an adjustment reuse coefficient of 0.4.

Mixed Cases
If a function is changed in both internal elements and type, assign the higher of the two
adjustment coefficients from the above. For transactions, note that changes in the user
interface, layout, or fixed labels, without changes in the processing logic, are not
considered.
Deleted Functions
Assign an adjustment reuse coefficient of 0.4.

3. Adjustment by technology (only applied to DM projects, Access segment)

DEVDM ENHDM
Class Coefficient Coefficient
EOA C C 0.5 0.5
EQA C C 0.5 0.5
EQLISTBOX 0.1 0.1
Table 6. DW technology adjustments.

Effort Estimation

After we obtain the “FP-equivalent” frm the previous adjustment, we can put its value
in a benchmarking regression equation, as the following, which has been obtained (by
filtering on several sample attributes) from the ISBSG Benhmark:

Avg.Eff = 13.92 x FP-equivalent - 371.15

Note that this equation is just an example; more precise estimations can be obtained
only by creating a “local benchmark” for the given company, project team, or
department. However, one further step is still to be made: specific productivity
adjustment of the average effort estimate.

Specific productivity adjustment

This last step is carrie dout by means of the well-known COCOMO II model; we recall
that only some factors of the original COCOMO II model are to be used, since, for
example, the REUSE factor is already explicitly considered in the previous steps, when
calculating the FP-equivalent.

The final effort estimation is therefore:

N
Effort = Effort ⋅ ∏ CDi
i =1
where:
• Effort is the Average Effort from the previous step, based on ISBSG or
equivalent benchmark.
• CDi is the coefficient of the ith COCOMO II Cost Driver.

The Cost Driver considered in the actual research are:

• RELY (Required software reliability)
• CPLX (Product complexity)
• DOCU (Documentation match to life-cycle needs)
• PVOL (Platform volatility)
• ACAP (Analyst capabilities)
• PCAP (Programmer capabilities)
• AEXP (Applications experience)
• PEXP (Platform experience)
Readers should refer to the original COCOMO II documentation for exact values of the
levels of these drivers.

COMMENTS & CONCLUSIONS

Serious testing of the proposed approach is being carried out at the moment. Further
developments will surely come from the adoption of the COSMIC Full Function Point
“layer” concept, which will is able to take into account the impact of some specific
“segments” of data warehouse systems, as for example a detailed model of the ETL
(Extraction, Transformation & Loading) phases, in order to improve the effort
estimation precision. Moreover, research pointed out the inadequacy of cut limits in the
complexity levels of IFPUG Function Point method, as already shown by the ISBSG
research: some rangs in the complexity matrices should be revised or extended. Note
that in COSMIC Full Function Point method, there no such artificial cut-off.

Another issue that is to be faced is the creation of specific benchmark for data
warehouse domain and technology, since this typology is going to play an relevant role
in the future of public and private companies, which have to manage more and more
information in less and less time than ever.

REFERENCES
• Baralis E., “Data Mining”, Politecnico di Torino, 1999
• COCOMO II Model Defi nition Manual rel. 1.4 , University of Southern
California, 1997
• COSMIC Full Function Point Measurement Manual, Version 2.0 , Serge
Oligny, 1999
• Hennie Huijgens, “Estimating Cost of Software Maintenance: a Real Case
Example”, NESMA , 2000
• Dyché J., “e-Data: Turning Data into Information with Data Warehousing”,
Addison-Wesley, 2000
• IFPUG Function Point Counting Practices Manual, Release 4.1 , IFPUG,
1999
• ISBSG Benchmark, Release 6, ISBSG, 2000
• Torlone R., “Data Warehousing”, Dipartimento di Informatica e Automazione,
Università di Roma Tre, 1999

Enigma LIT 36629386 0 GTIIGS MikeMaloney
100% (3)
Enigma LIT 36629386 0 GTIIGS MikeMaloney
47 pages
PR ARTS Data Warehouse V2 20110707
No ratings yet
PR ARTS Data Warehouse V2 20110707
75 pages
65230c8b7c23f3927615ba1c 9903322507
No ratings yet
65230c8b7c23f3927615ba1c 9903322507
2 pages
2.1.2.8 Lab - The Digital Oscilloscope
No ratings yet
2.1.2.8 Lab - The Digital Oscilloscope
3 pages
International Marketing - Lesson Plan
No ratings yet
International Marketing - Lesson Plan
9 pages
Sedimentology and Sequence Stratigraphy of Reefs and Carbonate Platforms A Short Course (AAPG Course Notes 34)
100% (1)
Sedimentology and Sequence Stratigraphy of Reefs and Carbonate Platforms A Short Course (AAPG Course Notes 34)
76 pages
DSC1520 Study Guide
100% (3)
DSC1520 Study Guide
102 pages
Presented By: A Paper Presentation On
No ratings yet
Presented By: A Paper Presentation On
7 pages
Data Warehousing Reduced
No ratings yet
Data Warehousing Reduced
6 pages
Logical Database Design Using Entity-Relationship Modeling: Normalization To Avoid Redundancy
No ratings yet
Logical Database Design Using Entity-Relationship Modeling: Normalization To Avoid Redundancy
12 pages
AshishRaichur MinisteringHealingandDeliverance
No ratings yet
AshishRaichur MinisteringHealingandDeliverance
22 pages
EmotionallyFree FreeAtLast 3 PsAshishRaichur
No ratings yet
EmotionallyFree FreeAtLast 3 PsAshishRaichur
6 pages
Finder Database Admin
100% (1)
Finder Database Admin
490 pages
What Is A Data Modelvery Important
No ratings yet
What Is A Data Modelvery Important
7 pages
DM104 - Evaluation of Business Performance
No ratings yet
DM104 - Evaluation of Business Performance
15 pages
Data Warehousing Tutorial
No ratings yet
Data Warehousing Tutorial
86 pages
An Overview of Snowflake Apache Iceberg Tables by Augusto Kiniama Rosa Snowflake Feb, 2024 Medium
No ratings yet
An Overview of Snowflake Apache Iceberg Tables by Augusto Kiniama Rosa Snowflake Feb, 2024 Medium
20 pages
Data Modeling 101: Search
No ratings yet
Data Modeling 101: Search
9 pages
2019C2 - Data Lakes Ebook
No ratings yet
2019C2 - Data Lakes Ebook
37 pages
BW Multi Dimensional Data Modelling
No ratings yet
BW Multi Dimensional Data Modelling
99 pages
ERD Notation Cheat Sheet
No ratings yet
ERD Notation Cheat Sheet
3 pages
Chapter 5
No ratings yet
Chapter 5
47 pages
Awakening Devos PDF
100% (1)
Awakening Devos PDF
44 pages
Dictionary of Bible Proper Names
100% (1)
Dictionary of Bible Proper Names
288 pages
Loading Data in +snowflake
No ratings yet
Loading Data in +snowflake
10 pages
Witchcraft Pt. 1
No ratings yet
Witchcraft Pt. 1
3 pages
AshishRaichur GodsInstructionsOnPrayerII
No ratings yet
AshishRaichur GodsInstructionsOnPrayerII
4 pages
Sap S/4Hana®: Extensibility For Customers and Partners August 2016
No ratings yet
Sap S/4Hana®: Extensibility For Customers and Partners August 2016
26 pages
DBT Interview Question
No ratings yet
DBT Interview Question
6 pages
ETL Testing / Data Warehouse Testing - Tips, Techniques, Process and Challenges
No ratings yet
ETL Testing / Data Warehouse Testing - Tips, Techniques, Process and Challenges
4 pages
Big Data and Data Warehouse
No ratings yet
Big Data and Data Warehouse
19 pages
Beyond Understanding-Ps Ashish Raichur
No ratings yet
Beyond Understanding-Ps Ashish Raichur
5 pages
Informatica Designer Module
No ratings yet
Informatica Designer Module
101 pages
Informatica Training
No ratings yet
Informatica Training
285 pages
Introducing Snowflake: Data Warehousing For Everyone
No ratings yet
Introducing Snowflake: Data Warehousing For Everyone
15 pages
The Christian Dimensions of Marriage
No ratings yet
The Christian Dimensions of Marriage
8 pages
Geometry Ina RDBMS
100% (1)
Geometry Ina RDBMS
33 pages
ETL
No ratings yet
ETL
22 pages
Test Data Management
No ratings yet
Test Data Management
7 pages
A Interview Questions and Answers
No ratings yet
A Interview Questions and Answers
34 pages
Planning A Data Warehouse
No ratings yet
Planning A Data Warehouse
38 pages
DBT - Note2024-Roles
No ratings yet
DBT - Note2024-Roles
1 page
UBS OCF - IDQ Capabilities Review
No ratings yet
UBS OCF - IDQ Capabilities Review
15 pages
Data-Quality Brochure 6787
No ratings yet
Data-Quality Brochure 6787
6 pages
PoT - Rational.09.2.066.01 Workbook
No ratings yet
PoT - Rational.09.2.066.01 Workbook
176 pages
Data Mining Unit - 1 Notes
No ratings yet
Data Mining Unit - 1 Notes
16 pages
Database Design
No ratings yet
Database Design
10 pages
How To Sell A Data Warehouse To Upper Management Checklist
No ratings yet
How To Sell A Data Warehouse To Upper Management Checklist
6 pages
Data Modeling 101
No ratings yet
Data Modeling 101
17 pages
Overcoming Depression-Ps Ashish Raichur
No ratings yet
Overcoming Depression-Ps Ashish Raichur
13 pages
Scripture That Show God's Zeal For His Glory PDF
No ratings yet
Scripture That Show God's Zeal For His Glory PDF
5 pages
Intoduction_To_GIS
100% (1)
Intoduction_To_GIS
57 pages
How To Mentor Coach and Nurture People Mentoring Coaching Nurturing Part 2 Ps Ashish Raichur
100% (1)
How To Mentor Coach and Nurture People Mentoring Coaching Nurturing Part 2 Ps Ashish Raichur
13 pages
Snowflake Architecture
No ratings yet
Snowflake Architecture
5 pages
The Data Warehousing Development Lifecycle
100% (1)
The Data Warehousing Development Lifecycle
5 pages
Sybase Data Architecture and Data Governance WP
No ratings yet
Sybase Data Architecture and Data Governance WP
8 pages
Eb Accelerate Cloud Transformation
No ratings yet
Eb Accelerate Cloud Transformation
16 pages
Revival History
No ratings yet
Revival History
63 pages
DWH ETL and Business Intelligence Report
No ratings yet
DWH ETL and Business Intelligence Report
7 pages
Embracing the Kingdom’s Blueprint Part One: Discipleship
From Everand
Embracing the Kingdom’s Blueprint Part One: Discipleship
Riaan Engelbrecht
No ratings yet
Big Data AWS A Complete Guide
From Everand
Big Data AWS A Complete Guide
Gerardus Blokdyk
No ratings yet
Open Source Database: Virtue Or Vice?
From Everand
Open Source Database: Virtue Or Vice?
Binayaka Mishra
No ratings yet
Database testing Third Edition
From Everand
Database testing Third Edition
Gerardus Blokdyk
No ratings yet
Keys To A Revolutionary Life
From Everand
Keys To A Revolutionary Life
Thomas Hill
No ratings yet
Clinical Predictors of and Mortality in Acute Respiratory Distress Syndrome: Potential Role of Red Cell Transfusion
No ratings yet
Clinical Predictors of and Mortality in Acute Respiratory Distress Syndrome: Potential Role of Red Cell Transfusion
8 pages
Brett Graham Resume 1
No ratings yet
Brett Graham Resume 1
1 page
Bulk Report Card Class IV a Medium Bengali
No ratings yet
Bulk Report Card Class IV a Medium Bengali
14 pages
Loan Management of 3rd Year
100% (1)
Loan Management of 3rd Year
23 pages
Cutoff 2010 Dip
No ratings yet
Cutoff 2010 Dip
11 pages
1 - Dorina Rapti New VET Strategy in Albania
No ratings yet
1 - Dorina Rapti New VET Strategy in Albania
11 pages
Inclusive Education Theory Policy and Practice Assessment 2 Case Study
100% (1)
Inclusive Education Theory Policy and Practice Assessment 2 Case Study
13 pages
Functional Safety Description
No ratings yet
Functional Safety Description
28 pages
Issues in DB Mirroring Log Shipping Clustering
No ratings yet
Issues in DB Mirroring Log Shipping Clustering
1 page
Santa Barbara County Recovery Strategic Plan
100% (1)
Santa Barbara County Recovery Strategic Plan
43 pages
Detention Basin - Design
No ratings yet
Detention Basin - Design
3 pages
A Multi Layered Secure, Robust and High Capacity Image Steganographic Algorithm
No ratings yet
A Multi Layered Secure, Robust and High Capacity Image Steganographic Algorithm
8 pages
Sessional 1 Ans
0% (1)
Sessional 1 Ans
2 pages
Yoga Sutras Thorough
No ratings yet
Yoga Sutras Thorough
9 pages
Social Influence
100% (1)
Social Influence
25 pages
Mediation Brochure 140716
No ratings yet
Mediation Brochure 140716
6 pages
Written Work
No ratings yet
Written Work
6 pages
Independence PDF
No ratings yet
Independence PDF
28 pages
Clinical Practice Evaluation 2
No ratings yet
Clinical Practice Evaluation 2
13 pages
Answer Keys Challenge For JEE (Main) 2024 Session 1
No ratings yet
Answer Keys Challenge For JEE (Main) 2024 Session 1
42 pages
Wright Agrees To Let Wilcox Peer Review Regnerus
No ratings yet
Wright Agrees To Let Wilcox Peer Review Regnerus
3 pages
Spare Parts List: R902401004 R910940793 Drawing: Material Number
No ratings yet
Spare Parts List: R902401004 R910940793 Drawing: Material Number
11 pages
TNB
No ratings yet
TNB
3 pages
10 PowerPoint Hacks
No ratings yet
10 PowerPoint Hacks
60 pages
Evaluation of Uncertainty in Analytical Measurement
No ratings yet
Evaluation of Uncertainty in Analytical Measurement
21 pages
Student Oral Case Analysis To Medical Student
100% (2)
Student Oral Case Analysis To Medical Student
3 pages

Size & Estimation of DW

Uploaded by

Size & Estimation of DW

Uploaded by

SIZE & ESTIMATION OF DATA WAREHOUSE SYSTEMS

Data Processing Organization

Keywords: functional measurement, effort estimation, data warehouse.

DATA WAREHOUSE DEFINITIONS

Transaction Processing Data Warehouse

Enterprise Data Warehouse (EDW)

Data Mart (DM)

Data Access Tools (OLAP, On-line Analytical Processing)

Extraction, Transformation, & Loading (ETL)

Figure 1. Sample Dimensional Cube.

Figure 2. Example of Star Schema.

FUNCTIONAL MEASUREMENT DEFINITIONS

IFPUG Function Point

Low Complexity Average Complexity High Complexity

FUNCTIONAL SIZE OF DATAWAREHOUSE SYSTEMS

ETL (EDW) ETL (DM) Data Access

Boundary re-definition should be performed only in special cases, as the merge of 2

Operational source data

Technical metadata, as update frequency, system versioning, physical-logical files

Administration: The administration segment contains traditional processes, such as the

Function Taxonomy Classes

Type Where Examples

Final Function Point Formulas

EFFORT ESTIMATION FOR DATAWAREHOUSE SYSTEMS

1. Adjustment by intervention class (only specific classes are shown)

DEVEDW ENHEDW DEVDM ENHDM

2a. Development (both EDW & DM)

2b. Enhancement (both EDW & DM)

Type Changed Functions (i.e. ILF to EIF, EQ to EO, etc.)

3. Adjustment by technology (only applied to DM projects, Access segment)

Avg.Eff = 13.92 x FP-equivalent - 371.15

Specific productivity adjustment

The final effort estimation is therefore:

The Cost Driver considered in the actual research are:

COMMENTS & CONCLUSIONS

You might also like