SlideShare a Scribd company logo
Σ KEYSUM
© 1997 Data Management & Warehousing
INTRODUCTION
Keysum is a new and interesting
technique (not a product) in the
generation of keys within a database. It
has particular application within Data
Warehouses where keys are often
made up of de-normalised
alphanumeric data.
THE PROBLEMS
Data that has been de-normalised often
has a primary key that is made up of a
single string, a series of concatenated
strings, or other data types that can be
converted to strings. The key is
traditionally costly in terms of storage
requirements and access speed when
used in an index. It is, however, vital to
the usability of the data.
The second issue is that in a data
warehousing environment data may be
loaded and assigned an arbitrary
unique number as a key. If the data
needs to be re-loaded at a later date,
possibly with additions, then it is
impossible to guarantee that the same
arbitrary key will be assigned to the
same row.
THE SOLUTION
The solution is simplicity itself. The
generated key of the row should be the
checksum of the string that makes up
the unique key. This will, depending
on the checksum algorithm chosen,
generate a large integer that will be
nearly unique within the scope of the
data. For example using the industry
standard CRC32 algorithm will
generate a number in the range 0 to
4294967296, whilst using the Message
Digest algorithm MD5 will generate a
number between 0 and 3.4 * 1038
.
In addition to this the result can
incorporate the length of the original
string which improves the uniqueness
of lower order algorithm results
considerably.
HOW DOES THIS HELP?
The table key is now an integer, the
optimal format on which to index. The
user now calls a function to convert the
required string into the checksum and
uses the index to look up the
appropriate row. On very large tables
this is considerably faster than
conventional string look-up.
Furthermore the data can be validated,
as, if the current checksum differs from
the stored checksum then the data has
changed. This also works when re-
loading data, as any existing data will
still be able to reference the old key. It
should also be noted that when a field
within the key is altered the key also
needs to be re-generated.
If this technique is used in contexts
such as trend analysis within a Data
Warehouse it is also possible that the
occasional mis-match because of a
duplicate checksum will not be
statistically significant and therefore
the key can be considered unique.
WHAT ARE THE ISSUES?
No checksum is guarantied to be
unique. It is therefore possible that two
different records can return the same
value. If the length is included in the
checksum it is still not guarantied but it
further reduces the risk. When
choosing a checksum algorithm it is
important to consider the amount of
records for which the checksum will
provide a key. If you have a table with
500,000 rows (such as a table that
contains addresses) then CRC32 will
have an 8500:1 chance of duplicates
without considering the length of the
original string.
MD5 on the other has the remote
6.8*1032
:1 chance of generating a
duplicate checksum. This is because it
uses 128 bits rather than CRC32 which
uses only 32 bits.
When implementing the algorithm it is
important to note that checksums
normally return unsigned integers as
their result. Your database and routines
that access the checksum must all be
able to handle the size of the result and
ensure that they deal with the issue of
signed versus unsigned variables.
IS THIS FEATURE AVAILABLE NOW?
There is no direct implementation of a
checksum within the SQL Dialects of
the major vendors currently available,
however it can be implemented via an
external procedure call.
The author has implemented this
technique within an Oracle7™
database. A daemon was created that
took as its input the string and returned
two values, the checksum and the
length. This was connected to the
database via a ‘Database Pipe’. When a
checksum was required a PL/SQL
stored procedure was called that placed
the string into the database pipe and
received the two values, the checksum
and the length, back.
The daemon was also implemented as
a shared library so that it could be
accessed from the command line and
from other utilities that could call a
shared ‘C’ library.
An optional parameter was included to
allow the use of different algorithms in
different contexts. For example where
only a small data set needs a checksum
key then CRC32 may be suitable,
whilst MD5 is used only for the largest
data sets.
WHERE DO I GET A CHECKSUM
ALGORITHM?
The inevitably answer to this question
is ‘From the Internet’. Any site that
distributes the source for FreeBSD
includes an implementation of CRC32.
MD5 is also widely available.
THE FUTURE DIRECTION
The author hope that in the future that
Database vendors such as Oracle will
add the checksum function to their
SQL dialects. Once available as a in-
built function the need to implement
checksums via external procedure calls
will disappear and performance will be
improved even more. It will also allow
some standardisation is the choice and
handling of the checksum algorithms.
Data Management & Warehousing is
the trading name of David M Walker, a
freelance Data Warehousing consultant.
Address: 138, Finchampstead Road,
Wokingham, Berkshire,
RG41 2NU, United Kingdom.
WWW: http://www.datamgmt.com
Telephone: +44 (0) 7050 028 911
Fax: +44 (0) 7050 028 912
Copyright © 1997 All rights reserved.
All Copyrights and Trademarks respected
MD5 Copyright © 1991-2, RSA Data Security, Inc.
Oracle7™ is a trademark of Oracle Corporation
WHAT IS THE MD5 MESSAGE-DIGEST ALGORITHM?
MD5 is a message-digest algorithm. The algorithm takes as input a message of
arbitrary length and produces as output a 128-bit "fingerprint" or "message digest" of
the input. It is conjectured that it is computationally infeasible to produce two
messages having the same message digest, or to produce any message having a given
pre-specified target message digest.
The MD5 algorithm is designed to be quite fast on 32-bit machines. In addition, the
MD5 algorithm does not require any large substitution tables; the algorithm can be
coded quite compactly.
Copyright (C) 1991-2, RSA Data Security, Inc. Created 1991. All rights reserved.
Ad

More Related Content

What's hot (20)

hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training report
Sarvesh Meena
 
Data ware house
Data ware houseData ware house
Data ware house
VESIT/University of Mumbai
 
GridGain & Hadoop: Differences & Synergies
GridGain & Hadoop: Differences & SynergiesGridGain & Hadoop: Differences & Synergies
GridGain & Hadoop: Differences & Synergies
GridGain Systems - In-Memory Computing
 
No sql3 rmoug
No sql3 rmougNo sql3 rmoug
No sql3 rmoug
Gwen (Chen) Shapira
 
In-Memory Data Grids: Explained...
In-Memory Data Grids: Explained...In-Memory Data Grids: Explained...
In-Memory Data Grids: Explained...
GridGain Systems - In-Memory Computing
 
Can data virtualization uphold performance with complex queries?
Can data virtualization uphold performance with complex queries?Can data virtualization uphold performance with complex queries?
Can data virtualization uphold performance with complex queries?
Denodo
 
A data analyst view of Bigdata
A data analyst view of Bigdata A data analyst view of Bigdata
A data analyst view of Bigdata
Venkata Reddy Konasani
 
Massive parallel processing database systems mpp
Massive parallel processing database systems mppMassive parallel processing database systems mpp
Massive parallel processing database systems mpp
Diana Patricia Rey Cabra
 
SQL Server Managing Test Data & Stress Testing January 2011
SQL Server Managing Test Data & Stress Testing January 2011SQL Server Managing Test Data & Stress Testing January 2011
SQL Server Managing Test Data & Stress Testing January 2011
Mark Ginnebaugh
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
Cognizant
 
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xModule 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
NPN Training
 
Big data ppt
Big data pptBig data ppt
Big data ppt
Thirunavukkarasu Ps
 
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...
Eric Javier Espino Man
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Denodo
 
Data Ware House System in Cloud Environment
Data Ware House System in Cloud EnvironmentData Ware House System in Cloud Environment
Data Ware House System in Cloud Environment
IJERA Editor
 
Data lakes
Data lakesData lakes
Data lakes
Şaban Dalaman
 
Queues, Pools and Caches - Paper
Queues, Pools and Caches - PaperQueues, Pools and Caches - Paper
Queues, Pools and Caches - Paper
Gwen (Chen) Shapira
 
Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata
Mk Kim
 
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
Denodo
 
Bigdata
Bigdata Bigdata
Bigdata
NithiDazz
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training report
Sarvesh Meena
 
Can data virtualization uphold performance with complex queries?
Can data virtualization uphold performance with complex queries?Can data virtualization uphold performance with complex queries?
Can data virtualization uphold performance with complex queries?
Denodo
 
Massive parallel processing database systems mpp
Massive parallel processing database systems mppMassive parallel processing database systems mpp
Massive parallel processing database systems mpp
Diana Patricia Rey Cabra
 
SQL Server Managing Test Data & Stress Testing January 2011
SQL Server Managing Test Data & Stress Testing January 2011SQL Server Managing Test Data & Stress Testing January 2011
SQL Server Managing Test Data & Stress Testing January 2011
Mark Ginnebaugh
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
Cognizant
 
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xModule 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
NPN Training
 
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...
Eric Javier Espino Man
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Denodo
 
Data Ware House System in Cloud Environment
Data Ware House System in Cloud EnvironmentData Ware House System in Cloud Environment
Data Ware House System in Cloud Environment
IJERA Editor
 
Queues, Pools and Caches - Paper
Queues, Pools and Caches - PaperQueues, Pools and Caches - Paper
Queues, Pools and Caches - Paper
Gwen (Chen) Shapira
 
Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata
Mk Kim
 
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
How to Achieve Fast Data Performance in Big Data, Logical Data Warehouse, and...
Denodo
 

Viewers also liked (6)

Conspectus data warehousing appliances – fad or future
Conspectus   data warehousing appliances – fad or futureConspectus   data warehousing appliances – fad or future
Conspectus data warehousing appliances – fad or future
David Walker
 
Connections a life in the day of - david walker
Connections   a life in the day of - david walkerConnections   a life in the day of - david walker
Connections a life in the day of - david walker
David Walker
 
A linux mac os x command line interface
A linux mac os x command line interfaceA linux mac os x command line interface
A linux mac os x command line interface
David Walker
 
IOUG93 - Technical Architecture for the Data Warehouse - Paper
IOUG93 - Technical Architecture for the Data Warehouse - PaperIOUG93 - Technical Architecture for the Data Warehouse - Paper
IOUG93 - Technical Architecture for the Data Warehouse - Paper
David Walker
 
EOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperEOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - Paper
David Walker
 
Data warehousing change in a challenging environment
Data warehousing change in a challenging environmentData warehousing change in a challenging environment
Data warehousing change in a challenging environment
David Walker
 
Conspectus data warehousing appliances – fad or future
Conspectus   data warehousing appliances – fad or futureConspectus   data warehousing appliances – fad or future
Conspectus data warehousing appliances – fad or future
David Walker
 
Connections a life in the day of - david walker
Connections   a life in the day of - david walkerConnections   a life in the day of - david walker
Connections a life in the day of - david walker
David Walker
 
A linux mac os x command line interface
A linux mac os x command line interfaceA linux mac os x command line interface
A linux mac os x command line interface
David Walker
 
IOUG93 - Technical Architecture for the Data Warehouse - Paper
IOUG93 - Technical Architecture for the Data Warehouse - PaperIOUG93 - Technical Architecture for the Data Warehouse - Paper
IOUG93 - Technical Architecture for the Data Warehouse - Paper
David Walker
 
EOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperEOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - Paper
David Walker
 
Data warehousing change in a challenging environment
Data warehousing change in a challenging environmentData warehousing change in a challenging environment
Data warehousing change in a challenging environment
David Walker
 
Ad

Similar to Keysum - Using Checksum Keys (20)

IRJET- Privacy Preserving Cloud Storage based on a Three Layer Security M...
IRJET-  	  Privacy Preserving Cloud Storage based on a Three Layer Security M...IRJET-  	  Privacy Preserving Cloud Storage based on a Three Layer Security M...
IRJET- Privacy Preserving Cloud Storage based on a Three Layer Security M...
IRJET Journal
 
IRJET - A Secure AMR Stganography Scheme based on Pulse Distribution Mode...
IRJET -  	  A Secure AMR Stganography Scheme based on Pulse Distribution Mode...IRJET -  	  A Secure AMR Stganography Scheme based on Pulse Distribution Mode...
IRJET - A Secure AMR Stganography Scheme based on Pulse Distribution Mode...
IRJET Journal
 
Organizational compliance and security in Microsoft SQL 2012-2016
Organizational compliance and security in Microsoft SQL 2012-2016Organizational compliance and security in Microsoft SQL 2012-2016
Organizational compliance and security in Microsoft SQL 2012-2016
George Walters
 
Farheen
Farheen Farheen
Farheen
Farheen Naaz
 
IRJET- A Survey on Searching of Keyword on Encrypted Data in Cloud using ...
IRJET-  	  A Survey on Searching of Keyword on Encrypted Data in Cloud using ...IRJET-  	  A Survey on Searching of Keyword on Encrypted Data in Cloud using ...
IRJET- A Survey on Searching of Keyword on Encrypted Data in Cloud using ...
IRJET Journal
 
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Fuzzy keyword search over
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Fuzzy keyword search over2014 IEEE DOTNET CLOUD COMPUTING PROJECT Fuzzy keyword search over
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Fuzzy keyword search over
IEEEFINALSEMSTUDENTPROJECTS
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Fuzzy keyword search over
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Fuzzy keyword search overIEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Fuzzy keyword search over
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Fuzzy keyword search over
IEEEMEMTECHSTUDENTPROJECTS
 
2014 IEEE JAVA CLOUD COMPUTING PROJECT Performance and cost evaluation of an ...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Performance and cost evaluation of an ...2014 IEEE JAVA CLOUD COMPUTING PROJECT Performance and cost evaluation of an ...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Performance and cost evaluation of an ...
IEEEFINALSEMSTUDENTPROJECTS
 
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Performance and cost evaluation of an...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Performance and cost evaluation of an...IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Performance and cost evaluation of an...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Performance and cost evaluation of an...
IEEEGLOBALSOFTSTUDENTPROJECTS
 
2014 IEEE DOTNET DATA MINING PROJECT A novel model for mining association rul...
2014 IEEE DOTNET DATA MINING PROJECT A novel model for mining association rul...2014 IEEE DOTNET DATA MINING PROJECT A novel model for mining association rul...
2014 IEEE DOTNET DATA MINING PROJECT A novel model for mining association rul...
IEEEMEMTECHSTUDENTSPROJECTS
 
IEEE 2014 DOTNET DATA MINING PROJECTS A novel model for mining association ru...
IEEE 2014 DOTNET DATA MINING PROJECTS A novel model for mining association ru...IEEE 2014 DOTNET DATA MINING PROJECTS A novel model for mining association ru...
IEEE 2014 DOTNET DATA MINING PROJECTS A novel model for mining association ru...
IEEEMEMTECHSTUDENTPROJECTS
 
Database System.pptx
Database System.pptxDatabase System.pptx
Database System.pptx
Database Homework Help
 
M021201092098
M021201092098M021201092098
M021201092098
theijes
 
Machine Learning and AI at Oracle
Machine Learning and AI at OracleMachine Learning and AI at Oracle
Machine Learning and AI at Oracle
Sandesh Rao
 
The Champion Supervisor
The Champion SupervisorThe Champion Supervisor
The Champion Supervisor
Hassan Rizwan
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Precisely
 
IRJET- Providing In-Database Analytic Functionalities to Mysql : A Proposed S...
IRJET- Providing In-Database Analytic Functionalities to Mysql : A Proposed S...IRJET- Providing In-Database Analytic Functionalities to Mysql : A Proposed S...
IRJET- Providing In-Database Analytic Functionalities to Mysql : A Proposed S...
IRJET Journal
 
Secure Text Transfer Using Diffie-Hellman Key Exchange Based On Cloud
Secure Text Transfer Using Diffie-Hellman Key Exchange Based On CloudSecure Text Transfer Using Diffie-Hellman Key Exchange Based On Cloud
Secure Text Transfer Using Diffie-Hellman Key Exchange Based On Cloud
IRJET Journal
 
Key aggregate searchable encryption (kase) for group data sharing via cloud s...
Key aggregate searchable encryption (kase) for group data sharing via cloud s...Key aggregate searchable encryption (kase) for group data sharing via cloud s...
Key aggregate searchable encryption (kase) for group data sharing via cloud s...
CloudTechnologies
 
IRJET- Privacy Preserving Cloud Storage based on a Three Layer Security M...
IRJET-  	  Privacy Preserving Cloud Storage based on a Three Layer Security M...IRJET-  	  Privacy Preserving Cloud Storage based on a Three Layer Security M...
IRJET- Privacy Preserving Cloud Storage based on a Three Layer Security M...
IRJET Journal
 
IRJET - A Secure AMR Stganography Scheme based on Pulse Distribution Mode...
IRJET -  	  A Secure AMR Stganography Scheme based on Pulse Distribution Mode...IRJET -  	  A Secure AMR Stganography Scheme based on Pulse Distribution Mode...
IRJET - A Secure AMR Stganography Scheme based on Pulse Distribution Mode...
IRJET Journal
 
Organizational compliance and security in Microsoft SQL 2012-2016
Organizational compliance and security in Microsoft SQL 2012-2016Organizational compliance and security in Microsoft SQL 2012-2016
Organizational compliance and security in Microsoft SQL 2012-2016
George Walters
 
IRJET- A Survey on Searching of Keyword on Encrypted Data in Cloud using ...
IRJET-  	  A Survey on Searching of Keyword on Encrypted Data in Cloud using ...IRJET-  	  A Survey on Searching of Keyword on Encrypted Data in Cloud using ...
IRJET- A Survey on Searching of Keyword on Encrypted Data in Cloud using ...
IRJET Journal
 
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Fuzzy keyword search over
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Fuzzy keyword search over2014 IEEE DOTNET CLOUD COMPUTING PROJECT Fuzzy keyword search over
2014 IEEE DOTNET CLOUD COMPUTING PROJECT Fuzzy keyword search over
IEEEFINALSEMSTUDENTPROJECTS
 
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Fuzzy keyword search over
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Fuzzy keyword search overIEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Fuzzy keyword search over
IEEE 2014 DOTNET CLOUD COMPUTING PROJECTS Fuzzy keyword search over
IEEEMEMTECHSTUDENTPROJECTS
 
2014 IEEE JAVA CLOUD COMPUTING PROJECT Performance and cost evaluation of an ...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Performance and cost evaluation of an ...2014 IEEE JAVA CLOUD COMPUTING PROJECT Performance and cost evaluation of an ...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Performance and cost evaluation of an ...
IEEEFINALSEMSTUDENTPROJECTS
 
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Performance and cost evaluation of an...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Performance and cost evaluation of an...IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Performance and cost evaluation of an...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Performance and cost evaluation of an...
IEEEGLOBALSOFTSTUDENTPROJECTS
 
2014 IEEE DOTNET DATA MINING PROJECT A novel model for mining association rul...
2014 IEEE DOTNET DATA MINING PROJECT A novel model for mining association rul...2014 IEEE DOTNET DATA MINING PROJECT A novel model for mining association rul...
2014 IEEE DOTNET DATA MINING PROJECT A novel model for mining association rul...
IEEEMEMTECHSTUDENTSPROJECTS
 
IEEE 2014 DOTNET DATA MINING PROJECTS A novel model for mining association ru...
IEEE 2014 DOTNET DATA MINING PROJECTS A novel model for mining association ru...IEEE 2014 DOTNET DATA MINING PROJECTS A novel model for mining association ru...
IEEE 2014 DOTNET DATA MINING PROJECTS A novel model for mining association ru...
IEEEMEMTECHSTUDENTPROJECTS
 
M021201092098
M021201092098M021201092098
M021201092098
theijes
 
Machine Learning and AI at Oracle
Machine Learning and AI at OracleMachine Learning and AI at Oracle
Machine Learning and AI at Oracle
Sandesh Rao
 
The Champion Supervisor
The Champion SupervisorThe Champion Supervisor
The Champion Supervisor
Hassan Rizwan
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Precisely
 
IRJET- Providing In-Database Analytic Functionalities to Mysql : A Proposed S...
IRJET- Providing In-Database Analytic Functionalities to Mysql : A Proposed S...IRJET- Providing In-Database Analytic Functionalities to Mysql : A Proposed S...
IRJET- Providing In-Database Analytic Functionalities to Mysql : A Proposed S...
IRJET Journal
 
Secure Text Transfer Using Diffie-Hellman Key Exchange Based On Cloud
Secure Text Transfer Using Diffie-Hellman Key Exchange Based On CloudSecure Text Transfer Using Diffie-Hellman Key Exchange Based On Cloud
Secure Text Transfer Using Diffie-Hellman Key Exchange Based On Cloud
IRJET Journal
 
Key aggregate searchable encryption (kase) for group data sharing via cloud s...
Key aggregate searchable encryption (kase) for group data sharing via cloud s...Key aggregate searchable encryption (kase) for group data sharing via cloud s...
Key aggregate searchable encryption (kase) for group data sharing via cloud s...
CloudTechnologies
 
Ad

More from David Walker (20)

Moving To MicroServices
Moving To MicroServicesMoving To MicroServices
Moving To MicroServices
David Walker
 
Big Data Week 2016 - Worldpay - Deploying Secure Clusters
Big Data Week 2016  - Worldpay - Deploying Secure ClustersBig Data Week 2016  - Worldpay - Deploying Secure Clusters
Big Data Week 2016 - Worldpay - Deploying Secure Clusters
David Walker
 
Data Works Berlin 2018 - Worldpay - PCI Compliance
Data Works Berlin 2018 - Worldpay - PCI ComplianceData Works Berlin 2018 - Worldpay - PCI Compliance
Data Works Berlin 2018 - Worldpay - PCI Compliance
David Walker
 
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy ClustersData Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
David Walker
 
Big Data Analytics 2017 - Worldpay - Empowering Payments
Big Data Analytics 2017  - Worldpay - Empowering PaymentsBig Data Analytics 2017  - Worldpay - Empowering Payments
Big Data Analytics 2017 - Worldpay - Empowering Payments
David Walker
 
Data Driven Insurance Underwriting
Data Driven Insurance UnderwritingData Driven Insurance Underwriting
Data Driven Insurance Underwriting
David Walker
 
Data Driven Insurance Underwriting (Dutch Language Version)
Data Driven Insurance Underwriting (Dutch Language Version)Data Driven Insurance Underwriting (Dutch Language Version)
Data Driven Insurance Underwriting (Dutch Language Version)
David Walker
 
An introduction to data virtualization in business intelligence
An introduction to data virtualization in business intelligenceAn introduction to data virtualization in business intelligence
An introduction to data virtualization in business intelligence
David Walker
 
BI SaaS & Cloud Strategies for Telcos
BI SaaS & Cloud Strategies for TelcosBI SaaS & Cloud Strategies for Telcos
BI SaaS & Cloud Strategies for Telcos
David Walker
 
Building an analytical platform
Building an analytical platformBuilding an analytical platform
Building an analytical platform
David Walker
 
Gathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data WarehousesGathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data Warehouses
David Walker
 
An introduction to social network data
An introduction to social network dataAn introduction to social network data
An introduction to social network data
David Walker
 
Using the right data model in a data mart
Using the right data model in a data martUsing the right data model in a data mart
Using the right data model in a data mart
David Walker
 
Implementing Netezza Spatial
Implementing Netezza SpatialImplementing Netezza Spatial
Implementing Netezza Spatial
David Walker
 
UKOUG06 - An Introduction To Process Neutral Data Modelling - Presentation
UKOUG06 - An Introduction To Process Neutral Data Modelling - PresentationUKOUG06 - An Introduction To Process Neutral Data Modelling - Presentation
UKOUG06 - An Introduction To Process Neutral Data Modelling - Presentation
David Walker
 
Oracle BI06 From Volume To Value - Presentation
Oracle BI06   From Volume To Value - PresentationOracle BI06   From Volume To Value - Presentation
Oracle BI06 From Volume To Value - Presentation
David Walker
 
Openworld04 - Information Delivery - The Change In Data Management At Network...
Openworld04 - Information Delivery - The Change In Data Management At Network...Openworld04 - Information Delivery - The Change In Data Management At Network...
Openworld04 - Information Delivery - The Change In Data Management At Network...
David Walker
 
IRM09 - What Can IT Really Deliver For BI and DW - Presentation
IRM09 - What Can IT Really Deliver For BI and DW - PresentationIRM09 - What Can IT Really Deliver For BI and DW - Presentation
IRM09 - What Can IT Really Deliver For BI and DW - Presentation
David Walker
 
IOUG93 - Technical Architecture for the Data Warehouse - Presentation
IOUG93 - Technical Architecture for the Data Warehouse - PresentationIOUG93 - Technical Architecture for the Data Warehouse - Presentation
IOUG93 - Technical Architecture for the Data Warehouse - Presentation
David Walker
 
ETIS11 - Enterprise Metadata Management
ETIS11 -  Enterprise Metadata ManagementETIS11 -  Enterprise Metadata Management
ETIS11 - Enterprise Metadata Management
David Walker
 
Moving To MicroServices
Moving To MicroServicesMoving To MicroServices
Moving To MicroServices
David Walker
 
Big Data Week 2016 - Worldpay - Deploying Secure Clusters
Big Data Week 2016  - Worldpay - Deploying Secure ClustersBig Data Week 2016  - Worldpay - Deploying Secure Clusters
Big Data Week 2016 - Worldpay - Deploying Secure Clusters
David Walker
 
Data Works Berlin 2018 - Worldpay - PCI Compliance
Data Works Berlin 2018 - Worldpay - PCI ComplianceData Works Berlin 2018 - Worldpay - PCI Compliance
Data Works Berlin 2018 - Worldpay - PCI Compliance
David Walker
 
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy ClustersData Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
Data Works Summit Munich 2017 - Worldpay - Multi Tenancy Clusters
David Walker
 
Big Data Analytics 2017 - Worldpay - Empowering Payments
Big Data Analytics 2017  - Worldpay - Empowering PaymentsBig Data Analytics 2017  - Worldpay - Empowering Payments
Big Data Analytics 2017 - Worldpay - Empowering Payments
David Walker
 
Data Driven Insurance Underwriting
Data Driven Insurance UnderwritingData Driven Insurance Underwriting
Data Driven Insurance Underwriting
David Walker
 
Data Driven Insurance Underwriting (Dutch Language Version)
Data Driven Insurance Underwriting (Dutch Language Version)Data Driven Insurance Underwriting (Dutch Language Version)
Data Driven Insurance Underwriting (Dutch Language Version)
David Walker
 
An introduction to data virtualization in business intelligence
An introduction to data virtualization in business intelligenceAn introduction to data virtualization in business intelligence
An introduction to data virtualization in business intelligence
David Walker
 
BI SaaS & Cloud Strategies for Telcos
BI SaaS & Cloud Strategies for TelcosBI SaaS & Cloud Strategies for Telcos
BI SaaS & Cloud Strategies for Telcos
David Walker
 
Building an analytical platform
Building an analytical platformBuilding an analytical platform
Building an analytical platform
David Walker
 
Gathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data WarehousesGathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data Warehouses
David Walker
 
An introduction to social network data
An introduction to social network dataAn introduction to social network data
An introduction to social network data
David Walker
 
Using the right data model in a data mart
Using the right data model in a data martUsing the right data model in a data mart
Using the right data model in a data mart
David Walker
 
Implementing Netezza Spatial
Implementing Netezza SpatialImplementing Netezza Spatial
Implementing Netezza Spatial
David Walker
 
UKOUG06 - An Introduction To Process Neutral Data Modelling - Presentation
UKOUG06 - An Introduction To Process Neutral Data Modelling - PresentationUKOUG06 - An Introduction To Process Neutral Data Modelling - Presentation
UKOUG06 - An Introduction To Process Neutral Data Modelling - Presentation
David Walker
 
Oracle BI06 From Volume To Value - Presentation
Oracle BI06   From Volume To Value - PresentationOracle BI06   From Volume To Value - Presentation
Oracle BI06 From Volume To Value - Presentation
David Walker
 
Openworld04 - Information Delivery - The Change In Data Management At Network...
Openworld04 - Information Delivery - The Change In Data Management At Network...Openworld04 - Information Delivery - The Change In Data Management At Network...
Openworld04 - Information Delivery - The Change In Data Management At Network...
David Walker
 
IRM09 - What Can IT Really Deliver For BI and DW - Presentation
IRM09 - What Can IT Really Deliver For BI and DW - PresentationIRM09 - What Can IT Really Deliver For BI and DW - Presentation
IRM09 - What Can IT Really Deliver For BI and DW - Presentation
David Walker
 
IOUG93 - Technical Architecture for the Data Warehouse - Presentation
IOUG93 - Technical Architecture for the Data Warehouse - PresentationIOUG93 - Technical Architecture for the Data Warehouse - Presentation
IOUG93 - Technical Architecture for the Data Warehouse - Presentation
David Walker
 
ETIS11 - Enterprise Metadata Management
ETIS11 -  Enterprise Metadata ManagementETIS11 -  Enterprise Metadata Management
ETIS11 - Enterprise Metadata Management
David Walker
 

Recently uploaded (20)

Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 

Keysum - Using Checksum Keys

  • 1. Σ KEYSUM © 1997 Data Management & Warehousing INTRODUCTION Keysum is a new and interesting technique (not a product) in the generation of keys within a database. It has particular application within Data Warehouses where keys are often made up of de-normalised alphanumeric data. THE PROBLEMS Data that has been de-normalised often has a primary key that is made up of a single string, a series of concatenated strings, or other data types that can be converted to strings. The key is traditionally costly in terms of storage requirements and access speed when used in an index. It is, however, vital to the usability of the data. The second issue is that in a data warehousing environment data may be loaded and assigned an arbitrary unique number as a key. If the data needs to be re-loaded at a later date, possibly with additions, then it is impossible to guarantee that the same arbitrary key will be assigned to the same row. THE SOLUTION The solution is simplicity itself. The generated key of the row should be the checksum of the string that makes up the unique key. This will, depending on the checksum algorithm chosen, generate a large integer that will be nearly unique within the scope of the data. For example using the industry standard CRC32 algorithm will generate a number in the range 0 to 4294967296, whilst using the Message Digest algorithm MD5 will generate a number between 0 and 3.4 * 1038 . In addition to this the result can incorporate the length of the original string which improves the uniqueness of lower order algorithm results considerably. HOW DOES THIS HELP? The table key is now an integer, the optimal format on which to index. The user now calls a function to convert the required string into the checksum and uses the index to look up the appropriate row. On very large tables this is considerably faster than conventional string look-up. Furthermore the data can be validated, as, if the current checksum differs from the stored checksum then the data has changed. This also works when re- loading data, as any existing data will still be able to reference the old key. It should also be noted that when a field within the key is altered the key also needs to be re-generated. If this technique is used in contexts such as trend analysis within a Data Warehouse it is also possible that the occasional mis-match because of a duplicate checksum will not be statistically significant and therefore the key can be considered unique. WHAT ARE THE ISSUES? No checksum is guarantied to be unique. It is therefore possible that two different records can return the same value. If the length is included in the checksum it is still not guarantied but it further reduces the risk. When choosing a checksum algorithm it is important to consider the amount of records for which the checksum will provide a key. If you have a table with 500,000 rows (such as a table that contains addresses) then CRC32 will have an 8500:1 chance of duplicates without considering the length of the original string.
  • 2. MD5 on the other has the remote 6.8*1032 :1 chance of generating a duplicate checksum. This is because it uses 128 bits rather than CRC32 which uses only 32 bits. When implementing the algorithm it is important to note that checksums normally return unsigned integers as their result. Your database and routines that access the checksum must all be able to handle the size of the result and ensure that they deal with the issue of signed versus unsigned variables. IS THIS FEATURE AVAILABLE NOW? There is no direct implementation of a checksum within the SQL Dialects of the major vendors currently available, however it can be implemented via an external procedure call. The author has implemented this technique within an Oracle7™ database. A daemon was created that took as its input the string and returned two values, the checksum and the length. This was connected to the database via a ‘Database Pipe’. When a checksum was required a PL/SQL stored procedure was called that placed the string into the database pipe and received the two values, the checksum and the length, back. The daemon was also implemented as a shared library so that it could be accessed from the command line and from other utilities that could call a shared ‘C’ library. An optional parameter was included to allow the use of different algorithms in different contexts. For example where only a small data set needs a checksum key then CRC32 may be suitable, whilst MD5 is used only for the largest data sets. WHERE DO I GET A CHECKSUM ALGORITHM? The inevitably answer to this question is ‘From the Internet’. Any site that distributes the source for FreeBSD includes an implementation of CRC32. MD5 is also widely available. THE FUTURE DIRECTION The author hope that in the future that Database vendors such as Oracle will add the checksum function to their SQL dialects. Once available as a in- built function the need to implement checksums via external procedure calls will disappear and performance will be improved even more. It will also allow some standardisation is the choice and handling of the checksum algorithms. Data Management & Warehousing is the trading name of David M Walker, a freelance Data Warehousing consultant. Address: 138, Finchampstead Road, Wokingham, Berkshire, RG41 2NU, United Kingdom. WWW: http://www.datamgmt.com Telephone: +44 (0) 7050 028 911 Fax: +44 (0) 7050 028 912 Copyright © 1997 All rights reserved. All Copyrights and Trademarks respected MD5 Copyright © 1991-2, RSA Data Security, Inc. Oracle7™ is a trademark of Oracle Corporation WHAT IS THE MD5 MESSAGE-DIGEST ALGORITHM? MD5 is a message-digest algorithm. The algorithm takes as input a message of arbitrary length and produces as output a 128-bit "fingerprint" or "message digest" of the input. It is conjectured that it is computationally infeasible to produce two messages having the same message digest, or to produce any message having a given pre-specified target message digest. The MD5 algorithm is designed to be quite fast on 32-bit machines. In addition, the MD5 algorithm does not require any large substitution tables; the algorithm can be coded quite compactly. Copyright (C) 1991-2, RSA Data Security, Inc. Created 1991. All rights reserved.