Practical Data Mining 1st Edition Monte F. Hancock Jr 2024 scribd download
Practical Data Mining 1st Edition Monte F. Hancock Jr 2024 scribd download
https://ebookultra.com
https://ebookultra.com/download/practical-data-
mining-1st-edition-monte-f-hancock-jr/
https://ebookultra.com/download/data-mining-practical-machine-
learning-tools-and-techniques-2nd-edition-ian-h-witten/
ebookultra.com
https://ebookultra.com/download/making-sense-of-data-i-a-practical-
guide-to-exploratory-data-analysis-and-data-mining-2nd-edition-glenn-
j-myatt/
ebookultra.com
https://ebookultra.com/download/data-mining-and-data-warehousing-1st-
edition-s-k-mourya/
ebookultra.com
https://ebookultra.com/download/practical-graph-mining-with-r-
instructor-solution-manual-solutions-1st-edition-nagiza-f-samatova/
ebookultra.com
Exploratory Data Mining and Data Cleaning 1st Edition
Tamraparni Dasu
https://ebookultra.com/download/exploratory-data-mining-and-data-
cleaning-1st-edition-tamraparni-dasu/
ebookultra.com
https://ebookultra.com/download/biological-data-mining-chapman-hall-
crc-data-mining-and-knowledge-discovery-series-1st-edition-jake-y-
chen/
ebookultra.com
https://ebookultra.com/download/music-data-mining-1st-edition-tao-li/
ebookultra.com
Hancock
Achieves a unique and delicate balance between depth, breadth, and clarity.
—Stefan Joe-Yen, Cognitive Research Engineer, Northrop Grumman Corporation
& Adjunct Professor, Department of Computer Science, Webster University
Used as a primer for the recent graduate or as a refresher for the grizzled veteran,
Practical Data Mining is a must-have book for anyone in the field of data
mining and analytics.
Used by corporations, industry, and government to inform and fuel everything from
focused advertising to homeland security, data mining can be a very useful tool
across a wide range of applications. Unfortunately, most books on the subject are
designed for the computer scientist and statistical illuminati and leave the reader
largely adrift in technical waters.
Revealing the lessons known to the seasoned expert, yet rarely written down for
the uninitiated, Practical Data Mining explains the ins-and-outs of the detection,
characterization, and exploitation of actionable patterns in data. This working field
manual outlines the what, when, why, and how of data mining and offers an easy-
to-follow, six-step spiral process.
Helping you avoid common mistakes, the book describes specific genres of data
mining practice. Most chapters contain one or more case studies with detailed
project descriptions, methods used, challenges encountered, and results obtained.
The book includes working checklists for each phase of the data mining process.
Your passport to successful technical and planning discussions with management,
senior scientists, and customers, these checklists lay out the right questions to ask
and the right points to make from an insider’s point of view.
http://www. celestech.com/PracticalDataMining
K13109
ISBN: 978-1-4398-6836-2
90000
w w w. c rc p r e s s . c o m
9 781439 868362
www.auerbach-publications.com
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts
have been made to publish reliable data and information, but the author and publisher cannot assume
responsibility for the validity of all materials or the consequences of their use. The authors and publishers
have attempted to trace the copyright holders of all material reproduced in this publication and apologize to
copyright holders if permission to publish in this form has not been obtained. If any copyright material has
not been acknowledged please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmit-
ted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented,
including photocopying, microfilming, and recording, or in any information storage or retrieval system,
without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.
com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood
Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and
registration for a variety of users. For organizations that have been granted a photocopy license by the CCC,
a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used
only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
This book is dedicated to my beloved wife, Sandy, and to my dear little sister, Dr.
Angela Lobreto. You make life a joy.
Also, to my professional mentors George Milligan, Dr. Craig Price, and Tell Gates,
three of the finest men I have ever known, or ever hope to know: May God bless you
richly, gentlemen; He has blessed me richly through you.
v
This page intentionally left blank
Contents
Dedication v
Preface xv
Acknowledgments xxiii
vii
viii Practical Data Mining
References 261
Glossary 263
Index 269
Preface
Data mining is much more than just trying stuff and hoping something good happens!
Rather, data mining is the detection, characterization, and exploitation of actionable
patterns in data.
This book is a wide-ranging treatment of the practical aspects of data mining in
the real-world. It presents in a systematic way the analytic principles acquired by the
author during his 30+ years as a practicing engineer, data miner, information scientist,
and Adjunct Professor of Computer Science.
This book is not intended to be read and then put on the shelf. Rather, it is a working
field manual, designed to serve as an on-the-job guidebook. It has been written specifi-
cally for IT consultants, professional data analysts, and sophisticated data owners who
want to establish data mining projects; but are not themselves data mining experts.
Most chapters contain one or more cases studies. These are synopses of data min-
ing projects led by the author, and include project descriptions, the data mining meth-
ods used, challenges encountered, and the results obtained. When possible, numerical
details are provided, grounding the presentation in specifics.
Also included are checklists that guide the reader through the practical considera-
tions associated with each phase of the data mining process. These are working check-
lists: material the reader will want to carry into meetings with customers, planning
discussions with management, technical planning meetings with senior scientists,
etc. The checklists lay out the questions to ask, the points to make, explain the what’s
and why’s—the lessons learned that are known to all seasoned experts, but rarely
written down.
While the treatment here is systematic, it is not formal: the reader will not encoun-
ter eclectic theorems, tables of equations, or detailed descriptions of algorithms. The
“bit-level” mechanics of data mining techniques are addressed pretty well in online
literature, and freeware is available for many of them. A brief list of vendors and sup-
ported applications is provided below. The goal of this book is to help the non-expert
address practical questions like:
xv
xvi Practical Data Mining
The content of the book is divided into two parts: Chapters 1–8 and Chapters 9–11.
The first eight chapters constitute the bulk of the book, and serve to ground the
reader in the practice of data mining in the modern enterprise. These chapters focus
on the what, when, why, and how of data mining practice. Technical complexities are
introduced only when they are essential to the treatment. This part of the book should
be read by everyone; later chapters assume that the reader is familiar with the concepts
and terms presented in these chapters.
Chapter 1 (What is Data Mining and What Can it Do?) is a data mining manifesto:
it describes the mindset that characterizes the successful data mining practitioner. It
delves into some philosophical issues underlying the practice (e.g., Why is it essential
that the data miner understand the difference between data and information?).
Chapter 2 (The Data Mining Process) provides a summary treatment of data min-
ing as a six-step spiral process.
Chapters 3–8 are devoted to each of the steps of the data mining process. Check-
lists, case studies, tables, and figures abound.
The last three chapters, 9–11, are devoted to specific categories of data mining
practice, referred to here as genres. The data mining genres addressed are Chapter
9: Detecting and Characterizing Known Patterns (Supervised Learning), Chapter 10:
Detecting, Characterizing, and Exploiting Hidden Patterns (Forensic Analysis), and
Chapter 11: Knowledge: Its Acquisition, Representation, and Use.
Preface xvii
It is hoped the reader will benefit from this rendition of the author’s extensive
experience in data mining/modeling, pattern processing, and automated decision
support. He started this journey in 1979, and learned most of this material the hard
way. By repeating his successes and avoiding his mistakes, you make his struggle
worthwhile!
As originally conceived, computers were just that: machines for performing computa-
tion. Volumes of data might be input, but the answer tended to consist of just a few
numbers. Early computers had nothing that we would call online storage.
Reliable, inexpensive mass storage devices did not exist. Data was not stored in the
computer at all: it was input, transformed, and output. Computing was done to obtain
answers, not to manage data
Data was saved outside of the computer, on paper tape and cards, and read back in
when needed. The use of online mass storage was not widespread, because it was expen-
sive, slow, and unstable.
With the invention of stable, cost-effective mass storage devices, everything changed.
Over time, the computer began to be viewed less as a machine for crunching numbers,
and more as a device for storing them. Initially, the operating system’s file management
system was used to hold data in flat files: un-indexed lists or tables of data. As the
need to search, sort, and process data grew, it became necessary to provide applications
for organizing data into various types of business-specific hierarchies. These early
databases organized data into tiered structures, allowing for rapid searching of records
in the hierarchy.
Data was stored on high-density media such as magnetic tape, and magnetic drum.
Platter disc technology began to become more generally used, but was still slow and
had low capacity.
xviii Practical Data Mining
Reliable, cost-effective online mass storage became widely available. Data was organized
into domain specific vertical structures, typically for a single part of an organization.
This allowed the development of stovepipe systems for focused applications. The use of
Online Transaction Processing (OLTP) systems became widespread, supporting inven-
tory, purchasing, sales, planning, etc. The focus of computing began to shift from raw
computation to data processing: the ingestion, transformation, storage, and retrieval
of bulk data.
However, there was an obvious shortcoming. The databases of functional orga-
nizations within an enterprise were developed to suit the needs of particular business
units. They were not interoperable, making the preparation of an enterprise-wide data
view very difficult. The difficulty of horizontal integration caused many to question
whether the development of enterprise-wide databases was feasible.
As the utility of automatic data storage became clear, organizations within businesses
began to construct their own hierarchical databases. Soon, the repositories of corporate
information on all aspects of a business grew to be large.
Increased processing power, widespread availability of reliable communication net-
works, and development of database technology allowed the horizontal integration of
multiple vertical data stores into an enterprise-wide database. For the first time, a global
view of an entire organization’s data repository was accessible through a single portal.
This brings us to the present. Mass storage and raw compute power has reached the
point today where virtually every data item generated by an enterprise can be saved.
And often, enterprise databases have become extremely large, architecturally complex,
and volatile. Ultra-sophisticated data modeling tools have become available at the pre-
cise moment that competition for market share in many industries begins to peak. An
appropriate environment for application of these tools to a cleansed, stable, offline
repository was needed and data warehouses were born. And, as data warehouses have
grown large, the need to create architecturally compatible functional subsets, or data
marts, has been recognized.
The immediate future is moving everything toward cloud computing. This will
include the elimination of many local storage disks as data is pushed to a vast array of
external servers accessible over the internet. Data mining in the cloud will continue
to grow in importance as network connectivity and data accessibility become virtu-
ally infinite.
Some feeling for the current interest in data mining can be gained by reviewing the
following list of data mining companies, groups, publications, and products.
Preface xix
The data mining tools in the following list are used for general types of data:
• Netica—BBN software that is easy to use, and implements BBN learning from
data. It has a nice user interface.
http://www.norsys.com
• Hugin—Implements reasoning with continuous variables and has a nice user
interface.
http://www.hugin.dk
About the Author
Monte F. Hancock, Jr., BA, MS, is Chief Scientist for Celestech, Inc., which has
offices in Falls Church, Virginia, and Phoenix, Arizona. He was also a Technical
Fellow at Northrop Grumman; Chief Cognitive Research Scientist for CSI, Inc., and
was a software architect and engineer at Harris corporation, and HRB Singer, Inc.
He has over 30 years of industry experience in software engineering and data mining
technology development.
He is also Adjunct Full Professor of Computer Science for the Webster University
Space Coast Region, where he serves as Program Mentor for the Master of Science
Degree in Computer Science. Monte has served for 26 years on the adjunct faculty in
the Mathematics and Computer Science Department of the Hamilton Holt School of
Rollins College, Winter Park, Florida, and served 3 semesters as adjunct Instructor in
Computer Science at Pennsylvania State University.
Monte teaches secondary Mathematics, AP Physics, Chemistry, Logic, Western
Philosophy, and Church History at New Covenant School, and New Testament Greek
at Heritage Christian Academy, both in Melbourne, Florida. He was a mathematics
curriculum developer for the Department of Continuing Education of the University
of Florida in Gainesville, and serves on the Industry Advisory Panels in Computer
Science for both the Florida Institute of Technology, and Brevard Community
College in Melbourne, Florida. Monte has twice served on panels for the National
Science Foundation.
Monte has served on many program committees for international data mining con-
ferences, was a Session Chair for KDD. He has presented 15 conference papers, edited
several book chapters, and co-authored the book Data Mining Explained with Rhonda
Delmater, Digital Press, 2001.
Monte is cited in (among others):
xxi
xxii Practical Data Mining
It is always a pleasure to recognize those who have provided selfless support in the
completion of a significant work.
Special thanks is due to Rhonda Delmater, with whom I co-authored my first book,
Data Mining Explained (Digital Press, 2001), and who proposed the development of
this book. Were it not for exigent circumstances, this would have been a joint work.
Special thanks are also due to Theron Shreve (acquisition editor), Marje Pollack
(compositor), and Rob Wotherspoon (copy editor) of Derryfield Publishing Services,
LLC. What a pleasure to work with professionals who know the business and under-
stand people!
Special thanks are due to Dan Strohschein, who worked on technical references,
and Katherine Hancock, who verified the vendor list.
Finally, to those who have made significant contributions to my knowledge
through the years: John Day, Chad Sessions, Stefan Joe-Yen, Rusty Topping, Justin
Mortimer, Leslie Kain, Ben Hancock, Olivia Hancock, Marsha Foix, Vinnie, Avery,
Toby, Tristan, and Maggie.
xxiii
This page intentionally left blank
Chapter 1
What Is Data Mining
and What Can It Do?
Purpose
The purpose of this chapter is to provide the reader with grounding in the fundamental
philosophical principles of data mining as a technical practice. The reader is then intro-
duced to the wide array of practical applications that rely on data mining technology.
The issue of computational complexity is addressed in brief.
Goals
After you have read this chapter, you will be able to define data mining from both
philosophical and operational perspectives, and enumerate the analytic functions data
mining performs. You will know the different types of data that arise in practice. You
will understand the basics of computational complexity theory. Most importantly, you
will understand the difference between data and information.
1.1 Introduction
Our study of data mining begins with two semi-formal definitions:
1
2 Practical Data Mining
Taking this view of what data mining is we can formulate a functional definition
that tells us what individuals engaged in data mining do.
Definition 2. Data Mining is the application of the scientific method to data to obtain
useful information. The heart of the scientific approach to problem-solving is rational
hypothesis testing guided by empirical experimentation.
What we today call science today was referred to as natural philosophy in the 15th
century. The Aristotelian approach to understanding the world was to catalog and
organize more-or-less passive acts of observation into taxonomies. This method began
to fall out of favor in the physical sciences in the 15th century, and was dead by the 17th
century. However, because of the greater difficulty of observing the processes underly-
ing biology and behavior, the life sciences continued to rely on this approach until well
into the 19th century. This is why the life sciences of the 1800s are replete with taxono-
mies, detailed naming conventions, and perceived lines of descent, which are more a
matter of organizing observations than principled experimentation and model revision.
Applying the scientific method today, we expect to engage in a sequence of planned
steps:
cations, but it also has philosophical implications. In particular, since there are by
definition no perfect techniques for intractable problems, different people will handle
them in different ways; no one can say definitively that one way is necessarily wrong
and another right. This makes data mining something of an art, and leaves room for
the operation of both practical experience and creative experimentation. It also implies
that the data mining philosophy to which you look when science falls short can mean
the difference between success and failure. Let’s talk a bit about developing such a data
mining philosophy.
As noted above, data mining can be thought of as the application of the scientific
method to data. We perform data collection (sampling), formulate hypotheses (e.g.,
visualization, cluster analysis, feature selection), conduct experiments (e.g., construct
and test classifiers), refine hypotheses (spiral methodology), and ultimately build theo-
ries (field applications). This is a process that can be reviewed and replicated. In the real
world, the resulting theory will either succeed or fail.
Many of the disciplines that apply to empirical scientific work also apply to the
practice of data mining: assumptions must be made explicit; the design of principled
experiments capable of falsifying our hypotheses is essential; the integrity of the evi-
dence, process, and results must be meticulously maintained and documented; out-
comes must be repeatable; and so on. Unless these disciplines are maintained, nothing
of certain value can result. Of particular importance is the ability to reproduce results.
In the data mining world, these disciplines involve careful configuration management
of the system environment, data, applications, and documentation. There are no effec-
tive substitutes for these.
One of the most difficult mental disciplines to maintain during data mining work
is reservation of judgment. In any field involving hypothesis and experimentation, pre-
liminary results can be both surprising and exhilarating. Finding the smoking gun in a
forensic study, for example, is hitting pay-dirt of the highest quality, and it is hard not
to get a little excited if you smell gunpowder.
However, this excitement cannot be allowed to short-circuit the analytic pro-
cess. More than once I have seen exuberant young analysts charging down the hall
to announce an amazing discovery after only a few hours’ work with a data set; but
I don’t recall any of those instant discoveries holding up under careful review. I can
think of three times when I have myself jumped the gun in this way. On one occa-
sion, eagerness to provide a rapid response led me to prematurely turn over results to
a major customer, who then provided them (without review) to their major customer.
Unfortunately, there was an unnoticed but significant flaw in the analysis that invali-
dated most of the reported results. That is a trail of culpability you don’t want leading
back to your office door.
• Can these patterns be presented to users in a way that will facilitate their assess-
ment, understanding, and exploitation?
• Can a machine learn these patterns and their relevant interpretations?
Data mining helps the user interact productively with the data
• Planning helps the user achieve and maintain situational awareness of vast,
dynamic, ambiguous/incomplete, disparate, multi-source data.
• Knowledge leverages users’ domain knowledge by creating functionality based
upon an understanding of data creation, collection, and exploitation.
• Expressiveness produces outputs of adjustable complexity delivered in terms
meaningful to the user.
• Pedigree builds integrated metrics into every function, because every recommen-
dation has to have supporting evidence and an assessment of certainty.
• Change uses future-proof architectures and adaptive algorithms that anticipate
many users addressing many missions.
Data mining enables the user to get their head around the problem space
Decision Support is all about . . .
at a grocery store, the pattern of capillaries in your retina, election results, etc. In fact:
A datum (singular) is any symbolic representation of any attribute of any given thing.
More than one datum constitutes data (plural).
data, because (in theory), any temperature within a reasonable range could actually
occur. Time is usually assumed to be continuous in this sense, as is distance; therefore
sizes, distances, and durations are all continuous data.
On the other hand, when the possible data values can be placed in a list, they are
discrete: hair color, gender, quantum states (depending upon whom you ask), head-
count for a business, the positive whole numbers (an infinite set) etc., are all discrete.
A very important difference between discrete and continuous data for data mining
applications is the matter of error. Continuous data can presumably have any amount
of error, from very small to very large, and all values in between. Discrete data are either
completely right or completely wrong.
The standard mining analogy is helpful here. Data mining is similar in some ways
to mining for precious metals:
• Silver mining. Prospectors survey a region and select an area they think might
have ore, the rough product that is refined to obtain metal. They apply tools
to estimate the ore content of their samples and if it is high enough, the ore is
refined to obtain purified silver.
• Data mining. Data miners survey a problem space and select sources they think
might contain salient patterns, the rough product that is refined to obtain infor-
mation. They apply tools to assess the information content of their sample and if
it is high enough, the data are processed to infer latent information.
However, there is a very important way in which data mining is not like silver
mining. Chunks of silver ore actually contain particular silver atoms. When a chunk
of ore is moved, its silver goes with it. Extending this part of the silver mining analogy
to data mining will get us into trouble. The silver mining analogy fails because of the
fundamental difference between data and information.
The simplest scenario demonstrating this difference involves their different relation
to context. When I remove letters from a word, they retain their identity as letters, as do
the letters left behind. But the information conveyed by the letters removed and by the
letters left behind has very likely been altered, destroyed, or even negated.
Another example is found in the dependence on how the information is encoded. I
convey exactly the same message when I say “How are you?” that I convey when I say
“Wie gehts?,” yet the data are completely different. Computer scientists use the terms
syntax and semantics to distinguish between representation and meaning, respectively.
It is extremely dangerous for the data miner to fall into the habit of regarding partic-
ular pieces of information as being attached to particular pieces of data in the same way
that metal atoms are bound to ore. Consider a more sophisticated, but subtle example:
A Morse code operator sends a message consisting of alternating, evenly spaced dots
and dashes (Figure 1.3):
This is clearly a pattern but other than manifesting its own existence, this pattern
conveys no information. Information Theory tells that us such a pattern is devoid of
information by pointing out that after we’ve listened to this pattern for a while, we can
perfectly predict which symbol will arrive next. Such a pattern, by virtue of its complete
predictability is not informative: a message that tells me what I already know tells me
nothing. This important notion can be quantified in the Shannon Entropy (see glos-
sary). However, if the transmitted tones are varied or modulated, the situation is quite
different (Figure 1.4):
What Is Data Mining and What Can It Do? 11
This example makes is quite clear that information does not reside within the dots
and dashes themselves; rather, it arises from an interpretation of their inter-relation-
ships. In Morse code, this is their order and duration relative to each other. Notice that
by removing the first dash from O = - - -, the last two dashes now mean M = - -, even
though the dashes have not changed. This context sensitivity is a wonderful thing, but
it causes data mining disaster if ignored.
A final illustration called the Parity Problem convincingly establishes the distinct
nature of data and information in a data mining context.
original pair of bits is, what will he say? And if I ask Bob what the parity of the original
pair of bits is, what will he say?
Neither one can say what the parity of the original pair is, because each one is lack-
ing a bit. If I handed Al a one, he could reason that if the bit I can’t see is also a one,
then the parity of the original pair is even. But if the bit I can’t see is a zero, then the
parity of the original pair is odd. Bob is in exactly the same boat.
Riddle one. Al is no more able to state the parity of the original bit pair than he was
before he was given his bit and the same is true for Bob. That is, each one has 50% of
the data, but neither one has received any information at all.
Suppose now that I have 100 lab assistants, and 100 randomly generated bits of
data. To assistant 1, I give all the bits except bit 1; to assistant 2, I give all the bits except
bit 2; and so on. Each assistant has received 99% of the data. Yet none of them is any
more able to state the parity of the original 100-bit data set than before they received
99 of the bits.
What Is Data Mining and What Can It Do? 13
Riddle two. Even though each assistant has received 99% of the data, none of them
has received any information at all.
Riddle three. The information in the 100 data bits cannot be in the bits themselves.
For, which bit is it in? Not bit 1, since that bit was given to 99 assistants, and didn’t
provide them with any information. Not bit 2, for the same reason. In fact, it is clear
that the information cannot be in any of the bits themselves. So, where is it?
Riddle four. Suppose my 100 bits have odd parity (say, 45 ones and 55 zeros). I arrange
them on a piece of paper, so they spell the word “odd.” Have I added information? If
so, where is it? (Figure 1.6)
Riddle five. Where is the information in a multiply encrypted message, since it com-
pletely disappears when one bit is removed?
Riddle two. In the late 18th century, many examples of Egyptian hieroglyphics were
known, but no one could read them. Did they have meaning? Apparently not, since
there were no “rememberers.” In 1798, the French found the Rosetta Stone, and within
the next 20 or so years, this “lost” language was recovered, and with it, the “mean-
ing” of Egyptian hieroglyphics. So, was the meaning “in” the hieroglyphics, or was it
“brought to” the hieroglyphics by its translators?
Riddle three. If I write a computer program to generate random but intelligible stories
(which I have done, by the way), and it writes a story to a text file, does this story have
meaning before any person reads the file? Does it have meaning after a person reads the
file? If it was meaningless before but meaningful afterwards, where did the meaning
come from?
Riddle four. Two cops read a suicide note, but interpret it in completely different
ways. What does the note mean?
Riddle five. Suppose I take a large number of tiny pictures of Abraham Lincoln and
arrange them, such that they spell out the words “Born in 1809”; is additional mean-
ing present?
Riddle six. On his deathbed, Albert Einstein whispered his last words to the nurse
caring for him. Unfortunately, he spoke them in German, which she did not under-
stand. Did those words mean anything? Are they now meaningless?
Riddle seven. When I look at your family photo album, I don’t recognize anyone, or
understand any of the events depicted; they convey nothing to me but what they imme-
diately depict. You look at the album, and many memories of people, places, and events
are engendered; they convey much. So, where is the meaning? Is it in the pictures, or
is it in the viewer?
As we can see by considering the questions above, the meaning of a data set arises
during an act of interpretation by a cognitive agent. At least some of it resides outside
the data itself. This external content we normally regard as being in the domain ontol-
ogy; it is part of the document context, and not the document itself.
Complexity arises in many ways, precisely because there are many ways that latent
information can be obscured. For example, data can be complex because they are
unwieldy. This can mean many records and/or many fields within a record (dimen-
sions). Large data sets are difficult to manipulate, making their information content
more difficult and time consuming to tap.
Data can also be complex because their information content is spread in some
unknown way across multiple fields or records. Extracting information present in com-
plicated bindings is a combinatorial search problem. Data can also be complex because
the information they contain is not revealed by available tools. For example, visualiza-
tion is an excellent information discovery tool, but most visualization tools do not sup-
port high-dimensional rendering.
Data can be complex because the patterns that contain interesting information
occur rarely. Data can be complex because they just don’t contain very much informa-
tion at all. This is a particularly vexing problem because it is often difficult to deter-
mine whether the information is not visible, or just not present.
There is also the issue of whether latent information is actionable. If you are trying
to construct a classifier, you want to characterize patterns that discriminate between
classes. There might be plenty of information available, but little that helps with this
specific task.
Sometimes the format of the data is a problem. This is certainly the case when those
data that carry the needed information are collected/stored at a level of precision that
obscures it (e.g., representing continuous data in discrete form).
Finally, there is the issue of data quality. Data of lesser quality might contain infor-
mation, but at a low level of confidence. In this case, even information that is clearly
present might have to be discounted as unreliable.
Lots of research has been conducted to determine the Big O complexity of various
algorithms. It is generally held that algorithms having polynomial complexity, O(np), are
tractable, while more demanding Big O complexities are intractable. The details can’t
be addressed here, but we do note that many data mining problems (optimal feature
selection, optimal training of a classifier, etc.) have a computational complexity that is
beyond any polynomial level. In practice, this means that data miners must be content
with solutions that are good enough. These are referred to as satisficing solutions.
Problems that are very computationally complex in their general case may fall into
a class of problems referred to as NP-Hard. These problems, which have no known
efficient algorithmic solutions, are frequently encountered in data mining work. Often
problems in a domain are arranged in a hierarchy to help system architects make engi-
neering trades (Figure 1.8).
• The Knapsack Problem. Given cubes of various sizes and materials (and hence,
values), find the highest value combination that fits within a given box.
• The Traveling Salesman Problem. Given a map with N points marked, find the
shortest circuit (a route that ends where it starts) that visits each city exactly once.
• The Satisfiability Problem. Given a boolean expression, determine whether
there is an assignment of the variables that makes it true.
• The Classifier Problem. Given a neural network topology and a training set,
find the weights that give the best classification score.
1.8 Summary
The purpose of this chapter was to provide the reader with a grounding in the fun-
damental principles of data mining as a technical practice. Having read this chapter,
you are now able to define data mining from both a philosophical and operational
perspective, and enumerate the analytic functions data mining performs. You know
the different types of data that arise in practice. You have been introduced to the basics
18 Practical Data Mining
Coming up
The next chapter presents a spiral methodology for managing the data mining process.
The key principles underlying this process are summarized in preparation for the
detailed treatments that follow later.
Chapter 2
The Data Mining Process
Purpose
The purpose of this chapter is to provide the reader with a deeper understanding of
the fundamental principles of data mining. It presents an overview of data mining as
a process of discovery and exploitation that is conducted in spirals, each consisting of
multiple steps. A Rapid Application Development (RAD) data mining methodology is
presented that accommodates disruptive discovery and changing requirements.
Goals
After you have read this chapter, you will be able to explain the more complex princi-
ples of data mining as a discipline. You will be familiar with the major components
of the data mining process, and will know how these are implemented in a spiral
methodology. Most importantly, you will understand the relative strengths and weak-
nesses of conventional and RAD development methodologies as they relate to data
mining projects.
2.1 Introduction
Successful data mining requires the cultivation of an appropriate mindset. There are
many ways that data mining efforts can go astray; even seemingly small oversights can
cause significant delays or even project failure. Just as pilots must maintain situational
awareness for safe performance, data miners must remember where they are in their
analysis, and where they are going. All of this demands a principled approach imple-
mented as a disciplined process.
19
20 Practical Data Mining
• Discovery
o Detect actionable patterns in data
o Characterize actionable patterns in data
Exploring the Variety of Random
Documents with Different Content
further to the right, and the wind has changed to N. The high
shores, behind which the whole country is bare, with the exception
of a few uschàrs, and seems to lie higher, approach again the river
on the left; and two villages shew themselves at some hundred
paces, on the gently-ascending downs; below them the old river-bed
appears on dry ground.
The Shilluks, armed with lances, and standing on the shore,
shout again their “Habàba!” but we sail now, and they do not offer
us anything, much as we should like to make use of their cows and
wood; and besides there are two many of them. Groups of tokuls
stand in a row. A quarter after twelve, continually E.S.E. Half-past
twelve, S.E. by E.; to the left, E. The wind has changed, and is
contrary; so we go E.S.E. The Shilluks also have sleeping-places,
open at the top, wherein warm ashes form their beds, with which
also they powder their hair, thereby making it look grey.
A quarter before one. From E. by N. A gohr on the right, and we
go, at one o’clock, E.S.E. Half-past one. The river takes a direction
before us to E., with some little inlets, so that we cannot see the
lower shore. The wind blows strongly against us from E. We have
but scanty fare, being without meat. I cannot deny kew to myself
now, for I really want it.
Half-past two. E. by S. A Haba on the right, before it a lake
connected with the river in front; the forest is upon a gentle
declivity, and covered with shrubs, thorns, and dwarf-trees, even to
the edge of the water. The shore also falls away gently to the river,
near which it only rises a little above the narrow green margin of
grass. We halt close to the right shore, owing to want of wood.
The shore ascends to about fifteen feet high, where the trees
begin, and is composed of nothing but mimosas, although the Nile
very certainly does not flow over it; for the river has full play far
away to the left.
If we call these lakes, marshes, and reed-morasses, a
longitudinal valley, enclosed as they are with the Nile between two
high shores, which, however, do not ascend to the due height, the
original shores perhaps lying still further by the irregular low line of
mountains, or rather hills, it is plain that the same is gradually filled
by alluvial deposits from the mountains of Bari, or from above, and
an accumulation of vegetables, or the momentary sprouting forth of
an corresponding kingdom of plants, must have soon followed the
more important vegetable matter. As the sluices of the so-called
valley pour into the great Nile, it must have falls on a level with the
Nile itself, and has, therefore, dug a bed, and made an even slope to
this side, after the stream had removed the first barriers or dikes of
the high shores, which are now secure from any inundation. A river-
bed, indeed, naturally becomes deeper when there is a proper fall
and a regular conduit. The lower Nile has elevated its bed, because
it has but few vents. Why could not the White River have a similar
retrograde connection of water, which is prevented from flowing off,
such as is the case, in the first place, near Khartùm? The Nile here
might have been previously in majestic fullness, and flowed rapidly
between the present old shores to Khartùm, until it created shallows
and islands, where reeds and water-plants of every species sprang
forth luxuriantly from the nearly stagnant water, and vehemently
opposed the natural course of the river, seized the alluvial deposits
from above in their polypi-arms, and rose to what we now see to be
meadows and marshes.
The Shilluks are tolerably acquainted with the good disposition of
the Turks: as soon as a vessel approaches a group of them, they get
up and go away; this even befell Selim Capitan, in spite of his
interpreter. When they see us coming, they drive the cows from the
water, even without letting them drink. We on our side are afraid,
and with justice, to land on the inhabited spots. I brought back two
guinea-fowls, the produce of my shooting excursion with my
servants; I had seen Suliman Kashef with one of a similar kind
above. They are not at all like those in Taka, and different only from
those of Europe by the darker colour of their plumage. We shall
remain here to-night; thunder and rain have been satisfied with
merely threatening us,—and are happily over. I disembark once
more, and see fifty to sixty giraffes in the level shore towards the
horizon, but it was too late to get at them. The thermometer was at
nine o’clock in the morning 21°, but did not get up afterwards to
more than 28°, fortunately for us,—not so much on account of
shooting as because the heat might have been insupportable, for we
were between these high shores à talus, with an average angle of
25° to 30°, and the wind was entirely still.
10th March. We remain to-day here for the sake of shooting,
conformably to Suliman Kashef’s determination. His halberdiers set
off to-night to follow the course of the giraffes, and to find out their
abode in the gallas,—unfortunately without success, for they did not
like perhaps to trust themselves so far in the territory of their deadly
enemies.
I remarked a number of burnt bones of hippopotami in the low
forest lying close to the river. I should be inclined to believe that the
natives burn the carrion intentionally, in order not to be exposed to
the disgusting effluvium. A species of black wasps build hanging-
nests here, which however seem from their transparency to contain
very little honey. I could not ascertain this more exactly, because I
was obliged to be cautious in breaking off a branch with such nests
on it. We remark low mountains beyond the softly ascending desert,
and perhaps the dry water-courses which issue here from the steppe
flow to them, and there may be the real abode of the deer. In my
shooting excursion I looked carefully among the thorn-bushes, and
found that the plants are mostly the same; I had fancied quite
otherwise. A blue convolvulus—not, however, belonging to the water
—displayed a lighter colour than usual, and had also round and
glutinous leaves: I took seeds of some pretty creepers and gathered
the fruits of the shrubs, for I was already acquainted with the
leaves. Every thing now was withered, and I am curious to know
what will become of the various seeds I have collected when they
are sown in Europe.
Most of the birds had retreated before the shooting of the other
sportsmen commenced, but I stumbled upon several turtle-doves,
and instinctively grasped my gun, letting my botanical bundle fall on
the ground. I shot some, and got under a tree, where I saw them
fluttering around. The thorns stuck to me and pricked me all over,
and there I sat bent, like an ostrich caught in a thorn bush,
compared with which the bull-rush of Moses was a child. I could not
force through it with my coat on and gun in my hand; so I got loose
from the sharp barbs of the thorns with torn clothes, leaving behind
the tarbusch, takie, and half my cowl, without even scratching my
ears, though they were bleeding enough already. I fetched back my
tarbusch by means of my gun, and then examined my malicious
enemy a little closer, notwithstanding he was an old acquaintance. I
found withered apples on it, and gathered some, for the sake of the
seed; when green they are exceedingly similar to oranges or
Egyptian lemons. I have not found it confirmed that they are deadly
poison to camels.
11th March.—“Bauda mafish, am’d el Allàh!” (the latter properly
Hamdl el Allàh,) was the cry on all sides to Allàh, because the gnats
had taken their departure, and I hope that those which are still in
my cabin will soon follow their companions. Departure at a quarter
before ten to S.E. by E., then a little E. by S. Summer or pastoral
villages on the left: we perceive also herds, but not a morsel of them
is destined for us. On the right an old river-bed or narrow lake,
mostly marshy, and connected below with the river. A quarter after
ten, E.S.E., on a pretty good course, with the exception of some
shallow inlets. We sail, with a south-west wind, four miles. On the
left again open reed-huts or sleeping-places, and herds to which the
people are collecting,—on account of the Turks. All the Haba here is
deposited soil, which lies almost always higher than the other
ground. This evidently fading forest once enjoyed better times, when
the blessing of rain was afforded it, but the benefit of which it lost
directly by its higher situation.
What fables are told of the incredible luxuriance of the tropical
kingdom of plants! At all events it could only be said of aquatic
plants which are forced by water, evaporation, and sunshine, as if by
steam or chemical preparations; but then only in the rainy season
and a few weeks beyond. I saw, indeed, trees shooting forth at this
time in Taka, which boiling and cauldron-shaped valley may perhaps
contain a tropical growth, or something like it; and plants springing
up from the morass with incredible celerity and luxuriance, as if by
magic. But trees that have true manly vigour, and strive to shoot out
with sound strong muscles, whose pith is still clearly to be seen in
the bark, with not a bough injured,—not a branch hanging down
withered,—these are sought for in vain in the Tropics, so far as I
have seen. We can form a tolerable idea of the momentary life and
vigour there by comparing in Europe, acacias, planes, and poplars,
on suitable soil; it is the most cheerful awakening after a long
repose: but part of the limbs always continues in a sleep-like death,
whether it be under the bark of the stem, or a bough that the sun
scorches, or a runner become dry, which disfigures the whole tree. A
forest requires care, either by the fortuitous kindness of Nature
herself; or, when that is not sufficient, by the directing hand of man.
The omnipotence of the terrestrial womb of fruits is past,—that
which gave previously the magic of lovely green to the coming
species, without any visible seeds of themselves. Half-past eleven
o’clock, S.E.—It has just rained a little;—what anxiety and fear of
rain these half-naked coloured people shew; what care they display
in preparing immediately a tent to sit under! I have very often
remarked this; rain must therefore make a sensible impression on
their hot skin. Twelve o’clock, E.S.E. We see at the distance on the
left towards the horizon, solitary dhellèbs as usual on elevated
ground; and also isolated little groups of Shilluks. Narrow tracks of
water right and left, which not long ago were flowing cheerfully. The
river has also gradually laid aside its terraces in preceding times,
until it has limited itself to its present bed; and those parts of the
shore, lying higher are only just moistened, even when it is at its
highest water-mark. It would be interesting to follow these old river-
beds in the ascending line at the side, and to arrive at the dams of
the primitive stream, or at the higher circumvallation which
surrounded the lake here at one time. A quarter after one o’clock.—
On the right a gohr cul-de-sac, low bushes to S.E., called by the very
same name as the Haba; on the left solitary trees and straw huts of
the herdsmen. At two—on the right, another gohr cul-de-sac,—to E.
We sail E.N.E, and wind, for the first time since the morning, to the
left: a track of water in the shape of a terrace, just there, from half a
foot to a foot higher than our level. A beautiful line of dome-palms
before us, but still thicker a little to the left. Half-past three, N.E.—
Heaps of simsim-sheaves on the water at the left, and a row of ten
villages near the dome-palms. A broad gohr or river comes from W.
This may be the river of the Jengähs; but it seems to approach in
the background too much to the Nile; perhaps therefore it is that
gohr which is said to have its old river-bed on the high shores, below
the villages of the Shilluks. A quarter before three, E. We see on the
left seven more large and small villages, by or near that row of
dome-palms, which on this side is very thin; then a dome-forest to
the left at a quarter of an hour’s distance.
An unlimited water-course before us in E. by N., but no huts to
be seen on the left. Therefore, the nation of the Nuèhrs might have
been dislodged by the Shilluks from that quarter; for the former
extend, or are said to extend, up to the Sobàt and its shores. This
side, at all events, had been inhabited, as I plainly saw this morning
at our landing-place. The Haba, however, continues at a slight
distance from the river; on the left also the dome-forest is now
reduced to a strip of a wood. The shores are surprisingly low on
both sides; and therefore not any tokul-village is to be seen near
them. A gohr is on the right, which is scarcely separated from the
river, and in connection with it, like the other narrow ones. Three
o’clock. On the left three more villages in the dome-forest tract; and
on the right and left parrallel gohrs, subordinate Niles, which are
now stagnant, and the fish in which are a prey to men and beasts.
Four more villages to the left, near the dome-wood retreating from
the river; on the right the forest thickens.
Half-past three. Towards S. We have a tolerably high and
apparently planted island at our left, and halt at the right near a hill
—probably a deserted domicile. But look there! that is really the far-
famed Sobàt, the water of which is flowing against us, and which is
so much feared by the crew, who are tired of the voyage. I soon
disembarked on the shore, sauntered up the hill, and was surprised
to find that I could see so far in the distance, and fed my eye and
mind with a diorama which extended from W. to N.E. The Nile is
conspicuous in the W., and meanders to N.E., where it is lost to the
sight. An isolated dhellèb-palm on the right shore indicates this last
boundary. The horizon behind this glittering length of the Nile is
adorned with a transparent forest of dome-palms, interspersed with
slender dhellèb palms, with their small heads. The basin of a lake
spreads from W. to N.W., at my feet, and the river Sobàt winding
downwards from S.E., and flowing in the depth at my right, unites
with the Nile near the lake: both its shores are bare, and only a few
melancholy straw tokuls stand on the extreme point of the right
shore. All the remaining part of the district extends far and wide in a
dead waste, with a little withered grass; and the horizon alone from
S.S.W. to S.W., displays afar some palms and other trees, through
which the blue sky glistens.
The lake lying in the angle between the left shore of the Sobàt
and the right of the White Stream is connected with the former by a
narrow opening, evidently prevented from closing by the hand of
man. The mouth, as is the case elsewhere, is merely stopped up by
reeds, to keep the fish of the lake in confinement. Our blacks
shewed on this occasion what they do to catch fish when the water
of these lakes is shallow, and does not reach up to a man’s middle.
They disturb it with their feet, put fishing or conical baskets into it,
and harpoon the large fish, who come to the top to breathe.
The Sobàt, swelling at high water far higher and stronger, has
raised unquestionably a dam against this lake, the former river-bed
of the White Stream, and pressed the Nile more towards N.W. into
its present bed. Notwithstanding such an advantage being at hand,
the natives have cut through the dam for the purpose of catching
fish. The Sobàt has shortly before its mouth a hundred and thirty
mètres in breadth and three fathoms in depth, whilst when we were
here before it was four fathoms; and according to Selim Capitan, a
few days earlier last year, five fathoms. We can tell but very little
generally of the depth of the Nile, because its bed is very uneven,
and the stream causes eternal fluctuations.
The name of Sobàt could only have been given to this river by
the Funghs, for the Arabs have never possessed it, and usually call it
Bach’r el Makàda (river of Habesch.) The Dinkas name the White
Stream Kedi, and this Kiti, which mostly denotes water in the
dialects on the White Stream up to Bari, where it is called Kirboli: Kir
also means water among the tribes down the river. Its name is Tilfi
and Tak with the Nuèhrs and Shilluks.
When I view the steep and high slope of the shores of the Sobàt,
and the proportionate thin layer of earth on the immovable strata of
clay or original soil, which here is twenty to twenty-five feet higher
than on the shore or in the bed of the Nile, I return to my former
conviction, that the immeasurable particles of stone and plants
stream by means of the breach, and flowing away of the lakes of the
Ethiopian highlands, to the lake of the basin-shaped valley of the
White Stream which flows off with the Nile, as the deepest point;
and that all the lower country under the mountain chains of Fàzogl
and Habesch, from the Atbara to the land of Bari must be under
water, if it be not a lake connected with the depressed regions of the
White Stream. If the lakes, therefore, of that lofty plain were torn by
a powerful catastrophe, and deserted their chasms or valleys, as the
water-basins of Switzerland did formerly—(even now there are lakes
or flat valleys, signs of a deluge, in which the waters might have
dashed from the summit of Atlas to the top of the Alps)—there is no
question that the lower lakes or valleys must have filled and
overflowed. The first rushing-down of the mass of waves, incredibly
violent as it must have been, the falling of mountains accompanying
it, and their washing-away, overpowered everything below them, as
if gods had descended from Olympus, and no longer recognized
those limits that would have remained eternal obstacles by an
inferior shock. The first deposit was a layer of clay on the side of the
Sobat, whilst the White Stream suffered no such sediment when in
its primitive strength, and washed away everything that it could
seize, as is shewn by the far lower shores. The high shores of the
Sobàt and its environs fall away, especially towards the level parts of
the left side of the Nile, to which the accumulated slime could still
less arrive owing to the stream carrying it off, although several gohrs
and rivers from thence pour into it. These afford water certainly, but
no slime to increase the height of the shore, as we plainly see by the
Gazelle River, and also in the little Kiti of the Jengähs called Njin-
Njin. We must assume from the Dinka country and its greater
elevation, that the ground towards the Nile was heightened formerly
by its gohrs flowing from above, or perhaps constant rivers; whilst
Kordofàn, which lies over the left shore of the Nile, discharges no
rivers, and its oases have run down from the mountains themselves,
and formed islands in the sands which still remain, for the sunken
ground forms cisterns that nourish the succulent power of the
mountains by imbibing the moist element; or it may be, that springs
were bored by God’s own hand.
CHAPTER IX.
ROYAL CRANES. — SCRUPLES OF FEÏZULLA CAPITAN. — COMPOSITION OF THE
SHORES. — DESCRIPTION OF THE DHELLÈB—PALM AND ITS FRUIT. — FORM
OF EGYPTIAN PILLARS DERIVED FROM THIS TREE. — DIFFERENCE BETWEEN
EGYPTIAN AND GREEK ARCHITECTURE. — DESCRIPTION OF THE SUNT-TREE.
— DEATH OF AN ARABIAN SOLDIER. — VISIT OF A MEK OR CHIEF. —
DANGEROUS RENCONTRE WITH A LION ON SHORE. — PURSUIT OF THIS
BEAST BY THE AUTHOR AND SULIMAN KASHEF WITH HIS MEN. — FEAR OF
THE NATIVES AT THE TURKS. — PLUNDER OF THEIR TOKULS BY THE CREW. —
BREAD-CORN OF THE DINKAS. — ANTELOPE HUNT. — DIFFERENT SPECIES OF
THESE ANIMALS. — IMMENSE HERDS ON THE BANKS OF THE WHITE NILE. —
LIONS AGAIN. — BAD CONDITION OF THE VESSELS.
12th March.—We set out at half-past nine o’clock, and sail to S.E.
by E. Shrubs on the higher shore to the right. A quarter before ten,
from S.E. by E.; further to the left round a corner, to which a bend
corresponds on the opposite shore: this is often the case on the Nile.
To E.N.E., and immediately again with a short tract to N.E. The river
flows with all its force against the left shore, and therefore the latter
is higher, more perpendicular, and disrupt, than the right, which
soon, however, becomes similar. We go a short tract libàhn, and see
a few miserable small straw tokuls with thin doors, on the left, in the
little green underwood, which seems to be nourished by the
inundation, and is mostly young döbker.
The shores display again iron oxyde. A quarter before eleven:
from E. by N., to the right, E.S.E., where we sail. The shores on the
right and left are higher, according to the current, and the falling of
the river is accurately marked out on the shore by little gradations,
which are exceedingly regular, and one to two inches high. We crawl
on only slowly with the faint south wind, and make now one mile;
for the current being stagnant below towards the Nile, told me
directly that the floating companion of the mountain dissipates
quickly its water, differently from the slow, crawling Nile, which is
obliged to work through the plain of a lake-basin.
Eleven o’clock. The wind freshens, and we go S.E. and E.S.E. On
the left a solitary dhellèb-palm rises on the shore, with its beautiful
and really symmetrical head; its slender base without rings, and its
elegant foliage. From hence in the bend, further to the right, in S.,
where five dhellèb-palms break the uniformity of the high shore on
the left. A low ridge of a hill lies near them, on which a village must
have once stood. If I could but transplant the tallest dhellèb to
Louisa’s island, near Berlin, to make it the common property of all
the northern nations! It is hot, for the high shores keep the
refreshing breeze from the deep water, and only the sail enjoys a
cheerful gust of wind, with the assistance of which we go, at a
quarter before twelve, from S.W., where a regular forest before us
presents itself to the eye, to the left, in S.W. by S. We make two
miles; a quarter of a mile, perhaps, being derived from the current.
A quarter after twelve, from S.W., to the left, E. by N. We hardly
move from the place till it blows from N.E., and then we go better,
having four miles’ course. An old sailor runs on shore close by the
vessel, to find crocodiles’ eggs; tumbles into holes, falls in the grass,
and is using every exertion to find a convenient sand-path instead of
the clay. The crew call him to come off, but he wants to shew that
he is a nimble fellow—thus every one has his hobby-horse.
The river winds continually in a bend to the left: a wretched
stunted forest on the right, and miserable tokuls, without people,
here and there on this shore. One o’clock; from E. by N., where the
river winds again to the right, S.E. by S. We halt at a quarter before
two, at the right shore, yet not to let the men rest; that would be
against the Turkish custom, for they think there are no human
beings except themselves. At three o’clock we go with libàhn to S.E.,
and immediately to the left E. Half-past three, in a bend to the right,
S.S.E.; and four o’clock, on the left, in the bend, to E.S.E. Five
o’clock, from E.N.E., on the right to E., where we stop at the right
shore.
Last night I awoke up several times, and the wild geese on the
neighbouring lake, seemed to call to me in a friendly manner, and
scream “Here we are, for you have not had for a long time either
sheep, goats, or fowls.” I was on the wing therefore at day-break,
but saw only four royal cranes (grus royal, Arabic gornu, or chornu),
one of whom I shot, for they are very delicious when dressed in a
ragout. Feïzulla, although he has been seven years in England,
drinks drams and wine like a Turk, and scruples to dine with me,
because I had not cut the bird’s throat immediately after it was shot,
whilst it was yet alive, and made it debièg (koscher, as the Jews
say). These beautiful birds, with a tuft of golden hair and shining
feathers, appear in flocks on the White River: my Sale killed a brace
in a moment, and would have brought us more if he could have
followed them. The geese would only surrender at discretion to the
“longue carabine,” and I had only my short double-barrel.
I visited once more, on this occasion, the hill above-mentioned,
which I found quite adapted for the situation of a village. I had seen
already the remains of potters’ ware, and solitary flower-gardens, or
plots of ground trodden down, where once tokuls stood, but where
now neither grass nor shrubs could grow; and I came to the
conclusion that a considerable village must have stood there, which
could have belonged only to the Nuèhrs, and was probably
destroyed by the Shilluks. Thermometer, sunrise, 21°; half-past nine
o’clock, 28°; noon, 29°; no rise beyond that was perceptible
afterwards.
13th March.—Departure at seven o’clock, with libàhn to E.S.E. by
E.; then to the left, E.N.E., and we sail with a good north-east wind.
A quarter before eight: from E., in the bend to S.E,; on the left some
straw tokuls. The wind becomes strong, and we make six miles for
the present; the mountain stream seems to be here at its lowest
pitch, and has only a quarter of a mile rapidity. Eight o’clock; from
S.E. by S.; to the left, E. by S., where we are obliged to go libàhn. A
quarter after eight, to the left, but we halt before the corner of the
bend till noon, owing to the violent east wind. I made a little
excursion into the immeasurable plain, which was tree-less and
comfortless; and found two villages, better built than usual, to which
I was not able to approach, and likewise a long and dried-up marsh.
I could not, unfortunately, discover any guinea-fowls in the durra-
stubble.
At twelve o’clock, we proceed with libàhn to N.E., where our
Bach’r el Makàda winds again to the right. Half-past twelve. The
shores, with few exceptions, attain a height of fifteen to eighteen
feet: the upper surface of the soil consists of humus to two or three
feet deep (which may be deeper in the low ground, old gohrs, and
several tracts), and under it nothing is seen but clay or mud, having
a yellowish colour on the shore, from the iron oxyde, with which it is
strongly impregnated, and generally more so than on the White Nile,
where this is only the case in layers. A fertile country, but requiring
human hands, canals, and sakiën. We see from its shores, and in the
dried-up pools, which receive very little nourishment here from
vegetable matter, particularly on the upper land, that the Sobàt
brings down fruitful earth or slime.
From half-past twelve to two, in a bend to the left, S.E., where
we go again left in N.E. by N. On the same side there is a tolerably
well built little village on the shore. A quarter before three, still
further to the left, N. by E. Four o’clock, we wheel to the right in
E.N.E., where we get the view of a genuine low forest, and notice on
the left a village in the winding to S. by E. Half-past four, also
further; a hamlet on the right with straw tokuls, the first on this side.
We see here also reed-boats, as among the Nuèhrs and Shilluks on
the Nile. At five o’clock to S., where we at first halt at the right
shore, before the bend to the left. Two large villages lie from half to
three-quarters of an hour distant, and I see an immeasurable bare
plain cracked from drought,—a summer shallow lake without any
verdure. We go then to the left shore, the soil of which is less mixed
with sand than that of the right, and gives us some hope of shooting
and fishing. The huntsman Sale returned, however, disconsolate, for
he had seen nothing at all.
The left shore is still more precipitous and higher here than the
right one, because the stream forces itself into this bend. When we
disembark, we find that the land again rises to a gentle acclivity, and
we have the prospect of a large lake about three quarters of an hour
distant, which overflows perhaps deeper into the Sobàt. Many lakes
of this kind must be found in the country of the Dinkas, because
springs, as in the Taka country, are not sufficient for the watering of
the cattle of this merissa-loving, dancing and singing tribe; and
besides, the drawing of the water would cause too much trouble.
The Sobàt is stagnant here in the proper sense of the word, and
no log can determine anything else.
14th March. We navigate again on the right side, and go at half-
past seven o’clock with libàhn from S. by E., immediately S.E., where
the north-east wind remains contrary to us, notwithstanding the
narrow water-tract. Some small and still green reed-huts hang on
the shore, sheltered from the north wind: these are stations for
hunters of hippopotami and crocodiles, or for fishermen, who,
however, have gone away, and taken with them their working
implements, for they are frightened of us. The durra seems to thrive
famously on the half-sandy shore, and rises cheerfully above the
reeds; probably it is sown,—that is, a handful thrown here and there
on the vacant spots.
Eight o’clock—E. by N., and N.E. by E. The upper margin of the
right shore is planted throughout with durra, and some small fishing-
huts shew that men dwell there. Ten o’clock.—Hitherto always N.E.
within considerable deviations, and then N. by E.; where we halt at
the corner of the right shore on account of the wind, for the river
goes still further to the left: level land above, some underwood, and
a village at a little distance. A quarter before one.—N. by W., and
about one, in a bend to the right. When the crew relieve one
another at the rope, they imitate to perfection the Uh-uh-i-ih of the
tribes on the upper part of the White Stream, and during the towing
itself they sing the song à-à-à-jòk-jòk, which would be difficult for a
white man to do. The force of the water is directed here against the
right shore, which is without any crust of vegetation, and seems to
ascend to the uppermost margin, as is proved by the gradations
being washed away, and the thin layer of humus, one foot to one
and a half high, decreases perpendicularly, whilst the lower part of
the soil displays unmixed clay. It certainly required a powerful
pressure of water to wash this primary deposit to such a depth; the
left shore, on the contrary, has a coating of slime and vegetation
down to the water.
Two o’clock.—E. by N.; twenty-one dhellèb-palms on the left,
with a pastoral hamlet of thirty new straw-tokuls. The crew are
beginning to shoot down the dhellèb-fruits, and I also disembark on
the shore, beyond which the ground, with the beautiful group of
trees, is still imperceptibly elevated. We are quite comfortable there,
but I gaze far and wide for a point to break the unbounded flat
waste that shews not a thorn or a bush; the river winds melancholy
between the naked shores. These palms stand in luxuriant growth,—
a proof that the soil is capable of other things, and may look for a
better future. The very pretty straw-huts present nothing worth
having to our rapacious eyes, and near them we remark the
sleeping-places, and a large, glimmering heap of dung, serving at
night for fire and a bed. The cow-dung is collected in little heaps in
the enclosure, surrounded with palings, where the beast is tied, and
is still quite fresh: notwithstanding this, it is very certain that we
expect in vain the return this evening of these beautifully spotted
cattle. Standing on an old trunk of a tree, I remarked a large village
on the right shore at a quarter of an hour up the river.
The dhellèb-tree has the same fibrous texture of bark, and of the
interior of the trunks, as the dates and dome-palms; but it is far
finer, thicker, and stronger. The outside of the bark shews rings from
below upwards, and the tree itself shoots forth slenderly from the
earth, and swells gradually towards the centre to a spheroid form,
when it decreases again to the top, and rises stately, separating the
head from the stem. The fruit is as large as a child’s head, and in
clusters, as in the palms before named, but on far stronger stalks,
from which it hangs down immediately close to the stem. It is
smooth outside, and of a golden colour, like its pulp; the latter is
fibrous, of a bitter-sweet taste, like chewing soft wood and leaves
behind in the mouth an astringent taste, which may arise here from
the fruit not being fully ripe. There are from four to six kernels in
this gold apple of the size of a child’s hand, or of those of the dome-
palms: the stalk has a scaly covering, surrounding about a third part
of the fruit. The kernels, or the nuts, have themselves a solid pulp,
shining like dark glass, being exactly similar to that of the dome-
fruit: at first it is like milk, but on coming to maturity becomes of the
consistency of horn. The trunk of palms is surrounded with the same
kind of rings as the date-tree, the rind feeling smooth, like planed
wood; consequently it was impossible to climb these trees to gather
any fruit, owing principally to the swelling in the centre, and
therefore it was shot down. After several attempts, we drove large
nails in the stem, to hold the rope by, and then we ascended
gradually.
The bark falls off on the ground, as is the case with the other
palms, for the tree throws out foliage like grass from the interior: the
thick rootlets spread themselves in all directions through the ground,
like polypi, with a thousand veins of life.
There seems to me to be no doubt that the Egyptian pillars,
protruding in the middle, derived their origin from the dhellèb-palms,
which might have been transplanted in the Thebaïs; for it was
impossible that the Egyptians should not take notice of the unusual
shape of this tree—they who borrowed all their forms and
embellishments, even to those of their spoons and salve-boxes, from
the kingdom of nature.
Lifeless figures having no meaning are never represented by
them; flowers, foliage, leaves, sacred animals, or parts of them
properly introduced, are intermixed with hieroglyphics, like a
garland, without beginning or end. The Greeks quickly seized what
was beautiful in this, discarded what was heavy and confused, and
pleased themselves and succeeding ages by lighter and more
elegant forms. They placed the acanthus and horns, or volutes on
the capitals of their pillars, and the Germans planted a stone-forest
as the holiest of holy.
A large village of the Nuèhrs (judging from several potsherds)
stood on our hill: this nation dwells up the river from hence and in
the direction of the White Stream, where we had seen them last. I
had found also on the last landing-place fragments and the
foundations of a village, and heard from our blacks that the Shilluks,
several years ago, had a great war with the Nuèhrs, drove them
from these parts, and took possession of the lake abounding in fish,
which I have previously mentioned. We have not remarked any sunt
among the mimosas from the country of Bari up to the Sobàt, and
even on this river, but we see talle. The latter tree has a reddish
bark; the long white prickles grow by couples; the flowers are
whitish and without any particular scent; the bark, however, is used
for pastilles, and, when rubbed, sprinkled on the merissa. It affords
the best gum (gamme, semmag), which is white like that from sunt,
while that from the sejal (or sayal) is blackish. Thermometer
yesterday morning 22°, and did not rise beyond 27°, and this
morning 18°; noon 26° to 29°.
15th March. We leave our beautiful palms at half-past nine
o’clock, and go from E. by N., and notwithstanding the strong north-
east wind, slowly in the bend to the right. A quarter after ten, S.E.
by E., then a very short tract S.S.E.: some grass huts of fishermen,
and crocodile and hippopotami hunters at the lower declivity of the
shore on the left. Half-past ten, to the left S.E., and further to the
left, S.E. by E., where we halt at eleven o’clock, because an Arabian
soldier has just cried himself to death before our cabin! He wept at
having to die in a foreign land and not seeing his mother any more.
Nearly all these people lose their courage directly they are attacked
by any illness, the nature of which they cannot visibly perceive as
they can a wound, &c. He died with a piece of bread in his mouth,
because the Arabs believe, and with justice, that so long as you can
chew bread you will not die. It is shameful that we dare not take
even medicine from the fine black physician we have on board, and
much less can we expect assistance or salvation from him. Ten
minutes have flown; the deceased is carried to the upper part of the
shore, and yet the worthy disciple of Clot-Bey has never even looked
at him! We leave at half-past two the place where the soldier was
buried in dead silence, after having received five more cows, upon
whom the crew fell like wolves, and navigate to the left, E.S.E.; then
again slowly to the right. Three o’clock, to S.E. We sail about five
minutes, and stop again at the right shore, by the corner where it
turns to the left, and then again, “Jo hàmmet, Ja mohammed!” is
chaunted at the rope. In the winding below the left shore we saw a
water-hunting establishment of seven straw tokuls. A quarter before
three, from E.S.E. to E. by S. A quarter after four, E.S.E. Half-past
four, E. Some few trees on the right entirely or partly withered, and
soon afterwards a few green ones, of which those standing lower
shew that the water has poured into the shores, even to the margin.
Five o’clock, E. by N., then slowly right to E., where we halt at a
quarter of an hour later. The river makes a strong bend to the right,
and we hope to sail to-morrow.
This afternoon, when the cows were brought us, I procured a
ring, with much difficulty, for sug-sug, and though badly
manufactured, it is at least peculiar to the country. I saw several
such rings among them, but not one of them had a circular form,
and by this we may measure the standard of their skill. Those which
are better worked, are found among the Nuèhrs. The five cows
came from the Mek, who presented himself in person to Suliman
Kashef, with whom Selim Capitan also happened to be: he was
clothed in a ferda, which he had received from the Shilluks. He wore
a very thick copper ring on his hand, and was of opinion that dress
is the privilege of sheikhs. An old woman and a man preceded him;
the former attired like an ancient Queen of the Witches. We dressed
the mek in a red caftan, put a gay-coloured red handkerchief round
his head, and hung glass beads on him. Another cow was brought to
us, but they wanted an enormous quantity of sug-sug for it, (these
trinkets are generally held in little value here, because the Gelabis
frequent these regions,) and still more for goats and sheep.
Thermometer, sunrise, 18°; noon and subsequently, 28° to 30°.
16th March.—Man is not appalled in the midst of danger itself,—if
it were so, he would be lost; but the frail human heart throbs
afterwards. Yesterday evening I left the vessel, in company with
Thibaut, to get at a swarm of finches, which birds are said to give a
delicious flavour to a pillau, of which we wanted to be joint
partakers. We were soon obliged to separate, in order to salute the
birds on both sides of their settlement. In my excursion, however, on
the shore, I came all of a sudden within a few steps of a lion,
without having the least distant idea that this fearful enemy could be
in the neighbourhood of all our vessels, and I had only my double-
barrel, which was loaded merely with small shot; whilst my
huntsman Sale, was pursuing a gazelle, at a long distance off.
Possibly our firing had awakened this supreme chief from his sleep,
for otherwise I must have seen him before, although my eye was
directed to a brace of birds at the left; because the underwood could
not have concealed an object of such size, as it only reached up to
the knee, and was merely interspersed here and there with a higher
bush. I was just taking aim slowly and almost irresolutely at the two
beautiful birds, who were looking at me with surprise and
confidence, contrary to the custom of the cunning finches, when the
lion stood before me on the right, as if he had sprung from the
earth. He was so close to me that he appeared to stand as high as
up to my breast, but yet I stood, my poor weak weapon in my hand,
holding it close to my side, with perfect presence of mind, so as to
keep my face free, and to wait for the attack; I was firm, and he
seemed also to be resolute.
At first we stared at each other mutually; he measured me from
top to toe, but disregarded the Turkish accoutrements and sun-burnt
countenance, for my red cap which he seemed not to despise. I, on
my side, recognized in him the dreaded king of beasts, although he
wore no mane, according to his usual custom, but I did not appeal
to his magnanimity. At last he turned his face from me, and went
away slowly with a dreadfully pliable movement of his hinder parts,
and his tail hanging down, but could not restrain himself from
turning round to look at me once more, while I was trusting to the
effect of one or two shots in the eyes or jaws, if it came to a contest
of life or death; and really I remained standing immovable, with too
much of the lion in me to tremble, and to bring certain destruction
on my head by untimely flight. However, away he went, looking
round several times, but not stopping, as if he feared pursuit, and I
turned my back to him equally slowly, without even calling out a
farewell; but I cast a searching look over my shoulders every now
and then, right and left, expecting that he might make a spring like
a cat, and I kept him in sight before me, when I was about to jump
down from the shore on to the sand where the vessels and crew
were. I confess openly that I now felt an evident throbbing of the
heart, and that my nose seemed to have turned white. Taken
unawares as I had been by the lion, the distance of five paces,
according to the measurement I made, was nearly too close for me:
on his side it was only necessary for him to have smelt me, which
probably I should not have allowed. I stood a moment on the margin
of the shore, in order that I might tranquilly summon Suliman Kashef
to the pursuit of the beast, without betraying any pallor of
countenance, and then I jumped down on the sand. When I swore
by the prophets to Suliman Kashef that my account was true, he was
ready immediately with his sharpshooters. At my advice we formed a
line of riflemen above, though I could not obtain a couple of bullets
for my gun; but the Turks soon crawled together again, except a tall
black slave of Suliman’s, who was at the right wing. When the latter
soon afterwards pointed and made signs that the lion was near at
hand, his master motioned with his hand and gun that he would
shoot him if he did not join us, for he held himself as lost, being left
quite alone. We set off at a slight trot, because the lion continued
his walk, until at last Suliman, as it began to get dark, ordered three
of his boldest warriors to go in advance. Three shots were fired, but
the men came back, and described the lion as a real monster. I was
actually glad that the magnanimous beast, according to all
probability, was not even wounded. They called me again an “Agù el
bennaht,” because I accompanied the expedition to see my lion a
second time, and they expressed themselves rejoiced that God had
preserved me, and wished me happiness, with pious phrases from
the Koràn.
To-day we sailed at half-past six o’clock from the place to S.E.
and S.E. by S.; at seven o’clock; E. by S., a village on the high shore
at the right.
We saw yesterday, from our landing-place, four villages, lying
together on the right and left shore, which the Dinkas have taken
into their possession. At half-past seven o’clock, after we had sailed
only slowly (two miles), owing to the wind being partly adverse, we
proceeded to E.S.E. and S.E. by E. The strong breeze caught the
sails, and we make seven miles clear of deduction: unfortunately,
the tract will not be long. A quarter before eight we stop before the
corner, where a winding to the left commences, in order to go
libàhn, because the vessels ahead do it. Some huntsmen’s huts, with
their inhabitants, stand on the right shore, and I procure, on this
occasion, a horn of the Tete species of antelope. We proceed,
sailing, to S.E. by E., and E.S.E., and halt a quarter after eight. Again
at S.E. by E., to go libàhn round the left. Unfortunately, the wind has
torn the sail, which I had feared for a long time would be the case;
for it was ripped up in several places, and the Tailor Capitan did not
trouble himself about it. “Allàh kerim!” A large village at some
distance above. At a quarter before one, we go libàhn to S.E. by E.;
then E.S.E. and E. by S. On the right shore a village with Dinka
tokuls and sleeping-places. It is not yet, however, decided whether
the Dinkas dwell there, although the style of architecture of the
tokuls, their grooved and arched roofs, without eaves, seem rather
to denote that they belong to this tribe than to that of the Nuèhrs.
The wind is very strong, and the crew are obliged to tow with all
their might; but the river winds now to the right, and we can,
perhaps, sail. A quarter before two. From E. by N., slowly in the
bend to the right: a village on the right shore, in the bend to the
left, exactly like that on the left side. Half-past two, E. by S. We
cannot see anything of the village here, owing to the high shore;
and the blacks, who stood shortly before in large numbers on the
shore, have fled because they saw the Turkish countenances of
Suliman Kashef’s halberdiers. The Turk is pleased at such fear, which
is associated with hatred and contempt on the part of the negroes. A
quarter before three; S. by W. The wind makes the men at the rope
run; but we are not able to sail, because the river winds immediately
to the left. We have a low sand-island at our right. Our men will let
nothing lie by the huntsmen’s huts: tortoise-shells (water-tortoises),
vessels,—such as gadda, burma, gara—everything is carried off; for
the blacks have imbibed the Turkish notion of “Abit,” and are now
askari (soldiers), who pretend to know nothing of their countrymen.
Three o’clock. To the left in S. and S.S.E.; then again to the right.
Half-past three. We sail a little S. by W. and S. by E.; a village on the
left. The Dinkas appear to mix everything called corn to make bread;
such as durra, lubiën of different species, gourd or melon stones,
&c., of which I have a specimen; and also lotus seeds, found here in
great quantities, and therefore denoting that there are several lakes
in the interior, and the small rice I have mentioned previously. A
large hippopotamus shewed himself on the flat left shore: he was
afraid of the vessels and the shouting of the crew, and trotted in a
semicircle, like an immense wild boar, in order to plunge into the
water with a greater roar. Four o’clock. To the left E.S.E. Five o’clock.
From E. further to the left.
The crawling along these cheerless shores, notwithstanding the
shouting, jokes, teasing, and stumbling on board the vessels from
side to side, and sometimes into the water, and the huzzaing when
that takes place—notwithstanding all the various kinds of occupation
and non-occupation which may amuse us for a short time—is
exceedingly wearisome; and it is well for me if I retain my senses to
sketch here and there an idea, which may be followed out or
rejected by those whose attainments are higher, and who have the
advantage of an enlightened circle, where opinions and views can be
expressed and discussed. Such a circle, however, cannot be found in
Bellet Sudan, or on board my vessel. We halt a little after six o’clock
in E.N.E., at the right shore. Thermometer, sunrise, 18°; noon, 27°
and 28°; sunset, 27°.
11th March.—We had a great antelope-hunt yesterday evening.
Amongst others, there was an Ariel with twenty-five rings on its
horns, and a Tete, and three female Tilli. The latter, also a species of
antelopes, are of lighter colour than the Ariels, and almost white,
whilst the Tete has a dark-brown coat with white breast and belly.
The female Tilli are distinguished by having long tails, but the males
are said to be bare behind. I was not able to leave the vessel
sufficiently early to see a herd of more than a thousand antelopes
that were going to the watering-place. My huntsman, also, who had
struck into another road, saw some hundred together; all the others
agreed that there were these thousand which I have mentioned. But
they soon dexterously divided to the right and the left on the
immeasurable level of this land, where there was merely low grass,
wild bamie and a quantity of basil, which latter was also met with on
all sides in the countries further up; and Suliman Kashef only shot
four, and my Sale not a single one. I myself could only see some
antelopes on the horizon, because it was already getting dusk, and I
stopped with Sabatier close to the vessels, in case some beast
should be scattered from the herd, but in vain. On this occasion,
also, I saw two lions at a distance.
At night the wind blew in coldly at the door and windows, and
even this morning the north-east wind was cool. At half-past six we
proceed E.N.E., and in a bend further to the right E. and E. by S.,
where we make a stronger evolution to the right. Eight o’clock.
Libàhn from S.E. by S. to S. We glide over shallows apparently
consisting of rubble-stone; the wind becomes strong and tosses the
waves. A quarter before nine, S.E. by S. to S., then still more to the
left, where we are soon thrown by the wind on the left shore, and
stop in E.S.E. Thibaut is with me, and they are calling for him; his
ship is full of water, and all the crew are summoned there: it is
fortunate that we are near land. Selim Capitan neglected to have the
vessels caulked at Khartùm, or to order at least gotrahm (instead of
tar) to be applied to the parts which we had stopped up with some
oakum.
At five minutes’ distance above, a large village deserted by
people; we are magnanimous enough on our side to keep the crew
from plundering it. It is slightly elevated: the same is also the case
with the shore, so that shallow lakes are formed right and left, at
present dry, and having vents to the water, which apparently are
kept open by human hands for the sustentation of the soil,—on
which, however, nothing is seen. A number of snail-shells are lying
together on the surface just as I have seen in other places, and it
seems that snails are eaten. We remain here on account of the
accident to Thibaut’s vessel, but the shores, à talus, do not allow us
to bring it on the dry land. Thermometer 17° and 24°.
CHAPTER X.
VARIOUS SPECIES OF GRASSES. — FORMATION OF THE SHORES. —
WATERFOWLS. — AN ANTELOPE OF THE TETE SPECIES, NOW AT BERLIN. —
STRATA OF THE SHORE. — THE SOBÀT RIVER. THE MAIN ROAD FOR THE
NATIVES FROM THE HIGHLANDS TO THE PLAINS. — OBSERVATIONS ON THE
COURSE OF THE NILE AND SOBÀT. — A THOUSAND ANTELOPES SEEN MOVING
TOGETHER! — WILD BUFFALOES, LIONS, AND HYÆNAS. — AFRICA, THE
CRADLE OF THE NEGRO RACE. — THE SHUDDER-EL-FAS: DESCRIPTION OF
THIS SHRUB. — ARNAUD’S CHARLATANRY. — OUR AUTHOR FEARED BY THE
FRENCHMEN. — ARNAUD AND SABATIER’S JOURNALS: THE MARVELLOUS
STORIES OF THE FORMER. — THIBAUT’S JEALOUSY. — VISIT OF A SHEIKH OF
THE SHILLUKS. — FEAR OF THE TURKS AT THESE PEOPLE. — SULIMAN
KASHEF PURSUED BY A LION.