Unit 11
Unit 11
Structure
11.0 Objectives
11.1 Introduction
11.2 Derivative Indexing and Assignment Indexing
11.3 Pre-Coordinate Indexing System
11.3.1 Cutter’s Contribution
11.3.2 Kaiser’s Contribution
11.3.3 Chain Indexing
11.3.4 PRECIS (Preserved Context Index System)
11.3.5 POPSI (Postulate Based Permuted Subject Indexing)
11.4 Post-Coordinate Indexing
11.4.1 Pre-Coordinate Indexing versus Post-Coordinate Indexing
11.4.2 Term Entry System and Item Entry System
11.4.3 Uniterm Indexing
11.5 Keyword Indexing
11.5.1 Key Word in Context Indexing (KWIC)
11.5.2 Variations of Keyword Indexing
11.5.3 Double KWIC
11.5.4 Other Versions
11.6 Computerised Indexing
11.6.1 Meaning and Features
11.6.2 Manual Indexing versus Computerised Indexing
11.6.3 Advantages and Disadvantages of Computerised Indexing
11.6.4 Components of Computerised Indexing System
11.6.5 Categories of Computerised Indexing Systems
11.6.6 Comparison of Computerised Indexing Systems
11.6.7 Index File Organisation
11.6.8 Methods of Computerised Indexing
11.7 Indexing Internet Resources
11.7.1 Search Engine Indexing
11.7.2 Subject / Information Gateway
11.7.3 Semantic Web
11.7.4 Taxonomies
11.8 Summary
11.9 Answers to Self Check Exercises
11.10 Keywords
11.11 References and Further Reading
11.0 OBJECTIVES
After reading this Unit, you will be able to:
understand the basic principles of subject indexing techniques;
appreciate the differences between (a) derived and assigned indexing, (b) pre-
78 and post-coordinate indexing systems;
trace the major contributions in pre-coordinate indexing systems; Indexing Techniques
learn the different stages of intellectual operations and working of some important
pre-coordinate indexing systems such as, Chain Indexing, PRECIS and POPSI;
understand the objective conditions that led to the development of post-coordinate
indexing;
learn the different entry structure such as item entry as well as term entry systems
and the methods of post-coordinate indexing with particular reference to Uniterm
indexing system and post-coordinate searching devices;
explain and apply the different varieties of keyword indexing;
appreciate the differences between the manual indexing and computerised indexing;
understand the concept of computerised indexing in terms of its features,
components, categories, and index file organisation;
learn different methods associated with the generation of index entries with the aid
of computer;
get acquainted with the indexing internet resources with particular reference to
search engine indexing and other associated concepts; and
develop skills of using different indexing techniques for formulating different types
of subject headings.
11.1 INTRODUCTION
Subject approach to information has been an area of intense study and research in the
area of organisation of information, resulting in the generation of new theories and the
design of the corresponding new indexing techniques based on these theories. Indexing
technique actually originated from what is known as the ‘back-of-the-book index. Its
objective is to show where exactly in the text of a document a particular concept (denoted
by a term) is mentioned, referred to, defined or discussed. The ‘back-of-the-book
index’ may be either in the form of specific index or relative index. Specific index presents
the broad topics in the form of one-to-one-entry whereas the relative index is one
which displays each concept in different context. The best example of such an index is
the relative index of Dewey Decimal Classification. But the relative index is usually
unique to the text to which it points to and is quite difficult to maintain on a large scale.
Subsequently we have seen the development of pre-coordinate indexing model.
Till about the early fifties of the last century, the pre-coordinate indexing models were
the only ones that had been developed. In the subsequent decades, the post-coordinate
indexing models were designed and developed, the physical apparatus for these index
files also changed from the conventional index cards to different formats of post-
coordinate indexing models. With the advent of the computers, the keyword index
models like KWIC, KWOC, and KWAC were introduced, post-coordinate indexing
also became more amenable for computer manipulation. Most of the bibliographic
databases today have indexes based on post-coordinate indexing principles. The
following sections of this Unit present the major developments in subject indexing
techniques for organising the index file.
81
Indexing c) Use direct form in case of a noun connected to another noun with a preposition,
e.g. Patient with heart disease, Fertilisation of flowers, death penalty, etc.
d) Use direct form in case of phrase or sentence used as the name of a subject,
e.g. Medicine as profession.
Kaiser’s systematic indexing did not make any provision for entry under the ‘process’
term and as a result it failed to satisfy the users’ approach by the ‘process’ term. The
concept of ‘time’ was also left by Kaiser. It is to be pointed out here that Kaiser was
perhaps the first person who gave the idea of categorisation in subject indexing.
82
11.3.3 Chain Indexing Indexing Techniques
Merge specific subject entries, subject references (i.e. ‘see also’ references) and ‘see’
references and arrange them in single alphabetical sequence.
The above noted steps in chain indexing are demonstrated below with illustrative example:
0) Subject of the Document
Researches on Child Psychology in India
1) Class Number of the Subject of the Document
Class no.: 155.4072054 [according to DDC, 22nd Edition]
2) Representation of the Class Number in the form of a Chain
100 Philosophy, Parapsychology and Occultism
150 Psychology
155 Differential and developmental psychology
155. ———
155.4 Child psychology
152.40 ———
152.407 Education, Research related topics
152.4072 Research
152.40720 ———
152.407205 Asia
152.4072054 India
3) Determination of Links
100 Philosophy, Parapsychology and Occultism [USL]
150 Psychology [SL]
155 Differential and developmental psychology [USL]
155. [FL]
155.4 Child psychology [SL]
152.40 [FL]
152.407 Education, Research related topics [USL]
152.4072 Research [SL]
152.40720 [FL]
152.407205 Asia [USL]
152.4072054 India [SL]
85
Indexing 4) Preparation of Specific Subject Heading
Research, Child psychology, India
5) Preparation of Subject Reference Headings
Research, Child psychology
Child psychology
Psychology
6) Preparation of Cross References
India, Research, Child psychology
7) Preparation of Index Entries
Research, Child psychology, India 152.4072054
Bibliographical description and abstracts of the document are to be furnished
under the specific subject heading.
Research, Child psychology 152.4072
See also
Research, Child psychology, India
Child Psychology 155.4
See also
Research, Child psychology, India
Psychology 150
See also
Research, Child psychology, India
India, Research, Child psychology
See
Research, Child psychology, India
8) Alphabetisation
Arrange the above entries according to single alphabetical order.
Advantages
A major advantage of chain indexing is that it ensures the collocation of aspects of a
subject which have been scattered in the classification scheme (i.e. distributed relatives)
because last link in the class number is always the first link in the chain of subject index
entries.
It offers general as well as specific information to all information seekers by deriving
subject headings from the chain of successive subdivisions that leads from the general
to most specific level.
It is more or less a mechanical system and it is economic also.
86
Criticisms Indexing Techniques
88
f) The system must have sufficient references between semantically related terms.
Features of PRECIS Indexing Techniques
Primary Operators
Environment of core 0 Location
concepts
Core concepts 1 Key System
Thing when action not present.
Thing towards which an action
is directed, e.g. object of
transitive action, performer of
intransitive action
2 Action; Effect of action
3 Performer of transitive action
(Agent, Instrument); Intake;
Factor
Extra-core concepts 4 Viewpoint-as-form
5 Selected Instance: study region,
study example, sample
population
6 Form of document; Target user
Secondary Operations
Co-ordinate concepts f ‘Bound’ co-ordinate concept
g Standard co-ordinate concept
Dependent elements p Part; Property
q Member of quasi-generic group
r Assembly
Special classes of action s Roll definer; Directional
property
t Author-attributed action
u Two-way interaction
Schema of Codes
Primary codes
Theme Interlinks $x 1st concept in coordinate
theme
$y 2nd/subsequent concept in co-
ordinate theme
$z Common concept
Term Codes $a Common noun
$c Proper name (class of-one)
$d Place name
90
Secondary codes Indexing Techniques
Differences
Preceding differences (3 characters)—
1st and 2nd characters:
$0 Non-lead, space generating
$1 Non-lead, close-up
$2 Lead, space generating
$3 Lead, close-up
rd
3 character: Number in the range, 1 to 9 indicating level of difference
Date as a difference $d
Parenthetical difference
$n Non-lead parenthetical difference
$o Lead parenthetical difference
Connectives
$v Downward reading connective
$w Upward reading connective
Typographic codes
$e Non-filing part in italic preceded by comma
$f Filing part in italic preceded by comma
$g Filing part in roman , no preceding punctuation
$h Filing part in italic preceded by full point
$I Filing part in italic , no preceding punctuation
Lead Qualifier
Display
Lead is occupied by the approach term, which is the filing word and is offered as the
user’s access point in the index.
Qualifier position is occupied by the term(s) that sets the lead into wider context (i.e.
general to specific). Together, the Lead and the Qualifiers correspond to the Heading.
Terms in the heading set down in a narrower-to-wider context order. When the first
term of the input string appears in the Lead position, the Qualifier position is usually
kept blank.
Display position is occupied by those additional set of qualifying terms of the PRECIS
string, which rely upon the heading for their context. When the last term of the input
string appears in the Lead position, the Display position becomes empty.
Steps in PRECIS
Let us take the following title of a document for demonstrating the different steps of
subject indexing according to PRECIS:
91
Indexing 0. Title: University libraries in West Bengal
1. Analysis: Involves analysis of the thought content of the document and formulating
the subject statement in natural language:
Measurement of the performance of university libraries in West Bengal
2. Preparation of Input String: Involves the identification of the status or role of
each component term denoting key concept in terms of the role operators of PRECIS
and assigning the appropriate operators to prepare an input string. The stages of the
preparation of input string are furnished below:
[For understanding the stages of the preparation of input string, students are required to
consult the schema of role operators and codes as furnished under the sub-section
11.3.4. of this Unit]
a) Identifying the concept signifying an action (if there be any).
In the present example, the action concept is denoted by the term ‘Measurement’
and this term should, therefore be prefixed by the role operator (2).
b) Identifying the kind of action represented by this term, i.e. whether transitive action
or intransitive action. ‘Measurement’ is clearly a transitive action since it is capable
of taking an object. The object of transitive action is considered as the key system
and is coded by the operator (1). In the present example, it is the ‘Performance’
which is being measured, so the input string should now appear in the form:
(1) performance
c) Identifying the concept, if any related as property and/or whole-to-part. In this
example, ‘performance’ is the property of ‘university libraries’ and ‘libraries’ is the
part of the ‘universities’. As a result, we will get the following input string:
(1) universities
(p) libraries
(p) performance
(2) measurement
d) The remaining term ‘West Bengal’ signifies the environment (i.e. geographical
location) in which the whole thing takes place. Accordingly, the operator (0) is to
be prefixed to the concept and the resulting input string will be:
(0) “West Bengal
(1) “universities
(p) “libraries
(p) “performance
(2) “measurement
Note: The primary operators have ordinal filing values and the terms in the above
input string are sequenced accordingly. Thus, component terms are organised into
the above input string according to the principle of context dependency. The
secondary operator (p), prefixed with ‘libraries’ and denoting ’part’ of the
92
‘universities’, is preceded by the primary operator (1) to which it is related. Similarly,
the secondary operator (p), prefixed with ‘performance’ and also denoting Indexing Techniques
‘property’ of the ‘libraries’, is preceded by the secondary operator (p) meant for
‘libraries’ to which it is related. The terms, except proper name (i.e. West Bengal),
are written in lower case initials. Index entries will be generated in upper case
initials by the computer. Tick mark (“) is to be provided for the terms which should
appear as Lead (access points) in the index entries.
3) Generation of Index Entries: The first index entry will be generated by the
computer by pushing the first term of the input string into the Lead position and
keeping the remaining terms in the Display position. As soon as any term goes to
the Lead position, it is printed in bold type face.
West Bengal
Universities. Libraries. Performance. Measurement
The second index entries will be generated by pushing the second term of the input
string into the Lead position and thereby replacing the existing Lead term into the Qualifier
position, such as:
Universities. West Bengal
Libraries. Performance. Measurement
Similarly other index entries will be generated as
Libraries. Universities. West Bengal
Performance. Measurement
Performance. Libraries. Universities. West Bengal
Measurement
Measurement. Performance. Libraries. Universities. West Bengal
Note: It can now be seen in the above examples that Lead and Qualifier are
separated by a full stop and 2-letter space. The standard separator between two
terms in the entry is full stop and one space. The terms in the Display position are
written leaving 2-letter space from the left. For over-run of Display in the next line,
4-letter space and for over-run of heading in the next line 8-letter spaces are to be
left from the margin.
4) Generation of supporting reference entries: ‘see’ and ‘see also’ references
are generated from semantically related terms taken from a machine-held thesaurus.
5) Alphabetisation: All the entries, generated by the process, as stated above, are
arranged alphabetically by headings. Under the common heading, displays are
organised alphabetically.
Formats of PRECIS Index
Index entries in PRECIS are basically generated in three formats: standard format,
inverted format and predicate transformation.
a) Standard Format: Index entries in the standard format are generated when any
of the primary operators (0), (1), and (2) or its dependent elements appear in the
Lead. The process of generation of index entries in the standard format has already
been demonstrated under the Section 11.3.4.7 of this Unit.
93
Indexing b) Inverted Format: Index entries in the inverted format are generated whenever a
term coded by an operator in the range from (4) to (6) or its dependent elements
appear in the Lead. The rule relating to the generation of index entries with this
format is that—when any of the terms coded either (4), or (5) or (6) or any of their
dependent element operators appear in the Lead, the whole input string is read
from top to bottom and is written in the Display. However, if the term appearing in
the Lead is last term of the input string, then it will be dropped from the Display.
Example: A report on the feminist viewpoint on marriage
Input String:
(2) marriage
(4) feminist viewpoint
(6) reports
Index Entries:
Marriage
– Feminist viewpoint – Reports
Feminist viewpoint
– Marriage – Feminist viewpoint – Reports
Reports
– Marriage – Feminist viewpoint
c) Predicate Transformation: When an entry is generated under a term coded (3)
that immediately follows a term coded either by (2) or (s) or(t)—each of which
introduces an action of one kind or another—the predicate transformation takes
place. An input string of this kind is shown below:
Example: Planning of libraries by architect
Input String:
(1) libraries
(2) planning $v by $w of
(3) architects
In order to bring expressiveness in the resulting index entries, the connective codes $v
and $w (see ‘schema of codes’) are attached to the action concept and it results in a
compound phrase. The rule relating to the generation of index entries with this format is:
when the term coded (3) goes to the Lead, the computer checks the operator assigned
to the next preceding term. If that operator is (2) or (s) or (t), the term coded with any
of these operators and the term accompanied by the Code $w for upward reading
connective (if any) are printed in the Display position instead of Qualifier position.
Accordingly, the index entries for the aforesaid input string will be:
Libraries
Planning by architects
94 Planning. Libraries
By architects Indexing Techniques
Architects
Planning of libraries
11.3.5 POPSI
All pre-coordinate indexing models are entirely based on the method of facet analysis.
Ranganathan pointed out in a paper entitled ‘Subject heading and facet analysis’
(Journal of Documentation, 20 (3), 1964, p.109-119) that facet analysis does not
depend entirely on notational scheme of classification. The rules of chain procedure, he
said, can be so framed as to implement any kind of decision about the sought first
heading and the other successive headings in conformity with the principle of local
variation. Since then, continuous research on this new line of thinking was going on at
Documentation Research and Training Centre (DRTC), Bangalore and a number of
papers on Postulate-based Permuted Subject Indexing (POPSI) based on
Ranganathan’s General Theory of Library Classification came out. Dr. Ganesh
Bhattacharyya first explained the fundamentals of subject indexing languages with an
extensive theoretical background which ultimately led to the development of newer
version of POPSI forming the part of his General Theory of Subject Indexing Languages
(GT-SIL). Bhattacharyya developed the POPSI through logical interpretation of the
deep structure of subject indexing language (SIL). POPSI drew attention to the
helpfulness of adopting a suitable device for ensuring an optimally effective organising
classification through the alphabetisation of verbal subject – propositions. It prescribes
the use of apparatus words – such as prepositions, conjunctions, participles etc., as
and when necessary to communicate the exact meaning of subject propositions. These
words are put in parenthesis and they are ignored in alphabetisation. Since the POPSI
Index of all verbal entries, filing them in one alphabetical sequence in a unipartite index
is made easy.
Major Working Concepts of POPSI
1) Deep Structure of Subject Indexing Languages (DS-SIL)
DS-SIL is the logical abstraction of the surface structures of outstanding SILs like
Cutter, Dewey, Kaiser and Ranganathan. According to the General Theory of SIL, the
structure of a specific SIL has been assumed to be a surface structure of the deep
structure of SIL. The DS-SILs has been presented diagrammatically as follows:
95
Indexing It appears from the above diagram that any specific subject may belong to any one of
the following elementary categories (D, E, A, P) and modifier:
2) Elementary Categories and Modifiers
a) Discipline (=D) refers to an elementary category that includes the
conventional field of study, or any aggregate of such fields, or artificially created
fields analogous to those mentioned above; e.g. Physics, Biotechnology,
Ocean science, Library and Information Science, etc.
b) Entity (=E) refers to an elementary category that includes manifestations
having perceptual correlates, or only conceptual existence, as contrasted
with their properties, and actions performed by them or on them; e.g. Energy,
Light, Plants, Animals, Place, Time, Environment, etc.
c) Action (=A) refers to an elementary category that includes manifestations
denoting the concept of ‘doing’. An action may manifest itself as Self Action
or External Action. For examples: Function, Migration, etc. are Self Actions;
and Treatment, Selection, organisation, and Evaluation, etc. are External
Actions.
d) Property (=P) refers to an elementary category that includes manifestations
denoting the concept of ‘attribute’—qualitative or quantitative; e.g. Property,
Effect, Power, Capability, Efficiency, Utility, Form, etc.
e) M=Modifier refers to a qualifier used to modify any one the elementary
categories D, E, A and P. It decreases the extension and increases the intension
of the qualified manifestation without disturbing its conceptual wholeness. A
modifier can modify any one of the elementary categories, as well as two or
more elementary categories. Modifiers are of two types:
Common Modifiers: They refer to Space (e.g. Libraries in India),
Time (e.g. Libraries in India 19th Century), Environment (e.g. Desert
Birds), and Form (e.g. Encyclopedia of Physics). Common modifiers
have the property of modifying a combination of two or more elementary
categories.
Special Modifiers: A special modifier is used to modify only one of
the elementary categories. It may be of Discipline-based, or Entity-
based, or Property-based, or Action-based. Special modifiers can be
grouped into two types:
i) those that require a phrase or auxiliary words to be inserted between
the term and thus forming a complex phrase, e.g. Cataloguing using
computers; and
ii) those that do not require auxiliary words or phrase to be inserted in
between the terms, but automatically form an acceptable compound
term denoting Species/Type, e.g. ‘Chemical’ in ‘Chemical
Treatment’.
3) Organising Classification and Associative Classification
According to the General Theory of SIL, classification is a combination of both organising
classification and associative classification. In other words, an indexing system is a
combination of both organising classification and associative classification. The tasks
96
involved in creating an organising classification are the categorisation of concepts and Indexing Techniques
their organising in hierarchies. In organising classification compound subjects are based
on genus-species, whole-part, and other inter-facet relationships. Here, classification is
used to distinguish and rank each subject from all other subjects with reference to its
Coordinate—Superordinate—Subordinate—Collateral (COSSCO) relationships. The
result of organising classification is always a hierarchy. In associative classification, a
subject is distinguished from other subjects based on the reference of how it is associated
with other subjects without reference to its COSSCO relationships. The result of
associative classification is always a relative index.
4) Base and Core
In the context of constructing compound subject heading, when the purpose is to bring
together all or major portion of information relating to a particular manifestation or
manifestations of a particular elementary category, the manifestation/category is Base.
In case of a complex subject, any one of the subjects can be decided to be the Base
subject depending upon the purpose in hand. For example: for a document on ‘Eye
cancer’, ‘Eye’ is the Base subject in an Eye Hospital Library, and ‘Cancer’ is to be
considered as the Base subject for a Cancer Research Centre.
When the purpose is to bring together within a recognised Base, all or major portion of
information pertaining to one or more elementary categories, the category or categories
concerned is the Core of the concerned Base. Core lies within the Base, and which one
will be the Base or Core depends on the collection or purpose of the library. For
example: In DDC, ‘Medicine’ is the Base, and the ‘Human body’ and its ‘Organs’
constitute the Core of the Base.
Features of POPSI’
From the operational point of view, the salient features of POPSI may be grouped
under three components: Analysis, Synthesis, and Permutation.
The work of ‘Analysis’ and ‘Synthesis’ is primarily based on the postulates associated
with the deep structure of SILs for generating organising classification. The task of
analysis and synthesis is largely guided by the following POPSI-table. The work of
‘Permutation’ is based on cyclic permutation of each term-of-approach, either
individually or in association with other terms for generating associative classification
effect in alphabetical arrangement.
Rules of Syntax
The basic rules of syntax associated with POPSI are:
a) Discipline is followed by Entity, both modified and unmodified.
b) Property follows immediately the manifestation in relation to which it is a Property.
c) Action follows immediately the manifestation in relation to which it is an Action.
d) A Property can have its own Property.
e) An Action can have its own Action.
f) A Species/Type follows immediately the manifestation in relation to which it is a
Species/Type.
g) A Part follows immediately the manifestation in relation to which it is a Part.
97
Indexing h) A Modifier follows immediately the manifestation in relation to which it is a Modifier.
The following POPSI Table, like Role Operators in PRECIS, is used in sequencing the
component terms for formulating a subject heading
0 Form modifier
1 General Treatment
2 Phase relation
2.1 General
2.2 Bias
2.3 Comparison
2.4 Similarity
2.5 Difference
2.6 Application
2.7 Influence
Common modifiers
3 Time modifier
4 Environment modifier
5 Place modifier
6 Entity (E) .1 Action (A) , Part
7 Discipline (D) .2 Property (P) . Speciator/Type
Note: Notations —Special modifier
.1 and .2 are Note: A
preceded by the Species/Type/ Special
notation of the modifier follows
manifestation in immediately the
relation to manifestation in
which it is (A) relation to which it is
and (P). a Species/Type.
8 Core I
9 Base (B)
Note: Features relating to Core I and Base (B) are analogous to
6 Entity / 7 Discipline /.1 Action / .2 Property.
99
Indexing Demonstration of the Procedure of POPSI-Basic
0) Title of the document:
Use of computers for indexing of educational films in university libraries in India
1) Content Analysis:
Library and Information Science = Discipline (D) [Implicit]
University libraries = Entity (E) [Explicit]
Educational films = Part of E [Explicit]
Indexing = A of Part of E [Explicit]
Use = Application phase relation (PR) [Explicit]
Computers = E-based Special modifier (Sm) [Explicit]
India = Common modifier (Cm) of place [Explicit]
2) Formalisation:
Library and Information Science (D), University libraries (E), Educational films
(Part of E), Indexing (A of Part of E), Use (Application PR), Computers (E-
based Sm), India (Cm)
3) Standardization:
It is assumed that all the terms in the formalized expression of the subject as shown
above are standard terms.
4) Modulation:
Library and Information Science (D), Libraries. Academic libraries. University
libraries (E), Information sources. Films. Educational films (Part of E), Indexing
(A of Part of E), Use (Application PR), Computers (E-based Sm), Asia, India
(Cm)
5) Entry for Organising Classification:
Library and Information Science 6 Libraries. Academic libraries. University libraries,
Information sources. Films. Educational films 6.1 Indexing 2.6 (using) Computers
5 (in) Asia, India
6) Approach-term Selection:
The following terms are selected as the approach-terms:
Libraries
Academic libraries
University libraries
Information sources
Films
Educational films
Indexing
Computers
India
100
7) Preparation of Entries of Associative Classification: Indexing Techniques
LIBRARIES
Library and Information Science 6 Libraries. Academic libraries. University libraries,
Information sources. Films. Educational films 6.1 Indexing 2.6 (using) Computers
5 (in) Asia, India
The above organising classification will have to be repeated under each of the
following approach terms:
Academic libraries
University libraries
Information sources
Films
Educational films
Indexing
Computers
India
8) Alphabetisation:
All the entries are arranged according to the alphabetical order, word-by-word.
POPSI-Specific
The steps in POPSI with illustrative examples as demonstrated above fall within the
purview of POPSI-Basic. According to Bhattacharyya, there is no single absolute version
of organising or associative classification. POPSI tries to find out what is logically basic,
and amenable to systematic manipulation to meet specific requirement. The POPSI-
Basic is a product of the application of the GT-SIL and it is readily amenable to the
systematic manipulation to generate purpose-oriented specific versions known as
POPSI-Specific. POPSI-Specific is always a derivation from the POPSI-Basic
according to special decisions and rules to meet specific requirements at the local level.
It may be noted here that this approach is totally different from that of earlier contributors
of different SILs.
If the purpose is to bring together all or a major portion of information pertaining to a
specific topic in a discipline manifesting any of the Elementary categories, the above
version of POPSI-Basic can be systematically manipulated to generate the required
version of POPSI-Specific. This involves the decision about the ‘Base’ and ‘Core’.
For example, our purpose is to bring together all information pertaining ‘Educational
films’ in one place and hence, ‘Educational films’ is to be considered as the Base.
‘University libraries’ and ‘Indexing’ are to be considered as the Modifier and Action to
Base respectively. In view of this, we can prepare the following organising classification
entry:
Information sources. Films. Educational films — Libraries. Academic libraries. University
libraries 9.1 Indexing 2.6 (using) Computers 5 (in) Asia, India.
101
Indexing Self Check Exercise
Note: i) Write your answers in the space given below.
ii) Check your answers with the answers given at the end of this Unit.
5) Discuss the Principle of Context Dependency.
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
6) Discuss the entry structure and entry format as followed in PRECIS?
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
7) How do you categorise different operational stages of POPSI?
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
8) What are the major steps in formulating index entries according to POPSI?
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
103
Indexing 11.4.2 Term Entry System and Item Entry System
In Term Entry System, we prepare entries for a document under each of the appropriate
subject headings, and file these entries alphabetically. Here, terms are posted on the
item (i.e. Term on Item System). In this type of post-coordinate indexing, the number of
entries for a document is dependent on the number of terms associated with the thought
content of the document. Searching of two files (Term Profile and Document Profile) is
required in this system. Uniterm and Peek-a-boo are examples of these.
It is possible to take the opposite approach and make a single entry for each item, using
a physical form which permits access to the entry from all appropriate headings. A
system which works in this way is called an Item Entry System. Here, items are
posted on the term (i.e. Item on Term System). In this type of post-coordinate indexing,
single entry is made for each item. Item entry system involves the searching of one file
(i.e. Term Profile) only. Edge-notched Card is an example of item entry system.
104 arrange all term cards according to the alphabetical order of the uniterms.
Term cards so prepared as stated above and arranged according to the alphabetical Indexing Techniques
order of the uniterms are called the Term Profile of the system.
Searching
The process of searching in Uniterm system involves the following operations:
Searcher identifies the different component terms/uniterms associated with the
content of her/his queries.
Searcher, after identifying the component terms/uniterms, pull out the pertinent
term cards from the alphabetical deck of the Term Profile.
Term cards thus pulled out are matched to find the common accession number(s).
The number(s) common in all such Uniterm cards represent the sum total of the
component concept of the specific subject.
With the help of the common accession number(s), relevant card(s) are pulled out
from the Document Profile where full bibliographical information of the required
document(s) is available.
The main advantage of the Uniterm indexing system is its simplicity and the ease with
which persons without much knowledge of subject indexing can handle it. The criticisms
against Uniterm system centre around: (a) search time: involves much searching time
because of the searching of two files—Term Profile and Document profile; and (b)
false drops: possibility of retrieving irrelevant documents due to false coordination of
uniterms. For example, searching with the uniterms ‘Teachers’, ‘Students’ and
‘Evaluation’ may retrieve documents on both the subjects, ‘Evaluation of students by
teachers and ‘Evaluation of teachers by students’, one of which might be irrelevant to a
particular user.
To overcome the problem of false drops, the following post-coordinate searching devices
have been used:
1) Use of Pre-coordinated Terms
It is the introduction of pre-coordination to some extent in post-coordinate system
in which two or more terms in a subject are bound in place of isolated single term/
uniterm to get rid of false coordination.
2) Links: Links are special symbols used to group all the related concepts in a
document separately, so that inappropriate combinations of terms are not retrieved.
Suppose we have a document (accession number: 243) dealing with two different
topics—Classification of non-book materials and indexing of films. In order
to avoid false coordination like Classification of films and indexing of non-
book materials, alphabetical symbols, which serve as interlocking device, are
attached to accession number to indicate different groups:
Classification 243A
Non-book Materials 243A
Indexing 243B
Films 243B
3) Roles: Roles are the indicator digits/symbols attached to the terms at the time of
indexing to indicate the role or status or use of the term in a particular context.
Here, the possible roles of different terms are identified beforehand and terms are 105
Indexing tagged with these role indicators at the time of indexing. For example, roles
developed by the Engineering Joint Council, known as EJC role operators may be
attached to ‘Television’ to distinguish its functions as the product and tool:
Role Document
2 [Product / Output] Manufacturing of television
3 [Agent / Tool] Use of television in education
4) Weighting: It is the device of allocating quantitative values to the index terms
according to their degree of relevance in the document. Different ways for indicating
weights have been suggested. A simple system uses numbers 1 to 3, where 3
indicates maximum weight (i.e. the index term is highly specific and covers an
entire major subject of the document), 2 and 1 indicate weights of index terms in
decreasing order of their values or relevance in the document.
108
ANNUAL REPORT Indexing Techniques
EDITORIAL:*
1966=………………………………………………………………………... F 1
ANNUAL REVIEW
BOOK REVIEW:* OF INFORMATION SCIENCE AND TECHNOLOGY= B3-2.
INFORMATION SCIENCE AND TECHNOLOGY= BOOK REVIEW:*OF B3-2
SCIENCE AND TECHNOLOGY= BOOK REVIEW:* OF INFORMATION B3-2
TECHNOLOGY= BOOK REVIEW:* OF INFORMATION SCIENCE AND B3-2
111
Indexing 11.6.3 Advantages and Disadvantages of Computerised
Indexing
Advantages of computerised indexing are as follow:
It is as effective as human indexing;
It is cost effective compared to expensive human indexing;
Maintains consistency in indexing;
Indexing time is reduced;
Help searchers find information quickly;
Can be applied to large volumes of texts where human indexing becomes impossible
(e.g. Indexing web pages);
Retrieval effectiveness can be achieved.
Disadvantages are
Not flexible;
Not precise when looking at unique materials;
Not able to adapt new terminology;
Not able to do the conceptual analysis of the content of the document;
Not a term occurs several times in a document will always be a significant term.
114
However, the trend of in the domain of IR system is convergence. CDROM based Indexing Techniques
systems are integrated with OPACs, OPACs are linking online databases, document
delivery services, and other resource discovery services. MARC 21 bibliographic format
includes field 856 for encoding URLs of Internet resources. The Web is becoming the
platform for convergence of different IR systems e.g. Web-OPACs are linking open
databases (information mashup) and acting as the gateways for local and global
information resources to support users.
11.7.4 Taxonomies
Taxonomies can be considered another tool useful for organising web-based information.
For example, taxonomies provide an excellent means for organising subject-specific
information into an easily navigable format. The utility of taxonomies for displaying web
information can also be found when examining Yahoo’s site – which uses a taxonomy/
subject hierarchy to classify its indexed information.
Self Check Exercise
Note: i) Write your answers in the space given below.
ii) Check your answers with the answers given at the end of this Unit.
17) What do you mean by indexing Internet resources?
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
18) What are the different types search engines used in indexing Internet resources?
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
122
19) Discuss the methods of search engine indexing. Indexing Techniques
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
20) What is Semantic Web?
.....................................................................................................................
.....................................................................................................................
.....................................................................................................................
11.8 SUMMARY
In this Unit we have dealt with different techniques of subject indexing. It begins with a
brief discussion on derivative indexing and assignment indexing and is followed by the
discussion on different types of pre- and post-coordinate indexing systems. We cannot
understand the pre-coordinate indexing properly without being aware of the contributions
of C.A. Cutter and J. Kaiser. For this, principles and processes of subject indexing
techniques as enunciated by Cutter and Kaisers are discussed. Major pre-coordinate
indexing systems like Chain Indexing, PRECIS, and POPSI are discussed with reference
to their principles, syntactical and semantic aspects, entry structure and system of
references. Objective conditions that led to the development of post-coordinate indexing
and its differences with pre-coordinate indexing are explained. Term entry system and
item entry system forming parts of the post-coordinate indexing system are also discussed
with an emphasis on the operational stages of Uniterm indexing system. Different varieties
of keyword indexing are explained. Computerised indexing techniques are explained in
terms of its meaning, features, differences with manual indexing, advantages and
disadvantages, components, categories, index file organisation and different methods
associated with the generation of index entries with the aid of computers. Indexing
internet resources with particular reference to search engine indexing and other associated
concepts are discussed briefly at the end of this Unit. All indexing techniques are
demonstrated with illustrative examples.
125
Indexing 15) Design and development of the computerised indexing system mainly depends on
the methods used for organising records in the file. Some of the important logical
file organisations are: (1) Sequential File, (2) Inverted File, (3) Indexed Sequential
Files, (4) Chained Files, (5) Tree Structured Files, and (6) B-Tree.
16) Different methods of computerised indexing include: (1) Statistical Method, which
may consist of (1.1) Term Frequency method, (1.2) Relative Frequency Method,
and (1.3) Term weighting method; (2) Linguistic Method, (3) Artificial Intelligence
(AI) based Indexing System, which may consist of (2.1) Natural Language
Processing (NLP) based Indexing System, and (2.2) Expert System based Indexing
System,
17) Indexing Internet resources or Web indexing means the following:
a) search engine indexing of the Web,
b) creation of metadata,
c) organisation of Web links by category, and
d) creation of a Website index that looks and functions like a back-of-book
index.
18) Different types search engines used in indexing Internet resources are: (1) General
Search Engine, (2) Regional Search Engine, (3) Meta-search Engine, (4) Subject
Specific Search Engine, and (5) Directory-based Search Engines.
19) In order to find information from the Internet, a search engine employs special
software robots, called spiders (also called crawler, worm, wanderer, gatherer,
etc.), which traverse the web, following links between pages. It builds the list of
the words found on Web sites, called Web crawling. Here, the spider begins with
a popular site, indexing the words on its pages and following every link found
within the site. In this way, the spidering system quickly begins to travel across the
Web and takes note of the (a) words within the page, and (b) location of the
words - title, subtitles, meta tags and other positions of the Web page. Meta tags
allow the owner of a page to specify key words and concepts under which the
page will be indexed. The meta tags also guide the search engine in choosing
which of the several possible meanings for these words is correct. After completion
of finding information on Web pages by the spiders, the search engine stores the
information in a way that makes accessible to users. In the simplest case, a search
engine just stores the word and the URL where it was found. To make for more
useful results, most search engines store more than just the word and URL.
20) The “semantic web” is an approach to extend the web with semantic information
to avoid wrong hits. The conception of Semantic Web is characterised by developing
tools and technologies like languages, standards and protocols so that the Web
becomes meaningful. Technologies involved in the development of the Semantic
Web are the Uniform Resource Identifier (URI) for identifying documents uniquely
and globally, XML (eXtensible Markup Language) for structuring the data
semantically, RDF (Resource Description Framework) to base the structure of
the documents on a common model base, Ontologies (to define the objects/entities
and the interrelations between these objects/entities), etc.
126
Indexing Techniques
11.10 KEYWORDS
Action : An elementary category associated with POPSI
which refers to an idea denoting the concept of
‘doing’. An action may manifest itself as Self
Action or External Action.
Assigned Indexing : The process of indexing in which a human indexer
selects one or more subject headings or
descriptors from a list of controlled vocabulary
to represent the subject(s) of a work. Also
known as Assignment Indexing and Concept
Indexing.
Associative Classification : It refers to a classification in which a subject is
distinguished from all other subjects based on
the reference of how it is associated with other
subjects, without reference to its COSSCO
relationships. The result of associative
classification is always a relative index.
Back-of-the-book Index : An index which shows where exactly in the text
of a document a particular concept (denoted by
a term) is mentioned, referred to, defined or
discussed.
Base : It is a particular manifestation or manifestations
of a particular elementary category under which
all or major portion of related information are
brought together.
Boolean Operators : AND, OR, and NOT. Used to combine search
terms. AND finds only records that contain both
terms. OR finds records that contain either term.
NOT finds records that contain the first term but
not the second term.
Chain Indexing : The process of deriving subject index entries
based on the extracted vocabulary of a notational
scheme of classification. It retains all necessary
context but removes unnecessary context.
Classaurus : It is an elementary category-based (faceted)
systematic scheme of hierarchical classification
in verbal plane incorporating all the necessary
features of a conventional information retrieval
thesaurus. It is used as vocabulary control device
in POPSI.
Coextensive Subject Index : A subject index entry, in which a term, phrase,
or a set of terms define precisely the full thought
content of the document. Here extension and
intension of the ideas are equal to the thought
content of the document.
127
Indexing Common Modifier : It refers to the name of a place (space), Time,
Environment), and Form.
Computerised Indexing : A method of indexing in which an algorithm is
applied by a computer to the title and/or text of
a work to identify and extract words and phrases
representing subjects, for use as headings under
which entries are made in the index.
Concrete : An elementary category suggested by Kaiser to
refer to things, place and abstract terms, not
signifying any action or process.
Content Designation : The act of making a bibliographic record machine
readable by encoding its various elements
according to a specified scheme.
Core : It is a particular manifestation or manifestations
of one or more elementary category under which
all or major portion of related information are
brought together within a recognised Base.
COSSCO Relationship : It is a relationship in which COordinate—
Superordinate—Subordinate—COllateral
(COSSCO) relationships of a subject are shown.
Deep Structure of Subject : DS-SIL refers to the logical abstraction of the
Indexing Languages (DS-SIL) surface structures of outstanding SILs like Cutter,
Dewey, Kaiser and Ranganathan.
Derived Indexing : The process of indexing in which terms to be
used to represent the content of the document
are derived directly from the document itself. Also
known as Derivative Indexing.
Discipline : An elementary category associated with POPSI
that includes the conventional fields of study, or
any aggregate of such fields, or artificially created
fields.
Entity : An elementary category associated with POPSI
which includes manifestations having perceptual
correlates, or only conceptual existence, as
contrasted with their properties, and actions
performed by them or on them.
False Drops : Retrieval of unwanted documents because of the
false coordination of terms at the time of
searching.
Input String : A set of terms arranged according to the role
operators which act as instructions to the
computer for generating index entries.
Item Entry System : A type of post-coordinate indexing system in
which It takes the opposite approach to term
128 entry system and prepares a single entry for each
document (item), using a defined physical form, Indexing Techniques
which permits access to the entry from all
appropriate headings. Here, items are posted on
the term.
Keyword : A term that is chosen, either from the actual text
or from the queries of the searcher, that is
considered to be a ‘key’ to finding certain
information.
Keyword Indexing : The process of using significant words from a
title or an abstract or sometimes from the text of
the document as index entries.
KWIC Indexing : Key Word In Context, format for showing index
entries within the context in which they occur.
KWOC Indexing : Key Word Out of Context, the use of significant
word from titles for subject index entries, each
followed by the whole title from which the word
was taken.
Meta Search Engine : a program that allows to search across many
search engines at once.
Modifier : It refers to a qualifier used to modify any one the
elementary categories D, E, A and P associated
with POPSI.
Nesting : Grouping terms within parentheses to specify the
order in which they will be combined. Terms in
the innermost parentheses will be combined and
searched first. Without parentheses, terms will
be combined in left-to-right order.
Ontology : A formal specification of a representational
vocabulary for a shared domain of discourse—
definitions of classes, relations, functions, and
other objects. Ontologies define data models in
terms of classes, subclasses and properties to
enhance the functioning of the Web.
Organising Classification : In organising classification compound subjects
are based on genus-species, whole-part, and
other inter-facet relationships. Organising
classification distinguish and rank each subject
from all other subjects with reference to its
COordinate—Superordinate—Subordinate—
COllateral (COSSCO) relationships.
Post-Coordinate Indexing : An indexing model in which terms associated with
the content of the document are kept separately
in the index file by the indexer and the searcher
coordinate coordinates the terms at the time of
searching or output stage. Also known as
‘coordinate indexing’.
129
Indexing PRECIS : PREserved Context Index System, a subject
indexing technique in which an open-ended
vocabulary can be organised according to a
scheme of role operators, usually for computer
manipulation.
Pre-Coordinate Indexing : An indexing model in which terms associated with
the content of the document are coordinated by
the indexer by following the syntactical rules of
given indexing language at the time of indexing
or input stage for use in the retrieval of information
collection on compound and /or complex
concepts.
Process : ‘An elementary category suggested by Kaiser
to refer to mode of treatment of the subject by
the author, an action or process described in the
document, and an adjective related to the
concrete as component of the subject.
Property : An elementary category associated with POPSI
which refers to the idea denoting ‘attribute’.
Role Operator : Role operators consist of a set of alpha-numeric
notations which specifies the grammatical role or
the function of the indexed term and regulates
the order of terms in the input string. Role
operators and their associated rules also serve
as the computer instruction for determining the
format, typography and punctuation associated
with each index entry.
Search Engine : A retrieval tool on the World Wide Web that, in
general, matches keywords input by a user to
words found at websites. The more sophisticated
search engines may allow other than keyword
searching.
Semantic Web : The “semantic web” is an approach to extend
the web with semantic information to avoid wrong
hits by developing tools and technologies like
languages, standards and protocols so that the
Web becomes meaningful.
Special Modifiers : A special modifier refers to a qualifier which is
used to qualify/modify only one of the elementary
categories associated with POPSI.
Stop-word List : The ‘stop-word’ list refers to a list of words,
which have no value for indexing / retrieval. These
may include insignificant words like articles (a,
an, the), prepositions, conjunctions, pronouns,
auxiliary verbs together with such general words
as ‘aspect’, ‘different’, ‘very’, etc. Each major
search system has defined its own ‘stop list’
Subject Analysis : The process of identifying the different component
130
ideas associated with the thought content of the Indexing Techniques
document and establishing the interrelationships
among those component ideas.
Subject Gateways : Organized lists of web pages, divided into subject
areas by human indexers.
Subject Heading : A word or group of words representing the
subject of a document.
Subject Index : A tool that exhibits the analysed contents of the
collection of documents (either in the library or
database).
Subject Indexing : The process of representing the informational
content of the document by analysing its content
and translating the result of analysis into an
indexing language for creating a surrogate record
for it, especially subject access points, in an index.
Term Entry System : A type of post coordinate indexing system in
which index entries for a document are made
under each of the component terms associated
with the thought content of the document. Here,
terms are posted on the item.
Web Indexing : Web indexing means providing access points for
online information materials, which are available
through the use of World Wide Web browsing
Software.
World Wide Web (WWW) : a network of many thousands of servers linked
together by a common protocol.
132