67% found this document useful (3 votes)
7K views

IRS Automatic Indexing UNIT-2

The document discusses different methods of automatic indexing, including statistical indexing, natural language indexing, and concept indexing. Statistical indexing uses the frequency of terms to calculate relevance and includes techniques like probability weighting, vector weighting, Bayesian models, and logistic regression. Vector weighting represents documents and queries as vectors of term weights. Bayesian models use probability theory to rank documents based on relevance. Logistic regression derives coefficients to predict relevance probabilities. Natural language indexing performs additional parsing while concept indexing correlates terms to concepts.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
67% found this document useful (3 votes)
7K views

IRS Automatic Indexing UNIT-2

The document discusses different methods of automatic indexing, including statistical indexing, natural language indexing, and concept indexing. Statistical indexing uses the frequency of terms to calculate relevance and includes techniques like probability weighting, vector weighting, Bayesian models, and logistic regression. Vector weighting represents documents and queries as vectors of term weights. Bayesian models use probability theory to rank documents based on relevance. Logistic regression derives coefficients to predict relevance probabilities. Natural language indexing performs additional parsing while concept indexing correlates terms to concepts.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

UNIT-2

AUTOMATIC INDEXING
• Classes of Automatic Indexing
• Statistical Indexing
• Natural Language
• Concept Indexing
• Hypertext Linkages
CLASSES OF AUTOMATIC INDEXING
 Automatic indexing is the process of analyzing an item to extract the information to be
permanently kept in index.
 This process is associated with the generation of the searchable data structures associated with
an item.
 The left hand side of the figure includes identifying processing tokens, apply stop list algorithm
, characterize tokens, apply stemming, and creating searchable data structure is part of indexing
process.
Data Flow in Information Processing System (Overall fig. )

ma
 All systems go through the initial stage of zoning and identify the processing tokens to create
index.
 Some systems automatically divide the document up into fixed length passages or localities,
which become the item unit that is indexed.
 Filters such as stop list algorithm and stemming algorithms -to reduce the processing tokens.
 An index is the data structure created to support search strategy.
 Search strategy – classified as statistical , natural language, and concept.
 Statistical strategy – covers the broadest range of indexing techniques and common in
commercial systems .
 Basis for statistical is – frequency of occurrence of processing tokens(words/ phrases) within
documents and within data base.
 The words/phrases are the domain of searchable values.
 Statistics that are applied to the event data are probabilistic, Bayesian, vector spaces, neural
net.
 Statistic approach – stores a single statistics, such as how often each word occurs in an item
– used for generating relevance scores after a Boolean search .
 Probabilistic indexing -stores the information that are used in calculating a probability that a
particular item satisfies (i.e., is relevant to) a particular query.
 Bayesian and vector space – stores the information that are used in generating a relative
confidence level of an items relevancy to a query.
 Neural networks –dynamic learning structures– comes under concept indexing – that
determine concept class.
 Natural language approach perform the similar processing token identification as in statistical
techniques - additional level of parsing of the item (present, past, future
action ) enhance search precision.
 Concept indexing uses the words within an item to correlate to concepts discussed in the index
item.
 When generating the concept classes automatically, there may not be a name applicable to the
concept but just a statistical significance.
 Finally, a special class of indexing can be defined by creation of hypertext linkages.
 These linkages provide virtual threads of concepts between items versus directly defining the
concept within an item.
 Each technique has its own strengths and weaknesses.

ma
STATISTICAL INDEXING
• Uses frequency of occurrence of events to calculate the number to indicate potential relevance
of an item.
• The documents are found by normal Boolean search and then statistical calculation are
performed on the hit file, ranking the out put( e.g. The term- frequency algorithm)
1. Probability weighting
2. Vector Weighting
1. Simple Term Frequency algorithm
2. Inverse Document Frequency algorithm
3. Signal Weighting
4. Discrimination Value
5. Problems with the weighting schemes and vector model
3. Bayesian Model
1. Probabilistic weighting
• Probabilistic systems attempt to calculate a probability value that should be invariant to both
calculation method and text corpora(large collection of written/spoken texts).
• The probabilistic approach is based on direct application of theory of probability to
information retrieval system.
• Advantage: uses the probability theory to develop the algorithm.
• This allows easy integration of the final results when searches are performed across multiple
databases and use different search algorithms.
• The use of probability theory is a natural choice – the basis of evidential reasoning ( drawing
conclusions from evidence).
• This is summarized by PRP( probability ranking principle) and Plausible corollary (reasonable
result)
• PRP –hypothesis –if a reference retrieval systems response to each request is a ranking of the
documents in order of decreasing probability of usefulness to the user, the overall
effectiveness of the system to the users is best obtainable on the basis of the data available.
• Plausible corollary : the techniques for estimating the probabilities of usefulness for
outputranking in IR is standard probability theory and statistics.
• probabilities are based on binary condition the item is relevant or not.
• IRS the relevance of an item is a continuous function from non- relevant to absolutely useful.
• Source of problems: in application of probability theory come from lack of accurate data and
simplified assumptions that are applied to mathematical modeling.
• cause the results of probabilistic approaches in ranking items to be less accurate than other
approaches.
3

ma
• Advantage of probabilistic approach is that it can identify its weak assumptions and work to
strengthens them.
• Ex : logistical regression
 Approach starts by defining a model 0 system.
• In retrieval system there exits a query q iand a document term di which has a set of attributes
(Vi…Vn) from the query (e.g., counts of term frequency in the query), from the document
(e.g., counts of term frequency in the document ) and from the database (e.g., total number of
documents in the database divided by the number of documents indexed by the term).
• The logistic reference model uses a random sample of query documentterms for which binary
relevance judgment has been made from judgment from samples.
o Logarithm O is the logarithm of odds of relevance for terms Tk which is present in document
Dj and query Qi
o The logarithm that the ith Query is relevant to the jth document is the sum of the logodds for all
terms:
• The inverse logistic transformation is applied to obtain the probability of relevance of a
document to a query:
o The coefficients of the equation for logodds is derived for a particular database using a random
sample of query-document-term-relevance quadruples and used to predict odds of relevance
for other query-document pairs.
o Additional attributes of relative frequency in the query (QRF), relative frequency in the
document (DRF) and relative frequency of the term in all the documents (RFAD) were
included, producing the logodds formula:
o QRF = QAF\ (total number of terms in the query), DRF = DAF\(total number of words in the
document) and RFAD = (total number of term occurrences in the database)\ (total number of
all words in the database).
o Logs are used to reduce the impact of frequency information; then smooth out skewed
distributions.
o A higher max likelihood is attained for logged attributes.
o The coefficients and log (O(R)) were calculated creating the final formula for ranking for
query vector Q’, which contains q terms:

 The logistic inference method was applied to the test database along with the Cornell SMART
vector system, inverse document frequency and cosine relevance weighting formulas.

ma
 The logistic inference method outperformed the vector method.
 Attempts have been made to combine different probabilistic techniques to get a more
accurate value.
o This combination of logarithmic odds has not presented better results.
o The objective is to have the strong points of different techniques compensate for weaknesses.
o To date this combination of probabilities using averages of Log-Odds has not produced better
results and in many cases produced worse results.
2 . Vector weighing
• Earliest system that investigated statistical approach is SMART system of Cornell
university. – system based on vector model.
- Vector is one dimension of set of values, where the order position is fixed and represents a
domain .
- Each position in the vector represents a processing token
- Two approaches to the domain values in the vector – binary or weighted
- Under binary approach the domain contains a value of 1 or 0
- Under weighted - domain is set of positive value – the value of each processing token represents
the relative importance of the item.
- Binary vector requires a decision process to determine if the degree that a particular token the
semantics of item is sufficient to include in the vector.
• Ex., a five-page item may have had only one sentence like “Standard taxation of the
shipment of the oil to refineries is enforced.”
• For the binary vector, the concepts of “Tax” and “Shipment” are below the threshold of
importance (e.g., assume threshold is 1.0) and they not are included in the vector.
Binary and Vector Representation of an Item

- A Weighted vector acts same as the binary vector but provides a range of values that
accommodates a variance in the value of relative importance of processing tokens in
representing the item .
• The use of weights also provides a basis for determining the rank of an item.
 The vector approach allows for mathematical and physical representation using a
vector space model.
• Each processing token can be considered another dimension in an item representation space.

ma
• 3D vector representation assuming there were only three processing tokens, Petroleum Mexico
and Oil.

Fig : Vector Represenation


2. 1 .Simple term frequency algorithms
 In both weighted and un weighted approaches an automatic index processing implements an
algorithm to determine the weight to be assigned to a processing token .
 In statistical system : the data that are potentially available for calculating a weight are the
frequency of occurrence of the processing token in an existing item (i.e., term frequency - TF),
the frequency of occurrence of the processing token in the existing database (i.e., total
frequency -TOTF) and the number of unique items in the database that contain the processing
token (i.e., item frequency - IF, frequently labeled in other document frequency - DF).
 Simplest approach is to have the weight equal to the term frequency.
 If the word “computer” occurs 15 times within an item it has a weight of 15.
 The term frequency weighting formula: (1+log(TF))/1+log(average(TF)) / (1-slope) * pivot
+ slope*no. of unique terms
• where slope was set at .2 and the pivot was set to the average no of unique terms occurring in
the collection.
• Slope and pivot are constants for any document/query set.
• This leads to the final algorithm that weights each term by the above formula divided by the
pivoted normalization:

ma
2. Inverse document frequency(IDF)
• Enhancing the weighting algorithm : the weights assigned to the term should be inversely
proportional to the frequency of term occurring in the data base.
• The term “computer” represents a concept used in an item, but it does not help a user find the
specific information being sought since it returns the complete DB.
• This leads to the general statement enhancing weighting algorithms that the weight assigned
to an item should be inversely proportional to the frequency of occurrence of an item in the
database.
• This algorithm is called inverse document frequency (IDF).
• The un-normalized weighting formula is:
WEIGHT = TFij * [Log2(n) – Log2+(IFij)+1]
• where WEIGHTij is the vector weight that is assigned to term “j” in item “i,”
• Tfij (term frequency) is the frequency of term “j” in item “i”, “n” is the number of items in the
database and
• IFij (item frequency or document frequency) is the number of items in the database that have
term “j” in them.
• Ex., Assume that the term “oil” is found in 128 items, “Mexico” is found in 16 items and
“refinery” is found in 1024 items.
• If a new item arrives with all three terms in it, “oil” found 4 times, “Mexico” found 8 times,
and “refinery found 10 times and there are 2048 items in the total database.
• weight calculations using inverse document frequency.
• with the resultant inverse document frequency item vector = (20, 64, 20).
• The value of “n” and IF vary as items are added and deleted from the database.

3. Signal weighting
• IDF adjusts the weight of a processing token for an item based upon the number of items that
contain the term in the existing database.
• It does not account for is the term frequency distribution of the processing token in the items
that contain the term - can affect the ability to rank items.
• For example, assume the terms “SAW” and “DRILL” are found in 5 items with the following
frequencies:

ma
• In Information Theory, the information content value of an object is inversely proportional to
the probability of occurrence of the item.
• An instance of an event that occurs all the time has less information value than an instance of
a seldom occurring event.
• This is typically represented as INFORMATION = -Log2(p), where p is the probability of
occurrence of event “p.”
• The information value for an event that occurs
.5 %of the time is: occurs 50 %of the time is:
• If there are many independent occurring events then the calculation for the average
information value across the events is:

• Its value decreases proportionally to increases in variances in the values of can be defined
as
• formula for calculating the weighting factor called Signal (Dennis-67) can be used:
• producing a final formula of:

ma
• The weighting factor for term “DRILL” that does not have a uniform distribution is larger
than that for term “SAW” and gives it a higher weight.
• This technique could be used by itself or in combination with inverse document frequency or
other algorithms.
• The overhead of the additional data needed in an index and the calculations required to get
the values have not been demonstrated to produce better results.
• It is a good example of use of Information Theory in developing information retrieval
algorithms.
4. Discrimination value
• Creating a weighting algorithm based on the discrimination of value of the term.
• all items appear the same, the harder it is to identify those that are needed.
• Salton and Yang proposed a weighting algorithm that takes into consideration the ability for
a search term to discriminate among items.
• They proposed use of a discrimination value for each term “i”:
• where AVESIM is the average similarity between every item in the database and AVESIMi
is the same calculation except that term “i” is removed from all items.

• DISCRIMi value being positive, close to zero/negative.


• A positive value indicates that removal of term “i” has increased the similarity between items.
• In this case, leaving the term in the database assists indiscriminating between items and is of
value.
• A value close to zero implies that the term’s removal or inclusion does not change the
similarity between items.

ma
• If the value is negative, the term’s effect on the database is to make the items appear more
similar since their average similarity decreased with its removal.
• Once the value of DISCRMi is normalized as a positive number, it can be used in the standard
weighting formula as:

Problems With Weighting Schemes


• Often weighting schemes use information that is based upon processing token distributions
across the database.
• Information databases tend to be dynamic with new items always being added and to a lesser
degree old items being changed or deleted.
• Thus these factors are changing dynamically.
• There are a number of approaches to compensate for the constant changing values.
a. Ignore the variances and calculate weights based upon current values, with the factors
changing over time. Periodically rebuild the complete search database.
b. Use a fixed value while monitoring changes in the factors. When the changes reach a certain
threshold, start using the new value and update all existing vectors with the new value.
c. Store the invariant variables (e.g., term frequency within an item) and at search time
calculate the latest weights for processing tokens in items needed for search terms.
• First approach, Periodically the database and all term weights are recalculated based upon the
most recent updatesto the database.
• For large databases in the millions of items, the overhead of rebuilding the database can be
significant.
• In the second approach, there is a recognition that for the most frequently occurring items, the
aggregate values are large.
• As such, minor changes in the values have negligible effect on the final weight calculation.
• Thus, on a term basis, updates to the aggregate values are only made when sufficient changes
not using the current value will have an effect on the final weights and the search/ranking
process.
• This process also distributes the update process over time by only updating a subset of terms at
any instance in time.
• The third approach is the most accurate. The weighted values in the database only matter when
they are being used to determine items to return from a query or the rank order to return the
items.
• This has more overhead in that database vector term weights must be calculated dynamically for
every query term.
• SOLUTIONS:

10

ma
• If the system is using an inverted file search structure, this overhead is very minor.
• The best environment would allow a user to run a query against multiple different time periods
and different databases that potentially use different weighting algorithms, and have the system
integrate the results into a single ranked Hit file.
Problems With the Vector Model
• A major problem comes in the vector model when there are multiple topics being discussed in a
particular item.
• For example, assume that an item has an in-depth discussion of “oil” in “Mexico” and also “coal”
in “Pennsylvania.” The vector model does not have a mechanism to associate each energy
source with its particular geographic area.
• There is no way to associate correlation factors between terms since each dimension in a vector
is independent of the other dimensions.
• Thus the item results in a high value in a search for “coal in Mexico.”
• Another major limitation of a vector space is in associating positional information with a
processing term.
• The concept of a vector space allows only one scalar value to be associated with each processing
term for each item.
Bayesian Model
• One way of overcoming the restrictions inherent in a vector model is to use a Bayesian approach
to maintaining information on processing tokens.
• The Bayesian model provides a conceptually simple yet complete model for information
systems.
• The Bayesian approach is based upon conditional probabilities (e.g., Probability of Event 1
given Event 2 occurred).
• This general concept can be applied to the search function as well as to creating the index to the
database.
• The objective of information systems is to return relevant items.
• Thus the general case, using the Bayesian formula, is P(REL/DOCi , Queryj) which is
interpreted as the probability of relevance (REL) to a search statement given a particular
document and query.
• In addition to search, Bayesian formulas can be used in determining the weights associated with
a particular processing token in an item.
• The objective of creating the index to an item is to represent the semantic information in the
item.
• A Bayesian network can be used to determine the final set of processing tokens (called topics)
and their weights.

11

ma
• A simple view of the process where Ti represents the relevance of topic “i” in a particular item
and Pj represents a statistic associated with the event of processing token “j” being present in
the item.

• “m” topics would be stored as the final index to the item.


• The statistics associated with the processing token are typically frequency of occurrence.
• But they can also incorporate proximity factors that are useful in items that discuss multiple
topics.
• There is one major assumption made in this model:Assumption of Binary Independence : the
topics and the processing token statistics are independent of each other. The existence of one
topic is not related to the existence of the other topics. The existence of one processing token
is not related to the existence of other processing tokens.
• In most cases this assumption is not true. Some topics are related to other topics and some
processing tokens related to other processing tokens.
• For example, the topics of “Politics” and “Economics” are in some instances related to each
other and in many other instances totally unrelated.
• The same type of example would apply to processing tokens.
• There are two approaches to handling this problem.
• The first is to assume that there are dependencies, but that the errors introduced by assuming the
mutual independence do not noticeably effect the determination of relevance of an item nor its
relative rank associated with other retrieved items.
• This is the most common approach used in system implementations.
• A second approach can extend the network to additional layers to handle interdependencies.
• Thus an additional layer of Independent Topics (ITs) can be placed above the Topic layer and a
layer of Independent Processing Tokens (IPs) can be placed above the processing token layer.

12

ma
Extended Bayesian Network

• The new set of Independent Processing Tokens can then be used to define the attributes
associated with the set of topics selected to represent the semantics of an item.
• To compensate for dependencies between topics the final layer of Independent Topics is
created.
• The degree to which each layer is created depends upon the error that could be introduced by
allowing for dependencies between Topics or Processing Tokens.
• Although this approach is the most mathematically correct, it suffers from losing a level of
precision by reducing the number of concepts available to define the semantics of an item.

NATURAL LANGUAGE
• The goal of natural language processing is to use the semantic information in addition to the
statistical information to enhance the indexing of the item.
• This improves the precision of searches, reducing the number of false hits a user reviews.

13

ma
• The semantic information is extracted as a result of processing the language rather than
treating each word as an independent entity.
• The simplest output of this process results in generation of phrases that become indexes to an
item.
• More complex analysis generates thematic representation of events rather than phrases.
• Statistical approaches use proximity as the basis behind determining the strength of word
relationships in generating phrases.
• For example, with a proximity constraint of adjacency, the phrases “venetian blind” and “blind
Venetian” may appear related and map to the same phrase.
• But syntactically and semantically those phrases are very different concepts.
• Word phrases generated by natural language processing algorithms enhance indexing
specification and provide another level of disambiguation.
• Natural language processing can also combine the concepts into higher level concepts
sometimes referred to as thematic representations.
1. Index Phrase Generation
• The goal of indexing is to represent the semantic concepts of an item in the information system
to support finding relevant information.
• Single words have conceptual context, but frequently they are too general to help the user find
the desired information.
• Term phrases allow additional specification and focusing of the concept to provide better
precision and reduce the user’s overhead of retrieving non-relevant items.
• Having the modifier “grass” or “magnetic” associated with the term “field” clearly
disambiguates between very different concepts.
• One of the earliest statistical approaches to determining term phrases using of a COHESION
factor between terms (Salton-83):

• where SIZE-FACTOR is a normalization factor based upon the size of the vocabulary and
is the total frequency of co-occurrence of the pair Termk , Termh in the item collection.
• Co-occurrence may be defined in terms of adjacency, word proximity, sentence proximity, etc.
• This initial algorithm has been modified in the SMART system to be based on the following
guidelines
• any pair of adjacent non-stop words is a potential phrase
• any pair must exist in 25 or more items
• phrase weighting uses a modified version of the SMART system single term algorithm
• normalization is achieved by dividing by the length of the single-term sub-vector.
• Statistical approaches tend to focus on two term phrases.
14

ma
• The natural language approaches is their ability to produce multiple-term phrases to denote a
single concept.
• If a phrase such as “industrious intelligent students” was used often, a statistical approach would
create phrases such as “industrious intelligent” and “intelligent student.”
• A natural language approach would create phrases such as “industrious student,” “intelligent
student” and “industrious intelligent student.”
• The first step in a natural language determination of phrases is a lexical analysis of the input.
• In its simplest form this is a part of speech tagger that, for example, identifies noun phrases by
recognizing adjectives and nouns.
• Precise part of speech taggers exist that are accurate to the 99 per cent range.
• Additionally, proper noun identification tools exist that allow for accurate identification of
names, locations and organizations since these values should be indexed as phrases and not
undergo stemming.
• The Tagged Text Parser (TTP), based upon the Linguistic String Grammar (Sager-81), produces
a regularized parse tree representation of each sentence reflecting the predicate- argument
structure (Strzalkowski-93).
• The tagged text parser contains over 400 grammar production rules. Some examples of

• The TTP parse trees are header-modifier pairs where the header is the main concept and the
modifiers are the additional descriptors that form the concept and eliminate ambiguities.
• Ex., ‘The former Soviet President has been a local hero’ regularized parse tree structure
generated for the
|assert
perf[HAVE]
verb[BE]
subject
np
noun[President]
t_pos[The]
15

ma
adj[former]
adj[Soviet]
object
np
noun[hero]
t_pos[a]
adj[local]
• To determine if a header-modifier pair warrants indexing,Strzalkowski calculates a value for
Informational Contribution (IC) for each element in the pair.
• The basis behind the IC formula is a conditional probability between the terms. The formula for
IC between two terms (x,y) is:
• Where is the frequency of (x,y) in the database, is the number of pairs in which “x” occurs
at the same Position as in (x,y) and D(x) is the dispersion parameter which is the number of
distinct words with which x is paired. When IC= 1, x occurs only with the
• following formula for weighting phrases:
• is 1 for i<N and 0 otherwise C1, C2 normalizing factors.
2 Natural Language Processing
• Natural language processing not only produces more accurate term phrases, but can provide
higher level semantic information identifying relationships between concepts.
• System adds the functional processes Relationship Concept Detectors, Conceptual Graph
Generators and Conceptual Graph Matchers that generate higher level linguistic relationships
including semantic and is course level relationships.
• During the first phase of this approach, the processing tokens in the document are mapped to
Subject Codes.
• These codes equate to index term assignment and have some similarities to the concept-based
systems.
• The next phase is called the Text Structurer, which attempts to identify general discourse level
areas within an item.
• The next level of semantic processing is the assignment of terms to components, classifying the
intent of the terms in the text and identifying the topical statements.
• The next level of natural language processing identifies interrelationships between the concepts.
• The final step is to assign final weights to the established relationships.
• The weights are based upon a combination of statistical information and values assigned to the
actual words used in establishing the linkages.

16

ma
CONCEPT INDEXING
• Natural language processing starts with a basis of the terms within an item and extends the
information kept on an item to phrases and higher level concepts such as the relationships
between concepts.
• Concept indexing takes the abstraction a level further.
• Its goal is use concepts instead of terms as the basis for the index, producing a reduced dimension
vector space.
• Concept indexing can start with a number of unlabeled concept classes and let the information
in the items define the concepts classes created.
• A term such as “automobile” could be associated with concepts such as “vehicle,”
“transportation,” “mechanical device,” “fuel,” and “environment.”
• The term “automobile” is strongly related to “vehicle,” lesser to “transportation” and much lesser
the other terms.
• Thus a term in an item needs to be represented by many concept codes with different weights
for a particular item.
• The basis behind the generation of the concept approach is a neural network model.
• Special rules must be applied to create a new concept class.
• Example demonstrates how the process would work for the term “automobile.”

HYPERTEXT LINKAGES
• It’s a new class of information representation is evolving on the Internet.
• Need to be generated manually Creating an additional information retrieval dimension.
• Traditionally the document was viewed as two dimensional .
• Text of the item as one dimension and references as second dimension.
• Hypertext with its linkages to additional electronic items , can be viewed as networking between
the items that extend contents, i.e by embedding linkage allows the user to go immediately to
the linked item.
17

ma
• Issue : how to use this additional dimension to locate relevant information .

• At the internet we have three classes of mechanism to help find information.


1. Manually generated indexes ex: www.yahoo.com were information sources on the
home page are indexed manually into hyperlink hierarchy.
- The user navigates through hierarchy by expanding the hyper
link.
- At some point the user starts to see the end of items.
2. Automatically generated indexes - sites like lycos.com and altavista.com automatically go
to other internet sites and return the text , ggogle.com.
3. Web Crawler's : A web crawler (also known as a Web spider or Web robot) is a program
or automated script which browses the World Wide Web in a methodical, automated manner.

• Are tools that that allow a user to define items of interest and they automatically go to various
sites on the net and search for the desired information.’ know as search tool rather than
indexing.
• What is needed is an indexalgorithm for items that look at hypertext linkages as an
extension of the concept where the link exits .
• Current concept is defined by the proximity of information .
• Attempts have been made to achieve automatically generate hyper link between items . But they
suffer from the dynamic growing data bases.
• Significant portion of errors have come from parsing rather than algorithm problems.

18

ma

You might also like