0% found this document useful (0 votes)
7 views12 pages

String Finding1

Uploaded by

gsvl2207
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views12 pages

String Finding1

Uploaded by

gsvl2207
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

22

An Efficient Implementation of String


Pat t ern Ma t ch i ng Machines for A Finite
Number of Keywords

Jun-ichi Aoe
Department of Information Science and System Engineering,
The University of Tokushima,
Minami-Josanjima-Cho,
Tokushima-City
770, Japan.

Abstract

This paper describes a method of implementing a static transition table of a string pattern
matching machine to locate all occurrences of a finite number of keywords in a t e x t string. T h e
scheme combines the fast access of an array representation with the compactness of a list
structure. Each transition can be computed from the present data s t r u c t u r e in O(1) time and the
storage is as small as the list structure. The construction and pattern matching programs
associated with the present data s t r u c t u r e are provided and the efficiency is evaluated b y a
empirical results.

Key Words: A string pattern matching algorithm, information retrieval, data structure,
space saving, static set of keywords, text-scanning.

1. Introduction

A string pattern matching matching machine has been applied to the subprocess of many
information retrieval models: for example, a lexical analyzer, code optimization and code
generation of a compilerE2],Eg],E11~,E12~,[13~E18~,E23]; a library bibliographic searchEl~[4];
t e x t - e d i t i n g ; a spell checkerE19]; filtering high f r e q u e n c y words in natural language
processing[73,E21]; and so on. Recently, Aho et al.E1] presented an efficient string pattern
matching algorithm to locate all occurrences of any of a finite number of keywords in a t e x t
string and Knuth et al.E16] presented a fast matching algorithm to find all occurrences of one
given keyword in a t e x t string. This paper discusses an efficient technique for implementing a
machine based on Aho et al..
The process of the matching machine occupies a reasonable portion of the total running time
of these models, since it is a low level process that must look at the input one symbol at a time.
Therefore, a fast implementation of the transition table of the string pattern matching machine
should be selected. A matrix representation, indexed by states and input symbols, is the fastest,
but it can take up too much space. A list representation consisting of the transitions out of each
state can achieve a space saving over the matrix representation, but is slower. T h e data
s t r u c t u r e for the finite state machine suggested by S. C. JohnsonE14] and shown by Aho et al.E2]
is more complex to implement, but it combines the time efficiency of the matrix representation
with the space e f f i c i e n c y of the list representation. This data s t r u c t u r e uses t h r e e
one-dimensional arrays to represent the transition table of a finite state machine. It has been
23

utilized by many users as compiler-compilers LEX[18] and YACC[14] in UNIX systems[2].


Although the present data s t r u c t u r e is related to the three arrays structure, the following
improvements over the three arrays s t r u c t u r e are introduced by restricting the transition t a b l e
of the finite state machine to that of the pattern matching machine.
1) Remove one a r r a y from the three arrays s t r u c t u r e by defining interdependent relations among
the s t a t e numbers.
2) Store the states with more than one nonfail value as the fast access s t r u c t u r e presented b y 1)
and the states with only one nonfail value can be encoded as a string memory.
T h e major functions of the construction and matching programs associated with the present
data s t r u c t u r e are provided by a language C. T h e presented scheme is e v a l u a t e d shortly by
the theoretical observation and the evaluation is supported by the simulation results for various
sets of keywords.

I I. A Pat t ern Matching Mach i ne

A. Formal Definition

In this paper a string is simply a finite sequence of symbols. Let K be a finite set of strings
which we shall call keywords. A string pattern matching machine for K is a program which
takes as input the t e x t string w and produces as output the locations in w at which keywords of
K appear as substrings. In order to simply the discussion we consider the matching machine
which takes as input one specified string x , called a word, of the text string w a n d produces as
o u t p u t which x is in K or not. To avoid confusion between words like THE and THEN, let us
add a special endmarker symbol, /g to the ends of all keywords, so no prefix of a keyword can be
a keyword itself. A set of these keywords is denoted as K. We define a string pattern matching
machine as M = ( $ , / , gA where
(1) S is a finite set of states. Each state is represented by an integer more than zero.
One state (usually 1) designated as the initial state in S
(2) I is a finite set of input symbols
(3) g i s a function from SX I U {#} to SU {fail}, called a goto function.
State s without transitions drawing from it called an accepting state. Namely, x # is in K if
and only if a sequence of the transitions from 1 to the accepting state s spells out some x#. T h e
transition labeled a in I from s to st indicates that g(s, a) = st. The absence of a transition
indicates fail. T h e transition graph by the goto function is called a goto graph. T h e following
(usual) conventions hold in this paper.
a, /~ c, d E I O {#};
x, y, z ~ ( I U { # } ) " .
Let ¢ be the empty string. T h e notation of the goto function for K is e x t e n d e d to strings by the
conditions
8¢ s.-, ~ ) : s ,
g(sr, ax) : g(g(Sr, a), ~ .

B. Construction of the goto function

T h e goto function g can be produced by the following program based on an algorithm of Aho
et al.[1]. In this program, the function g is implemented by a two-dimensional array GOTO, where
the e n t r y for state s and for input symbol a is denoted as GOTO[s][a] and the fail value is
denoted as a zero entry. We assume that GOTO[sIa] = 0 if g(s,a) has not yet been defined. T h e
function ENTER(key_word inserts into the goto graph a path that spells out "key_word'.

2
24

main()
int new_state;
[
char *keyword;

n e w _ s t a t e = 1;
do
gets(key_word);
while(ENTER(key_word) == TRUE);

ENTER(key_word)
char key_word[];
{
s=l; jr0;
i f ( s t r l e n ( k e y _ w o r d ) == O) then return(FALSE);
w h i l e ( G O T O [ s I k e y _ w o r d [ j ] ] :~: O){
s = GOTO[sIkey_word[j]];
++j;
}
w h i l e ( k e y _ w o r d [ j ] != " ~ 0 " ) [
++new_state;
GOTO[s][key_word[j]] = n e w _ s t a t e ;
s = new_state;
++j;
}
return(TRUE);
}

The following program summarizes the behavior of a matching program by using the array
GOTO.

GOTO_MATCH(word)
char word[];
{
int s, nextstate, input index;

S = 1;

i n p u t _ i n d e x = -1;
do{
++input_index;
nextstate = GOTO[next stateIword[input_index]];
i f ( n e x t _ s t a t e = 0) return(FALSE);
s = next_state;

} w h i l e ( w o r d [ i n p u t _ i n d e x ] == "#');
return(TRUE);

3
25

11 1. Efficient Implementation

A. A Compact Array S t r u c t u r e

A triple-array s t r u c t u r e using three one-dimensional arrays BASE, CHECK and NEXT as


shown in Fig. 1 was defined by S • C. Johnson[14](the details are found in Aho et al.[2]) as a
static implementation scheme for transition tables of YACC[2],[14] and LEX[21118]. The goto
value g(s, a) = st is computed from the triple-array by the following steps.
First, the index t of the array CHECK is computed as t = BASE[s] + a.
If CHECKEr] is equal to the current state s~ then g(s, a) becomes NEXT[tl otherwise g(s,a) fails.
The excellent feature of the triple-array structure is that the indexes to the arrays NEXT
and CHECK are efficiently computed by the array BASE and the numerical value of the input
symbol. Thus, it can achieve the fast retrieval of the transition table.
By restricting the use of the triple-array to the transition of a string pattern matching
machine, the following alternative data structure, called a double-array, can be defined.
For s a n d st in g g(s, a) = st if and only if BASE and CHECK for K s a t i s f i e s
st = BASE[s] + a and CHECK[St] = s
In the d o u b l e - a r r a y structure, the subsequent governing state st can be computed by
BASE[s] and the current input symbol a , so the array NEXT can be removed from the
triple-array.
The goto graph has many states for a large set of keywords, so it is important to make the
d o u b l e - a r r a y more compact. In the interest of space saving, we define the following terms on
the goto graph.

[Definition I] The following states are defined on the goto graph.


1) For key x a y in K, we define state st such that g ( s , xa) = g as a separate state if a is a
sufficient symbol for distinguishing the key x a y from all others in K.
2) Each state(except the separate state) in a path from the initial state to the separate state is
called a multistat~
3) Each state(except the separate state) in a path from the separate state to the accepting state
is called a single-state.
Let Sp, SM and S, be sets of separate states, multistates and single=states, respectively.

[Definition 2] A string x # s u c h that g(s, x#) = s ' f o r s in Sp and st in Sx is called a single=string


for the separate state s, denoted by STR[s]. Only if separate state s has no transition, then
STR[s] = ~.

BASE CHECK NEXT

J
/7
t $ S'

Fig. 1 Triple-array structure for g(g a) = s :

4
26

We propose that the transitions from S ~ × ( I U [#h to ( S ~ U Sp) be stored in the d o u b l e - a r r a y


and that those from (Sp U Sa)× ( I U {#J) to Sj be stored as the single-string in a string memory,
called a TALL. This is a well-known technique[17], there is, however, a problem in how to
determine the following:
a) Whether a given state is a separate state or not.
b) A location for taking a single-string from TAIL.
This problem can be easily solved by using additional arrays, but it can take up too much
space. T h e following modified d o u b l e - a r r a y and TAIL enable us to solve the problem without
using e x t r a space.

[Definition 3] We define the d o u b l e - a r r a y and TAIL as being valid for K if the following c o n -
ditions are satisfied for the goto graph.
(1) For s i n S ~ a n d s ' i n ( S ~ U Sp), g(s, a) = s ' i f and only if BASE[s] + a = st and CHECK[st] = s
(2) For s i n Sp such that STR[s] = b~ba ... b,(O~t~), BASE[s] is negative and
TAIL[p] = b~, TAIL[p + 1] = ba . . . . . TAIL[p + m - I] = b .
for p = -BASE[s],

In this elaborate s t r u c t u r e as depicted in Fig. 2, BASE[r] has two values to indicate a


separate state number and locate the single-string in TAIL. T h e goto graph on ( S ~ U ~ ) X ( I U
{#}) is called the reduced goto graplr

bI b 2 •.. bm

STR[s'] = bib2 . . . b

1
BASE CHECK
p = -BASE[s' ]

/ p TAIL
/

l bl b2 -.. bm

Fig. 2 D o u b l e - a r r a y and TAIL structure.

(Example 1) F i g u r e 3 shows the reduced goto graph and the s i n g l e - s t r i n g s for


K "= {adb#, apply//, apropos#, at#, as#, at#, awk#, basename#, bc#, bill#, binmail#, cat#, co#,
ccat#, cd#, checkeq#, checknr#, cchfn#, chgrp#, chmod#, chown#, chsl~},
which is a s u b s e t of UNIX 4.2BSD commands. In Fig. 3 the multistates are 1, 2, 4, 11, 14, 17, 19,
23, 24, 25, 26 and the other states are the separate states.
27

a d
( STR[3]-b#
P P
~ ) - ~ NTR[5]=Iy~
STR[6]-opos#
r
- :Q STR[7]=#
$
~ -- ~ STZ[8]=#
t
*~ STR[9]=#
5Q STR[IO]=k#
a

~ - - - - ~ STR[12]=sename#

STR[13]=#
STR[15]=f#

STR[ 1 6] ==ai 1#

STR[18]=t#

STRE20]=t
L" >~) STR[21]=t#
d
~ STR[22]=#
h )~ 3 e c k •
STR[27]=q#

= ~ STZ[28]=r#
~--~ STB[30]= rp#

~--~ $TR[31]= od#

STR[32] = wn#

s@ STR[33]= h#

Fig. 3 The reduced goto graph for K:

B. Construction of The Reduced Date Structure

Since the presented data structure requires state numbers to be related, for s and g in S such
that g(s,a) = d, the process of state d must be preceded by that of state s to obtain the relations.
Thus, the function B U I L T _ B C T to construct the double-array and TAIL processes the
multistates in the first step and the separate states in the next. We assume that the
double-array has enough available zero entries to reduce the G O T O array.
28

The function BUILT_BCT is given below, where it utilizes the following functions.
• new[s]: T h e entry of array ne_Wto denote a new state number for state s.
• function IN_QUEUE(s): Store state s in the queue memory, but the initial q u e u e is empty.
• function OUT_QUEUE(): Remove one e n t r y from the queue memory and return it. If the
q u e u e is empty, then return FALSE.
• m: T h e number of input symbols.
• n : T h e number of states.

• SEPARATE[s]: T h e e n t r y of an array SEPARATE having TRUE if state s is a s e p a r a t e state,


otherwise FALSE.

BUILT_BCT0
[
int s, n e x t _ s t a t e , t a i l _ i n d e x ;

/* Routine for multistates */


s = 1; new[s]= 1;
do{
BASE[new[s]] = I;
overlap:
/*(a-t)*/ for(j=t; j <= m; ++j)
if(GOTO[s][j] != 0 && CHECK[BASE[new[s] + j] != 0) then
{
++BASE[new[s]];
goto overlap;

/*(a-2)*/ for(j=l; j <= m; ++j)


if(GOTO[s][j] l= 0) then
{
CHECK[BASE[new[s]] + j] = new[s];
n e x t _ s t a t e = GOTO[s][j];
n e w [ n e x t _ s t a t e ] = BASE[new[s] + j];
if(SEPARATE[next_state] == FALSE) then
INQUEUE(next_state);

s = OUT_QUEUE();
}while(s = F A L S E ) ;

/* Routine for separate state */


t a i l _ i n d e x = 1;
strcpy(TAIL, "#"); /* TAIL[0] = "#" is regarded as a dummy entry*/
/*(a-3)*/ for(s = 1; s <= n; ++s)
if(SEPARATE(s) == TRUE) then
{
BASE[new[s]] = - t a i l _ i n d e x ;
strcat(TAIL, STR[s]);
t a i l _ i n d e x += strlen(STR[s]);
}
29

T h e for-statement of line (a-l) selects as the row displacement BASE[new[s]] for state s the
smallest value such that no nonfail value GOTO[s~./~ in row s of the array G O T O is mapped to the
same position as any nonfail value in a previous row. The for-statement of line (a-2) defines
nonfail value GOTO[s~./] on the double-array, while renumbering state numbers, and stores the
next governing state in the queue. The for-statement of line (a-3) stores STR[s] for each
separate state in TAIL while holdir~g the condition to BASE[new[s]] on Definition 3---(2).

(Example 2 ) Figure 4 shows the d o u b l e - a r r a y and TAIL for K" as shown in Fig. 4. In this
example, the numerical values for a, b, G . . . . z~ # are regarded as 1, 2, 3 . . . . . 26, 27,
respectively.

1 2 3 4 5 6 7 8 9 I0 11 12 13 14 15 16 17 18 19 20 21

BASE 1 1 5 6 -1 -8 -16 -15 -34 8 -18 2 9 12 23 0 7 -27 -3 -4 -5

CHECK 0 1 1 I 2 3 4 3 I0 4 4 28 12 3 4 0 2 14 2 2 2

22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

BASE 48 -19 -6 -22 -29 0 9 -36 -38 -51 0 0 0 -34 -41 0 -44 0 0 0 -47

CHECK 13 17 2 17 14 0 15 15 15 13 0 0 0 lO 15 0 15 0 0 0 15

I 2 3 4 5 6 7 8 9 I0 II 12 13 14 15 16 17 18 19 20 21
m

TAIL ] b # # # # k # s e n a m e # # t # # 1 y #
I

22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

TAIL o p o s # f # m a i l # I # n # r p # o d

43 44 45 46 47 48 49 50 51 52

TAIL # w n # h # q # r #]

Fig. 4 The d o u b l e - a r r a y and TAIL for K'.

B. A Matching Program b y The Reduced Data Structure

In order to keep a state number from exceeding the maximum index of the arrays BASE and
CHECK, denoted b y BC-$IZt~ of the array CHECK as the maximum index of the nonzero entries.
Suppose that the d o u b l e - a r r a y and TAIL is placed in a main memory as global variables. Then,
a matching program is summarized by the following function BCT_MATCH(word) that r e t u r n s
TRUE if "word" is in K, otherwise FALSE.

8
30

BCT_MATCH(word)
char word[];
[
int s, next_state, input_index, tail_index;

s=1;
input_index = -I;
do{
++input_index;
next_state = BASE[s] ÷ word[input_index];
/*(b-l)*/ if(next_state > BASE[0] ','~CHECK[next_state] != s) then return (FALSE);
s = next_state;
} while(BASE[s] >= 0)
l*(b-2)*/ if(word[input_index] == "#') then return(TRUE);
tail_index = -BASE[s];
++input_index;
do{
/*(b-3)*/ if(TAIL[tail_index] != word[input_index]) then return(FALSE);
*+tail_index;
++input_index;
]while(word[input_index-l] == '#');
return(TRUE);

In this program, line(b-I) returns F A L S E w h e n a mismatch is detected on the double-array.


T h e first do-while loop terminates if BASE[s] is negative, that is, state s is a separate state. If
the current input symbol is equal to "#' at line (b-2), then T R U E is immediately returned because
of STRIa] = e. Line (b-3) in the second do-while loop returns F A L S E w h e n a mismatch is
detected on T A I L and the loop terminates w h e n the matching on T A I L is success.

(Example 3 ) Consider the retrieval of keyword cat# by using the double-array and TAIL for
K" as shown in Fig. 4. Note that BC-SIZE is 42. BCT_MATCH(cat~ returns TRUE by the
following computations.
1) For input_index = 0.
next_state = BASE[s] + word[input_index] = BASE[l] + word[0] = I + c = 4
next_state = 4 < B C - S I Z E = 42 and C H E C K [ n e x t s t a t e ] = CHECK[4] = I
s = next_state = 4
BASE[s] = BASEl4] = 6 > 0
2) For input_index = I.
next_state = BASE[s] + word[input index] = BASE[4] + word[l] = 6 + a = 7
next_state = 7 < B C - S I Z E = 42 and C H E C K [ n e x t s t a t e ] = CHECK[7] = 4
s = next_state = 7
BASE[s] = BASE[7] = -16 < 0
3) For input i n d e x = 2 .
TAIL[tail_index] = TAIL[16] = t = word[input_index] = word[2]
4) For input_index = 3.
TAIL[tail_index] = TAIL[17] = # = word[input_index] = word[3]

9
31

IV. E v a l u a t i o n

A. Theoretical Observation

• Size of the d o u b l e - a r r a y
As discussed by Aho et al. [2] and Tarjan et a1.[241 it is difficult to e v a l u a t e theoretically
BC-SIZE by using external parameters a" (the number of multistates and separate states) and m
(the number of input symbols). However, because of the sparse relations on the states and input
symbols of the transition table, we can assume that BC-SIZE is proportional to ff + cm for a con-
stant c, where cm is equal to the number of redundant indexes on the arrays BASE and CHECK.
T h e value c is called a redundant coeHicient and will be evaluated b y empirical observations.

* Construction Time of the d o u b l e - a r r a y


T h e worst-case time complexity of the function BUILT_BCT depends on the first
f o r - s t a t e m e n t of line (a-l). The maximum value of BASE[new(s)] in the for-loop of function
BUILT_BCT becomes n'+cax so the worst-case time confirming
(GOTO[s][j] != 0 && CHECK[BASE[new[sl + .~ != 0)
for each s is proportional to ~n'+cm) = n ' ~ + c ~ . Hence, the worst-case time complexity of
constructing the d o u b l e - a r r a y is O ( f f 2 ~ - c n ' ~ ) .

B. Simulation Results

T a b l e I r e p r e s e n t s the simulation results on a workstation sun3/260 for the following sets of


keywords.
KWI: T h e r e s e r v e d words for Pascal[25].
KW2: Main city names in Japan.
KW3: T h e r e s e r v e d words for COBOL[20].
KW4: Commands in UNIX System V.
KW5: Commands in UNIX 4.2BSD.
KW6: Main city names in the world.
KW7: English words in UNIX's spell dictionary.
In this simulation, the number m of the input symbols is 128 and e v e r y e n t r y in the
d o u b l e - a r r a y requires two bytes, but the d o u b l e - a r r a y for KW7 is divided into two to keep two
b y t e s e n t r y of the d o u b l e - a r r a y . In the storage result, TRANSI stands for a v e r a g e storage of
one transition on (SM U S P ) × (I U {#}) and B C T / S F K stands for the ratio the size of the
d o u b l e - a r r a y and TAIL to that of source file of keywords. In the time result Matching stands
for an a v e r a g e matching time for each keyword. From the result, it turns out that, with the
increase of the number of keywords, the number of b y t e s representing one transition closes upon
4.0 and that the the total size of BASE, CHECK and TAIL closes upon that of source file of
keywords. This depends on the following features.
1) The e x t r e m e l y small redundant coefficient value c.
2) T h e t r e e s t r u c t u r e of the goto graph being allowed to merge the common p r e f i x e s into the
same transition.
3) A compact string memory TAIL representing all transitions on single-states that occupies
60%~,J76% to the total number of states.
T h e list forms[17],[19] representing a tree s t r u c t u r e requires from three to five bytes, so we can
say that the presented data s t r u c t u r e is compact.
As may readily be seen in the results, it turns out that the matching is v e r y fast and that the
construction can be performed at a practical speed.

10
32

Table I Simulation results.

KYl g~2 gf3 KW4 g| 5 KW6 X|7

Number and Length


Number of geyvords 35 45 310 603 657 1.480 23,976
Number of M u l t i s t a t e s 17 24 301 296 394 981 16.518
Number of Single-States 109 216 947 1,630 1,730 7.281 59.941
Number of Total States 161 285 1,558 2,529 2.781 9.742 100.235
Average Length of geywords 5.1 6.4 7.5 6.7 6.9 9.5 8.2
Storage(kilo-bytes)
BASE and CHECK 0.25 O-31 2.69 3.75 4.45 9.89 162
TAIL 0.11 0.22 0.95 1.63 1.73 7.28 60
BASE. CHECK and TAIL O. 36 O. 53 3. 64 5. 38 6. 18 17.17 222
TRANSI 4.81 4.55 4.42 4.17 4.23 4.02 4. O0
Source F i l e of geyvords 0.18 0.29 2.33 4.02 4. 54 14.08 196
BCT/SFK 2. O0 1.83 1.56 1.34 1.36 1.22 1. 13
Redundant C o e f f i c i e n t

c 0.08 0.06 0.48 0.30 0.48 0.16 0.11


Time
Natching(mili-second) 31.4 31.3 30.6 30.3 31.6 29.1 31.7
Construclion(second) 1,2 1.6 10.1 21.1 23.0 52.3 840

V. Conc l us i on

An efficient implementation of a string pattern matching machine has been described by


improving the t h r e e - a r r a y s t r u c t u r e of compiler-compilers LEX[18] and YACC[14] in UNIX
systems. It has been shown by empirical results for various sets of keywords that the size of
the obtained data s t r u c t u r e is v e r y compact and that the presented matching program is v e r y
fast.
T h e presented data s t r u c t u r e was used for a lexical analyzer[13]; filtering high f r e q u e n c y
words in natural language processing[7]; and for a Roman-Hiragana translator[7] of a Japanese
word processor, which converts Roman characters into the corresponding Hiragana characters.
The presented method can be applied to a static key search in place of perfect
hashing[8],[10],[15],[17],[22] and to the reduction algorithm of sparse matrices[5],[24]. It would
be a v e r y interesting study to update the double-array and to e x t e n d the use of the
d o u b l e - a r r a y to the transition tables of a finite state machine[2] associated with parsing
tables[14].
The developed program will be provided on request to any readers. Please feel free to
contact me.

R e f e r e n c e s

[1] A. V. Aho and M. J. Corasick, "Efficient string matching: An aid to bibliographic search,"
Comm.ACM., vol. 18, pp. 333-340, June1975.
[2] A. V. Aho, R. Sethi and J. D. Ullman: Compilers Principles, Techniques, and Tools,
Addison-Wesley, Reading Mass., Ch. 3, pp. 144-146, 1986.

11
33

[3] J. Ape, Y. Yamamoto and R. Shimada, "An efficient method for storing and retrieving finite
state machines," (in Japanese), IECE Trans., vol. J65-D, pp. 1235-1242, Oct. 1982, (in English),
Electronica Japonica, vol. 13.
[4] , "A method for improving string pattern matching machines", 1EEE Trans., Softw. Eng.,
vol. SE-10, pp. 116-120, Jan. 1984.
[5] , "An efficient algorithm of reducing sparse matrices by row displacements,"
(in Japanese), Trans. IPS Japan, vol. 26, pp. 211-218, Mar. 1985.
[6] , "An Efficient implementation of static string pattern matching machines," Proc of the
First Int. Conf. on Supercomputing, pp. 491-498, Dec. 1985.
[7] J. Ape and M. Fujikawa, "An efficient representation of hierarchical semantic primitives -An
aid to machine translation systems-, " Proc. of the Second Int. Conf. on Supercomputing,
pp.361-370, May 1987.
[8] F. Berman, E. Bock, E. Dittert, M. J. O'donnelland D. Plank: "Collections of functions for
perfect hashing, SIAM J. Comput., vol. 15, pp. 604-618, Feb. 1986.
[9] R. G. G. Cattel,"Automatic derivation of code generators from machine descriptions,"
ACId Trans. Prog. Lang. Syst., vol. 4, pp. 173-190 Jan. 1982.
[10] R. J. Cichelli, " Minimal perfect functions made simple," Comm. ACK vol. 23, pp. 17-19, Jan.
1980.
[11] ]. W. Davidson and C. W. Fraser, OThe design and application of a retargetable peephole
optimizer," ACId Trans., Prog. Lang. Sys~ vol. 4, pp. 21-36, Jan. 1982.
[12] S. L. Graham,"Table-driven code generation," Computer, vol. 13, pp. 25-34, Aug. 1980.
[13] J. Harada,"Pascal compiler by table driven lexical and syntax analyzers", (in Japanese)
Tokushima Univ., Graduation Thesis, 1983.
[14] S. C. Johnson, " YACC-yet another compiler-compiler," CSTR 32, Bell Lab., N. ]., pp.l-34,
1975.
[15] G. Jaeschke, "Reciprocal hashing: A method for generating minimal perfect hashing
functions," Comm. ACM., vol. 24, pp. 829-833, Dec. 1981.
[16] D. E. Knuth, J. H.. Morris, and V. R. Pratt,"Fast pattern matching in string," SIAM J.
Compug, vol. 6, pp. 323-350, June 1977.
[17] D. E. Knuth, The Art of Computer Programming, vol. I, Fundamental Algorithm, pp. 295-304,
Addition-Wesley, Reading Mass., Ibid., vol. IIl, Sorting and Searching, pp. 481-505, 1973..
[18] M. E. Lesk,"Lex-a lexical analyzer generator, "CSTR 39, Bell Lab., N. J., pp. 1-13, Oct. 1975.
[19] J. L. Peterson, J. L., Computer Programs for Spelling Correction, Lecture Notes in Comput.
Sci., Springer-Verlag, New York 1980.
[20] T. Sato,"COBOL Technique," (in Japanese), Tokyo-Denki-Daigaku, 1970.
[21] B. A. Shell, "Median split trees: A fast lookup techniques for frequency occurring keys, "
Comm. ACId., vol. 21, pp. 947-959, Nov. 1978.
[22] R. Sprugnoli, "Perfect hashing functions: a single probe retrieving method for static sets,"
Comm. ACId vol. 20, pp. 841-850, Nov. 1977.
[23] A. S. Tanenbaum, H. Staveren and J. W. Stevenson,"Using peephole optimization on
intermediate code," ACId Trans. Prog. Lang. Syst., vol.4, pp. 21-36, Jan. 1982.
[24] R. E. Tarjan and A. C. Yap, "Storing a sparse table, "Comm. ACI~ vol. 22, pp. 606-611, Nov.
1979.
[25] N. Wirth,"The programming language Pascal,"Acta Inf., vol. 1, pp. 35-63, Jan. 1971.

12

You might also like