LIPIcs.CPM.2016.23
LIPIcs.CPM.2016.23
Abstract
This paper presents a new approach for linear-time suffix sorting. It introduces a new sorting
principle that can be used to build the first non-recursive linear-time suffix array construction
algorithm named GSACA. Although GSACA cannot keep up with the performance of state of
the art suffix array construction algorithms, the algorithm introduces a couple of new ideas for
suffix array construction, and therefore can be seen as an ’idea collection’ for further suffix array
construction improvements.
1 Introduction
The suffix array is an elementary data structure used in string processing as well as in data
compression. Introduced by Manber and Myers in 1990 [11], the suffix array nowadays
finds application in dozens of different areas. Constructing a suffix array from a given
string unfortunately turns out to be a computationally hard task; despite the existence
of linear-time algorithms for suffix array construction, some super-linear algorithms still
achieve better results in practice.
As data grows bigger and bigger, ’optimal’ suffix array construction algorithms (SACAs)
nowadays still stay an area of great interest. According to a survey paper of Puglisi et
al. [19], an ’optimal’ SACA fulfils three requirements: First, an algorithm should run in
asymptotic minimal worst-case-time, linear-time in an optimal way. Second, an algorithm
should run fast in practice, too. Finally, the algorithm should consume as less extra space
in addition to the text and the suffix array as possible, a constant amount optimally.
Presently, no SACA is able to meet all of those requirements in an optimal way. Our
contribution towards this goal will be the presentation of a new design principle for suffix
array construction, resulting in the first non-recursive linear-time suffix array construction
algorithm. Although the new algorithm is not able to fulfil all requirements of optimal suffix
array construction, it presents a new approach for suffix array construction, and therefore
is interesting from a theoretical point of view.
Overview This paper will be organised as follows: Section 2 contains a short introduction
to suffix arrays and basic definitions. Section 3 presents the new sorting principle along
with an introductory example, before Section 4 lists the new algorithm with explanations
of technical details. Section 5 contains performance analyses of the new algorithm, before
Section 6 summarises the results and gives an outline for future work.
© Uwe Baier;
licensed under Creative Commons License CC-BY
27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016).
Editors: Roberto Grossi and Moshe Lewenstein; Article No. 23; pp. 23:1–23:12
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
23:2 Linear-time Suffix Sorting – A New Approach for Suffix Array Construction
Related Work The suffix array first was described in 1990 by Manber and Myers [11] as a
space-saving alternative to suffix trees [21].
Then, in 2003, four linear-time1 SACAs were contemporary introduced by Kim et al. [8],
Kärkkäinen and Sanders [7], Ko and Aluru [10] and Hon et al. [6], before Joong Chae Na
introduced another linear-time SACA in 2005 [15]. Two algorithms stood out: the Skew
Algorithm by Kärkkäinen and Sanders [7] because of its elegance, as well as the algorithm
by Ko and Aluru [10] because of its good performance in practice.
Later on, in 2009, Nong et al. presented two new algorithms using the induced sorting
principle [17, 18] as an improvement to the algorithm by Ko and Aluru. One of those
algorithms, called SA-IS [17], was able to outperform most of other existing SACAs [14]
while guaranteeing asymptotic linear runtime and almost optimal space requirements. In
the meantime, performance of SA-IS was further improved while decreasing the required
workspace to an only alphabet-dependent linear term [16]. Consequently, variants of the
SA-IS algorithm serve as best linear-time SACAs known at the moment.
2 Preliminaries
Let Σ be a totally ordered set (alphabet) of elements (characters). A string S of length n
over alphabet Σ is a finite sequence of n characters originating from Σ. The empty string
with length 0 is denoted by ε.
Let i and j be two integers in range [1, n]. We denote by
S[i] the i-th character of S.
S[i..j] the substring of S starting at the i-th and ending at the j-th position.
We state S[i..j] = ε if i > j, and define S[i..j + 1) = S[i..j].
Si the suffix of S starting at the i-th position, i.e. Si = S[i..n].
Furthermore, we call S a nullterminated string if $ ∈ Σ, $ < c for all c ∈ Σ \ {$}, and $
occurs exactly once in S, at the end of the string. First, a definition of the suffix array shall
be presented. Additionally, next lexicographically smaller suffixes are required.
1
Super-linear-time SACAs are not object of interest here; we refer to the survey paper of Puglisi et al.
[19] for more information about them.
2
One can think of this as follows: if we define an imaginary empty last suffix Sn+1 := ε, then Sn+1 is a
proper prefix of Sn , so Sn+1 is the next smaller suffix of Sn .
U. Baier 23:3
i SA[i] [
SA[i] SSA[i] S[SA[i]..SA[i]
[)
1 14 15 $ $
2 3 14 aindraining$ aindraining
3 8 14 aining$ aining
4 6 8 draining$ dr
5 13 14 g$ g
6 1 3 graindraining$ gr
7 4 6 indraining$ in
8 11 13 ing$ in
9 9 11 ining$ in
10 5 6 ndraining$ n
11 12 13 ng$ n
12 10 11 ning$ n
13 2 3 raindraining$ r
14 7 8 raining$ r
3 Algorithmic Idea
Within this Section, the algorithmic idea of the new algorithm will be presented. The main
idea is to split the suffix array construction in two phases.
In a first phase, suffixes are divided into suffix groups as if each suffix Si consists only
of the string S[i..bi): If S[i..bi) = S[j..b
j) holds for two suffixes Si and Sj , then they belong to
the same group, otherwise to different groups. For any group G containing a suffix Si , we
denote the string S[i..bi) as the group context of G. In addition to the division of suffixes, the
groups itself also will be ordered by comparing their group contexts. When comparing suffix
groups by their contexts, the terms ’lower group’ and ’higher group’ will be used rather than
the terms ’smaller’ or ’larger’, because groups are sets, and the latter both terms usually
refer to set sizes, not to lexicographic comparison.
Afterwards, in a second phase, this group structure can be used to compute the suffix
array. By iterating over the suffix array in ascending lexicographic order and completing
the contexts of suffixes such that only groups with a single suffix remain, the desired order
of suffixes can be obtained. A sketch of the principle can be found in Algorithm 1.
First, let’s clarify the correctness of the principle by some argumentation. Assume that
before the i-th iteration of the outer loop in Phase 2 (lines 4 to 8) all entries SA[1] · · · SA[i]
were computed correctly. Then, within the i-th iteration, each further computed SA-entry
is correct: Let j be any index with b j = SA[i]. Assume that an index k from the same
group exists such that Sk <lex Sj . Because group(k) = group(j), by the sorting in Phase 1,
S[j..b
j) = S[k..bk) holds, so Sbk <lex Sbj must hold. Because of the ascending iteration order of
the outer loop in Phase 2, b k must have been processed in one of the previous i − 1 iterations.
Within this iteration, the index k was processed in the inner loop of Phase 2, and thus has
been removed from its group in line 8, group(k) 6= group(j), contradiction. For the same
reason, and because of the group order computed in Phase 1 (line 2), exactly those suffixes
Sk with group(k) < group(j) must be lexicographically smaller than Sj , so j is correctly
placed into the suffix array in line 7.
Now we know that all entries are placed correctly to SA, but it remains to show that the
suffix array is filled entirely. Therefore, consider the point in time after the i-th iteration
of the outer loop in Phase 2, and let Sj be the lexicographically i + 1-th smallest suffix.
CPM 2016
23:4 Linear-time Suffix Sorting – A New Approach for Suffix Array Construction
context
$
r
groups {14}{ 3 , 8 }{ 6 }{ 1 , 13}{ 4 , 9 , 11}{ 5 , 10 , 12}{ 2 , 7 }
i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $
Figure 1 Initial group division for the suffixes of S = graindraining$, where links from the
group with context i to the text are shown. Groups are ordered by their context from left to right.
Because Sbj <lex Sj holds by the definition of next lexicographically smaller suffixes, the
index bj must have been processed by the outer loop of Phase 2 already, and thus, the index
j must have been placed to the suffix array correctly, SA[i + 1] = j holds.
The argumentation shows that the principle works correctly, but there are still a lot of
issues remaining. But instead of presenting a more detailed algorithm directly, an intro-
ductory example will be presented, to bridge the gap between the sorting principle and the
final algorithm.
context
r
groups {14}{ 3 , 8 }{ 6 }{ 1 , 13}{ 4 , 9 , 11}{ 5 , 10 , 12}{ 2 , 7}
Step 1: For each index of the pro-
cessed group, compute prev point-
i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ers.
S[i] g r a i n d r a i n i n g $
context
$
r
groups {14}{ 3 , 8 }{ 6 }{ 1 , 13}{ 4 , 9 , 11}{ 5 , 10 , 12}{ 2 , 7}
Steps 2 and 3: Rearrange the pre-
viously computed prev pointer in-
i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 dices in new groups.
S[i] g r a i n d r a i n i n g $
dr
gr
r
groups {14}{ 3 , 8 }{ 6 }{13}{ 1 }{ 4 , 9 , 11}{ 5 , 10 , 12}{ 2 , 7} groups consist of the contexts of
their old groups, extended by the
context of the currently processed
i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 group. Also, the lexicographic
S[i] g r a i n d r a i n i n g $ order between the groups is pre-
served.
Such processing causes an effect quite similar to the prefix doubling technique: Each time
when indices of a group are removed and collected in a new group (step 3), the context
of the new group consists of the context of the old group, extended by the context of the
currently processed group, see Figure 2 for an example.
To clarify why context extensions take place, let i be an index and ic be the first index
following i such that i is not reachable using the prev pointer chain starting at ic , i.e.
ic := min{ j ∈ [i + 1..n + 1] | i 6∈ {j, prev(j), prev(prev(j)), . . .} }.3 As one can show (see
[2]), during the processing of groups in Phase 1, group(i) = group(j) ⇔ S[i..ic ) = S[j..jc )
holds for two indices i and j, so the string S[i..ic ) meets our imagination of group contexts.
However, coming back to the above mentioned context extensions, we’ll take a closer look
onto the steps performed when processing a group. In Step 1, prev pointers are computed.
Let i be an index of the processed group, and let p := prev(i) be its prev pointer. By the
definition of a prev pointer (see Step 1), all indices j between p and i (p < j < i) are placed
in higher groups than p and i.4 Since groups are processed in decreasing order, for each such
index a prev pointer must have been computed already. As p belongs to a lower group than
all of those indices, p ≤ prev(j) must hold for all p < j < i. Consequently, p is reachable
from the prev pointer chains starting at all indices j with p < j < i, but as index i had no
prev pointer before the current step, pc = i must hold. Now, after the computation of the
prev pointer, p is reachable from all indices up to ic − 1, so the new context of p is S[p..ic ).
This shows that p’s old context was extended by the context of the currently processed
group. Consequently, p must be placed into a new group, as performed in Steps 2 and 3.
Another property of the processing is a consistent group order: For any groups G1 and
G2 , G1 is lower ordered than G2 if and only if the context of G1 is lexicographically smaller
than the context of G2 . Whenever a new group is created, its context is extended by a
lexicographically larger context, so the new group must be placed higher than the old one.
Also, since the context of the old group is lexicographically smaller than that of the next
3
After the initial step ic = i + 1 holds for all indices, because no prev pointers were computed yet.
4
The special case that groups of indices between p and i are equal to group(i) will be handled later.
CPM 2016
23:6 Linear-time Suffix Sorting – A New Approach for Suffix Array Construction
dr
gr
in
context
r
groups {14}{ 3 , 8 }{ 6 }{13}{ 1 }{ 4 , 9 , 11}{ 5 , 10 , 12}{ 2 , 7}
Step 1: compute prev point-
ers. Note that one computed prev
i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
pointer points to index 3, while 2
S[i] g r a i n d r a i n i n g $ pointers point to index 8.
dr
gr
in
context
$
r
groups {14}{ 3 , 8 }{ 6 }{13}{ 1 }{ 4 , 9 , 11}{ 5 , 10 , 12}{ 2 , 7} Steps 2 and 3: since the index 8 is
followed by two contexts, it must
be moved to a different group than
i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 3, although both belonged to the
S[i] g r a i n d r a i n i n g $ same group before.
ainin
ain
dr
gr
context in
$
r
groups {14}{ 3 }{ 8 }{ 6 }{13}{ 1 }{ 4 , 9 , 11}{ 5 , 10 , 12}{ 2 , 7} Result: By placing 8 in a higher
group than 3, the lexicographic or-
der of groups is still preserved.
i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $
Figure 4 Groups and prev pointers from the string S = graindraining$ after Phase 1.
higher group G, e the extended context of the new group is lexicographically smaller than that
of G, so the placement of the new groups in Step 3 preserves the lexicographic order.
e
Now knowing that context extensions take place, one needs to be aware of one special
case to preserve a consistent group order: Think about two indices i and j of the same group
such that one prev pointer from an index of the currently processed group points to i, and
two prev pointers from the currently processed group point to j. Since context extensions
take place, i’s context is extended one time, while j’s context is extended by two contexts
of the currently processed group. Since i and j belong to the same group, the new context
of i is lexicographically smaller than that of j. As a consequence, after the extensions, i and
j cannot belong to the same group, and must be handled separately as shown in Figure 3.
Note that the example considers only two indices with different pointer counts; in general
terms, an arbitrary number of indices and pointers must be taken into account.
The result of Phase 1 for our running example can be found in Figure 4. Summarising,
the greedy group processing from highest to lowest group in conjunction with aspects of
implicit dynamic programming lead to the desired group division after Phase 1. A formal
proof for correctness must be omitted here, but can be found in [2]. Next, we’ll take a look
at the implementation of the missing part: Phase 2.
SA[i] 14 3 8 6 13 1 − − − − − − 2 7
order. Within the i-th iteration, all indices j with bj = SA[i] are computed. Each such index
is removed from its current group, placed into a new group as immediate predecessor of its
old group, and stored in the suffix array, see Algorithm 1.
The main issue in implementing this method is to compute indices j with b j = SA[i]. As
we will see, prev pointers computed in Phase 1 will be very useful for this computation:
starting at j := SA[i] − 1, we follow the prev pointer chain prev(j), prev(prev(j)), . . . until
either no more prev pointer exists, or the index under consideration is already contained in
the suffix array. The set { j ∈ [1 . . . n] | b
j = SA[i] } then consists of exactly those indices
visited in the prev pointer chain of SA[i] − 1. Examples can be found in Figures 5 and 6,
the next purpose is to ensure correctness of this statement.
The first index under consideration is j := SA[i] − 1: if j is not contained in the suffix
array already, then by the ascending iteration order of Phase 2, SSA[i] <lex Sj must hold.
Since Sj is the preceding suffix of SSA[i] , SSA[i] clearly must be the next lexicographically
smaller suffix of Sj . Now, given a suffix Sj with b j = SA[i], the next index k with b k = SA[i]
(if existing) can be found by following j’s prev pointer, i.e. k = prev(j). If k is not contained
in the suffix array already, SSA[i] <lex Sk must hold. Also, since group(k) < group(l) holds
for all k < l ≤ j by the definition of prev pointers, Sk <lex Sl holds for all k < l ≤ j because
of the group order of Phase 1. This indeed means that b k ≥b j. Combined with SSA[i] <lex Sk ,
SSA[i] clearly must be the next lexicographically smaller suffix of Sk .
For any index k between j and prev(j) (prev(j) < k < j) group(k) ≥ group(j) must hold
by the definition of prev pointers. If group(k) > group(j), by sorting in Phase 1, Sk >lex Sj
must hold. Because k < j, b k ≤ j 6= SA[i] holds, so those indices can be skipped. In the
special case that group(k) = group(j), by Phase 1, S[k..b k) = S[j..b
j) holds. Since k < j
and the contexts are the same, k < b
b j holds, so clearly k 6= SA[i] must be fulfilled and those
b
indices can be skipped, too.
If an index j is reached that is already contained in the suffix array, we know that it
must have been placed into the suffix array in an earlier step. This indeed means that
Sbj <lex SSA[i] , so j can be skipped. For any further index k in the prev pointer chain of
j, an argumentation as above clearly shows that Sbk <lex SSA[i] , so those indices can be
CPM 2016
23:8 Linear-time Suffix Sorting – A New Approach for Suffix Array Construction
skipped, too. For the remaining indices between this prev pointer chain, we can also use the
argumentation above and forget about these indices, too.
We refer to [2] for a formal proof, it must be omitted here for reasons of space. So far,
we’ve seen a running example along with some argumentations for correctness. The missing
part is an algorithm along with its runtime analysis, which will be addressed in the next
section.
4 Algorithm
The new suffix array construction algorithm including all special cases discussed in the
previous section can be found in Algorithm 2.
Now, to verify that the algorithm can be implemented in asymptotic linear time, some
technical details about the algorithm will be discussed. First thing that has to be done is
to explain a set of needed data structures. Six arrays of size n will be used:
U. Baier 23:9
i 1 2 3 4 5 6 7 8 9 10 11 12 13 14
S[i] g r a i n d r a i n i n g $
GSIZE[i] 1 2 0 1 2 0 3 0 0 3 0 0 2 0
SA[i] 14 3 8 6 1 13 4 9 11 5 10 12 2 7
GLINK[i] 5 13 2 7 10 4 13 2 7 10 7 10 5 1
ISA[i] 5 13 2 7 10 4 14 3 8 11 9 12 6 1
Figure 7 Initial data structure setup after line 2 of Phase 1, applied to the string S =
graindraining$. Prev pointers are not listed since all entries initially are set to nil.
SA contains suffix starting positions, ordered according to the current group order.
ISA is the inverse permutation of SA, to be able to detect the position of a suffix in SA.
GSIZE contains the sizes of all groups. Group sizes are ordered according to the group
order, so GSIZE has the same order as SA. GSIZE contains the size of each group once
at the beginning of the group, followed by zeros until the beginning of the next group.
GLINK stores pointers from suffixes to their groups. All entries point to the beginning
of a group, at the same position where GSIZE contains the size of the group.
PREV is used to store prev pointers. All entries initially are set to nil.
PC is used to count prev pointers pointing from G to P. PC initially is set to zero.
The initial setup of those structures (lines 1 and 2 of Algorithm 2) can be performed in O(n)
time using a technique called bucket sort. Refer to Figure 7 for an example.
The first problem to be solved is the processing of groups in descending group order, line
3. Therefore, if two variables gs and ge contain the bounds of the current group G in SA,
we get to the preceding group by setting ge ← gs − 1 and gs ← GLINK[SA[gs − 1]], and so
trivially need O(n) time to iterate over all groups.
For the prev pointer computation in line 5, we observe the following: Each index j
between an index i and prev(i) belongs to a higher or equal group. If j belongs to a higher
group, its prev pointer is already computed, and each index between j and prev(j) belongs
to a higher group than that of i. So, to compute the prev pointer of an index i, we start at
index i − 1 and follow prev pointers until an index j belongs to the same or a lower group5 .
If j belongs to a lower group, the prev pointer of i is found; otherwise, if j belongs to the
same group and itself has no prev pointer yet, we collect j in a list and repeat the same
procedure, thus setting prev pointers of a whole list of indices. This technique is called
pointer jumping and is well known to require O(n) work totally, since each pointer is used
only once for pointer jumping, and overall n pointers are computed. The extra amount of
work for the list collection is O(|G|), and therefore sums up to O(n) in total for Phase 1,
since each group is processed only once.
For the computation of the set P and subsets P1 , . . . , Pk 6 , (lines 6 to 7) we use the PC-
array. After prev pointer computation, for each i ∈ G, we increment PC[PREV[i]]. After this
loop, PC[p] contains the count of prev pointers pointing from G to p. Also note that the set P
easily can be computed during the loop, by adding the index prev(i) to set P if PC[prev(i)]
was zero before the incrementation. Now, while the set P is not empty, do the following: In
the l-th iteration, for each p ∈ P, decrement PC[p]. If PC[p] is zero, remove p from P and
add it to set Pl . This way, all sets P1 , . . . , Pk are computed, and all entries of the array PC
are set to zero, so it can be reused again. Time results in O(|G|) per group G, because the
5
This can be done by comparing GLINK[j] with gs from the actual group.
6
The set P and subsets P1 , . . . , Pk can be implemented as list and list of lists respectively.
CPM 2016
23:10 Linear-time Suffix Sorting – A New Approach for Suffix Array Construction
Table 2 SACA performance results. Speeda) and cache missesb) are composed of the arithmetic
mean of 10 runs per file for each text corpus.
5 Performance Analyses
All experiments were conducted on a 64 bit Ubuntu 14.04.3 LTS system equipped with two
ten-core Intel Xeon processors E5-2680v2 with 2.8 GHz and 128 GB of RAM.
The algorithm described in this paper was named GSACA because of its greedy and
grouping behaviour. It was compared against common linear-time and state of the art
SACAs on text selections of different text corpuses. The benchmark itself is available online
[1], results can be found in Table 2.
7
Note that the additional split of Pl from line 9 of Algorithm 2 implicitly is performed within this step.
U. Baier 23:11
The results clearly show that GSACA cannot keep up with current state of the art
SACAs; construction speeds of divsufsort or SA-IS are about 3 to 4 times faster than those
of GSACA. Limited performance mainly is owed to cache-unfriendly operations like pointer
jumping or suffix rearrangements, causing high cache miss rates and slow construction.
6 Conclusion
We presented the first non-recursive linear-time suffix array construction algorithm. Unfor-
tunately, by comparing its performance with other linear–time SACAs, GSACA must be
seen as a late child of the 2003 ’epoch of suffix array construction’ rather than a state of
the art SACA. Nonetheless, the results are quite promising: the algorithm deals a lot with
previous smaller and next smaller values, what normally hints to an alternative stack-based
approach. This could result in better cache miss rates and speed, but this remains an open
problem for the moment. Compared to developmental histories of other SACAs, GSACA is
in its infancy, and therefore offers a lot of room for future improvements.
References
1 Uwe Baier. GSACA. https://github.com/waYne1337/gsaca. last visited January 2016.
2 Uwe Baier. Linear-time Suffix Sorting-A new approach for suffix array construction. Mas-
ter’s thesis, Ulm University, 2015.
3 Sebastian Deorowicz. Silesia Corpus. http://sun.aei.polsl.pl/~sdeor/index.php?
page=silesia. last visited January 2016.
4 Paolo Ferragina and Gonzalo Navarro. Pizza & Chili Corpus. http://pizzachili.dcc.
uchile.cl/texts.html. last visited January 2016.
5 Paolo Ferragina and Gonzalo Navarro. Repetitive Corpus. http://pizzachili.dcc.
uchile.cl/repcorpus.html. last visited January 2016.
6 Wing-Kai Hon, Kunihiko Sadakane, and Wing-Kin Sung. Breaking a Time-and-Space Bar-
rier in Constructing Full-Text Indices. In Proceedings of the 44th Annual IEEE Symposium
on Foundations of Computer Science, FOCS ’03, pages 251–260, 2003.
7 Juha Kärkkäinen and Peter Sanders. Simple Linear Work Suffix Array Construction. In
Proceedings of the 30th International Conference on Automata, Languages and Program-
ming, ICALP ’03, pages 943–955, 2003.
8 Dong Kyue Kim, Jeong Seop Sim, Heejin Park, and Kunsoo Park. Linear-time Construction
of Suffix Arrays. In Proceedings of the 14th Annual Conference on Combinatorial Pattern
Matching, CPM ’03, pages 186–199, 2003.
9 Pang Ko. Ko–Aluru Algorithm. https://sites.google.com/site/yuta256/KA.tar.bz2.
last visited January 2016.
10 Pang Ko and Srinivas Aluru. Space Efficient Linear Time Construction of Suffix Arrays.
In Proceedings of the 14th Annual Conference on Combinatorial Pattern Matching, CPM
’03, pages 200–210, 2003.
11 Udi Manber and Gene Myers. Suffix Arrays: A New Method for On-line String Searches.
In Proceedings of the 1st Annual ACM-SIAM Symposium on Discrete Algorithms, SODA
’90, pages 319–327, 1990.
12 Yuta Mori. libdivsufsort. https://github.com/y-256/libdivsufsort. last visited Janu-
ary 2016.
13 Yuta Mori. sais–lite–2.4.1. https://sites.google.com/site/yuta256/sais. last visited
January 2016.
14 Yuta Mori. Suffix Array Construction Benchmark. https://github.com/y-256/
libdivsufsort/blob/wiki/SACA_Benchmarks.md. last visited January 2016.
CPM 2016
23:12 Linear-time Suffix Sorting – A New Approach for Suffix Array Construction
15 Joong Chae Na. Linear-Time Construction of Compressed Suffix Arrays Using O(N Log
N)-bit Working Space for Large Alphabets. In Proceedings of the 16th Annual Conference
on Combinatorial Pattern Matching, CPM ’05, pages 57–67, 2005.
16 Ge Nong. Practical Linear-time O(1)-workspace Suffix Sorting for Constant Alphabets.
ACM Transactions on Information Systems, 31(3):15:1–15:15, 2013.
17 Ge Nong, Sen Zhang, and Wai Hong Chan. Linear Suffix Array Construction by Almost
Pure Induced-Sorting. In Proceedings of the 2009 Data Compression Conference, DCC ’09,
pages 193–202, 2009.
18 Ge Nong, Sen Zhang, and Wai Hong Chan. Linear Time Suffix Array Construction Using
D-Critical Substrings. In Proceedings of the 20th Annual Conference on Combinatorial
Pattern Matching, CPM ’09, pages 54–67, 2009.
19 Simon J Puglisi, William F Smyth, and Andrew H Turpin. A Taxonomy of Suffix Array
Construction Algorithms. ACM Computational Survey, 39(2), 2007.
20 Peter Sanders. DC3 Algorithm. http://people.mpi-inf.mpg.de/~sanders/programs/
suffix/. last visited January 2016.
21 Peter Weiner. Linear Pattern Matching Algorithms. In Proceedings of the 14th Annual
Symposium on Switching and Automata Theory, SWAT ’73, pages 1–11, 1973.