TAGE Predictor
TAGE Predictor
branch prediction
André Seznec, Pierre Michaud
Abstract
It is now widely admitted that in order to provide state-of-the-art accuracy, a condi-
tional branch predictor must combine several predictions. Recent research has shown that
an adder tree is a very effective approach for the prediction combination function.
In this paper, we present a more cost effective solution for this prediction combina-
tion function for predictors relying on several predictor components indexed with different
history lengths. Using GEometric history length as the O-GEHL predictor, the TAGE
predictor uses (partially) tagged components as the PPM-like predictor. TAGE relies on
(partial) hit-miss detection as the prediction computation function. TAGE provides state-
of-the-art prediction accuracy on conditional branches. In particular, at equivalent storage
budgets, the TAGE predictor significantly outperforms all the predictors that were pre-
sented at the Championship Branch Prediction in december 2004.
The accuracy of the prediction of the targets of indirect branches is a major issue on
some applications. We show that the principles of the TAGE predictor can be directly
applied to the prediction of indirect branches. The ITTAGE predictor (Indirect Target
TAgged GEometric history length) significantly outperforms previous state-of-the-art in-
direct target branch predictors.
Both TAGE and ITTAGE predictors feature tagged predictor components indexed
with distinct history lengths forming a geometric series. They can be associated in a
single cost-effective predictor, sharing tables and predictor logic, the COTTAGE predictor
(COnditional and indirect Target TAgged GEometric history length).
1 Introduction
State-of-the-art conditional branch predictors [24, 26, 9, 11] exploit several different
history lengths to capture correlation from very remote branch outcomes as well as very
recent branch history. Hybrid predictors [19] were initially relying on metapredictors to
select a prediction from a few different predictions. Several different approaches were later
proposed to compute the final prediction from predictors featuring multiple components.
E.g., majority vote [21], predictor fusion [18] and partial tagging [6] have been shown to be
competitive with meta-prediction.
However, recent studies on conditional branch predictors as illustrated by the 1st Cham-
pionship Branch Prediction in december 2004 [7, 3, 17, 14, 25] seemed to indicate that the
adder tree is the most competitive approach for final prediction computation. Using a
(multiply)-adder tree as the prediction combination function was first proposed for the
neural based branch predictors [31, 12]. These first proposals were suffering from several
∗. This work was partially supported by an Intel research grant and an Intel research equipment donation
André Seznec and Pierre Michaud
shortcomings, for instance high prediction latency ( due to the adder tree) and predic-
tor hardware logic complexity (growing linearly with the history length). Most of these
shortcomings were progressively addressed. Latency issue was addressed through ahead
pipelining [9]; prediction accuracy was improved through different steps: mixing local and
global history [13], using redundant history and skewing [23]; hardware logic complexity
was reduced through MAC representation [23] or hashing [30]. Finally, the O-GEHL pre-
dictor [25, 24] was shown to be able to exploit very long global history lengths in the
hundreds bits range. Experiments showed that the O-GEHL predictor achieves state-of-
the-art branch prediction accuracy for storage budgets in the 32Kbit-1Mbit range. It uses a
medium number N of predictor tables (e.g. 4 to 12) and limited hardware logic for prediction
computation (a N-entry 4-bit adder tree). To the best of our knowledge the O-GEHL pre-
dictor is the most storage-effective reasonably implementable conditional branch predictor
that has been presented so far.
The O-GEHL predictor relies on an adder tree as the final prediction computation
function, but its main characteristic is the use of a geometric series as the list of the
history lengths. This characteristic allows the O-GEHL predictor to exploit very long history
lengths as well as to capture correlations on recent branch outcomes. Our first contribution
in this paper is to show that partial tagging is a more cost-effective final branch prediction
selection function than an adder tree for predictors using a geometric series of history
lengths.
We present the TAGE conditional branch predictor. TAGE stands for TAgged GEo-
metric history length. TAGE is derived from Michaud’s tagged PPM-like predictor [20]. It
relies on a default tagless predictor backed with a plurality of (partially) tagged predictor
components indexed using different history lengths for index computation. These history
lengths form a geometric series. The prediction is provided either by a tag match on
a tagged predictor component or by the default predictor. In case of multiple hits, the
prediction is provided by the tag matching table with the longest history. Our main con-
tributions on the conditional branch predictor are 1) the use of geometric history length
series in PPM-like predictors 2) a new and efficient predictor update algorithm. On the
CBP traces, the TAGE predictor outperforms the O-GEHL predictor at equal storage bud-
gets and equivalent predictor complexity (number of tables, computation logic, etc.). Our
study points out that the quality of a prediction scheme is very dependent on the choice of
the final prediction computation function, but also on the careful design of the predictor
update policy. With the proposed update policy, partial tagging therefore appears to be
more efficient than adder tree for final prediction computation.
An indirect branch target misprediction and a conditional branch misprediction result in
equivalent penalties since both mispredictions can only be resolved at branch execution time.
On some applications, for instance server applications, the number of indirect branches is
relatively high. On these applications, accurately predicting indirect branch targets is
becoming a major issue. The second contribution of this paper is to point out that the
structure of the TAGE predictor can be easily adapted to predict indirect branches. We
present ITTAGE, an indirect branch target predictor. ITTAGE stands for Indirect branch
Target TAgged GEometric history length predictor. ITTAGE relies on the same principles
as TAGE, i.e., a default predictor backed by a plurality of (partially) tagged predictor
components indexed using global history lengths that form a geometric series. ITTAGE is
2
A case for (partially) TAgged GEometric history length branch prediction
shown to be able to exploit the correlation between a branch target and very long global
history in the hundred bits range. ITTAGE reaches an unprecedented level of accuracy for
an indirect target branch predictor.
Both TAGE and ITTAGE predictors are implemented using tagged predictor compo-
nents indexed with distinct history lengths forming a geometric series. The COTTAGE,
COnditional and indirect Target TAgged GEometric history length, predictor combines a
TAGE predictor and a ITTAGE predictor. Indirect targets and conditional branch out-
comes are stored in the same tables and part of the predictors logic is shared. COTTAGE
is therefore a cost-effective solution for predicting both conditional branches and indirect
branch targets.
Paper outline
The remainder of the paper is organized as follows. Section 2 presents our experimental
framework for simulation and evaluation of the branch predictors. Section 3 introduces the
TAGE predictor principles and evaluates its performance. In Section 4, we present a few
last optimizations that can be implemented on the TAGE predictor and evaluate the TAGE
predictor in the context of the CBP rules. Section 5 induces the ITTAGE indirect predictor
principles. Section 6 points out that the TAGE and ITTAGE predictors can be combined in
single cost-effective predictor, the COTTAGE predictor. Finally, Section 7 reviews related
work and summarizes this study.
2 Evaluation framework
2.1 Simulation traces and evaluation metric
To allow reproducibility, the simulations illustrating this paper were run using the pub-
licly available traces provided for the 1st Championship Branch Predictor that was held in
december 2004 (http://www.jilp.org/cbp/). 20 traces selected from 4 different classes of
workloads are used. The 4 workload classes are: server, multi-media, specint, specfp. Each
of the branch traces is derived from an instruction trace consisting of 30 million instructions.
These traces include system activity.
30 million instruction traces are often considered as short traces for branch prediction
studies. However 30 million instructions represent approximately the workload that is
executed by a PC under Linux or Windows in 10 ms, i.e., the OS time slice. Moreover,
system activity was shown to have an important impact on predictor accuracy [8]. Finally,
some traces, particularly server traces, exhibit very large number of static branches. Such
traces are not represented in more conventional workloads such as Specint workloads.
The characteristics of the CBP traces are summarized in Table 1. It is noticeable that
many traces feature only a few indirect branches, while other traces and particularly server
traces feature many indirect branches.
The evaluation metric used in this paper is misprediction per kiloinstructions (misp/KI).
Due to space limitations, in many places we will use the average misprediction rate computed
as the ratio of the total number of mispredictions on the 20 benchmarks divided by the total
number of instructions in the 20 traces.
Since this study was performed on traces, immediate update of the predictor is assumed.
On a real hardware processor, the effective update is performed later in the pipeline, at
3
André Seznec and Pierre Michaud
FP-1 FP-2 FP-3 FP-4 FP-5 INT-1 INT-2 INT-3 INT-4 INT-5
static branches 444 452 810 556 243 424 1585 989 681 441
dyn. (x10000) 221 179 155 90 242 419 287 377 207 376
static indirect 47 73 102 57 46 58 52 120 60 58
dyn. (x100) 2 3 4 3 1 2 308 5 447 77
MM-1 MM-2 MM-3 MM-4 MM-5 SER-1 SER-2 SER-3 SER-4 SER-5
static branches 460 2523 1091 2256 4536 10910 10560 16604 16890 13017
dyn. (x10000) 223 381 302 488 256 366 354 381 427 429
static indirect 47 456 137 176 1625 2040 1974 1947 2536 1821
dyn. (x100) 371 866 2018 144 1754 2841 2735 1386 2995 3130
misprediction resolution or at commit time. However, for branch predictors using very long
global branch history, the differences of accuracy between a delayed updated predictor and
an immediate update predictor are known to be small [10, 26].
Exact reproducibility assumes exact equal initial state. However, branch predictor be-
haviors might be sensitive to the initialization state of the predictor. Resetting counters
before simulating each trace leads to underestimate cold start effects.
In order to approach a realistic initialization point, the simulations presented in this pa-
per assume that the simulations of the 20 traces are chained without resetting the predictor
counters. Compared with assuming resetting all the counters before simulating each trace,
the discrepancy in prediction accuracy is relatively marginal for the TAGE predictor with
moderate storage budget (0.03 misp/KI for a 64Kbits TAGE predictor), but was found to
result in larger and significant discrepancies on other predictors.
However, in Section 4, we present the simulation results of the TAGE predictor strictly
respecting the 1st Championship Branch Prediction Rules.
For computing the indexes for global history predictors, most studies consider either
hashing the conditional branch history with the branch address or hashing the path history
with the branch address [22]. Both these solutions lead to consider distinct paths as equal.
This phenomenon can be called path aliasing. The impact of path aliasing on predictor
accuracy is particularly important when a short global history is used.
In order to limit this phenomenon, it was proposed in [24] to include non-conditional
branches in the branch history ghist (by inserting a taken bit) and to also use a (limited)
16-bit path history phist consisting of 1 address bit per branch.
Branch history management The TAGE predictor relies on using a very long global
branch history (in the hundred bits range). This global branch history as well as the path
history are speculatively updated and must therefore be restored on misprediction. This
can be implemented through circular buffers (for instance a 256 bits buffer for the global
4
A case for (partially) TAgged GEometric history length branch prediction
history) to store the branch history [15]. Restoring the branch history and the path history
consists of restoring the head pointer.
In this section, we first recall the general principles of using geometric history lengths
initially introduced for the O-GEHL predictor [24]. Then we present the TAGE predictor
and evaluate its performance.
Geometric history length prediction was introduced with the O-GEHL predictor [24].
The predictor features M distinct predictor tables Ti, 0 ≤ i < M indexed with hash
functions of the branch address and the global branch/path history.
Distinct history lengths are used for computing the index of the distinct tables. Table T0
is indexed using the branch address. The history lengths used for computing the indexing
functions for tables Ti, 1 ≤ i < M are of the form L(i) = αi−1 ∗ L(1), i.e., the lengths
L(i) form a geometric series. More precisely, as history lengths are integers, we use
L(i) = (int)(αi−1 ∗ L(1) + 0.5).
Using a geometric series of history lengths allows to use very long history lengths for
indexing some predictor tables, while still dedicating most of the storage space to predictor
tables using short global history lengths. As an example on a 8-component predictor, using
α = 2 and L(1) =2 leads to the following series {0,2,4,8,16,32,64,128}. As pointed out in
[24], the exact formula of the series is not important, but the general form of the series is
important.
The TAGE predictor is directly derived from Michaud’s PPM-like tag-based branch
predictor [20]. Figure 1 illustrates a TAGE predictor. The TAGE predictor features a
base predictor T0 in charge of providing a basic prediction and a set of (partially) tagged
predictor components Ti. These tagged predictor components Ti, 1 ≤ i ≤ M are indexed
using different history lengths that form a geometric series, i.e L(i) = (int)(αi−1 ∗L(1)+0.5).
Throughout this paper, the base predictor will be a simple PC-indexed 2-bit counter
bimodal table. An entry in a tagged component consists in a signed counter ctr which sign
provides the prediction, a (partial) tag and an unsigned useful counter u. Throughout this
paper, u is a 2-bit counter and ctr is a 3-bit counter.
5
André Seznec and Pierre Michaud
T0 T1 T2 T3 T4
base predictor
=? =? =? =?
prediction
Figure 1: A 5-component TAGE predictor synopsis: a base predictor is backed with several
tagged predictor components indexed with increasing history lengths
6
A case for (partially) TAgged GEometric history length branch prediction
A few definitions and notations In the remainder of the paper, we define the provider
component as the predictor component that ultimately provides the prediction. We define
the alternate prediction altpred as the prediction that would have occurred if there had been
a miss on the provider component. That is, if there are tag hits on T2 and T4 and tag
misses on T1 and T3, T4 is the provider component and T2 provides altpred. If there is no
hitting component then altpred is the default prediction.
7
André Seznec and Pierre Michaud
(C) Initializing the allocated entry: The allocated entry is initialized with the prediction
counter set to weak correct. Counter u is initialized to 0 (i.e., strong not useful).
8
A case for (partially) TAgged GEometric history length branch prediction
two of bits. Other numbers from 4 to 15 components can naturally be considered. The
considered configurations are illustrated in Table 2.
9
André Seznec and Pierre Michaud
5
5 comp. TAGE
8 comp. TAGE
4.5 4 comp. OGEHL
8 comp. OGEHL
mispredictions/KI
4 PPM-like
3.5
2.5
2
32 64 128 256 512 1024
Prediction table sizes (Kbits)
four was resulting in no accuracy benefit while on the TAGE predictor it results in 0.07 to
0.12 misp/KI reduction in average for equivalent storage budget predictors.
The importance of the initialization of the allocated entries was already pointed out in
[20] and some hardware was dedicated to control this initialization. But with our improved
method, the method used in [20] is clearly suboptimal.
Updating the predictor is not on the critical path. Therefore, complex update policy may
be applied. While many studies have detailed update policies in branch predictors, the
importance of a careful design of the update policy has been rarely pointed out.
In order to illustrate the importance of the update policy on the effective accuracy of
the predictor, we cite three examples in addition of TAGE. The first example is the use
of the 2-bit counters in branch prediction [29]. The hysteresis bit averages the predictions
on the occurences of the same branch, but it also smoothens aliasing impact. The second
example is the use of partial update on multiple component predictors [21]. Such a partial
update reduces the impact of aliasing on many predictors. The third example is the use
of an update threshold in neural based predictors [12]. Without the use of such an update
threshold, the neural based predictors perform very poorly.
It was shown in [24] that the O-GEHL predictor is very robust to the choice of the param-
eters of the geometric series of history lengths.
The same property applies for the TAGE predictor as illustrated for 8-component TAGE
predictors featuring from 32K to 1Mbits on Figure 3. On this figure, minimum history L(1)
is set to 5, the maximum history length is varied from 50 to 500. For instance, for a 128
Kbits 8-component predictor picking any maximum history length in the range 110-500 leads
to approximately the same average predictor accuracy (minimum 2.37 misp/KI, maximum
2.41 misp/KI).
10
A case for (partially) TAgged GEometric history length branch prediction
5
32 Kbits
64 Kbits
4.5 128 Kbits
256 Kbits
mispredictions/KI
4 512 Kbits
1024 Kbits
3.5
2.5
2
50 100 150 200 250 300 350 400 450 500
maximum history length
4.5
7-bits
8-bits
4 9-bits
10-bits
mispredictions/KI
11-bits
3.5 14-bits
2.5
2
32 64 128 256 512 1024
Prediction tables size (Kbits)
11
André Seznec and Pierre Michaud
First, let us point out that the accuracy difference between the 8-component O-GEHL
predictor and the 8-component TAGE predictor remains approximately constant (around
0.20 misp/KI) when varying the predictor size from 64 Kbits to 1 Mbit.
Second, the O-GEHL predictor also features dynamic history length fitting [16] that
allows it to adapt the used history length to the behavior of the application and this was
shown to bring a significant accuracy benefit. Up to now we have not found any elegant and
efficient way to implement dynamic history length fitting on the TAGE predictor. Despite
the absence of dynamic history length fitting, the TAGE predictor outperforms the O-GEHL
predictor.
Therefore, to compare the effectiveness of the two prediction combination functions, we
have run simulations of a 8-component 64 Kbits O-GEHL predictor where dynamic history
length fitting is disabled (GEHL predictor in [24]). The same geometric series of history
lengths are considered for the 64 Kbits GEHL and TAGE predictors, i.e., L(1)=5 and
L(7)=130. These parameters are close to the best possible parameters for both predictors.
Results of these simulations are displayed on a per benchmark basis on Table 3.
This table clearly illustrates that, despite using 16 bits per entry on a tagged component
entry on the TAGE predictor against using only 4 bits per entry on the O-GEHL predictor,
partial tagging is more storage effective than the adder tree. On any single benchmark, the
TAGE predictor is more accurate than the O-GEHL predictor. It is also noticeable that
the advantage of the TAGE predictor is large on the benchmarks with large code footprint,
i.e., the server benchmarks. The same experiment was performed for a wide range of (L(1),
L(7)) pairs leading to very similar results.
Results displayed in this paper were obtained using 3-bit counters for prediction and
2 bits for the useful counters. On most benchmarks, using 2-bit counters for prediction
and a single bit for the useful counters on the tagged tables is sufficient. However, on
three benchmarks, INT-3, MM-1 and MM-2, some accuracy difference was encountered.
12
A case for (partially) TAgged GEometric history length branch prediction
This translates in an average of 2.78 misp/KI for a 8-component 57K bits TAGE predictor
against 2.61 misp/KI for the illustrated 64 Kbits TAGE predictor.
13
André Seznec and Pierre Michaud
prediction accuracy on medium size predictors, mostly due to aliasing on the base predictor.
For instance, on a 64Kbits 8-component predictor, 3-block ahead pipelining reaches 2.73
misp/KI against 2.61 misp/KI on our benchmark set. This small accuracy gap tends to
vanish for larger predictors, for instance 2.08 misp/KI against 2.05 misp/KI for 1Mbit
8-component TAGE predictors.
The First Championship Branch Prediction was fixing a storage budget of 64Kbits +
256 bits.
Simple tuning for allowing a slightly better usage of the storage budget can be performed
on the TAGE predictor. First to enhance the behavior of the bimodal base predictor, one
can share an hysteresis counter between several prediction entries as proposed on the EV8
predictor [26]. Using one hysteresis bit for four prediction bits appeared to be a good
tradeoff.
Second, we remarked that using slightly different tag widths on the different tagged
tables makes a better usage of the storage space. More precisely, the width of the tag
should (slightly) increase with the history length: a false match is more harmful on the
tables using the longest history.
For a 5-component TAGE predictor, we use respectively 8-bit tags for T1 and T2, and
9-bit tags for T3 and T4. The tagged tables feature 1Kentries, and represent a total of 54
Kbits.
For a 8-component TAGE predictor, we use respectively 9-bit tags for T1 and T2, 10-bit
tags for T3 and T4, 11-bit tags for T5 and T6, 12-bit tags for T7. The tagged tables feature
512 entries, and represent a total of 53.5 Kbits.
On both predictors, we use a base bimodal predictor consisting of 8K prediction bits and
2K hysteresis bits, i.e the 5-component and the 8-component CBP-TAGE predictors feature
14
A case for (partially) TAgged GEometric history length branch prediction
64Kbits and 63.5 Kbits respectively. The respective history length series are (5,15,44,130)
and (5,9,15,25,44,76,130).
The 8-component CBP-TAGE predictor achieves 2.553 misp/KI and the 5-component
CBP-TAGE predictor achieves 2.678 misp/KI. This has to be compared with the 2.820
misp/KI achieved by both the 64Kbits CBP O-GEHL predictor and the 64Kbits Gao and
Zhou predictor [7].
Benchmark by benchmark simulation results are displayed in Table 4.
Table 4: Accuracy of the 64Kbits TAGE predictor (misp/KI) using CBP contest rules
15
André Seznec and Pierre Michaud
PC indexed first-level predictor backed with a second-level tagged table indexed with a hash
function of the global history and the PC.
In this section, we introduce the ITTAGE indirect target predictor based on the same
principles as the TAGE predictor. The ITTAGE predictor outperforms the cascaded indirect
branch predictor at equivalent storage budget.
Remark On architectures featuring a BTB, the BTB can provide the function of the base
indirect jump predictor. For the sake of simplicity of the presentation, we will not consider
this case in this paper. However adapting our ITTAGE predictor to an architecture featuring
a BTB is straightforward.
16
A case for (partially) TAgged GEometric history length branch prediction
=? =? =? =?
Figure 5: ITTAGE predictor structure: a base predictor is backed with several tagged pre-
dictor components indexed with increasing history lengths
17
André Seznec and Pierre Michaud
200000
Chang et al.
cascaded predictor
5-component ITTAGE
150000 8-component ITTAGE
mispredictions
100000
50000
0
32 64 128 256 512
predictor size (Kbits)
200000
38.5 Kbits
77 Kbits
154 Kbits
150000 308 Kbits
mispredictions
616 Kbits
100000
50000
0
50 100 150 200 250 300 350 400 450 500
maximum history length
The storage budgets for the simulated predictors are therefore 77 *2n bits for the 5-
component ITTAGE predictor and 72.25 *2n bits for the 8-component ITTAGE predictor
We also simulated the gshare-like indexed single table indirect jump predictor from Chang et
al [1] and the 2-table cascaded predictor [4]. For this latter predictor, on the CBP benchmark
using the same number of entries are considered on the two tables,a 1-bit confidence is used
on both levels, 9 tag bits are used on the second level since this width is sufficient in
practice. For each of the simulated predictors, we illustrate the best configuration (i.e., the
best history length series).
Figure 6 shows that the ITTAGE predictor allows to limit the number of mispredictions
on indirect jumps. For instance, using a 144.5 Kbits 8-component ITTAGE results in less
than 39,000 mispredictions, i.e., a 2 % misprediction rate against 65,000 mispredictions i.e.,
a 3.4 % misprediction rate when using a 154Kbits cascaded predictor.
As expected from the experiments on the TAGE and O-GEHL predictors, sensitivity to
variations of the parameters of the geometric series is very low for the ITTAGE predictor.
Figure 7 illustrates this phenomenon on a 5-component ITTAGE predictor.
18
A case for (partially) TAgged GEometric history length branch prediction
The ITTAGE predictor is effective to predict indirect jump targets. While some class
of applications would benefit from the use of such a state-of art indirect target predictor,
many other applications do not feature significant numbers of indirect branches (see Table
1). Therefore the cost-effectiveness of implementing an ITTAGE predictor might be ques-
tionable as it features several extra tables compared with previous indirect jump predictor
proposals: using an 72.25 Kbits 8-component predictor leads to 0.09 misp/KI in average
on the 20 benchmarks against 0.16 misp/KI using a 77 Kbits 2-component cascaded pre-
dictor and 0.39 misp/KI for a 132 Kbits single table indirect jump predictor. While the
extra cost of a second table in the indirect branch predictor is compensated by a signifi-
cant overall misprediction reduction, using 3 or 6 extra tables (and the associated logic for
index/tag computation and tag check) for implementing the ITTAGE predictor does not
seem cost-effective.
On high-end superscalar processors, all the prediction structures (conditional branch
predictors, indirect jump predictors, return stack, BTB) must be accessed speculatively.
The similarities of the structures of TAGE predictor and the ITTAGE predictor allow to
regroup the conditional branch predictor and the indirect jump predictor in a single physical
structure as illustrated in Figure 8. The COTTAGE predictor (for COnditional and Target
TAgged GEometric history length) implements both a TAGE predictor and an ITTAGE
predictor. The COTTAGE predictor components store both conditional and indirect jump
predictions. Tag and index computation logic is also shared. On each cycle, conditional
branch prediction and indirect jump prediction are read on the same row.
For cost-effectiveness, the predictor tables must implement more TAGE entries than
ITTAGE entries. Therefore, a row in a predictor table stores a single indirect jump predic-
tion and several conditional branch predictions. For simplifying Figure 8, the multiplexors
needed to select the correct TAGE entry at each COTTAGE component exit have been
omitted.
The robustness of both TAGE and ITTAGE predictors to the choice of the geomet-
ric series of history length parameters allows high accuracy on both predictors (see Fig-
ures 3 and 7). For instance when associating a 8-component ITTAGE predictor featuring
72.25 Kbits predictor with a 256Kbits TAGE predictor, if quasi-optimal parameters for the
TAGE predictor are used such as L(1)=5 and L(7)=195, the ITTAGE predictor encounters
58614 mispredictions against 54103 mispredictions for its best parameters, i.e., L(1)=5 and
L(7)=66.
The COTTAGE predictor appears as a cost-effective solution when one considers the
cost of implementing both a conditional predictor and an indirect efficient indirect jump
predictor: a 5-component COTTAGE predictor provides state-of-the-art prediction accu-
racy on both conditional branches and indirect jumps while using a total of only 5 storage
tables.
7 Conclusion
It is now widely admitted that conditional branch predictors must exploit several differ-
ent history lengths to capture correlation from very remote branch outcomes as well as very
19
André Seznec and Pierre Michaud
to tag match
for multiplexor control
Figure 8: The COTTAGE predictor: combining a ITTAGE predictor and a TAGE predictor
20
A case for (partially) TAgged GEometric history length branch prediction
recent branch history [24, 26, 9]. Before this study, recent research [9, 24, 30] on branch
prediction seemed to indicate that the most cost-effective way to combine these predictions
was through the (multiply)-accumulation of the predictions. Among these new proposals,
the O-GEHL predictor was shown to represent the state-of-the-art at the 1st Championship
Branch Prediction in december 2004 [24].
We have shown that, for predictors using geometric series of history lengths (as O-
GEHL), using partial tags is more storage effective for achieving high accuracy. At equal
number of predictor tables, the TAGE predictor outperforms the O-GEHL predictor with
lower hardware complexity (no use of dynamic history length fitting, shorter global history).
Moreover while combining predictions through (multiply)-accumulation is clearly limited
to conditional branch prediction, a very efficient indirect branch predictor, the ITTAGE
predictor can be designed using the same principles as the TAGE predictor.
Finally, we have presented the COTTAGE predictor, a cost effective solution for predict-
ing both conditional and indirect branches. A M-component COTTAGE predictor embeds
both a M-component TAGE predictor and a M-component ITTAGE predictor, but features
only M predictor tables. Moreover some of the logic (index and tag computation) is also
shared between the TAGE and ITTAGE predictors.
The TAGE and ITTAGE predictors inherit most of the properties of the O-GEHL
predictors. The hardware complexity of the computation logic is limited and scales only
linearly with the number of components in the predictor. The design space of cost-effective
COTTAGE predictors is large. For instance, high level of accuracy are obtained for a broad
spectrum of maximum history lengths ranging from 100 to 500 for medium size predictors.
Moreover as the other global branch history predictors [27, 9, 23, 30, 24], the COTTAGE
predictor can be ahead pipelined to provide predictions in time for the instruction fetch
engine applying the general technique that was described in [24].
This study has demonstrated the superiority of using partial tagging as a prediction
combination function over using an adder tree for predictors using multiple global history
indexed components. However, if a predictor also includes components using other infor-
mation sources (local history, loop counts, value prediction, ..), an other final prediction
combination function has to be found.
Simulator distribution
The simulator of the CBP TAGE predictor described in Section 4 is accessible on the
website of the authors at http://www.irisa.fr/caps/.
References
[1] P.-Y. Chang, E. Hao, and Y. N. Patt. Target prediction for indirect jumps. In ISCA
’97: Proceedings of the 24th annual international symposium on Computer architecture,
pages 274–283, New York, NY, USA, 1997. ACM Press.
[2] I.-C. Chen, J. Coffey, and T. Mudge. Analysis of branch prediction via data compres-
sion. In Proceedings of the 7th International Conference on Architectural Support for
Programming Languages and Operating Systems, Oct. 1996.
21
André Seznec and Pierre Michaud
[4] K. Driesen and U. Holzle. The cascaded predictor: Economical and adaptive branch
target prediction. In Proceeding of the 30th Symposium on Microarchitecture, Dec.
1998.
[6] A. N. Eden and T. Mudge. The yags branch predictor. In Proceedings of the 31st
Annual International Symposium on Microarchitecture, Dec 1998.
[7] H. Gao and H. Zhou. Adaptive information processing: An effective way to improve per-
ceptron predictors. Journal of Instruction Level Parallelism (http://www.jilp.org/vol7),
April 2005.
[9] D. Jiménez. Fast path-based neural branch prediction. In Proceedings of the 36th
Annual IEEE/ACM International Symposium on Microarchitecture, dec 2003.
[10] D. Jiménez. Reconsidering complex branch predictors. In Proceedings of the 9th Inter-
national Symposium on High Perform ance Computer Architecture, 2003.
[11] D. Jiménez. Piecewise linear branch prediction. In Proceedings of the 32nd Annual
International Symposium on Computer Architecture, june 2005.
[12] D. Jiménez and C. Lin. Dynamic branch prediction with perceptrons. In Proceedings
of the Seventh International Symposium on High Perform ance Computer Architecture,
2001.
[13] D. Jiménez and C. Lin. Neural methods for dynamic branch prediction. ACM Trans-
actions on Computer Systems, November 2002.
[15] S. Jourdan, T.-H. Hsing, J. Stark, and Y. N. Patt. The effects of mispredicted-path ex-
ecution on branch prediction structures. In Proceedings of the International Conference
on Parallel Architectures and Compilation Techniques, October 1996.
[16] T. Juan, S. Sanjeevan, and J. J. Navarro. A third level of adaptivity for branch
prediction. In Proceedings of the 25th Annual International Symposium on Computer
Architecture, June 30 1998.
22
A case for (partially) TAgged GEometric history length branch prediction
[18] G. Loh and D. Henry. Predicting conditional branches with fusion-based hybrid predic-
tors. In Proceedings of the 11th Conference on Parallel Architectures and Compilation
Techniques, 2002.
[19] S. McFarling. Combining branch predictors. TN 36, DEC WRL, June 1993.
[21] P. Michaud, A. Seznec, and R. Uhlig. Trading conflict and capacity aliasing in condi-
tional branch predictors. In Proceedings of the 24th Annual International Symposium
on Computer Architecture (ISCA-97), June 1997.
[23] A. Seznec. Revisiting the perceptron predictor. Technical Report PI-1620, IRISA
Report, May 2004.
[24] A. Seznec. Analysis of the o-gehl branch predictor. In Proceedings of the 32nd Annual
International Symposium on Computer Architecture, june 2005.
[25] A. Seznec. Genesis of the o-gehl branch predictor. Journal of Instruction Level Paral-
lelism (http://www.jilp.org/vol7), April 2005.
[26] A. Seznec, S. Felix, V. Krishnan, and Y. Sazeidès. Design tradeoffs for the ev8 branch
predictor. In Proceedings of the 29th Annual International Symposium on Computer
Architecture, 2002.
[27] A. Seznec and A. Fraboulet. Effective ahead pipelining of the instruction addres gen-
erator. In Proceedings of the 30th Annual International Symposium on Computer Ar-
chitecture, June 2003.
[28] A. Seznec, S. Jourdan, P. Sainrat, and P. Michaud. Multiple-block ahead branch pre-
dictors. In Architectural Support for Programming Languages and Operating Systems
(ASPLOS-VII), pages 116–127, 1996.
[29] J. Smith. A study of branch prediction strategies. In Proceedings of the 8th Annual
International Symposium on Computer Architecture, 1981.
[30] D. Tarjan and K. Skadron. Revisiting the perceptron predictor again. Technical Report
CS-2004-28, University of Virginia, September 2004.
[31] L. N. Vintan and M. Iridon. Towards a high performance neural branch predictor. In
IJCNN’99. International Joint Conference on Neural Networks. Proceedings., 1999.
23