Speculative Execution in High Performance Computer Architectures Chapman Hall Crc Computer Information Science Series 1st Edition David Kaeli pdf download
Speculative Execution in High Performance Computer Architectures Chapman Hall Crc Computer Information Science Series 1st Edition David Kaeli pdf download
https://ebookgate.com/product/bayesian-data-analysis-second-edition-
chapman-hall-crc-texts-in-statistical-science-andrew-gelman/
ebookgate.com
R Markdown The Definitive Guide Chapman Hall CRC The R
Series 1st Edition Xie
https://ebookgate.com/product/r-markdown-the-definitive-guide-chapman-
hall-crc-the-r-series-1st-edition-xie/
ebookgate.com
Speculative Execution
in High Performance
Computer Architectures
CHAPMAN & HALL/CRC
COMPUTER and INFORMATION SCIENCE SERIES
Series Editor: Sartaj Sahni
PUBLISHED TITLES
HANDBOOK OF SCHEDULING: ALGORITHMS, MODELS, AND PERFORMANCE ANALYSIS
Joseph Y-T. Leung
THE PRACTICAL HANDBOOK OF INTERNET COMPUTING
Munindar P. Singh
HANDBOOK OF DATA STRUCTURES AND APPLICATIONS
Dinesh P. Mehta and Sartaj Sahni
DISTRIBUTED SENSOR NETWORKS
S. Sitharama Iyengar and Richard R. Brooks
SPECULATIVE EXECUTION IN HIGH PERFORMANCE COMPUTER ARCHITECTURES
David Kaeli and Pen-Chung Yew
CHAPMAN & HALL/CRC COMPUTER and INFORMATION SCIENCE SERIES
Speculative Execution
in High Performance
Computer Architectures
Edited by
David Kaeli
Northeastern University
Boston, MA
Published in 2005 by
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
Kaeli, David R.
Speculative execution in high performance computer architectures / David Kaeli and Pen Yew.
p. cm. -- (Chapman & Hall/CRC computer and information science series)
Includes bibliographical references and index.
ISBN 1-58488-447-9 (alk. paper)
1. Computer architecture. I. Yew, Pen-Chung, 1950- II. Title. III. Series.
QA76.9.A73K32 2005
004'.35--dc22 2005041310
David Kaeli received his B.S. in Electrical Engineering from Rutgers Uni-
versity, his M.S. in Computer Engineering from Syracuse University, and his
Ph.D. in Electrical Engineering from Rutgers University. He is currently an
Associate Professor on the faculty of the Department of Electrical and Com-
puter Engineering at Northeastern University. Prior to 1993, he spent 12
years at IBM, the last 7 at IBM T.J. Watson Research in Yorktown Heights,
N.Y. In 1996 he received the NSF CAREER Award. He currently direct-
s the Northeastern University Computer Architecture Research Laboratory
(NUCAR). Dr. Kaeli’s research interests include computer architecture and
organization, compiler optimization, VLSI design, trace-driven simulation and
workload characterization. He is an editor for the Journal of Instruction Level
Parallelism, the IEEE Computer Architecture Letters, and a past editor for
IEEE Transactions on Computers. He is a member of the IEEE and ACM.
URL: www.ece.neu.edu/info/architecture/nucar.html
v
Acknowledgments
Professors Kaeli and Yew would like to thank their students for their help and
patience in the preparation of the chapters of this text. They would also like
to thank their families for their support on this project.
vii
Contents
1 Introduction 1
David R. Kaeli1 and Pen-Chung Yew2 Northeastern University,1 Uni-
versity of Minnesota2
3 Branch Prediction 29
Philip G. Emma IBM T.J. Watson Research Laboratory
4 Trace Caches 87
Eric Rotenberg North Carolina State University
ix
x Contents
Index 421
List of Tables
12.1 Using control and data speculation to hide memory latency. . 302
12.2 Example 12.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
12.3 Enhanced Φ insertion allows data speculation. . . . . . . . . . 317
12.4 Enhanced renaming allows data speculation . . . . . . . . . . 318
12.5 An example of speculative load and check generation . . . . . 321
12.6 Different representations used to model check instructions and
recovery code . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
12.7 Two examples of multi-level speculation . . . . . . . . . . . . 323
12.8 Examples of check instructions and their recovery blocks in
multi-level speculation . . . . . . . . . . . . . . . . . . . . . . 324
12.9 Examples of check instructions and their recovery blocks in
multi-level speculation . . . . . . . . . . . . . . . . . . . . . . 325
12.10Example of recovery code generation in speculative PRE . . . 328
12.11The recovery code introduced in the speculative PRE interacts
with the instruction scheduling . . . . . . . . . . . . . . . . . 329
xi
List of Figures
xiii
xiv List of Figures
Advances in VLSI technology will soon enable us to place more than a billion
transistors on a single chip. Given this available chip real estate, new com-
puter architectures have been proposed to take advantage of this abundance
of transistors. The two main approaches that have been used successfully to
improve performance are:
• the ability to identify those events which can be predicted with high
accuracy, and
• the ability to filter out those events which are hard to predict.
1
2 Speculative Execution in High Performance Computer Architectures
This book brings together experts from both academia and industry who are
actively involved in research in the various aspects of speculative execution.
The material present in this book is organized around four general themes:
1. instruction-level speculation,
2. data-level speculation,
3. compiler-level support and multithreading for speculative execution, and
4. novel speculative execution architectures.
Next we will discuss each of these topics as a brief introduction to this book.
is fetched and the time the instruction that depends on the value of the
load instruction can use it.) These schemes use techniques such as address
prediction and dependency prediction [15]. The emphasis in this chapter is on
out-of-order processor architectures. A taxonomy of address calculation and
address prediction is presented. Address prediction is a special case of value
prediction, and utilizes similar schemes [16]. Issues related to speculative
memory disambiguation in out-of-order processors, as well as empirical data
on the potential of such speculative disambiguation, are also presented.
Chapter 9 presents mechanisms that perform data value speculation. In
general, data dependence speculation tries to speculatively ignore data depen-
dences that are either too ambiguous or too rare to occur [17]. Data value
speculation tries to predict values of the true data dependencies that will
actually occur during program execution. The data dependent instructions
can thus proceed speculatively without waiting for the results of the instruc-
tions they depend on [18]. Various value predictors and the issues related to
their implementation such as their hardware complexity are presented. Is-
sues related to data dependence speculation are described in the context of
data value prediction. Four different approaches are discussed that can be
used to verify the validity of a value prediction, and to recover from a value
mis-prediction [19].
Chapter 10 focuses an approach that overcomes data access latency and
data dependence by combining aggressive execution and value speculation.
It has been observed that many computations are performed repeatedly and
redundantly [20]. By dynamically removing such redundant computations,
the stalls due to memory latency and data dependences for those computa-
tions can be removed. The approach itself could be implemented as a non-
speculative mechanism (i.e., the block of computations could wait until all of
its live-in inputs are available before it determines whether or not the block
has been computed before, and thus redundant). However, using value predic-
tion for those live-in inputs can provide a significant performance advantage
and can allow us to take large speculative steps through the execution space.
Empirical data is presented in this chapter which demonstrates the potential
effectiveness of this scheme.
References
Glenn Reinman
UCLA Computer Science Department
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Scaling Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Instruction Cache Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Instruction Cache Prefetching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Future Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.1 Introduction
Instruction delivery is a critical component of modern microprocessors.
There is a fundamental producer/consumer relationship that exists between
the instruction delivery mechanism and the execution core of a processor.
A processor can only execute instructions as fast as the instruction delivery
mechanism can supply them.
Instruction delivery is complicated by the presence of control instructions
and by the latency and achievable bandwidth of instruction memory. This
Chapter will focus on the latter problem, exploring techniques to hide in-
struction memory latency with aggressive instruction cache prefetching.
9
10 Speculative Execution in High Performance Computer Architectures
Branch
Predictor instruction
PC mask
to
Instruction
cache line execution
Cache
core
Instruction Delivery
FIGURE 2.1: Instruction delivery mechanism: High level view of the struc-
tures of the front-end.
to their size, thus the latency of the logic scales linearly with feature size
reductions.
The latency of the wordlines and bitlines, on the other hand, does not scale
as well due to parasitic capacitance effects that occur between the closely
packed wires that form these buses. As the technology is scaled to smaller
feature sizes, the thickness of the wires does not scale. As a result, the parasitic
capacitance formed between wires remains fixed in the new process technology
(assuming wire length and spacing are scaled similarly).
Since wire delay is proportional to its capacitance, signal propagation delay
over the scaled wire remains fixed even as its length and width are scaled.
This effect is what creates the interconnect scaling bottleneck. Since on-chip
memory tends to be very wire congested (wordlines and bitlines are run to
each memory cell), the wires in the array are narrowly spaced to minimize the
size of the array. As a result, these wires are subject to significant parasitic
capacitance effects. Agarwal et al. [AHKB00] conclude that architectures
which require larger components will scale more poorly than those with s-
maller components. They further conclude that larger caches may need to
pay substantial delay penalties.
loaded into a portion of memory, and the PC represents the memory address
where a given instruction is stored.
The instruction cache stores a subset of the instructions in memory, dy-
namically swapping lines of instructions in and out of the cache to match the
access patterns of the application. The size of the lines that are swapped in
and out of the cache depend on the line size of the cache. Larger line sizes
can exploit more spatial locality in instruction memory references, but con-
sume more memory bandwidth on cache misses. The line size, along with the
associativity and number of sets of the cache, influences the size, latency, and
energy consumption of the cache on each access.
The more instruction addresses that hit in the instruction cache, the more
memory latency that can be hidden. However, architects must balance this
against the latency and energy dissipation of the cache. The latency of the
instruction cache access impacts the branch misprediction pipeline loop of
the processor. Therefore, the processor must be designed to make the most
efficient use of the available cache space to maximize latency reduction.
Unlike data cache misses, where aggressive instruction scheduling and large
instruction windows can help hide memory access latency, instruction cache
misses are more difficult to tolerate. Instructions are typically renamed in
program order, which means that a single instruction cache miss can stall the
pipeline even when subsequent accesses hit in the cache.
Typically the line size is selected based on the target bandwidth required to
feed the execution core. An entire cache line is driven out, and a contiguous
sequence of instructions are selected from this line based on branch prediction
information and the starting PC.
Figure 2.2 demonstrates some alternatives for instruction cache design.
These alternatives differ in the latency and energy efficient of the cache, and
in the complexity of the implementation.
address
Direct Tag Data
Mapped
valid? cache line
address
Serial way
Tag Data
Access select cache line
Set
Associative
address
Parallel Tag Data
Access
way select cache line
address
Cycle 1 Tag Data
Way
Predictor
Way address way
Prediction = select
Tag Data
way cache line
valid?
A serial access cache does not overlap the tag and data component accesses
as in the direct mapped cache and therefore will have longer latency. But the
energy efficiency is close to that of a direct mapped cache since at most one
line will be driven out of the data component per access. In fact, it may even
provide an energy benefit over the direct mapped cache since there must be a
cache hit for a cache line to be driven out. This benefit would depend on how
often the cache misses and how much of an energy impact the associative tag
comparison has.
A parallel access cache approaches the latency benefits of the direct mapped
cache, as the latency of this cache is the maximum of the tag and data com-
ponent latency, plus some delay for the output driver. However this cache
can use dramatically more energy than the direct mapped cache. On a given
access to an n-way set associative cache, n different cache lines will be driven
to the data output part of the cache until the tag component can select one.
predictor time must be much less than the tag component, to parallelize the
tag and data component accesses as much as possible. If the way prediction
is correct, only a single cache line will be driven out of the data component,
providing the energy benefits of the serial access set associative cache with
the timing benefits of the parallel access set associative cache. However, if the
way prediction is wrong, the correct cache line will be driven in the following
cycle based on the tag access. In this case, the latency of the access is no
worse than the serial access set associative cache, but the energy dissipation
is slightly worse since two cache lines are driven on a way misprediction. The
way predictor must be designed to be fast enough to parallelize the access
time to the cache components, small enough to avoid contributing to energy
dissipation, but large enough to provide reasonable prediction accuracy.
992, 512, 544, 576, 352, 384, 416, 768, 800, ...
TIME
FIGURE 2.3: Sample instruction address stream: This sample stream of
cache line addresses will be used in subsequent figures. The stream starts
with instruction memory address 992.
To limit the number of prefetches, Smith [Smi82] suggested that each in-
struction cache line be augmented with a single bit to indicate whether or not
the next consecutive line should be prefetched. On an instruction cache miss,
the bit for the line that missed would be set. On a cache access, if the bit
is set for the line that is read from the cache, the next sequential cache line
would be prefetched from memory, and the bit for the current line would be
cleared.
For the sample instruction stream shown in Figure 2.3, next line prefetching
would capture the misses at 544, 384, 416, and 800.
quential cache lines, starting with the line that missed, are prefetched into
a small, FIFO queue (the stream buffer). The prefetch requests are over-
lapped, so the latency of the first line that misses can hide the latency of
any subsequent misses in the stream. Subsequent work has considered ful-
ly associative stream buffers [FJ94], non-unit stride prefetches of sequential
cache lines [PK94], and filtering unnecessary prefetches that already exist in
the instruction cache [PK94]. This latter enhancement is critical in reduc-
ing the number of prefetches generated by stream buffers. Idle cache ports
or replicated tag arrays can be used to perform this filtering. A redundant
prefetch (one that already exists in the instruction cache) not only represents
a wasted opportunity to prefetch something useful, but also represents wast-
ed memory bandwidth that could have been used to satisfy a demand miss
from the instruction or data cache, or even a data cache prefetch. Multiple
stream buffers can be coordinated together and probed in parallel with the
instruction cache for cache line hits.
With a conventional stream buffer approach, contiguous addresses will be
prefetched until the buffer fills. The buffer must then stall until one of the
cache lines in the buffer is brought into the cache (on a stream buffer hit)
or until the buffer itself is reallocated to another miss. Confidence counters
associated with each buffer can be used to guide when a buffer should be
reallocated. One policy might be to use one saturating two-bit counter with
each stream buffer. On a stream buffer hit, the counter for that buffer is
incremented. When all stream buffers miss, the counters for all buffers is
decremented. On a cache miss (and when all stream buffers miss), a stream
buffer is selected for replacement if the counter for that stream buffer is cur-
rently 00. If no stream buffers exist with a cleared confidence counter (set to
00), then a buffer is not allocated for that cache miss. The counters would not
overflow or underflow. This policy would allow stream buffers that are suc-
cessfully capturing the cache miss pattern of the incoming instruction address
to continue prefetching cache lines, and would deallocate buffers that are no
longer prefetching productive streams. Similar policies can also be used to
guide what stream buffer should be allowed access to a shared memory or L2
port for the next prefetch request.
Figure 2.4 demonstrates the stream buffer in action on our instruction
stream example. The PC, instruction cache, result fetch queue, and four
stream buffer entries are shown. The V field in the stream buffer indicates
that the entry is valid and is tracking an in-flight cache line. The R field in
the stream buffer indicates that the entry is ready – that it has arrived from
the other levels of the memory hierarchy.
Cycle k sees a successful cache access for address 992. In cycle k+1, the
instruction cache misses on address 512, stalling fetch. A stream buffer is
allocated for the miss. Each cycle, the next contiguous cache line in memory
is brought into the cache. Assuming that the stream buffer uses the filtering
approach of [PK94], address 576 will not be brought into the stream buffer
since it already exists in the instruction cache. Eventually, the stream buffer
Instruction Cache Prefetching 17
FIGURE 2.4: The stream buffer in action on the first part of the example
from Figure 2.3.
18 Speculative Execution in High Performance Computer Architectures
PC
result
fetch
queue to
Instruction
execution
Cache
core
MSHR
L2
Cache
FIGURE 2.5: Out-of-order fetch architecture with a lockup-free instruc-
tion cache.
prefetches address 608, which is not referenced in the example in Figure 2.3.
In this example, the stream buffer would fill with addresses that are not
referenced in the near future, and assuming only a single stream buffer, this
buffer would likely be reallocated to a new miss stream. However, there may
be some benefit to these prefetches even if the buffer is reallocated, as they
may serve to warm up the L2 cache.
chitecture continues to supply instructions after the cache miss to the result
queue. A placeholder in the queue tracks where instructions in in-flight cache
lines will be placed once they have been fetched from other levels of the mem-
ory hierarchy. However, this architecture still maintains in-order semantics
as instructions leave this queue to be renamed. If the next instruction in the
result queue to be renamed/allocated is still in-flight, renaming/allocation
will stall until the instructions return from the other levels of the memory
hierarchy.
Out-of-order fetch requires more sophistication in the result fetch queue
implementation to manage and update the placeholders from the MSHRs,
and to stall when a placeholder is at the head of the result queue (i.e., the
cache line for that placeholder is still in-flight).
Figure 2.6 illustrates our example address stream on the out-of-order fetch
architecture. The PC, instruction cache, result fetch queue, and four MSHRs
are shown. The V field in the MSHR indicates that the entry is valid and
is tracking an in-flight cache line. The R field in the MSHR indicates that
the entry is ready – that it has arrived from the other levels of the memory
hierarchy.
Cycle k sees a successful cache access for address 992, and that cache line
is placed in the result fetch queue. In cycle k+1, address 512 misses in the
instruction cache and is allocated to an MSHR. A placeholder for this line is
installed in the result fetch queue. If there are no other lines in the result fetch
queue, stages fed by this queue must stall until the line is ready. However,
instruction fetch does not stall. In the next cycle, address 544 also misses in
the cache and is also allocated an MSHR. A placeholder for this address is
also placed in the queue. In cycle k+3, address 576 hits in the cache, and
the corresponding cache line is placed in the queue. The stages that consume
entries from this queue (i.e., rename) must still stall since they maintain in-
order semantics and must wait for the arrival of address 512. Once the cache
line for address 512 arrives, it will be placed in the instruction cache and
the MSHR for that address will be deallocated. In this simple example, the
latency of the request for address 512 helped to hide the latency of 544, 320,
and 352. More address latencies could be hidden with more MSHR entries and
assuming sufficient result fetch queue entries. This approach is also heavily
reliant on the branch prediction architecture to provide an accurate stream of
instruction addresses.
The similarity between MSHRs and stream buffers should be noted, as both
structures track in-flight cache lines. The key difference is that MSHRs track
demand misses and only hold instruction addresses, while stream buffers hold
speculative prefetches and hold the actual instruction memory itself.
992
Cache
k 0 0
0 0
512 MSHRs V R
PC 512 1 0 result fetch queue
Instruction
k+1 Cache 0 0 P
0 0
0 0
544 MSHRs V R
PC 512 1 0 result fetch queue
Instruction
Cache 544 1 0 P P
k+2
0 0
0 0
384 MSHRs VR
PC 512 1 1
Instruction
512
576
Cache 544 1 0 P P P
k+21
320 1 0
result fetch queue
352 1 0
512 is ready and delivered out of order to fetch queue
FIGURE 2.6: Out-of-order fetch in action on the first part of the example
from Figure 2.3.
Instruction Cache Prefetching 21
fetch
target
queue to
Branch Instruction
PC execution
Predictor Cache
core
Stream Buffer
L2
Cache
FIGURE 2.7: The fetch directed prefetching architecture.
Cache 0 0
k 0 0
0 0
Cache hit on 992
Prefetch miss on 512 – allocate stream buffer entry
Stream Buffer V R
fetch target queue 512 1 0
Instruction
352
320
576
544
512
Cache 544 1 0
k+1
0 0
0 0
Cache miss on 512 – cache stalls
Prefetch miss on 544 – allocate stream buffer entry
Stream Buffer V R
fetch target queue 512 1 0
Instruction
352
576
544
320
512
Cache 544 1 0
k+2 0 0
0 0
Stream Buffer V R
fetch target queue 512 1 1
Instruction
352
320
576
544
512
Cache 544 1 0
k+20 320 1 0
352 1 0
number of entries in the result fetch queue, a structure which stores instruc-
tions. Fetch directed prefetching is limited by the number of entries in the
fetch target queue, a structure which stores fetch addresses. Occupancy in ei-
ther of these queues allows these mechanisms to look further ahead at the fetch
stream of the processor and tolerate more latency, but the fetch target queue
uses less space for the same number of entries. Fetch directed prefetching is
only limited by the bandwidth of the branch predictor. To scale the amount
of prefetching, the branch predictor need only supply larger fetch blocks. To
scale prefetching with accurate filtering, the tag component of the cache must
have more ports or must be replicated. In order to scale the number of cache
lines that can be allocated to MSHRs by out-of-order fetch in a single cycle,
the branch predictor bandwidth must be increased and the number of ports
on the instruction cache must increase.
One other difference is that prefetches can often start slightly before out-
of-order fetches. Since prefetching is a lookahead mechanism, a prefetch can
be initiated at the same time that a cache line is fetched from the instruction
cache. This is illustrated in the simple example of Figure 2.8 where address
512 is prefetched one cycle earlier than in out-of-order fetch.
kicked out if its CCT entry is not zero. If the CCT entry saturates, the tag
component will not verify any more requests for that cache line until the CCT
counter is decremented for that entry. In addition to providing a consistency
mechanism, the CCT also provides an intelligent replacement policy. The tag
component does not stall on an instruction cache miss, and therefore it can
run ahead of the data component. The CCT then reflects the near future
use pattern of the instruction cache, and can help guide cache replacement.
The larger the cache block queue, the further ahead the CCT can look at
the incoming fetch stream. This of course requires accurate branch prediction
information.
On a cache miss, the integrated prefetching mechanism allocates an entry
in the stream buffer. The stream buffer can bring in the requested cache
line while the tag component of the cache continues to check the incoming
fetch stream. Once the missed cache line is ready, it can either be installed
into the instruction cache (if the CCT can find a replacement line) or can
be kept in the stream buffer. This allows the stream buffer to be used as a
flexible repository of cache lines, providing extra associativity for cache sets
with heavy thrashing behavior.
Figure 2.9 demonstrates the integrated prefetching architecture of [RCA02].
The stream buffer is also decoupled, and requires a CCT of its own to guide
replacement. The tag components of both the stream buffer and instruction
cache are accessed in parallel. There is a single shared cache block queue to
maintain in-order fetch. Each entry in the cache block queue is consumed by
the data component of both the instruction cache and stream buffer. However,
only one of the two data components will be read depending on the result of
the tag comparison.
Instruction Cache
CCT
Tag Data
Array Array
instruction to
addresses execution
Cache Block Queue core
Tag Data
Array Array
CCT
Stream Buffer
the correct path, and that the wrong path prefetch may itself evict useful
blocks.
One way to preserve entries in the stream buffers, but still allow new
prefetch requests, is to add another bit to each stream buffer entry that in-
dicates whether or not that entry is replaceable or not. If the bit is set, that
means that while the entry may be valid, it is from an incorrectly speculated
path. Therefore, it may be overwritten if there is demand for more prefetches
or misses. But assuming the entry is valid and ready, it should still be probed
on stream buffer accesses to see if there is a wrong-path prefetch hit.
References
[CLM97] I.K. Chen, C.C. Lee, and T.N. Mudge. Instruction prefetching
using branch prediction information. In International Conference
on Computer Design, pages 593–601, October 1997.
angle de ces toits d’un bleu violâtre, où par places luisait joyeusement le
soleil. Des cheminées, quoiqu’il fût de bonne heure et que la saison
n’exigeât pas encore rigoureusement du feu, s’échappaient de petites vrilles
de fumée légère, témoignant d’une vie heureuse, abondante, active. Dans
cette abbaye de Thélème les cuisines étaient déjà éveillées. Montés sur des
chevaux robustes, des gardes-chasse apportaient du gibier pour le repas du
jour; les tenanciers amenaient des provisions que recevaient des officiers de
bouche. Des laquais traversaient la cour, allant porter ou exécuter des
ordres.
Rien n’était plus gai à l’œil que l’aspect de ce château, dont les murs de
briques et de pierres neuves semblaient avoir les couleurs dont la santé
fleurit un visage bien portant. Il donnait l’idée d’une prospérité ascendante,
en plein accroissement, mais non subite comme il plaît aux caprices de la
Fortune, en équilibre sur sa roue d’or qui tourne, d’en distribuer à ses
favoris d’un jour. Sous ce luxe neuf se sentait une richesse ancienne.
Un peu en arrière du château, de chaque côté des ailes, s’arrondissaient
de grands arbres séculaires, dont les cimes se nuançaient de teintes
safranées, mais dont le feuillage inférieur gardait encore de vigoureuses
frondaisons. C’était le parc qui s’étendait au loin, vaste, ombreux, profond,
seigneurial, attestant la prévoyance et la richesse des ancêtres. Car l’or peut
faire pousser rapidement des édifices, mais il ne saurait accélérer la
croissance des arbres, dont peu à peu les rameaux s’augmentent comme
ceux de l’arbre généalogique des maisons qu’ils couvrent et protègent de
leur ombre.
Certes le bon Sigognac n’avait jamais senti les dents venimeuses de
l’envie mordre son honnête cœur et y infiltrer ce poison vert qui bientôt
s’insinue dans les veines, et, charrié avec le sang jusqu’au bout des plus
minces fibrilles, finit par corrompre les meilleurs caractères du monde.
Cependant il ne put refouler tout à fait un soupir en songeant qu’autrefois
les Sigognac avaient le pas sur les Bruyères, pour être de noblesse plus
antique et déjà notoire au temps de la première croisade. Ce château frais,
neuf, pimpant, blanc et vermeil comme les joues d’une jeune fille, adorné
de toutes recherches et magnificences, faisait une satire involontairement
cruelle du pauvre manoir délabré, effondré, tombant en ruine au milieu du
silence et de l’oubli, nid à rats, perchoir de hiboux, hospice d’araignées,
près de s’écrouler sur son maître désastreux qui l’avait quitté au dernier
moment, pour ne pas être écrasé sous sa chute. Toutes les années d’ennui et
de misère que Sigognac y avait passées défilèrent devant ses yeux, les
cheveux souillés de cendre, couvertes de livrées grises, les bras ballants,
dans une attitude de désespérance profonde et la bouche contractée par le
rictus du bâillement. Sans le jalouser, il ne pouvait s’empêcher de trouver le
marquis bien heureux.
En s’arrêtant devant le perron, le chariot tira Sigognac de cette rêverie
qui n’avait rien de fort réjouissant. Il chassa du mieux qu’il put ces
mélancolies intempestives, résorba, par un effort de courage viril, une larme
qui germait furtivement au coin de son œil, et sauta à terre d’une façon
délibérée pour tendre la main à l’Isabelle et aux comédiennes embarrassées
de leurs jupes que le vent matinal faisait ballonner.
Welcome to Our Bookstore - The Ultimate Destination for Book Lovers
Are you passionate about books and eager to explore new worlds of
knowledge? At our website, we offer a vast collection of books that
cater to every interest and age group. From classic literature to
specialized publications, self-help books, and children’s stories, we
have it all! Each book is a gateway to new adventures, helping you
expand your knowledge and nourish your soul
Experience Convenient and Enjoyable Book Shopping Our website is more
than just an online bookstore—it’s a bridge connecting readers to the
timeless values of culture and wisdom. With a sleek and user-friendly
interface and a smart search system, you can find your favorite books
quickly and easily. Enjoy special promotions, fast home delivery, and
a seamless shopping experience that saves you time and enhances your
love for reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!
ebookgate.com