AMD-K5 Processor Technical Reference Manual (November 1996)
AMD-K5 Processor Technical Reference Manual (November 1996)
TM
Advanced Micro Devices reserves the right to make changes in its products
without notice in order to improve design or performance characteristics.
Trademarks:
AMD, the AMD logo and combinations thereof are trademarks of Advanced Micro Devices, Inc.
Am386 and Am486 are registered trademarks, and AMD-K5 and K86 are trademarks of Advanced Micro Devices, Inc.
Microsoft and Windows are registered trademarks and Windows NT is a trademark of Microsoft.
Other product names used in this publication are for identification purposes only and may be trademarks of their
respective companies.
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
Contents
1 Overview 1-1
1.1 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
iii
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
4 Performance 4-1
4.1 Code Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1
4.1.1 General Superscalar Techniques . . . . . . . . . . . . . . . . . . . . . . . 4-1
4.1.2 Techniques Specific to the AMD-K5 Processor . . . . . . . . . . . 4-3
4.2 Dispatch and Execution Timing . . . . . . . . . . . . . . . . . . . . . . . . . 4-5
4.2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5
4.2.2 Integer Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-8
4.2.3 Integer Dot Product Example . . . . . . . . . . . . . . . . . . . . . . . . 4-17
4.2.4 Floating-Point Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-19
iv
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
v
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
vi
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
vii
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
viii
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
ix
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
x
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
List of Figures
Figure 2-1. Internal Architecture, with Pipeline Stage . . . . . . . . . . . 2-2
Figure 2-2. Pipeline Stage Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5
Figure 3-1. Control Register 4 (CR4) . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2
Figure 3-2. 4-Kbyte Paging Mechanism . . . . . . . . . . . . . . . . . . . . . . . . 3-5
Figure 3-3. 4-Mbyte Paging Mechanism . . . . . . . . . . . . . . . . . . . . . . . . 3-6
Figure 3-4. Page-Directory Entry (PDE). . . . . . . . . . . . . . . . . . . . . . . . 3-7
Figure 3-5. Page-Table Entry (PTE) . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10
Figure 3-6. EFLAGS Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-15
Figure 3-7. Task State Segment (TSS) . . . . . . . . . . . . . . . . . . . . . . . . 3-22
Figure 3-8. Machine-Check Address Register (MCAR) . . . . . . . . . . 3-25
Figure 3-9. Machine-Check Type Register (MCTR) . . . . . . . . . . . . . 3-26
Figure 5-1. Signal Groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3
Figure 5-2. Single-Transfer Memory Read and Write. . . . . . . . . . . 5-143
Figure 5-3. Single-Transfer Memory Write Delayed
by EWBE Signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-145
Figure 5-4. I/O Read and Write . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-146
Figure 5-5. Single-Transfer Misaligned Memory and
I/O Transfers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-148
Figure 5-6. Burst Reads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-151
Figure 5-7. Burst Read (NA Sampled) . . . . . . . . . . . . . . . . . . . . . . . 5-152
Figure 5-8. Burst Writeback Due To Cache-Line Replacement . . . 5-155
Figure 5-9. AHOLD-Initiated Inquire Miss . . . . . . . . . . . . . . . . . . . 5-158
Figure 5-10. AHOLD-Initiated Inquire Hit to Shared
or Exclusive Line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-159
Figure 5-11. AHOLD-Initiated Inquire Hit to Modified Line. . . . . . 5-161
Figure 5-12. Basic BOFF Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 5-163
Figure 5-13. BOFF-Initiated Inquire Hit to Modified Line. . . . . . . . 5-165
Figure 5-14. HOLD-Initiated Inquire Hit to Shared
or Exclusive Line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-167
Figure 5-15. HOLD-Initiated Inquire Hit to Modified Line . . . . . . . 5-168
Figure 5-16. Basic Locked Operation . . . . . . . . . . . . . . . . . . . . . . . . . 5-170
Figure 5-17. TLB Miss (4-Kbyte Page) . . . . . . . . . . . . . . . . . . . . . . . . 5-172
Figure 5-18. Locked Operation with BOFF Intervention . . . . . . . . . 5-174
Figure 5-19A. Interrupt Acknowledge Operation Part 1. . . . . . . . . . . 5-177
Figure 5-19B. Interrupt Acknowledge Operation Part 2. . . . . . . . . . . 5-178
Figure 5-19C. Interrupt Acknowledge Operation Part 3. . . . . . . . . . . 5-179
Figure 5-20. Basic Special Bus Cycle (Halt Cycle) . . . . . . . . . . . . . . 5-181
Figure 5-21. Shutdown Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-182
Figure 5-22. FLUSH-Acknowledge Cycle . . . . . . . . . . . . . . . . . . . . . . 5-183
Figure 5-23. Cache-Invalidation Cycle (INVD Instruction) . . . . . . . 5-184
Figure 5-24A. Cache-Writeback and Invalidation Cycle
(WBINVD Instruction) Part 1 . . . . . . . . . . . . . . . . . . . . 5-185
xi
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
xii
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
List of Tables
Table 2-1. ALU Instruction Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9
Table 2-2. Cache States for Read and Write Accesses . . . . . . . . . . . . 2-19
Table 2-3. Cache States for Snoops, Invalidation,
and Replacements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-20
Table 2-4. Snoop Action. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-22
Table 3-1. Control Register 4 (CR4) Fields . . . . . . . . . . . . . . . . . . . . . . 3-3
Table 3-2. Page-Directory Entry (PDE) Fields . . . . . . . . . . . . . . . . . . . 3-8
Table 3-3. Page-Table Entry (PTE) Fields . . . . . . . . . . . . . . . . . . . . . . 3-11
Table 3-4. Virtual-Interrupt Additions to EFLAGS Register . . . . . . 3-15
Table 3-5A. Instructions that Modify the IF or VIF
Flags—Real Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-16
Table 3-5B. Instructions that Modify the IF or VIF
Flags—Protected Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-17
Table 3-5C. Instructions that Modify the IF or VIF
Flags—Virtual-8086 Mode . . . . . . . . . . . . . . . . . . . . . . . . . 3-18
Table 3-5D. Instructions that Modify the IF or
VIF Flags—Virtual-8086 Mode Interrupt
Extensions (VME) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-19
Table 3-5E. Instructions that Modify the IF or
VIF Flags—Protected Mode Virtual
Interrupt Extensions (PVI) . . . . . . . . . . . . . . . . . . . . . . . . . 3-20
Table 3-6. Interrupt Behavior and Interrupt-Table Access . . . . . . . . 3-23
Table 3-7. Machine-Check Type Register (MCTR) Fields . . . . . . . . . 3-27
Table 3-8. CPU Clock Frequencies, Bus Frequencies,
and P-Rating Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-29
Table 4-1. Integer Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-8
Table 4-2. Integer Dot Product Internal Operations Timing . . . . . . . 4-18
Table 4-3. Floating-Point Instructions . . . . . . . . . . . . . . . . . . . . . . . . . 4-19
Table 5-1. Summary of Signal Characteristics . . . . . . . . . . . . . . . . . . . 5-4
Table 5-2. Conditions for Driving and Sampling Signals . . . . . . . . . . . 5-8
Table 5-3. Summary of Interrupts and Exceptions . . . . . . . . . . . . . . . 5-16
Table 5-4. Address-Generation Sequence During Bursts . . . . . . . . . . 5-21
Table 5-5. Relation Of BE7–BE0 To Other Signals . . . . . . . . . . . . . . . 5-34
Table 5-6. Encodings For Special Bus Cycles . . . . . . . . . . . . . . . . . . . 5-35
Table 5-7. Processor-to-Bus Clock Ratios. . . . . . . . . . . . . . . . . . . . . . . 5-36
Table 5-8. Outputs Floated When BOFF is Asserted . . . . . . . . . . . . . 5-38
Table 5-9. MESI-State Transitions for Reads . . . . . . . . . . . . . . . . . . . 5-51
Table 5-10. Relation Between D63–D0, BE7–BE0, and DP7–DP0 . . . . 5-56
Table 5-11. MESI-State Transitions for Inquire Cycles . . . . . . . . . . . . 5-71
Table 5-12. Outputs Floated When HLDA is Asserted. . . . . . . . . . . . . 5-74
Table 5-13. Interrupt Acknowledge Operation Definition. . . . . . . . . . 5-85
Table 5-14. PWT, Writeback/Writethrough, and MESI . . . . . . . . . . . 5-105
Table 5-15. Register State After RESET or INIT . . . . . . . . . . . . . . . . 5-110
xiii
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
xiv
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
Preface
This manual describes the technical features of the AMD-K5™ processor, and its dif-
ferences from the Pentium processor, at a level of detail suitable for a hardware
designer or system-software developer to implement system boards, core system
logic, and system software. Specifically, the manual describes the following aspects
of the processor
■ Internal architecture
■ Software differences from the 486 and Pentium processors
■ Performance parameters
■ Bus signals functions
■ Bus cycle timing
■ Design issues for system-board designs
■ Test and debugging features
A full description of the x86 programming environment is beyond the scope of this
manual. Instead, the software sections describe differences from the 486 processor’s
programming environment. A list of commercial books that describe the x86 pro-
gramming environment and other subjects of potential interest appears at the end of
this preface.
Notation
The following notation is used in this manual:
b—Binary
d—Decimal
h—Hexadecimal
xv
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
CS:EIP—A logical address, expressed as a segment selector (CS) and offset (EIP)
Terminology
The following definitions apply throughout this document:
■ Pin and Signal—A pin is a piece of metal on the processor’s package. A signal is
the information about logical states that a pin carries. Pins have pin numbers; sig-
nals have signal names. On processors that multiplex signals, pins can carry more
than one signal; the AMD-K5 processor, however, does not multiplex signals in
this manner.
■ Assert and Negate—A signal that is driven or sampled active is asserted. A signal
that is inactive is negated. In general, asserted means sampled asserted either by
the processor or target logic. Signals that are active in a Low-voltage state, such as
BRDY, are shown with an overbar. Signals that are active in a High-voltage state,
such as INTR, are shown without an overbar. Dual-state signals, such as R/S and
WB/WT, have two states of assertion and, therefore, the term asserted has no
meaning; such dual-state signals are driven High or Low.
■ Drive and Sample—A single-state signal is driven when it is asserted or negated by
a logic device; it is sampled when its driven state is detected by another device.
■ Cycle and Clock—This term commonly refers to at least four different things:
• Bus-clock period: The cycle time of the CLK signal.
• Processor-clock period: The cycle time of the processor’s internal clock, which
has a frequency relative to CLK that is determined by the state of the BF sig-
nal(s) during RESET. Whenever this cycle is meant, such as in the Chapter 4
description of pipeline timing and the instruction latency, the full name, pro-
cessor-clock cycle, is used.
• Bus cycle: A signal protocol on the processor’s bus, such as a single-transfer
read cycle or a special bus cycle.
• Sequence of bus cycles: One or more contiguous bus cycles. For example, the two
bus cycles that constitute an interrupt acknowledgment are called a bus opera-
tion, so that the constituent bus cycles can be distinguished from the entire op-
eration.
xvi
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
xvii
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
cache tags. The AMD-K5 and Pentium processors both support inquire cycles.
• Internal Snooping: These snoops are initiated by the processor (rather than sys-
tem logic) during certain types of cache accesses. Both the AMD-K5 and Pen-
tium microprocessors support this type of internal snooping for the purpose of
detecting self-modifying code. See page 2-22 for details.
• Bus Watch: Some caching devices watch their address and data bus continu-
ously while they are held off the bus. They compare every address driven by
another bus master with their internal cache tags, and they may also be able to
update their cached lines during writebacks to memory by another bus master.
Neither the AMD-K5 nor Pentium microprocessors support bus watching.
■ Cold and Warm Reset—The terms cold or hard reset and warm or soft reset are
commonly used to mean three related but different things, and the terms are
therefore avoided. A cold or hard reset typically refers to the assertion of RESET
at power-up, but warm or soft reset can refer either to the assertion of RESET
after power-up or to the assertion of INIT.
■ System Logic—Any logic outside the processor, including a core-logic chipset,
another bus master, or separate controllers for L2 cache, memory, interrupts,
DMA, communications, video, bus bridging, bus arbitration, or any other system
function.
References
Abel, Peter. IBM PC Assembly Language and Programming. Englewood Cliffs: Prentice
Hall, 1995.
Abramovici, Miron; Melvin A. Breuer; and Arthur D. Friedman. Digital Systems Test-
ing and Testable Design. New York: IEEE Press, 1990.
Agarwal, Rakesh. 80x86 Architecture & Programming. Vols. I and II. Englewood Cliffs:
Prentice-Hall, 1991.
Anderson, Don, and Tom Shanley. Pentium Processor System Architecture. Reading:
Addison-Wesley, 1995.
Barkakati, Nabajyoti, and Randall Hyde. Microsoft Macro Assembler Bible. Carmel:
Sams, 1992.
Brey, Barry B. The Intel 32-Bit Microprocessors. Englewood Cliffs: Prentice Hall, 1995.
Brown, Ralf, and Jim Kyle. PC Interrupts, A Programmer’s Reference to BIOS, DOS, and
Third-Party Calls. Reading: Addison-Wesley, 1994. For an updated version on the
Internet, ftp to OAKOAKLANDEDU and get file PUBMSDOSINFOINTERZIP.
xviii
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
Brumm, Penn, and Don Brumm. 80386/80486 Assembly Language Programming. Wind-
crest: McGraw-Hill, 1993.
Crawford, John H., and Patrick P. Gelsinger. Programming the 80386. San Francisco:
Sybex, 1987.
Giles, William B. Assembly Language Programming for the Intel 80xxx Family. New
York: Macmillan, 1991.
Handy, Jim. The Cache Memory Book. San Diego: Academic Press, 1993.
Institute of Electrical and Electronics Engineers. IEEE Standard for Binary Floating-
Point Arithmetic. ANSI/IEEE Std 754-1985.
Morse, Stephen P.; Eric J. Isaacson; and Douglas J. Albert. The 80386/387 Architec-
ture. New York: John Wiley & Sons, 1987.
Norton, Peter; Peter Aitken; and Richard Wilton. PC Programmer’s Bible. Redmond:
Microsoft Press, 1993.
Patterson, David A., and John L. Hennessy. Computer Organization and Design: The
Hardware/Software Interface. San Francisco: Morgan Kaufmann Publishers, 1994.
Phoenix Technical Reference Series. System BIOS for IBM PCs, Compatibles, and EISA
Computers. Reading: Addison-Wesley, 1991.
xix
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
Ro, Sen-Cuo, and Sheau-Chuen Her. i386/i486 Advanced Programming. New York:
Van Nostrand Reinhold, 1993.
Wakerly, John F. Digital Design Principles and Practices. Englewood Cliffs: Prentice-
Hall, 1994.
Wharton, John. The Complete x86. Sebastopol, CA: MicroDesign Resources, 1994.
xx
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
1
Overview
1-1
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
1.1 Features
■ Pentium-Processor Standard
• Compatible with the Pentium (735\90, 815\100)
processor 296-pin socket
• Compatible with existing Pentium (735\90, 815\100)
processor support infrastructure and system designs
• Compatible with Pentium, 486, and 386 processor soft-
ware
• Compatible with x86 DOS, Microsoft® Windows® operat-
ing system, and the large installed base of x86 software
• Compatible with IEEE 854 floating-point standard
• Selectable bus frequencies
• Support for multiprocessing
■ High-Performance Execution
• Six execution units (two ALUs, two load/store, one
branch, one floating-point)
• Up to four instructions issued per processor clock
• Out-of-order issue and completion
• Speculative execution along three predicted branches
• Register renaming
• Data forwarding
• Predecoder converts x86 instructions to single-cycle
RISC operations (ROPs)
• Fast integer multiply (4-cycle, fully pipelined)
• Five-stage pipeline
• Single-cycle cache access
• Zero-delay branching, 3-clock misprediction penalty (of-
ten hidden)
• No mixed-operand-size penalty
• No prefix penalty
• Single-cycle misalignment penalty
• No instruction-pairing requirements for parallel issue
• No pipeline invalidation on segment loads
• Efficient support for 16- and 32-bit code, with mixed op-
erand sizes
1-2 Overview
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
Features 1-3
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
1-4 Overview
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
2
Internal Architecture
Figure 2-1 shows the major logic blocks that make up the inter-
nal architecture. The blocks are organized in the figure by
stages of the processor’s execution pipeline, which are listed
vertically on the right side of the figure. The blocks are
explained throughout the section that follows.
2-1
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
Byte
Queue
Load
Load Load
Load Execute
Store
Store Store
Store
4 Ports
2 Ports
5 Ports
Figure 2-1 shows the relation between the internal logic and
the stages of the execution pipeline. Figure 2-2 shows the func-
tions of the pipeline stages. The first five stages—Fetch,
Decode 1, Decode 2, Execute, and Result—affect throughput
performance. The sixth stage, Retire, may occur at a variable
number of clocks after the Result stage, but the Retire stage
does not affect throughput performance when the processor
operates in a non-serialized mode, which is typical of most pro-
cessing. Thus, the pipeline effectively has five stages. Because
the pipeline is moderately shallow, penalties associated with
mispredicting a branch (three clocks) or clearing the pipeline
(variable clocks) are relatively small compared with processors
that have deeper pipelines (more pipeline stages).
1 2 3 4 5
Fetch
a) Calculate Address
b) Fetch instruction
Predict branch
Decode 1
a) Merge into byte queue
b) Generate ROPs
Decode 2
a) Merge register tags and immediates
b) Access registers or ROB
Execute
a) Dispatch ROPs to execution units
Calculate operand linear address1
b) Execute
Arbitrate for result bus
Access operands in data cache1
Check protection and segment limit1
Result
Forward to execution units
Write to ROB
Correct branch prediction
Drive write cycle on bus1
Retire
Write to real-state registers
Forward from ROB
Notes:
1. Load/store instructions only.
2. The Retire stage may occur one or more clocks after completion, but it does not affect throughput.
2.2.1 Fetch
The processor can fetch up to 16 bytes per clock out of the
instruction cache. Fetching begins with the calculation of the
linear address for the next instruction along a predicted
branch of the x86 instruction stream. The address accesses the
instruction cache or, during a miss, the prefetch cache. Fetch-
ing can occur along a single execution stream with up to three
taken branches. Fetches that miss both the instruction cache
and prefetch cache are driven to the prefetcher.
2.2.2 Decode
The two-stage decode logic accepts predicted x86 instruction
bytes and their predecode bits from the fetch logic, shifts them
into a 16-byte FIFO buffer called the byte queue, merges regis-
ter tags and operands, and generates internal RISC operations
(ROPs). The decode logic also generates microcode entry
points for complex instructions, interrupts and exceptions, and
several other functions, and it manages the floating-point
stack.
2.2.3 Execute
The processor has the following execution units that work in
parallel with one another:
■ Two ALUs (integer, logic, and shift operations)
■ One floating-point unit
■ Two load/store units
■ One branch unit
Each execution unit has its own FIFO reservation station with
two or four entries. ROPs are dispatched to reservation sta-
tions in program order. One ROP can be dispatched to a single
reservation station in a given clock, thus up to four reservation
stations receive an ROP each clock. ROPs are issued from a res-
ervation station to its execution unit when all operands are
available from the register file, reorder buffer, or prior execu-
tion via forwarding (including from data cache loads), and
when the execution unit has completed its prior ROP. Issue
and dispatch occur in the same clock if the operands are avail-
able and the unit is free at dispatch time.
Integer/Shift Units Two ALUs perform integer, logic, and shift operations. Both
ALUs have two-entry reservation stations. Table 2-1 shows the
types of ROPs executed by each ALU. Unlike the Pentium pro-
cessor, the AMD-K5 processor has few restrictions on the pair-
ing of integer instructions needed to use both integer units in
parallel.
Floating-Point Unit The IEEE 854-compatible floating-point unit (FPU) can issue
pipelined ROPs from its 2-entry reservation station at the rate
of one per clock. One ROP can be issued to either the add or
multiply pipeline in each clock, even when the operations are
separated by an exchange ROP. The add and multiply pipe-
lines use a common pre-detect unit and rounder. The rounder
can return one result per clock.
Load/Store Units Two load/store units read and write data-cache and memory
operands. A shared, 4-entry reservation station buffers incom-
ing ROPs, and a shared, 4-entry store buffer accepts outgoing
speculative-state operands destined for the data cache or mem-
ory. The reservation station is dual-ported and the store buffer
is single-ported, so that the processor can perform two loads or
one load and one store per clock.
Branch Unit The branch unit has a 2-entry reservation station and executes
correctly predicted branches with zero delay. The unit exe-
cutes calls, returns, conditional jumps, conditional byte-sets,
floating-point exchanges, and microbranches. Speculative exe-
cution occurs whenever a conditional-branch instruction exe-
cutes. The branch unit is the only execution unit that decodes
condition codes and supports speculative flag input operands.
2.2.4 Result
The processor implements a 16-entry reorder buffer (ROB) for
speculative-state register renaming, and a 4-entry store buffer
for speculative-state buffering between the load/store units
and the data cache. An ROP is said to complete when the result
of its execution is written to the ROB or store buffer. Results
may be returned out of order. Results written to the ROB are
simultaneously forwarded (that is, fed back) to all execution
units.
An entry tag is allocated at the top of the ROB for each ROP
that is dispatched to a reservation station. Entries for up to
four ROPs can be allocated simultaneously. Among other
things, the ROB keeps track of the program counter associated
with each instruction, resolves ROP-level dependencies, stores
speculative results, provides the most recent copy of a register
to execution units, recovers from mispredicted branches with-
out altering real state, and provides substitute tags to internal
resources when required operands are still outstanding.
2.2.5 Retire
The processor implements a real-state (non-speculative) regis-
ter file that contains the x86-architecture registers and a real-
state 8-Kbyte data cache. While ROPs complete out of order
and their results are forwarded to other execution units and to
the ROB out of order, their results are always written at retire-
ment time to the real-state x86 registers in program order.
Likewise, as results are written from the load/store units to the
store buffer out of order, they are always written at retirement
time to the data cache and/or memory in program order.
Only one store from the store buffer can be among the set of up
to four instructions that retire simultaneously. If the set of
retirement candidates in any clock includes more than one
store, only those instructions up to (but not including) the sec-
ond store will retire. The remaining stores occur one at a time,
in their queued order, during subsequent retire cycles.
The enabling and operating modes for the caches are software
controlled by the CD and NW bits of CR0. When disabled, both
caches are locked. They are accessed in all operating modes,
and the processor can still hit in a cache that has not been
invalidated, even if software has turned the caches off. These
mechanisms work the same on both the AMD-K5 and Pentium
processors.
buffer, which resides between the load/store units and the data
cache, moves to the real-state data cache or memory.
Linear tags are read for all accesses to the instruction and data
caches. All read misses, memory writes, and snooping—both
external inquire cycles and automatic internal snooping—go
through the physical tags. The MESI cache-coherency state is
recorded in the physical tags.
The linear tags for both caches are invalidated whenever pag-
ing is turned on or off, or when CR3 (the page-directory base
register) is loaded, except that during x86-architecture task
switches, the linear tags are only invalidated if the current and
new value for CR3 are different. When linear tags are invali-
dated, many or all of the cached lines may still be valid, but
accesses miss in the linear tags and go through the MMU to the
physical tags. If an access misses the linear tags but hits in the
physical tags, the processor restores the linear tag using the
linear address for the access. This is called a cache-tag recovery.
The revalidation of the linear tag does not add any additional
time to that of the physical-tag access itself.
The linear tags for both caches are invalidated during physical-
tag invalidation, or when the RESET or INIT input signal is
asserted. The linear and physical tags for both caches are inval-
idated when the FLUSH input signal is asserted or when the
INVD or WBINVD instruction is executed.
Table 2-2 shows all possible cache-line states before and after
program-generated accesses to individual cache lines. The
table includes the correspondence between MESI states and
writethrough or writeback states for lines in the data cache.
Table 2-3 shows all possible cache-line states before and after
cache snoop or invalidation operations performed with inquire
cycles. Together, these tables show all of the conditions for
writethroughs and writebacks to memory.
2.3.6 Snooping
The term snooping commonly refers to at least three different
actions, only two of which are supported by the AMD-K5 and
Pentium processors:
■ Inquire Cycles—These are bus cycles initiated by external
logic that cause the processor to look up an address in its
physical cache tags. Both the AMD-K5 and Pentium proces-
sors support inquire cycles.
■ Internal Snooping—This is initiated by the processor (rather
than external logic) during certain cache accesses. Internal
snooping detects self-modifying code. Both the AMD-K5 and
Pentium processors support internal snooping.
■ Bus Watching—Some caching devices watch their address
and data buses while they are held off the bus, comparing
addresses driven by another bus master with their internal
cache tags and optionally updating their cached lines on the
fly during writebacks by the other master. The AMD-K5 and
the Pentium processor do not support bus watching.
Inquire Cycles In systems with multiple caching masters, external logic main-
tains cache coherency by driving inquire cycles to the proces-
sor. System logic initiates inquire cycles by asserting AHOLD,
BOFF, or HOLD to obtain control of the address bus, and then
driving EADS, INV and an inquire address. Such bus cycles
cause the processor to compare the physical tags for both its
instruction and data caches with the inquire address. If the
compare hits a shared or exclusive line in the data cache or a
valid line in the instruction cache, the processor asserts HIT. If
the compare hits a modified line in the data cache, the proces-
sor asserts HITM.
Internal Snooping The processor automatically snoops its instruction cache dur-
ing read or write misses to its data cache, and it snoops its data
cache during read misses to its instruction cache. It does this to
detect the presence of self-modifying code. Table 2-4 summa-
rizes the actions taken during this internal snooping.
If an internal snoop hits its target, the processor does the fol-
lowing:
■ During Instruction-Cache Read Miss—The line in the data
cache, store buffer, or writeback buffer is written back (if
modified) and invalidated, and the instruction-cache read is
performed again. If the data-cache line was modified, a
copy of the writeback data is passed directly to the instruc-
tion cache, thus avoiding a line-fill bus cycle after the write-
back bus cycle.
■ During Data-Cache Read Miss—The line in the instruction
cache, prefetch cache, or line-fill buffer stays valid, and the
data-cache read is performed as a single, non-cacheable
read.
■ During Data-Cache Write Miss—The line in the instruction
cache, prefetch cache, or line-fill buffer is invalidated, the
reorder buffer invalidates all instructions in the pipeline
following the instruction that initiated the snoop, and the
data-cache write is performed.
The AMD-K5 processor, like the 486 processor but unlike the
Pentium processor, requires a jump (near or far) after a self-
modifying write to clear the prefetch cache. However, both the
AMD-K5 and the Pentium processors require a serializing
instruction after self-modifying code whose physical address is
aliased to multiple linear addresses.
2.3.7 Buffers
Several buffers are associated with the instruction and data
caches, as described below.
Line-Fill Buffers The processor has two 16-byte line-fill buffers in the bus inter-
face unit, one of which is used during instruction-cache line
fills and the other during data-cache line fills. The buffer holds
half of the 32-byte burst cycle that the processor drives in
response to a cacheable fetch miss.
2.4.3 Segmentation
The instruction cache contains a copy of certain fields in the
current code-segment descriptor. The information is used dur-
ing prefetch for segment translation (logical-to-linear
addresses), thus providing linear-address tags for the instruc-
tion-cache entries. Likewise, the load/store units hold the cur-
rent data-segment descriptors, which are used to generate the
linear address and perform protection checks during data-
cache accesses. The processor can cache segment descriptors
in its data cache.
The TLBs are accessed during cache accesses that miss in the
linear tags. Each TLB is organized into tag directories (linear-
address references) and data arrays (physical-address refer-
ences). The TLB entries also contain bits used to check privi-
lege and access rights. Because the caches are linearly
addressed, however, cache accesses do not go through the TLB.
The cache accesses are faster because the TLB is not involved.
Copies of the privilege and access bits from the TLB entries
are loaded into the caches when the cache lines are filled. If a
privilege-level violation is detected during a cache access, the
TLB is accessed, and it alone can issue a page-related excep-
tion.
3
Software Environment and
Extensions
3-1
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
31 8 7 6 5 4 3 2 1 0
G M P T P V
P C S D S V M
E E E E D I E
Reserved
Software can read the MCAR and MCTR registers in the excep-
tion handling routine with the RDMSR instruction, as
described in Section 3.3.5 on page 3-33. The format of the regis-
ters is shown in Figure 3-8 on page 3-25 and Figure 3-9 on page
3-26.
PTE
Byte
PDE
CR3
31 22 21 12 11 0
Linear Address
4-Mbyte
Page
4-Mbyte
Page
Directory
Byte
PDE
CR3
31 22 21 0
Linear Address
Figure 3-1 and Table 3-1 show the fields in CR4. Figure 3-4 and
Table 3-2 show the fields in a page-directory entry.
31 12 11 10 9 8 7 6 5 4 3 2 1 0
A
P P P U W
Physical Base Address V G S 0 A C W / / P
L D T S R
The INVLPG instruction clears both the V and G bits for the
referenced entry. To invalidate all entries, including global-
page entries, in both TLBs:
1. Clear the Global Page Extension (GPE) bit in CR4.
2. Load CR3 with the base address of another (or same) page
directory.
31 12 11 10 9 8 7 6 5 4 3 2 1 0
A P P U W
Physical Base Address V G 0 D A C W / / P
L D T S R
Interrupt Redirection 8086 programs expect to have full access to the interrupt flag
in Virtual-8086 Mode (IF) in the EFLAGS register, which enables maskable external
Without VME interrupts via the INTR signal. When 8086 programs run in Vir-
Extensions tual-8086 mode on a 386 or 486 processor, they run as pro-
tected tasks and access to the IF flag must be controlled by the
operating system on a task-by-task basis to prevent corruption
of system resources.
Hardware Interrupts When VME extensions are enabled, the IF-modifying instruc-
and the VIF and VIP tions that are normally trapped by the operating system are
Extensions allowed to execute, but they write and read the VIF bit rather
than the IF bit in EFLAGS. This leaves maskable interrupts
enabled for detection by the operating system. It also indicates
to the operating system whether the Virtual-8086 program is
able to or expecting to receive interrupts.
Thus, when VME extensions are enabled, the VIF and VIP bits
are set and cleared as follows:
■ VIF—This bit is controlled by the processor and used by the
operating system to determine whether an external
maskable interrupt should be passed on to the program or
held pending. VIF is set and cleared for instructions that
can modify IF, and it is cleared during software interrupts
through interrupt gates. The original IF value is preserved
in the EFLAGS image on the stack.
■ VIP—This bit is set and cleared by the operating system via
the EFLAGS image on the stack. It is set when an interrupt
occurs for a Virtual-8086 program who’s VIF bit is cleared.
The bit is checked by the processor when the program sub-
sequently attempts to set VIF.
Figure 3-6 and Table 3-4 show the VIF and VIP bits in the
EFLAGS register. The VME extensions support conventional
emulation methods for passing interrupts to Virtual-8086 pro-
grams, but they make it possible for the operating system to
avoid time-consuming emulation of most instructions that
write or read the IF.
The VIF and IF flags only affect the way the operating system
deals with hardware interrupts (the INTR signal). Software
interrupts are handled like machine-generated exceptions and
cannot be masked by real or virtual copies of IF (see page 3-
21). The VIF and VIP flags only ease the software overhead
associated with managing interrupts so that virtual copies of
the IF flag do not have to be maintained by the operating sys-
tem. Instead, each task’s TSS holds its own copy of these flags
in its EFLAGS image.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
V V I
I I I A V R N O O D I T S Z A P C
D P F C M F T P F F F F F F F F F
L
Reserved
ID Flag ID 21
Virtual Interrupt Pending VIP 20
Virtual Interrupt Flag VIF 19
Alignment Check AC 18
Virtual-8086 Mode VM 17
Resume Flag RF 16
Nested Task NT 14
I/O Privilege Level IOPL 13–12
Overflow Flag OF 11
Direction Flag DF 10
Interrupt Flag IF 9
Trap Flag TF 8
Sign Flag SF 7
Zero Flag ZF 6
Auxiliary Flag AF 4
Parity Flag PF 2
Carry Flag CF 0
Table 3-5D. Instructions that Modify the IF or VIF Flags—Virtual-8086 Mode Interrupt
Extensions (VME)1
TYPE PE VM VME PVI IOPL GP(0) IF VIF
CLI 1 1 1 — 3 No IF ← 0 No Change
CLI 1 1 1 — <3 No No Change VIF ← 0
STI 1 1 1 — 3 No IF ← 1 No Change
STI 1 1 1 — <3 No3 No Change VIF ← 1
PUSHF 1 1 1 — 3 No Pushed Not Pushed
PUSHF 1 1 1 — <3 No Not Pushed Pushed into IF
PUSHFD 1 1 1 — 3 No Pushed Pushed
PUSHFD 1 1 1 — <3 Yes — —
POPF 1 1 1 — 3 No Popped Not Popped
POPF 1 1 1 — <3 No Not Popped Popped from IF
POPFD 1 1 1 — 3 No Popped Not Popped
POPFD 1 1 1 — <3 Yes — —
IRET from
1 1 1 — 3 No Popped Not Popped
V86 Mode
IRET from
1 1 1 — <3 No3 Not Popped Popped from IF
V86 Mode
IRETD from
1 1 1 — 3 No Popped Not Popped
V86 Mode
IRETD from
1 1 1 — <3 Yes — —
V86 Mode
IRETD from
1 1 1 — — No3 Popped Popped
Protected Mode2
Notes:
1. All Virtual-8086 mode tasks run at CPL = 3.
2. All protected virtual interrupt handlers run at CPL = 0.
3. GP(0) if an attempt is made to set VIF when VIP = 1.
— Not applicable.
Table 3-5E. Instructions that Modify the IF or VIF Flags—Protected Mode Virtual
Interrupt Extensions (PVI)1
TYPE PE VM VME PVI IOPL GP(0) IF VIF
CLI 1 0 — 1 3 No IF ← 0 No Change
CLI 1 0 — 1 <3 No No Change VIF ← 0
STI 1 0 — 1 3 No IF ← 1 No Change
STI 1 0 — 1 <3 No3 No Change VIF ← 1
PUSHF 1 0 — 1 3 No Pushed Not Pushed
PUSHF 1 0 — 1 <3 No Pushed Not Pushed
PUSHFD 1 0 — 1 3 No Pushed Pushed
PUSHFD 1 0 — 1 <3 No Pushed Pushed
POPF 1 0 — 1 3 No Popped Not Popped
POPF 1 0 — 1 <3 No Not Popped Not Popped
POPFD 1 0 — 1 3 No Popped Not Popped
POPFD 1 0 — 1 <3 No Not Popped Not Popped
IRETD2 1 0 — 1 — No3 Popped Popped
Notes:
1. All Protected mode virtual interrupt tasks run at CPL = 3.
2. All protected mode virtual interrupt handlers run at CPL = 0.
3. GP(0) if an attempt is made to set VIF when VIP = 1.
— Not applicable.
Figure 3-7 shows the format of the TSS, with the Interrupt
Redirection Bitmap near the top. The IRB contains 256 bits,
one for each possible software-interrupt vector. The most-sig-
nificant bit of the IRB is located immediately below the base of
the IOPB. This bit controls interrupt vector 255. The least-sig-
nificant bit of the IRB controls interrupt vector 0.
31 0
TSS Limit
from TR
I/O Permission Bitmap (IOPB)
(up to 8 Kbyte)
Operating System
Data Structure
The only differences between the VME and PVI extensions are
that, in PVI, selective INTn interception using the Interrupt
Redirection Bitmap in the TSS does not apply, and only the STI
and CLI instructions are affected by the extension.
Table 3-5A through Table 3-5E and Table 3-6 show, among
other things, the behavior of hardware and software inter-
rupts, and instructions that affect interrupts, in Protected
mode with the PVI extensions enabled.
The MCAR can be read with the RDMSR instruction when the
ECX register contains the value 00h. Figure 3-8 shows the for-
mat of the MCAR register. The contents of the register can be
read with the RDMSR instruction.
If system software has set the MCE bit in CR4 before the bus-
cycle error, the processor also generates a machine-check
exception as described in Section 3.1.1 on page 3-4.
63 0
The MCTR can be read with the RDMSR instruction when the
ECX register contains the value 01h. Figure 3-9 and Table 3-7
show the formats of the MCTR register. The contents of the
register can be read with the RDMSR instruction. The proces-
sor clears the CHK bit (bit 0) in MCTR when the register is
read with the RDMSR instruction.
If system software has set the MCE bit in CR4 before the bus-
cycle error, the processor also generates a machine-check
exception as described in Section 3.1.1 on page 3-4.
63 5 4 3 2 1 0
L M D W C
O / / / H
C I C R K
K O
Reserved
3.3.1 CPUID
Privilege: CPL=0
Registers Affected: EAX, EBX, ECX, EDX
Flags Affected: none
Exceptions Generated: Real, Virtual-8086 mode—none
Protected mode—none
The CPUID instruction identifies the type of processor and the features it supports.
A 0 or 1 value written to the EAX register specifies what information will be
returned by the instruction.
The processor implements the ID flag (bit 21) in the EFLAGS register. By writing and
reading this bit, software can verify that the processor will execute the CPUID
instruction.
For detailed instructions on processor and feature identification see the AMD Proces-
sor Recognition application note, order# 20734.
Table 3-8 outlines the AMD-K5 processor family codes and model codes with the
CPU clock frequencies (MHz), bus frequencies (MHz), and P-rating strings (“Pxxx”).
Table 3-8. CPU Clock Frequencies, Bus Frequencies, and P-Rating Strings
Family Code Model Code CPU Frequency (MHz) CPU Bus Frequency (MHz) P-Rating String (“Pxxx”)1
75 50 P75
0 90 60 P90
100 66 P100
5 90 60 P120
100 66 P133
1
120 60 P150
133 66 P166
Notes:
1. The CPUID instruction does not return a P-Rating string.
— This table does not constitute product announcements. Instead, the information in the table represents possible product offerings.
AMD will announce actual products based on availability and market demand..
3.3.2 CMPXCHG8B
If the memory value matches the value in EDX and EAX, the ZF flag is set to 1 and
the 8-byte value in ECX and EBX is written to the memory location, as follows:
■ ECX—Upper 32 bits of exchange value
■ EBX—Lower 32 bits of exchange value
Privilege: CPL = 0
Registers Affected: CR4, 32-bit general-purpose register
Flags Affected: none
Exceptions Generated: Real mode—none
Virtual-8086 mode—GP(0)
Protected mode—GP(0) if CPL not = 0
3.3.4 RDTSC
The processor’s 64-bit time stamp counter (TSC) increments on each processor clock.
In Real or Protected mode, the counter can be read with the RDMSR instruction and
written with the WRMSR instruction when CPL = 0. However, in Protected mode the
RDTSC instruction can be used to read the counter at privilege levels higher than
CPL = 0.
The required privilege level for using the RDTSC instruction is determined by the
Time Stamp Disable (TSD) bit in CR4, as follows:
■ CPL = 0—Set the TSD bit in CR4 to 1
■ Any CPL—Clear the TSD bit in CR4 to 0
The RDTSC instruction reads the counter value into the EDX and EAX registers as
follows:
■ EDX—Upper 32 bits of TSC
■ EAX—Lower 32 bits of TSC
The following example shows how the RDTSC instruction can be used. After this
code is executed, EAX and EDX contain the time required to execute the RDTSC
instruction.
Privilege: CPL = 0
Registers Affected: EAX, ECX, EDX
Flags Affected: none
Exceptions Generated: Real—GP(0) for unimplemented MSR address
Virtual-8086 mode—GP(0)
Protected mode—GP(0) if CPL not = 0
Protected mode—GP(0) for unimplemented MSR address
The RDMSR or WRMSR instructions can be used in Real or Protected mode to access
several 64-bit, model-specific registers (MSRs). These registers are addressed by the
value in ECX, as follows:
■ 00h: Machine-Check Address Register (MCAR). This may contain the physical
address of the last bus cycle for which the BUSCHK or PCHK signal was asserted.
For details, see Section 3.1.1 on page 3-4.
■ 01h: Machine-Check Type Register (MCTR). This contains the cycle definition of
the last bus cycle for which the BUSCHK or PCHK signal was asserted. For
details, see Section 3.1.1 on page 3-4. The processor clears the CHK bit (bit 0) in
MCTR when the register is read with the RDMSR instruction.
■ 10h: Time Stamp Counter (TSC). This contains a time value. The TSC can be ini-
tialized to any value with the WRMSR instruction, and it can be read with either
the RDMSR or RDTSC instruction. For details, see Section 3.2.3 on page 3-27.
■ 82h: Array Access Register (AAR). This contains an array pointer and test data
for testing the processor’s cache and TLB arrays. For details on the AAR, see Sec-
tion 7.4 on page 7-7.
■ 83h: Hardware Configuration Register (HWCR). This contains configuration bits
that control miscellaneous debugging functions. For details, see Section 7.1 on
page 7-3.
The above value in ECX identifies the register to be read or written. The EDX and
EAX registers contain the MSR values to be read or written, as follows:
■ EDX—Upper 32 bits of MSR. For the AAR, this contains the array pointer and (in
contrast to all other MSRs) its contents are not altered by a RDMSR instruction.
■ EAX—Lower 32 bits of MSR. For the AAR, this contains the data to be read/writ-
ten.
All MSRs are 64 bits wide. However, the upper 32 bits of the AAR are write-only and
are not returned on a read. EDX remains unaltered, making it more convenient to
maintain the array pointer.
3.3.6 RSM
Privilege: CPL = 0
Registers Affected: CS, DS, ES, FS, GS, SS, EIP, EFLAGS, LDTR,
CR3, EAX, EBX, ECX, EDX, ESP, EBP, EDI, ESI
Flags Affected: none
Exceptions Generated: Real, Virtual-8086 mode—Invalid opcode if not in SMM
Protected mode—Invalid opcode if not in SMM
Protected mode—GP(0) if CPL not = 0
The RSM instruction should be the last instruction in any System Management Mode
(SMM) service routine. It restores the processor state that was saved when the SMI
interrupt was asserted. This instruction is only valid when the processor is in SMM. It
generates an invalid opcode exception at all other times.
The processor enters the Shutdown state if any of the following illegal conditions are
encountered during the execution of the RSM instruction: the SMM base value is not
aligned on a 32-Kbyte boundary, or any reserved bit of CR4 set to 1, or the PG bit is
set while the PE is cleared in CR0, or the NW bit it set while the CD bit is cleared in
CR0.
This opcode always generates an invalid opcode exception. The opcode will not be
used in future AMD K86™ processors.
4
Performance
4-2 Performance
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
4-4 Performance
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
4.2.1 Notation
Table 4-1 on page 4-8 contains the definitions for the integer
instructions. Table 4-3 on page 4-19 contains the definitions for
the floating-point instructions. The first column in these tables
indicates the instruction mnemonic and operand types. The fol-
lowing notations are used in the AMD-K5 microprocessor docu-
mentation:
■ reg—register
■ mem—memory location
■ imm—immediate value
■ int_16—16-bit integer
■ int_32—32-bit integer
■ int_64—64-bit integer
■ real_32—32-bit floating-point number
■ real_64—64-bit floating-point number
■ real_80—80-bit floating-point number
x_xx_xxxxxxxx_xxx_xxx
MODrm[2:0]
MODrm[5:3]
Opcode
Addressing Mode:
0x = register
10 = memory without index
1x = memory with or without index
11 = memory with index
1 = two-byte opcode (0F xx)
4-6 Performance
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
The x/y value following the ROP type indicates the relative dis-
patch and execution cycle of the opcode, in the absence of any
conflicts. The format is:
x/y[/z]
where:
■ x = Dispatch Cycle—The relative cycle in which the ROP is
dispatched from decode to the reservation station.
■ y = Execution Cycle—The relative cycle in which the ROP is
issued from the reservation station to the execution unit.
■ z = Result Cycle—The relative cycle in which the result is
returned on the result bus. It is indicated only when the
latency is greater than one cycle. For stores, it reflects the
relative time that a store operand is available to be for-
warded from the store buffer to a dependent load opera-
tion.
4-8 Performance
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
4-10 Performance
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
4-12 Performance
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
4-14 Performance
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
4-16 Performance
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
4-18 Performance
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
4-20 Performance
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
4-22 Performance
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
4-24 Performance
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
4-26 Performance
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
5
Bus Interface
5-1
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
Clock
CLK BF
(BF1–BF0)
AHOLD
BOFF BRDY
Bus BRDYC
BREQ Data
Arbitration D63–D0
HLDA and
HOLD DP7–DP0 Data
PCHK Parity
PEN
A20M
A31–A3
Address EADS
AP
and HIT Inquire
ADS
Address HITM Cycles
ADSC
Parity INV
APCHK
BE7–BE0
D/C AMD-K5
Processor FERR Floating-Point
EWBE
Cycle IGNNE Errors
LOCK
Definition
M/IO
and
NA
Control
SCYC
W/R
BUSCHK
FLUSH
INIT
INTR External
CACHE
NMI Interrupts,
Cache KEN
PRDY Interrupt
Control PCD
R/S Acknowledge,
PWT
RESET and Reset
WB/WT
SMI
SMIACT
STPCLK
Interrupt Acknow.
Memory Writes14
Memory Reads14
Signal
Inquire Cycles3
SMIACT Active
Locked Cycles
AHOLD Active
Special Cycles
RESET Active
HLDA Active
Cache Hits39
PRDY Active
BOFF Active
Shutdown33
Stop Grant
INIT Active
Stop Clock
I/O Cycles
Halt
Bus Arbitration
AHOLD I 23 —
BOFF I —
BREQ O 38
HLDA O 39 35 —
HOLD I 35
Address and Address Parity
A20M I 10 10 10 10 10 10 10 10 10
A31–A32 I/O 44 19 19 7 4 4 3 3 3
AP I/O 38 7 4 4 3 3 3
ADS O 38 37 3 3 3 3
ADSC O 38 37 3 3 3 3
APCHK O 7 3 3 3 3 3 3 3 3
BE7–BE0 38 37 16 3 3 3
Cycle Definition and Control
D/C O 38 37 16 3 3 3
EWBE I 37 26 26 3 3 3
LOCK O 38 1 — 16
M/IO O 38 37 16 3 3 3
NA18 I 18 18 18 16 18
SCYC O 13 13 13 13 13
W/R O 38 37 16 3 3 3
Interrupt Acknow.
Memory Writes14
Memory Reads14
Signal
Inquire Cycles3
SMIACT Active
Locked Cycles
AHOLD Active
Special Cycles
RESET Active
HLDA Active
Cache Hits39
PRDY Active
BOFF Active
Shutdown33
Stop Grant
INIT Active
Stop Clock
I/O Cycles
Halt
Cache Control
CACHE O 38 37 25 25 25 25 16 3 3 3 21
KEN42 I 16 21
PCD O 38 16 3 3 3 21
PWT O 38 16 3 3 3 15
WB/WT I 38 16 15
Data and Data Parity
BRDY I 38 37 16 3 3 3
BRDYC I 38 37 16 3 3 3
D63–D0 I/O 38 37 16 3 3 3
DP7–DP0 I/O 38 37 16 3 3 3
PCHK42 O 16
PEN42 I 16
Inquire Cycles
EADS7 I 43 43 43 43 1 43 43
HIT O 1
HITM O 1
INV I 43 43 43 43 1 43 43
Floating-Point Errors
FERR O
IGNNE I
Interrupt Acknow.
Memory Writes14
Memory Reads14
Signal
Inquire Cycles3
SMIACT Active
Locked Cycles
AHOLD Active
Special Cycles
RESET Active
HLDA Active
Cache Hits39
PRDY Active
BOFF Active
Shutdown33
Stop Grant
INIT Active
Stop Clock
I/O Cycles
Halt
External Interrupts, Interrupt Acknowledgments, and Reset
BUSCHK29 I 38 29 16 3 12 12
FLUSH27 I 41 41 41 41 12
INIT27 I 30 30 30 30 12 9 —
INTR5, 28 I 40 40 40 40
NMI27 I 12 9
PRDY O —
R/S28 I 31
RESET I 30 30 30 30 — 17
SMI27 I 12 22
SMIACT O — 32
STPCLK28 I 34 34 34 34 24
Test and Debug
FRCMC I
IERR O 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
PRDY O See “External Interrupts, Interrupt Acknowledgments, and Reset”
R/S I See “External Interrupts, Interrupt Acknowledgments, and Reset”
TCK I
TDI I
TDO O
TMS I
TRST I
Bus and Processor Clock
BF I 11
BF1–BF0 I 11
CLK I 11
30. The first code fetch after register initialization during INIT or RESET does not occur if AHOLD, BOFF, or HLDA is asserted.
31. PRDY is asserted either when R/S goes Low or when the Test Access Port (TAP) instruction, USEHDT, is executed. In the latter case,
R/S is watched for a Low-to-High transition, which takes the processor out of the Hardware Debug Tool (HDT) mode.
32. The processor can go into the Hardware Debug Tool (HDT) mode from within SMM either when R/S goes Low or when the TAP
instruction, USEHDT, is executed (the instruction causes the processor to assert PRDY). In this case, SMIACT can be toggled with HDT
commands. SMIACT selects main or SMM memory.
33. Only NMI, INIT, RESET, and SMI gets the processor out of the Shutdown state.
34. The processor cannot drive the Stop-Grant special bus cycle.
35. HOLD is sampled, but the only practical effect is to assert HLDA.
36. Writebacks or writethroughs cannot occur when HLDA is asserted.
37. During writebacks.
38. During writebacks or writethroughs.
39. Including writebacks and writethroughs (except for HLDA).
40. The processor cannot drive the interrupt acknowledge cycle, and therefore cannot obtain the interrupt vector.
41. If FLUSH is asserted while AHOLD, BOFF, or HLDA is asserted, the outcome of the flush depends on whether the flush causes write-
backs of modified lines. If no writebacks are needed, the processor invalidates all lines but does not perform the FLUSH-acknowledge
cycle until the processor gets control of the bus again. If a writeback is needed, the processor stops at that writeback without having
invalidated any lines, waits until control of the bus is returned to the processor, then completes the FLUSH operation.
42. Driven or sampled only during reads.
43. Sampled after AHOLD or HLDA is asserted, and while the processor completes an in-progress bus cycle.
44. Without ADS during cache accesses, with ADS during cache writethroughs and writebacks.
The processor writes (pushes) its current state onto the stack
prior to entering the service routine for exceptions and for
BUSCHK, SMI, NMI, and INTR interrupts. Because of these
writes, the state of EWBE affects the processor’s response to
such interrupts and exceptions. For example, if the processor
has initiated a write cycle prior to the next instruction retire-
ment boundary on which such an interrupt would otherwise be
recognized, the bus cycle completes but the processor does not
respond to the interrupt until it samples EWBE asserted so
that it can write to the stack. Also, if the processor has written
to the stack once and EWBE is not asserted thereafter, the pro-
cessor does not write again and its response to an interrupt is
halted. A negated EWBE also pauses the processor’s response
to FLUSH if the flush causes writebacks. However, during
interrupts that do not write to memory (R/S, FLUSH if there
are no writebacks, INIT, and STPCLK), the state of EWBE has
no affect on the processor’s recognition of or response to such
interrupts.
A20M should not be asserted during the first code fetch follow-
ing the RESET or INIT cycles because the masking of bit 20
leads to a fetch from an incorrect address. The BIOS and the
operating system alone are responsible for controlling the
state of A20M. After RESET or INIT, they do this by writing to
an external I/O port. (I/O ports 60 and 64h, or port 92h, or regis-
ter-shadowed versions of those ports are commonly used to
control the state of A20M.) The instruction pipeline is serial-
ized by virtue of writing to the I/O port, thus allowing time for
the A20M signal to assert before the next memory or cache
access. Advanced operating systems that do not run under
DOS, such as Windows NT™ and OS/2 operating systems, do
not use Real mode and never assert A20M.
System logic can derive memory and I/O port select signals, as
well as memory row and address signals, from A31–A3 and the
cycle definition signals. Although the processor does not inter-
pret the A4–A3 signals as part of an inquire cycle address, sys-
tem logic must drive them at valid logic levels (0 or 1) during
inquire cycles, and the processor drives both bits to 0 during
writebacks.
While system logic has obtained control of the address bus via
assertion of AHOLD, BOFF or HOLD, the A31–A5 signals
become inputs and define a 32-byte, cache-line, inquire cycle
address in conjunction with the following signals:
The processor floats ADS one clock after system logic asserts
BOFF and in the same clock that the processor asserts HLDA.
Details The processor initiates bus cycles for the purpose of reading
and writing memory or I/O, and for writebacks of modified
cache lines. While the processor controls the bus, or while it is
writing back a modified cache line (whether in control of the
bus or not), ADS defines the beginning of the cycle. In the
clock that it asserts ADS, the processor also begins driving the
several signals that define and qualify the bus cycle, including
A31–A3 (or A31–A5 for writebacks), AP, the cycle definition
signals (D/C, M/IO and W/R), BE7–BE0, BREQ, A20M, CACHE,
LOCK, PCD, PWT and SCYC.
If ADS initiates a cache line fill and all four ways of the cache
that could accommodate the incoming line are filled with valid
The processor may again drive its own cycles with ADS as early
as one clock after system logic negates AHOLD. Before negat-
ing AHOLD, however, system logic may need to arbitrate
among potential contenders for the address bus so as to avoid
deadlock contention for the bus.
During burst reads (CACHE and KEN both asserted with the
first BRDY of a memory read), the processor drives BE7–BE0
with ADS to identify the bytes of the desired instruction or
operand. The processor drives BE7–BE0 with the desired bytes
at that time because it does not yet know whether the read will
be a single-transfer or a burst—this depends on how system
logic drives KEN with the first BRDY. If system logic negates
KEN, it must return as a single transfer only the bytes speci-
fied on BE7–BE0. If system logic asserts KEN, it must ignore
BE7–BE0 during all transfers of the burst and return all eight
bytes for the starting address on A31–A3. BE7–BE0 does not
change during the four transfers of the burst. (This behavior is
unlike the 486 processor, which drives BE3–BE0 separately for
each transfer of a burst.) System logic must determine the suc-
cessive quadword addresses for each transfer in a burst,
depending on the starting address, as shown in Section 5-4 on
page 5-21.
The processor floats the bus one clock after the assertion of
BOFF. All output and bidirectional signals used for memory or
I/O accesses are floated. Table 5-8 shows the signals floated.
The same set of signals is floated with HLDA.
All data transfers that are not performed as bursts are per-
formed as one or more single-transfer cycles. For write cycles,
EWBE must be asserted either with or after BRDY in order for
any further writes or certain other operations to be performed
(see the description of EWBE on page 5-62). If system logic
returns more BRDYs than the processor expects for a single-
transfer cycle or a burst cycle, the processor ignores them.
Details Bus cycle errors such as parity can be reported to the processor
on BUSCHK if this reporting is not done on NMI. The BUSCHK
signal is not used in most PC systems, although higher-end sys-
tems may find uses for it in special situations.
The only type of write cycle for which the processor asserts
CACHE are 32-byte writebacks of modified data. Writebacks
can be caused by (a) externally initiated inquire cycles or
FLUSH operations, (b) processor-initiated internal snoops or
cache line replacements, or (c) program-initiated WBINVD
instructions. By contrast, the processor drives writethroughs
during write hits to shared cache lines and during write misses,
but writethroughs are driven as single transfers of 1 to 8 bytes.
CACHE is not asserted during writethroughs.
For data cache MESI state transitions during writes, see the
description of the WB/WT signal on page 5-133. For more
details on data-cache MESI state transitions and control, and
the correspondence between MESI states and writeback or
writethrough states, see Section 5.2.56 on page 5-133 and Sec-
tion 6.2 on page 6-8.
While the processor operates with the Test Access Port (TAP),
all TAP events are timed relative to TCK rather than to CLK.
D/C is driven with the other cycle definition outputs (M/IO and
W/R) and with the BE7–BE0 byte-enable outputs during mem-
ory cycles (including cache writethroughs and writebacks), I/O
cycles, locked cycles, special bus cycles, and interrupt
acknowledge operations in the normal operating modes (Real,
Protected, and Virtual-8086) and in SMM, or while PRDY is
asserted. While AHOLD is asserted, D/C is driven only to com-
plete a bus cycle that had been initiated before AHOLD was
asserted, or for inquire cycle writebacks. During the Shut-
down, Halt, and Stop Grant states, D/C is driven only for
inquire cycle writebacks. D/C is not driven during the Stop
Clock state, or while BOFF, HLDA, RESET, or INIT is asserted.
The processor floats D/C one clock after system logic asserts
BOFF and in the same clock that the processor asserts HLDA.
Details The processor drives D/C according to whether the access is
initiated by the processor’s prefetch or branch logic (indicating
a code access) or its load/store logic (indicating a data access).
In the AMD-K5 processor, code accesses can be done specula-
tively, but data accesses are not. Only data (not code) can be
read from the I/O address space, because the cycle definition
for an I/O code read (D/C = 0, M/IO = 0, W/R = 0) defines an
interrupt acknowledge cycle.
If BOFF and EADS are both asserted in the same clock that
AHOLD is negated, EADS is not recognized. If EADS is
asserted on the same clock that HOLD is negated, both the
AMD-K5 and the Pentium processors recognize this as a valid
inquire cycle and process it correctly. However, if EADS is
asserted on the clock following the negation of HOLD, the
AMD-K5 processor does not recognize this as a valid inquire
cycle.
Details Inquire cycles cause the processor to compare a physical
address driven by system logic with the processor’s physical
address tags for its instruction and data caches. Inquire cycles
can occur in parallel with the processor’s own cache accesses,
which are done through a separate set of linear address tags.
its data cache with the addresses in the instruction cache, and
(c) automatic bus watching, in which a caching device con-
stantly compares addresses being driven by any other device
on the address bus with its own cached addresses. The AMD-K5
and Pentium processors only support the first two types of
snooping, not the third.
There are three methods by which system logic can obtain con-
trol of the address bus prior to running one or more inquire
cycles: AHOLD, BOFF or HOLD. While it has control of at least
the address bus, system logic can drive inquire cycles using
EADS, A31–A5, INV, and (optionally) AP.
The state of the numeric error (NE) bit in CR0 does not affect
the FERR signal.
Driven The processor drives FERR every clock during memory cycles
(including cache writethroughs and writebacks), cache hits of
all types, I/O cycles, and locked cycles in the normal operating
modes (Real, Protected, and Virtual-8086) and in SMM. FERR
is not driven during the Shutdown, Halt, Stop Grant, or Stop
Clock states, or while RESET, INIT, or PRDY is asserted.
Details The processor asserts FERR on the instruction boundary of the
next floating-point instruction or WAIT instruction that occurs
following the floating-point instruction that caused the
unmasked floating-point exception—that is, FERR is not
asserted at the time the exception occurs. The IGNNE signal
does not affect the assertion of FERR.
AHOLD, BOFF, and HOLD are all recognized and behave nor-
mally while FLUSH is asserted, and they will intervene in an
in-progress FLUSH operation. For example, if BOFF is
asserted while a FLUSH operation is writing modified lines
back to memory, an in-progress writeback will be aborted.
their signals are tied together so that they run the same pro-
gram.
System logic can use HITM to inhibit access to the bus by other
masters (via BOFF or HOLD) until the writeback associated
with the hit has completed. The time at which the writeback
occurs depends on which input signal was used to hold the pro-
cessor off the bus for the inquire cycle:
■ If AHOLD was used, the processor drives the writeback as
early as two clocks after asserting HITM, whether or not
AHOLD is still asserted at that time.
■ If BOFF or HOLD was used, the processor delays the write-
back until after BOFF or HLDA is negated. In the case of
BOFF, the writeback is driven before any aborted bus cycle
is restarted.
There are three methods by which system logic can obtain con-
trol of the address bus to drive an inquire cycle: AHOLD,
BOFF, or HOLD. AHOLD obtains control only of the address
bus and allows another master to drive only inquire cycles,
whereas BOFF and HOLD obtain control of the full bus
(address and data), allowing another master to drive not only
inquire cycles but also read and write cycles. Unlike BOFF,
AHOLD and HOLD both permit an in-progress bus cycle to
complete, but writebacks can occur while AHOLD is asserted,
whereas pending writebacks during the assertion of HOLD
occur after HOLD is negated, which is similar to BOFF.
Unlike RESET, INIT does not reinitialize the data and instruc-
tion caches, floating-point registers, model-specific registers,
or cache disable (CD) and not-writethrough (NW) bits in CR0.
A20M should not be asserted during the first code fetch follow-
ing the INIT cycle. The operating system alone is responsible
for controlling the state of A20M by writing to an external reg-
ister provided for this purpose. (See the description of A20M
on page 5-18.)
If system logic can leave the INTR signal asserted after the
INTR service routine is entered, the interrupt vector returned
by system logic during the interrupt acknowledge operation
must (in Protected mode) be for an interrupt gate, or for a task
gate that references a TSS with its IF cleared. If the returned
vector is not one of these two types, the processor will again
respond to INTR prior to executing the first instruction of the
service routine, causing an infinite loop.
For a comparison of the states that HITM, HIT, and INV can
assume, see Table 5-11 on page 5-71.
If BOFF is asserted after the first eight bytes, BRDY and KEN
of a cache-line fill are returned, the processor uses the first
eight bytes but it does not cache them, and the line fill is
aborted. When BOFF is negated, the entire bus cycle is
restarted from the beginning and the system must again drive
KEN in the same state that was sampled before the backoff.
Thus, system logic cannot use BOFF to change the state of KEN
and therefore the cacheability status of a line.
On the 486 processor, KEN is sampled twice (on the first and
last transfer of a burst) and must be asserted at both times for
a burst read to be treated as a cache-line fill. On the AMD-K5
and Pentium processors, however, KEN is sampled only on the
first clock of a transfer, during BRDY or NA, whichever is first.
The processor floats LOCK one clock after system logic asserts
BOFF and in the same clock that the processor asserts HLDA.
Details The processor always locks the following types of memory
operations:
■ Interrupt Acknowledge Operations—These are a pair of read
cycles used to obtain an interrupt vector in response to the
assertion of INTR.
■ Descriptor-Table Accesses—These involve segment descrip-
tors in the global descriptor table (GDT), local descriptor
table (LDT) or interrupt descriptor table (IDT) and occur in
Protected mode. The processor performs them during a seg-
ment load to ensure that the Accessed (A) bit in code and
data descriptors is set to 1, or to test and set the Busy (B) bit
in TSS descriptors. The sequence is as follows: (1) the pro-
cessor drives an unlocked read of the descriptor to see if
the relevant bit is set to 1, (2) if the bit is cleared to 0, the
processor then drives a locked read-modify-write to set the
bit to 1. During updates to the Accessed and Busy bits, the
AMD-K5 processor drives a locked four-byte read and four-
The processor always negates LOCK for at least one idle clock
between sequential locked operations. For example, if a read-
modify-write is followed by another read-modify-write, there is
an unlocked idle clock (sometimes called a dead clock)
between the two sequences to allow system logic to reallocate
the bus to another bus master. During this idle clock, the pro-
cessor responds to all signals and pending interrupts.
Only data (not code) can be read or written from the I/O
address space; the cycle definition for an I/O code read (D/C =
0, M/IO = 0, W/R = 0) defines an interrupt acknowledge cycle,
and the cycle definition for an I/O code write (D/C = 0, M/IO =
0, W/R = 1) defines a special bus cycle.
If INIT and NMI are both asserted during the Stop Grant state
(not necessarily simultaneously), the AMD-K5 processor recog-
nizes the INIT after leaving the Stop Grant state, then it recog-
nizes the NMI prior to fetching any instructions. Current
implementations of the Pentium processor do not recognize the
NMI in such cases, although future implementations may.
The processor floats PCD one clock after system logic asserts
BOFF and in the same clock that the processor asserts HLDA.
Details If PCD is negated during read misses, the page being accessed
may or may not be cacheable, depending on the state of other
signals. If PCD is asserted during any type of access, the page
is noncacheable. The PCD output affects the processor’s cach-
ing of data only during read misses. It has no effect on the pro-
cessor during read hits, write misses, or write hits, as shown in
Tables 5-17 and 5-18 on page 5-135.
The method of selecting the PCD bit is similar to that for the
PWT bit, described on page 5-105. The cache disable (CD) and
not-writethrough (NW) bits in CR0 are cleared to 0 for normal,
cacheable operation. If a location is already cached before the
operating system sets a PCD bit to 1, any access to that location
will hit in the cache regardless of the state of the PCD bit or
signal.
PCHK is driven for memory and I/O reads, locked reads, and
interrupt acknowledge operations in the normal operating
modes (Real, Protected, and Virtual-8086) and in SMM, or
while PRDY is asserted. PCHK is not driven during any type of
write cycles or special bus cycles; or during the Shutdown,
Halt, Stop Grant, or Stop Clock states; or while BOFF, HLDA,
RESET, or INIT is asserted. While AHOLD is asserted, PCHK
is driven only to complete a bus cycle already begun before the
assertion of AHOLD.
Details To determine data parity, the bit value driven on DP7–DP0 is
considered with the bit values driven on D63–D0. If the total
number of 1 bits is even for DP7–DP0 and D63–D0, the byte is
considered free of error (thus the term even parity). If the num-
ber of 1 bits is odd, the byte is considered to have an error.
During burst reads, the processor checks all eight bytes of
D63–D0 for errors, with respect to the even parity bit sampled
on DP7–DP0. During single-transfer reads, only the enable
bytes on D63–D0 and the enabled parity bits on DP7–DP0 (as
specified by BE7–BE0) are checked.
If PEN is asserted during the BRDY for a read cycle, and the
processor reports a data parity error on PCHK for that cycle,
the processor latches the physical address and cycle definition
of the failed bus cycle and (optionally) generates a machine
check exception. See the description of PEN on page 5-102 for
details.
PEN is sampled for memory and I/O reads, locked reads, and
interrupt acknowledge operations in the normal operating
modes (Real, Protected, and Virtual-8086) and in SMM, or
while PRDY is asserted. PEN is not sampled during any type of
write cycles or special bus cycles; or during the Shutdown,
Halt, Stop Grant, or Stop Clock states; or while BOFF, HLDA,
RESET, or INIT is asserted. While AHOLD is asserted, PEN is
sampled only to complete a bus cycle already begun before the
assertion of AHOLD.
Details If PEN is asserted when a data parity error is reported on
PCHK, the processor latches the physical address and cycle
definition of the failed bus cycle in its 64-bit machine check
address register (MCAR) and its 64-bit machine check type
register (MCTR). These registers can be read with the RDMSR
instruction. See Section 3.3.5 on page 3-33 for details on this
instruction.
Debug software can force the processor into SMM, but the pro-
cessor does not recognize SMI or any other interrupts while
PRDY is asserted. If system hardware or software wishes to
assert RESET, it must exit the HDT before asserting RESET.
The processor floats PWT one clock after system logic asserts
BOFF and in the same clock that the processor asserts HLDA.
Details As Table 5-14 shows, lines in the modified or exclusive MESI
state are said to be in the writeback state, which corresponds to
PWT = 0. Lines in the shared MESI state are said to be in the
writethrough state, which corresponds to PWT = 1.
System logic can use PWT output, along with its WB/WT input,
to determine how the processor will control internal caching.
Tables 5-17 and 5-18 on page 5-135 show how the state of PWT
and WB/WT determine the MESI state of a line in the data
cache after a cache-line fill or writeback. If WB/WT is Low or
PWT is High during a read miss or a write hit to a shared line,
The bits that determine the PWT output are stored in a proces-
sor control register or the TLB. Those bits include the paging
enable (PG) bit in CR0 and the page writethrough (PWT) bit in
one of three locations. The selection of bits depends on the pro-
cessor’s operating mode and the type of access, as follows:
■ In Real mode, and in Protected and Virtual-8086 modes
while paging is disabled (PG bit in CR0 cleared to 0):
PWT output = Low (writeback)
■ In Protected and Virtual-8086 modes while paging is
enabled (PG bit in CR0 set to 1):
For accesses to I/O space, page directory entries, and other
non-paged accesses:
PWT output = PWT bit in CR3
For accesses to 4-Kbyte page table entries or 4-Mbyte
pages:
PWT output = PWT bit in page directory entry
For accesses to a 4-Kbyte pages:
PWT output = PWT bit in page table entry
The method of selecting the PWT bit is similar to that for the
PCD bit as described on page 5-99. The cache disable (CD) and
not-writethrough (NW) bits in CR0 are cleared to 0 for normal,
cacheable operation.
If R/S is used to initiate the HDT, the debug logic must hold R/
S Low throughout the debug session. The processor negates
PRDY and begins fetching instructions for normal operation
one clock after a Low-to-High transition on R/S, or when the
TAP instruction register is cleared or the TAP is reset.
The processor floats SCYC one clock after system logic asserts
BOFF and in the same clock that the processor asserts HLDA.
Details For purposes of bus cycles, the term aligned means:
■ 2- and 4-byte transfers lie within 4-byte address boundaries
■ 8-byte transfers lie within 8-byte address boundaries
that an I/O device has not been accessed for several minutes.
The power management logic can then assert SMI, and the
SMM service routine can obtain relevant information from the
power management logic with which to make power-down deci-
sions under program control. These decisions can be communi-
cated back to the power management logic, which in turn can
power the I/O device down and assert STPCLK to the proces-
sor.
sor left off when it recognized SMI, unless the value is altered
by the SMM service routine).
The processor enters the Halt state from the normal operating
modes (Real, Protected or Virtual-8086) or SMM when it exe-
cutes the HLT instruction. The processor leaves the Halt state
and returns to its prior operating mode when RESET, SMI,
INIT, NMI, or INTR is asserted. If STPCLK is asserted within
The processor enters the Stop Clock state when system logic
turns off CLK while STPCLK is asserted. This is the minimum-
power state and it can only be entered from the Stop Grant
state after BRDY has been returned for the Stop Grant special
bus cycle. In the Stop Clock state, the processor’s phase-lock
loop and I/O buffers are disabled, except for the I/O buffers on
CLK and the Test Access Port (TAP) signals. System logic
should not change the state of any signals, and the processor
does not recognize any signal edges in the Stop Clock state.
When CLK is restarted, the processor returns to the Stop Grant
state, responds to inputs in the next clock, but cannot drive bus
cycles until its phase-lock loop is synchronized. The latter
takes several clocks (see the data sheet for this specification).
The CLK can be driven with a different frequency, and/or the
bus-to-processor clock ratio can be changed on the BF input(s)
upon restarting CLK.
the exclusive state, a subsequent write hit to the same line tran-
sitions the line to the modified state. During write hits, the
states of PWT and WB/WT can only change a line from shared
to exclusive; it cannot change an exclusive line to a shared line.
5.3.2 Addressing
The address for a bus cycle is driven on A31–A3 and BE7–BE0.
A31–A3 carry the upper 29 bits of the address, identifying an
aligned 8-byte (quadword) region in memory. BE7–BE0 iden-
tify the accessed bytes in that quadword, in effect indicating
the three least-significant bits of the address and the size (in
bytes) of the desired transfer. For burst and inquire cycles,
A31–A5 are sufficient to identify the memory location of the
cache line. For burst reads, which are four-transfer cache-line
fills, system logic should watch A4–A3 and return the
addressed quadword first, before returning the remainder of
the cache line.
5.3.3 Alignment
For purposes of bus cycles, the term aligned means:
For each signal in the timing diagrams, the High level repre-
sents 1, the Low level represents 0, and the middle level repre-
sents the floating (high-impedance) state. When both the High
and Low levels are shown, the meaning depends on the signal.
For a single signal, it means don’t care. For a bus, it means that
the processor or system logic is driving a value, but this value
may or may not be valid (for example, the value on the address
bus is valid only during the assertion of ADS, although
addresses are also driven on the bus at other times).
The value indicated for the address bus represents the value
driven on lines A31–A3. This value, multiplied by 8, is the byte
address of an 8-byte region in memory. The value for BE7–BE0
indicates which bytes in that region are to be transferred: the
bytes corresponding to the zeros on BE7–BE0 are transferred.
During the read cycle, the processor drives PCD, PWT, and
CACHE to indicate its caching and cache-coherency intent for
the access. System logic returns KEN and WB/WT to either con-
firm or change this intent. In this example, the processor
asserts PCD and negates CACHE, so the accesses are non-
cacheable, even though system logic asserts KEN during the
BRDYs to indicate its support for cacheability. The processor
(which drives CACHE) and system logic (which drives KEN)
must agree in order for an access to be cacheable. They must
also agree among PWT and WB/WT in order for a cacheable
line to be cached in the writeback state.
The processor can drive another cycle (in this example, a write
cycle) as early as two clocks after the assertion of BRDY. A
dead (or idle) clock is thus guaranteed between any two bus
cycles. As in the read cycle, neither the address nor the cycle-
definition signals are valid until the processor asserts ADS,
and the value driven on A31–A3 is valid only during the asser-
tion of ADS.
While Figure 5-2 shows BRDY returned in the next clock after
ADS, most DRAM-based systems add wait states (idle clocks)
between ADS and BRDY.
CLK
A31–A3
ADS
AP
BE7–BE0
BRDY
BREQ
CACHE
D/C
D63–D0
DP7–DP0
KEN
M/IO
PCD
PCHK
PEN
PWT
W/R
WB/WT
CLK
Read Write
Single-Transfer Figure 5-3 shows two consecutive memory writes. The first
Memory Write write fills an external write buffer and the second write is
Delayed by EWBE stalled for three clocks by the negation of EWBE.
Signal
For writes, system logic can store the address and data in a
write buffer, return BRDY, and perform the store to memory
later. If the number of outstanding writes exceeds the size of
the write buffer, system logic must negate EWBE to prevent
the processor from sending additional writes until EWBE is
asserted. The advantage of negating EWBE as opposed to not
asserting BRDY is that negating EWBE prevents only write
requests, but not asserting BRDY stalls the bus and prevents
all requests.
CLK
A31–A3
ADS
BE7–BE0
BRDY
D/C
D63–D0
EWBE
M/IO
W/R
CLK
Effective
Write BRDY Write
I/O Read and Write Figure 5-4 shows an I/O read followed by an I/O write. The pro-
cessor accesses I/O when it executes an I/O instruction (any of
the INx or OUTx instructions). Accesses to memory-mapped
I/O ports appear on the bus as accesses to memory rather than
to the I/O address space.
CLK
A31–A3
ADS
BE7–BE0
BRDY
D/C
D63–D0
M/IO
W/R
CLK
Read Write
CLK
A31–A3
ADS
BE7–BE0
BRDY
D/C
D63–D0
M/IO
SCYC
W/R
CLK
Burst Read Figure 5-6 shows two consecutive burst reads. During burst
reads (CACHE and KEN both asserted with the first BRDY of a
memory read), the processor drives BE7–BE0 with ADS to
identify the bytes of the desired instruction or operand. The
processor drives BE7–BE0 with the desired bytes at that time
because it does not yet know whether the read will be a single-
transfer or a burst—this depends on how system logic drives
KEN with the first BRDY. If system logic negates KEN it must
return, as a single transfer, only the bytes specified on BE7–
BE0. If system logic asserts KEN, it must ignore BE7–BE0 dur-
ing all transfers of the burst and return all eight bytes for the
starting address on A31–A3. BE7–BE0 does not change during
the four transfers of the burst. (This behavior is unlike the 486
processor, which drives BE3–BE0 separately for each transfer
of a burst.) System logic must determine the successive quad-
word addresses for each transfer in a burst, depending on the
starting address, as shown in Table 5-21.
In the clock after ADS, the processor drives the first of four
sequential eight-byte (quadword) transfers on the data bus.
The processor holds the first transfer on the bus until system
logic returns BRDY, then it transfers the next quadword. In
this example, system logic returns BRDY with no wait states,
and the processor responds by driving the subsequent quad-
word in the next clock. Typical systems, however, add one or
more wait states between the transfers.
For both read cycles, the processor asserts CACHE with ADS
and system logic asserts KEN with the BRDY of the first trans-
fer. Thus, CACHE and KEN agree, and the access is cached.
This agreement between CACHE and KEN is required in order
for a burst read to occur. The processor only drives burst reads
if the access is cacheable. If either CACHE or KEN were
negated during the BRDY of the first transfer, the read would
terminate with the first quadword transfer, thus becoming a
single-transfer read.
CLK
A31–A3
ADS
BE7–BE0
BRDY
BREQ
CACHE
D/C
D63–D0
KEN
M/IO
PWT
W/R
WB/WT
CLK
Read Read
CLK
A31–A3
ADS
BE7–BE0
BRDY
CACHE
D/C
D63–D0
KEN
M/IO
NA
PWT
W/R
WB/WT
CLK
Read Read
Burst Writeback Figure 5-8 shows a burst read followed by a writeback. Write-
backs are the only type of burst write that the processor per-
forms. They can be initiated by the processor or by system
logic in the following cases:
■ Processor-Initiated Writebacks:
• Replacement—If a cache-line fill is initiated when all
four ways of the cache that could accommodate the in-
coming line are filled with valid entries, the processor
uses a round-robin algorithm to select a line for replace-
ment. Before a replacement is made to a data cache line
in the modified state, the line is written back to memory.
• Internal Snoop—The processor snoops the data cache
whenever an instruction-cache line is read, and it snoops
the instruction cache whenever a data cache line is writ-
ten. This snooping is performed to determine whether
the same address is stored in both caches, a situation
that is taken to imply the occurrence of self-modifying
code. If a snoop hits a data cache line in the modified
state, the line is written back to memory before being in-
validated.
• WBINVD Instruction—When the processor executes a
WBINVD instruction, it writes back all modified lines in
the data cache and then invalidates all lines in both
caches. The action taken in response to the WBINVD in-
struction is essentially the same as the action taken in
response to the FLUSH input signal, except that the ac-
knowledge cycles differ. For details, see page 5-185.
■ System-Initiated Writebacks:
• Inquire Cycle Hits—If an inquire cycle hits a modified
line in the data cache, the processor writes back the line.
For details, see page 5-157.
• FLUSH—If system logic asserts the FLUSH input, the
entire contents of the data cache are written back to
memory before the entire contents of both caches are in-
validated. The action taken in response to the FLUSH
input signal is essentially the same as the action taken in
response to the WBINVD instruction, except that the ac-
knowledge cycles differ. For details, see page 5-183.
During the burst read (Step 2), the states of PWT and WB/WT
are the same as in Figure 5-6 and Figure 5-7.
CLK
A31–A3
ADS
BE7–BE0
BRDY
CACHE
D/C
D63–D0
EADS
KEN
M/IO
PWT
W/R
WB/WT
CLK
Read Write
AHOLD-Initiated Figure 5-9 shows a burst read, during which system logic
Inquire Miss asserts AHOLD to acquire the address bus for an inquire cycle.
The processor floats the address bus one clock after AHOLD is
asserted, although the data bus continues to return data from
the in-progress burst read. (The processor supports only one in-
progress bus cycle. No pending bus cycles are buffered.) Two
clocks after asserting AHOLD, system logic initiates the
inquire cycle by asserting EADS, driving INV (negated in this
example), and driving the inquire address on A31–A5.
CLK
A31–A3
ADS
AHOLD
AP
APCHK
BE7–BE0
BRDY
BREQ
D/C
D63–D0
EADS
HIT
HITM
INV
M/IO
W/R
CLK
Read Inquire
AHOLD-Initiated Figure 5-10 shows an example similar to Figure 5-9, minus the
Inquire Hit to Shared address parity error, but this inquire cycle hits either a shared
or Exclusive Line or exclusive line in the cache, as indicated by the assertion of
HIT and the negation of HITM two clocks after the assertion of
EADS. The processor invalidates the cache line because sys-
tem logic asserts INV with EADS. The processor may drive a
new bus cycle as early as one clock after system logic negates
AHOLD.
CLK
A31–A3
ADS
AHOLD
BE7–BE0
BRDY
D/C
D63–D0
EADS
HIT
HITM
INV
M/IO
W/R
CLK
Read Inquire
AHOLD-Initiated Figure 5-11 shows the same sequence as in Figure 5-10, but this
Inquire Hit to time the inquire cycle hits a modified line. As in Figure 5-10,
Modified Line system logic asserts INV with EADS. Two clocks later, the pro-
cessor asserts both HIT and HITM. A few clocks later the pro-
cessor drives a writeback for the cache line and then
invalidates its cached copy. The processor holds HITM
asserted until one clock after the last BRDY of the writeback.
CLK
A31–A3
ADS
AHOLD
BE7–BE0
BRDY
D/C
D63–D0
EADS
HIT
HITM
INV
M/IO
W/R
CLK
Bus Backoff (BOFF) BOFF provides the fastest response of the three bus-hold
inputs. Unlike AHOLD and HOLD, BOFF does not permit an
in-progress bus cycle to complete. It forces the processor off
the bus in the next clock, aborting any in-progress bus cycle
that the processor may have begun.
CLK
A31–A3
ADS
BE7–BE0
BOFF
BRDY
D/C
D63–D0
M/IO
W/R
CLK
CLK
A31–A3
ADS
BE7–BE0
BOFF
BRDY
CACHE
D/C
D63–D0
EADS
HIT
HITM
INV
KEN
M/IO
W/R
CLK
Read Restarted
(aborted) Inquire Writeback Read
HOLD-Initiated Figure 5-14 shows HOLD asserted in the same clock that the
Inquire Hit to Shared processor begins a read cycle. The processor completes the
or Exclusive Line read (which is a burst read) and asserts HLDA two clocks after
the last BRDY of the in-progress cycle. It also floats all output
and bidirectional signals used for memory or I/O accesses at
the same time it asserts HLDA.
CLK
A31–A3
ADS
BE7–BE0
BRDY
CACHE
D/C
D63–D0
EADS
HIT
HITM
HLDA
HOLD
INV
KEN
M/IO
W/R
CLK
Read Inquire
HOLD-Initiated Figure 5-15 shows an example similar to the one in Figure 5-14,
Inquire Hit to except that the inquire cycle hits a modified line (both HIT and
Modified Line HITM asserted two clocks after EADS). System logic negates
HOLD in the clock after EADS, and two clocks later (one clock
after HIT and HITM transition) the processor negates HLDA.
As early as one clock after negating HLDA, the processor
asserts ADS to drive the writeback, after which the processor
invalidates its copy of the line.
CLK
A31–A3
ADS
BE7–BE0
BRDY
CACHE
D/C
D63–D0
EADS
HIT
HITM
HLDA
HOLD
INV
KEN
M/IO
W/R
CLK
Basic Locked Figure 5-16 shows a pair of read-write bus cycles. The proces-
Operation sor asserts LOCK with the ADS of the first bus cycle in the
locked operation, and holds it asserted until the last expected
BRDY of the last bus cycle in the locked operation. Between
the locked operations, the processor negates LOCK for at least
one clock.
CLK
A31–A3
ADS
BE7–BE0
BRDY
CACHE
D/C
D63–D0
KEN
LOCK
M/IO
W/R
CLK
TLB Miss Figure 5-17 shows a TLB miss for a 4-Kbyte page. An overview
(4-Kbyte Page) of the 4-Kbyte paging mechanism is illustrated in Figure 3-2 on
page 3-5. The paging mechanism for 4-Mbyte pages (Figure 3-3
on page 3-6) is similar but somewhat simpler. The processor
has separate TLBs for the two page sizes.
The general sequence, both for PDE and PTE, is as follows for
accesses to a 4-Kbyte page:
■ The processor drives an unlocked read of the PDE or PTE to
see if the relevant bit (A or D) is set.
■ If the bit is cleared (0), the processor then drives a locked
read-modify-write (four-byte read followed by four-byte
write) to set the bit.
CLK
A31–A3
ADS
BE7–BE0
BRDY
CACHE
D/C
D63–D0
EADS
KEN
LOCK
M/IO
W/R
CLK
Locked Operation Unlike AHOLD and HOLD, BOFF does not permit an in-
with BOFF progress bus cycle to complete. It forces the processor off the
Intervention bus in the next clock, aborting any in-progress bus cycle that
the processor may have begun. If BOFF is asserted during a
locked operation, only the cycle(s) aborted before their last
BRDY and the cycles not yet run are restarted after BOFF is
negated. Thus, system logic must keep track of all cycles in the
locked operation that have completed before the assertion of
BOFF and must continue the locked operation immediately
after BOFF is negated, except that if a writeback is pending
when BOFF is negated, the writeback takes precedence over
the restarting of the aborted cycles in the locked operation.
System logic should ensure that the processor results for inter-
rupted and uninterrupted locked cycles are consistent. That is,
system logic must guarantee that the memory accessed by the
processor is not modified during the time another bus master
controls the bus.
CLK
A31–A3
ADS
BE7–BE0
BOFF
BRDY
CACHE
D/C
D63–D0
KEN
LOCK
M/IO
W/R
CLK
Interrupt Figure 5-19A shows system logic asserting INTR during a burst
Acknowledge read. The figure shows the resulting bus behavior, up to the
Operation start of the interrupt handler. When the processor recognizes
an INTR interrupt at the next instruction-retirement bound-
ary, the processor performs the following actions:
■ Finish In-Progress Bus Cycle—In Figure 5-19A, a burst read is
in progress when system logic asserts INTR. The processor
supports only one such in-progress bus cycle.
■ Flush Instruction Pipeline—This is not visible on the bus.
■ Acknowledge Interrupt—The interrupt acknowledge opera-
tion consists of a locked pair of reads, as shown in Table
5-22. The first read is not functional (a protocol relic). The
second read returns the interrupt vector in D7–D0. (The
interrupt vector is an offset into an interrupt table.) System
logic must return a BRDY in response to both cycles. The
processor inserts at least one idle clock between the locked
reads.
■ System logic will typically not be able to determine the
instruction boundary on which the processor recognizes
INTR. Thus, as a practical matter, system logic should hold
INTR asserted until the beginning of the interrupt acknowl-
edge operation, or until there is some other evidence that
the interrupt service routine has been entered (for exam-
ple, the access to the interrupt-table address).
CLK
A31–A3
ADS
BE7–BE0
BRDY
CACHE
D/C
D63–D0
EADS
INTR
KEN
LOCK
M/IO
W/R
CLK
Interrupt
INTR Asserted Acknowledge
Cycles
CLK
A31–A3
ADS
BE7–BE0
BRDY
CACHE
D/C
D63–D0
EADS
INTR
KEN
LOCK
M/IO
W/R
CLK
CLK
A31–A3
ADS
BE7–BE0
BRDY
CACHE
D/C
D63–D0
EADS
INTR
KEN
LOCK
M/IO
W/R
CLK
Basic Special Bus Figure 5-20 shows a basic special bus cycle, which is defined
Cycle during ADS by D/C = 0, M/IO = 0, and W/R = 1 and differenti-
ated by BE7–BE0 and A31–A3. In this example, BE7–BE0
= FBh and A31–A3 = 0, so it is the special cycle the processor
generates after executing a HLT instruction. System logic must
respond with BRDY.
CLK
A31–A3
ADS
BE7–BE0
BRDY
D/C
EWBE
M/IO
W/R
CLK
Shutdown Cycle Figure 5-21 shows a shutdown and the special cycle that fol-
lows. The processor enters shutdown when an interrupt or
exception occurs during the handling of a double fault (vector
8), which amounts to a triple fault. When the processor encoun-
ters such a triple fault, it stops its activity on the bus and gen-
erates the special bus cycle for shutdown (BE7–BE0 = FEh).
System logic must respond with BRDY.
System logic must assert NMI, INIT, RESET, or SMI to get the
processor out of the Shutdown state.
CLK
A31–A3
ADS
BE7–BE0
BRDY
CACHE
D/C
D63–D0 …
INTR
KEN
LOCK
M/IO
W/R
CLK
Shutdown Shutdown
Occurs Special
Cycle
CLK
A31–A3
ADS
BE7–BE0
BRDY
CACHE
D/C
… …
D63–D0
FLUSH
KEN
LOCK
M/IO
W/R
CLK
CLK
A31–A3
ADS
BE7–BE0
BRDY
CACHE
D/C
D63–D0
KEN
LOCK
M/IO
W/R
CLK
INVD
Instruction Cache Invalidation
Completes Special Cycle
Cache-Writeback Figure 5-24A and Figure 5-24B show the cache-writeback and
and Invalidation invalidation special bus cycle, followed by the cache-invalida-
Cycle (WBINVD tion special bus cycle. The processor drives these two special
Instruction) cycles after executing the WBINVD instruction.
CLK
A31–A3
ADS
BE7–BE0
BRDY
CACHE
D/C
D63–D0
KEN
LOCK
M/IO
W/R
CLK
WBINVD
Instruction Writeback
Completes
CLK
A31–A3
ADS
BE7–BE0
BRDY
CACHE
D/C
D63–D0
KEN
LOCK
M/IO
W/R
CLK
Branch-Trace Figure 5-25 shows the two branch-trace message special bus
Message Cycles cycles that the processor generates for each taken branch
when branch tracing is enabled as described in Section 7.6 on
page 7-17. System logic can accumulate the address and data
bus values for debugging or profiling.
CLK
A31–A3
ADS
BE7–BE0
BRDY
CACHE
D/C
D63–D0
KEN
M/IO
W/R
CLK
Branch-trace
Message special
cycles
Transition from Figure 5-26A and Figure 5-26B shows the transition from one of
Normal Execution to the processor’s normal operating modes (Real, Protected, or
SMM Virtual-8086 mode) to System Management Mode (SMM). Sys-
tem logic causes this transition by asserting SMI.
CLK
A31–A3
ADS
BE7–BE0
BRDY
CACHE
D/C
D63–D0
FLUSH
KEN
LOCK
M/IO
SMI
SMIACT
W/R
CLK
SMIACT
SMI Asserted Asserted
CLK
A31–A3
ADS
BE7–BE0
BRDY
CACHE
D/C
D63–D0
FLUSH
KEN
LOCK
M/IO
SMI
SMIACT
W/R
CLK
Begin save of
processor state
Stop-Grant and Stop- Figure 5-27A and Figure 5-27B show the processor’s transition
Clock States from normal execution to the Stop-Grant state, then to the
Stop-Clock state, and finally back to normal execution. The
series of transitions begins when system logic asserts STPCLK.
Upon recognizing a STPCLK interrupt at the next instruction-
retirement boundary, the processor performs the following
actions, in the order shown:
1. Flush Pipeline—The processor invalidates all instructions
remaining in the pipeline. This is not visible on the bus.
2. Complete In-Progress Cycle—If the processor had begun a
bus cycle or locked operation when STPCLK was asserted,
the processor completes the bus cycle and waits until the
system asserts the last expected BRDY and also asserts
EWBE. If no bus cycle is in progress, system logic must
assert EWBE at the same time as, or at sometime after, it
asserts STPCLK. In Figure 5-27A, a burst read is shown
completing after STPCLK is asserted.
3. Stop-Grant Cycle—After sampling both EWBE asserted, the
processor drives a Stop-Grant special bus cycle. This cycle
is identified by D/C = 0, M/IO = 0, W/R = 1, BE7–BE0 = FBh
and A31–A3 = 10h. System logic must respond by asserting
BRDY. This is visible on the bus, near the middle of Figure
5-27A.
4. Stop Internal Clock—When system logic returns BRDY for
the Stop-Grant special bus cycle, the processor stops its
internal clock and floats D63–D0 and DP7–DP0. This is on
the bus between Figure 5-27A and Figure 5-27B immedi-
ately after the BRDY of the Stop-Grant special bus cycle.
5. (Optional) Stop Bus Clock—After returning BRDY in
response to the Stop-Grant special bus cycle, power-man-
agement logic can transition to the Stop-Clock state by stop-
ping CLK while STPCLK is held asserted.
CLK
A31–A3
ADS
BE7–BE0
BRDY
CACHE
D/C
D63–D0
KEN
LOCK
M/IO
STPCLK
Stop
W/R Clock
State
CLK
CLK
A31–A3
ADS
BE7–BE0
BRDY
CACHE
D/C
D63–D0
KEN
LOCK
M/IO
STPCLK
W/R
CLK
Normal
State
CLK
A31–A3
ADS
BE7–BE0
BRDY
CACHE
D/C
D63–D0
…
INIT
KEN
M/IO
RESET
W/R
CLK
Code fetch
INIT Asserted from FFFF_FFF0h
6
System Design
Throughout this chapter, the term clock refers both to the pro-
cessor’s internal clock and to the bus clock (CLK). Thus, each
type of clock is explicitly differentiated in the descriptions
that follow.
6.1 Memory
Memory 6-1
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
4 Gbyte FFFF_FFFF
Boot ROM
FFFF_C000
Extended
Hardware (expanded)
Alias Memory
1 Mbyte 000F_FFFF
Aliased Boot ROM
000F_C000
BIOS
Remap
640 Kbyte 0009_FFFF During Boot
Low
(conventional)
Memory
0003_FFFF SMM
0003_8000 Memory
DOS Kernel
BIOS Data
Interrupt Vectors 0000_0000
Decimal Hexadecimal
Memory 6-3
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
Figure 6-2 shows the default map of the SMM memory area. It
consists of a 64-Kbyte area, between 0003_0000h and
0003_FFFFh, of which the top 32 Kbytes (0003_8000h and
0003_FFFFh) must be populated with RAM. The SMM service-
routine entry point is located at 0003_8000h.
Memory 6-5
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
32-Kbyte
Minimum
RAM
SMM
Service Routine
Service Routine Entry Point 0003_8000
Memory 6-7
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
6.2 Cache
6.2.1 L2 Cache
To improve system performance, an L2 cache can be added
between the processor and main memory. The L2 cache can be
implemented for 3-2-2-2 bursts using 15-ns asynchronous
SRAM on a 60-MHz or 66-MHz bus. Faster bursts can be imple-
mented with synchronous SRAM. 9-ns SSRAM can achieve 3-1-
1-1 bursts at 66 MHz and 10-ns SSRAM can achieve 2-1-1-1
bursts at 50 MHz.
Cache 6-9
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
Cache 6-11
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
Cache 6-13
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
BOFF Arbitration BOFF obtains control of the full bus (address and data) in the
next clock, intervening in any in-progress bus cycle if neces-
sary. It provides the fastest response of the three bus-hold
inputs. The processor floats its outputs in the next clock after
the assertion of BOFF. Thus, the signal can also be used not
only for inquire cycles but also to resolve deadlock between
two bus masters during inquire cycles.
Cache 6-15
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
1 BOFF
AMD-K5 Other
Processor 2 EADS Caching
Master
3 HITM
5 Writeback
4 BOFF
Processor Bus
System Bus
AHOLD Arbitration AHOLD’s sole function is to support inquire cycles. The asser-
tion of AHOLD by system logic only gets control of the address
bus, leaving the data bus available to the processor for the
completion of an in-progress bus cycle. If an inquire cycle hits
a modified line while AHOLD is asserted, the writeback can
occur while AHOLD is either asserted or negated.
Cache 6-17
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
9 Writeback AMD-K5
Processor
Look-Through
L2 Cache
System
Logic
1 Memory Access
System Bus
5 BOFF
Main Other
Memory Bus
Master
HOLD Arbitration System logic can use the HOLD (request) and HLDA (acknowl-
edge) protocol to gain control of the address and data buses.
Like BOFF, HOLD/HLDA gains control of both the address and
data buses but only after the processor completes any in-
progress bus cycle or a sequence of cycles, like a locked cycle.
However, unlike BOFF, the HOLD/HLDA protocol cannot
resolve deadlock. In systems where deadlock can occur BOFF
must be used, and there is no need to support HOLD/HLDA.
Cache 6-19
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
1 WB/WT = 0
Look-Through
L2 Cache 2 WB/WT = 1
System
Logic
System Bus
Other
Main
Bus
Memory
Master
Cache 6-21
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
The processor enters SMM when system logic asserts the SMI
interrupt and the processor acknowledges it with SMIACT, at
which point the processor saves its state and jumps to the SMM
service routine. The processor returns from SMM when it exe-
cutes the RSM (resume) instruction from within the SMM ser-
vice routine. Upon return, the processor picks up where it left
off in its prior operating mode, except that special return
options are provided when the processor enters SMM from the
Halt state or from a trapped I/O instruction, as described in the
sections below.
Figure 6-2 on page 6-7 shows the default map of the SMM mem-
ory area. It consists of a 64-Kbyte area, between 0003_0000h
and 0003_FFFFh, of which the top 32-Kbytes (0003_8000h and
0003_FFFFh) must be populated with RAM. The default code-
segment (CS) base address for the area—called the SMM Base
Address—is at 0003_0000h. The top 512 bytes (0003_FFFFh to
0003_FE00h) contain a fill-down SMM state-save area. The
default entry point for the SMM service routine is at
0003_8000h.
Table 6-2 shows the offsets in the SMM state-save area relative
to the SMM base address. The SMM service routine can alter
any of the read/write values in the state-save area. The con-
tents of any reserved locations in the state-save area are not
necessarily the same between the AMD-K5 processor and the
Pentium or 486 processors.
begin
{
if SMI Handler is to be Relocated then
{
set SMM Base Address (offset FEF8h) to new value
resume
}
else
{
SMM execution to begin at relocation area.
resume
}
}
end
Before return from SMM, the halt restart slot can be written
as:
■ Bits 15–1—Undefined
■ Bit 0—Point of return from SMM
1 = return to Halt state
0 = return to state specified by SMM state-save area
The fields of the halt restart slot are the same as in the Pen-
tium processor auto halt restart slot. During entry into and exit
from SMM, the processor writes or reads only bit 0 of the 16-bit
value although the entire 16 bits can be read or written by the
service routine. The Pentium-compatible pseudo-code for
implementing the halt restart slot in BIOS is as follows:
begin
{
if return to Halt state then
{
if SMI# during Halt state then
set halt restart slot to 00FFh
}
}end
If the return takes the processor back to the Halt state, the
HLT instruction is not refetched, but the Halt special bus cycle
is driven on the bus after the return.
The I/O trap dword is related to the I/O trap restart slot,
described below. Bit 1 of the I/O trap dword (the valid bit)
should be tested if the I/O trap restart slot is to be changed.
The fields of the I/O trap restart slot are configured as follows:
■ Bits 31–16—reserved
■ Bits 15–0—I/O instruction restart on return from SMM:
0000h = execute the next instruction after the trapped I/O
instruction
00FFh = re-execute the trapped I/O instruction
The processor initializes the I/O trap restart slot to 0000h upon
entry into SMM. If SMM was entered due to a trapped I/O
instruction, the processor indicates the validity of the I/O
instruction by setting or clearing bit 1 of the I/O trap dword at
offset FFA4 in the SMM state-save area, as described in Sec-
tion 6.3.6. The SMM service routine should test bit 1 of the I/O
trap dword to determine the validity of the I/O instruction
before writing the I/O trap restart slot. If the I/O instruction
was valid, the SMM service routine can safely rewrite the I/O
trap restart slot with the value 00FFh, which causes the proces-
sor to re-execute the trapped I/O instruction when the RSM
instruction is executed. If the I/O instruction was invalid, writ-
ing the I/O trap restart slot has undefined results. If sequential
SMI interrupts occur, the second entry into SMM will never
have bit 1 of the I/O trap dword set, and the second SMM ser-
vice routine should not rewrite the I/O trap restart slot.
begin
{
if I/O instruction needs to be restarted then
{
if valid I/O instruction (test offset FFA4) then
set I/O restart slot (offset FF00) to 00FFh
}
}
end
Table 5-2 on page 5-8 and Table 5-3 on page 5-16 summarize the
behavior of all interrupts in SMM.
Within the Halt state, the processor disables the majority of its
internal clock distribution and (if STPCLK is asserted) the
internal pullup resistor on STPCLK. However, its phase-lock
loop still runs, its key internal logic is still clocked, most of its
inputs and outputs retain their last state (except D63–D0 and
DP7–DP0 which are floated), and it still responds to input sig-
nals.
STPCLK Asserted
STPCLK Negated
CLK CLK
Started Stopped
Stop Clock
State
Within the Stop Grant state (as in the Halt state) the majority
of the processor’s internal clock distribution and all internal
pullup resistors are disabled. However, its phase-lock loop still
runs, its key internal logic is still clocked, most of its inputs
and outputs retain their last state (except D63–D0 and DP7–
DP0 which are floated), and it still responds to input signals.
VCC
VCC at Operating Voltage
PWRGOOD
RESET
CLK
http://www.amd.com/
7
Test and Debug
7-1
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
The sections that follow provide details on each of the test and
debug features.
31 8 7 6 5 4 3 2 1 0
D D D D
D I B D S
C C P C P
C
Reserved
of the TAP BIST, the result remains in the BIST result register
for shifting out through the TDO signal. The TRST signal must
be asserted or the TAP instruction must be changed in order to
exit TAP BIST and return to normal operation.
31 0
Array Pointer
(Contents of EDX)
MSR
31 0 82h
Array Data
(Contents of EAX)
Bits 7–0 of every array pointer encode the array ID, which iden-
tifies the array to be accessed, as shown in Table 7-3. To sim-
plify multiple accesses to an array, the contents of EDX is
retained after the RDMSR instruction executes (EDX is nor-
mally cleared after a RDMSR instruction).
31 30 29 28 27 19 18 13 12 8 7 0
Array ID
0 0 Way 0 0 0 0 0 0 0 0 Set 0 0 0 0 0 (E1h, ECh)
31 28 27 0
0 0 0 0 Valid Bits
31 23 22 0
0 0 0 0 0 0 0 0 0 Valid Bits
31 30 29 28 27 19 18 13 12 10 9 8 7 0
Array ID
0 0 Way 0 0 0 0 0 0 0 0 0 Set Dword 0 0 (E0h)
31 0
Valid Bits
(E0h) Data
31 30 29 28 27 20 19 12 11 8 7 0
Array ID
0 0 Way 0 0 0 0 0 0 0 0 Set 0 0 0 0 (E5h, EDh, E6h, E7h)
31 20 19 0
0 0 0 0 0 0 0 0 0 0 0 0 Valid Bits
31 21 20 0
0 0 0 0 0 0 0 0 0 0 0 Valid Bits
31 19 18 0
0 0 0 0 0 0 0 0 0 0 0 0 0 Valid Bits
31 19 18 0
0 0 0 0 0 0 0 0 0 0 0 0 0 Valid Bits
31 30 29 28 27 20 19 12 11 9 8 7 0
Opcode Array ID
0 0 Way 0 0 0 0 0 0 0 0 Set 0
Bytes (E4h)
31 26 25 0
0 0 0 0 0 0 Valid Bits
31 30 29 28 27 13 12 8 7 0
Array ID
0 0 Way 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Set (E8h, E9h)
31 22 21 0
0 0 0 0 0 0 0 0 0 0 Valid Bits
31 20 19 0
0 0 0 0 0 0 0 0 0 0 0 0 Valid Bits
31 30 29 28 27 8 7 0
Array ID
0 0 Entry 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 (EAh, EBh)
31 12 11 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Valid Bits
31 15 14 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Valid Bits
Branch tracing is enabled by writing bits 3–1 with 001b and set-
ting bit 5 to 1 in the Hardware Configuration Register
(HWCR), as described in Section 7.1 on page 7-3. When thus
enabled, the processor drives two branch-trace message spe-
cial bus cycles immediately after each taken branch instruc-
tion is executed. Both special bus cycles have a BE7–BE0
encoding of DFh (1101_1111b). The first special bus cycle iden-
tifies the branch source, the second identifies the branch tar-
get. The contents of the address and data bus during these
special bus cycles are shown in Table 7-4.
Appendix A
Compatibility With the
Pentium and 486 Processors
A-1
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
A-2
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
A-4
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
For updates to the Busy bit in the TSS descriptor, the AMD-K5
processor behaves in the manner described for updates to the
Accessed bit. The Pentium processor does not perform the
unlocked read to get the descriptor.
This is the same set of output pins that have selectable drive
strengths on the Pentium processor. However, the Pentium
processor supports three drive strengths on these pins while
the AMD-K5 processor supports two.
A-6
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
A.3.2 BOFF Asserted before Snoop to Linefill Buffer and after the
Cacheability of the Line is Established
A snoop to the linefill buffer occurs during a linefill when the
address of the snoop matches the address of the linefill. If
BOFF is asserted after the cacheability of the line is deter-
mined via the KEN pin being sampled active (with the asser-
tion of NA or BRDY, whichever comes first) and a snoop to the
linefill buffer occurs with either BOFF or AHOLD or both
asserted, the Pentium processor treats the snoop as a hit,
whereas the AMD-K5 processor may or may not treat it as a hit.
For DCACHE linefills, the AMD-K5 processor treats the snoop
A-8
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
Comments In treating the snoop as a hit, the AMD-K5 and Pentium proces-
sors assert the HIT pin and also cache the line as either shared
or invalid, depending on the state of the INV pin. The cycle
restarts after the deassertion of BOFF and AHOLD.
A-10
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
On the AMD-K5 processor, if, on the RSM, the I/O Restart Flag
in the SMM save area is set, the debug trap is cancelled and
will be redetected as a result of the reexecution of the I/O
instruction.
A-12
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
A.6 Exceptions
A-14
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
A.7 Debug
Debug A-15
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
A-16
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
Index
Numerics DDC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4
DE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3
4-Mbyte Pages. . . . . . . . . . . . . . . . . . . . . . . 3-5, 3-8 DIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4
dirty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-171
A DSPC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4
G . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8, 3-11
A20M . . . . . . . . . . . . . . . . . . . . . . . . 5-8, 5-18, 6-22
GPE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3
A31–A3 . . . . . . . . . . . . . . . . . . . . . 5-8, 5-20, 5-137
MCE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3–3-4
Accessed bit . . . . . . . . . . . . . . . . . . . . . . . . . 5-171
PS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8, 3-11
Address Parity . . . . . . . . . . . . . . . . . . . . . . . . . 5-8
PSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3
Addresses . . . . . . . . . . . . . . . . . . . . . . . . 5-8, 5-137
PVI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3, 3-24
A20M mask . . . . . . . . . . . . . . . . . . . . . . . . . 5-18
TSC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-27
address generation during bursts . . 5-21, 5-150
TSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3, 3-27
aliasing . . . . . . . . . . . . . . . . . . . . . . . . . 2-16, 2-23
VIF . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13, 3-15
aligned . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-114
VIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13, 3-15
alignment . . . . . . . . . . . . . . . . . . . . . . . . . . 5-137
VME . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3, 3-12
boot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-110
BOFF . . . . . . . . 5-8, 5-37, 5-162, 5-164, 5-173, 6-15
bus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-20
hold . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-28, 5-157 Boot Address . . . . . . . . . . . . . . . 5-82, 5-110, 5-195
Boundary-Scan . . . . . . . . . . . . . . . . . . 5-127–5-131
indexed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3
Boundary-Scan Test Access Port (TAP) . . . . 7-19
parity . . . . . . . . . . . . . . . . . . . . 5-31–5-32, 5-157
pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-96 Branch Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10
Branches . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3, 2-6
selector:offset format . . . . . . . . . . . . . . . . 5-113
prediction. . . . . . . . . . . . . . . . . . . . . . . . . 2-6, 4-2
strobe . . . . . . . . . . . . . . . . . . . . . 5-24, 5-27, 5-58
Address-Generation Interlocks (AGIs). . . . . . 4-4 tracing . . . . . . . . . . . . . . . . . . . 5-35, 5-180, 7-17
Branch-Trace Message Cycle. . . . . . . . . . . . 5-187
ADS . . . . . . . . . . . . . . . . . . . . . . . . 5-8, 5-24, 5-136
BRDY . . . . . . . . . . . . . . . . . . 5-9, 5-41, 5-137, 5-150
ADSC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8, 5-27
AGIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4 BRDYC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9
BREQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8, 5-45
AHOLD . . . . . . 5-8, 5-28, 5-157, 5-159–5-160, 6-17
Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-23
Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16, 2-23
Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-137 external write . . . . . . . . . . . . . . . . . . . . . . . 5-62
invalidation . . . . . . . . . . . . . . . . . . . . . . . . . 2-25
ALU instruction classes . . . . . . . . . . . . . . . . . . 2-9
line-fill . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-23
AP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8, 5-31
APCHK . . . . . . . . . . . . . . . . . . . . . 5-8, 5-32, 5-157 prefetch . . . . . . . . . . . . . . . . . . . . 2-3, 2-22, 2-24
replacement . . . . . . . . . . . . . . . . . . . . . . . . . 2-25
Array Access Register (AAR) . . . . . . . . . . . . . 7-8
store . . . . . . . . . . . . . . 2-8, 2-11–2-12, 2-22, 2-24
Array Pointer . . . . . . . . . . . . . . . . . . . . . . . 7-8–7-9
Array Test Data . . . . . . . . . . . . . . . . . . . . 7-8, 7-10 writeback . . . . . . . . . . . . . . . 2-8, 2-22, 2-25–2-26
Built-In Self Test (BIST) . . . . . . . . . . . . . . . . . 7-5
Bursts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-149
B addresses . . . . . . . . . . . . . . . . . . . . . . 5-21, 5-150
Backoff. . . . . . . . . . . . . . . . . . . . . . . . . . 5-37, 5-162 CACHE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-49
BE7–BE0 . . . . . . . . . . . . . . . . . . . 5-33, 5-56, 5-137 Bus
BF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-10, 5-36 address hold. . . . . . . . . . . . . . . . . . . . . . . . 5-157
BIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-5 arbitration . . . . . 5-8, 5-28, 5-37, 5-45, 5-76, 6-14
Bit Scan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4 backoff . . . . . . . . . . . . . . . . . . . . . . . . 5-37, 5-162
Bit Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4 check. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-46
Bits clock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-52
A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-171 deadlock . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-37
accessed . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-171 frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-36
D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-171 hold . . . . . . . . . . . . . . . . . . . . . . 5-37, 5-76, 5-166
DBP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4 interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1
DC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4 lock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-91
Index I-1
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
I-2 Index
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
Index I-3
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
I-4 Index
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
Index I-5
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
Parity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8–5-9 R
address . . . . . . . . . . . . . . . . . . . 5-31–5-32, 5-157
R/S . . . . . . . . . . . . . . . . . . . . . . . . 5-10, 5-16, 5-107
data . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-57, 5-101
enable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-102 RDMSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-33
RDTSC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-32
PCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9, 5-99
Reads
PCHK . . . . . . . . . . . . . . . . . . . . . . 5-9, 5-101, 5-141
PDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8 I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-146
MESI state . . . . . . . . . . . . . . . . . . . . . . . . . 5-134
PEN . . . . . . . . . . . . . . . . . . . . . . . 5-9, 5-102, 5-141
reordering . . . . . . . . . . . . . . . . . . . . . . . . . . 2-27
Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1
Peripheral Products . . . . . . . . . . . . . . . . . . . . 6-43 single-transfer from memory . . . . . . . . . . 5-141
single-transfer misaligned . . . . . . . . . . . . 5-147
Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4
W/R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-132
byte queue . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7
decode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7 Real Mode
transition from protected mode. . . . . . . . 5-195
dependencies . . . . . . . . . . . . . . . . . . . . . 2-8, 2-11
References . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii
dispatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8
dispatch conflicts . . . . . . . . . . . . . . . . . . . . . 4-3 Register
file. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12
execute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8
Registers
fetch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6
flush . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-13 AAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-8
CR4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2, 3-31
flush (FLUSH) . . . . . . . . . . . . . . . . . . . . . . . 5-66
debug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-16
flush (INIT) . . . . . . . . . . . . . . . . . . . . 5-82, 5-195
flush (INTR). . . . . . . . . . . . . . . . . . . . . . . . . 5-84 DR7–D0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-16
EFLAGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-15
flush (NMI). . . . . . . . . . . . . . . . . . . . . . . . . . 5-98
HWCR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-3
flush (R/S) . . . . . . . . . . . . . . . . . . . . . . . . . 5-107
flush (RESET) . . . . . . . . . . . . . . . . . . . . . . 5-110 MCAR . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4, 3-25
MCTR . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4, 3-26
flush (SMI) . . . . . . . . . . . . . . . . . . . . 5-117, 5-189
model-specific . . . . . . . . . . . . . . . . . . . . . . . 3-25
flush (STPCLK) . . . . . . . . . . . . . . . . 5-123, 5-192
MSRs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-25
forwarding . . . . . . . . . .2-8, 2-11–2-12, 2-16–2-17
operands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
invalidation . . . . . . . . . . . . . . . . . . . . . . . . . 2-12
state after RESET or INIT . . . . . . . . . . . . 5-110
issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8
TAP device ID . . . . . . . . . . . . . . . . . . . . . . . 7-21
load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15
Reorder Buffer (ROB) . . . . . . . . . . . . . . . . . . 2-11
performance. . . . . . . . . . . . . . . . . . . . . . . . . . 4-1
Reordering of Reads and Writes . . . . . . . . . 2-27
retirement . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12
Replacement
serialization . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7
buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-25
store . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15, 2-24
cache. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-20
synchronization . . . . . . . . . . . . . . . . . . . . . . . 2-7
Reserved Opcodes . . . . . . . . . . . . . . . . . . . . . 3-36
Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-38
RESET . . . . . . . . . . . . . . . . . . . . . . 5-8, 5-10, 5-109
Power Management . . . . . . . . . . . . . . . 5-122, 6-33
Reset (soft) . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-81
PRDY . . . . . . . . . . . . . . . . . . . . . . . 5-8, 5-10, 5-103
Retirement . . . . . . . . . . . . . . . . . . . . . . . 2-12, 2-24
Precise interrupts. . . . . . . . . . . . . . . . . . . . . . 5-13
ROB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-11
Predecode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3
ROPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7–2-8
Prefetch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3
RSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-35
buffer . . . . . . . . . . . . . . . . . . . . . . 2-3, 2-22, 2-24
Prefixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3
Privilege level . . . . . . . . . . . . . . . . . . . . . . . . 5-140 S
Probe Mode . . . . . . . . . . . . . . . . 5-103, 5-107, 7-23
SCYC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8, 5-114
Probe Ready . . . . . . . . . . . . . . . . . . . . . . . . . 5-103
Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 2-27
Protected Virtual Interrupts . . . . . . . . . . 3-3, 3-24
Self-Modifying Code. . . . . . . . . . . . . . . . 2-21, 2-23
PS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8, 3-11
Serialization . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7
PSE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3
Serializing instructions . . . . . . . . . . . . . . . . . . 2-8
PTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10
Shift Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9
Public TAP Instructions. . . . . . . . . . . . . . . . . 7-22
Shifts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
PVI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3, 3-24
Shutdown Cycle . . . . . . . . . . . . . . . . . . . . . . 5-182
PWT . . . . . . . . . . . . . . . . . . . . . . . 5-9, 5-105, 5-150
Shutdown State. . . . . . . . . . . . . . . 5-8, 5-35, 5-180
I-6 Index
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
Index I-7
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
I-8 Index
18524C/0—Nov1996 AMD-K5 Processor Technical Reference Manual
Index I-9
AMD-K5 Processor Technical Reference Manual 18524C/0—Nov1996
I-10 Index