Bgp Troubleshooting
Bgp Troubleshooting
Session Number
Presentation_ID © 2005 Cisco Systems, Inc. All rights reserved. Cisco Confidential 1
Presentation Slides
• Available on
ftp://ftp-eng.cisco.com
/pfs/seminars/NANOG36-BGP-Troubleshooting.pdf
And on the NANOG 36 meeting pages at
http://www.nanog.org/mtg-0602/pdf/smith.pdf
• Fundamentals of Troubleshooting
• Local Configuration Problems
• Internet Reachability Problems
Human error
Typos, using wrong commands, accidents, poorly planned
maintenance activities
Interoperability issues
Differences in interpretation of RFC1771 and its
developments
• Fundamentals
• Local Configuration Problems
• Internet Reachability Problems
• Peer Establishment
• Missing Routes
• Inconsistent Route Selection
• Loops and Convergence Issues
1.1.1.1 2.2.2.2
iBGP
? R1 R2 eBGP
3.3.3.3
AS 1
R3
?
AS 2
R2#sh run | begin ^router bgp
router bgp 1
bgp log-neighbor-changes
neighbor 1.1.1.1 remote-as 1
neighbor 3.3.3.3 remote-as 2
Local AS
We Are Listening for TCP Connections for Port 179 for the
Configured Peering Addresses Only!
R2#
router bgp 1
neighbor 1.1.1.1 remote-as 1
neighbor 1.1.1.1 update-source Loopback0
neighbor 3.3.3.3 remote-as 2
neighbor 3.3.3.3 update-source Loopback0
• Common problem:
iBGP is run between loopback interfaces on router (for
stability), but the configuration is missing from the router ⇒
iBGP fails to establish
Remember that source address is the IP address of the
outgoing interface unless otherwise specified
1.1.1.1 2.2.2.2
iBGP
R1 R2 eBGP
3.3.3.3
AS 1
R3
?
AS 2
• R1 is established now
• The eBGP session is still having trouble!
R2#ping 3.3.3.3
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 3.3.3.3, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 4/4/8 ms
R2#ping ip
Target IP address: 3.3.3.3
Extended commands [n]: y
Source address or interface: 2.2.2.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 3.3.3.3, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)
R2#
router bgp 1
neighbor 3.3.3.3 remote-as 2
neighbor 3.3.3.3 ebgp-multihop 2
neighbor 3.3.3.3 update-source Loopback0
R2#
router bgp 1
neighbor 3.3.3.3 remote-as 2
neighbor 3.3.3.3 ebgp-multihop 2
neighbor 3.3.3.3 update-source Loopback0
neighbor 3.3.3.3 password 7 05080F1C221C
R3#
router bgp 2
neighbor 2.2.2.2 remote-as 1
neighbor 2.2.2.2 ebgp-multihop 2
neighbor 2.2.2.2 update-source Loopback0
• Check configuration on R3
Password is missing from the eBGP configuration
• Fix the R3 configuration
Peering should now come up!
But it does not
R2#
%TCP-6-BADAUTH: Invalid MD5 digest from 3.3.3.3:11024 to 2.2.2.2:179
%TCP-6-BADAUTH: Invalid MD5 digest from 3.3.3.3:11024 to 2.2.2.2:179
%TCP-6-BADAUTH: Invalid MD5 digest from 3.3.3.3:11024 to 2.2.2.2:179
• Common problems:
Missing password – needs to be on both ends
Cut and paste errors – don’t!
Typographical & transcription errors
Capitalisation, extra characters, white space…
• Common solutions:
Check for symptoms/messages in the logs
Re-enter passwords using keyboard, from scratch – don’t cut&paste
AS 1 AS 2
R1 eBGP R2
Layer 2
R2#
%BGP-5-ADJCHANGE: neighbor 1.1.1.1 Down BGP Notification sent
%BGP-3-NOTIFICATION: sent to neighbor 1.1.1.1 4/0 (hold time expired) 0
bytes
R2#show ip bgp neighbor 1.1.1.1 | include Last reset
Last reset 00:01:02, due to BGP Notification sent, hold time expired
R1#ping ip
Target IP address: 2.2.2.2
Repeat count [5]:
Datagram size [100]: 1500
Timeout in seconds [2]:
Extended commands [n]:
Sweep range of sizes [n]:
Type escape sequence to abort.
Sending 5, 1500-byte ICMP Echos to 2.2.2.2, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)
• Diagnosis
Keepalives get lost because they get stuck in the router’s queue
behind BGP update packets.
BGP update packets are packed to the size of the MTU – keepalives
and BGP OPEN packets are not packed to the size of the MTU ⇒
Path MTU problems
Use ping with different size packets to confirm the above – 100byte
ping succeeds, 1500byte ping fails = MTU problem somewhere
• Solution
Pass the problem to the L2 folks – but be helpful, try and pinpoint
using ping where the problem might be in the network
AS 1 AS 2
R1 eBGP R2
Layer 2
Small Packets
Large Packets
• Peer Establishment
• Missing Routes
• Inconsistent Route Selection
• Loops and Convergence Issues
• Route Origination
• UPDATE Exchange
• Filtering
• iBGP mesh problems
• Network statement
R1# show run | include 200.200.0.0
network 200.200.0.0 mask 255.255.252.0
• Route Origination
• UPDATE Exchange
• Filtering
• iBGP mesh problems
• Common issues
Clashing router IDs
Clashing cluster IDs
• Two RR clusters
• R1 is a RR for R3
• R2 is a RR for R4
• R4 is advertising
R1 R2
7.0.0.0/8
• R2 has the route R3 R4
but R1 and R3 do
not?
• Time to debug!!
access-list 100 permit ip host 7.0.0.0 host 255.0.0.0
R1# debug ip bgp update 100
• Tell R2 to resend his UPDATEs
R2# clear ip bgp 1.1.1.1 out
• R1 shows us something interesting
*Mar 1 21:50:12.410: BGP(0): 2.2.2.2 rcv UPDATE w/ attr:
nexthop 4.4.4.4, origin i, localpref 100, metric 0, originator
100.1.1.1, clusterlist 2.2.2.2, path , community , extended
community
*Mar 1 21:50:12.410: BGP(0): 2.2.2.2 rcv UPDATE about 7.0.0.0/8
-- DENIED due to: ORIGINATOR is us;
• Solution
Do NOT set the router ID by hand unless you have a very
good reason to do so and have a very good plan for
deployment
Router-ID is usually calculated automatically by router
• One RR cluster
• R1 and R2 are RRs R1 R2
• R3 and R4 are RRCs
• R4 is advertising 7.0.0.0/8
R2 has it R3 R4
R1 and R3 do not
• Time to debug!!
access-list 100 permit ip host 7.0.0.0 host 255.0.0.0
R1# debug ip bgp update 100
• Route Origination
• UPDATE Exchange
• Filtering
• iBGP mesh problems
• Type of filters
Prefix filters
AS_PATH filters
Community filters
Route-maps
R1 R2
R2#sh ip as-path 1
AS path access list 1
permit ^$
Nothing matches again! Let’s use the up arrow key to see where the
cursor stops
R1 R2
R1#show access-list 99
Standard IP access list 99
permit 10.0.0.0
R1#
4d00h: BGP(0): 2.2.2.2 rcvd UPDATE w/ attr: nexthop 2.2.2.2,
origin i, metric 0, path 12
4d00h: BGP(0): 2.2.2.2 rcvd 10.0.0.0/8 -- DENIED due to: route-map;
• Should be
Extended IP access list 100
permit ip host 10.0.0.0 host 255.0.0.0
• Use prefix-list instead, more difficult to make a mistake
ip prefix-list my_filter permit 10.0.0.0/8
R1 R2
• Configuration verified on R2
• No filters blocking announcement on R2
• So what’s wrong?
• Watch route-maps
Route-map rules often catch out operators when they are
used for filtering
Absence of an appropriate match means the prefix will be
discarded
• Route Origination
• UPDATE Exchange
• Filtering
• iBGP mesh problems
1.1.1.1 R5 2.2.2.2
R1 iBGP R2
eBGP
3.3.3.3 AS 1
R3 4.4.4.4
R4 B
eBGP
AS 3
• Looking at R3
R3#show ip bgp
Status codes: * valid, > best, i - internal,
Network Next Hop Metric LocPrf Weight Path
*> 3.0.0.0 10.10.10.10 0 2 5 i
*> 4.0.0.0 10.10.10.10 0 2 5 i
*> 10.10.0.0/24 10.10.10.10 0 2 i
*> 10.20.0.0/16 10.10.10.10 0 2 i
• Looking at R4
R4#show ip bgp
Network Next Hop Metric LocPrf Weight Path
* i3.0.0.0 10.10.10.10 100 0 2 5 i
* i4.0.0.0 10.10.10.10 100 0 2 5 i
* i10.10.0.0/24 10.10.10.10 100 0 2 i
* i10.20.0.0/16 10.10.10.10 100 0 2 i
R4#show ip bgp
Network Next Hop Metric LocPrf Weight Path
*>i3.0.0.0 3.3.3.3 100 0 2 5 i
*>i4.0.0.0 3.3.3.3 100 0 2 5 i
*>i10.10.0.0/24 3.3.3.3 100 0 2 i
*>i10.20.0.0/16 3.3.3.3 100 0 2 i
1.1.1.1 R5 2.2.2.2
R1 iBGP R2
eBGP
3.3.3.3 AS 1
R3 4.4.4.4
R4 B
eBGP
AS 3
• Customer is connected to R1
Can’t see AS2 ⇒ R3 is somehow not passing routing
information about AS2 to R1
Can’t see R5 ⇒ R5 is somehow not passing routing
information about sites connected to R5
But can see rest of the Internet ⇒ his prefix is being
announced to some places, so not an iBGP origination
problem
• Peer Establishment
• Missing Routes
• Inconsistent Route Selection
• Loops and Convergence Issues
AS 3 AS 10
10.0.0.0/8
R3
R2
MED 30
MED 20 RouterA
AS 2
AS 1
• RouterA will have three paths MED 0
• MEDs from AS 3 will not be compared with R1
MEDs from AS 1
• RouterA will sometimes select the path from R1 as best
and may also select the path from R3 as best
NANOG 36 © 2006 Cisco Systems, Inc. All rights reserved. 116
Inconsistent Route Selection:
Example I
• Initial State
Path 1 beats Path 2 — Lower MED
Path 3 beats Path 1 — Lower Router-ID
AS 3 AS 10
10.0.0.0/8
R3
R2
MED 30
MED 20 RouterA
AS 2
AS 1
MED 0
R1
• RouterA will have three paths
• RouterA will consistently select the path from R1 as best!
R1 R2
R3
• Summary:
RFC1771 wasn’t prefect when it came to path selection —
years of operational experience have shown this
Vendors and ISPs have worked to put in stability
enhancements, now reflected in RFC4271
But these can lead to interesting problems
And of course some defaults linger much longer than they
ought to — so never assume that an out of the box default
configuration will be perfect for your network
• Peer Establishment
• Missing Routes
• Inconsistent Route Selection
• Loops and Convergence Issues
R3
R1
AS 3 R2
142.108.10.2
AS 4
AS 12
• R3 prefers routes via AS 4 one minute
• BGP scanner runs then R3 prefers routes via AS 12
• The entire table oscillates every 60 seconds
• Watch for:
Table version number incrementing rapidly
Number of networks/paths or external/internal
routes changing
R3#sh debug
BGP events debugging is on
BGP updates debugging is on
IP routing debugging is on
R3#
BGP: scanning routing tables
BGP: nettable_walker 142.108.0.0/16 calling revise_route
RT: del 142.108.0.0 via 142.108.10.2, bgp metric [200/0]
BGP: revise route installing 142.108.0.0/16 -> 1.1.1.1
RT: add 142.108.0.0/16 via 1.1.1.1, bgp metric [200/0]
RT: del 156.1.0.0 via 142.108.10.2, bgp metric [200/0]
BGP: revise route installing 156.1.0.0/16 -> 1.1.1.1
RT: add 156.1.0.0/16 via 1.1.1.1, bgp metric [200/0]
R3#
BGP: scanning routing tables
BGP: ip nettable_walker 142.108.0.0/16 calling revise_route
RT: del 142.108.0.0 via 1.1.1.1, bgp metric [200/0]
BGP: revise route installing 142.108.0.0/16 -> 142.108.10.2
RT: add 142.108.0.0/16 via 142.108.10.2, bgp metric [200/0]
BGP: nettable_walker 156.1.0.0/16 calling revise_route
RT: del 156.1.0.0 via 1.1.1.1, bgp metric [200/0]
BGP: revise route installing 156.1.0.0/16 -> 142.108.10.2
RT: add 156.1.0.0/16 via 142.108.10.2, bgp metric [200/0]
R3
R1
AS 3 R2
142.108.10.2
AS 4
AS 12
• R3 naturally prefers routes from AS 12
• R3 does not have an IGP route to 142.108.10.2 which is the next-hop for
routes learned via AS 12
• R3 learns 142.108.0.0/16 via AS 4 so 142.108.10.2 becomes reachable
R3
R1
AS 3 R2
142.108.10.2
AS 4
AS 12
• R3 now has IGP route to AS 12 next-hop or R2 is using next-hop-self
• R3 now prefers routes via AS 12 all the time
• No more oscillation!!
NANOG 36 © 2006 Cisco Systems, Inc. All rights reserved. 138
Troubleshooting Tips
• Fundamentals
• Local Configuration Problems
• Internet Reachability Problems
AS 1 AS 3
192.168.1.0/24
R1 R3
R2
AS 2
• Checklist:
AS1 announces, but does AS2 see it?
• Checklist:
Does AS2 send it to AS3?
AS 1 AS 3
203.51.206.0
R1 R3
The Internet
• Checklist:
AS1 announces, but do its upstreams see it?
• Hmmm….
• Looking Glass can see 202.173.144.0/21
This includes 202.173.147.0/24
So the problem must be with AS3, or AS3’s upstream
• Checklist:
Does AS3’s upstream send it to AS3?
The Internet
AS 2 AS 3
R2 R3
R1
AS 1
• Checklist:
What does “trouble” mean?
• Is outbound traffic loadsharing okay?
Can usually fix this with selectively rejecting prefixes, and using
local preference
Generally easy to fix, local problem, simple application of policy
• Is inbound traffic loadsharing okay?
Errummm, bigger problem if not
Need to do some troubleshooting if configuration with communities,
AS-PATH prepends, MEDs and selective leaking of subprefixes don’t
seem to help
• Checklist:
AS1 announces, but does AS2 see it?
• Checklist:
Does AS2 send it to its upstream?
• Checklist:
Repeat all of the above for AS3
The Internet
AS 2 AS 3
R2 R3
R1
AS 1
• Checklist:
Assume AS1 has done everything in this tutorial so far
• Not panicking
• Logical approach
Check configuration first
Check locally first before blaming the peer
Troubleshoot layer 1, then layer 2, then layer 3, etc
Session Number
Presentation_ID © 2005 Cisco Systems, Inc. All rights reserved. Cisco Confidential 188