17EC64 - Module4 - Network Layer Protocol and Unicast Routing
17EC64 - Module4 - Network Layer Protocol and Unicast Routing
17EC 64
PROTOCOLS AND UNICAST ROUTING
MODULE 4
In this chapter, we show how the network layer is implemented in the TCP/IP protocol suite.
The protocols in the network layer have gone through a few versions; in this chapter, we
concentrate on the current version v4.
Communication at the network layer is host-to-host (computer-to-computer); a computer somewhere in
the world needs to communicate with another computer somewhere else in the world through the
Internet.
The packet transmitted by the sending computer may pass through several LANs or WANs before
reaching the destination computer. A global addressing scheme called logical addressing in required for
this communication. The term IP address refers to the logical address in the network layer of the TCP/IP
protocol suite.
Communication at the network layer in the Internet is connectionless. If reliability is important,
IPv4 must be paired with a reliable transport-layer protocol such as TCP.
The network layer in version 4 can be thought of as one main protocol and three auxiliary ones as shown
in Figure 4.1.
• The main protocol, Internet Protocol version 4 (IPv4), is responsible for packetizing,
forwarding, and delivery of a packet at the network layer.
• The Internet Control Message Protocol version 4 (ICMPv4) helps IPv4 to handle some errors
that may occur in the network-layer delivery.
• The Internet Group Management Protocol (IGMP) is used to help IPv4 in multicasting.
• The Address Resolution Protocol (ARP) is used to glue the network and data-link layers in
mapping network-layer addresses to link-layer addresses
Figure 4.1: Position of IP and other network-layer protocols in TCP/IP protocol suite
Datagram Format
The Internet Protocol version 4 (IPv4) is the delivery mechanism used by the TCP/IP protocols. Packets
used by the IP are called datagrams. A datagram is a variable-length packet consisting of two parts: header
and payload (data). The header is 20 to 60 bytes in length and contains information essential to routing and
delivery. It is customary in TCP/IP to show the header in 4-byte sections.
1. Version Number: The 4-bit version number (VER) field defines the version of the IPv4 protocol,
which, obviously, has the value of 4.
2. Header Length: The 4-bit header length (HLEN) field defines the total length of the datagram
header in 4-byte words. The IPv4 datagram has a variable-length header. When a device receives a
datagram, it needs to know when the header stops and the data, which is encapsulated in the packet,
starts. The total length is divided by 4 and the value is inserted in the field. The receiver needs to
multiply the value of this field by 4 to find the total length.
3. Service Type: In the original design of the IP header shown in Fig 4.3, this field was referred to
as type of service (TOS), which defined how the datagram should be handled. In the late 1990s,
IETF redefined the field to provide differentiated services (DiffServ).
4. Total Length: This 16-bit field defines the total length (header plus data) of the IP datagram in
bytes. A 16-bit number can define a total length of up to 65,535 (when all bits are 1s). However,
the size of the datagram is normally much less than this. This field helps the receiving device to know
when the packet has completely arrived. To find the length of the data coming from the upper
layer, subtract the header length from the total length. The header length can be found by
multiplying the value in the HLEN field by 4.
Note: The total length field defines the total length of the datagram including the header.
5. Identification, Flags, and Fragmentation Offset: These three fields are related to the
fragmentation of the IP datagram when the size of the datagram is larger than the underlying
network can carry.
6. Time-to-live: Due to some malfunctioning of routing protocols (discussed later) a datagram may
be circulating in the Internet, visiting some networks over and over without reaching the
destination. This may create extra traffic in the Internet. The time-to-live (TTL) field is used to
control the maximum number of hops (routers) visited by the datagram. When a source host sends
the datagram, it stores a number in this field. This value is approximately two times the maximum
number of routers between any two hosts. Each router that processes the datagram decrements
this number by one. If this value, after being decremented, is zero, the router discards the datagram.
7. Protocol: In TCP/IP, the data section of a packet, called the payload, carries the whole packet
from another protocol. A datagram, for example, can carry a packet belonging to any transport-
layer protocol such as UDP or TCP. A datagram can also carry a packet from other protocols that
directly use the service of the IP, such as some routing protocols or some auxiliary protocols. The
Internet authority has given any protocol that uses the service of IP a unique 8- bit number which
is inserted in the protocol field. When the payload is encapsulated in a datagram at the source IP,
the corresponding protocol number is inserted in this field; when the datagram arrives at the
destination, the value of this field helps to define to which protocol the payload should be delivered.
In other words, this field provides multiplexing at the source and demultiplexing at the destination
Figure 4.4: Multiplexing and demultiplexing using the value of the protocol field
8. Header checksum: IP is not a reliable protocol; it does not check whether the payload carried by
a datagram is corrupted during the transmission. IP puts the burden of error checking of the
payload on the protocol that owns the payload, such as UDP or TCP.
The datagram header, however, is added by IP, and its error-checking is the responsibility of IP.
Errors in the IP header can be a disaster. For example, if the destination IP address is corrupted,
the packet can be delivered to the wrong host. If the protocol field is corrupted, the payload may
be delivered to the wrong protocol. If the fields related to the fragmentation are corrupted, the
datagram cannot be reassembled correctly at the destination, and so on. For these reasons, IP adds
a header checksum field to check the header, but not the payload. We need to remember that, since
the value of some fields, such as TTL, which are related to fragmentation and options, may change
from router to router, the checksum needs to be recalculated at each router. Checksum in the
Internet normally uses a 16-bit field, which is the complement of the sum of other fields calculated
using 1s complement arithmetic.
9. Source and Destination Addresses: These 32-bit source and destination address fields define the
IP address of the source and destination respectively. The source host should know its IP address.
The destination IP address is either known by the protocol that uses the service of IP or is provided
by the DNS. Note that the value of these fields must remain unchanged during the time the IP
datagram travels from the source host to the destination host.
10. Options: A datagram header can have up to 40 bytes of options. Options can be used for network
testing and debugging. Although options are not a required part of the IP header, option processing
is required of the IP software. This means that all implementations must be able to handle options
if they are present in the header. The existence of options in a header creates some burden on the
datagram handling; some options can be changed by routers, which forces each router to recalculate
the header checksum. There are one-byte and multi-byte options
11. Payload: Payload, or data, is the main reason for creating a datagram. Payload is the packet coming
from other protocols that use the service of IP. Comparing a datagram to a postal package, payload
is the content of the package; the header is only the information written on the package
Example 1:
An IPv4 packet has arrived with the first 8 bits as shown : 01000010 The receiver discards the packet. Why?
There is an error in this packet. The 4 leftmost bits (0100) show the version, which is correct. The next 4
bits (0010) show an invalid header length (2 × 4 = 8). The minimum number of bytes in the header must
be 20. The packet has been corrupted in transmission.
Example 2
In an IPv4 packet, the value of HLEN is 1000 in binary. How many bytes of options are being carried by this
packet?
The HLEN value is 8, which means the total number of bytes in the header is 8 × 4, or 32 bytes. The first
20 bytes are the base header, the next 12 bytes are the options.
Example 3
The HLEN value is 5, which means the total number of bytes in the header is 5 × 4, or 20 bytes (no
options). The total length is 40 bytes, which means the packet is carrying 20 bytes of data (40 − 20).
Example 4
An IPv4 packet has arrived with the first few hexadecimal digits as shown. 0x45000028000100000102 . . .
How many hops can this packet travel before being dropped? The data belong to what upper-layer
protocol?
To find the time-to-live field, we skip 8 bytes. The time-to-live field is the ninth byte, which is 01. This means
the packet can travel only one hop. The protocol field is the next byte (02), which means that the upper-
layer protocol is IGMP.
Example 5
An IPv4 packet has arrived with the header decimal digits as shown below. Calculate the checksum for this
header
Fragmentation
A datagram can travel through different networks. Each router decapsulates the IP datagram from the
frame it receives, processes it, and then encapsulates it in another frame.The format and size of the received(
or sent) frames depend on the protocol used by the physical network through which the frame has just
travelled( or going to travel). For example, if a router connects a LAN to a WAN, it receives a frame in the
LAN format and sends a frame in the WAN format.
Example :A packet has arrived with an M bit value of 1. Is this the first fragment, the last fragment, or a middle fragment?
Do we know if the packet was fragmented?
Solution - If the M bit is 1, it means that there is at least one more fragment. This fragment can be the first
one or a middle one, but not the last one. We don’t know if it is the first one or a middle one; we need
more information (the value of the fragmentation offset).
Example : A packet has arrived with an M bit value of 1 and a fragmentation offset value of 0. Is this the first fragment, the
last fragment, or a middle fragment?
Solution - Because the M bit is 1, it is either the first fragment or a middle one. Because the offset value is 0,
it is the first fragment.
Example A packet has arrived in which the offset value is 100. What is the number of the first byte? Do we know the number
of the last byte?
Solution - To find the number of the first byte, we multiply the offset value by 8. This means that the first
byte number is 800. We cannot determine the number of the last byte unless we know the length of the
data.
Example - A packet has arrived in which the offset value is 100, the value of HLEN is 5, and the value of the total length
field is 100. What are the numbers of the first byte and the last byte?
Solution The first byte number is 100 × 8 = 800. The total length is 100 bytes, and the header length is 20
bytes (5 × 4), which means that there are 80 bytes in this datagram. If the first byte number is 800, the last
byte number must be 879.
Options The header of the IPv4 datagram is made of two parts: a fixed part and a variable part. The fixed
part is 20 bytes long and was discussed in the previous section. The variable part comprises the options that
can be a maximum of 40 bytes (in multiples of 4-bytes) to preserve the boundary of the header. Options, as
the name implies, are not required for a datagram. They can be used for network testing and debugging.
Taxonomy of options in IPv4
I Single-Byte Options: There are two single-byte options. No Operation A no-operation option is a 1-
byte option used as a filler between options. End of Option An end-of-option option is a 1-byte option
used for padding at the end of the option field. It, however, can only be used as the last option.
a) Record Route A record route option is used to record the Internet routers that handle the
datagram. It can list up to nine router addresses. It can be used for debugging and management
purposes.
b) Strict Source Route A strict source route option is used by the source to predetermine a route for
the datagram as it travels through the Internet. Here, the sender can choose a route with a specific
type of service, such as minimum delay or maximum throughput. Alternatively, it may choose a
route that is safer or more reliable for the sender’s purpose. For example, a sender can choose a
route so that its datagram does not travel through a competitor’s network. If a datagram specifies a
strict source route, all the routers defined in the option must be visited by the datagram
c) Loose Source Route: A loose source route option is similar to the strict source route, but it is less
rigid. Each router in the list must be visited, but the datagram can visit other routers as well.
d) Timestamp: A timestamp option is used to record the time of datagram processing by a router.
The time is expressed in milliseconds from midnight, Universal time or Greenwich mean time.
Knowing the time a datagram is processed can help users and managers track the behavior of the
routers in the Internet
Security of IPv4 Datagrams
The IPv4 protocol, as well as the whole Internet, was started when the Internet users trusted each other. No
security was provided for the IPv4 protocol. Today, however, the situation is different; the Internet is not
secure anymore. There are three security issues that are particularly applicable to the IP protocol: packet
sniffing, packet modification, and IP spoofing.
Packet Sniffing: An intruder may intercept an IP packet and make a copy of it. Packet sniffing is a passive
attack, in which the attacker does not change the contents of the packet. This type of attack is very difficult
to detect because the sender and the receiver may never know that the packet has been copied. Although
packet sniffing cannot be stopped, encryption of the packet can make the attacker’s effort useless. The
attacker may still sniff the packet, but the content is not detectable.
Packet Modification: The second type of attack is to modify the packet. The attacker intercepts the packet,
changes its contents, and sends the new packet to the receiver. The receiver believes that the packet is
coming from the original sender. This type of attack can be detected using a data integrity mechanism. The
receiver, before opening and using the contents of the message, can use this mechanism to make sure that the
packet has not been changed during the transmission
IP Spoofing : An attacker can masquerade as somebody else and create an IP packet that carries the source
address of another computer. An attacker can send an IP packet to a bank pretending that it is coming
from one of the customers
IPSec The IP packets today can be protected from the previously mentioned attacks using a protocol called
IPSec (IP Security).
ICMPv4
The IPv4 has no error-reporting or error-correcting mechanism. The IP protocol also lacks a mechanism for
host and management queries. A host sometimes needs to determine if a router or another host is alive.
And sometimes a network manager needs information from another host or router.
The Internet Control Message Protocol version 4 (ICMPv4) has been designed to compensate for the above
two deficiencies. It is a companion to the IP protocol. ICMP itself is a network-layer protocol. However,
its messages are not passed directly to the data-link layer as would be expected. Instead, the messages are
first encapsulated inside IP datagrams before going to the lower layer. When an IP
datagram encapsulates an ICMP message, the value of the protocol field in the IP datagram is set to 1 to
indicate that the IP payroll is an ICMP message.
MESSAGES
ICMP messages are divided into two broad categories: error-reporting messages and query messages. The
error-reporting messages report problems that a router or a host (destination) may encounter when it
processes an IP packet. The query messages, which occur in pairs, help a host or a network manager get
specific information from a router or another host. For example, nodes can discover their neighbors
An ICMP message has an 8-byte header and a variable-size data section. Although the general format of
the header is different for each message type, the first 4 bytes are common to all. As Figure 4. shows, the first
field, ICMP type, defines the type of the message. The code field specifies the reason for the particular
message type. The last common field is the checksum field (to be discussed later in the chapter). The rest
of the header is specific for each message type.
The data section in error messages carries information for finding the original packet that had the error. In
query messages, the data section carries extra information based on the type of query.
3. No ICMP error message will be generated for a fragmented datagram that is not the first
fragment.
Note that all error messages contain a data section that includes the IP header of the original datagram plus
the first 8 bytes of data in that datagram. The original datagram header is added to give the original source,
which receives the error message, information about the datagram itself. The 8 bytes of data are included
because the first 8 bytes provide information about the port numbers (UDP and TCP) and sequence
number (TCP). This information is needed so the source can inform the protocols (TCP or UDP) about
the error.
The following are important points about ICMP error messages:
❑ No ICMP error message will be generated in response to a datagram carrying an ICMP error message.
❑ No ICMP error message will be generated for a fragmented datagram that is not the first fragment.
❑ No ICMP error message will be generated for a datagram having a multicast address.
❑ No ICMP error message will be generated for a datagram having a special address such as 127.0.0.0 or
0.0.0.0.
Destination Unreachable - The most widely used error message is the destination unreachable
(type 3). This message uses different codes (0 to 15) to define the type of error message and the
reason why a datagram has not reached its final destination
Source Quench - Another error message is called the source quench (type 4) message, which
informs the sender that the network has encountered congestion and the datagram has been
dropped; the source needs to slow down sending more datagrams. In other words, ICMP adds a
kind of congestion control mechanism to the IP protocol by using this type of message.
Redirection Message - The redirection message (type 5) is used when the source uses a wrong
router to send out its message. The router redirects the message to the appropriate router, but
informs the source that it needs to change its default router in the future. The IP address of the
default router is sent in the message.
Parameter Problem - A parameter problem message (type 12) can be sent when either there is a
problem in the header of a datagram (code 0) or some options are missing or cannot be interpreted
(code 1).
Query Messages - Query messages are used to probe or test the liveliness of hosts or routers in
the Internet, find the one-way or the round-trip time for an IP datagram between two devices, or
even find out whether the clocks in two devices are synchronized. Query messages in ICMP can be
used independently without relation to an IP datagram. But a query message needs to be
encapsulated in a datagram, as a carrier. Naturally, query messages come in pairs: request and reply.
The echo request (type 8) and the echo reply (type 0) pair of messages are used by a host or a router
to test the liveliness of another host or router
Deprecated Messages
Three pairs of messages are declared obsolete by IETF:
1. Information request and replay messages are not used today because their duties are done by the Address
Resolution Protocol (ARP).
2. Address mask request and reply messages are not used today because their duties are done by the
Dynamic Host Configuration Protocol (DHCP)
3. Router solicitation and advertisement messages are not used today because their duties are done by the
Dynamic Host Configuration Protocol (DHCP)
Debugging Tools
There are several tools that can be used in the Internet for debugging. We can determine the viability of a
host or router. We can trace the route of a packet. We introduce two tools that use ICMP for debugging: ping
and trace route.
Ping - We can use the ping program to find if a host is alive and responding. We use ping here to see how
it uses ICMP packets. The source host sends ICMP echo-request messages; the destination, if alive, responds
with ICMP echo-reply messages. The ping program sets the identifier field in the echo-request and echo-
reply message and starts the sequence number from 0; this number is incremented by 1 each time a new
message is sent. Note that ping can calculate the round-trip time. It inserts the sending time in the data
section of the message. When the packet arrives, it subtracts the arrival time from the departure time to get
the round-trip time (RTT).
Example: The following shows how we send a ping message to the auniversity.edu site. We set the identifier field in the echo
request and reply message and start the sequence number from 0; this number is incremented by one each time a new message is
sent. Note that ping can calculate the round-trip time. It inserts the sending time in the data section of the message. When the
packet arrives, it subtracts the arrival time from the departure time to get the round-trip time (rtt).
$ ping auniversity.edu PING auniversity.edu (152.181.8.3) 56 (84) bytes of data.
64 bytes from auniversity.edu (152.181.8.3): icmp_seq=0 ttl=62 time=1.91 ms
64 bytes from auniversity.edu (152.181.8.3): icmp_seq=1 ttl=62 time=2.04 ms
64 bytes from auniversity.edu (152.181.8.3): icmp_seq=2 ttl=62 time=1.90 ms
64 bytes from auniversity.edu (152.181.8.3): icmp_seq=3 ttl=62 time=1.97 ms
64 bytes from auniversity.edu (152.181.8.3): icmp_seq=4 ttl=62 time=1.93 ms
64 bytes from auniversity.edu (152.181.8.3): icmp_seq=5 ttl=62 time=2.00 ms
--- auniversity.edu statistics -- 6 packets transmitted, 6 received, 0% packet loss rtt min/avg/max =
1.90/1.95/2.04 ms
Traceroute or Tracert - The trace route program in UNIX or tracert in Windows can be used to trace the
path of a packet from a source to the destination. It can find the IP addresses of all the routers that are
visited along the path. The program is usually set to check for the maximum of 30 hops (routers) to be
visited. The number of hops in the Internet is normally less than this.
Traceroute - The traceroute program is different from the ping program. The ping program gets help from
two query messages; the traceroute program gets help from two error-reporting messages: time- exceeded
and destination-unreachable. The traceroute is an application layer program, but only the client program is
needed, because, as we can see, the client program never reaches the application layer in the destination host.
In other words, there is no traceroute server program. The traceroute application program is encapsulated
in a UDP user datagram, but traceroute intentionally uses a port number that is not available at the
destination. If there are n routers in the path, the traceroute program sends (n + 1) messages. The first n
messages are discarded by the n routers, one by each router; the last message is discarded by the destination
host. The traceroute client program uses the (n + 1) ICMP error-reporting messages received to find the
path between the routers.
The first traceroute message is sent with time-to-live (TTL) value set to 1; the message is discarded at the
first router and a time-exceeded ICMP error message is sent, from which the traceroute program can find
the IP address of the first router (the source IP address of the error message) and the router name (in the
data section of the message). The second traceroute message is sent with TTL set to 2, which can find the
IP address and the name of the second router. Similarly, the third message can find the information about
router 3. The fourth message, however, reaches the destination host.
ICMP Checksum
In ICMP the checksum is calculated over the entire message (header and data).
Example : Figure 4.13 below shows an example of checksum calculation for a simple echo-request message. We randomly
chose the identifier to be 1 and the sequence number to be 9. The message is divided into 16-bit (2-byte) words. The words are
added and the sum is complemented. Now the sender can put this value in the checksum field.
1. Changing the Address - One simple solution is to let the mobile host change its address as it
goes to the new network. The host can use DHCP to obtain a new address to associate it with the
new network. This approach has several drawbacks. First, the configuration files would need to be
changed. Second, each time the computer moves from one network to another, it must be
rebooted. Third, the DNS tables need to be revised so that every other host in the Internet is aware
of the change. Fourth, if the host roams from one network to another during a transmission, the
data exchange will be interrupted. This is because the ports and IP addresses of the client and the
server must remain constant for the duration of the connection.
2. Two Addresses - The approach that is more feasible is the use of two addresses. The host has its
original address, called the home address, and a temporary address, called the care-of
address. The home address is permanent; it associates the host with its home network, the network
that is the permanent home of the host. The care-of address is temporary. When a host moves from
one network to another, the care-of address changes; it is associated with the foreign network, the
network to which the host moves. When a mobile host visits a foreign network, it receives its care-
of address during the agent discovery and registration phase
❑ Length. The 8-bit length field defines the total length of the extension message (not the length of the
ICMP advertisement message).
❑ Sequence number. The 16-bit sequence number field holds the message number. The recipient can
use the sequence number to determine if a message is lost.
❑ Lifetime. The lifetime field defines the number of seconds that the agent will accept requests. If the
value is a string of 1s, the lifetime is infinite.
❑ Code. The code field is an 8-bit flag in which each bit is set (1) or unset (0). The meanings of the bits
are shown in Table 4.2
Table 4.2 :Code bits
❑ Care-of Addresses. This field contains a list of addresses available for use as careof addresses. The
mobile host can choose one of these addresses. The selection of this care-of address is announced in the
registration request. Note that this field is used only by a foreign agent.
Agent Solicitation
When a mobile host has moved to a new network and has not received agent advertisements, it can initiate
an agent solicitation. It can use the ICMP solicitation message to inform an agent that it needs assistance.
Mobile IP does not use a new packet type for agent solicitation; it uses the router solicitation packet of ICMP.
Registration - The second phase in mobile communication is registration. After a mobile host has moved
to a foreign network and discovered the foreign agent, it must register. There are four aspects of registration:
1. The mobile host must register itself with the foreign agent.
2. The mobile host must register itself with its home agent. This is normally done by the foreign agent on
behalf of the mobile host.
its care-of address and also to announce its home address and home agent address. The foreign agent, after
receiving and registering the request, relays the message to the home agent.
❑ Type. The 8-bit type field defines the type of message. For a request message the value of this field is 1.
❑ Flag. The 8-bit flag field defines forwarding information. The value of each bit can be set or unset. The
meaning of each bit is given in Table 4.3.
Table 4.3 : Registration request flag field bits
❑ Lifetime. This field defines the number of seconds the registration is valid. If the field is a string of 0s,
the request message is asking for deregistration. If the field is a string of 1s, the lifetime is infinite.
❑ Home address. This field contains the permanent (first) address of the mobile host.
❑ Home agent address. This field contains the address of the home agent.
❑ Care-of address. This field is the temporary (second) address of the mobile host. ❑ Identification. This
field contains a 64-bit number that is inserted into the request by the mobile host and repeated in the reply
message. It matches a request with a reply.
❑ Extensions. Variable length extensions are used for authentication. They allow a home agent to
authenticate the mobile agent
Registration Reply - A registration reply is sent from the home agent to the foreign agent and then relayed
to the mobile host. The reply confirms or denies the registration request. Figure 4.18 shows the
format of the registration reply. The fields are similar to those of the registration request with the following
exceptions. The value of the type field is 3. The code field replaces the flag field and shows the result of the
DEPT. OF ELECTR ONICS & COMMU NICA TION E N G G . |M I T
142
MYSOR E
COMPUTER COMMUNICATON NETWORKS | MODULE 4: NETWORK LAYER
17EC 64
PROTOCOLS AND UNICAST ROUTING
registration request (acceptance or denial). The care-of address field is not needed.
consults a registry table to find the care-of address of the mobile host. (Otherwise, the packet would just
be sent back to the home network.) The packet is then sent to the care-of address. Path 3 of Figure 4.19.
shows this step.
From Mobile Host to Remote Host When a mobile host wants to send a packet to a remote host (for
example, a response to the packet it has received), it sends as it does normally. The mobile host prepares a
packet with its home address as the source, and the address of the remote host as the destination. Although
the packet comes from the foreign network, it has the home address of the mobile host. Path 4 of Figure
4.19. shows this step.
Transparency - In this data transfer process, the remote host is unaware of any movement by the mobile
host. The remote host sends packets using the home address of the mobile host as the destination address;
it receives packets that have the home address of the mobile host as the source address. The movement is
totally transparent. The rest of the Internet is not aware of the movement of the mobile host.
Inefficiency in Mobile IP
Communication involving mobile IP can be inefficient. The inefficiency can be severe or moderate. The
severe case is called double crossing or 2X. The moderate case is called triangle routing or dog-leg routing.
Double Crossing - Double crossing occurs when a remote host communicates with a mobile host that has
moved to the same network (or site) as the remote host (see Figure 4.20). When the mobile host sends a
packet to the remote host, there is no inefficiency; the communication is local. However, when the remote
host sends a packet to the mobile host, the packet crosses the Internet twice. Since a computer usually
communicates with other local computers (principle of locality), the inefficiency from double crossing is
significant.
Triangle Routing - Triangle routing, the less severe case, occurs when the remote host communicates with
a mobile host that is not attached to the same network (or site) as the mobile host. When the mobile host
sends a packet to the remote host, there is no inefficiency
UNICAST ROUTING
Unicast routing in the Internet, with a large number of routers and a huge number of hosts,
can be done only by using hierarchical routing: routing in several steps using different
routing algorithms.
In this section, we first discuss the general concept of unicast routing in an internet. After
the routing concepts and algorithms are understood, we show how we can apply them to
the Internet.
General Idea
In unicast routing, a packet is routed, hop by hop, from its source to its destination by the help of
forwarding tables. The source host needs no forwarding table because it delivers its packet to the
default router in its local network. The destination host needs no forwarding table either because it
receives the packet from its default router in its local network. This means that only the routers that glue
together the networks in the internet need forwarding tables. So, routing a packet from its source to its
destination means routing the packet from a source router to a destination router
source router chooses a route to the destination router in such a way that the total cost for the route is
the least cost among all possible routes. In Figure 4.1, the best route between A and E is A-B- E, with the
cost of 6. This means that each router needs to find the least-cost route between itself and all the other
routers to be able to route a packet using this criteria.
Least-Cost Trees
If there are N routers in an internet, there are (N − 1) least-cost paths from each router to any other
router. This means we need N × (N − 1) least-cost paths for the whole internet. If we have only 10
routers in an internet, we need 90 least-cost paths. A better way to see all of these paths is to combine them
in a least-cost tree. A least-cost tree is a tree with the source router as the root that spans the whole graph
(visits all other nodes) and in which the path between the root and any other node is the shortest. In this
way, we can have only one shortest-path tree for each node; we have N least-cost trees for the whole
internet.
Figure 4.23: Least-cost trees for nodes in the internet of Figure 4.22
The least-cost trees for a weighted graph can have several properties if they are created using
consistent criteria 1.
1. The least-cost route from X to Y in X’s tree is the inverse of the least-cost route from Y to X in
Y’s tree; the cost in both directions is the same. For example, in Figure 2.23, the route from A to
F in A’s tree is (A → B → E → F), but the route from F to A in F’s tree is (F → E → B → A),
which is the inverse of the first route. The cost is 8 in each case. 2.
2. Instead of travelling from X to Z using X’s tree, we can travel from X to Y using X’s tree and
continue from Y to Z using Y’s tree. For example, in Figure 2.23, we can go from A to G in A’s
tree using the route (A → B → E → F → G). We can also go from A to E in A’s tree (A → B → E)
MIT MYSORE | DEPT. OF ELECTRONICS & COMMUNICATION ENGG. 147
COMPUTER COMMUNICATON NETWORKS | MODULE 4: NETWORK LAYER
17EC 64
PROTOCOLS AND UNICAST ROUTING
and then continue in E’s tree using the route (E → F → G). The combination of the two routes in
the second case is the same route as in the first case. The cost in the first case is 9; the cost in the
second case is also 9 (6 + 3).
ROUTING ALGORITHMS
Several routing algorithms have been designed in the past. The differences between these methods are in
the way they interpret the least cost and the way they create the least-cost tree for each node. In this
section, we discuss the common algorithms; later we show how a routing protocol in the Internet
implements one of these algorithms.
Distance-Vector Routing
The distance-vector (DV) routing uses the goal we discussed in the introduction, to find the best route. In
distance-vector routing, the first thing each node creates is its own least-cost tree with the
rudimentary information it has about its immediate neighbors. The incomplete trees are exchanged
between immediate neighbors to make the trees more and more complete and to represent the whole
internet. In distance-vector routing, a router continuously tells all of its neighbors what it knows about the
whole internet (although the knowledge can be incomplete).
Bellman-Ford Equation
The heart of distance-vector routing is the famous Bellman-Ford equation. This equation is used to find
the least cost (shortest distance) between a source node, x, and a destination node, y, through some
intermediary nodes (a, b, c, . . .) when the costs between the source and the intermediary nodes and the least
costs between the intermediary nodes and the destination are given. The following shows the general
case in which Dij is the shortest distance and cij is the cost between nodes i and j. Dxy = min{(cxa +
Day), (cxb + Dby), (cxc + Dcy), …}
In distance-vector routing, normally we want to update an existing least cost with a least cost through an
intermediary node, such as z, if the latter is shorter. In this case, the equation becomes simpler, as shown
below:
Dxy = min{Dxy, (cxz + Dzy)}
We can say that the Bellman-Ford equation enables us to build a new least-cost path from previously
established least-cost paths. In Figure 4.24, we can think of (a→y), (b→y), and (c→y) as previously
established least-cost paths and (x→y) as the new least-cost path
DEPT. OF ELECTR ONICS & COMMU NICA TION E N G G . |M I T
148
MYSOR E
COMPUTER COMMUNICATON NETWORKS | MODULE 4: NETWORK
17EC 64
LAYER PROTOCOLS AND UNICAST ROUTING
Distance Vectors
The concept of a distance vector is the rationale for the name distance-vector routing. A least-cost tree
is a combination of least-cost paths from the root of the tree to all destinations. These paths are
graphically glued together to form the tree. Distance-vector routing unglues these paths and creates a
distance vector, a one-dimensional array to represent the tree. Figure 4.25 shows the tree for node A in the
internet in Figure 4.1 and the corresponding distance vector. Note that the name of the distance vector
defines the root, the indexes define the destinations, and the value of each cell defines the least cost
from the root to the destination. A distance vector does not give the path to the destinations as the least-
cost tree does; it gives only the least costs to the destinations
Each node in an internet, when it is booted, creates a very rudimentary distance vector with the
minimum information the node can obtain from its neighborhood. The node sends some greeting
messages out of its interfaces and discovers the identity of the immediate neighbors and the distance
between itself and each neighbor. It then makes a simple distance vector by inserting the discovered
distances in the corresponding cells and leaves the value of other cells as infinity
Figure 4.26 shows all distance vectors for our internet. However, we need to mention that these vectors are
made asynchronously, when the corresponding node has been booted; the existence of all of them in a
figure does not mean synchronous creation of them
These rudimentary vectors cannot help the internet to effectively forward a packet. For example, node A
thinks that it is not connected to node G because the corresponding cell shows the least cost of
infinity. To improve these vectors, the nodes in the internet need to help each other by exchanging
information. After each node has created its vector, it sends a copy of the vector to all its immediate
neighbors. After a node receives a distance vector from a neighbor, it updates its distance vector using the
Bellman-Ford equation (second case). However, we need to understand that we need to update, not only
one least cost, but N of them in which N is the number of the nodes in the internet. If we are using a
program, we can do this using a loop; if we are showing the concept on paper, we can show the whole
vector instead of the N separate equations. We show the whole vector instead of seven equations for
each update in Figure 4.27.
In the first event, node A has sent its vector to node B. Node B updates its vector using the cost cBA =
2. In the second event, node E has sent its vector to node B. Node B updates its vector using the cost cEA
= 4. After the first event, node B has one improvement in its vector: its least cost to node D has changed
from infinity to 5 (via node A). After the second event, node B has one more improvement in its vector;
its least cost to node F has changed from infinity to 6 (via node E). We hope that we have convinced the
reader that exchanging vectors eventually stabilizes the system and allows all nodes to find the ultimate least
cost between themselves and any other node. We need to remember that after updating a node, it
immediately sends its updated vector to all neighbors. Even if its neighbors have received the previous
vector, the updated one may help more
Distance-Vector Routing Algorithm:
Now we can give a simplified pseudocode for the distance-vector routing algorithm, as shown in Figure
4.28. The algorithm is run by its node independently and asynchronously.
Lines 4 to 11 initialize the vector for the node. Lines 14 to 23 show how the vector can be updated after
receiving a vector from the immediate neighbor. The for loop in lines 17 to 20 allows all entries (cells) in
the vector to be updated after receiving a new vector. Note that the node sends its vector in line 12, after
being initialized, and in line 22, after it is updated.
By flooding: Each node send LS packet (LSP) to all its immediate neighbors (each interface) to collect
two pieces of information for each neighboring node:
1. The identity of the node.
When a node receives an LSP, it compares the LSP with the copy it may already have. The node checks the
sequence number in both LSP to know which one is old and discards it, and keep the new one. Then
the node sends a copy of it out of each interface except the one from which the packet arrived. This
guarantees that flooding stops somewhere in the network (where a node has only one interface).
Each node creates the comprehensive LSDB. This LSDB is the same for each node and shows the whole
map of the internet. In other words, a node can make the whole map if it needs to, using
Figure 4.32: LSPs created and sent out by each node to build LSDB
Formation of Least-Cost Trees - To create a least-cost tree for itself, using the shared LSDB, each node
needs to run the famous Dijkstra Algorithm. This iterative algorithm uses the following steps:
1. The node chooses itself as the root of the tree, creating a tree with a single node, and sets the total cost
MIT MYSORE | DEPT. OF ELECTRONICS & COMMUNICATION ENGG. 153
COMPUTER COMMUNICATON NETWORKS | MODULE 4: NETWORK LAYER
17EC 64
PROTOCOLS AND UNICAST ROUTING
ion of the least- cost tree for the graph in Figure 4.10 using Dijkstra’s algorithm.
To respond to these demands, a third routing algorithm, called path-vector (PV) routing has been
devised. Path-vector routing does not have the drawbacks of LS or DV routing as described above
because it is not based on least-cost routing. The best route is determined by the source using the policy
it imposes on the route. In other words, the source can control the path. Although path-vector routing is
not actually used in an internet, and is mostly designed to route a packet between ISPs.
Spanning Trees
In path-vector routing, the path from a source to all destinations is also determined by the best
spanning tree. The best spanning tree, however, is not the least-cost tree; it is the tree determined by
the source when it imposes its own policy. If there is more than one route to a destination, the source can
choose the route that meets its policy best. A source may apply several policies at the same time. Figure 4.13
shows a small internet with only five nodes. Each source has created its own spanning tree that meets its
policy. The policy imposed by all sources is to use the minimum number of nodes to reach a destination.
The spanning tree selected by A and E is such that the communication does not pass through D as a
middle node. Similarly, the spanning tree selected by B is such that the communication does not
pass through C as a middle node.
Path-vector routing also imposes one more condition on this equation: If Path (v, y) includes x, that
path is discarded to avoid a loop in the path. In other words, x does not want to visit itself when it
selects a path to y. Figure 4.15 shows the path vector of node C after two events. In the first event, node
C receives a copy of B’s vector, which improves its vector: now it knows how to reach node A. In the
second event, node C receives a copy of D’s vector, which does not change its vector. The vector for node
C after the first event is stabilized and serves as its forwarding table.
Lines 4 to 12 show the initialization for the node. Lines 17 to 24 show how the node updates its vector
after receiving a vector from the neighbor. The update process is repeated forever. We can see the
similarities between this algorithm and the DV algorithm.
UNICAST ROUTING PROTOCOLS
A protocol is more than an algorithm. A protocol needs to define its domain of operation, the
messages exchanged, communication between routers, and interaction with protocols in other
domains.
We discuss three common protocols used in the Internet:
Internet Structure
Today, the Internet has changed from a tree-like structure, with a single backbone, to a multi-
backbone structure run by different private corporations. The Internet has a structure similar to what is
shown in Figure 4.16. The Internet has changed from a tree-like structure, with a single backbone, to a
multi-backbone structure run by different private corporations today.
backbones. At a lower level, there are some provider networks that use the backbones for global
connectivity but provide services to Internet customers. Finally, there are some customer networks that
use the services provided by the provider networks. Any of these three entities (backbone, provider
network, or customer network) can be called an Internet Service Provider or ISP. They provide services, but
at different levels.
Hierarchical Routing
The Internet today is made of a huge number of networks and routers that connect them. Routing in the
Internet cannot be done using a single protocol for two reasons: a scalability problem and an
administrative issue. Scalability problem means that the size of the forwarding tables becomes huge,
searching for a destination in a forwarding table becomes time-consuming, and updating creates a huge
amount of traffic. The administrative issue is related to the Internet structure described in Figure4.16.
As the figure shows, each ISP is run by an administrative authority. The administrator needs to have
control in its system. The organization must be able to use as many subnets and routers as it needs, may
desire that the routers be from a particular manufacturer, may wish to run a specific routing algorithm
to meet the needs of the organization, and may want to impose some policy on the traffic passing through
its ISP. Hierarchical routing means considering each ISP as an autonomous system (AS). Each AS can
run a routing protocol that meets its needs, but the global Internet runs a global protocol to glue all ASs
together.
The routing protocol run in each AS is referred to as intra-AS routing protocol, intradomain
routing protocol, or interior gateway protocol (IGP); the global routing protocol is referred
to as inter-AS routing protocol, interdomain routing protocol, or exterior gateway protocol
(EGP). There may be several intradomain routing protocols, and each AS is free to choose one, but it
should be clear that there should be only one interdomain protocol that handles routing between these
entities. Presently, the two common intradomain routing protocols are RIP and OSPF; the only
interdomain routing protocol is BGP. The situation may change when we move to IPv6.
Autonomous Systems
Each ISP is an autonomous system when it comes to managing networks and routers under its control.
There are small, medium-size, and large ASs, and each AS is given an autonomous number (ASN) by the
ICANN. Each ASN is a 16-bit unsigned integer that uniquely defines an AS. The autonomous
systems, however, are not categorized according to their size; they are categorized according to the way
they are connected to other ASs. We have stub ASs, multihomed ASs, and transient ASs. The type, affects the
operation of the interdomain routing protocol in relation to that AS.
❑ Stub AS. A stub AS has only one connection to another AS. The data traffic can be either initiated or
terminated in a stub AS; the data cannot pass through it. A good example of a stub AS is the
customer network, which is either the source or the sink of data.
❑ Multihomed AS. A multihomed AS can have more than one connection to other ASs, but it does
not allow data traffic to pass through it. A good example of such an AS is some of the customer ASs that
may use the services of more than one provider network, but their policy does not allow data to be
passed through them.
❑ Transient AS. A transient AS is connected to more than one other AS and also allows the traffic to
pass through. The provider networks and the backbone are good examples of transient ASs.
Routing Information Protocol (RIP)
The Routing Information Protocol (RIP) is one of the most widely used intradomain routing protocols
based on the distance-vector routing algorithm. RIP was started as part of the Xerox Network System
(XNS), but it was the Berkeley Software Distribution (BSD) version of UNIX that helped make the use of
RIP widespread.
Hop Count
A router in this protocol basically implements the distance-vector routing algorithm shown in Table
However, the algorithm has been modified as described below.
1. Since a router in an AS needs to know how to forward a packet to different networks (subnets) in
an AS, RIP routers advertise the cost of reaching different networks instead of reaching other
nodes in a theoretical graph. In other words, the cost is defined between a router and the
network in which the destination host is located.
2. To make the implementation of the cost simpler (independent from performance factors of
the routers and links, such as delay, bandwidth, and so on), the cost is defined as the number of
hops, which means the number of networks (subnets) a packet needs to travel through from
the source router to the final destination host. Note that the network in which the source
host is connected is not counted in this calculation because the source host does not use a
forwarding table; the packet is delivered to the default router. Figure 4.17 shows the concept
of hop count advertised by three routers from a source host to a destination host. In RIP, the
maximum cost of a path can be 15, which means 16 is considered as infinity (no connection).
For this reason, RIP can be used only in autonomous systems in which the diameter
of the AS is not more than 15 hops.
❑ Instead of sending only distance vectors, a router needs to send the whole contents of its
forwarding table in a response message.
❑ The receiver adds one hop to each cost and changes the next router field to the address of the
sending router. Each route in the modified forwarding table is the received route and each route in the
old forwarding table the old route. The received router selects the old routes as the new ones except
in the following three cases:
1. If the received route does not exist in the old forwarding table, it should be added to the route.
2. If the cost of the received route is lower than the cost of the old one, the received route should be
selected as the new one.
3. next router is the same in both routes, the received route should be selected as the new one. This is
the case where the route was actually advertised by the same router in the past, but now the situation
has been changed. For example, suppose a neighbor has previously advertised a route to a destination with
cost 3, but now there is no path between this neighbor and that destination. The neighbor advertises
this destination with cost value infinity (16 in RIP). The receiving router must not ignore this value even
though its old route has a lower cost to the same destination.
The new forwarding table needs to be sorted according to the destination route (mostly using the longest
prefix first)
Example 4.2 Figure 4.43 shows a more realistic example of the operation of RIP in an autonomous
system. First, the figure shows all forwarding tables after all routers have been booted. Then there are
changes in some tables when some update messages have been exchanged. Finally, there are the stabilized
forwarding tables when there is no more change
Every time a new update for the route is received, the timer is reset. If there is a problem on an
internet and no update is received within the allotted 180 seconds, the route is considered expired
and the hop count of the route is set to 16, which means the destination is unreachable.
Every route has its own expiration timer.
3. The garbage collection timer is used to purge a route from the forwarding table. When the
information about a route becomes invalid, the router does not immediately purge that route
from its table. Instead, it continues to advertise the route with a metric value of 16. At the
same time, a garbage collection timer is set to 120 seconds for that route. When the count
reaches zero, the route is purged from the table. This timer allows neighbours to become
aware of the invalidity of a route prior to purging
Performance
Before ending this section, let us briefly discuss the performance of RIP:
❑ Update Messages - The update messages in RIP have a very simple format and are sent
only to neighbours; they are local. They do not normally create traffic because the routers try to
avoid sending them at the same time.
❑ Convergence of Forwarding Tables - RIP uses the distance-vector algorithm, which
can converge slowly if the domain is large, but, since RIP allows only 15 hops in a domain (16 is
considered as infinity), there is normally no problem in convergence. The only problems that may
slow down convergence are count-to-infinity and loops created in the domain; use of poison-
reverse and split-horizon strategies added to the RIP extension may alleviate the situation.
❑ Robustness - As distance-vector routing is based on the concept that each router sends
what it knows about the whole domain to its neighbours. This means that the calculation of the
forwarding table depends on information received from immediate neighbours, which in turn
receive their information from their own neighbours. If there is a failure or corruption in one
router, the problem will be propagated to all routers and the forwarding in each router will be
affected.
Open Shortest Path First
Open Shortest Path First (OSPF) is also an intradomain routing protocol like RIP, but it is based on
the link-state routing protocol. OSPF is an open protocol, which means that the
specification is a public document.
In OSPF, like RIP, the cost of reaching a destination from the host is calculated from the source
router to the destination network. Each link (network) can be assigned a weight based on the
throughput, round-trip time, reliability, and so on. An administration can also decide to use the
hop count as the cost. An interesting point about the cost in OSPF is that different service types
(TOSs) can have different weights as the cost. Figure 4.44 shows the idea of the cost from a
router to the destination host network. We can compare the figure with Figure 4.40 for the
RIP.
passing the information collected by each area to all other areas. In this way, a router in an area can receive
all LSPs generated in other areas. For the purpose of communication, each area has an area identification.
The area identification of the backbone is zero. Figure 20.23 shows an autonomous system and its
areas.
IP datagram that carries a message from OSPF sets the value of the protocol field to 89. This means that,
although OSPF is a routing protocol to help IP to route its datagrams inside an AS, the OSPF messages
are encapsulated inside datagrams. OSPF has gone through two versions: version 1 and version 2. Most
implementations use version 2
OSPF Messages
OSPF is a very complex protocol; it uses five different types of messages. In Figure 20.25, the format of
the OSPF common header (which is used in all messages) and the link-state general header (which is used in
some messages) are given. Then, the outlines of five message types used in OSPF are given. The hello
message (type 1) is used by a router to introduce itself to the neighbours and announce all neighbours
that it already knows. The database description message (type 2) is normally sent in response to
the hello message to allow a newly joined router to acquire the full LSDB. The linkstate request
message (type 3) is sent by a router that needs information about a specific LS. The link-state update
message (type 4) is the main OSPF message used for building the LSDB. This message, in fact, has five
different versions (router link, network link, summary link to network, summary link to AS border
router, and external link). The link-state acknowledgment message (type 5) is used to create
reliability in OSPF; each router that receives a link-state update message needs to acknowledge it.
Authentication - As Figure 20.25 shows, the OSPF common header has the provision for
authentication of the message sender. This prevents a malicious entity from sending OSPF messages
to a router and causing the router to become part of the routing system to which it actually does not
belong.
OSPF Algorithm
OSPF implements the link-state routing algorithm we discussed in the previous section. However, some
changes and augmentations need to be added to the algorithm:
❑ After each router has created the shortest-path tree, the algorithm needs to use it to create the
corresponding routing algorithm.
❑ The algorithm needs to be augmented to handle sending and receiving all five types of messages.
Performance
The performance of OSPF includes:
❑ Update Messages. The link-state messages in OSPF have a somewhat complex format. They also are
flooded to the whole area. If the area is large, these messages may create heavy traffic and use a lot of
bandwidth.
❑ Convergence of Forwarding Tables. When the flooding of LSPs is completed, each router can create
its own shortest-path tree and forwarding table; convergence is fairly quick. However, each router
needs to run Dijkstra’s algorithm, which may take some time.
❑ Robustness. The OSPF protocol is more robust than RIP because, after receiving the completed
LSDB, each router is independent and does not depend on other routers in the area. Corruption or
failure in one router does not affect other routers as seriously as in RIP
Figure 4.26 shows an example of an internet with four autonomous systems. AS2, AS3, and AS4 are stub
autonomous systems; AS1 is a transient one. In our example, data exchange between AS2, AS3, and AS4
should pass through AS1. Each autonomous system in Figure 4.26 uses one of the two common
intradomain protocols, RIP or OSPF. Each router in each AS knows how to reach a network that is in its
own AS, but it does not know how to reach a network in another AS. To enable each router to route a
packet to any network in the internet, a variation of BGP4, called external BGP (eBGP) is installed on
each border router (the one at the edge of each AS which is connected to a router at another AS). Then
MIT MYSORE | DEPT. OF ELECTRONICS & COMMUNICATION ENGG. 169
COMPUTER COMMUNICATON NETWORKS | MODULE 4: NETWORK LAYER
17EC 64
PROTOCOLS AND UNICAST ROUTING
install the second variation of BGP, called internal BGP (iBGP), on all routers. This means that the border
routers will be running three routing protocols (intradomain, eBGP, and iBGP), but other routers are
running two protocols (intradomain and iBGP). In this section, we introduce the basics of BGP and its
relationship with intradomain routing protocols (RIP or OSPF).
Operation of External BGP (eBGP)
BGP is a kind of point-to-point protocol. When the software is installed on two routers, they try to
create a TCP connection using the well-known port 179. Here, a pair of client and server processes
continuously communicate with each other to exchange messages. The two routers that run the BGP
processes are called BGP peers or BGP speakers. The eBGP variation of BGP allows two physically
connected border routers in two different ASs to form pairs of eBGP speakers and exchange messages. The
routers that are eligible in our example in Figure 20.26 form three pairs: R1-R5, R2-R6, and R4-R9. The
connection between these pairs is established over three physical WANs (N5, N6, and N7).
However, there is a need for a logical TCP connection to be created over the physical connection to
make the exchange of information possible. Each logical connection in BGP parlance is referred to as a
session. This means that we need three sessions in our example, as shown in Figure 4.50.
The cost is taken from the intradomain forwarding tables (RIP or OSPF). Figure 20.30 shows the
interdomain forwarding tables. For simplicity, we assume that all ASs are using RIP as the intradomain
routing protocol. The shaded areas are the augmentation injected by the BGP protocol; the default
destinations are indicated as zero
Path Attributes
In both intradomain routing protocols (RIP or OSPF), a destination is normally associated with two
pieces of information: next hop and cost. The first one shows the address of the next router to deliver the
packet; the second defines the cost to the final destination. Interdomain routing is more involved and
naturally needs more information about how to reach the final destination. In BGP these pieces are
called path attributes. BGP allows a destination to be associated with up to seven path attributes. Path
attributes are divided into two broad categories: well-known and optional. A well-known attribute
must be recognized by all routers; an optional attribute need not be. A well-known attribute can be
mandatory, which means that it must be present in any BGP update message, or discretionary, which means
it does not have to be. An optional attribute can be either transitive, which means it can pass to the next
AS, or intransitive, which means it cannot. All attributes are inserted after the corresponding
destination prefix in an update message (discussed later). The format for an attribute is shown in Figure
4.31.
MIT MYSORE | DEPT. OF ELECTRONICS & COMMUNICATION ENGG. 173
COMPUTER COMMUNICATON NETWORKS | MODULE 4: NETWORK LAYER
17EC 64
PROTOCOLS AND UNICAST ROUTING
has no value field, which means the value of the length field is zero.
❑ AGGREGATOR (type 7). This is an optional transitive attribute, which emphasizes that the
destination prefix is an aggregate. The attribute value gives the number of the last AS that did the
aggregation followed by the IP address of the router that did so.
Route Selection
In the case where multiple routes are received to a destination, BGP needs to select one among them. The
route selection process in BGP is not as easy as the ones in the intradomain routing protocol that is based on
the shortest-path tree. A route in BGP has some attributes attached to it and it may come from an eBGP
session or an iBGP session. Figure 4.32 shows the flow diagram as used by common implementations. The
router extracts the routes which meet the criteria in each step. If only one route is extracted, it is selected and
the process stops; otherwise, the process continues with the next step. Note that the first choice is related to
the LOCAL-PREF attribute, which reflects the policy imposed by the administration on the route.
Messages
BGP uses four types of messages for communication between the BGP speakers across the ASs and inside
an AS: open, update, keepalive, and notification (see Figure 4.33). All BGP packets share the same
common header.
❑ Open Message. To create a neighborhood relationship, a router running BGP opens a TCP
connection with a neighbor and sends an open message.
❑ Update Message. The update message is the heart of the BGP protocol. It is used by a router to
withdraw destinations that have been advertised previously, to announce a route to a new
destination, or both. Note that BGP can withdraw several destinations that were advertised before, but
it can only advertise one new destination (or multiple destinations with the same path attributes) in a
single update message.
❑ Keepalive Message. The BGP peers that are running exchange keepalive messages regularly (before
their hold time expires) to tell each other that they are alive.
❑ Notification. A notification message is sent by a router whenever an error condition is detected or a
router wants to close the session.