0% found this document useful (0 votes)
23 views70 pages

Os n08 Network

This document provides an overview of network protocols. It discusses that network communication is governed by protocols that define message formats and the actions taken upon transmission and reception. It describes the network protocol stack consisting of layers including the application layer (L5), transport layer (L4), internet layer (L3), and link layer (L2). It focuses on higher level details of some common application layer (L5) protocols like HTTP, email protocols, and voice/video conferencing. It also discusses transport layer (L4) protocols like TCP and UDP, focusing on how TCP provides reliable in-order byte streaming while UDP provides best-effort datagram delivery. It briefly introduces the internet layer (L3) IP protocol and

Uploaded by

omerdaube
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views70 pages

Os n08 Network

This document provides an overview of network protocols. It discusses that network communication is governed by protocols that define message formats and the actions taken upon transmission and reception. It describes the network protocol stack consisting of layers including the application layer (L5), transport layer (L4), internet layer (L3), and link layer (L2). It focuses on higher level details of some common application layer (L5) protocols like HTTP, email protocols, and voice/video conferencing. It also discusses transport layer (L4) protocols like TCP and UDP, focusing on how TCP provides reliable in-order byte streaming while UDP provides best-effort datagram delivery. It briefly introduces the internet layer (L3) IP protocol and

Uploaded by

omerdaube
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Operating Systems (234123)

Networking

Dan Tsafrir (2022-05-23 one hour; 2022-05-30)

OS (234123) - networking 1
Preface

NETWORK PROTOCOLS

OS (234123) - networking 2
Protocol definition
• Communicating parties are
– Host machines (computers) & processes
• A network communication protocol is a set of rules defining
– Format & order of messages sent & received
– Action taken upon message transmission & reception
• All network communication activity
– Is governed by protocols

OS (234123) - networking 3
Network protocol stack consists of layers

Layer 5 (L5)
e.g., http, ssh
(typically) user
(typically) kernel
Layer 4 (L4)
e.g., TCP, UDP
Layer 3 (L3) = IP
Layer 2 (L2)
e.g., Ethernet, FC
Layer 1 (L1)
hardware

payload

OS (234123) - networking 4
Higher-level view on

APPLICATION-LAYER (L5) &


TRANSPORT-LAYER (L4) PROTOCOLS

OS (234123) - networking 5
Application-layer (L5) protocols
• When the protocol is determined by the app
– Many examples, here are a few…
• Standard (protocol determined by multiple organizations)
– HTTP, HTTPS (web browsing)
– SMTP, IMAP, POP (email)
– VoIP (voice)
– iCalendar (scheduling)
– NFS (distributed filesystem)
– SSH (secure shell)
– Bitcoin (cryptocurrency)
• Proprietary (single organization; still, can be open)
– Microsoft Exchange (mail & scheduling)
– Skype, Zoom (mostly video conferencing)
– WhatsApp, Telegram (mostly text messaging)
• And you can easily invent your own, as needed
OS (234123) - networking 6
Lossiness
• When considering protocols, we should be aware that
– Networks are inherently lossy
– What’s sent won’t necessarily reach its destination
• For example, because
– Bits specifying destination get flipped due to electrical problems
– Network elements (routers, switches, APs, endpoints) malfunction
– Network cables get severed

OS (234123) - networking 7
Lossiness
• But more frequently, loss occurs because
– Memory buffer space in network elements temporally runs out
• For example, consider the incast problem
– Where many elements simultaneously send data to one
element, beyond its processing / bandwidth capacity
• Over the net
– Loss happens

server 1
All the time

server n
switch 2 switch 1

OS (234123) - networking 8
Reordering
• Also, when considering protocols, should be aware that
– Networks may mess up order of messages
• For example, because
– Multiple paths between source & destination may exist
– Messages may get sent through different paths

server switch router

OS (234123) - networking 9
Transport-layer (L4) protocols
• Protocols in this layer
– Provide host-to-host communication service for processes
– (Typically) implemented by the OS
• Why would we want to user to implement it?
– Example: DPDK (= Data Plane Development Kit)
• Why would we want the NIC (hardware) to implement it?
– Example: RDMA (= remoted DMA)
• There are many examples, but 2 are used significantly more
– TCP (usually) & UDP (occasionally)
• Implemented by all OSes
• Account for the vast majority of internet traffic

OS (234123) - networking 10
TCP (transmission control protocol)
• The protocol that cares…
– It cares about data loss & reordering
(acronym should’ve been “Transmission that Cares Protocol” 😉)
• Provides
– Stream of bytes abstraction, namely
– It ensures that all bytes arrives, in order, to the receiving app
• Said to be “connection-oriented”
– A communication ‘session’ between the two parties must be
negotiated/established before data transmission can begin
• How does TCP implement its nice properties?
– Here’s an oversimplified explanation,
to provide intuition,
in a nutshell…

OS (234123) - networking 11
TCP (transmission control protocol)
• Sender
– Split bytes into contagious chunks (called “segments”)
• Each chunk specifies which bytes [from … to]
– Keep chunks around
• Until receiver acknowledges receiving them
– Resend un-ack-ed chunks
• That haven’t been ack-ed for some time, or
• That the receiver says it’s missing
– Slow down transmission rate
• When noticing sent chunks don’t reach receiver
• Gradually speed up otherwise
– Congestion control = changing rate in response to drops
• Tries to address buffering problem
– https://en.wikipedia.org/wiki/TCP_congestion_control

OS (234123) - networking 12
TCP (transmission control protocol)
• Receiver
– Send ack messages
• Upon received chunks
– Ask sender to resend
• Chunks that appear to be missing for some time
– “Advertise” free buffer space
• Sender is forbidden to send more
– Flow control: advertising
• Receiver controls sender transmission rate, thereby
• Preventing its own buffer overflow
• So that sender won’t transmit too much too fast

– Deliver bytes in order to receiving user-level app

OS (234123) - networking 13
TCP (transmission control protocol)
source: http://blog.cerowrt.org/post/bbrs_basic_beauty ; Sep 2016; congestion
using the flent.org upload benchmark control
algorithm
upload throughput [Mbps]

BBR

ping latency [ms]


upload

Cubic

Reno
ping

time [sec]
TCP “sawtooth” behavior with various congestion control algorithms; occurs
because drops are used to sense congestion. (BBR proposed by Google in 2016.)

OS (234123) - networking 14
UDP (user datagram protocol)
• The protocol that doesn’t care
– Neither about data loss nor about message reordering
– Best effort service
• Provides
– “Datagram” abstraction (as opposed to “data-stream” or just “stream”)
• Datagram = chunk of bytes (still called “segment” here)
– Chunks might get lost or be delivered out-of-order
• But per-chunk bytes integrity is supported (with checksum)
– Apps decide if they’re okay with that
• Compared to TCP
– Simpler, lower latency, no congestion control (so can blast away)
• Said to be “connectionless”
– No negotiation to establish communication ‘session’
– Each chunk handled independently of others

OS (234123) - networking 15
Higher level view on

IP, THE NETWORK-LAYER (L3)


PROTOCOL

OS (234123) - networking 16
Domain & host names
• Computers are associated with hierarchical human-readable
“domain” names, sometimes referred to as host names
– www.cs.technion.ac.il, csa.cs.technion.ac.il, csm.cs.tecnion.ac.il
• Each of these is a name of a single host machine
• So, they’re indeed “host names”, but…
– www.google.com, www.amazon.com, www.cnn.com, www.ynet.co.il
• Each of these is backed by multiple host machines, and…
– hagit.net.technion.ac.il, benny.net.technion.ac.il
• Each of these identifies a website in host net.technion.ac.il
• So “domain” is a more general term
– https://en.wikipedia.org/wiki/Domain_name
• In this lecture, we’ll typically use the term host name
assuming
– That it corresponds to a single host machine

OS (234123) - networking 17
IP addresses lecture ended

• Domain names are only for humans


– They’re not actually used to transmit data around
• Instead, each host is mapped to a 32-bit IP address
– Obtained via a domain name resolution protocol (called DNS; see later)
– Caveat: for us, “IP” usually means IPv4 = “Internet Protocol version #4”
• There’s also IPv6 (128-bit addresses), which we’ll mostly ignore
• Each IP address correspond to a single host
– BTW, unlike a domain name, which may correspond to multiple hosts
• E.g., if its DNS is set to balance the load and resolve to multiple IPs
– The opposite is not true
• One host may be associated with multiple IPs (e.g., when it has
multiple networking devices, such as wireless and wired)
– But for simplicity, let’s ignore all that for the time being and
• Assume that there’s a 1-to-1 host-to-IP mapping

OS (234123) - networking 18
IP addresses
• Presented as four 8-bit decimal octets separated by dots
– For example, the IP of csa.cs.technion.ac.il is
Notation IP address
Decimal 2219057153 (not terribly convenient)
132 68 32 1
Binary (8x4 = 32 bits)
10000100 | 01000100 | 00100000 | 00000001
Dot-decimal 132.68.32.1 (a bit more convenient)

• The protocol that implements IP addresses is… IP


– (Recall that IP = internet protocol)
– It’s the network-layer (L3) protocol
– On top of which TCP & UDP (L4) are implemented
• IP identifies host machines
• TCP/UDP identify individual communication channels of processes
on the host, as we’ll see next

OS (234123) - networking 19
How computers communicate – programmer’s perspective

SOCKETS & PORTS (BACK TO L4),


CLIENT-SERVER

OS (234123) - networking 20
Sockets
• The parties that communicate across the network are
– Apps that run on hosts (e.g., browsers, http server)
• They communicate through
– Socket file descriptors, created via the socket() syscall,
instead of open()
– A sockfd constitutes a communication endpoint
• read()-ing and write()-ing through socket fds
– Translate to receiving & sending data (duplex)
• There’s also more specific system calls (h/w: browse man)
– ssize_t send (int sockfd, const void *buf, size_t len, int flags)
ssize_t recv (int sockfd, void *buf , size_t len, int flags)
• And their scatter-gather versions (h/w: browse man)
– ssize_t sendmsg (int sockfd, const struct msghdr *msg, int flags)
ssize_t recvmsg (int sockfd, struct msghdr *msg , int flags)

OS (234123) - networking 21
Ports
• On the same host, there can be
– Multiple communicating processes, each utilizing multiple sockfds
– IP addresses aren’t a sufficient identifier for transmitted data chunks
• To disambiguate, each sockfd is associated with a
– Port, unsigned 16-bit integer that identifies the sockfd
– Every transmitted chunk is associated with IPaddress + port
• Ports can be either
– Ephemeral (‫ = )זְ מַ נִּי; ְקצַ ר י ִָּמים‬dynamically allocated by the kernel, or
– Well-known = predetermined standard/known values
• Ports ≤ 1023 are “reserved” (for privileged processes)
• For example, http & https traffic flows via ports 80 & 443
– http://www.google.com  http://www.google.com:80
https://www.google.com  https://www.google.com:443
• See more well-known ports in
– https://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers

OS (234123) - networking 22
Example

Chrome socket fd endpoint Apache socket fd endpoint


132.68.32.1 : 22011 132.68.32.15 : 80
(22011 is some ephemeral (80 is a well-known port
port allocated by the kernel) for http service requests)

Chrome socket pair Apache


browser webserver
(client) connect request

csa.cs.technion.ac.il www.cs.technion.ac.il
132.68.32.1 132.68.32.15

OS (234123) - networking 23
Socket unique identifier: 5-tuple
• By, e.g., RFC 6146
https://datatracker.ietf.org/doc/html/rfc6146#section-2

“A 5-tuple

( source IP address, // 1
source port, // 2
destination IP address, // 3
destination port, // 4
transport protocol ) // 5

uniquely identifies a UDP/TCP [socket connection] session.”

• Hence, multiple fd-s at the source or destination machines


may be associated with the same port number

OS (234123) - networking 24
Client-server model
Server process (e.g., Apache) Client process (e.g., Chrome)
• Always on (both host & process) • Asynchronously / intermittently
• Passively waits for clients to connects to server & sends
request service, the then reacts request(s)
(“request-response” paradigm) • May have dynamic IP
• Has well-known domain name • May use ephemeral port
(and/or IP address) & port
time

client server

OS (234123) - networking 25
Simplistic echo server to exemplify the above

TOP-DOWN EXAMPLE
Code available in
http://www.cs.technion.ac.il/~dan/course/os/sock.c

OS (234123) - networking 26
Our echo application-layer protocol
Simplistic echo client Simplistic echo server
1 Connect to the server Accept connection from a
using TCP new client
2 Write (= send) a short byte- Read (= receive) up to N
sequence message to server characters from client
3 Read (= receive) from server at Write (= send = echo) these
most N characters characters back to the client
4 Print the received characters Print client’s host name
5 Exit Go to 1

OS (234123) - networking 27
/*
* Recall that, for simplicity, we’re invoking syscalls
* through DO_SYS, which exit()s upon failure with the
* appropriate error message.
*
* There are advantages to reporting error via return
* values, but we opt for simplicity here.
*/
#define DO_SYS(syscall) do { \
if( (syscall) == -1 ) { \
perror( #syscall ); \
exit(EXIT_FAILURE); \
} \
} while( 0 )

OS (234123) - networking 28
/*
* Implement the protocol’s server side. The host of this
* server and the given port must be known to the client
*/
void echo_server(uint16_t port)
{
const int N=256;
char buf[N];
int k, clifd, srvfd = tcp_establish(port);

for(;;) {
DO_SYS( clifd = accept(srvfd, NULL, NULL) );
DO_SYS( k = read (clifd, buf , N ) );
DO_SYS( write (clifd, buf , k ) );
print_peer("client from:", clifd); // client’s host
DO_SYS( close(clifd) );
}
}

OS (234123) - networking 29
/*
* Implement the protocol’s client side. The given host
* and port must identify the server
*/
void echo_client(const char *host, uint16_t port)
{
char buf[256], msg[] = "hello\n";
int k, fd = tcp_connect(host, port);

DO_SYS( write(fd, msg, strlen(msg)) );


DO_SYS( k = read (fd, buf, sizeof(buf)) );
DO_SYS( write(STDOUT_FILENO, buf, k) );
DO_SYS( close(fd) );

exit(EXIT_SUCCESS);
}

OS (234123) - networking 30
How were the TCP socket fds created?
• Different for
– Client and server
• But both need to
– Properly initialize a struct addrinfo via the getaddrinfo() syscall
– Invoke socket() using values from addrinfo
• Which returns the sockfd
• addrinfo encapsulates all the required info, including
– Protocol family (IPv4 or IPv6)
– Protocol (TCP)
– Socket type (stream)
– Whether this should be a server or client

OS (234123) - networking 31
Creating TCP sockfds: getaddrinfo
int getaddrinfo( const char* node /*host (in our case)*/,
const char* service /*port (in our case)*/,
const struct addrinfo* hint /*input*/,
struct addrinfo** res /*output, free with freeaddrinfo*/);
argument in server in client
directly node null (=this host) server’s host name
service server’s well-known port
input via hint hint.ai_flags AI_PASSIVE (=server) 0 (=client)
hint.ai_family AF_UNSPEC (IP version unspecified)
hint.ai_protocol IPPROTO_TCP
hint.ai_socktype SOCK_STREAM
output via res hint.ai_family either IPv4 or IPv6
hint.ai_addr struct sockaddr*, encapsulates IP+port
hint.ai_addrlen length of *ai_addr
• Thus, programmers can remain unaware of which IP version is being used by
this particular host
– v4 or v6
• And they don’t need to worry about struct sockaddr allocation
– Has different size for different IP version
OS (234123) - networking 32
Creating TCP sockfds: getaddrinfo
int getaddrinfo( const char* node /*host (in our case)*/,
const char* service /*port (in our case)*/,
const struct addrinfo* hint /*input*/,
struct addrinfo** res /*output, free with freeaddrinfo*/);
argument in server in client
directly node null (=this host) server’s host name
service server’s well-known port
input via hint hint.ai_flags AI_PASSIVE (=server) 0 (=client)
hint.ai_family AF_UNSPEC (IP version unspecified)
hint.ai_protocol IPPROTO_TCP
hint.ai_socktype SOCK_STREAM
output via res hint.ai_family either IPv4 or IPv6
hint.ai_addr struct sockaddr*, encapsulates IP+port
hint.ai_addrlen length of *ai_addr
• Why do we need both ai_protocol and ai_socktype?
– Other protocols may implement a stream abstraction too
– Notably, non-default TCP versions specialized for data centers
• LAN (local are network), as opposed WAN (wide area network)
• Where latencies are lower (as low as microseconds)
OS (234123) - networking 33
Creating TCP sockfds: getaddrinfo
struct addrinfo*
alloc_tcp_addr(const char *host, uint16_t port, int flags)
{
int err; struct addrinfo hint, *a; char ps[16];

snprintf(ps, sizeof(ps), "%hu", port); // why string?


memset(&hint, 0, sizeof(hint));
hint.ai_flags = flags;
hint.ai_family = AF_UNSPEC;
hint.ai_socktype = SOCK_STREAM;
hint.ai_protocol = IPPROTO_TCP;

if( (err = getaddrinfo(host, ps, &hint, &a)) != 0 ) {


fprintf(stderr,"%s\n", gai_strerror(err));
exit(EXIT_FAILURE);
}

return a; // should later be freed with freeaddrinfo()


}

OS (234123) - networking 34
Creating TCP sockfds: server
• A sequence of 4 syscalls

– srvfd = socket( protocol family /*IP version*/,


connection type /*stream in our example*/,
protocol /*TCP in our example*/ );

– bind( srvfd,
/*to*/ well-known port associated with server );

– listen( /*on*/ srvfd /*for incoming requests directed at port, */,


/*allowing*/ backlog /*of un-accepet()ed pending requests, */
/*at the most; this syscall actually transforms srvfd to a server fd*/
/*able to accept() new connections (= create clifd-s)*/ );

– clifd = accept( srvfd ); /* new ephemeral fd for each client connect()


request */

OS (234123) - networking 35
Creating+using TCP sockfd-s: both sides
server message client

srvfd = socket(…) sockfd = socket(…)

bind( srvfd , port )

listen( srvfd )

loop:
request service
clifd = accept( srvfd ) (transport) connect( sockfd , host+port )

request
read(clifd ) (application) write( sockfd )

response
write( clifd ) (application) read( sockfd )

OS (234123) - networking 36
Creating TCP sockfds: server
/*
* Return server fd (on this host) that listen()s on port
*/
int tcp_establish(uint16_t port)
{
int srvfd;
struct addrinfo *a =
alloc_tcp_addr(NULL/*host*/, port, AI_PASSIVE);

DO_SYS( srvfd = socket( a->ai_family,


a->ai_socktype,
a->ai_protocol ) );
DO_SYS( bind( srvfd,
a->ai_addr,
a->ai_addrlen ) );
DO_SYS( listen( srvfd,
5/*backlog*/ ) );
freeaddrinfo( a );
return srvfd;
}
OS (234123) - networking 37
Creating TCP sockfds: client
/*
* Return client fd connect()ed to host+port
*/
int tcp_connect(const char* host, uint16_t port)
{
int clifd;
struct addrinfo *a = alloc_tcp_addr(host, port, 0);

DO_SYS( clifd = socket( a->ai_family,


a->ai_socktype,
a->ai_protocol ) );
DO_SYS( connect( clifd,
a->ai_addr,
a->ai_addrlen ) );

freeaddrinfo( a );
return clifd;
}

OS (234123) - networking 38
Getting information about other side
/*
* Print hostname of peer associated with sockfd
*/
void print_peer(const char *msg_prefix, int sockfd)
{
struct sockaddr_storage store; // big enough for any sock
socklen_t alen = sizeof(store); // needed by getpeername
char peer[HOST_NAME_MAX+1]={0}; // name of peer
int mlen = strlen(msg_prefix)+8; // 'msg' needs to be bigger
char msg[sizeof(peer)+mlen]={0}; // use this for printing
struct sockaddr *a = (struct sockaddr*)&store; // base class

DO_SYS( getpeername(sockfd, a, &alen) ); // fills a+alen


if((err=getnameinfo(a,alen,peer,sizeof(peer),NULL,0,0))) {
fprintf(stderr,"%s\n", gai_strerror(err));
exit(EXIT_FAILURE);
}
snprintf(msg, sizeof(msg), "%s %s\n", msg_prefix, peer);
DO_SYS( write(STDOUT_FILENO, msg, strlen(msg)) );
}
OS (234123) - networking 39
Example aftermath lecture ended

• Support very long per-client TCP messages?


– Easily: server reads + writes (=echoes) in a loop
– Until client close()s its sockfd endpoint
• When this happens, read(clifd) returns 0
• Problem
– What if some clients finish quickly
(echo short message)
whereas other take a long time?
(echo very long message, or send message chunks slowly)
=> Convoy effect
• Solution
– Concurrency: serve multiple clifd-s simultaneously, in RR order
• But how will that work, technically?
– Need to use I/O multiplexing syscall: select() or poll() or epoll()
• Get a set of fd-s; return a subset of ready fd-s that won’t block
– See, e.g., https://devarea.com/linux-io-multiplexing-select-vs-poll-vs-epoll/
OS (234123) - networking 40
Example aftermath
• Can we use UDP for our simple echo server (instead of TCP)?
– If we assume each message fits into one segment
• No need to handle reordering
– But loss is an issue, so
• Client should set up an alarm, and
• Retransmit when it expires, if server response hasn’t yet arrived
• What if messages don’t fit in one UDP segment?
– Need multiple segments (=messages), so
– Requests & responses should be numbered due to possible reordering
– Client must then handle out-of-order echo replies

OS (234123) - networking 41
PHYSICAL-LAYER (L1) & LINK-LAYER (L2)
PROTOCOLS

OS (234123) - networking 42
LAN (local area network) connectivity
• LAN examples • Devices in the LAN are
– Computers in your home connected
– CS server room – With wires, or wirelessly,
– Computers where you work or both
– Computer in a data center
(may include multiple LANs)

OS (234123) - networking 43
LAN connectivity
• Each LAN device is equipped with a NIC
NIC
– Network Interface Controller
• Which is connected to a switch(s)
– Possibly via a Wi-Fi AP

WLAN
NIC RJ45

Wi-Fi access
point (AP)
switch

OS (234123) - networking 44
LAN connectivity
• Question
– What’s the protocol LAN nodes use to communicate?
• Nodes = hosts, NICs, APs, switches, phones, printers, …
– What’s the protocol that flows through the wires?
– What’s the protocol that flows on top of Wi-Fi?
– What’s the native “language” that all these components “speak”?
• Both hardware components (nodes)
• And software components (OS device drivers that speak to nodes)
• Answer
– Most frequently, its Ethernet
– We’ll focus on it in this lecture

OS (234123) - networking 45
Ethernet (IEEE 802.3 standard)
• A combination of
– Hardware, firmware, and (OS) software
• Most dominant wired LAN technology
– Simple, cheap, fast
• 10 Gb/sec (Gbps) is probably most widely deployed in datacenters
• 40 Gbps & 100 Gbps commodity
• 200 Gb/sec available, 400 Gb/sec around the corner
• Ethernet is both a link-layer (L2) protocol
– Allows nodes (not processes) to communicate within the LAN
– By sending “frames” (how byte-chunks are called in the link-layer)
• And a physical-layer (L1) protocol
– Lowest protocol layer (EE realm)
– Defines how raw bits (rather than frames) are transmitted
– Defines the transmission media
• For Ethernet, there are several (twisted pairs, fiber, cable, …)

OS (234123) - networking 46
Ethernet (IEEE 802.3 standard)
• Network features
– Connectionless & unreliable (lossy)
– Ethernet frames can fail to reach their destination
• Connection to transport-layer protocols (TCP/UDP)?
– Within the LAN, hardware components speak Ethernet, not TCP/UDP
– They don’t typically understand TCP/UDP (oversimplified)
– It is the OS that implements TCP/UDP on top of Ethernet
– For Ethernet nodes, TCP/UDP traffic appears as regular data
– More on that later

OS (234123) - networking 47
MAC (media access control) address
• A unique 48-bits (= 6-bytes) number
– Identifies each Ethernet component
– Burned in ROM of NIC / Switch / AP
– Used to switch Ethernet frames to their destination within the LAN
• Notation
– 6 pairs of hex digits, e.g.,
• C8:5B:76:EA:B8:A0
– Of which the device manufacturer is allocated 3, exclusively, e.g.,
• CC:46:D6 – Cisco
• 3C:5A:B4 – Google
• 00:9A:CD – HUAWEI
– So, it’s the job of manufacturers to ensure MAC address uniqueness

OS (234123) - networking 48
Ethernet frame format (simplified)
link-layer Ethernet frame
64B ≤ 18B (header) + 46…1500B (data) ≤ 1518B

header payload
S destination source
preamble D MAC addr. MAC addr. size data CRC gap
F
7 bytes 1B 6 bytes 6 bytes 2B 46–1500 bytes 4B 12B

physical-layer Ethernet packet


• Using frame’s header • SDF = start frame delimiter
– A switch knows where to • CRC = cyclic redundancy check,
send the frame, and an error-detecting code
– Receiver can identify sender – Error => frame is dropped

OS (234123) - networking 49
The role of IP, again

WHAT’S THE INTERNET, REALLY?

OS (234123) - networking 50
IP & Routers
• Problem
– LAN hardware components typically “speak” L2 (usually Ethernet)
• A “language” that works exclusively within the LAN
– How then, can node@LAN-A send messages to node@LAN-B?
• Technically, switches@LAN-A can’t talk to switches@LAN-B
• And in addition, note that…

OS (234123) - networking 51
IP & Routers
• Problem
– LAN-B may use a non-Ethernet protocol (e.g., InfiniBand)
– LAN-A may be continents apart from LAN-B
– Getting from LAN-A to LAN-B may require using multiple L1/L2-s, e.g.,
• Telephone, cable, fiber, and/or satellite communication channels

OS (234123) - networking 52
IP & Routers
• Solution
– Router = hardware component
• Physically stands @ edge of at least two networks, LAN1 & LAN2,
• And knows how to communicate with both
– So, the router can speak in the language of
1. L1/L2 of LAN1 & LAN2, and it also speaks
2. IP = the network-layer (L3) protocol that implements IP addresses
– On top of which transport-layer (L4) protocols are built

OS (234123) - networking 53
IP & Routers
• Solution
– Thus, the router can “peel off” L2-headers of LAN1
and re-encapsulate the corresponding data (= IP packets)
within L2-headers of LAN2
– By forwarding IP packets (one setup at a time) between networks,
routers eventually route the packets from source to destination

OS (234123) - networking 54
IP & Routers
• Solution
– Thus, the “internet” is an “inter-network”… of networks!
• That’s the origin of the name
• Networks are vertices in the internet graph,
and routers are edges
– Made possible by the IP (L3) protocol
• Which provides a global address space for all hosts in the world
• And routing tables (instructions how to route)

OS (234123) - networking 55
Closing loose ends &

PUTTING IT ALL TOGETHER

OS (234123) - networking 56
The “TCP/IP” protocol stack – recap
• L1 = physical-layer
– How bits transmitted (EE level)
• L2 = link-layer
– Ethernet (frequently)
– Communication between hosts across a LAN, with MACs
• L3 = network-layer
– IP, which provides global address space
– Communication between hosts across the WAN, with IPs
• L4 = transport-layer
– TCP & UDP (frequently), stream abstraction
– Communication between processes across the WAN
• L5 = application-layer
– Numerous protocols that utilize L4

OS (234123) - networking 57
Reminder: we started off with encapsulation

L5

L4

L3

L2
L1

46B to 1500B data


(64B to 1518B with header)

OS (234123) - networking 58
source
message
Encapsulation
M application
segment Ht M transport
packet Hn Ht M network
frame Hl Hn Ht M link
physical
link
physical

switch

destination Hn Ht M network
M application Hl Hn Ht M link Hn Ht M
Ht M transport physical
Hn Ht M network
Hl Hn Ht M link router
physical

OS (234123) - networking 59
Forwarding & routing

HTTP message
HTTP HTTP
routing

TCP segment
TCP TCP

router router

IP packet IP packet IP packet


IP IP IP IP
forwarding

Ethernet Ethernet SONET SONET Ethernet Ethernet


interface interface interface interface interface interface

Ethernet frame SONET frame Ethernet frame


OS (234123) - networking 60
Layering – IP as a narrow waist
• Many applications protocols on top of UDP & TCP
• IP works over many types of networks
• This is the “Hourglass” architecture of the Internet
– If every network supports IP, applications may run over many different
networks (cellular, …)

OS (234123) - networking 61
Fragmentations & reassembly
• Link layer has MTU
– = Maximal transfer size = largest possible frame in network layer
– Changes for different link types
– For Ethernet: data ≤ 1500B , which is smaller than..
• Maximal IP packet
– 64 KB
• So large IP packets are divided (fragmented)
– One packet becomes several packets
– Each containing all headers of all higher-level protocols
• IP packet
– Header contains fragment offset
– Fragmented @ source
– Reassembled @ final destination
– NICs typically know how to fragment & reassemble (“HW acceleration”)
• TSO (= TCP segmentation offload) & LRO (= large receive offload)

OS (234123) - networking 62
Connection between IP & LANs
• Recall: IP presented as 4 x 8-bit decimal dot-separated octets
– For example, the IP of csa.cs.technion.ac.il is
Notation IP address
Decimal 2219057153 (not terribly convenient)
132 68 32 1
Binary (8x4 = 32 bits)
10000100 | 01000100 | 00100000 | 00000001
Dot-decimal 132.68.32.1 (a bit more convenient)

• Actually, the IP address consist of two parts


– Subnet part: high order bits
– Host part: low order bits
• How is the division determined?
– IP address is always coupled with a corresponding subnet mask

OS (234123) - networking 63
Connection between IP & LANs
• For example, the subnet mask of
– csa.cs.technion.ac.il is the 24 most significant bits
Notation IP address
Decimal 2219057153 (not terribly convenient)
132 68 32 1
Binary (8x4 = 32 bits)
10000100 | 01000100 | 00100000 | 00000001
Dot-decimal 132.68.32.1 (a bit more convenient)

10000100 | 01000100 | 00100000 | 00000001

• CIDR format subnet part host part


– CIDR = classless inter-domain routing
– a.b.c.d/x, where x is the number of bits in the subnet portion
– For csa, this is 132.68.32.1/24

OS (234123) - networking 64
Connection between IP & LANs
• Meaning of subnet
– All nodes in subnet can physically reach each other
• Without intervening router!
• Through switches only
– Namely, the IP subnet
• Defines the boundaries of the Ethernet LAN

OS (234123) - networking 65
ARP – address resolution protocol: IP => MAC
• IP addresses of other machines are known
– Or can be discovered using domain names and DNS
• X knows the IP of Y, with which it wants to connect
– How does X discover Y’s MAC?
• The network learns (plug & play) as follows
– X broadcasts ARP query packet
• Contains Y’s IP
• Destination MAC = FF:FF:FF:FF:FF:FF
– All nodes on LAN receive the query
– Y, which knows its MAC address, sends it back to X (unicast)
• If Y is outside the LAN
– X gets the MAC of the first-hop router,
which can forward it onwards based on its IP

OS (234123) - networking 66
NAT – network address translation
• At home, for example, your router box
– (which is also a switch, an AP, and a modem)
– Typically, has only one IP, allocated to you by your ISP
• But you have many devices in your home LAN
– Which interact with WAN of the outside world (e.g., by browsing)
– How can this be?
• The IP standard defines a few millions of “private IP address"
– 10.0.0.0/8 (= 10.*.*.*)
– 172.16.0.0/12 (= 172.16.0.0 – 172.31.255.255)
– 192.168.0.0/18 (= 192.168.*.*)
• Your router box dynamically assigns such IPs
– To all your home LAN devices
• And it jiggles the IP + port of outgoing/incoming segments
– Accordingly
– (Other routers will refuse to forward private IPs)
OS (234123) - networking 67
DHCP – dynamic host config. protocol: get an IP
• Layer-5 (application) protocol
– Implemented by the DHCP server (= the app)
• In your home, typically, built into your router
• Goal of DHCP server
– Allows client host to dynamically get IP address when joining LAN
– Can also provide
• Subnet mask (indicating net vs. host portion of IP address)
• IP of client’s first-hop router (for IP destination outside LAN)
• Name + IP of DNS server
• Uses UDP
– Which provides the ability to broadcast to all hosts in LAN
– (Which in turn uses Ethernet’s ability to broadcast in LAN)

OS (234123) - networking 68
DHCP – dynamic host configuration protocol

DHCP DHCP ▪ Connecting laptop needs IP address


DHCP UDP (+ IPs of first-hop & DNS server)
DHCP IP => Acts as a DHCP client
DHCP Eth
Phy ▪ Client issues DHCP request
DHCP “is there a DHCP server out there?”
▪ encapsulated in UDP,
DHCP
encapsulated in IP,
DHCP
DHCP UDP encapsulated in Ethernet
DHCP IP
Eth
▪ Ethernet frame broadcast
DHCP router with DHCP (FF:FF:FF:FF:FF:FF) on LAN
Phy server built into
▪ Received at router running
router DHCP server
▪ Ethernet demuxed to IP
demuxed to UDP demuxed to
DHCP

OS (234123) - networking 69
DHCP – dynamic host configuration protocol

DHCP DHCP ▪ DHCP servers responds


DHCP UDP “Here’s the info you should use!”
DHCP IP ▪ New IP for client
DHCP Eth ▪ IP of first-hop router for client
Phy
▪ Name+IP of DNS server for client

▪ Encapsulation of DHCP server,


DHCP DHCP frame forwarded to client,
DHCP UDP demuxing up to DHCP at
DHCP IP client
DHCP Eth router with DHCP
DHCP
Phy
server built into ▪ Client now knows its
IP address,
router
name & IP address of DNS server, IP
address of its first-hop router

OS (234123) - networking 70

You might also like