IP I - Chapter 1 - Internet Technologies and Protocols (1)
IP I - Chapter 1 - Internet Technologies and Protocols (1)
1
1.1. Introduction
• Welcome to the exciting and rapidly evolving world of Internet and web
programming!
• As of April 2022, there are more than five billion Internet users worldwide—
that’s approximately 63.1% of the global population.
• In use today are more than a billion general-purpose computers, and billions
more embedded computers are used in cell phones, smartphones, tablet
computers, home appliances, automobiles and more—and many of these
devices are connected to the Internet.
• According to a study by Cisco Internet Business Solutions Group, there were
12.5 billion Internet-enabled devices in 2010, and the number is predicted to
reach 25 billion by 2015 and 50 billion by 2020.
2
1.2. Overview of the Internet and the World Wide Web
1.2.1. Evolution of the Internet and World Wide Web
• The Internet—a global network of computers—was made possible by the
convergence of computing and communications technologies.
• In the late 1960s, ARPA (the Advanced Research Projects Agency) rolled out
blueprints for networking the main computer systems of about a dozen ARPA-
funded universities and research institutions.
• They were to be connected with communications lines operating at a then-
stunning 56 Kbps (i.e., 56,000 bits per second)—this at a time when most
people (of the few who could) were connecting over telephone lines to
computers at a rate of 110 bits per second.
3
Cont’d
• A bit (short for “binary digit”) is the smallest data item in a computer; it can
assume the value 0 or 1.
• There was great excitement. Researchers at Harvard talked about
communicating with the powerful Univac computer at the University of Utah to
handle the intensive calculations related to their computer graphics research.
• Many other intriguing possibilities were raised. Academic research was about
to take a giant leap forward.
• ARPA proceeded to implement the ARPANET, which eventually evolved into
today’s Internet.
4
Packet Switching
• One of the primary goals for ARPANET was to allow multiple users to send
and receive information simultaneously over the same communications paths
(e.g., phone lines).
• The network operated with a technique called packet switching, in which
digital data was sent in small bundles called packets.
• The packets contained address, error-control and sequencing information.
• This packet-switching technique greatly reduced transmission costs, as
compared with the cost of dedicated communications lines.
• The network was designed to operate without centralized control.
• If a portion of the network failed, the remaining working portions would still
route packets from senders to receivers over alternative paths for reliability.
5
TCP/IP
• TCP - The protocol (i.e., set of rules) for communicating over the ARPANET
• TCP ensured that messages were properly routed from sender to receiver and that
they arrived intact.
• One challenge was to get different networks to communicate.
• ARPA accomplished this with the development of IP, truly creating a network of
networks, the current architecture of the Internet.
• The combined set of protocols is now commonly called TCP/IP.
• Each computer on the Internet has a unique IP address.
• The current IP standard, Internet Protocol version 4 (IPv4), has been in use since
1984 and will soon run out of possible addresses.
• The next-generation Internet Protocol, IPv6, is just starting to be deployed.
• It features enhanced security and a new addressing scheme, hugely expanding the
number of IP addresses available so that we will not run out of IP addresses in6 the
World Wide Web, HTML, HTTP
• The World Wide Web allows computer users to execute web-based
applications and to locate and view multimedia-based documents on almost any
subject over the Internet.
• In 1989, Tim Berners-Lee of CERN (the European Organization for Nuclear
Research) began to develop a technology for sharing information via
hyperlinked text documents.
• Berners-Lee called his invention the HyperText Markup Language (HTML).
• He also wrote communication protocols to form the backbone of his new
information system, which he called the World Wide Web.
• In particular, he wrote the Hypertext Transfer Protocol (HTTP)—a
communications protocol used to send information over the web.
• The URL specifies the address (i.e., location) of the web page displayed in the
browser window. Each web page on the Internet is associated with a unique
URL. URLs usually begin with http:// 7
HTTPS
• URLs of websites that handle private information, such as credit card numbers,
often begin with https://, the abbreviation for Hypertext Transfer Protocol
Secure (HTTPS).
• HTTPS is the standard for transferring encrypted data on the web.
• It combines HTTP with the Secure Sockets Layer (SSL) and the more recent
Transport Layer Security (TLS) cryptographic schemes for securing
communications and identification information over the web.
• Although there are many benefits to using HTTPS, there are a few drawbacks,
most notably some performance issues because encryption and decryption
consume significant computer processing resources.
8
Mosaic, Netscape, Emergence of Web 2.0
• Web use exploded with the availability in 1993 of the Mosaic browser, which featured
a user-friendly graphical interface.
• Marc Andreessen, whose team at the National Center for Supercomputing Applications
(NCSA) developed Mosaic, went on to found Netscape, the company that many people
credit with igniting the explosive Internet economy of the late 1990s.
• But the “dot com” economic bust brought hard times in the early 2000s. The
resurgence that began in 2004 or so has been named Web 2.0.
• Google is widely regarded as the signature company of Web 2.0.
• Some other companies with Web 2.0 characteristics are YouTube (video sharing),
Facebook (social networking), Twitter (microblogging), Groupon (social commerce),
Foursquare (mobile check-in), Salesforce (business software offered as online services
“in the cloud”), Craigslist (mostly free classified listings), Flickr (photo sharing),
Skype (Internet telephony and video calling and conferencing, now owned by
Microsoft) and Wikipedia (a free online encyclopedia).
9
1.2.2. Web Basics
• Web is the service that provides access to information stored on web servers,
the high-capacity, high performance computers that power the web.
• The web consists of a collection of linked files known as webpages.
• Because the web supports text, graphics, audio, and video, a webpage can
display any of these multimedia elements in a browser.
• A website is a related collection of webpages created and maintained by a
person, company, educational institution, or other organization.
• Each website contains a home page, which is the main page and the first
document users see when they access the website.
• The home page typically provides information about the website’s purpose and
content, often by including a list of links to other webpages on the website.10
Cont’d
13
Parts of a URL
• A URL contains information that directs a browser to the resource that the user
wishes to access.
• Web servers make such resources available to web clients.
• Popular web servers include Apache’s HTTP Server and Microsoft’s Internet
Information Services (IIS).
• Let’s examine the components of the URL
• http://www.deitel.com/books/downloads.html
• The text http:// indicates that the HyperText Transfer Protocol (HTTP) should be
used to obtain the resource.
• Next in the URL is the server’s fully qualified hostname (for example,
www.deitel.com) — the name of the web-server computer on which the
resource resides. 14
Cont’d
• This computer is referred to as the host, because it houses and maintains
resources.
• The hostname www.deitel.com is translated into an IP address—a numerical
value that uniquely identifies the server on the Internet.
• An Internet Domain Name System (DNS) server maintains a database of
hostnames and their corresponding IP addresses and performs the translations
automatically.
• The remainder of the URL (/books/downloads.html) specifies the resource’s
location (/books) and name (downloads.html) on the web server.
• The location could represent an actual directory on the web server’s file system.
• For security reasons, however, the location is typically a virtual directory.
• The web server translates the virtual directory into a real location on the server,
thus hiding the resource’s true location.
15
Client-Server Architecture: Making a Request and Receiving a
Response
When given a web page URL, a web browser uses HTTP to request the web page found at
that address. Figure 1.1 shows a web browser sending a request to a web server.
18
Cont’d
• The server first sends a line of text that indicates the HTTP version, followed
by a numeric code and a phrase describing the status of the transaction.
• For example,
HTTP/1.1 200 OK
indicates success, whereas
HTTP/1.1 404 Not found
• informs the client that the web server could not locate the requested resource.
• A complete list of numeric codes indicating the status of an HTTP transaction
can be found at www.w3.org/Protocols/rfc2616/rfc2616-sec10.html.
19
HTTP Headers
• Next, the server sends one or more HTTP headers, which provide additional information
about the data that will be sent.
• In this case, the server is sending an HTML5 text document, so one HTTP header for this
example would read:
Content-type: text/html
• The information provided in this header specifies the Multipurpose Internet Mail
Extensions (MIME) type of the content that the server is transmitting to the browser.
• The MIME standard specifies data formats, which programs can use to interpret data
correctly.
• For example, the MIME type text/plain indicates that the sent information is text that can be
displayed directly.
20
Cont’d
• Similarly, the MIME type image/jpeg indicates that the content is a JPEG
image. When the browser receives this MIME type, it attempts to display the
image.
• The header or set of headers is followed by a blank line, which indicates to the
client browser that the server is finished sending HTTP headers.
• Finally, the server sends the contents of the requested document
(downloads.html).
• The client-side browser then renders (or displays) the document, which may
involve additional HTTP requests to obtain associated CSS and images.
21
HTTP get and post Requests
• The two most common HTTP request types (also known as request methods)
are get and post.
• A get request typically gets (or retrieves) information from a server, such as an
HTML document, an image or search results based on a user-submitted search
term.
• A post request typically posts (or sends) data to a server. Common uses of post
requests are to send form data or documents to a server.
• An HTTP request often posts data to a server-side form handler that processes
the data.
• For example, when a user performs a search or participates in a web-based
survey, the web server receives the information specified in the HTML form as
part of the request.
22
Cont’d
• Get requests and post requests can both be used to send data to a web server,
but each request type sends the information differently.
• A get request appends data to the URL, e.g., www.google.com/search?q=deitel.
• In this case search is the name of Google’s server-side form handler, q is the
name of a variable in Google’s search form and deitel is the search term.
• The ? in the preceding URL separates the query string from the rest of the
URL in a request.
• A name/value pair is passed to the server with the name and the value separated
by an equals sign (=).
• If more than one name/value pair is submitted, each pair is separated by an
ampersand (&).
23
Cont’d
• The server uses data passed in a query string to retrieve an appropriate resource
from the server. The server then sends a response to the client.
• A get request may be initiated by submitting an HTML form whose method
attribute is set to "get", or by typing the URL (possibly containing a query
string) directly into the browser’s address bar. We discuss HTML forms in
Chapters 2.
• A post request sends form data as part of the HTTP message, not as part of the
URL.
• A get request typically limits the query string (i.e., everything to the right of the
?) to a specific number of characters, so it’s often necessary to send large
amounts of information using the post method.
24
Cont’d
• The post method is also sometimes preferred because it hides the submitted
data from the user by embedding it in an HTTP message.
• If a form submits several hidden input values along with user-submitted data,
the post method might generate a URL like www.searchengine.com/search.
• The form data still reaches the server and is processed in a similar fashion to a
get request, but the user does not see the exact information sent.
25
Client-Side Caching
• Browsers often cache (save on disk) recently viewed web pages for quick reloading.
• If there are no changes between the version stored in the cache and the current version on
the web, this speeds up your browsing experience.
• An HTTP response can indicate the length of time for which the content remains “fresh.”
• If this amount of time has not been reached, the browser can avoid another request to the
server.
• If not, the browser loads the document from the cache.
• Similarly, there’s also the “not modified” HTTP response, indicating that the file content
has not changed since it was last requested (which is information that’s send in the
request).
• Browsers typically do not cache the server’s response to a post request, because the next
post might not return the same result. For example, in a survey, many users could visit
the same web page and answer a question. The survey results could then be displayed for
the user. Each new answer would change the survey results.
26
1.3. Multitier Application Architecture
• Web-based applications are often multitier applications (sometimes referred to
as n-tier applications) that divide functionality into separate tiers (i.e., logical
groupings of functionality). Although tiers can be located on the same computer,
the tiers of web-based applications often reside on separate computers. Figure
1.3 presents the basic structure of a three-tier web-based application.