Week 9 Module
Week 9 Module
Non-volatile computer storage is based around mass storage drives. Every computer comes with a primary fixed
disk (HDD or SSD). This stores the operating system and applications software that has been installed to the
PC plus data files created by users.
The computer may also have a number of other storage devices, such as a secondary HDD or SSD, a
CD/DVD/BD optical drive or writer, USB removable drives, or a flash memory card reader.
In order for the OS to able to read and write files to a drive, it must be partitioned and formatted with a file system.
Partitioning a hard disk is the act of dividing it into logically separate storage areas. This may be done to improve
the performance of the disk, to install multiple operating systems, or to provide a logical separation of different
data areas. You must create at least one partition on the hard disk before performing a format to create a file
system. Typically, this is done through Windows Setup when building a new PC or through Disk Management
when adding an extra hard disk.
On the primary fixed disk, one of the partitions must be made active. This active partition is also referred to as
the system partition. An active partition is used by the computer to boot. In Windows, the system partition is
usually hidden from File Explorer and is not allocated a drive letter.
Windows Drives
In Windows, each formatted partition can be allocated a drive letter, from A through Z. The boot partition
(containing the operating system files) is usually allocated the letter C. Each removable drive (CD/DVD/BD or
flash memory card for instance) can also be allocated a drive letter.
File Systems
Each partition can be formatted with a different file system. Under Windows, there is a choice between FAT and
NTFS.
■ FAT (File Allocation Table)—this was used for older versions of Windows and is preserved under Windows for
compatibility. Typically, the 32-bit version (FAT32) is used. This permits a maximum file size of 4 GB and a
maximum partition size of 32 GB.
■ NTFS (New Technology File System)—as a 64-bit addressing scheme, NTFS allows much larger partitions
(up to 2 TB) than FAT. NTFS also supports extended attributes, allowing for file-level security permissions,
compression, and encryption. These features make NTFS much more stable and secure than FAT. Windows
must be installed to an NTFS partition.
FAT32 is used for formatting most removable drives and disks as it provides the best compatibility between
different types of computers and devices.
CDs and DVDs are often formatted using Universal Disk Format (UDF), though the older CD format ISO 9660
(or CDFS) offers the best compatibility with legacy drives. Recordable media can be written to once only;
rewritable media support deleting and adding files later, but to make the disc fully compatible with consumer
DVD players, the session must be closed.
Most Linux distributions use some version of the ext file system to format partitions on mass storage devices.
ext3 is a 64-bit file system with support for journaling, which means that the file system tracks changes, giving
better reliability and less chance of file corruption in the event of crashes or power outages. Support for journaling
is the main difference between ext3 and its predecessor (ext2). ext4 delivers significantly better performance
than ext3 and would usually represent the best choice for new systems.
Apple Mac workstations and laptops use the extended Hierarchical File System (HFS+), though the latest macOS
version is being updated to the Apple File System (APFS).
You can evaluate file systems by considering which features they do or do not support:
■ Compression—the file system can automatically reduce the amount of disk space taken up by a file. The file
system applies a non-lossy algorithm to the file to find ways to store the data in it more efficiently without
discarding any information. Note that file system compression only benefits files that are not already compressed.
A file type such as JPEG that already applies compression will be significantly reduced in size.
■ Encryption—the file system can automatically encrypt data in a file when it is saved. This means that the file
can only be opened when there is access to the encryption key. If this is stored separately to the data and/or its
use is protected by a password, the data on the drive is protected even if the disk is stolen and installed in
another computer system.
■ Permissions—the file system maintains an Access Control List (ACL) for each file or folder object. The ACL
records which user accounts are allowed to read, write, or control the object.
■ Journaling—the file system tracks changes or intended changes in a log. This means that if there is a sudden
power cut and a particular write operation was interrupted, the journal may be used to recover the data or at
least restore the file system to good working order (consistent state).
■ Limitations—as noted in the table below, file systems have limits in terms of their maximum capacity and the
size of individual files.
■ Naming rules—very old file systems limited the size of a file name to eight characters plus a three-character
extension. Modern file systems support longer file names (usually up to 255 characters) and complete directory
paths, use of Unicode characters in the name, and support distinguishing the case of file name characters. File
systems also have a number of reserved characters which cannot be used in a file name.
Folders and Directories
The purpose of a drive is to store files. Folders are a means of organizing files on each drive to make them easier
to find. Folders can also create distinct areas in terms of security access controls. Operating system files can be
separated from user data files, and standard users can be prevented from modifying them. Also, each user can
have a protected storage area that other standard users cannot access, unless the folder is shared.
Folders are created in a hierarchy of subfolders. The first level of the hierarchy is called the root folder. This is
created when the drive is formatted. The root folder is identified by the drive label and a backslash. For example,
the root folder of the C: drive is C:\ The root folder can contain files and subfolders. The path to a subfolder is
also separated by backslashes. For example, in C:\WINDOWS\System32\, WINDOWS is a subfolder of the root
and System32 is a subfolder of WINDOWS.
A default folder structure is created on the boot partition when Windows is installed. A default installation creates
the following three system folders:
■ Windows—the "system root," containing drivers, logs, add-in applications, system and Registry files (notably
the System32 subfolder), and so on. System32 contains most of the applications and utilities used to manage
and configure Windows.
■ Users—storage for users' profile settings and data (Documents, Temporary Internet Files, Cookies, recent file
shortcuts, desktop shortcuts, and so on).
Linux Directories
"Folder" is a Windows-specific term. In Linux, these containers are called directories. Also, Linux uses the
forward slash (/) to represent the root and as a directory delimiter. For example, in the directory path /home/andy,
home is a subdirectory of the root directory and andy is a subdirectory of home.
It is important to realize that everything available to the Linux OS is represented as a file in the file system,
including devices. This is referred to as the unified file system. For example, a single hard drive attached to a
SATA port would normally be represented in the file system by /dev/sda. A second storage device—perhaps one
attached to a USB port—would be represented as /dev/sdb. There is no concept of "drive letters" in Linux.
Everything is represented through the file system.
A file system configured on a partition on a particular storage device is attached to a particular directory (mount
point) within the unified file system using the mount command. For example:
...mounts partition 1 on the mass storage device sda to the directory /mnt/mydrive. Mountable file systems are
listed in the /etc/fstab file.
File Explorer
In Windows, File Explorer (called Windows Explorer in previous versions and very widely just referred to as
"Explorer") provides a visual means of navigating the file system. In the main pane, you can double-click a folder
to open it. You can use the Navigation pane to expand and collapse objects or the Breadcrumb on the address
bar and Back and Forward buttons on the toolbar
Navigation Pane
When browsing the computer using File Explorer in Windows 10, two top-level categories are shown in the
navigation pane. Quick access contains shortcuts to folders that are most useful. These can be modified by
dragging and dropping. By default, it contains shortcuts to your personal Desktop, Downloads, Documents, and
Pictures folders
The second top-level category is the Desktop. Under the "Desktop" object, you can find the following categories:
■ OneDrive—if you sign into the computer with a Microsoft account, this shows the files and folders saved to
your cloud storage service on the Internet. As you can see from the screenshot, other cloud service providers
may add links here too.
■ User account—the folders belonging to your account profile. For example, in the screenshot above the user
account is listed as "James at CompTIA.
■ This PC—access to user-generated files in the user's profile plus the hard drives and removable storage drives
available to the PC.
■ Libraries—these can be used to create views of folders and files stored in different locations and on different
disks.
■ Network—contains computers, shared folders, and shared printers available over the network.
■ Control Panel—options for configuring legacy Windows features (most configuration is now performed via the
Settings app rather than Control Panel).
■ Recycle Bin—provides an option for recovering files and folders that have been recently deleted.
User Profiles and Libraries
Each user has his or her own profile folder, stored under the Users system folder. Files in each user's profile are
private (though a user with administrative privileges can still access them). Each profile folder contains subfolders
for different types of file (documents, music, pictures, video, and so on). The profile folder also contains hidden
subfolders used to store application settings and customizations, favorite links, shortcuts, temporary files, and
so on.
Windows also configures a Public profile to allow users of the PC to share files between them (a local share).
In Windows 10, libraries are used to provide easy access to different kinds of documents that may be stored in
different places. For example, you may store pictures in your pictures folder, on a flash drive, and on a network.
You can view all these pictures in one location by adding the locations to a library. Libraries work as a kind of
"virtual" folder
By default, each profile contains libraries for Documents, Music, Pictures, and Videos. You can create new
libraries using the toolbar or by right-clicking in the Libraries folder. Right-clicking a library icon allows you to set
the locations (folders) it includes and optimize the library display settings for a particular type of file.
You can also set the default save location (the physical folder used when you save a file to a library)
You can use the shortcut or File menus to create a new folder within another object. Windows has various folder
naming rules that must be followed when modifying the folder structure:
■ No two subfolders within the same folder may have the same name. Subfolders of different folders may have
the same name though.
■ Folder names may not contain the following reserved characters: \ / : * ? " < > |
■ The full path to an object (including any file name and extension) may not usually exceed 260 characters.
A warning message is displayed if these rules are not followed and the user is prompted to enter a new folder
name.
Files
Files are the containers for the data that is used and modified through the operating system and applications.
Files store either text or binary data; text data is human-readable, while binary data can only be interpreted by a
software application compatible with that file type.
Files follow a similar naming convention to folders, except that the last part of the file name represents an
extension, which describes what type of file it is and is used by Windows to associate the file with an application.
The extension is divided from the rest of the file name by a period. By convention, extensions are three
characters. By default, the extension is not shown to the user.
System and application files are created when you install programs. User files are created when you use the
Save or Save As function of a program
As you can see, the File Explorer tools are available in an application's Save dialog to navigate between folders.
Most applications let you save the file in one of several file formats, accessed through the Save as type box.
Save As dialog
You can also create certain types of file in Explorer by right-clicking in a folder and selecting New, followed by
the type of file you want to create.
Files are usually opened by double-clicking them. You may want to open a file in a software product other than
the default however. When you right-click a file, the shortcut menu displays a list of suitable choices, or you can
choose Open With and browse for different application. You can also use the Default Programs applet to
configure file associations. When creating and editing text files, you must be careful to use a plain text file format,
such as that used by Notepad (a Windows accessory). If you convert a plain text system file to a binary format,
it will become unusable.
You must also use the Save command to retain any changes you make while editing a document. If you want to
keep both the original document and the edited version, use the Save As command to create a new file with a
different name and/or stored in a different folder.
The File Explorer Options applet in Control Panel controls how Explorer works. The General tab contains options
for opening files by single-clicking and for opening folders in the same or new windows.
The View tab contains a long list of options affecting how folders and files are displayed in Explorer (such as
whether to show hidden files or file extensions). View settings (such as whether to show thumbnail icons or
details) are retained on a per-folder basis but can be reset using the buttons on the View tab.
Renaming Files and Folders
To rename a file or folder, select it, press F2, then type the new name. You can also right-click the file and select
Rename.
■ Use the Edit > Cut/Copy/Paste commands from the main menu or shortcut menu or their keyboard shortcuts
(CTRL+X, CTRL+C, CTRL+V).
■ Drag and drop the object, holding down CTRL to copy or SHIFT to move (or CTRL+SHIFT to create a shortcut).
■ Right-click drag the object and select an option from the shortcut menu displayed when you release the mouse
button.
■ Use the Send To command from the main menu or shortcut menu to copy a file to a disk or send it by email.
If a folder contains a file with the same name as the file being pasted, a confirmation dialog is shown:
You can choose to overwrite the destination file, cancel the paste operation, or keep both files by renaming the
one you are moving or copying (in Windows 8, choose the Compare info for both files option to do this). If doing
this with several files, there is also a check box to choose the same option for all conflicts.
Deleting Files and the Recycle Bin
To delete a file using Explorer, select it then press DEL (or use the shortcut menu). Confirm the action using the
prompts.
If you accidentally delete a file or folder from a local hard disk, you can retrieve it from the Recycle Bin. A retrieved
file will be restored to the location from which it was deleted. The size of the Recycle Bin is limited by default to
10% of the drive's capacity. If large numbers of files are deleted, those files that have been in the Recycle Bin
the longest will be permanently deleted to make room for the newly deleted files.
To recover a file, open the Recycle Bin , right-click the icon(s) to recover, and select Restore.
If disk space is low, the Recycle Bin can be emptied (right-click the Recycle Bin icon and select Empty Recycle
Bin from the shortcut menu). This process will permanently remove deleted files.
Recycle Bin properties—note that on this PC there are two hard drives, each with its own recycle bin.
You can set the amount of space to use on a per-drive basis or set one Recycle Bin for all local drives. You can
also choose to suppress the use of delete confirmation dialogs.
To set these options, right-click the Recycle Bin and select Properties.
Selecting Multiple Files and Folders
You can also perform actions on multiple files and folders. To do so, you need to be able to select the icons you
want. There are various ways of doing this:
■ Click and drag the mouse cursor around a block of files or select the first icon then hold SHIFT and click the
last icon to select a block. You may want to sort the icons into a particular order first (see the "Searching for
Folders and Files" topic below).
■ Select the first icon then hold CTRL and select any other icons you want.
■ Use SHIFT with the ARROW keys to select a block of files using the keyboard.
■ Use CTRL with the ARROW keys to keep your existing selection, using the SPACEBAR to add icons to it.
A file's name is just one of its attributes. Other attributes include the date the file was created, accessed, or
modified, its size, its description, and the following markers, which can be enabled or disabled:
Files stored on an NTFS partition have extended attributes, including permissions, compression, and encryption.
You can set some attributes manually using the file or folder's properties dialog. To open the properties dialog
for a file or folder, right-click and select Properties. The properties for a folder will show the size of all the files in
that folder (plus any subfolders). The properties for a file (or selection of multiple files) will show the file size.
Folder and File Permissions
To view, create, modify, or delete a file in a folder, you need the correct permissions on that folder. Permissions
can also be applied to individual files. Administrators can obtain full permissions over any file, but standard users
can generally only view and modify files stored either in their profile or in the public profile. If a user attempts to
view or save a file with insufficient permissions to do so, Windows displays an Access Denied error message.
Custom permissions can be configured for a file or folder using the Security tab in its properties dialog.
To configure permissions, you first select the account to which the permissions apply. You can then set the
appropriate permission level. In simple terms, the permissions available are as follows:
■ Full control—allows the user to do anything with the object, including change its permissions and its owner.
■ Modify—allows the user to do most things with an object but not to change its permissions or owner.
■ Read/list/execute—allow the user to view the contents of a file or folder or start a program.
■ Write—allows the user to read a file and change it, or create a file within a folder, but not to delete it.
Windows Search enables you to locate files and information located in on your computer, within apps, such as
email, or on the web. Search makes automatic use of file and folder properties (or metadata) and file contents.
In Windows 10, the simplest way to search is to press the START key and type a search phrase. Files, programs,
apps, messages, and web pages that match your search are displayed instantly:
In Windows 10, the search box is located next to the Start button. You can type your search text straight into the
box, or you can use vocal commands to initiate a search by using Windows Cortana, Windows 10’s digital
assistant. Type the required search string, and if necessary, click the Apps, Documents, or Web tab to filter
results accordingly.
To search for files, you can also use File Explorer. The Explorer search box is located in the top-right corner of
the window. Pressing F3 in Explorer activates the search box.
You can open, rename, delete, move, and copy files from the search results as normal. If a basic search does
not locate the file you want, you can add a filter to reduce the number of results:
In any folder, you can also use the view options to make finding a file or files easier. The view options set how
large icons are, and you can use Details view to show information about each file in columns. The column
headers allow you to sort files in ascending or descending order (or in other views you can right-click and select
Sort By).
The column headers or right-click menu also allow you to group and filter by the information in that field:
■ Group—show icons in groups with dividers between them (for example, all files with names beginning "A,"
then all files beginning with "B," and so on).
It is worth knowing some of the extensions used to identify common file types.
The following file formats are often used by word processing software:
■ txt—a text-only file with no "binary" file information linking the file to a particular software application. Any
application can open a text file, but this file type cannot store any information about formatting or layout.
■ rtf—Rich Text Format is an early "generic" file format for sharing documents between different word processing
applications. It is capable of storing basic formatting information, such as font and paragraph formatting, and
layout features, such as tables.
■ odf—the Open Document Format is an XML-based specification with better support for the features of modern
word processors than RTF.
■ doc/docx—this format is the one used by Microsoft Word. The docx XMLbased format was introduced in Word
2007.
Spreadsheet Software
Microsoft's Excel spreadsheet software saves files with an xls or xlsx (Excel 2007 and up) file extension.
Presentation Software
Microsoft's PowerPoint presentation software saves files with a ppt or pptx (PowerPoint 2007 and up) file
extension.
Adobe's Portable Document Format (PDF) is a file format for distributing documents. It is now an open standard,
so different productivity applications can use it. For example, you could save a Microsoft Word document to PDF
format and then open it in the Adobe Reader PDF viewer application. Most web browsers have plug-in PDF
viewers. PDF was envisaged as a "final" format for the distribution of a published document. A PDF should look
the same on-screen as it does when printed. It is possible to edit PDFs (using special applications) or to export
a document from PDF to another format. In most cases though, it is important to keep a copy of the document
in its "native" format. For example, having published a PDF from a Word document file, you would also save the
latest changes to the Word file and keep it as the source file for any future changes.
DTP and graphic design applications (and most productivity software) can make use of images in digital file
formats. A number of different image file formats have been developed for use in different scenarios:
■ jpg/jpeg (Joint Pictures Expert Group)—this lossy compression format is the most widely used for photographic
pictures. The lossy compression method relies on dithering the image to some extent (changing the color value
of some pixels). The user can select a level of compression when saving the file, trading picture quality for
reduced file size.
■ gif (Graphics Interchange Format)—this is an old lossless compression format. It only supports up to 8 bits per
pixel, seriously limiting the available color palette. An 8-bit image can have up to 256 color values. Modern image
formats support up to 24 bits per pixel, allowing a palette of millions of color values.
■ tiff (Tagged Image File Format)—this is a popular format for exchanging images between editing applications.
It can use lossless or JPEG compression.
■ png (Portable Network Graphics)—this is a full-color (24-bit) lossless format designed to replace GIF. It also
supports transparency.
■ bmp—this is a Windows-only lossless format. It is not widely used due to its lack of compatibility with other
operating systems.
■ mpg—this is an early MPEG (Motion Pictures Expert Group) standard for video files with lossy compression.
■ mp4—the MPEG-4 standard audio/video file format. The format acts as a container for audio and video media
streams (plus additional media, such as subtitles). A number of different encoding methods (or codecs) are
available. One of the most widely used is H.264.
■ flv—another container file format designed to deliver Flash Video. This is video created in the Adobe Flash
developer tool. It can be viewed through the free Flash Player browser plug-in. Flash was once ubiquitous on
the web but its use is declining since Apple refused to support it on the iPhone and iPad. The HTML5 web page
coding language provides a standards-based alternative to Flash.
■ wmv (Windows Media Video)—a video container file format developed by Microsoft. It is well supported by
media players and can also be used as the format for DVD and Blu-ray Discs.
■ avi—a legacy Windows-only video format. It is a limited format with not much ongoing support.
■ mp3—developed from MPEG, this remains one of the most popular formats for distributing music and is almost
universally supported by media players. The only drawback is that it is a lossy compression format, which means
that some of the audio information is discarded.
■ aac (Advanced Audio Coding)—developed from MPEG as a successor to mp3. This format is also widely
supported.
■ m4a—this is an audio-only file format deriving from the MPEG-4 standards track. It usually uses AAC
compression, though other methods are available (including lossless ones).
■ flac (Free Lossless Audio Codec)—as the name suggests, this format achieves file size compression without
discarding audio data. The only drawback is that it is not quite as widely supported by media players.
■ wav—this is an early Windows audio file format. It is not widely supported by media players but may be used
by audio editing applications.
Executable Files
An executable file is one that contains program code. Unlike a data file, program code can make changes to the
computer system. Most operating systems enforce permissions to restrict the right to run executable code to
administrator-level users.
■ msi—this is a Windows Installer file used to install and uninstall software applications under Windows.
Compression Formats
Often, to send or store a file it needs to be compressed in some way, to reduce the amount of space it takes up
on the storage media or the bandwidth required to send it over a network. There are a number of compression
utilities and formats.
■ zip—this format was developed for the PKZIP utility but is now supported "natively" by Windows, Mac OS X,
and Linux. "Natively" means that the OS can create and extract files from the archive without having to install a
third-party application.
■ tar—this was originally a UNIX format for writing to magnetic tape (tape archive) but is still used with gzip
compression (tgz or .tar.gz) as a compressed file format for UNIX, Linux, and macOS. A third-party utility is
required to create and decompress tar files in Windows.
■ 7z—this type of archive is created and opened using the open-source 7-Zip compression utility.
■ gz—this type of archive is created and opened by the gzip utility, freely available for UNIX and Linux computers.
A number of Windows third-party utilities can work with gzip-compressed files.
■ iso—this is a file in one of the formats used by optical media. The main formats are ISO 9660 (used by CDs)
and UDF (used by DVDs and Blu-Ray Discs). Many operating systems can mount an image file so that the
contents can be read through the file browser.
■ vhd/vmdk—these are disk image file formats used with Microsoft Hyper-V and VMware virtual machines
respectively. A disk image is a file containing the contents of a hard disk, including separate partitions and file
systems. Like an ISO, such a file can often be mounted within an OS so that the contents can be inspected via
the file browser.
A network is two or more computer systems linked together by some form of transmission medium that enables
them to share information. The network technology is what connects the computers, but the purpose of the
network is to provide services or resources to its users. These services may include access to shared files and
folders, printing, and database applications.
Network clients are computers and software that allow users to request resources shared by and hosted on
servers.
Networks of different sizes are classified in different ways. A network in a single location is often described as a
Local Area Network (LAN). This definition encompasses many different types and sizes of networks though. It
can include both residential networks with a couple of computers and enterprise networks with hundreds of
servers and thousands of workstations. Typically, most of the equipment and cabling used on a LAN is owned
and operated by the company or organization using the LAN.
Networks in different geographic locations but with shared links are called Wide Area Networks (WAN). A WAN
is more likely to make use of a service provider network. Companies that operate national telephone networks
are called telecommunications companies or telcos. Companies that specialize in providing Internet access are
called Internet Service Providers (ISP). Telcos operate as ISPs themselves but also make parts of their networks
available to smaller ISPs.
Network Media
A network is made by creating communications pathways between the devices on the network. Network
endpoints can be referred to as nodes or hosts. Communications pathways are implemented using an adapter
installed in the host to transmit and receive signals and network media between the interfaces to carry the
signals. There are two main types of local network connections:
■ Wired data connections use cabling and either electrical signals over copper wire or light signals over fiber
optic to connect nodes. Most local networks use a wired network standard called Ethernet to implement these
links.
■ Wireless (Wi-Fi) data connections use radio signals to transmit signals over the air. With Wi-Fi, a node usually
connects to an access point at a range of up to about 30m.
Wide area networks can also use copper or fiber optic cabling and various types of wireless networking, including
point-to-point radio, cellular radio, and satellite communications.
Network signals must be packaged in such a way that each host is able to understand them. Also, each host
must have a means of recognizing the location of other hosts on the network. These functions are provided by
a network protocol. A network protocol identifies each host on the network using a unique address. It also defines
a packet structure. A packet is a wrapper for each data unit transmitted over the network. A packet generally
consists of a header (indicating the protocol type, source address, destination address, error correction
information, and so on) and a payload (the data).
In an age when even your refrigerator is connected to the Internet, it’s important that you understand the basics
of networking, specifically, how the Transmission Control Protocol/Internet Protocol (TCP/IP) suite of protocols
works to provide the apps and services we increasingly rely on.
Packet Transmission
The original research underpinning TCP/IP was performed in the late 1960s and early 1970s by the Advanced
Research Projects Agency (ARPA), which is the research arm of the US Department of Defense (DoD). The
DoD wanted to build a network to connect a number of military sites. The prototype was a research network
called ARPANET, first operational in 1972. This connected four university sites using a system described as a
packet switching network.
Prior to this development, any two computers wanting to communicate had to open a direct channel, known as
a circuit. If this circuit was broken, the computers would stop communicating immediately. Packet switching
introduces the ability for one computer to forward information to another. To ensure information reaches the
correct destination, each packet is addressed with a source and destination address and then transferred using
any available pathway to the destination computer. A host capable of performing this forwarding function is called
a router.
A packet switching protocol is described as "robust" because it can automatically recover from communication
link failures. It re-routes data packets if transmission lines are damaged or if a router fails to respond. It can
utilize any available network path rather than a single, dedicated one.
The figure above shows an example of an internetworking system. A packet being sent from Network A to
Network D may be sent via Network C (the quickest route). If this route becomes unavailable, the packet is
routed using an alternate route (for example, A-F-E-D).
As well as the forwarding function and use of multiple paths, data is divided into small chunks or packets. Using
numerous, small packets means that if some are lost or damaged during transmission, it is easier to resend just
the small, lost packets than having to re-transmit the entire message.
TCP/IP Protocol Suite Layers
The major benefit in utilizing TCP/IP is the wide support for the protocol. It is the primary protocol of the Internet
and the World Wide Web. It is also the primary protocol for many private internets, which are networks that
connect Local Area Networks (LANs) together.
As mentioned above, TCP/IP is a suite or set of network transport protocols. When considering network
technologies and protocols, it is helpful to conceive of them as working in layers. The TCP/IP model consists of
four layers, each with defined functions. At each layer are protocols within the TCP/IP suite, or its supporting
technologies, that make use of the protocols in the layer below and provide services to the protocols in the layer
above:
■ Link or Network Interface layer—responsible for putting frames onto the physical network. This layer does not
contain TCP/IP protocols as such. At this layer, different networking products and media can be used, such as
Ethernet or Wi-Fi. Communications on this layer take place only on a local network segment and not between
different networks. Data at the link layer is packaged in a unit called a frame.
■ Internet layer—encapsulates packets into Internet datagrams and deals with routing between different
networks. Three key protocols are used at this layer:
● Internet Protocol (IP)—the main protocol in the TCP/IP suite is responsible for logical addressing and
routing of packets between hosts and networks.
● Address Resolution Protocol (ARP)—used for hardware address resolution. Each host has a link or
network interface layer address, usually called the Media Access Control (MAC) address, to identify it on the
local physical network. To deliver packets, this local MAC address must be resolved to a logical IP address using
ARP.
● Internet Control Message Protocol (ICMP)—sends messages and reports on errors regarding packet
delivery.
■ Transport layer—these protocols provide communication sessions between computers. Each application
protocol is identified at the transport layer by a port number. There are two transport protocols:
● Transport Control Protocol (TCP) provides connection-oriented delivery. This means that the delivery
is reliable and that packets are delivered in the correct sequence.
● User Datagram Protocol (UDP) provides connectionless delivery – there is no guarantee that packets
will arrive in the correct sequence. Any issues arising from the unreliable nature of UDP must be dealt with at
the application layer. The advantage of UDP is that there is less overhead involved in processing and transmitting
each packet and so it is faster than TCP.
■ Application layer—the top level of the architecture contains protocols that provide the communications formats
for exchanging data between hosts, such as transmitting an email message or requesting a web page.
Internet Protocol
The Internet Protocol (IP) is the primary protocol responsible for the forwarding function we defined above. It
provides packet delivery for all higher-level protocols within the suite. It provides best effort delivery between
hosts on a local network or within an internetwork of an unreliable and connectionless nature.
Delivery is not guaranteed, and a packet might be lost, delivered out of sequence, duplicated, or delayed.
IP Packet Structure
At the IP layer, any information received from the transport layer is wrapped in a datagram. The transport layer
datagram is the payload and IP adds a number of fields in a header to describe the payload and how to deliver
it:
Once the fields have been added, the IP datagrams are packaged into a suitable frame format and delivered
over the local network segment.
IP Addresses
As you can see from the fields in the datagram, an IP address is used to logically identify each device (host) on
a given network. An IP address is a 32- bit binary value. To make this value easier to enter in configuration
dialogs, it is expressed as four decimal numbers separated by periods: 172.30.15.12 for instance. Each number
represents a byte value, that is, an eight-character binary value, also called an octet, or a decimal value between
0 and 255. This is referred to as dotted decimal notation.
Recall that you can convert between binary and decimal by setting out the place value of each binary digit. For
example, you can convert 172 as follows:
■ The network number (network ID)—this number is common to all hosts on the same IP network.
■ The host number (host ID)—this unique number identifies a host on a particular network or logical subnetwork.
In order to distinguish the network ID and host ID portions within an address, each host must also be configured
with a network prefix length or subnet mask. This is combined with the IP address to determine the identity of
the network to which the host belongs.
The network prefix is also a 32-bit number. It contains a contiguous series of binary ones where the matching bit
of the IP address is a part of the network ID. The rest of the mask is zeroes and represents the host ID bits in
the IP address. For example, the prefix /8 would contain eight binary ones followed by 24 binary zeros. The prefix
could also be expressed as a subnet mask by converting it to dotted decimal (255.0.0.0).
Packet Delivery and Forwarding
The Internet Protocol (IP) covers addressing and forwarding at a "logical" level between networks with distinct
IDs (network layer). Actual delivery of information takes place at the lower physical/data link layer. The IP
datagram is put into a frame. Frames can only be delivered over a local network segment.
MAC Addresses
Frames use a different addressing method than IP. At the data link layer, each host is identified by the address
of its network interface. This is called a hardware address or a Media Access Control (MAC) address. The MAC
address is assigned to the network adapter at the factory. It is a 48-bit value expressed in hex notation. It is often
displayed as six groups of two hexadecimal digits with colon or hyphen separators or no separators at all (for
example, 00:60:8c:12:3a:bc or 00608c123abc) or as three groups of four hex digits with period separators
(0060.8c12.3abc).
If two systems are to communicate using IP, the host sending the packet must map the IP address of the
destination host to the hardware address of the destination host. The Address Resolution Protocol (ARP) is the
protocol that enables this process of local address discovery to take place. Hosts broadcast ARP messages onto
the local network to find out which host MAC address "owns" a particular IP address. If the destination host
responds, the frame can be delivered. Hosts also cache IP:MAC address mappings for several minutes to reduce
the number of ARP messages that have to be sent.
Routing
If the destination IP address is a local one (with the same network ID as the source), the host uses ARP
messaging to discover the local destination host. If the network IDs are different, the sending host uses ARP
messaging to discover a router on the local segment (its default gateway) and uses that to forward the packet.
The router forwards the packet to its destination (if known), possibly via intermediate routers.
DNS and URLs
As we have seen, network addressing uses 48-bit MAC values at the data link layer and 32-bit IP addresses at
the network layer. Computers can process these numbers easily, but they are very difficult for people to
remember or type correctly.
People find it much easier to address things using simple names. Consequently, there are protocols to assign
names to hosts and networks and to convert these names into IP addresses. The name resolution protocol used
with the TCP/IP suite is called the Domain Name System (DNS).
The Domain Name System (DNS) is a hierarchical, client/server-based distributed database name management
system. The purpose of the DNS database is to resolve resource names to IP addresses. In the DNS, the clients
are called resolvers and the servers are called name servers. The DNS database is distributed because no one
DNS server holds all possible DNS records. This would be far too much information for a single server to store.
Instead, the hierarchical nature of the DNS namespace enables DNS servers to query one another for the
appropriate record.
The namespace is structured like an inverted tree, starting at the root, and working down. Below the root are a
set of Top Level Domains (TLD) that define broad classes of entities (.com versus .gov, for instance) or national
authorities (.uk versus .ca, for instance). Within the TLDs, entities such as companies, academic institutions,
non-profits, governments, or even individuals can all register individual domains. An organization may also create
sub-domains to represent different parts of a business. Domains and subdomains contain resource records.
These records contain the host name to IP address mapping information used to resolve queries.
Any computer holding records for a part of the namespace is said to be a name server. Name servers that
contain the requested resource records for a particular namespace are said to be authoritative. If they are not
authoritative for a namespace, they will have pointers to other name servers which might be authoritative.
Resolvers are software programs running on client computers. For example, name resolution is a critical part of
web browsing, so web browser software will implement a resolver.
Hostnames and Fully Qualified Domain Names
A hostname is just the name given to an IP host. A hostname can be configured as any string with up to 256
alphanumeric characters (plus the hyphen), though most hostnames are much shorter. The hostname can be
combined with information about the domain in which the host is located to produce a Fully Qualified Domain
Name (FQDN). For example, if www is a host name, then the FQDN of the host www within the comptia.org
domain is www.comptia.org.
In the graphic below, a client needs to establish a session with the www.comptia.org web server.
1) The resolver (client) sends a recursive DNS query to its local DNS server asking for the IP address of
www.comptia.org. The local name server checks its DNS data corresponding to the requested domain name.
3) The root name server has authority for the root domain and will reply with the IP address of a name server for
the .org top level domain.
4) The local name server sends an iterative query for www.comptia.org to the .org name server.
5) The .com name server doesn't have a resource record www.comptia.org but it can provide the IP address of
the name server responsible for the comptia.org domain.
6) The local name server now queries the comptia.org name server for the IP address of www.comptia.org.
7) The comptia.org name server replies with the IP address corresponding to the FQDN www.comptia.org.
8) The local name server sends the IP address of www.comptia.org back to the original resolver.
Note how each query brings the local name server closer to the IP address of www.comptia.org.
Uniform Resource Locators (URL)
When a web browser is used to request a record from a web server, the request must have some means of
specifying the location of the web server and the resource on the web server that the client wants to retrieve.
This information is provided as a Uniform Resource Locator (URL).
The URL (or web address) contains the information necessary to identify and (in most cases) access an item.
1) Protocol—this describes the access method or service type being used. URLs can be used for protocols other
than HTTP/HTTPS. The protocol is followed by the characters ://
2) Host location—this could be an IP address, but as IP addresses are very hard for people to remember, it is
usually represented by a Fully Qualified Domain Name (FQDN). DNS allows the web browser to locate the IP
address of a web server based on its FQDN.
3) File path—specifies the directory and file name location of the resource, if required. Each directory is delimited
by a forward slash. The file path may or may not be case-sensitive, depending on how the server is configured.
If no file path is used, the server will return the default (home) page for the website.
The protocols we have discussed so far all involve supporting communications with addressing formats and
forwarding mechanisms. At the application layer, there are protocols that support services, such as publishing,
e-commerce, or messaging. The TCP/IP suite encompasses a large number and wide range of application layer
protocols. Some of the principal protocols amongst these are discussed below.
Hypertext Transfer Protocol (HTTP) is the basis of the World Wide Web. HTTP enables clients (typically web
browsers) to request resources from an HTTP server. A client connects to the HTTP server using its TCP port
(the default is port 80) and submits a request for a resource using a Uniform Resource Locator (URL). The server
acknowledges the request and returns the data.
To run a website, an organization will typically lease a server or space on a server from an ISP. Larger
organizations with their own Internet Point-of Presence may host websites themselves. Web servers are not only
used on the Internet, however. Private networks using web technologies are described as intranets (if they permit
only local access) or extranets (if they permit remote access).
HTTP is usually used to serve HTML web pages, which are plain text files with coded tags (HyperText Markup
Language) describing how the page should be formatted. A web browser can interpret the tags and display the
text and other resources associated with the page, such as picture or sound files. Another powerful feature is its
ability to provide hyperlinks to other related documents. HTTP also features forms mechanisms (GET and POST)
whereby a user can submit data from the client to the server.
The functionality of HTTP servers is often extended by support for scripting and programmable features (web
applications).
SSL/TLS
One of the critical problems for the provision of early e-commerce sites was the lack of security in HTTP. Under
HTTP, all data is sent unencrypted and there is no authentication of client or server. Secure Sockets Layer (SSL)
was developed by Netscape and released as version 3.0 in 1996 to address these problems. SSL proved very
popular with the industry and is still in widespread use. Transport Layer Security (TLS) was developed from SSL
and ratified as a standard by IETF. TLS is now the version in active development, with 1.2 as the latest version.
SSL/TLS is closely associated with use of the HTTP application, referred to as HTTPS or HTTP Over SSL or
HTTP Secure but can also be used to secure other TCP/IP application protocols.
Essentially, a server is assigned a digital certificate by some trusted Certificate Authority. The certificate proves
the identity of the server, assuming that the client trusts the Certificate Authority. The server uses the digital
certificate and the SSL/TLS protocol to encrypt communications between it and the client. This means that the
communications cannot be read or changed by a third party.
Email is a messaging system that can be used to transmit text messages and binary file attachments encoded
using Multipurpose Internet Mail Extensions (MIME). Email can involve the use of multiple protocols. The
following process illustrates how an email message is sent from a typical corporate mail gateway, using the
Microsoft Exchange mail server, to a recipient with dial-up Internet access:
1) The email client software on the sender's computer ([email protected]) sends the message to the
Exchange email server using Microsoft's MAPI (Message Application Programming Interface) protocol. The mail
server puts the message in a queue, waiting for the next Simple Mail Transfer Protocol (SMTP) session to be
started.
2) When the Exchange SMTP server starts to process the queue, it first contacts a DNS server to resolve the
recipient's address (for example, [email protected]) to an IP address for the othercompany.com
email server, listed as an MX (Mail Exchanger) record in DNS.
3) It then uses SMTP to deliver the message to this email server. The delivery usually requires several "hops,"
from the mail gateway to the sender's Internet Service Provider (ISP), then to the recipient's ISP. The hops taken
by a message as it is delivered over the Internet are recorded in the message header.
When using POP3, the messages are usually deleted from the server when they are downloaded, though some
clients have the option to leave them on the server. IMAP supports permanent connections to a server and
connecting multiple clients to the same mailbox simultaneously. It also allows a client to manage the mailbox on
the server, to organize messages in folders and control when they are deleted for instance, and to create multiple
mailboxes.
Configuring Email
To configure an email account, you need the username, password, and default email address, plus incoming
and outgoing server addresses and protocol types from the ISP.
Configuring an email account—the incoming server is either POP3 or IMAP while the outgoing server is SMTP.
Internet email addresses follow another URL scheme (mailto). An Internet email address comprises two parts;
the user name (local part) and the domain name, separated by an @ symbol. The domain name may refer to a
company or an ISP. For example, [email protected] or [email protected].
Different mail systems have different requirements for allowed and disallowed characters in the local part. The
local part is supposed to be case-sensitive, but most mail systems do not treat it as such. An incorrectly
addressed email will be returned with a message notifying that it was undeliverable. Mail may also be rejected if
it is identified as spam or if there is some other problem with the user mailbox, such as the mailbox being full.