LPR Week 4 - Digiruns
LPR Week 4 - Digiruns
• It allocates the system resources and coordinate all the details of the computer's internals is called
the operating system or the kernel.
• Users communicate with the kernel through a program known as the shell. The shell is a command
line interpreter; it translates commands entered by the user and converts them into a language that is
understood by the kernel.
• The Unix operating system comprises three parts: The kernel, the standard utility programs, and the
system configuration files.
LINUX VS UNIX
UNIX LINUX
A family of multitasking, multi-user computer
A family of free and open-source software operating
operating systems that derived from the original AT &
system built around the Linux kernel
T Unix
Source code is available to the general public
Source code is not available to the general public
Used for servers, workstations, mainframes and high- Used for personal computers, desktops; also used for
end computers game development, embedded system etc
Not portable Portable and can be executed on various hard drives
Variants: Solaris, HP UNIX, BSD, AIS etc Variants: Ubuntu, Fedora, Red Hat, CentOS, Debian etc
Supports xfs, ramfs, nfs, vfat, cramfsm, ext3, ext4,
Supports zfs, js, hfx, gps, xfs, vfxs file systems
ext2, ext1, ufs, autofs, devpts, nts file systems
Installation requires more sophisticated high-end
Does not require more specific hardware components
hardware
Expensive Free
What is Linux?
• Linux is a free and open-source
operating system which is made up of
several software components. They
collectively manage hardware
resources and allow users to access
the web or edit a text file in a text
editor.
o A Linux distribution, often shortened to Linux distro, is an operating system compiled from
components developed by various open source projects and programmers.
o Each distribution includes the Linux kernel (the foundation of the operating system), the
GNU shell utilities (the terminal interface and commands), the X server (for a graphical
desktop), the desktop environment, a package management system, an installer and other
services
• Group: A user- group can contain multiple users. All users belonging to a group
will have the same Linux group permissions access to the file
• Other: Any other user who has access to a file. This person has neither created
the file, nor he belongs to a usergroup who could own the file. Practically, it
means everybody else
B. PERMISSION [1/5]
The permission system in Linux defines user’s behaviour. Every file and directory in
your UNIX/Linux system has following 3 permissions defined for all the 3 owners:
• Read: This permission gives individual the authority to open and read a file. Read
permission on a directory gives one the ability to lists its content.
• Write: The write permission gives individual the authority to modify the contents of
a file stored in the directory.
• In this mode, file permissions are not represented as characters but a three-digit octal number.
• In the symbolic mode, you can modify permissions of a specific owner. It makes use of
mathematical symbols to modify the Unix file permissions.
Operator Description
+ Adds a permission to a file or directory
– Removes the permission
User Denotations
u user/owner
g group
o other
a all
[1/2]
III. CRONTAB
• Cron is a system process that will automatically perform tasks as per the specific
schedule.
• Crontab, stands for “cron table”, is also the name of the program, which is used to
edit that schedule.
(folders) and each of these directories is used to store a specific type of data.
[2/4]
/bin directory: The bin is short for binaries. It contains all the binary files for the applications.
/sbin directory: It contains all the system binaries that a system administrator will use.
/boot directory: It contains all of the files which an operating system needs to boot.
/dev directory: All the devices of the computer will be listed here.
/etc directory: This folder is where all of your configurations which are system-wide like apt,
source.list, etc will be stored.
/lib32 and /lib64 directory: These are where all of the libraries are stored.
[3/4]
/mnt and /media directory: “mnt” stands for the mount. All of the devices which are mounted
manually will be displayed here. Media directory contains devices like
the USB, cdrom, etc.
/opt directory: “opt” stands for optionals. It is where all of the proprietary programs like google
chrome, team viewer, etc are stored.
/root directory: the “/” and “/root” both are two different things. “/root” is the root directory for
the user who has root privileges. User need to have root permission to access the
“/root” directory.
/srv directory: This directory is used when user have a server running on its system. So, all the files
of services data can be accessed here.
[4/4]
/sys directory: This directory is used to interact with the kernel. The data available here is not
physically written to the disk. It is created every time when a system boots up.
/tmp directory: It contains all the temporary files. All the applications which are running on the
system use this directory to store temporary data.
/usr directory: All the applications which are installed by the user will reside here. It is also known as
“Unix System Resource”.
/var directory: var is a variable directory. It stores information related to the system log files, system
crash reports, etc.
/home directory: It is where all of the users and there files and folders will be displayed.
[1/8]
Linux Disks & File Systems
• Linux, like UNIX, uses file systems to represent other objects that the system
needs to operate.
• The first part of this two-part implementation is the Linux virtual filesystem which provides a single set
of commands for the kernel, and developers, to access all types of filesystems.
• The filesystem-specific device drivers are the second part of the implementation. The device driver
interprets the standard set of filesystem commands to ones specific to the type of filesystem on the
partition or logical volume.
A. Disks [3/8]
• A disk can be, and often is, divided into partitions. Each partition can then be assigned a
different purpose.
• Disk partitions are displayed in the first column of output from the df command, as shown
below:
This entry shows the file representing partition 1 on the first hard drive (hda). Depending on
the hardware configuration, hard disks may also appear as sda, sdb, and such or with a higher
letter designation, i.e. hde or higher.
• Command check disk : [4/8]
o du command: It is used to check the information of disk usage of files and
directories on a system. It displays a list of all the files along with their
respective sizes. By default, size given is in kilobytes.
o df command: The df command stands for "disk-free," and shows available and used
disk space on the Linux system.
o fdisk -l command: It shows disk size along with disk partitioning information
o parted -l command: The parted is a command line tool that helps to manage the
hard disk partitions. The parted -l command is used to lists hard
disk partitions
[5/8]
B. File Systems
• A file system is a set of methods to access
and organize storage.
- Ext2 is the first Linux file system that allows managing two terabytes of data.
- Ext4 file system is the faster file system among all the Ext file systems. It is a
very compatible option for the SSD (solid-state drive) disks, and it is the
default file system in Linux distribution.
2. JFS File System [7/8]
- JFS stands for Journaled File System
- It is an alternative to the Ext file system. It can also be used in place of Ext4,
where stability is needed with few resources. It is a handy file system
when CPU power is limited.
- The Linux kernel is the main component of a Linux operating system and is the core interface between
a computer’s hardware and its processes.
- The kernel has 4 jobs:
a. Memory management: Keep track of how much memory is used to store what, and
where.
b.Process management: Determine which processes can use the central processing unit
(CPU), when, and for how long.
d.System calls and security: Receive requests for service from the processes.
- It is very important to learn the booting process to understand the working of any operating system.
• The stages of Linux boot process are:
[2/3]
[3/3]
1. The machine’s BIOS or boot microcode hundreds and runs a boot loader.
2. Boot loader finds the kernel image on the disk and loads it into memory, to start the system.
7. For some purpose, init starts a method permitting you to log in, typically at the top or close
to the top of the boot sequence.
System Logging
What so Important?
- In Linux, system logs are human-readable records of the core system activities, such as: user logins
and login failures, operating system booting, system failures, etc, that performed by services,
daemons, and system applications.
- Syslog is specifically responsible for creating logs via the System Logger. Syslog comprises of several
components such as the Syslog Message Format, Syslog Protocol, and the Syslog Daemon.
- The /var/log directory stores most of the logs on a Linux system. The /var directory mostly contains
variable files and directories i.e data that is bound to change often.
System Time [1/3]
What so Important?
- In Linux, time is measured in seconds since the Unix epoch. The Unix epoch is January 1, 1970 UTC.
This means that the time on your Linux machine is always relative to this date.
- Checking the time on your Linux machine is important for a number of reasons. For example, if you
are running a cron job, you will want to make sure that the time is set correctly so that the job will
run at the correct time.
System Time [2/3]
- There are a few different ways to check the time on your Linux machine:
a) to use the date command.
This command displays the current date and time in human-readable format.
- %D: Display date as mm/dd/yy. - %m: Displays the month of year (01 to 12).
- %d: Display the day of the month (01 to 31). - %y: Displays last two digits of the year(00 to 99).
- %a: Displays the abbreviated name for weekdays (Sun to Sat). - %Y: Display four-digit year.
- %A: Displays full weekdays (Sunday to Saturday). - %T: Display the time in 24 hour format as HH:MM:SS.
- %h: Displays abbreviated month name (Jan to Dec). - %H: Display the hour.
- %b: Displays abbreviated month name (Jan to Dec). - %M: Display the minute.
- %B: Displays full month name(January to December). - %S: Display the seconds.
Users
What so Important?
- Users command in Linux system is used to show the user names of users currently logged in to the
current host. It will display who is currently logged in according to FILE. If the FILE is not specified,
use /var/run/utmp. /var/log/wtmp as FILE is common.
User Environment
There are several commands available that allow us to list and set environment variables in Linux:
• env – This command allows us to run other programs in a specific environment without changing
the current one.
• printenv – print the environment variables you want to view.
• set – puts a value into an environment variable.
• unset – remove environment variables.
• export – put values into an environment variable.
Linux Process
What so Important? Commands
Type of Processes
- A computer program can run in the UNIX shell command line interpreter for file manipulation,
program execution and printing text is called Shell Scripting.
- Individual commands follow a sequence and perform the operations and hence it is known as Shell
Script. This is useful for the repetitive tasks in the system administration.
What so useful?
- We need to write shell scripts to avoid repetitive work and automation, to perform the routine
backups by system admins using shell scripting, too Adding new functionality to the shell, to
perform System monitoring and so on.
DOCKER
What is docker?
• Single host deployment - This means you can run everything on a single piece of
hardware
• High productivity - Docker Compose reduces the time it takes to perform tasks
• Security - All the containers are isolated from each other, reducing the threat landscape
Docker Architecture [1/3]
Docker follows Client-Server architecture, which includes the three main components that are Docker Client,
Docker Host, and Docker Registry.
[2/3]
1. Docker Client
• Docker client uses commands and REST APIs to communicate with the Docker Daemon
(Server).
• When a client runs any docker command on the docker client terminal, the client terminal
sends these docker commands to the Docker daemon.
• Docker daemon receives these commands from the docker client in the form of command
and REST API's request.
• Docker Client uses Command Line Interface (CLI) to run the following commands:
• Docker build
• Docker pull
• Docker run
[3/3]
Docker search—
o We can use the command to search for public images on the Docker hub.
o It will return information about the image name, description, stars, official
and automated.
Docker run
–env MYSQL_ROOT_PASSWORD=my-secret-pw –detach mysql
Docker ps
We can list all the running containers by using the following command.
Docker stop [3/4]
To stop a container, use the command with either the container id or container name. We
may stop a container if we want to change our docker run command.
Docker restart
To restart our stopped contained
Docker rename
container_before container_after
Docker exec
To run a new command in a running container
Docker logs
This command is helpful to debug our Docker containers. It will fetch logs from
a specified container.
[4/4]
Docker rm container_name
In case we want to remove a container, we can use the following command.
Docker rmi
If we want to free some disk space, we can use the command with the image id to
remove an image.
Docker Object [1/4]
1. Images
➢ Docker images are the read-only binary templates used to create Docker
Containers.
➢ It uses a private container registry to share container images within the
enterprise and also uses public container registry to share container images
within the whole world.
➢ Metadata is also used by docket images to describe the container's abilities.
[2/4]
2. Docker containers
➢ Containers are the structural units of Docker, which is used to hold the entire
package that is needed to run the application.
➢ In other words, we can say that the image is a template, and the container is a
copy of that template.
3. Docker Networking [3/4]
❖Bridge - Bridge is a default network driver for the container. It is used when
multiple docker communicates with the same docker host.
❖Host - It is used when we don't need for network isolation between the
container and the host.
❖None - It disables all the networking.
❖Overlay - Overlay offers Swarm services to communicate with each other. It
enables containers to run on the different docker host.
❖Macvlan - Macvlan is used when we want to assign MAC addresses to the
containers.
[4/4]
4. Docker Storage
Command:
• docker-compose up [OPTION]
• docker-compose up -d
By using Compose, we can define the services in a YAML file, as well as spin them
up and tear them down with one single command.
Docker Compose
Docker for Data Engineering
A brief of Problems
- A data science project often involves a whole team of data scientists, data engineers, software
architects, often working along with other software development teams to create a viable solution.
You can have a situation where different data scientists end up working with different versions of
the library only to realize after hours of debugging that there are small differences in their
environments. Docker lets you create a consistent environment for data scientists and data
engineers to deal with these kinds of situations.
Docker as Solutions
- Ease of model building - Dockerizing your data science project can help set things up faster
- Deployment - Deploying a data science solution is much easier with docker in place.
Docker for Data Engineering
Use Case
For example, if you wanted to deploy Airflow, you could just have a VM, install Python, Postgres and Airflow. But,
what if you also needed a different version of Postgres for a different app? What if you also have a whole bunch of
other things that are running on that VM? What if some of them need Python 2 instead of Python 3?
While Airflow isn’t the heaviest piece of software, imagine something like Spark with Python. You’d need Python,
some version of the JVM, some version of Spark, as well as other dependencies. If you have a laptop, there’s a very
high likelihood you’ll run into some dependency conflicts.
Docker allows you to completely isolate processes in a way that’s reproducible. If an image works locally, it’ll work
elsewhere, like production. Docker, as a technology, isn’t technically required anywhere. However, there are loads
of benefits, specifically reproduction of an environment.