SPSA
SPSA
System programming (or systems programming) is the activity of programming system software. The primary distinguishing characteristic of systems programming when compared to application programming is that application programming aims to produce software which provides services to the user (e.g. word processor), whereas systems programming aims to produce software which provides services to the computer hardware (e.g. disk defragmenter). It also requires a greater degree of hardware awareness.Contents [hide]In system programming more specifically: the programmer will make assumptions about the hardware and other properties of the system that the program runs on, and will often exploit those properties (for example by using an algorithm that is known to be efficient when used with specific hardware) usually a low-level programming language or programming language dialect is used that: can operate in resource-constrained environments is very efficient and has little runtime overhead has a small runtime library, or none at all allows for direct and "raw" control over memory access and control flow lets the programmer write parts of the program directly in assembly language debugging can be difficult if it is not possible to run the program in a debugger due to resource constraints. Running the program in a simulated environment can be used to reduce this problem.Systems programming is sufficiently different from application programming that programmers tend to specialize in one or the other. In system programming, often limited programming facilities are available. The use of automatic garbage collection is not common and debugging is sometimes hard to do. The runtime library, if available at all, is usually far less powerful, and does less error checking. Because of those limitations, monitoring and logging are often used; operating systems may have extremely elaborate logging subsystems. Implementing certain parts in operating system and networking requires systems programming (for example implementing Paging (Virtual Memory) or a device driver for an operating system). Originally systems programmers invariably wrote in assembly language. Experiments with hardware support in high-level languages in the late 1960s led to such languages as BLISS, BCPL, and extended Algol for Burroughs large systems, but C, helped by the growth of UNIX, became ubiquitous in the 1980s. More recently Embedded C++ has seen some use, for instance in the I/O Kit drivers of Mac OS X. For historical reasons, some organizations use the term systems programmer to describe a job function which would be more accurately termed systems administrator. This is particularly true in organizations whose computer resources have historically been dominated by mainframes, although the term is even used to describe job functions which do not involve mainframes. This usage arose because administration of IBM mainframes often involved the writing of custom assembler code which integrated with the Operating System, indeed, some IBM software products had substantial code contributions from customer programming staff. This type of programming is progressively less common, but the term systems programmer is still the defacto job title for staff directly administering IBM mainframes.
Assemblers:
Assemblers involve a set of concepts that enable code to be written in a language and then used to control the computer's operations. These concepts include the assignment of labels to represent locations in memory (e.g. letters such as X, Y, Z). Fixed names are given to operations such as STORE, LOAD and ADD as well as registers (e.g. R1, R2). Numbers written in decimal are also converted to binary. Each line in assembly language translates to a single machine code instruction. Assembly languages are machine specific and require an understanding of the structure of the machine in order for that machine to be used to its potential. Assembly language is also very hard to learn. FORTRAN was an early language that appeared around the same time as COBOL and it was very similar to assembly code. COBOL came along in about 1956 and it combined assembly code instructions into an easier to write language. Some other languages: BASIC (Beginner's All-purpose Symbolic Instruction Code) - a simple language originally designed for beginners and students . PASCAL (designed early 1970s) - originally designed for simplicity in order to teach programming C++ (1986) - commonly used object-oriented language VB (1991) - visual programming system designed by Microsoft Programming languages involve declarations of variables, literals and constants. Variables in a programming language are simply labels pointing to a location or they represent a type of data. Constants are labels for literals which are bit patterns. There are four main types of imperative instructions within programming languages. These are Assignments, Expressions, Control Statements, and Procedural Units (which are also known as methods).
An assembler is a program that takes basic computer instructions and converts them into a pattern of bits that the computer's processor can use to perform its basic operations. Some people call these instructions assembler language and others use the term assembly language.Here's how it works:
Most computers come with a specified set of very basic instructions that correspond to the basic machine operations that the computer can perform. For example, a "Load" instruction causes the processor to move a string of bits from a location in the processor's memory to a special holding place called a register. Assuming the processor has at least eight registers, each numbered, the following instruction would move the value (string of bits of a certain length) at memory location 3000 into the holding place called register 8:
L 8,3000
The programmer can write a program using a sequence of these assembler instructions. This sequence of assembler instructions, known as the source code or source program, is then specified to the assembler program when that program is started.
The assembler program takes each program statement in the source program and generates a corresponding bit stream or pattern (a series of 0's and 1's of a given length). The output of the assembler program is called the object code or object program relative to the input source program. The sequence of 0's and 1's that constitute the object program is sometimes called machine code. The object program can then be run (or executed) whenever desired.
In the earliest computers, programmers actually wrote programs in machine code, but assembler languages or instruction sets were soon developed to speed up programming. Today, assembler programming is used only where very efficient control over processor operations is needed. It requires knowledge of a particular computer's instruction set, however. Historically, most programs have been written in "higher-level" languages such as COBOL, FORTRAN, PL/I, and C. These languages are easier to learn and faster to write programs with than assembler language. The program that processes the source code written in these languages is called a compiler. Like the assembler, a compiler takes higher-level language statements and reduces them to machine code. A newer idea in program preparation and portability is the concept of a virtual machine. For example, using the Java programming language, language statements are compiled into a generic form of machine language known as bytecode that can be run by a virtual machine, a kind of theoretical machine that approximates most computer operations. The bytecode can then be sent to any computer platform that has previously downloaded or built in the Java virtual machine. The virtual machine is aware of the specific instruction lengths and other particularities of the platform and ensures that the Java bytecode can run.
Program loading: Copy a program from secondary storage (which since about 1968 invariably means a disk) into main memory so it's ready to run. In some cases loading just involves copying the data from disk to memory, in others it involves allocating storage, setting protection bits, or arranging for virtual memory to map virtual addresses to disk pages. Relocation: Compilers and assemblers generally create each file of object code with the program addresses starting at zero, but few computers let you load your program at location zero. If a program is created from multiple subprograms, all the subprograms have to be loaded at non-overlapping addresses. Relocation is the process of assigning load addresses to
the various parts of the program, adjusting the code and data in the program to reflect the assigned addresses. In many systems, relocation happens more than once. It's quite common for a linker to create a program from multiple subprograms, and create one linked output program that starts at zero, with the various subprograms relocated to locations within the big program. Then when the program is loaded, the system picks the actual load address and the linked program is relocated as a whole to the load address. Symbol resolution: When a program is built from multiple subprograms, the references from one subprogram to another are made using symbols; a main program might use a square root routine called sqrt, and the math library defines sqrt. A linker resolves the symbol by noting the location assigned to sqrt in the library, and patching the caller's object code to so the call instruction refers to that location.
Although there's considerable overlap between linking and loading, it's reasonable to define a program that does program loading as a loader, and one that does symbol resolution as a linker. Either can do relocation, and there have been all-in-one linking loaders that do all three functions. The line between relocation and symbol resolution can be fuzzy. Since linkers already can resolve references to symbols, one way to handle code relocation is to assign a symbol to the base address of each part of the program, and treat relocatable addresses as references to the base address symbols. One important feature that linkers and loaders share is that they both patch object code, the only widely used programs to do so other than perhaps debuggers. This is a uniquely powerful feature, albeit one that is extremely machine specific in the details, and can lead to baffling bugs if done wrong.
Two-pass linking
Now we turn to the general structure of linkers. Linking, like compiling or assembling, is fundamentally a two pass process. A linker takes as its input a set of input object files, libraries, and perhaps command files, and produces as its result an output object file, and perhaps ancillary information such as a load map or a file containing debugger symbols. Each input file contains a set of segments, contiguous chunks of code or data to be placed in the output file. Each input file also contains at least one symbol table. Some symbols are exported, defined within the file for use in other files, generally the names of routines within the file that can be called from elsewhere. Other symbols are imported, used in the file but not defined, generally the names of routines called from but not present in the file. When a linker runs, it first has to scan the input files to find the sizes of the segments and to collect the definitions and references of all of the symbols It creates a segment table listing all of the segments defined in the input files, and a symbol table with all of the symbols imported or exported. Using the data from the first pass, the linker assigns numeric locations to symbols, determines the sizes and location of the segments in the output address space, and figures out where everything goes in the output file. The second pass uses the information collected in the first pass to control the actual linking process. It reads and relocates the object code, substituting numeric addresses for symbol references, and adjusting memory addresses in code and data to reflect relocated segment addresses, and writes the relocated code to the output file. It then writes the output file, generally with header information, the relocated segments, and symbol table information. If the program uses dynamic linking, the symbol table contains the info the runtime linker will need to resolve dynamic symbols. In many cases, the linker itself will generate small amounts of code or data in the output file, such as "glue code" used to call routines in overlays or dynamically
linked libraries, or an array of pointers to initialization routines that need to be called at program startup time. Whether or not the program uses dynamic linking, the file may also contain a symbol table for relinking or debugging that isn't used by the program itself, but may be used by other programs that deal with the output file. Some object formats are relinkable, that is, the output file from one linker run can be used as the input to a subsequent linker run. This requires that the output file contain a symbol table like one in an input file, as well as all of the other auxiliary information present in an input file. Nearly all object formats have provision for debugging symbols, so that when the program is run under the control of a debugger, the debugger can use those symbols to let the programmer control the program in terms of the line numbers and names used in the source program. Depending on the details of the object format, the debugging symbols may be intermixed in a single symbol table with symbols needed by the linker, or there may be one table for the linker and a separate, somewhat redundant table for the debugger. A few linkers appear to work in one pass. They do that by buffering some or all of the contents of the input file in memory or disk during the linking process, then reading the buffered material later. Since this is an implementation trick that doesn't fundamentally affect the two-pass nature of linking, we don't address it further here.
A macro (from the Greek '' for big or far) in computer science is a rule or pattern that specifies how a certain input sequence (often a sequence of characters) should be mapped to an output sequence (also often a sequence of characters) according to a defined procedure. The mapping process that instantiates a macro into a specific output sequence is known as macro expansion.The term originated with macro-assemblers, where the idea is to make available to the programmer a sequence of computing instructions as a single program statement, making the programming task less tedious and less error-prone Keyboard macros and mouse macros allow short sequences of keystrokes and mouse actions to be transformed into other, usually more time-consuming, sequences of keystrokes and mouse actions. In this way, frequently-used or repetitive sequences of keystrokes and mouse movements can be automated. Separate programs for creating these macros are called macro recorders. During the 1980s, macro programs -- originally SmartKey, then SuperKey, KeyWorks, Prokey -were very popular, first as a means to automatically format screenplays, then for a variety of user input tasks. These programs were based on the TSR (Terminate and stay resident) mode of operation and applied to all keyboard input, no matter in which context it occurred. They have to some extent fallen into obsolescence following the advent of mouse-driven user interface and the availability of keyboard and mouse macros in applications, such as word processors and spreadsheets, which makes it possible to create application-sensitive keyboard macros. Keyboard macros have in more recent times come to life as a method of exploiting the economy of massively multiplayer online role-playing game (MMORPG)s. By tirelessly performing a boring, repetitive, but low risk action, a player running a macro can earn a large amount of the game's currency. This effect is even larger when a macro-using player operates multiple accounts simultaneously, or operates the accounts for a large amount of time each day. As this money is generated without human intervention, it can dramatically upset the economy of the game by causing runaway inflation. For this reason, use of macros is a violation of the TOS or EULA of most MMORPGs, and administrators of MMORPGs fight a continual war to identify and punish macro users.
Compilers:
A compiler is a computer program (or set of programs) that transforms source code written in
a computer language (the source language) into another computer language (the target language, often having a binary form known as object code). The most common reason for wanting to transform source code is to create an executable program. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a lower level language (e.g., assembly language or machine code). A program that translates from a low level language to a higher level one is a decompiler. A program that translates between high-level languages is usually called a language translator, source to source translator, or language converter. A language rewriter is usually a program that translates the form of expressions without a change of language.
A compiler is likely to perform many or all of the following operations: lexical analysis, preprocessing, parsing, semantic analysis, code generation, and code optimization. Program faults caused by incorrect compiler behavior can be very difficult to track down and work around and compiler implementors invest a lot of time ensuring the correctness of their software. The term compiler-compiler is sometimes used to refer to a parser generator, a tool often used to help create the lexer and parser.
Incremental Compiler A computer-aided software development system includes programs to implement edit, compile, link and run sequences, all from memory, at very high speed. The compiler operates on an incremental basis, line-by-line, so if only one line is changed in an edit session, then only that line need be recompiled if no other code is affected. Scanning is done incrementally, and the resulting token list saved in memory to be used again where no changes are made. All of the linking tables are saved in memory so there is no need to generate link tables for increments of code where no changes in links are needed. The parser is able to skip lines or blocks of lines of source code which haven't been changed; for this purpose, each line of source text in the editor has a change-tag to indicate whether this line has been changed, and from this change-tag information a clean-lines table is built having a clean-lines indication for each line of source code, indicating how many clean lines follow the present line. All of the source code text modules, the token lists, symbol tables, code tables and related data saved from one compile to another are maintained in virtual memory rather than in files so that speed of operation is enhanced. Also, the object code created is maintained in memory rather than in a file, and executed from this memory image, to reduce delays. A virtual memory management arrangement for the system assures that all of the needed data modules and code is present in real memory by page swapping, but with a minimum of page faults, again to enhance operating speed.
preprocessing (to expand macros) compilation (from source code to assembly language) assembly (from assembly language to machine code) linking (to create the final executable)
As an example, we will examine these compilation stages individually using the Hello World program hello.c:
#include <stdio.h> int main (void) { printf ("Hello, world!\n"); return 0; }
Note that it is not necessary to use any of the individual commands described in this section to compile a program. All the commands are executed automatically and transparently by GCC internally, and can be seen using the -v option described earlier. Although the Hello World program is very simple it uses external header files and libraries, and so exercises all the major steps of the compilation process.
The Compilation Process
Stages from Source to Executable 1. Compilation: source code ==> relocatable object code (binaries) 2. Linking: many relocatable binaries (modules plus libraries) ==> one relocatable binary (with all external references satisfied) 3. Loading: relocatable ==> absolute binary (with all code and data references bound to the addresses occupied in memory) 4. Execution: control is transferred to the first instruction of the program
At compile time (CT), absolute addresses of variables and statement labels are not known. In static languages (such as Fortran), absolute addresses are bound at load time (LT). In block-structured languages, bindings can change at run time (RT).
Phases of the Compilation Process 1. Lexical analysis (scanning): the source text is broken into tokens. 2. Syntactic analysis (parsing): tokens are combined to form syntactic structures, typically represented by a parse tree.
The parser may be replaced by a syntax-directed editor, which directly generates a parse tree as a product of editing.
3. Semantic analysis: intermediate code is generated for each syntactic structure.
Type checking is performed in this phase. Complicated features such as generic declarations and operator overloading (as in Ada and C++) are also processed.
4. Machine-independent optimization: intermediate code is optimized to improve efficiency. 5. Code generation: intermediate code is translated to relocatable object code for the target machine. 6. Machine-dependent optimization: the machine code is optimized.
On some systems (e.g., C under Unix), the compiler produces assembly code, which is then translated by an assembler.
Text editor
A text editor is a type of program used for editing plain text files. Text editors are often provided with operating systems or software development packages, and can be used to change configuration files and programming language source code.
A plain text file is represented and edited by showing all the characters as they are present in the file. The only characters usable for 'mark-up' are the control characters of the used character set; in practice this is newline, tab and formfeed. The most commonly used character set is ASCII, especially recently, as plain text files are more used for programming and configuration and less frequently used for documentation than in the past. Documents created by a word processor generally contain fileformat-specific "control characters" beyond what is defined in the character set. These enable functions like bold, italic, fonts, columns, tables, etc. These and other common page formatting symbols were once associated only with desktop publishing but are now commonplace in the simplest word processor. Word processors can usually edit a plain text file and save in the plain text file format. However one must take care to tell the program that this is what is wanted. This is especially important in cases such as source code, HTML, and configuration and control files. Otherwise the file will
contain those "special characters" unique to the word processor's file format and will not be handled correctly by the utility the files were intended for.
History
Before text editors existed, computer text was punched into punched cards with keypunch machines. The text was carried as a physical box of these thin cardboard cards, and read into a card-reader. The first text editors were line editors oriented on typewriter style terminals and they did not provide a window or screen-oriented display. They usually had very short commands (to minimize typing) that reproduced the current line. Among them were a command to print a selected section(s) of the file on the typewriter (or printer) in case of necessity. An "edit cursor", an imaginary insertion point, could be moved by special commands that operated with line numbers of specific text strings (context). Later, the context strings were extended to regular expressions. To see the changes, the file needed to be printed on the printer. These "line-based text editors" were considered revolutionary improvements over keypunch machines. In case typewriter-based terminals were not available, they were adapted to keypunch equipment. In this case the user needed to punch the commands into the separate deck of cards and feed them into the computer in order to edit the file. When computer terminals with video screens became available, screen-based text editors became common. One of the earliest "full screen" editors was O26 - which was written for the operator console of the CDC 6000 series machines in 1967. Another early full screen editor is vi. Written in the 1970s, vi is still a standard editor for Unix and Linux operating systems. The productivity of editing using full-screen editors (compared to the line-based editors) motivated many of the early purchases of video terminals.
search string with a replacement string. Different methods are employed, Global(ly) Search And Replace, Conditional Search and Replace, Unconditional Search and Replace.
Cut, copy, and paste Main article: Cut, copy, and paste
Most text editors provide methods to duplicate and move text within the file, or between files.
Text formatting Main article: Text formatting
Text editors often provide basic formatting features like line wrap, auto-indentation, bullet list formatting, comment formatting, and so on.
Undo and redo Main article: Undo
As with word processors, text editors will provide a way to undo and redo the last edit. Often especially with older text editorsthere is only one level of edit history remembered and successively issuing the undo command will only "toggle" the last change. Modern or more complex editors usually provide a multiple level history such that issuing the undo command repeatedly will revert the document to successively older edits. A separate redo command will cycle the edits "forward" toward the most recent changes. The number of changes remembered depends upon the editor and is often configurable by the user.
Importing Main article: Data transformation
Reading or merging the contents of another text file into the file currently being edited. Some text editors provide a way to insert the output of a command issued to the operating system's shell.
Filtering Main article: Filter (software)
Some advanced text editors allow you to send all or sections of the file being edited to another utility and read the result back into the file in place of the lines being "filtered". This, for example, is useful for sorting a series of lines alphabetically or numerically, doing mathematical computations, and so on.
Syntax Highlighting Main article: Syntax highlighting
Another useful feature of many text editors is syntax highlighting, where the editor can recognise or be instructed that you are writing in a particular language, such as HTML or C++, and can colour code your code for you, to break up text and easily identify tags, etc.
Special features
Some editors include special features and extra functions, for instance,
Source code editors are text editors with additional functionality to facilitate the production of source code. These often feature user-programmable syntax highlighting, and coding tools or keyboard macros similar to an HTML editor (see below). Folding editors. This subclass includes so-called "orthodox editors" that are derivatives of Xedit. The specialized version of folding is usually called outlining (see below). IDEs (integrated development environments) are designed to manage and streamline larger programming projects. They are usually only used for programming as they contain many features unnecessary for simple text editing. World Wide Web programmers are offered a variety of text editors dedicated to the task of web development. These create the plain text files that deliver web pages. HTML editors include: Dreamweaver, E (text editor), Frontpage, HotDog, Homesite, Nvu, Tidy, GoLive, and BBedit. Many offer the option of viewing a work in progress on a built-in web browser. Mathematicians, physicists, and computer scientists often produce articles and books using TeX or LaTeX in plain text files. Such documents are often produced by a standard text editor, but some people use specialized TeX editors. Outliners. Also called tree-based editors, because they combine a hierarchical outline tree with an text editor. Folding (see above) can generally be considered a generalized form of outlining.
Debug Monitors A debug monitor is very powerful graphical or console mode tool that monitors all the activities that are handled by the WinDriver Kernel. You can use the debug monitor to see how each command that is sent to the kernel is executed. A WinDriver Kernel is a driver development toolkit inside ones computer that simplifies the creation of drivers. A driver is used in a computer so that the computer can read the devices that are in the computer or that get attached to the computer. If you were to hook up a printer to your computer, you would first need to install its driver so that the computer could create graphics or a console so that you could control your printer through the computer. The same thing goes for audio devices, internet devices, video devices. A debug monitor, simply put, is a tool that helps to find and reduce the number of bugs and defects in a computer program or any electrical device within or attached to the computer in order to make it act the way it should. While the driver is being created and downloaded, the debug monitor helps it work properly. For example, when an armored car drives up to a bank and the guards have to transfer money from the truck to the bank, there are special guards that stand watch to make sure no one tries to rob them thus making the transaction go smoothly. Those guards could be the debug monitors in the computer industry.
If the debugging monitor locates a bug or defect in any of the equipment, it will first try to reproduce the problem which will allow a programmer to view each string that was within the bug or defect range and try to fix it. A programmer is a technician who has learned the basic format of computers that make them run. These are strings of technical information that most people using computers will never see. For example, using a clock. The general public will plug in the clock and use it to tell time but will not open it up to see how it works. That is saved for the people who fix clocks. They are the programmers of clocks in the computer industry. The programmer will delete strings or add new ones and then use the debug monitor to re-create the driver download to see if he fixed the problem. This can be a tedious task with all the processes that run in the computer, but the debug monitor helps to make it a lot easier. Assemblers:
An assembler is a translator that translates source instructions (in symbolic language) into target instructions (in machine language), on a one to one basis. This means that each source instruction is translated into exactly one target instruction. This definition has the advantage of clearly describing the translation process of an assembler. It is not a precise definition, however, because an assembler can do (and usually does) much more than just translation. It offers a lot of help to the programmer in many aspects of writing the program. The many types of help Offered by the assembler are grouped under the general term directives (or pseudo instructions). _ Another good definition of assemblers is: An assembler is a translator that translates a machine oriented language into machine language. Obviously, future symbols are not an error and their use should not be prohibited. The programmer should be able to refer to source lines which either precede or follow the current line. Thus the future symbol problem has to be solved. It turns out to be a simple problem and there are two solutions, a one-pass assembler and a two-pass assembler. They represent not just different solutions to the future symbol problem but two different approaches to assembler design and operation. The one-pass assembler, as the name implies, solves the future symbol problem by reading the source file once. Its most important feature, however, is that it does not generate a relocatable object file but rather loads the object code (the machine language program)directly into memory. Similarly, the most important feature of the two-pass assembler is that it generates a relocatable object file, that is later loaded into memory by a loader. It also solves the future symbol problem by performing two passes over the source file. It should be noted at this point that a one-pass assembler can generate an object file. Such a file, however, would be absolute, rather than relocatable, and its use is limited. Absolute and relocatable object files are discussed later in this chapter. Figure is a summary of the most important components and operations of an assembler.
pass just to determine the size of an instruction. It has to look at the mnemonic and, sometimes, at the operands and the modes, even though it does not assemble the instruction in the first pass. All the information about the mnemonic and the operand collected by the assembler in the first pass is extremely useful in the second pass, when instructions are assembled. This is why many assemblers save all the information collected during the first pass and transmit it to the second pass through an intermediate file. Each record on the intermediate file contains a copy of a source line plus all the information that has been collected about that line in the first pass. At the end of the first pass the original source file is closed and is no longer used. The intermediate file is reopened and is read by the second pass as its input file. A record in a typical intermediate file contains: The record type. It can be an instruction, a directive, a comment, or an invalid line. The LC value for the line. A pointer to a specific entry in the OpCode table or the directive table. The second pass uses this pointer to locate the information necessary to assemble or execute the line. The Two-Pass Assembler 21 A copy of the source line. Notice that a label, if any, is not use by pass 2 but must be included in the intermediate file since it is needed in the final listing. Fig. 12 is a flow chart summarizing the operations in the two passes. There can be two problems with labels in the first pass; multiply-defined labels and invalid labels. Before a label is inserted into the symbol table, the table has to be searched for that label. If the label is already in the table, it is doubly (or even multiply-)defined. The assembler should treat this label as an error and the best way of doing this is by inserting a special code in the type field in the symbol table. Thus a situation such as: AB ADD 5,X . . AB SUB 6,Y . . JMP AB will generate the entry: name value type AB MTDF in the symbol table. Labels normally have a maximum size (typically 6 or 8 characters), must start with a letter, and may only consist of letters, digits, and a few other characters. Labels that do not conform to these rules are invalid labels and are normally considered a fatal error. However, some assemblers will truncate a long label to the maximum size and will issue just a warning, not an error, in such a case. Exercise What is the advantage of allowing characters other than letters and digits in a label? The only problem with symbols in the second pass is bad symbols. These are either multiply-defined or undefined symbols. When a source line uses a symbol in the operand field, the assembler looks it up in the symbol table. If the symbol is found but has a type of MTDF, or if the symbol is not found in the symbol table (i.e., it has not been defined), the assembler responds as follows. It flags the instruction in the listing file.It assembles the instruction as far as possible, and writes it on the object file. It flags the entire object file. The flag instructs the loader not to start execution of the program. The object file is still generated and the loader will read and load it, but not start it. Loading such a file may be useful if the user wants to see a memory map
The Operations of the Two-Pass Assmbler. The JMP AB instruction above is an example of a bad symbol in the operand field. This instruction cannot be fully assembled, and thus constitutes our first example of a fatal error detected and issued by the assembler. The last important point regarding a two-pass assembler is the box, in the flow chart above, that says write object instruction onto the object file. The point is that when the two-pass assembler writes the machine instruction on the object file, it has access to the source instruction. This does not seem to be an important point but, in fact, it constitutes the main difference between the one-pass and the two-pass assemblers. This point is the reason why a one-pass assembler can only produce an absolute object file (which has only limited use), whereas a two-pass assembler can produce a relocatable object file, which is much more general.
in the instruction is too small to store a pointer. In such a case the assembler must resort to other methods, one of which is discussed below. Copying the LC (=67)in to the value field of the symbol table entry for AB, rewriting the 36. When the assembler reaches the JMP AB instruction, it repeats the three steps above. The situation at those three points is summarized below.
memory symbol memory symbol memory symbol table table table loc contents n v t loc contents n v t loc contents n v t 36 BEQ - . 36 BEQ - . 36 BEQ - . ...... . AB 36 U . AB 67 U . AB 89 U . 67 BNE 36 . 67 BNE 36 . .... . 89 JMP 67
It is obvious that an indefinite number of instructions can refer to AB as a future symbol. The result will be a linked list linking all these instructions. When the definition of AB is finally found (the LC will be 126 at that point), the assembler searches the symbol table for AB and finds it. The type field is still U which tells the assembler that AB has been used as a future symbol. The assembler then follows the linked list of instructions using the pointers found in the instructions. It starts from the pointer found in the symbol table and, for each instruction in the list, the assembler: saves the value of the pointer found in the address field of the instruction. The pointer is saved in a register or a memory location (temp in the figure below), and is later used to find the next incomplete instruction. Stores the value of AB (=126)in the address field of the instruction, thereby completing it. The last step is to store the value 126 in the value field of AB in the symbol table, and to change the type to D. The individual steps taken by the assembler in our example are shown in the table below. It, therefore, follows that at the end of the single pass, the symbol table should only contain symbols with a type of D. At the end of the pass, the assembler scans the symbol table for undefined symbols. If it finds any symbols with a type of U, it issues an error message and will not start the program. Figure 13 is a flow chart of a one-pass assembler. The one-pass assembler loads the machine instructions in memory and thus has no trouble in going back and completing instructions. However, the listing generated by such an assembler is incomplete since it cannot backspace the listing
26 Basic Principles Ch. 1
Address Contents Contents Contents 36 BEQ - BEQ - BEQ 126 . . 67 BNE 36 BNE 126 BNE 126 . . 89 JMP 126 JMP 126 JMP 126 temp=67 temp=36 temp=/ Step 1 Step 2 Step 3 file to complete lines previously printed. Therefore, when an incomplete instruction (one that uses a future symbol)is loaded in memory, it also goes into the listing file as incomplete. In the example above, the three lines using symbol AB will be printed with asterisks * or question marks ?, instead of the value of AB. _ The key to the operation of a one-pass assembler is the fact that it loads the object code directly in memory and does not generate an object file. This makes it possible for the assembler to go back and complete instructions in memory at any time during assembly. The one-pass assembler can, in principle, generate an object file by simply writing the object program from memory to a file. Such an object file, however, would be absolute. Absolute and relocatable object files are discussed below. One more point needs to be mentioned here. It is the case where the address field in the instruction is too small for a pointer. This is a common case, since machine instructions are designed to be short and normally do not contain a full address. Instead of a full address, a typical machine instruction contains two fields, mode and displacement (or offset), such that the mode tells the computer how to obtain the full address from the displacement (see appendix A). The displacement field is small (typically 812 bits)and has no room for a full address. To handle this situation, the one-pass assembler has an additional data structure, a collection of linked lists, each corresponding to a future symbol. Each linked list contains, in its nodes, pointers to instructions that are waiting to be completed.
The list for symbol AB is shown below in three successive stages of its construction. When symbol AB is found, the assembler uses the information in the list to complete all incomplete instructions. It then returns the entire list to the pool of available memory. As the name implies, these are assemblers for high-level assembler languages. Such languages are rare, there is no general agreement on how to define them, on what their main features should be, and on whether they are useful and should be developed at all. Existing high-level assemblers differ in many respects and, on looking at several of them, two possible definitions emerge:
Symbol Table:
The organization of the symbol table is the key to fast assembly. Even when working on a small program, the assembler may use the symbol table hundreds of times and, consequently, an efficient implementation of the table can cut the assembly time significantly even for short programs. _ The symbol table is a dynamic structure. It starts empty and should support two operations, insertion and search. In a two-pass assembler, insertions are done only in the first pass and searches, only in the second. In a one-pass assembler, both insertions and searches occur in the single pass. The symbol table does not have to support deletions, and this fact affects the choice of data structure for implementing the table. A symbol table can be implemented in many different ways but the following methods are almost always used, and will be discussed here: A linear array. A sorted array with binary search. Buckets with linked lists. A binary search tree. A hash table.
A Linear Array
The symbols are stored in the first N consecutive entries of an array, and a new symbol is inserted into the table by storing it in the first available entry (entry N + 1) of the array. A typical Pascal code for such an array would be: var symtab: record N: 0..lim; tabl: array[0..lim] of record name: string;
valu: integer; type: char; end; end; Where lim is some suitable constant. The variable N is initially set to zero, and it always points to the last entry in the array. An insertion is done by: Testing to make sure that N <lim (the symbol table is not full). Incrementing N by 1. Inserting the name, value, and type into the three fields, using N as an index. The insertion takes fixed time, independent of the number of symbols in the table. To search, the array of names is scanned entry by entry. The number of steps involved varies from a minimum of 1 to a maximum of N. Every search for a nonexistent symbol involves N steps, thus a program with many undefined symbols will be slow to assemble because the average search time will be high. Assuming a program with only a few undefined symbols, the average search time is N/2. In a two-pass assembler, insertions are only done in the first pass so, at the end of that pass, N is fixed. All searches in the second pass are performed in a fixed table. In a one-pass assembler, N grows during the pass, and thus each search takes an average of N/2 steps, but the values of N are different. Advantages: Fast insertion. Simple operations. Disadvantages: Slow search, specially for large values of N. Fixed size.
the same comparisons as in the insertion process above. The average search thus also takes 1 + N/52 steps. Such a symbol table has a variable size. More nodes can be allocated and added to the buckets, and the table can, in principle, use the entire available memory. Advantages: Fast operations. Flexible table size. Disadvantages: Although the number of steps is small, each step involves the use of a pointer and is therefore slower than a step in the previous methods (that use arrays). Also, some programmers always tend to assign names that start with an A. In such a case all the symbols will go into the first bucket, and the table will behave essentially as a linear array. Such an implementation is recommended only if the assembler is designed to assemble large programs, and the operating system makes it convenient to allocate storage for list nodes. Exercise 2.1 What if symbol names can start with a character other than a letter? Can this data structure still be used? If yes, how?
is a good source for binary search trees and it also discusses the average times for insertion, search, and deletion (which, in the case of a symbol table, is unnecessary). The minimum number of steps for insertion or search is obviously 1. The maximum number of steps depends on the height of the tree. The tree in Fig. 21 above has a height of 7, so the next insertion will require from 1 to 7 steps. The height of a binary tree with N nodes varies between log2 N (which is the height of a fully balanced tree), and N (the height of a skewed tree). It can be proved that an average binary tree is closer to a balanced tree than to a skewed tree, and this implies that the average time for insertion or search in a binary search tree is of the order of log2 N.
Advantages: Efficient operation (as measured by the average number of steps). Flexible size. Disadvantages: Each step is more complex than in an array-based symbol table. The recommendations for use are the same as for the previous method.
Ideally, a hash table requires fixed time for insert and search, and can be an excellent choice for a large symbol table. There are, however, two problems associated with this method namely, collisions and overflow, that make hash tables less than ideal. Collisions involve the case where two entirely different symbol names are hashed into identical indexes. Names such as SYMB and ZWYG6 can be hashed into the same value, say, 54. If SYMB is encountered first in the program, it will be inserted into entry 54 of the hash table. When ZWYG6 is found, it will be hashed, and the assembler should discover that entry 54 is already taken. The collision problem cannot be avoided just by designing a better hash function. The problem stems from the fact that the set of all possible symbols is very large, but any given program uses a small part of it. Typically, symbol names start with a letter, and consist of letters and digits only. If such a name is limited to six characters, then there are 26 365 ( 1.572 billion) possible names. A typical program rarely contains more than, say, 500 names, and a hash table of size 512 (= 29) may be sufficient. When 1.572 billion names are mapped into 512 positions, more than 3 million names will map into each position. Thus even the best hash function will generate the same index for many different names, and a good solution to the collision problem is the key to an efficient hash table. The simplest solution involves a linear search. All entries in the symbol table are originally marked as vacant. When the symbol SYMB is inserted into entry 54,
that entry is marked occupied. If symbol ZWYG6 should be inserted into entry 54 and that entry is occupied, the assembler tries entries 55, 56 and so on. This implies that, in the case of a collision, the hash table degrades to a linear table. Another solution involves trying entry 54 + P where P and the table size are relative primes. In either case, the assembler tries until a vacant entry is found or until the entire table is searched and found to be all occupied. Morris [16] presents a complete analysis of hash tables, where it is shown that the average number of steps to insert (or search for a) symbol is 1/(1 p) where p is the percent-full of the table. p = 0 corresponds to an empty table, p = 0.5 means a half-full table, etc. The following table gives the average number of steps for a few values of p. number p of steps 01 .4 1.66 .5 2 .6 2.5 .7 3.33 .8 5 .9 10 .95 20
Sec. 2.5 A Hash Table 65
It is clear that when the hash table gets more than 50%60% full, performance suffers, no matter how good the hashing function is. Thus a good hash table design makes sure that the table never gets more than 60% occupied. At that point the table is considered overflowed. The problem of hash table overflow can be handled in a number of ways. Traditionally, a new, larger table is opened and the original table is moved to the new one by rehashing each element. The space taken by the original table is then released. Hopgood [17] is a good analysis of this method. A better solution, though, is to use open hashing. 2.5.2 Open hashing An open hash table is a structure consisting of buckets, each of which is the start of a linked list of symbols. It is very similar to the buckets with linked lists discussed above. The principle of open hashing is to hash the name of the symbol and use the hash index to select a bucket. This is better than using the first character in the name, since a good hash function can evenly distribute the names over the buckets, even in cases where many symbols start with the same letter. Aho et al. [18] presents an analysis of open
LOADERS:
To better understand loaders, some material from previous chapters should be reviewed. The principles of operation of one pass and two pass.These topics discuss three of the four main tasks of a loader namely, loading,relocation, and linking. The fourth task is memory allocation (finding room in memory for the program). A loader therefore does more than its name implies. A loader performing all four tasks is called a linking loader. (however, some authors call it a relocating loader. Perhaps the best name would be a general loader.) A loader that does everything except loading is called a linker (in the UNIVAC literature, it is called a collector, Burroughs calls it a binder and IBM, a linkage editor ). An absolute loader is one that supports neither relocation nor linking. As a result, loaders come in many varieties, from very simple to very complex, and range in size from very small (a few tens of instructions for a bootstrap loader)
to large (thousands of instructions). A few good references for loaders are [1, 3, 46, 64, 82]. Neither assemblers nor loaders are user programs. They are a part of the operating system (OS). However, the loader can be intimately tied up with the rest of the operating system (because of its memory allocation task), while the assembler is more a stand-alone program, having little to do with the rest of the OS. Most of this chapter is devoted to linking loaders, but it starts with two short sections describing assemble-go loaders and absolute loaders. It ends with a number of sections devoted to special features of loaders and to special types of loaders. Before we start, here is a comment on the word relocate. Loaders do not relocate a program in the sense that they do not move it in memory from one area to another. The loader may reload the same program in different memory areas but, once loaded, the program normally is not relocated. There are some exceptions where a program is relocated, at run time, to another memory area but, in general, the term relocate is a misnomer. Exercise If it is a misnomer, why do we use it?
Absolute Loaders
An absolute loader is the next step in the hierarchy of loaders. It can load an absolute object file generated by a one-pass assembler. (Note that some linkage editors also generate an absolute object file.) This partly solves some of the problems mentioned above. Still, such a loader is limited in what it can do. An absolute object file consists of three parts: The start address of the program. This is where the loader should start loading the program.
Sec. 7.2 Absolute Loaders 199
The object instructions. The address of the first executable instruction. This is placed in the object file by the assembler in response to the END directive. It is either the address specified by the END or, in the absence of such an address, is identical to the first address of
the program. The loader reads the first item and loads the rest of the object file into successive memory locations. Its last step is to read item 3 (the address of the first executable instruction) from the object file, and to branch to that address, in order to start execution of the program.Library routines are handled by an absolute loader in the same way as by an assemble-go system.It turns out that even a one-pass assembler can, under certain conditions, generate code that will run when loaded in any memory area. This code is called position independent and is generated when certain addressing modes are used, or when the hardware uses base registers.Addressing modes are described in appendix A. Modes such as direct, immediate,relative, stack, and a few others, generate code that is position independent. A program using only such modes can be loaded and executed starting at any address in memory, and no relocation is necessary.The use of base registers is not that common but, since they are one of thefew ways for generating position independent code, they are also described in appendix A.
Linking Loaders
These are full-feature, general loaders that support the four tasks mentioned earlier. Such a loader can load several object files, relocating each, and linking them into one executable program. The loader, of course, has access neither to the source file nor to the symbol table. This is why the individual object files must contain all the information needed by the loader. A word on terminology. In IBM terminology a load module is an absolute object file (or something very similar to it), and an object module is a relocatable object file. Those terms are discussed in detail in point 7 below. The following is a summary of the main steps performed by such a loader: 1. It reads, from the standard input device, the names of all the object files to be loaded. Some may be library routines. 2. It locates all the object files, opens each and reads the first record. This record (see figure 73b) is a loader directive containing the size of the program written in that file. The loader then adds the individual sizes to compute the total size of the program. With the OS help, the loader then locates an available memory area large enough to acommodate the 3. The next step is to read the next couple of items from the first object file. These are loader directives, each corresponding to a special symbol (EXTRN or ENTRY). This information is loaded in memory in a special symbol table (SST) to be used later for linking. 4. Step 3 is repeated for all remaining object files. After reading all the special symbol information from all the object files, the loader scans the SST, merging items as described below. This process converts the SST into a global external symbol table (GEST). If no errors are discovered during this process, the GEST is ready and the loader uses it later to perform linking. 5. The loader then reads the rest of the first object file and loads it, relocating instructions when necessary. All loader directives found in the file are executed. Any item requiring special relocation is handled as soon as it is read off the file, using information in the GEST. Some of those items may require loading routines off libraries (see later in this chapter). 6. Step 5 is repeated for all remaining object files. They are read and loaded in the order in which their names were read in step 1. 7. The loader generates three outputs. The main output is the loaded program. It is loaded in memory as one executable module where one cannot tell if instructions came from different object files. In a computer where virtual memory is used, the program is physically divided into pages (or logically divided into segments) which are loaded in different areas of memory. In such a case, the program does not occupy a contiguous memory area. Nevertheless, it is considered a single module and it gets executed as one unit. Pages and segments are described in any Operating Systems or Systems Programming text.
The second (optional) output of the loader is a listing file with error messages, if any, and a memory map. The memory map contains, for each program, its name, start address, and size. The name is specified by the user in a special directive (IDENT or TITLE) or, in the absence of such a directive, it is the name of the object file. The third loader output is also optional and is a single object file for the entire program. This file includes all the individual programs after linking, so it does not include any linking information, but includes relocation bits. It is called a load module. Such a file can later be loaded by a relocating loader without having to do any linking, which speeds up the loading. Note that a load module is the main output of a linkage editor (see below). The reason for this output is that, in a production environmentwhere programs are loaded and executed frequently, but rarely need to be reassembled (or recompiled)fast load becomes important. In such an environment it makes sense to use two types of loaders. The first is a linker or a linkage editor, which performs just linking and produces a load module. The second is a simple relocating loader that reads and loads a load module, performing just the three tasks of memory Linking Loaders 201allocation, loading, and relocation. By eliminating linking, the relocating loader works fast.On the other hand, when programs are being developed and tested, they have to be reassembled or recompiled very often. In such a case it makes more sense to use a full-feature loader, which performs all four tasks. Using two loaders would be slower since most runs would involve a new version of the program and would necessitate executing both loaders.Linking can be done at a number of different stages.. It turns out that late linking allows for more flexibility. The latest possible moment to do the linking is at run time. This is the dynamic linking feature discussed later in this chapter. Consider an instruction that requires linking, something like a CALL LB instruction, which calls a library routine LB. This instruction is loaded but is not always executed. (Recall that, each time a program is run, different instructions are executed.) Doing the linking at run time has the advantage that, if the CALL LB instruction is not executed, the library routine does not have to be loaded. Of course there is a tradeoff. Run time linking requires some of the loader routines to reside in memory with the program.
Overlays
Many modern computers use virtual memories that make it possible to run programs larger than the physical memory. Either one program or several programs can be executed even if the total size is greater than the entire memory available. When a computer does not use virtual memory, running a large program becomes a problem. One solution is overlays (or chaining), which will be discussed here since its implementation involves the loader. Overlays are based on the fact that many programs can be broken into logical parts such that only one part is needed in memory at any time. The program is
216 Loaders Ch. 7
divided, by the programmer, into a main part (the overlay root), that resides in memory during the entire execution, and several overlays (links or segments) that can be called, one at a time, by the root, loaded and executed. All the links share the same memory area whose size should be the maximum size of the links. A link may contain one program or several programs, linked in the usual way. At any given time, only the root and one link are active (but see the discussion of sublinks and tree structure below). Two features are needed to implement overlays: A directive declaring the start of each overlay. Those directives are recognized by the assembler which, in turn, prepares a separate object file for each overlay. A special CALL OVERLAY instruction to load an overlay (a link) at run time. Such an instruction calls a special loader routine, the overlay manager, resident in memory with the main program, which loads the specific overlay from the object file into the shared memory area. The last executable instruction in the overlay must
be a return. It should return to the calling program, which is typically the main part, but could also be another overlay. Such a return works either by popping the return address fron the stack, or by generating a software interrupt, that transfers control to the overlay manager in the OS. A typical directive declaring an overlay is OVERLAY n (or LINK n) where n is the overlay number. Each such directive directs the assembler to finish the previous assembly, write an object file for the current overlay, and start a new assembly for the next overlay. The END directive terminates the last link. The result is a number of object files, the first of which is a regular one, containing the main program. All the rest are special, each containing a loader directive declaring it to be an overlay and specifying the number of the overlay. The loader receives the names of all the object files, it loads the first one but, upon opening the other ones, finds that they are overlays. As a result, the other object files are not loaded but are left open, accessible to the loader. The loader uses the maximum size of those files as the size of the shared memory area and loads, following the main program, a routine that can locate and load an overlay. At run time, each CALL OVERLAY[n] (or CALL LINK) instruction, invokes that routine which loads the overlay, on top of the previous one, into the shared area. As far as relocating the different overlays, there are two possibilities: The first one is to relocate each overlay while it is loaded. The other possibility is to prepare a pre-relocated (absolute) version of each overlay and load the absolute versions. This requires more load time work but speeds up loading the overlays at run time. Generally, an overlay is a large part of the program and is not loaded many times. In such a case, the first alternative, of relocating the overlay each time it is loaded, seems a better choice. In general, each overlay may be very large, and sub-overlays can be declared. The result is a program organized as a tree where each branch corresponds to an overlay, each smaller branch, to a sub-overlay, etc. Figure 77 is an example of such a tree. The table below assumes certain sizes for the different links and a start address
Overlays 217
Figure An Overlay Tree. of 0 for the root A. It then shows the start addresses of each link and the total size of the program when that link is loaded.
UNIX:OPRATING SYSTEM
Booting UNIX: Loading the Kernel: Most systems, particularly PCs, implement a two-stage loading process: The system BIOS loads a small boot program. This small boot program in turns loads the kernel. On PCs, this small boot program exists within the first 512 bytes of the boot device. This 512-byte segment is called the Master Boot Record. The MBR is what loads the kernel from disk. A popular boot loader used by most Linux distributions to boot Linux is called LILO. LILO can also be used to boot other operating systems as well such as MS-DOS, Windows 98 and Windows NT. LILO can be installed to either the MBR or to the boot record of the Linux root partition. Install to the boot record instead of the MBR to use another boot loader for another OS which does not know how to boot Linux itself. For example, say that you want to have both Windows NT and Linux on the same box and you want to dual boot between them. You can have NTs boot loader installed to the MBR and add the option to boot Linux to its boot menu. If the user elects to boot Linux, NTLOADER will then pass control to LILO which will in turn load the Linux kernel. FreeBSD has something similar to LILO for loading its kernel. It consists of two parts: one which lives in the MBR (see man boot0cfg) and another part which lives in the FreeBSD root partition (see man disklabel). UNIX and UNIX-like systems for non-PC hardware typically follow a straightforward (but usually proprietary and system specific) scheme for booting their kernels. The kernel itself is a program that usually lives in the root partition of the UNIX filesystem. Most Linux distributions call it /vmlinuz and it often a symbolic link to the real kernel file which lives in /boot. Other UNIX and UNIX-like systems may call it /unix, /vmunix, or /kernel. After the kernel is brought in from disk into main memory, it begins execution and one of the first things it does is initialize the systems hardware. All those cryptic messages you see fly by when the Linux kernel first starts up are messages from the compiled-in kernel drivers initializing and configuring your hardware. Other UNIX and UNIX-like systems do something similar. Sometimes the kernel needs help in configuring your hardware. Information such as IRQ, DMA, and I/O base addresses need to be specified to the kernel. With Linux these can be specified via its command line. The BootPrompt-HOWTO has more information about the Linux command line. This can be had from http://www.linuxdoc.org.
The first program the kernel attempts to execute after basic system initialization is complete is called init.The init process is the mother of all processes running on a UNIX system. If this process dies, so does the system.inits job after basic system initialization is complete is to take over the system start-up procedure and complete the system bootstrap process. The actual program which the Linux kernel executes as the init process can be specified via the init command line parameter. For example, to start bash instead of init, you can specify init=/bin/bash on the Linux command line. (see BootPrompt-HOWTO for details.) Startup Scripts System V Style:: All start-up scripts are typically kept in a directory named init.d which usually lives somewhere under /etc. Red Hat Linux places this directory under /etc/rc.d. HP-UX places this directory under /sbin. Each start-up script can usually accept at least two command line arguments: start and stop. start tells the script to start whatever it is that script is responsible for. stop tells the script to stop whatever it is that script is responsible for.
All start-up scripts are typically kept in a directory named init.d which usually lives somewhere under /etc. Red Hat Linux places this directory under /etc/rc.d. HP-UX places this directory under /sbin. Each start-up script can usually accept at least two command line arguments: start and stop. start tells the script to start whatever it is that script is responsible for. stop tells the script to stop whatever it is that script is responsible for. Each run-level gets its own directory and is usually under /etc, but sometimes can be found under /sbin on some systems. This directory follows the naming convention of rcn.d where n is the run-level, i.e. scripts for run-level 2 would be found under a directory named rc2.d. This directory contains scripts which are executed when that run-level is entered. While this directory can contain actual scripts, it usually consists of symbolic links to real scripts which lives under the init.d directory. Scripts in the run-level directory are executed in alphanumeric order and if the script name begins with a S the script is passed the start command line parameter and if it begins with a K it is passed the stop command line parameter. SysV inits configuration file is /etc/inittab. This file tells init what script it should run for each run-level. A common way to implement the SysV style start-up procedure is to have init execute some master control script passing to it as an argument the run-level
number. This script then executes all of the scripts in that run-levels script directory. For example, for run-level 2, init may execute the script /etc/init.d/rc passing it the argument 2. This script in turn would execute every script in runlevel 2s script directory /etc/rc2.d. SINGLE USER MODE: Single-user mode is a special administrative mode that usually starts the system with a minimal configuration. For example, no system daemons are started and extra filesystems may not be mounted. Single-user mode is typically used to repair a broken system such as fscking a sick filesystem which cannot be repaired by the automatic fscking procedure. Entering single-user mode varies from system to system, but it usually involves specifying to init a special flag before the system starts up. This can be done in Linux by specifying the parameter single on the LILO boot prompt. On SysV-ish systems, single user mode can also be entered by telling init to enter run-level 1 or S. This can be done via the telinit command. SYSTEM SHUT DOWN UNIX systems have to be gracefully powered down. You cannot just shut the system off. This can damage the system. The typical way to shutdown the UNIX system is to use the shutdown command. shutdown allows the system administrator to broadcast a message to all currently logged in users that the system is about to be shutdown. The exact syntax of the shutdown command tends to vary from system to system. Check shutdowns man page for details.
Operating Systems and Basics OS is system software, which may be viewed as collection of software consisting of procedures for operating the computer & providing an environment for execution of programs. Its an interface between user & computer. Types of Processing: 1. Serial Processing 2. Batch Processing 3. Multiprogramming. Types of OSs: 1. Batch OS 2. Multiprogramming OS
Multitasking/Multiprocessing Multiuser OS Time Sharing OS Real Time OS 3. Network OS 4. Distributed OS OS Structure: 1. Layered Structure 2. Kernel Structure Create & Delete process Processor scheduling, mem mgmt & I/O mgmt. Process synchronization. IPC help 3. Virtual Machine 4. Client Server model Process Management Process Status: New, ready to run, running, suspended, sleep, wait, terminate. Types of Scheduler: 1. Long term/Job Scheduler 2. Medium term/ 3. Short term/ CPU Scheduler Processes can be either CPU bound or I/O bound. Scheduling performance criteria: CPU utilisation Throughput Turnaround time Waiting time Response Time
Scheduling Algorithms: Preemptive First-come-first served Non-Preemptive Shortest-job-first Round Robin. Priority based scheduling Multi-level Queue Processing the interrupt to switch the CPU to another process requires saving all the registers for the old process & then loading the registers for new process is known as Context Switching.
Scheduling Mechanisms
A multiprogramming operating system allows more than one process to be loaded into the executabel memory at a time and for the loaded process to share the CPU using time-multiplexing.Part of the reason for using multiprogramming is that the operating system itself is implemented as one or more processes, so there must be a way for the operating system and application processes to share the CPU. Another main reason is the need for processes to perform I/O operations in the normal course of computation. Since I/O operations ordinarily require orders of magnitude more time to complete than do CPU instructions, multiprograming systems allocate the CPU to another process whenever a process invokes an I/O operation
Context Switching
Typically there are several tasks to perform in a computer system. So if one task requires some I/O operation, you want to initiate the I/O operation and go on to the next task. You will come back to it later. This act of switching from one process to another is called a "Context Switch"
When you return back to a process, you should resume where you left off. For all practical purposes, this process should never know there was a switch, and it should look like this was the only process in the system. To implement this, on a context switch, you have to save the context of the current process select the next process to run restore the context of this new process.
All this information is usually stored in a structure called Process Control Block (PCB). All the above has to be saved and restored.
context_switch() is called even when the process is running usually done via a timer interrupt.
Round Robin
Round Robin calls for the distribution of the processing time equitably among all processes requesting the processor.Run process for one time slice, then move to back of queue. Each process gets equal share of the CPU. Most systems use some variant of this.
Suppose we use 1 ms time slice: then compute-bound process gets interrupted 9 times unnecessarily before I/O-bound process is runnable
Problem: Round robin assumes that all processes are equally important; each receives an equal portion of the CPU. This sometimes produces bad results. Consider three processes that start at the same time and each requires three time slices to finish. Using FIFO how long does it take the average job to complete (what is the average response time)? How about using round robin?
* Process A finishes after 7 slices, B 8, and C 9, so the average is (7+8+9)/3 = 8 slices. Round Robin is fair, but uniformly enefficient. Solution: Introduce priority based scheduling.
Comments: In priority scheduling, processes are allocated to the CPU on the basis of an externally assigned priority. The key to the performance of priority scheduling is in choosing priorities for the processes. Problem: Priority scheduling may cause low-priority processes to starve Solution: (AGING) This starvation can be compensated for if the priorities are internally computed. Suppose one parameter in the priority assignment function is the amount of time the process has been waiting. The longer a process waits, the higher its priority becomes. This strategy tends to eliminate the starvation problem.
Comments: SJF is proven optimal only when all jobs are available simultaneously.
Problem: SJF minimizes the average wait time because it services small processes before it services large ones. While it minimizes average wiat time, it may penalize processes with high service time requests. If the ready list is saturated, then processes with large service times tend to be left in the ready list while small processes receive service. In extreme case, where the system has little idle time, processes with large service times will never be served. This total starvation of large processes may be a serious liability of this algorithm. Solution: Multi-Level Feedback Queques
Attacks both efficiency and response time problems. Give newly runnable process a high priority and a very short time slice. If process uses up the time slice without blocking then decrease priority by 1 and double its next time slice. Often implemented by having a separate queue for each priority. How are priorities raised? By 1 if it doesn't use time slice? What happens to a process that does a lot of computation when it starts, then waits for user input? Need to boost priority a lot, quickly.
Swapping
The early deveIopment of UMX systems transferred entire processes between primary memory and secondary storage device but did not transfer parts of a process independently, except for shared text. Such a memory management policy is called swapping. UNIXwas fist implemented on PDP-11, where the total physical memory was limited to 256Kbytes. The total memory resources were insufficient to justify or support complex memory management algorithms. Thus, UNIXswapped entire process memory images. Allocation of both main memory and swap space is done first- fit. When the size of a process' memory image increases (due to either stack expansion or data expansion), a new piece of memory .big enough for the whole image is allocated. The memory image is copied, the old memory is freed, and the appropriate tables are updated. (An attempt is made in some systems to find memory contiguous to the end of the current piece, to avoid some copying.) If no single piece of main memory is large enough, the process is swapped out such that it will be swapped back in with the new size. There is no need to swap out a sharable text segment, because it is read-only, and there is no need to read in a sharable text segment for a process when another instance is already in memory. That is one of the main reasons for keeping track of sharable text segments: less
swap traffic. The other reason is the reduced amount of main memory required for multiple processes using the same text segment. Decisions regarding which processes to swap in or swap out are made by the scheduler process (also known as the swapper). The scheduler wakes up at lcast once every 4 seconds to check for processes to be swapped in or out. A process is more likely to be swapped out if it is idle or has been in main memory for a long time, or is large; if no obvious candidates are found, other processes are picked by age. A process is more likely to be swapped in if its has been swapped out a long time, or is small. There are checks to prevent thrashing, basically by not letting a process be swapped out if it's not been in memory for a certain amount of time. If jobs do not need to be swapped out, the process table is searched for a process deserving to be brought in (determined by how small the process is and how long it has been swapped out). Processes are swapped out until there is not enough memory available. Many UNIX systems still use the swapping scheme just described. All Berkeley UNIX systems, on the other hand, depend primarily on paging for memory-contention management, and depend only secondarily on swapping. A scheme similar in outline to the traditional one is used to determine which processes get swapped in or out, but the details differ and the 1influence of swapping is less.
Demand Paging
As there is much less physical memory than virtual memory the operating system must be careful that it does not use the physical memory inefficiently. One way to save physical memory is to only load virtual pages that are currently being used by the executing program. For example, a database program may be run to query a database. In this case not all of the database needs to be loaded into memory, just those data records that are being examined. Also, if the database query is a search query then the it does not make sense to load the code from the database program that deals with adding new records. This technique of only loading virtual pages into memory as they are accessed is known as demand paging. When a process attempts to access a virtual address that is not currently in memory the CPU cannot find a page table entry for the virtual page referenced. For example, in Figure there is no entry in Process X's page table for virtual PFN 2 and so if Process X attempts to read from an address within virtual PFN 2 the CPU cannot translate the address into a physical one. At this point the CPU cannot cope and needs the operating system to fix things up. It notifies the operating system that a page fault has occurred and the operating system makes the process wait whilst it fixes things up. The CPU must bring the appropriate page into memory from the image on disk. Disk access takes a long time, relatively speaking, and so the process must wait quite a while until the page has been fetched. If there are other processes that could run then the operating system will select one of them to run. The fetched page is written into a free physical page frame and an entry for the virtual PFN is added to the processes page table. The process is then restarted at the point where the memory fault occurred. This time the virtual memory access is made, the CPU can make the address translation and so the process continues to run. This is known as demand paging and occurs when the system is busy but also when an image is first loaded into memory. This mechanism means that a process can execute an image that only partially resides in physical memory at any one time.
Synchronization & IPC The shared storage may be in main memory or it may be a shared file. Each process has a segment of code, critical Section, which accessed shared memory or files. Some way of making sure that if one process is executing in its critical section, other process will be excluded from doing the same thing is known as Mutual Exclusion. Hardware support is available for mutual exclusion called Test & set instruction, it is designed to allow only one process among several concurrent processes to enter in its critical section. Semaphore: Its a synchronization tool, its a variable which accepts non-negative integer values and except for initialization may be accessed and manipulated through two primitive functions wait() & signal(). Disadvantages : 1. Semaphores are unstructured. 2. Semaphores do not support data abstraction. Alternative to Semaphores: 1. Critical region 2. Conditional critical region 3. Monitors 4. Message Passing Reason for Deadlock: 1. Mutual exclusion 2. Hold & wait 3. No preemption 4. Circular condition. Memory Management In a single process, system memory is protected through hardware mechanism such as dedicated register called Fence register. In a multi programming, memory can be allocated either Statically or Dynamically. Partition information is stored in partition Description Table. Two strategies are used to allocate memory to ready process are First Fit & Best Fit.
Loading program into memory by relocating load or linker in a static allocation is known as Static relocation. In Dynamic method, run time mapping of virtual address into physical address with support of some hardware mechanism such as base register & limit registers. Protection is served by using Limit registers to restrict the program to access memory location, sharing is achieved by using dedicated common partition. Static allocation does not support data structures like stack & queues. It limits degree of multi programming. Compaction is a process of collecting free space in to a single large memory chunk to fit the available process. It is not done, bcoz it occupies lot of CPU time. It is only supported in Mianframe & SuperComputers Paging is a memory management technique that permits a programs memory to be noncontiguous into physical memory, thus allowing a program to be allocated physical memory whenever is required. This is done by Virtual Address, later these address are converted to physical address. Memory is divided into number of fixed size blocks called Frames. The virtual address space or logical memory of a process is also broken into blocks of the same size called pages. When a program is to be run, its pages are loaded into any frame from the disk. Mapping is done thru Page Map Table which contains the base address of each page in physical memory. Hardware support is given to paging using Page Map Table Register (PMRT) which will be pointing to beginning of the PMT. Look side memory or Content addressable memory is used to overcome the problem of PMT. Address translation is done by Associative Memory which will convert virtual to physical address by page & offset values by looking into PMT. Segmentation is Memory management scheme its sophisticated form Address translation. It is done by Segment table, which is a important component in segmented system. Segment accessing supported by Segment Table Base register (STBR). Protection is enforced by Segment table Limit register (STLR). Virtual memory is memory management technique which splits the process into small chunks called Overlay. First overlay will can next overlay before quitting the CPU, the remaining overlays will be on Hard disk, the swapping is done by OS. Advantages: 1. What ever the size program, memory can be allocated easily. 2. Since the swapping is done between main & secondary memory, the CPU utilization & throughput will be increased. 3. It reduces the external fragmentation & Increases the program execution speed. In Demand paging, pages are loaded only on demand, not in advance, its same as paging with swapping feature.
Page fault occurs due to missing of page in the main memory, it means that the program is referring the address of the page which is not brought into memory. File Management Some systems support a single uniform set of file manipulation features for both file & I/O device management, this feature is known as Device Independent I/O or Device Independence. Printer is a one of a such example. File organization may be 1. Byte Sequenced in which OS does not impose any structure on the file organization. 2. Record Sequenced, its a sequence of fixed sized records, arbitrary records read or written, but records cant be inserted or deleted in middle of the file. 3. ISAM files are inserted in disc blocks which will have keys to inserted, its look like a tree of blocks., Responsibility of File Management. 1. Mapping of logical file address to physical disk address 2. Management of disk space & allocation - deallocation. 3. Keeping track of all files in system 4. Support for protection & sharing of files. Method to access the file organized in hierarchical form. Absolute pathname & relative pathname. File & directory searching is done using 1. Linear List organization which takes O(n) comparisons to locate a file. 2. Hashing Technique 3. Balanced binary tree which takes O(log n) comparisons to locate file, it always provides sorted list files which will increase the efficiency. Collection of tracks on all surfaces that are at the same distance is called a Cylinder. Disk Space Management Methods: 1. Linked list 2. Bit Map Disk allocation Methods 1. Contiguous: It supports both Sequential & Direct Accessing. Allocation is done using First Fit & Best Fit methods.
2. Linked List Advantages: Simple No disk Compaction. Disadvantages It doesnt support direct accessing since blocks are scattered over the disk. Slow direct accessing of any disk block Space requirement for pointers Reliability 3. Indexed Uses Index Block to support direct accessing. Problem with same can be solved by Multiple Level indexing, indirect blocks, double indirect block. Advantages No external fragmentation Efficient random access Indexing of free space is done with Bitmap Can keep the index of bad blocks. Disk Scheduling 1. First come first served (FCFS) 2. Shortest Seek Time First 3. Scan Scheduling also called Elevator Algorithm Setup and Status Commands
Command Purpose logout end your UNIX session passwd change password by prompting for old and new passwords stty set terminal options TABLE 1. Special Keys and Control Characters
Special Key Function/Description DELETE Acts as a rubout or erase key. Pressing DELETE once will backup and erase one character, allowing you to correct and retype mistakes. BACKSPACE This key is sometimes used as the rubout key instead of the DELETE key. Otherwise, it is mapped as a backspace key, which generates a ^H on the display. CTRL-U ^U erases the entire command line. It is also called the line kill character. CTRL-W ^W erases the last word on the command line.
CTRL-S ^S stops the flow of output on the display. CTRL-Q ^Q resumes the flow of output stopped by CTRL-S. CTRL-C ^C interrupts a command or process in progress and returns to the command line. This will usually work; if it doesnt, try typing several ^Cs in a row. If it still doesnt work, try typing ^\, q (for quit), exit, ^D, or ^Z. CTRL-Z ^Z suspends a command or process in progress. CTRL-D ^D generates an end-of-file character. It can be used to terminate input to a program, or to end a session with a shell. CTRL-\ ^\ quits a program and saves an image of the program in a file called core for later debugging.
A Selected Command List Introduction to the UNIX Operating System on IT Systems 13 date display or set the date finger display information about users ps display information about processes env display or change current environment set C shell command to set shell variables alias C shell command to define command abbreviations history C shell command to display recent commands
Editing Tools
Command Purpose pico simple text editor vi screen oriented (visual) display editor diff show differences between the contents of files grep search a file for a pattern sort sort and collate lines of a file (only works on one file at a time) wc count lines, words, and characters in a file look look up specified words in the system dictionary awk pattern scanning and processing language gnuemacs advanced text editor A Selected Command List 14 Introduction to the UNIX Operating System on IT Systems
lprloc locations & names of printers, prices per page pacinfo current billing info for this account
text file whereas the file payroll.c indicates a C program called payroll. For more information on programming conventions, see the section, Additional Resources. Some UNIX files begin with a period, for example, .cshrc or .login. Files that begin with a period will not appear in a normal directory listing and are usually UNIX environment and application setup files. A large grouping of files and directories is referred to as a file system. File systems are related to the disk size and structure, and to the internal structure of UNIX. What you should remember is that users files and directories are usually on a different file system than the systems files and directories. If the number of users is large, as on Owlnet, the user files and directories may be on more than one file system.
Creating Files
Many files are created using a text editor. A text editor is a program that allows you to enter and save text. You can also use a text editor to manipulate saved text through corrections, deletions, or insertions. The main text editors on Information Technology managed networks are vi, GNU Emacs, Pico, and aXe. (Note: vi is included with every UNIX system, but GNU Emacs is commonly installed separately by system managers. aXe is only available if you are using the X Window system.) You should learn how to use at least one of these tools. Information Technology has tutorial documents on each of these editors. Please see the section, Additional Resources, for information on the tutorials. You can create a file without a text editor by using the cat command (short for concatenate) and the > (redirect output) symbol. To create a file using the cat command, type: cat > new-filename Displaying Files 16 Introduction to the UNIX Operating System on IT Systems where new-filename is the name you wish to give the file. The command cat generally reads in a file and displays it to standard output. When there is no filename directly following the command, cat treats standard input as a file. The > symbol will redirect the output from cat into the new filename you specify. cat will keep reading and writing each line you type until it encounters an endoffile character. By typing CTRL-D on a line by itself, you generate an end-of-file character. It will stop when it sees this character. Try it, using this example as a guide: cat > practice When you reach the end of each line, press the RETURN key. You can only correct mistakes on the line you are currently typing. Use the DELETE key to move the cursor back to the mistake and then retype the rest of the line correctly. When you have completed the last line, press RETURN and type CTRL-D.
Displaying Files
Now that you have created a file, you can display it one of several ways. You could use the cat command. Just type cat followed by the name of the file that you want to see. cat practice Sometimes the files you want to view are very long. When using the cat command, the text will scroll by very quickly. You can control the flow of text by using CTRL-S and CTRL-Q. CTRL-S stops the flow of text and CTRL-Q restarts it. If you use CTRL-S, stopping the flow of text, and so on, you must remember to type CTRL-Q or the computer will not display any output, including anything that you type. more is a program that displays only one screen of information at a time; it waits for you to tell it to continue. Type more followed by a filename. more practice The computer will display one screen of text and then wait for you to press the space bar before it displays the next page of text, until you reach the end of the file. Pressing the ? character will show help for more. A utility of greater power called less is available on many systems; it allows reverse scrolling of files and other enhancements. It is invoked the same way as more.
Listing Files
The ls command will list the files in the current directory that do not begin with a period. Below is a list of options you can tack on to ls: ls -a lists all the contents of the current directory, including files with initial periods, which are not usually listed. ls -l lists the contents of the current directory in long format, including file permissions, size, and date information.
ls -s lists contents and file sizes in kilobytes of the current directory. Copying Files Introduction to the UNIX Operating System on IT Systems 17 If you have many files, your directory list might be longer than one screen. You can use the programs more or most with the | (vertical bar or pipe) symbol to pipe the directory list generated as output by the ls command into the more program. more or less will display the output from ls one page at a time. ls | more
Copying Files
To make a copy of a file, use the cp (copy) command. cp filename newfilename where filename is the file you wish to copy and newfilename is the file you are creating. cp practice sample (make a copy of practice called sample) ls practice sample The example created a new file called sample that has the same contents as practice. If sample already exists, the cp command will overwrite the previous contents. New accounts are often set up so that cp will prompt for confirmation before it overwrites an existing file. If your account is not set up in this manner, use the -i option (cp -i) to get the confirmation prompt, like so: cp -i practice sample
Renaming Files
To rename one of your files, use the mv (move) command. mv oldfilename newfilename where oldfilename is the original filename and newfilename is the new filename. For instance, to rename sample as workfile type: mv sample workfile ls practice workfile This moves the contents of sample into the new file workfile. (Note: Moving a file into an existing file overwrites the data in the existing file.) New accounts are often set up so that mv will prompt for confirmation before doing this. If your account is not set up in this manner, use the -i option (mv -i) to get the confirmation prompt. Deleting Files 18 Introduction to the UNIX Operating System on IT Systems
Deleting Files
To delete files, use the rm (remove) command. For instance, to delete workfile, type: rm workfile ls practice
directories, creating links is a better alternative to making a copy of the file for each directory (and then having to alter each one every time a change is made to the original). It is also more convenient than having to use the files full pathname every time you need to access it. Another use for linking a file is to allow another user access to that particular file without also allowing entry into the directory that actually contains the file. The kind of link you will want to create is called a symbolic link. A symbolic link contains the pathname of the file you wish to create a link to. Symbolic links can tie into any file in the file structure; they are not limited to files within a file system. Symbolic links may also refer to directories as well as individual files. To create a symbolic link to a file within the same directory, type: ln -s originalFile linkName where originalFile is the file that you want to link to and linkName is the link to that file. To create a link in a directory other than that of the original file, type: ln -s originalFile differentDirectoryName/linkName If you create a link within the same directory as the original file, you cannot give it the same name as the original file. There is no restriction on a files additional names outside of its own directory. Links do not change anything about a file, no matter what the link is named. If someone makes a link to one of your files, and you then delete that file, that link will no longer point to anything and may cause problems for the other user. NOTE: You should always use symbolic links when linking to files owned by others!
Printing Files
To print a file, use the lpr command: lpr filename or lpr [-Pprintername] filename (for laser printers only) To get a list of the printers available to your machine, type: lprloc lprloc lists all of the printers that your system knows about, by name, along with their type and location. To get some status information on the printers, use the command lpstat -p. Printer accounting information is available by running the command pacinfo.
Directories
About UNIX Directories
UNIX directories are similar to regular files; they both have names and both contain information. Directories, however, contain other files and directories. Many of the same rules and commands that apply to files also apply to directories. All files and directories in the UNIX system are stored in a hierarchical tree structure. Envision it as an upside-down tree, as in the figure below. FIGURE 2. UNIX Directory Structure At the top of the tree is the root directory. Its directory name is simply / (a slash character). Below the root directory is a set of major subdirectories that usually include bin, dev, etc, lib, pub, tmp, and usr. For example, the /bin directory is a subdirectory, or child, of / (the root directory). The root directory, in this case, is also the parent directory of the bin directory. Each path leading down, away from the root, ends in a file or directory. Other paths can branch out from directories, but not from files. pages on lpq, lpr, and lprm.
Displaying Directories
When you initially log in, the UNIX system places you in your home directory. The pwd command will display the full pathname of the current directory you are in. pwd /home/userid By typing the ls -a command, you can see every file and directory in the current directory, regardless of whether it is your home directory. To display the contents of your home directory when it is not your current directory, enter the ls command followed by the full pathname of your home directory. ls /home/userid If you are using a shell other than the Bourne shell, instead of typing the full pathname for your directory, you can also use the tilde symbol with the ls command to display the contents of your home directory. ls ~
To help you distinguish between files and directories in a listing, the ls command has a -F option, which appends a distinguishing mark to the entry name showing the kind of data it contains: no mark for regular files; / for directories; @ for links; * for executable programs: ls -F ~
Changing Directories
To change your current directory to another directory in the directory tree, use the cd command. For example, to move from your home directory to your projects directory, type:
cd projects (relative pathname from home directory) or, cd ~/projects (full pathname using ~) or, cd /home/userid/projects (full pathname) Using pwd will show you your new current directory. pwd /home/userid/projects To get back to the parent directory of projects, you can use the special .. directory abbreviation. cd .. pwd /home/userid If you get lost, issuing the cd command without any arguments will place you in your home directory. It is equivalent to cd ~, but also works in the Bourne shell.
Renaming Directories
You can rename an existing directory with the mv command: mv oldDirectory newDirectory The new directory name must not exist before you use the command. The new directory need not be in the current directory. You can move a directory anywhere within a file system.
Removing Directories
To remove a directory, first be sure that you are in the parent of that directory. Then use the command rmdir along with the directorys name. You cannot remove a directory with rmdir unless all the files and subdirectories contained in it have been erased. This prevents you from accidentally erasing important subdirectories. You could erase all the files in a directory by first going to that directory (use cd) and then using rm to remove all the files in that directory. The quickest way to remove a directory and all of its files and subdirectories (and their contents) is to use the rm -r (for recursive) command along with the directorys name. For example, to empty and remove your projects directory, move to that directorys parent, then type: rm -r projects (remove the directory and its contents)
(owner), g for group, o for others, or some combination thereof (a (all) has the same effect as ugo), how they are to be changed (+ adds a permission, - removes a permission, and = sets the specified permissions, removing the other ones) and which permission to add or remove (r for read, w for write, and x for execute). For example, to remove all the permissions from myfile: chmod a-rwx myfile ls -l myfile ---------- 1 owner 588 Jul 15 14:41 myfile (Note: chmod a= myfile achieves the same effect.) To allow read and write permissions for all users: chmod ugo+rw myfile ls -l myfile -rw-rw-rw- 1 owner 588 Jul 15 14:42 myfile To remove write permission for your groups and other users: chmod go-w myfile ls -l myfile -rw-r--r-- 1 owner 588 Jul 15 14:42 myfile Finally, to allow only read permission to all users: chmod a=r myfile ls -l myfile -r--r--r-- 1 owner 58 Jul 15 14:43 myfile Now the file is protected by allowing only read access; it cannot be written to or executed by anyone, including you. Protecting a file against writing by its owner is a safeguard against accidental overwriting, although not against accidental deletion. chmod will also accept a permission setting expressed as a 3-digit octal number. To determine this octal number, you first write a 1 if the permission is to be set and a 0 otherwise. This produces a binary number which can be converted into octal by grouping the digits in threes and replacing each group by the corresponding octal digit according to the table below. TABLE 2. Symbolic to Octal Conversions
Symbolic Binary Octal --- 000 0 --x 001 1 -w- 010 2 -wx 011 3 r-- 100 4 r-x 101 5
Thus, if the setting you want is rw-r--r--, determine the octal number with the following method: This shows that the octal equivalent of rw-r--r-- is 644. The following example illustrates that the permissions for myfile have been reset to the values with which we began. chmod 644 myfile ls -l myfile -rw-r--r-- 1 owner 588 Jul 15 14:44 myfile To change the permissions back to read only, you can execute chmod as follows: chmod 444 myfile ls -l myfile -r--r--r-- 1 owner 588 Jul 15 14:45 myfile As with files, directories may also have permissions assigned. When listing directories, you may use the -d option to keep from descending into the directories you list. Otherwise, the contents of the directories will be displayed as well as their names. Below is an example of permissions assigned to a directory: ls -lgd home drwxrwxr-x 1 owner caam223 588 Jul 15 9:45 home The directory and the files and directories under it may be read and executed by anyone, but written to only by the owner and users in the caam223 group. Assuming you are the owner of this directory, you may decide to change the permission to allow only yourself and the caam223 group to read and execute files in the home directory. You would set the permissions accordingly: chmod o-rx home ls -lgd home
drwxrwx--- 1 owner caam223 588Jul 15 9:46 home You may decide that only you should be able to alter the contents of the directory. You must remove the write permission for the group. rw- 110 6 rwx 111 7 TABLE 2. Symbolic to Octal Conversions
Symbolic Binary Octal
symbolic r w - r - - r - \ / \ / \ / binary 110 100 100 \ / \ / \ / octal 6 4 4 Removing Directories 26 Introduction to the UNIX Operating System on IT Systems chmod 750 home ls -lgd home drwxr-x--- 1 owner caam223 588 Jul 15 9:48 home An alternative to the previous command is chmod g-w. When you create a file the system gives it a default set of permissions. These are controlled by the system administrator and will vary from installation to installation. If you would like to change the default which is in effect for you, choose your own with the umask command. Note that the permission specified by the umask setting will be applied to the file, unlike that specified in the chmod command, which normally only adds or deletes (few people use the = operator to chmod). First, issue the command without arguments to cause the current settings to be echoed as an octal number: umask 022 If you convert these digits to binary, you will obtain a bit pattern of 1s and 0s. A 1 indicates that the corresponding permission is to be turned off, a 0, that it is to be turned on. (Notice that the bit patterns for chmod and umask are reversed.) Hence, the mask output above is 000010010, which produces a permission setting of rwr-r- (i.e., write permission is turned off for group and other). Newly created files always have the execution bit turned off. Suppose you decide that the default setting you prefer is rwxr-x---. This corresponds to the masking bit pattern 000010111, so the required mask is 026: umask 26 Now, if you create a new file during this session, the permissions assigned to the file will be the ones allowed by the mask value.
Wildcard Characters
Using wildcard characters that allow you to copy, list, move, remove, etc. items with similar names is a great help in manipulating files and directories. 1. The symbol ? will match any single character in that position in the file name. 2. The symbol * will match zero or more characters in the name. 3. Characters enclosed in brackets [and] will match any one of the given characters in the given position in the name. A consecutive sequence of characters can be designated by [char char]. Examples of each follow: 1. ?ab2 would match a name that starts with any single character and ends with ab2. ?ab? would match all names that begin and end with any character and have ab in between. 2. ab* would match all names that start with ab, including ab itself. a*b would match all names that start with a and end with b, including ab. 3. s[aqz] would match sa, sq, and sz. s[2-7] would match s2, s3, s4, s5, s6 and s7. Viewing Your Processes Introduction to the UNIX Operating System on IT Systems 27 These wildcard symbols help in dealing with groups of files, but you should remember that the instruction:
rm * would erase all files in your current directory (although by default, you would be prompted to okay each deletion). The wildcard * should be used carefully.
Processes
Every command or program running under UNIX is called a process. A sequence of related processes is called a job. Your applications and even your shell itself are processes. The windowing system is also a process, or a collection of processes. The UNIX kernel manages the processes on the system, usually without distinguishing among them. UNIX is a multi-tasking systemit allows you to continue to work in the foreground while running one or more jobs in the background. It also runs the processes of many users simultaneously. You could even log off and come back later if the background jobs do not require interaction with you.
If you have forgotten the job number, type the command jobs to see a list of the jobs that are running in the background at the moment. Note: The rules imposed by system administrators about where and how to run background jobs varies from network to network and changes over time. It is important to stay current with the background job policy of your network.
Remote Login
Sometimes, while you are logged into one workstation, you will find that you would like to be logged in to another workstation, file server, or other UNIX system. The command rlogin allows you to do so provided that you have an account on the other system. Type: rlogin newSystem You may then have to supply your password. You should also get the messages about logging in that are used on newSystem. If your userid is different on newSystem you will have to use the form: rlogin newSystem -l userid
& operator execute a command as a background process. banner prints the specified string in large letters. Each argument may be upto 10 characters long. break is used to break out of a loop. It does not exit from the program. Cal Produces a calender of the current month as standard output. The month (1-12) and year (1-9999) must be specified in full numeric format. Cal [[ month] year] Calendar Displays contents of the calendar file case operator The case operator is used to validate multiple conditions. Case $string in Pattern 1) Command list;; Command list;; Pattern 3) Command list;; easc cat (for concatenate) command is used to display the contents of a file. Used without arguments it takes input from standard input <Dtrl d> is used to terminate input. cat [filename(s)] cat > [filename] Data can be appended to a file using >> Some of the available options are : Cat [-options] filename(S) -s silent about files that cannot be accessed -v enables display of non printinging characters (except tabs, new lines, form-
feeds) -t when used with v, it causes tabs to be printed as ^Is -e when used with v, it causes $ to be printed at the end of each line The t and e options are ignored if the v options is not specified. cd Used to change directories chgrp Changes the group that owns a file. Chgrp [grou id] [filename] chmod Allows file permissions to be changed for each user. File permissions can be changed only by the owner (s). Chmod [+/-][rwx] [ugo] [filename] chown Used to change the owner of a file. The command takes a file(s) as source files and the login id of another user as the target. Chown [user-id] [filename] cmp The cmp command compares two files (text or binary) byte-by-byte and displays the first occurrence where the files differ. Cmp [filename1] [filename2] -1 gives a long listing comm. The comm command compares two sorted files and displays the instances that are common. The display is separated into 3 columns. Comm. filename1 filename2 first displays what occurs in first files but not in the second second displays what occurs in second file but not in first third displays what is common in both files continue statement The rest of the commands in the loop are ignored. It moves out of the loop and moves on the next cycle. cp The cp (copy) command is used to copy a file. Cp [filename1] [filename2] cpio(copy input/output) Utility program used to take backups. Cpio operates in three modes:
-o output -i input -p pass creat() the system call creates a new file or prepares to rewrite an existing file. The file pointer is set to the beginning of file. #include<sys/tyes.h> #include<sys/stat.h> int creat(path, mode) char *path; int mode; cut used to cut out parts of a file. It takes filenames as command line arguments or input from standard input. The command can cut columns as well as fields in a file. It however does not delete the selected parts of the file. Cut [-ef] [column/fie,d] filename Cut-d : f1,2,3 filename Where d indicates a delimiter specified within : df used to find the number of free blocks available for all the mounted file systems. #/etc/df [filesystem] diff the diff command compares text files. It gives an index of all the lines that differ in the two files along with the line numbers. It also displays what needs to be changed. Diff filename1 filename2 echo The echo command echoes arguments on the command line. echo [arguments] env Displays the permanent environment variables associated with a users login id exit command Used to stop the execution of a shell script. expr command Expr (command) command is used for numeric computation. The operators + (add), -(subtract), *(multiplu), /(divide), (remainder) are allowed. Calculation are performed in order of normal numeric precedence.
find The find command searches through directories for files that match the specified criteria. It can take full pathnames and relative pathnames on the command line. To display the output on screen the print option must be specified for operator The for operator may be used in looping constructs where there is repetitive execution of a section of the shell program. For var in vall val2 val3 val4; Do commnds; done fsck Used to check the file system and repair damaged files. The command takes a device name as an argument # /etc/fsck /dev/file-system-to-be-checked. grave operator Used to store the standard the output of a command in an enviroment variable. () grep The grep (global regular expression and print) command can be used as a filter to search for strings in files. The pattern may be either a fixed character string or a regular expression. Grep string filename(s) HOME Users home directory if operator The if operator allows conditional operator If expression; then commands; fi if thenelse fi $ if; then commands efile; then commands fi kill used to stop background processes In used to link files. A duplicate of a file is created with another name
LOGNAME displays users login name ls Lists the files in the current directory Some of the available options are: -l gives a long listing -a displays all file{including hidden files lp used to print data on the line printer. Lp [options] filename(s) mesg The mesg command controls messages received on a terminal. -n does not allow messages to be displayed on screen -y allows messages to be displayed on screen mkdir used to create directories more The more command is used to dispay data one screenful at a time. More [filename] mv Mv (move) moves a file from one directory to another or simply changes filenames. The command takes filename and pathnames as source names and a filename or exiting directory as target names. mv [source-file] [target-file] news The news command allows a user to read news items published by the system administrator. ni Displays the contents of a file with line numbers passwd Changes the password paste The paste command joins lines from two files and displays the output. It can take a number of filenames as command line arguments. paste file1 file2
PATH The directories that the system searches to find commands pg Used to display data one page (screenful) at a time. The command can take a number of filenames as arguments. Pg [option] [filename] [filename2].. pipe Operator (1) takes the output of one commands as input of another command. ps Gives information about all the active processes. PS1 The system prompt pwd (print working directory) displays the current directory. rm The rm (remove) command is used to delete files from a directory. A number of files may be deleted simultaneously. A file(s) once deleted cannot be retrieved. rm [filename 1] [filename 2] sift command Using shift $1becomes the source string and other arguments are shifted. $2 is shifted to $1,$3to $2 and so on. Sleep The sleep command is used to suspend the execution of a shell script for the specified time. This is usually in seconds. sort Sort is a utility program that can be used to sort text files in numeric or alphabetical order Sort [filename] split Used to split large file into smaller files Split-n filename Split can take a second filename on the command line. su Used to switch to superuser or any other user.
sync Used to copy data in buffers to files system0 Used to run a UNIX command from within a C program tail The tail command may be used to view the end of a file. Tail [filename] tar Used to save and restore files to tapes or other removable media. Tar [function[modifier]] [filename(s)] tee output that is being redirected to a file can also be viewed on standard output. test command It compares strings and numeric values. The test command has two forms : test command itself If test ${variable} = value then Do commands else do commands File The test commands also uses special operators [ ]. These are operators following the of are interpreted by the shell as different from wildcard characters. Of [ -f ${variable} ] Then Do commands Elif [ -d ${variable} ] then do commands else do commands fi many different tests are possible for files. Comparing numbers, character strings, values of environment variables. time Used to display the execution time of a program or a command. Time is reported in seconds. Time filename values
tr The tr command is used to translate characters. tr [-option] [string1 [string2]] tty Displays the terminal pathname umask Used to specify default permissions while creating files. uniq The uniq command is used to display the uniq(ue) lines in a sorted file. Sort filename uniq until The operator executes the commands within a loop as long as the test condition is false. wall Used to send a message to all users logged in. # /etc/wall message wait the command halts the execution of a script until all child processes, executed as background processes, are completed. wc The wc command can be used to count the number of lines, words and characters in a fine. wc [filename(s)] The available options are: wc [options] [filename] -1 -w -c while operator the while operator repeatedly performs an operation until the test condition proves false. $ while do commands done who displays information about all the users currently logged onto the system. The user name, terminal number and the date and time that each user logged onto the system. The syntax of the who command is who [options]
write The write command allows inter-user communication. A user can send messages by addressing the other users terminal or login id. write user-name [terminal number]
TTY Terminal
If you are using a TTY terminal (a TTY is line-at-a-time oriented as opposed to page oriented) and the screen is blank, you only need to press RETURN and a login prompt should appear on the screen.
Workstation
If the display features a box in the center of the screen with text similar to that in the figure below, then you are using a workstation that is configured to run a windowing system called the X Window system. These machines are called X terminals. (For more information on the X Window system, see the Information Technology document, UNIX 2, Introduction to the X Window System.). If the screen is entirely black, then a screen-saving program is running automatically to protect the monitor from damage. Moving the mouse or pressing the RETURN key should wake up the display. (If you see the words This screen has been locked... then someone else is using the workstation, but they are temporarily away from their seat. Look for an unoccupied machine.) Move the mouse until the cursor (a black X) is on top of the white box.
Operations Center, 109 Mudd Lab. The system will then display the command prompt. The prompt signals that the system is ready for you to enter your next command. The name of the workstation followed by a percent sign (%) forms the command prompt (e.g. chub.owlnet.rice.edu%). Once you finish typing a command, you must always press RETURN to execute it.
Logging Out
Workstations and TTY Terminals
To end a work session, you must explicitly log out of a UNIX session. To do this, type logout at the command prompt. Once you have logged out, the system will either display the login prompt again or begin executing a screen saver program. You should never turn a workstation off. Turning off a terminal does not necessarily log you out. If you are having trouble logging out, see the section, Troubleshooting.
X Terminals
To log out of the X Window system from an X terminal, move the cursor into the console window (it is labeled console), type the command exit, and press RETURN. If you try to use the logout command in the console window, you will receive the message, Not in login shell.
UNIX Commands
The UNIX Shell
Once you are logged in, you are ready to start using UNIX. As mentioned earlier, you interact with the system through a command interpreter program called the shell. Most UNIX systems have two different shells, although you will only use one or the other almost all of the time. The shell you will find on Information Technology supported networks is the C shell. It is called the C shell because it has syntax and constructs similar to those in the C programming language. The C shell command prompt often includes the name of the computer that you are using and usually ends with a special character, most often the percent sign (%). Another common shell is the Bourne shell, named for its author. The default prompt for the Bourne shell is the dollar sign ($). (If the prompt is neither one of these, a quick way to check which shell you are using is to type the C shell command alias; if a list appears, then you are using the C shell; if the message, Command not found appears, then you are using the Bourne shell). Modified versions of these shells are also available. TC shell (tcsh) is C shell with file name completion and command line editing (default prompt: >). The GNU Bourne-Again shell (bash) is basically the Bourne shell with the same features added (default prompt: bash$). In addition to processing your command requests, UNIX shells have their own syntax and control constructs. You can use these shell commands to make your processing more efficient, or to automate repetitive tasks. You can even store a sequence of shell commands in a file, called a shell script, and run it just like an ordinary program. Writing shell scripts is a topic discussed in the class notes for the UNIX IIIScripts Short Course.
data >> file Another redirection is <, which tells the command to take its input from a file rather than from the keyboard. For example, if you have a program that requires data input from the keyboard, you may find that you have to type the same data a large number of times in the debugging stage of program development. If you put that data in a file and direct the command to read it from there you will only have to type the data once, when you make the data file. program < datafile If you do this, you would see the same response from program as if you had typed the data in from the keyboard when requested. You can also combine both kinds of redirection as in, program < datafile > outputfile The data in the file datafile will then be used as input for program and all output will be stored in outputfile. If you want to accumulate output from different sources in a single file, the symbol >> directs output to be appended to the end of a file rather than replacing the previous (if any) contents, which the single > redirection will do. A final I/O redirection is the pipe symbol, |. The | tells the computer to take the output created by the command to the left of it and use that as the input for the command on the right. For example, we could type: date | program This would use the output of the date command as input to another program. NOTE: Many, but not all, interactive programs accept input from a file.
Creating a script
There are a lot of different shells available for Linux but usually the bash (bourne again shell) is used for shell programming as it is available for free and is easy to use. So all the scripts we will write in this article use the bash (but will most of the time also run with its older sister, the bourne shell). For writing our shell programs we use any kind of text editor, e.g. nedit, kedit, emacs, vi...as with other programming languages. The program must start with the following line (it must be the first line in the file):
#!/bin/sh
The #! characters tell the system that the first argument that follows on the line is the program to be used to execute this file. In this case /bin/sh is shell we use. When you have written your script and saved it you have to make it executable to be able to use it. To make a script executable type chmod +x filename Then you can start your script by typing: ./filename
Comments
Comments in shell programming start with # and go until the end of the line. We really recommend you to use comments. If you have comments and you don't use a certain script for some time you will still know immediately what it is doing and how it works.
Variables
As in other programming languages you can't live without variables. In shell programming all variables have the datatype string and you do not need to declare them. To assign a value to a variable you write:
varname=value
To get the value back you just put a dollar sign in front of the variable:
#!/bin/sh # assign a value: a="hello world" # now print the content of "a": echo "A is:" echo $a
Type this lines into your text editor and save it e.g. as first. Then make the script executable by typing chmod +x first in the shell and then start it by typing ./first The script will just print:
A is: hello world
Sometimes it is possible to confuse variable names with the rest of the text:
num=2 echo "this is the $numnd"
This will not print "this is the 2nd" but "this is the " because the shell searches for a variable called numnd which has no value. To tell the shell that we mean the variable num we have to use curly braces:
num=2 echo "this is the ${num}nd"
This prints what you want: this is the 2nd There are a number of variables that are always automatically set. We will discuss them further down when we use them the first time. If you need to handle mathematical expressions then you need to use programs such as expr (see table below). Besides the normal shell variables that are only valid within the shell program there are also environment variables. A variable preceeded by the keyword export is an environment variable. We will not talk about them here any further since they are normally only used in login scripts.
prompt the user for input and write it into a variable (var) sort lines in file.txt remove duplicate lines, used in combination with sort since uniq removes only duplicated consecutive lines Example: sort file.txt | uniq do math in the shell Example: add 2 and 3 expr 2 "+" 3 search for files Example: search by name: find . -name filename -print This command has many different possibilities and options. It is unfortunately too much to explain it all in this article. write data to stdout (your screen) and to a file Normally used like this: somecommand | tee outfile It writes the output of somecommand to the screen and to the file outfile return just the file name of a given name and strip the directory path Example: basename /bin/tux returns just tux return just the directory name of a given name and strip the actual file name Example: dirname /bin/tux returns just /bin print some lines from the beginning of a file print some lines from the end of a file sed is basically a find and replace program. It reads text from standard input (e.g from a pipe) and writes the result to stdout (normally the screen). The search pattern is a regular expression (see references). This search pattern should not be confused with shell wildcard syntax. To replace the string linuxfocus with LinuxFocus in a text file use: cat text.file | sed 's/linuxfocus/LinuxFocus/' > newtext.file This replaces the first occurance of the string linuxfocus in each line with LinuxFocus. If there are lines where linuxfocus appears several times and you want to replace all use: cat text.file | sed 's/linuxfocus/LinuxFocus/g' > newtext.file Most of the time awk is used to extract fields from a text line. The default field separator is space. To specify a different one use the option -F.
cat file.txt | awk -F, '{print $1 "," $3 }'
expr
find
tee
basename file
sed
awk
Here we use the comma (,) as field separator and print the first and third ($1 $3) columns. If file.txt has lines like:
Adam Bor, 34, India Kerry Miller, 22, USA
There is much more you can do with awk but this is a very common use.
2) Concepts: Pipes, redirection and backtick They are not really commands but they are very important concepts. pipes (|) send the output (stdout) of one program to the input (stdin) of another program.
grep "hello" file.txt | wc -l
finds the lines with the string hello in file.txt and then counts the lines. The output of the grep command is used as input for the wc command. You can concatinate as many commands as you like in that way (within reasonable limits). redirection: writes the output of a command to a file or appends data to a file > writes output to a file and overwrites the old file in case it exists >> appends data to a file (or creates a new one if it doesn't exist already but it never overwrites anything). Backtick The output of a command can be used as command line arguments (not stdin as above, command line arguments are any strings that you specify behind the command such as file names and options) for another command. You can as well use it to assign the output of a command to a variable. The command
find . -mtime -1 -type f -print
finds all files that have been modified within the last 24 hours (-mtime -2 would be 48 hours). If you want to pack all these files into a tar archive (file.tar) the syntax for tar would be:
tar xvf file.tar infile1 infile2 ...
Instead of typing it all in you can combine the two commands (find and tar) using backticks. Tar will then pack all the files that find has printed:
#!/bin/sh # The ticks are backticks (`) not normal quotes ('): tar -zcvf lastmod.tar.gz `find . -mtime -1 -type f -print`
3) Control structures The "if" statement tests if the condition is true (exit status is 0, success). If it is the "then" part gets executed:
if ....; then .... elif ....; then
Most of the time a very special command called test is used inside if-statements. It can be used to compare strings or test if a file exists, is readable etc... The "test" command is written as square brackets " [ ] ". Note that space is significant here: Make sure that you always have space around the brackets. Examples:
[ [ [ [ -f "somefile" ] -x "/bin/ls" ] -n "$var" ] "$a" = "$b" ] : : : : Test Test Test Test if if if if somefile is a file. /bin/ls exists and is executable. the variable $var contains something the variables "$a" and "$b" are equal
Run the command "man test" and you get a long list of all kinds of test operators for comparisons and files. Using this in a shell script is straight forward:
#!/bin/sh if [ "$SHELL" = "/bin/bash" ]; then echo "your login shell is the bash (bourne again shell)" else echo "your login shell is not bash but $SHELL" fi
The variable $SHELL contains the name of the login shell and this is what we are testing here by comparing it against the string "/bin/bash" Shortcut operators People familiar with C will welcome the following expression:
[ -f "/etc/shadow" ] && echo "This computer uses shadow passwors"
The && can be used as a short if-statement. The right side gets executed if the left is true. You can read this as AND. Thus the example is: "The file /etc/shadow exists AND the command echo is executed". The OR operator (||) is available as well. Here is an example:
#!/bin/sh mailfolder=/var/spool/mail/james [ -r "$mailfolder" ] || { echo "Can not read $mailfolder" ; exit 1; } echo "$mailfolder has mail from:" grep "^From " $mailfolder
The script tests first if it can read a given mailfolder. If yes then it prints the "From" lines in the folder. If it cannot read the file $mailfolder then the OR operator takes effect. In plain English you read this code as "Mailfolder readable or exit program". The problem here is that you must have exactly one command behind the OR but we need two: -print an error message -exit the program To handle them as one command we can group them together in an anonymous function using curly braces. Functions in general are explained further down. You can do everything without the ANDs and ORs using just if-statements but sometimes the shortcuts AND and OR are just more convenient. The case statement can be used to match (using shell wildcards such as * and ?) a given string against a number of possibilities.
case ... in ...) do something here;; esac
Let's look at an example. The command file can test what kind of filetype a given file is:
file lf.gz returns: lf.gz: gzip compressed data, deflated, original filename, last modified: Mon Aug 27 23:09:18 2001, os: Unix
We use this now to write a script called smartzip that can uncompress bzip2, gzip and zip compressed files automatically :
#!/bin/sh ftype=`file "$1"` case "$ftype" in "$1: Zip archive"*) unzip "$1" ;; "$1: gzip compressed"*) gunzip "$1" ;; "$1: bzip2 compressed"*) bunzip2 "$1" ;; *) error "File $1 can not be uncompressed with smartzip";; esac
Here you notice that we use a new special variable called $1. This variable contains the first argument given to a program. Say we run smartzip articles.zip then $1 will contain the string articles.zip The select statement is a bash specific extension and is very good for interactive use. The user can select a choice from a list of different values:
select var in ... ; do break done .... now $var can be used ....
Here is an example:
#!/bin/sh echo "What is your favourite OS?" select var in "Linux" "Gnu Hurd" "Free BSD" "Other"; do break done echo "You have selected $var"
The while-loop will run while the expression that we test for is true. The keyword "break" can be used to leave the loop at any point in time. With the keyword "continue" the loop continues with
the next iteration and skips the rest of the loop body. The for-loop takes a list of strings (strings separated by space) and assigns them to a variable:
for var in ....; do .... done
A more useful example script, called showrpm, prints a summary of the content of a number of RPM-packages:
#!/bin/sh # list a content summary of a number of RPM packages # USAGE: showrpm rpmfile1 rpmfile2 ... # EXAMPLE: showrpm /cdrom/RedHat/RPMS/*.rpm for rpmpackage in $*; do if [ -r "$rpmpackage" ];then echo "=============== $rpmpackage ==============" rpm -qi -p $rpmpackage else echo "ERROR: cannot read file $rpmpackage" fi done
Above you can see the next special variable, $* which contains all the command line arguments. If you run showrpm openssh.rpm w3m.rpm webgrep.rpm then $* contains the 3 strings openssh.rpm, w3m.rpm and webgrep.rpm. The GNU bash knows until-loops as well but generally while and for loops are sufficient. Quoting Before passing any arguments to a program the shell tries to expand wildcards and variables. To expand means that the wildcard (e.g. *) is replaced by the appropriate file names or that a variable is replaced by its value. To change this behaviour you can use quotes: Let's say we have a number of files in the current directory. Two of them are jpg-files, mail.jpg and tux.jpg.
#!/bin/sh echo *.jpg
This will print "mail.jpg tux.jpg". Quotes (single and double) will prevent this wildcard expansion:
#!/bin/sh echo "*.jpg" echo '*.jpg'
This will print "*.jpg" twice. Single quotes are most strict. They prevent even variable expansion. Double quotes prevent wildcard expansion but allow variable expansion:
#!/bin/sh echo $SHELL echo "$SHELL" echo '$SHELL'
Finally there is the possibility to take the special meaning of any single character away by preceeding it with a backslash:
echo \*.jpg echo \$SHELL
Here documents Here documents are a nice way to send several lines of text to a command. It is quite useful to write a help text in a script without having to put echo in front of each line. A "Here document" starts with << followed by some string that must also appear at the end of the here document. Here is an example script, called ren, that renames multiple files and uses a here document for its help text:
#!/bin/sh # we have less than 3 arguments. Print the help text: if [ $# -lt 3 ] ; then cat <<HELP ren -- renames a number of files using sed regular expressions USAGE: ren 'regexp' 'replacement' files... EXAMPLE: rename all *.HTM files in *.html: ren 'HTM$' 'html' *.HTM HELP exit 0 fi OLD="$1" NEW="$2" # The shift command removes one argument from the list of # command line arguments. shift shift # $* contains now all the files: for file in $*; do if [ -f "$file" ] ; then newfile=`echo "$file" | sed "s/${OLD}/${NEW}/g"` if [ -f "$newfile" ]; then echo "ERROR: $newfile exists already" else echo "renaming $file to $newfile ..." mv "$file" "$newfile" fi fi done
This is the most complex script so far. Let's discuss it a little bit. The first if-statement tests if we have provided at least 3 command line parameters. (The special variable $# contains the number of arguments.) If not, the help text is sent to the command cat which in turn sends it to the screen. After printing the help text we exit the program. If there are 3 or more arguments we
assign the first argument to the variable OLD and the second to the variable NEW. Next we shift the command line parameters twice to get the third argument into the first position of $*. With $* we enter the for loop. Each of the arguments in $* is now assigned one by one to the variable $file. Here we first test that the file really exists and then we construct the new file name by using find and replace with sed. The backticks are used to assign the result to the variable newfile. Now we have all we need: The old file name and the new one. This is then used with the command mv to rename the files. Functions As soon as you have a more complex program you will find that you use the same code in several places and also find it helpful to give it some structure. A function looks like this:
functionname() { # inside the body $1 is the first argument given to the function # $2 the second ... body }
You need to "declare" functions at the beginning of the script before you use them. Here is a script called xtitlebar which you can use to change the name of a terminal window. If you have several of them open it is easier to find them. The script sends an escape sequence which is interpreted by the terminal and causes it to change the name in the titlebar. The script uses a function called help. As you can see the function is defined once and then used twice:
#!/bin/sh # vim: set sw=4 ts=4 et: help() { cat <<HELP xtitlebar -- change the name of an xterm, gnome-terminal or kde konsole USAGE: xtitlebar [-h] "string_for_titelbar" OPTIONS: -h help text EXAMPLE: xtitlebar "cvs" HELP exit 0 } # in case of error or if -h is given we call the function help: [ -z "$1" ] && help [ "$1" = "-h" ] && help # send the escape sequence to change the xterm titelbar: echo -e "\033]0;$1\007" #
It's a good habit to always have extensive help inside the scripts. This makes it possible for others (and you) to use and understand the script. Command line arguments
We have seen that $* and $1, $2 ... $9 contain the arguments that the user specified on the command line (The strings written behind the program name). So far we had only very few or rather simple command line syntax (a couple of mandatory arguments and the option -h for help). But soon you will discover that you need some kind of parser for more complex programs where you define your own options. The convention is that all optional parameters are preceeded by a minus sign and must come before any other arguments (such as e.g file names). There are many possibilities to implement a parser. The following while loop combined with a case statement is a very good solution for a generic parser:
#!/bin/sh help() { cat <<HELP This is a generic command line parser demo. USAGE EXAMPLE: cmdparser -l hello -f -- -somefile1 somefile2 HELP exit 0 } while [ case $1 -h) -f) -l) --) -*) *) esac done echo echo echo echo -n "$1" ]; do in help;shift 1;; # function help is called opt_f=1;shift 1;; # variable opt_f is set opt_l=$2;shift 2;; # -l takes an argument -> shift by 2 shift;break;; # end of options echo "error: no such option $1. -h for help";exit 1;; break;;
"opt_f is $opt_f" "opt_l is $opt_l" "first arg is $1" "2nd arg is $2"
It produces
opt_f is 1 opt_l is hello first arg is -somefile1 2nd arg is somefile2
How does it work? Basically it loops through all arguments and matches them against the case statement. If it finds a matching one it sets a variable and shifts the command line by one. The unix convention is that options (things starting with a minus) must come first. You may indicate that this is the end of option by writing two minus signs (--). You need it e.g with grep to search for a string starting with a minus sign:
Search for -xx- in file f.txt: grep -- -xx- f.txt
Our option parser can handle the -- too as you can see in the listing above.
Examples
A general purpose sceleton Now we have discussed almost all components that you need to write a script. All good scripts should have help and you can as well have our generic option parser even if the script has just one option. Therefore it is a good idea to have a dummy script, called framework.sh, which you can use as a framework for other scripts. If you want to write a new script you just make a copy:
cp framework.sh myscript
and then insert the actual functionality into "myscript". Let's now look at two more examples: A binary to decimal number converter The script b2d converts a binary number (e.g 1101) into its decimal equivalent. It is an example that shows that you can do simple mathematics with expr:
#!/bin/sh # vim: set sw=4 ts=4 et: help() { cat <<HELP b2h -- convert binary to decimal USAGE: b2h [-h] binarynum OPTIONS: -h help text EXAMPLE: b2h 111010 will return 58 HELP exit 0 } error() { # print an error and exit echo "$1" exit 1 } lastchar() { # return the last character of a string in $rval if [ -z "$1" ]; then # empty string rval="" return fi # wc puts some space behind the output this is why we need sed: numofchar=`echo -n "$1" | wc -c | sed 's/ //g' ` # now cut out the last char rval=`echo -n "$1" | cut -b $numofchar` } chop()
{ # remove the last character in string and return it in $rval if [ -z "$1" ]; then # empty string rval="" return fi # wc puts some space behind the output this is why we need sed: numofchar=`echo -n "$1" | wc -c | sed 's/ //g' ` if [ "$numofchar" = "1" ]; then # only one char in string rval="" return fi numofcharminus1=`expr $numofchar "-" 1` # now cut all but the last char: rval=`echo -n "$1" | cut -b 0-${numofcharminus1}` } while [ case $1 -h) --) -*) *) esac done -n "$1" ]; do in help;shift 1;; # function help is called shift;break;; # end of options error "error: no such option $1. -h for help";; break;;
# The main program sum=0 weight=1 # one arg must be given: [ -z "$1" ] && help binnum="$1" binnumorig="$1" while [ -n "$binnum" ]; do lastchar "$binnum" if [ "$rval" = "1" ]; then sum=`expr "$weight" "+" "$sum"` fi # remove the last position in $binnum chop "$binnum" binnum="$rval" weight=`expr "$weight" "*" 2` done echo "binary $binnumorig is decimal $sum" #
The algorithm used in this script takes the decimal weight (1,2,4,8,16,..) of each digit starting from the right most digit and adds it to the sum if the digit is a 1. Thus "10" is: 0*1+1*2=2 To get the digits from the string we use the function lastchar. This uses wc -c to count the number of characters in the string and then cut to cut out the last character. The chop function
has the same logic but removes the last character, that is it cuts out everything from the beginning to the character before the last one. A file rotation program Perhaps you are one of those who save all outgoing mail to a file. After a couple of months this file becomes rather big and it makes the access slow if you load it into your mail program. The following script rotatefile can help you. It renames the mailfolder, let's call it outmail, to outmail.1 if there was already an outmail.1 then it becomes outmail.2 etc...
#!/bin/sh # vim: set sw=4 ts=4 et: ver="0.1" help() { cat <<HELP rotatefile -- rotate the file name USAGE: rotatefile [-h] OPTIONS: -h help text EXAMPLE: rotatefile out This will e.g rename out.2 to out.3, out.1 to out.2, out to out.1 and create an empty out-file The max number is 10 version $ver HELP exit 0 } error() { echo "$1" exit 1 } while [ -n "$1" ]; do case $1 in -h) help;shift 1;; --) break;; -*) echo "error: no such option $1. -h for help";exit 1;; *) break;; esac done # input check: if [ -z "$1" ] ; then error "ERROR: you must specify a file, use -h for help" fi filen="$1" # rename any .1 , .2 etc file: for n in 9 8 7 6 5 4 3 2 1; do if [ -f "$filen.$n" ]; then p=`expr $n + 1` echo "mv $filen.$n $filen.$p" filename
mv $filen.$n $filen.$p fi done # rename the original file: if [ -f "$filen" ]; then echo "mv $filen $filen.1" mv $filen $filen.1 fi echo touch $filen touch $filen
How does the program work? After checking that the user provided a filename we go into a for loop counting from 9 to 1. File 9 is now renamed to 10, file 8 to 9 and so on. After the loop we rename the original file to 1 and create an empty file with the name of the original file.
Debugging
The most simple debugging help is of course the command echo. You can use it to print specific variables around the place where you suspect the mistake. This is probably what most shell programmers use 80% of the time to track down a mistake. The advantage of a shell script is that it does not require any re-compilation and inserting an "echo" statement is done very quickly. The shell has a real debug mode as well. If there is a mistake in your script "strangescript" then you can debug it like this:
sh -x strangescript
This will execute the script and show all the statements that get executed with the variables and wildcards already expanded. The shell also has a mode to check for syntax errors without actually executing the program. To use this run:
sh -n your_script
If this returns nothing then your program is free of syntax errors. We hope you will now start writing your own shell scripts. Have fun!
pwd
The first two lines beginning with a hash (#) are comments and are not interpreted by the shell. Use comments to document your shell script; you will be surprised how easy it is to forget what your own programs do! The backquotes (`) around the command whoami illustrate the use of command substitution. The \\n is an option of the echo command that tells the shell to add an extra carriage return at the end of the line. The \\c tells the shell to stay on the same line. See the man page for details of other options. The argument to the echo command is quoted to prevent the shell interpreting these commands as though they had been escaped with the \\ (backslash) character. The shell also provides you with a programming environment with features similar to those of a high level programming languages. * The UNIX operating system provides a flexible set of simple tools to perform a wide variety of system-management, text-processing, and general-purpose tasks. These simple tools can be used in very powerful ways by tying them together programmatically, using "shell scripts" or "shell programs". The UNIX "shell" itself is a user-interface program that accepts commands from the user and executes them. It can also accept the same commands written as a list in a file, along with various other statements that the shell can interpret to provide input, output, decision-making, looping, variable storage, option specification, and so on. This file is a shell program. Shell programs are, like any other programming language, useful for some things but not for others. They are excellent for system-management tasks but not for general-purpose programming of any sophistication. Shell programs, though generally simple to write, are also tricky to debug and slow in operation. There are three versions of the UNIX shell: the original "Bourne shell (sh)", the "C shell (csh)" that was derived from it, and the "Korn shell (ksh)" that is in predominant use. The Bourne shell is in popular use as the freeware "Bourne-again shell" AKA "bash". This document focuses on the Bourne shell. The C shell is more powerful but has various limitations, and while the Korn shell is clean and more powerful than the other two shells, it is a superset of the Bourne shell: anything that runs on a Bourne shell runs on a Korn shell, though the reverse is not true. Since the Bourne shell's capabilities are probably more than most people require, there's no reason to elaborate much beyond them in an introductory document, and the rest of the discussion will assume use of the Bourne shell unless otherwise stated.
The shell executes such commands when they are typed in from the command prompt with their appropriate parameters, which are normally options and file names.
* The shell also allows files to be defined in terms of "wildcard characters" that define a range of files. The "*" wildcard character substitutes for any string of characters, so:
rm *.txt
-- deletes all files that end with ".txt". The "?" wildcard character substitutes for any single character, so:
rm book?.txt
-- deletes "book1.txt", "book2.txt", and so on. More than one wildcard character can be used at a time, for example:
rm *book?.txt
* Another shell capability is "input and output redirection". The shell, like other UNIX utilities, accepts input by default from what is called "standard input", and generates output by default to what is called "standard output". These are normally defined as the keyboard and display, respectively, or what is referred to as the "console" in UNIX terms. However, standard input or output can be "redirected" to a file or another program if needed. Consider the "sort" command. This command sorts a list of words into alphabetic order; typing in:
sort PORKY ELMER FOGHORN DAFFY WILE BUGS <CTL-D>
-- spits back:
Note that the CTL-D key input terminates direct keyboard input. It is also possible to store the same words in a file and then "redirect" the contents of that file to standard input with the "<" operator:
sort < names.txt
This would list the sorted names to the display as before. They can be redirected to a file with the ">" operator:
sort < names.txt > output.txt
They can also be appended to an existing file using the ">>" operator:
sort < names.txt >> output.txt
In these cases, there's no visible output, since the command just executes and ends. However, if that's a problem, it can be fixed by connecting the "tee" command to the output through a "pipe", designated by "|". This allows the standard output of one command to be chained into the standard input of another command. In the case of "tee", it accepts text into its standard input and then dumps it both to a file and to standard output:
sort < names.txt | tee output.txt
So this both displays the names and puts them in the output file. Many commands can be chained together to "filter" information through several processing steps. This ability to combine the effects of commands is one of the beauties of shell programming. By the way, "sort" has some handy additional options:
sort sort sort sort -u -r -n -k 2 # # # # Eliminate redundant lines in output. Sort in reverse order. Sort numbers. Skip first field in sorting.
* If a command generates an error, it is displayed to what is called "standard error", instead of standard output, which defaults to the console. It will not be redirected by ">". However, the operator "2>" can be used to redirect the error message. For example:
ls xyzzy 2> /dev/null
-- will give an error message if the file "xyzzy" doesn't exist, but the error will be redirected to the file "/dev/null". This is actually a "special file" that exists under UNIX where everything sent to it is simply discarded.
* The shell permits the execution of multiple commands sequentially on one line by chaining them with a ";":
rm *.txt ; ls
A time-consuming program can also be run in a "parallel" fashion by following it with a "&":
sort < bigfile.txt > output.txt &
* These commands and operations are essential elements for creating shell programs. They can be stored in a file and then executed by the shell. To tell the shell that the file contains commands, just mark it as "executable" with the "chmod" command. Each file under UNIX has a set of "permission" bits, listed by an "ls -l" -- the option providing file details -- as:
rwxrwxrwx
The "r" gives "read" permission, the "w" gives "write" permission, and the "x" gives "execute" permission. There are three sets of these permission bits, one for the user, one for other members of a local group of users on a system, and one for everyone who can access the system -- remember that UNIX was designed as a multiuser environment.
The "chmod" command can be used to set these permissions, with the permissions specified as an octal code. For example:
chmod 644 myfile.txt
This sets both read and write permission on the file for the user, but everybody else on the system only gets read permission. The same octal scheme can be used to set execute permission, though it's simpler just to use chmod "+x" option:
chmod +x mypgm
This done, if the name "mypgm" is entered at the prompt, the shell reads the commands out of "mypgm" and executes them. The execute permission can be removed with the "-x" option.
For example, suppose we want to be able to inspect the contents of a set of archive files stored in the directory "/users/group/archives". We could create a file named "ckarc" and store the following command string in it:
ls /users/group/archives | pg
This is a very simple shell program. As noted, the shell has control constructs, supports storage variables, and has several options that can be set to allow much more sophisticated programs. The following sections describe these features in a quick outline fashion.
Incidentally, this scheme for creating executable files is for the UNIX environment. Under the Windows environment, the procedure is to end shell program file names in a distinctive extension -- ".sh" is a good choice, though any unique extension will do -- and then configure Windows to run all files with that extension with a UNIX-type shell, usually bash.
This sends the string "This is a test!" to standard output. It is recommended to write shell programs that generate some output to inform the user of what they are doing.
The shell allows variables to be defined to store values. It's simple, just declare a variable is assign a value to it:
shvar="This is a test!"
The string is enclosed in double-quotes to ensure that the variable swallows the entire string (more on this later), and there are no spaces around the "=". The value of the shell variable can be obtained by preceding it with a "$":
echo $shvar
This displays "This is a test!". If no value had been stored in that shell variable, the result would have simply been a blank line. Values stored in shell variables can be used as parameters to other programs as well:
ls $lastdir
The value stored in a shell variable can be erased by assigning the "null string" to the variable:
shvar=""
There are some subtleties in using shell variables. For example, suppose a shell program performed the assignment:
allfiles=*
This would echo a list of all the files in the directory. However, only the string "*" would be stored in "allfiles". The expansion of "*" only occurs when the "echo" command is executed.
Another subtlety is in modifying the values of shell variables. Suppose we have a file name in a shell variable named "myfile" and want to copy that file to another with the same name, but with "2" tacked on to the end. We might think to try:
mv $myfile $myfile2
-- but the problem is that the shell will think that "myfile2" is a different shell variable, and this won't work. Fortunately, there is a way around this; the change can be made as follows:
mv $myfile ${myfile}2
A UNIX installation will have some variables installed by default, most importantly $HOME, which gives the location of a particular user's home directory.
As a final comment on shell variables, if one shell program calls another and the two shell programs have the same variable names, the two sets of variables will be treated as entirely different variables. To call other shell programs from a shell program and have them use the same shell variables as the calling program requires use of the "export" command:
shvar="This is a test!" export shvar echo "Calling program two." shpgm2
echo "Done!"
-- and it would print out the matching lines. However, suppose we wanted to search for "Wile E. Coyote". If we did this as:
fgrep Wile E. Coyote source.txt
-- we'd get an error message that "fgrep" couldn't open "E.". The string has to be enclosed in doublequotes (""):
fgrep "Wile E. Coyote" source.txt
If a string has a special character in it, such as "*" or "?", that must be interpreted as a "literal" and not a wildcard, the shell can get a little confused. To ensure that the wildcards are not interpreted, the wildcard can either be "escaped" with a backslash ("\*" or "\?") or the string can be enclosed in single quotes, which prevents the shell from interpreting any of the characters within the string. For example, if:
echo "$shvar"
-- is executed from a shell program, it would output the value of the shell variable "$shvar". In contrast, executing:
echo '$shvar'
* Having considered "double-quoting" and "single-quoting", let's now consider "back-quoting". This is a little tricky to explain. As a useful tool, consider the "expr" command, which can be used to perform simple math from the command line:
expr 2 + 4
This displays the value "6". There must be spaces between the parameters; in addition, to perform a multiplication the "*" has to be "escaped" so the shell doesn't interpret it:
expr 3 \* 7
Now suppose the string "expr 12 / 3" has been stored in a shell variable named "shcmd"; then executing:
echo $shcmd
-- or:
echo "$shcmd"
-- would simply produce the text "expr 12 / 3". If single-quotes were used:
echo '$shcmd'
-- the result would be the string "$shcmd". However, if back-quotes, the reverse form of a single quote, were used:
echo `$shcmd`
-- the result would be the value "4", since the string inside "shcmd" is executed. This is an extremely powerful technique that can be very confusing to use in practice.
-- shifts the arguments three times, so that the fourth argument ends up in "$1".
exit else echo "Welcome to Bongo Congo." fi echo "Do you have anything to declare?"
-- checks the command line to see if the first argument is "hyena" or "jackal" and bails out, using the "exit" command, if they are. Other arguments allow the rest of the file to be executed. Note how "$1" is enclosed in double quotes, so the test will not generate an error message if it yields a null result.
is a directory. is an ordinary file. can be read. is nonzero length. can be written. is executable.
-- there is a potential pitfall in that a user might enter, say, "-d" as a command-line parameter, which would cause an error when the program was run. Now there is only so much that can be done to save users from their own clumsiness, and "bullet-proofing" simple example programs tends to make them not so simple any more, but there is a simple if a bit cluttered fix for such a potential pitfall. It is left as an exercise for the reader.
There is also a "case" control construct that checks for equality with a list of items. It can be used with the example at the beginning of this section:
case "$1" in "gorilla") "hyena") *) esac
echo "Sorry, gorillas not allowed." exit;; echo "Hyenas not welcome." exit;; echo "Welcome to Bongo Congo.";;
* The fundamental loop construct in the shell is based on the "for" command. For example:
for nvar in 1 2 3 4 5 do echo $nvar done
-- echoes the numbers 1 through 5. The names of all the files in the current directory could be displayed with:
for file in * do echo $file done
One nice little feature of the shell is that if the "in" parameters are not specified for the "for" command, it just cycles through the command-line arguments.
There is also a "continue" command that starts the next iteration of the loop immediately. There must be a command in the "then" or "else" clauses, or the result is an error message. If it's not convenient to actually do anything in the "then" clause, a ":" can be used as a "no-op" command:
then : else
* There are two other looping constructs available as well, "while" and "until". For an example of "while":
n=10 while [ "$n" -ne 0 ] do echo $n n=`expr $n - 1` done
-- counts down from 10 to 1. The "until" loop has similar syntax but tests for a false condition:
n=10 until [ "$n" -eq 0 ] do ...
It is strongly recommended to comment all shell programs. If they are just one-liners, a simple comment line at the top of the file will do. If they are complicated shell programs, they should have a title, revision number, revision date, and revision history along with descriptive comments. This will prevent confusion if multiple versions of the same shell program are found, or if the program is modified later. Shell programs can be obscure, even by the standards of programming languages, and it is useful to provide a few hints.
* Standard input can be read into a shell program using the "read" command. For example:
echo "What is your name?" read myname echo $myname
-- echoes the user's own name. The "read" command will read each item of standard input into a list of shell variables until it runs out of shell variables, and then it will read all the rest of standard input into the last shell variable. As a result, in the example above, the user's name is stored into "myname".
* If a command is too long to fit on one line, the line continuation character "\" can be used to put it on more than one line:
echo "This is a test of \ the line continuation character."
* There is a somewhat cryptic command designated by "." that executes a file of commands within a shell program. For example:
. mycmds
-- will execute the commands stored in the file "mycmds". It's something like an "include" command in other languages.
* For debugging, the execution of a shell program can be traced using the "-x" option with the shell:
sh -x mypgm *
This traces out the steps "mypgm" takes during the course of its operation.
* One last comment on shell programs before proceeding: What happens if with a shell program that just performs, say:
cd /users/coyote
-- to change to another directory? Well ... nothing happens. After the shell program runs and exits, the directory remains unchanged. The reason is that the shell creates a new shell, or "subshell", to run the shell program, and when the shell program is finished, the subshell vanishes, along with any changes made in that subshell's environment. It is easier, at least in this simple case, to define a command alias in the UNIX "login" shell instead of struggling with the problem in shell programs.
-- takes a file containing names and a file containing corresponding phone numbers and generates a file with each name and number "pasted" together on the same line.
* The "head" and "tail" utilities list the first 10 or last 10 lines in a file respectively. The number of lines to be listed can be specified if needed:
head -n -5 source.txt tail -n -5 source.txt tail -n +5 source.txt # List first 5 lines. # List last 5 lines. # List all lines from line 5.
* The "tr" utility translates from one set of characters to another. For example, to translate uppercase characters to lowercase characters:
tr '[A-Z]' '[a-z]' < file1.txt > file2.txt
-- deletes all asterisks from the input stream. Note that "tr" only works on single characters.
* The "uniq" utility removes duplicate consecutive lines from a file. It has the syntax:
uniq source.txt output.txt
A "-c" option provides an additional count of the number of times a line was duplicated, while a "-d" option displays only the duplicated lines in a file.
* The "wc (word count)" utility tallies up the characters, words, and lines of text in a text file. It can be invoked with the following options:
wc -c wc -w wc -l # Character count only. # Word count only. # Line count only.
* The "find" utility is extremely useful, if a little hard to figure out. Essentially, it traverses a directory subtree and performs whatever action specified on every match. For example:
find / -name findtest.txt -print
This searches from the root directory ("/") for "findtest.txt", as designated by the "-name" option, and then prints the full pathname of the file, as designated by the "-print" option. Incidentally, "find" must be told what to do on a match; it will not by default say or do anything, it will just keep right on searching.
There are a wide variety of selection criteria. Simply printing out the names of directories in a search can be done with:
find . -type d -print
Files can also be found based on their username, date of last modification, size, and so on.
-- finds every example of the string "Taz" in all files ending in ".txt", then displays the name of the file and the line of text containing the string.
But using the magic characters provides much more flexibility. For example:
grep ^Taz *.txt
-- finds the string "Taz" only if it is at the beginning of the line. Similarly:
grep Taz$ *.txt
Now suppose we want to match both "Taz" and "taz". This can be done with:
[Tt]az
The square brackets ("[]") can be used to specify a range of characters. For example:
group_[abcdef]
-- matches the strings "group_a", "group_b", and so on up to "group_f". This range specification can be simplified to:
group_[a-f]
Similarly:
set[0123456789]
It is also possible to match to all characters except a specific range. For example:
unit_[^xyz]
Other magic characters provide a wildcard capability. The "." character can substitute for any single character, while the "*" substitutes for zero or more repetitions of the preceding regular expression. For example:
__*$
-- matches any line that is padded with spaces to the right margin (for clarity the space character is represented here by a "_"). If a magic character is to be matched as a real item of text, it has to be "escaped" with a "\":
test\.txt
-- lists all lines that don't match the regular expression. Other options include:
grep -n grep -i grep -l # List line numbers of matches. # Ignore case. # Only list file names for a match.
If there's no need to go through the bother of using regular expressions in a particular search, there is a variation on "grep" called "fgrep" (meaning "fixed grep" or "fast grep") that searches for matches on strings and runs faster; it was used in an earlier example. It uses the same options as described for "grep" above.
* The name "sed" stands for "stream editor" and it provides, in general, a search-and-replace capability. Its syntax for this task is as follows:
sed 's/<regular_expression>/<replacement_string>/[g]' source.txt
The optional "g" parameter specifies a "global" replacement. That is, if there are have multiple matches on the same line, "sed" will replace them all. Without the "g" option, it will only replace the first match on that line. For example, "sed" can be used to replace the string "flack" with "flak" as follows:
-- or perform substitutions and deletions from a list of such specifications stored in a file:
sed -f sedcmds.txt source.txt > output.txt
The "sed" utility has a wide variety of other options, but a full discussion of its capabilities is beyond the scope of this document.
* Finally, "awk" is a full-blown text processing language that looks something like a mad cross between "grep" and "C". In operation, "awk" takes each line of input and performs text processing on it. It recognizes the current line as "$0", with each word in the line recognized as "$1", "$2", "$3", and so on. This means that:
awk '{ print $0,$0 }' source.txt
-- prints each line with duplicate text. A regular expression can be specified to identify a pattern match. For example, "awk" could tally the lines with the word "Taz" on them with:
awk '/Taz/ { taz++ }; END { print taz }' source.txt
The END clause used in this example allows execution of "awk" statements after the line-scanning has been completed. There is also a BEGIN clause that allows execution of "awk" statements before linescanning begins. "Awk" can be used to do very simple or very complicated things. Its syntax is much like that of "C", though it is much less finicky to deal with. Details of "awk" are discussed in a companion document.
Similarly, I like to timestamp my documents in a particular format, so I have a shell program named "td" ("timedate") that invokes "date" as follows:
date +"date: %A, %d %B %Y %H%M %Z"
Another simple example is a shell script to convert file names from uppercase to lowercase:
for file
In this example, "for" is used to sequence through the file arguments, and of "tr" and back-quoting are used to establish the lower-case name for the file.
QUICK REFERENCE
* This final section provides a fast lookup reference for the materials in this document. * Useful commands:
cat cd chmod +x chmod 666 cp expr 2 + 2 fgrep grep grep -v grep -n grep -i grep -l head -5 source.txt ls mkdir more mv paste f1 f2 pg pwd rm rm -r rmdir sed 's/txt/TXT/g' sed 's/txt//' sed '/txt/q' sort sort +1 sort -n sort -r sort -u tail -5 source.txt tail +5 source.txt tr '[A-Z]' '[a-z]' tr '[a-z]' '[A-Z]' tr -d '_' uniq wc wc -w wc -l # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # Lists a file or files sequentially. Change directories. Set execute permissions. Set universal read-write permissions. Copy files. Add 2 + 2. Search for string match. Search for string pattern matches. Search for no match. List line numbers of matches. Ignore case. Only list file names for a match. List first 5 lines. Give a simple listing of files. Make a directory. Displays a file a screenfull at a time. Move or rename files. Paste files by columns. Variant on "more". Print working directory. Remove files. Remove entire directory subtree. Remove a directory. Scan and replace text. Scan and delete text. Scan and then quit. Sort input. Skip first field in sorting. Sort numbers. Sort in reverse order. Eliminate redundant lines in output. List last 5 lines. List all lines after line 5. Translate to lowercase. Translate to uppercase. Delete underscores. Find unique lines. Word count (characters, words, lines). Word count only. Line count.
shvar="Test 1" echo $shvar export shvar mv $f ${f}2 $1, $2, $3, ... $0 $# $* shift 2 read v . mycmds
# # # # # # # # # # #
Initialize a shell variable. Display a shell variable. Allow subshells to use shell variable. Append "2" to file name in shell variable. Command-line arguments. Shell-program name. Number of arguments. Complete argument list. Shift argument variables by 2. Read input into variable "v". Execute commands in file.
* IF statement:
if [ "$1" = "red" ] then echo "Illegal code." exit elif [ "$1" = "blue" ] then echo "Illegal code." exit else echo "Access granted." fi [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ "$shvar" "$shvar" "$shvar" "$shvar" "$nval" "$nval" "$nval" "$nval" "$nval" "$nval" -d -f -r -s -w -x tmp tmp tmp tmp tmp tmp ] ] ] ] ] ] = "red" ] != "red" ] = "" ] != "" ] -eq -ge -gt -le -lt -ne 0 0 0 0 0 0 ] ] ] ] ] ] String comparison, true if match. String comparison, true if no match. True if null variable. True if not null variable. Integer Integer Integer Integer Integer Integer True True True True True True if if if if if if test; test; test; test; test; test; "tmp" "tmp" "tmp" "tmp" "tmp" "tmp" true true true true true true if if if if if if equal to 0. greater than greater than less than or less than to not equal to or equal to 0. 0. equal to 0. 0. 0.
is a directory. is an ordinary file. can be read. is nonzero length. can be written. is executable.
* CASE statement:
case "$1" in "red") "blue") *) esac
echo "Illegal code." exit;; echo "Illegal code." exit;; echo "Access granted.";;
* Loop statements:
for nvar in 1 2 3 4 5
do echo $nvar done for file do echo $file done # Cycle through command-line arguments.
while [ "$n" != "Joe" ] do echo "What's your name?" read n echo $n done
# Or:
There are "break" and "continue" commands that exit or skip to the end of loops as the need arises.
This processes the three files given as arguments to the command printps.
$$ $! $$* $@@
Notes
$* and $@@ when unquoted are identical and expand into the arguments. "$*" is a single word, comprising all the arguments to the shell, joined together with spaces. For example '1 2' 3 becomes "1 2 3". "$@@" is identical to the arguments received by the shell, the resulting list of words completely match what was given to the shell. For example '1 2' 3 becomes "1 2" "3"
This prompts the user for input, assigns this to the variable name and then displays the value of this variable to standard output. If there is more than one word in the input, each word can be assigned to a different variable. Any words left over are assigned to the last named variable. For example:
echo echo read echo "Please enter your surname\n" "followed by your first name: \c" name1 name2 "Welcome to Glasgow $name2 $name1"
Conditional statements
Every Unix command returns a value on exit which the shell can interrogate. This value is held in the read-only shell variable $?. A value of 0 (zero) signifies success; anything other than 0 (zero) signifies failure.
The if statement
The if statement uses the exit status of the given command and conditionally executes the statements following. The general syntax is:
if test then commands else commands fi
then, else and fi are shell reserved words and as such are only recognised after a newline or ; (semicolon). Make sure that you end each if construct with a fi statement. if statements may be nested:
if ... then ... else if ... ... fi fi
The elif statement can be used as shorthand for an else if statement. For example:
if ... then ... elif ... ... fi
cmd1 is executed and its exit status examined. Only if cmd1 succeeds is cmd2 executed. This is a terse notation for:
if cmd1 then cmd2 fi
The || operator
You can use the || operator to execute a command and, if it fails, execute the next command in the command list. For example:
cmd1 || cmd2
cmd1 is executed and its exit status examined. If cmd1fails then cmd2 is executed. This is a terse notation for:
cmd1 if test $? -ne 0 then cmd2 fi
First, we test to see if the filename specified by the variable $FILE exists and is a regular file. If it does not then we test to see if the variable $WARN is assigned the value yes, and if it is a message that the filename does not exist is displayed.
When all the commands are executed control is passed to the first statement after the esac. Each list of commands must end with a double semi-colon (;;). A command can be associated with more than one pattern. Patterns can be separated from each other by a | symbol. For example:
case word in pattern1|pattern2) command ...
;;
Patterns are checked for a match in the order in which they appear. A command is always carried out after the first instance of a pattern. The * character can be used to specify a default pattern as the * character is the shell wildcard character.
commands is a sequence of one or more commands separated by a newline or ; (semicolon). The reserved words do and done must be preceded by a newline or ; (semicolon). Small loops can be written on a single line. For example:
for var in list; do commands; done
do command-list2 done
The commands in command-list1 are executed; and if the exit status of the last command in that list is 0 (zero), the commands in command-list2 are executed. The sequence is repeated as long as the exit status of command-list1 is 0 (zero). The until statement has the general form:
until command-list1 do command-list2 done
This is identical in function to the while command except that the loop is executed as long as the exit status of command-list1 is non-zero. The exit status of a while/until command is the exit status of the last command executed in command-list2. If no such command list is executed, a while/until has an exit status of 0 (zero).
This will cause execution to resume after the done n levels up. The continue command causes execution to resume at the while, until or for statement which begins the loop containing the continue command. You can also specify an argument n|FR to continue which will cause execution to continue at the n|FRth enclosing loop up.
Script files
Shelling script files do not require any extension, but you may see them with the .sh ending. There are two ways to call the script file:
./scriptfile.sh sh scriptfile.sh
The first method requires that the script have the file permission of +x (execution) for the user. The second just requires more typing.
The first method also has a requirement that the first line in the file is:
#!/bin/sh
This line tells bash what interpreter to use. Some other common first lines include:
#!/usr/bin/perl #!/usr/bin/python #!/usr/bin/php
You need only choose the appropriate one for whichever language you are using. For shell scripting, however, you will always want to choose the to use the sh interpreter.
Conditional statements
This program checks to see if a file exists which you pass to the script on the command line, and outputs a message if so.
NOTE: the -a option tells bash to check if a file exists. Also, you can use elif, and else, if you want to check for other conditions. elif has the same parameter layout as if, and else has no parameters invoke this script with:
Step by step breakdown: 1) a variable is created called "file" and it set to the first argument you pass into the script, the <file to check for> 2) the if statement checks to see if the file exists using the -a flag. notice the spacing, this is a requirement for conditional statements. 3) the script will send a message if the file exists, if it doesn't, oh well. 4) the if statement is ended by fi, a requirement.
Looping
This script will list the contents of a directory and prepend "Directory", "File", or "Symlink" before the proper listing. this employs a for loop which cycles through the ls output of a directory you pass into the script on the command line.
directorytols=$1 for filename in $( ls "$directorytols") do if [ -d "$filename" ] ; then echo "Directory: $filename" elif [ -h "$filename" ] ; then echo "Symlink: $filename" else echo "File: $filename" fi done
Ordinary files can contain text, data, or program information. An ordinary file cannot contain another file, or directory. An ordinary file can be thought of as a one-dimensional array of bytes. Directories In a previous section, we described directories as containers that can hold files, and other directories. A directory is actually implemented as a file that has one line for each item contained within the directory. Each line in a directory file contains only the name of the item, and a numerical reference to the location of the item. The reference is called an inumber, and is an index to a table known as the i-list. The i-list is a complete list of all the storage space available to the file system. Special files Special files represent input/output (i/o) devices, like a tty (terminal), a disk drive, or a printer. Because Unix treats such devices as files, a degree of compatibility can be achieved between device i/o, and ordinary file i/o, allowing for the more efficient use of software. Special files can be either character special files, that deal with streams of characters, or block special files, that operate on larger blocks of data. Typical block sizes are 512 bytes, 1024 bytes, and 2048 bytes. Links A link is a pointer to another file. Remember that a directory is nothing more than a list of the names and i-numbers of files. A directory entry can be a hard link, in which the inumber points directly to another file. A hard link to a file is indistinguishable from the file itself. When a hard link is made, then the i-numbers of two different directory file entries point to the same inode. For that reason, hard links cannot span across file systems. A soft link (or symbolic link) provides an indirect pointer to a file. A soft link is implemented as a directory file entry containing a pathname. Soft links are distinguishable from files, and can span across file systems. Not all versions of Unix support soft links.
The I-List
When we speak of a Unix file system, we are actually referring to an area of physical memory represented by a single i-list. A Unix machine may be connected to several file systems, each with its own i-list. One of those i-lists points to a special storage area, known as the root file system. The root file system contains the files for the operating system itself, and must be available at all times. Other file systems are removable. Removable file systems can be attached, or mounted, to the root file system. Typically, an empty directory is created on the root file system as a mount point, and a removable file system is attached there. When you issue a cd command to access the files and directories of a mounted removable file system, your file operations will be controlled through the i-list of the removable file system. The purpose of the i-list is to provide the operating system with a map into the memory of some physical storage device. The map is continually being revised, as the files are created and removed, and as they shrink and grow in size. Thus, the mechanism of mapping must be very flexible to accommodate drastic changes in the number and size of files. The i-list is stored in a known location, on the same memory storage device that it maps.
Each entry in an i-list is called an i-node. An i-node is a complex structure that provides the necessary flexibility to track the changing file system. The i-nodes contain the information necessary to get information from the storage device, which typically communicates in fixed-size disk blocks. An i-node contains 10 direct pointers, which point to disk blocks on the storage device. In addition, each i-node also contains one indirect pointer, one double indirect pointer, and one triple indirect pointer. The indirect pointer points to a block of direct pointers. The double indirect pointer points to a block of indirect pointers, and the triple indirect pointer points to a block of double indirect pointers. By structuring the pointers in a geometric fashion, a single i-node can represent a very large file. It now makes a little more sense to view a Unix directory as a list of i-numbers, each i-number referencing a specific i-node on a specific i-list. The operating system traces its way through a file path by following the i-nodes until it reaches the direct pointers that contain the actual location of the file on the storage device.
will let you know if you're over your soft limit. Adding the -v option will provide statistics about your disk usage.
Summarizes disk usage in a specified directory hierarchy ln Creates a hard link (default), or a soft link (with -s option) mount, umount Attaches, or detaches, a file system (super user only) mkfs Constructs a new file system (super user only) fsck Evaluates the integrity of a file system (super user only)
You can use the ls command to list the contents of each directory in your path, and the man command to get help on unfamiliar utilities. A good systems administrator will ensure that manual pages are provided for the utilities installed on the system.
What is Linux?
Linux is a true 32-bit operating system that runs on a variety of different platforms, including Intel, Sparc, Alpha, and Power-PC (on some of these platforms, such as Alpha, Linux is actually 64-bit). There are other ports available as well, but I do not have any experience with them. Linux was first developed back in the early 1990s, by a young Finnish then-university student named Linus Torvalds. Linus had a "state-of-the-art" 386 box at home and decided to write an alternative to the 286-based Minix system (a small unix-like implementation primarily used in operating systems classes), to take advantage of the extra instruction set available on the thennew chip, and began to write a small bare-bones kernel. Eventually he announced his little project in the USENET group comp.os.minix, asking for interested parties to take a look and perhaps contribute to the project. The results have been phenomenal! The interesting thing about Linux is, it is completely free! Linus decided to adopt the GNU Copyleft license of the Free Software Foundation, which means that the code is protected by a copyright -- but protected in that it must always be available to others. Free means free -- you can get it for free, use it for free, and you are even free to sell it for a profit (this isn't as strange as it sounds; several organizations, including Red Hat, have packaged up the standard Linux kernel, a collection of GNU utilities, and put their own "flavour" of included applications, and sell them as distributions. Some common and popular distributions are Slackware, Red Hat, SuSe, and Debian)! The great thing is, you have access to source code which means you can customize the operating systems to your own needs, not those of the "target market" of most commercial vendors. Linux can and should be considered a full-blown implementation of unix. However, it can not be called "Unix"; not because of incompatibilities or lack of functionality, but because the word "Unix" is a registered trademark owned by AT&T, and the use of the word is only allowable by license agreement. Linux is every bit as supported, as reliable, and as viable as any other operating system solution (well, in my opinion, quite a bit more so!). However, due to its origin, the philosophy behind it, and the lack of a multi-million dollar marketing campaign promoting it, there are lot of myths about it. People have a lot to learn about this wonderful OS!
Operating Systems/System Administration in UNIX Lecture Notes PCP Bhatt/IISc, Bangalore M19/V1/June 04/1
Most users are primarily interested in just running a set of basic applications for their professional needs. Often they cannot afford to keep track of new software releases and patches that get announced. Also, rarely they can install these themselves. In addition, these are non-trivial tasks and can only be done with superuser privileges. Users share resources like disk space, etc. So there has to be some allocation policy of the disk space. A system administrator needs to implement such a policy. System administration also helps in setting up user's working environments. On the other hand, the management is usually keen to ensure that the resources are used properly and efficiently. They seek to monitor the usage and keep an account of system usage. In fact, the system usage pattern is often analysed to help determine the efficacy of
Operating Systems/System Administration in UNIX Lecture Notes PCP Bhatt/IISc, Bangalore M19/V1/June 04/2
usage. Clearly, managements' main concerns include performance and utilisation of resources to ensure that operations of the organisation do not suffer. At this juncture it may be worth our while to list major tasks which are performed by system administrators. We should note that most of the tasks require that the system administrator operates in superuser mode with root privileges. Administration Tasks List This is not an exhaustive list, yet it represents most of the tasks which system administrators perform: 1. System startup and shutdown: In the Section 19.2, we shall see the basic steps required to start and to stop operations in a Unix operational environment. 2. Opening and closing user accounts: In Unix an administrator is both a user and a super-user. Usually, an administrator has to switch to the super-user mode with root privileges to open or close user accounts. In Section 19.3, we shall discuss some of the nuances involved in this activity. 3. Helping users to set up their working environment: Unix allows any user to customize his working environment. This is usually achieved by using .rc files. Many users need help with an initial set-up of their .rc files. Later, a user may modify his .rc files to suit his requirements. In Section 19.4, we shall see most of the useful .rc files and the interpretations for various settings in these files. 4. Maintaining user services: Users require services for printing, mail Web access and chat. We shall deal with mail and chat in Section 19.4 where we discuss .rc files and with print services in Section 19.5 where we discuss device management and services. These services include spooling of print jobs, provisioning of print quota, etc. 5. Allocating disk space and re-allocating quotas when the needs grow: Usually there would be a default allocation. However, in some cases it may be imperative to enhance the allocation. We shall deal with the device oriented services and management issues in Section 19.5. 6. Installing and maintaining software: This may require installing software patches from time to time. Most OSs are released with some bugs still present. Often with usage these bugs are identified and patches released. Also, one may have some software installed which satisfies a few of the specialized needs of the user
Operating Systems/System Administration in UNIX Lecture Notes PCP Bhatt/IISc, Bangalore M19/V1/June 04/3
local is an indicator of the local (and therefore a non-standard) nature of software. We shall not discuss the software installation as much of it is learned from experienced system administrators by assisting them in the task. 7. Installing new devices and upgrading the configuration: As a demand on a system grows, additional devices may need to be installed. The system administrator will have to edit configuration files to identify these devices. Some related issues shall be covered in section 19.5 later in this chapter. 8. Provisioning the mail and internet services: Users connected to any host shall seek Mail and internet Web access. In addition, almost every machine shall be a resource within a local area network. So for resource too the machine shall have an IP address. In most cases it would be accessible from other machine as well. We shall show the use .mailrc files in this context later in Section 19.4. 9. Ensuring security of the system: The internet makes the task of system administration both interesting and challenging. The administrators need to keep a check on spoofing and misuse. We have discussed security in some detail in the module on OS and Security. 10. Maintaining system logs and profiling the users: A system administrator is required to often determine the usage of resources. This is achieved by analysing system logs. The system logs also help to profile the users. In fact, user profiling helps in identifying security breaches as was explained in the module entitled OS and Security. 11. System accounting: This is usually of interest to the management. Also, it helps system administrators to tune up an operating system to meet the user requirements. This also involves maintaining and analysing logs of the system operation. 12. Reconfiguring the kernel whenever required: Sometimes when new patches are installed or a new release of the OS is received, then it is imperative to compile the kernel. Linux users often need to do this as new releases and extensions become available. Let us begin our discussions with the initiation of the operations and shutdown procedures.
Operating Systems/System Administration in UNIX Lecture Notes PCP Bhatt/IISc, Bangalore M19/V1/June 04/4
Starting and Shutting Down First we shall examine what exactly happens when the system is powered on. Later, we shall examine the shutdown procedure for Unix systems. Unix systems, on being powered on, usually require that a choice be made to operate either in single or in multiple-user mode. Most systems operate in multi-user mode. However, system administrators use single-user mode when they have some serious reconfiguration or installation task to perform. Family of Unix systems emanating from System V usually operate with a run level. The single-user mode is identified with run level s, otherwise there are levels from 0 to 6. The run level 3 is the most common for multi-user mode of operation. On being powered on, Unix usually initiates the following sequence of tasks: 1. The Unix performs a sequence of self-tests to determine if there are any hardware problems. 2. The Unix kernel gets loaded from a root device.
3. The kernel runs and initializes itself. 4. The kernel starts the init process. All subsequent processes are spawned from init process. 5. The init checks out the file system using fsck. 6. The init process executes a system boot script. 7. The init process spawns a process to check all the terminals from which the system may be accessed. This is done by checking the terminals defined under /etc/ttytab or a corresponding file. For each terminal a getty process is launched. This reconciles communication characteristics like baud rate and type for each terminal. 8. The getty process initiates a login process to enable a prospective login from a terminal. During the startup we notice that fsck checks out the integrity of the file system. In case the fsck throws up messages of some problems, the system administrator has to work around to ensure that there is a working configuration made available to the users. It will suffice here to mention that one may monitor disk usage and reconcile the disk integrity. The starting up of systems is a routine activity. The most important thing to note is that on booting, or following a startup, all the temporary files under tmp directory are cleaned
Operating Systems/System Administration in UNIX Lecture Notes PCP Bhatt/IISc, Bangalore M19/V1/June 04/5
up. Also, zombies are cleaned up. System administrators resort to booting when there are a number of zombies and often a considerable disk space is blocked in the tmp directory. We next examine the shutdown. Most Unix systems require invoking the shutdown utility. The shutdown utility offers options to either halt immediately, or shutdown after a pre-assigned period. Usually system administrators choose to shutdown with a preassigned period. Such a shutdown results in sending a message to all the terminals that the system shall be going down after a certain interval of time, say 5 minutes. This cautions all the users and gives them enough time to close their files and terminate their active processes. Yet another shutdown option is to reboot with obvious implications. The most commonly used shutdown command is as follows: shutdown -h time [message] Here the time is the period and message is optional, but often it is intended to advise users to take precautions to terminate their activity gracefully. This mode also prepares to turn power off after a proper shutdown. There are other options like k, r, n etc. The readers are encouraged to find details about these in Unix man pages. For now, we shall move on to discuss the user accounts management and run command files. Managing User Accounts When a new person joins an organisation he is usually given an account by the system administrator. This is the login account of the user. Now a days almost all Unix systems support an admin tool which seeks the following information from the system administrator to open a new account: 1. Username: This serves as the login name for the user. 2. Password: Usually a system administrator gives a simple password. The users are advised to later select a password which they feel comfortable using. User's password appears in the shadow files in encrypted forms. Usually, the /etc/passwd file contains the information required by the login program to authenticate the login name and to initiate appropriate shell as shown in the description below:
bhatt:x:1007:1::/export/home/bhatt:/usr/local/bin/bash damu:x:1001:10::/export/home/damu:/usr/local/bin/bash Each line above contains information about one user. The first field is the name of the user; the next a dummy indicator of password, which is in another file, a shadow file. Password programs use a trap-door algorithm for encryption.
Operating Systems/System Administration in UNIX Lecture Notes PCP Bhatt/IISc, Bangalore M19/V1/June 04/6
3. Home directory: Every new user has a home directory defined for him. This is the default login directory. Usually it is defined in the run command files. 4. Working set-up: The system administrators prepare .login and .profile files to help users to obtain an initial set-up for login. The administrator may prepare .cshrc, .xinitrc .mailrc .ircrc files. In Section 19.4 we shall later see how these files may be helpful in customizing a user's working environment. A natural point of curiosity would be: what happens when users log out? Unix systems receive signals when users log out. Recall, in Section 19.2 we mentioned that a user logs in under a login process initiated by getty process. Process getty identifies the terminal being used. So when a user logs out, the getty process which was running to communicate with that terminal is first killed. A new getty process is now launched to enable yet another prospective login from that terminal. The working set-up is completely determined by the startup files. These are basically .rc (run command) files. These files help to customize the user's working environment. For instance, a user's .cshrc file shall have a path variable which defines the access to various Unix built-in shell commands, utilities, libraries etc. In fact, many other shell environmental variables like HOME, SHELL, MAIL, TZ (the time zone) are set up automatically. In addition, the .rc files define the access to network services or some need-based access to certain licensed software or databases as well. To that extent the .rc files help to customize the user's working environment. We shall discuss the role of run command files later in Section 19.4. 5. Group-id: The user login name is the user-id. Under Unix the access privileges are determined by the group a user belongs to. So a user is assigned a group-id. It is possible to obtain the id information by using an id command as shown below: [bhatt@iiitbsun OS]$id uid=1007(bhatt) gid=1(other) [bhatt@iiitbsun OS]$ 6. Disc quota: Usually a certain amount of disk space is allocated by default. In cases where the situation so warrants, a user may seek additional disk space. A user may interrogate the disk space available at any time by using the df command. Its usage is shown below:
Operating Systems/System Administration in UNIX Lecture Notes PCP Bhatt/IISc, Bangalore M19/V1/June 04/7
df [options] [name] : to know the free disk space. where name refers to a mounted file system, local or remote. We may specify directory if we need to know the information about that directory. The following options may help with additional information: -l : for local file system -t : reports total no. of allocated blocks and i-nodes on the device.
The Unix command du reports the number of disk blocks occupied by a file. Its usage is shown below: du [options] [name]... where name is a directory or a file Above name by default refers to the current directory. The following options may help with additional information: -a : produce output line for each file -s : report only the total usage for each name that is a directory i.e. not individual files. -r : produce messages for files that cannot be read or opened 7. Network services: Usually a user shall get a mail account. We will discuss the role of .mailrc file in this context in section 19.4. The user gets an access to Web services too. 8. Default terminal settings: Usually vt100 is the default terminal setting. One can attempt alternate terminal settings using tset, stty, tput, tabs with the control sequences defined in terminfo termcap with details recorded in /etc/ttytype or /etc/tty files and in shell variable TERM. Many of these details are discussed in Section 19.5.1 which specifically deals with terminal settings. The reader is encouraged to look up that section for details. Once an account has been opened the user may do the following: 1. Change the pass-word for access to one of his liking. 2. Customize many of the run command files to suit his needs. Closing a user account: Here again the password file plays a role. Recall in section 19.1 we saw that /etc/password file has all the information about the users' home directory, password, shell, user and group-id, etc. When a user's account is to be deleted, all of this information needs to be erased. System administrators login as root and delete the user entry from the password file to delete the account.
Operating Systems/System Administration in UNIX Lecture Notes PCP Bhatt/IISc, Bangalore M19/V1/June 04/8
The .rc Files Usually system administration offers a set of start-up run command files to a new user. These are files that appear as .rc files. These may be .profile, .login, .cshrc, .bashrc .xinitrc, .mailrc .ircrc, etc. The choice depends upon the nature of the login shell. Typical allocations may be as follows: 0 Bourne or Korn shell: .profile 1 C-Shell: .login, .cshrc 2 BASH: .bashrci 3 TCSH: .tcshrc BASH is referred as Bourne-again shell. TCSH is an advanced C-Shell with many shortcuts like pressing a tab may complete a partial string to the extent it can be covered unambiguously. For us it is important to understand what is it that these files facilitate. Role of .login and .profile files: The basic role of these files is to set up the environment for a user. These may include the following set-ups. Set up the terminal characteristics: Usually, the set up may include terminal type, and character settings for the prompt, erase, etc. Set up editors: It may set up a default editor or some specific editor like emacs. Set up protection mode: This file may set up umask, which stands for the user mask. umask determines access right to files.
Set up environment variables: This file may set up the path variable. The path variable defines the sequence in which directories are searched for locating the commands and utilities of the operating system. Set up some customization variables: Usually, these help to limit things like selecting icons for mail or core dump size up to a maximum value. It may be used for setting up the limit on the scope of the command history, or some other preferences. A typical .login file may have the following entries: # A typical .login file umask 022 setenv PATH /usr/ucb:/usr/bin:/usr/sbin:/usr/local/bin setenv PRINTER labprinter
Operating Systems/System Administration in UNIX Lecture Notes PCP Bhatt/IISc, Bangalore M19/V1/June 04/9
setenv EDITOR vi biff y set prompt='hostname'=> The meanings of the lines above should be obvious from the explanation we advanced earlier. Next we describe .cshrc files and the readers should note the commonalities between these definitions of initialisation files. The .cshrc file: The C-shell makes a few features available over the Bourne shell. For instance, it is common to define aliases in .cshrc files for very frequently used commands like gh for ghostview and c for clear. Below we give some typical entries for .cshrc file in addition to the many we saw in the .login file in this section: if (! $?TERM) setenv TERM unknown if ("TERM" == "unknown" || "$TERM" == "network") then echo -n 'TERM? [vt100]: '; set ttype=($<) if (ttype == "") set ttype="vt100" if (ttype == "pc") then set ttype="vt100" endif setenv TERM $ttype endif alias cl clear alias gh ghostview set history = 50 set nobeep Note that the above, in the first few lines in the script, system identifies the nature of terminal and sets it to operate as vt100. It is highly recommended that the reader should examine and walk-through the initialization scripts which the system administration provides. Also, a customization of these files entails that as a user we must look up these files and modify them to suit our needs. There are two more files of interest. One corresponds to regulating the mail and the other which controls the screen display. These are respectively initialized through .mailrc and .xinitrc. We discussed the latter in the chapter on X Windows. We shall discuss the settings in .mailrc file in the context of the mail system.
Operating Systems/System Administration in UNIX Lecture Notes
The mail system: .mailrc file : From the viewpoint of the user's host machine, the mail program truly acts as the main anchor for our internet-based communication. The Unix sendmail program together with the uu class of programs form the very basis of the mail under Unix. Essentially, the mail system has the following characteristics: 1. The mail system is a Store and forward system. 2. Mail is picked up from the mail server periodically. The mail daemon, picks up the mail running as a background process. 3. Mail is sent by sendmail program under Unix. 4. The uu class of programs like uucp or Unix-to-Unix copy have provided the basis for developing the mail tools. In fact, the file attachments facility is an example of it. On a Unix system it is possible to invoke the mail program from an auto-login or .cshrc program. Every Unix user has a mailbox entry in the /usr/spool/mail directory. Each person's mail box is named after his own username. In Table 19.1 we briefly review some very useful mail commands and the wild card used with these commands. We next give some very useful commands which help users to manage their mails efficiently: Various command options for mail. d:r : delete all read messages. d:usenet : delete all messages with usenet in body p:r : print all read messages. p:bhatt : print all from user ``bhatt''.
Operating Systems/System Administration in UNIX Lecture Notes PCP Bhatt/IISc, Bangalore M19/V1/June 04/11
During the time a user is composing a mail, the mail system tools usually offer facility to escape to a shell. This can be very useful when large files need to be edited along side the mail being sent. These use ~ commands with the interpretations shown below: ~! escape to shell, ~d include dead.letter ~h edit header field The mail system provides for command line interface to facilitate mail operations using some of the following commands. For instance, every user has a default mail box called mbox. If one wishes to give a different name to the mailbox, he may choose a new name for it. Other facilities allow a mail to be composed with, or without, a subject or to see the progress of the mail as it gets processed. We show some of these options and their usage with mail command below. mail -s greetings [email protected] -s: option is used to send a mail with subject. -v: option is for the verbose option, it shows mails' progress -f mailbox: option allows user to name a new mail box mail -f newm: where newm may be the new mail box option which a user may opt for in place of mbox (default option). Next we describe some of the options that often appear inside .mailrc user files. Generally, with these options we may have aliases (nick-names) in place of the full mail address. One may also set or unset some flags as shown in the example below:
Various options for .mailrc file. In Table 19.2, we offer a brief explanation of the options which may be set initially in .mailrc files. In addition, in using the mail system the following may be the additional facilities which could be utilized: 1. To subscribe to [email protected], the body of the message should contain subscribe", the group to subscribe to and the subscribers' e-mail address as shown in the following example. subscribe allmusic [email protected]. 2. To unsubscribe use logout allmusic. In addition to the above there are vacation programs which send mails automatically when the receiver is on vacation. Mails may also be encrypted. For instance, one may use a pretty good privacy (PGP) for encrypting mails. Facilitating chat with .ircrc file: System administrators may prepare terminals and offer Inter Relay Chat or IRC facility as well. IRC enables real-time conversation with one or more persons who may be scattered anywhere globally. IRC is a multi-user system. To use IRC's, Unix-based IRC versions, one may have to set the terminal emulation to vt100 either from the keyboard or from an auto-login file such as .login in bin/sh or .cshrc in /bin/csh. $ set TERM=vt100 $ stty erase "^h"
Operating Systems/System Administration in UNIX Lecture Notes PCP Bhatt/IISc, Bangalore M19/V1/June 04/13
The most common way to use the IRC system is to make a telnet call to the IRC server. There are many IRC servers. Some servers require specification of a port number as in irc.ibmpcug.co.uk9999. When one first accesses the IRC server, many channels are presented. A channel may be taken as a discussion area and one may choose a channel for an online chat (like switch a channel on TV). IRCs require setting up an .ircrc file. Below we give some sample entries for a .ircrc file. The .ircrc files may also set internal variables. /COMMENT ..... /NICK <nn> /JOIN <ch> IRC commands begin with a \/" character. In Table 19.3, we give a few of the commands for IRC with their interpretations. Various commands with interpretation. IRCs usually support a range of channels. Listed below are a few of the channel types: Limbo or Null Public Private Secret Moderated
Operating Systems/System Administration in UNIX Lecture Notes PCP Bhatt/IISc, Bangalore M19/V1/June 04/14
Limited Topic limited Invite Only Message disabled. The above channel types are realized by using a mode command. The modes are set or unset as follows. The options have the interpretations shown in Table 19.4. /MODE sets (with +) and unsets (with -) the mode of channel with the following options /MODE <channel> +<channel options> < parameters> /MODE <channel> -<channel options> < parameters> Various options for channels. Sourcing Files As we have described above, the .rc files help to provide adequate support for a variety of services. Suppose we are logged to a system and seek a service that requires a change in one of the .rc files. We may edit the corresponding file. However, to affect the changed behavior we must source the file. Basically, we need to execute the source command with the file name as argument as shown below where we source the .cshrc file: source .cshrc Device Management and Services Technically the system administrator is responsible for every device, for all of its usage and operation. In particular, the administrator looks after its installation, upgrade, configuration, scheduling, and allocating quotas to service the user community. We shall, however, restrict ourselves to the following three services: 1. Terminal-based services, discussed in Section 19.5.1 2. Printer services, discussed in Section 19.5.2 3. Disc space and file services, discussed in Section 19.5.3. We shall begin with the terminal settings and related issues. The Terminal Settings
Operating Systems/System Administration in UNIX Lecture Notes PCP Bhatt/IISc, Bangalore M19/V1/June 04/15
In the context of terminal settings the following three things are important: 1. Unix recognizes terminals as special files. 2. Terminals operate on serial lines. Unix has a way to deal with files that are essentially using serial communication lines. 3. The terminals have a variety of settings available. This is so even while the protocols of communication for all of them are similar. From the point of terminal services provisioning and system configuration, system administration must bear the above three factors in mind. Unix maintains all terminal related information in tty files in /etc/dev directory. These files are special files which adhere to the protocols of communication with serial lines. This includes those terminals that use modems for communication. Some systems may have a special file for console like /etc/dev/console which can be monitored for messages as explained in the chapter on X-Windows. Depending upon the terminal type a serial line control protocol is used which can interrogate or activate appropriate pins on the hardware interface plug.
The following brief session shows how a terminal may be identified on a host: login: bhatt Password: Last login: Tue Nov 5 00:25:21 from 203.197.175.174 [bhatt@iiitbsun bhatt]$hostname iiitbsun [bhatt@iiitbsun bhatt]$tty /dev/pts/1 [bhatt@iiitbsun bhatt]$ termcap and terminfo files: The termcap and terminfo files in the directory /etc or in /usr/share/lib/terminfo provide the terminal database, information and programs for use in the Unix environment. The database includes programs that may have been compiled to elicit services from a specific terminal which may be installed. The programs that control the usage of a specific terminal are identified in the environment variable TERM as shown in the example below: [bhatt@localhost DFT02]$ echo $TERM xterm [bhatt@localhost DFT02]$
Operating Systems/System Administration in UNIX Lecture Notes PCP Bhatt/IISc, Bangalore M19/V1/June 04/16
Options under stty. There are specific commands like tic, short for terminal information compilation. Also, there are programs that convert termcap to terminfo whenever required. For detailed discussions on terminal characteristics and how to exploit various features the reader may refer to [2]. We shall, however, elaborate on two specific commands here. These are the tset and stty commands. 1. tset Command: The tset command is used to initialize a terminal. Usually, the command sets up initial settings for characters like erase, kill, etc. Below we show how under C-Shell one may use the tset command: $setenv TERM `tset - Q -m ":?vt100" Sometimes one may prepare a temporary file and source it. 2. stty command: We briefly encountered the stty command in Section 19.2. Here we shall elaborate on stty command in the context of options and the values which may be availed by using the stty command. In Table 19.5 we list a few of the options with their corresponding values. There are many other options. In Table 19.5 we have a sample of those that are available. Try the command stty -a to see the options for your terminal. Below is shown the setting on my terminal: [bhatt@localhost DFT02]$ stty -a speed 38400 baud; rows 24; columns 80; line = 0; intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = M-^?; eol2 = M-^?; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R; werase = ^W; lnext = ^V; flush = ^O; min = 1; time = 0; -parenb -parodd cs8 hupcl -cstopb cread -clocal -crtscts -ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl ixon -ixoff -iuclc ixany imaxbel
Operating Systems/System Administration in UNIX Lecture Notes
opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0 isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt echoctl echoke [bhatt@localhost DFT02]$ Lastly, we discuss how to attach a new terminal. Basically we need to connect a terminal and then we set-up the entries in termcap and/or in terminfo and configuration files. Sometimes one may have to look at the /etc/inittab or /etc/ttydefs as well. It helps to reboot the system on some occasions to ensure proper initialization following a set-up attempt. Printer Services Users obtain print services through a printer daemon. The system arranges to offer print services by spooling print jobs in a spooling directory. It also has a mechanism to service the print requests from the spooling directory. In addition, system administrators need to be familiar with commands which help in monitoring the printer usage. We shall begin with a description of the printcap file. The printcap file: Unix systems have their print services offered using a spooling system. The spooling system recognizes print devices that are identified in /etc/printcap file. The printcap file serves not only as a database, but also as a configuration file. Below we see the printcap file on my machine: # /etc/printcap # # DO NOT EDIT! MANUAL CHANGES WILL BE LOST! # This file is autogenerated by printconf-backend during lpd init. # # Hand edited changes can be put in /etc/printcap.local, and will be included. iiitb:\ :sh:\ :ml=0:\ :mx=0:\ :sd=/var/spool/lpd/iiitb:\ :lp=|/usr/share/printconf/jetdirectprint:\ :lpd_bounce=true:\
Operating Systems/System Administration in UNIX Lecture Notes PCP Bhatt/IISc, Bangalore M19/V1/June 04/18
:if=/usr/share/ printconf/mf_wrap per: The printcap file is a read-only file except that it can be edited by superuser ROOT. The entries in printcap files can be explained using Table 19.6. With the file description and the table we can see that the spooling directory for our printer, with printer name iiitb is at /var/spool. Also note we have no limit on file size which can be printed. The printcap file: printer characteristics.
Printer spooling directory: As we explained earlier, print requests get spooled first. Subsequently, the printer daemon lpd honours the print request to print. To achieve this, one may employ a two layered design. Viewing it bottom up, at the bottom layer maintain a separate spooling directory for each of the printers. So, when we attach a new printer, we must create a new spooling directory for it. At the top level, we have a spooling process which receives each print request and finally spools it for printer(s). Note that the owner of the spool process is a group daemon. Printer monitoring commands: The printer commands help to monitor both the health of the services as also the work in progress. In table 19.7 we elaborate on the commands and their interpretations. The printer commands. To add a printer one may use a lpadmin tool. Some of the system administration practices are best learned by assisting experienced system administrators rarely can be taught through a textbook.
Operating Systems/System Administration in UNIX Lecture Notes PCP Bhatt/IISc, Bangalore M19/V1/June 04/19
Disk space allocation and management In this section we shall discuss how does a system administrator manage the disk space. We will also like the reader to refer to Section 2.7.1 where we stated that at the time of formatting, partitions of the disk get defined. The partitions may be physical or logical. In case of a physical partition we have the file system resident within one disk drive. In case of logical partition, the file system may extend over several drives. In either of these cases the following issues are at stake: 1. Disk file system: In Chapter 2 we indicated that system files are resident in the root file system. Similarly, the user information is maintained in home file system created by the administrator. Usually, a physical disk drive may have one or more file systems resident on it. As an example, consider the mapping shown in Figure 19.1. We notice that there are three physical drives with mapping or root and The names of file systems are shown in bold letters. Mapping file systems on physical drives. other file systems. Note that the disk drive with the root file system co-locates the var file system on the same drive. Also, the file system home extends over two drives. This is possible by appropriate assignment of the disk partitions to various file systems. Of course, system programmers follow some method in both partitioning and allocating the partitions. Recall that each file system maintains some data about each of the files within it. System administrators have to reallocate the file systems when new disks become available, or when some disk suffers damage to sectors or tracks which may no longer be available.
Operating Systems/System Administration in UNIX Lecture Notes PCP Bhatt/IISc, Bangalore M19/V1/June 04/20
2. Mounting and unmounting: The file systems keep the files in a directory structure which is essentially a tree. So a new file system can be created by specifying the point of mount in the directory tree. A typical mount instruction has
the following format. mount a-block-special-file point-of-mount Corresponding to a mount instruction, there is also an instruction to unmount. In Unix it is umount with the same format as mount. In Unix every time we have a new disk added, it is mounted at a suitable point of mount in the directory tree. In that case the mount instruction is used exactly as explained. Of course, a disk is assumed to be formatted. 3. Disk quota: Disk quota can be allocated by reconfiguring the file system usually located at /etc/fstab. To extend the allocation quota in a file system we first have to modify the corresponding entry in the /etc/fstab file. The system administration can set hard or soft limits of user quota. If a hard limit has been set, then the user simply cannot exceed the allocated space. However, if a soft limit is set, then the user is cautioned when he approaches the soft limit. Usually, it is expected that the user will resort to purging files no longer in use. Else he may seek additional disk space. Some systems have quota set at the group level. It may also be possible to set quota for individual users. Both these situations require executing an edit quota instruction with user name or group name as the argument. The format of edquota instruction is shown below. edquota user-name 4. Integrity of file systems: Due to the dynamics of temporary allocations and moving files around, the integrity of a file system may get compromised. The following are some of the ways the integrity is lost: Lost files. This may happen because a user ahs opened the same file from multiple windows and edited them. A block may be marked free but may be in use. A block may be marked in use but may be free. The link counts may not be correct. The data in the file system table and actual files may be different.
Operating Systems/System Administration in UNIX Lecture Notes PCP Bhatt/IISc, Bangalore M19/V1/June 04/21
The integrity of the file system is checked out by using a fsck instruction. The argument to the command is the file system which we need to check as shown below. fsck file-system-to-be-checked On rebooting the system these checks are mandatory and routinely performed. Consequently, the consistency of the file system is immediately restored on rebooting. 5. Access control: As explained earlier in this chapter, when an account is opened, a user is allocated a group. The group determines the access. It is also possible to offer an initial set-up that will allow access to special (licensed) software like matlab suite of software. 6. Periodic back-up: Every good administrator follows a regular back-up procedure so that in case of a severe breakdown, at least a stable previous state can be achieved. After-Word In this moduler we have listed many tasks which system administrators are required to perform. However, as we remarked earlier, the best lessons in system administration are
learned under the tutelage of a very experienced system administrator. There is no substitute to the hands-on" learning.