UNIX From
UNIX From
Useful Commands
pwd
To see where we are, we can print working directory, which returns the path of the directory
in which we currently reside:
cd
ls
ls lists the files and directories in the cwd, if we leave off arguments. If we pass directories
to the command as arguments, it will list their contents. Here are some common flags.
Display our files in a column:
$ ls -1 # list vertically with one line per item
List in long form—show file permissions, the owner of the file, the group to which he
belongs, the date the file was created, and the file size:
$ ls -l # long form
List in human-readable (bytes will be rounded to kilobytes, gigabytes, etc.) long form:
$ ls -hl # long form, human readable
List in human-readable long form sorted by the time the file was last modified:
$ ls -hlt # long form, human readable, sorted by time
List in long form all files in the directory, including dotfiles:
$ ls -al # list all, including dot- files and dirs.
Note that you can use an arbitrary number of arguments and that bash uses the convention
that an asterik matches anything. For example, to list only files with the .txt extension:
$ ls *.txt
mkdir, rmdir
To make directory—i.e., create a new folder—we use:
$ mkdir mynewfolder
To make nested directories (and don't complain if trying to make a directory that already
exists), use the -p flag:
$ mkdir -p a/b/c # make nested directories
You can remove directory using:
$ rmdir mynewfolder # don't use this
However, since the directory must be empty to use this command, it's not convenient.
Instead, use:
$ rm -r mynewfolder # use this
If folder is not empty then use –r otherwise using just rm without –r will not delete your dir.
Since rm has all the functionality you need, I know few people who actually use rmdir.
About the only occasion to use it is if you want to be careful you're not deleting a directory
with stuff in it.
echo
echo prints the string passed to it as an argument. For example:
$ echo joe
joe
Supress newline:
As we've seen above, if you want to print a string with spaces, use quotes. You should also
be aware of how bash treats double vs single quotes. If you use double quotes, any variable
inside them will be expanded (the same as in Perl). If you use single quotes, everything is
taken literally and variables are not expanded. Here's an example:
$ var=5
$ joe=hello $var
-bash: 5: command not found
That didn't work because we forgot the quotes. Let's fix it:
$ joe="hello $var"
$ echo $joe
hello 5
$ joe='hello $var'
$ echo $joe
hello $var
There are two variations on cat, which occasionally come in handy. zcat allows you to
cat zipped files:
$ zcat file.txt.gz
You can also see a file in reverse order, bottom first, with tac (tac is cat spelled
backwards).
cp
The command to make a copy of a file is cp:
$ cp file1 file2
Use the recursive flag, -R, to copy directories plus files:
$ cp -R dir1 dir2 # copy directories
The directory and everything inside it are copied. Question: what would the following do?
$ cp -R dir1 ../../
Answer: it would make a copy of dir1 up two levels from our current working directory.
Tip: If you're moving a large directory structure with lots of files in them, use rsync instead
of cp. If the command fails midway through, rsync can start from where it left off but cp
can't.
mv
To rename a file or directory we use mv:
$ mv file1 file2
In a sense, this command also moves files, because we can rename a file into a different
path. For example:
$ mv file1 dir1/dir2/file2
would move file1 into dir1/dir2/ and change its name to file2, while:
$ mv file1 dir1/dir2/
would simply move file1 into dir1/dir2/ (or, if you like, rename ./file1 as ./dir1/dir2/file1).
Swap the names of two files, a and b:
$ mv a a.1
$ mv b a
$ mv a.1 b
Change the extension of a file, test.txt, from .txt to .html:
$ mv test.txt test.html
Shortcut:
$ mv test.{txt,html}
mv can be dangerous because, if you move a file into a directory where a file of the same
name exists, the latter will be overwritten. To prevent this, use the -n flag:
$ mv -n myfile mydir/ # move, unless "myfile" exists in "mydir"
rm
The command rm removes the files you pass to it as arguments:
$touch file #to create file
$ rm file # removes a file
Use the recursive flag, -r, to remove a file or a directory:
$ rm -r dir # removes a file or directory
If there's a permission issue or the file doesn't exist, rm will throw an error. You can
override this with the force flag, -f:
$ rm -rf dir # force removal of a file or directory
# (i.e., ignore warnings)
You may be aware that when you delete files, they can still be recovered with effort if your
computer hasn't overwritten their contents. To securely delete your files—meaning
overwrite them before deleting—use:
$ rm -P file # overwrite your file then delete
man
man shows the usage manual, or help page, for a command. For example, to see the
manual for ls:
$ man ls
To see man's own manual page:
$ man man
head, tail
Based off of An Introduction to Unix - head and tail: head and tail print the first or last n
lines of a file, where n is 10 by default. For example:
$ head myfile.txt # print the first 10 lines of the file
$ head -1 myfile.txt # print the first line of the file
$ head -50 myfile.txt # print the first 50 lines of the file
$ tail myfile.txt # print the last 10 lines of the file
$ tail -1 myfile.txt # print the last line of the file
$ tail -50 myfile.txt # print the last 50 lines of the file
These are great alternatives to cat, because often you don't want to spew out a giant file.
You only want to peek at it to see the formatting, get a sense of how it looks, or hone in on
some specific portion. If you combine head and tail together in the same command
chain, you can get a specific row of your file by row number. For example, print row 37:
$ cat -n file.txt | head -37 | tail -1 # print row 37 kitty
$ head -2 hello.txt kitty.txt
==> hello.txt <==
hello
hello
==> kitty.txt <==
kitty
kitty
which is useful if you're previewing many files.
To preview all files in the current directory:
$ head *
See the last 10 lines of your bash history:
$ history | tail # show the last 10 lines of history
See the first 10 elements in the cwd:
$ ls | head
grep, egrep
grep is the terminal's analog of find from ordinary computing (not to be confused with unix
find). If you've ever used Safari or TextEdit or Microsoft Word, you can find a word with
⌘F (Command-f) on Macintosh. Similarly, grep searches for text in a file and returns the
line(s) where it finds a match. For example, if you were searching for the word apple in your
file, you'd do:
$ grep apple myfile.txt # return lines of file with the text apple
grep has many nice flags, such as:
$ grep -n apple myfile.txt # include the line number
$ grep -i apple myfile.txt # case insensitive matching
$ grep --color apple myfile.txt # color the matching text
Also useful are what I call the ABCs of Grep—that's After, Before, Context. Here's what they
do:
$ grep -A1 apple myfile.txt # return lines with the match,
# as well as 1 after
$ grep -B2 apple myfile.txt # return lines with the match,
# as well as 2 before
$ grep -C3 apple myfile.txt # return lines with the match,
# as well as 3 before and after.
You can do an inverse grep with the -v flag. Find lines that don't contain apple:
$ grep -v apple myfile.txt # return lines that don't contain apple
Find any occurrence of apple in any file in a directory with the recursive flag:
$ grep -R apple mydirectory/ # search for apple in any file in mydirectory
grep works nicely in combination with history:
$ history | grep apple # find commands in history containing "apple"
Exit after, say, finding the first two instances of apple so you don't waste more time
searching:
$ cat myfile.txt | grep -m 2 apple
To see only the match use –o
$ grep -0 „the‟ abc.txt
Display the count of number of matches :
$ grep –c “unix” abc.txt
Display the file name that matches the pattern
$ grep –l “unix” *
or
$ grep –l “unix” f1.txt,f2.txt,f3.txt
There are more powerful variants of grep, like egrep, which permits the use of regular
expressions, as in:
$ egrep "apple|orange" myfile.txt # return lines with apple OR
orange
which
which shows you the path of a command in your PATH. For example, on my computer:
$ which less
/usr/bin/less
$ which cat
/bin/cat
$ which rm
/bin/rm
If there is more than one of the same command in your PATH, which will show you the one
which you're using (i.e., the first one in your PATH). Suppose your PATH is the following:
$ echo $PATH
/home/username/mydir:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin
And suppose myscript.py exists in 2 locations in your PATH:
$ ls /home/username/mydir/myscript.py
/home/username/mydir/myscript.py
$ ls /usr/local/bin/myscript.py
/usr/local/bin/myscript.py
Then:
$ which myscript.py
/home/username/mydir/myscript.py # this copy of the script has
precedence
chmod
chmod adds or removes permissions from files or directories. In unix there are three
spheres of permissions:
u - user
g - group
o- other/world
Everyone with an account on the computer is a unique user (see whoami) and, although
you may not realize it, can be part of various groups, such as a particular lab within a
university or a team in a company. There are also three types of permission:
r - read
w - write
x - execute
clear
clear clears your screen:
$ clear
wc
wc counts the number of words, lines, or characters in a file. Count lines:
$ cat myfile.txt
aaa
bbb
ccc
ddd
$ cat myfile.txt | wc -l
4
Count words:
$ echo -n joe | wc -w
1
Count characters:
$ echo -n joe | wc -c
3
sort
As you guessed, the command sort sorts files. It has a large man page, but we can learn
its basic features by example. Let's suppose we have a file, testsort.txt, such that:
$ cat testsort.txt
vfw 34 awfjo
a 4 2
f 10 10
beb 43 c
f 2 33
f 1 ?
Then:
$ sort testsort.txt
a 4 2
beb 43 c
f 1 ?
f 10 10
f 2 33
vfw 34 awfjo
What happened? The default behavior of sort is to dictionary sort the rows of a file
according to what's in the first column, then second column, and so on. Where the first
column has the same value—f in this example—the values of the second column determine
the order of the rows. Dictionary sort means that things are sorted as they would be in a
dictionary: 1,2,10 gets sorted as 1,10,2.
If you want to do a numerical sort, use the -n flag; if you want to sort in reverse order, use
the -r flag. You can also sort according to a specific column. The notation for this is:
sort -kn,m
where n and m are numbers which refer to the range column n to column m. In practice, it
may be easier to use a single column rather than a range so, for example:
sort -k2,2
means sort by the second column (technically from column 2 to column 2). To sort
numerically by the second column:
$ sort -k2,2n testsort.txt
f 1 ?
f 2 33
a 4 2
f 10 10
vfw 34 awfjo
beb 43 c
As is often the case in unix, we can combine flags as much as we like. Question: what does
this do?
$ sort -k1,1r -k2,2n testsort.txt
vfw 34 awfjo
f 1 ?
f 2 33
f 10 10
beb 43 c
a 4 2
Answer: the file has been sorted first by the first column, in reverse dictionary order, and
then—where the first column is the same—by the second column in numerical order. You
get the point!
Sort uniquely:
$ sort -u testsort.txt # sort uniquely
Sort using a designated tmp directory:
$ sort -T /my/tmp/dir testsort.txt # sort using a designated tmp
directory
sort works particularly well with uniq. For example, look at the following list of numbers:
seq
seq prints a sequence of numbers. Display numbers 1 through 5:
$ seq 1 5
1
2
3
4
5
You can achieve the same result with:
$ echo {1..5} | tr " " "\n"
1
2
3
4 5
If you add a number in the middle of your seq range, this will be the "step":
$ seq 1 2 10
1
3
5
7
9
cut
cut cuts one or more columns from a file, and delimits on tab by default. Suppose a file,
sample.blast.txt, is:
TCONS_00007936|m.162 gi|27151736|ref|NP_006727.2| 100.00 324
TCONS_00007944|m.1236 gi|55749932|ref|NP_001918.3| 99.36 470
TCONS_00007947|m.1326 gi|157785645|ref|NP_005867.3| 91.12 833
TCONS_00007948|m.1358 gi|157785645|ref|NP_005867.3| 91.12 833
Then:
$ cat sample.blast.txt | cut -f2
gi|27151736|ref|NP_006727.2|
gi|55749932|ref|NP_001918.3|
gi|157785645|ref|NP_005867.3|
gi|157785645|ref|NP_005867.3|
although this is long-winded and in this case we can achieve the same result simply with:
paste
paste joins files together in a column-wise fashion. Another way to think about this is in
contrast to cat, which joins files vertically. For example:
$ cat file1.txt
a
b
c
$ cat file2.txt
1
2
3
$ paste file1.txt file2.txt
a 1
b 2
c 3
Paste with a delimiter:
$ paste -d";" file1.txt file2.txt
a;1
b;2
c;3
awk
awk and sed are command line utilities which are themselves programming languages
built for text processing. Both of these languages are almost antiques which have been
pushed into obsolescence by Perl and Python. Writing a simple line of awk can be faster
and less hassle than hauling out Perl or Python. The key point about awk is, it works line by
line. A typical awk construction is:
cat file.txt | awk '{ some code }'
Awk executes its code once every line. Let's say we have a file, test.txt, such that:
$ cat test.txt
1 c
3 c
2 t
1 c
In awk, the notation for the first field is $1, $2 is for second, and so on. The whole line is $0.
For example:
$ cat test.txt | awk '{print}' # print full line
1 c
3 c
2 t
1 c
$ cat test.txt | awk '{print $0}' # print full line
1 c
3 c
2 t
1 c
Awk has a bunch of built-in variables which are handy: NR is the row number; NF is the
total number of fields; and OFS is the output delimiter. There are many more you can read
about here. Continuing with our very contrived examples, let's see how these can help us:
$ cat test.txt | awk '{print $1"\t"$2}' # write tab explicitly
1 c
3 c
2 t
1 c
So the first command prints the file as it is. The second command prints the file with the row
number added in front. And the third prints the file with the row number in the first column
and the number of fields in the second—in our case always two.
For example, if you wanted to print the 3rd row of your file, you could use:
$ cat test.txt | awk '{if (NR==3) {print $0}}' # print the 3rd row
of your file
2 t
We can tell awk to delimit on anything we like by using the -F flag. For instance, let's look at
the following situation:
$ echo "a b" | awk '{print $1}'
a
$ echo "a b" | awk -F"\t" '{print $1}'
a b
When we feed a space b into awk, $1 refers to the first field, a. However, if we explicitly tell
awk to delimit on tabs, then $1 refers to a b because it occurs before a tab. You can also
use shell variables inside your awk by importing them with the -v flag:
$ x=hello
$ cat test.txt | awk -v var=$x '{ print var"\t"$0 }'
hello 1 c
hello 3 c
hello 2 t
hello 1 c
And you can write to multiple files from inside awk:
$ cat test.txt | awk '{if ($1==1) {print > "file1.txt"} else {print
> "file2.txt"}}'
$ cat file1.txt
1 c
1 c
$ cat file2.txt
3 c
2 t
Lets print the row numbers such that the first field equals the second field?
$ echo -e "a\ta\na\tc\na\tz\na\ta"
a a
a c
a z
a a
Here's the answer:
$ echo -e "a\ta\na\tc\na\tz\na\ta" | awk '$1==$2{print NR}'
1
4
How to print the average of the first column in a text file?
$ cat file.txt | awk 'BEGIN{x=0}{x=x+$1;}END{print x/NR}'
NR is a special variable representing row number
$ cat test_header.txt
This is a header
1 asdf
2 asdf
2 asdf
1,3 is sed's notation for the range 1 to 3. We can't do much more without entering regular
expression territory. One sed construction is:
/pattern/d
where d stands for delete if the pattern is matched. So to remove lines beginning with #:
$ cat test_comment.txt
1 asdf
# This is a comment
2 asdf
# This is a comment
2 asdf
$ cat test_comment.txt | sed '/^#/d'
1 asdf
2 asdf
2 asdf
Another construction is:
s/A/B/
where s stands for substitute. So this means replace A with B. By default, this only works for
the first occurrence of A, but if you put a g at the end, for group, all As are replaced:
s/A/B/g
For example:
$ # replace 1st occurrence of kitty with X
$ echo "hello kitty. goodbye kitty" | sed 's/kitty/X/'
hello X. goodbye kitty
Sed has the ability to edit files in place with the -i flag—that is to say, modify the file wihout
going through the trouble of creating a new file and doing the re-naming dance. For
example, to add the line This is a header to the top of myfile.txt:
$ sed -i '1i This is a header' myfile.txt