0% found this document useful (0 votes)
55 views

MSC 1090 Lecture 1

This document provides an introduction to a computational biostatistics class taught using R. It discusses that the class will cover topics like the Linux command line, R, statistics, and visualization over 12 weeks. It notes most assignments will be done via the command line, which can be more efficient for large data analysis tasks compared to a graphical user interface. The document outlines expectations for the class website, assignments, and conduct. Today's class will provide an introduction to the file system and manipulating files from the Linux command line.

Uploaded by

HamzahKhan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views

MSC 1090 Lecture 1

This document provides an introduction to a computational biostatistics class taught using R. It discusses that the class will cover topics like the Linux command line, R, statistics, and visualization over 12 weeks. It notes most assignments will be done via the command line, which can be more efficient for large data analysis tasks compared to a graphical user interface. The document outlines expectations for the class website, assignments, and conduct. Today's class will provide an introduction to the file system and manipulating files from the Linux command line.

Uploaded by

HamzahKhan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Introduction to Computational BioStatistics with R:

Linux command line I

Erik Spence

SciNet HPC Consortium

13 September 2022

Erik Spence (SciNet HPC Consortium) Linux Command Line I 13 September 2022 1 / 27
Today’s slides

To find today’s slides, go to the ”Introduction to Computational BioStatistics with R” page,


under ”Lectures”, ”Intro to Linux Shell I”.

https://education.scinet.utoronto.ca

You can also access the class web site directly, here:

https://scinet.courses/1246

Erik Spence (SciNet HPC Consortium) Linux Command Line I 13 September 2022 2 / 27
Who are we?

We are Erik Spence and Alexey Fedoseev.


We are Applications Analysts at SciNet (https://www.scinethpc.ca).
SciNet is a High-Performance-Computing (HPC) consortium, one of six in Canada, run by
the University of Toronto.
These consortia run massively parallel computers, with tens of thousands of cores, to
perform computations that couldn’t be done otherwise.
Our job at SciNet is to help users get their code to run on these machines.
We also educate users on how to write fast, efficient code.

Erik Spence (SciNet HPC Consortium) Linux Command Line I 13 September 2022 3 / 27
About this class

Some notes about this class:

This class aims to be a graduate course on data analysis and research computing, using R.
We will meet for twelve weeks, two lectures per week, on Tuesdays and Thursdays,
starting September 13, 9:00am ∼ 10:30am.
The class is scheduled to last 1.5 hours. This allows for classes running long, and for
answering questions after class, taking up assignments, etc.
Class is held in GB244 on Tuesdays and BL205 on Thursdays.
This class can be taken for graduate credit by students in IMS, and other departments
(MSC1090H).

Erik Spence (SciNet HPC Consortium) Linux Command Line I 13 September 2022 4 / 27
About this class, continued
Some notes about this class:

There will be 10, approximately-weekly, homework assignments, usually assigned on


Thursdays, due one week later at midnight, and worth 100% of your final mark.
Late assignments will be accepted until one week after the deadline (at 9:00am), with a
penalty of 0.5 points per day (out of 10).
The assignments are submitted through the class web site.
Office hours will be held on Tuesdays from 11:00am - 12:30pm, at the SciNet offices (661
University Ave., Suite 1140).
Please, please ask for help if you need it!
I Post questions to the class forum.
I Talk to us, or email us if you have questions: [email protected].

Ask questions!

Erik Spence (SciNet HPC Consortium) Linux Command Line I 13 September 2022 5 / 27
SciNet certificates
In addition to official UofT class credit, SciNet also offers its own certificates.
We offer certificates in High Performance Computing, Scientific Computing and Data
Science.
Each certificate requires 36 SciNet credits; specific classes qualify for specific certificates.
This class qualifies for 28 credits toward the Data Science certificate, and 8 credits toward
the Scientific Computing certificate.
Visit the SciNet education website to see what other courses are available.

https://scinet.courses

Erik Spence (SciNet HPC Consortium) Linux Command Line I 13 September 2022 6 / 27
Class expectations
Some details about the class:
Prerequisites: minimal-to-no programming experience is sufficient. The goal is to start
slowly so all will be on the same page.
Software you will need:
I a Terminal program (needed immediately). On Windows, we recommend
F ”git bash”, which contains both the terminal and ”git”, which will be needed later.
F if you’re using the Windows Subsystem for Linux (WSL), the terminal will be built in.
I On a Mac you may use ”Terminal”.
I A text editor (needed Thursday): Atom, Brackets, Sublime.
I R, and various R libraries (needed by week 2).
I git, for version control (needed by week 4?).
Grading scheme: the final grade will be based on the homework assignments (100%).
Attendance is not mandatory, though encouraged. If you don’t attend, listen to the
recordings! The slides do NOT constitute all of the class material!

Erik Spence (SciNet HPC Consortium) Linux Command Line I 13 September 2022 7 / 27
Class website

https://scinet.courses
Log in with your SciNet account, or temporary account.
Click on ”Introduction to Computational BioStatistics with R”.
Let us know if you do not yet have an account.
Erik Spence (SciNet HPC Consortium) Linux Command Line I 13 September 2022 8 / 27
Student Code of Conduct

Some details about doing the assignments:

You are welcome to discuss your assignments with each other.


You are not welcome to copy each other’s code.
You are not welcome to copy code you find on the internet, without giving credit.

http://tinyurl.com/UofTCodeOfConduct

Erik Spence (SciNet HPC Consortium) Linux Command Line I 13 September 2022 9 / 27
Class topics
Our adventure in data analysis will cover the following:
Getting started with the Linux command line.
Getting started with R.
Vectors, arrays, data frames.
Version control, modular programming.
Statistics and machine learning.
Visualization.
Other topics.

This list is subject to change. If there’s a particular topic that you’d like covered, let us know.

Erik Spence (SciNet HPC Consortium) Linux Command Line I 13 September 2022 10 / 27
Before we start

This class is intended to be fairly interactive. Eventually you will need R on your computers.
This week we will be using the Linux command line. Hopefully you’ve already got a terminal
program installed.

Windows users: we strongly recommend downloading ”git bash” or using the Windows
Subsystem for Linux (WSL).

https://git-scm.com/downloads

As mentioned, R will be needed, but not just yet. Please install it at your convenience.

Erik Spence (SciNet HPC Consortium) Linux Command Line I 13 September 2022 11 / 27
Today’s class

Today we will visit the following topics:


Motivation for using the command line.
The file system from the command line.
Manipulating files from the command line.

The point of today’s class is to give you a first taste of the Linux command line. Please stop
me if you have a question.

Erik Spence (SciNet HPC Consortium) Linux Command Line I 13 September 2022 12 / 27
The Truth about interfaces

Why are we looking at the command line interface?


Nobody. Nobody. Nobody, uses a Graphical User Interface (GUI) for HPC
(High-Performance Computing). Nobody.
And this includes repetitive or large-scale data analysis, and the majority of programming
environments.
Who cares? Well, if you’re going to do repetitive data analysis it’s possible you might
need to use HPC to get it done.
Even if you don’t, knowing how to use this infrastructure will allow you to be significantly
more efficient, consistent and productive in the management and analysis of your data.

Erik Spence (SciNet HPC Consortium) Linux Command Line I 13 September 2022 13 / 27
GUIs versus the command line
Graphical User Interfaces (GUIs) have many strengths.
Very good at using existing functionality, existing controls.
Programs tend to have lots of functionality built into them, but can only do what they’ve
been programmed to do.
Can’t save a series of commands to replicate functionality.
Easy to learn. Hard to use for big tasks.

The Command Line Interface (CLI) has a different approach.


A blank canvas; you get to program what you want to do.
Good at creating new things.
Commands that do already exist are very good at doing one thing.
Commands that you create can be saved and re-used.
Hard to learn. Easy to use for big tasks.
Erik Spence (SciNet HPC Consortium) Linux Command Line I 13 September 2022 14 / 27
Why are we learning this?

I thought this was a data analysis class. Why are we learning this?
The goal of this class is to make you a more-productive researcher.
To that end we are going to teach you more than just statistics and how to program.
We’re going to teach you programming best practices.
It will be painful, because you will be learning new ways of doing things.
But we can promise you that you’ll be more productive if you adopt the practises that we
are going to teach you.

Running code from the command line, instead of through a GUI, is a necessary part of
improving your productivity.

Erik Spence (SciNet HPC Consortium) Linux Command Line I 13 September 2022 15 / 27
”The” shell
Open a Terminal:
Windows: start up ”git bash” (or ”MobaXterm”). Or start your WSL and open a terminal.
Mac: Applications/Utilities/Terminal (drag this to the dock).
Linux: xterm, eterm, ...
The terminal launches a shell. The shell is what you are actually interacting with when
you type commands.
The shell provides access to files, the network, and other programs.
I You type in commands.
I The shell interprets them.
I Performs actions on its own, or launches other programs.
The most commonly used shell in Linux is ’bash’.
There are others; mostly the same but some syntax is different.

Erik Spence (SciNet HPC Consortium) Linux Command Line I 13 September 2022 16 / 27
The command line prompt
Now that we’ve got a terminal open, what do we see? We see the command line prompt!
On ”git bash”, the prompt looks something like this:
ejspence@mycomp MINGW64 ~
Where ’ejspence’ is my username, and ’mycomp’ is the name of my computer. On a Mac my
prompt might look like this:
mycomp:~ ejspence$

On a Linux machine, my prompt might look like this:


[ejspence@mycomp ~]$
All of these are customizable, which we won’t be covering today. It doesn’t matter what it
looks like, so long as you’re comfortable with the prompt.

Erik Spence (SciNet HPC Consortium) Linux Command Line I 13 September 2022 17 / 27
Basics: home sweet home
Where am I?
Whenever you are using a shell you are located in some directory. You ”are somewhere”.
This is called the ”path”.
When you launch a shell, you start in your ”home directory”, this is the top directory of
all of your stuff.
The home directory is /c/Users/username for ”git bash”, /Users/username on Macs,
/home/username on Unix/Linux systems, /home/g/group/username on SciNet.
If a path starts with a ”/”, it is a ”full path”, otherwise it is a ”relative path” (meaning
the path relative to where you are currently located.
The home directory is universally represented by the ~ symbol.
Directories are sometimes called folders because of how they are represented in GUIs. We
will call them directories.
On Unix systems directories are listings of files, including other directories.
Erik Spence (SciNet HPC Consortium) Linux Command Line I 13 September 2022 18 / 27
A typical Linux directory tree
The top directory is ’/’; under that are home and /
other directories, under home are the user home
directories, etc. You can always specify
a file or directory by its full ’path’: home etc
/home/ejspence/work/README.
ejspence brelier

Desktop Downloads firstMPI.c work

code README

Erik Spence (SciNet HPC Consortium) Linux Command Line I 13 September 2022 19 / 27
Basics: the file system
I will be assuming I am on a ”git bash” terminal, with a custom prompt. Your output will
likely differ somewhat if you are on a different system.
[ejspence.mycomp] Our commands
pwd present working directory
[ejspence.mycomp] pwd
ls [dir] list the directory contents
/c/Users/ejspence
[ejspence.mycomp] ls arg mandatory argument
Desktop LauncherFolder MyDocuments [arg] optional argument
[ejspence.mycomp] ls /c/Users
ejspence Public

’pwd’ stands for ’present working directory’. It will print the directory you are currently in.
As mentioned on the last slide, you begin in your home directory.
’ls’ stands for ’list’. If no argument is given it lists the contents of the current directory,
otherwise it lists the contents of the argument. Some implementations of ls
include colour.
Erik Spence (SciNet HPC Consortium) Linux Command Line I 13 September 2022 20 / 27
Creating directories
[ejspence.mycomp] pwd Our commands
/c/Users/ejspence pwd present working directory
ls [dir] list the directory contents
[ejspence.mycomp] ls mkdir dir create a directory
Desktop LauncherFolder MyDocuments
[ejspence.mycomp] mkdir firstdir arg mandatory argument
[ejspence.mycomp] ls -F [arg] optional argument
Desktop LauncherFolder MyDocuments
firstdir/
[ejspence.mycomp] mkdir ~/2ndir
[ejspence.mycomp] ls -F
2ndir/ Desktop LauncherFolder MyDocuments
firstdir/

’mkdir’ stands for ’make directory’. It creates a new directory, putting it in the current
directory unless a different path is specified.
’ls -F’ lists the directory, as before, but labels directories with a ’/’.

Erik Spence (SciNet HPC Consortium) Linux Command Line I 13 September 2022 21 / 27
Moving between directories
[ejspence.mycomp] ls Our commands
2ndir Desktop LauncherFolder MyDocuments firstdir pwd present working directory
ls [dir] list the directory contents
[ejspence.mycomp] mkdir firstdir/temp mkdir dir create a directory
[ejspence.mycomp] cd firstdir cd [dir] change directory
[ejspence.mycomp] pwd
/c/Users/ejspence/firstdir arg mandatory argument
[arg] optional argument
[ejspence.mycomp] ls
temp
’cd’ stands for ’change directory’. It
[ejspence.mycomp] cd temp
moves you to the directory you specify.
[ejspence.mycomp] pwd
/c/Users/ejspence/firstdir/temp With no argument it moves you to the
[ejspence.mycomp] cd .. home directory.
[ejspence.mycomp] pwd
/c/Users/ejspence/firstdir
[ejspence.mycomp] cd ~
[ejspence.mycomp] pwd
/c/Users/ejspence

Erik Spence (SciNet HPC Consortium) Linux Command Line I 13 September 2022 22 / 27
Tips for getting around
Some common commands for moving around your directories:
The directory above is represented by the ’..’ symbol; the current directory is represented
by the ’.’ symbol:
I ’cd ..’ goes up a directory.
I ’cd ../..’ goes up two directories.
I ’cd ../otherdir’ goes up one directory and then down into ’otherdir’.
I ’cd firstdir/seconddir/../..’ goes nowhere.
I ’cd ./././.’ also goes nowhere.
You can use absolute paths: ’cd /c/Users/ejspence/firstdir/temp’.
~ is the symbol for your home directory, on whatever system you are using. ’cd ~/work’
goes to /c/Users/ejspence/work.
’cd’ without any arguments goes to your home directory (~), from no matter where you
are.
’cd -’ goes back to the directory you were in previously.
Erik Spence (SciNet HPC Consortium) Linux Command Line I 13 September 2022 23 / 27
Tips for using the command line
Some more helpful tips for using the command line:
Use the ’tab’ key, it will ’auto-complete’ the available options based on what you’ve
already typed,
I start typing your command, and then hit ’tab’
I the shell will fill in the rest, if there is only one option.
I if nothing happens, there is either no option or more than one option.
I hit the tab key twice, this will list all available options
I continue typing to reduce the number of options, then hit tab again to fill in the rest.
Use ’Ctrl-a’ to go to the beginning of the command line, ’Ctrl-e’ to go to the end of the
line.
Use the up arrow. This scrolls through the shell’s ’history’.
Do not put spaces in your files names, nor any other special characters.

Erik Spence (SciNet HPC Consortium) Linux Command Line I 13 September 2022 24 / 27
Man pages

Know a command but aren’t sure how to use the options? Use the man (manual) page!
Most programs have a man page describing its use and all available options.
These pages are good for finding out more about a command you already use, but are less
good for learning new commands.
Many programs have gazillions of options.
No human being who has ever lived has know all the options for ’ls’.
Over time you will find a few that you find useful for your favourite commands.
Unfortunately, the ’man’ command doesn’t work with ”git bash”. Try adding the ”--help”
flag after a command to see the command-line options.

Erik Spence (SciNet HPC Consortium) Linux Command Line I 13 September 2022 25 / 27
Man pages: help!
Use the man (manual) page for a list of all flags for a command.
[ejspence.mycomp] man ls Our commands
NAME pwd present working directory
ls [dir] list the directory contents
ls - list directory contents
mkdir dir create a directory
SYNOPSIS cd [dir] change directory
ls [OPTION]... [FILE]... man cmd command’s man page
DESCRIPTION
List information about the FILEs (the current di- arg mandatory argument
rectory by default). Sort entries alphabetically [arg] optional argument
if none of -cftuvSUX nor --sort.
Not sure how to use the command? Not
Mandatory arguments to long options are mandatory sure what options there are? Check the
for short options too. man page!
-a, --all
do not ignore entries starting with . Type ’q’ to get out of the man page.
-A, --almost-all
do not list implied . and ..
...
Erik Spence (SciNet HPC Consortium) Linux Command Line I 13 September 2022 26 / 27
Our commands so far
There are a couple of things to observe about Our commands
pwd present working directory
the commands we’ve seen so far: ls [dir] list the directory contents
mkdir dir create a directory
The commands are designed to be fast cd [dir] change directory
and easy to use. man cmd command’s man page

The commands do, essentially, only one arg mandatory argument


specific thing. [arg] optional argument

The commands are pretty cryptic. Either


you know them or you don’t.
Commands can take options. These are
usually indicated with a ’-something’ flag
(such as ’ls -F’).
As you may have hoped, the purpose of this class, and the next, is to teach you enough
commands that you will be able to survive the Unix command line.

Erik Spence (SciNet HPC Consortium) Linux Command Line I 13 September 2022 27 / 27

You might also like