0% found this document useful (0 votes)

39 views

Using The Batch Farm: Technische Universität München

The document provides an overview of using the Batch Farm computing resources at Technische Universität München. It discusses [1] the infrastructure including 21 compute nodes with 570 cores and GPU job slots, [2] the differences between parallel and single job computing, and [3] basic commands for submitting, monitoring, and cancelling jobs. It also provides examples of arranging and submitting parallel and single jobs to the computing cluster.

Uploaded by

Foad WM

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views

Using The Batch Farm: Technische Universität München

Uploaded by

Foad WM

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Technische Universität München

Using the Batch Farm

Prologue

• All information + scripts from this talk

also available in

A) transfer.ktas.ph.tum.de
B) /home/www/papers/computing
Overview

• Infrastructure
• Parallel vs single job computing
• Basic commands
• How to …
… arrange a job
… send a job
… monitor my stuff
• Please don’t…
Infrastructure

• 21 compute nodes → 570 cores

• ~ 2 Gb RAM / cores
• 20 GPU job slots
• Standard queue: 2,5h / job
• Long queue: 12h / job
• Local storage ~100 Gb per node
• 1/10 Gbit/s network connection / node

SLURM job scheduler

https://www.schedmd.com
Parallel vs single job

Parallel running Single running

Job 1 Job 4 Job 7 Job 1

Job 2 Job 5 Job 8 Job 2
Job 3 Job 6 Job 9 Job 3

• Independent jobs • Code development

• Parameter scans • Compiling
• MC production • Create Plots / Graphs
• Data analysis (runwise) • Small nTuple analysis
• Creating of independent • Merging of several
output files files
Example: Parallel job
Detector Summary Tape File Analysis (DSTs)

Problem
• 1000 files with 250 events/file

Solution
• Create code locally
• Analyse 1 file per job
• Create 1 output file per job (Plots, Ntuples…)
• Send 1000 jobs to farm
• Merge plots/ntuples afterwards
Example: Single job
Fitting of a peak in plot

Problem
• Fit peaks in 1 or 2 plots

Solution
• Create a macro / program to fit
• Do it locally and check the output

Don’t make life more

complicated than it is!
Basic commands
• sview Here you will get some information about
• sshare the basic commands. Most of them
provide more information, see “command
• sbatch –help”
• scancel
• squeue
• sinfo
• Monitoring software
– Graphical
– Text based
Basic commands
• sview SLURM overview. Job, partition and node
• sshare information in an graphical overview
• sbatch Just enter “sview” in a terminal
• scancel
• squeue
• sinfo
• Monitoring software
– Graphical
– Text based
Basic commands
• sview “Fair share” ranking. (How fast do I get
• sshare the slot for the next job?)
• sbatch Just enter “sshare --all” in a terminal
• scancel
• squeue
• sinfo
• Monitoring software
– Graphical
– Text based
Basic commands
• sview Submit a job to the farm
• sshare
Enter “sbatch --help” for info about the
• sbatch parameters (will be described later)
• scancel
• squeue
• sinfo
• Monitoring software
– Graphical
– Text based
Basic commands
• sview Kill your jobs by id or all of your jobs
• sshare using “scancel –u [ADS]”
• sbatch
• scancel
• squeue
• sinfo
• Monitoring software
– Graphical
– Text based
Basic commands
• sview Gives information about the status of the
• sshare running jobs and the queue.
• sbatch Just enter “squeue” in a terminal
• scancel
• squeue
• sinfo
• Monitoring software
– Graphical
– Text based
Basic commands
• sview Gives information about the nodes,
• sshare queues and user of the farm.
• sbatch Just enter “sinfo” in a terminal
• scancel
• squeue
• sinfo
• Monitoring software
– Graphical
– Text based
Basic commands
• sview A short graphical overview over the users
• sshare currently running jobs on the farm.
• sbatch https://transfer.ktas.ph.tum.de/django/monitor/1/
• scancel
• squeue
• sinfo
• Monitoring software
– Graphical
– Text based
Basic commands
• sview A short text based overview over the users
• sshare currently running jobs on the farm.
• sbatch https://transfer.ktas.ph.tum.de/webpage/monitori
• scancel ng_batchfarm.html
• squeue
• sinfo
• Monitoring software
– Graphical
– Text based
How to… arrange a job

• Input:
– File to analyse? (Filelist?)
– Parameters?
• Output:
– Different names/ directories
• Compile before sending to
farm
• How much CPUtime / RAM
• Do I need temporary space?
• Do I need access to /scratch
• Check before farm
Example: Random Numbers

Problem
• Create file with 10 different lines and random
numbers
• Must be scalable to farm

Solution
• Input: the name of the output file has to be given
• Compile
• “Full program”
– Example.cc
– Makefile
→ This generates a executable program
Example: Random Numbers
Example: Random Numbers

• Run it locally to check if it works

How to… send a job

• Select your parameters:

– CPU
– RAM
– Partition
• SLURM can only submit scripts
• Loop over all the jobs you want to submit
• Create a bash/ python script
• Example:
– Create a script with a submit loop (submit.sh)
– Inside, create a temporary script with your job inside
– Run your script
Example: Send 10 jobs
#!/bin/sh
cpu=5 # time limit in minutes for your job,
# will be killed after that time!
mem=100 # ram limit in Mb for your job,
# it will be killed if it exceeds this
nJobs=10 # number of jobs to be performed

# the program is defined here

program=/home/www/papers/computing/programs/Example
name=Example

# the output parameters are defined here

output_path=/home/www/papers/computing/testoutput
output_name=Event
output_end=txt
Example: Send 10 jobs
# generate a random number to identify the jobs stuff exactly
randomID=$RANDOM

for i in `seq 1 $nJobs`; do

tmp_scriptname=/var/tmp/sub_${randomID}_$i.sh

# set your default environment

echo "#!/bin/sh" > $tmp_scriptname
echo ". ~/.bashrc" >> $tmp_scriptname

# execute your program to the local disk

echo "${program} /var/tmp/local_${randomID}_$i.txt" >> $tmp_scriptname

# copy the completed output to your location

echo "cp /var/tmp/local_${randomID}_$i.txt ${output_path}/${output_name}-$i.${output_end}" >>
$tmp_scriptname
# clean up your stuff
echo "rm /var/tmp/local_${randomID}_$i.txt " >> $tmp_scriptname

# submit your temporary script to the farm

sbatch --mem-per-cpu=${mem} --time=${cpu} --job-name=$name-${counter} ${tmp_scriptname}

# delete your temporary script rm -rf ${tmp_scriptname}

Example: Send 10 jobs

# submit your temporary script to the farm

sbatch --mem-per-cpu=${mem} --time=${cpu} --job-name=$name-${counter}
${tmp_scriptname}

# delete your temporary script

rm -rf ${tmp_scriptname}

done
How to… monitor my stuff

• Check your jobs frequently (squeue…)

– Do they disappear suddenly?
– Do they go down too fast?

• Check the log files in case of problems

– What is written there?
– Is it depending on one
machine?

• Try to run a job locally

Error handling

• Have you checked the logfile?

Don’t call an
• Are your scripts and code valid? admin without
having checked all
points!
• Is your data available?

• Is the fileserver present or

under heavy usage?

• Do your jobs last unusually

long?
Important Notes

Some important notes:

• Don’t use /tmp. Use /var/tmp
• Don’t write directly to /scratch, copy at the end of the job
• Clean up after your job
• Try to stay under 50k jobs at one time
• Adjust your CPU and RAM
usage reasonable
• Always check your work

• Be friendly to the others 

Questions?

Learn Multithreading with Modern C++
From Everand
Learn Multithreading with Modern C++
James Raynard
No ratings yet
Intro To Slurm
No ratings yet
Intro To Slurm
27 pages
HPC Rosalind Gettingstarted
No ratings yet
HPC Rosalind Gettingstarted
6 pages
Using A CPU Farm
No ratings yet
Using A CPU Farm
22 pages
Hercules Instructions
No ratings yet
Hercules Instructions
12 pages
Slurm Talk
No ratings yet
Slurm Talk
40 pages
serverservices_gpu-cluster [LME - WIKI]
No ratings yet
serverservices_gpu-cluster [LME - WIKI]
4 pages
Summary
No ratings yet
Summary
2 pages
Bunya User Guide 2022 12 06
No ratings yet
Bunya User Guide 2022 12 06
10 pages
Doing More With Slurm Advanced Capabilities
No ratings yet
Doing More With Slurm Advanced Capabilities
31 pages
Scheduler Commands Cheatsheet-2020-Ally
No ratings yet
Scheduler Commands Cheatsheet-2020-Ally
1 page
Submitting Your MATLAB Jobs Using Slurm To High-Performance Clusters - by Rahul Bhadani - Towards Da
No ratings yet
Submitting Your MATLAB Jobs Using Slurm To High-Performance Clusters - by Rahul Bhadani - Towards Da
1 page
Srun - Documentation
No ratings yet
Srun - Documentation
37 pages
A, Array : Jobacctgatherfrequency Parameter in Slurm'S Configuration File, Slurm - Conf. The Supported For
No ratings yet
A, Array : Jobacctgatherfrequency Parameter in Slurm'S Configuration File, Slurm - Conf. The Supported For
26 pages
Introductory Supercomputing PDF
No ratings yet
Introductory Supercomputing PDF
94 pages
Some Very Under Done Instructions For HPC 2013: Hpc@lists - Iitk.ac - in
No ratings yet
Some Very Under Done Instructions For HPC 2013: Hpc@lists - Iitk.ac - in
4 pages
HPC 2013
No ratings yet
HPC 2013
4 pages
hpc-cheat-sheet
No ratings yet
hpc-cheat-sheet
1 page
Cluster Computing Tutorial
No ratings yet
Cluster Computing Tutorial
101 pages
Mpi Sample Script
No ratings yet
Mpi Sample Script
6 pages
S L U R M: Imple Inux Tility For Esource Anagement
No ratings yet
S L U R M: Imple Inux Tility For Esource Anagement
21 pages
LSF For Users: Mike Page SCD Consulting Services Group
No ratings yet
LSF For Users: Mike Page SCD Consulting Services Group
26 pages
Media 1080170 SMXX
No ratings yet
Media 1080170 SMXX
10 pages
User Guide of High Performance Computing Cluster in School of Physics
No ratings yet
User Guide of High Performance Computing Cluster in School of Physics
8 pages
Lecture
No ratings yet
Lecture
20 pages
HPC User Manual-Updated
No ratings yet
HPC User Manual-Updated
4 pages
Week1 CLOUD
No ratings yet
Week1 CLOUD
38 pages
SLURM-HPC
No ratings yet
SLURM-HPC
28 pages
JSSPP_2023_keynote_SLURM
No ratings yet
JSSPP_2023_keynote_SLURM
22 pages
Slurm 18.08 Overview
No ratings yet
Slurm 18.08 Overview
21 pages
Great Lakes Cheat Sheet
No ratings yet
Great Lakes Cheat Sheet
3 pages
Slurm Usage Guide
No ratings yet
Slurm Usage Guide
6 pages
2023-05 HPC User Guide
No ratings yet
2023-05 HPC User Guide
10 pages
HPC_introduction_Lecture_2
No ratings yet
HPC_introduction_Lecture_2
55 pages
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Hack into your Friends Computer
From Everand
Hack into your Friends Computer
Magelan Cyber Security
No ratings yet
CCV User Manual 2013 10 03
No ratings yet
CCV User Manual 2013 10 03
32 pages
ANSYS GRAHAM Guideline
No ratings yet
ANSYS GRAHAM Guideline
4 pages
Slurm. Our Way.: Douglas Jacobsen, James Botts, Helen He Nersc
No ratings yet
Slurm. Our Way.: Douglas Jacobsen, James Botts, Helen He Nersc
13 pages
Foundation Course for Advanced Computer Studies
From Everand
Foundation Course for Advanced Computer Studies
Franck Ismael Djédjé
No ratings yet
Artemis-Cheat Sheet-Phase-3
No ratings yet
Artemis-Cheat Sheet-Phase-3
2 pages
How To Use The CBI Cluster
No ratings yet
How To Use The CBI Cluster
10 pages
PBS Pro Configuration Commands PDF
No ratings yet
PBS Pro Configuration Commands PDF
13 pages
Sun Grid Engine Tutorial
No ratings yet
Sun Grid Engine Tutorial
14 pages
FLUENT Cluster
No ratings yet
FLUENT Cluster
11 pages
Building 150226153659 Conversion Gate02
No ratings yet
Building 150226153659 Conversion Gate02
64 pages
PBS Tutorial: Fangrui Ma Universit of Nebraska-Lincoln October 26th, 2007
No ratings yet
PBS Tutorial: Fangrui Ma Universit of Nebraska-Lincoln October 26th, 2007
6 pages
Practical Guide of Building A HPC Cluster
No ratings yet
Practical Guide of Building A HPC Cluster
30 pages
All My IT Tech Posts
From Everand
All My IT Tech Posts
Stephen Edwards
No ratings yet
The Mac Terminal Reference and Scripting Primer
From Everand
The Mac Terminal Reference and Scripting Primer
Jay Docherty
4.5/5 (3)
Federrath Astro Computing Course 2023
No ratings yet
Federrath Astro Computing Course 2023
25 pages
TheCampusCluster sp2013
No ratings yet
TheCampusCluster sp2013
45 pages
Backend Handbook: for Ruby on Rails Apps
From Everand
Backend Handbook: for Ruby on Rails Apps
Francisco Quintero
1/5 (1)
Operating Systems
No ratings yet
Operating Systems
85 pages
HPC Job
No ratings yet
HPC Job
8 pages
Flux Framework Readthedocs Io Flux Core en Stable
No ratings yet
Flux Framework Readthedocs Io Flux Core en Stable
342 pages
Footprinting, Reconnaissance, Scanning and Enumeration Techniques of Computer Networks
From Everand
Footprinting, Reconnaissance, Scanning and Enumeration Techniques of Computer Networks
Dr. Hidaia Mahmood Alassouli
No ratings yet
Parallel Programming Using MPI
No ratings yet
Parallel Programming Using MPI
69 pages
Using The Cluster
No ratings yet
Using The Cluster
8 pages
Gromacs Tutorial PDF
No ratings yet
Gromacs Tutorial PDF
48 pages
IDC HPC and ROI PDF
No ratings yet
IDC HPC and ROI PDF
111 pages
II Slurm Overview
No ratings yet
II Slurm Overview
52 pages
Lenovo Intelligent Computing Orchestration (Lico) : Product Guide
No ratings yet
Lenovo Intelligent Computing Orchestration (Lico) : Product Guide
25 pages
On Deep Learning
No ratings yet
On Deep Learning
97 pages
Cuda
No ratings yet
Cuda
15 pages
2011 Dimensions in Prog Synthesis
No ratings yet
2011 Dimensions in Prog Synthesis
12 pages
9 Intro Mesosphere DCOS
No ratings yet
9 Intro Mesosphere DCOS
30 pages
JS All in One
No ratings yet
JS All in One
490 pages
Stefan Scharf - Editing
No ratings yet
Stefan Scharf - Editing
30 pages
Sticksmith (Book)
No ratings yet
Sticksmith (Book)
26 pages
Exile Film Makers
No ratings yet
Exile Film Makers
7 pages
Linux: 1 History
No ratings yet
Linux: 1 History
22 pages
Popular Hard Disk Drive Manufacturers: Hardware Operating System
No ratings yet
Popular Hard Disk Drive Manufacturers: Hardware Operating System
10 pages
Policy Center Getting Started Guide v8.5
100% (1)
Policy Center Getting Started Guide v8.5
80 pages
Code Generation
No ratings yet
Code Generation
3 pages
Multimedia Card
No ratings yet
Multimedia Card
22 pages
Ds Flow Ingest Jul2021
No ratings yet
Ds Flow Ingest Jul2021
2 pages
3750 Stack Upgrade Best Practice
No ratings yet
3750 Stack Upgrade Best Practice
12 pages
Android Seminar
No ratings yet
Android Seminar
14 pages
Comandos Juniper
No ratings yet
Comandos Juniper
6 pages
Structure Manager
100% (2)
Structure Manager
646 pages
Azure Developer Intro
No ratings yet
Azure Developer Intro
770 pages
Hälge Disc
No ratings yet
Hälge Disc
5 pages
AKAI Mpc5000 Update Instructions
No ratings yet
AKAI Mpc5000 Update Instructions
3 pages
EE2003 COAL Assignment 1
No ratings yet
EE2003 COAL Assignment 1
4 pages
DELL INSPIRON 3881 - I3 10100 8GB 1TB HDD 19.5" W10H (UC Technology)
No ratings yet
DELL INSPIRON 3881 - I3 10100 8GB 1TB HDD 19.5" W10H (UC Technology)
5 pages
Allen Bradley OS 9 Assembler Linker
No ratings yet
Allen Bradley OS 9 Assembler Linker
89 pages
BA - KOSTAL Interface Description MODBUS - PIKO CI
No ratings yet
BA - KOSTAL Interface Description MODBUS - PIKO CI
38 pages
How To Install SAP Enhancement Package 4 For SAP ERP 6.0: A Practical Guide
No ratings yet
How To Install SAP Enhancement Package 4 For SAP ERP 6.0: A Practical Guide
29 pages
How to Create a Wimboot Installation of Windows 8.1 _ 6 Steps - Instructables
No ratings yet
How to Create a Wimboot Installation of Windows 8.1 _ 6 Steps - Instructables
16 pages
CN Combined Merged
No ratings yet
CN Combined Merged
487 pages
Keyboard Not Working On Remote Desktop
No ratings yet
Keyboard Not Working On Remote Desktop
5 pages
Ameb BaseAdmin Guide PDF
No ratings yet
Ameb BaseAdmin Guide PDF
454 pages
Exam NSE5 - FAZ-6.2: IT Certification Guaranteed, The Easy Way!
No ratings yet
Exam NSE5 - FAZ-6.2: IT Certification Guaranteed, The Easy Way!
10 pages
Introduction To Programming Language & Classification of Software
No ratings yet
Introduction To Programming Language & Classification of Software
10 pages
Dice - Tech Skills Glossary PDF
No ratings yet
Dice - Tech Skills Glossary PDF
4 pages
HP Laserjet 3200
No ratings yet
HP Laserjet 3200
2 pages
SQL DBA Learning Syllabus
No ratings yet
SQL DBA Learning Syllabus
6 pages
tbs6314 Quad Hdmi Raw Data Capture Card Data Sheet
No ratings yet
tbs6314 Quad Hdmi Raw Data Capture Card Data Sheet
2 pages
Windows/download Windows 8 1 Iso
No ratings yet
Windows/download Windows 8 1 Iso
20 pages
B0700ey A PDF
No ratings yet
B0700ey A PDF
138 pages
Network Performance Evaluation For RIP, OSPF and EIGRP Routing Protocols
No ratings yet
Network Performance Evaluation For RIP, OSPF and EIGRP Routing Protocols
4 pages