Training delivered in 2009 for a compute cluster customer in Calcutta, India. I honestly have no idea what I was thinking. There is no possible audience who would have been pleased with this talk.
Docker and friends at Linux Days 2014 in Praguetomasbart
Docker allows deploying applications easily across various environments by packaging them along with their dependencies into standardized units called containers. It provides isolation and security while allowing higher density and lower overhead than virtual machines. Core OS and Mesos both integrate with Docker to deploy containers on clusters of machines for scalability and high availability.
Running Docker in Development & Production (#ndcoslo 2015)Ben Hall
The document discusses running Docker in development and production. It covers:
- Using Docker containers to run individual services like Elasticsearch or web applications
- Creating Dockerfiles to build custom images
- Linking containers together and using environment variables for service discovery
- Scaling with Docker Compose, load balancing with Nginx, and service discovery with Consul
- Clustering containers together using Docker Swarm for high availability
This document provides instructions for installing a single-node Hadoop cluster on Ubuntu. It outlines downloading and configuring Java, installing Hadoop, configuring SSH access to localhost, editing Hadoop configuration files, and formatting the HDFS filesystem via the namenode. Key steps include adding a dedicated Hadoop user, generating SSH keys, setting properties in core-site.xml, hdfs-site.xml and mapred-site.xml, and running 'hadoop namenode -format' to initialize the filesystem.
This document discusses Docker and provides an introduction and overview. It introduces Docker concepts like Dockerfiles, commands, linking containers, volumes, port mapping and registries. It also discusses tools that can be used with Docker like Fig, Baseimage, Boot2Docker and Flynn. The document provides examples of Dockerfiles, commands and how to build, run, link and manage containers.
1. The document discusses Docker containers, Docker machines, and Docker Compose as tools for building Python development environments and deploying backend services.
2. It provides examples of using Docker to run sample Python/Django applications with MySQL and PostgreSQL databases in containers, and load testing the applications.
3. The examples demonstrate performance testing Python REST APIs with different database backends and caching configurations using Docker containers.
I’ve been keeping a collection of Linux commands that are particularly useful; some are from websites I’ve visited, others from experience
I hope you find these are useful as I have. I’ll periodically add to the list, so check back occasionally.
The document provides step-by-step instructions for installing a single-node Hadoop cluster on Ubuntu Linux using VMware. It details downloading and configuring required software like Java, SSH, and Hadoop. Configuration files are edited to set properties for core Hadoop functions and enable HDFS. Finally, sample data is copied to HDFS and a word count MapReduce job is run to test the installation.
Here are some sed commands to demonstrate its capabilities:
◦ sed 's/rain/snow/' easy_sed.txt; cat easy_sed.txt
◦ sed 's/plain/mountains/' easy_sed.txt; cat easy_sed.txt
◦ sed 's/Spain/France/' easy_sed.txt; cat easy_sed.txt
◦ sed 's/^The //' easy_sed.txt; cat easy_sed.txt
◦ sed '/Spain/d' easy_sed.txt; cat easy_sed.txt
This demonstrates sed's substitution and deletion capabilities using regular expressions to match patterns in the file.
Per chi incomincia addentrarsi nel magico mondo dei comandi da terminale la vita può essere dura. In rete esistono diverse guide, ma la “Linux Bash Shell Cheat Sheet for Beginners” di Raphael è qualcosa che i principianti dovrebbero tenere a portata di mano. La segnaliamo un po’ perchè è molto semplice e chiara, e un po’ perchè è stata scritta da un sedicenne canadese. Personalmente è una cosa che mi fa piacere, perchè dimostra che anche i giovanissimi si accostano a linux nel modo migliore, ovvero “imparo e a mia volta diffondo”.
The document discusses Hadoop and HDFS. It provides an overview of HDFS architecture and how it is designed to be highly fault tolerant and provide high throughput access to large datasets. It also discusses setting up single node and multi-node Hadoop clusters on Ubuntu Linux, including configuration, formatting, starting and stopping the clusters, and running MapReduce jobs.
The document discusses containerization using Docker. It begins with an overview of Docker commands to run containers with increasing levels of isolation for hostname, process ID, and filesystem/mounting. It then demonstrates how to execute commands in a container using Linux namespaces to isolate processes and filesystems. The document aims to show how Docker containers can isolate and sandbox processes running on a machine.
This document provides a toolbox of Unix/Linux/BSD commands for system administration, networking, security, and development tasks. It contains over 20 sections that each cover a topic like the system, processes, file system, network configuration, encryption, version control, programming, and more. The document aims to be a practical guide for IT workers and advanced users, with concise explanations of commands.
Using docker for data science - part 2Calvin Giles
A lightning talk for PyData London (http://www.meetup.com/PyData-London-Meetup/) on using docker and fig to manage your data science development environment.
Using python and docker for data scienceCalvin Giles
PyData London meetup group lightning talk slides on getting an ipython notebook with the scipy stack and custom packages running in a notebook server in 5 minutes.
This document provides an overview and introduction to Node.js. It discusses that Node.js is a platform for building scalable network applications using JavaScript and uses non-blocking I/O and event-driven architecture. It was created by Ryan Dahl in 2009 and uses Google's V8 JavaScript engine. Node.js allows building web servers, networking tools and real-time applications easily and efficiently by handling concurrent connections without threads. Some popular frameworks and modules built on Node.js are also mentioned such as Express.js, Socket.IO and over 1600 modules in the npm registry.
This document discusses container security and analyzes potential vulnerabilities in Docker containers. It describes how containers may not fully isolate processes and how an attacker could escape a container to access the host machine via avenues like privileged containers, kernel exploits, or Docker socket access. It provides examples of container breakouts using these methods and emphasizes the importance of security features like seccomp, AppArmor, cgroups to restrict containers. The document encourages readers to apply security best practices like the Docker Bench tool to harden containers.
This document discusses Linux accounting and monitoring user activity. It begins with an overview of the yum and dnf package managers and how they can be used to install the psacct or acct packages for monitoring user activity. It then covers various commands provided by psacct/acct like ac, lastcomm, sa to view user login times, previously executed commands, and account activity summaries. The document also provides overviews of Kerberos for authentication, LDAP for user information storage, and lists some common system utility commands.
The document discusses Docker and CoreOS for deploying applications to production. It provides an overview of Docker, CoreOS, Etcd for service discovery, Fleet for cluster management, and using Systemd and Docker together. Examples are given for setting up databases, logging, and deploying Presence and Ambassador microservices. Issues addressed include killing Docker containers, disk space on Btrfs, and image sizes. Other orchestration tools like Kubernetes and Deis are also mentioned.
This document provides information on Linux communication tools including wall, talk, write, mesg, and systemctl. It discusses how wall broadcasts messages to logged in users. It explains how talk and ytalk allow interactive chats between users. It covers how write sends text without email. It also discusses how mesg blocks or allows messages from other users. Finally, it provides an in-depth overview of the systemctl command for managing systemd services and units.
The Unbearable Lightness: Extending the Bash shellRoberto Reale
This document summarizes challenges and limitations of the Bash shell and discusses various attempts to address them through frameworks, libraries, and paradigms like object orientation, functional programming, and inversion of control. It highlights issues like variable scoping, lack of exceptions, sorting, parsing, binary data handling, and debugging as well as efforts like bashlets, bashinator, bash manager, and oobash to improve modularity, reuse, and robustness of Bash scripts.
This document provides a summary of common Linux shell commands and shell scripting concepts. It begins with recapping common commands like ls, cat, grep etc. It then discusses what a shell script is, how to write basic scripts, and covers shell scripting fundamentals like variables, conditionals, loops, command line arguments and more. The document also provides examples of using sed, awk and regular expressions for text processing and manipulation.
Node.js is a JavaScript runtime built on Chrome's V8 engine. It allows JavaScript to be run on the server-side. Node.js avoids blocking I/O operations by using non-blocking techniques and event loops. It provides APIs for common tasks like HTTP servers, filesystem access, and more. While still in development, Node.js has found success in building real-time applications and APIs due to its asynchronous and non-blocking architecture.
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...NETWAYS
It gives an introduction to the architecture of Bareos, and how the components of Bareos interact. The configuration of Bareos will be discussed and the main Bareos features will be shown. As a practical part of the workshop the adaption of the preconfigured standard backup scheme to the attendees’ wishes will be developed.
Attendees are kindly asked to contribute configuration tasks that they want to have solved.
The latest releases of today’s popular Linux distributions include all the tools needed to do interesting things with Linux containers.
For the Makefile MicroVPS project, I set out to build a minimal virtual private server-like environment in a Linux container from scratch.
These are my requirements for the MicroVPS:
Minimal init sequence
Most of what happens in a rc.sysinit file is not needed (or wanted) in a container. However, to work like a virtual private server, the MicroVPS will need some kind of init system. The absolute minimum would be enough to start the network and at least one service.
Native network namespace
The MicroVPS will have a dedicated network namespace. It should be easy to configure.
Native package management
The package set installed in the container image will be managed using native tools like deb or rpm.
Automated build
An automated repeatable build process is a must.
Fast iteration cycle
The building and testing cycle must be fast enough not to drive me insane.
Easy management
It should be easy to distribute, monitor, and run a MicroVPS container.
In this tutorial, I will show how to use the tools included with Linux to build a virtual private server in a Linux container from scratch, using GNU Make to automate the build process.
The document discusses various Linux system monitoring utilities including SAR, SADC/SADF, MPSTAT, VMSTAT, and TOP. SAR provides CPU, memory, I/O, network, and other system activity reports. SADC collects system data which SADF can then format and output. MPSTAT reports processor-level statistics. VMSTAT provides virtual memory statistics. TOP displays active tasks and system resources usage.
The document provides instructions for configuring Red Hat Enterprise Linux 5 on VMware before installing Oracle 11gR2. This includes installing additional packages, modifying configuration files, creating users and filesystem directories, and preparing the system. Key steps are installing VMware tools, configuring network interfaces, formatting shared storage, installing the Oracle ASM library driver, and modifying shell profiles for the Oracle software owners. The goal is to prepare a system with a primary node "tom" and failover node "jerry" that is ready for an Oracle Grid 11gR2 installation.
I’ve been keeping a collection of Linux commands that are particularly useful; some are from websites I’ve visited, others from experience
I hope you find these are useful as I have. I’ll periodically add to the list, so check back occasionally.
The document provides step-by-step instructions for installing a single-node Hadoop cluster on Ubuntu Linux using VMware. It details downloading and configuring required software like Java, SSH, and Hadoop. Configuration files are edited to set properties for core Hadoop functions and enable HDFS. Finally, sample data is copied to HDFS and a word count MapReduce job is run to test the installation.
Here are some sed commands to demonstrate its capabilities:
◦ sed 's/rain/snow/' easy_sed.txt; cat easy_sed.txt
◦ sed 's/plain/mountains/' easy_sed.txt; cat easy_sed.txt
◦ sed 's/Spain/France/' easy_sed.txt; cat easy_sed.txt
◦ sed 's/^The //' easy_sed.txt; cat easy_sed.txt
◦ sed '/Spain/d' easy_sed.txt; cat easy_sed.txt
This demonstrates sed's substitution and deletion capabilities using regular expressions to match patterns in the file.
Per chi incomincia addentrarsi nel magico mondo dei comandi da terminale la vita può essere dura. In rete esistono diverse guide, ma la “Linux Bash Shell Cheat Sheet for Beginners” di Raphael è qualcosa che i principianti dovrebbero tenere a portata di mano. La segnaliamo un po’ perchè è molto semplice e chiara, e un po’ perchè è stata scritta da un sedicenne canadese. Personalmente è una cosa che mi fa piacere, perchè dimostra che anche i giovanissimi si accostano a linux nel modo migliore, ovvero “imparo e a mia volta diffondo”.
The document discusses Hadoop and HDFS. It provides an overview of HDFS architecture and how it is designed to be highly fault tolerant and provide high throughput access to large datasets. It also discusses setting up single node and multi-node Hadoop clusters on Ubuntu Linux, including configuration, formatting, starting and stopping the clusters, and running MapReduce jobs.
The document discusses containerization using Docker. It begins with an overview of Docker commands to run containers with increasing levels of isolation for hostname, process ID, and filesystem/mounting. It then demonstrates how to execute commands in a container using Linux namespaces to isolate processes and filesystems. The document aims to show how Docker containers can isolate and sandbox processes running on a machine.
This document provides a toolbox of Unix/Linux/BSD commands for system administration, networking, security, and development tasks. It contains over 20 sections that each cover a topic like the system, processes, file system, network configuration, encryption, version control, programming, and more. The document aims to be a practical guide for IT workers and advanced users, with concise explanations of commands.
Using docker for data science - part 2Calvin Giles
A lightning talk for PyData London (http://www.meetup.com/PyData-London-Meetup/) on using docker and fig to manage your data science development environment.
Using python and docker for data scienceCalvin Giles
PyData London meetup group lightning talk slides on getting an ipython notebook with the scipy stack and custom packages running in a notebook server in 5 minutes.
This document provides an overview and introduction to Node.js. It discusses that Node.js is a platform for building scalable network applications using JavaScript and uses non-blocking I/O and event-driven architecture. It was created by Ryan Dahl in 2009 and uses Google's V8 JavaScript engine. Node.js allows building web servers, networking tools and real-time applications easily and efficiently by handling concurrent connections without threads. Some popular frameworks and modules built on Node.js are also mentioned such as Express.js, Socket.IO and over 1600 modules in the npm registry.
This document discusses container security and analyzes potential vulnerabilities in Docker containers. It describes how containers may not fully isolate processes and how an attacker could escape a container to access the host machine via avenues like privileged containers, kernel exploits, or Docker socket access. It provides examples of container breakouts using these methods and emphasizes the importance of security features like seccomp, AppArmor, cgroups to restrict containers. The document encourages readers to apply security best practices like the Docker Bench tool to harden containers.
This document discusses Linux accounting and monitoring user activity. It begins with an overview of the yum and dnf package managers and how they can be used to install the psacct or acct packages for monitoring user activity. It then covers various commands provided by psacct/acct like ac, lastcomm, sa to view user login times, previously executed commands, and account activity summaries. The document also provides overviews of Kerberos for authentication, LDAP for user information storage, and lists some common system utility commands.
The document discusses Docker and CoreOS for deploying applications to production. It provides an overview of Docker, CoreOS, Etcd for service discovery, Fleet for cluster management, and using Systemd and Docker together. Examples are given for setting up databases, logging, and deploying Presence and Ambassador microservices. Issues addressed include killing Docker containers, disk space on Btrfs, and image sizes. Other orchestration tools like Kubernetes and Deis are also mentioned.
This document provides information on Linux communication tools including wall, talk, write, mesg, and systemctl. It discusses how wall broadcasts messages to logged in users. It explains how talk and ytalk allow interactive chats between users. It covers how write sends text without email. It also discusses how mesg blocks or allows messages from other users. Finally, it provides an in-depth overview of the systemctl command for managing systemd services and units.
The Unbearable Lightness: Extending the Bash shellRoberto Reale
This document summarizes challenges and limitations of the Bash shell and discusses various attempts to address them through frameworks, libraries, and paradigms like object orientation, functional programming, and inversion of control. It highlights issues like variable scoping, lack of exceptions, sorting, parsing, binary data handling, and debugging as well as efforts like bashlets, bashinator, bash manager, and oobash to improve modularity, reuse, and robustness of Bash scripts.
This document provides a summary of common Linux shell commands and shell scripting concepts. It begins with recapping common commands like ls, cat, grep etc. It then discusses what a shell script is, how to write basic scripts, and covers shell scripting fundamentals like variables, conditionals, loops, command line arguments and more. The document also provides examples of using sed, awk and regular expressions for text processing and manipulation.
Node.js is a JavaScript runtime built on Chrome's V8 engine. It allows JavaScript to be run on the server-side. Node.js avoids blocking I/O operations by using non-blocking techniques and event loops. It provides APIs for common tasks like HTTP servers, filesystem access, and more. While still in development, Node.js has found success in building real-time applications and APIs due to its asynchronous and non-blocking architecture.
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...NETWAYS
It gives an introduction to the architecture of Bareos, and how the components of Bareos interact. The configuration of Bareos will be discussed and the main Bareos features will be shown. As a practical part of the workshop the adaption of the preconfigured standard backup scheme to the attendees’ wishes will be developed.
Attendees are kindly asked to contribute configuration tasks that they want to have solved.
The latest releases of today’s popular Linux distributions include all the tools needed to do interesting things with Linux containers.
For the Makefile MicroVPS project, I set out to build a minimal virtual private server-like environment in a Linux container from scratch.
These are my requirements for the MicroVPS:
Minimal init sequence
Most of what happens in a rc.sysinit file is not needed (or wanted) in a container. However, to work like a virtual private server, the MicroVPS will need some kind of init system. The absolute minimum would be enough to start the network and at least one service.
Native network namespace
The MicroVPS will have a dedicated network namespace. It should be easy to configure.
Native package management
The package set installed in the container image will be managed using native tools like deb or rpm.
Automated build
An automated repeatable build process is a must.
Fast iteration cycle
The building and testing cycle must be fast enough not to drive me insane.
Easy management
It should be easy to distribute, monitor, and run a MicroVPS container.
In this tutorial, I will show how to use the tools included with Linux to build a virtual private server in a Linux container from scratch, using GNU Make to automate the build process.
The document discusses various Linux system monitoring utilities including SAR, SADC/SADF, MPSTAT, VMSTAT, and TOP. SAR provides CPU, memory, I/O, network, and other system activity reports. SADC collects system data which SADF can then format and output. MPSTAT reports processor-level statistics. VMSTAT provides virtual memory statistics. TOP displays active tasks and system resources usage.
The document provides instructions for configuring Red Hat Enterprise Linux 5 on VMware before installing Oracle 11gR2. This includes installing additional packages, modifying configuration files, creating users and filesystem directories, and preparing the system. Key steps are installing VMware tools, configuring network interfaces, formatting shared storage, installing the Oracle ASM library driver, and modifying shell profiles for the Oracle software owners. The goal is to prepare a system with a primary node "tom" and failover node "jerry" that is ready for an Oracle Grid 11gR2 installation.
Why and How Powershell will rule the Command Line - Barcamp LA 4Ilya Haykinson
PowerShell is a command shell for Windows that treats commands as objects that interact through pipes and objects. It provides a fully-fledged programming language where commands manipulate objects and share a common naming convention. PowerShell holds that commands should do one thing well and interact through a consistent environment, addressing issues with text parsing between traditional command line programs.
This document provides an overview of containerization and Docker. It covers prerequisites, traditional application deployment challenges, container components like namespaces and cgroups, major Docker concepts like images and containers, common Docker commands, building Dockerfiles, and Docker workflows and best practices. Hands-on exercises are included to build and run containers.
This document provides an overview of essential Linux commands and utilities for SQL Server DBAs. It covers topics such as Linux history, users and permissions, file editing and navigation commands like vi, process monitoring with ps and top, and system diagnostic utilities like sar, vmstat, and mpstat. The document aims to teach SQL Server DBAs basic Linux skills to manage their environment and troubleshoot issues.
This document summarizes a Docker workshop that covers:
1. Running Docker containers, including starting containers interactively or detached, checking statuses, port forwarding, linking containers, and mounting volumes.
2. Building Docker images, including committing existing containers or building from a Dockerfile, and using Docker build context.
3. The official Docker Hub for finding and using common Docker images like Redis, MySQL, and Jenkins. It also covers tagging and pushing images to private Docker registries.
This document provides an overview of Ansible, an open source tool for configuration management and application deployment. It discusses how Ansible aims to simplify infrastructure automation tasks through a model-driven approach without requiring developers to learn DevOps tools. Key points:
- Ansible uses YAML playbooks to declaratively define server configurations and deployments in an idempotent and scalable way.
- It provides ad-hoc command execution and setup facts gathering via SSH. Playbooks can target groups of servers to orchestrate complex multi-server tasks.
- Variables, templates, conditionals allow playbooks to customize configurations for different environments. Plugins support integration with cloud, monitoring, messaging tools.
- Ansible aims to reduce complexity compared
Docker allows applications and their dependencies to be packaged into standardized units called containers that can run on any computing environment regardless of the underlying infrastructure. Containers leverage and share the host operating system's kernel to run as isolated processes, which improves performance and reduces overhead compared to virtual machines. Dockerfiles define the build instructions for container images, while Docker Compose allows defining and running multi-container applications with a single configuration file.
Docker is an amazing tool, but unless you work with it every day, you're probably left with a ton of questions. What's a container? What's an image? What's the difference between Docker, Machine, Compose, and Swarm? Why the heck should I care? Well, Docker makes it easier than ever to deploy and scale your applications and services. In addition, it lets you simulate your production environment on your local machine without heavy virtual machines. In this talk, we'll explore the basics of Docker, create a custom image for a web application, create a group of containers, and look at how you can put your apps into production on various cloud providers. At the end of the talk, you'll have the knowledge you need to put this to use with your own applications.
If you're not familiar with Docker yet, here is your chance to catch up: a quick overview of the Open Source Docker Engine, and its associated services delivered through the Docker Hub. It also includes Jérôme will also discuss the new features of Docker 1.0, and briefly explain how you can run and maintain Docker on Azure. In addition, an Azure team member will demonstrate how deploy docker to Azure. The presentation will be followed by a Q&A session!
PuppetConf 2016: The Challenges with Container Configuration – David Lutterko...Puppet
Here are the slides from David Lutterkort's PuppetConf 2016 presentation called The Challenges with Container Configuration. Watch the videos at https://www.youtube.com/playlist?list=PLV86BgbREluVjwwt-9UL8u2Uy8xnzpIqa
Introducing containers into your infrastructure brings new capabilities, but also new challenges, in particular around configuration. This talk will take a look under the hood at some of those operational challenges including:
* The difference between runtime and build-time configuration, and the importance of relating the two together.
* Configuration drift, immutable mental models and mutable container file systems.
* Who configures the orchestrators?
* Emergent vs. model driven configuration.
In the process we will identify some common problems and talk about potential solutions.
Talk from PuppetConf 2016
Troubleshooting Tips from a Docker Support EngineerJeff Anderson
The document discusses various troubleshooting techniques for Docker including using tools like socat and curl to characterize networking and TLS issues, checking container processes and permissions, using volumes to store persistent data, and resolving issues with incorrect localhost references between containers. It also provides examples of troubleshooting issues with a Minecraft server, Ruby application, and Nginx proxy configuration.
Troubleshooting Tips from a Docker Support Engineer - Jeff Anderson, DockerDocker, Inc.
Docker makes everything easier. But even with the easiest platforms, sometimes you run into problems. In this session, you'll learn first hand from someone whose job is helping customers fix these problems. Using Docker and Docker Data Center, you can keep your apps running smoothly with minimal downtime. In this session, you'll learn how to apply your troubleshooting skills in the Docker ecosystem, including: 1. Identification and characterization of the problem. 2. Command line tools to inspect networking and namespaces. 3. Applying these skills to your workloads on OSS Docker and on DDC.
Title: Introduction to Docker
Abstract:
During the year since it’s inception, Docker have changed our perception of the OS-level Virtualization also called Containers.
At this workshop we will introduce the concept of Linux containers in general and Docker specifically. We will guide the participants through a practical exercise that will include use of various Docker commands and a setting up a functional Wordpress/MySQL system running in two containers and communication with each other using Serf
Topics:
Docker Installation (in case is missing)
Boot2Docker
Docker commands
- basic commands
- different types of containers
- Dockerfiles
Serf
Wordpress Exercise
- setting up Serf cluster
- deploying MySQL
- deploying Wordpress and connecting to MySQL
Prerequisites:
Working installation of Docker
On Mac - https://docs.docker.com/installation/mac/
On Windows - https://docs.docker.com/installation/windows/
Other Platforms - https://docs.docker.com/installation/#installation
Docker has created enormous buzz in the last few years. Docker is a open-source software containerization platform. It provides an ability to package software into standardised units on Docker for software development. In this hands-on introductory session, I introduce the concept of containers, provide an overview of Docker, and take the participants through the steps for installing Docker. The main session involves using Docker CLI (Command Line Interface) - all the concepts such as images, managing containers, and getting useful work done is illustrated step-by-step by running commands.
This document provides an overview of an introductory class on using Linux at the command line. It outlines the following:
- The class will start with a sign-in sheet and end with an evaluation. The instructor will cover as much material as possible in the allotted time, starting with the easiest concepts.
- The class is hands-on and lab-based, allowing students to ask questions. Commands for students to type will be in bold text. There will be a mid-class break.
- Topics to be covered include basic Linux commands, navigating and manipulating files and directories, permissions, and using tools like grep, awk and sed to filter and manipulate output.
This document summarizes Dockerizing a Django application. It describes the speakers' experiences moving from a non-Dockerized setup with many issues, like outdated images and long recovery times, to a Dockerized setup with improved scalability, documentation, and development workflows. Key aspects of the new setup include using Docker Compose to run multiple services, Docker Machine to provision environments, and Docker Swarm for production deployments across multiple instances.
Data and Computing Infrastructure for the Life SciencesChris Dwan
My slides from the 2025 Bio-IT World Expo.
I tried to lift above the churn to find constants that an architect or strategist could use to make well informed and durable technology choices.
This document summarizes a talk given by Chris Dwan at a DNA Nexus User Group Meeting. It discusses how platform requirements change as companies mature from their early startup phase focused on agility, to a growth phase where compliance and governance are important, to a mature phase where financial considerations are critical. It provides rules of thumb for choosing platforms in the different phases, noting that platforms that reduce time and work are attractive early on, but financial oversight becomes more important later.
The FY23 Somerville city budget proposes $307.77 million in expenditures, an 8.1% increase over the prior year. Revenues are projected to be $309.5 million, an 8% increase. Key investments include a historic 10% increase to the school budget, funding for mental health services and youth programs, environmental sustainability initiatives, and positions to support housing stability and equity across city departments. The budget also proposes restructuring city government with a new Chief Administrative Officer position.
Production Bioinformatics, emphasis on ProductionChris Dwan
Production bioinformatics at Sema4 can be thought of as data ops - a peer to the lab ops organization. We operate 24/7 to deliver correct and timely results on NGS and other data for thousands of samples per week. This deck introduces the Prod BI organization and systems architecture with a focus on what it takes to run bioinformatics in production rather than for R&D or pure research.
This document outlines proposals to reduce the Somerville Police Department budget and reallocate those funds. The SPD budget has grown faster than inflation while other programs like housing, arts, and sustainability have seen cuts. Specific proposals include reducing the police budget by 10-60% and allocating those funds instead to education, affordable housing, economic development, sustainability programs, and social services like crisis counseling. Cutting the police budget by 60% could double several other departmental budgets and leave millions available for alternative emergency response programs.
No Free Lunch: Metadata in the life sciencesChris Dwan
This presentation covers some challenges and makes suggestions to support the work of creating flexible, interoperable data systems for the life sciences.
The Urban Forestry Committee discussed upcoming public tree hearings and made recommendations. They recommended that the city work with developers to retain two mature ash trees for a hotel/residential development and use native species for replacements. For a blue spruce, they recommended asking Eversource to relocate power lines rather than removing the tree. For a Siberian elm, they suggested monitoring its health and considering treatment instead of removal.
Chris Dwan is a director of consulting and professional services at Bioteam, an independent consulting company that specializes in bridging science and information technology. He has a background in computer science and biology and has worked on projects for organizations like NASA, the CDC, and pharmaceutical companies. During the career day presentation, he discussed his work in bioinformatics, DNA sequencing, and providing consulting services to help solve problems at the intersection of biology and computer science.
Advocacy in the Enterprise (what works, what doesn't)Chris Dwan
This document summarizes strategies for advocacy and inclusion in the workplace. It discusses how leadership buy-in and removing barriers can promote inclusion, while quotas and public shaming can be counterproductive. Effective hiring practices include evaluating candidates based on job requirements rather than fit, and avoiding biases. Once hired, new employees benefit from sponsors, a supportive culture, and diversity training for managers. Overall, the document advocates for inclusive practices that promote equal opportunity and access.
The document summarizes challenges faced by early adopters of next generation DNA sequencing technology and potential solutions. It discusses issues such as high upfront costs of sequencers, data storage and management difficulties due to the large amount of data generated, networking and data transfer problems, and lack of laboratory information management systems. Potential solutions proposed include using virtualization and cloud computing through Amazon Web Services, developing a wiki-based laboratory information management system, simplifying storage architectures, and automated data capture and management.
The document discusses the history and development of high performance computing. It describes how early computers were mechanical devices, then became electronic and digital. It also summarizes the development of parallel and cluster computing technologies that allow multiple processors to work together on problems.
This document summarizes a Tree Preservation Ordinance for a city. It establishes definitions related to trees, creates roles like Tree Warden and Urban Forestry Committee, and outlines regulations for removing public shade trees, city trees, and private trees. Permits are required to remove trees, and replacements or payments to a Tree Fund are typically required for removing significant trees to maintain the city's tree canopy. The ordinance aims to enhance environmental and quality of life benefits of the urban forest.
A response from Newport Construction to the city of Somerville's demand that we be compensated for the improper destruction of our trees.
In which they respond: "No."
This document provides lighting design details for a pedestrian underpass, including:
- A luminaire schedule listing a single LED luminaire model to be used with 4,807 lumens output.
- A lighting plan showing the layout of 6 luminaires in the tunnel.
- Photometric calculations indicating the tunnel lighting will average 35.1 footcandles with a maximum of 42.9 fc and minimum of 13.4 fc.
Passenger car unit (PCU) of a vehicle type depends on vehicular characteristics, stream characteristics, roadway characteristics, environmental factors, climate conditions and control conditions. Keeping in view various factors affecting PCU, a model was developed taking a volume to capacity ratio and percentage share of particular vehicle type as independent parameters. A microscopic traffic simulation model VISSIM has been used in present study for generating traffic flow data which some time very difficult to obtain from field survey. A comparison study was carried out with the purpose of verifying when the adaptive neuro-fuzzy inference system (ANFIS), artificial neural network (ANN) and multiple linear regression (MLR) models are appropriate for prediction of PCUs of different vehicle types. From the results observed that ANFIS model estimates were closer to the corresponding simulated PCU values compared to MLR and ANN models. It is concluded that the ANFIS model showed greater potential in predicting PCUs from v/c ratio and proportional share for all type of vehicles whereas MLR and ANN models did not perform well.
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptxRishavKumar530754
LiDAR-Based System for Autonomous Cars
Autonomous Driving with LiDAR Tech
LiDAR Integration in Self-Driving Cars
Self-Driving Vehicles Using LiDAR
LiDAR Mapping for Driverless Cars
its all about Artificial Intelligence(Ai) and Machine Learning and not on advanced level you can study before the exam or can check for some information on Ai for project
Raish Khanji GTU 8th sem Internship Report.pdfRaishKhanji
This report details the practical experiences gained during an internship at Indo German Tool
Room, Ahmedabad. The internship provided hands-on training in various manufacturing technologies, encompassing both conventional and advanced techniques. Significant emphasis was placed on machining processes, including operation and fundamental
understanding of lathe and milling machines. Furthermore, the internship incorporated
modern welding technology, notably through the application of an Augmented Reality (AR)
simulator, offering a safe and effective environment for skill development. Exposure to
industrial automation was achieved through practical exercises in Programmable Logic Controllers (PLCs) using Siemens TIA software and direct operation of industrial robots
utilizing teach pendants. The principles and practical aspects of Computer Numerical Control
(CNC) technology were also explored. Complementing these manufacturing processes, the
internship included extensive application of SolidWorks software for design and modeling tasks. This comprehensive practical training has provided a foundational understanding of
key aspects of modern manufacturing and design, enhancing the technical proficiency and readiness for future engineering endeavors.
This paper proposes a shoulder inverse kinematics (IK) technique. Shoulder complex is comprised of the sternum, clavicle, ribs, scapula, humerus, and four joints.
Analysis of reinforced concrete deep beam is based on simplified approximate method due to the complexity of the exact analysis. The complexity is due to a number of parameters affecting its response. To evaluate some of this parameters, finite element study of the structural behavior of the reinforced self-compacting concrete deep beam was carried out using Abaqus finite element modeling tool. The model was validated against experimental data from the literature. The parametric effects of varied concrete compressive strength, vertical web reinforcement ratio and horizontal web reinforcement ratio on the beam were tested on eight (8) different specimens under four points loads. The results of the validation work showed good agreement with the experimental studies. The parametric study revealed that the concrete compressive strength most significantly influenced the specimens’ response with the average of 41.1% and 49 % increment in the diagonal cracking and ultimate load respectively due to doubling of concrete compressive strength. Although the increase in horizontal web reinforcement ratio from 0.31 % to 0.63 % lead to average of 6.24 % increment on the diagonal cracking load, it does not influence the ultimate strength and the load-deflection response of the beams. Similar variation in vertical web reinforcement ratio leads to an average of 2.4 % and 15 % increment in cracking and ultimate load respectively with no appreciable effect on the load-deflection response.
Value Stream Mapping Worskshops for Intelligent Continuous SecurityMarc Hornbeek
This presentation provides detailed guidance and tools for conducting Current State and Future State Value Stream Mapping workshops for Intelligent Continuous Security.
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...Infopitaara
A Boiler Feed Pump (BFP) is a critical component in thermal power plants. It supplies high-pressure water (feedwater) to the boiler, ensuring continuous steam generation.
⚙️ How a Boiler Feed Pump Works
Water Collection:
Feedwater is collected from the deaerator or feedwater tank.
Pressurization:
The pump increases water pressure using multiple impellers/stages in centrifugal types.
Discharge to Boiler:
Pressurized water is then supplied to the boiler drum or economizer section, depending on design.
🌀 Types of Boiler Feed Pumps
Centrifugal Pumps (most common):
Multistage for higher pressure.
Used in large thermal power stations.
Positive Displacement Pumps (less common):
For smaller or specific applications.
Precise flow control but less efficient for large volumes.
🛠️ Key Operations and Controls
Recirculation Line: Protects the pump from overheating at low flow.
Throttle Valve: Regulates flow based on boiler demand.
Control System: Often automated via DCS/PLC for variable load conditions.
Sealing & Cooling Systems: Prevent leakage and maintain pump health.
⚠️ Common BFP Issues
Cavitation due to low NPSH (Net Positive Suction Head).
Seal or bearing failure.
Overheating from improper flow or recirculation.
RICS Membership-(The Royal Institution of Chartered Surveyors).pdfMohamedAbdelkader115
Glad to be one of only 14 members inside Kuwait to hold this credential.
Please check the members inside kuwait from this link:
https://www.rics.org/networking/find-a-member.html?firstname=&lastname=&town=&country=Kuwait&member_grade=(AssocRICS)&expert_witness=&accrediation=&page=1
RICS Membership-(The Royal Institution of Chartered Surveyors).pdfMohamedAbdelkader115
2009 cluster user training
1. Cluster User Training
From Bash to parallel jobs under SGE in one terrifying
hour
Christopher Dwan, Bioteam
First delivered at IICB, Kolkata, India
December 14, 2009
3. Unix command line essentials
“pwd”: Print current working directory (where am I?)
“ls”: File listing
“ls –l”: Detailed file listing, including permissions
“cd”: Change directory
“chmod”: Change file permissions
“echo”: Print something to the terminal
“more”: Show contents of a text file
“pico” Text editor
“man”: Documentation on any Unix command:
4. Basic Unix Exercises: Making directories
What directory am I currently in?
Remora:~ cdwan$ pwd
/Users/cdwan/
Create a directory named “test_1”
Remora:~ cdwan$ mkdir test_1
Remora:~ cdwan$ ls
test_1
Change into that directory, and verify that we are there.
Remora:~ cdwan$ cd test_1
remora:test_1 cdwan$ pwd
/Users/cdwan/test_1
5. More basic Unix
Return to your home directory:
“cd” with no arguments
Exit the session:
“exit”
6. File editing.
“The best script editor” is the subject of an ongoing religious war in
technical circles
You should use the tool that does not get in your way.
“vi”: lightweight, complex, powerful, difficult to use
“emacs”: heavyweight, complex, powerful, difficult to use
“pico”: Possibly the simplest editor to use
To edit a file: “pico filename”
7. Hello world in ‘bash’
“Bash” is a shell scripting language.
– It is the default scripting language that you have at the terminal.
– I.e: You are already using it.
– We will take this command:
remora:test_1 cdwan$ echo "hello world"
hello world
And create a wrapper script to do the same thing:
remora:test_1 cdwan$ pico hello.sh
remora:test_1 cdwan$ chmod +x hello.sh
remora:test_1 cdwan$ ./hello.sh
hello world
8. Running a set of bash commands
Using pico, create a file named “hello.sh” containing a single line:
echo "hello world"
Exit pico. Verify the contents of the file:
remora:ex_1 cdwan$ more hello.sh
echo "hello world"
Then invoke it using the ‘bash’ interpreter:
remora:ex_1 cdwan$ bash hello.sh
hello world
9. Hello world script
remora:test_1 cdwan$ pico hello.sh
#!/bin/bash
echo "hello world”
The “#!” line tells the system to automatically run it using bash
Ctrl-O: Save the file
Ctrl-X: Exit pico
10. File permissions
Files have properties:
– Read, Write, Execute
– Three different types of user: “owner”, “group”, “everyone”
To take a script you have written and turn it into an executable program,
run “chmod +x” on it.
remora:test_1 cdwan$ ls -l hello.sh
-rw-r--r-- 1 cdwan staff 32 Dec 14 21:56 hello.sh
remora:test_1 cdwan$ chmod +x hello.sh
remora:test_1 cdwan$ ls -l hello.sh
-rwxr-xr-x 1 cdwan staff 32 Dec 14 21:56 hello.sh
11. Execute the hello world script
remora:test_1 cdwan$ pico hello.sh
remora:test_1 cdwan$ chmod +x hello.sh
remora:test_1 cdwan$ ./hello.sh
hello world
13. Most useful SGE commands
• qsub / qdel
– Submit jobs & delete jobs
• qstat & qhost
– Status info for queues, hosts and jobs
• qacct
– Summary info and reports on completed job
• qrsh
– Get an interactive shell on a cluster node
– Quickly run a command on a remote host
• qmon
– Launch the X11 GUI interface
Please do not copy, put online
or redistribute
[email protected]
14. Interactive Sessions
To generate an interactive session, scheduled on any node:
“qlogin”
applecluster:~ cluster$ qlogin
Your job 145 ("QLOGIN") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 145 has been successfully scheduled.
Establishing /common/node/ssh_wrapper session to host node002.cluster.private ...
The authenticity of host '[node002.cluster.private]:50726 ([192.168.2.2]:50726)'
can't be established.
RSA key fingerprint is a7:02:43:23:b6:ee:07:a8:0f:2b:6c:25:8a:3c:93:2b.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '[node002.cluster.private]:50726,[192.168.2.2]:50726'
(RSA) to the list of known hosts.
Last login: Thu Dec 3 09:55:42 2009 from portal2net.cluster.private
node002:~ cluster$
[email protected] BIP 1/8 0.00 darwin-x86
145 0.55500 QLOGIN cluster r 12/15/2009 09:15:08
15. Requesting the whole node
qlogin -pe threaded 8
[email protected] BIP 8/8 0.00 darwin-x86
146 0.55500 QLOGIN cluster r 12/15/2009 09:34:58 8
We request a parallel environent called “threaded” (note, this PE does not
exist by default in SGE – we create it in iNquiry)
We request 8 slots within that environment
Now, no other jobs will be scheduled to your node while that login is in
place.
16. Most basic job example
qsub –b y /bin/hostname
You will see two new files in your home directory:
hostname.oYYY
hostname.eYYY
YYY is the job id provided by the queuing system.
These are the standard output and standard error files from running
/bin/hostname on one of the nodes.
Argument “-b y” indicates that this is a a compiled binary. SGE will not
try to parse the input, but merely run it.
17. Creating the sleeper script
#!/bin/bash
echo “hello world”
sleep 60
hostname
Then run like this:
remora:test_1 cdwan$ cp hello.sh sleeper.sh
remora:test_1 cdwan$ pico sleeper.sh
remora:test_1 cdwan$ ./sleeper.sh
hello world
remora.local
18. Submitting the sleeper script
genesis2:example bioteam$ qsub -cwd –S /bin/bash sleeper.sh
Your job 217 ("sleeper.sh") has been submitted
genesis2:example bioteam$ qstat
job-ID prior name user state submit/start at
queue slots ja-task-ID
-----------------------------------------------------------------
-------
217 0.55500 sleeper.sh bioteam r 12/14/2009 12:00:46
[email protected] 1
19. Adding arguments directly into the script
#!/bin/bash
#$ -S /bin/bash
#$ -cwd
echo "hello world”
sleep 60
hostname
Any comment that starts with “#$” is interpreted as an argument to
qsub.
In case of conflict, the command line wins.
20. More SGE commands
• “qstat –f”: Show all queues, even the empty ones
• “qstat –u *”: Show jobs from all users
• “qstat –f –u *”: Both all queues and uses
• “qdel job_id”: Delete a particular job
• “qdel –u cdwan”: Delete all jobs run by user cdwan
21. Giving your job a name
#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -N sleeper
echo ”Hello world”
sleep 60
hostname
genesis2:demo bioteam$ qsub sleeper.sh
Your job 222 ("sleeper") has been submitted
22. Resource Requirements
genesis2:demo bioteam$ qsub -l arch=solaris sleeper.sh
Your job 225 ("sleeper") has been submitted
genesis2:demo bioteam$ qstat
job-ID prior name user state
----------------------------------------------
225 0.00000 sleeper bioteam qw
We specify a resource requirement that cannot be met (there are no solaris
machines in the cluster)
Qstat –j 225 tells the story:
(-l arch=solaris) cannot run at host "node008.cluster.private" because it offers only
hl:arch=darwin-ppc
23. Environment variables from SGE
SGE sets several variables in the script for you.
– JOB_ID numerical ID of the job
#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -N sleeper
echo "My job id is " $JOB_ID
sleep 60
genesis2:demo bioteam$ more sleeper.o221
My job id is 221
24. Job dependencies
genesis2:demo bioteam$ qsub -N ”primary” sleeper.sh
genesis2:demo bioteam$ qsub -hold_jid primary -N
"secondary" sleeper.sh
Your job 224 ("secondary") has been submitted
genesis2:demo bioteam$ qstat
job-ID prior name user state
-----------------------------------------------
223 0.55500 primary bioteam r
224 0.00000 secondary bioteam hqw
26. Task arrays
#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -N task_array
#$ -o task.out
#$ -e task.err
echo "Job ID is " $JOB_ID “ Task is “ $SGE_TASK_ID
27. Task arrays
genesis2:example_2 bioteam$ qsub -t 1-10 task.sh
Your job-array 228.1-10:1 ("task_array") has been
submitted
genesis2:example_2 bioteam$ more task.out
Job ID is 228 Task ID is 10
Job ID is 228 Task ID is 1
Job ID is 228 Task ID is 3
Job ID is 228 Task ID is 4
• …
28. Perl instead of bash
My “hello world” script, using Perl instead of Bash:
remora:~ cdwan$ more hello_world.pl
#!/usr/bin/perl
#! –S /usr/bin/perl
sleep(60);
print "Hello worldn";
Submitted to the queuing system:
qsub hello_world.pl
30. Parallel Jobs
• A parallel job runs simultaneously across multiple servers
– Biggest SGE job I’ve heard of: Single application running across 63,000
CPU cores: TACC “Ranger” Cluster in Texas
– Distinction with ‘batches’ of processes that include many tasks to be
done in any order
APPLICATION
32. Batch Jobs
Copyright 2006, The BioTeam
32 Not for
Redistribution
http://bioteam.net
Private Ethernet Network
“Public” Ethernet Network
•Independent applications running at the same time
•Many jobs (batch)
•Maximum efficiency, simple to write
33. Tightly Coupled / Parallel
Copyright 2006, The BioTeam
33 Not for
Redistribution
http://bioteam.net
One parallel application running over the entire cluster
Private Ethernet Network
“Public” Ethernet Network
Application
•One job, where response time is important.
•Overall efficiency is lower
•Scalability is hard
34. Amdahl’s Law
Maximum expected speedup for
parallelizing any task
– Serial portion (non parallelizable)
– Parallel portion (can be parallelized)
Additionally:
– Cost associated with using more
machines (startup, teardown)
– At least, scheduling. Possibly some
other factors like communication
– Communication scales with number
of processes
Important to note:
Re-stating the problem can radically
alter the serial / parallel ratio
35. Network Latency
• Latency:
– Time to initiate communication
• Throughput:
– Data rate once communication is established
• Gigabit Ethernet:
– Latencies: ~100ms
– Throughputs up to 80% of wire speed (800Mb/sec)
– $10 / network port
• Myranet / Infiniband:
– Latencies: ~3ms
– Throughput: 80% of wire speed
– $800 / network port
37. Parallel Jobs
Many different software implementations are used to support
parallel tasks:
– MPICH
– LAM-MPI
– OpenMPI
– PVM
– LINDA
No magic involved
– Requires work
– Your application must support parallel methods
38. Submitting a standalone MPI job
Build the code with a particular version of MPI:
genesis2:examples bioteam$ pwd
/common/mpich/ch_p4/examples
genesis2:examples root# which mpicc
/common/mpich/ch_p4/bin/mpicc
genesis2:examples root# mpicc cpi.c
genesis2:examples root# mpicc -o cpi cpi.o
Run without any MPI framework:
genesis2:examples root# ./cpi
Process 0 on genesis2.mit.edu
pi is approximately 3.1416009869231254, Error is
0.0000083333333323
wall clock time = 0.000214
39. Submitting a standalone MPI job (no SGE)
MPI needs to know which hosts to use: We create a hosts file which simply lists
the machine
genesis2:~ bioteam$ more hosts_file
node001
node002
Node003
node004
Then start the job using ‘mpirun’
genesis2:~ bioteam$ mpirun -machinefile hosts_file
-np 4 /common/mpich/ch_p4/examples/cpi
Process 0 on genesis2.mit.edu
Process 2 on node002.cluster.private
Process 3 on node003.cluster.private
Process 1 on node001.cluster.private
pi is approximately 3.1416009869231249, Error is 0.0000083333333318
wall clock time = 0.002106
40. Critical Notes
In order to have MPICH jobs work reliably, you need to compile
and run them with the same version of MPICH.
/common/mpich
/common/mpich2
/common/mpich2-64
All user account issues must be in order for this to work.
– Password free ssh in particular
If the application does not work from the command line, SGE will
not help.
41. Loose Integration with SGE
Loose Integration
– Grid Engine used for:
• Picking when the job runs
• Picking where the job runs
• Generating the custom machine file
– Grid Engine does not:
• Launch or control the parallel job itself
• Track resource consumption or child processes
Advantages of loose integration
– Easy to set up
– Can trivially support almost any parallel application technology
Disadvantages of loose integration
– Grid Engine can’t track resource consumption
– Grid Engine must “trust” the parallel app to honor the custom hostfile
– Grid Engine can not kill runaway jobs
42. Tight integration with SGE
Tight Integration
– Grid Engine handles all aspects of parallel job operation from start to
finish
– Includes spawning and controlling all parallel processes
Tight integration advantages:
– Grid Engine remains in control
– Resource usage accurately tracked
– Standard commands like “qdel” will work
• Child tasks will not be forgotten about or left untouched
Tight Integration disadvantages:
– Can be really hard to implement
– Makes job debugging and troubleshooting harder
– May be application specific
43. Running an mpich job with loose SGE
integration
Step one: Job must work without SGE.
– Until you can demonstrate a running job using a host file and manual
start up, there is no point to involving SGE
Step two: Create a wrapper script to allow SGE to define the list of
hosts and the number of tasks
Step three: Submit that wrapper script into a ‘parallel
environment’
– Parallel environment manages all the host list details for you.
44. MPICH Wrapper for CPI
A trivial MPICH wrapper for Grid Engine:
#!/bin/bash
## ---- EMBEDDED SGE ARGUMENTS ----
#$ -N MPI_Job
#$ -pe mpich 4
#$ -cwd
#$ -S /bin/bash
## ------------------------------------
MPIRUN=/common/mpich/ch_p4/bin/mpirun
PROGRAM=/common/mpich/ch_p4/examples/cpi
export RSHCOMMAND=/usr/bin/ssh
echo "I got $NSLOTS slots to run on!"
$MPIRUN -np $NSLOTS -machinefile $TMPDIR/machines $PROGRAM
45. Job Execution
Submit just like any other SGE job:
[genesis2:~] bioteam% qsub submit_cpi
Your job 234 ("MPI_Job") has been submitted
Output files generated:
[genesis2:~] bioteam% ls -l *234
-rw-r--r-- 1 bioteam admin 185 Dec 15 20:55 MPI_Job.e234
-rw-r--r-- 1 bioteam admin 120 Dec 15 20:55 MPI_Job.o234
-rw-r--r-- 1 bioteam admin 52 Dec 15 20:55 MPI_Job.pe234
-rw-r--r-- 1 bioteam admin 104 Dec 15 20:55 MPI_Job.po234
46. Output
[genesis2:~] bioteam% more MPI_Job.o234
I got 5 slots to run on!
pi is approximately 3.1416009869231245, Error is
0.0000083333333314
wall clock time = 0.001697
[genesis2:~] bioteam% more MPI_Job.e234
Process 0 on node006.cluster.private
Process 1 on node006.cluster.private
Process 4 on node004.cluster.private
Process 2 on node013.cluster.private
Process 3 on node013.cluster.private
49. Behind the scenes: MPICH
The “startmpi.sh” script is run before job launches and
creates custom machine file
The user job script gets date required by ‘mpirun’ from
environment variables:
$NODES, $TEMPDIR/machines, etc.
The “stopmpi.sh” script is just a placeholder
Does not really do anything (no need yet)
50. Behind the scenes: LAM-MPI
• Just like MPICH
• But 2 additions:
– The “lamstart.sh” script launches LAMBOOT
– The “lamstop.sh” script executes LAMHALT at job termination
• In an example configuration, lamboot is started this way:
– lamboot -v -ssi boot rsh -ssi rsh_agent "ssh -x
-q" $TMPDIR/machines
51. Behind the scenes: LAM-MPI
• A trivial LAM-MPI wrapper for Grid Engine:
#!/bin/sh
MPIRUN=“/common/lam/bin/mpirun”
## ---- EMBEDDED SGE ARGUMENTS ----
#$ -N MPI_Job
#$ -pe lammpi 3-5
#$ -cwd
## ------------------------------------
echo "I have $NSLOTS slots to run on!"
$MPIRUN C ./mpi-program
52. OpenMPI
In absence of specific requirements, a great choice
Works well over Gigabit Ethernet
Trivial to achieve tight SGE integration
Recent personal experience:
– Out of the box: ‘cpi.c’ on 1024 CPUs
– Out of the box: heavyweight genome analysis pipeline on 650 Nehalem
cores
53. Behind the scenes: OpenMPI
OpenMPI 1.2.x natively supports automatic tight SGE integration
– Build from source with “--enable-sge”
– mpirun -np $NSLOTS /path-to-my-parallel-app
OpenMPI PE config:
pe_name openmpi
slots 4
user_lists NONE
xuser_lists NONE
start_proc_args /bin/true
stop_proc_args /bin/true
allocation_rule $round_robin
control_slaves TRUE
job_is_first_task FALSE
urgency_slots min
55. Application profiling
• Most basic: System Monitoring
– ‘top’
– Ganglia
• Apple ‘shark’ tools
• Deep understanding of code.
56. Tuning parallel jobs
• Round Robin:
– Jobs are distributed to as many nodes as possible
– Good for tasks where memory may be the bottleneck
• Fill up:
– Jobs are packed onto as few nodes as possible
– Good for jobs where interprocess communications may be the
bottleneck
• Single chassis
– “threaded” environment from earlier sessions
– For multi-threaded programs (BLAST)