Integrated Computational Materials Engineering (ICME)

Running PBS script with LAMMPS

Abstract

This example shows how to run LAMMPS (or any other UNIX executable) on a UNIX cluster that uses the batch scripting language, PBS (e.g., at CAVS, Raptor, Talon etc.). This example will also show how to use LAMMPS on a cluster that does not have PBS scripting (e.g., at CAVS, Javelin, Bazooka, etc.). This will use the LAMMPS input script from Tutorial 1. Notice that these scripts are meant to show how to run on the HPC clusters at Mississippi State University - other alterations may be required for other clusters and universities.

Author(s): Mark A. Tschopp

LAMMPS Input File

Download an input file

This input script was run using the Jan 2010 version of LAMMPS. Changes in some commands may require revision of the input script. Copy the text below and paste it into a text file, 'calc_fcc.in'. Use the 'Paste Special' command with 'Unformatted Text'. Notice that the replicate command is used in the following script so that it is a 20 x 20 x 20 simulation cell (32,000 atoms) that is going to be run on 16 processors. Notice that we get the same cohesive energy as that run with 4 atoms in Tutuorial 1.

# Find minimum energy fcc configuration
# Mark Tschopp, 2010

# ---------- Initialize Simulation --------------------- 
clear 
units metal 
dimension 3 
boundary p p p 
atom_style atomic 
atom_modify map array

# ---------- Create Atoms --------------------- 
lattice 	fcc 4
region	box block 0 1 0 1 0 1 units lattice
create_box	1 box
lattice	fcc 4 orient x 1 0 0 orient y 0 1 0 orient z 0 0 1  
create_atoms 1 box
replicate 20 20 20

# ---------- Define Interatomic Potential --------------------- 
pair_style eam/alloy 
pair_coeff * * Al99.eam.alloy Al
neighbor 2.0 bin 
neigh_modify delay 10 check yes 
 
# ---------- Define Settings --------------------- 
compute eng all pe/atom 
compute eatoms all reduce sum c_eng 

# ---------- Run Minimization --------------------- 
reset_timestep 0 
fix 1 all box/relax iso 0.0 vmax 0.001
thermo 10 
thermo_style custom step pe lx ly lz press pxx pyy pzz c_eatoms 
min_style cg 
minimize 1e-25 1e-25 5000 10000 

variable natoms equal "count(all)" 
variable teng equal "c_eatoms"
variable length equal "lx"
variable ecoh equal "v_teng/v_natoms"

print "Total energy (eV) = ${teng};"
print "Number of atoms = ${natoms};"
print "Lattice constant (Angstoms) = ${length};"
print "Cohesive energy (eV) = ${ecoh};"

print "All done!"

PBS batch script

Here is an example batch script for Raptor. Copy the text below and paste it into a text file, 'pbs_Raptor_calc_fcc.txt'. Use the 'Paste Special' command with 'Unformatted Text'.

#!/bin/sh 
#PBS -N calc_fcc 
#PBS -q q16p192h@Raptor 
#PBS -l nodes=4:ppn=4 
#PBS -l walltime=192:00:00 
#PBS -mea 
#PBS -r n 
#PBS -V 
cd $PBS_O_WORKDIR 
mpirun -np 16 lmp_exe < calc_fcc.in 

Here is an example batch script for Talon. Copy the text below and paste it into a text file, 'pbs_Talon_calc_fcc.txt'. Use the 'Paste Special' command with 'Unformatted Text'.

#!/bin/sh 
#PBS -N calc_fcc
#PBS -q q192p48h@Talon 
#PBS -l nodes=16:ppn=12 
#PBS -l walltime=48:00:00 
#PBS -mbea 
#PBS -r n 
#PBS -V 
cd $PBS_O_WORKDIR 
mpirun -np 192 lmp_exe < calc_fcc.in

Here is an example batch script for running the MATLAB script, "run_MATLAB-script_Raptor.m", on Raptor. Copy the text below and paste it into a text file, 'pbs_Raptor_MATLAB.txt'. Use the 'Paste Special' command with 'Unformatted Text'.

#!/bin/sh 
#PBS -N calc_fcc 
#PBS -q q16p192h@Raptor 
#PBS -l nodes=4:ppn=4 
#PBS -l walltime=192:00:00 
#PBS -mea 
#PBS -r n 
#PBS -V 
cd $PBS_O_WORKDIR 
matlab -nodesktop -nodisplay -nosplash -nojvm -r "run_MATLAB-Script_Raptor;"

Running simulations using a batch script

Here are the steps that you need to do for running on Raptor:
  1. Open up a Secure Shell Client on your computer.
  2. Quick connect using hostname "raptor-login" with your user name.
  3. Congratulations! You are now logged on to the compute node for Raptor. Do not run any simulations on this node - it is simply meant for submitting pbs scripts on, and it then assigns jobs using the PBS scheduler.
  4. Type "swsetup pbs" to setup the paths to the PBS scheduler.
  5. Change to the directory that contains your input script and your pbs script, i.e., "cd work_directory" where work_directory is where your scripts are contained.
  6. Type "qsub pbs_Raptor_calc_fcc.txt" and hit enter. Your job has been submitted and will run when the scheduler can fit it in

The same steps can be used for Talon. Login to "talon-login" with your user name. Change to the directory with your scripts, type "qsub pbs_Talon_calc_fcc.txt" and hit enter.

A few things about the PBS scheduler:
  • There are numerous websites online that explain the #PBS commands in the PBS file. Use them for questions.
  • The "#PBS -q q192p48h@Talon" line tells the scheduler what queue you will be running on. In this queue, you can request a maximum of 192 processors for 48 hours. A quick trick for checking what queues may be available on a cluster is to type "qstat" and see what other people are using.
  • The "#PBS -l nodes=16:ppn=12" line is specific to the cluster that you are running on. For Talon, there are 12 processors per node. Therefore, 16 nodes are needed for 192 processors.
  • The "cd $PBS_O_WORKDIR" line changes the directory to whatever directory you submitted your pbs script from. When the PBS script is submitted, certain variables are stored with it, the $PBS_O_WORKDIR being one of them.
  • The pbs scheduler can handle lots and lots of pbs scripts (if you have a lot of scripts to run) and will prioritize which job is run next.
  • The "qstat" command can be used to display the status of your job.
  • If you know approximately how long your job will take, you can request less processors and less hours in the "#PBS -l walltime=48:00:00" line. Sometimes, the scheduler can fit in smaller simulations in between the larger simulations that are scheduled to run. For instance, consider a job that requests 192 processors for 48 hours and there are only 96 available. If another 96 processors will not become available for 24 hours, the scheduler can fit in a job that only requires 96 processors for 24 hours while it is waiting for processors for the larger job. The whole goal of the PBS scheduling system is to try to obtain 100% usage of its processors.
  • Adjust the number of processors through the "#PBS -l nodes=16:ppn=12", not the queue line. For example, if you only require 96 processors, change this line to "#PBS -l nodes=8:ppn=12". Remember to alter your "mpirun -np 96 lmp_exe < fcc_calc.in" line too!

Without PBS Scheduler

This is easy. Just use the executable line from the PBS Batch Script, so:
  1. Open up a Secure Shell Client on your computer.
  2. Quick connect using hostname "javelin" with your user name.
  3. Congratulations! You are now logged on Javelin. Unlike clusters with PBS, where there is only a compute node, you can submit your job on multiple processors here. There is a scheduler that tries to balance the load, though.
  4. Change to the directory that contains your input script and your pbs script, i.e., "cd work_directory" where work_directory is where your scripts are contained.
  5. Type "lmp_exe < calc_fcc.in" to run on 1 processor or "mpirun -np 12 lmp_exe < calc_fcc.in" to run on 12 processors.
IMPORTANT NOTES:
  1. There are a much smaller amount of processors on Javelin, so please do not use 128 processors for a cluster with only 24 processors, for example. It will try to run, but the communication between processors will severely slow down the calculation.
  2. Understand that processors may switch between jobs on the fly to try to balance the load. Therefore, you can run a job over the top of someone else's job. If all the processors are being used, you can still run your job, but it will be slowed down due to the other jobs running. This is the advantage of running on a cluster with the PBS scheduler - you get sole possession of the processors for running your job.
  3. These are often useful for debugging and setting up codes prior to running on the bigger clusters with PBS schedulers, i.e., because if you have a relatively small simulation, you do not have to go through a scheduler to execute your code.
  4. How many processors should I use? Type 'top' to see what is running currently. If nothing is running and your simulation is short, then feel free to use more processors than if there are a lot of processes running. Type 'q' to quit the 'top' screen.