gaussian
Table of Content
Versions and Availability
h4
h5
Module Names for gaussian on qb2
Machine | Version | Module Name |
---|---|---|
qb2 | g09-d01(default) | gaussian/g09-d01(default) |
qb2 | g16-a03 | gaussian/g16-a03 |
qb2 | g16-b01 | gaussian/g16-b01 |
qb2 | g16-c01 | gaussian/g16-c01 |
▶ Module FAQ?
The information here is applicable to LSU HPC and LONI systems.
h4
Shells
A user may choose between using /bin/bash and /bin/tcsh. Details about each shell follows.
/bin/bash
System resource file: /etc/profile
When one access the shell, the following user files are read in if they exist (in order):
- ~/.bash_profile (anything sent to STDOUT or STDERR will cause things like rsync to break)
- ~/.bashrc (interactive login only)
- ~/.profile
When a user logs out of an interactive session, the file ~/.bash_logout is executed if it exists.
The default value of the environmental variable, PATH, is set automatically using Modules. See below for more information.
/bin/tcsh
The file ~/.cshrc is used to customize the user's environment if his login shell is /bin/tcsh.
Modules
Modules is a utility which helps users manage the complex business of setting up their shell environment in the face of potentially conflicting application versions and libraries.
Default Setup
When a user logs in, the system looks for a file named .modules in their home directory. This file contains module commands to set up the initial shell environment.
Viewing Available Modules
The command
$ module avail
displays a list of all the modules available. The list will look something like:
--- some stuff deleted --- velvet/1.2.10/INTEL-14.0.2 vmatch/2.2.2 ---------------- /usr/local/packages/Modules/modulefiles/admin ----------------- EasyBuild/1.11.1 GCC/4.9.0 INTEL-140-MPICH/3.1.1 EasyBuild/1.13.0 INTEL/14.0.2 INTEL-140-MVAPICH2/2.0 --- some stuff deleted ---
The module names take the form appname/version/compiler, providing the application name, the version, and information about how it was compiled (if needed).
Managing Modules
Besides avail, there are other basic module commands to use for manipulating the environment. These include:
add/load mod1 mod2 ... modn . . . Add modules rm/unload mod1 mod2 ... modn . . Remove modules switch/swap mod . . . . . . . . . Switch or swap one module for another display/show . . . . . . . . . . List modules loaded in the environment avail . . . . . . . . . . . . . . List available module names whatis mod1 mod2 ... modn . . . . Describe listed modules
The -h option to module will list all available commands.
▶ Did not find the version you want to use??
If a software package you would like to use for your research is not available on a cluster, you can request it to be installed. The software requests are evaluated by the HPC staff on a case-by-case basis. Before you send in a software request, please go through the information below.
h3
Types of request
Depending on how many users need to use the software, software requests are divided into three types, each of which corresponds to the location where the software is installed:
- The user's home directory
- Software packages installed here will be accessible only to the user.
- It is suitable for software packages that will be used by a single user.
- Python, Perl and R modules should be installed here.
- /project
- Software packages installed in /project can be accessed by a group of users.
- It is suitable for software packages that
- Need to be shared by users from the same research group, or
- are bigger than the quota on the home file syste.
- This type of request must be sent by the PI of the research group, who may be asked to apply for a storage allocation.
- /usr/local/packages
- Software packages installed under /usr/local/packages can be accessed by all users.
- It is suitable for software packages that will be used by users from multiple research groups.
- This type of request must be sent by the PI of a research group.
h3
How to request
Please send an email to sys-help@loni.org with the following information:
- Your user name
- The name of cluster where you want to use the requested software
- The name, version and download link of the software
- Specific installation instructions if any (e.g. compiler flags, variants and flavor, etc.)
- Why the software is needed
- Where the software should be installed (locally, /project, or /usr/local/packages) and justification explaining how many users are expected.
Please note that, once the software is installed, testing and validation are users' responsibility.
About the Software
Usage
Gaussian is run from the command line, and does not provide a graphical interface. Thus interactive and batch job usage is the same. The TCP Linda extension is required run Gaussian in parallel using more than 1 node per job. Currently only LSU has such a license.
Please refer to the FAQ on Common Problems below or the Gaussian User Manual for Memory Requirements for the your gaussian job.
An input file is used to specify the desired calculations. It may be as simple as:
%chk=water.chk # HF/6-31G(d) water energy Title section 0 1 O -0.464 0.177 0.0 H -0.464 1.137 0.0 H 0.441 -0.143 0.0
Please refer to the program documentation for details.
One an input file has been created, the next step is creating a PBS or SLURM job file.
▶ QSub FAQ?
Portable Batch System: qsub
qsub
All HPC@LSU clusters use the Portable Batch System (PBS) for production processing. Jobs are submitted to PBS using the qsub command. A PBS job file is basically a shell script which also contains directives for PBS.
Usage
$ qsub job_script
Where job_script is the name of the file containing the script.
PBS Directives
PBS directives take the form:
#PBS -X value
Where X is one of many single letter options, and value is the desired setting. All PBS directives must appear before any active shell statement.
Example Job Script
#!/bin/bash # # Use "workq" as the job queue, and specify the allocation code. # #PBS -q workq #PBS -A your_allocation_code # # Assuming you want to run 16 processes, and each node supports 4 processes, # you need to ask for a total of 4 nodes. The number of processes per node # will vary from machine to machine, so double-check that your have the right # values before submitting the job. # #PBS -l nodes=4:ppn=4 # # Set the maximum wall-clock time. In this case, 10 minutes. # #PBS -l walltime=00:10:00 # # Specify the name of a file which will receive all standard output, # and merge standard error with standard output. # #PBS -o /scratch/myName/parallel/output #PBS -j oe # # Give the job a name so it can be easily tracked with qstat. # #PBS -N MyParJob # # That is it for PBS instructions. The rest of the file is a shell script. # # PLEASE ADOPT THE EXECUTION SCHEME USED HERE IN YOUR OWN PBS SCRIPTS: # # 1. Copy the necessary files from your home directory to your scratch directory. # 2. Execute in your scratch directory. # 3. Copy any necessary files back to your home directory. # Let's mark the time things get started. date # Set some handy environment variables. export HOME_DIR=/home/$USER/parallel export WORK_DIR=/scratch/myName/parallel # Set a variable that will be used to tell MPI how many processes will be run. # This makes sure MPI gets the same information provided to PBS above. export NPROCS=`wc -l $PBS_NODEFILE |gawk '//{print $1}'` # Copy the files, jump to WORK_DIR, and execute! The program is named "hydro". cp $HOME_DIR/hydro $WORK_DIR cd $WORK_DIR mpirun -machinefile $PBS_NODEFILE -np $NPROCS $WORK_DIR/hydro # Mark the time processing ends. date # And we're out'a here! exit 0
An example PBS batch job file follows:
#!/bin/tcsh #PBS -A your_allocation # specify the allocation. Change it to your allocation #PBS -q checkpt # the queue to be used. #PBS -l nodes=1:ppn=4 # Number of nodes and processors #PBS -l walltime=1:00:00 # requested Wall-clock time. #PBS -o g09_output # name of the standard out file to be "g09_output". #PBS -j oe # standard error output merge to the standard output file. #PBS -N g09test # name of the job (that will appear on executing the qstat command). # # cd to the directory with Your input file cd ~USER/g09test # # Change this line to reflect your input file and output file g09 water.inp
An example SLURM batch job file follows:
#!/bin/bash #SBATCH -A loni_loniadmin1 # specify the allocation. Change it to your allocation #SBATCH -p checkpt # the queue to be used. #SBATCH -N 1 #SBATCH -n 48 # Number of nodes and processors #SBATCH -t 2:00:00 # requested Wall-clock time. #SBATCH -o slurm-%x-%j.out-%N # name of the standard out file to be "slurm-g16-100460.out-qbc185". #SBATCH -J g16_job # name of the job (that will appear on executing the squeue command). # # cd to the directory with Your input file cd $SLURM_SUBMIT_DIR # # Change this line to reflect your input file and output file g16 myinput.com
Multi-node Job Submission
Below is an example PBS job script for running gaussian/g16-b01 on QB2 (Ref: https://github.com/ResearchComputing/Documentation/wiki/Gaussian#parallel-jobs)
#!/bin/bash #PBS -N g16job #PBS -l nodes=2:ppn=20 #PBS -l walltime=01:00:00 #PBS -A your_allocation module load gaussian/g16-b01 for n in $(cat $PBS_NODEFILE | uniq); do echo ${n} done | paste -s -d, > nodes.$PBS_JOBID # the next line prevents OpenMP parallelism from conflicting with Gaussian's internal parallelization export OMP_NUM_THREADS=1 # increases the verbosity of Linda output messages export GAUSS_LFLAGS="-v" cd $PBS_O_WORKDIR date g16 -p=20 -w=$(cat nodes.$PBS_JOBID) myinput date
Below is an example SLURM job script for running gaussian/g16-b01 on QB3 (Ref: https://curc.readthedocs.io/en/latest/software/gaussian.html)
#!/bin/bash #SBATCH -p checkpt #SBATCH -N 2 #SBATCH -n 48 #SBATCH -c 1 #SBATCH -t 2:00:00 #SBATCH -A loni_loniadmin1 #SBATCH -J g16 #SBATCH -o slurm-%x-%j.out-%N module load gaussian/g16-b01 cd $SLURM_SUBMIT_DIR for n in `scontrol show hostname | sort -u`; do echo ${n} done | paste -s -d, > nodes.$SLURM_JOBID # the next line prevents OpenMP parallelism from conflicting with Gaussian's internal parallelization export OMP_NUM_THREADS=1 # increases the verbosity of Linda output messages export GAUSS_LFLAGS="-v" g16 -p=48 -w=$(cat nodes.$SLURM_JOBID) myinput.com #End-of-file (EOF)
Contents of myinput(or myinput.com):
#P b3lyp/6-31g* test stable=(opt,qconly) Gaussian Test Job 135: Fe=O perpendicular to ethene, in triplet state. 0 3 X Fe X RXFe C1 X RXC Fe 90. C2 X RXC Fe 90. C1 180. O X RXO C1 90. Fe 0. H1 C1 RCH C2 CCH Fe Angle1 H2 C1 RCH C2 CCH Fe -Angle1 H3 C2 RCH C1 CCH Fe Angle2 H4 C2 RCH C1 CCH Fe -Angle2 RXFe 1.7118 RXC 0.7560 RXO 3.1306 RCH 1.1000 Angle1 110.54 Angle2 110.53 CCH 117.81
Below are examples for using g09
TCP Linda is required to run gaussian jobs on more than one node
#!/bin/bash #PBS -q checkpt #PBS -l nodes=2:ppn=16 #PBS -l cput=00:20:00 #PBS -l walltime=00:20:00 #PBS -o output-file #PBS -j oe #PBS -V #PBS -N jobtest export WORK_DIR=$PBS_O_WORKDIR cd $WORK_DIR cat $PBS_NODEFILE | sort | uniq > /tmp/.nodes.$PBS_JOBID export GAUSS_LFLAGS="-nodefile /tmp/.nodes.$PBS_JOBID" g09 < input.inp > output.log
In the sample input above, INPUT.inp, the Link 0 directives should be
%UseSSH %chk=/work/mmcken6/g09.chk %mem=16mw %nprocshared=16 Note: must match ppn in the job submission script %NprocLinda=2 Note: must match nodes in the job submission script (rest of input) ...
Note that the "%UseSSH" directive is necessary for Linda jobs, which may fail otherwise. Alternatively, you can add the "-opt Tsnet.Node.lindarsharg: ssh" flag to the g09 command, which has the same effect for Linda jobs.
Resources
- The Gaussian 09 Manual
- The Gaussian 03 Manual is no longer available on line.
▶ Common Problems FAQ?
Gaussian Common Problems
There are a few common Gaussian problems that can be easily resolved. These issues usually stem from disk or memory space limitations.
Memory Requirements
%Mem=N sets the amount of dynamic memory used to N 8-byte words (default); this value may also be followed by KB,MB,GB,KW,MW or GW (without intervening spaces) to specify units of kilo-, mega- or giga- bytes or words. The default memory size is 256 MB.
All LONI clusters and LSU HPC Tezpur cluster has only 4GB RAM per node. For running jobs on these clusters, the value of N should not be greater than 3500MB or 450MW.
LSU HPC clusters such as Philip, Pandora and SuperMike II have 24/48/96, 128 and 32 GB RAM per node respectively. The maximum value of N should be 120GB or 15GW on Pandora, 28GB or 3500MW on SuperMike II and 20/40/90GB on Philip (depending on queue).
If you use a value of N greater than these value, your job will use virtual memory making not only the job to run slower but also cause excessive swapping of memory which can bring down the node. If your jobs repeatedly use more memory than that available on the node and/or bring down the compute node, your privileges of using the cluster will be suspended.
LSU HPC users have access to TCP Linda to run gaussian jobs on multiple nodes. Note that the %Mem=N sets the amount of dynamic memory per node and not total memory for the job, so the maximum value of N should be the same as described above.
You can estimate the amount of memory in 8-byte words that your job will require using the formula
N = M + 2(NB)2
where where NB is the number of basis functions used in the calculation, and M is a minimum value that is usually generously covered by the default memory size.
Please refer to Gaussian manual regarding Link 0 Commands and Efficiency Considerations for more details.
Scaling
First one needs to understand the basic run-time needs of Gaussian calculations. The table below is the Formal Scaling Behavior of Gaussian, in which N = the number of basis functions. Use this table to determine how much work will be required, compared to current selections, if N is increased (e.g. if the behavior is N4, doubling N would result in 16 times more work).
Scaling Behavior | Method(s) |
---|---|
N4 | HF |
N5 | MP2 |
N6 | MP3, CISD, CCSD, QCISD |
N7 | MP4, CCSD(T), QCISD(T) |
N8 | MP5, CISDT, CCSDT |
N9 | MP6 |
N10 | MP7, CISDTQ, CCSDTQ |
Large files and memory usage
Computational cost and demand increases quickly when trying to obtain accuracies better than the MP2 level. On the other hand, one can supply a large molecule at a lower level of theory and still come across the same disk/memory errors.
If one has a large model and needs a good electron correlation method, starting this calculation from an initial guess wave function will likely cause it to fail instantly. A typical route in achieving such accuracies with a large model begins with a good initial guess of the wave function at a lower level of theory. In this method, one uses the orbital coefficients from the lower level of theory calculations, projects them onto a larger basis set, and uses that as an initial guess for the high level of theory. Every chemical model is different; care and caution needs to be taken at each step, perhaps even repeat the calculation using a different set of inputs to see if it converges properly.
For instance, if one would like to run a large model at the MP2/6-311G** level of theory.
- Optimize wave function at the HF/3-21G
- Re-optimize at the MP2/6-31G*
- Re-optimize at MP2/6-311G**
When restarting the calculation the following Guess and SCF options are important
- Guess=Read
- Reads the initial guess from the checkpoint file. If the basis set specified is different from the basis set used in the job which generated the checkpoint file, then the wave function will be projected from one basis to the other. This is an efficient way to switch from one basis to another.
- Geom=AllCheckpoint
- Reads the molecular geometry, charge, multiplicity and title from the checkpoint file. This is often used to start a second calculation at a different level of theory.
- SCF=Restart
- Enable use of checkpoint file.
Break up restart files
Sometimes when writing a large restart file, Gaussian will crash complaining about shared memory is too small, or not enough memory. This is caused by reading/writing too much information at one time. One can break up how it writes its read-write restart file (*.rwf) by:
%rwf=/work/username/tmp1,2GB,/work/username/tmp2,2GB,/work/username/tmp3,2GB
If the last file doesn't have a number, then the rest of the rwf is written to that file.
One problem, two solutions
If one experiences two different solutions to the same problem- either same calculation on two different machines or same calculation run at different times on the same machine - one is likely using an incorrect restart file. Check your output calculations - namely the NOrb value (a different number of orbitals will likely produce a different energy result).
Refer to the Gaussian manual for more information on memory and disk space usage.
Warnings not to be ignored
Warning!!: The largest alpha MO coefficient is
This warning is usually associated with post-HF calculation (MP2 or CC). Although, this is not an error will and will not cause your job to crash, it is an important warning. It warns on the accuracy of your calculation. This occurs when one has a near-linear dependencies in the basis sets. For instance, diffuse functions on two close atoms are likely linearly dependent. When transforming to molecular orbitals, the atomic orbital integrals are multiplied by all the molecular orbital coefficients. The accuracy of the molecular orbital will decrease since one or more atomic orbitals are very large.
Last modified: September 17 2021 12:09:19.