Underutilized: E131 - Too Many Unused Nodes Message
This error occurs if only 1 node assigned to a job is being used and all the others are idle. In this example, 31 nodes were found to be inactive. The load value represents the number of user processes running on the node. Ideally, there should be 1 user process per core. This job asked for 32 nodes, and there were 20 cores per node, so the total load should have been about 640. In fact, all the processes were started on one node, smic252 (see line 9), giving a load of 639.92. This caused resource starvation on the node. This is reflected in the fact that the job ran for almost 4 hours, but no CPU time was consumed (see line 14). Such a situation could destabilize the system, so the job was terminated. Information collected by the PBS job manager is reflected in lines 13 to 32. User account and allocation information is shown in lines 33 to 42.
1) E131 - Too many unused nodes 2) Job 74236 has 31 unused nodes. 3) Please correct this problem. 4) 5) Job deleted 6) 7) PBS job: 74236, nodes: 32 8) Hostname Days Load CPU U#(User:Process:VirtualMemory:Memory:Hours) 9) smic252 34 639.92 0 0 10) smic253 34 0.16 0 0 11) smic254 34 0.11 0 0 12) . . . 29 similar lines removed . . . 13) PBS_job=74236 user=flast allocation=hpc_alloc03 14) queue=checkpt total_load=641.57 cpu_hours=0.00 wall_hours=3.90 15) unused_nodes=31 total_nodes=32 ppn=20 avg_load=20.04 16) avg_cpu=0% avg_mem=0mb avg_vmem=0mb top_proc=none:0.0hr:0% 17) node_processes=0 18) 19) Node statistics:: 20) Number of nodes: 32 21) Number of cores: 640 22) Total physical memory per node: 64364mb 23) Average memory usage per node: 0mb, 0% 24) Average memory usage per core: 0mb 25) Average virtual memory usage per node: 0mb 26) Average virtual memory usage per core: 0mb 27) Average CPU percent per node: 0% 28) Average CPU percent per core: 0% 29) Average load per node: 0.02 30) Reverified average load per node: 19.89 31) Effective maximum load on a node: 635.08 32) 33) Name: First Last 34) Mail: flast@somewhere.lsu.edu 35) Affil: First Last 36) Category: 37) Name: First Last 38) Mail: flast@somewhere.lsu.edu 39) Affil: First Last 40) Category: validation:current:02/22/2011 41) Allocations: 42) hpc_alloc03,flast,1578202.88,default
Users may direct questions to sys-help@loni.org.