LSU HPC Storage Policy
h3
h4
▶ Table of Contents
See also: ITS Faculty Data Storage.
1. Storage Systems
Available storage on LSU@HPC high performance computing machines is divided into four file systems (Table 1). More complete descriptions of follow below.
File System | Description |
---|---|
/var/scratch | Local storage on each node where existence of files are guaranteed only during job execution (i.e. could be deleted as soon as the job finishes). |
/work | Shared storage provided to all users for job input, output, and related data files. Files are not backed up and may be subject to purge! |
/home | Persistant storage provided to each active user account. Files are backed up to tape. |
/project | Persistant storage provided by request for a limited time to hold large amounts of project-specific data. Files are not backed up. |
All file systems share a common characteristic. When they are too full, system performance begins to suffer and all users suffer from the effects. Similarly, placing too many individual files in a directory degrades performance. System management activities are aimed at keeping the different file systems below 80% of capacity. It is strongly recommended that users hold file counts below 1,000, and never exceed 10,000, per subdirectory. If atypical usage begins to impact performance, individual users may be contacted and asked to help resolve the issue. When performance or capacity issues become significant, system management may intervene by requiring users to offload files, stop jobs, or take other actions to ensure the stability and continued operation of the system.
Management makes every effort to avoid data loss. In the event of a failure, every effort will be made to recover data. However, the volume of data housed makes it impractical to provide system-wide data backup. Users are expected to safeguard their own data by making sure that all important code, scripts, documents and data are transferred to another location in a timely manner. The use of inexpensive, high capacity external hard drives attached to a local workstation is highly recommended for individual user backup.
Back to Top2. File System Details
2.1. /var/scratch
/var/scratch space is provided on all compute nodes, and is local to each node (i.e. files stored in /var/scratch cannot be accessed by other nodes). The size of this file system will vary from system to system, and possibly across nodes within a system. This is the preferred place to put any intermediate files required while a job is executing. Once the job ends, the files it stores in /var/scratch are subject to deletion. Users should not have any expectation that files will exist after a job terminates, and are expected to move the data from /var/scratch to their /work or /home directory as part of the clean up process in their job script.
Back to Top2.2. /work
/work is the primary file storage area to utilize when running jobs. /work is a common file system that all system nodes have access to. It is the recommended location for input files, checkpoint files, other job output, as well as related data files. Files on /work are not backed up, making the safekeeping of data the responsibility of the user.
User may consume as much space as needed to run jobs, but must be aware that since the /work file systems are fully shared, they are subject to purge if they become overfull. If capacity approaches 80%, an automatic purge process is started by management. This process targets files with the oldest age and size, removing them in turn until the capacity drops to an acceptable level. The purge process will normal occur once per month, as necessary. In short, use the space required, but clean up afterwards. See the description of /project space below for longer term storage options.
Back to Top2.3. /home
All user home directories are located in the /home file system. /home is intended for the user to store source code, executables, and scripts. /home may be considered persistent storage. The data here should remain as long as the user has a valid account on the system. /home will always have a storage quota, and is clearly a hard limit. While /home is not subject to management activities controlling capacity and performance, it should not be considered permanent storage, as system failure could result in the loss of information. Users should arrange for backing up their own data, even though /home is periodically backed up to tape.
Back to Top2.4. /project
The /project file system may be available on some systems, and provides storage space for a specific project. /project space is allocated and must be requested. To qualify as the PI of a storage allocation, the user must satisfy the PI qualifications of a computational allocation and must have an active computational allocation when the request is submitted. It is made available for 12 months at a time. Shortly before an allocation expires the user will be notified of the upcoming expiration. Users may request to have the allocation extended. Renewal requests should be submitted at least 1 month prior to expiration to allow decision and planning time. Users should have no expectation that data will persist, and may be erased any time after 1 month from a project’s expiration. Thus the user is encouraged to employ alternate safe keeping and protection solutions of their own.
Back to Top2.5 System Specific Information
System | File System | Storage (Tb) |
Quota (GB) |
Purge File Limit (Million) |
---|---|---|---|---|
SuperMike-III | /work | 840 (LPFS)[2] | N/A | 4 |
/home | (NFS)[3] | 10 | N/A | |
/project | 840 (LPFS) | By Request | ||
SuperMIC | /work | 840 (LPFS)[2] | N/A | 4 |
/home | (NFS)[3] | 10 | N/A | |
/project | 840 (LPFS)[4] | By Request | ||
Deep Bayou | /work | 840 (LPFS)[5] | N/A | 4 |
/home | (NFS)[3] | 10 | N/A | |
/project | 840 (LPFS)[5] | By Request |
3. Job Use
On all systems, jobs must be run from the /work file systems, and not from the /home or /project file systems. The individual nodes assigned to a job will have access to their local /var/scratch space. Files should be copied from /home or /project space to /work before a job is executed, and back when a job terminates, to avoid excessive I/O during execution that degrades system performance.
Back to Top4. /project Allocation Requests
Limitations: Space is allocated on the /project file systems by request for periods of up to 12 months at a time. Renewal requests are allowed but, in the interest of fairness, are subject to availability and competing requests. Please be advised that the data stored on the /project file systems is NOT backed up and it is the users' responsibility to ensure that any data of importance is backed up somewhere else.
How to Apply: A storage allocation may be requested by completing a web form (click here). The provided information should fully justify the need for the storage, and indicate how the data will be handled in the event that the allocation can not be renewed. A user group can be set up for sharing access to the space if the Pricipal Investigator (PI) includes a list of HPC user names. The PI can provide this list either in the "Description of the need for this storage" section of the web form or by sending an email to sys-help@loni.org with the subject "[Creating|Adding] members to a storage allocation".
Allocation Class: Allocations are divided into 3 classes (small, medium, and large), each with a separate approval authority (Table 3). All storage allocation requests are limited by the available space, and justifications must be commensurate with the amount of space required. Each PI is allowed only one storage allocation per /project storage volume.
All classes need to describe how the allocation will be used and why the /work volume is insufficient. In addition to this, large storage allocations need to briefly describe a data plan that includes storage allocation size calculations based on cluster model runs, data sizes, software package size, etc. that justifies a large allocation. An estimate of the timeline on the need of the storage is also requested. Many researchers include this information in their CPU allocation request where they have described their research in detail, and often just attach their CPU allocation request PDF to the storage allocation request or reference it. Please also note that, since all storage allocations are cluster specific, the PI must clearly indicate for which cluster the storage allocation is being requested.
Storage Allocation Policy:
- Each storage allocation can only have one Principal Investigator (PI), who is responsible for the administrative tasks of the allocation, such as renewal and membership management;
- The PI of a storage allocation is the steward of all data stored under it (the LSU Security of Data policy can be found here: LSU Security of Data policy);
- In the event that a member needs to be removed from a storage allocation, it is the PI's responsibility to notify the HPC staff and clean up the data owned by that person;
- The maximum large storage allocation request has been established per PI at 20 TB;
- Storage allocations are for ongoing HPC use and are not meant for archival storage. Any storage allocation request (including renewal) without an active CPU allocation will be rejected;
- When a storage allocation expires, the PI has a grace period of up to 4 weeks to copy the data off. The data will be removed when the grace period expires.
Class | Size | Approval Authority |
---|---|---|
Small | Up to 100GB | HPC@LSU Staff |
Medium | Between 100GB and 1TB | HPC@LSU Operations Manager |
Large | Over 1TB | HPC@LSU Director |
[1] Global Parallel File System
[2] Lustre Parallel File System
[3] Network File System
[4] Physically shares /work
[5] Shared with the SuperMIC cluster
Last revised: 10 Dec 2014