Policy: addition of compute nodes to the HPC center
General
This document describes the policy of the CIS Division, formulated according to guidelines of the Technion management, regarding the addition of new compute nodes (that are purchased by the academic staff members) to Zeus, the High-Performance Computing (HPC) cluster at the Technion. The Technion encourages researchers to take this action, because it increases the utilization of the hardware and makes the life easier for the researchers since it is supported by the CIS division.
Benefits of joining private servers to the cluster
Suppose that a researcher joins his/her servers, with a total of [Equation] cores, to Zeus. Let us call those servers the ‘private’ servers. In that case he/she enjoys the following three direct benefits:
- CPU-usage credit: while CPU/GPU computation hours are charged from researchers, a researcher that connects his/her servers to Zeus gets a monthly credit of x * 24 * 30 * 0.7 core hours.
- A private queue: the researcher gets access to a private queue to the private servers, which has no limits on the running time of the jobs (in contrast to the general queues, which are limited to 24 hours or in some cases, a week). Note that this does not mean that the researcher is the only user of those servers, as they also serve other queues. The researcher will be given the ability and privilege to set policy priorities between the computational tasks of his/her research group.
- A higher limit on parallel usage: An increase in the maximum number of cores that can be used in parallel by the researcher. While other users are limited to (currently) 600 cores, researchers in this group are limited by 600 + x cores in parallel.
The usage of the cluster, compared to using disconnected private nodes, has additional advantages:
- The hosting, operation and support is given by the computer center.
- Access to thousands of cores (note the above restriction, however, on the number of cores that can be ran in parallel by a single user).
- Central software resources and licensing
- Physical hosting of the acquired nodes at the server farm, including:
- Location in the racks
- Dual power supply
- Dedicated air conditioning system
- Central UPS system
- Management services, system administration, installing software and upgrades
- Every research group gets a 2TB storage allocation in the central system [1]
- Information security services
- Networking services
- Monitoring and control services (incl. usage statistics).
Note: Zeus does not host equipment that is 7 years old or older. Old equipment will be removed and returned to its owner.
Configurations recommended by CIS
The following table lists the recommended CPU / memory configuration (note that many manufacturers / suppliers offer similar configurations).
Configuration name | CPU | Total RAM |
A | AMD EPYC 9554 3.1GHz 64-core | 768GB Memory DDR5 4800MHz |
B | AMD EPYC 9554 3.1GHz 64-core | 1,536GB Memory DDR5 4800MHz |
C | Xeon-Platinum 8580 2.0GHz 60-core | 1,024GB Memory DDR5 5600MHz |
D | Xeon-Platinum 8580 2.0GHz 60-core | 1,536GB Memory DDR5 5600MHz |
E | AMD EPYC 9654 2.4GHz 96-core | 1,536GB Memory DDR5 4800MHz |
The above configurations (A – E) also include the following components:
OS Disks: NVME SSD 480 GB M.2
Secondary Disk: NVME SSD 3.2T U.3
NIC: Ethernet 10Gb 2-port SFP+ and SFP RJ45 Transceiver *2, Ethernet 10/25Gb 4-port SFP28
IB/Ethernet-NIC: InfiniBand HDR100/Ethernet 100Gb 1-port
PS: 1600W Platinum Hot Plug Low Halogen Power Supply*2
Minimum settings for connecting existing equipment to the Zeus cluster
In order to connect an existing server to the Zeus cluster (and host it at the computer center), it must meet the following minimum requirements:
- Size (enclosure): it must fit into a 19 inch rack
- Power supply: it must include 2 independent power supplies
- Age: up to 3 years.
- Warranty: warranty and supplier service for a total of 5 years.
- Computing power: at least 40 physical cores and 320 GB of RAM memory.
- OS Disk: NVME SSD 480 GB M.2
- NIC: Ethernet 10Gb 2-port SFP+
There is also an option for ‘remote hosting’, i.e., the server is hosted in the faculty building. This is possible if the server is connected to a dedicated high-speed Ethernet switch in the faculty, which in turn is connected by fiber-optic to the computer center.
Software
Do not install software without proper licensing. If the software is not free and not already licensed to the Technion, the researcher will bear its cost.