NT’s Bursting HPC product aims to provide the highest number of compute cycles to facilitate the fastest turn-around of urgent jobs for the lowest, green, price possible
no size limit to the job/result storage
no exit fees: grab your results once they are ready, just as you would from your own cluster, with a minimum residency period guaranteed.
In terms of hardware, on-site facilities, administration and user tools: NONE!
For example, all normal CLI, SSH and/or web based user, job and file handling facilities are provided.
However, since our geo-thermal sites are commonly located in remote rural areas away from wide internet backbones, continual up- /down-load of small data chunks (interactive visualization and/or storage transfers) will adversely affect overall usability of the facilities. We have therefore decided not to offer interactive (i.e. remote access to visualization and similar analytical applications/tools) sessions or permanent storage at this time.
Up to 60K cores using, AMD EPYC Milan or later CPUs
alternative architectures: per client discussions.
Bare metal is expected to be the preferred option;
Virtual Machines (VMs) &/or Containers can be provisioned but the specifics will depend on the client.
No, but with one caveat;
We will continually review occupancy/usage with a view to expanding capacity as required pre-emptively. Currently, our suppliers are stating less than 1 month from order to commissioning of new hardware so that we can expand quickly.
The caveat: in the unlikely event that requested occupancy of our capability exceeds current availability and if acceptable to the client, we will either seamlessly (and at no additional cost or risk to the client) temporarily offload some tasks to a preapproved alternative supplier or if preferred add a slight delay until servers become available.
Each client (organization) is assigned a dedicated area, within which each clients’ user(s) are assigned a $HOME folder. This is used as the location for upload, job submission and result retrieval.
The data-handling model mimics classic compute server usage via a job scheduler (likely PBSPro). Jobs are submitted to the queue from a user (e.g. $HOME) location. Jobs are run and all result files copied back to the source location for user retrieval. Email is sent notifying user of job completion and/or run state if submission or runtime error is detected
Due to expected latency we have decided not to offer interactive capabilities (i.e. there is NO remote access to visualization and similar analytical applications/tools)
NT’s Bursting HPC focus on the fastest turn-around of urgent jobs for the lowest, green, price possible means we are not configured to also act as a data repository providing long term storage
The short term “job/result” storage available is:
user area ($HOME) for upload of job files;
same user area for retrieval of completed runs.
Access portal(s), head nodes and, general pool nodes always run fully updated Enterprise Linux servers (RHEL, SLES, Ubuntu…);
Depending on specific client image configurations, cluster nodes may be using unpatched instances;
After these resources are released by the client, they are taken down, re-imaged to latest LTS versions and verified prior to being reinstated into the general pool(s);
All portal, user (head node) and compute machines are behind enterprise level firewalls;
Dedicated hardware clusters are an option for further isolation at additional cost, the details to be negotiated at time of request.
VLANs using dynamically assigned switches are under consideration if clients request, perhaps requiring fee adjustments.
Clients supply and purchase licenses;
Rescale, Altair or other third party partners may provide further tools for license pulling and similar management tasks.
Very short term (seconds to hours):
on-site short-term energy storage (large UPS);
Short term (hours to days):
temporarily pull from Grid
“green power” products only;
Yes, it will have two directions of connection to next backbone location;
i.e. if fiber is broken in one direction, total throughput is reduced but possible over the alternate branch