Maintenance Downtime Logs for 2024

In February 2025, these downtime updates were moved from the CCR Help Desk portal to this location. This page contains 5 months of downtime change logs.

December 2024 Downtime

Date of downtime: FRIDAY, December 20, 2024
NOTE: The date of this downtime has been changed to accommodate the end of semester and multiple holidays at the end of December. Thank you for your understanding!

Approximate time of outage: 7am-5pm

Resources affected by downtime:

  • UB-HPC cluster (all partitions)
  • Faculty cluster (all partitions)
  • Portals: OnDemand, ColdFront (temporarily), IDM (temporarily)

What will be done:

  • Reboot of all cluster nodes
  • Updates of front-end login nodes (login1, login2, vortex-future)
  • Infrastructure Updates
  • Decommissioning of Skylake compute nodes (racks i05,i06,i07,i08) - these have reached 7 years in age and are being removed from service.
  • Updated versions of cluster related services and packages:
    • Apptainer v1.3.5 -> v1.3.6
    • NVIDIA driver v535.216.01 -> v535.216.03
    • Slurm v24.05.4 -> v24.05.5

Specific Effects to CCR users:
Removing the Skylake compute nodes will remove the MRI tag from Slurm We do not anticipate any of these updates will result in system problems. However, with NVIDIA driver updates on the compute nodes, it's possible this might have an effect on users' workflows. Please report any suspected problems to CCR Help with details so that we may attempt to replicate them.

November 2024 Downtime

Date of downtime: Tuesday, November 26, 2024

Approximate time of outage: 7am-5pm

Resources affected by downtime:

  • UB-HPC cluster (all partitions)
  • Faculty cluster (all partitions)
  • Portals: OnDemand, ColdFront (temporarily), IDM (temporarily)

What will be done:

  • Operating system upgrades on all cluster nodes - this includes an update to NVIDIA drivers and Apptainer
  • Reboot of all cluster nodes
  • Updates of front-end login nodes (login1, login2, vortex-future)
  • Infrastructure Updates
  • Update OnDemand to version 3.1.10

Specific Effects to CCR users:
We do not anticipate any of these updates will result in system problems. However, with operating system updates on the compute nodes, it's possible these might have an effect on users' workflows. Please report any suspected problems to CCR Help with details so that we may attempt to replicate them.

October 2024 Downtime

Date of downtime: Tuesday, October 29, 2024

Approximate time of outage: 7am-5pm

Resources affected by downtime:

  • UB-HPC cluster (all partitions)
  • Faculty cluster (all partitions)
  • Portals: OnDemand, ColdFront (temporarily), IDM (temporarily)

What will be done:

  • Reboot of all cluster nodes
  • Updates of front-end login nodes (login1, login2, vortex-future)
  • Infrastructure Updates
  • OnDemand update to version 3.1.9 - this includes a long requested feature of keeping the cluster app alive/active for longer than just a few minutes
  • Slurm upgrade to version 24.05.4
  • Compute node operating system updates - this includes an update to NVIDIA drivers
  • Identity management portal software update

Specific Effects to CCR users:
We do not anticipate any of these updates will result in system problems. However, with operating system updates on the compute nodes, it's possible these might have an effect on users' workflows. Please report any suspected problems to CCR Help with details so that we may attempt to replicate them.

September 2024 Downtime

Date of downtime: Tuesday, September 24, 2024

Approximate time of outage: 7am-5pm

Resources affected by downtime:

  • UB-HPC cluster (all partitions)
  • Faculty cluster (all partitions)
  • Portals: OnDemand, ColdFront (temporarily), IDM (temporarily)
  • Lake Effect Research Cloud: Offline for part of the day

What will be done:

  • Electrical work in machine room requiring power outage of all compute nodes and other services
  • Reboot of all cluster nodes
  • Updates of front-end login nodes (login1, login2, vortex-future) to Ubuntu 24
  • Update of OnDemand
  • Update of apptainer
  • Infrastructure services updated
  • Slurm upgrade
  • Removal of anaconda3 module from ccrsoft/2023.01 - Anaconda has not been supported by CCR for some time and due to changes in the licensing terms we are now legally obligated to remove it from our systems
  • Removal of /util/academic, /util/common, /util/industry, /util/software/legacy from CCR systems

Specific Effects to CCR users:

  • module load anaconda3 will no longer work
  • Since January 2023, we have moved forward with sunsetting the ccrsoft/legacy software environment (everything installed in /util/common, /util/academic, & /util/industry). This software environment was removed from our systems in June so users can no longer run module load ccrsoft/legacy and access any of the previous generation of software modules. However, there may be users who installed software in their own project or home directories that depended on software within the /util structure. Though it is unlikely this software is still working due to other considerable upgrades done on CCR's systems in the last year, there is always the possibility this final change will break some workflows. As we've said for the last 18 months, please ensure that your software works without anything in /util so that you do not experience any issues after this downtime.
  • If you are using one of the legacy containers provided by CCR in June as a temporary work around, these will also be going away as they depend on software installed in /util.

August 2024 Downtime

Date of downtime: Tuesday, August 27, 2024

Approximate time of outage: 7am-5pm

Resources affected by downtime:

  • UB-HPC cluster (all partitions)
  • Faculty cluster (all partitions)
  • Portals: OnDemand, ColdFront (temporarily), IDM (temporarily)

What will be done:

  • Reprovisioning and reboot of all cluster nodes
  • Updates of front-end login nodes (login1, login2, vortex-future)
  • Infrastructure Updates

Specific Effects to CCR users:
For those using MPI we recommend reading through CCR's Slurm job documentation on how best to request high speed networks.