Modern computer systems are very robust, and SCI IT has infrastructure in place to ensure that all hosts stay up, functional, and available 24×7. However, maximizing uptime also requires regular maintenance, which includes patching, updating, and rebooting machines—processes that impact both availability and the services running on them. Additionally, no infrastructure is 100% reliable, and sometimes, despite best efforts, unexpected downtime occurs.

This document outlines how SCI IT handles system downtime—both planned and unplanned—and how we communicate with SCI users about it. All times listed are for Salt Lake City, UT.

Patching and updating

A machine’s operating system and the software packages it uses are constantly updated to fix bugs, add new features, address security risks, etc. Most of these updates do not require a reboot and can be completed without affecting anyone, but some do require a system restart. These updates are the most common reason machines may be unavailable, though downtime is generally limited to just a few minutes.

Updates are applied differently depending on the type of machine:

Infrastructure Servers

These machines—such as those that run SCI IT services (mailing list servers, monitoring, web servers, etc.)—will receive software updates at 3 a.m. on Wednesday mornings and will be rebooted if necessary. This includes the shell.sci.utah.edu machines. Any interruption in service will be minimized, but some services may become temporarily unavailable during reboots.

A message will be posted in the #sci-it Slack channel the day prior as a reminder.

Desktops, Workstations, and Compute Clusters

As of July 2024, a standard procedure for non-server updates/reboots has not yet been decided or implemented. If you would like your desktop or workstation updated, please contact SCI IT.

Regular Maintenance

Occasionally, work outside of regular patching needs to be performed, which may cause temporary unavailability. SCI IT has designated Fridays from 5–7 p.m. for such work, although this window is not used most weeks. If work that may interrupt service is scheduled during this time, SCI IT will provide advance notice via email and the #sci-it Slack channel.

Emergency Maintenance or Major Downtimes

Major work affecting a large number of SCI users will typically be scheduled during University of Utah breaks, holidays, or low-usage periods in the summer. These will be announced in advance via email and in the #sci-it and #general Slack channels. Unexpected outages (including identification, progress updates, and resolution) will also be communicated in the same way.