[Ekhi-users] CPD-2 cooling general failure affecting Nostromo and Ekhi clusters

Inigo Aldazabal Mensa inigo.aldazabalm at ehu.eus
Fri Jul 31 00:12:08 CEST 2020


Hi all,

This only concerns to Nostromo and (mostly) Ekhi computing clusters
users.

Due to the very high temperatures today Thursday the 30th, the cooling
system of CFM Data Center 2 (CPD-2) failed completely along the
evening, forcing me to power off all Ekhi and Nostromo nodes as the
temperature raised over 50C, and thus canceling all the running jobs.

The cooling system should not had behaved the way it did, and a reboot
and a couple of button pushes put it back to a normal working state, but
I did have to power off the clusters meanwhile nevertheless as the room
temperature was getting critical.

Both clusters are up now, and the cooling system seems stable. Also, I
don't expect it to fail again in the following days as the weather
conditions were exceptional today.

Currently, since July the 25th and until August the 10th, I'm on
holidays, but fortunately today I was around and able to fix the
problem. Should you find any strange behaviour please open a ticket in
the Computing Service ticketing system (only accessible form UPV/EHU
network or through VPN) so I can go through it as I'm back at work:

https://services.cfm.ehu.es/glpi 

Use your UPV/EHU LDAP credentials to log in.

Bests,

Iñigo

-- 
Iñigo Aldazabal Mensa, Ph.D.
Computing Service Manager / Scientific Computing Specialist
Centro de Física de Materiales (CSIC-UPV/EHU)
Paseo Manuel de Lardizabal, 5
20018 San Sebastian - Guipuzcoa
SPAIN

phone: +34-943-01-8780
e-mail: inigo.aldazabal en csic.es inigo.aldazabalm en ehu.eus
pgp key id: 0xDBCC8369



More information about the Ekhi-users mailing list