############################################################################## Random Simple Things that have Worked in the Past and are Likely to Work Again ############################################################################## A Sunfire Goes Down ------------------- As of 6/27/2016. Note: one of the sunfires can't be booted with all the disks in. If this sunfire fails, pop all the disks except for the SSD (which is a pain to get back in), and the boot drives (drives 1 and 2). Then boot the machine. Then, after it has booted, put the disks back in, and follow the instructions below. 1. ssh onto magellan, and run ``bmc sunfire0-bmc chassis power cycle``. 2. ssh onto the sunfire, probably through magellan. Run ``zfs mount -a`` if the zpools aren't already mounted, and then start ceph (either through sys v init or systemd, depending on the sunfire). 3. monitor ``ceph health`` (on any ceph monitor: crimea, magellan, or gomes) to ensure that ceph comes back up properly. The Website is Reachable, but everything 403's or 404's ------------------------------------------------------- As of 7/6/2016. Restart web.vm. Mail server fails IMAP requests ------------------------------- As of 7/21/2016. Run ``sudo journalctl -u dovecot`` on crimea.acm.jhu.edu. If it says that a connection timed out to acmsys/Maildir, then there's a problem with the afs mail dir servers on chicago. First things first, check the ZFS status. Run ``zpoll status``. If that says something is wrong, debug the zpool. To restart the maildir server run ``/etc/init.d/openafs-fileserver restart``. If takes longer than ~10 minutes something else is wrong. Try restarting chicago. Echidna's AFS servers died -------------------------- As of 9/19/2016. Reboot echidna. You can't do ceph things with cinder (like create/delete volumes) ----------------------------------------------------------------- As of 9/25/16. Restart cinder-volume on gomes. Ceph won't start on a sunfire due to permission errors ---------------------------------------------------------------- As of 2/28/17. Run ``chown -R ceph:ceph /var/run/ceph``, then try again. See http://tracker.ceph.com/issues/15553 for more info. A ceph mon is down after a restart ---------------------------------- As of 3/11/2017. Run ``systemctl restart ceph``. The issue is that, since our ceph config is served out of AFS, we have an implicit dependency on AFS, but systemd doesn't know it (this should be fixed at some point). Anyway, by the time you ssh into the machine to manually restart ceph, openafs-client should be up, so simply restarting ceph should just work. OpenStack VMs won't be deleted, and they just hang -------------------------------------------------- As of 3/11/2017. Reboot gomes. You Can't Delete OpenStack VMs (they're stuck in the deleting state) -------------------------------------------------------------------- As of 4/12/2017. ssh to the compute node that the instance was running on, and restart the nova daemon on it.