Context Navigation

← Previous Change
Wiki History
Next Change →

Changes between Initial Version and Version 1 of Internal/Repair

Timestamp:: May 22, 2007, 4:12:51 PM (18 years ago)
Author:: Joseph F. Miklojcik III
Comment:: —

Legend:

: Unmodified
: Added
: Removed
: Modified

Internal/Repair

               v1
+) Clear some space in the node repair area.  Obtain a Philips-head
+screwdriver and a bin for garbage.  Open a web browser, and ssh
+sessions to dhcp1.orbit-lab.org and repository2.orbit-lab.org
+(probably through gw.orbit-lab.org) on a network connected computer.
+) Make a page in the orbit-lab.org wiki with a name matching the
+template Internal/RepairYYYYMMDD (Internal/Repair20070520 for
+example).  Write the current time and whoever is helping do the
+repairs on this wiki page.
+) Determine the set of nodes you are going to replace.  These will be
+any nodes marked as red on orbit-lab.org/wiki/Status, or nodes in
+which the CM cannot reliably power up the node.  Do not repair more
+than ten at a time.  Write the coordinates of these nodes down in the
+wiki page for the repair.  Note which of those node positions are
+supposed to have Atheros and which are supposed to have Intel.  It
+simplifies things if you can do all Atheros or all Intel nodes in a
+particular round of repairs.
+) Comment out lines for these nodes from dhcp1:/etc/dhcp3/dhcpd.conf.
+Restart dhcpd on dhcp1.
+) For each node to be repaired, remove each node from its mounting in
+the grid, leaving the node id box attached.  As you remove nodes, take
+them and their node id box back to the node repair area.  One or two
+other people can work on nodes in the node repair area while one
+person moves nodes back and forth from the grid.  Note any exceptional
+hardware or incorrectly installed connections on the wiki page.
+) Once in the node repair area, remove the node id box and then the
+yellow node enclosure.  Verify that the node id boxes match the list
+of nodes to be repaired on the wiki page, and that the 802.11 hardware
+vendor matches what is expected.  Note exceptions on the wiki page.
+) Replace the power supply.  Take care to put old power supplies in
+the garbage bin.  If the 802.11 hardware vendor did not match what is
+expected, correct the hardware.  Replace the enclosure.  Replace the
+node id box.
+) Calibrate the node (NYI).
+) Replace the node in the grid.  Verify the node id box against two
+adjacent nodes.
+) Once all nodes have been repaired and replaced, verify that the
+nodes are not red on the orbit-lab.org/wiki/Staus page.  That is, that
+the CM reports back to the CMC correctly.
+) Turn the repaired nodes on.  Because they obtain pool addresses
+from dhcp, they will load an 'inventory' image (NYI).  Wait five
+minutes for the inventory image to finish loading.  Then, command the
+CMC to run the inventory command on each node.
+) Run the gendhcpconf script on repository2.  Compare its output
+with the entries you commented out in step 4.  Correct
+dhcp1:/etc/dhcp3/dhcpd.conf if needed.
+) During the following maintenance slot, verify that you can image
+all nodes that have been repaired since the last maintenance slot by
+running the CM stress experiment (NYI).