| 1 | 1) Clear some space in the node repair area. Obtain a Philips-head |
| 2 | screwdriver and a bin for garbage. Open a web browser, and ssh |
| 3 | sessions to dhcp1.orbit-lab.org and repository2.orbit-lab.org |
| 4 | (probably through gw.orbit-lab.org) on a network connected computer. |
| 5 | |
| 6 | 2) Make a page in the orbit-lab.org wiki with a name matching the |
| 7 | template Internal/RepairYYYYMMDD (Internal/Repair20070520 for |
| 8 | example). Write the current time and whoever is helping do the |
| 9 | repairs on this wiki page. |
| 10 | |
| 11 | 3) Determine the set of nodes you are going to replace. These will be |
| 12 | any nodes marked as red on orbit-lab.org/wiki/Status, or nodes in |
| 13 | which the CM cannot reliably power up the node. Do not repair more |
| 14 | than ten at a time. Write the coordinates of these nodes down in the |
| 15 | wiki page for the repair. Note which of those node positions are |
| 16 | supposed to have Atheros and which are supposed to have Intel. It |
| 17 | simplifies things if you can do all Atheros or all Intel nodes in a |
| 18 | particular round of repairs. |
| 19 | |
| 20 | 4) Comment out lines for these nodes from dhcp1:/etc/dhcp3/dhcpd.conf. |
| 21 | Restart dhcpd on dhcp1. |
| 22 | |
| 23 | 5) For each node to be repaired, remove each node from its mounting in |
| 24 | the grid, leaving the node id box attached. As you remove nodes, take |
| 25 | them and their node id box back to the node repair area. One or two |
| 26 | other people can work on nodes in the node repair area while one |
| 27 | person moves nodes back and forth from the grid. Note any exceptional |
| 28 | hardware or incorrectly installed connections on the wiki page. |
| 29 | |
| 30 | 6) Once in the node repair area, remove the node id box and then the |
| 31 | yellow node enclosure. Verify that the node id boxes match the list |
| 32 | of nodes to be repaired on the wiki page, and that the 802.11 hardware |
| 33 | vendor matches what is expected. Note exceptions on the wiki page. |
| 34 | |
| 35 | 7) Replace the power supply. Take care to put old power supplies in |
| 36 | the garbage bin. If the 802.11 hardware vendor did not match what is |
| 37 | expected, correct the hardware. Replace the enclosure. Replace the |
| 38 | node id box. |
| 39 | |
| 40 | 8) Calibrate the node (NYI). |
| 41 | |
| 42 | 8) Replace the node in the grid. Verify the node id box against two |
| 43 | adjacent nodes. |
| 44 | |
| 45 | 9) Once all nodes have been repaired and replaced, verify that the |
| 46 | nodes are not red on the orbit-lab.org/wiki/Staus page. That is, that |
| 47 | the CM reports back to the CMC correctly. |
| 48 | |
| 49 | 10) Turn the repaired nodes on. Because they obtain pool addresses |
| 50 | from dhcp, they will load an 'inventory' image (NYI). Wait five |
| 51 | minutes for the inventory image to finish loading. Then, command the |
| 52 | CMC to run the inventory command on each node. |
| 53 | |
| 54 | 11) Run the gendhcpconf script on repository2. Compare its output |
| 55 | with the entries you commented out in step 4. Correct |
| 56 | dhcp1:/etc/dhcp3/dhcpd.conf if needed. |
| 57 | |
| 58 | 12) During the following maintenance slot, verify that you can image |
| 59 | all nodes that have been repaired since the last maintenance slot by |
| 60 | running the CM stress experiment (NYI). |