| 1 | = Changes in nodehandler to address message losses and other issues = |
| 2 | |
| 3 | == Nodehandler tests == |
| 4 | |
| 5 | * '''Imaging 400 nodes''' |
| 6 | |
| 7 | 1) After starting nodehandler (both imaging and experimentation), start communication layer process (ind1) |
| 8 | 2) 4 communication groups created for imaging all nodes. Each group is responsible for prespecified nodes. (Could be moved to a config file) |
| 9 | 3) Communication layer has to be started manually, but it will be terminated automatically by nodehandler at the end of the experiment |
| 10 | |
| 11 | * Main steps |
| 12 | 1) 80 is the magic number for the group size. |
| 13 | 2) Switch on nodes in groups of 80. |
| 14 | 3) Retry upto three times.. |
| 15 | 4) Give up for those nodes that do not boot into pxe |
| 16 | |
| 17 | 5) Then switch on the next group of 80... and so on.. |
| 18 | ... |
| 19 | |
| 20 | 6) Until whenAll, then start frisbee process |
| 21 | 7) Switch off nodes in the order of completion.. |
| 22 | |
| 23 | Frisbee time is fairly constant, main problem is with initial booting into pxe image |