| | 1 | = Changes in nodehandler to address message losses and other issues = |
| | 2 | |
| | 3 | == Nodehandler tests == |
| | 4 | |
| | 5 | * '''Imaging 400 nodes''' |
| | 6 | |
| | 7 | 1) After starting nodehandler (both imaging and experimentation), start communication layer process (ind1) |
| | 8 | 2) 4 communication groups created for imaging all nodes. Each group is responsible for prespecified nodes. (Could be moved to a config file) |
| | 9 | 3) Communication layer has to be started manually, but it will be terminated automatically by nodehandler at the end of the experiment |
| | 10 | |
| | 11 | * Main steps |
| | 12 | 1) 80 is the magic number for the group size. |
| | 13 | 2) Switch on nodes in groups of 80. |
| | 14 | 3) Retry upto three times.. |
| | 15 | 4) Give up for those nodes that do not boot into pxe |
| | 16 | |
| | 17 | 5) Then switch on the next group of 80... and so on.. |
| | 18 | ... |
| | 19 | |
| | 20 | 6) Until whenAll, then start frisbee process |
| | 21 | 7) Switch off nodes in the order of completion.. |
| | 22 | |
| | 23 | Frisbee time is fairly constant, main problem is with initial booting into pxe image |