wiki:Old/Tutorials/HowToImage

Version 13 (modified by thierry, 17 years ago) ( diff )

Go back —> Tutorials

How to install a disk image on the nodes of a Testbed (using imageNodes4)

If you have not done so yet:

Lets assume that you registered as user 'bob' and made a reservation for the 'grid' testbed. You then access the 'grid' console using the command:

 ssh bob@console.grid.orbit-lab.org

Then, to image some nodes on the 'grid' testbed, use the command:

 imageNodes4 all baseline-7.11.ndz 
 # will write the disk image with the name 'baseline-7.11.ndz' on all the nodes on the 'grid'

 imageNodes4 [1..10,1..5] baseline-7.11.ndz 
 # same as above, but only on the nodes that have their coordinates x=[1..10] and y=[1..5]
 
 imageNodes4 [[3,1],[5,6]] baseline-7.11.ndz 
 # same as above, but only on nodes [3,1] and [5,6]

 imageNodes4 [1,1] baseline-7.11.ndz 
 # same as above, but only on node [1,1]

The output of the "imaging" process will look like the following:

Imaging nodes: '[1..20,1..20]' with image 'baseline-7.11.ndz' on default domain (retrieved from hostname)
 INFO init: NodeHandler Version 4.2.0 (1272)
 INFO init: Experiment ID: grid_2007_07_03_03_13_50
 ...
 WARN -:topo:image: Ignoring missing node '5@14'
 ...
 INFO stdlib: Waiting for nodes (Up/Down/Total): 0/387/387 - (still down: n_20_11,n_17_14,n_15_18)
 INFO stdlib: Waiting for nodes (Up/Down/Total): 34/353/387 - (still down: n_20_11,n_17_14,n_15_18)
 INFO stdlib: Waiting for nodes (Up/Down/Total): 61/326/387 - (still down: n_20_11,n_17_14,n_15_18)
 ...
 WARN stdlib: Giving up on node n_4_1
 INFO whenAll: *: 'status[@value='UP']' fires
 INFO exp: Progress(0/0/214): 0/0/0 min(n_20_11)/avg/max (216) - Timeout: 581 sec.
 ...
 INFO exp: Progress(208/14/214): 0/91/100 min(n_18_19)/avg/max (216) - Timeout: 6 sec.
 ...
 INFO Experiment: DONE!

At the end of the "imaging" process, you will have 3 topology files within your user directory:

  • system_topo_active_grid.rb - a topology with all the nodes that have successfully been imaged
  • system_topo_failed_grid.rb - a topology with all the nodes that have failed during the "imaging" process (possibly due to some disk read/write errors)
  • system_topo_timedout_grid.rb - a topology with all the nodes that have timed out the "imaging" process. These nodes correctly started writing the image on their disk, but they did not finish before the default timeout of 800 sec.

You can then:

  • use the information in the topo_grid_active.rb file to check/select which node to use in your experiments
  • or directly use the topology defined in this file within your experiment scripts, as described in details in this tutorial.

Attachments (6)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.