wiki:Internal/ImageInstall

Version 27 (modified by ssugrim, 17 years ago) ( diff )

So here's my first attempt at poking around the orbit infrastructure. The goal clone a machine. My candidate is orbit-pc2, one of the user service machines. It's going to be migrated to a dell.

Step 1

PXE net-work booting (on the source machine): —

This is a concerted effort between the dhcp server and the tftp server. In the dhcpd.conf file in the parameter "nextserver" is interpreted by the pxe booting clinet as "where to look" for the boot img (and associated directive files). The booting clinet then initiates a tftp session with the "next-server" and downloads an directive file that tells it what image to copy, uncompress, and boot.

Our current setup:
DHCPD runs on dhcp1.orbit-lab.org (10.0.0.1)
AFTPD runs on repository2.orbit-lab.org (10.0.50.40, this is "next-server") 

Step 2

Getting the boot image:

On the tftp server a boot instructions file (BIF) is stored in /tfptboot/pexlinux.cfg/ . It contains information on what image to download, and what boot flags to apply to the kernel.

When the booting client wakes up, and establishes a tftp session with "next-server", it looks for BIF in the /tftpboot/pxelinux.cfg directory that has it's IP ADDRESS as the name of the file (IN HEX, with no field delimiters).

E.G. if your IP was 
10.0.250.246
the file would have to be named 
0A00FAF6

NOTE: The link should point to the BIF NOT the image. In this particular case I 
linked to default.orbit-pxe.

ln -s default.orbit-pxe 0A00FAF6

Assuming that the image file and BIF are valid, the booting client will boot and eventually put you at a console. If the file of apprioate name isn't there, the client will strip off the last bit and search for a file of that name (In our example the next would be 0A00FAF), and proceed until there are not bits left. This can be used to get a collection of clients in a range of address to boot from the same image, by naming the file something "smaller" in bits.


Some snags I've run into:

P:DHCP "next-server" point to the wrong machine
S: Edit the dhcpd.conf file on the dhcp server and adjust the next-server paramenter for the appropriate group. (Make sure your adjusting the next-server for the group for which you belong. You can watch the boot messages on the pxe-client to figure out what group dhcp is putting you in.)

P: TFPTD dosen't respond
S: Restart TFTPD on the next-server. In our current setup this is repository2 and the server name is AFTPD.

P: No BIF get loaded by the client, and the machine moves to other boot media
S: Make sure the file is named appropriately. When the pxe clinet searches for the BIF,it list (and displays on screen) the names it tires. If you don't see this list of attempts, the client hasn't found / communicated with the tftp server. If you see the list, a clever use of the Pause/Break key will help you find what its looking for.


once at the console, the resting place of the image was mounted.

Step 3

Finding a place to put your image:

mount -o nolock repository2:/export/orbit/image/tmp /mnt

We need the nolock option because of some NFS bug.
Repository2:yadayada is the NFS destination (where the image will be dumped)
/mnt is my mount point (could be anything really)

So now that I have some place to put the image file, I can create it with the imagezip utility (should live in /bin on the image).

Note: NFS is not particularly fourth coming with information. For instance if you type the 
name of the share point incorrectly, you get a permission denied error. Some 
what misleading.

You should be in ??? network, other wise repository2 won't allow you to mount. 
You might also run into some routing issues, and thus not be able to get a 
packet to repository2 even thous you can resolve the name.

Step 4

Getting the image:

imagezip /dev/hda - > /mnt/orbit-pc1...

imagezip is part of the firsbee suite, it is dumb and will just start dumping to std out. To use it you need to redirect that to a file or some such place. (unix pipe + ssh?)
/dev/hda is my source (where I want the image to come from)
the source is followed by a - for some reason I don't know.
> is the unix redirect
/mnt/orbit-pc1… is my file name (note that I put it on the nfs mounted directory, since the image has no place / space to hold any thing. And even if it did, how would you get at it?)

Assuming that all went well, you should have a file sitting somewhere on some box, that you can get at later.

Now the fun begins. We have and image, it would be usefull to put it somewhere.

Since frisbee is a client / server type service, I'll need a place for the server to run. This place should have access to my freshly generated image. Mogwai (?!$#@) is our candidate. Once ssh'd to mogwai, I'll scp my file from repository2, and place it in /tmp. (an exercise left to the reader)

Step 5

Starting up a frisbee server/daemon (frisbeed):

frisbeed -p 5050 -m 10.0.250.221 /tmp/orbit-pc2...

frisbeed lives in /usr/sbin. Note the d for daemon.
-p 5050 is the port number for the service (I just picked one at random)
-m 10.0.250.221 is the address of the client. This could be some kind of multi cast address, but to keep things simple I figued I'd just do one machine for now.
/tmp/orbit-pc2 is my image file, freshly copied.

Step 6

Pulling down the image to the "client"

Note: This setup requires that you preform steps 1 and 2 on the target machine. From here you should be at the similar console to the one you had when using the imagezip utility. Since all I'm doing is using the frisbee client to get the image from the frisbee server I made in step 5, I don't need step 3.

frisbee -p 5050 -m 10.0.101.10 -i 10.0.250.221 /dev/hda

frisbee should live in /bin somewhere on the winlab.img that we booted into.
-p 5050 is the randomly choosen port number
-m 10.0.101.10 is mogwai's ip (my service host)
-i 10.0.250.221 is the interface I'm trying to use to get my image from. No idea why I need this, but if I don't put it there, it complains about not being able to resolve the host name. ("gethostbyname: Unknown host")
/dev/hda is where I'm going to dump my image. (Hope you don't have anything there you need, it's going to be gone in a miniute).

Note: Frisbee depends on multicast and thus can't "route". This means 
that both client and server be in the same subnet. 

Start the service first, start the client second. Assming that it worked you should see a progress indicator.


Additional notes

  • you can specify a multicast address in the -m tag, for doing multiple nodes. E.G. 224.0.0.10
  • when starting frisbeed on repository2 make sure to tell it which interface to listen on (via the -i ipaddress parameter, repository2 has multiple interfaces, some real, some virtual) other wise the client will get confused about an address mismatch.
  • if you were imageing a console (and thus using one of the console images), once the system has booted into the image you need to make 2 post image changes and then reboot
    1. In /etc/udev/rules.d/ remove the z25_persisten_net.rules files. If not removed they will screw up the numbering of the interfaces
    2. change the hostname.
Note: See TracWiki for help on using the wiki.