Post by Michael ProkopPost by T o n gfails because grub2 installation to PBR is still broken.
http://img264.imageshack.us/img264/3293/screenshotgrub2pbrnok.png
Jepp, the fact that grub's PBR feature is still broken is really sad
(and personally I don't think this will be fixed ever by grub's
developers...).
Since I spent about half a day now debugging this stuff, I think I
should share what I found out:
Basically grub-install first creates a core.img file (stored for example
in /mnt/hda1/boot/grub/core.img) and then calls
grub-setup --directory /mnt/hda1/boot/grub --device-map \
/mnt/hda1/boot/grub/device.map /dev/hda1 --force
Which will find out that /dev/hda1 is not a MBR (i.e too little space
for embedding core.img), complain about it loudly, but then (as --force
is given) will continue building a blocklist for that file. (i. e. a
list of sectors where the file is stored).
To do this, it will have to read the core.img file with its built-in
filesystem drivers (the same ones used for loading modules and grub.cfg
later) and remember all the sectors read. So, first convert the "Linux"
path into a GRUB path (and here is the problem).
First stat the path and get st_dev, the device number where the path is
on. Then stat each device name listed in the device map and get st_rdev
(the device number the device points to). If these two are the same, set
root_dev to the GRUB name of that device. In our example, root_dev will
now be (hd0,1), because that is what is stored in device.map for
/dev/hda1 (which is the drive where /mnt/hda1 is mounted from). (In a
similar way, dest_dev is set to the GRUB device name where the
bootloader should be stored, which is (hd0,1) as well here, but might be
different).
Now that we have root_dev pointing to (hd0,1) and core_path_dev pointing
to /mnt/hda1/boot/grub/core.img, we reach the line
#define MAX_TRIES 5
(I will refer to that line later again). After that line, we will try in
a loop to grub_file_open the file (root_dev, core_path_dev).
grub_file_open will basically do nothing else than concatenate the
device and the path, i. e. it will pass
(hd0,1)/mnt/hda1/boot/grub/core.img to the filesystem driver (ext2 in my
example). The filesystem driver answers "file not found" - and he is
right there, as on that device there is no /mnt/hda1..
[At that point there is that Debian hack I talked about yesterday on
IRC. If this call fails with "file not found", Debian will overwrite
core_path_dev with "/grub/core.img". If you have a dedicated boot
partition, this is exactly what you want, since (hdX,Y)/grub/core.img
will point to the right file in this case. But it is a dirty hack which
makes debugging a lot harder if that is not the case, which is the
reason I used a freshly compiled vanilla GRUB for debugging this.]
Therefore, to make this work, there are a few options:
(1) Make sure that the drive where core.img is on is mounted on /.
In this case, the path will be correct. Obviously this is not a real
solutions, since you cannot always ensure this.
(2) Add a symlink so that (in our example) /mnt/hda1/mnt/hda1 is a
(relative) symlink to mnt/hda1. I. e
mkdir /mnt/hda1/mnt && ln -s .. /mnt/hda1/mnt/hda1
In this case, the filesystem driver will find the core.img file and
creating the blocklist works. (I tested this and it worked for me).
As that symlink is only needed while grub-setup is run, it can be
removed afterwards.
(3) In case of a Debian "patched" grub, you can use a /grub symlink on
that device as well which points to boot/grub.
(4) Patch $GRUB_SOURCE/util/i386/pc/grub_setup.c before or after that
#define line mentioned above so that core_path_dev is corrected. A
simple way would be to strip components from the start until
grub_file_open finds the file. But this can fail, in case you give a
path like /tmp/blah/grub/core.img where /tmp/blah is a symlink to
/mnt/hda1/boot. The proper solution would be a bit harder. I don't
know if there is a syscall or library function to find the path
relative to its mountpoint, but if not we would have to resolve
all symlinks, find the mount point (for example by stat'ing each
parent directory until st_dev changes) and strip it away. I am not a
good C coder (and I don't have papers on file at the FSF), so I
won't propose a patch for this. Feel free to send this upstream
wherever you like. It would at least solve *this* problem with
installing to a PBR. There are still other (non-solvable) problems
with PBR install, like when the core image is on a RAID0 or RAID5
disk, but at least the common case (core.img is on an ext[234]
partition) will work again :)
For GRML I suggest changing the scripts as outlined in (2) (or (3)) above :)
Have fun,
Michael