(3 intermediate revisions by the same user not shown) | |||
Line 151: | Line 151: | ||
|} | |} | ||
(clearly I should turn bcj off by default on ppc64) | (clearly I should turn bcj off by default on ppc64) | ||
=== comments (jreiser) === | |||
I find the data useful. The execve table points to dracut as a slow-poke. Verified: dracut is using a derivative of buildinstall to fondle each file as slowly as possible, negating much of the speedup provided by lorax; estimated loss: a minute (5%). The filename counts show that TMPDIR is not being honored. Using a RAM-backed tmpfs such as /dev/shm saves two minutes (10%), but pungi refuses: https://bugzilla.redhat.com/show_bug.cgi?id=723251. Also, dracut forces TMPDIR=/var/tmp which is another slowdown. | |||
My 3GHZ box with 8GB RAM builds -G -B -C (x86_64 including bcj) in 14m13s; adding -I brings the total to 16m17s. | |||
=== speedups === | === speedups === | ||
Line 430: | Line 436: | ||
There was a bug in pungi that didn't add the PPC specific parameters to the mkisofs commandline. See bugzilla #729798 for the details | There was a bug in pungi that didn't add the PPC specific parameters to the mkisofs commandline. See bugzilla #729798 for the details | ||
== squashed 500MB rootfs saves 535KB (jreiser) == | |||
Squashing a 500MB ext4 rootfs saves 535KB and build time in contrast to a 2GB ext4. | Squashing a 500MB ext4 rootfs saves 535KB and build time in contrast to a 2GB ext4. |
Latest revision as of 19:59, 12 August 2011
Boot problem (jreiser)
[2011-08-06 0000 UTC] Upon boot of DVD, then dracut drops to shell with the message:
Warning: No root device "live:/dev/disk/by-label/Fedora" found.
The file is present as
/dev/disk/by-label/Fedora 16 x86_64 DVD
with those three spaces. Reboot with quoting (single, or double, or backslash) also fails. Altering the boot parameter to
root=ID=ata_SONY_DVD_RW_AW_G-170A
also fails [that name is in my /dev/disk/by-id].
[2011-08-06 1700 UTC] After modifying pungi so that the label of the DVD has dashes "Fedora-16-x86_64-DVD" and changing the boot command line to include the dashes "root=live:CDLABEL=Fedora-16-x86_64-DVD", then dracut still complains "No root device \"live:/dev/disk/by-label/Fedora-16-x86_64-DVD\" found". The device is there, symlinked to ../../sr0 ==> /dev/sr0, and that is the correct name of the hardware drive.
[2011-08-07 0400 UTC] In the emergency shell after dracut cannot find the root, I can find it by:
mkdir /mnt1 /mnt2 /mnt3 mount -o ro,loop /dev/sr0 /mnt1 mount -o ro,loop /mnt1/images/install.img /mnt2 mount -o ro,loop /mnt2/LiveOS/rootfs.img /mnt3
but I cannot find how to specify that in a "root=live..." that dracut understands. It's also a mystery why so many mounts are necessary.
notes/solutions (wwoods)
- The DVD isn't built correctly - pungi changes the label of the image without changing the boot config file. Pungi will need to be fixed to handle that. (boot.iso should work fine.)
- I believe dracut expects spaces to be escaped, so it would need to be:
root=live:CDLABEL=Fedora%x2016%x20x86_64%x20DVD
- As a workaround, you could just boot with
root=live:/dev/sr0
- I believe dracut expects spaces to be escaped, so it would need to be:
- The mounts are necessary because treebuilder images are live images, and all live images are ext4 images inside squashfs images inside an iso (CD/DVD) or vfat (USB stick) filesystem.
- Yes, that does seem needlessly complicated
- No, there isn't a simpler way to do it right now
boot problem on a PowerMac G5
I've tried to boot with 'root=live:/dev/hda', but dracut bailed out with 'Warning: No root device "live:/dev/hda" found'
I've also tried to boot with 'root=live:http://some-server/.../install.img', but dracut aborted with a similar error.
mount traps when systemd mounts /tmp (hamzy)
boot the 08/04/2011 iso image with the following options
linux systemd.log_target=kmsg systemd.log_level=debug rd.break
Seen during booting:
... [ 11.992445] systemd[1]: tmp.mount mount process exited, code=killed status=11 [ 12.029660] systemd[1]: Job loader.service/start failed with result 'dependency'. [ 12.029695] systemd[1]: Unit tmp.mount entered failed state. ...
bash-4.2# systemctl status tmp.mount [ 71.988204] systemd[1]: Accepted connection on private bus. [ 71.988860] systemd[1]: Got D-Bus request: org.freedesktop.systemd1.Manager.LoadUnit() on /org/freedesktop/systemd1 [ 71.989077] systemd[1]: Got D-Bus request: org.freedesktop.DBus.Properties.GetAll() on /org/freedesktop/systemd1/unit/tmp_2emount tmp.mount - Runtime Directory Loaded: loaded (/lib/systemd/system/tmp.mount) Active: inactive (dead) Where: /tmp What: tmpfs CGroup: name=systemd:/system/tmp.mount [ 71.990451] systemd[1]: Got D-Bus request: org.freedesktop.DBus.Local.Disconnected() on /org/freedesktop/DBus/Local
FIX:
boot with the following options
linux systemd.log_target=kmsg systemd.log_level=debug rd.break
run the following command in the shell
sed -i -e 's/tmp\.mount //' /sysroot/lib/systemd/system/loader.service
notes (wwoods)
- Only seems to happen on PPC64 systems
- This workaround has been added to the treebuilder branch; should be fixed for images built on or after Aug. 9
Slow (jreiser)
Treebuilder is slower. The first culprit looks like dracut.
number | execve |
---|---|
7606 | /bin/egrep |
2578 | /bin/cp |
1633 | /lib64/ld-linux-x86-64.so.2 |
1434 | /usr/bin/ldd |
1413 | /bin/ln |
1303 | /usr/local/bin/ln |
1303 | /usr/bin/ln |
1087 | /sbin/modinfo |
The second culprit is not honoring $TMPDIR, and not putting yumroot-$PID and installroot-$PID inside it.
count | filename |
---|---|
1693845 | * |
253387 | $DESTDIR/yumroot/... |
79977 | $DESTDIR/installroot/... |
44486 | /etc/xattr.conf |
41713 | /proc/self/task/31818/attr/fscreate |
25919 | /proc/self/task/1070/attr/fscreate |
20220 | /usr |
19487 | /etc/localtime |
18485 | /lib64/libc.so.6 |
18294 | /usr/share |
18184 | /etc/ld.so.cache |
18074 | /etc/ld.so.preload |
17213 | /usr/bin/ldd |
14471 | /bin/egrep |
9861 | /usr/share/locale |
9090 | /var/log/dracut.log |
9013 | /dev/null |
7529 | /sbin/modprobe |
6110 | . |
commentary (wwoods)
- This is pretty useless without some hard numbers. On my years-old test systems, I can complete a run in the following times:
Build type | Arch | Time | Notes |
---|---|---|---|
treebuilder | x86_64 | 19:15 | install.img: 125M |
treebuilder (bcj=off) | x86_64 | 15:07 | install.img: 127M |
treebuilder | ppc64 | 40:10 | install.img: 134M |
treebuilder (bcj=off) | ppc64 | 37:43 | install.img: 134M |
(clearly I should turn bcj off by default on ppc64)
comments (jreiser)
I find the data useful. The execve table points to dracut as a slow-poke. Verified: dracut is using a derivative of buildinstall to fondle each file as slowly as possible, negating much of the speedup provided by lorax; estimated loss: a minute (5%). The filename counts show that TMPDIR is not being honored. Using a RAM-backed tmpfs such as /dev/shm saves two minutes (10%), but pungi refuses: https://bugzilla.redhat.com/show_bug.cgi?id=723251. Also, dracut forces TMPDIR=/var/tmp which is another slowdown.
My 3GHZ box with 8GB RAM builds -G -B -C (x86_64 including bcj) in 14m13s; adding -I brings the total to 16m17s.
speedups
bcj=off
: Add this to /etc/lorax/lorax.conf
[compression] bcj=off
The anaconda installer starts but then hangs (hamzy)
boot the 08/04/2011 iso image with the following options
linux systemd.log_target=kmsg systemd.log_level=debug rd.break
run the following command in the shell
sed -i -e 's/tmp\.mount //' /sysroot/lib/systemd/system/loader.service exit
You will see and then nothing else
Starting Anaconda version 16.14.
Change the file /lib/systemd/system/loader.service to start the loader like this
ExecStart=/usr/bin/strace -f -e write=3,4,5 /sbin/loader
This will run strace on the loader and produce the following output
[pid 606] <... read resumed> "", 4096) = 0 [pid 606] --- {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=910, si_status=0, si_utime=0, si_stime=1} (Child exited) --- [pid 606] read(11, "", 4096) = 0 [pid 606] waitpid(910, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 910 [pid 606] close(9) = 0 [pid 606] close(11) = 0 [pid 606] access("/mnt/install/source/.treeinfo", R_OK) = -1 ENOENT (No such file or directory) [pid 606] umount("/mnt/install/source", 0) = 0 [pid 606] time(NULL) = 1312818684 [pid 606] send(6, "<143>Aug 8 15:51:24 loader: in "..., 62, MSG_NOSIGNAL) = 62 [pid 606] write(4, "15:51:24,021 DEBUG loader: in do"..., 61) = 61 | 00000 31 35 3a 35 31 3a 32 34 2c 30 32 31 20 44 45 42 15:51:24 ,021 DEB | | 00010 55 47 20 6c 6f 61 64 65 72 3a 20 69 6e 20 64 6f UG loade r: in do | | 00020 4c 6f 61 64 65 72 4d 61 69 6e 2c 20 73 74 65 70 LoaderMa in, step | | 00030 20 3d 20 53 54 45 50 5f 4c 41 4e 47 0a = STEP_ LANG. | [pid 606] select(3, [2], NULL, NULL, {0, 0}) = 0 (Timeout) [pid 606] write(1, "\33[2;21H\33[m\17\33[30m\33[47m\342\224\214\342\224\200\342\224\200\342\224"..., 1926) = 1926 [pid 606] select(1, [0], [], [], NULL
The loader gets as far as outputting the following statement
15:06:02,581 DEBUG loader: in doLoaderMain, step = STEP_LANG.
However, I think that there is still unbuffered strace output that did not make it to the screen.
TODO:
Try to set up debugging with gdbserver
gdbserver :9999 /sbin/loader
$HOME? (jreiser)
During install from boot.iso DVD, then /mnt/sysimage has a "home/<<me>>" directory which contains a copy of my user $HOME at the time I ran pungi, even though I ran pungi within "su; cd" which set $CWD to /root.
SQUASHFS errors (hamzy)
boot the 08/04/2011 iso image with the following options
linux systemd.log_target=kmsg systemd.log_level=debug rd.break
When you are in the shell, you will see the following errors:
switch_root:/# grep -r 'Starting Anaconda' / 2>/dev/null [ 112.508993] SQUASHFS error: xz_dec_run error, data probably corrupt [ 112.509009] SQUASHFS error: squashfs_read_data failed to read block 0x6e32d5a [ 112.509021] SQUASHFS error: Unable to read data cache entry [6e32d5a] [ 112.509028] SQUASHFS error: Unable to read page, block 6e32d5a, size 8ec4 [ 112.509146] SQUASHFS error: Unable to read data cache entry [6e32d5a] [ 112.509152] SQUASHFS error: Unable to read page, block 6e32d5a, size 8ec4 [ 112.541370] SQUASHFS error: Unable to read data cache entry [6e32d5a] [ 112.541379] SQUASHFS error: Unable to read page, block 6e32d5a, size 8ec4 [ 112.541506] SQUASHFS error: Unable to read data cache entry [6e32d5a] [ 112.541512] SQUASHFS error: Unable to read page, block 6e32d5a, size 8ec4
or
bash-4.2# nslookup hamzy2.austin.ibm.com [ 3082.076565] SQUASHFS error: xz_dec_run error, data probably corrupt [ 3082.076587] SQUASHFS error: squashfs_read_data failed to read block 0x6e32d5a [ 3082.076597] SQUASHFS error: Unable to read data cache entry [6e32d5a] [ 3082.076604] SQUASHFS error: Unable to read page, block 6e32d5a, size 8ec4 Bus error
It seems that one of the blocks is corrupt?
missing files for networking (hamzy)
boot the 08/04/2011 iso image with the following options
linux systemd.log_target=kmsg systemd.log_level=debug rd.break
When trying to bring up the networking, I noticed the following missing files:
bash-4.2# ifup eth0 /etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory /etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory /etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory /etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory /etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory /etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory /etc/sysconfig/network-scripts/ifup-ipv6: line 91: /sbin/sysctl: No such file or directory /etc/sysconfig/network-scripts/ifup-ipv6: line 115: [: -ne: unary operator expected /etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory /etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory /etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory /etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory
/bin/mountpoint and /sbin/sysctl are missing from the image.
networking does not work (hamzy)
boot the 08/04/2011 iso image with the following options
linux systemd.log_target=kmsg systemd.log_level=debug rd.break
I am in the shell, I set up the networking by the following commands:
bash-4.2# cat << __EOF__ > /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE="eth0" BOOTPROTO="static" DNS1="9.10.244.100" GATEWAY="9.5.250.1" HWADDR="2A:0F:4A:82:98:04" ONBOOT="yes" IPADDR="9.5.250.146" IPV6INIT="yes" NETMASK="255.255.255.0" NM_CONTROLLED="yes" TYPE="Ethernet" __EOF__ bash-4.2# cat << __EOF__ > /etc/sysconfig/network NETWORKING=yes HOSTNAME=bluebill __EOF__ bash-4.2# ifup eth0
I then saw the following errors:
bash-4.2# ifup eth0 /etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory /etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory /etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory /etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory /etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory /etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory /etc/sysconfig/network-scripts/ifup-ipv6: line 91: /sbin/sysctl: No such file or directory /etc/sysconfig/network-scripts/ifup-ipv6: line 115: [: -ne: unary operator expected /etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory /etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory /etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory /etc/init.d/functions: line 17: /bin/mountpoint: No such file or directory
However, I can bypass the mountpoint errors by setting the SYSTEMCTL_SKIP_REDIRECT environment variable
bash-4.2# SYSTEMCTL_SKIP_REDIRECT=yes ifup eth0 /etc/sysconfig/network-scripts/ifup-ipv6: line 91: /sbin/sysctl: No such file or directory /etc/sysconfig/network-scripts/ifup-ipv6: line 115: [: -ne: unary operator expected
Unfortunately, while networking looks like it is up, the machine is unresponsive to the network (not pingable, not sshable).
bash-4.2# ifconfig eth0 Link encap:Ethernet HWaddr 2A:0F:4A:82:98:04 inet addr:9.5.250.146 Bcast:9.5.250.255 Mask:255.255.255.0 inet6 addr: fe80::280f:4aff:fe82:9804/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:38 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:2460 (2.4 KiB) Interrupt:20 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) bash-4.2# nslookup hamzy2.austin.ibm.com [ 3082.076565] SQUASHFS error: xz_dec_run error, data probably corrupt [ 3082.076587] SQUASHFS error: squashfs_read_data failed to read block 0x6e32d5a [ 3082.076597] SQUASHFS error: Unable to read data cache entry [6e32d5a] [ 3082.076604] SQUASHFS error: Unable to read page, block 6e32d5a, size 8ec4 Bus error
possible recursive locking detected (hamzy)
boot the 08/04/2011 iso image with the following options
linux systemd.log_target=kmsg systemd.log_level=debug rd.break
Sometimes, you will see the following:
[ 7.860083] ============================================= [ 7.860110] [ INFO: possible recursive locking detected ] [ 7.860121] 3.0.0-1.fc16.ppc64 #1 [ 7.860130] --------------------------------------------- [ 7.860140] udevd/288 is trying to acquire lock: [ 7.860152] (&mm->mmap_sem){++++++}, at: [<c000000000044560>] .drop_cop+0x70/0x13c [ 7.860185] [ 7.860186] but task is already holding lock: [ 7.860199] (&mm->mmap_sem){++++++}, at: [<c00000000069c3c8>] .do_page_fault+0x244/0x72c [ 7.860227] [ 7.860228] other info that might help us debug this: [ 7.860241] Possible unsafe locking scenario: [ 7.860242] [ 7.860253] CPU0 [ 7.860261] ---- [ 7.860267] lock(&mm->mmap_sem); [ 7.860283] lock(&mm->mmap_sem); [ 7.860302] [ 7.860303] *** DEADLOCK *** [ 7.860304] [ 7.860323] May be due to missing lock nesting notation [ 7.860324] [ 7.860341] 1 lock held by udevd/288: [ 7.860351] #0: (&mm->mmap_sem){++++++}, at: [<c00000000069c3c8>] .do_page_fault+0x244/0x72c [ 7.860390] [ 7.860391] stack backtrace: [ 7.860401] Call Trace: [ 7.860412] [c00000027250aab0] [c000000000015764] .show_stack+0x94/0x144 (unreliable) [ 7.860435] [c00000027250ab70] [c0000000006a3098] .dump_stack+0x24/0x2c [ 7.860455] [c00000027250abf0] [c0000000000ee25c] .__lock_acquire+0x176c/0x17f4 [ 7.860475] [c00000027250ad40] [c0000000000ee858] .lock_acquire+0x10c/0x128 [ 7.860498] [c00000027250ae30] [c000000000698f34] .down_read+0x78/0x12c [ 7.860519] [c00000027250aed0] [c000000000044560] .drop_cop+0x70/0x13c [ 7.860539] [c00000027250af70] [c0000000000448b4] .destroy_context+0x2c/0x50 [ 7.860558] [c00000027250b000] [c0000000000a88d4] .__mmdrop+0x50/0x70 [ 7.860577] [c00000027250b090] [c000000000096c2c] .finish_task_switch+0x10c/0x15c [ 7.860600] [c00000027250b130] [c000000000696284] .schedule+0x9f8/0xaf0 [ 7.860620] [c00000027250b3b0] [c000000000696428] .io_schedule+0xac/0x118 [ 7.860646] [c00000027250b440] [c00000000016f550] .sleep_on_page+0x1c/0x2c [ 7.860668] [c00000027250b4c0] [c00000000069717c] .__wait_on_bit_lock+0x88/0x108 [ 7.860688] [c00000027250b580] [c00000000016f528] .__lock_page+0x6c/0x78 [ 7.860712] [c00000027250b650] [c00000000016f69c] .find_lock_page+0x7c/0xc4 [ 7.860735] [c00000027250b6f0] [c0000000001881b0] .shmem_getpage+0x9c/0xad8 [ 7.860755] [c00000027250b800] [c000000000188c44] .shmem_fault+0x58/0xb8 [ 7.860775] [c00000027250b8b0] [c000000000192e10] .__do_fault+0x78/0x650 [ 7.860795] [c00000027250b9c0] [c000000000196400] .handle_pte_fault+0x3cc/0x9b0 [ 7.860815] [c00000027250bac0] [c000000000197a4c] .handle_mm_fault+0x15c/0x16c [ 7.860840] [c00000027250bb80] [c00000000069c628] .do_page_fault+0x4a4/0x72c [ 7.860860] [c00000027250be30] [c00000000000676c] handle_page_fault+0x20/0x74
ISO image unbootable (hamzy)
booting the 08/09/2011 iso image ( http://ppc.koji.fedoraproject.org/scratch/karsten/iso/Fedora-20110809-ppc64-DVD.iso ) receives the following error:
Welcome to yaboot version 1.3.16 (Red Hat 1.3.16-3.fc16) Enter "help" to get some basic usage information boot: linux Please wait, loading kernel... linux:0,/vmlinux: Unable to open file, Invalid device
notes/solutions (Kick_)
Try again with the 08/10/2011 images, please.
There was a bug in pungi that didn't add the PPC specific parameters to the mkisofs commandline. See bugzilla #729798 for the details
squashed 500MB rootfs saves 535KB (jreiser)
Squashing a 500MB ext4 rootfs saves 535KB and build time in contrast to a 2GB ext4.
+++ b/src/pylorax/treebuilder.py - fssize = 2 * (1024*1024*1024) # 2GB sparse file compresses down to nothin' + fssize = 500 * (1024*1024) # typ:380MB used
-Filesystem size 121315.13 Kbytes (118.47 Mbytes) +Filesystem size 120779.49 Kbytes (117.95 Mbytes)