Rescue initramfs, anyone? With SSH, LUKS, LVM?

Message

szatox · Post by **szatox** » Sun Mar 30, 2025 7:10 pm

I'm basically rebuilding a voyager, which means I need to be a bit creative with my installation media. Unfortunately, the way I used to manually build initramfs currently results in its contents exceeding 1GB, which is... Way, too much.
So, does anyone have a readily available initramfs (or a recipe for one) which would let me connect via SSH, repartition disk, encrypt it, create an LVM and format root?

Genkernel seems to be falling just a bit short; its included tools are _too_ minimal to my liking. They look kinda familiar, but I don't know how to use them.

pingtoo · Post by **pingtoo** » Sun Mar 30, 2025 7:28 pm

Not a initrd per se, but may be you can consider AlpineLinux in diskless mode. This way you will have complete linux environment and dynamically load any utilities for building Gentoo (or any other OS)

zen_desu · Post by **zen_desu** » Sun Mar 30, 2025 11:15 pm

szatox wrote:I'm basically rebuilding a voyager, which means I need to be a bit creative with my installation media. Unfortunately, the way I used to manually build initramfs currently results in its contents exceeding 1GB, which is... Way, too much.
So, does anyone have a readily available initramfs (or a recipe for one) which would let me connect via SSH, repartition disk, encrypt it, create an LVM and format root?

Genkernel seems to be falling just a bit short; its included tools are _too_ minimal to my liking. They look kinda familiar, but I don't know how to use them.

I made a basic dropbear module for ugrd, just to test. I can help you make a module specifically for this project if you're interested.

ugrd as a tool mostly helps find/copy dependencies and make a basic POSIX shell script to run things. The dropbear module started dropbear as part of the init, and then forces it to run a small shell script which eventually kills the dropbear server so the main init keeps moving along.

again, warning, just test code: https://github.com/desultory/ugrd-dropb ... ropbear.py

I could possibly add a dev option to make it so the generated initramfs doesn't attempt to switch_root, possibly making it run a a specified target within the initramfs. Making it use a squashfs is also another option.

szatox · Post by **szatox** » Mon Mar 31, 2025 1:37 pm

Honestly both options look like they should meet my requirements. Alpine is known for its focus on being small, and using a squashfs as root could really save me a good chunk of precious RAM.
Still, since genkernel is deprecated now, I might as well start thinking about replacing it on my systems.

Zen_desu, I see ugrd has a pretty extensive reference documentation (good), but is there any quickstart guide? You know, some equivalent of a "Hello world"?

I could possibly add a dev option to make it so the generated initramfs doesn't attempt to switch_root, possibly making it run a a specified target within the initramfs. Making it use a squashfs is also another option.

Storing a part of initramfs in a squashfs does sound like a cool feature, but I don't think it is something that anyone (besides me right now) would care about. I'm just trying to abuse the hell out of the tools that already exist to avoid doing the work myself.
About ssh itself, IMO there should be 2 or 3 paths selected by a boot flag: activate server and wait for the admin to kill it before mounting root (interactive boot), or don't activate sshd and just proceed with boot (regular boot). The optional 3rd path would be: activate sshd in case of boot failure instead of dropping to local shell (remote rescue mode).
This looks like handy thing to have on pretty much any server, avoiding going through annoyingly slow and clunky webconsoles.

zen_desu · Post by **zen_desu** » Mon Mar 31, 2025 3:24 pm

szatox wrote:Honestly both options look like they should meet my requirements. Alpine is known for its focus on being small, and using a squashfs as root could really save me a good chunk of precious RAM.
Still, since genkernel is deprecated now, I might as well start thinking about replacing it on my systems.

Zen_desu, I see ugrd has a pretty extensive reference documentation (good), but is there any quickstart guide? You know, some equivalent of a "Hello world"?

I could possibly add a dev option to make it so the generated initramfs doesn't attempt to switch_root, possibly making it run a a specified target within the initramfs. Making it use a squashfs is also another option.
Storing a part of initramfs in a squashfs does sound like a cool feature, but I don't think it is something that anyone (besides me right now) would care about. I'm just trying to abuse the hell out of the tools that already exist to avoid doing the work myself.
About ssh itself, IMO there should be 2 or 3 paths selected by a boot flag: activate server and wait for the admin to kill it before mounting root (interactive boot), or don't activate sshd and just proceed with boot (regular boot). The optional 3rd path would be: activate sshd in case of boot failure instead of dropping to local shell (remote rescue mode).
This looks like handy thing to have on pretty much any server, avoiding going through annoyingly slow and clunky webconsoles.

ugrd is kinda designed to "just work" for encrypted root installs, as in it will detect most device info and make an initramfs specifically designed to boot back into that state.

Most project docs are in the "docs" folder of the git repo, there is a bit of info for writing modules in there, but I'd suggest checking out that dropbear module as an example of a working fully external module, or a simpler builtin module like the ext4 one which just adds shell lines to run FSCK for you.

The design is a bit unique but most of the actual code is done using python, where "init functions" take the python function name to be used as the shell function name, and lines returned by the python function are added to that shell function. You can see where it goes through functions and adds them to the profile here: https://github.com/desultory/ugrd/blob/ ... or.py#L243

This is a bit strange but mostly helps for cases where you want more complex formatting, or more dynamic functions. This is probably the best example of the more complex usage: https://github.com/desultory/ugrd/blob/ ... up.py#L484

I've been considering adding a proper template system, possibly using jinja, or an option to just inject shell scripts into the profile, but that adds a bit of complexity as ugrd tracks internal function names, stuff like that would make it a bit harder for it to check things like function name and binary name collisions.

For the ssh options, how should the system know which path it should take? maybe some type of timeout? I'm wondering how the admin will typically be expected to interact during the boot process, just key entry? I'm not sure it can be dome as simply as just adding alternate boot args, because it could just get stuck waiting for keys unless there is a watchdog or similar. Maybe the rd_fail function could be overridden and key entry could have a timeout, rd_fail could be set to do an instant reboot instead of waiting for user input. I'm not sure the best way to force using another option, it could be as simple as setting the next boot target in efi vars.

szatox · Post by **szatox** » Mon Mar 31, 2025 6:03 pm

I see, so there is no hello world. Damn, detailed documentation is great for doing advanced stuff, but at the same time it's a bit of a tarpit for me who doesn't yet know what to look for. Well, I guess I'll just throw it at a VM and see what happens.

I'm definitely not at the stage to make comments on the design or anything happening under the hood, just purely usage:
* Automatic detection is scary. I understand why it's convenient in most cases, I like the idea of things just working and I like doing things this way myself, but it tends to get in the way when used for something not originally intended. I want to build it on one machine and deploy to another. I know it's my fault I'm doing things in a weird way; I'm writing this point solely to make clear what I'm doing.
* Initramfs has no business setting efivars, and many systems using efi also keep using bootloaders anyway. Just leave setting boot flags to the user, it will be more predictable this way, and users will be able to set them permanently in config, or add at boot time as a one-time tweak as needed. I don't know what scenario would require any heuristics, timeouts, watchdogs or other guesswork on ssh.
* I am going to use ssh in intiramfs to install the system on a machine which doesn't allow custom installations, and then for decrypting rootfs during boot. I have had situations when I needed to interrupt boot on much more typical systems to either fix something gone wrong with the init or just circumvent permissions. Either way, I know in advance whether I'm going to do something in initramfs or not, so I can just inform it via a boot option.
The only exception was boot failing for some reason. The system state is undefined at this point, so open the door for a meatbag intelligence. This typically means dropping to a local shell, but if initramfs has been built with ssh, remote shell might be a good alternative.

pingtoo · Post by **pingtoo** » Mon Mar 31, 2025 6:37 pm

May I suggest another point of view

Abandon the initrd thinking. "initrd" really is just timing. As in when kernel pass control to user space.

So boot in to a system running entire (include rootfs) in memory is ideally for request "creative with my installation media". If your installation media can install supporting tools for target system on demand wouldn't it be nice? no predetermined fix set, so it will not be limited only for certain setup.

in my mind you can build a cpio image file using alpinelinux and let kernel run it at time when initrd phase kick in or you can just let the system boot to a diskless mode alpinelinux then perform mount real root and do switch_root if that is your preference it work same. Or even fancier use kexec to reboot with everything loaded the possibility is endless.

You would need to write a "init" script, that can examine kernel command line (/proc/cmdline) so it can base on keyword(s) to decide if it should start sshd or start finding rootfs and go even fancier do some automation on building a system (some sort of ghosting if you will). If the "init" script add some timing event handle you can make it base on time to perform task(s). Tips: if add "atd" (dynamically or statically) install in your "init" script, the timing event can be relatively contral by simply use "at now + 4 minutes do_this_command" line in your script to manage.

zen_desu · Post by **zen_desu** » Mon Mar 31, 2025 7:07 pm

szatox wrote:I see, so there is no hello world. Damn, detailed documentation is great for doing advanced stuff, but at the same time it's a bit of a tarpit for me who doesn't yet know what to look for. Well, I guess I'll just throw it at a VM and see what happens.

I'm definitely not at the stage to make comments on the design or anything happening under the hood, just purely usage:
* Automatic detection is scary. I understand why it's convenient in most cases, I like the idea of things just working and I like doing things this way myself, but it tends to get in the way when used for something not originally intended. I want to build it on one machine and deploy to another. I know it's my fault I'm doing things in a weird way; I'm writing this point solely to make clear what I'm doing.
* Initramfs has no business setting efivars, and many systems using efi also keep using bootloaders anyway. Just leave setting boot flags to the user, it will be more predictable this way, and users will be able to set them permanently in config, or add at boot time as a one-time tweak as needed. I don't know what scenario would require any heuristics, timeouts, watchdogs or other guesswork on ssh.
* I am going to use ssh in intiramfs to install the system on a machine which doesn't allow custom installations, and then for decrypting rootfs during boot. I have had situations when I needed to interrupt boot on much more typical systems to either fix something gone wrong with the init or just circumvent permissions. Either way, I know in advance whether I'm going to do something in initramfs or not, so I can just inform it via a boot option.
The only exception was boot failing for some reason. The system state is undefined at this point, so open the door for a meatbag intelligence. This typically means dropping to a local shell, but if initramfs has been built with ssh, remote shell might be a good alternative.

I may try to make a basic example module in the repo to help, maybe one that literally prints "hello world" during the initramfs process. https://github.com/desultory/ugrd/pull/246

The autodetection can pretty much entirely be disabled by setting "hostonly = false" in the config you're using. A key thing is that the validation system (often helpful) reads host info to validate against, so disabling hostonly also disables the majority of validation, but should still allow more basic checks to run. The autodetection in ugrd was very much designed to benefit the user without getting in the way.

I agree, the initramfs setting efivars idea was a bit of a hack, basically enabling the use of alternate cmdline args to reboot and do something else, so things don't get stuck in a boot loop. I'm trying to consider more unattended setups, which is where I think SSH would make sense. I'm not sure how to fit SSH into a more "general" boot setup as an option, but maybe it could be even simpler than this, simply using SSH if there is a timeout for local key entry.

https://github.com/desultory/ugrd/blob/ ... se.py#L185 this could maybe be changed to allow the user (or modules) to set a custom "recovery" target, so something other than rd_restart can be used.

pingtoo · Post by **pingtoo** » Mon Mar 31, 2025 8:27 pm

zen_desu,

May I suggest you add an "atd" module in to ugrd. This way you can have options to start network/ssh after some critical point or a condition met.

And there is no need to handle clean up of "atd" if rootfs can be successfully determined, just simply pkill it before umount/move /sys, /proc.

I been think design my initrd for some time now. and I want to use a FSM (Finite State Machine) model for my init script. I had identify some "event" as input for the FSM, I am thinking kernel command line parsing could be one, keyboard interaction is another one, and time event is also another one. So may be you can try design you ugrd (no necessary FSM) but by what can be the "trigger"/"event" in the init script flow so it can work more dynamically.

zen_desu · Post by **zen_desu** » Mon Mar 31, 2025 8:35 pm

pingtoo wrote:zen_desu,

May I suggest you add an "atd" module in to ugrd. This way you can have options to start network/ssh after some critical point or a condition met.

And there is no need to handle clean up of "atd" if rootfs can be successfully determined, just simply pkill it before umount/move /sys, /proc.

I been think design my initrd for some time now. and I want to use a FSM (Finite State Machine) model for my init script. I had identify some "event" as input for the FSM, I am thinking kernel command line parsing could be one, keyboard interaction is another one, and time event is also another one. So may be you can try design you ugrd (no necessary FSM) but by what can be the "trigger"/"event" in the init script flow so it can work more dynamically.

I'm not sure what you mean by "atd".

Maybe some sort of trigger/event system can be implemented in the shell script. Part of the design goal is that the flow of the actual shell script it creates should be pretty understandable, as in you can read it through once and have a very good idea of what should happen.

ugrd has some "internal" variables it keeps in /run/ugrd which help it have memory in case things restart, and helps not repeat things. These are cleared just before swtich_root during a cleanup phase. I think this sort of thing could be implemented within that system.

An advantage to this design is that if a user gets a recovery shell, they can set the vars, exit, and have it act differently. The final consideration is keeping this as a "simple shell script", maybe it could have certain checkpoints where it could change the flow depending on the state of things, but I like that there is not really any "magic" tucked into the final image it makes, it's no more than a CPIO with some POSIX shell.

pingtoo · Post by **pingtoo** » Mon Mar 31, 2025 8:46 pm

zen_desu wrote:I'm not sure what you mean by "atd".

I am sorry I did not make it clear. It is a daemon process. In gentoo the package is sys-process/at
The package provide two commands, one is a user command "at" and the other is a daemon command "atd". you start atd at beginning of system, then use "at" command to define execution at some point of time. more detail can be find in Gentoo wiki and man page.

pingtoo · Post by **pingtoo** » Mon Mar 31, 2025 8:52 pm

zen_desu wrote:...

Thank you for reply. If you think we should discuss further we can start a new thread. this is out side of OP's topic.

szatox · Post by **szatox** » Tue Apr 01, 2025 8:20 pm

Abandon the initrd thinking. "initrd" really is just timing. As in when kernel pass control to user space.

Trying to kill 2 birds with one stone here. Yes, using a full-blown liveCD that loads into RAM as an installation medium is a viable option. I've been using similar tricks in the past (topping it off with PXE and WoL, followed by autodiscovery via avahi to make things even more plug'n'play). I want to use initramfs, because I will need an intermediate stage during boot and I hope to keep using the same tool.

May I suggest you add an "atd" module in to ugrd. This way you can have options to start network/ssh after some critical point or a condition met

atd is an ad-hoc counterpart to cron. It's good when you want a non-interactive, resource-hungry job to run when you're not around. SSH is an exact opposite of that.
BTW, adding atd implies this thing is going to run for a long time. It changes direction from being an initramfs to being a liveCD. Which is fine, but a project with a new goal could also get a new name, don't you think?

.

Back to the original topic: A tester walks into a pub and orders 0 beers...
I mean, I tried running ugrd on a VM with root on 9p. It didn't like having 0 block devices

fs/mounts.py fails at line 207, _get_mount_source_type
"No source type found in mount: {'options': {'ro'}, 'destination': PosixPath('/target_rootfs'), 'base_mount': False }

hostonly = false, uncommenting 'label = ...' and adding --no-autodetect-root didn't make any difference.
There is a bunch of other autodetect flags set to true (dm, lvm, raid...), do I have to disable every one of them manually for it to not query disks? I though --no-autodetect-root would disable all more specific tests as well.
I don't mind retrying it on a different VM, but I thought you'd want to know this result. Or did I just set the bar too high with this one?

I'm not sure how to fit SSH into a more "general" boot setup as an option, but maybe it could be even simpler than this, simply using SSH if there is a timeout for local key entry.
(...)
this could maybe be changed to allow the user (or modules) to set a custom "recovery" target, so something other than rd_restart can be used.

Aha! That explains why you didn't like enabling ssh via boot flags. I did not even consider automatic reboot as a recovery option in case of a boot failure.

In my experience there is so little randomness at this stage I consider it to be effectively deterministic. It either boots every time or fails every time, so there is no need for timeouts or guesswork, only admin's preferences and mistakes (and occasional hardware failures). And since I'm the admin and it is my preference, I can tell in advance whether I want to do local key entry or network unlock.
And at this point I'm actually curious what was the story behind your choice.

zen_desu · Post by **zen_desu** » Tue Apr 01, 2025 8:37 pm

szatox wrote:
Abandon the initrd thinking. "initrd" really is just timing. As in when kernel pass control to user space.
Trying to kill 2 birds with one stone here. Yes, using a full-blown liveCD that loads into RAM as an installation medium is a viable option. I've been using similar tricks in the past (topping it off with PXE and WoL, followed by autodiscovery via avahi to make things even more plug'n'play). I want to use initramfs, because I will need an intermediate stage during boot and I hope to keep using the same tool.

May I suggest you add an "atd" module in to ugrd. This way you can have options to start network/ssh after some critical point or a condition met
atd is an ad-hoc counterpart to cron. It's good when you want a non-interactive, resource-hungry job to run when you're not around. SSH is an exact opposite of that.
BTW, adding atd implies this thing is going to run for a long time. It changes direction from being an initramfs to being a liveCD. Which is fine, but a project with a new goal could also get a new name, don't you think?

.

Back to the original topic: A tester walks into a pub and orders 0 beers...
I mean, I tried running ugrd on a VM with root on 9p. It didn't like having 0 block devices
fs/mounts.py fails at line 207, _get_mount_source_type
"No source type found in mount: {'options': {'ro'}, 'destination': PosixPath('/target_rootfs'), 'base_mount': False }

hostonly = false, uncommenting 'label = ...' and adding --no-autodetect-root didn't make any difference.
There is a bunch of other autodetect flags set to true (dm, lvm, raid...), do I have to disable every one of them manually for it to not query disks? I though --no-autodetect-root would disable all more specific tests as well.
I don't mind retrying it on a different VM, but I thought you'd want to know this result. Or did I just set the bar too high with this one?

I'm not sure how to fit SSH into a more "general" boot setup as an option, but maybe it could be even simpler than this, simply using SSH if there is a timeout for local key entry.
(...)
this could maybe be changed to allow the user (or modules) to set a custom "recovery" target, so something other than rd_restart can be used.
Aha! That explains why you didn't like enabling ssh via boot flags. I did not even consider automatic reboot as a recovery option in case of a boot failure.

In my experience there is so little randomness at this stage I consider it to be effectively deterministic. It either boots every time or fails every time, so there is no need for timeouts or guesswork, only admin's preferences and mistakes (and occasional hardware failures). And since I'm the admin and it is my preference, I can tell in advance whether I want to do local key entry or network unlock.
And at this point I'm actually curious what was the story behind your choice.

9p storage types have not been tested at all on ugrd, I can look into that, doesn't seem like it should be too much of a challenge. The main consideration is that it gets most storage info using the current mounts, and blkid, so if the 9p info is not there, I'll need to figure out a new info source.

When you disable hostonly mode, most detection is automatically disabled. If you're not using hostonly mode, you must manually define the mount info, it can be like:

Code: Select all

[mounts.root]
uuid = "aaaaabbbbcccccdddd"

More info here: https://github.com/desultory/ugrd/blob/ ... rdfsmounts

One thing to pay special attention to is the toml section. label= must be set under a [mounts.<name>] block.

The mount named 'root' is the only real special mount, and it automatically sets the destination to "/target_rootfs" because generally, added mounts default to using their name if a destination is not set. Using /root doesn't really work as the root homedir is occasionally needed: https://github.com/desultory/ugrd/blob/ ... ml#L83-L86

I should probably add a special warning message for an unconfigured root mount when hostonly or other relevant detection methods are disabled.
Maybe it would make sense to drop the root definition requirement if hostonly is not used? I guess that kinda makes sense. ugrd, unless the cmdline module is masked, will try to use info passed in root= first, but uses the "validated" info as a fallback. A fallback in that form stops making sense when you're specifically building for another host, but makes sense if you can expect that host to always boot from particular storage.

On the note of livecd stuff, ugrd has a livecd module which is more or less really just a wrapper for the squashfs mounting stuff often done with livecds.

szatox · Post by **szatox** » Tue Apr 01, 2025 8:50 pm

9p storage types have not been tested at all on ugrd, I can look into that, doesn't seem like it should be too much of a challenge

Don't worry too much about it. The actual target will be a different machine anyway, I was just surprised. I was just surprised disabling validation against local host didn't make it ignore this error.

Thanks for hints so far, I'll see what I can do and return with whatever result I'm about to get

Edit: Aaand I know what I missed.
I uncommented the label option, but not the mount root header. Fixing this part did help.
Getting on with making it actually do the things I want....

zen_desu · Post by **zen_desu** » Tue Apr 01, 2025 9:11 pm

szatox wrote:
9p storage types have not been tested at all on ugrd, I can look into that, doesn't seem like it should be too much of a challenge
Don't worry too much about it. The actual target will be a different machine anyway, I was just surprised. I was just surprised disabling validation against local host didn't make it ignore this error.

Thanks for hints so far, I'll see what I can do and return with whatever result I'm about to get

Edit: Aaand I know what I missed.
I uncommented the label option, but not the mount root header. Fixing this part did help.
Getting on with making it actually do the things I want....

The error was a bit confusing, but it was failing because the root was only partially defined. ugrd checks all mounts have a valid source and destination, and the root mount comes half made, more or less. So because it did not autodetect, the later error was a bit cryptic. The question I have is whether or not it makes sense to ignore this "issue" when hostonly is disabled, or just to have a big error and say "danger danger you need to set root= in your bootloader!!!"

Good luck getting it moving forward, one thing of particular importance here is that the LVM module kinda blindly find LVM devices, the mount portion then expects that worked, and attempts to do mounts from devices which should be available once LVM devices are initialized. It will just fail if for some reason that doesn't happen. This works very well if it's LVM under LUKS, as the LUKS device will lilely be manually decrypted, so you can reasonably expect LVM devices to be there after; I'm not sure what issues may happen if plain LVM is used. Currently, ugrd will keep attempting mounts until it works, or the user provides input to make it "break" and restart. It may be worth having it re-attempt LVM scans when LVM is in use, and it can't find the mount. If it does fail, you should be able to press enter to make it restart and it _should_ work the second time around if the issue was based on slow device init.

It would also be possible to have the LVM init portion actually check that a certain device is present after it does the init, but there is a bit of a concern how that info is provided, and if things ever change, that could make it fail to boot when there is no issue. IMO the worst bugs are when checks fail at the worst time and things don't move along even though things are fine.

pingtoo · Post by **pingtoo** » Tue Apr 01, 2025 10:09 pm

Sorry to interject but just want to bring some ideas and perhaps answer why "atd".

that the LVM module kinda blindly find LVM devices, the mount portion then expects that worked, and attempts to do mounts from devices which should be available once LVM devices are initialized. It will just fail if for some reason that doesn't happen.

May be there could be some code logic kind work like kernel fail (panic) for root device. somehow list everything found so far, and possible gave user a chance to choose from one from list?

And some sort of time control would be nice for above ideas in situation in a headless environment. It could also be some form of email notification after timeout so that is why having a "atd" help in some situation. (trying to make initrd smart)

szatox · Post by **szatox** » Wed Apr 02, 2025 12:58 am

The question I have is whether or not it makes sense to ignore this "issue" when hostonly is disabled, or just to have a big error and say "danger danger you need to set root= in your bootloader!!!"

We've been setting root= since the dawn of time, with initrd and without it. Nobody will get confused by keeping it that way.

It seems I can add arbitrary executables simply by specifying binaries = [ ] in the config, this will make adding extra tools trivial.
Things are starting to look pretty good actually. I think I'm just missing that ssh module, and a suitable kernel. I'm leaving those 2 parts for tomorrow though.

pingtoo, you'll need an MTA too.
On a more serious note: whoever happens to be holding "THE Phone"™ at that time will surely notify you, in no uncertain terms, that your unattended reboot lit the monitoring up like a Christmas tree

zen_desu · Post by **zen_desu** » Wed Apr 02, 2025 1:23 am

szatox wrote:
The question I have is whether or not it makes sense to ignore this "issue" when hostonly is disabled, or just to have a big error and say "danger danger you need to set root= in your bootloader!!!"
We've been setting root= since the dawn of time, with initrd and without it. Nobody will get confused by keeping it that way.

It seems I can add arbitrary executables simply by specifying binaries = [ ] in the config, this will make adding extra tools trivial.
Things are starting to look pretty good actually. I think I'm just missing that ssh module, and a suitable kernel. I'm leaving those 2 parts for tomorrow though.

pingtoo, you'll need an MTA too.
On a more serious note: whoever happens to be holding "THE Phone"™ at that time will surely notify you, in no uncertain terms, that your unattended reboot lit the monitoring up like a Christmas tree

Yes, it's not a big deal forcing the use of root=, but part of the goal of ugrd is for it to have no issue booting the system it built for if validation is enabled, so while minor, that sort of thing would make it hard to guarantee a successful boot, as ugrd does not do anything but make the initramfs itself. Sometimes grub likes to pass strange root= parameters, which is much more of a problem for dracut because it doesn't behave this way. Basically, the name of a mapped device mapper device can really be anything, and grub likes to set root= to that, but it could be unlocked to another name and then if that path is used for the root target, things break even though the device was successfully decrypted; one of the many woes involved with using paths for a root option, but one that happens often because of how grub works.

That's what the `binaries` config option is for

it's used extensively in the internal modules, and it does a lot to ensure libraries are included and usable at runtime, but depending on the tool, more may need to be added. This is the case for things like plymouth, and it's very hard to know when this will be a problem unless you test and/or check software docs and do a bit to copy installed components. The sister to `binaries` is `dependencies` which copies a file from the root into the initramfs with the same path. `copies` is like dependencies but lets you adjust the target path.

The dropbear module should be workable, but I could help make a proper openssh module. That is something I'd like to be included as a core module/option. I've honestly just been unsure how it should be presented because this is not something I use. Your use case has made it clear that one good option could be using SSH as a sort of recovery option, in the event of a timeout, which helps a lot.

A side note is that ugrd is designed to be very secure as a baseline, I wonder the best way to wrap the issue of SSH host keys in an initramfs image. I mean copying the build host's keys into an initramfs which is stored on an unencrypted volume is a bad idea, and if it generates new keys at each boot, that's better but it's hard to know if you're logging into a honeypot. Possibly ugrd could store keys specifically used for the initramfs, so they are not random, but that doesn't address someone being able to open the initramfs, extract those keys, and use them later.

szatox · Post by **szatox** » Wed Apr 02, 2025 9:25 pm

I just got to it and I'm still building stuff...
Anyway, I cloned your dropbear repo into ugrd's directory (is there any path intended for local modules?) and after a few attempts renamed it to "net", created a host_key, and then got stuck on authorized_keys:

Code: Select all

dropbear_authorized_keys: /etc/dropbear/authorized_keys

ERROR    | GeneratorHelpers._write() got an unexpected keyword argument 'append'
Traceback (most recent call last):
  File "/usr/lib/python3.12/site-packages/ugrd/main.py", line 169, in main
    generator.build()
  File "/usr/lib/python3.12/site-packages/ugrd/initramfs_generator.py", line 95, in build
    self.run_build()
  File "/usr/lib/python3.12/site-packages/ugrd/initramfs_generator.py", line 244, in run_build
    self.run_hook(task, force_exclude=True)
  File "/usr/lib/python3.12/site-packages/ugrd/initramfs_generator.py", line 160, in run_hook
    if function_output := self.run_func(function, *args, **kwargs):
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/ugrd/initramfs_generator.py", line 111, in run_func
    if function_output := function(self):
                          ^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/ugrd/net/dropbear.py", line 53, in dropbear_finalize
    self._write("etc/passwd", "root:x:0:0:root:/root:/bin/sh\n", append=True)
TypeError: GeneratorHelpers._write() got an unexpected keyword argument 'append'

What should I do with it?

Your use case has made it clear that one good option could be using SSH as a sort of recovery option, in the event of a timeout, which helps a lot.

A side note is that ugrd is designed to be very secure as a baseline, I wonder the best way to wrap the issue of SSH host keys in an initramfs image. I mean copying the build host's keys into an initramfs which is stored on an unencrypted volume is a bad idea

Well, as long as I can set that timeout to 0 for immediate access...

Yeah, that security hole is a bit of an issue, but I don't see any way to fix it. Basically, if you have your servers on-premise, you surely won't be honeypotting yourself, and if it's off-premise there's nothing you can do except trust the hosting provider. And a hosting provider can also wiretap your virtual KVM, so unlocking disks with a "local" input isn't really any more secure.

zen_desu · Post by **zen_desu** » Wed Apr 02, 2025 9:33 pm

szatox wrote:I just got to it and I'm still building stuff...
Anyway, I cloned your dropbear repo into ugrd's directory (is there any path intended for local modules?) and after a few attempts renamed it to "net", created a host_key, and then got stuck on authorized_keys:
Code: Select all
dropbear_authorized_keys: /etc/dropbear/authorized_keys

ERROR    | GeneratorHelpers._write() got an unexpected keyword argument 'append'
Traceback (most recent call last):
  File "/usr/lib/python3.12/site-packages/ugrd/main.py", line 169, in main
    generator.build()
  File "/usr/lib/python3.12/site-packages/ugrd/initramfs_generator.py", line 95, in build
    self.run_build()
  File "/usr/lib/python3.12/site-packages/ugrd/initramfs_generator.py", line 244, in run_build
    self.run_hook(task, force_exclude=True)
  File "/usr/lib/python3.12/site-packages/ugrd/initramfs_generator.py", line 160, in run_hook
    if function_output := self.run_func(function, *args, **kwargs):
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/ugrd/initramfs_generator.py", line 111, in run_func
    if function_output := function(self):
                          ^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/ugrd/net/dropbear.py", line 53, in dropbear_finalize
    self._write("etc/passwd", "root:x:0:0:root:/root:/bin/sh\n", append=True)
TypeError: GeneratorHelpers._write() got an unexpected keyword argument 'append'
What should I do with it?

Your use case has made it clear that one good option could be using SSH as a sort of recovery option, in the event of a timeout, which helps a lot.

A side note is that ugrd is designed to be very secure as a baseline, I wonder the best way to wrap the issue of SSH host keys in an initramfs image. I mean copying the build host's keys into an initramfs which is stored on an unencrypted volume is a bad idea
Well, as long as I can set that timeout to 0 for immediate access...
Yeah, that security hole is a bit of an issue, but I don't see any way to fix it. Basically, if you have your servers on-premise, you surely won't be honeypotting yourself, and if it's off-premise there's nothing you can do except trust the hosting provider. And a hosting provider can also wiretap your virtual KVM, so unlocking disks with a "local" input isn't really any more secure.

Ah, the module requires v2+, so will only work with the latest 9999 (git) build. I've been meaning to do a proper v2 release, but have been real busy and a few features I _wanted_ to be part of that release may not make it, and may make it in 2.1 or 2.2 or something. v2 had a somewhat substantial overhaul which should make modules a bit easier to use. Before, there was no ordering system so when hooks ran in each init level was determined purely by import order, which was a bit tricky to work around a times. A few older config options were deprecated as well, but other than that v2 is mostly a visual overhaul with more consistent coloring, log formatting, and better log messages with fewer tracebacks. There are enough few small changes that I want to be very sure things don't break over a minor but malformed change.

One thing to note is that while I'm not rushing to tell anyone to use the 9999 build, it is the one I use to boot all of my personal systems, and _should_ have the fewest bugs/most refinement. I typically only backport fixes for more serious/breaking issues, which don't happen as often because of testing, but some use cases can't be easily/properly tested, especially particular hardware configurations.

Concerning security, that is kinda how I see it. I feel about equally comfortable using the server's management interface. On my server, I mostly enter keys over IPMI or RS232, neither being particularly secure, but if someone has local access, SSH in this form won't really be secure either.

szatox · Post by **szatox** » Thu Apr 03, 2025 1:16 pm

Ah, I see. I unmasked the live ebuild and its dependencies, and... kmod started failing even though I already had it disabled with no_kmod.
Ended up adding another one of those
@unset("no_kmod", "no_kmod is enabled, skipping.", log_level=30)
in front of
def _add_kmod_firmware(self, kmod: str) -> None: (line 263)

I don't really speak python, but it at least let the build go through to the end.
Alright, I think it's time to test the archive created in step 1.

zen_desu · Post by **zen_desu** » Thu Apr 03, 2025 4:39 pm

szatox wrote:Ah, I see. I unmasked the live ebuild and its dependencies, and... kmod started failing even though I already had it disabled with no_kmod.
Ended up adding another one of those
@unset("no_kmod", "no_kmod is enabled, skipping.", log_level=30)
in front of
def _add_kmod_firmware(self, kmod: str) -> None: (line 263)

I don't really speak python, but it at least let the build go through to the end.
Alright, I think it's time to test the archive created in step 1.

Thanks for finding that, no_kmod is an option which needs more test coverage. Based on the description, my guess is that it's looking for modinfo for kmods defined by builtin modules, so like looking for "btrfs" even though that may be builtin, and if there are _no_ kmods, that lookup will instantly fail. So that firmware finding function should probably check for that before doing a lookup.

You may not think you know python, but that decorator usage looks perfect to me. Those @unset bits more or less restrict functions based on set config. I'm looking into it more, but I think it attempting to include firmware for "non-modules" is intentional. This is because if you build in a kernel module, and it needs firmware, that must either be in the kernel _or_ an initramfs image. Maybe there could be a "no-builtin-firmware" type of toggle, as the current behavior can help in some cases. Currently there is a https://github.com/desultory/ugrd/blob/ ... d.toml#L16 `kmod_pull_firmware` which applies to all types, but a more specific option could be more clear, or turning that error into a warning if `no_kmod` is set.

This should fix it: https://github.com/desultory/ugrd/pull/248
It will log a fair number of warnings for kmods required by modules.

szatox · Post by **szatox** » Thu Apr 03, 2025 10:43 pm

I certainly don't know python, but the god of programming, Copy Pasta, answers like half of all prayers and lets you try again if you're not satisfied with the result

The module that triggered that error for me was dm_crypt requested by cryptsetup. I don't think this one even uses any firmware, but yes, there was an error message regarding firmware too. I'd actually expect firmware to be together with modules that need it, so since I'm not packing any modules in initramfs, there should be no need for firmware either.
Anyway, pulling your version of the fix didn't break things again for me, so I guess it's good.

I can't bring my system on-line though. I need a static IP configuration, so I enabled ugrd.net.net and added ip_address = ip/mask and ip_gateway, which didn't really do anything. Actually, can I override it with kernel params?
At this moment I'm not sure whether it's a problem with ugrd or my brand-new completely untested kernel though, so I went ahead looking for shell access. Enabled console, which made the resulting initramfs much more talkactive, but even though there is an automatic login prompt, I can't really use any commands. It only responds to Enter (sometimes).
I'm going to give debug a shot. Maybe this way I'll be able to have a good look at the thing while it's running.

Actually, I also have a bit of a problem with the config file structure. I'm sure there must be some logic behind it, but I don't really know whether adding new variables at the top of the file is good enough, or do I have to prepend them with [some.header] or not, or what is the scope of that header, and if the order of active lines is important and so on.
Basically, it's somewhat similar to ini, but not quite, and it feels inconsistent and unintuitive.
One thing I am fairly sure about is that all my changes go into /etc/ugrd/config.toml, and configs provided with the modules are not meant to be touched.
To be fair, with autodetect in place, most users probably won't even see it, so it's not that big of a deal.

BTW, looking at different things I noticed that in cmdline you're parsing boot_option=value pairs and turn them into variables. It's redundant; kernel already provides them as env to init.
Funny thing, a long time ago I saw a similar pattern in genkernel so I _obviously_ replicated it in my own initramfs, and then tried reusing using it for something else, and then - running in a different context - it finally turned out that code didn't even work

zen_desu · Post by **zen_desu** » Thu Apr 03, 2025 11:01 pm

szatox wrote:I certainly don't know python, but the god of programming, Copy Pasta, answers like half of all prayers and lets you try again if you're not satisfied with the result
The module that triggered that error for me was dm_crypt requested by cryptsetup. I don't think this one even uses any firmware, but yes, there was an error message regarding firmware too. I'd actually expect firmware to be together with modules that need it, so since I'm not packing any modules in initramfs, there should be no need for firmware either.
Anyway, pulling your version of the fix didn't break things again for me, so I guess it's good.

Great, that was a dumb bug on my part. As a general flow, attempts are made to see if modules require firmware in every case, who knows when things could change or what is required on unknown hardware. For the internal modules, I don't think anything _needs_ firmware so this unfortunately is just a way it could fail, if the checks fail (normally this would be avoided by having the now-existing fixes).

szatox wrote: I can't bring my system on-line though. I need a static IP configuration, so I enabled ugrd.net.net and added ip_address = ip/mask and ip_gateway, which didn't really do anything. Actually, can I override it with kernel params?
At this moment I'm not sure whether it's a problem with ugrd or my brand-new completely untested kernel though, so I went ahead looking for shell access. Enabled console, which made the resulting initramfs much more talkactive, but even though there is an automatic login prompt, I can't really use any commands. It only responds to Enter (sometimes).
I'm going to give debug a shot. Maybe this way I'll be able to have a good look at the thing while it's running.

If you enable the debug module, this may help some, as it'll give you a shell you can use.

https://github.com/desultory/ugrd/blob/ ... py#L46-L63 This portion is responsible for net setup, it should raise an exception if required config is missing. I think I only tested that module in a VM. Could it be that required interfaces are missing because of missing kmods? It does "net device" detection using mac addresses, that way hopefully the right device gets used, as it'll be named by the kernel at this point.

szatox wrote: Actually, I also have a bit of a problem with the config file structure. I'm sure there must be some logic behind it, but I don't really know whether adding new variables at the top of the file is good enough, or do I have to prepend them with [some.header] or not, or what is the scope of that header, and if the order of active lines is important and so on.
Basically, it's somewhat similar to ini, but not quite, and it feels inconsistent and unintuitive.
One thing I am fairly sure about is that all my changes go into /etc/ugrd/config.toml, and configs provided with the modules are not meant to be touched.
To be fair, with autodetect in place, most users probably won't even see it, so it's not that big of a deal.

BTW, looking at different things I noticed that in cmdline you're parsing boot_option=value pairs and turn them into variables. It's redundant; kernel already provides them as env to init.
Funny thing, a long time ago I saw a similar pattern in genkernel so I _obviously_ replicated it in my own initramfs, and then tried reusing using it for something else, and then - running in a different context - it finally turned out that code didn't even work

This is TOML "fun", but more or less, the top portion of the file is for global variables, once you define a section, anything defined under there is part of that, acting like dictionary keys usually. This continues until the next defined section. It's like INI but in most ways stricter. The main gotcha is defining things under the wrong section.

Yes, for most uses, the user only needs to edit the "main" config to set things like key file/header file locations, but autodetection should be sufficient for most cases. The secret is that ugrd only really processes config once, starting with the "base" config, then processing the user config, starting with included modules. This means that everything is processed exactly the same, and you can see user config as extension of all of the module config. The goal is for all definition/config to be cohesive throughout the project, so making a module isn't really much different than making config. The main thing differentiating modules from user config is that modules often have associated python bits and can define new variables (defined in `custom_parameters`). The main point of defining them this way is to ensure types are sane, and so users can be warned if unknown config options are being set.

Thank you for pointing out that I don't need to parse stuff myself, I'll look into that. The current parser is also not perfect, just works "well enough". Honestly, ugrd was originally designed with the mindset that "it's making an image only for the system which made it, so cmdline parameters are useless" but obviously that doesn't apply to everything and having options is nice. Until more recent versions, the user had to explicitly enable cmdline parsing, this is also a bit of a security consideration, as ugrd gives no recovery shell by default, so it's a decent way to prevent an unauthorized root shell. I've kinda backed up on that a bit because if a user really cares they can restrict cmdline changes in other way. The recovery arg has to be added, so I think it's fine it won't give a shell by default, but doesn't require rebuilding to enable this option.

A side note is a fair bit of the design was taken from existing systems, just reshaped, and often simplified/modernized. I recall seeing most other systems did something to parse this so kinda assumed it was necessary. Maybe this is necessary for older kernels? I'm not trying to break support for very old versions, but don't see a need to work hard to support < linux 6.x

It's great when feedback like this is provided so things can improve, so thanks for that

Rescue initramfs, anyone? With SSH, LUKS, LVM?

Rescue initramfs, anyone? With SSH, LUKS, LVM?

Re: Rescue initramfs, anyone? With SSH, LUKS, LVM?