I made a basic dropbear module for ugrd, just to test. I can help you make a module specifically for this project if you're interested.szatox wrote:I'm basically rebuilding a voyager, which means I need to be a bit creative with my installation media. Unfortunately, the way I used to manually build initramfs currently results in its contents exceeding 1GB, which is... Way, too much.
So, does anyone have a readily available initramfs (or a recipe for one) which would let me connect via SSH, repartition disk, encrypt it, create an LVM and format root?
Genkernel seems to be falling just a bit short; its included tools are _too_ minimal to my liking. They look kinda familiar, but I don't know how to use them.
Storing a part of initramfs in a squashfs does sound like a cool feature, but I don't think it is something that anyone (besides me right now) would care about. I'm just trying to abuse the hell out of the tools that already exist to avoid doing the work myself.I could possibly add a dev option to make it so the generated initramfs doesn't attempt to switch_root, possibly making it run a a specified target within the initramfs. Making it use a squashfs is also another option.
szatox wrote:Honestly both options look like they should meet my requirements. Alpine is known for its focus on being small, and using a squashfs as root could really save me a good chunk of precious RAM.
Still, since genkernel is deprecated now, I might as well start thinking about replacing it on my systems.
Zen_desu, I see ugrd has a pretty extensive reference documentation (good), but is there any quickstart guide? You know, some equivalent of a "Hello world"?
Storing a part of initramfs in a squashfs does sound like a cool feature, but I don't think it is something that anyone (besides me right now) would care about. I'm just trying to abuse the hell out of the tools that already exist to avoid doing the work myself.I could possibly add a dev option to make it so the generated initramfs doesn't attempt to switch_root, possibly making it run a a specified target within the initramfs. Making it use a squashfs is also another option.
About ssh itself, IMO there should be 2 or 3 paths selected by a boot flag: activate server and wait for the admin to kill it before mounting root (interactive boot), or don't activate sshd and just proceed with boot (regular boot). The optional 3rd path would be: activate sshd in case of boot failure instead of dropping to local shell (remote rescue mode).
This looks like handy thing to have on pretty much any server, avoiding going through annoyingly slow and clunky webconsoles.
I may try to make a basic example module in the repo to help, maybe one that literally prints "hello world" during the initramfs process. https://github.com/desultory/ugrd/pull/246szatox wrote:I see, so there is no hello world. Damn, detailed documentation is great for doing advanced stuff, but at the same time it's a bit of a tarpit for me who doesn't yet know what to look for. Well, I guess I'll just throw it at a VM and see what happens.
I'm definitely not at the stage to make comments on the design or anything happening under the hood, just purely usage:
* Automatic detection is scary. I understand why it's convenient in most cases, I like the idea of things just working and I like doing things this way myself, but it tends to get in the way when used for something not originally intended. I want to build it on one machine and deploy to another. I know it's my fault I'm doing things in a weird way; I'm writing this point solely to make clear what I'm doing.
* Initramfs has no business setting efivars, and many systems using efi also keep using bootloaders anyway. Just leave setting boot flags to the user, it will be more predictable this way, and users will be able to set them permanently in config, or add at boot time as a one-time tweak as needed. I don't know what scenario would require any heuristics, timeouts, watchdogs or other guesswork on ssh.
* I am going to use ssh in intiramfs to install the system on a machine which doesn't allow custom installations, and then for decrypting rootfs during boot. I have had situations when I needed to interrupt boot on much more typical systems to either fix something gone wrong with the init or just circumvent permissions. Either way, I know in advance whether I'm going to do something in initramfs or not, so I can just inform it via a boot option.
The only exception was boot failing for some reason. The system state is undefined at this point, so open the door for a meatbag intelligence. This typically means dropping to a local shell, but if initramfs has been built with ssh, remote shell might be a good alternative.
I'm not sure what you mean by "atd".pingtoo wrote:zen_desu,
May I suggest you add an "atd" module in to ugrd. This way you can have options to start network/ssh after some critical point or a condition met.
And there is no need to handle clean up of "atd" if rootfs can be successfully determined, just simply pkill it before umount/move /sys, /proc.
I been think design my initrd for some time now. and I want to use a FSM (Finite State Machine) model for my init script. I had identify some "event" as input for the FSM, I am thinking kernel command line parsing could be one, keyboard interaction is another one, and time event is also another one. So may be you can try design you ugrd (no necessary FSM) but by what can be the "trigger"/"event" in the init script flow so it can work more dynamically.
I am sorry I did not make it clear. It is a daemon process. In gentoo the package is sys-process/atzen_desu wrote:I'm not sure what you mean by "atd".
Trying to kill 2 birds with one stone here. Yes, using a full-blown liveCD that loads into RAM as an installation medium is a viable option. I've been using similar tricks in the past (topping it off with PXE and WoL, followed by autodiscovery via avahi to make things even more plug'n'play). I want to use initramfs, because I will need an intermediate stage during boot and I hope to keep using the same tool.Abandon the initrd thinking. "initrd" really is just timing. As in when kernel pass control to user space.
atd is an ad-hoc counterpart to cron. It's good when you want a non-interactive, resource-hungry job to run when you're not around. SSH is an exact opposite of that.May I suggest you add an "atd" module in to ugrd. This way you can have options to start network/ssh after some critical point or a condition met
Aha! That explains why you didn't like enabling ssh via boot flags. I did not even consider automatic reboot as a recovery option in case of a boot failure.I'm not sure how to fit SSH into a more "general" boot setup as an option, but maybe it could be even simpler than this, simply using SSH if there is a timeout for local key entry.
(...)
this could maybe be changed to allow the user (or modules) to set a custom "recovery" target, so something other than rd_restart can be used.
9p storage types have not been tested at all on ugrd, I can look into that, doesn't seem like it should be too much of a challenge. The main consideration is that it gets most storage info using the current mounts, and blkid, so if the 9p info is not there, I'll need to figure out a new info source.szatox wrote:Trying to kill 2 birds with one stone here. Yes, using a full-blown liveCD that loads into RAM as an installation medium is a viable option. I've been using similar tricks in the past (topping it off with PXE and WoL, followed by autodiscovery via avahi to make things even more plug'n'play). I want to use initramfs, because I will need an intermediate stage during boot and I hope to keep using the same tool.Abandon the initrd thinking. "initrd" really is just timing. As in when kernel pass control to user space.
atd is an ad-hoc counterpart to cron. It's good when you want a non-interactive, resource-hungry job to run when you're not around. SSH is an exact opposite of that.May I suggest you add an "atd" module in to ugrd. This way you can have options to start network/ssh after some critical point or a condition met
BTW, adding atd implies this thing is going to run for a long time. It changes direction from being an initramfs to being a liveCD. Which is fine, but a project with a new goal could also get a new name, don't you think?
.
Back to the original topic: A tester walks into a pub and orders 0 beers...
I mean, I tried running ugrd on a VM with root on 9p. It didn't like having 0 block devices![]()
fs/mounts.py fails at line 207, _get_mount_source_type
"No source type found in mount: {'options': {'ro'}, 'destination': PosixPath('/target_rootfs'), 'base_mount': False }
hostonly = false, uncommenting 'label = ...' and adding --no-autodetect-root didn't make any difference.
There is a bunch of other autodetect flags set to true (dm, lvm, raid...), do I have to disable every one of them manually for it to not query disks? I though --no-autodetect-root would disable all more specific tests as well.
I don't mind retrying it on a different VM, but I thought you'd want to know this result. Or did I just set the bar too high with this one?
Aha! That explains why you didn't like enabling ssh via boot flags. I did not even consider automatic reboot as a recovery option in case of a boot failure.I'm not sure how to fit SSH into a more "general" boot setup as an option, but maybe it could be even simpler than this, simply using SSH if there is a timeout for local key entry.
(...)
this could maybe be changed to allow the user (or modules) to set a custom "recovery" target, so something other than rd_restart can be used.
In my experience there is so little randomness at this stage I consider it to be effectively deterministic. It either boots every time or fails every time, so there is no need for timeouts or guesswork, only admin's preferences and mistakes (and occasional hardware failures). And since I'm the admin and it is my preference, I can tell in advance whether I want to do local key entry or network unlock.
And at this point I'm actually curious what was the story behind your choice.
Code: Select all
[mounts.root]
uuid = "aaaaabbbbcccccdddd"
Don't worry too much about it. The actual target will be a different machine anyway, I was just surprised. I was just surprised disabling validation against local host didn't make it ignore this error.9p storage types have not been tested at all on ugrd, I can look into that, doesn't seem like it should be too much of a challenge
The error was a bit confusing, but it was failing because the root was only partially defined. ugrd checks all mounts have a valid source and destination, and the root mount comes half made, more or less. So because it did not autodetect, the later error was a bit cryptic. The question I have is whether or not it makes sense to ignore this "issue" when hostonly is disabled, or just to have a big error and say "danger danger you need to set root= in your bootloader!!!"szatox wrote:Don't worry too much about it. The actual target will be a different machine anyway, I was just surprised. I was just surprised disabling validation against local host didn't make it ignore this error.9p storage types have not been tested at all on ugrd, I can look into that, doesn't seem like it should be too much of a challenge
Thanks for hints so far, I'll see what I can do and return with whatever result I'm about to get
Edit: Aaand I know what I missed.
I uncommented the label option, but not the mount root header. Fixing this part did help.
Getting on with making it actually do the things I want....
May be there could be some code logic kind work like kernel fail (panic) for root device. somehow list everything found so far, and possible gave user a chance to choose from one from list?that the LVM module kinda blindly find LVM devices, the mount portion then expects that worked, and attempts to do mounts from devices which should be available once LVM devices are initialized. It will just fail if for some reason that doesn't happen.
We've been setting root= since the dawn of time, with initrd and without it. Nobody will get confused by keeping it that way.The question I have is whether or not it makes sense to ignore this "issue" when hostonly is disabled, or just to have a big error and say "danger danger you need to set root= in your bootloader!!!"
Yes, it's not a big deal forcing the use of root=, but part of the goal of ugrd is for it to have no issue booting the system it built for if validation is enabled, so while minor, that sort of thing would make it hard to guarantee a successful boot, as ugrd does not do anything but make the initramfs itself. Sometimes grub likes to pass strange root= parameters, which is much more of a problem for dracut because it doesn't behave this way. Basically, the name of a mapped device mapper device can really be anything, and grub likes to set root= to that, but it could be unlocked to another name and then if that path is used for the root target, things break even though the device was successfully decrypted; one of the many woes involved with using paths for a root option, but one that happens often because of how grub works.szatox wrote:We've been setting root= since the dawn of time, with initrd and without it. Nobody will get confused by keeping it that way.The question I have is whether or not it makes sense to ignore this "issue" when hostonly is disabled, or just to have a big error and say "danger danger you need to set root= in your bootloader!!!"
It seems I can add arbitrary executables simply by specifying binaries = [ ] in the config, this will make adding extra tools trivial.
Things are starting to look pretty good actually. I think I'm just missing that ssh module, and a suitable kernel. I'm leaving those 2 parts for tomorrow though.
pingtoo, you'll need an MTA too.
On a more serious note: whoever happens to be holding "THE Phone"™ at that time will surely notify you, in no uncertain terms, that your unattended reboot lit the monitoring up like a Christmas tree
Code: Select all
dropbear_authorized_keys: /etc/dropbear/authorized_keys
ERROR | GeneratorHelpers._write() got an unexpected keyword argument 'append'
Traceback (most recent call last):
File "/usr/lib/python3.12/site-packages/ugrd/main.py", line 169, in main
generator.build()
File "/usr/lib/python3.12/site-packages/ugrd/initramfs_generator.py", line 95, in build
self.run_build()
File "/usr/lib/python3.12/site-packages/ugrd/initramfs_generator.py", line 244, in run_build
self.run_hook(task, force_exclude=True)
File "/usr/lib/python3.12/site-packages/ugrd/initramfs_generator.py", line 160, in run_hook
if function_output := self.run_func(function, *args, **kwargs):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/site-packages/ugrd/initramfs_generator.py", line 111, in run_func
if function_output := function(self):
^^^^^^^^^^^^^^
File "/usr/lib/python3.12/site-packages/ugrd/net/dropbear.py", line 53, in dropbear_finalize
self._write("etc/passwd", "root:x:0:0:root:/root:/bin/sh\n", append=True)
TypeError: GeneratorHelpers._write() got an unexpected keyword argument 'append'
Well, as long as I can set that timeout to 0 for immediate access...Your use case has made it clear that one good option could be using SSH as a sort of recovery option, in the event of a timeout, which helps a lot.
A side note is that ugrd is designed to be very secure as a baseline, I wonder the best way to wrap the issue of SSH host keys in an initramfs image. I mean copying the build host's keys into an initramfs which is stored on an unencrypted volume is a bad idea
Ah, the module requires v2+, so will only work with the latest 9999 (git) build. I've been meaning to do a proper v2 release, but have been real busy and a few features I _wanted_ to be part of that release may not make it, and may make it in 2.1 or 2.2 or something. v2 had a somewhat substantial overhaul which should make modules a bit easier to use. Before, there was no ordering system so when hooks ran in each init level was determined purely by import order, which was a bit tricky to work around a times. A few older config options were deprecated as well, but other than that v2 is mostly a visual overhaul with more consistent coloring, log formatting, and better log messages with fewer tracebacks. There are enough few small changes that I want to be very sure things don't break over a minor but malformed change.szatox wrote:I just got to it and I'm still building stuff...
Anyway, I cloned your dropbear repo into ugrd's directory (is there any path intended for local modules?) and after a few attempts renamed it to "net", created a host_key, and then got stuck on authorized_keys:
What should I do with it?Code: Select all
dropbear_authorized_keys: /etc/dropbear/authorized_keys ERROR | GeneratorHelpers._write() got an unexpected keyword argument 'append' Traceback (most recent call last): File "/usr/lib/python3.12/site-packages/ugrd/main.py", line 169, in main generator.build() File "/usr/lib/python3.12/site-packages/ugrd/initramfs_generator.py", line 95, in build self.run_build() File "/usr/lib/python3.12/site-packages/ugrd/initramfs_generator.py", line 244, in run_build self.run_hook(task, force_exclude=True) File "/usr/lib/python3.12/site-packages/ugrd/initramfs_generator.py", line 160, in run_hook if function_output := self.run_func(function, *args, **kwargs): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/site-packages/ugrd/initramfs_generator.py", line 111, in run_func if function_output := function(self): ^^^^^^^^^^^^^^ File "/usr/lib/python3.12/site-packages/ugrd/net/dropbear.py", line 53, in dropbear_finalize self._write("etc/passwd", "root:x:0:0:root:/root:/bin/sh\n", append=True) TypeError: GeneratorHelpers._write() got an unexpected keyword argument 'append'
Well, as long as I can set that timeout to 0 for immediate access...Your use case has made it clear that one good option could be using SSH as a sort of recovery option, in the event of a timeout, which helps a lot.
A side note is that ugrd is designed to be very secure as a baseline, I wonder the best way to wrap the issue of SSH host keys in an initramfs image. I mean copying the build host's keys into an initramfs which is stored on an unencrypted volume is a bad idea![]()
Yeah, that security hole is a bit of an issue, but I don't see any way to fix it. Basically, if you have your servers on-premise, you surely won't be honeypotting yourself, and if it's off-premise there's nothing you can do except trust the hosting provider. And a hosting provider can also wiretap your virtual KVM, so unlocking disks with a "local" input isn't really any more secure.
Thanks for finding that, no_kmod is an option which needs more test coverage. Based on the description, my guess is that it's looking for modinfo for kmods defined by builtin modules, so like looking for "btrfs" even though that may be builtin, and if there are _no_ kmods, that lookup will instantly fail. So that firmware finding function should probably check for that before doing a lookup.szatox wrote:Ah, I see. I unmasked the live ebuild and its dependencies, and... kmod started failing even though I already had it disabled with no_kmod.
Ended up adding another one of those
@unset("no_kmod", "no_kmod is enabled, skipping.", log_level=30)
in front of
def _add_kmod_firmware(self, kmod: str) -> None: (line 263)
I don't really speak python, but it at least let the build go through to the end.
Alright, I think it's time to test the archive created in step 1.
Great, that was a dumb bug on my part. As a general flow, attempts are made to see if modules require firmware in every case, who knows when things could change or what is required on unknown hardware. For the internal modules, I don't think anything _needs_ firmware so this unfortunately is just a way it could fail, if the checks fail (normally this would be avoided by having the now-existing fixes).szatox wrote:I certainly don't know python, but the god of programming, Copy Pasta, answers like half of all prayers and lets you try again if you're not satisfied with the result![]()
The module that triggered that error for me was dm_crypt requested by cryptsetup. I don't think this one even uses any firmware, but yes, there was an error message regarding firmware too. I'd actually expect firmware to be together with modules that need it, so since I'm not packing any modules in initramfs, there should be no need for firmware either.
Anyway, pulling your version of the fix didn't break things again for me, so I guess it's good.
If you enable the debug module, this may help some, as it'll give you a shell you can use.szatox wrote: I can't bring my system on-line though. I need a static IP configuration, so I enabled ugrd.net.net and added ip_address = ip/mask and ip_gateway, which didn't really do anything. Actually, can I override it with kernel params?
At this moment I'm not sure whether it's a problem with ugrd or my brand-new completely untested kernel though, so I went ahead looking for shell access. Enabled console, which made the resulting initramfs much more talkactive, but even though there is an automatic login prompt, I can't really use any commands. It only responds to Enter (sometimes).
I'm going to give debug a shot. Maybe this way I'll be able to have a good look at the thing while it's running.
This is TOML "fun", but more or less, the top portion of the file is for global variables, once you define a section, anything defined under there is part of that, acting like dictionary keys usually. This continues until the next defined section. It's like INI but in most ways stricter. The main gotcha is defining things under the wrong section.szatox wrote: Actually, I also have a bit of a problem with the config file structure. I'm sure there must be some logic behind it, but I don't really know whether adding new variables at the top of the file is good enough, or do I have to prepend them with [some.header] or not, or what is the scope of that header, and if the order of active lines is important and so on.
Basically, it's somewhat similar to ini, but not quite, and it feels inconsistent and unintuitive.
One thing I am fairly sure about is that all my changes go into /etc/ugrd/config.toml, and configs provided with the modules are not meant to be touched.
To be fair, with autodetect in place, most users probably won't even see it, so it's not that big of a deal.
BTW, looking at different things I noticed that in cmdline you're parsing boot_option=value pairs and turn them into variables. It's redundant; kernel already provides them as env to init.
Funny thing, a long time ago I saw a similar pattern in genkernel so I _obviously_ replicated it in my own initramfs, and then tried reusing using it for something else, and then - running in a different context - it finally turned out that code didn't even work