Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Tips for stability and trustworthiness
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
psycho
Apprentice
Apprentice


Joined: 22 Jun 2007
Posts: 282
Location: New Zealand

PostPosted: Sun Feb 21, 2021 8:06 am    Post subject: Tips for stability and trustworthiness Reply with quote

Given that this kind of thing never gained enough interest to happen, what do people do when they want to use Gentoo but need to count on a system's working reliably for months or years without a lot of maintenance? Are there any effective tricks for working with the existing tree to limit updates to security and bug fixes, or is it basically just a case of either not updating at all, or checking and potentially testing every single update (e.g. if a library's updated, going through all the software that uses that library to check that all relevant features still work etc.) every time? I've thought about using e.g.
Code:
glsa-check -t all
regularly to flag genuinely necessary updates and just leaving everything else untouched...but how long before that approach could make a necessary (GLSA flagged) package update into a nightmare due to the whole system's being so out of sync?

What are other people doing? I can't always see all the potential problems coming just from emerge -p and so I'm not comfortable doing something like a presentation before a large audience on software that's had a bunch of libraries upgraded to new versions since the last time I tested the presentation software or the audio or whatever. Are there any tips for keeping Gentoo (1) up-to-date in terms of security fixes, and yet (2) predictable in terms of being able to trust that whatever was working when you tried it last month will still be working at the crucial moment you need it this month?

At the moment I'm thinking of maybe trying to keep things fixed (not updated at all) for as long as possible (six months?) while just checking GLSA for serious issues, and then planning update weekends to bridge the months of falling behind (and after updating, thoroughly testing everything again for another stable "snapshot"). It feels clumsy though: do I just have to accept that rolling release = lots of manual attention, or have people figured out ways to keep stuff simultaneously secure (up-to-date in that respect) and trustworthy?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 47613
Location: 56N 3W

PostPosted: Sun Feb 21, 2021 11:35 am    Post subject: Reply with quote

psycho,

Those with large Gentoo deployments do an offline build and test cycle, then deploy the tested binary.
They have to distribute the repo and binaries together at deploy time because things have moved on in the live tree.

I have about 10 broadly similar (hardware) systems. I build an test there, keeping all the distfiles and binaries, so I can roll back with an emerge -K when things break.
I've had to do that a few times. Once or twice, its not been easy.
Once my main desktop is done and I'm happy with it, I do the others over the course of a few days. This mostly works.

I use Kernel Self Protection Project/Recommended Settings everywhere and don't run services I don't need. The rest of security starts with your threat assessment.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
cboldt
l33t
l33t


Joined: 24 Aug 2005
Posts: 914

PostPosted: Sun Feb 21, 2021 1:13 pm    Post subject: Reply with quote

I just run "stable" but do updates once a week.

Uptime 1000+ days. Obviously no kernel updates until the system goes down (typically power fail or hardware fail), but rock solid otherwise.
Back to top
View user's profile Send private message
Juippisi
Developer
Developer


Joined: 30 Sep 2005
Posts: 525
Location: /home

PostPosted: Sun Feb 21, 2021 4:52 pm    Post subject: Reply with quote

cboldt wrote:

Uptime 1000+ days. Obviously no kernel updates until the system goes down (typically power fail or hardware fail), but rock solid otherwise.


There's kpatch/elivepatch for this.
Back to top
View user's profile Send private message
figueroa
Veteran
Veteran


Joined: 14 Aug 2005
Posts: 1204
Location: Edge of the Marsh USA

PostPosted: Mon Feb 22, 2021 3:47 am    Post subject: Reply with quote

I do updates first on a nearly identical local server first before duplicating the update to a mission critical remote server -- no surprises.

Update often -- small bights are easier to digest than orgies.

Since packages with security issues or serious bugs often have dependencies that also need updating, just updating programs identified in GLSA is an impractical option.

If you believe you can live with an outdated, security-vulnerable system, I've done it, don't recommend it.

I keep two paths open to ssh into remote server, openssh and dropbear.

Protect from the outside with fail2ban with stringent settings.

Do I need to say (?): full backups, automatic backups, off-site backups, tested with full restore from time to time.
_________________
Andy Figueroa
andy@andyfigueroa.net Working with Unix since 1983.
Automate and Test Your Backups
Back to top
View user's profile Send private message
figueroa
Veteran
Veteran


Joined: 14 Aug 2005
Posts: 1204
Location: Edge of the Marsh USA

PostPosted: Mon Feb 22, 2021 3:52 am    Post subject: Reply with quote

cboldt wrote:
I just run "stable" but do updates once a week.

Uptime 1000+ days. Obviously no kernel updates until the system goes down (typically power fail or hardware fail), but rock solid otherwise.

I had a machine up for over 1000 days once. It made me really nervous. I now reboot every couple of months whether I need to or not. it's really important to know your mission critical computer can boot up. Better to plan that reboot that to wait for a power supply to blow up.
_________________
Andy Figueroa
andy@andyfigueroa.net Working with Unix since 1983.
Automate and Test Your Backups
Back to top
View user's profile Send private message
cboldt
l33t
l33t


Joined: 24 Aug 2005
Posts: 914

PostPosted: Mon Feb 22, 2021 2:46 pm    Post subject: Reply with quote

figueroa wrote:
I had a machine up for over 1000 days once. It made me really nervous. I now reboot every couple of months whether I need to or not. it's really important to know your mission critical computer can boot up. Better to plan that reboot that to wait for a power supply to blow up.

Fill in strategy is two-machines. The LAN evolved to that point when the (single) server had a hardware failure. I pressed a screenless laptop into mail server duty while waiting for new hardware - power supply if I recall. Once the original server was up, I just kept the mail serving on the screenless laptop. That was probably 15 years ago., and by now both machines are running on different hardware. But if/when either one dies, the other can fill in. I maintain a few critical services on both machines, but only turn them on, on one. mail, dns basically.
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 18838

PostPosted: Mon Feb 22, 2021 11:20 pm    Post subject: Re: Tips for stability and trustworthiness Reply with quote

psycho wrote:
need to count on a system's working reliably for months or years without a lot of maintenance?
Years, wow. The low side of months seems somewhat easy enough. Sync and fetch. Save the repo for each period after you've done a sync and have fetched updates.

On the longer side of months, use a chroot to update in the manner described above, but actually do the updates inside the chroot. This should make updating the chroot somewhat easier. Optionally use binaries to update the live system, or repo snapshots and sources to update it on the actual host.

My personal preference that I have not yet had a chance to implement is to maintain multiple boot environments (at least temporarily), "current" and "upgrade". Either using normal or LVM partitions, separate physical drives, etc., perform the upgrade in the "upgrade" location. When done, boot into "upgrade." In case of failure, reboot to the previous "current."

For the general idea, see ZFS Boot Environments.

I haven't decided how I'd manage that on a laptop. I also haven't identified how to track which partition would be "current" or "upgrade", which would be particularly important as more time elapses since the laat upgrade or reboot. LVM snapshots seemed to be the most promising non-ZFS option, but I only did some basic testing a few years back.

psycho wrote:
Are there any effective tricks for working with the existing tree to limit updates to security and bug fixes, or is it basically just a case of either not updating at all, or checking and potentially testing every single update (e.g. if a library's updated, going through all the software that uses that library to check that all relevant features still work etc.) every time? I've thought about using e.g.
Code:
glsa-check -t all
regularly to flag genuinely necessary updates and just leaving everything else untouched...but how long before that approach could make a necessary (GLSA flagged) package update into a nightmare due to the whole system's being so out of sync?
In my experience, "stable" is too volatile to rely on being able to do that. That seems to be the nature of an actively developed rolling release. If you can do the glsa-check updates, that'd be better than nothing, but at some point, it is probably going to not work without the updates you're trying to avoid. Consider recent changes with X (suid change, X won't start), PAM (possible lockout during upgrade), Python 2.7 and 3.x changes, profile 17.1, and maybe some others. Obviously those aren't likely to be future problems, but something else might (or might not). Consider that in whatever plan you choose.

psycho wrote:
What are other people doing?
When I started using Gentoo on my laptop, I quickly decided that wasn't viable, so I started to build in a chroot. My plan is to use generic binaries and use them across my systems and VMs. I'm currently working on my second chroot for this method, what will become my storage server. Once that's working, I may choose to go back to native binaries in some cases, but that will depend on how well I'm able to create tools to manage the process.

Due to the what I described as a "volatile, actively developed" rolling release, my conclusion is that Gentoo is truly the "meta distro" it has always been. I happened to be lucky for a very long time that it also worked for me as a distro. I'm now in the process of using Gentoo to create the "distro" I'll be using. Although I either lack the experience using Gentoo that way, or the tooling isn't as complete as it seems it ought to be on the "manage your distro" side of the equation.
_________________
Magna Carta (1215) | Spectral evidence no longer permissible (c. 1792) | Cancel culture, deplatforming (c. 2016)
Back to top
View user's profile Send private message
PlatinumTrinity
Tux's lil' helper
Tux's lil' helper


Joined: 10 Mar 2020
Posts: 78

PostPosted: Tue Feb 23, 2021 3:04 am    Post subject: Reply with quote

cboldt wrote:
I just run "stable" but do updates once a week.

Uptime 1000+ days. Obviously no kernel updates until the system goes down (typically power fail or hardware fail), but rock solid otherwise.


I used to keep my desktop machines up that long. I've even had some old NT systems that managed 200+ days of uptime. I don't do this anymore because I run full disk encryption. If I'm not going to be on the machine I shut it down. I've never been one to use suspend even on my laptops. The machine is either in-use or it's off.

My servers don't run Gentoo. I keep those on *BSD. I wish the Gentoo/BSD project was still active because I would use it. Generally, I don't have anything on servers that I want to hide so I don't mind keeping those up. As long as the power is on my servers should be online.
Back to top
View user's profile Send private message
psycho
Apprentice
Apprentice


Joined: 22 Jun 2007
Posts: 282
Location: New Zealand

PostPosted: Tue Feb 23, 2021 9:28 am    Post subject: Reply with quote

Thanks for lots of interesting feedback and ideas. I do already backup my systems regularly and have at least two disks per box so am able to do the parallel live-system / old-working-backup thing...but my working backups won't be a huge consolation if I'm delivering a presentation in front of a large audience and stuff breaks to the point of "oh, sorry folks, just hold on while I boot into a backup OS that actually works...": I'd be much happier just working every day with that trustworthy "backup" OS, keeping it secure with glsa-flagged updates, if there's some way to achieve that.

What exactly does
Code:
emerge -u @security
do, for example if I ran it on a box that hadn't been updated in any other way for a couple of months? I've never tried it, but it looks like a leave-everything-alone-except-for-GLSA-flagged packages update...? If it actually works, it might be enough for me to feel that I can maintain my (deployed) stable systems that way, while a single building machine is keeping in sync with live portage, ready for eventual deployment the next time I want to freeze things as a stable snapshot...?
Back to top
View user's profile Send private message
Hu
Moderator
Moderator


Joined: 06 Mar 2007
Posts: 16693

PostPosted: Tue Feb 23, 2021 5:07 pm    Post subject: Reply with quote

As a side warning, you cannot assume that every security relevant update is flagged in a GLSA. The project tries its best, but if upstream does not call out a security problem fixed by an update, or if the GLSA team is behind, there may be security fixes available without a corresponding GLSA.
Back to top
View user's profile Send private message
psycho
Apprentice
Apprentice


Joined: 22 Jun 2007
Posts: 282
Location: New Zealand

PostPosted: Tue Feb 23, 2021 8:29 pm    Post subject: Reply with quote

Thanks Hu, and that's fine...I'm not hosting anything with user data or doing anything that really needs to be up-to-the-minute informed re every obscure little possibility...so as long as the GLSA addresses high risk high impact stuff within a day or two of its being flagged upstream I'd be happy with the resulting system, if the @security target incorporated the GLSA stuff smoothly (i.e. if the practice of merging that into an otherwise frozen Gentoo weren't so unusual and untested that it wound up leading to just as much or even more breakage as plain old -u world). The point re upstream's potentially just staying quiet about the issues they've addressed in new releases is a more worrying one...where that's the case it's basically rolling release = more secure...although it's offset to some extent by the possibility that new versions are also introducing new features opening up new holes that the older versions didn't have...so again, I think what I might try is to maintain working tested Gentoo installs with just emerge -u @security (or manual glsa-check plus rsync of ebuilds or whatever it takes), while a "testing" (though of course "stable" in terms of keywords) system keeps in sync with live portage.

Or at least I can give it a shot on one box for a while and see what happens over time. I guess I could treat it as an experiment along these lines: if, when my -u world systems expose their first problem (the first program that no longer works or whatever...something inconvenient that has to be manually sorted out), the conservative -u @security system has avoided that issue and *not* had any issues unique to itself, I could treat that as evidence that the @security target is useful for this purpose and maybe look at trying it on everything, with just a building system tracking live portage (so that at some point I can test and freeze it ready for deployment as an upgrade to the @security boxes).
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 18838

PostPosted: Tue Feb 23, 2021 10:43 pm    Post subject: Reply with quote

psycho wrote:
my working backups won't be a huge consolation if I'm delivering a presentation in front of a large audience and stuff breaks to the point of "oh, sorry folks, just hold on while I boot into a backup OS that actually works...": I'd be much happier just working every day with that trustworthy "backup" OS, keeping it secure with glsa-flagged updates, if there's some way to achieve that.
Don't do upgrades right before you do a presentation. Test "critical" software after an upgrade. The nature of changing something that works is that it might stop working. I'm not aware of any solution that prevents potential problems. So whatever you choose, realize that what you're doing is managing the amount of down time to restore previous working state.
_________________
Magna Carta (1215) | Spectral evidence no longer permissible (c. 1792) | Cancel culture, deplatforming (c. 2016)
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum