Anacron - Daily and Weekly job conflicts

lyallp

I use anacron with /etc/cron.daily and /etc/cron.weekly directories populated with scripts that I execute.

My problem is, one of my weekly scripts is a long running disk test script (I have bad sectors on one of my disks and I want to keep an eye on the disk)

This conflicts with a daily job (emerge --sync) in which the long running script will generate errors as files go 'missing' in portage that where there when the weekly script started.

This could easily be a weekly backup, rather than a disk check, it's the concept that matters.

Can I setup such that anacron runs all it's daily/weekly/monthly scripts in series - or make it such that my long-running script will prevent shorter running scripts from running until it is finished?

Is there some 'lockfile' my jobs can create to prevent other jobs from running?

I guess I could replace the run-parts with my own script which scans for executable files in the indicated directory and runs them in series, but I felt that this was something someone else may have already done.

I would create files in /etc/cron.daily, for example, as '01-script.parallel' and '01-script.serial', execute all the parallel ones first, when they are complete iterate through the serials....

Have my script 'lock' any other instances so when anacron executes the daily and the weekly and monthly all at the execution time, including the command line delay, which is a hack, the daily, weekly, monthly invocations would wait for each other and execute in series...

Alternatively is there a different package that does what I am looking for?

Thoughts much appreciated.
_________________
...Lyall

Marlo · Veteran Joined: 26 Jul 2003 Posts: 1591

Hello lyallp,
You can play with RANDOM_DELAY and START_HOURS_RANGE. See the man page
https://linux.die.net/man/5/anacrontab.
Greetings
Ma
_________________
------------------------------------------------------------------
http://radio.garden/

lyallp · Posted: Mon Feb 28, 2022 9:45 am Post subject:

Random delay doesn't cut the mustard.

I have written a simplistic Run-Parts.sh which searches a directory for scripts with 'serial' in the filename and executes them one after the other, waiting til one finishes before starting the next.

It will run scripts with 'parallel' in the name, in parallel.

The remaining scripts are executed in parallel, basically current functionality of run-parts, as I understand it.

The script only allows one instance of itself to run at a time, waiting till other instances finish, before resuming, thus, daily, weekly and monthly scripts won't interfere with each other.

I simply replaced the anacron invocations of run-parts with my Run-Parts.sh.

A lot of missing functionality compared to run-parts but good enough for my purposes.

For me, this works well with a weekly script that I have that takes many hours but has non-functional conflicts with some of my daily scripts.

It completely removes the need for the random delay or delay before weekly, monthly steps.

For example, each month, I run a disk check, which takes ages, whilst daily, I do an emerge --sync, the appearance and disappearance of files during the disk check generates diagnostics.
_________________
...Lyall

szatox · Advocate Joined: 27 Aug 2013 Posts: 3603

lyallp · Posted: Mon Feb 28, 2022 10:01 am Post subject:

Cool, will look into it...

Edit: Well, I did look into it, flock doesn't quite fit my requirements, still useful in the script, however.
_________________
...Lyall

Hu · Administrator Joined: 06 Mar 2007 Posts: 23336

Why is your disk check causing files to go in and out of existence? A healthy disk should never lose files, and the disk check should not be modifying filesystems on the device.

szatox · Advocate Joined: 27 Aug 2013 Posts: 3603

Hu, I suppose he unmounts the FS before fsck.

lyallp, what's wrong with flock? Answering this question could inform other possible answers.

Hu · Administrator Joined: 06 Mar 2007 Posts: 23336

Perhaps so. I wouldn't use fsck in an automated manner on a drive known to have failing sectors. I would use a SMART test, if I kept the drive in service at all. SMART is nondisruptive and can be done while the filesystem is mounted.

lyallp · Posted: Mon Feb 28, 2022 10:27 pm Post subject:

You asked for background ....

My long running job is a disk check does not use fsck.

The disk that is failing is some spinning iron, and the number of hard errors is not unbearable at this time.
Its a 6TB drive and is split into half NTFS filesystem and half XFS.

The XFS part, has data which is not crucial, such as portage and other dynamic data plus I have backups.

What I do is find every file on the disk, cat/dd it to null and if there is an error, move it to a folder called BAD_SECTORS.
New errors do not happen often, but I wrote the script when I first found I had bad sectors on the disk via 'syslog' and I had great difficulty in associating a kernel level bad sector with human level file.
So, I had a bad sector, what file was affected? Bugger it, I will read all the contents of all files to find the affected files or, in some cases, directories.

I don't bother trying to recover the file or directory which has the error, I just put it to one side so it does not interfere again, with things like portage.
Sectors that are broken that are not contained within a file or directory structure are not found in this test. I did consider using 'dd' to fill the disk with nulls to try find these but my care factor wandered away.

I also emerge --sync every day, so the emerge --sync and the disk check conflict.

One of these days, I will replace the disk, it's not urgent.

So, back to anacron.

I use anacron to schedule daliy/weekly/monthly jobs and one daily job is emerge --sync and another weekly job is bad sector scans on the failing drive.

I was having reports of missing files that the bad sector scan could not find, as a result of emerge --sync deleting files...

The problem, for me, is flock seemed to assume that I wanted to run a command when the lock was achieved and release the lock when the command completed.

I wanted a lock which acquired a lock and a separate? command to release the lock, whilst I did stuff in between.

I perfectly understand putting 'flock' in each script that is run by anacron in the /etc/cron.daily, for example.
Make the decision parallel or serial in each script.

I wanted to have the lock at a higher level, at the actual execution of the whole directory of /etc/cron.daily, for example.

The running script, would prevent other instances of itself from running (say monthly and weekly), forcing them to wait till daily has completed.

I guess I could put 'flock' at the /etc/anacrontab level when it runs the run-parts script, just make the run-parts a parameter of 'flock'.

Then, I wanted individual scripts in the daily to also be run serially or parallel, for example, I have a weekly backup which takes ages.

I do not want other jobs to run whilst this backup is happening, for example.

However, I do have other jobs that I am happy to run in parallel, clean up scripts, etc.

It just seemed easier to me to write my own run-parts script.

To that end, flock is useful in the case of executing the individual scripts from this run-script and but it is not so useful in serialising daily, weekly, monthly, although, having thought about it, using flock at the /etc/anacrontab level achieves some of the result I am looking for.

I do now use flock to serialise steps within my run-parts script now, and I think I may even use flock on /etc/anacron level but having filenames that indicate parallel or serial running seems more intuitive to me than having flock in each script, requiring you to examine each script to determine if it is running in parallel or serially.

Anyway, that's some background on the issue, and thoughts on flock when it comes to paralellising/serialising anacron jobs.
_________________
...Lyall

figueroa · Posted: Tue Mar 01, 2022 5:37 am Post subject:

Would you be kind enough to share your script that does " find every file on the disk, cat/dd it to null and if there is an error, move it to a folder called BAD_SECTORS"?
_________________
Andy Figueroa
hp pavilion hpe h8-1260t/2AB5; spinning rust x3
i7-2600 @ 3.40GHz; 16 gb; Radeon HD 7570
amd64/23.0/split-usr/desktop (stable), OpenRC, -systemd -pulseaudio -uefi

lyallp · Posted: Tue Mar 01, 2022 7:21 am Post subject:

Sure, it's nothing fancy....

The script finds the files, I move them manually after looking at what the file is and whether I need to do anything.

Hu · Administrator Joined: 06 Mar 2007 Posts: 23336

Rather than grep -v for BAD_SECTORS, you could use a find expression that prohibits examining that directory at all.

szatox · Advocate Joined: 27 Aug 2013 Posts: 3603

lyallp · Posted: Tue Mar 01, 2022 10:10 pm Post subject:

Script, I print the mount point I am scanning before the find, so that distinguishes files.

The error output of find whilst scanning bad sectors is re-directed to stderr, in the script above, however the expression is point taken. I do have BAD_SECTORS_# where # is 0-6 so far....

Yes, I should replace the disk but it only has about 12 bad sectors so far and the number has not changed in quite some time.

Locking, upon reflection, thanks for that feedback. I do use flock now

_________________
...Lyall