Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Restarting services after updates
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks
View previous topic :: View next topic  
Author Message
Rcomian
Apprentice
Apprentice


Joined: 10 Jan 2004
Posts: 174
Location: Uk, Northwest

PostPosted: Thu Feb 09, 2006 10:58 pm    Post subject: Restarting services after updates Reply with quote

You know the problem - after updating a bunch of packages you may need to restart some services in order to pick up any changes or security updates. But unless you were really paying attention, or the service being updated was a key service, it can be hard to find out what's going on.
This also goes if your trying to have a server applying security updates whilst being un-attended - it would need to know when to restart a service.

There's lots of issues involved with this, but the script presented here is a first stab at actually providing at least a partial solution.
The idea is that it gets run after all updates and etc-updates are done (I prefer to use cfg-update because it has an automatic mode, but your tastes may vary).
There's a magical directory (/var/lib/init.d/started) that contains a symlink to all the currently running services. This script keys on the simple fact that if a service has been updated, the modified time of the actual init script will be later than the modified time of this symlink. If this is the case, it's a fair bet that the service needs restarting.

There are several ways to control this script. First there are commanline options:
-p sets the script into PRETEND mode - no services will actually be restarted.
-v sets the script into VERBOSE mode - all running scripts will be listed along with a description of whether they were restarted, and if not, why not.
-d <dir> use a directory other that /var/lib/init.d/started to look for the magic symlinks
-b <file> use this file for the blacklist
-w <file> use this file for the whitelist
-h prints the usage message
-l <file> redirects output to a log file.

You can set a blacklist or a whitelist (or both if your perverse) of services this script can control. These should by default be placed at:
/etc/service_restart_blacklist
/etc/service_restart_whitelist

When restarting, a service is simply restarted with the "/etc/init.d/<service> restart" command.
It won't restart a service if it finds an update for the init script waiting in the wings.

There's a lot more things we could key on to decide if a service needs restarting or not, these may be added in if the basic script proves its worth. It's something I've seen a lot of people ask for, and I'd kind of like it myself - I'm playing with trying to get a machine which performs basic maintenance un-attended.

So, here it is, enjoy!

Update: As far as blacklists go, it's probably wise to blacklist the xdm service - you don't want to be logged out of your desktop session by an overzealous cron job!

Code:

#!/usr/bin/python

# Copyright 2006 Jim Tupper
# Modified by Chris Roe, February 2006
#
# Licenced under the GPL, any version
#
# Changelog:
# 2006-02-09   JT    Version 1.0
#   Original version
#
# 2006-02-23   CR    Version 1.1
#   - proper command line help (via a new -h option)
#   - support for writing to a verbose log file (via the -l option)
#
# 2006-02-25   JT    Version 1.2
#   - Added additional trigger for detecting updates to /etc/conf.d/
#   - Using "pause" & "start" rather than "restart" as a poor man's attempt
#     to handle the service dependency issues.
#   - Added default blacklist if no black or whitelist is specified at all.
#   - Modified function comments to use python's docbook syntax
#   - Reversed default logic of do_restart to make it easier to add extra triggers
#   - Added this changelog

import os, stat, re, getopt, sys, datetime

def print_usage (err = ""):
   """Outputs the usage message of this script.
   
   Parameters:
      err  - Optional.  If specified, this error will be printed before
             the usage message.
   """

   if not (err == "" or err.isspace()):
       error( "Error: " + err )
       error()
   
   error( "Usage: restart-services.py [-p] [-v] [-d <serviceDir>]" )
   error( "             [-b <blacklist>] [-w <whitelist>] [-l logfile]" )
   error()
   error( "Where:" )
   error( "    -h    Prints this help message." )
   error()
   error( "    -p    Runs the script in pretend mode.  Using this option" )
   error( "          only outputs what would happen.  It does not actually" )
   error( "          (re)start or stop any services." )
   error()
   error( "    -v    Runs the script in verbose mode.  This will output" )
   error( "          extra information regarding exactly what the script" )
   error( "          is doing." )
   error()
   error( "    -d <serviceDir>" )
   error( "          Directs the script to use the file indicated by " )
   error( "          <serviceDir> to be its \"magic\" file.  This file" )
   error( "          is used to determine if a service needs to be " )
   error( "          restarted if the service's init script has been" )
   error( "          modified after the time on this file." )
   error()
   error( "          The script wholly manages this file on its own, so it" )
   error( "          should be a file that you do not care about. This" )
   error( "          file is created if need be." )
   error()
   error( "          The default location for this file is /var/lib/init.d/started" )
   error()
   error( "    -b <blacklist>" )
   error( "          Indicates to the script which file contains a list of" )
   error( "          services that should not be restarted.  This file should" )
   error( "          list each service that is on the blacklist on a separate" )
   error( "          line.  The default location that is used if this option" )
   error( "          is not specified is /etc/service_restart_blacklist" )
   error()
   error( "    -w <whitelist>" )
   error( "          Indicates to the script which services are to be restarted." )
   error( "          Only these services will be restarted. By default, the" )
   error( "          script looks for /etc/service_restart_whitelist as the " )
   error( "          whitelist file." )
   error()
   error( "    -l <logfile>" )
   error( "          If specified, will write verbose output to the specified file." )
   error( "          This option is handy, if it is difficult to redirect the output" )
   error( "          of this script in you situation, or if you want the normal" )
   error( "          amount of output sent to stdout, but want a verbose log to fall" )
   error( "          back on in case of problems." )
   error()
   error( "Return Codes:" )
   error( "    0     Normal, successful completion." )
   error()
   error( "    1     Unable to open log file for writing." )
   error()
   error( "    2     An invalid option was passed to the script." )
   error()
   error( "    3     Unable to load a blacklist rule." )
   error()


def verbose(msg=""):
    """Outputs messages that are meant to be displayed in verbose mode.

    If the VERBOSE global is set to True, the message is output to standard out.
    If the LOG_FILE global is set to a File, the message is written to to that file.
 
    Parameters:
      msg  - Optional. The string containing the message to be output.  If ommitted,
             a blank line is output.
    """

    if VERBOSE:
        sys.stdout.write(msg + "\n")

    if (LOG_FILE != None):
        LOG_FILE.write(str(datetime.datetime.now()) + " - [DEBUG] - " + msg + "\n")



def info(msg=""):
    """Outputs messages that are meant to be displayed on standard out.

    If the LOG_FILE global is set to a File, the message is written to to that
    file as well.
 
    Parameters:
        msg  - Optional. The string containing the message to be output.  If ommitted,
               a blank line is output.
    """

    sys.stdout.write(msg + "\n")

    if not (LOG_FILE == None):
        LOG_FILE.write(str(datetime.datetime.now()) + " - [INFO ] - " + msg + "\n")


def error(msg=""):
    """Outputs messages that are meant to be displayed on standard err.

    If the LOG_FILE global is set to a File, the message is written to to that
    file as well.

    Parameters:
        msg  - Optional. The string containing the message to be output.  If ommitted,
               a blank line is output.
    """

    sys.stderr.write(msg + "\n")

    if not (LOG_FILE == None):
        LOG_FILE.write(str(datetime.datetime.now()) + " - [ERROR] - " + msg + "\n")


def warn(msg=""):
    """Outputs messages that are meant to be displayed on standard err as warnings to the user

    If the LOG_FILE global is set to a File, the message is written to to that
    file as well.
 
    Parameters:
        msg  - Optional. The string containing the message to be output.  If ommitted,
               a blank line is output.
    """

    sys.stderr.write(msg + "\n")

    if not (LOG_FILE == None):
       LOG_FILE.write(str(datetime.datetime.now()) + " - [WARN ] - " + msg + "\n")


##########################################################################################
##  MAINLINE Script.
##########################################################################################
PRETEND=False
VERBOSE=False
SVCDIR="/var/lib/init.d/started"
SERVICE_BLACKLIST="/etc/service_restart_blacklist"
SERVICE_WHITELIST="/etc/service_restart_whitelist"
LOG_FILE=None
CONFDIR="/etc/conf.d"

try:
   opts, args = getopt.getopt(sys.argv[1:], "hpvd:b:w:l:")
except getopt.GetoptError, (errno, strerror):
   # print help information and exit:
   print_usage("Unknown Option: " + strerror)
   sys.exit(2)

for o, a in opts:
   if o=="-h":
      print_usage()
      sys.exit(0)
   if o=="-p":
      PRETEND=True
   if o=="-v":
      VERBOSE=True
   if o=="-d":
      SVCDIR=a
   if o=="-b":
      SERVICE_BLACKLIST=a
   if o=="-w":
      SERVICE_WHITELIST=a
   if o=="-l":
      try:
          info("Writing log to " + a)
          LOG_FILE= file(a, 'w+')
      except:
          error("Unable to open '" + a + "' for writing!!")
          sys.exit (1)

# Load the blacklist
blacklist = None
if os.path.exists(SERVICE_BLACKLIST):
   blacklist = []
   f = open(SERVICE_BLACKLIST, 'r')
   try:
      for rule in f:
         try:
            # Strip the trailing newline
            rule = rule[:-1]

            if len(rule) > 0:
               blacklist.append( re.compile("^"+rule+"$") )
               verbose("Loaded blacklist rule: " + rule)
         except:
            # Can't continue
            # The user is expecting a service to be left alone but we tell what service(s)
            error("Failed to load blacklist rule: " + rule)
            sys.exit(3)
   finally:
      f.close()

# Load the whitelist
whitelist = None
if os.path.exists(SERVICE_WHITELIST):
   whitelist = []
   f = open(SERVICE_WHITELIST, 'r')
   try:
      for rule in f:
         try:
            # Strip the trailing newline
            rule = rule[:-1]

            if len(rule) > 0:
               whitelist.append( re.compile("^"+rule+"$") )
               verbose("Loaded whitelist rule: " + rule)
         except:
            # This is less critical than failing to load a blacklist. Still report the error.
            error("Failed to load whitelist rule: " + rule)
   finally:
      f.close()

# Load a default blacklist if none specified.
# Note, if you make your own blacklist you should consider if you need these items.
if whitelist == None and blacklist == None:
   blacklist = []

   # Don't allow xdm to restart, this will logout any logged in users
   blacklist.append( re.compile("^xdm$"        ) ) 

   # Don't allow networks to restart, this could disrupt downloads etc.
   blacklist.append( re.compile("^net\..*$"    ) ) 


# Go through the list of started services
for service in os.listdir(SVCDIR):
   fullpath = os.path.join(SVCDIR, service)

   # First check to see if the entry is actually a link at all.
   servicestat = os.lstat(fullpath)

   if not stat.S_ISLNK(servicestat[stat.ST_MODE]):
      # Just ignore this, it's a poisson rouge
      continue

   # By default, assume we're not going to restart this service, and then find reasons to do so
   do_restart=False

   # This is the start of the description string for the service
   description = "Service: " + service

   # Get the details for the file this link points to
   targetstat = os.stat(fullpath)


   # see if the init.d script is older than the link to it
   if do_restart==False:
      if targetstat[stat.ST_MTIME] > servicestat[stat.ST_MTIME]:
         do_restart=True
         description=description+" - service script modified"
      else:
         description=description+" - service script not modified"

   # See if the service's config file has been updated since starting the service
   if do_restart==False:
      if service[:4] == "net.":
         # TODO: scan for possible network config files
         description=description+" - network configs not supported yet"
         pass
      else:
         confpath = os.path.join( CONFDIR, service )

         if os.path.exists( confpath ):
            confstat = os.stat(confpath)
         
            if confstat[stat.ST_MTIME] > servicestat[stat.ST_MTIME]:
               do_restart=True
               description=description+" - service config modified"
            else:
               description=description+" - service config not modified"
         else:
            description=description+" - no config found"

   # Todo - other reasons to restart should be added here


   # After this point the checks are to find reasons NOT to restart

   # Find the destination that the link points to
   targetpath = os.path.realpath(fullpath)

   # See if there are outstanding updates for the service script
   if do_restart:
      # We'll use this to look for files which are unapplied updates to the target file
      filematcher = re.compile("\._cfg...._" + service)

      for item in os.listdir(os.path.dirname(targetpath)):
         if filematcher.match(item) != None:
            # We've found an un-applied update to this file, we don't want to restart this
            # service until that update has been applied.
            do_restart=False
            description=description+" - outstanding service update"
            break

   # See if there are outstanding updates for the service script config
   if do_restart and service[:4] != "net.":
      # We'll use this to look for files which are unapplied updates to the config file
      filematcher = re.compile("\._cfg...._" + service)

      for item in os.listdir(os.path.dirname(CONFDIR)):
         if filematcher.match(item) != None:
            # We've found an un-applied update to this file, we don't want to restart this
            # service until that update has been applied.
            do_restart=False
            description=description+" - outstanding config update"
            break


   # Apply the blacklist rules
   if do_restart and blacklist != None:
      for rule in blacklist:
         if rule.match(service):
            # This service has been blacklisted
            do_restart=False
            description=description+" - blacklisted"
            continue

      if do_restart:
         description=description+" - not blacklisted"


   # Apply the whitelist rules
   if do_restart and whitelist != None:
      do_restart=False
      for rule in whitelist:
         if rule.match(service):
            # This service has been whitelisted
            do_restart=True
            description=description+" - whitelisted"

      if do_restart==False:
         do_restart=False
         description=description+" - not whitelisted"



   # Actually do the restart if needed
   if do_restart:
      # If we get here then the service should be restarted
      description=description+" - restarting"

      # Tell the user what's going on   
      verbose(description)

      if not PRETEND:
         # Actually do the restart
         servicescript = os.readlink(fullpath)

         # Pause the service. This stops the service without stopping services which depend on it
         errorcode = os.spawnl(os.P_WAIT, servicescript, servicescript, "pause")
         if errorcode <> 0:
            error(service+" could not be stopped. Stopping resulted in error: " + errorcode)

         # Start the service again.
         errorcode = os.spawnl(os.P_WAIT, servicescript, servicescript, "start")
         if errorcode <> 0:
            error(service+" could not be started. Starting resulted in error: " + errorcode)
      else:
         info(service+" would restart")
   else:
      # Tell the user what's going on     
      verbose(description)

if LOG_FILE != None:
    LOG_FILE.flush()
    LOG_FILE.close()



Comments and suggestions very welcome.


Last edited by Rcomian on Sat Feb 25, 2006 11:36 am; edited 2 times in total
Back to top
View user's profile Send private message
Kooky
n00b
n00b


Joined: 10 Sep 2005
Posts: 23
Location: Mannheim

PostPosted: Fri Feb 10, 2006 7:06 pm    Post subject: Reply with quote

Hi,
first of all "yes that is what i'm searching for!" but why you use a symlink in /var/lib/iinit.d and not just check "/etc/init.d/<service> status" to see if it is running?

Greets Kooky
Back to top
View user's profile Send private message
Rcomian
Apprentice
Apprentice


Joined: 10 Jan 2004
Posts: 174
Location: Uk, Northwest

PostPosted: Fri Feb 10, 2006 8:52 pm    Post subject: Reply with quote

That's a good question. We need something to key on when deciding whether to restart a service, one of the simplest things I've found is that if the init script for the service was modified after the service was started, then it needs restarting. If you iterate over the init.d directory the scripts will tell you if they're started or not, but they wont tell you when they were started, so I had to dig around to find that information. Also, different scripts have different capabilities, so I didn't want to rely on them too much. The only place I found the information I was after was kind of en-passant - in the creation/modified date of the symlinks created to record which services are considered to be "running". These symlinks are created automatically when a service is started and they're deleted again when the service is stopped or zapped.

Now since we're already in this directory which tells us when the services were started, and it also, by definition says which services are running, there's not really much point in iterating over the init.d directory to speculatively find this information when it's given to us right there.

The location of the directory containing the symlinks is defined in the file /sbin/functions.sh. If you source this file and echo $svcdir you should see it. Eventually I'd like to use this environment variable to find the location in a more accurate way than having it hardcoded. If your svcdir isn't /var/lib/init.d then it may be /var/lib/supervise - so use that directory instead.

There's a lot of room for improving the detection for when a script needs restarting. Ultimately we can use genlop to see if a service or any of its dependencies have been installed since the service started, but knowing when a service was started is really key, so I can't see moving away from iterating this directory until there's a better way to find that information for whatever script we're looking at.
Back to top
View user's profile Send private message
psychomunky
Guru
Guru


Joined: 02 Nov 2004
Posts: 337
Location: Canada

PostPosted: Thu Feb 23, 2006 2:04 am    Post subject: Reply with quote

This be damn cool, but I have a couple of questions/suggestions:

    1. What happens if your init update tool detects no change in the /etc/init.d, but updates your /etc/conf.d for that script?? Do most of the config updaters (etc-update, cfg-update and dispatch-conf being the major ones), actually update the date/time stamp on the /etc/init.d scripts?? I know this is a first stab, and it seems cool, but I am just curious what the plans are. I'd love to help with ideas/brainstorming/testing/etc...heck I'd even write some code if you really wanted me to (I sold out being a developer to become a DBA, but I still like to do the scripting, etc, but don't know a lick of python).

    2. Any way to get this to write the verbose stuff to either syslog, or it's own log file without having to manually redirect it?? I'd like to be able to run it, and see the normal output, but have somewhere I can go if I experience a problem with a service that was not restarted

    3. Is there a way to order the restarts based upon dependancies?? From what I can tell, this script just restarts services as it finds them. So if I need to restart like say apache and the script finds apache first, it will restart it. If the script then finds that I need to restart net.eth0 (bad example, but you'll see my point in a moment), it would restart apache along with it, since apache depends upon net. However, if you could guarantee that the net.eth0 service was restarted first, then apache will be restarted already and your script won't restart it again. This could result in a lot of services that get restarted multiple times. This is not that large of a problem for most boxen, however, it could be quite inefficient, especially for boxen that run large services like databases, etc.


Nonetheless, this is a much better attempt than my hokey shell script that attempts to restart all the services in a particular run-level. My script also actually suffers from the pitfalls of point 3 above as well. Combining this with GLCU (Gentoo Linux Cron Update), revdep-rebuild and cfg-update (in automatic mode), would be a good start to a self maintained Gentoo box....
Back to top
View user's profile Send private message
Rcomian
Apprentice
Apprentice


Joined: 10 Jan 2004
Posts: 174
Location: Uk, Northwest

PostPosted: Thu Feb 23, 2006 7:55 pm    Post subject: Reply with quote

Thanks for your comments psychomunky, I'll look at your points in reverse order, if I may.

3.
I'd actually just been heavily bitten by your 3rd point not 5 minutes before reading your post.


I'll have to look into a reliable way of working out the dependencies. The dependencies are cached in a fairly readable format in /var/lib/init.d/deptree, so that might be a starting point. I'll have a think and when I get a good clean answer I'll have a go at implementing it.

I'm also looking at making a default blacklist if no black or white list is defined at all, as having things like xdm restart in the middle of the night without warning could be devastating and I don't think it's necessarily reasonable to expect everyone who uses this script to think of things like that :)


2.
Looking for more reasons to restart a service is definitely something I want to move on. I had considered the /etc/conf.d issue. I didn't do it originally because it wasn't necessarily a clean relationship - the /etc/conf.d/net file indicated that not all services would have config entries with the exact corresponding name. Looking at the scripts involved with dependencies etc, it appears that the net config file is an exception that's dealt with explicitly and that most other services do expect the config file to have the same name as the init script. So, that said, I may well implement this as it wouldn't be difficult.

There's always the additional issue of external configurations (which are normally in /etc) - we'd probably want to restart if they changed as well.

I'd also like to look at dependencies of the packages. If a shared library has a critical security fix (well, lets face it, in practice this would be any fix) I want to restart all services that depend on that library, nomatter how far up the tree they appear. This should be fairly simple with judicious use of genlop and equery or their equivalent functionality although it would make the script much much slower to run.

One situation I'm very keen on is restarting apache when php is updated. This is awkward because this is a reverse dependancy - php depends on apache, not the other way round. Although this would be easy to do as a one off exception, I'd like to find some generic pricipal rather than coding in specifics, but so far all I can find is that we need to look for reverse dependencies in addition to looking for normal dependencies. If that's the case then fair enough, it's well defined and easy enough to put a command line switch in to turn this off.
Another possibility for this situation would be the ability to define explicitly linked packages (ie, when we check apache, also check updates to this list of packages). The problem I have with this is setting up sensible & expected default behaviour, and in the end it may well only be reproducing a subset of package dependencies anyway, without the benefit of having those dependencies updated by the people who really know what they are.

1.
As far as syslog logging goes, I'm all for it, although I've never even looked at how it would be done. Logging output to a standard output file would be very simple as well - just a new commandline switch and probably a wrapper to replace the "print" statements in the script. If you want to get your hands dirty with script - that would be an excellent place to start ;)
One thing to bear in mind tho, is that actual failures whilst restarting services wouldn't be logged by this mechanism. The script would only log the outputs from the print statements. Obviously the reason I didn't have a log file in the first cut was because it can be manually redirected :)


Now as far as GLCU & a self maintaining gentoo goes, this is what I've been working to (very slowly) over the last couple of months. I've not even heard of GLCU before, I'm very happy to see that I'm not alone in thinking this is possible! (if not, perhaps practical, it's just to cover for my own lazyness).

I'll have a look at GLCU and see if the approach it takes matches what I'm trying to do, it should all fit together quite nicely :)

I'm also kinda keen to have some form of auto retry when a build fails, for example turning down the CFLAGS to see if that helps (or having package specific CFLAGS), but this is another project.
Back to top
View user's profile Send private message
psychomunky
Guru
Guru


Joined: 02 Nov 2004
Posts: 337
Location: Canada

PostPosted: Fri Feb 24, 2006 2:53 am    Post subject: Reply with quote

Rcomian,

All very excellent points you make. It sounds like I am on the same, or at least a very similar page as you.

It is funny that you had been bitten by that, as the reason I mentioned it was, I was also bitten about a half hour before with a cheesy bash script I wrote to restart services.

Anyways, regarding point 2, I hadn't considered the /etc/init.d/net.* vs /etc/conf.d/net issue. I wonder if it is a standard naming thing that if an init script contains a . in its name, it's config file is only named up until the dot?? In this logic, if you see that /etc/conf.d/service has changed, you could check to see if you have an /etc/init.d/service script that can be restarted, and if you cannot find an /etc/init.d/service script is not found, then look of scripts matching the pattern /etc/init.d/service.* that may need to be restarted. the net scripts are the only ones I have seen as well with this sort of "odd" behaviour, but perhaps it was a forward thinking thing on Gentoo's part?

As for GLCU, you can find it at http://glcu.sf.net. The author was busy doing school stuff and hasn't updated for a while, but it almost looks like there is activity on it again. I have already tried your script in conjunction with GLCU by doing the following:

- placed your script in /usr/local/sbin (which is part of the path that GLCU ca use) and called it restart-services.py
- set the "updatetc" option in the glcu config file to be:

Code:
revdep-rebuild && dispatch-conf && /usr/local/sbin/restart-services.py


I believe there are ways for GLCU to be fully automated, but I haven't personally tried them.

As for point 1, I have been an eager beaver and already added support or a separate log file. The modifed script is this:
Code:

#!/usr/bin/python

# Copyright 2006 Jim Tupper
# Licenced under the GPL, any version
# Modified by Chris Roe, February 2006 to include the following:
#       - proper command line help (via a new -h option)
#       - support for writing to a verbose log file (via the -l option)

import os, stat, re, getopt, sys, datetime

##########################################################################################
##  Outputs the usage message of this script.
##
##  Parameters:
##      err  - Optional.  If specified, this error will be printed before
##             the usage message.
##
##########################################################################################
def print_usage (err = ""):
   if not (err == "" or err.isspace()):
       error( "Error: " + err )
       error()
   
   error( "Usage: restart-services.py [-p] [-v] [-d <serviceDir>]" )
   error( "             [-b <blacklist>] [-w <whitelist>] [-l logfile]" )
   error()
   error( "Where:" )
   error( "    -h    Prints this help message." )
   error()
   error( "    -p    Runs the script in pretend mode.  Using this option" )
   error( "          only outputs what would happen.  It does not actually" )
   error( "          (re)start or stop any services." )
   error()
   error( "    -v    Runs the script in verbose mode.  This will output" )
   error( "          extra information regarding exactly what the script" )
   error( "          is doing." )
   error()
   error( "    -d <serviceDir>" )
   error( "          Directs the script to use the file indicated by " )
   error( "          <serviceDir> to be its \"magic\" file.  This file" )
   error( "          is used to determine if a service needs to be " )
   error( "          restarted if the service's init script has been" )
   error( "          modified after the time on this file." )
   error()
   error( "          The script wholly manages this file on its own, so it" )
   error( "          should be a file that you do not care about. This" )
   error( "          file is created if need be." )
   error()
   error( "          The default location for this file is /var/lib/init.d/started" )
   error()
   error( "    -b <blacklist>" )
   error( "          Indicates to the script which file contains a list of" )
   error( "          services that should not be restarted.  This file should" )
   error( "          list each service that is on the blacklist on a separate" )
   error( "          line.  The default location that is used if this option" )
   error( "          is not specified is /etc/service_restart_blacklist" )
   error()
   error( "    -w <whitelist>" )
   error( "          Indicates to the script which services are to be restarted." )
   error( "          Only these services will be restarted. By default, the" )
   error( "          script looks for /etc/service_restart_whitelist as the " )
   error( "          whitelist file." )
   error()
   error( "    -l <logfile>" )
   error( "          If specified, will write verbose output to the specified file." )
   error( "          This option is handy, if it is difficult to redirect the output" )
   error( "          of this script in you situation, or if you want the normal" )
   error( "          amount of output sent to stdout, but want a verbose log to fall" )
   error( "          back on in case of problems." )
   error()
   error( "Return Codes:" )
   error( "    0     Normal, successful completion." )
   error()
   error( "    1     Unable to open log file for writing." )
   error()
   error( "    2     An invalid option was passed to the script." )
   error()

##########################################################################################
##  Outputs messages that are meant to be displayed in verbose mode.
##  If the VERBOSE global is set to True, the message is output to standard out.
##  If the LOG_FILE global is set to a File, the message is written to to that file.
##
##  Parameters:
##      msg  - Optional. The string containing the message to be output.  If ommitted,
##             a blank line is output.
##
##########################################################################################
def verbose(msg=""):
    if VERBOSE:
        sys.stdout.write(msg + "\n")

    if (LOG_FILE != None):
        LOG_FILE.write(str(datetime.datetime.now()) + " - [DEBUG] - " + msg + "\n")

##########################################################################################
##  Outputs messages that are meant to be displayed on standard out.
##  If the LOG_FILE global is set to a File, the message is written to to that
##  file as well.
##
##  Parameters:
##      msg  - Optional. The string containing the message to be output.  If ommitted,
##             a blank line is output.
##
##########################################################################################
def info(msg=""):
    sys.stdout.write(msg + "\n")

    if not (LOG_FILE == None):
        LOG_FILE.write(str(datetime.datetime.now()) + " - [INFO ] - " + msg + "\n")

##########################################################################################
##  Outputs messages that are meant to be displayed on standard err.
##  If the LOG_FILE global is set to a File, the message is written to to that
##  file as well.
##
##  Parameters:
##      msg  - Optional. The string containing the message to be output.  If ommitted,
##             a blank line is output.
##
##########################################################################################
def error(msg=""):
    sys.stderr.write(msg + "\n")

    if not (LOG_FILE == None):
        LOG_FILE.write(str(datetime.datetime.now()) + " - [ERROR] - " + msg + "\n")

##########################################################################################
##  Outputs messages that are meant to be displayed on standard err as warnings to the user
##  If the LOG_FILE global is set to a File, the message is written to to that
##  file as well.
##
##  Parameters:
##      msg  - Optional. The string containing the message to be output.  If ommitted,
##             a blank line is output.
##
##########################################################################################
def warn(msg=""):
    sys.stderr.write(msg + "\n")

    if not (LOG_FILE == None):
       LOG_FILE.write(str(datetime.datetime.now()) + " - [WARN ] - " + msg + "\n")

##########################################################################################
##  MAINLINE Script.
##########################################################################################
PRETEND=False
VERBOSE=False
SVCDIR="/var/lib/init.d/started"
SERVICE_BLACKLIST="/etc/service_restart_blacklist"
SERVICE_WHITELIST="/etc/service_restart_whitelist"
LOG_FILE=None

try:
   opts, args = getopt.getopt(sys.argv[1:], "hpvd:b:w:l:")
except getopt.GetoptError, (errno, strerror):
   # print help information and exit:
   print_usage("Unknown Option: " + strerror)
   sys.exit(2)

for o, a in opts:
   if o=="-h":
      print_usage()
      sys.exit(0)
   if o=="-p":
      PRETEND=True
   if o=="-v":
      VERBOSE=True
   if o=="-d":
      SVCDIR=a
   if o=="-b":
      SERVICE_BLACKLIST=a
   if o=="-w":
      SERVICE_WHITELIST=a
   if o=="-l":
      try:
          info("Writing log to " + a)
          LOG_FILE= file(a, 'w+')
      except:
          error("Unable to open '" + a + "' for writing!!")
          sys.exit (1)

# Load the blacklist
blacklist = None
if os.path.exists(SERVICE_BLACKLIST):
   blacklist = []
   f = open(SERVICE_BLACKLIST, 'r')
   try:
      for rule in f:
         try:
            blacklist.append( re.compile("^"+rule[:-1]+"$") )
            verbose("Loaded blacklist rule: " + rule)
         except:
            error("Failed to load blacklist rule: " + rule)
   finally:
      f.close()

# Load the whitelist
whitelist = None
if os.path.exists(SERVICE_WHITELIST):
   whitelist = []
   f = open(SERVICE_WHITELIST, 'r')
   try:
      for rule in f:
         try:
            whitelist.append( re.compile("^"+rule[:-1]+"$") )
            verbose("Loaded whitelist rule: " + rule)
         except:
            error("Failed to load whitelist rule: " + rule)
   finally:
      f.close()

# Go through the list of started services
for service in os.listdir(SVCDIR):
   fullpath = os.path.join(SVCDIR, service)

   # First check to see if the entry is actually a link at all.
   servicestat = os.lstat(fullpath)

   if not stat.S_ISLNK(servicestat[stat.ST_MODE]):
      # Just ignore this, it's a poisson rouge
      continue

   # By default, assume we're going to restart this service, and then find reasons not to
   do_restart=True

   # This is the start of the description string for the service
   description = "Service: " + service

   # Get the details for the file this link points to
   targetstat = os.stat(fullpath)

   # see if the init.d script is older than the link to it
   if targetstat[stat.ST_MTIME] > servicestat[stat.ST_MTIME]:
      description=description+" - service script modified"
   else:
      do_restart=False
      description=description+" - restart not required"

      # Todo - other reasons to restart should be added here


   # Find the destination that the link points to
   targetpath = os.path.realpath(fullpath)

   # See if there are outstanding updates for the service script
   if do_restart:
      # We'll use this to look for files which are unapplied updates to the target file
      filematcher = re.compile("\._cfg...._" + service)

      for item in os.listdir(os.path.dirname(targetpath)):
         if filematcher.match(item) != None:
            # We've found an un-applied update to this file, we don't want to restart this
            # service until that update has been applied.
            do_restart=False
            description=description+" - outstanding service update"
            break


   # Apply the blacklist rules
   if do_restart and blacklist != None:
      for rule in blacklist:
         if rule.match(service):
            # This service has been blacklisted
            do_restart=False
            description=description+" - blacklisted"
            continue

      if do_restart:
         description=description+" - not blacklisted"


   # Apply the whitelist rules
   if do_restart and whitelist != None:
      do_restart=False
      for rule in whitelist:
         if rule.match(service):
            # This service has been whitelisted
            do_restart=True
            description=description+" - whitelisted"

      if do_restart==False:
         do_restart=False
         description=description+" - not whitelisted"



   # Actually do the restart if needed
   if do_restart:
      # If we get here then the service should be restarted
      description=description+" - restarting"

      # Tell the user what's going on   
      verbose(description)

      if not PRETEND:
         # Actually do the restart
         servicescript = os.readlink(fullpath)
         os.spawnl(os.P_WAIT, servicescript, servicescript, "restart")
      else:
         info(service+" would restart")
   else:
      # Tell the user what's going on     
      verbose(description)

if LOG_FILE != None:
    LOG_FILE.flush()
    LOG_FILE.close()



Or if you prefer, the contents of the patch file for your original version of the script:

Code:

diff -Naur v1.0/restart-services.py v1.1/restart-services.py
--- v1.0/restart-services.py   2006-02-23 21:53:45.000000000 -0700
+++ v1.1/restart-services.py   2006-02-23 21:54:34.000000000 -0700
@@ -2,23 +2,166 @@
 
 # Copyright 2006 Jim Tupper
 # Licenced under the GPL, any version
-
-import os, stat, re, getopt, sys
-
-try:
-   opts, args = getopt.getopt(sys.argv[1:], "pvd:b:w:")
-except getopt.GetoptError, (errno, strerror):
-   # print help information and exit:
-   print "Unknown option: " + strerror
-   sys.exit(2)
-
+# Modified by Chris Roe, February 2006 to include the following:
+#       - proper command line help (via a new -h option)
+#       - support for writing to a verbose log file (via the -l option)
+
+import os, stat, re, getopt, sys, datetime
+
+##########################################################################################
+##  Outputs the usage message of this script.
+##
+##  Parameters:
+##      err  - Optional.  If specified, this error will be printed before
+##             the usage message.
+##
+##########################################################################################
+def print_usage (err = ""):
+   if not (err == "" or err.isspace()):
+       error( "Error: " + err )
+       error()
+   
+   error( "Usage: restart-services.py [-p] [-v] [-d <serviceDir>]" )
+   error( "             [-b <blacklist>] [-w <whitelist>] [-l logfile]" )
+   error()
+   error( "Where:" )
+   error( "    -h    Prints this help message." )
+   error()
+   error( "    -p    Runs the script in pretend mode.  Using this option" )
+   error( "          only outputs what would happen.  It does not actually" )
+   error( "          (re)start or stop any services." )
+   error()
+   error( "    -v    Runs the script in verbose mode.  This will output" )
+   error( "          extra information regarding exactly what the script" )
+   error( "          is doing." )
+   error()
+   error( "    -d <serviceDir>" )
+   error( "          Directs the script to use the file indicated by " )
+   error( "          <serviceDir> to be its \"magic\" file.  This file" )
+   error( "          is used to determine if a service needs to be " )
+   error( "          restarted if the service's init script has been" )
+   error( "          modified after the time on this file." )
+   error()
+   error( "          The script wholly manages this file on its own, so it" )
+   error( "          should be a file that you do not care about. This" )
+   error( "          file is created if need be." )
+   error()
+   error( "          The default location for this file is /var/lib/init.d/started" )
+   error()
+   error( "    -b <blacklist>" )
+   error( "          Indicates to the script which file contains a list of" )
+   error( "          services that should not be restarted.  This file should" )
+   error( "          list each service that is on the blacklist on a separate" )
+   error( "          line.  The default location that is used if this option" )
+   error( "          is not specified is /etc/service_restart_blacklist" )
+   error()
+   error( "    -w <whitelist>" )
+   error( "          Indicates to the script which services are to be restarted." )
+   error( "          Only these services will be restarted. By default, the" )
+   error( "          script looks for /etc/service_restart_whitelist as the " )
+   error( "          whitelist file." )
+   error()
+   error( "    -l <logfile>" )
+   error( "          If specified, will write verbose output to the specified file." )
+   error( "          This option is handy, if it is difficult to redirect the output" )
+   error( "          of this script in you situation, or if you want the normal" )
+   error( "          amount of output sent to stdout, but want a verbose log to fall" )
+   error( "          back on in case of problems." )
+   error()
+   error( "Return Codes:" )
+   error( "    0     Normal, successful completion." )
+   error()
+   error( "    1     Unable to open log file for writing." )
+   error()
+   error( "    2     An invalid option was passed to the script." )
+   error()
+
+##########################################################################################
+##  Outputs messages that are meant to be displayed in verbose mode.
+##  If the VERBOSE global is set to True, the message is output to standard out.
+##  If the LOG_FILE global is set to a File, the message is written to to that file.
+##
+##  Parameters:
+##      msg  - Optional. The string containing the message to be output.  If ommitted,
+##             a blank line is output.
+##
+##########################################################################################
+def verbose(msg=""):
+    if VERBOSE:
+        sys.stdout.write(msg + "\n")
+
+    if (LOG_FILE != None):
+        LOG_FILE.write(str(datetime.datetime.now()) + " - [DEBUG] - " + msg + "\n")
+
+##########################################################################################
+##  Outputs messages that are meant to be displayed on standard out.
+##  If the LOG_FILE global is set to a File, the message is written to to that
+##  file as well.
+##
+##  Parameters:
+##      msg  - Optional. The string containing the message to be output.  If ommitted,
+##             a blank line is output.
+##
+##########################################################################################
+def info(msg=""):
+    sys.stdout.write(msg + "\n")
+
+    if not (LOG_FILE == None):
+        LOG_FILE.write(str(datetime.datetime.now()) + " - [INFO ] - " + msg + "\n")
+
+##########################################################################################
+##  Outputs messages that are meant to be displayed on standard err.
+##  If the LOG_FILE global is set to a File, the message is written to to that
+##  file as well.
+##
+##  Parameters:
+##      msg  - Optional. The string containing the message to be output.  If ommitted,
+##             a blank line is output.
+##
+##########################################################################################
+def error(msg=""):
+    sys.stderr.write(msg + "\n")
+
+    if not (LOG_FILE == None):
+        LOG_FILE.write(str(datetime.datetime.now()) + " - [ERROR] - " + msg + "\n")
+
+##########################################################################################
+##  Outputs messages that are meant to be displayed on standard err as warnings to the user
+##  If the LOG_FILE global is set to a File, the message is written to to that
+##  file as well.
+##
+##  Parameters:
+##      msg  - Optional. The string containing the message to be output.  If ommitted,
+##             a blank line is output.
+##
+##########################################################################################
+def warn(msg=""):
+    sys.stderr.write(msg + "\n")
+
+    if not (LOG_FILE == None):
+       LOG_FILE.write(str(datetime.datetime.now()) + " - [WARN ] - " + msg + "\n")
+
+##########################################################################################
+##  MAINLINE Script.
+##########################################################################################
 PRETEND=False
 VERBOSE=False
 SVCDIR="/var/lib/init.d/started"
 SERVICE_BLACKLIST="/etc/service_restart_blacklist"
 SERVICE_WHITELIST="/etc/service_restart_whitelist"
+LOG_FILE=None
+
+try:
+   opts, args = getopt.getopt(sys.argv[1:], "hpvd:b:w:l:")
+except getopt.GetoptError, (errno, strerror):
+   # print help information and exit:
+   print_usage("Unknown Option: " + strerror)
+   sys.exit(2)
 
 for o, a in opts:
+   if o=="-h":
+      print_usage()
+      sys.exit(0)
    if o=="-p":
       PRETEND=True
    if o=="-v":
@@ -29,6 +172,13 @@
       SERVICE_BLACKLIST=a
    if o=="-w":
       SERVICE_WHITELIST=a
+   if o=="-l":
+      try:
+          info("Writing log to " + a)
+          LOG_FILE= file(a, 'w+')
+      except:
+          error("Unable to open '" + a + "' for writing!!")
+          sys.exit (1)
 
 # Load the blacklist
 blacklist = None
@@ -39,10 +189,9 @@
       for rule in f:
          try:
             blacklist.append( re.compile("^"+rule[:-1]+"$") )
-            if VERBOSE:
-               print "Loaded blacklist rule: " + rule
+            verbose("Loaded blacklist rule: " + rule)
          except:
-            print "Failed to load blacklist rule: " + rule
+            error("Failed to load blacklist rule: " + rule)
    finally:
       f.close()
 
@@ -55,10 +204,9 @@
       for rule in f:
          try:
             whitelist.append( re.compile("^"+rule[:-1]+"$") )
-            if VERBOSE:
-               print "Loaded whitelist rule: " + rule
+            verbose("Loaded whitelist rule: " + rule)
          except:
-            print "Failed to load whitelist rule: " + rule
+            error("Failed to load whitelist rule: " + rule)
    finally:
       f.close()
 
@@ -143,16 +291,18 @@
       description=description+" - restarting"
 
       # Tell the user what's going on   
-      if VERBOSE:
-         print description
+      verbose(description)
 
       if not PRETEND:
          # Actually do the restart
          servicescript = os.readlink(fullpath)
          os.spawnl(os.P_WAIT, servicescript, servicescript, "restart")
       else:
-         print service+" would restart"
+         info(service+" would restart")
    else:
       # Tell the user what's going on     
-      if VERBOSE:
-         print description
\ No newline at end of file
+      verbose(description)
+
+if LOG_FILE != None:
+    LOG_FILE.flush()
+    LOG_FILE.close()



If you look closely, I took a few extra liberties in this code....feel free to remove them. They are:
- I added a proper help message upon receiving an invalid option, or upon invocation of the -h option.
- Anything that was printed in an except clause, I wrote to stderr instead of stdout, as per a general unix "best practice". At the moment I considered them errors, but some could probably be considered warnings instead.
- Anything else that was printed, goes to standard out.
- The log file outputs the level of output, as well as the date and time in addition to the actual message from the code.
Back to top
View user's profile Send private message
Rcomian
Apprentice
Apprentice


Joined: 10 Jan 2004
Posts: 174
Location: Uk, Northwest

PostPosted: Sat Feb 25, 2006 11:34 am    Post subject: Reply with quote

Fantastic patch, psychomunky - thanks for that.

I've integrated your changes, all good stuff and made a version 1.2.
This script includes a first attempt to handle the dependency stuff. At first I expected to be reading the dependency cache in /var/lib/init.d, then I thought I'd at least be parsing the output of "/etc/init.d/<service> needsme" and doing some cool stuff with it.
Then I realised that if you call "pause" and then "start", you restart the service without restarting any dependants. I'm a little dissapointed that the solution could be so simple, but you can't have everything.

Now I'm not sure if the dependants SHOULD be restarted in this sort of case or not. I suspect they might, but since this way is so simple I'm going to give it a try before doing anything more complex.

As a test to see the sort of effect that might be happening here I configured boa to listen exclusively on my external interface. I then paused the network and noticed that boa couldn't be reached, then started the network again, and boa came back again, all by itself.

Now by default I've stopped restarting the network interfaces (I just feel that's got too much potential for danger, or at least annoyance), but I'm taking the results above as an indication that pausing and starting the services may be enough.

Obviously, an option to restart dependencies as well, with full tracking for this, should be included, but in the interests of release early release often, I'll put this out and test it as it is, just to understand what's going on myself and see if there are any more pressing issues than that.

One issue I have noticed is that restarting services after an update often just doesn't work, I'm not talking about this script, it doesn't work even if you do it manually. I suspect the reason could be that a service is started with one version of the init script, then another version of the init script tries to stop it. If there's been any changes in how this start/stop happens (and after all, why else would an init-script change?) then it looks like the new script can't find the old service to stop and fails. This results in the service still actually running and needing to be manually killed, zapped and started again.

I suspect that the solution may be that if an unapplied update to an init script is found, the service should be stopped (BEFORE the update is applied), then the updates applied and the script restarted again. This means we get a wrapper to the config update scripts (etc-update, dispatch-conf or cfg-update), where we stop potential scripts early, try to apply the updates then ensure the services are started again after the update.
This is a fairly fundamental change to the script and I'm wondering if it would be better implemented as part of the config update scripts.
It could be done in this sort of way:
Code:
restart-services --stop-only
cfg-update -ua
restart-services


I'm still thinking about the implications of this. It might turn out that we really need to stop the service just before portage installs the new files, and start the service again after the config's have been updated, but this sounds like it needs quite tight integration to portage.

Anyway, I hope this version of the script is at least slightly useful. Any ideas are very welcome.
Back to top
View user's profile Send private message
psychomunky
Guru
Guru


Joined: 02 Nov 2004
Posts: 337
Location: Canada

PostPosted: Tue Feb 28, 2006 6:21 am    Post subject: Reply with quote

Glad you liked the patch....

As you could obviously tell, I haven't done a whack load of python scripting. I am assuming that moving the function headers I had into triple quotes is something similar to javadoc comments in Java (if, of course, you are familiar with Java). Is this correct??

Anyways, I see that you've discovered some more ways for me to try to melt my brain. All interesting points... But I have a couple questions before I'll attempt to help:

1. This business of pause/start....does it work for all services?? I have looked through a lot of init.d scripts and off the top of my head, I cannot recall one that actually defines a pause() method. As well, are there going to be some services that actually do a "true" pause and remain in memory and when a start is invoked actually just continue where they left off?? I definately agree that this is a way to try out to see if it works first....the simpler the better IMHO.

2. Restarting the network interfaces..definately could be trouble there. Unless you are using 3rd party drivers, or really need a new feature in the new baselayout, I cannot think of a really good reason to need to restart the network. Although, I must say, I have restarted the network on a remote box via SSH and was not dropped at all....I think it is a situation similar to your boa experience. (Stopping on the other hand will completely hoop your session :) I agree with omitting them for now...perhaps the best long term approach, should this ever get into portage would be to have a default blacklist that includes xdm and net

3. This issue of new init scripts not being able to stop the old ones is also something I have found upon occasion. I have never had it with any of the baselayout scripts (net, localmount, etc), but I have experienced it a few times with things like apache. It doesn't happen often, but it does happen. I think that this might be a bug or a lack of a feature in gentoo's rc system. I know that the start-stop-daemon executable is used to start and stop most services, and that it and the rc system track what is running vs what is not via pid files. Perhaps the rc system should be enhanced to track the pid files along with the service names in order to be able to tell if there is a process still running for that particular service and if the stop() method defined in the init script fails, it should forcefully terminate the process if it exists. This behaviour could be controlled via the /etc/conf.d/rc file.

Your solution on this would also work. But both of our solutions involve modifying other utilities. If there was some way to determine the PID file that is being used by the rc system, then your script could do the manually kill and zap thing before restarting the service.

Although, perhaps we are trying to solve world hunger or invoke world peace here. Maybe the simplest and best solution is to just e-mail the sysadmin if there is a situation like this, or if a service errors and cannot be restarted. Realistically, we cannot expect things to go smoothly all of the time, but if the script could notify a sys admin that something needs to be looked at, well then, at least we don't have to worry about the box until something goes wrong. This is the way I have seen a few things take, like GLCU and a lot of my DBA monitoring tools....they try to do updates and maintain the system like 80-90% of the time, but when they encounter that other 10-20%, they detect it and let the admin know.

I will install the new version of the script tomorrow and take it for a test drive, but for now, I must get some sleep.
Back to top
View user's profile Send private message
Rcomian
Apprentice
Apprentice


Joined: 10 Jan 2004
Posts: 174
Location: Uk, Northwest

PostPosted: Fri Mar 03, 2006 1:35 pm    Post subject: Reply with quote

Pause/start should work for all services. According to the documentation:
Quote:
If you want to stop a service, but not the services that depend on it, you can use the pause argument:

this comes from http://www.gentoo.org/doc/en/handbook/handbook-x86.xml?part=2&chap=4

Now if that works for all services, my only answer can be - it should do. I get the impression that the services scripts are more complex than they let on. The basic "stop" function defined in the script only ever stops the actual service, its the runinitscripts shell script that actually finds deps and does interesting things with them. So I think "pause" as an argument actually gets translated into a call to "stop", although I'll have to confirm that.

Also, by the same token, if pause really is translated into a "stop" command, then there won't be any services which actually do an in memory pause.

As for pid files, if there was a generic way of finding these then we it would be easy to get service watchdogs & restarting crashed services going properly as well.

I agree that detecting errors whilst stopping a service is going to be the most useful situation. I think the next patch will be to detect if the "pause" and "start" actually worked. Then I guess it's a matter of adding in the emailing facility.
Back to top
View user's profile Send private message
Rcomian
Apprentice
Apprentice


Joined: 10 Jan 2004
Posts: 174
Location: Uk, Northwest

PostPosted: Sat Jun 17, 2006 9:27 pm    Post subject: Reply with quote

Ok, I've now got a new script using portage's new hook capabilities.
https://forums.gentoo.org/viewtopic-p-3388303.html#3388303
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum