[Users] OpenVZ,, CentOS SCL and failed cron jobs?

Jeffrey Walton noloader at gmail.com
Fri Dec 21 13:03:56 MSK 2018


On Fri, Dec 21, 2018 at 4:18 AM Konstantin Khorenko
<khorenko at virtuozzo.com> wrote:
>
> On 12/20/2018 07:39 PM, Jeffrey Walton wrote:
> >
> > I'm performing a post-mortem on our [failed] disaster recovery procedures.
> >
> > We have an OpenVZ-based CentOS 7 VM. We use it for an open source
> > project website and wiki. Our backup job in /etc/cron.daily has not
> > been executing (nor has other cron jobs, like yum-daily.cron). We
> > cannot find mention of the failures in dmesg or other logs in
> > /var/log.
> >
> > It looks like things broke sometime around December 2017 based on the
> > date of our last backup. (It is embarrassing, but like I said there
> > were no logged failures so I did not know to investigate). I don't
> > keep change control logs, but the best I can tell our last two major
> > configuration changes were:
> >
> > * Migrate OpenVZ 7.1 -> 7.2, June 2016
> > * Enable CentOS SCL, December 2017
> ...
> unfortunately i have not heard about issues related with OpenVZ + SCL,
> seems you are challenged to investigate it.
>
> i'd start with checking if cron service is run at all,

Thanks Konstantin.

I tracked it down to a daily cron job. Backup ran for 7 seconds but
did not log its error:

    Dec  19 08:04:43 ftpit run-parts(/etc/cron.daily)[14217]: starting
gdrive-backup
    Dec  19 08:04:50 ftpit run-parts(/etc/cron.daily)[17162]: finished
gdrive-backup

Our site is static so a 7 second backup seems reasonable to me for an
incremental. (https://www.cryptopp.com)

In reality this is what was happening (from the command line):

    # duplicity --allow-source-mismatch ...
sftp://XXXX:YYYY@zonk.example.com:22480/backup
    ... Failed: No module named paramiko

There is a Paramiko in the original Python. However, I failed to
install Paramiko for the SCL version of Python. And exercising
duplicity from the command line failed to reveal the problem:

    # duplicity --version
    duplicity 0.7.18.2

In the end it looks like an exercise in why airplanes crash...

  1. CentOS 7 ships with antique software
      - users have to do something special to get into a good state
      - users must enable SCL
  2. SCL is missing software
      - users have to do something special to get into a good state
      - Components like Duplicity have to be built from sources
  3. Linux paths are still broken
      - users have to do something special to get into a good state
      - 20 years or so and counting
  4. Cron misreports job results
      - swallows exceptions and errors
  5. User (me) configured machine incorrectly
      - SCL configuration was wrong
  6. User (me) monitored machine incorrectly
      - Did not detect cron job failures

I'd like to strangle the idiot who thought it was a good idea to allow
Cron to swallow exceptions and allow things to silently fail. I bet
that genius is a CTO of a Fortune 500 company.

Jeff


More information about the Users mailing list