[Users] OpenVZ,, CentOS SCL and failed cron jobs?

Jeffrey Walton noloader at gmail.com
Fri Dec 21 13:34:44 MSK 2018


On Fri, Dec 21, 2018 at 5:03 AM Jeffrey Walton <noloader at gmail.com> wrote:
>
> On Fri, Dec 21, 2018 at 4:18 AM Konstantin Khorenko
> <khorenko at virtuozzo.com> wrote:
> >
> > On 12/20/2018 07:39 PM, Jeffrey Walton wrote:
> > >
> > > I'm performing a post-mortem on our [failed] disaster recovery procedures.
> > >
> > > We have an OpenVZ-based CentOS 7 VM. We use it for an open source
> > > project website and wiki. Our backup job in /etc/cron.daily has not
> > > been executing (nor has other cron jobs, like yum-daily.cron). We
> > > cannot find mention of the failures in dmesg or other logs in
> > > /var/log.
> > >
> > > It looks like things broke sometime around December 2017 based on the
> > > date of our last backup. (It is embarrassing, but like I said there
> > > were no logged failures so I did not know to investigate). I don't
> > > keep change control logs, but the best I can tell our last two major
> > > configuration changes were:
> > >
> > > * Migrate OpenVZ 7.1 -> 7.2, June 2016
> > > * Enable CentOS SCL, December 2017
> > ...
> > unfortunately i have not heard about issues related with OpenVZ + SCL,
> > seems you are challenged to investigate it.
> >
> > i'd start with checking if cron service is run at all,
>
> Thanks Konstantin.
>
> I tracked it down to a daily cron job. Backup ran for 7 seconds but
> did not log its error:
>
>     Dec  19 08:04:43 ftpit run-parts(/etc/cron.daily)[14217]: starting
> gdrive-backup
>     Dec  19 08:04:50 ftpit run-parts(/etc/cron.daily)[17162]: finished
> gdrive-backup
>
> Our site is static so a 7 second backup seems reasonable to me for an
> incremental. (https://www.cryptopp.com)
>
> In reality this is what was happening (from the command line):
>
>     # duplicity --allow-source-mismatch ...
> sftp://XXXX:YYYY@zonk.example.com:22480/backup
>     ... Failed: No module named paramiko
>
> There is a Paramiko in the original Python. However, I failed to
> install Paramiko for the SCL version of Python. And exercising
> duplicity from the command line failed to reveal the problem:
>
>     # duplicity --version
>     duplicity 0.7.18.2
>
> In the end it looks like an exercise in why airplanes crash...
>
>   1. CentOS 7 ships with antique software
>       - users have to do something special to get into a good state
>       - users must enable SCL
>   2. SCL is missing software
>       - users have to do something special to get into a good state
>       - Components like Duplicity have to be built from sources
>   3. Linux paths are still broken
>       - users have to do something special to get into a good state
>       - 20 years or so and counting
>   4. Cron misreports job results
>       - swallows exceptions and errors
>   5. User (me) configured machine incorrectly
>       - SCL configuration was wrong
>   6. User (me) monitored machine incorrectly
>       - Did not detect cron job failures
>
> I'd like to strangle the idiot who thought it was a good idea to allow
> Cron to swallow exceptions and allow things to silently fail. I bet
> that genius is a CTO of a Fortune 500 company.

Re:

> There is a Paramiko in the original Python. However, I failed to
> install Paramiko for the SCL version of Python.

Looking at reports like
https://bugs.launchpad.net/ubuntu/+source/duplicity/+bug/959089 , the
problems are not a one-off problem for us. Its a chronic problem
across distros that has not been fixed.

Packages and software need to be in a good state. They have to "just
work" out of the box. When are distros going to learn that RTFM does
not work? If it was going to work it would have happened in the last
50 years or so.

The engineers responsible for this mess meet the definition of insane.
They keep doing the same thing over and over again expecting a
different outcome. It is completely irrational behavior.

(end gripe)

Jeff


More information about the Users mailing list