[Users] OpenVZ,, CentOS SCL and failed cron jobs?
Jeffrey Walton
noloader at gmail.com
Fri Dec 21 13:03:56 MSK 2018
On Fri, Dec 21, 2018 at 4:18 AM Konstantin Khorenko
<khorenko at virtuozzo.com> wrote:
>
> On 12/20/2018 07:39 PM, Jeffrey Walton wrote:
> >
> > I'm performing a post-mortem on our [failed] disaster recovery procedures.
> >
> > We have an OpenVZ-based CentOS 7 VM. We use it for an open source
> > project website and wiki. Our backup job in /etc/cron.daily has not
> > been executing (nor has other cron jobs, like yum-daily.cron). We
> > cannot find mention of the failures in dmesg or other logs in
> > /var/log.
> >
> > It looks like things broke sometime around December 2017 based on the
> > date of our last backup. (It is embarrassing, but like I said there
> > were no logged failures so I did not know to investigate). I don't
> > keep change control logs, but the best I can tell our last two major
> > configuration changes were:
> >
> > * Migrate OpenVZ 7.1 -> 7.2, June 2016
> > * Enable CentOS SCL, December 2017
> ...
> unfortunately i have not heard about issues related with OpenVZ + SCL,
> seems you are challenged to investigate it.
>
> i'd start with checking if cron service is run at all,
Thanks Konstantin.
I tracked it down to a daily cron job. Backup ran for 7 seconds but
did not log its error:
Dec 19 08:04:43 ftpit run-parts(/etc/cron.daily)[14217]: starting
gdrive-backup
Dec 19 08:04:50 ftpit run-parts(/etc/cron.daily)[17162]: finished
gdrive-backup
Our site is static so a 7 second backup seems reasonable to me for an
incremental. (https://www.cryptopp.com)
In reality this is what was happening (from the command line):
# duplicity --allow-source-mismatch ...
sftp://XXXX:YYYY@zonk.example.com:22480/backup
... Failed: No module named paramiko
There is a Paramiko in the original Python. However, I failed to
install Paramiko for the SCL version of Python. And exercising
duplicity from the command line failed to reveal the problem:
# duplicity --version
duplicity 0.7.18.2
In the end it looks like an exercise in why airplanes crash...
1. CentOS 7 ships with antique software
- users have to do something special to get into a good state
- users must enable SCL
2. SCL is missing software
- users have to do something special to get into a good state
- Components like Duplicity have to be built from sources
3. Linux paths are still broken
- users have to do something special to get into a good state
- 20 years or so and counting
4. Cron misreports job results
- swallows exceptions and errors
5. User (me) configured machine incorrectly
- SCL configuration was wrong
6. User (me) monitored machine incorrectly
- Did not detect cron job failures
I'd like to strangle the idiot who thought it was a good idea to allow
Cron to swallow exceptions and allow things to silently fail. I bet
that genius is a CTO of a Fortune 500 company.
Jeff
More information about the Users
mailing list