[Users] [PATCH] change timeouts, startup behaviour ocf:heartbeat:ManageVE (OpenVZ VE cluster resource)

Thu Mar 21 10:59:17 EDT 2013

On 13/03/13 16:18, Dejan Muhamedagic wrote:
> On Tue, Mar 12, 2013 at 12:58:44PM +0000, Tim Small wrote:
>   
>> The attached patch changes the behaviour of the OpenVZ virtual machine
>> cluster resource agent, so that:
>>
>> 1. The default resource stop timeout is greater than the hardcoded
>>     
> Just for the record: where is this hardcoded actually? Is it
> also documented?
>   

Defined here:

http://git.openvz.org/?p=vzctl;a=blob;f=include/env.h#l26

/** Shutdown timeout.
 */
#define MAX_SHTD_TM             120

Used by env_stop() here:

http://git.openvz.org/?p=vzctl;a=blob;f=src/lib/env.c#l821
<http://git.openvz.org/?p=vzctl;a=blob;f=src/lib/env.c;h=2da848d87904d9e572b7da5c0e7dc5d93217ae5b;hb=HEAD#l818>

       for (i = 0; i < MAX_SHTD_TM; i++) {
                sleep(1);
                if (!vps_is_run(h, veid)) {
                        ret = 0;
                        goto out;
                }
        }

kill_vps:
        logger(0, 0, "Killing container ...");

Perhaps something based on wall time would be more consistent, and I can
think of cases where users might want it to be a bit higher, or a bit
lower, but currently it's just fixed at 120s.

I can't find the timeout documented anywhere.

>> 2. The start operation now waits for resource startup to complete i.e.
>> for the VE to "boot up" (so that the cluster manager can detect VEs
>> which are hanging on startup, and also throttle simultaneous startups,
>> so as not-to overburden the node in question).  Since the start
>> operation now does a lot more, the default start operation timeout has
>> been increased.
>>     
> I'm not sure if we can introduce this just like that. It changes
> significantly the agent's behaviour.
>   

Yes.  I think it probably makes the agent's behavour a bit more correct,
but that depends what your definition of a VE resource having "started"
is, I suppose.  Currently with this agent the says that it has started
as soon as it has begun the boot process, whereas with the proposed
change, it would mean that it has started when it has booted up (which
should imply "is operational").

Although my personal reason for the change was so that I had a
reasonable way to avoid booting tens of VEs on the host machine at the
same time, I can think of other benefits - such as making other
resources depend on the fully-booted VE, or detecting the case where a
faulty VE host node causes the VE to hang during start-up.

I suppose other options are:

1. Make start --wait the default, but make starting without waiting
selectable using a RA parameter.

2. Make start without waiting the default, but make --wait selectable
using a RA parameter.

I suppose that the change will break configurations where the
administrator has hard coded a short timeout, and this change is
introduced as part of an upgrade, which I suppose is a bad thing...

> BTW, how does vzctl know when the VE is started?
>   

The vzctl manual page says that 'vzctl start --wait' will "attempt to
wait till the default runlevel is reached" within the container.

> If the description above matches
> the code modifications, then there should be three instead of
> one patch.
>   

Fair enough - I was being lazy!

Tim.

-- 
South East Open Source Solutions Limited
Registered in England and Wales with company number 06134732.  
Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
VAT number: 900 6633 53  http://seoss.co.uk/ +44-(0)1273-808309

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/users/attachments/20130321/fc5ddd35/attachment.html>