[Users] vzctl chkpnt speed

Mon Jan 21 04:32:30 EST 2013

> On Fre, 2013-01-18 at 16:26 +0100, Roman Haefeli wrote:
>> Hi all
>>
>> Only recently I discovered that online migration seems to work for us
>> now. CT on NFS or NFS mounted inside a CT is non-issue now.
>>
>> We are running all our CTs on an NFS filesystem shared between
>> hostnodes. While checkpointing and restoring works flawlessly with that
>> setup, I noticed that "vzctl chkpnt CTID" writes slower to the NFS mount
>> compared to a dd-write, for instance.
>>
>> 'dd if=/dev/zero of=/mnt/nfs/deleteme bs=1M' writes with approx. 70MB/s.
>> 'iotop' shows that 'vzctl chkpnt CTID' writes with only 18MB/s to the
>> same dir.
>>
>> However, the speed of writing is similar to the one of dd when I use the
>> 'bs=8k' option for dd. This makes me assume that quite some write
>> performance could be gained, if checkpointing would write bigger blocks
>> at time. I haven't read the respective source code to confirm my
>> assumption that small block sizes are used, as my skills are far too
>> limited, but if that is really the case, wouldn't it make sense to use
>> bigger writes in order to improve checkpointing performance?
>>
>> What do you think?
>
> I found a way to speed up checkpointing so it uses the maximum possible
> write speed. On the hostnodes we mount the nfs share with the 'sync'
> mount option (in order to avoid mutual storage lags between the CTs).
> However, when using 'async' the write speed is not dependent on the
> block size anymore and is always fast, this means also checkpointing is
> fast with 'async'. As we still want the CT's private areas to be mounted
> with 'sync', the solution was to use a separate mount for the /vz/dump
> directory with mount option 'async'. This way we can achieve maximum
> checkpointing speed (which is ~70MB/s on our machines).
>

Hello, Roman.
It's not correct to compare checkpointing to dd, because it doesn't perform sequential writes. We perform a lot of disk seek operations during checkpointing.
And yes, NFS works much faster in async mode (async mode shadows seek operations and allows to perform many writes before awaiting for attributes update from 
server), than in sync. This can help you to reduce CPT time.
But you have to fsync resulting checkpointing image on source node to make sure that it's consistent on shared storage before resuming on another node.

> Roman
>
>
> _______________________________________________
> Users mailing list
> Users at openvz.org
> http://lists.openvz.org/mailman/listinfo/users
>
>
>

-- 
Best regards,
Stanislav Kinsbursky