[Users] Live Migration Optimal execution

Nipun Arora nipunarora2512 at gmail.com
Thu Nov 27 16:14:33 PST 2014


Thanks, the speed is improved by an order of magnitude :)

btw. is there any benchmark, that you all have looked into for testing how
good/practical live migration is for real-world systems?
Additionally, I'm trying to run a java application(dacapo benchmark), but
keep having trouble in getting java to run..

java -version

Error occurred during initialization of VM

Could not reserve enough space for object heap

Could not create the Java virtual machine.

I've put my vz conf file below, can anyone suggest what could be the
problem?

Thanks
Nipun

# UBC parameters (in form of barrier:limit)

KMEMSIZE="14372700:14790164"

LOCKEDPAGES="2048:2048"

PRIVVMPAGES="65536:69632"

SHMPAGES="21504:21504"

NUMPROC="240:240"

PHYSPAGES="0:131072"

VMGUARPAGES="33792:unlimited"

OOMGUARPAGES="26112:unlimited"

NUMTCPSOCK="360:360"

NUMFLOCK="188:206"

NUMPTY="16:16"

NUMSIGINFO="256:256"

TCPSNDBUF="1720320:2703360"

TCPRCVBUF="1720320:2703360"

OTHERSOCKBUF="1126080:2097152"

DGRAMRCVBUF="262144:262144"

NUMOTHERSOCK="1200"

DCACHESIZE="3409920:3624960"

NUMFILE="9312:9312"

AVNUMPROC="180:180"

NUMIPTENT="128:128"


# Disk quota parameters (in form of softlimit:hardlimit)

DISKSPACE="3145728:3145728"

DISKINODES="131072:144179"

QUOTATIME="0"


# CPU fair scheduler parameter

CPUUNITS="1000"


NETFILTER="stateless"

VE_ROOT="/vz/root/101"

VE_PRIVATE="/vz/private/101"

OSTEMPLATE="centos-6-x86_64"

ORIGIN_SAMPLE="basic"

HOSTNAME="test"

IP_ADDRESS="192.168.1.101"

NAMESERVER="8.8.8.8 8.8.4.4"

CPULIMIT="25"

SWAPPAGES="0:262144"


On Mon, Nov 24, 2014 at 12:16 PM, Kir Kolyshkin <kir at openvz.org> wrote:

>
> On 11/23/2014 07:13 PM, Nipun Arora wrote:
>
> Thanks, I will try your suggestions, and get back to you.
> btw... any idea what could be used to share the base image on both
> containers?
> Like hardlink it in what way? Once both containers start, won't they have
> to write to different locations?
>
>
> ploop is composed as a set of stacked images, with all of them but the top
> one being read-only.
>
>
>  I understand that some file systems have a copy on write mechanism,
> where after a snapshot all future writes are written to a additional linked
> disks.
> Does ploop operate in a similar way?
>
>
> yes
>
>
>  http://wiki.qemu.org/Features/Snapshots
>
>
> http://openvz.livejournal.com/44508.html
>
>
>
>  The cloning with a modified vzmigrate script helps.
>
>  - Nipun
>
> On Sun, Nov 23, 2014 at 5:29 PM, Kir Kolyshkin <kir at openvz.org> wrote:
>
>>
>> On 11/23/2014 04:59 AM, Nipun Arora wrote:
>>
>> Hi Kir,
>>
>>  Thanks for the response, I'll update it, and tell you about the results.
>>
>>  1. A follow up question... I found that the write I/O speed of
>> 500-1Mbps increased the suspend time  to several minutes.(mostly pcopy
>> stage)
>> This seems extremely high for a relatively low I/O workload, which is why
>> I was wondering if there are any special things I need to take care of.
>> (I ran fio (flexible i/o writer) with fixed throughput while doing live
>> migration)
>>
>>
>>  Please retry with vzctl 4.8 and ploop 1.12.1 (make sure they are on both
>> sides).
>> There was a 5 second wait for the remote side to finish syncing
>> copied ploop data. It helped a case with not much I/O activity in
>> container, but
>> ruined the case you are talking about.
>>
>> Newer ploop and vzctl implement a feedback channel for ploop copy that
>> eliminates
>> that wait time.
>>
>> http://git.openvz.org/?p=ploop;a=commit;h=20d754c91079165b
>> http://git.openvz.org/?p=vzctl;a=commit;h=374b759dec45255d4
>>
>> There are some other major improvements as well, such as async send for
>> ploop.
>>
>> http://git.openvz.org/?p=ploop;a=commit;h=a55e26e9606e0b
>>
>>
>>  2. For my purposes, I have modified the live migration script to allow
>> me to do cloning... i.e. I start both the containers instead of deleting
>> the original. I need to do this "cloning" from time to time for the same
>> target container...
>>
>>         a. Which means that lets say we cloned container C1 to container
>> C2, and let both execute at time t0, this works with no apparent loss of
>> service.
>>
>>         b. Now at time t1 I would like to again clone C1 to C2, and would
>> like to optimize the rsync process as most of the ploop file for C1 and C2
>> should still be the same (i.e. less time to sync). Can anyone suggest what
>> would be the best way to realize the second point?
>>
>>
>>  You can create a ploop snapshot and use shared base image for both
>> containers
>> (instead of copying the base delta, hardlink it). This is not supported
>> by tools
>> (for example, since base delta is now shared you can't merge down to it,
>> but the
>> tools are not aware) so you need to figure it out by yourself and be
>> accurate
>> but it should work.
>>
>>
>>
>>
>>  Thanks
>> Nipun
>>
>> On Sun, Nov 23, 2014 at 12:56 AM, Kir Kolyshkin <kir at openvz.org> wrote:
>>
>>>
>>> On 11/22/2014 09:09 AM, Nipun Arora wrote:
>>>
>>> Hi All,
>>>
>>>  I was wondering if anyone can suggest what is the most optimal way to
>>> do the following
>>>
>>>  1. Can anyone clarify if ploop is the best layout for minimum suspend
>>> time during live migration?
>>>
>>>
>>>  Yes (due to ploop copy which only copies the modified blocks).
>>>
>>>
>>>  2. I tried migrating a ploop device where I increased the --diskspace
>>> to 5G,
>>> and found that the suspend time taken by live migration increased to 57
>>> seconds
>>> (mainly undump and restore increased)...
>>> whereas a 2G diskspace was taking 2-3 seconds suspend time... Is this
>>> expected?
>>>
>>>
>>>  No. Undump and restore times depends mostly on amount of RAM used by a
>>> container.
>>>
>>> Having said that, live migration stages influence each other, although
>>> it's less so
>>> in the latest vzctl release (I won't go into details here if you allow
>>> me -- just make sure
>>> you test with vzctl 4.8).
>>>
>>>
>>>  3. I tried running a write intensive workload, and found that beyond
>>> 100-150Kbps,
>>> the suspend time during live migration rapidly increased? Is this an
>>> expected trend?
>>>
>>>
>>>  Sure. With increased writing speed, the amount of data that needs to be
>>> copied after CT
>>> is suspended increases.
>>>
>>>
>>>  I am using vzctl 4.7, and ploop 1.11 in centos 6.5
>>>
>>>
>>>  You need to update vzctl and ploop and rerun your tests, there should be
>>> some improvement (in particular with respect to issue #3).
>>>
>>>
>>>  Thanks
>>> Nipun
>>>
>>>
>>> _______________________________________________
>>> Users mailing listUsers at openvz.orghttps://lists.openvz.org/mailman/listinfo/users
>>>
>>>
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at openvz.org
>>> https://lists.openvz.org/mailman/listinfo/users
>>>
>>>
>>
>>
>> _______________________________________________
>> Users mailing listUsers at openvz.orghttps://lists.openvz.org/mailman/listinfo/users
>>
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at openvz.org
>> https://lists.openvz.org/mailman/listinfo/users
>>
>>
>
>
> _______________________________________________
> Users mailing listUsers at openvz.orghttps://lists.openvz.org/mailman/listinfo/users
>
>
>
> _______________________________________________
> Users mailing list
> Users at openvz.org
> https://lists.openvz.org/mailman/listinfo/users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/users/attachments/20141127/8c11807f/attachment-0001.html>


More information about the Users mailing list