[Users] Performance degradation on 042stab113.X

Tue Apr 5 01:28:52 PDT 2016

Dear Karl,

it is good surprise for me, and I cannot explain why this happen.

Probably following fix was help you:
042stab113.18:
* force charge swapin readahead pages if in ub0 (PSBM-44857)

Backup on host triggered memory reclaim activity inside containers,
and we had few reports about hard node lockups in such situation.

I expected it isn't your case because your node was not hang completely,
but probably it improves performance for your testcase too.

First wave of tests on 042stab114.5 kernel was finished successfully 
and at present we did not detected any new issues.
The testing is still in progress and many other tests was not executed yet,
however seems we broke nothing during our re-base,
and I hope this kernel will be good for production too.

We have finished re-base to rhel6u8 beta kenrel too,
first kernel was started successfully and passed validation on my test node.
We've noticed 2 issues only:
- first one was minor re-base-relates issue, one of our patches was corrected
- second one was critical bug in RHEL kernel, 
I've escalated it to Red Hat and they have fixed it already. 
All developers loves early bugreports from beta testers :).

So I prepare 115.2 kernel with these bugfixes and going to publish it today-tomorrow.

Thank you,
	Vasily Averin.

On 04.04.2016 22:11, Karl Johnson wrote:
> Hi Vasily,
> 
> I've upgraded two nodes last week from 113.12 to 113.21 and it seems
> better. Backups last weekend took the same time as it was on <=108.8.
> I'll still keep an eye on this and also on the development of 115 in
> OpenVZ Jira.
> 
> Thanks!
> 
> Karl
> 
> On Thu, Mar 31, 2016 at 4:13 AM, Vasily Averin <vvs at virtuozzo.com <mailto:vvs at virtuozzo.com>> wrote:
> 
>     On 30.03.2016 18:38, Karl Johnson wrote:
>     > Hi Vasily,
>     >
>     > I do indeed use simfs / ext4 / cfq. Only a backup of each containers
>     > private areas is done with vzdump and then transferred to a backup
>     > server with ncftpput. Compressing the data is OK while transferring
>     > the dump over local network peak the load so the issue is with (read)
>     > IO. I’m trying to find out why it was fine before and cause problem
>     > now. Those nodes are in heavy production so it’s hard to do testing
>     > (including downgrading kernel).
> 
>     Few lists of blocked processes taken on alt+sysrq+W "magic sysrq" key can be useful,
>     it allows to see who is blocked, and it allows to see dynamic of process,
>     but it does not explain who causes this traffic jam.
> 
>     I'm sorry, but another ways of troubleshooting are much more destuctive.
>     Moreover even kernel crash dump does not guarantee success in your case.
>     It allows to see whole picture with all details,
>     but it does not allow to understand the dynamic of process.
> 
>     > Thanks for all the information on futur roadmap. I’m glad that the
>     > work as already begun on RHEL 6.8 rebase. I read the beta technical
>     > notes last week and some upgrades seem great. Do you consider
>     > 042stab114.5 stable even if it’s in the testing repo? I might try it
>     > tomorrow and see how it goes.
> 
>     In fact we do not know yet.
> 
>     114.x kernels includes ~30 new patches from Red Hat and ~10 our ones,
>     and we had few minor rejects only during re-base.
>     At the first glance it should not cause problems,
>     but first 114.x kernel was crashed on boot,
>     and 114.4 was crashed after CT suspend-resume.
>     In both cases we was need to re-work our patches.
> 
>     042stab114.5 kernel work well on my test node right now,
>     but it is not ready for production yet and requires careful re-testing.
>     So if you have some specific workload, we would be very grateful
>     for any testing and bugreports.
>     It allows us to know about hidden bugs before release.
> 
>     thank you,
>             Vasily Averin
> 
>     > Regards,
>     >
>     > Karl
>     >
>     > On Wed, Mar 30, 2016 at 5:48 AM, Vasily Averin <vvs at virtuozzo.com <mailto:vvs at virtuozzo.com> <mailto:vvs at virtuozzo.com <mailto:vvs at virtuozzo.com>>> wrote:
>     >
>     >     Dear Karl,
>     >
>     >     thank you for explanation.
>     >     however some details are still not it clear.
>     >
>     >     I believe you use simfs containers (otherwise you can do not worry about PSBM-34244,
>     >     using of 113.12 kernels also confirms it)
>     >     but it isn't clear how exactly you backup your nodes.
>     >     Do you dump whole partition with containers or just copy containers private areas somehow?
>     >     What filesystem you have on partition with containers.
>     >     What is backup storage in your case?
>     >
>     >     Anyway seems you do not freeze filesystem with containers before backup.
>     >     This functionality was broken in RHEL6 kernels quite long time,
>     >     and Red Hat fixed it in 2.6.32-504.x and 573.x kernels.
>     >
>     >     https://access.redhat.com/solutions/1506563
>     >
>     >     Probably these fixes affect your testcase.
>     >
>     >     I'm not sure of course,
>     >     may be it isn't and some other fixes are guilty:
>     >     Red Hat added >7000 new patches into 2.6.32-573.x kernels
>     >     many our patches was changed during re-base,
>     >     and many new patches was added.
>     >     There was to many changes between 108.x and 113.x kernels.
>     >
>     >     Our tests did not detected significant performance degradation,
>     >     but it means nothing, most likely we just did not measured your testcase.
>     >
>     >     I do not expect that situation will be changed on 113.21 kernel,
>     >     seems we did not fixed similar issues last time.
>     >
>     >     Yes, you-re right, our 042stab114.x kernels will be based
>     >     on last released RHEL6.7 kernel 2.6.32-573.22.1.el6.
>     >     its validation is in progress at present,
>     >     and I hope we'll publish it in nearest future.
>     >
>     >     However I did not found any related bugfixes in new RHEL6 kernels,
>     >     and doubt that it helps you.
>     >
>     >     Also we're going to make 115.x kernel based on RHEL6 update8 beta kernel 2.6.32-621.el6,
>     >     it have no chances to be released in stable branch but its testing helps us to speed-up
>     >     our rebase to RHEL6.8 release kernel (we expect RHEL6u8 will be released in end of May).
>     >
>     >     The work on 115.x kernel is in progress, and I hope it should be done in next few days.
>     >
>     >     So I would like to propose you following plan:
>     >     please check how works 113.21, 114.x and 115.x kernels, (may be it works already)
>     >     if issue will be still present, please reproduce the problem once again, crash affected host,
>     >     create new bug in jira and push me again. I'll send you private link for vmcore uploading.
>     >     Investigation of kernel crash dump file probably allows me to find bottleneck in your case.
>     >
>     >     Thank you,
>     >             Vasily Averin
>     >
>     >     On 29.03.2016 21:03, Karl Johnson wrote:
>     >     > Hi Vasily,
>     >     >
>     >     > Every weekend I do backups of all CT which take a lot of IO. It
>     >     > didn't affect much load average before 108 but as soon as I upgraded
>     >     > to 113, load got very high and nodes became sluggish during backups.
>     >     > It might be something else but I was looking for feedback if someone
>     >     > else had the same issue. I will continue to troubleshoot this issue.
>     >     > Meanwhile, I will upgrade them from 113.12 to 113.21 and see how it
>     >     > goes even if there's nothing related to this in the changelog.
>     >     >
>     >     > Thanks for the reply,
>     >     >
>     >     > Karl
>     >     >
>     >     > On Tue, Mar 29, 2016 at 5:21 AM, Vasily Averin <vvs at virtuozzo.com <mailto:vvs at virtuozzo.com> <mailto:vvs at virtuozzo.com <mailto:vvs at virtuozzo.com>> <mailto:vvs at virtuozzo.com <mailto:vvs at virtuozzo.com> <mailto:vvs at virtuozzo.com <mailto:vvs at virtuozzo.com>>>> wrote:
>     >     >
>     >     >     Dear Karl,
>     >     >
>     >     >     no, we know nothing about possible performance degradation between
>     >     >     042stab108.x and 042stab113.x kernels.
>     >     >     High load average and CPU peaks  are not a problems per se,
>     >     >     it can be caused by increased activity on your nodes.
>     >     >
>     >     >     Could you please explain in more details,
>     >     >     why you believe you have a problem on your nodes?
>     >     >
>     >     >     Thank you,
>     >     >             Vasily Averin
>     >     >
>     >     >     On 28.03.2016 20:28, Karl Johnson wrote:
>     >     >     > Hello,
>     >     >     >
>     >     >     > Did anyone notice performance degradation after upgrading vzkernel to
>     >     >     > 042stab113.X? I’ve been running 042stab108.5 on few nodes for a while
>     >     >     > with no issue and upgraded to 042stab113.12 few weeks ago to fix an
>     >     >     > important CVE and rebase to latest rhel6 kernel.
>     >     >     >
>     >     >     > Since the upgrade from 108.5 to 113.12, I noticed much higher load
>     >     >     > average on those upgraded OpenVZ nodes, mostly when IO is heavily
>     >     >     > used. High CPU peaks are much more frequent. I would be curious to
>     >     >     > know if someone else has the same issue. I wouldn’t downgrade because
>     >     >     > of security fix PSBM-34244.
>     >     >     >
>     >     >     > Regards,
>     >     >     >
>     >     >     > Karl
>     >     >     >
>     >     >     >
>     >     >     > _______________________________________________
>     >     >     > Users mailing list
>     >     >     > Users at openvz.org <mailto:Users at openvz.org> <mailto:Users at openvz.org <mailto:Users at openvz.org>> <mailto:Users at openvz.org <mailto:Users at openvz.org> <mailto:Users at openvz.org <mailto:Users at openvz.org>>>
>     >     >     > https://lists.openvz.org/mailman/listinfo/users
>     >     >     >
>     >     >     _______________________________________________
>     >     >     Users mailing list
>     >     >     Users at openvz.org <mailto:Users at openvz.org> <mailto:Users at openvz.org <mailto:Users at openvz.org>> <mailto:Users at openvz.org <mailto:Users at openvz.org> <mailto:Users at openvz.org <mailto:Users at openvz.org>>>
>     >     >     https://lists.openvz.org/mailman/listinfo/users
>     >     >
>     >     >
>     >     >
>     >     >
>     >     > _______________________________________________
>     >     > Users mailing list
>     >     > Users at openvz.org <mailto:Users at openvz.org> <mailto:Users at openvz.org <mailto:Users at openvz.org>>
>     >     > https://lists.openvz.org/mailman/listinfo/users
>     >     >
>     >     _______________________________________________
>     >     Users mailing list
>     >     Users at openvz.org <mailto:Users at openvz.org> <mailto:Users at openvz.org <mailto:Users at openvz.org>>
>     >     https://lists.openvz.org/mailman/listinfo/users
>     >
>     >
>     >
>     >
>     > _______________________________________________
>     > Users mailing list
>     > Users at openvz.org <mailto:Users at openvz.org>
>     > https://lists.openvz.org/mailman/listinfo/users
>     >
>     _______________________________________________
>     Users mailing list
>     Users at openvz.org <mailto:Users at openvz.org>
>     https://lists.openvz.org/mailman/listinfo/users
> 
> 
> 
> 
> _______________________________________________
> Users mailing list
> Users at openvz.org
> https://lists.openvz.org/mailman/listinfo/users
>