[Users] Performance degradation on 042stab113.X

Mon Apr 4 12:11:54 PDT 2016

Hi Vasily,

I've upgraded two nodes last week from 113.12 to 113.21 and it seems
better. Backups last weekend took the same time as it was on <=108.8. I'll
still keep an eye on this and also on the development of 115 in OpenVZ Jira.

Thanks!

Karl

On Thu, Mar 31, 2016 at 4:13 AM, Vasily Averin <vvs at virtuozzo.com> wrote:

> On 30.03.2016 18:38, Karl Johnson wrote:
> > Hi Vasily,
> >
> > I do indeed use simfs / ext4 / cfq. Only a backup of each containers
> > private areas is done with vzdump and then transferred to a backup
> > server with ncftpput. Compressing the data is OK while transferring
> > the dump over local network peak the load so the issue is with (read)
> > IO. I’m trying to find out why it was fine before and cause problem
> > now. Those nodes are in heavy production so it’s hard to do testing
> > (including downgrading kernel).
>
> Few lists of blocked processes taken on alt+sysrq+W "magic sysrq" key can
> be useful,
> it allows to see who is blocked, and it allows to see dynamic of process,
> but it does not explain who causes this traffic jam.
>
> I'm sorry, but another ways of troubleshooting are much more destuctive.
> Moreover even kernel crash dump does not guarantee success in your case.
> It allows to see whole picture with all details,
> but it does not allow to understand the dynamic of process.
>
> > Thanks for all the information on futur roadmap. I’m glad that the
> > work as already begun on RHEL 6.8 rebase. I read the beta technical
> > notes last week and some upgrades seem great. Do you consider
> > 042stab114.5 stable even if it’s in the testing repo? I might try it
> > tomorrow and see how it goes.
>
> In fact we do not know yet.
>
> 114.x kernels includes ~30 new patches from Red Hat and ~10 our ones,
> and we had few minor rejects only during re-base.
> At the first glance it should not cause problems,
> but first 114.x kernel was crashed on boot,
> and 114.4 was crashed after CT suspend-resume.
> In both cases we was need to re-work our patches.
>
> 042stab114.5 kernel work well on my test node right now,
> but it is not ready for production yet and requires careful re-testing.
> So if you have some specific workload, we would be very grateful
> for any testing and bugreports.
> It allows us to know about hidden bugs before release.
>
> thank you,
>         Vasily Averin
>
> > Regards,
> >
> > Karl
> >
> > On Wed, Mar 30, 2016 at 5:48 AM, Vasily Averin <vvs at virtuozzo.com
> <mailto:vvs at virtuozzo.com>> wrote:
> >
> >     Dear Karl,
> >
> >     thank you for explanation.
> >     however some details are still not it clear.
> >
> >     I believe you use simfs containers (otherwise you can do not worry
> about PSBM-34244,
> >     using of 113.12 kernels also confirms it)
> >     but it isn't clear how exactly you backup your nodes.
> >     Do you dump whole partition with containers or just copy containers
> private areas somehow?
> >     What filesystem you have on partition with containers.
> >     What is backup storage in your case?
> >
> >     Anyway seems you do not freeze filesystem with containers before
> backup.
> >     This functionality was broken in RHEL6 kernels quite long time,
> >     and Red Hat fixed it in 2.6.32-504.x and 573.x kernels.
> >
> >     https://access.redhat.com/solutions/1506563
> >
> >     Probably these fixes affect your testcase.
> >
> >     I'm not sure of course,
> >     may be it isn't and some other fixes are guilty:
> >     Red Hat added >7000 new patches into 2.6.32-573.x kernels
> >     many our patches was changed during re-base,
> >     and many new patches was added.
> >     There was to many changes between 108.x and 113.x kernels.
> >
> >     Our tests did not detected significant performance degradation,
> >     but it means nothing, most likely we just did not measured your
> testcase.
> >
> >     I do not expect that situation will be changed on 113.21 kernel,
> >     seems we did not fixed similar issues last time.
> >
> >     Yes, you-re right, our 042stab114.x kernels will be based
> >     on last released RHEL6.7 kernel 2.6.32-573.22.1.el6.
> >     its validation is in progress at present,
> >     and I hope we'll publish it in nearest future.
> >
> >     However I did not found any related bugfixes in new RHEL6 kernels,
> >     and doubt that it helps you.
> >
> >     Also we're going to make 115.x kernel based on RHEL6 update8 beta
> kernel 2.6.32-621.el6,
> >     it have no chances to be released in stable branch but its testing
> helps us to speed-up
> >     our rebase to RHEL6.8 release kernel (we expect RHEL6u8 will be
> released in end of May).
> >
> >     The work on 115.x kernel is in progress, and I hope it should be
> done in next few days.
> >
> >     So I would like to propose you following plan:
> >     please check how works 113.21, 114.x and 115.x kernels, (may be it
> works already)
> >     if issue will be still present, please reproduce the problem once
> again, crash affected host,
> >     create new bug in jira and push me again. I'll send you private link
> for vmcore uploading.
> >     Investigation of kernel crash dump file probably allows me to find
> bottleneck in your case.
> >
> >     Thank you,
> >             Vasily Averin
> >
> >     On 29.03.2016 21:03, Karl Johnson wrote:
> >     > Hi Vasily,
> >     >
> >     > Every weekend I do backups of all CT which take a lot of IO. It
> >     > didn't affect much load average before 108 but as soon as I
> upgraded
> >     > to 113, load got very high and nodes became sluggish during
> backups.
> >     > It might be something else but I was looking for feedback if
> someone
> >     > else had the same issue. I will continue to troubleshoot this
> issue.
> >     > Meanwhile, I will upgrade them from 113.12 to 113.21 and see how it
> >     > goes even if there's nothing related to this in the changelog.
> >     >
> >     > Thanks for the reply,
> >     >
> >     > Karl
> >     >
> >     > On Tue, Mar 29, 2016 at 5:21 AM, Vasily Averin <vvs at virtuozzo.com
> <mailto:vvs at virtuozzo.com> <mailto:vvs at virtuozzo.com <mailto:
> vvs at virtuozzo.com>>> wrote:
> >     >
> >     >     Dear Karl,
> >     >
> >     >     no, we know nothing about possible performance degradation
> between
> >     >     042stab108.x and 042stab113.x kernels.
> >     >     High load average and CPU peaks  are not a problems per se,
> >     >     it can be caused by increased activity on your nodes.
> >     >
> >     >     Could you please explain in more details,
> >     >     why you believe you have a problem on your nodes?
> >     >
> >     >     Thank you,
> >     >             Vasily Averin
> >     >
> >     >     On 28.03.2016 20:28, Karl Johnson wrote:
> >     >     > Hello,
> >     >     >
> >     >     > Did anyone notice performance degradation after upgrading
> vzkernel to
> >     >     > 042stab113.X? I’ve been running 042stab108.5 on few nodes
> for a while
> >     >     > with no issue and upgraded to 042stab113.12 few weeks ago to
> fix an
> >     >     > important CVE and rebase to latest rhel6 kernel.
> >     >     >
> >     >     > Since the upgrade from 108.5 to 113.12, I noticed much
> higher load
> >     >     > average on those upgraded OpenVZ nodes, mostly when IO is
> heavily
> >     >     > used. High CPU peaks are much more frequent. I would be
> curious to
> >     >     > know if someone else has the same issue. I wouldn’t
> downgrade because
> >     >     > of security fix PSBM-34244.
> >     >     >
> >     >     > Regards,
> >     >     >
> >     >     > Karl
> >     >     >
> >     >     >
> >     >     > _______________________________________________
> >     >     > Users mailing list
> >     >     > Users at openvz.org <mailto:Users at openvz.org> <mailto:
> Users at openvz.org <mailto:Users at openvz.org>>
> >     >     > https://lists.openvz.org/mailman/listinfo/users
> >     >     >
> >     >     _______________________________________________
> >     >     Users mailing list
> >     >     Users at openvz.org <mailto:Users at openvz.org> <mailto:
> Users at openvz.org <mailto:Users at openvz.org>>
> >     >     https://lists.openvz.org/mailman/listinfo/users
> >     >
> >     >
> >     >
> >     >
> >     > _______________________________________________
> >     > Users mailing list
> >     > Users at openvz.org <mailto:Users at openvz.org>
> >     > https://lists.openvz.org/mailman/listinfo/users
> >     >
> >     _______________________________________________
> >     Users mailing list
> >     Users at openvz.org <mailto:Users at openvz.org>
> >     https://lists.openvz.org/mailman/listinfo/users
> >
> >
> >
> >
> > _______________________________________________
> > Users mailing list
> > Users at openvz.org
> > https://lists.openvz.org/mailman/listinfo/users
> >
> _______________________________________________
> Users mailing list
> Users at openvz.org
> https://lists.openvz.org/mailman/listinfo/users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openvz.org/pipermail/users/attachments/20160404/cb62e689/attachment.html>