[CRIU] HAProxy + CRIU
Fox, Kevin M
Kevin.Fox at pnnl.gov
Fri Apr 8 12:35:04 PDT 2016
A lot of clients don't reconnect/retry on failure outside of what raw tcp provides. They also cache DNS entries too long so switching around IP's in DNS entries for adjusting load balancers is a real pain, if even possible.
So, that makes Load Balancers kind of important to always have running. And keepalilved doesn't help too much other then having a quick but dirty recovery. Connections break.
But for scalability of some services, you may want to run a lot of load balancers.
So, say you have a pool of http object storage servers (swift for example):
n[0-5]. Each is 40gig attached.
You could put a load balancer in front on separate machines, but you'd have to have a big cluster with a lot of extra bandwidth for it not to be a bottleneck.
A better option would be to have haproxy on each node. n[0-5] and have them prefer transfer to the local node.
The problem then is if you want to take a node out, say n1, its painful. you need to get the dns entry updated to remove the ip (doesn't work if your organization doesn't let you quickly do that). You then have to wait the minimum dns timeout on common os's (windows I think is still 30 minutes). then make sure all the traffic drains.
This solution is an attempt to eliminate the need to play with DNS. DNS would always have the vip's for the n[0-5] load balancers.
when, say, n1 is to be pulled out, the haproxy is live migrated to another of the machines, say n3. The connections in the haproxy still point back at n1 so they are not lost.
You then can tweak the config on the n1 haproxy to point to n3 and the server reloaded. All new connections to the n1 proxy then go to the n3 server. Once all connections between n3 and n1 are done, n1 can safely be upgraded, have hardware replaced, rebooted, etc, and brought back online.
Once maintenance of rc1 is complete, the procedure can be reversed to bring back full bandwidth access to the resources. You can cycle thorough all the nodes to upgrade them all.
So that covers live upgrading the host a haproxy is running on. The reload process that a haproxy does actually spawns a new process for the newer connections, so if you enter the container and upgrade, then reload, it will safely upgrade that piece of the software too.
So the tool provides you a way to upgrade the whole software stack live.
This only works if haproxy doesn't loose connections though, so really relies on connection repair in CRIU being solid. I'm hoping it is, which is one reason I'm posting here. Curious if there are any known gotcha's.
I think this could be a disruptive technology in the network scalability world and is a real world showcase of CRIU's power.
----------------------------
As for the diskless migration, I think I may implement it eventually, but the container is only 8mb in size, and the process is similarly sized. Its all just about network connections with haproxy. Not much memory usage. So it was a fair amount of extra steps without too much benefit for the first pass.
Thanks,
Kevin
________________________________________
From: Pavel Emelyanov [xemul at virtuozzo.com]
Sent: Thursday, April 07, 2016 5:42 AM
To: Fox, Kevin M; criu at openvz.org
Subject: Re: [CRIU] HAProxy + CRIU
On 04/06/2016 10:51 PM, Fox, Kevin M wrote:
> I've contributed a script that makes HAProxy easily migrate with CRIU.
Wow! :) Can you shed a little bit more light on this? What's the real-life
usage you plan for this?
> Script here:
> https://github.com/openstack/osops-tools-contrib/blob/master/multi/superhaproxy
> Commit here:
> https://github.com/openstack/osops-tools-contrib/commit/bd711d693c6ce07a203fec5978c7c4e2cc07dcd4
>
> I'm curious if the CRIU team knows of any issues that may show with the approach?
Well, at the first glance it looks sane. One thing I can point
out from the very beginning is that it makes sense to send
pages directly to the restore side w/o putting them into local
dir. See the https://criu.org/Disk-less_migration
More comments/ideas will come once we understand the use-case better :)
-- Pavel
More information about the CRIU
mailing list