[CRIU] [p.haul] Tarfile stream exception when migrating LXC container

Pavel Emelyanov xemul at virtuozzo.com
Mon Aug 7 18:07:18 MSK 2017


On 07/24/2017 06:47 PM, Lucas Ouellette-Falkenstein wrote:
> Greetings,

Hi! Sorry for a long silence, I've been on vacation last two weeks :)

> I've spent some time trying to get the LXC container transfer with the WebUI working based on older
> patches (https://lists.openvz.org/pipermail/criu/2016-June/029858.html).
> However, I have been having some trouble with images and config file transfer.
> The simple and obvious way, doing something with rsync/scp/etc works perfectly fine.

The thing is that p.haul's idea was to use only the given streams for data transfer, and
sending a tarball via it was the simplest way.

Actually, live migration with FS transfer is not nice by itself, but that's another story.

> phaul however, uses transfer over socket with a tar stream in images.py and util.py, and it produces
> an error but only for LXC containers, seems LXC support is broken.
> 
> It fails with with the following trace:
> 
> 
> "Traceback (most recent call last):
>   File "/media/sf_Code/p.haul/phaul/images.py", line 46, in run
>     tf = tarfile.open(mode="r|", fileobj=tf_fileobj)
>   File "/usr/lib64/python2.7/tarfile.py", line 1705, in open
>     t = cls(name, filemode, stream, **kwargs)
>   File "/usr/lib64/python2.7/tarfile.py", line 1587, in __init__
>     self.firstmember = self.next()
>   File "/usr/lib64/python2.7/tarfile.py", line 2370, in next
>     raise ReadError(str(e))
> ReadError: bad checksum"
> 
> 
> At first I thought it was an issue with reading raw image files like pages-%d.img, but it seems that no matter
> what it will hang or throw an exception on an LXC container.
> I don't see any differences in LXC dumps vs non-container process dumps besides the size/number of images dumped,
> and I ruled that out by trying to transfer less/smaller images.
> 
> Chasing the execution with a debugger indicates that eventually it reads a 512 block of null bytes from the socket
> which it expects to contain a checksum.
> Of course "\0\0\0\0" is not in the list of valid checksums.
> This results in a HeaderError ("e" in the above snippet) which is thrown, caught, and rethrown as the ReadError.
> 
> Changing it from an uncompressed tar stream to a compressed tar stream (r|gz & w|gz and r|bz2 & r|bz2) does change
> the error a little. 
> Instead phaul hangs on socket.recv in util.py, I can try other compression types but don't think it'll change. 

The problem with current code is that the tarfile is sent as a raw sequence of bytes and then gets closed.
So if you mix channels (e.g. pass memory channel as fs channel) the logic would become broken.

> I don't know why transfer was done with a tar stream, maybe because it was supposed to be a simple way to send 
> over socket instead of designing a protocol and error handling?

Exactly :) P.haul (python p.haul) is rather a proof-of-concept that shows how things should look like.

> Previous commits (e.g 5fff02f, 9944e3f, 7c338ef ) have message logs indicating the current file transfer method
> has been a problem in the past, and a rewrite might be worthwhile. 
> 
> Adding a new, more tested library for image transfer might be best, though that's not for me to decide, not sure
> what platforms phaul supports and what is used in production.
> There was a comment in a commit log (9944e3f i believe) that CRIU *might* add a feature to send images using the
> socket it opens, that hasn't been added right? Does not seem so.
> Otherwise it might be sensible to add an a phaul argument to choose the file transfer method (tar stream [default],
> rsync, etc)

Yes, the way p.haul transfers the FS is quite raw and is to be fixed heavily. Actually, the whole python
version of p.haul is causing more problems than solves. The biggest issue with it is that it's written in
python :) and thus is hard to integrate with. To mitigate this we've started the go version of this code which
now sits in criu/ sources (criu-dev branch). All new stuff (lazy migration, image-cache and -proxy) are about
to go there, and python version will thus get transformed into a CLI on top of the Go core. So since we're about
to start some real work about FS migration, I'd appreciate if we could do it in terms of extending the go p.haul
code.

> Any advice would be greatly appreciated.
> 
> Thankful for your time,
> Lucas Ouellette-Falkenstein
> 
> 
> 
> _______________________________________________
> CRIU mailing list
> CRIU at openvz.org
> https://lists.openvz.org/mailman/listinfo/criu
> .
> 



More information about the CRIU mailing list