[CRIU] [p.haul] Tarfile stream exception when migrating LXC container
Lucas Ouellette-Falkenstein
louellet at redhat.com
Mon Jul 24 18:47:27 MSK 2017
Greetings,
I've spent some time trying to get the LXC container transfer with the WebUI working based on older patches (https://lists.openvz.org/pipermail/criu/2016-June/029858.html).
However, I have been having some trouble with images and config file transfer.
The simple and obvious way, doing something with rsync/scp/etc works perfectly fine.
phaul however, uses transfer over socket with a tar stream in images.py and util.py, and it produces an error but only for LXC containers, seems LXC support is broken.
It fails with with the following trace:
"Traceback (most recent call last):
File "/media/sf_Code/p.haul/phaul/images.py", line 46, in run
tf = tarfile.open(mode="r|", fileobj=tf_fileobj)
File "/usr/lib64/python2.7/tarfile.py", line 1705, in open
t = cls(name, filemode, stream, **kwargs)
File "/usr/lib64/python2.7/tarfile.py", line 1587, in __init__
self.firstmember = self.next()
File "/usr/lib64/python2.7/tarfile.py", line 2370, in next
raise ReadError(str(e))
ReadError: bad checksum"
At first I thought it was an issue with reading raw image files like pages-%d.img, but it seems that no matter what it will hang or throw an exception on an LXC container.
I don't see any differences in LXC dumps vs non-container process dumps besides the size/number of images dumped, and I ruled that out by trying to transfer less/smaller images.
Chasing the execution with a debugger indicates that eventually it reads a 512 block of null bytes from the socket which it expects to contain a checksum.
Of course "\0\0\0\0" is not in the list of valid checksums.
This results in a HeaderError ("e" in the above snippet) which is thrown, caught, and rethrown as the ReadError.
Changing it from an uncompressed tar stream to a compressed tar stream (r|gz & w|gz and r|bz2 & r|bz2) does change the error a little.
Instead phaul hangs on socket.recv in util.py, I can try other compression types but don't think it'll change.
I don't know why transfer was done with a tar stream, maybe because it was supposed to be a simple way to send over socket instead of designing a protocol and error handling?
Previous commits (e.g 5fff02f, 9944e3f, 7c338ef ) have message logs indicating the current file transfer method has been a problem in the past, and a rewrite might be worthwhile.
Adding a new, more tested library for image transfer might be best, though that's not for me to decide, not sure what platforms phaul supports and what is used in production.
There was a comment in a commit log (9944e3f i believe) that CRIU *might* add a feature to send images using the socket it opens, that hasn't been added right? Does not seem so.
Otherwise it might be sensible to add an a phaul argument to choose the file transfer method (tar stream [default], rsync, etc)
Any advice would be greatly appreciated.
Thankful for your time,
Lucas Ouellette-Falkenstein
More information about the CRIU
mailing list