[CRIU] checkpoint/restore of an 32bit application on arm64
andi
andi.platschek at gmail.com
Mon Oct 15 14:55:34 MSK 2018
Hi all,
over the weekend I got it to a point where I can actually dump+restore a 32bit application in my
64bit arm (with 32bit criu). HOWEVER the application that I am able to dump is very minimalistic:
int main(int argc, char* argv[])
{
int count;
while (1) {
count++;
}
}
as soon as I add a library call in the loop, the restored application crashes:
[ 256.988318] hello[1931]: unhandled level 3 translation fault (11) at 0x00000000, esr 0x82000007
[ 256.988321] pgd = ffff80083d2a2000
[ 256.988323] [00000000] *pgd=000000083d2a3003
[ 256.988325] , *pud=000000083d2a4003
[ 256.988326] , *pmd=000000083d2a5003
[ 256.988328] , *pte=0000000000000000
[ 256.988329]
[ 256.988330]
[ 256.988335] CPU: 0 PID: 1931 Comm: hello Not tainted 4.9.0-artech #108
[ 256.988337] Hardware name: rwzweiCAB4GS08W001p01_f_f (DT)
[ 256.988339] task: ffff80087017e000 task.stack: ffff80083ca74000
[ 256.988343] PC is at 0x0
[ 256.988345] LR is at 0xf7572d40
[ 256.988347] pc : [<0000000000000000>] lr : [<00000000f7572d40>] pstate: 00000010
[ 256.988348] sp : 00000000ffc39db0
[ 256.988350] x12: 00000000000f4240
[ 256.988352] x11: 00000000ffc39dd4 x10: 00000000f7616fac
[ 256.988356] x9 : 0000000000000000 x8 : 0000000000000000
[ 256.988360] x7 : 0000000000000000 x6 : 0000000000010348
[ 256.988363] x5 : 0000000000000000 x4 : 0000000000000000
[ 256.988367] x3 : 0000000000000000 x2 : 00000000000003e8
[ 256.988370] x1 : 0000000000000000 x0 : 0000000000000000
[ 256.988373]
*unless* the call is before the while(1) loop ... so this way I now have an application that I can
use to compare before/after restore. So far I have identified a couple of problems:
-> it seems that argv is not restored (/proc/<PID>/cmdline is wrong) -- any pointers were to look at?
with a quick search for "cmdline" and "argv" I did not find out where the restore is happening!?
-> the auxv was messed up -- I fixed this one, comparing /proc/<PID>/auxv before/after restore looks good.
-> VmFlags slightly changed (is this expected behavior?) -- most of them are correct, here is the
diff between before/after restore:
# diff -y hello_before hello_after | grep VmFlags | grep "|"
VmFlags: rd ex mr mw me dw sd | VmFlags: rd ex mr mw me ac sd
VmFlags: rd mr mw me dw ac sd | VmFlags: rd mr mw me ac sd
VmFlags: rd wr mr mw me dw ac sd | VmFlags: rd wr mr mw me ac sd
VmFlags: rd ex mr mw me sd | VmFlags: rd ex mr mw me ac sd
VmFlags: mr mw me sd | VmFlags: mr mw me ac sd
VmFlags: rd ex mr mw me dw sd | VmFlags: rd ex mr mw me ac sd
VmFlags: rd mr mw me dw ac sd | VmFlags: rd mr mw me ac sd
VmFlags: rd wr mr mw me dw ac sd | VmFlags: rd wr mr mw me ac sd
VmFlags: rd wr mr mw me gd ac | VmFlags: rd wr mr mw me gd ac sd
so it's mostly dw dropped and/or ac added and one additional sd (last line) -- so my first guess
would be, that this is a problem that should be fixed, but won't lead to the above crash?
-> the second problem in the mappings is, that RSS, PSS, etc. values are quite different:
# diff -y hello_before hello_after | grep -v VmFlags | grep "|"
Anonymous: 0 kB | Anonymous: 4 kB
Rss: 788 kB | Rss: 0 kB
Pss: 135 kB | Pss: 0 kB
Shared_Clean: 788 kB | Shared_Clean: 0 kB
Referenced: 788 kB | Referenced: 0 kB
Rss: 8 kB | Rss: 0 kB
Pss: 8 kB | Pss: 0 kB
Private_Dirty: 8 kB | Private_Dirty: 0 kB
Referenced: 8 kB | Referenced: 0 kB
Anonymous: 8 kB | Anonymous: 0 kB
Rss: 4 kB | Rss: 0 kB
Pss: 4 kB | Pss: 0 kB
Private_Dirty: 4 kB | Private_Dirty: 0 kB
Referenced: 4 kB | Referenced: 0 kB
Anonymous: 4 kB | Anonymous: 0 kB
Rss: 8 kB | Rss: 0 kB
Pss: 8 kB | Pss: 0 kB
Private_Dirty: 8 kB | Private_Dirty: 0 kB
Referenced: 8 kB | Referenced: 0 kB
Anonymous: 8 kB | Anonymous: 0 kB
Rss: 128 kB | Rss: 0 kB
Pss: 18 kB | Pss: 0 kB
Shared_Clean: 128 kB | Shared_Clean: 0 kB
Referenced: 128 kB | Referenced: 0 kB
Rss: 8 kB | Rss: 0 kB
Pss: 8 kB | Pss: 0 kB
Private_Dirty: 8 kB | Private_Dirty: 0 kB
Referenced: 8 kB | Referenced: 0 kB
Anonymous: 8 kB | Anonymous: 0 kB
Rss: 4 kB | Rss: 0 kB
Pss: 4 kB | Pss: 0 kB
Private_Dirty: 4 kB | Private_Dirty: 0 kB
Referenced: 4 kB | Referenced: 0 kB
Anonymous: 4 kB | Anonymous: 0 kB
Rss: 4 kB | Rss: 0 kB
Pss: 4 kB | Pss: 0 kB
Private_Dirty: 4 kB | Private_Dirty: 0 kB
Referenced: 4 kB | Referenced: 0 kB
Anonymous: 4 kB | Anonymous: 0 kB
Rss: 8 kB | Rss: 4 kB
Pss: 8 kB | Pss: 4 kB
Private_Dirty: 8 kB | Private_Dirty: 4 kB
Referenced: 8 kB | Referenced: 4 kB
Anonymous: 8 kB | Anonymous: 4 kB
I *think* this is the problem to fix first to get going
[ 294.336197] BUG: Bad rss-counter state mm:ffff800871f463c0 idx:1 val:-18
[ 294.336201] BUG: Bad rss-counter state mm:ffff800871f463c0 idx:3 val:-1
any hints on any of those issues is much appreciated!
many thanks!
andi
On 10/9/18 6:15 PM, Dmitry Safonov wrote:
> On Tue, 9 Oct 2018 at 14:53, Cyrill Gorcunov <gorcunov at gmail.com> wrote:
>> On Tue, Oct 09, 2018 at 01:36:30PM +0200, andi wrote:
>>> Hi all,
>>>
>>> I was wondering if anyone has ever looked into the possibility of supporting CONFIG_COMPAT for arm?
>> Hi Andi. None I know of is currently working on compat mode for arm.
>>
>>> P.S.: In case someone is wondering at what point the restore fails:
>>>
>>> (00.055449) pie: 376: Error (criu/pie/restorer.c:1482): sys_prctl(PR_SET_MM, PR_SET_MM_MAP) failed with -14
>>> (00.055471) pie: 376: Error (criu/pie/restorer.c:1700): Restorer fail 376
>>> (00.055515) Error (criu/cr-restore.c:2266): Restoring FAILED.
>> Seems like it is EFAULT, so some value has been able to copy from userspace.
>> Hard to tell what exactly is happening since someone with good arm knowledge
>> is needed here.
> Just for the reference another thread about it a year ago:
> https://lists.openvz.org/pipermail/criu/2017-December/040025.html
>
More information about the CRIU
mailing list