[Devel] Re: multi-threaded app fails to restart
Oren Laadan
orenl at cs.columbia.edu
Tue Jul 20 16:12:40 PDT 2010
Hi John
In your program, it is a thread of the root task (of the hierarchy)
that is missed. Indeed the previous patch was incomplete - it did
fix the non-root-threads case but spoiled the root-threads case.
That was silly... well, can you try this little patch:
Thanks for following up, was very helpful !
Oren.
---
diff --git a/kernel/checkpoint/sys.c b/kernel/checkpoint/sys.c
index 171c867..3288af0 100644
--- a/kernel/checkpoint/sys.c
+++ b/kernel/checkpoint/sys.c
@@ -605,13 +605,13 @@ int walk_task_subtree(struct task_struct *root,
continue;
}
+ /* if not last thread - proceed with thread */
+ task = next_thread(task);
+ if (!thread_group_leader(task))
+ continue;
+
/* by definition, skip siblings of root */
while (task != root) {
- /* if not last thread - proceed with thread */
- task = next_thread(task);
- if (!thread_group_leader(task))
- break;
-
/* if has sibling - proceed with sibling */
if (!list_is_last(&task->sibling, &parent->children)) {
task = list_entry(task->sibling.next,
---
On Tue, 20 Jul 2010, John Paul Walters wrote:
> >
> > Hi John,
> >
> > I just pushed a few more fixes related to signals to ckpt-v22-dev.
> > Can you please see if they fix your problem ?
> >
> > Also, can you please post the test program that you are using, so
> > we can try to replicate the problem ?
> >
> > Note that it is usually ok for sys_restart() to return -512 -- it
> > means that the process/thread was interrupted when the checkpoint,
> > and it will now retry the same syscall from then.
> >
> > You can use the -F (--freezer) switch of restart(1) to freeze the
> > restarted tasks/threads before they are allowed to run in userspace.
> > Using it you can tell whether the other thread dies immediately
> > after restart, or is not at all restarted.
> >
> > Thanks,
> >
> > Oren.
> >
>
> Hi Oren,
>
> I grabbed the most recent v22-dev that includes the updates. I'm
> still experiencing the same issue. Testing with -F indicates that the
> second thread isn't being restarted. The code that I'm using is:
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <pthread.h>
> #include <sys/syscall.h>
> #include <errno.h>
> #include <string.h>
> #include <unistd.h>
>
> #define OUTFILE "/tmp/cr-self.out"
>
> void *
> func (void *arg)
> {
> FILE *file;
> int counter = 0;
>
> file = fopen(OUTFILE, "w+");
>
> while (1){
> sleep(2);
> counter++;
> fprintf(file, "Count %d\n", counter);
> fflush(file);
> }
>
> return NULL;
> }
>
> int
> main (int argc, char **argv)
> {
> pthread_t thread;
> close (0);
> close (1);
> close (2);
> unlink (OUTFILE);
>
> pthread_create(&thread, NULL, func, NULL);
> pthread_join(thread, NULL);
> return 0;
> }
>
> Thanks for your help,
> JP
>
>
_______________________________________________
Containers mailing list
Containers at lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
More information about the Devel
mailing list