Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    NoobTracker
    @NoobTracker

    Try setenforce 0 for a quick check to see if disabling selinux helps

    That works!

    NoobTracker
    @NoobTracker
    I'm trying to snapshot a program that has a memory footprint of ~100M when executed on my Windows machine, it's running through WINE. criu gets stuck after a second. Taking the snapshot should be done after a few minutes, right?
    The CPU usage is nonexistent, I think it's just frozen ... odd
    NoobTracker
    @NoobTracker
    It only fails/freezes if the windows application is running
    Should I show what CRIU logs before freezing?
    NoobTracker
    @NoobTracker
    CRIU can dump GUI applications, right?
    NoobTracker
    @NoobTracker
    Pavel Tikhomirov
    @Snorch

    CRIU can dump GUI applications, right?

    In general no, probably except only VNC https://criu.org/VNC

    Prajwal S N
    @snprajwal
    Hi all, a lot of work has happened on the go-criu library for the past few months, so we've opened an issue to plan a new release
    checkpoint-restore/go-criu#86
    Do drop a comment if there's anything that's been missed
    Younes Manton
    @ymanton

    I'm seeing the following build break on Fedora Rawhide:

    In file included from criu/pie/util.c:3:
    /usr/include/sys/mount.h:240:6: error: redeclaration of 'enum fsconfig_command'
      240 | enum fsconfig_command
          |      ^~~~~~~~~~~~~~~~
    In file included from /usr/include/sys/mount.h:32:
    criu/include/linux/mount.h:11:6: note: originally defined here
       11 | enum fsconfig_command {
          |      ^~~~~~~~~~~~~~~~

    I thought it was fixed by checkpoint-restore/criu#1943 but it looks like /usr/include/sys/mount.h has

         30 #ifdef __has_include
         31 # if __has_include ("linux/mount.h")
         32 #  include "linux/mount.h"
         33 # endif
         34 #endif

    which normally finds /usr/include/linux/mount.h but when building CRIU instead grabs criu/include/linux/mount.h. Was that the intention? @rst0git any idea?

    Pavel Tikhomirov
    @Snorch
    Hm, for me it looks like a bug in glibc... Maybe I'm mistaken but if glibc uses #include "" it searches for for mount.h in criu directory, but originally commit 774058d729 ("linux: Fix sys/mount.h usage with kernel headers") expects it to search mount.h in kernel source...
    Younes Manton
    @ymanton
    Yeah, I would think that a header intended to be installed in a central place should not use #include "".
    Pavel Tikhomirov
    @Snorch

    Reproduce:

    [snorch@turmoil test]$ cat include/linux/mount.h 
    enum fsconfig_command {
        FSCONFIG_SET_FLAG = 10,
    };
    [snorch@turmoil test]$ cat test.c 
    #include <stdio.h>
    /* Include new glibc sys/mount.h header */
    #include "/home/snorch/devel/general/glibc/sysdeps/unix/sysv/linux/sys/mount.h"
    
    int main () {
        printf("%d\n", FSCONFIG_SET_FLAG);
        return 0;
    }
    [snorch@turmoil test]$ gcc -o test -I include test.c
    In file included from test.c:2:
    /home/snorch/devel/general/glibc/sysdeps/unix/sysv/linux/sys/mount.h:240:6: error: redeclaration of ‘enum fsconfig_command’
      240 | enum fsconfig_command
          |      ^~~~~~~~~~~~~~~~
    In file included from /home/snorch/devel/general/glibc/sysdeps/unix/sysv/linux/sys/mount.h:32:
    include/linux/mount.h:1:6: note: originally defined here
        1 | enum fsconfig_command {
          |      ^~~~~~~~~~~~~~~~
    /home/snorch/devel/general/glibc/sysdeps/unix/sysv/linux/sys/mount.h:242:3: error: redeclaration of enumerator ‘FSCONFIG_SET_FLAG’
      242 |   FSCONFIG_SET_FLAG       = 0,    /* Set parameter, supplying no value */
          |   ^~~~~~~~~~~~~~~~~
    include/linux/mount.h:2:9: note: previous definition of ‘FSCONFIG_SET_FLAG’ with type ‘enum fsconfig_command’
        2 |         FSCONFIG_SET_FLAG = 10,
          |         ^~~~~~~~~~~~~~~~~

    Let's wait for second opinion on it.

    probably I should've used -iquote, but anyways the same error
    Pavel Tikhomirov
    @Snorch
    @ymanton does checkpoint-restore/criu#1943 help?
    Radostin Stoyanov
    @rst0git
    We can see the same errors in our CI as well: https://github.com/checkpoint-restore/criu/runs/7963467477
    Younes Manton
    @ymanton
    @Snorch No, the above error is with those patches included.
    Younes Manton
    @ymanton
    Should criu/include/linux/mount.h be called criu/include/sys/mount.h instead since it's trying to provide the stuff from /usr/include/sys/mount.h not /usr/include/linux/mount.h?
    Radostin Stoyanov
    @rst0git
    @ymanton criu/include/linux/mount.h was introduced in commit: checkpoint-restore/criu@b5b1c4e
    This file was initially used to provide missing declarations from linux/mount.h, but the subfolder name (sys or linux) in the criu source tree does not make any difference.
    Radostin Stoyanov
    @rst0git
    @Snorch I am able to replicate the compilation errors from CI with fedora:rawhide container.
    Radostin Stoyanov
    @rst0git
    I've opened a pull request with a fix: checkpoint-restore/criu#1962
    Bui Quang Minh
    @minhbq-99
    Hi everyone, I'm trying to implement C/R support for cgroupv2 threaded controller which means threads in a process may belong to different controllers.
    As threads are cloned and restored later in restorer, my idea is to create a service fd (cgroupd) working like usernsd that receives the cg_set number from restored thread and the thread id then fix up the thread's controller (write thread id to controller/cgroup.threads). However, AFAIK, SCM_CREDENTIALS cmsg contains the process id (thread group id) not the thread id. So how can we pass the thread id across the namespace boundary?
    Pavel Tikhomirov
    @Snorch

    my idea is to create a service fd (cgroupd) working like usernsd

    Why not just use usernsd, e.g. see how userns_move works, but just give the tid in it's arguments? (instead of using the one SCM_CREDENTIALS give you)

    Bui Quang Minh
    @minhbq-99
    I think the reason behind using SCM_CREDENTIALS is that it transforms the pid of caller (which may be in pid namespace) into outer pid namespace of callee (usernsd). If we pass tid directly from the inside pid namespace, it may be not correct tid from the usernsd outer pid namespace viewpoint.
    Pavel Tikhomirov
    @Snorch
    Just send item->threads[i].real as usernsd should be in criu pidns. upd: this is probably unavailable on restore, but it should not be too hard to get it from proc.
    Bui Quang Minh
    @minhbq-99
    Thanks, I will look around and try to tackle that
    Younes Manton
    @ymanton
    Is it possible for a test program to know where the parasite blob lives? I'm trying to write a test that checks the parasite blob's stack, but I don't see any existing way to do that. Maybe the test has to dig around its proc, but I was hoping a nicer way existed
    SnaK
    @SallyKAN

    Hi everyone, I am trying to checkpoint a process, but I got this error message:

    [ff.checkpoint] (0.085s) criu> (00.035845) ----------------------------------------
    [ff.checkpoint] (0.085s) criu> (00.036034)
    [ff.checkpoint] (0.085s) criu> (00.036039) Dumping pages (type: 58 pid: 1000)
    [ff.checkpoint] (0.085s) criu> (00.036041) ----------------------------------------
    [ff.checkpoint] (0.085s) criu> (00.036772) Pagemap generated: 1792 pages (0 lazy) 0 holes
    [ff.checkpoint] (0.085s) criu> (00.039268) Error (criu/page-xfer.c:254): page-xfer: Unable to spice data: Broken pipe
    [ff.checkpoint] (0.085s) criu> (00.039285) Error (criu/bfd.c:132): bfd: Error flushing image: Broken pipe
    [ff.checkpoint] (0.085s) criu> (00.039361) ----------------------------------------
    [ff.checkpoint] (0.085s) criu> (00.039364) Error (criu/mem.c:644): Can't dump page with parasite
    [ff.checkpoint] (0.085s) criu> (00.039377) Error (criu/bfd.c:132): bfd: Error flushing image: Broken pipe
    [ff.checkpoint] (0.085s) criu> (00.041419) Error (criu/bfd.c:132): bfd: Error flushing image: Broken pipe
    [ff.checkpoint] (0.085s) criu> (00.041458) Unlock network
    [ff.checkpoint] (0.085s) criu> (00.041462) Unfreezing tasks into 1
    [ff.checkpoint] (0.085s) criu> (00.041547) Dismissing the image streamer
    [ff.checkpoint] (0.085s) criu> (00.041558) Error (criu/cr-dump.c:1792): Dumping FAILED.

    Also, I read about these explaining how CRIU implements checkpoint TCP connection (https://criu.org/TCP_connection), but it's more about how they deal with sockets during the restore process.
    Can someone help to point me out why this error happens during the checkpoint process? ? Thanks in advance!

    I am also wondering can CRIU checkpoint a TCP socket with the Keep-Alive option?
    Adrian Reber
    @adrian:lisas.de
    [m]
    @SallyKAN: the output looks unusual. How are you using CRIU?
    SnaK
    @SallyKAN
    well this is actually a open source tool called fastfreeze which assemble the CR function of CRIU, it will prints the error log of CRIU execution
    it seems like CRIU dumping task failed because of a tcp socket... when I removed the tcp connection, it works fine...
    Adrian Reber
    @adrian:lisas.de
    [m]
    @SallyKAN: maybe try it first without fast freeze. @nviennot is the author of fast freeze and is also reachable here (sometimes)
    SnaK
    @SallyKAN
    I don't quite understand here in the source code of CRIU, it seems like the splice method needs to write to the sockets of the processs I am gonna checkpoint.
    /* local xfer */
    static int write_pages_loc(struct page_xfer *xfer, int p, unsigned long len)
    {
        ssize_t ret;
        ssize_t curr = 0;
    
        while (1) {
            ret = splice(p, NULL, img_raw_fd(xfer->pi), NULL, len - curr, SPLICE_F_MOVE);
            if (ret == -1) {
                pr_perror("Unable to spice data");
                return -1;
            }
            if (ret == 0) {
                pr_err("A pipe was closed unexpectedly\n");
                return -1;
            }
            curr += ret;
            if (curr == len)
                break;
        }
    
        return 0;
    }

    @SallyKAN: maybe try it first without fast freeze. @nviennot is the author of fast freeze and is also reachable here (sometimes)

    Thanks for your replying!

    Pavel Tikhomirov
    @Snorch

    I don't quite understand here in the source code of CRIU, it seems like the splice method needs to write to the sockets of the processs I am gonna checkpoint.

    In the above code splice writes memory of your dumped processes to image files.

    SnaK
    @SallyKAN

    I don't quite understand here in the source code of CRIU, it seems like the splice method needs to write to the sockets of the processs I am gonna checkpoint.

    In the above code splice writes memory of your dumped processes to image files.

    So does this mean that splice will read all the fds of my process first? I am trying to figure out in what circumstances it will throw the page-xfer: Unable to spice data: Broken pipeerror

    Pavel Tikhomirov
    @Snorch

    You can use https://github.com/Snorch/linux-helpers/blob/master/gftrace.sh like this

    perf probe -f '__x64_sys_splice%return $retval'
    bash ./gftrace.sh __x64_sys_splice

    And reproduce the problem while script is running.

    Depending on your kernel the exact traceable name of sys_splice may be different. Search for it in /sys/kernel/debug/tracing/available_filter_functions if needed.

    And provide output file (./trace), that would probably shed more light on what happens in your case.

    Normally EPIPE is returned if other end of pipe is closed and thus we would never be able to get/send data from/to it.
    Alternatively there can be other error messages in criu log, and EPIPE is not actual problem, so please attach full criu log, else it is hard to help.
    Zeyad Yasser
    @ZeyadYasser

    Hello Everyone,
    How do I make CRIU dump opened files that are on tmpfs or dev mounts (e.g. /tmp, /dev/shm). I know CRIU supports this, but I can't get it work.

        16: Error (criu/files-reg.c:2259): Can't open file dev/shm/mono.16 on restore: No such file or directory
        16: Error (criu/files-reg.c:2185): Can't open file dev/shm/mono.16: No such file or directory
        16: Error (criu/mem.c:1359): `- Can't open vma
        15: Error (criu/cr-restore.c:1494): 16 exited, status=1

    I tried specifying --external for those mountpoints, but still they are not being dumped.
    Thanks!

    Adrian Reber
    @adrian:lisas.de
    [m]
    If you say external it will definitely not work. Good question. Not entirely sure, but maybe CRIU only dumps a tmpfs if you are running in a mount namespace. Not sure. It works always for containers. @Snorch do you know when CRIU also includes the tmpfs contents in the checkpoint?
    Radostin Stoyanov
    @rst0git

    How do I make CRIU dump opened files that are on tmpfs or dev mounts (e.g. /tmp, /dev/shm)

    There are a few ways of doing this. It depends on your use case. For instance, you can use action-script as shown in the following example.
    https://github.com/checkpoint-restore/criu/blob/criu-dev/scripts/tmp-files.sh

    Radostin Stoyanov
    @rst0git

    Can't open file dev/shm/mono.16

    btw, we recently added support in Podman to checkpoint/restore the content of dev/shm: containers/podman#12665

    Pavel Tikhomirov
    @Snorch
    @ZeyadYasser
    1) CRIU dumps mounts only when dumped process mount namespace is dumped
    2) Mount namespace is dumped if process is in different mount namespace to CRIU (CRIU assumes mount namespace belongs to dumped process exclusively like in a container)
    3) When tmpfs mount is dumped its content is always collected in tar image and on restore it is put back in newly created tmpfs
    4) External mounts for CRIU is kind of a blackbox, CRIU does not dump them, the user should provide all needed mounts with exactly the same content on restore via CRIU options.
    Zeyad Yasser
    @ZeyadYasser
    Thanks everyone, it makes much more sense now.
    s09bQ5
    @s09bQ5
    Hi, am I right that CRIU is able to handle System V message queues but doesn't know how to handle POSIX message queues? For me it chokes on the file descriptors of POSIX message queues that exist on an invisible mount. Or do I have to mount the mqueue filesystem in a special way?
    3 replies
    Vaibhav Jakkula
    @VaibhavJak
    Hi all . I am a 2nd year BTech(CSAI) student of IIIT Lucknow. Interested in docker and linux. Please suggest me some of the relevant projects and how to get started to contributing here, will be highly appreciated:)
    Bui Quang Minh
    @minhbq-99
    @VaibhavJak You can get some project ideas from issues or this page: https://criu.org/Google_Summer_of_Code_Ideas (some are in developing progress you may check the pull request list)