Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    alidhamieh
    @alidhamieh
    Does CRIU re-sends TCP ACK to client on restoring TCP connection? for previously established connection
    On restore i see: 15:56:00.417689 IP 10.88.0.43.8080 > 73.132.70.25.54894: Flags [.], ack 3809915341, win 229, options [nop,nop,TS val 2159752066 ecr 0], length 0
    Adrian Reber
    @adrian:lisas.de
    [m]
    @alidhamieh: I am pretty sure CRIU does not, but maybe the TCP stack does it
    alidhamieh
    @alidhamieh
    Got it.
    can we eliminate this ack of restore? do you see this ack on your setup?
    5 replies
    i want to eliminate that because it seems gCloud filters unsolicited egress ack.
    minhbq-99
    @minhbq-99
    Hi, I can see than hugetlb can be used with sysvipc shm but I cannot find any ways to determine if a shm segment is backed by hugetlb or not.
    Does anyone have ideas for this problem?
    Pavel Tikhomirov
    @Snorch

    Hi @minhbq-99 , afaics If we shmat sysvipc shm with hugetlb backing it looks the same as hugetlb mapping created by mmap. Meaning that it also has /proc/pid/map_files/file. But looking on your PR https://github.com/checkpoint-restore/criu/pull/1622/commits/3ec6dbfe29558c5067fdb7c04313f01743e694c7#diff-6f08d59ddde08ca75f7ccb0aac7f5ca6e011bd968b57d3de0ee7a1786f582763R238 I'm not sure that your dev comparison works even for mmap's. If I do simple test https://gist.github.com/Snorch/ab5f86e5e8f3d7f9fecfd7eabdcadd7a

    [root@fedora helpers]# ./shm-huge 
    shm_ptr = 0x7f8868400000
    map = 0x7f88689af000
    map2m = 0x7f8868200000

    All three different mappings have the same device:

    [root@fedora snorch]# stat /proc/136984/map_files/{7f8868400000,7f88689af000,7f8868200000}* | grep Dev
    Device: 16h/22d    Inode: 1055674     Links: 1
    Device: 16h/22d    Inode: 1055681     Links: 1
    Device: 16h/22d    Inode: 1055673     Links: 1

    On pretty new 5.13.12-200.fc34.x86_64 kernel.

    Maybe I'm missing something but I don't see a way to understand which hugepage type (16k/2m/1g) is the mapping.
    Pavel Tikhomirov
    @Snorch
    Ah I missed that we get dev from /proc/pid/maps not from stat, then dev looks like indicating that it's a hugepage and everything is ok:
    [root@fedora helpers]# ./shm-huge 
    shm_ptr = 0x7f555e800000
    map = 0x7f555ed76000
    map2m = 0x7f555e600000
    
    [root@fedora helpers]# grep "7f555e800000\|7f555ed76000\|7f555e600000" /proc/158858/maps
    7f555e600000-7f555e800000 rw-s 00000000 00:0f 1088051                    /anon_hugepage (deleted)
    7f555e800000-7f555ea00000 rw-s 00000000 00:0f 65567                      /SYSV6129e7d0 (deleted)
    7f555ed76000-7f555ed77000 rw-s 00000000 00:01 73556                      /dev/zero (deleted)
    minhbq-99
    @minhbq-99
    Hi @Snorch , I use that dev number to detect hugetlb, with different page size (2MB, 1GB), we will have different device number
    For the mapping, file path is used to differentiate between shm (/SYSV), memfd (/memfd), I will update the pull request with my latest local branch.
    Pavel Tikhomirov
    @Snorch
    To conclude: sysvipc shm should be exactly same as mmap'ed regions
    minhbq-99
    @minhbq-99
    The problem is that shm may not be mmaped (not using shmat yet) but I come up with an idea that when collecting shm key we will use shmat to check if it is hugetlb
    Pavel Tikhomirov
    @Snorch
    yes
    one more thing here, You detect device numbers for hugetlb and cache this in kdat, can't it be a problem if a new hugetlb dev appear?
    Pavel Tikhomirov
    @Snorch
    probably we need to refresh hugetlb device numbers each run...
    minhbq-99
    @minhbq-99
    yes, that's my solution, every time we load the kerndat cache, we need to collect hugetlb dev again
    Pavel Tikhomirov
    @Snorch
    nice, thanks!
    minhbq-99
    @minhbq-99

    Hi, my pull request fails on a CentOS 7 user_namespace test case. The problem is in restoring hugetlb shmem mappings, when restoring shmem mappings, we try to use memfd, if we cannot, we open map_files link of that mapping. In case of of CentOS 7, we fall into open map_files link and we don't have CAP_SYS_ADMIN cap

    https://elixir.bootlin.com/linux/v3.10/source/fs/proc/base.c#L1889

    With some debuggings, I found that the restored process have CAP_SYS_ADMIN but its cred->user_ns has the lower level than init_user_ns. But why the checkpoint process can open map_files link? I see that checkpoint process's cred->user_ns is the same as init_user_ns. So why there is a difference in cred->user_ns between checkpoint and restore process?

    minhbq-99
    @minhbq-99
    Hmm, I understand the problem. When checkpointing, the criu process in root userns tries to checkpoint a process inside a userns. So the checkpoint code are actually in root userns. On the other hand, when restoring, the restoring code is run by the process that is in a userns
    Andrei Vagin
    @avagin
    @minhbq-99 you can look at userns_call
    Doraemon
    @dongliangde
    Processes that have been restored by criu-ns, if frozen again, the following error will occur
    Traceback (most recent call last):
      File "./criu-ns", line 231, in <module>
        res = wrap_dump()
      File "./criu-ns", line 200, in wrap_dump
        set_pidns(pid, pid_idx)
      File "./criu-ns", line 161, in set_pidns
        raise OSError(errno.ENOENT, 'Cannot find NSpid field in proc')
    FileNotFoundError: [Errno 2] Cannot find NSpid field in proc
    Adrian Reber
    @adrian:lisas.de
    [m]
    Which OS are you running it on? I think CentOS 7 does not have NSpid
    everybody else should have it
    Doraemon
    @dongliangde

    Which OS are you running it on? I think CentOS 7 does not have NSpid

    Run in docker, the bottom package is ubuntu

    Adrian Reber
    @adrian:lisas.de
    [m]
    which version of ubuntu?
    cat /proc/self/status | grep NSpid that command should work
    Doraemon
    @dongliangde

    which version of ubuntu?

    Ubuntu 20.04.3

    which version of ubuntu?

    When criu is restored, it will cause pid conflicts, so through criu-ns restoration under the new command space, the successful process has been restored, and freezing again will cause this problem

    Doraemon
    @dongliangde

    which version of ubuntu?

    Not found by the command

    Adrian Reber
    @adrian:lisas.de
    [m]
    uname -a ?
    Doraemon
    @dongliangde

    uname -a ?

    Linux 8194e282c3c5 3.10.0-1062.el7.x86_64 #1 SMP Wed Aug 7 18:08:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

    Adrian Reber
    @adrian:lisas.de
    [m]
    that does not sound like ubuntu
    that is centos 7 and simply does not have NSpid
    your kernel is too old, sorry
    Doraemon
    @dongliangde

    that does not sound like ubuntu

    Got it, thanks

    Avijit Dash
    @Avijit009

    While dumping a container (using runc diskless method), I get a stats-dump file on the source node. But on the destination side, after restoring that container, I am not getting the stats-restore file. Isn't it possible to get that stats-restore thing?

    (I also used podman where stats-dump and stats-restore both got on the source and destination)

    Though I had asked this question before and thought I could solve this issue, I failed to do that.

    Radostin Stoyanov
    @rst0git
    @Avijit009 CRIU creates stats-restore in https://github.com/checkpoint-restore/criu/blob/014e4f3002a5b5f01f619252cd0b1b1f4632aa9b/criu/cr-restore.c#L2427
    If you follow the steps in https://github.com/checkpoint-restore/criu/issues/1652#issuecomment-968341985 c1/stats-restore should be created.
    What runc commands are you using? (Note that --pre-dump doesn't create a complete checkpoint)
    Avijit Dash
    @Avijit009

    @rst0git
    sudo runc checkpoint --pre-dump --image-path <dir> --work-path <dir> looper --page-server <dest_ip>:port --tcp-established

    sudo runc checkpoint --image-path <dir> --work-path <dir> looper --page-server <dest_ip>:port --tcp-established

    I am using these two commands to pre-dump and dump the container.

    In case of https://github.com/checkpoint-restore/criu/issues/1652#issuecomment-968341985 this one both stats-dump and stats-restore was created.

    Radostin Stoyanov
    @rst0git

    Hi @Avijit009,

    Assuming that you are migrating a runc container between two VMs, you can try the following steps:

    1 [Both VMs] Make sure that you have the same rootfs and config.json:

    mkdir -p tmp/rootfs && cd tmp
    sudo docker export $(sudo docker create alpine:latest) --output="alpine.tar"
    sudo tar xf alpine.tar -C rootfs
    runc spec
    sed -i '/terminal/c\   \"terminal": false,' config.json
    sed -i '/"sh"/c\   \"sh", "-c", "i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done"' config.json

    2 [Src VM] Run container:

    sudo runc run -d looper &> /dev/null < /dev/null
    sudo runc ps looper

    3 [Dst VM] Start page server:

    mkdir c1
    sudo criu page-server --images-dir c1 --port 5000

    4 [Src VM] Run first pre-dump checkpoint:

    sudo runc checkpoint --pre-dump --image-path c1 --work-path c1 --page-server <dst IP address>:5000 looper
    # sudo crit show c1/stats-dump

    5 [Dst VM] Start page server (again):

    sudo criu page-server --images-dir c1 --port 5000 --auto-dedup

    4 [Src VM] Run second pre-dump checkpoint:

    sudo runc checkpoint --page-server <dst IP address>:5000 --pre-dump --image-path c2 --work-path c2 looper
    # sudo crit show c2/stats-dump

    5 [Dst VM] Start page server (again):

    sudo criu page-server --images-dir c1 --port 5000 --auto-dedup

    6 [Src VM] Run final checkpoint:

    sudo runc checkpoint --page-server <dst IP address>:5000 --image-path c3 --work-path c3 looper
    # sudo crit show c3/stats-dump
    # Send checkpoint files to destination VM:
    scp -r ./c3 <dst IP address>:

    7 [Dst VM] Restore container:

    # Combine all checkpoint files
    mv ~/c3/* c1
    # Restore container
    sudo runc restore -d --image-path c1 --work-path c1 looper
    # sudo crit show c1/stats-restore

    The following commands are useful for clean-up:

    # Stop container
    sudo runc kill looper KILL
    
    # Remove stopped container
    sudo runc delete looper
    Avijit Dash
    @Avijit009

    @rst0git Hi

    Thank you. It is working now.

    One more question can you refer me the engineering design of CRIU dump and restore process if there is any

    6 replies
    Jun Gan
    @geminijun
    Does anyone have any ideas about this issue? checkpoint-restore/criu#1655
    Kaustubh Welankar
    @kaustubh0x77
    Hello folks. I have a sample program which uses about 800 MB of RAM. When I dump it with --leave-running flag, I notice the memory usage spikes up to 800MB * number of forked processes. I reasoned that this could be due to some sort of copy on write like thing happening (https://criu.org/Copy-on-write_memory).
    If this is the case, I was curious about what part of the code leads to memory usage increase. Could someone point me to some code that does it?
    Adrian Reber
    @adrian:lisas.de
    [m]
    @kaustubh0x77: is the memory usage reduced again after dumping or does it stay increased?
    Pavel Tikhomirov
    @Snorch
    Knowing how memory usage was measured would also be helpful
    Kaustubh Welankar
    @kaustubh0x77
    @adrian:lisas.de Memory stays at its peak till the process gets killed. Subsequent dumps of the same process do not increase the memory usage.
    @Snorch I was tracking free memory using watch -n 0.1 free
    Adrian Reber
    @adrian:lisas.de
    [m]
    That sounds wrong, this also means you are seeing memory usage of the file system cache. Just look how much memory the processes actually use.
    Pavel Tikhomirov
    @Snorch
    you just see criu memory usage together with original memory usage