Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    alidhamieh
    @alidhamieh
    can we eliminate this ack of restore? do you see this ack on your setup?
    5 replies
    i want to eliminate that because it seems gCloud filters unsolicited egress ack.
    minhbq-99
    @minhbq-99
    Hi, I can see than hugetlb can be used with sysvipc shm but I cannot find any ways to determine if a shm segment is backed by hugetlb or not.
    Does anyone have ideas for this problem?
    Pavel Tikhomirov
    @Snorch

    Hi @minhbq-99 , afaics If we shmat sysvipc shm with hugetlb backing it looks the same as hugetlb mapping created by mmap. Meaning that it also has /proc/pid/map_files/file. But looking on your PR https://github.com/checkpoint-restore/criu/pull/1622/commits/3ec6dbfe29558c5067fdb7c04313f01743e694c7#diff-6f08d59ddde08ca75f7ccb0aac7f5ca6e011bd968b57d3de0ee7a1786f582763R238 I'm not sure that your dev comparison works even for mmap's. If I do simple test https://gist.github.com/Snorch/ab5f86e5e8f3d7f9fecfd7eabdcadd7a

    [root@fedora helpers]# ./shm-huge 
    shm_ptr = 0x7f8868400000
    map = 0x7f88689af000
    map2m = 0x7f8868200000

    All three different mappings have the same device:

    [root@fedora snorch]# stat /proc/136984/map_files/{7f8868400000,7f88689af000,7f8868200000}* | grep Dev
    Device: 16h/22d    Inode: 1055674     Links: 1
    Device: 16h/22d    Inode: 1055681     Links: 1
    Device: 16h/22d    Inode: 1055673     Links: 1

    On pretty new 5.13.12-200.fc34.x86_64 kernel.

    Maybe I'm missing something but I don't see a way to understand which hugepage type (16k/2m/1g) is the mapping.
    Pavel Tikhomirov
    @Snorch
    Ah I missed that we get dev from /proc/pid/maps not from stat, then dev looks like indicating that it's a hugepage and everything is ok:
    [root@fedora helpers]# ./shm-huge 
    shm_ptr = 0x7f555e800000
    map = 0x7f555ed76000
    map2m = 0x7f555e600000
    
    [root@fedora helpers]# grep "7f555e800000\|7f555ed76000\|7f555e600000" /proc/158858/maps
    7f555e600000-7f555e800000 rw-s 00000000 00:0f 1088051                    /anon_hugepage (deleted)
    7f555e800000-7f555ea00000 rw-s 00000000 00:0f 65567                      /SYSV6129e7d0 (deleted)
    7f555ed76000-7f555ed77000 rw-s 00000000 00:01 73556                      /dev/zero (deleted)
    minhbq-99
    @minhbq-99
    Hi @Snorch , I use that dev number to detect hugetlb, with different page size (2MB, 1GB), we will have different device number
    For the mapping, file path is used to differentiate between shm (/SYSV), memfd (/memfd), I will update the pull request with my latest local branch.
    Pavel Tikhomirov
    @Snorch
    To conclude: sysvipc shm should be exactly same as mmap'ed regions
    minhbq-99
    @minhbq-99
    The problem is that shm may not be mmaped (not using shmat yet) but I come up with an idea that when collecting shm key we will use shmat to check if it is hugetlb
    Pavel Tikhomirov
    @Snorch
    yes
    one more thing here, You detect device numbers for hugetlb and cache this in kdat, can't it be a problem if a new hugetlb dev appear?
    Pavel Tikhomirov
    @Snorch
    probably we need to refresh hugetlb device numbers each run...
    minhbq-99
    @minhbq-99
    yes, that's my solution, every time we load the kerndat cache, we need to collect hugetlb dev again
    Pavel Tikhomirov
    @Snorch
    nice, thanks!
    minhbq-99
    @minhbq-99

    Hi, my pull request fails on a CentOS 7 user_namespace test case. The problem is in restoring hugetlb shmem mappings, when restoring shmem mappings, we try to use memfd, if we cannot, we open map_files link of that mapping. In case of of CentOS 7, we fall into open map_files link and we don't have CAP_SYS_ADMIN cap

    https://elixir.bootlin.com/linux/v3.10/source/fs/proc/base.c#L1889

    With some debuggings, I found that the restored process have CAP_SYS_ADMIN but its cred->user_ns has the lower level than init_user_ns. But why the checkpoint process can open map_files link? I see that checkpoint process's cred->user_ns is the same as init_user_ns. So why there is a difference in cred->user_ns between checkpoint and restore process?

    minhbq-99
    @minhbq-99
    Hmm, I understand the problem. When checkpointing, the criu process in root userns tries to checkpoint a process inside a userns. So the checkpoint code are actually in root userns. On the other hand, when restoring, the restoring code is run by the process that is in a userns
    Andrei Vagin
    @avagin
    @minhbq-99 you can look at userns_call
    dongliangde
    @dongliangde
    Processes that have been restored by criu-ns, if frozen again, the following error will occur
    Traceback (most recent call last):
      File "./criu-ns", line 231, in <module>
        res = wrap_dump()
      File "./criu-ns", line 200, in wrap_dump
        set_pidns(pid, pid_idx)
      File "./criu-ns", line 161, in set_pidns
        raise OSError(errno.ENOENT, 'Cannot find NSpid field in proc')
    FileNotFoundError: [Errno 2] Cannot find NSpid field in proc
    Adrian Reber
    @adrian:lisas.de
    [m]
    Which OS are you running it on? I think CentOS 7 does not have NSpid
    everybody else should have it
    dongliangde
    @dongliangde

    Which OS are you running it on? I think CentOS 7 does not have NSpid

    Run in docker, the bottom package is ubuntu

    Adrian Reber
    @adrian:lisas.de
    [m]
    which version of ubuntu?
    cat /proc/self/status | grep NSpid that command should work
    dongliangde
    @dongliangde

    which version of ubuntu?

    Ubuntu 20.04.3

    which version of ubuntu?

    When criu is restored, it will cause pid conflicts, so through criu-ns restoration under the new command space, the successful process has been restored, and freezing again will cause this problem

    dongliangde
    @dongliangde

    which version of ubuntu?

    Not found by the command

    Adrian Reber
    @adrian:lisas.de
    [m]
    uname -a ?
    dongliangde
    @dongliangde

    uname -a ?

    Linux 8194e282c3c5 3.10.0-1062.el7.x86_64 #1 SMP Wed Aug 7 18:08:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

    Adrian Reber
    @adrian:lisas.de
    [m]
    that does not sound like ubuntu
    that is centos 7 and simply does not have NSpid
    your kernel is too old, sorry
    dongliangde
    @dongliangde

    that does not sound like ubuntu

    Got it, thanks

    Avijit Dash
    @Avijit009

    While dumping a container (using runc diskless method), I get a stats-dump file on the source node. But on the destination side, after restoring that container, I am not getting the stats-restore file. Isn't it possible to get that stats-restore thing?

    (I also used podman where stats-dump and stats-restore both got on the source and destination)

    Though I had asked this question before and thought I could solve this issue, I failed to do that.

    Radostin Stoyanov
    @rst0git
    @Avijit009 CRIU creates stats-restore in https://github.com/checkpoint-restore/criu/blob/014e4f3002a5b5f01f619252cd0b1b1f4632aa9b/criu/cr-restore.c#L2427
    If you follow the steps in https://github.com/checkpoint-restore/criu/issues/1652#issuecomment-968341985 c1/stats-restore should be created.
    What runc commands are you using? (Note that --pre-dump doesn't create a complete checkpoint)
    Avijit Dash
    @Avijit009

    @rst0git
    sudo runc checkpoint --pre-dump --image-path <dir> --work-path <dir> looper --page-server <dest_ip>:port --tcp-established

    sudo runc checkpoint --image-path <dir> --work-path <dir> looper --page-server <dest_ip>:port --tcp-established

    I am using these two commands to pre-dump and dump the container.

    In case of https://github.com/checkpoint-restore/criu/issues/1652#issuecomment-968341985 this one both stats-dump and stats-restore was created.

    Radostin Stoyanov
    @rst0git

    Hi @Avijit009,

    Assuming that you are migrating a runc container between two VMs, you can try the following steps:

    1 [Both VMs] Make sure that you have the same rootfs and config.json:

    mkdir -p tmp/rootfs && cd tmp
    sudo docker export $(sudo docker create alpine:latest) --output="alpine.tar"
    sudo tar xf alpine.tar -C rootfs
    runc spec
    sed -i '/terminal/c\   \"terminal": false,' config.json
    sed -i '/"sh"/c\   \"sh", "-c", "i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done"' config.json

    2 [Src VM] Run container:

    sudo runc run -d looper &> /dev/null < /dev/null
    sudo runc ps looper

    3 [Dst VM] Start page server:

    mkdir c1
    sudo criu page-server --images-dir c1 --port 5000

    4 [Src VM] Run first pre-dump checkpoint:

    sudo runc checkpoint --pre-dump --image-path c1 --work-path c1 --page-server <dst IP address>:5000 looper
    # sudo crit show c1/stats-dump

    5 [Dst VM] Start page server (again):

    sudo criu page-server --images-dir c1 --port 5000 --auto-dedup

    4 [Src VM] Run second pre-dump checkpoint:

    sudo runc checkpoint --page-server <dst IP address>:5000 --pre-dump --image-path c2 --work-path c2 looper
    # sudo crit show c2/stats-dump

    5 [Dst VM] Start page server (again):

    sudo criu page-server --images-dir c1 --port 5000 --auto-dedup

    6 [Src VM] Run final checkpoint:

    sudo runc checkpoint --page-server <dst IP address>:5000 --image-path c3 --work-path c3 looper
    # sudo crit show c3/stats-dump
    # Send checkpoint files to destination VM:
    scp -r ./c3 <dst IP address>:

    7 [Dst VM] Restore container:

    # Combine all checkpoint files
    mv ~/c3/* c1
    # Restore container
    sudo runc restore -d --image-path c1 --work-path c1 looper
    # sudo crit show c1/stats-restore

    The following commands are useful for clean-up:

    # Stop container
    sudo runc kill looper KILL
    
    # Remove stopped container
    sudo runc delete looper
    Avijit Dash
    @Avijit009

    @rst0git Hi

    Thank you. It is working now.

    One more question can you refer me the engineering design of CRIU dump and restore process if there is any

    6 replies
    Jun Gan
    @geminijun
    Does anyone have any ideas about this issue? checkpoint-restore/criu#1655
    Kaustubh Welankar
    @kaustubh0x77
    Hello folks. I have a sample program which uses about 800 MB of RAM. When I dump it with --leave-running flag, I notice the memory usage spikes up to 800MB * number of forked processes. I reasoned that this could be due to some sort of copy on write like thing happening (https://criu.org/Copy-on-write_memory).
    If this is the case, I was curious about what part of the code leads to memory usage increase. Could someone point me to some code that does it?
    Adrian Reber
    @adrian:lisas.de
    [m]
    @kaustubh0x77: is the memory usage reduced again after dumping or does it stay increased?
    Pavel Tikhomirov
    @Snorch
    Knowing how memory usage was measured would also be helpful
    Kaustubh Welankar
    @kaustubh0x77
    @adrian:lisas.de Memory stays at its peak till the process gets killed. Subsequent dumps of the same process do not increase the memory usage.
    @Snorch I was tracking free memory using watch -n 0.1 free
    Adrian Reber
    @adrian:lisas.de
    [m]
    That sounds wrong, this also means you are seeing memory usage of the file system cache. Just look how much memory the processes actually use.
    Pavel Tikhomirov
    @Snorch
    you just see criu memory usage together with original memory usage
    To exclude file system cache from equation you should look not on “free” but on “available” memory change
    And yes when criu saves X processes with Y memory each, criu would use X*Y memory with file caches. Thought due to copy-on-write original processes might used only Y
    Kaustubh Welankar
    @kaustubh0x77

    Ah,
    root@7a7c26439bc0:/repo# free
    total used free shared buff/cache available
    Mem: 462327192 7669320 142304568 8922604 312353304 443741192
    Swap: 0 0 0

    I was looking at the used column to be precise

    I saw the cache usage going up, but that reduces after checkpointing is over