Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Adrian Reber
    @adrian:lisas.de
    [m]
    which version of ubuntu?
    cat /proc/self/status | grep NSpid that command should work
    dongliangde
    @dongliangde

    which version of ubuntu?

    Ubuntu 20.04.3

    which version of ubuntu?

    When criu is restored, it will cause pid conflicts, so through criu-ns restoration under the new command space, the successful process has been restored, and freezing again will cause this problem

    dongliangde
    @dongliangde

    which version of ubuntu?

    Not found by the command

    Adrian Reber
    @adrian:lisas.de
    [m]
    uname -a ?
    dongliangde
    @dongliangde

    uname -a ?

    Linux 8194e282c3c5 3.10.0-1062.el7.x86_64 #1 SMP Wed Aug 7 18:08:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

    Adrian Reber
    @adrian:lisas.de
    [m]
    that does not sound like ubuntu
    that is centos 7 and simply does not have NSpid
    your kernel is too old, sorry
    dongliangde
    @dongliangde

    that does not sound like ubuntu

    Got it, thanks

    Avijit Dash
    @Avijit009

    While dumping a container (using runc diskless method), I get a stats-dump file on the source node. But on the destination side, after restoring that container, I am not getting the stats-restore file. Isn't it possible to get that stats-restore thing?

    (I also used podman where stats-dump and stats-restore both got on the source and destination)

    Though I had asked this question before and thought I could solve this issue, I failed to do that.

    Radostin Stoyanov
    @rst0git
    @Avijit009 CRIU creates stats-restore in https://github.com/checkpoint-restore/criu/blob/014e4f3002a5b5f01f619252cd0b1b1f4632aa9b/criu/cr-restore.c#L2427
    If you follow the steps in https://github.com/checkpoint-restore/criu/issues/1652#issuecomment-968341985 c1/stats-restore should be created.
    What runc commands are you using? (Note that --pre-dump doesn't create a complete checkpoint)
    Avijit Dash
    @Avijit009

    @rst0git
    sudo runc checkpoint --pre-dump --image-path <dir> --work-path <dir> looper --page-server <dest_ip>:port --tcp-established

    sudo runc checkpoint --image-path <dir> --work-path <dir> looper --page-server <dest_ip>:port --tcp-established

    I am using these two commands to pre-dump and dump the container.

    In case of https://github.com/checkpoint-restore/criu/issues/1652#issuecomment-968341985 this one both stats-dump and stats-restore was created.

    Radostin Stoyanov
    @rst0git

    Hi @Avijit009,

    Assuming that you are migrating a runc container between two VMs, you can try the following steps:

    1 [Both VMs] Make sure that you have the same rootfs and config.json:

    mkdir -p tmp/rootfs && cd tmp
    sudo docker export $(sudo docker create alpine:latest) --output="alpine.tar"
    sudo tar xf alpine.tar -C rootfs
    runc spec
    sed -i '/terminal/c\   \"terminal": false,' config.json
    sed -i '/"sh"/c\   \"sh", "-c", "i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done"' config.json

    2 [Src VM] Run container:

    sudo runc run -d looper &> /dev/null < /dev/null
    sudo runc ps looper

    3 [Dst VM] Start page server:

    mkdir c1
    sudo criu page-server --images-dir c1 --port 5000

    4 [Src VM] Run first pre-dump checkpoint:

    sudo runc checkpoint --pre-dump --image-path c1 --work-path c1 --page-server <dst IP address>:5000 looper
    # sudo crit show c1/stats-dump

    5 [Dst VM] Start page server (again):

    sudo criu page-server --images-dir c1 --port 5000 --auto-dedup

    4 [Src VM] Run second pre-dump checkpoint:

    sudo runc checkpoint --page-server <dst IP address>:5000 --pre-dump --image-path c2 --work-path c2 looper
    # sudo crit show c2/stats-dump

    5 [Dst VM] Start page server (again):

    sudo criu page-server --images-dir c1 --port 5000 --auto-dedup

    6 [Src VM] Run final checkpoint:

    sudo runc checkpoint --page-server <dst IP address>:5000 --image-path c3 --work-path c3 looper
    # sudo crit show c3/stats-dump
    # Send checkpoint files to destination VM:
    scp -r ./c3 <dst IP address>:

    7 [Dst VM] Restore container:

    # Combine all checkpoint files
    mv ~/c3/* c1
    # Restore container
    sudo runc restore -d --image-path c1 --work-path c1 looper
    # sudo crit show c1/stats-restore

    The following commands are useful for clean-up:

    # Stop container
    sudo runc kill looper KILL
    
    # Remove stopped container
    sudo runc delete looper
    Avijit Dash
    @Avijit009

    @rst0git Hi

    Thank you. It is working now.

    One more question can you refer me the engineering design of CRIU dump and restore process if there is any

    6 replies
    Jun Gan
    @geminijun
    Does anyone have any ideas about this issue? checkpoint-restore/criu#1655
    Kaustubh Welankar
    @kaustubh0x77
    Hello folks. I have a sample program which uses about 800 MB of RAM. When I dump it with --leave-running flag, I notice the memory usage spikes up to 800MB * number of forked processes. I reasoned that this could be due to some sort of copy on write like thing happening (https://criu.org/Copy-on-write_memory).
    If this is the case, I was curious about what part of the code leads to memory usage increase. Could someone point me to some code that does it?
    Adrian Reber
    @adrian:lisas.de
    [m]
    @kaustubh0x77: is the memory usage reduced again after dumping or does it stay increased?
    Pavel Tikhomirov
    @Snorch
    Knowing how memory usage was measured would also be helpful
    Kaustubh Welankar
    @kaustubh0x77
    @adrian:lisas.de Memory stays at its peak till the process gets killed. Subsequent dumps of the same process do not increase the memory usage.
    @Snorch I was tracking free memory using watch -n 0.1 free
    Adrian Reber
    @adrian:lisas.de
    [m]
    That sounds wrong, this also means you are seeing memory usage of the file system cache. Just look how much memory the processes actually use.
    Pavel Tikhomirov
    @Snorch
    you just see criu memory usage together with original memory usage
    To exclude file system cache from equation you should look not on “free” but on “available” memory change
    And yes when criu saves X processes with Y memory each, criu would use X*Y memory with file caches. Thought due to copy-on-write original processes might used only Y
    Kaustubh Welankar
    @kaustubh0x77

    Ah,
    root@7a7c26439bc0:/repo# free
    total used free shared buff/cache available
    Mem: 462327192 7669320 142304568 8922604 312353304 443741192
    Swap: 0 0 0

    I was looking at the used column to be precise

    I saw the cache usage going up, but that reduces after checkpointing is over
    The used increases and then stays high even after criu dump finishes
    Adrian Reber
    @adrian:lisas.de
    [m]
    @Avijit009: what do you exactly mean? Not sure what you are asking for.
    1 reply
    Kaustubh Welankar
    @kaustubh0x77

    @adrian:lisas.de , I ran my experiment again. The used size and buff/cache size both increase during dumping, and stay at that high value till we kill the process. I ran echo 1 > /proc/sys/vm/drop_caches . This reduced the buff/cache space, but not the used space

    I think this is something other than the write buffers

    Adrian Reber
    @adrian:lisas.de
    [m]
    But buffers and cache staying high is normal behavior. Those are only freed by the OS if necessary
    Kaustubh Welankar
    @kaustubh0x77
    Yup
    Adrian Reber
    @adrian:lisas.de
    [m]
    You have to check how much memory the actual process are using
    Kaustubh Welankar
    @kaustubh0x77
    How can I do that?
    Adrian Reber
    @adrian:lisas.de
    [m]
    Maybe run top and order by memory usage
    Kaustubh Welankar
    @kaustubh0x77
    Okay, that sounds good
    Adrian Reber
    @adrian:lisas.de
    [m]
    Look at /proc/PID for memory usages
    Kaustubh Welankar
    @kaustubh0x77
    The actual memory usage per process isn't going up. Its stays fixed.
    But I think thats the virtual memory usage that's being tracked. The physical memory usage looks like its increasing
    Kaustubh Welankar
    @kaustubh0x77

    Anyway, I have a minimal repro of the issue. A python script (I ran it with python3 script.py)

    from time import sleep
    import os
    
    a=10000
    b=10000
    array_ab = [ [ '?' for i in range(a) ] for j in range(b) ]
    for var in list(range(5)):
        n=os.fork()
        if n > 0:
          print("Parent process: ", id)
        else:
          break
    
    while True:
        print("Hello")
        sleep(5)
        print("World")

    The memory usage of this script is around 800 MB. After checkpointing with sudo criu dump -j -t <pid> --tcp-established --ghost-limit=9999999999 --leave-running --file-locks --images-dir /mnt/image0 and running echo 3 > /proc/sys/vm/drop_caches, the buff/cache goes down and the used still stays high.

    Adrian Reber
    @adrian:lisas.de
    [m]
    @kaustubh0x77: I just tried it and I see 6 processes, each using 800MB, after dumping I still see 6 processes each using 800MB
    Nothing changes about the processes memory consumption
    The checkpoint is about 4.8 GB
    I did not use --tcp-established --ghost-limit=9999999999 --file-locks because those options seem unnecessary
    It seems like you are just interpreting the wrong numbers
    Avijit Dash
    @Avijit009
    How to assign memory in a container? Like I wanted to assign 500MB of memory to a container and then want to migrate it.
    Adrian Reber
    @adrian:lisas.de
    [m]
    @Avijit009: please be more specific. It is not clear what you mean.
    2 replies
    Adrian Reber
    @adrian:lisas.de
    [m]
    1 reply
    Avijit Dash
    @Avijit009
    This message was deleted
    1 reply