Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Radostin Stoyanov
    @rst0git

    Hi @Avijit009,

    Assuming that you are migrating a runc container between two VMs, you can try the following steps:

    1 [Both VMs] Make sure that you have the same rootfs and config.json:

    mkdir -p tmp/rootfs && cd tmp
    sudo docker export $(sudo docker create alpine:latest) --output="alpine.tar"
    sudo tar xf alpine.tar -C rootfs
    runc spec
    sed -i '/terminal/c\   \"terminal": false,' config.json
    sed -i '/"sh"/c\   \"sh", "-c", "i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done"' config.json

    2 [Src VM] Run container:

    sudo runc run -d looper &> /dev/null < /dev/null
    sudo runc ps looper

    3 [Dst VM] Start page server:

    mkdir c1
    sudo criu page-server --images-dir c1 --port 5000

    4 [Src VM] Run first pre-dump checkpoint:

    sudo runc checkpoint --pre-dump --image-path c1 --work-path c1 --page-server <dst IP address>:5000 looper
    # sudo crit show c1/stats-dump

    5 [Dst VM] Start page server (again):

    sudo criu page-server --images-dir c1 --port 5000 --auto-dedup

    4 [Src VM] Run second pre-dump checkpoint:

    sudo runc checkpoint --page-server <dst IP address>:5000 --pre-dump --image-path c2 --work-path c2 looper
    # sudo crit show c2/stats-dump

    5 [Dst VM] Start page server (again):

    sudo criu page-server --images-dir c1 --port 5000 --auto-dedup

    6 [Src VM] Run final checkpoint:

    sudo runc checkpoint --page-server <dst IP address>:5000 --image-path c3 --work-path c3 looper
    # sudo crit show c3/stats-dump
    # Send checkpoint files to destination VM:
    scp -r ./c3 <dst IP address>:

    7 [Dst VM] Restore container:

    # Combine all checkpoint files
    mv ~/c3/* c1
    # Restore container
    sudo runc restore -d --image-path c1 --work-path c1 looper
    # sudo crit show c1/stats-restore

    The following commands are useful for clean-up:

    # Stop container
    sudo runc kill looper KILL
    
    # Remove stopped container
    sudo runc delete looper
    Avijit Dash
    @Avijit009

    @rst0git Hi

    Thank you. It is working now.

    One more question can you refer me the engineering design of CRIU dump and restore process if there is any

    6 replies
    Jun Gan
    @geminijun
    Does anyone have any ideas about this issue? checkpoint-restore/criu#1655
    Kaustubh Welankar
    @kaustubh0x77
    Hello folks. I have a sample program which uses about 800 MB of RAM. When I dump it with --leave-running flag, I notice the memory usage spikes up to 800MB * number of forked processes. I reasoned that this could be due to some sort of copy on write like thing happening (https://criu.org/Copy-on-write_memory).
    If this is the case, I was curious about what part of the code leads to memory usage increase. Could someone point me to some code that does it?
    Adrian Reber
    @adrian:lisas.de
    [m]
    @kaustubh0x77: is the memory usage reduced again after dumping or does it stay increased?
    Pavel Tikhomirov
    @Snorch
    Knowing how memory usage was measured would also be helpful
    Kaustubh Welankar
    @kaustubh0x77
    @adrian:lisas.de Memory stays at its peak till the process gets killed. Subsequent dumps of the same process do not increase the memory usage.
    @Snorch I was tracking free memory using watch -n 0.1 free
    Adrian Reber
    @adrian:lisas.de
    [m]
    That sounds wrong, this also means you are seeing memory usage of the file system cache. Just look how much memory the processes actually use.
    Pavel Tikhomirov
    @Snorch
    you just see criu memory usage together with original memory usage
    To exclude file system cache from equation you should look not on “free” but on “available” memory change
    And yes when criu saves X processes with Y memory each, criu would use X*Y memory with file caches. Thought due to copy-on-write original processes might used only Y
    Kaustubh Welankar
    @kaustubh0x77

    Ah,
    root@7a7c26439bc0:/repo# free
    total used free shared buff/cache available
    Mem: 462327192 7669320 142304568 8922604 312353304 443741192
    Swap: 0 0 0

    I was looking at the used column to be precise

    I saw the cache usage going up, but that reduces after checkpointing is over
    The used increases and then stays high even after criu dump finishes
    Adrian Reber
    @adrian:lisas.de
    [m]
    @Avijit009: what do you exactly mean? Not sure what you are asking for.
    1 reply
    Kaustubh Welankar
    @kaustubh0x77

    @adrian:lisas.de , I ran my experiment again. The used size and buff/cache size both increase during dumping, and stay at that high value till we kill the process. I ran echo 1 > /proc/sys/vm/drop_caches . This reduced the buff/cache space, but not the used space

    I think this is something other than the write buffers

    Adrian Reber
    @adrian:lisas.de
    [m]
    But buffers and cache staying high is normal behavior. Those are only freed by the OS if necessary
    Kaustubh Welankar
    @kaustubh0x77
    Yup
    Adrian Reber
    @adrian:lisas.de
    [m]
    You have to check how much memory the actual process are using
    Kaustubh Welankar
    @kaustubh0x77
    How can I do that?
    Adrian Reber
    @adrian:lisas.de
    [m]
    Maybe run top and order by memory usage
    Kaustubh Welankar
    @kaustubh0x77
    Okay, that sounds good
    Adrian Reber
    @adrian:lisas.de
    [m]
    Look at /proc/PID for memory usages
    Kaustubh Welankar
    @kaustubh0x77
    The actual memory usage per process isn't going up. Its stays fixed.
    But I think thats the virtual memory usage that's being tracked. The physical memory usage looks like its increasing
    Kaustubh Welankar
    @kaustubh0x77

    Anyway, I have a minimal repro of the issue. A python script (I ran it with python3 script.py)

    from time import sleep
    import os
    
    a=10000
    b=10000
    array_ab = [ [ '?' for i in range(a) ] for j in range(b) ]
    for var in list(range(5)):
        n=os.fork()
        if n > 0:
          print("Parent process: ", id)
        else:
          break
    
    while True:
        print("Hello")
        sleep(5)
        print("World")

    The memory usage of this script is around 800 MB. After checkpointing with sudo criu dump -j -t <pid> --tcp-established --ghost-limit=9999999999 --leave-running --file-locks --images-dir /mnt/image0 and running echo 3 > /proc/sys/vm/drop_caches, the buff/cache goes down and the used still stays high.

    Adrian Reber
    @adrian:lisas.de
    [m]
    @kaustubh0x77: I just tried it and I see 6 processes, each using 800MB, after dumping I still see 6 processes each using 800MB
    Nothing changes about the processes memory consumption
    The checkpoint is about 4.8 GB
    I did not use --tcp-established --ghost-limit=9999999999 --file-locks because those options seem unnecessary
    It seems like you are just interpreting the wrong numbers
    Avijit Dash
    @Avijit009
    How to assign memory in a container? Like I wanted to assign 500MB of memory to a container and then want to migrate it.
    Adrian Reber
    @adrian:lisas.de
    [m]
    @Avijit009: please be more specific. It is not clear what you mean.
    2 replies
    Adrian Reber
    @adrian:lisas.de
    [m]
    1 reply
    Avijit Dash
    @Avijit009
    This message was deleted
    1 reply
    Pavel Tikhomirov
    @Snorch
    Happy New Year to everyone involved in making CRIU work! Thanks for another fruitful year of code/review/issue-report-and-fix! =)
    venky1254
    @venky1254
    @adrian:lisas.de I have a use case like below and want to check does CRIU helps here?
    1)Simple Echo TCP Client/server program running in podman container and supports 100
    simultaneous connections.
    2)Now I have an updated version of the TCP Client/server program to support 200
    connections and with the enhanced features.
    I want to migrate it to a differenent container where it should run the updated version of the TCP Client/Server program
    by keeping the old connections intact and accepts new connections with the same IP and Port number. Its all within the same host. Here host kernel and dependent libraries are intact.
    Adrian Reber
    @adrian:lisas.de
    [m]
    @venky1254: That does not sound like a CRIU use case. You are trying to replace one process with another process and not checkpoint/restore a process.