Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Kaustubh Welankar
    @kaustubh0x77

    Ah,
    root@7a7c26439bc0:/repo# free
    total used free shared buff/cache available
    Mem: 462327192 7669320 142304568 8922604 312353304 443741192
    Swap: 0 0 0

    I was looking at the used column to be precise

    I saw the cache usage going up, but that reduces after checkpointing is over
    The used increases and then stays high even after criu dump finishes
    Adrian Reber
    @adrian:lisas.de
    [m]
    @Avijit009: what do you exactly mean? Not sure what you are asking for.
    1 reply
    Kaustubh Welankar
    @kaustubh0x77

    @adrian:lisas.de , I ran my experiment again. The used size and buff/cache size both increase during dumping, and stay at that high value till we kill the process. I ran echo 1 > /proc/sys/vm/drop_caches . This reduced the buff/cache space, but not the used space

    I think this is something other than the write buffers

    Adrian Reber
    @adrian:lisas.de
    [m]
    But buffers and cache staying high is normal behavior. Those are only freed by the OS if necessary
    Kaustubh Welankar
    @kaustubh0x77
    Yup
    Adrian Reber
    @adrian:lisas.de
    [m]
    You have to check how much memory the actual process are using
    Kaustubh Welankar
    @kaustubh0x77
    How can I do that?
    Adrian Reber
    @adrian:lisas.de
    [m]
    Maybe run top and order by memory usage
    Kaustubh Welankar
    @kaustubh0x77
    Okay, that sounds good
    Adrian Reber
    @adrian:lisas.de
    [m]
    Look at /proc/PID for memory usages
    Kaustubh Welankar
    @kaustubh0x77
    The actual memory usage per process isn't going up. Its stays fixed.
    But I think thats the virtual memory usage that's being tracked. The physical memory usage looks like its increasing
    Kaustubh Welankar
    @kaustubh0x77

    Anyway, I have a minimal repro of the issue. A python script (I ran it with python3 script.py)

    from time import sleep
    import os
    
    a=10000
    b=10000
    array_ab = [ [ '?' for i in range(a) ] for j in range(b) ]
    for var in list(range(5)):
        n=os.fork()
        if n > 0:
          print("Parent process: ", id)
        else:
          break
    
    while True:
        print("Hello")
        sleep(5)
        print("World")

    The memory usage of this script is around 800 MB. After checkpointing with sudo criu dump -j -t <pid> --tcp-established --ghost-limit=9999999999 --leave-running --file-locks --images-dir /mnt/image0 and running echo 3 > /proc/sys/vm/drop_caches, the buff/cache goes down and the used still stays high.

    Adrian Reber
    @adrian:lisas.de
    [m]
    @kaustubh0x77: I just tried it and I see 6 processes, each using 800MB, after dumping I still see 6 processes each using 800MB
    Nothing changes about the processes memory consumption
    The checkpoint is about 4.8 GB
    I did not use --tcp-established --ghost-limit=9999999999 --file-locks because those options seem unnecessary
    It seems like you are just interpreting the wrong numbers
    Avijit Dash
    @Avijit009
    How to assign memory in a container? Like I wanted to assign 500MB of memory to a container and then want to migrate it.
    Adrian Reber
    @adrian:lisas.de
    [m]
    @Avijit009: please be more specific. It is not clear what you mean.
    2 replies
    Adrian Reber
    @adrian:lisas.de
    [m]
    1 reply
    Avijit Dash
    @Avijit009
    This message was deleted
    1 reply
    Pavel Tikhomirov
    @Snorch
    Happy New Year to everyone involved in making CRIU work! Thanks for another fruitful year of code/review/issue-report-and-fix! =)
    venky1254
    @venky1254
    @adrian:lisas.de I have a use case like below and want to check does CRIU helps here?
    1)Simple Echo TCP Client/server program running in podman container and supports 100
    simultaneous connections.
    2)Now I have an updated version of the TCP Client/server program to support 200
    connections and with the enhanced features.
    I want to migrate it to a differenent container where it should run the updated version of the TCP Client/Server program
    by keeping the old connections intact and accepts new connections with the same IP and Port number. Its all within the same host. Here host kernel and dependent libraries are intact.
    Adrian Reber
    @adrian:lisas.de
    [m]
    @venky1254: That does not sound like a CRIU use case. You are trying to replace one process with another process and not checkpoint/restore a process.
    Younes Manton
    @ymanton
    I'm running into issues with this code in pstree.c: https://github.com/checkpoint-restore/criu/blob/04f8368eaee2b29bb92ff0ba4f5c43501408d15e/criu/pstree.c#L372-L412
    Basically I see Migrating process tree (SID 8->7) which results in 7 being added to some set of PIDs followed by Migrating process tree (GID 8->7) which fails with Error (criu/pstree.c:404): Current gid 7 intersects with pid (255) in images because it actually collides with the 7 that was just added. The code doesn't make much sense to me; if sid/gid is initially the same they shouldn't really be considered as colliding after migration should they?
    Rajneesh Bhardwaj
    @rajbhar
    So I ran into this classic issue where I was checkpointing a process while the process was invoked with extended command line with piped output to a tee process e.g. "./mytest 2>&1 | tee mytest.log" . For me, the Checkpoint would always work and when I restore, I would see CRIU restore always saying OK but the process won't resume. I think I ran into the classic issue explained here https://criu.org/Inheriting_FDs_on_restore . I have two observations here:
    1 . CRIU restore logs should probably give some hint as all CRIU restore logs always said "tasks resumed successfully" but the process would immediately get killed as I was only checkpointing the process with pid of "mytest" and not "tee" . I understand I should have used this inside a wrapper shell script and then both "mytest and tee" would become children of the shell script and would work fine. I spent a lot of time debugging in kernel mode (amdgpu driver) which I could have probably avoided if I was aware of this behavior or little less dumb :)
    2 . Documentation here https://criu.org/Inheriting_FDs_on_restore is not very clear about the usage of --inherit-fd to me. I think we could improve it but my understanding is not good yet around the use of --inherit-fd to contribute to the page.
    Rajneesh Bhardwaj
    @rajbhar
    If you agree that --inherit-fd issue could have been hinted/warned by the logs, just like hint for --tcp-established etc, I could work a patch for it. Please let me know your thoughts. Thank you!
    Rajneesh Bhardwaj
    @rajbhar
    I was able to restore the task with sudo criu restore -vvv -o restore.log --shell-job --link-remap --inherit-fd fd[2]:pipe:[257258] && echo OK where I had to look into the checkpointed logs for pipe_id and fd.
    (00.023302) Dumping pipe 15 with id 0x6 pipe_id 0x3ecea
    (00.023313) 23339 fdinfo 2: pos: 0 flags: 1/0
    Pavel Tikhomirov
    @Snorch

    I'm running into issues with this code in pstree.c: https://github.com/checkpoint-restore/criu/blob/04f8368eaee2b29bb92ff0ba4f5c43501408d15e/criu/pstree.c#L372-L412
    Basically I see Migrating process tree (SID 8->7) which results in 7 being added to some set of PIDs followed by Migrating process tree (GID 8->7) which fails with Error (criu/pstree.c:404): Current gid 7 intersects with pid (255) in images because it actually collides with the 7 that was just added. The code doesn't make much sense to me; if sid/gid is initially the same they shouldn't really be considered as colliding after migration should they?

    I believe that in some cases the phraze "shouldn't be considered as colliding" can be arguable. Imagine when you dump the process it has both sid=10 and pgid=20 external (no process with pid=10 or pid=20 in dumped subtree of processes) but sid and pgid are different, and on restore you try to rewrite (10,20) with (30,30) because the process calling restore happened to have same sid=pgid=30. Changing 10->30 and 20->30 can be considered wrong because we convert separate process group to an session-initial process group, this may also affect signal behavior after restore. So I'm not sure, probably we should restrict restoring with --shell-job if the inherited sid/pgid topology does not match exactly.

    Pavel Tikhomirov
    @Snorch
    But for your exact case where (8,8) changes to (7,7) looks like we had a bug, I will send fix pr soon.
    Younes Manton
    @ymanton
    @Snorch I agree that (x,y)->(z,z) is not possible to deal with, but for me it looks like its frequently the (x,x)->(y,y) case. Thanks for the patch! I'll test your fix and reply in the PR.
    Pavel Tikhomirov
    @Snorch
    @ymanton You are welcome! And thanks for reporting it!
    Rajneesh Bhardwaj
    @rajbhar

    If you agree that --inherit-fd issue could have been hinted/warned by the logs, just like hint for --tcp-established etc, I could work a patch for it. Please let me know your thoughts. Thank you!

    @Snorch @adrian:lisas.de

    Luca Repetti
    @klaa97

    Hello!

    Do you have any experience in trying to checkpoint/restart MPI processes? In particular, I am stumbling across this error in the restore.log:

    56227: Error (criu/files-reg.c:1831): Can't open file dev/shm/vader_segment.box.54c60001.0 on restore: No such file or directory
       56227: Error (criu/files-reg.c:1767): Can't open file dev/shm/vader_segment.box.54c60001.0: No such file or directory
                                                                              56227: Error (criu/mem.c:1383): `- Can't open vma
                                               Error (criu/cr-restore.c:2397): Restoring FAILED.

    I have --tcp-established enabled, and to me it seems like the problem is in the restoring of the open connection; for more context, the process I am trying to restart is an MPI process; I dump it using --leave-runningand then restore it, but it does not seem to work and I get the error above; any suggestion on how to investigate/fix this problem?

    Adrian Reber
    @adrian:lisas.de
    [m]
    @klaa97: I have done this many years ago and it is difficult
    Open MPI uses shared memory for local inter process communication
    to handle that correctly you need to run it in an IPC namespace
    but you can also switch to TCP based localhost communication in Open MPI
    that is slightly slower, but checkpointable
    Luca Repetti
    @klaa97

    First of all thank you so much for the answer!

    I read various issue (investigating this since a few weeks) on this topic (such as this one https://github.com/checkpoint-restore/criu/issues/1247#issuecomment-717903614 ) but what's not clear to me is that I am actually trying to restore the process on the same host, so even if shared memory was actually used, I don't get why MPI cannot use the same files.

    Anyway I will to try to solve this by:
    A: load the MPI nodes in separates Docker environments (yes, I have docker overhead, but I get CRIU to checkpoint not an MPI process with all the related complications but a Docker image, which should be much better)
    B: change the local process communication - I will try setting mca to tcp and check what happens

    Adrian Reber
    @adrian:lisas.de
    [m]
    @klaa97: Restoring shared memory can be difficult. Not sure it will work with an IPC namespace. I always switched to TCP. That works.
    With the container. Not sure how you are using it. There are usually two approaches with containers and MPI
    One container for each MPI rank or one container for all ranks and the main process