Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Younes Manton
    @ymanton
    If you mean checkpointing containers from the host side, I'm not doing that exactly. I'm checkpointing and restoring a process in the container manually. I haven't checked if this problem exists when checkpointing containers.
    Adrian Reber
    @adrian:lisas.de
    [m]
    Ah
    And can you do a chmod / or that doesn't work from the inside?
    Did you try to change the checkpoint image using crit to have the desired mode in the image
    Younes Manton
    @ymanton
    chmod works, I'll try adding that somewhere in my sequence to see if it gets around the problem. Haven't tried crit, but I'll give it a go if need be.
    Adrian Reber
    @adrian:lisas.de
    [m]
    From my point of view a criu option could make sense, but maybe I am also missing something. You could open a PR to see what the other maintainers are thinking about it.
    Younes Manton
    @ymanton
    Thanks, I'll do that.
    alidhamieh
    @alidhamieh
    @adrian:lisas.de Do you know a simple multi-tier app example out there to test live migration of a stack of few containers app?
    Adrian Reber
    @adrian:lisas.de
    [m]
    @alidhamieh: not sure I correctly understand your question, but I would say I am not aware of something like that
    alidhamieh
    @alidhamieh
    like app that has multi containers similar to your podman-criu-test container for live migration but for stack live migration
    jumbohei
    @jumbohei
    Hi, I have a question regarding to the permission and much appreciated for inputs. Let’s say if a user A who writes a program and runs as a user and then he/she wants to take a snapshot and I don’t want to give this person the root access. Is that doable? If yes, can user A goes into the image and modify the UID to 0, which is as root, to restore and run that application as root? Thanks!
    Liang Chun
    @featherchen
    Hi everyone, I am liangchun. As an open source lover, I am pleased to join GSoC 2022 as a contributor of CRIU, and I will be focus on the topic: Support sparse ghosts this summer. I am looking forward to work together with the community.
    Adrian Reber
    @adrian:lisas.de
    [m]
    @featherchen: welcome
    manasmgkar
    @manasmgkar
    Welcome @featherchen
    Pavel Tikhomirov
    @Snorch
    @featherchen welcome =)
    alidhamieh
    @alidhamieh
    When running sudo podman container checkpoint x --export=/tmp/x.tar.gz --tcp-established, I had this error (01.639740) mnt: 902: 72:/ @ ./sys
    (01.639742) mnt: 901: 71:/ @ ./dev
    (01.639746) mnt: Mount is not fully visible ./dev
    (01.639781) mnt: mount has children ./dev
    (01.648461) mnt: 900: 6e:/ @ ./proc
    (01.648478) mnt: 899: 6b:/ @ ./
    (01.648507) Dumping file-locks
    (01.648510) Error (criu/file-lock.c:111): Some file locks are hold by dumping tasks! You can try --file-locks to dump them.
    (01.648597) Unlock network
    (01.648635) Running network-unlock scripts
    (01.648638) RPC
    (01.653766) Unfreezing tasks into 1
    (01.653786) Unseizing 12464 into 1
    (01.654112) Error (criu/cr-dump.c:1781): Dumping FAILED.
    That is the base image: mongo:3.6.15-xenial
    Adrian Reber
    @adrian:lisas.de
    [m]
    @alidhamieh: did you try to use --file-locks
    It is an option to podman container checkpoint
    alidhamieh
    @alidhamieh
    sudo podman container checkpoint swac3-server_receiving-1 --export=/tmp/swac3-server_receiving-1.tar.gz --file-locks --tcp-established
    Error: unknown flag: --file-locks
    Adrian Reber
    @adrian:lisas.de
    [m]
    ah, then you probably need a newer version of podman or try to drop file-locks into the criu configuration file
    alidhamieh
    @alidhamieh
    how to drop file-locks in the config
    Adrian Reber
    @adrian:lisas.de
    [m]
    are you using runc or crun?
    alidhamieh
    @alidhamieh
    runc
    Adrian Reber
    @adrian:lisas.de
    [m]
    echo "file-locks" >> /etc/criun/runc.conf
    alidhamieh
    @alidhamieh
    sorry how to know if iam using runc or crun
    seems runc
    it worked without the n in criun
    Adrian Reber
    @adrian:lisas.de
    [m]
    yeah, that was a typo
    Do Hoang
    @huyhoang8398
    can i ask where is the block code that CRIU uses pages.img for restoration?
    Pavel Tikhomirov
    @Snorch

    Sure you can =)

    That's how you normally find where image is used in CRIU code:

    1) First look for an image name "pages-<id>.img":

    [# criu]$ grep -r "pages-" criu
    criu/image-desc.c:      FD_ENTRY_F(PAGES,       "pages-%u", O_NOBUF),
    criu/image-desc.c:      FD_ENTRY_F(PAGES_OLD,   "pages-%d", O_NOBUF),
    criu/image-desc.c:      FD_ENTRY_F(SHM_PAGES_OLD, "pages-shmem-%ld", O_NOBUF),

    2) Se what FD_ENTRY_F is:

    [# criu]$ git grep -A1 "#define FD_ENTRY_F"
    criu/image-desc.c:#define FD_ENTRY_F(_name, _fmt, _f)     \
    criu/image-desc.c-      [CR_FD_##_name] = {

    3) Look for CR_FD_PAGES open:

    [# criu]$ git grep open.*CR_FD_PAGES
    criu/image.c:   return open_image_at(dfd, CR_FD_PAGES, flags, *id);
    criu/mem.c:     pages = open_image(CR_FD_PAGES, opts.auto_dedup ? O_RDWR : O_RSTR, rsti(t)->pages_img_id);

    One is in prepare_vma_ios() and another is in open_pages_image_at().

    4) Vim cctree plugin says prepare_vma_ios() is on restore:

      +-< prepare_vma_ios
        +-< prepare_vmas
        | +-< restore_one_alive_task
        | | +-< restore_one_task

    5) open_pages_image_at() is both dump and restore

    Pavel Tikhomirov
    @Snorch
    And if we add some magic from here https://github.com/Snorch/call_tree_builder we get nice picture:
    pages-image.png
    Do Hoang
    @huyhoang8398
    Holy. Thanks a lots
    Do Hoang
    @huyhoang8398
    is this possible to make a fake pages.img from kernel module :| if so, can you guy recommend any way to do it
    Radostin Stoyanov
    @rst0git
    What do you mean by "fake pages.img"?
    Do Hoang
    @huyhoang8398
    i have to do some tweaking criu for my own purpose, so that i need to dump pages but using kernel module
    Radostin Stoyanov
    @rst0git
    I'm not sure I understand your question. CRIU uses the parasite code to dump memory pages: https://criu.org/Parasite_code
    Timo
    @TVH7

    Hi there,

    First of all, CRIU looks like a very cool project. Using the out of the box "criu" CLI was super easy and appears to do what I want (checkpoint a TCP connection, and restore it somewhere else, I for now, tried to do this by just restoring a docker container using the guide on criu.org) however my goal is to move this functionality to a application running in a kubernetes pod, which can than be used to "transfer" a TCP connection from my kubernetes pod to another kubernetes pod whenever a pod gets rebalanced.

    I learned that libsoccr has been build to exactly do this, so I'm trying to build a little C application that simulates a TCP client/server connetion, and uses libsoccr to checkpoint/restore the TCP connection of the client. Now the last time that I used C is years ago, so I am really struggling getting the libsoccr.a library linked to my "demo-app".

    I tried the following:

    1. build the criu repo
    2. moved the libsoccr.a from the soccr directory to a "lib" folder inside my project.
    3. Copied the include repo from criu to my project.
    4. Ran GCC to link the lib gcc doSocket.c -lsoccr -o doSocket.o -I include -L lib

    However it is failing trying to find libnet_init.

    Now I am guessing that I am linking the project incorrectly (as I probably also have to link libnet and other dependencies) but I am afraid I have to admit that I'm not really sure how to go from here as my C skills are lacking here.

    Anyone that can lead me in the right direction // is there maybe a demo application on github somewhere that demonstrates how to use libsoccr as a standalone library?

    Help would be really appreciated, I hope that I am not annoying you with my beginner-questions.

    3 replies
    Do Hoang
    @huyhoang8398
    is this possible to translate Virtual Address and Number Pages of Application to PFN or struct Page in Kernel Module?
    Shreyas Kharbanda
    @Alphacode18

    Hi all,

    I have been trying to setup CRIU v3.13 inside a privileged docker container (A dockerized CRIU Image is bundled as part of the Dockerfile on an Alpine Linux Base) to checkpoint a certain process. The containers run inside a Kubernetes cluster on a fresh install of Ubuntu 20.04 LTS.

    I have done some sanity checks, namely running the criu check command from within the container. Apart from criu check, I have checked for various privilege accesses and all come back positive.

    The criu check command returns the following:

    Error (criu/util.c:610): exited, status=1
    Error (criu/util.c:610): exited, status=1
    Warn (criu/kerndat.c:839): Can't keep kdat cache on non-tempfs
    Looks good.

    I am currently trying to get the simple loop example (https://criu.org/Simple_loop) working. After following the instructions, I see that dumping the process fails with error code -1.

    pie: 52: Warn (criu/pie/parasite.c:648): /proc/self/cgroup was bigger than the page size
    pie: 52: __sent ack msg: 76 76 -1
    pie: 52: Close the control socket for writing
    (00.037284) Fetched ack: 76 76 -1
    pie: 52: Daemon waits for command
    (00.037294) Error (compel/src/lib/infect-rpc.c:72): Command 76 for daemon failed with -1
    (00.037305) Error (criu/parasite-syscall.c:447): Parasite failed to dump /proc/self/cgroup

    I haven't had much success in finding a solution to the problem in the GitHub issues, and I'd really appreciate it if you could point me in the right direction.

    Pavel Tikhomirov
    @Snorch
    That happens likely because you have too much nested cgroup directories for a dumpee process. If you show /proc/<dumpee_pid>/cgroup that can be confirmed.
    one way of fixing/workarounding it is to use criu on host =)
    Adrian Reber
    @adrian:lisas.de
    [m]
    After almost two years my Kubernetes checkpoint support PR was finally merged today: kubernetes/kubernetes#104907 That took a lot longer than expected, but now it should be possible to checkpoint containers in Kubernetes with the help of CRIU 🎆
    Prajwal S N
    @snprajwal
    That's awesome!! Congratulations :confetti_ball:
    Zeyad Yasser
    @ZeyadYasser
    Awesome, Congrats 🎉
    Pavel Tikhomirov
    @Snorch
    Cool, Congrats!!!
    Andrei Vagin
    @avagin
    @adrian:lisas.de good job! Congrats!
    Adrian Reber
    @adrian:lisas.de
    [m]
    @mihalicyn: have you seen this: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1973620 does this mean that CRIU is again broken on Ubuntu kernels?