Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
    Younes Manton
    @ymanton
    I wouldn't be surprised if portability across distros is not possible for many other reasons, so this check probably doesn't hurt anything unless you went out of your way to make sure everything else was compatible, but images not being portable across container engines is a bit surprising. Can this check be relaxed or controlled via an option perhaps?
    Adrian Reber
    @adrian:lisas.de
    [m]
    The bigger problem to me seems that docker and podman are using different checkpoint Export mechanisms, so that importing it in another engine is not easily done today
    Portability across distributions sounds difficult anyway as kernels can be very different
    I have no opinion if this check could be relaxed. But as for a checkpoint image there is metadata anyway it sounds easier to track the desired mode in the metadata and during restore change the mode of the root directory
    Younes Manton
    @ymanton
    If you mean checkpointing containers from the host side, I'm not doing that exactly. I'm checkpointing and restoring a process in the container manually. I haven't checked if this problem exists when checkpointing containers.
    Adrian Reber
    @adrian:lisas.de
    [m]
    Ah
    And can you do a chmod / or that doesn't work from the inside?
    Did you try to change the checkpoint image using crit to have the desired mode in the image
    Younes Manton
    @ymanton
    chmod works, I'll try adding that somewhere in my sequence to see if it gets around the problem. Haven't tried crit, but I'll give it a go if need be.
    Adrian Reber
    @adrian:lisas.de
    [m]
    From my point of view a criu option could make sense, but maybe I am also missing something. You could open a PR to see what the other maintainers are thinking about it.
    Younes Manton
    @ymanton
    Thanks, I'll do that.
    alidhamieh
    @alidhamieh
    @adrian:lisas.de Do you know a simple multi-tier app example out there to test live migration of a stack of few containers app?
    Adrian Reber
    @adrian:lisas.de
    [m]
    @alidhamieh: not sure I correctly understand your question, but I would say I am not aware of something like that
    alidhamieh
    @alidhamieh
    like app that has multi containers similar to your podman-criu-test container for live migration but for stack live migration
    jumbohei
    @jumbohei
    Hi, I have a question regarding to the permission and much appreciated for inputs. Let’s say if a user A who writes a program and runs as a user and then he/she wants to take a snapshot and I don’t want to give this person the root access. Is that doable? If yes, can user A goes into the image and modify the UID to 0, which is as root, to restore and run that application as root? Thanks!
    Liang Chun
    @featherchen
    Hi everyone, I am liangchun. As an open source lover, I am pleased to join GSoC 2022 as a contributor of CRIU, and I will be focus on the topic: Support sparse ghosts this summer. I am looking forward to work together with the community.
    Adrian Reber
    @adrian:lisas.de
    [m]
    @featherchen: welcome
    manasmgkar
    @manasmgkar
    Welcome @featherchen
    Pavel Tikhomirov
    @Snorch
    @featherchen welcome =)
    alidhamieh
    @alidhamieh
    When running sudo podman container checkpoint x --export=/tmp/x.tar.gz --tcp-established, I had this error (01.639740) mnt: 902: 72:/ @ ./sys
    (01.639742) mnt: 901: 71:/ @ ./dev
    (01.639746) mnt: Mount is not fully visible ./dev
    (01.639781) mnt: mount has children ./dev
    (01.648461) mnt: 900: 6e:/ @ ./proc
    (01.648478) mnt: 899: 6b:/ @ ./
    (01.648507) Dumping file-locks
    (01.648510) Error (criu/file-lock.c:111): Some file locks are hold by dumping tasks! You can try --file-locks to dump them.
    (01.648597) Unlock network
    (01.648635) Running network-unlock scripts
    (01.648638) RPC
    (01.653766) Unfreezing tasks into 1
    (01.653786) Unseizing 12464 into 1
    (01.654112) Error (criu/cr-dump.c:1781): Dumping FAILED.
    That is the base image: mongo:3.6.15-xenial
    Adrian Reber
    @adrian:lisas.de
    [m]
    @alidhamieh: did you try to use --file-locks
    It is an option to podman container checkpoint
    alidhamieh
    @alidhamieh
    sudo podman container checkpoint swac3-server_receiving-1 --export=/tmp/swac3-server_receiving-1.tar.gz --file-locks --tcp-established
    Error: unknown flag: --file-locks
    Adrian Reber
    @adrian:lisas.de
    [m]
    ah, then you probably need a newer version of podman or try to drop file-locks into the criu configuration file
    alidhamieh
    @alidhamieh
    how to drop file-locks in the config
    Adrian Reber
    @adrian:lisas.de
    [m]
    are you using runc or crun?
    alidhamieh
    @alidhamieh
    runc
    Adrian Reber
    @adrian:lisas.de
    [m]
    echo "file-locks" >> /etc/criun/runc.conf
    alidhamieh
    @alidhamieh
    sorry how to know if iam using runc or crun
    seems runc
    it worked without the n in criun
    Adrian Reber
    @adrian:lisas.de
    [m]
    yeah, that was a typo
    Do Hoang
    @huyhoang8398
    can i ask where is the block code that CRIU uses pages.img for restoration?
    Pavel Tikhomirov
    @Snorch

    Sure you can =)

    That's how you normally find where image is used in CRIU code:

    1) First look for an image name "pages-<id>.img":

    [# criu]$ grep -r "pages-" criu
    criu/image-desc.c:      FD_ENTRY_F(PAGES,       "pages-%u", O_NOBUF),
    criu/image-desc.c:      FD_ENTRY_F(PAGES_OLD,   "pages-%d", O_NOBUF),
    criu/image-desc.c:      FD_ENTRY_F(SHM_PAGES_OLD, "pages-shmem-%ld", O_NOBUF),

    2) Se what FD_ENTRY_F is:

    [# criu]$ git grep -A1 "#define FD_ENTRY_F"
    criu/image-desc.c:#define FD_ENTRY_F(_name, _fmt, _f)     \
    criu/image-desc.c-      [CR_FD_##_name] = {

    3) Look for CR_FD_PAGES open:

    [# criu]$ git grep open.*CR_FD_PAGES
    criu/image.c:   return open_image_at(dfd, CR_FD_PAGES, flags, *id);
    criu/mem.c:     pages = open_image(CR_FD_PAGES, opts.auto_dedup ? O_RDWR : O_RSTR, rsti(t)->pages_img_id);

    One is in prepare_vma_ios() and another is in open_pages_image_at().

    4) Vim cctree plugin says prepare_vma_ios() is on restore:

      +-< prepare_vma_ios
        +-< prepare_vmas
        | +-< restore_one_alive_task
        | | +-< restore_one_task

    5) open_pages_image_at() is both dump and restore

    Pavel Tikhomirov
    @Snorch
    And if we add some magic from here https://github.com/Snorch/call_tree_builder we get nice picture:
    pages-image.png
    Do Hoang
    @huyhoang8398
    Holy. Thanks a lots
    Do Hoang
    @huyhoang8398
    is this possible to make a fake pages.img from kernel module :| if so, can you guy recommend any way to do it
    Radostin Stoyanov
    @rst0git
    What do you mean by "fake pages.img"?
    Do Hoang
    @huyhoang8398
    i have to do some tweaking criu for my own purpose, so that i need to dump pages but using kernel module
    Radostin Stoyanov
    @rst0git
    I'm not sure I understand your question. CRIU uses the parasite code to dump memory pages: https://criu.org/Parasite_code
    Timo
    @TVH7

    Hi there,

    First of all, CRIU looks like a very cool project. Using the out of the box "criu" CLI was super easy and appears to do what I want (checkpoint a TCP connection, and restore it somewhere else, I for now, tried to do this by just restoring a docker container using the guide on criu.org) however my goal is to move this functionality to a application running in a kubernetes pod, which can than be used to "transfer" a TCP connection from my kubernetes pod to another kubernetes pod whenever a pod gets rebalanced.

    I learned that libsoccr has been build to exactly do this, so I'm trying to build a little C application that simulates a TCP client/server connetion, and uses libsoccr to checkpoint/restore the TCP connection of the client. Now the last time that I used C is years ago, so I am really struggling getting the libsoccr.a library linked to my "demo-app".

    I tried the following:

    1. build the criu repo
    2. moved the libsoccr.a from the soccr directory to a "lib" folder inside my project.
    3. Copied the include repo from criu to my project.
    4. Ran GCC to link the lib gcc doSocket.c -lsoccr -o doSocket.o -I include -L lib

    However it is failing trying to find libnet_init.

    Now I am guessing that I am linking the project incorrectly (as I probably also have to link libnet and other dependencies) but I am afraid I have to admit that I'm not really sure how to go from here as my C skills are lacking here.

    Anyone that can lead me in the right direction // is there maybe a demo application on github somewhere that demonstrates how to use libsoccr as a standalone library?

    Help would be really appreciated, I hope that I am not annoying you with my beginner-questions.

    3 replies
    Do Hoang
    @huyhoang8398
    is this possible to translate Virtual Address and Number Pages of Application to PFN or struct Page in Kernel Module?
    Shreyas Kharbanda
    @Alphacode18

    Hi all,

    I have been trying to setup CRIU v3.13 inside a privileged docker container (A dockerized CRIU Image is bundled as part of the Dockerfile on an Alpine Linux Base) to checkpoint a certain process. The containers run inside a Kubernetes cluster on a fresh install of Ubuntu 20.04 LTS.

    I have done some sanity checks, namely running the criu check command from within the container. Apart from criu check, I have checked for various privilege accesses and all come back positive.

    The criu check command returns the following:

    Error (criu/util.c:610): exited, status=1
    Error (criu/util.c:610): exited, status=1
    Warn (criu/kerndat.c:839): Can't keep kdat cache on non-tempfs
    Looks good.

    I am currently trying to get the simple loop example (https://criu.org/Simple_loop) working. After following the instructions, I see that dumping the process fails with error code -1.

    pie: 52: Warn (criu/pie/parasite.c:648): /proc/self/cgroup was bigger than the page size
    pie: 52: __sent ack msg: 76 76 -1
    pie: 52: Close the control socket for writing
    (00.037284) Fetched ack: 76 76 -1
    pie: 52: Daemon waits for command
    (00.037294) Error (compel/src/lib/infect-rpc.c:72): Command 76 for daemon failed with -1
    (00.037305) Error (criu/parasite-syscall.c:447): Parasite failed to dump /proc/self/cgroup

    I haven't had much success in finding a solution to the problem in the GitHub issues, and I'd really appreciate it if you could point me in the right direction.

    Pavel Tikhomirov
    @Snorch
    That happens likely because you have too much nested cgroup directories for a dumpee process. If you show /proc/<dumpee_pid>/cgroup that can be confirmed.
    one way of fixing/workarounding it is to use criu on host =)
    Adrian Reber
    @adrian:lisas.de
    [m]
    After almost two years my Kubernetes checkpoint support PR was finally merged today: kubernetes/kubernetes#104907 That took a lot longer than expected, but now it should be possible to checkpoint containers in Kubernetes with the help of CRIU 🎆
    Prajwal S N
    @snprajwal
    That's awesome!! Congratulations :confetti_ball: