Just a few versions: the hugetlbfs mounts have no FS_USERNS_MOUNT (it means that cannot be mounted from inside the user namespace) (but shmemfs has this fs_flag).
Interesting information. I get error with user namespace but have no idea why, so I've disabled uns test already.
I think I understand the problem. hugetlbfs is mounted by default as I see but I need to set the number of hugepage in sysfs so that I can allocate them. Currently, I add some code to zdtm.py to set some hugepages before run zdtm test. The kerndat function may get called before hugepages are available so I cannot allocate and get dev number. The reason for inconsistent zdtm result is because the kerndat get cached. And my cached version in host get the correct dev number so the zdtm test is always successful.
Thank you @adrian:lisas.de @mihalicyn :)
Hi @minhbq-99 , afaics If we shmat sysvipc shm with hugetlb backing it looks the same as hugetlb mapping created by mmap. Meaning that it also has /proc/pid/map_files/file. But looking on your PR https://github.com/checkpoint-restore/criu/pull/1622/commits/3ec6dbfe29558c5067fdb7c04313f01743e694c7#diff-6f08d59ddde08ca75f7ccb0aac7f5ca6e011bd968b57d3de0ee7a1786f582763R238 I'm not sure that your dev comparison works even for mmap's. If I do simple test https://gist.github.com/Snorch/ab5f86e5e8f3d7f9fecfd7eabdcadd7a
[root@fedora helpers]# ./shm-huge
shm_ptr = 0x7f8868400000
map = 0x7f88689af000
map2m = 0x7f8868200000
All three different mappings have the same device:
[root@fedora snorch]# stat /proc/136984/map_files/{7f8868400000,7f88689af000,7f8868200000}* | grep Dev
Device: 16h/22d Inode: 1055674 Links: 1
Device: 16h/22d Inode: 1055681 Links: 1
Device: 16h/22d Inode: 1055673 Links: 1
On pretty new 5.13.12-200.fc34.x86_64 kernel.
[root@fedora helpers]# ./shm-huge
shm_ptr = 0x7f555e800000
map = 0x7f555ed76000
map2m = 0x7f555e600000
[root@fedora helpers]# grep "7f555e800000\|7f555ed76000\|7f555e600000" /proc/158858/maps
7f555e600000-7f555e800000 rw-s 00000000 00:0f 1088051 /anon_hugepage (deleted)
7f555e800000-7f555ea00000 rw-s 00000000 00:0f 65567 /SYSV6129e7d0 (deleted)
7f555ed76000-7f555ed77000 rw-s 00000000 00:01 73556 /dev/zero (deleted)
Hi, my pull request fails on a CentOS 7 user_namespace test case. The problem is in restoring hugetlb shmem mappings, when restoring shmem mappings, we try to use memfd, if we cannot, we open map_files link of that mapping. In case of of CentOS 7, we fall into open map_files link and we don't have CAP_SYS_ADMIN cap
https://elixir.bootlin.com/linux/v3.10/source/fs/proc/base.c#L1889
With some debuggings, I found that the restored process have CAP_SYS_ADMIN but its cred->user_ns
has the lower level than init_user_ns. But why the checkpoint process can open map_files link? I see that checkpoint process's cred->user_ns
is the same as init_user_ns. So why there is a difference in cred->user_ns
between checkpoint and restore process?
Traceback (most recent call last):
File "./criu-ns", line 231, in <module>
res = wrap_dump()
File "./criu-ns", line 200, in wrap_dump
set_pidns(pid, pid_idx)
File "./criu-ns", line 161, in set_pidns
raise OSError(errno.ENOENT, 'Cannot find NSpid field in proc')
FileNotFoundError: [Errno 2] Cannot find NSpid field in proc
cat /proc/self/status | grep NSpid
that command should work
which version of ubuntu?
When criu is restored, it will cause pid conflicts, so through criu-ns restoration under the new command space, the successful process has been restored, and freezing again will cause this problem
uname -a ?
Linux 8194e282c3c5 3.10.0-1062.el7.x86_64 #1 SMP Wed Aug 7 18:08:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
While dumping a container (using runc diskless method), I get a stats-dump file on the source node. But on the destination side, after restoring that container, I am not getting the stats-restore file. Isn't it possible to get that stats-restore thing?
(I also used podman where stats-dump and stats-restore both got on the source and destination)
Though I had asked this question before and thought I could solve this issue, I failed to do that.
stats-restore
in https://github.com/checkpoint-restore/criu/blob/014e4f3002a5b5f01f619252cd0b1b1f4632aa9b/criu/cr-restore.c#L2427c1/stats-restore
should be created.--pre-dump
doesn't create a complete checkpoint)
@rst0git
sudo runc checkpoint --pre-dump --image-path <dir> --work-path <dir> looper --page-server <dest_ip>:port --tcp-established
sudo runc checkpoint --image-path <dir> --work-path <dir> looper --page-server <dest_ip>:port --tcp-established
I am using these two commands to pre-dump and dump the container.
In case of https://github.com/checkpoint-restore/criu/issues/1652#issuecomment-968341985 this one both stats-dump and stats-restore was created.
Hi @Avijit009,
Assuming that you are migrating a runc container between two VMs, you can try the following steps:
1 [Both VMs] Make sure that you have the same rootfs
and config.json
:
mkdir -p tmp/rootfs && cd tmp
sudo docker export $(sudo docker create alpine:latest) --output="alpine.tar"
sudo tar xf alpine.tar -C rootfs
runc spec
sed -i '/terminal/c\ \"terminal": false,' config.json
sed -i '/"sh"/c\ \"sh", "-c", "i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done"' config.json
2 [Src VM] Run container:
sudo runc run -d looper &> /dev/null < /dev/null
sudo runc ps looper
3 [Dst VM] Start page server:
mkdir c1
sudo criu page-server --images-dir c1 --port 5000
4 [Src VM] Run first pre-dump checkpoint:
sudo runc checkpoint --pre-dump --image-path c1 --work-path c1 --page-server <dst IP address>:5000 looper
# sudo crit show c1/stats-dump
5 [Dst VM] Start page server (again):
sudo criu page-server --images-dir c1 --port 5000 --auto-dedup
4 [Src VM] Run second pre-dump checkpoint:
sudo runc checkpoint --page-server <dst IP address>:5000 --pre-dump --image-path c2 --work-path c2 looper
# sudo crit show c2/stats-dump
5 [Dst VM] Start page server (again):
sudo criu page-server --images-dir c1 --port 5000 --auto-dedup
6 [Src VM] Run final checkpoint:
sudo runc checkpoint --page-server <dst IP address>:5000 --image-path c3 --work-path c3 looper
# sudo crit show c3/stats-dump
# Send checkpoint files to destination VM:
scp -r ./c3 <dst IP address>:
7 [Dst VM] Restore container:
# Combine all checkpoint files
mv ~/c3/* c1
# Restore container
sudo runc restore -d --image-path c1 --work-path c1 looper
# sudo crit show c1/stats-restore
The following commands are useful for clean-up:
# Stop container
sudo runc kill looper KILL
# Remove stopped container
sudo runc delete looper