Reproduce:
[snorch@turmoil test]$ cat include/linux/mount.h
enum fsconfig_command {
FSCONFIG_SET_FLAG = 10,
};
[snorch@turmoil test]$ cat test.c
#include <stdio.h>
/* Include new glibc sys/mount.h header */
#include "/home/snorch/devel/general/glibc/sysdeps/unix/sysv/linux/sys/mount.h"
int main () {
printf("%d\n", FSCONFIG_SET_FLAG);
return 0;
}
[snorch@turmoil test]$ gcc -o test -I include test.c
In file included from test.c:2:
/home/snorch/devel/general/glibc/sysdeps/unix/sysv/linux/sys/mount.h:240:6: error: redeclaration of ‘enum fsconfig_command’
240 | enum fsconfig_command
| ^~~~~~~~~~~~~~~~
In file included from /home/snorch/devel/general/glibc/sysdeps/unix/sysv/linux/sys/mount.h:32:
include/linux/mount.h:1:6: note: originally defined here
1 | enum fsconfig_command {
| ^~~~~~~~~~~~~~~~
/home/snorch/devel/general/glibc/sysdeps/unix/sysv/linux/sys/mount.h:242:3: error: redeclaration of enumerator ‘FSCONFIG_SET_FLAG’
242 | FSCONFIG_SET_FLAG = 0, /* Set parameter, supplying no value */
| ^~~~~~~~~~~~~~~~~
include/linux/mount.h:2:9: note: previous definition of ‘FSCONFIG_SET_FLAG’ with type ‘enum fsconfig_command’
2 | FSCONFIG_SET_FLAG = 10,
| ^~~~~~~~~~~~~~~~~
Let's wait for second opinion on it.
criu/include/linux/mount.h
was introduced in commit: checkpoint-restore/criu@b5b1c4elinux/mount.h
, but the subfolder name (sys
or linux
) in the criu source tree does not make any difference.
enum fsconfig_command
was recently added to sys/mount.h
:Hi everyone, I am trying to checkpoint a process, but I got this error message:
[ff.checkpoint] (0.085s) criu> (00.035845) ----------------------------------------
[ff.checkpoint] (0.085s) criu> (00.036034)
[ff.checkpoint] (0.085s) criu> (00.036039) Dumping pages (type: 58 pid: 1000)
[ff.checkpoint] (0.085s) criu> (00.036041) ----------------------------------------
[ff.checkpoint] (0.085s) criu> (00.036772) Pagemap generated: 1792 pages (0 lazy) 0 holes
[ff.checkpoint] (0.085s) criu> (00.039268) Error (criu/page-xfer.c:254): page-xfer: Unable to spice data: Broken pipe
[ff.checkpoint] (0.085s) criu> (00.039285) Error (criu/bfd.c:132): bfd: Error flushing image: Broken pipe
[ff.checkpoint] (0.085s) criu> (00.039361) ----------------------------------------
[ff.checkpoint] (0.085s) criu> (00.039364) Error (criu/mem.c:644): Can't dump page with parasite
[ff.checkpoint] (0.085s) criu> (00.039377) Error (criu/bfd.c:132): bfd: Error flushing image: Broken pipe
[ff.checkpoint] (0.085s) criu> (00.041419) Error (criu/bfd.c:132): bfd: Error flushing image: Broken pipe
[ff.checkpoint] (0.085s) criu> (00.041458) Unlock network
[ff.checkpoint] (0.085s) criu> (00.041462) Unfreezing tasks into 1
[ff.checkpoint] (0.085s) criu> (00.041547) Dismissing the image streamer
[ff.checkpoint] (0.085s) criu> (00.041558) Error (criu/cr-dump.c:1792): Dumping FAILED.
Also, I read about these explaining how CRIU implements checkpoint TCP connection (https://criu.org/TCP_connection), but it's more about how they deal with sockets during the restore process.
Can someone help to point me out why this error happens during the checkpoint process? ? Thanks in advance!
splice
method needs to write to the sockets of the processs I am gonna checkpoint. /* local xfer */
static int write_pages_loc(struct page_xfer *xfer, int p, unsigned long len)
{
ssize_t ret;
ssize_t curr = 0;
while (1) {
ret = splice(p, NULL, img_raw_fd(xfer->pi), NULL, len - curr, SPLICE_F_MOVE);
if (ret == -1) {
pr_perror("Unable to spice data");
return -1;
}
if (ret == 0) {
pr_err("A pipe was closed unexpectedly\n");
return -1;
}
curr += ret;
if (curr == len)
break;
}
return 0;
}
@SallyKAN: maybe try it first without fast freeze. @nviennot is the author of fast freeze and is also reachable here (sometimes)
Thanks for your replying!
I don't quite understand here in the source code of CRIU, it seems like the splice method needs to write to the sockets of the processs I am gonna checkpoint.
In the above code splice writes memory of your dumped processes to image files.
So does this mean that splice will read all the fds of my process first? I am trying to figure out in what circumstances it will throw the page-xfer: Unable to spice data: Broken pipe
error
You can use https://github.com/Snorch/linux-helpers/blob/master/gftrace.sh like this
perf probe -f '__x64_sys_splice%return $retval'
bash ./gftrace.sh __x64_sys_splice
And reproduce the problem while script is running.
Depending on your kernel the exact traceable name of sys_splice may be different. Search for it in /sys/kernel/debug/tracing/available_filter_functions if needed.
And provide output file (./trace), that would probably shed more light on what happens in your case.
Hello Everyone,
How do I make CRIU dump opened files that are on tmpfs or dev mounts (e.g. /tmp, /dev/shm). I know CRIU supports this, but I can't get it work.
16: Error (criu/files-reg.c:2259): Can't open file dev/shm/mono.16 on restore: No such file or directory
16: Error (criu/files-reg.c:2185): Can't open file dev/shm/mono.16: No such file or directory
16: Error (criu/mem.c:1359): `- Can't open vma
15: Error (criu/cr-restore.c:1494): 16 exited, status=1
I tried specifying --external
for those mountpoints, but still they are not being dumped.
Thanks!
How do I make CRIU dump opened files that are on tmpfs or dev mounts (e.g. /tmp, /dev/shm)
There are a few ways of doing this. It depends on your use case. For instance, you can use action-script as shown in the following example.
https://github.com/checkpoint-restore/criu/blob/criu-dev/scripts/tmp-files.sh
Can't open file dev/shm/mono.16
btw, we recently added support in Podman to checkpoint/restore the content of dev/shm
: containers/podman#12665
Hi everyone, I am debugging a failed test in my cgroup-v2 PR. In global properties, we use access(path, F_OK)
if errno == ENOENT
, we simply skip this global property. However, I observe that in the failed test the errno of access(path, F_OK)
is EACCESS
but when opening the file the errno is ENOENT
which is weird to me. I also see some access(path, F_OK)
returns EACCESS
but can still open and read that file.
My patch for getting some logs
diff --git a/criu/cgroup.c b/criu/cgroup.c
index 2cdb63609..f4f50fd38 100644
--- a/criu/cgroup.c
+++ b/criu/cgroup.c
@@ -382,14 +382,19 @@ static int dump_cg_props_array(const char *fpath, struct cgroup_dir *ncd, const
struct cgroup_prop *prop;
for (j = 0; cgp && j < cgp->nr_props; j++) {
+ int ret;
+
if (snprintf(buf, PATH_MAX, "%s/%s", fpath, cgp->props[j]) >= PATH_MAX) {
pr_err("snprintf output was truncated\n");
return -1;
}
- if (access(buf, F_OK) < 0 && errno == ENOENT) {
+ ret = access(buf, F_OK);
+ if (ret < 0 && errno == ENOENT) {
pr_info("Couldn't open %s. This cgroup property may not exist on this kernel\n", buf);
continue;
+ } else if (ret < 0) {
+ pr_perror("cgroup: Path: %s", buf);
}
prop = create_cgroup_prop(cgp->props[j]);
Error log
(00.071032) Error (criu/cgroup.c:397): cg: cgroup: Path: /proc/self/fd/16/bar/cgroup.clone_children: Permission denied
(00.071045) cg: Dumping value 0 from /proc/self/fd/16/bar/cgroup.clone_children
(00.071049) Error (criu/cgroup.c:397): cg: cgroup: Path: /proc/self/fd/16/bar/notify_on_release: Permission denied
(00.071056) cg: Dumping value 0 from /proc/self/fd/16/bar/notify_on_release
(00.071060) Error (criu/cgroup.c:397): cg: cgroup: Path: /proc/self/fd/16/bar/cgroup.procs: Permission denied
(00.071066) cg: Dumping value from /proc/self/fd/16/bar/cgroup.procs
(00.071069) Error (criu/cgroup.c:397): cg: cgroup: Path: /proc/self/fd/16/bar/tasks: Permission denied
(00.071075) cg: Dumping value from /proc/self/fd/16/bar/tasks
(00.071079) Error (criu/cgroup.c:397): cg: cgroup: Path: /proc/self/fd/16/bar/cgroup.subtree_control: Permission denied
(00.071084) Error (criu/cgroup.c:292): cg: Failed opening /proc/self/fd/16/bar/cgroup.subtree_control: No such file or directory
(00.071085) Error (criu/cgroup.c:462): cg: dumping global properties failed
(00.071088) Error (criu/cgroup.c:732): cg: failed walking /proc/self/fd/16/ for empty cgroups: No such file or directory
(00.029005) 1: Error (criu/mount-v2.c:891): mnt-v2: Failed to copy sharing from -1:/var/lib/containers/storage/overlay-containers/e80ac5757f21caec6cb74bf628a39aa47fa86cf7c9c49a361711ed0abf711c7b/userdata/.containerenv to 11: Invalid argument
(00.029028) 1: Error (criu/mount-v2.c:958): mnt-v2: Failed to copy sharing from source /var/lib/containers/storage/overlay-containers/e80ac5757f21caec6cb74bf628a39aa47fa86cf7c9c49a361711ed0abf711c7b/userdata/.containerenv to 656
(00.029821) Error (criu/mount.c:3674): mnt: Can't remove the directory /tmp/.criu.mntns.kzWYEa: Device or resource busy
(00.029843) Error (criu/cr-restore.c:2536): Restoring FAILED.
@adrian:lisas.de https://github.com/Snorch/linux-helpers/blob/master/gftrace.sh collecting gftrace would be helpful
perf probe -f 'do_set_group%return $retval'
./gftrace do_set_group
at the time of error.