lxc-attach コマンドメモ
lxc 0.9.0 以降は userns 対応もあって lxc-attach コマンドが大幅に書き直されている.こんな仕組みみたい.
/* For the cgroup attaching logic to work in conjunction with pid and user namespaces, * we need to have the following hierarchy: * * lxc-attach [process executed externally] * | socketpair(cgroup_ipc_sockets) * | fork() -> child * | | setns() * | | fork() -> grandchild * | | | initialize * | | | signal parent * | |<------------------|----+ * | | signal parent | * |<----------------------|-----+ | * | add to cgroups | | * | signal child -------->| | * | | signal child ---->| * | waitpid() | waitpid() | exec() * | |<------------------| exit() * |<----------------------| exit() * | exit() * * The rationale is the following: The first parent is needed because after * setns() (mount + user namespace) we can't access the cgroup filesystem * to add the pid to the corresponding cgroup. Therefore, we need to do that * in a process executed on the host, so that's why we need to fork and wait * for it to have done some initialization (cgroups may restrict certain * operations so we have to do that in the end) and use IPC for signaling. * * Then in the child process we do the setns(). However, a process is never * really attached to a pid namespace (never changes its pid, doesn't appear * in the pid namespace /proc), only child processes of that process are * truely inside the new pid namespace. That's why we need to fork() again * after setns() before performing final initializations, then signal our * parent, which signals the primary process, which does cgroup adding, * which then signals to the grandchild that it can exec(). */ (src/lxc/lxc_attach.c の main 関数内コメントより)
- 親プロセスは,実際にアタッチするプロセスを対象の cgroup に追加しなければいけないので必要.そもそも子供から cgroup fs が見えない
- 子プロセスは,setns() するけど,そのプロセス自身は対象となる pid namespace には入れないので,そこからさらに fork() する必要がある