bypass4netns: Accelerating TCP/IP Communications in Rootless Containers (2402.00365v1)
Abstract: "Rootless containers" is a concept to run the entire container runtimes and containers without the root privileges. It protects the host environment from attackers exploiting container runtime vulnerabilities. However, when rootless containers communicate with external endpoints, the network performance is low compared to rootful containers because of the overhead of rootless networking components. In this paper, we propose bypass4netns that accelerates TCP/IP communications in rootless containers by bypassing slow networking components. bypass4netns uses sockets allocated on the host. It switches sockets in containers to the host's sockets by intercepting syscalls and injecting the file descriptors using Seccomp. Our method with Seccomp can handle statically linked applications that previous works could not handle. Also, we propose high-performance rootless multi-node communication. We confirmed that rootless containers with bypass4netns achieve more than 30x faster throughput than rootless containers without it. In addition, we evaluated performance with applications and it showed large improvements on some applications.
- Bypass container overlay networks with transparent bpf-driven socket replacement. In IEEE CLOUD 2022, pages 134–143. IEEE, 2022.
- Cloud Native Computing Foundation. Kubernetes. https://github.com/kubernetes/kubernetes, 2014.
- Cloud Native Computing Foundation. Cluster networking | kubernetes. https://kubernetes.io/docs/concepts/cluster-administration/networking/, 2023.
- Common Weakness Enumeration. CWE-367: Time-of-check Time-of-use (TOCTOU) Race Condition. https://cwe.mitre.org/data/definitions/367.html, 2023.
- M. Crosby et al. runc. https://github.com/opencontainers/runc/, 2014.
- M. Crosby et al. containerd. https://github.com/containerd/containerd, 2016.
- dlezcano et al. LXC - Linux Containers. https://github.com/lxc/lxc, 2008.
- dmcgowan. archive: check whiteout path before removal. https://github.com/containerd/containerd/pull/2001, 2018.
- Docker, Inc. Docker. https://github.com/docker, 2013.
- flannel-io. flannel. https://github.com/flannel-io/flannel, 2014.
- giuseppe. rootfs: umount all procfs and sysfs with –no-pivot. https://github.com/opencontainers/runc/pull/1962, 2019.
- S. Hallyn et al. lxc-user-nic. https://github.com/lxc/lxc/blob/master/doc/lxc-user-nic.sgml.in, 2013.
- MegaPipe: A new programming interface for scalable network I/O. In OSDI 12, pages 135–148. USENIX, 2012.
- mTCP: a highly scalable user-level TCP stack for multicore systems. In NSDI 14, pages 489–502. USENIX, 2014.
- M. Kerrisk. capabilities(7) Linux User’s Manual. https://man7.org/linux/man-pages/man7/capabilities.7.html, 2021.
- M. Kerrisk. namespaces(7) Linux User’s Manual. https://man7.org/linux/man-pages/man7/namespaces.7.html, 2021.
- M. Kerrisk et al. seccomp(2). https://man7.org/linux/man-pages/man2/seccomp.2.html, 2021.
- M. Kerrisk et al. seccomp_unotify(2). https://man7.org/linux/man-pages/man2/seccomp_unotify.2.html, 2021.
- Xmasq: Low-overhead container overlay network based on ebpf, 2023.
- A measurement study on linux container security: Attacks and countermeasures. ACSAC ’18, page 418–429. ACM, 2018.
- Grafting sockets for fast container networking. In ANCS ’18, page 15–27. ACM, 2018.
- National Institute of Standards and Technology. CVE-2017-5985. 2017.
- National Institute of Standards and Technology. CVE-2018-6556. https://nvd.nist.gov/vuln/detail/CVE-2018-6556, 2018.
- National Institute of Standards and Technology. CVE-2019-14271. https://nvd.nist.gov/vuln/detail/CVE-2019-14271, 2019.
- National Institute of Standards and Technology. CVE-2019-5736. https://nvd.nist.gov/vuln/detail/CVE-2019-5736, 2019.
- Project Calico. Calico. https://github.com/projectcalico/calico, 2014.
- Red Hat, Inc. Podman. https://github.com/containers/podman, 2018.
- A. Sarai et al. Rootless containers. https://rootlesscontaine.rs/, 2017.
- I. Shakury et al. tracee. https://github.com/aquasecurity/tracee, 2019.
- A. Suda. [CVE-2020–15257] Don’t use –net=host. Don’t use spec.hostNetwork. https://medium.com/nttlabs/dont-use-host-network-namespace-f548aeeef575, 2020.
- A. Suda et al. Rootlesskit. https://github.com/rootless-containers/rootlesskit, 2018.
- A. Suda et al. slirp4netns. https://github.com/rootless-containers/slirp4netns, 2018.
- A. Suda et al. Usernetes. https://github.com/rootless-containers/usernetes, 2018.
- A. Suda et al. nerdctl. https://github.com/containerd/nerdctl/, 2020.
- Slim: OS kernel support for a Low-Overhead container overlay network. In NSDI 19, pages 331–344. USENIX, 2019.