Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

bypass4netns: Accelerating TCP/IP Communications in Rootless Containers (2402.00365v1)

Published 1 Feb 2024 in cs.NI and cs.OS

Abstract: "Rootless containers" is a concept to run the entire container runtimes and containers without the root privileges. It protects the host environment from attackers exploiting container runtime vulnerabilities. However, when rootless containers communicate with external endpoints, the network performance is low compared to rootful containers because of the overhead of rootless networking components. In this paper, we propose bypass4netns that accelerates TCP/IP communications in rootless containers by bypassing slow networking components. bypass4netns uses sockets allocated on the host. It switches sockets in containers to the host's sockets by intercepting syscalls and injecting the file descriptors using Seccomp. Our method with Seccomp can handle statically linked applications that previous works could not handle. Also, we propose high-performance rootless multi-node communication. We confirmed that rootless containers with bypass4netns achieve more than 30x faster throughput than rootless containers without it. In addition, we evaluated performance with applications and it showed large improvements on some applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Bypass container overlay networks with transparent bpf-driven socket replacement. In IEEE CLOUD 2022, pages 134–143. IEEE, 2022.
  2. Cloud Native Computing Foundation. Kubernetes. https://github.com/kubernetes/kubernetes, 2014.
  3. Cloud Native Computing Foundation. Cluster networking | kubernetes. https://kubernetes.io/docs/concepts/cluster-administration/networking/, 2023.
  4. Common Weakness Enumeration. CWE-367: Time-of-check Time-of-use (TOCTOU) Race Condition. https://cwe.mitre.org/data/definitions/367.html, 2023.
  5. M. Crosby et al. runc. https://github.com/opencontainers/runc/, 2014.
  6. M. Crosby et al. containerd. https://github.com/containerd/containerd, 2016.
  7. dlezcano et al. LXC - Linux Containers. https://github.com/lxc/lxc, 2008.
  8. dmcgowan. archive: check whiteout path before removal. https://github.com/containerd/containerd/pull/2001, 2018.
  9. Docker, Inc. Docker. https://github.com/docker, 2013.
  10. flannel-io. flannel. https://github.com/flannel-io/flannel, 2014.
  11. giuseppe. rootfs: umount all procfs and sysfs with –no-pivot. https://github.com/opencontainers/runc/pull/1962, 2019.
  12. S. Hallyn et al. lxc-user-nic. https://github.com/lxc/lxc/blob/master/doc/lxc-user-nic.sgml.in, 2013.
  13. MegaPipe: A new programming interface for scalable network I/O. In OSDI 12, pages 135–148. USENIX, 2012.
  14. mTCP: a highly scalable user-level TCP stack for multicore systems. In NSDI 14, pages 489–502. USENIX, 2014.
  15. M. Kerrisk. capabilities(7) Linux User’s Manual. https://man7.org/linux/man-pages/man7/capabilities.7.html, 2021.
  16. M. Kerrisk. namespaces(7) Linux User’s Manual. https://man7.org/linux/man-pages/man7/namespaces.7.html, 2021.
  17. M. Kerrisk et al. seccomp(2). https://man7.org/linux/man-pages/man2/seccomp.2.html, 2021.
  18. M. Kerrisk et al. seccomp_unotify(2). https://man7.org/linux/man-pages/man2/seccomp_unotify.2.html, 2021.
  19. Xmasq: Low-overhead container overlay network based on ebpf, 2023.
  20. A measurement study on linux container security: Attacks and countermeasures. ACSAC ’18, page 418–429. ACM, 2018.
  21. Grafting sockets for fast container networking. In ANCS ’18, page 15–27. ACM, 2018.
  22. National Institute of Standards and Technology. CVE-2017-5985. 2017.
  23. National Institute of Standards and Technology. CVE-2018-6556. https://nvd.nist.gov/vuln/detail/CVE-2018-6556, 2018.
  24. National Institute of Standards and Technology. CVE-2019-14271. https://nvd.nist.gov/vuln/detail/CVE-2019-14271, 2019.
  25. National Institute of Standards and Technology. CVE-2019-5736. https://nvd.nist.gov/vuln/detail/CVE-2019-5736, 2019.
  26. Project Calico. Calico. https://github.com/projectcalico/calico, 2014.
  27. Red Hat, Inc. Podman. https://github.com/containers/podman, 2018.
  28. A. Sarai et al. Rootless containers. https://rootlesscontaine.rs/, 2017.
  29. I. Shakury et al. tracee. https://github.com/aquasecurity/tracee, 2019.
  30. A. Suda. [CVE-2020–15257] Don’t use –net=host. Don’t use spec.hostNetwork. https://medium.com/nttlabs/dont-use-host-network-namespace-f548aeeef575, 2020.
  31. A. Suda et al. Rootlesskit. https://github.com/rootless-containers/rootlesskit, 2018.
  32. A. Suda et al. slirp4netns. https://github.com/rootless-containers/slirp4netns, 2018.
  33. A. Suda et al. Usernetes. https://github.com/rootless-containers/usernetes, 2018.
  34. A. Suda et al. nerdctl. https://github.com/containerd/nerdctl/, 2020.
  35. Slim: OS kernel support for a Low-Overhead container overlay network. In NSDI 19, pages 331–344. USENIX, 2019.
Citations (3)

Summary

  • The paper demonstrates that bypass4netns enhances TCP/IP throughput by over 30 times in rootless container environments.
  • It employs socket switching with host-allocated sockets and Seccomp notifications to bypass conventional network namespace inefficiencies.
  • The method supports multi-node communications without VXLAN, enabling secure, efficient distributed networking in containerized systems.

An Academic Overview of "bypass4netns: Accelerating TCP/IP Communications in Rootless Containers"

The paper "bypass4netns: Accelerating TCP/IP Communications in Rootless Containers" presents a compelling solution to the network performance bottlenecks that afflict rootless containers. The proposed method, bypass4netns, significantly enhances TCP/IP communication speed in rootless container environments by circumventing inefficient networking components that are inherent to those containers.

The researchers investigate the limitations faced by rootless containers, wherein the absence of root privileges necessitates the use of alternative networking technologies like RootlessKit and slirp4netns. These technologies, while serving their purpose in rootless environments, introduce substantial overhead, greatly reducing network throughput as compared to traditional rootful containers. To address this, bypass4netns employs a strategy of socket switching, utilizing host-allocated sockets and Seccomp (secure computing mode) processes for the exchange of network data. This technique effectively bypasses the intermediate network namespace and the additional layers that cause the inefficiencies in conventional rootless networking.

The results demonstrated in the paper are marked by significant improvements. The authors report throughput enhancements exceeding 30 times when deploying bypass4netns, as against rootless containers without it. They also verify these enhancements with real-world applications, showcasing substantial performance boosts in practical terms. The method's compatibility with seamlessly replacing container-level sockets without requiring file descriptor alteration stands as a testament to its robustness. In particular, its use of Seccomp user-space notifications to intercept and redirect network system calls without demanding kernel modifications constitutes a crucial advantage over other solutions.

Furthermore, bypass4netns is extended to support multi-node communication without the use of traditional tools like VXLAN, thereby achieving high-performance networking even across distributed systems. This flexible approach does not incur the traditional root privilege requirements, thus maintaining the security benefits of rootless containers while minimizing operational lags.

The authors provide a detailed description of the surrounding concept of user namespaces and the role they play in achieving rootless containerization. They discuss related works, including limitations of existing solutions like AF_GRAFT and Slim, emphasizing how bypass4netns overcomes these by maintaining full user-space functionality and compatibility, including for applications with statically linked binaries which other methods fail to address.

This paper contributes significantly to the theoretical landscape by suggesting that socket behavior analysis using tools like eBPF (Extended Berkeley Packet Filter) could optimize container network layers further, opening avenues for fine-grained control and potentially paving the way for broader applicability in non-container contexts. Practically, it promises to enhance the deployment of more secure cloud environments that are less reliant on privileged operations, conducive to the robust operation of container orchestrations like Kubernetes.

Future research may aim to explore integration with container orchestration systems and elaborate on load balancing strategies in such frameworks. Furthermore, addressing the limitations noted by the authors, such as Seccomp's syscall drops, could enhance bypass4netns's reliability and expand its deployment scope.

In conclusion, this paper advances both the understanding and technology of container networking by offering a high-performance, compatible, and secure method for accelerating communications in rootless containers without losing the security benefits they bring to the table.