ReadWrite

Deep argument inspection in seccomp filters

Recently I’ve been working on using Seccomp filters as part of hardening a Linux binary. The goal of Seccomp in my use-case was to reduce the kernel attack surface by reducing the exposure of the system calls available to the running process.

Principle of Least Privilege

You see, even if you follow most Linux security containerization features available at hand, a security bug in the kernel could render most of those features ineffective depending upon the kind of bug. This is where seccomp(2) shines, as it provides a mechanism for the process to drop unnecessary system calls before running untrusted code or dealing with user input.

This massively reduces the probability of these exploits, since those system calls required for the exploit(directly or indirectly) would need to be allowed by seccomp for a successful escalation of privilege.

The Sweet Spot for Seccomping

As with most instances of the principle of least privilege, you want to drop privileges at the sweet spot where most of your privileged actions are already carried out. In the case of seccomp, this is ideally the spot where all your privileged system calls have already been executed.

Want to make a network connection? Sure, create a connect(2) call and hold on to those precious file descriptors, for you nor an untrusted code would be able to make the connect(2) call again. There are some trade-offs to this. It is not easy to refactor the code for seccomp if these connection calls are spread across the lifetime of the process.

Programs that are written with the goal of seccomping in mind would make life much easier for the person hardening this program.

In our case, say you can’t seccomp this connect(2) call and the only way to get this program working as intended is to allow this system call through the system call filter. What else can we do here?

Filter by inspecting arguments

So seccomp-bpf allows you to filter system calls and inspect them based on arguments. Good. And not so good at the same time. They won’t work for pointer-based arguments. This means you cannot filter the addr argument to connect(2), which makes your filter useless.

I mean, take a look at the signature for the connect(2) syscall.

int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen);

The second argument (addr) is a pointer, seccomp-bpf is useless here since it doesn’t validate the data pointed to by that pointer. I wish seccomp-bpf had this feature since it would help us write better, more hardened filters. Let us call this feature, “deep argument inspection in seccomp filters.”

Why don’t they inspect pointers?

Am fairly new to Linux’s security promises and it wasn’t obvious to me why would they discard such a valuable feature. It turns out there is a good reason for this.

You see, the Seccomp filter only has control over the system call just before its executed by the kernel. In the case of a deep argument inspection with pointers, the kernel would access the pointer at least two times. For connect(2), the addr argument would be read by the seccomp filter for inspection and later read again by the underlying system call when it is used.

This cries out loud for a TOCTOU(time-of-check-time-of-use) bug. The malicious process could trigger a race condition, where the data in memory could be an innocuous one when the seccomp inspection happens and some malicious data when the system call is used.

Avoiding TOCTOU is a Seccomp goal.

This was made clear by the following description from the kernel.org document on Seccomp-BPF

“Additionally, BPF makes it impossible for users of seccomp to fall prey to time-of-check-time-of-use (TOCTOU) attacks that are common in system call interposition frameworks. BPF programs may not dereference pointers which constrains all filters to solely evaluating the system call arguments directly.”

If Seccomp wasn’t designed with this goal in mind, it would be almost futile for security purposes. However, deep argument inspection is a well sought out feature for Seccomp. I was searching the Linux mailing lists for some pointers around this and found this gem below,

> doing deep argument inspection, but it is not an easy thing to get
> right. :)

Yes, please do not rush such a thing!!  It might even be a can of worms
not worth opening.

Conclusion

Indeed it is like opening a can of worms and eventually developers trusting a weak security feature for all the essential things. I’m curious if there is a robust way to solve this problem without changing the system call signatures.

I found LandLock which uses Linux Security Module (LSM) to solve this problem and it sounds like an interesting approach.

Generic solutions to this problem seem rather hard to me. However, I would take an implementation that compromises some system call performance for deep argument inspection. Probably one could implement a filter which copies data to kernel-space before calling those functions? Am I asking for more bugs? Don’t know. For now, I’ll just read and think more about this problem.