Technology

Show HN: Porting OpenBSD Pledge() to Linux

Show HN: Porting OpenBSD Pledge() to Linux thumbnail

July 13th, 2022 @ justine’s web page

[OpenBSD Blowfish Logo]

OpenBSD is an operating system that’s famous for its focus on security.
Unfortunately, OpenBSD leader Theo states that there
are only 7000
users
of OpenBSD. So it’s a very small but elite group, that yields
a disproportionate influence; since we hear all the time about the
awesome security features these guys get to use, even though we usually
can’t use them ourselves.

Pledge is like the forbidden fruit we all covet when the boss says we
must use things like Linux. Why does it matter? It’s because pledge()
actually makes security comprehensible. Linux has never really had a
security layer that mere mortals can understand. For example, let’s say
you want to do something on Linux like control whether or not some
program you downloaded from the web is allowed to have telemetry. You’d
need to write stuff like this:

static const struct sock_filter kFilter[] = {
    /L0*/ BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, syscall, 0, 14 - 1),
    /L1*/ BPF_STMT(BPF_LD | BPF_W | BPF_ABS, OFF(args[0])),
    /L2*/ BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, 2, 4 - 3, 0),
    /L3*/ BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, 10, 0, 13 - 4),
    /L4*/ BPF_STMT(BPF_LD | BPF_W | BPF_ABS, OFF(args[1])),
    /L5*/ BPF_STMT(BPF_ALU | BPF_AND | BPF_K, ~0x80800),
    /L6*/ BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, 1, 8 - 7, 0),
    /L7*/ BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, 2, 0, 13 - 8),
    /L8*/ BPF_STMT(BPF_LD | BPF_W | BPF_ABS, OFF(args[2])),
    /L9*/ BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, 0, 12 - 10, 0),
    /*L10*/ BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, 6, 12 - 11, 0),
    /*L11*/ BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, 17, 0, 13 - 11),
    /*L12*/ BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
    /*L13*/ BPF_STMT(BPF_LD | BPF_W | BPF_ABS, OFF(nr)),
    /*L14*/ /next filter */
};

Oh my gosh. It’s like we traded one form of security privilege for
another. OpenBSD limits security to a small pond, but makes it easy.
Linux is a big tent, but makes it impossibly hard. SECCOMP BPF might as
well be the Traditional Chinese of programming languages, since only a
small number of people who’ve devoted the oodles of time it takes to
understand code like what you see above have actually been able to
benefit from it. But if you’ve got OpenBSD privilege, then doing the
same thing becomes easy:

pledge("stdio rpath", 0);

That’s really all OpenBSD users have to do to prevent things like leaks
of confidential information. So how do we get it that simple on Linux? I
believe the answer is to find someone with enough free time to figure
out how to use SECCOMP BPF to implement pledge. The latest volunteer is
me, so look upon my code ye mighty and despair.

There’s been a few devs in the past who’ve tried this. I’m not going to
name names, because most of these projects were never completed. When it
comes to SECCOMP, the online tutorials only explain how to whitelist the
system calls themselves, so most people lose interest before figuring
out how to filter arguments. The projects that got further along also
had oversights like allowing the changing of setuid/setgid/sticky bits.
So none of the current alternatives should be used. I believe this
effort gets us much closer to having pledge() than ever before.


Command Line Utility
 
[Linux]

I originally wrote my pledge() polyfill for
the redbean web server as a
sandboxing solution. However it turns out pledge() is robust enough as
an abstraction that I thought it’d be useful to create a small command
line utility which launches processes under pledge(), so that anyone can
use it, without having to configure it in C code.

pledge.com


44kb – x86-64 elf executable (debug data, source code)


Written by Justine Alexandra Roberts Tunney (Twitter, GitHub, LinkedIn)


ab61efbc68afc94a5812bacd4c93d91f1da3b8fb267a2622724821cd9cace169

That binary will work on all Linux distros since RHEL6. Root privileges
are not required. You just use it to wrap your command invocations. It’s
so tiny and lightweight that it only adds a few microseconds of startup
latency to your program. It’s great for shell scripts and automated
tools. For example, if you want to run the list directory command, and
only permit that command to do basic stdio and filesystem path reading,
you’d say:

$ wget https://justine.lol/pledge/pledge.com
$ chmod +x pledge.com
$ ./pledge.com -p 'stdio rpath' ls
file listing output...

You can now be certain your ls command isn’t doing things like spying on
you, or uploading your bitcoin wallet to the cloud. However let’s say
authorizing network access is what you want. One command that has a real
legitimate need for that is curl, which can be configured as follows:

$ ./pledge.com -p 'stdio rpath inet thread' curl http://justine.lol/hello.txt
hello world

Here’s another example. Let’s say you have a public ssh server and you
want to let people read and take notes of your book collection, but you
don’t want anyone rewriting your books. In that case, you can repupose
something like the nano command as a strictly read-only editor. Since
nano has a TUI interface, you’d need to grant it TTY privileges.

./pledge.com -np 'stdio rpath tty' nano ~/books/bofh.txt

Troubleshooting

If your program crashes, then you can figure out why by tracing the
binary and seeing which system call is EPERM’ing. Since the invocation
above used the default set of promises (thereby making -p 'stdio
rpath'
redundant) then let’s see what happens if we reduce the
privileges to just stdio.

$ strace -ff ./pledge.com -p stdio ls
open("/etc/ld-musl-x86_64.path", O_RDONLY|O_CLOEXEC) = -1 EPERM (Operation not permitted)

Well that didn’t take long. Now that you know what’s wrong, you would
then consult the Promises section to see which
promise you need. For example, you’d know open(O_RDONLY) is
provided by rpath and that in order to fork()
you need -p proc.

Resource Limits

In addition to polyfilling pledge, your pledge command is also able to
apply some other very important safety hacks that aren’t obvious to the
uninitiated. For example, we’ve all run a program before that hammers
the system. Linux is very generous in how much memory programs can
allocate. An accidental loop in just one program, by default on Linux,
will absolutely take the whole machine out of commission for a few
minutes before the “OOM Killer” kicks in. In other cases, like a fork()
bomb, the default Linux environment provides no such protection, so it’s
essentially equivalent to a blue screen of death.

READ:  Jake Paul vs. Tyron Woodley: Start time, press conference, how to watch, fight details

Your pledge command imposes some perfectly reasonable resource quotas on
programs by default, to prevent that from happening. By default, unless
you tune the flags, a program is allowed to use only 4gb of memory and,
if you’ve permitted it to fork off new processes, then it won’t be able
to spawn more of them at the same time than twice your number of CPUs.
That way your sandbox won’t compromise the stability of your machine.

We also have a niceness feature. Have you ever had a program use so much
disk i/o that everything crawls to a halt? You run some program, and
then suddenly every small file takes seconds to load in Emacs? Your
pledge command can fix that. If you’re got a compute heavy long running
program, then pass the -n flag for a nice
that’s actually nice. The naive nice command doesn’t really do much,
since it doesn’t change the scheduler and it doesn’t change the i/o
priority. This command actually does. Using the -n flag
will guarantee the sandbox program will stay out of the way, since the
kernel will only let it use spare capacity.

Pledge Command Flags

-n
Apply maximum niceness to program. This means (1) nice is set to 19,
(2) i/o priority is set to idle, and (3) scheduler is set to idle.
-N
Don’t normalize file descriptors. by default, pledge.com guarantees
(1) the stdio file descriptors exist, and (2) file descriptors that
the parent process or shell forgot to close will be closed. In the
latter case, we only poll up to fd=256 which is fast, but the number
may be lower depending on system limits.
-g GID
Call setgid() before executing program (not allowed if setuid binary)
-u UID
Call setuid() before executing program (not allowed if setuid binary)
-c PATH
Call chroot() before executing program (needs root privileges)
-C SECS
set cpu limit in seconds [default: inherited]
-M BYTES
set virtual memory limit in bytes [default: 4gb]
-P PROCS
set process limit [default: GetCpuCount()*2]
-F BYTES
set individual file size limit [default: 4gb]
-p PLEDGE
Defaults to -p 'stdio rpath'. It’s repeatable. May
contain any of following separated by spaces:


See also
the Promises section below which goes into
much greater depth on what each category does.

  • stdio: allow stdio and benign system calls
  • rpath: read-only path ops
  • wpath: write path ops
  • cpath: create path ops
  • dpath: create special files
  • flock: file locks
  • tty: terminal ioctls
  • recvfd: allow SCM_RIGHTS
  • fattr: allow changing some struct stat bits
  • inet: allow IPv4 and IPv6
  • unix: allow local sockets
  • dns: allow dns
  • proc: allow fork, clone and friends
  • thread: allow clone
  • id: allow setuid and friends
  • exec: allow executing ape binaries

Securing APE Binaries

Actually Portable Executables should be written to call pledge()
internally. But if you want to secure an APE binary that doesn’t, using
the pledge.com command, then you need to convert (or “assimilate”) it
into the ELF format beforehand. You can usually do this by saying:

$ file redbean.com
redbean.com: DOS/MBR boot sector
$ ./redbean.com --assimilate
$ file redbean.com
redbean.com: ELF 64-bit LSB executable

Please note that won’t work if you’re using the binfmt_misc with the new
APE Loader then you can’t
run the APE shell script to assimilate your binary. We instead provide a
new assimilate.com program which can be used to convert APE programs to
ELF or Mach-O.

assimilate.com


Works on x86-64 Linux+Mac+Windows+FreeBSD+NetBSD+OpenBSD


92kb – PE+ELF+MachO+ZIP+SH executable (debug data, source code)


Written by Justine Alexandra Roberts Tunney (Twitter, GitHub, LinkedIn)


593a8119049e9e8a88d29f80af83bfdbb5fcdd8a4cbad934af05dd6a5145ce77


C API

Pledge works best when developing software using
Cosmpolitan Libc. You
can get started relatively easily writing pledge() programs using the
cosmopolitan monorepo. The zero config solution is to just plop this
program file into the examples folder. Start by cloning the repo:

$ git clone https://github.com/jart/cosmopolitan
$ cd cosmopolitan
$ nano examples/mypledge.c

You can then copy and paste this code:

#include "libc/calls/calls.h"
#include "libc/stdio/stdio.h"

int main() {
  pledge("stdio", 0);
  printf("hello worldn");
}

You can then build and run your program as follows:

$ make -j8 o//examples/mypledge.com
$ o//examples/mypledge.com
hello world

One of the things you may have noticed about the pledge.com command, is
its most restrictive mode (pledge.com -p "" cmd...) can’t
actually be used. Your program will be just crash. That’s because it’s
intended for the C API. What it means is that your process or thread
won’t be able to call any system call except exit. Such a program might
sound impossible, but you can actually communicate between processes
using shared memory. For example, here’s how you’d do it with threads.

int enclave(void *arg, int tid) {
  if (pledge("", 0)) return 1;
  int *job = arg;            // get job
  job[0] = job[0] + job[1];  // do work
  return 0;                  // exit
}
int main() {
  struct spawn worker;
  int job[2] = {2, 2};            // create workload
  _spawn(enclave, job, &worker);  // create worker
  _join(&worker);                 // wait for exit
  assert(job[0] == 4);            // check result
}

The above example shows an enclaved worker doing some kind of
computational task, possibly executing untrusted code, and then storing
the result to some memory location that the parent thread can see when
the worker has finished executing. It works great and is fast.

One of the disadvantages of the above example, is that the enclaved
worker has unfettered access to your stack memory and might make a mess
of things. That’s potentially creepy and not very enclaved. One way to
fix that is to use fork() instead of threads. In that case, you can
explicitly whitelist which memory is shared.

int ws;
// create small shared memory region
int *job = mmap(0, FRAMESIZE, PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
job[0] = 2;  // create workload
job[1] = 2;
if (!fork()) {  // create enclaved worker
  if (pledge("", 0)) _Exit(1);
  job[0] = job[0] + job[1];  // do work
  _Exit(0);
}
wait(&ws);  // wait for worker
assert(WIFEXITED(ws));
assert(WEXITSTATUS(ws) == 0);
assert(job[0] == 4);  // check result
munmap(job, FRAMESIZE);

Most of our the Cosmopolitan Libc unit tests have been set up to use
pledge() these days. Not necessarily because we’re concerned about them
being compromised, but because the pledge function has outstanding
documentation value in helping people understand our tests, since it
readily communicates what system functionality they need. For example,
our tests for the access() filesystem function says:

__attribute__((__constructor__)) static void init(void) {
  pledge("stdio rpath wpath cpath fattr", 0);
  errno = 0;
}

System Call Origin Verification

When you write your own Actually Portable Executables, you also get some
added security benefits compared to pledge.com. For example, another
famous OpenBSD system call is msyscall() which causes the kernel to
validate the RIP register of anything that issues a system call. In
Cosmopolitan, calling pledge() will polyfill that feature too
automatically, to only allow functions which are annotated with
the priviliged keyword to use SYSCALL. What
that means is if someone manages to compromise your server to inject
executable code into your program’s memory, then that code effectively
will have pledge("", 0) privileges, even if when your app
called pledge(), it specified something much broader. The redbean web
server’s unix.pledge()
function is also able to take advantage of this.

READ:  Redmi Note 11T Pro To Feature Dimensity 8100, 144Hz Display; Launching On May 24


Caveats

File system access is a blind spot. OpenBSD solves this with another
famous system call called unveil(), which lets users control file system
paths too. Right now there’s no clear way to implement that for Linux.
However our pledge() polyfill does do a reasonable job in restricting
which file system operations are possible. But once you permit the file
system ops, the ops are allowed to happen on pretty much any file the
user has access to.

I personally don’t view this as a problem. What I love about pledge.com
is it tells me if the programs I run that I downloaded from random
strangers on the Internet, are actually the good little command line
citizens that they claim to be. For example, if I download a tool for
computing some math, or compressing a file, then it really shouldn’t
need any access except -p "stdio rpath" especially if I’m
able to use pipes. So I can use pledge.com to make sure the command
keeps its promise and lets me know if there’s any surprising behaviors.
So this is great security if you’re dealing with command line programs
that are written in a conscientious manner. If it’s only able to read
files and can’t talk to the Internet, then seriously, what could it
possible do? It’s such a simple pareto-optimized niche that I can’t
believe no one’s made it easily addressable until now.

However, there’s always going to be that one program you want that’s
power hungry, possibly due to bloated frameworks and dependencies. In
that case, we may want access to some (but not all) of the file
system. pledge.com is able to address the need somewhat using chroot().
It’s worth noting though that chroot() has a weaknesses which kernel
developers have refused to fix for decades. Most of the docs on this
subject are unprofessional and crazy. For example, the chroot(2) man
page is probably the only category 2 man page I’ve ever seen that uses
shell script code to describe its functionality. As far as I can tell,
the only convincing weakness with chroot() is that the jail is only
locked from the inside. If you take away the freedom of a process by
putting it in a chroot jail, then another process that’s free can use
its freedom to bust its friend out of jail. For example, here’s how root
can leave a backdoor that lets the process escape:

mkdir("/tmp/mydir", 0755);
// privileged user opens a backdoor
int dirfd = open("/tmp", O_RDONLY | O_DIRECTORY);
// process enters chroot jail
chdir("/tmp/mydir");
chroot("/tmp/mydir");
// process escapes jail
fchdir(dirfd);
chdir("..");
// list root directory
struct dirent *e;
DIR *d = opendir(".");
while ((e = readdir(d))) {
  printf("%sn", e->d_name);
}
closedir(d);

The Linux devs could fix that if they wanted to. However I personally
don’t see why it’s a total dealbreaker, pledge.com helps avoid it by
closing rogue file descriptors at startup using poll(). What even more
surprising is that this weakness is also exploitable on OpenBSD, since
they too seem to have given up on securing the traditional chroot()
call. But at least OpenBSD provides an alternative that’s easy to use,
called unveil(). It’d be great to see that leadership from the Linux
kernel, but instead we just see blog posts from companies like RedHat
saying that having chroot() will make us more insecure than having no
security at all. It’s like banning locks because lockpick kits exist.
RedHat must be experts at mental gymnastics to publish such communiqués.
It’s also comical that Linux addresses the problem by restricting
chroot() to the root user account, since clearly something which is so
“insecure” will become more secure if you only do it from the most
privileged user. What an unfortunate state of affairs, since many of us
have needed to look elsewhere for answers, and the only folks offering
those right now is bloatware like Docker that locks-in your filesystem
with a bunch of cryptically named tar files. And they say that Docker
isn’t a security layer too! Even though it’s based things like cgroups
which are even more elite and difficult to understand than SECCOMP BPF.
We can only guess why the kernel devs do it. Maybe they’re afraid of
issue workload burnout and figure people won’t complain about security
if no one understands it! That’s something we’re working to change.

It should also be noted that there’s some features OpenBSD bakes into
pledge() that we’re not able to polyfill with Linux SECCOMP BPF. One of
the things OpenBSD does is it can check file system paths, in order to
loosen up restrictions around things like accessing the time zone
database. This isn’t a problem if you’re a Cosmopolitan Libc user.
Because APE binaries don’t read tzdata from the filesystem and instead
embed time zone data inside the ZIP structure of the binary. However it
could potentially be problematic if you’re using pledge.com to launch
binaries that are provided by your distro. Ask your friendly distro
maintainers to improve their security solutions. If they can’t, then you
can always switch to Cosmopolitan Libc.

READ:  Google Cloud poaches SAP exec Adaire Fox-Martin to run EMEA ops

Another caveat is that, so far, I’ve only implemented the things
described in the OpenBSD pledge(2) manual page. We still need to
reconcile this properly with the primary materials which would be the
OpenBSD pledge() kernel source code. We also need more community
feedback to make sure there aren’t things we haven’t considered. For
example, Linux has a lot of sneaky capabilities in a shifting landscape
that aren’t always widely understood, which can potentially bite the
authors of security tools, even when they’ve done due diligence.

I’ve also only really tested this on console applications. If you want a
pledge() that’s likely to work with GUIs, then, knowing the way the
Linux desktop goes, you really should consider
SerenityOS
since Andreas added pledge() support a couple years ago.


Pledge Documentation

Pledging causes most system calls to become unavailable. Your system
call policy is enforced by the kernel, which means it can propagate
across execve() if permitted. This system call is supported on
OpenBSD and Linux where it’s polyfilled using SECCOMP BPF. The way it
works on Linux is verboten system calls will raise EPERM whereas
OpenBSD just kills the process while logging a helpful message to
/var/log/messages explaining which promise category you needed.

By default exit and exit_group are always allowed. This is useful
for processes that perform pure computation and interface with the
parent via shared memory.

Once pledge is in effect, the chmod functions (if allowed) will not
permit the sticky/setuid/setgid bits to change. Linux will EPERM here
and OpenBSD should ignore those three bits rather than crashing.

User and group IDs can’t be changed once pledge is in effect. OpenBSD
should ignore chown without crashing; whereas Linux will just EPERM.

Memory functions won’t permit creating executable code after pledge.
Restrictions on origin of SYSCALL instructions will become enforced
on Linux (cf. msyscall) after pledge too, which means the process
gets killed if SYSCALL is used outside the .privileged section. One
exception is if the “exec” group is specified, in which case these
restrictions need to be loosened.

Using pledge is irreversible. On Linux it causes PR_SET_NO_NEW_PRIVS
to be set on your process; however, if “id” or “recvfd” are allowed
then then they theoretically could permit the gaining of some new
privileges. You may call pledge() multiple times if “stdio” is
allowed. In that case, the process can only move towards a more
restrictive state.

pledge() can’t filter file system paths or internet addresses. For
example, if you enable a category like “inet” then your process will
be able to talk to any internet address. The same applies to
categories like “wpath” and “cpath”; if enabled, any path the
effective user id is permitted to change will be changeable.


Promises

Your promises is a string that may include any of the following groups
delimited by spaces.

stdio
allows close, dup, dup2, dup3, fchdir, fstat, fsync,
fdatasync, ftruncate, getdents, getegid, getrandom, geteuid,
getgid, getgroups, getitimer, getpgid, getpgrp, getpid, getppid,
getresgid, getresuid, getrlimit, getsid, wait4, gettimeofday,
getuid, lseek, madvise, brk, arch_prctl, uname, set_tid_address,
clock_getres, clock_gettime, clock_nanosleep, mmap (PROT_EXEC and
weird flags aren’t allowed), mprotect (PROT_EXEC isn’t allowed),
msync, munmap, nanosleep, pipe, pipe2, read, readv, pread, recv,
poll, recvfrom, preadv, write, writev, pwrite, pwritev, select,
send, sendto (only if addr is null), setitimer, shutdown, sigaction
(but SIGSYS is forbidden), sigaltstack, sigprocmask, sigreturn,
sigsuspend, umask, socketpair, ioctl(FIONREAD), ioctl(FIONBIO),
ioctl(FIOCLEX), ioctl(FIONCLEX), fcntl(F_GETFD), fcntl(F_SETFD),
fcntl(F_GETFL), fcntl(F_SETFL).

rpath
(read-only path ops) allows chdir, getcwd, open(O_RDONLY),
openat(O_RDONLY), stat, fstat, lstat, fstatat, access, faccessat,
readlink, readlinkat, statfs, fstatfs.

wpath
(write path ops) allows getcwd, open(O_WRONLY),
openat(O_WRONLY), stat, fstat, lstat, fstatat, access, faccessat,
readlink, readlinkat, chmod, fchmod, fchmodat.

cpath
(create path ops) allows open(O_CREAT), openat(O_CREAT),
rename, renameat, renameat2, link, linkat, symlink, symlinkat,
unlink, rmdir, unlinkat, mkdir, mkdirat.

dpath
(create special path ops) allows mknod, mknodat, mkfifo.

flock
allows flock, fcntl(F_GETLK), fcntl(F_SETLK),
fcntl(F_SETLKW).

tty
allows ioctl(TIOCGWINSZ), ioctl(TCGETS), ioctl(TCSETS),
ioctl(TCSETSW), ioctl(TCSETSF).

recvfd
allows recvmsg(SCM_RIGHTS).

fattr
allows chmod, fchmod, fchmodat, utime, utimes, futimens,
utimensat.

inet
allows socket(AF_INET), listen, bind, connect, accept,
accept4, getpeername, getsockname, setsockopt, getsockopt, sendto.

unix
allows socket(AF_UNIX), listen, bind, connect, accept,
accept4, getpeername, getsockname, setsockopt, getsockopt.

dns
allows socket(AF_INET), sendto, recvfrom, connect.

proc
allows fork, vfork, kill, getpriority, setpriority, prlimit,
setrlimit, setpgid, setsid.

thread
allows clone, futex, and permits PROT_EXEC in mprotect.

id
allows setuid, setreuid, setresuid, setgid, setregid,
setresgid, setgroups, prlimit, setrlimit, getpriority, setpriority,
setfsuid, setfsgid.

exec
allows execve, execveat, access, faccessat. On Linux this
also weakens some security to permit running APE binaries. However
on OpenBSD they must be assimilate beforehand. On Linux, mmap()
will be loosened up to allow creating PROT_EXEC memory (for APE
loader) and system call origin verification won’t be activated.

execnative
allows execve, execveat. Can only be used to run
native executables; you won’t be able to run APE binaries. mmap()
and mprotect() are still prevented from creating executable memory.
System call origin verification can’t be enabled. If you always
assimilate your APE binaries, then this should be preferred.


Funding

[United States of Lemuria - two dollar bill - all debts public and primate]

Funding for the development of pledge() on Linux was crowdsourced from
Justine Tunney’s GitHub
sponsors
and Patreon
subscribers
. Your support is what makes projects like Cosmopolitan
Libc possible. Thank you.

Read More

Learn More: technology clipart,technology student association,technology management,technology readiness level,technology acceptance model,technology gif,technology transfer,technology consultant,technology package,technology addiction awareness scholarship,is technology good or bad,technology networks,technology movies,technology gap,technology jokes,is technology limiting creativity,technology leadership,technology drive,technology zero,technology help,technology 100 years ago,technology project manager,technology house,technology unlimited,technology background images,technology readiness level dod,g technology ssd,technology economics definition,technology obsolescence,is technology science,technology life cycle

Leave a Reply

Your email address will not be published. Required fields are marked *