Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

The Asterinas Book

asterinas-logo

Welcome to the documentation for Asterinas, an open-source project and community focused on developing cutting-edge Rust OS kernels.

Book Structure

This book is divided into six distinct parts:

Part 1: Asterinas NixOS

Asterinas NixOS is the first distribution built on top of the Asterinas kernel. It is based on NixOS, leveraging its powerful configuration model and rich package ecosystem, while swapping out the Linux kernel for Asterinas.

Part 2: Asterinas Kernel

Explore the modern OS kernel at the heart of Asterinas. Designed to realize the full potential of Rust, Asterinas Kernel implements the Linux ABI in a safe and efficient way. This means it can seamlessly replace Linux, offering enhanced safety and security.

Part 3: Asterinas OSTD

The Asterinas OSTD lays down a minimalistic, powerful, and solid foundation for OS development. It’s akin to Rust’s std crate but crafted for the demands of safe Rust OS development. The Asterinas Kernel is built on this very OSTD.

Part 4: Asterinas OSDK

The OSDK is a command-line tool that streamlines the workflow to create, build, test, and run Rust OS projects that are built upon Asterinas OSTD. Developed specifically for OS developers, it extends Rust’s Cargo tool to better suite their specific needs. OSDK is instrumental in the development of Asterinas Kernel.

Part 5: Contributing to Asterinas

Asterinas is in its early stage and welcomes your contributions! This part provides guidance on how you can become an integral part of the Asterinas project.

Part 6: Requests for Comments (RFCs)

Significant decisions in Asterinas are made through a transparent RFC process. This part describes the RFC process and archives all approvaed RFCs.

Licensing

Asterinas’s source code and documentation primarily use the Mozilla Public License (MPL), Version 2.0. Select components are under more permissive licenses, detailed here.

Our choice of the weak-copyleft MPL license reflects a strategic balance:

  1. Commitment to open-source freedom: We believe that OS kernels are a communal asset that should benefit humanity. The MPL ensures that any alterations to MPL-covered files remain open source, aligning with our vision. Additionally, we do not require contributors to sign a Contributor License Agreement (CLA), preserving their rights and preventing the possibility of their contributions being made closed source.

  2. Accommodating proprietary modules: Recognizing the evolving landscape where large corporations also contribute significantly to open-source, we accommodate the business need for proprietary kernel modules. Unlike GPL, the MPL permits the linking of MPL-covered files with proprietary code.

In conclusion, we believe that MPL is the best choice to foster a vibrant, robust, and inclusive open-source community around Asterinas.

Asterinas NixOS

Asterinas NixOS is the first distribution for Asterinas. We choose NixOS as the base OS for its unparalleled customizability and rich package ecosystem. Asterinas NixOS is intended to look and feel like vanilla NixOS, except that it replaces Linux with Asterinas as its kernel. For more rationale about choosing NixOS, see the RFC.

Disclaimer: Asterinas is an independent, community-led project. Asterinas NixOS is not an official NixOS project and has no affiliation with the NixOS Foundation. No sponsorship or endorsement is implied.

Asterinas NixOS is not ready for production use. We provide Asterinas NixOS to make the Asterinas kernel more accessible, allowing early adopters and enthusiasts to try it out and provide feedback. In addition, Asterinas developers use this distro as a key development vehicle to facilitate enabling and testing more real-world applications on the Asterinas kernel.

Getting Started

End Users

The following instructions describe how to install Asterinas NixOS in a VM using the Asterinas NixOS installer ISO.

  1. Launch an x86-64 Ubuntu container:

    docker run -it --privileged --network=host ubuntu:latest bash
    
  2. Inside the container, install QEMU:

    apt update
    apt install -y qemu-system-x86 qemu-utils
    
  3. Download the latest Asterinas NixOS installer ISO from GitHub Releases.

  4. Create a file that will be used as the target disk to install Asterinas NixOS:

    # The capacity of the disk is 10GB; adjust it as you see fit
    dd if=/dev/zero of=aster_nixos_disk.img bs=1G count=10
    
  5. Start an x86-64 VM with two drives: one is the installer CD-ROM and the other is the target disk:

    export INSTALLER_ISO=/path/to/your/downloaded/installer.iso
    qemu-system-x86_64 \
    -cpu host -m 8G -enable-kvm \
    -drive file="$INSTALLER_ISO",media=cdrom -boot d \
    -drive if=virtio,format=raw,file=aster_nixos_disk.img \
    -chardev stdio,id=mux,mux=on,logfile=qemu.log \
    -device virtio-serial-pci -device virtconsole,chardev=mux \
    -serial chardev:mux -monitor chardev:mux \
    -nographic
    

    After the VM boots, you now have access to the installation environment.

  6. Edit the configuration.nix file in the home directory to customize the NixOS system to be installed:

    vim configuration.nix
    

    The complete syntax and guidance for the configuration.nix file can be found in the NixOS manual. If you are not familiar with NixOS, you can simply skip this step.

    Not all combinations of settings in configuration.nix are supported by Asterinas NixOS yet. The ones that have been tested are documented in the subsequent chapters.

  7. Start installation:

    install_aster_nixos.sh --config configuration.nix --disk /dev/vda --force-format-disk
    

    The installation process involves downloading packages and may take around 30 minutes to complete, depending on your network speed.

  8. After the installation is complete, you can shut down the VM:

    poweroff
    

    Now Asterinas NixOS is installed in aster_nixos_disk.img.

  9. Start a VM to boot the newly installed Asterinas NixOS:

    qemu-system-x86_64 \
    -cpu host -m 8G -enable-kvm \
    -bios /usr/share/qemu/OVMF.fd \
    -drive if=none,format=raw,id=x0,file=aster_nixos_disk.img \
    -device virtio-blk-pci,drive=x0,disable-legacy=on,disable-modern=off \
    -chardev stdio,id=mux,mux=on,logfile=qemu.log \
    -device virtio-serial-pci -device virtconsole,chardev=mux \
    -serial chardev:mux -monitor chardev:mux \
    -device virtio-net-pci,netdev=net0,disable-legacy=on,disable-modern=off \
    -netdev user,id=net0 \
    -device isa-debug-exit,iobase=0xf4,iosize=0x04 \
    -nographic -display vnc=127.0.0.1:21
    

    If a desktop environment is enabled in the configuration.nix file, you can view the graphical interface using a VNC client.

Kernel Developers

  1. Follow Steps 1 and 2 in the “Getting Started” section of the Asterinas Kernel to set up the development environment.

  2. Inside the Docker container, generate a disk image with Asterinas NixOS installed using this command:

    make nixos
    

    or this command:

    make iso && make run_iso
    

    The difference between the two methods is that the first installs NixOS to a disk image entirely inside the container, whereas the second emulates the manual ISO installation steps (see the previous section) by running a VM. Using either method results in a disk image with an Asterinas NixOS installation.

  3. Start a VM to run the installed Asterinas NixOS:

    make run_nixos
    

Popular Applications

The Nix package collection contains over 120,000 packages. Many of them may already work out of the box on Asterinas NixOS, but it is impractical to test every package and document its support status. Instead, we group popular applications into categories and document the ones we have verified to work.

Package Management

NixOS provides a set of tools for building, installing, and managing packages.

Verified Usage

nix-build

nix-build builds Nix derivations and produces outputs in the Nix store. It is the preferred way to build software reproducibly.

# Step 1: Create a clean workspace
rm -rf /tmp/nix-hello-c && mkdir -p /tmp/nix-hello-c && cd /tmp/nix-hello-c

# Step 2: Write a C program
cat > hello.c <<'EOF'
#include <stdio.h>

int main(void) {
    puts("Hello, World!");
    return 0;
}
EOF

# Step 3: Write a default.nix
cat > default.nix <<'EOF'
{ pkgs ? import <nixpkgs> {} }:

pkgs.stdenv.mkDerivation {
  pname = "hello-c";
  version = "0.1.0";
  src = ./.;

  dontConfigure = true;

  buildPhase = ''
    cc hello.c -o hello
  '';

  installPhase = ''
    mkdir -p $out/bin
    install -m755 hello $out/bin/hello
  '';
}
EOF

# Step 4: Build and run
nix-build
./result/bin/hello

nix-env

nix-env installs or removes individual packages in your user profile.

# Install the `hello` package
nix-env -iA nixos.hello

# Remove the `hello` package
nix-env -e hello

nix-shell

nix-shell creates a temporary development environment with the specified dependencies. This is useful for testing software without modifying your system environment.

# Enter a shell with the `hello` package available
nix-shell -p hello

nixos-rebuild

nixos-rebuild manages the entire system configuration declaratively. It applies changes defined in configuration.nix, and is the recommended approach for installing packages system-wide.

# Edit the system configuration file
vim /etc/nixos/configuration.nix

# Apply configuration changes and rebuild the system without rebooting
nixos-rebuild test

Desktop Environment

Xfce

Xfce is a lightweight desktop environment for UNIX-like operating systems.

Installation

Add the following lines to the configuration.nix file:

services.xserver.enable = true;
services.xserver.desktopManager.xfce.enable = true;

Verified Backends

  • Display server:
    • Xorg display server
  • Graphics drivers:
    • Standard UEFI VGA framebuffer

Verified Functionality

  • Changing desktop wallpapers and background settings
  • Adjusting font size, style, and system theme
  • Creating application shortcuts and desktop launchers
  • Managing panels and window behavior
  • Using the settings manager and file browser

Verified GUI Applications

Utilities:

  • galculator: Calculator
  • mousepad: The default Xfce text editor
  • mupdf: A lightweight PDF and XPS viewer

Games:

  • fairymax: Chess
  • five-or-more: GNOME alignment game
  • lbreakout2: Breakout/Arkanoid clone
  • gnome-chess: GNOME chess
  • gnome-mines: Minesweeper
  • gnome-sudoku: GNOME sudoku
  • tali: GNOME dice game
  • xboard: Chess
  • xgalaga: Galaga-style arcade game

Containerization

Podman

Podman is a modern, daemonless container engine that provides a Docker-compatible command-line interface, making it easy for users familiar with Docker to transition.

Installation

To install Podman, add the following line to configuration.nix:

virtualization.podman.enable = true;

Verified Usage

podman run

podman run runs a command in a new container.

# Start a container, execute a command, and then exit
podman run --name=c1 docker.io/library/alpine ls /etc

# Start a container and attach to an interactive shell
podman run -it docker.io/library/alpine

podman image

podman image manages local images.

# List downloaded images
podman image ls

podman ps

podman ps lists containers.

# Show the status of all containers (including exited ones)
podman ps -a

podman rm

podman rm removes one or more containers.

# Remove a container named foo
podman rm foo

Asterinas Kernel

Overview

Asterinas is a secure, fast, and general-purpose OS kernel that provides an Linux-compatible ABI. It can serve as a seamless replacement for Linux while enhancing memory safety and developer friendliness.

  • Asterinas prioritizes memory safety by employing Rust as its sole programming language and limiting the use of unsafe Rust to a clearly defined and minimal Trusted Computing Base (TCB). This innovative approach, known as the framekernel architecture, establishes Asterinas as a more secure and dependable kernel option.

  • Asterinas surpasses Linux in terms of developer friendliness. It empowers kernel developers to (1) utilize the more productive Rust programming language, (2) leverage a purpose-built toolkit called OSDK to streamline their workflows, and (3) choose between releasing their kernel modules as open source or keeping them proprietary, thanks to the flexibility offered by MPL.

While the journey towards a production-grade OS kernel can be challenging, we are steadfastly progressing towards our goal.

Supported CPU Architectures

Asterinas targets modern, 64-bit platforms only.

A development platform is where you build and test Asterinas (i.e., the host machine running the Docker-based development environment).

Development Platform
x86-64

A deployment platform is a CPU architecture that Asterinas can run on as an OS kernel.

Deployment PlatformTier
x86-64Tier 1
x86-64 (Intel TDX)Tier 2
RISC-V 64Tier 2
LoongArch 64Tier 3
  • Tier 1: Fully supported and tested. CI runs the full test suite on every PR.
  • Tier 2: Actively developed with basic functionality working. CI runs build checks and basic tests on a regular basis (per PR for RISC-V and nightly for Intel TDX), but the full test suite is not yet covered.
  • Tier 3: Early-stage or experimental. The kernel can boot and perform basic operations, but CI coverage is limited and may not include automated runtime tests for every pull request.

Getting Started

Get yourself an x86-64 Linux machine with Docker installed. Follow the three simple steps below to get Asterinas up and running.

  1. Download the latest source code.

    git clone https://github.com/asterinas/asterinas
    
  2. Run a Docker container as the development environment.

    docker run -it --privileged \
                --network=host \
                -v /dev:/dev \
                -v $(pwd)/asterinas:/root/asterinas \
                asterinas/asterinas:0.17.1-20260317
    
  3. Inside the container, go to the project folder to build and run Asterinas.

    make kernel
    make run_kernel
    

If everything goes well, Asterinas is now up and running inside a VM.

Advanced Build and Test Instructions

User-Mode Unit Tests

Asterinas consists of many crates, some of which do not require a VM environment and can be tested with the standard cargo test. They are listed in the root Makefile and can be tested together through the following Make command.

make test

To test an individual crate, enter the directory of the crate and invoke cargo test.

Kernel-Mode Unit Tests

Many crates in Asterinas do require a VM environment to be tested. The unit tests for these crates are empowered by OSDK.

make ktest

To test an individual crate in kernel mode, enter the directory of the crate and invoke cargo osdk test.

cd asterinas/ostd
cargo osdk test

Integration Test

General Test

The following command builds and runs the test binaries in test/initramfs/src/apps directory on Asterinas.

make run_kernel AUTO_TEST=test

Syscall Test

The following command builds and runs the syscall test binaries on Asterinas.

make run_kernel AUTO_TEST=syscall

To run system call tests interactively, start an instance of Asterinas with the system call tests built and installed.

make run_kernel BUILD_SYSCALL_TEST=1

Then, in the interactive shell, run the following script to start the syscall tests.

/opt/syscall_test/run_syscall_test.sh

Debug

Using GDB to Debug

To debug Asterinas via QEMU GDB support, you can compile Asterinas in the debug profile, start an Asterinas instance and run the GDB interactive shell in another terminal.

Start a GDB-enabled VM of Asterinas with OSDK and wait for debugging connection:

make gdb_server

The server will listen at the default address specified in Makefile, i.e., a local TCP port :1234. Change the address in Makefile for your convenience, and check cargo osdk run -h for more details about the address.

Two options are provided to interact with the debug server.

  • A GDB client: start a GDB client in another terminal.

    make gdb_client
    
  • VS Code: CodeLLDB extension is required. After starting a debug server with OSDK from the shell with make gdb_server, a temporary launch.json is generated under .vscode. Your previous launch configs will be restored after the server is down. Press F5(Run and Debug) to start a debug session via VS Code. Click Continue(or, press F5) at the first break to resume the paused server instance, then it will continue until reaching your first breakpoint.

Note that if debugging with KVM enabled, you must use hardware assisted breakpoints. See “hbreak” in the GDB manual for details.

Intel TDX

Asterinas can serve as a secure guest OS for Intel TDX-protected virtual machines (VMs). This documentation describes how Asterinas can be run and tested easily on a TDX-enabled Intel server.

Intel TDX (Trust Domain Extensions) is a Trusted Execution Environment (TEE) technology that enhances VM security by creating isolated, hardware-enforced trust domains with encrypted memory, secure initialization, and attestation mechanisms. For more information about Intel TDX, jump to the last section.

Why choose Asterinas for Intel TDX

VM TEEs such as Intel TDX deserve a more secure option for its guest OS than Linux. Linux, with its inherent memory safety issues and large Trusted Computing Base (TCB), has long suffered from security vulnerabilities due to memory safety bugs. Additionally, when Linux is used as the guest kernel inside a VM TEE, it must process untrusted inputs (over 1500 instances in Linux, per Intel’s estimation) from the host (via hypercalls, MMIO, and etc.). These untrusted inputs create new attack surfaces that can be exploited through memory safety vulnerabilities, known as Iago attacks.

Asterinas offers greater memory safety than Linux, particularly against Iago attacks. Thanks to its framekernel architecture, the memory safety of Asterinas relies solely on the Asterinas Framework, excluding the safe device drivers built on top of the Asterinas Framework that may handle untrusted inputs from the host. For more information, see our talk on OC3’24.

Prepare the Intel TDX Environment

Please make sure your server supports Intel TDX.

See this guide or other materials to enable Intel TDX in host OS.

To verify the TDX host status, you can type:

dmesg | grep "TDX module initialized"

The following result is an example:

[   20.507296] tdx: TDX module initialized.

TDX module initialized means TDX module is loaded successfully.

Build and run Asterinas

  1. Download the latest source code.

    git clone https://github.com/asterinas/asterinas
    
  2. Run a Docker container as the development environment.

    docker run -it --privileged --network=host --device=/dev/kvm -v $(pwd)/asterinas:/root/asterinas asterinas/asterinas:0.17.1-20260317
    
  3. Inside the container, go to the project folder to build and run Asterinas.

    make run_kernel INTEL_TDX=1
    

If everything goes well, Asterinas is now up and running inside a TD.

Using GDB to Debug

A Trust Domain (TD) is debuggable if its ATTRIBUTES.DEBUG bit is 1. In this mode, the host VMM can use Intel TDX module functions to read and modify TD VCPU state and TD private memory, which are not accessible when the TD is non-debuggable.

Start Asterinas in a GDB-enabled TD and wait for debugging connection:

make gdb_server INTEL_TDX=1

Behind the scene, this command adds debug=on configuration to the QEMU parameters to enable TD debuggable mode.

The server will listen at the default address specified in Makefile, i.e., a local TCP port :1234.

Start a GDB client in another terminal:

make gdb_client INTEL_TDX=1

Note that you must use hardware assisted breakpoints because KVM is enabled when debugging a TD.

About Intel TDX

Intel® Trust Domain Extensions (Intel® TDX) is Intel’s newest confidential computing technology. This hardware-based trusted execution environment (TEE) facilitates the deployment of trust domains (TD), which are hardware-isolated virtual machines (VM) designed to protect sensitive data and applications from unauthorized access.

A CPU-measured Intel TDX module enables Intel TDX. This software module runs in a new CPU Secure Arbitration Mode (SEAM) as a peer virtual machine manager (VMM), and supports TD entry and exit using the existing virtualization infrastructure. The module is hosted in a reserved memory space identified by the SEAM Range Register (SEAMRR).

Intel TDX uses hardware extensions for managing and encrypting memory and protects both the confidentiality and integrity of the TD CPU state from non-SEAM mode.

Intel TDX uses architectural elements such as SEAM, a shared bit in Guest Physical Address (GPA), secure Extended Page Table (EPT), physical-address-metadata table, Intel® Total Memory Encryption – Multi-Key (Intel® TME-MK), and remote attestation.

Intel TDX ensures data integrity, confidentiality, and authenticity, which empowers engineers and tech professionals to create and maintain secure systems, enhancing trust in virtualized environments.

For more information, please refer to Intel TDX website.

The Framekernel Architecture

Framekernel: What and Why

The security of a microkernel, the speed of a monolithic kernel.

Asterinas introduces a novel OS architecture called framekernel, which unleashes the full power of Rust to bring the best of both monolithic kernels and microkernels.

Within the framekernel architecture, the entire OS resides in the same address space (like a monolithic kernel) and is required to be written in Rust. However, there’s a twist—the kernel is partitioned in two halves: the OS Framework (akin to a microkernel) and the OS Services. Only the OS Framework is allowed to use unsafe Rust, while the OS Services must be written exclusively in safe Rust.

Unsafe RustResponsibilitiesCode Sizes
OS FrameworkAllowedEncapsulate low-level unsafe code within high-level safe APIsSmall
OS ServicesNot allowedImplement OS functionalities, e.g., system calls, file systems, device driversLarge

As a result, the memory safety of the kernel can be reduced to that of the OS Framework, thus minimizing the Trusted Computing Base (TCB) associated with the kernel’s memory safety. On the other hand, the single address space allows different parts of the kernel to communicate in the most efficient means, e.g., function calls and shared memory. Thanks to the framekernel architecture, Asterinas can offer both exceptional performance and enhanced safety.

A comparison between different OS architectures

Requirements for the OS Framework

While the concept of framekernel is straightforward, the design and implementation of the required OS framework present challenges. It must concurrently fulfill four criteria.

The four requirements for the OS framework

  • Soundness. The safe APIs of the framework are considered sound if no undefined behaviors shall be triggered by whatever safe Rust code that a programmer may write using the APIs
  • as long as the code is verified by the Rust toolchain. Soundness ensures that the OS framework, in conjunction with the Rust toolchain, bears the full responsibility for the kernel’s memory safety.
  • Expressiveness. The framework should empower developers to implement a substantial range of OS functionalities in safe Rust using the APIs. It is especially important that the framework enables writing device drivers in safe Rust, considering that device drivers comprise the bulk of the code in a fully-fleged OS kernel (like Linux).

  • Minimalism. As the TCB for memory safety, the framework should be kept as small as possible. No functionality should be implemented inside the framework if doing it outside is possible.

  • Efficiency. The safe API provided by the framework is only allowed to introduce minimal overheads. Ideally, these APIs should be realized as zero-cost abstractions.

Fortunately, our efforts to design and implement an OS framework meeting these standards have borne fruit in the form of the Asterinas OSTD. Using this framework as a foundation, we have developed the Asterinas Kernel; this framework also enables others to create their own framekernels, with different goals and tradeoffs.

Linux Compatibility

“We don’t break user space.”

— Linus Torvalds

Asterinas is dedicated to maintaining compatibility with the Linux ABI, ensuring that applications and administrative tools designed for Linux can seamlessly operate within Asterinas. While we prioritize compatibility, it is important to note that Asterinas does not, nor will it in the future, support the loading of Linux kernel modules.

System Calls

At the time of writing, Asterinas supports over 230 Linux system calls for the x86-64 architecture, which are summarized in the table below.

NumbersNamesSupportedFlag Coverage
0read💯
1write💯
2open⚠️
3close💯
4stat💯
5fstat💯
6lstat💯
7poll⚠️
8lseek⚠️
9mmap⚠️
10mprotect⚠️
11munmap💯
12brk💯
13rt_sigaction⚠️
14rt_sigprocmask⚠️
15rt_sigreturn💯
16ioctl⚠️
17pread64💯
18pwrite64💯
19readv💯
20writev💯
21access💯
22pipe💯
23select💯
24sched_yield💯
25mremap⚠️
26msync⚠️
27mincoreN/A
28madvise⚠️
29shmgetN/A
30shmatN/A
31shmctlN/A
32dup💯
33dup2💯
34pause💯
35nanosleep💯
36getitimer💯
37alarm💯
38setitimer💯
39getpid💯
40sendfile💯
41socket⚠️
42connect⚠️
43accept⚠️
44sendto⚠️
45recvfrom⚠️
46sendmsg⚠️
47recvmsg⚠️
48shutdown💯
49bind⚠️
50listen💯
51getsockname💯
52getpeername💯
53socketpair⚠️
54setsockopt⚠️
55getsockopt⚠️
56clone⚠️
57fork💯
58vfork💯
59execve💯
60exit💯
61wait4⚠️
62kill💯
63uname💯
64semget⚠️
65semop⚠️
66semctl⚠️
67shmdtN/A
68msggetN/A
69msgsndN/A
70msgrcvN/A
71msgctlN/A
72fcntl⚠️
73flock💯
74fsync💯
75fdatasync💯
76truncate💯
77ftruncate💯
78getdents💯
79getcwd💯
80chdir💯
81fchdir💯
82rename💯
83mkdir💯
84rmdir💯
85creat💯
86link💯
87unlink💯
88symlink💯
89readlink💯
90chmod💯
91fchmod💯
92chown💯
93fchown💯
94lchown💯
95umask💯
96gettimeofday💯
97getrlimit💯
98getrusage⚠️
99sysinfo💯
100timesN/A
101ptraceN/A
102getuid💯
103syslogN/A
104getgid💯
105setuid💯
106setgid💯
107geteuid💯
108getegid💯
109setpgid💯
110getppid💯
111getpgrp💯
112setsid💯
113setreuid💯
114setregid💯
115getgroups💯
116setgroups💯
117setresuid💯
118getresuid💯
119setresgid💯
120getresgid💯
121getpgid💯
122setfsuid💯
123setfsgid💯
124getsid💯
125capget⚠️
126capset⚠️
127rt_sigpending💯
128rt_sigtimedwait💯
129rt_sigqueueinfoN/A
130rt_sigsuspend💯
131sigaltstack💯
132utime💯
133mknod💯
134uselibN/A
135personalityN/A
136ustatN/A
137statfs💯
138fstatfs💯
139sysfsN/A
140getpriority💯
141setpriority💯
142sched_setparam💯
143sched_getparam💯
144sched_setscheduler⚠️
145sched_getscheduler💯
146sched_get_priority_max💯
147sched_get_priority_min💯
148sched_rr_get_intervalN/A
149mlockN/A
150munlockN/A
151mlockallN/A
152munlockallN/A
153vhangupN/A
154modify_ldtN/A
155pivot_root💯
156_sysctlN/A
157prctl⚠️
158arch_prctl⚠️
159adjtimexN/A
160setrlimit💯
161chroot💯
162sync💯
163acctN/A
164settimeofdayN/A
165mount⚠️
166umount2⚠️
167swaponN/A
168swapoffN/A
169reboot⚠️
170sethostname💯
171setdomainname💯
172ioplN/A
173iopermN/A
174create_moduleN/A
175init_moduleN/A
176delete_moduleN/A
177get_kernel_symsN/A
178query_moduleN/A
179quotactlN/A
180nfsservctlN/A
181getpmsgN/A
182putpmsgN/A
183afs_syscallN/A
184tuxcallN/A
185securityN/A
186gettid💯
187readaheadN/A
188setxattr💯
189lsetxattr💯
190fsetxattr💯
191getxattr💯
192lgetxattr💯
193fgetxattr💯
194listxattr💯
195llistxattr💯
196flistxattr💯
197removexattr💯
198lremovexattr💯
199fremovexattr💯
200tkillN/A
201time💯
202futex⚠️
203sched_setaffinity💯
204sched_getaffinity💯
205set_thread_areaN/A
206io_setupN/A
207io_destroyN/A
208io_geteventsN/A
209io_submitN/A
210io_cancelN/A
211get_thread_areaN/A
212lookup_dcookieN/A
213epoll_create💯
214epoll_ctl_oldN/A
215epoll_wait_oldN/A
216remap_file_pagesN/A
217getdents64💯
218set_tid_address💯
219restart_syscallN/A
220semtimedop⚠️
221fadvise64⚠️
222timer_create⚠️
223timer_settime💯
224timer_gettime💯
225timer_getoverrunN/A
226timer_delete💯
227clock_settimeN/A
228clock_gettime⚠️
229clock_getresN/A
230clock_nanosleep⚠️
231exit_group💯
232epoll_wait💯
233epoll_ctl⚠️
234tgkill💯
235utimes💯
236vserverN/A
237mbindN/A
238set_mempolicyN/A
239get_mempolicyN/A
240mq_openN/A
241mq_unlinkN/A
242mq_timedsendN/A
243mq_timedreceiveN/A
244mq_notifyN/A
245mq_getsetattrN/A
246kexec_loadN/A
247waitid⚠️
248add_keyN/A
249request_keyN/A
250keyctlN/A
251ioprio_set⚠️
252ioprio_get⚠️
253inotify_init💯
254inotify_add_watch⚠️
255inotify_rm_watch💯
256migrate_pagesN/A
257openat⚠️
258mkdirat💯
259mknodat💯
260fchownat💯
261futimesat💯
262newfstatat⚠️
263unlinkat💯
264renameat💯
265linkat💯
266symlinkat💯
267readlinkat💯
268fchmodat💯
269faccessat💯
270pselect6💯
271ppoll⚠️
272unshare⚠️
273set_robust_list💯
274get_robust_listN/A
275spliceN/A
276teeN/A
277sync_file_rangeN/A
278vmspliceN/A
279move_pagesN/A
280utimensat⚠️
281epoll_pwait💯
282signalfd💯
283timerfd_create⚠️
284eventfd💯
285fallocate⚠️
286timerfd_settime⚠️
287timerfd_gettime💯
288accept4⚠️
289signalfd4💯
290eventfd2⚠️
291epoll_create1💯
292dup3💯
293pipe2⚠️
294inotify_init1⚠️
295preadv💯
296pwritev💯
297rt_tgsigqueueinfoN/A
298perf_event_openN/A
299recvmmsgN/A
300fanotify_initN/A
301fanotify_markN/A
302prlimit64💯
303name_to_handle_atN/A
304open_by_handle_atN/A
305clock_adjtimeN/A
306syncfs💯
307sendmmsg⚠️
308setns⚠️
309getcpu💯
310process_vm_readvN/A
311process_vm_writevN/A
312kcmpN/A
313finit_moduleN/A
314sched_setattr⚠️
315sched_getattr⚠️
316renameat2⚠️
318getrandom⚠️
319memfd_create⚠️
322execveat💯
327preadv2⚠️
328pwritev2⚠️
332statx⚠️
424pidfd_send_signal💯
434pidfd_open💯
435clone3⚠️
436close_range💯
438pidfd_getfd💯
439faccessat2⚠️
441epoll_pwait2💯
452fchmodat2💯
  • Supported:

    • ✅ = syscall supported
    • ❌ = not supported
  • Flag Coverage:

    • 💯 = all flags/commands/modes are supported
    • ⚠️ = syscall works, but some flags/modes are not implemented
    • ❓ = implementation exists, but we have not audited its coverage yet
    • N/A = not applicable (e.g., syscall not supported)

Most of these system calls (or their variants) are also supported for the RISC-V and LoongArch architectures.

File Systems

Here is the list of supported file systems:

  • Devfs
  • Devpts
  • Ext2
  • Procfs
  • Ramfs

Sockets

Here is the list of supported socket types:

  • TCP sockets over IPv4
  • UDP sockets over IPv4
  • Unix sockets

vDSO

Here is the list of supported symbols in vDSO:

  • __vdso_clock_gettime
  • __vdso_gettimeofday
  • __vdso_time

Boot Protocols

Here is the list of supported boot protocols:

Syscall Flag Coverage

This section documents the flag coverage of Asterinas’s implementation of Linux system calls. It introduces System Call Matching Language (SCML), a lightweight domain‑specific language for specifying allowed and disallowed patterns of system‑call invocations.

The rest of this section uses SCML to accurately and concisely describe both supported and unsupported functionality of system calls, which are divided into the following categories:

System Call Matching Language (SCML)

SCML specifies matching patterns for system‑call invocations. Asterinas developers can easily write SCML rules to describe supported patterns. Likewise, users and developers can intuitively read these rules to understand which system calls and features are available.

SCML is designed to integrate seamlessly with strace, the standard Linux system‑call tracer. Strace emits each invocation in a C‑style syntax; given a set of SCML rules, a tool can automatically determine whether a strace log entry conforms to the supported patterns. This paves the way for an SCML‑based analyzer that reports unsupported calls in any application’s trace.

Strace: A Quick Example

To illustrate, run strace on a simple “Hello, World!” program:

$ strace ./hello_world

A typical trace might look like this:

execve("./hello_world", ["./hello_world"], 0xffffffd3f710 /* 4 vars */) = 0
brk(NULL)                               = 0xaaaabdc1b000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffff890f4000
openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0\360\206\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1722920, ...}) = 0
…
write(1, "Hello, World!\n", 14)         = 14
exit_group(0)                           = ?

Key points of this output:

  • System calls are rendered as name(arg1, …, argN).
  • Flags appear as FLAG1|FLAG2|…|FLAGN.
  • Structs use {field1=value1, …}.
  • Arrays are shown as [value1, …].

SCML’s syntax draws directly from these conventions.

SCML by Example

SCML is intentionally simple: most Linux system‑call semantics hinge on bitflags. SCML rules act as templates: you define a rule once, and a human or an analyzer uses it to check if a syscall invocation matches it or not.

Imagine you’re developing a Linux-compatible OS (like Asterinas) that supports just a restricted subset of syscalls and their options. We will use SCML to describe the restricted functionality.

Matching Rules for System Calls

For example, your OS supports the open system call with one or more of the four flags: O_RDONLY, O_WRONLY, O_RDWR, and O_CLOEXEC: This constraint can be expressed in the following system call matching rule.

open(path, flags = O_RDONLY | O_WRONLY | O_RDWR | O_CLOEXEC);

To allow file creation, you add another matching rule that includes the O_CREAT flag and requires a mode argument:

open(path, flags = O_CREAT | O_RDONLY | O_WRONLY | O_RDWR | O_CLOEXEC, mode);

To support the O_PATH flag (only valid with O_CLOEXEC, not with O_RDONLY, O_WRONLY, or O_RDWR), you add a third matching rule:

open(path, flags = O_PATH | O_CLOEXEC);

SCML rules constrain only the flagged arguments; other parameters (like path and mode) accept any value.

In many system calls, the number of arguments may vary depending on the flags provided. To accommodate this, SCML allows you to use the .. wildcard in the parameter list. This indicates that any additional arguments are accepted, regardless of their value or count.

For example:

open(path, flags = O_RDONLY | O_WRONLY | O_RDWR | O_CLOEXEC, ..);

Here, the .. wildcard makes the rule flexible enough to match invocations of open with extra parameters, such as when the O_CREAT flag is present and a mode argument is required. This approach makes it easy to write concise rules that only constrain the arguments of interest, while allowing other parameters to vary as needed.

C-Style Comments

SCML also supports C‑style comments:

// All matching rules for the open syscall.
// A supported invocation of the open syscall must match at least one of the rules.
open(path, flags = O_RDONLY | O_WRONLY | O_RDWR | O_CLOEXEC);
open(path, flags = O_CREAT | O_RDONLY | O_WRONLY | O_RDWR | O_CLOEXEC, mode);
open(path, flags = O_PATH | O_CLOEXEC);

Matching Rules for Bitflags

Above, we embedded flag combinations directly within individual system‑call rules, which can lead to duplication and make maintenance harder. SCML allows you to define named bitflag rules that can be reused across multiple rules. This reduces repetition and centralizes your flag definitions. For example:

// Define a reusable bitflags rule
access_mode = O_RDONLY | O_WRONLY | O_RDWR;

open(path, flags = <access_mode> | O_CLOEXEC);
open(path, flags = O_CREAT | <access_mode> | O_CLOEXEC, mode);
open(path, flags = O_PATH | O_CLOEXEC);

Matching Rules for Structs

SCML can match flags inside struct fields. Consider sigaction:

struct sigaction = {
    sa_flags = SA_NOCLDSTOP | SA_NOCLDWAIT,
    ..
};

Here, .. is a wildcard for remaining fields that we do not care.

Then, we can write a system call rule that refers to the struct rule using the <struct_rule> syntax.

sigaction(signum, act = <sigaction>, oldact = <sigaction>);

Instead of defining a separate struct rule, you can also inline the struct pattern directly in the parameter list. This is convenient when the struct pattern is only used once or when you want to express different constraints for the same struct type in different contexts.

For example, the following rule inlines the struct pattern for capget:

capget(
    hdrp = {
        version = _LINUX_CAPABILITY_VERSION_3,
        ..
    },
    datap
);

Matching Rules for Arrays

SCML can describe how to match flags embedded inside the struct values of an array. This is the case of the poll system call. It takes an array of values of struct pollfd, whose event and revents fields are bitflags.

// Support all but the POLLPRI flags
events = POLLIN | POLLOUT | POLLRDHUP | POLLERR | POLLHUP | POLLNVAL;

struct pollfd = {
    events  = <events>,
    revents = <events>,
    ..
};

poll(fds = [ <pollfd> ], nfds, timeout);

Notice how SCML denotes an array with the [ <struct_rule> ] syntax.

Special Built-in Matching Rules

Bitflags-based matching rules described above are expressive enough to capture most patterns of interesting system call arguments. But some system call arguments cannot be characterized with bitflags. To address such cases, SCML introduces two special built-in matching rules: <PATH> and <INTEGER>

The file path matching rule

The <PATH> matching rule is used to denote a system call argument of a C-string file path. For example, the matching rules for the open system call can be enhanced with <PATH> as follows:

access_mode = O_RDONLY | O_WRONLY | O_RDWR;

open(path = <PATH>, flags = <access_mode> | O_CLOEXEC);
open(path = <PATH>, flags = O_CREAT | <access_mode> | O_CLOEXEC, mode);
open(path = <PATH>, flags = O_PATH | O_CLOEXEC);

File paths provide a new dimension to determine whether a system call is supported or not. Linux has multiple pseudo file systems such as DevTmpFS, ProcFS, SysFS, CgroupFS, and ConfigFS, mounted at well-known locations. A Linux-compatible OS such as Asterinas may only support a sub-tree of an pseudo FS. Knowing which system call arguments refer to file paths, a tool may be built to automatically issue warnings when unsupported file paths are accessed by system calls.

The integer matching rule

The <INTEGER> matching rule can match any integer system call argument such as 1234, -100, 0xdeadbeef, and 0o666. It can be used as a fallback rule when a system call takes an argument of either bitflags or integer.

timer_create(
    clockid =
        // Static clock IDs represented as bitflags
        CLOCK_PROCESS_CPUTIME_ID | CLOCK_THREAD_CPUTIME_ID | CLOCK_REALTIME | CLOCK_MONOTONIC | CLOCK_BOOTTIME |
        // Dynamic clock IDs (per-process or per-thread clock IDs)
        // represented as an integer value.
        <INTEGER>,
    sevp,
    timerid
);

Advanced Usage

Just like you can write multiple rules of the same system call, you may define multiple rules for the same struct:

// Rules for control message header
struct cmsghdr = {
    cmsg_level = SOL_SOCKET,
    cmsg_type  = SO_TIMESTAMP_OLD | SCM_RIGHTS | SCM_CREDENTIALS,
    ..
};
struct cmsghdr = {
    cmsg_level = SOL_IP,
    cmsg_type  = IP_TTL,
    ..
};

A cmsghdr value matches if it satisfies any one rule.

Struct rules may also be nested:

// Rule for message header, which refers to the rules for control message header
struct msghdr = {
    msg_control = [ <cmsghdr> ],
    ..
};

recvmsg(socket, message = <msghdr>, flags);

SCML supports arrays with nested structures and heterogeneous element types, as encountered in system calls like recvmsg where netlink message payloads follow the TLV (Type-Length-Value) format. Arrays can contain multiple elements of varying types: inline struct patterns ({ ... }), nested arrays ([ ... ]), or references to named rules (<identifier>). This flexibility allows SCML to represent hierarchical data structures as they appear in strace output.

For example, when receiving a netlink message about adding a network address:

struct iovec = {
    iov_base = [
        [
            {
                nlmsg_type = RTM_NEWADDR,
                ..
            },
            [
                [ { nla_type = IFA_CACHEINFO, .. } ]
            ]
        ]
    ],
    ..
};

recvmsg(
    sockfd,
    msg = {
        msg_iov = [ <iovec> ],
        ..
    },
    flags
);

This example demonstrates receiving a netlink message of type RTM_NEWADDR containing nested attributes with cache information (IFA_CACHEINFO). The nested array structure illustrates how SCML handles heterogeneous arrays where elements can be both structs and nested arrays, reflecting the hierarchical TLV encoding typical of netlink.

Formal Syntax

Below is the formal syntax of SCML, expressed in Extended Backus–Naur Form (EBNF). Non‑terminals are in angle brackets, terminals in quotes.

<scml>           ::= { <rule> }
<rule>           ::= <syscall-rule> ';' 
                   | <struct-rule> ';'
                   | <bitflags-rule> ';'

<syscall-rule>   ::= <identifier> '(' [ <param-list> ] ')'
<param-list>     ::= '..'
                   | <param> { ',' <param> } [ ',' '..' ]
<param>          ::= <identifier> '=' <flag-pattern>
                   | <identifier> '=' <struct-pattern>
                   | <identifier> '=' <array-pattern>
                   | <identifier>

<flag-pattern>   ::= <flag-part> { '|' <flag-part> }
<flag-part>      ::= <identifier>
                   | '<' <identifier> '>'

<array-pattern>  ::= '[' <array-element> { ',' <array-element> } ']'
<array-element>  ::= '<' <identifier> '>'
                   | <struct-pattern>
                   | <array-pattern>

<struct-rule>    ::= 'struct' <identifier> '=' <struct-pattern>
<struct-pattern> ::= '{' <field-list> [ ',' '..' ] '}'
<field-list>     ::= <field> { ',' <field> }
<field>          ::= <identifier>
                   | <identifier> '=' <flag-pattern>
                   | <identifier> '=' <struct-pattern>
                   | <identifier> '=' <array-pattern>

<bitflags-rule>  ::= <identifier> '=' <flag-pattern>

<identifier>     ::= letter { letter | digit | '_' }

comment          ::= '//' { any-char }

Process & Thread Management

sched_getattr and sched_setattr

Supported functionality in SCML:

// Get the scheduling policy of a "normal" thread
sched_getattr(
    pid,
    attr = {
        sched_policy = SCHED_OTHER | SCHED_BATCH | SCHED_IDLE,
        sched_flags = 0,
        ..
    },
    flags = 0,
);
// Set the scheduling policy of a "normal" thread
sched_setattr(
    pid,
    attr = {
        sched_policy = SCHED_OTHER | SCHED_BATCH | SCHED_IDLE,
        sched_flags = 0,
        ..
    },
    flags = 0,
);

// Get the scheduling policy of a real-time thread
sched_getattr(
    pid,
    attr = {
        sched_policy = SCHED_FIFO | SCHED_RR,
        sched_flags = 0,
        ..
    },
    flags = 0,
);
// Set the scheduling policy of a real-time thread
sched_setattr(
    pid,
    attr = {
        sched_policy = SCHED_FIFO | SCHED_RR,
        sched_flags = 0,
        ..
    },
    flags = 0,
);

Unsupported scheduling policies:

  • SCHED_DEADLINE

Unsupported scheduling flags:

  • SCHED_FLAG_RESET_ON_FORK
  • SCHED_FLAG_RECLAIM
  • SCHED_FLAG_DL_OVERRUN
  • SCHED_FLAG_UTIL_CLAMP_MIN
  • SCHED_FLAG_UTIL_CLAMP_MAX

wait4

Supported functionality in SCML:

// Wait until a specified child process undergoes a state change (termination, stopping and resumption)
wait4(
    pid, wstatus,
    options = WNOHANG | WSTOPPED | WCONTINUED | WNOWAIT,
    rusage
);

Ignored options:

  • WEXITED
  • WNOTHREAD
  • WALL
  • WCLONE

For more information, see the man page.

clone and clone3

Supported functionality in SCML:

signal_flags = SIGHUP | SIGINT | SIGQUIT | SIGILL |
               SIGTRAP | SIGABRT | SIGSTKFLT | SIGFPE |
               SIGKILL | SIGBUS | SIGSEGV | SIGXCPU |
               SIGPIPE | SIGALRM | SIGTERM | SIGUSR1 |
               SIGUSR2 | SIGCHLD | SIGPWR | SIGVTALRM |
               SIGPROF | SIGIO | SIGWINCH | SIGSTOP |
               SIGTSTP | SIGCONT | SIGTTIN | SIGTTOU |
               SIGURG | SIGXFSZ | SIGSYS | SIGRTMIN;

opt_flags =
    // Optional flags
    //
    // Share the parent's virtual memory
    CLONE_VM |
    // Share the parent's filesystem
    CLONE_FS |
    // Share the parent's file descriptor table
    CLONE_FILES |
    // Share the parent's signal handlers
    CLONE_SIGHAND |
    // Place child in the same thread group as parent
    CLONE_THREAD |
    // Share the parent's System V semaphore adjustments
    CLONE_SYSVSEM |
    // Suspend parent until the child exits or calls `execve`
    CLONE_VFORK |
    // Create a new mount namespace for the child
    CLONE_NEWNS |
    // Write child `TID` to parent's memory
    CLONE_PARENT_SETTID |
    // Allocate a `PID` file descriptor for the child
    CLONE_PIDFD |
    // Set thread-local storage for the child
    CLONE_SETTLS |
    // Write child `TID` to child's memory
    CLONE_CHILD_SETTID |
    // Clear child `TID` in child's memory on exit
    CLONE_CHILD_CLEARTID |
    // Make the child's parent the same as the caller's parent
    CLONE_PARENT;

// Create a thread or process
clone(
    fn, stack,
    flags = <opt_flags> | <signal_flags>,
    func_arg, ..
);

// Create a thread or process with enhanced control by providing structured arguments
clone3(
    clone_args = {
        flags = <opt_flags>,
        ..
    },
    size
);

For more information, see the man page.

sched_setscheduler

Supported functionality in SCML:

// Set scheduling policy
sched_setscheduler(pid, policy = SCHED_OTHER | SCHED_BATCH | SCHED_IDLE | SCHED_FIFO | SCHED_RR, param);

Unsupported policies or flags:

  • SCHED_RESET_ON_FORK

For more information, see the man page.

waitid

Supported functionality in SCML:

// Wait for a child process to change state
waitid(
    which = P_PID | P_PIDFD | P_PGID | P_ALL,
    pid, infop,
    options = WNOHANG | WSTOPPED | WCONTINUED | WNOWAIT,
    ru
);

Ignored options:

  • WEXITED

For more information, see the man page.

Memory Management

Memory Mappings

mmap and munmap

Supported functionality in SCML:

prot = PROT_NONE |
    PROT_EXEC |
    PROT_READ |
    PROT_WRITE;
opt_flags =
    MAP_ANONYMOUS |
    MAP_FIXED |
    MAP_FIXED_NOREPLACE |
    MAP_GROWSDOWN |
    MAP_HUGETLB |
    MAP_LOCKED |
    MAP_NONBLOCK |
    MAP_NORESERVE |
    MAP_POPULATE;

// Create a private memory mapping
mmap(
    addr, length,
    prot = <prot>,
    flags = MAP_PRIVATE | <opt_flags>,
    fd, offset
);

// Create a shared memory mapping
mmap(
    addr, length,
    prot = <prot>,
    flags = MAP_SHARED | MAP_SHARED_VALIDATE | <opt_flags>,
    fd, offset
);

// Unmap a memory mapping
munmap(addr, length);

Silently-ignored flags:

  • MAP_HUGETLB
  • MAP_GROWSDOWN
  • MAP_LOCKED
  • MAP_NONBLOCK
  • MAP_NORESERVE
  • MAP_POPULATE

Partially supported flags:

  • MAP_FIXED_NOREPLACE is treated as MAP_FIXED

Unsupported flags:

  • MAP_32BIT
  • MAP_HUGE_1GB
  • MAP_HUGE_2MB
  • MAP_UNINITIALIZED

For more information, see the man page.

msync

Supported functionality in SCML:

// Flush memory region to disk asynchronously
msync(
    addr, length,
    flags = MS_ASYNC | MS_INVALIDATE
);

// Flush memory region to disk synchronously
msync(
    addr, length,
    flags = MS_SYNC | MS_INVALIDATE
);

Silently-ignored flags:

  • MS_INVALIDATE is ignored because all processes use the same page cache

For more information, see the man page.

mremap

Supported functionality in SCML:

// Resize an existing memory mapping. Relocation is allowed if given `MREMAP_MAYMOVE`.
mremap(
    old_address,
    old_size,
    new_size,
    flags = MREMAP_MAYMOVE
);

// Resize an existing memory mapping and force relocation to a specified location.
mremap(
    old_address,
    old_size,
    new_size,
    flags = MREMAP_MAYMOVE | MREMAP_FIXED,
    new_address
);

For more information, see the man page.

mprotect

Supported functionality in SCML:

prot = PROT_NONE |
    PROT_EXEC |
    PROT_READ |
    PROT_WRITE;

// Set memory access permissions
mprotect(
    addr,
    len,
    prot = <prot>
);

Silently-ignored protection flags:

  • PROT_SEM
  • PROT_SAO
  • PROT_GROWSUP
  • PROT_GROWSDOWN

For more information, see the man page.

madvise

Supported functionality in SCML:

// Do not expect access in the near future and free associated resources
madvise(addr, length, advice = MADV_DONTNEED);

Silently-ignored advice:

  • MADV_NORMAL
  • MADV_RANDOM
  • MADV_SEQUENTIAL
  • MADV_WILLNEED
  • MADV_FREE
  • MADV_MERGEABLE
  • MADV_UNMERGEABLE
  • MADV_HUGEPAGE
  • MADV_NOHUGEPAGE

Unsupported advice:

  • MADV_RANDOM
  • MADV_REMOVE
  • MADV_DONTFORK
  • MADV_DOFORK
  • MADV_HWPOISON
  • MADV_UNMERGEABLE
  • MADV_SOFT_OFFLINE
  • MADV_DONTDUMP
  • MADV_DODUMP
  • MADV_FREE
  • MADV_WIPEONFORK
  • MADV_KEEPONFORK

For more information, see the man page.

File and Directory Operations

open and openat

Supported functionality of open in SCML:

access_mode =
    O_RDONLY |
    O_WRONLY |
    O_RDWR;
creation_flags =
    O_CLOEXEC |
    O_DIRECTORY |
    O_EXCL |
    O_NOCTTY |
    O_NOFOLLOW |
    O_TRUNC;
status_flags =
    O_APPEND |
    O_ASYNC |
    O_DIRECT |
    O_LARGEFILE |
    O_NOATIME |
    O_NONBLOCK |
    O_SYNC;

// Open an existing file
open(
    path,
    flags = <access_mode> | <creation_flags> | <status_flags>,
);
openat(
    dirfd,
    path,
    flags = <access_mode> | <creation_flags> | <status_flags>,
);

// Create a new file
open(
    path,
    flags = O_CREAT | <access_mode> | <creation_flags> | <status_flags>,
    mode
);
openat(
    dirfd,
    path,
    flags = O_CREAT | <access_mode> | <creation_flags> | <status_flags>,
    mode
);

// Status flags that are meaningful with O_PATH
opath_valid_flags = O_CLOEXEC | O_DIRECTORY | O_NOFOLLOW;
// All other flags are ignored with O_PATH
opath_ignored_flags = O_CREAT | <creation_flags> | <status_flags>;
// Obtain a file descriptor to indicate a location in FS
open(
    path,
    flags = O_PATH | <opath_valid_flags> | <opath_ignored_flags>
);
openat(
    dirfd,
    path,
    flags = O_PATH | <opath_valid_flags> | <opath_ignored_flags>
);

// Create an unnamed file
// open(path, flags = O_TMPFILE | <creation_flags> | <status_flags>)

Silently-ignored flags:

  • O_NOCTTY
  • O_DSYNC
  • O_SYNC
  • O_LARGEFILE
  • O_NOATIME
  • O_NOCTTY

Partially-supported flags:

  • O_PATH

Unsupported flags:

  • O_TMPFILE

Supported and unsupported functionality of openat are the same as open. The SCML rules are omitted for brevity.

For more information, see the man page.

renameat2

Supported functionality in SCML:

// Rename a file, moving it between directories if required.
renameat2(olddirfd, oldpath, newdirfd, newpath, 0);

Unsupported flags:

  • RENAME_EXCHANGE
  • RENAME_NOREPLACE
  • RENAME_WHITEOUT

For more information, see the man page.

lseek

Supported functionality in SCML:

// Set file offset
lseek(
    fd, offset,
    whence = SEEK_SET | SEEK_CUR | SEEK_END
);

Unsupported flags:

  • SEEK_DATA
  • SEEK_HOLE

For more information, see the man page.

newfstatat

Supported functionality in SCML:

// Retrieve file status by file descriptor
newfstatat(
    dirfd, path, statbuf,
    flags = AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW
);

Silently-ignored flags:

  • AT_NO_AUTOMOUNT

For more information, see the man page.

preadv2 and pwritev2

Supported functionality in SCML:

// Read data from multiple buffers
preadv2(fd, iov, iovcnt, offset, flags = 0);

// Write data to multiple buffers
pwritev2(fd, iov, iovcnt, offset, flags = 0);

Silently-ignored flags:

  • RWF_DSYNC
  • RWF_HIPRI
  • RWF_SYNC
  • RWF_NOWAIT

Unsupported flags:

  • RWF_APPEND
  • RWF_NOAPPEND
  • RWF_ATOMIC

For more information, see the man page.

faccessat2

Supported functionality in SCML:

// Check user's permissions for a file
faccessat2(
    dirfd, path, mode,
    flags = AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW
);

Silently-ignored flags:

  • AT_EACCESS

For more information, see the man page.

statx

Supported functionality in SCML:

statx_flags = AT_EMPTY_PATH | AT_NO_AUTOMOUNT | AT_STATX_FORCE_SYNC |
              AT_STATX_DONT_SYNC | AT_STATX_SYNC_AS_STAT | AT_SYMLINK_NOFOLLOW;
statx_mask = STATX_TYPE | STATX_MODE | STATX_NLINK | STATX_UID | STATX_GID |
             STATX_ATIME | STATX_MTIME | STATX_CTIME | STATX_INO | STATX_SIZE |
             STATX_BLOCKS | STATX_BASIC_STATS | STATX_BTIME | STATX_ALL |
             STATX_MNT_ID | STATX_DIOALIGN | STATX_MNT_ID_UNIQUE | STATX_SUBVOL |
             STATX_WRITE_ATOMIC | STATX_DIO_READ_ALIGN;

// Get file status (extended)
statx(
    dirfd, pathname,
    flags = <statx_flags>,
    mask = <statx_mask>,
    statxbuf
);

Silently-ignored flags:

  • AT_NO_AUTOMOUNT
  • AT_STATX_FORCE_SYNC
  • AT_STATX_DONT_SYNC

Silently-ignored masks:

  • STATX_DIOALIGN
  • STATX_MNT_ID_UNIQUE
  • STATX_SUBVOL
  • STATX_WRITE_ATOMIC
  • STATX_DIO_READ_ALIGN

For more information, see the man page.

fallocate

Supported functionality in SCML:

// Allocate disk space within the range specified
fallocate(fd, mode = FALLOC_FL_KEEP_SIZE, offset, size);

// Deallocate space (create a hole) while keeping the file size unchanged
fallocate(fd, mode = FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, offset, size);

Unsupported modes:

  • FALLOC_FL_UNSHARE_RANGE
  • FALLOC_FL_COLLAPSE_RANGE
  • FALLOC_FL_ZERO_RANGE
  • FALLOC_FL_INSERT_RANGE

For more information, see the man page.

utimensat

Supported functionality in SCML:

// Change file timestamps with nanosecond precision
utimensat(dirfd, path, times, flags = AT_SYMLINK_NOFOLLOW);

Unsupported flags:

  • AT_EMPTY_PATH

For more information, see the man page.

File Systems & Mount Control

Mount and Unmount File Systems

mount

Supported functionality in SCML:

// Create a new mount
mount(
    source, target, filesystemtype,
    mountflags = 0,
    data
);

// Move the existing mount point
mount(
    source, target, filesystemtype,
    mountflags = MS_MOVE,
    data
);

// Create a bind mount
mount(
    source, target, filesystemtype,
    mountflags = MS_BIND | MS_REC,
    data
);

Partially supported mount flags:

  • MS_REC is only effective when used in conjunction with MS_BIND
  • MS_REMOUNT can be used, but the set options have no actual effect.
  • MS_DIRSYNC can be set but have no actual effect.
  • MS_LAZYTIME can be set but have no actual effect.
  • MS_MANDLOCK can be set but have no actual effect.
  • MS_NOATIME can be set but have no actual effect.
  • MS_NODEV can be set but have no actual effect.
  • MS_NODIRATIME can be set but have no actual effect.
  • MS_NOEXEC can be set but have no actual effect.
  • MS_NOSUID can be set but have no actual effect.
  • MS_RDONLY can be set but have no actual effect.
  • MS_RELATIME can be set but have no actual effect.
  • MS_SILENT can be set but have no actual effect.
  • MS_STRICTATIME can be set but have no actual effect.
  • MS_SYNCHRONOUS can be set but have no actual effect.

Unsupported mount flags:

  • MS_SHARED
  • MS_SLAVE
  • MS_UNBINDABLE

For more information, see the man page.

umount and umount2

Supported functionality in SCML:

// Unmount a mounted file system
umount(target);

// Unmount a mounted file system with enhanced behavior control
umount2(target, flags = UMOUNT_NOFOLLOW);

Silently-ignored flags:

  • MNT_FORCE
  • MNT_DETACH
  • MNT_EXPIRE

For more information, see the man page.

Event Notifications

inotify_init and inotify_init1

Supported functionality in SCML:

// Create an inotify instance
inotify_init();

// Create an inotify instance with enhanced behavior control
inotify_init1(flags = IN_CLOEXEC | IN_NONBLOCK);

For more information, see the man page.

inotify_add_watch

Supported functionality in SCML:

inotify_events = IN_ACCESS | IN_MODIFY | IN_ATTRIB | IN_CLOSE_WRITE |
                 IN_CLOSE_NOWRITE | IN_OPEN | IN_CREATE | IN_DELETE |
                 IN_DELETE_SELF | IN_CLOSE;

inotify_controls = IN_ONLYDIR | IN_DONT_FOLLOW | IN_MASK_CREATE |
                   IN_MASK_ADD | IN_ONESHOT;

// Add a watch to an initialized inotify instance
inotify_add_watch(fd, pathname, mask = <inotify_events> | <inotify_controls>);

Unsupported event flags:

  • IN_MOVED_FROM and IN_MOVED_TO - Move events are not generated
  • IN_MOVE_SELF - Self move events are not generated
  • IN_UNMOUNT - Unmount events are not generated
  • IN_Q_OVERFLOW - Queue overflow events are not generated (events are silently dropped when queue is full)
  • IN_ALL_EVENTS - Only includes actually supported events

Unsupported control flags:

  • IN_EXCL_UNLINK - Events on unlinked files are not excluded

For more information, see the man page.

File Descriptor and I/O Control

fcntl

Supported functionality in SCML:

ignore_flags = O_RDONLY | O_WRONLY | O_RDWR | O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC;
can_change_flags = O_APPEND | O_ASYNC | O_DIRECT | O_NOATIME | O_NONBLOCK;

// Duplicate a file descriptor
fcntl(fd, cmd = F_DUPFD | F_DUPFD_CLOEXEC, arg);

// Retrieve file descriptor flags (F_GETFD), file status flags (F_GETFL)
// or SIGIO/SIGURG owner process (F_GETOWN)
fcntl(fd, cmd = F_GETFD | F_GETFL | F_GETOWN);

// Set file descriptor flags
fcntl(fd, cmd = F_SETFD, arg = FD_CLOEXEC);

// Set file status flags
fcntl(fd, cmd = F_SETFL, arg = <ignore_flags> | <can_change_flags>);

// Manage record locks: test (F_GETLK), non-blocking set (F_SETLK), blocking set (F_SETLKW)
fcntl(fd, cmd = F_GETLK | F_SETLK | F_SETLKW, arg);

// Assign SIGIO/SIGURG owner process
fcntl(fd, cmd = F_SETOWN, arg);

// Add seals to the inode referred to
fcntl(fd, cmd = F_ADD_SEALS, arg = F_SEAL_SEAL | F_SEAL_SHRINK | F_SEAL_GROW | F_SEAL_WRITE | F_SEAL_FUTURE_WRITE);

// Get seals from the inode referred to
fcntl(fd, cmd = F_GET_SEALS);

Unsupported commands:

  • F_NOTIFY
  • F_OFD_SETLK, F_OFD_SETLKW and F_OFD_GETLK
  • F_GETOWN_EX and F_SETOWN_EX
  • F_GETSIG and F_SETSIG
  • F_SETLEASE and F_GETLEASE
  • F_SETPIPE_SZ and F_GETPIPE_SZ
  • F_GET_RW_HINT and F_SET_RW_HINT
  • F_GET_FILE_RW_HINT and F_SET_FILE_RW_HINT

For more information, see the man page.

pipe and pipe2

Supported functionality in SCML:

// Create pipe
pipe(pipefd);

// Create pipe with enhanced behavior control
pipe2(pipefd, flags = O_CLOEXEC);

Silently-ignored flags:

  • O_DIRECT
  • O_NONBLOCK

For more information, see the man page.

eventfd and eventfd2

Supported functionality in SCML:

// Create event notification descriptor
eventfd(initval);

// Create event notification descriptor with enhanced behavior control
eventfd2(initval, flags = EFD_CLOEXEC);

Silently-ignored flags:

  • EFD_NONBLOCK
  • EFD_SEMAPHORE

For more information, see the man page.

memfd_create

Supported functionality in SCML:

// Create an anonymous file and return a file descriptor that refers to it
memfd_create(name, flags = MFD_CLOEXEC | MFD_ALLOW_SEALING);

Silently-ignored flags:

  • MFD_HUGETLB

Unsupported flags:

  • MFD_HUGE_64KB
  • MFD_HUGE_512KB
  • MFD_HUGE_1MB
  • MFD_HUGE_2MB
  • MFD_HUGE_8MB
  • MFD_HUGE_16MB
  • MFD_HUGE_32MB
  • MFD_HUGE_256MB
  • MFD_HUGE_512MB
  • MFD_HUGE_1GB
  • MFD_HUGE_2GB
  • MFD_HUGE_16GB

For more information, see the man page.

fadvise64

Supported functionality in SCML:

// Announce an intention to access file data in a specific pattern in the future
fadvise64(fd, offset, len, advice = 0);

Silently-ignored flags:

  • POSIX_FADV_NORMAL
  • POSIX_FADV_RANDOM
  • POSIX_FADV_SEQUENTIAL
  • POSIX_FADV_WILLNEED
  • POSIX_FADV_DONTNEED
  • POSIX_FADV_NOREUSE

For more information, see the man page.

epoll_ctl

Supported functionality in SCML:

struct epoll_event = {
    events = EPOLLIN | EPOLLOUT | EPOLLRDHUP | EPOLLPRI | EPOLLERR | EPOLLHUP |
             EPOLLET | EPOLLONESHOT,
    ..
};

// Add, modify, or delete entries in the interest list of an epoll file descriptor
epoll_ctl(
    epfd,
    op = EPOLL_CTL_ADD | EPOLL_CTL_MOD | EPOLL_CTL_DEL,
    fd, event = <epoll_event>
);

Unsupported flags in events:

  • EPOLLEXCLUSIVE
  • EPOLLWAKEUP

For more information, see the man page.

poll and ppoll

Supported functionality in SCML:

struct pollfd = {
    events = POLLIN | POLLPRI | POLLOUT | POLLRDHUP | POLLERR |
             POLLHUP | POLLNVAL | POLLRDNORM,
    ..
};

// Wait for I/O event on a set of file descriptors
poll(fds = [ <pollfd> ], nfds, timeout);

// Wait for I/O event on a set of file descriptors with specified signals temporarily blocked
ppoll(fds = [ <pollfd> ], nfds, timeout, sigset, sigset_size);

Unsupported events:

  • POLLRDBAND
  • POLLWRNORM
  • POLLWRBAND

For more information, see the man page.

ioctl

Supported functionality in SCML:

tty_ops = TCGETS | TCSETS | TCSETSW | TCSETSF | TIOCGWINSZ | TIOCSWINSZ |
          TIOCGPTN | FIONREAD | KDFONTOP | KDSETMODE | KDGETMODE |
          KDSKBMODE | KDGKBMODE;
term_ops = TIOCGPGRP | TIOCSPGRP | TIOCSCTTY | TIOCNOTTY | TIOCGSID;
pty_master_ops = TIOCSPTLCK | TIOCGPTLCK | TIOCGPTPEER | TIOCPKT | TIOCGPKT;

// Control file descriptor flags and I/O modes
ioctl(fd, op = FIONCLEX | FIOCLEX | FIONBIO | FIOASYNC, ..);

// Control terminal devices
ioctl(fd, op = <pty_master_ops> | <tty_ops> | <term_ops>, ..);

// Control framebuffer display
ioctl(
    fd,
    op = FBIOGET_VSCREENINFO | FBIOPUT_VSCREENINFO | FBIOGET_FSCREENINFO |
         FBIOGETCMAP | FBIOPUTCMAP | FBIOPAN_DISPLAY | FBIOBLANK,
    ..
);

// Control Event devices (evdev)
ioctl(
    fd,
    op = EVIOCGVERSION | EVIOCGID | EVIOCGNAME | EVIOCGPHYS | EVIOCGUNIQ |
         EVIOCGKEY | EVIOCGLED | EVIOCGSW | EVIOCSCLOCKID | EVIOCGBIT,
    ..
);

// Control block devices
ioctl(fd, op = BLKGETSIZE64, ..);

// Control Trust Domain Extensions (TDX) guest devices
ioctl(fd, op = TDX_CMD_GET_REPORT0, ..);

For more information, see the man page.

ioprio_set and ioprio_get

Supported functionality in SCML:

// Set the I/O priority for a single thread
ioprio_set(which = IOPRIO_WHO_PROCESS, who, ioprio);

// Get the I/O priority for a single thread
ioprio_get(which = IOPRIO_WHO_PROCESS, who);

Unsupported selectors:

  • IOPRIO_WHO_PGRP
  • IOPRIO_WHO_USER

For more information, see the man page.

Inter-Process Communication

futex

Supported functionality in SCML:

opt_flags = FUTEX_PRIVATE_FLAG | FUTEX_CLOCK_REALTIME;

// Block current thread if target value at `uaddr` matches `val`, and wait up to `timeout`.
futex(
    uaddr,
    futex_op = FUTEX_WAIT | <opt_flags>,
    val, timeout
);

// Block current thread with bitmask condition if target value at `uaddr` matches `val`,
// and wait up to `timeout`.
futex(
    uaddr,
    futex_op = FUTEX_WAIT_BITSET | <opt_flags>,
    val, timeout, unused = NULL, bitmask
);

// Unblock up to `max_waiters` threads waiting on `uaddr`
futex(
    uaddr,
    futex_op = FUTEX_WAKE | <opt_flags>,
    max_waiters
);

// Unblock up to `max_waiters` threads on `uaddr`, if the value on `uaddr` matches `bitmask`
futex(
    uaddr,
    futex_op = FUTEX_WAKE_BITSET | <opt_flags>,
    max_waiters, unused0 = NULL, unused1 = NULL, bitmask
);

// Unblock up to `max_waiters` threads waiting on `uaddr`, and requeue up to
// `max_requeue_waiters` of the remaining waiters to the target futex at `uaddr2`.
futex(
    uaddr,
    futex_op = FUTEX_REQUEUE | <opt_flags>,
    max_waiters, max_requeue_waiters, uaddr2
);

// Perform atomic operation encoded in `operation` on `uaddr2`. Unblock up to `max_waiters`
// threads waiting on `uaddr`, and conditionally unblock up to `max_waiters2` threads
// waiting on `uaddr2` based on the result of the atomic operation.
futex(
    uaddr,
    futex_op = FUTEX_WAKE_OP | <opt_flags>,
    max_waiters, max_waiters2, uaddr2, operation
);

Unsupported operations:

  • FUTEX_FD
  • FUTEX_CMP_REQUEUE
  • FUTEX_LOCK_PI
  • FUTEX_UNLOCK_PI
  • FUTEX_TRYLOCK_PI

For more information, see the man page.

System V semaphore

semget

Supported functionality in SCML:

// Creat or open a semaphore set
semget(
    key,
    nsems,
    semflg = IPC_CREAT | IPC_EXCL
);

For more information, see the man page.

semop and semtimedop

Supported functionality in SCML:

struct sembuf = {
    sem_flg = IPC_NOWAIT,
    ..
};

// Semaphore operations without blocking
semop(
    semid,
    sops = [ <sembuf> ],
    nsops
);

Unsupported semaphore flags:

  • SEM_UNDO

Supported and unsupported functionality of semtimedop are the same as semop. The SCML rules are omitted for brevity.

For more information, see the man page.

semctl

Supported functionality in SCML:

// Remove the semaphore set
semctl(
    semid,
    semnum,
    cmd = IPC_RMID
);

// Initialize the value of the semnum-th semaphore
semctl(
    semid,
    semnum,
    cmd = SETVAL,
    arg
);

// Return the current value (GETVAL), last operating process's PID (GETPID),
// count of processes awaiting increment (GETNCNT) or count of processes awaiting
// zero (GETZCNT) of the semnum-th semaphore
semctl(
    semid,
    semnum,
    cmd = GETVAL | GETPID | GETNCNT | GETZCNT
);

// Retrieve a copy of the `semid_ds` kernel structure for the specified semaphore set
semctl(
    semid,
    semnum,
    cmd = IPC_STAT,
    arg
);

Unsupported commands:

  • IPC_INFO
  • SEM_INFO
  • SEM_STAT
  • SEM_STAT_ANY
  • GETALL
  • SETALL

For more information, see the man page.

Networking & Sockets

Socket Creation

socket

Supported functionality in SCML:

// Optional flags for socket type
opt_type_flags = SOCK_NONBLOCK | SOCK_CLOEXEC;

// Create a UNIX socket
socket(
    family = AF_UNIX,
    type = SOCK_STREAM | SOCK_SEQPACKET | <opt_type_flags>,
    protocol = 0
);

// Create an IPv4 socket (TCP or UDP)
socket(
    family = AF_INET,
    type = SOCK_STREAM | SOCK_DGRAM | <opt_type_flags>,
    protocol = IPPROTO_IP | IPPROTO_TCP | IPPROTO_UDP
);

// Create a netlink socket
socket(
    family = AF_NETLINK,
    type = SOCK_RAW | SOCK_DGRAM | <opt_type_flags>,
    protocol = NETLINK_ROUTE | NETLINK_KOBJECT_UEVENT
);

// Create a VSOCK socket
socket(
    family = AF_VSOCK,
    type = SOCK_STREAM | <opt_type_flags>,
    protocol = 0
);

For more information, see the man page.

socketpair

Supported functionality in SCML:

// Optional flags for socket type
opt_type_flags = SOCK_NONBLOCK | SOCK_CLOEXEC;

// Create a pair of connected UNIX sockets
socketpair(
    family = AF_UNIX,
    type = SOCK_STREAM | SOCK_SEQPACKET | <opt_type_flags>,
    protocol = 0,
    sv
);

For more information, see the man page.

Socket Setup

bind

Supported functionality in SCML:

struct sockaddr = {
    sa_family = AF_INET | AF_UNIX | AF_NETLINK | AF_VSOCK,
    ..
};

// Bind a socket to an address
bind(
    sockfd, addr = <sockaddr>, addrlen
);

For more information, see the man page.

connect

Supported functionality in SCML:

struct sockaddr = {
    sa_family = AF_INET | AF_UNIX | AF_NETLINK | AF_VSOCK,
    ..
};

// Connect to a peer socket
connect(
    sockfd, addr = <sockaddr>, addrlen
);

For more information, see the man page.

accept and accept4

Supported functionality in SCML:

struct sockaddr = {
    sa_family = AF_INET | AF_UNIX | AF_VSOCK,
    ..
};

// Accept an incoming connection
accept(
    sockfd, addr = <sockaddr>, addrlen
);

// Accept an incoming connection and set flags for the new socket
accept4(
    sockfd, addr = <sockaddr>, addrlen,
    flags = SOCK_NONBLOCK | SOCK_CLOEXEC
);

For more information, see the man page.

Socket Communication

sendto, sendmsg and sendmmsg

Supported functionality in SCML:

struct sockaddr = {
    sa_family = AF_INET | AF_UNIX | AF_NETLINK | AF_VSOCK,
    ..
};

struct msg_hdr = {
    msg_name = <sockaddr>,
    msg_control = NULL,
    ..
};

struct mmsg_hdr = {
    hdr = <msg_hdr>,
    msg_len
};

// Send message on a socket
sendto(
    sockfd, buf, len,
    flags = 0,
    dest_addr = <sockaddr>,
    addrlen
);

// Send message using scatter-gather buffers and ancillary data
sendmsg(
    sockfd,
    msg = <msg_hdr>,
    flags = 0
);


// Send multiple messages on a socket
sendmmsg(
    sockfd,
    mmsgs = [ <mmsg_hdr> ],
    mmsg_count,
    flags = 0
);

Unsupported flags:

  • MSG_CONFIRM
  • MSG_DONTROUTE
  • MSG_DONTWAIT
  • MSG_EOR
  • MSG_MORE
  • MSG_CONFIRM
  • MSG_NOSIGNAL
  • MSG_OOB
  • MSG_FASTOPEN

For more information, see the man page.

recvfrom and recvmsg

Supported functionality in SCML:

// Receive message from a socket
recvfrom(
    sockfd, buf, size,
    flags = 0,
    src_addr, addrlen
);

// Receive message using scatter-gather buffers and ancillary data
recvmsg(
    sockfd,
    msg,
    flags = 0
);

Partially-supported flags:

  • MSG_PEEK because it is only supported in netlink socket

For more information, see the man page.

Socket Options

getsockopt and setsockopt

Supported functionality in SCML:

socket_options = SO_SNDBUF | SO_RCVBUF | SO_REUSEADDR | SO_REUSEPORT |
                 SO_PRIORITY | SO_LINGER | SO_PASSCRED | SO_KEEPALIVE |
                 SO_SNDBUFFORCE | SO_RCVBUFFORCE | SO_ERROR |
                 SO_PEERCRED | SO_ACCEPTCONN | SO_PEERGROUPS;

ip_options = IP_TOS | IP_TTL | IP_HDRINCL;

tcp_options = TCP_NODELAY | TCP_MAXSEG | TCP_KEEPIDLE | TCP_SYNCNT |
              TCP_DEFER_ACCEPT | TCP_WINDOW_CLAMP | TCP_CONGESTION |
              TCP_USER_TIMEOUT | TCP_INQ;

// Get options at socket level
getsockopt(
    sockfd, level = SOL_SOCKET,
    optname = <socket_options>,
    optval, optlen
);

// Get options at IP level
getsockopt(
    sockfd, level = SOL_IP,
    optname = <ip_options>,
    optval, optlen
);

// Get options at TCP level
getsockopt(
    sockfd, level = SOL_TCP,
    optname = <tcp_options>,
    optval, optlen
);

// Set options at socket level
setsockopt(
    sockfd, level = SOL_SOCKET,
    optname = <socket_options>,
    optval, optlen
);

// Set options at IP level
setsockopt(
    sockfd, level = SOL_IP,
    optname = <ip_options>,
    optval, optlen
);

// Set options at TCP level
setsockopt(
    sockfd, level = SOL_TCP,
    optname = <tcp_options>,
    optval, optlen
);

// Set options at netlink level
setsockopt(
    sockfd, level = SOL_NETLINK,
    optname = NETLINK_ADD_MEMBERSHIP | NETLINK_DROP_MEMBERSHIP,
    optval, optlen
);

For more information, see the man page.

Signals & Timers

Signals

rt_sigaction

Supported functionality in SCML:

// Change and/or retrieve a signal action
rt_sigaction(
    signum,
    act = {
        sa_flags = SA_ONSTACK | SA_RESTART | SA_NODEFER | SA_RESTORER | SA_SIGINFO | SA_RESETHAND,
        ..
    },
    oldact, sigsetsize
);

Unsupported sigaction flags:

  • SA_NOCLDSTOP
  • SA_NOCLDWAIT

For more information, see the man page.

rt_sigprocmask

Supported functionality in SCML:

// Change and/or retrieve blocked signals
rt_sigprocmask(
    how = SIG_BLOCK | SIG_UNBLOCK | SIG_SETMASK, set, oldset, sigsetsize
);

For more information, see the man page.

POSIX Interval Timers

timer_create

Supported functionality in SCML:

opt_notify_methods = SIGEV_NONE | SIGEV_SIGNAL | SIGEV_THREAD_ID;

// Create a timer with predefined clock source
timer_create(
    clockid = CLOCK_PROCESS_CPUTIME_ID | CLOCK_THREAD_CPUTIME_ID | CLOCK_REALTIME | CLOCK_MONOTONIC,
    sevp = {
        sigev_notify = <opt_notify_methods>,
        ..
    },
    timerid
);

// Create a timer based on a per-process or per-thread clock
timer_create(
    clockid = <INTEGER>,
    sevp = {
        sigev_notify = <opt_notify_methods>,
        ..
    },
    timerid
);

Partially-supported clock IDs:

  • CLOCK_BOOTTIME is treated as CLOCK_MONOTONIC

Unsupported predefined clock IDs:

  • CLOCK_REALTIME_ALARM
  • CLOCK_BOOTTIME_ALARM
  • CLOCK_TAI

Unsupported notification methods:

  • SIGEV_THREAD

For more information, see the man page.

timerfd_create

Supported functionality in SCML:

// Create a timer file descriptor with a predefined clock source
timerfd_create(
    clockid = CLOCK_REALTIME | CLOCK_MONOTONIC,
    flags = TFD_CLOEXEC | TFD_NONBLOCK
);

Partially-supported clock IDs:

  • CLOCK_BOOTTIME is treated as CLOCK_MONOTONIC

Unsupported predefined clock IDs:

  • CLOCK_REALTIME_ALARM
  • CLOCK_BOOTTIME_ALARM

For more information, see the man page.

timerfd_settime

Supported functionality in SCML:

// Arm or disarm a timer file descriptor
timerfd_settime(
    ufd,
    flags = TFD_TIMER_ABSTIME,
    utmr,
    otmr
);

Ignored flags:

  • TFD_TIMER_CANCEL_ON_SET

For more information, see the man page.

Namespaces, Cgroups & Security

prctl

Supported functionality in SCML:

// Retrieve or set the parent-death signal
prctl(op = PR_GET_PDEATHSIG | PR_SET_PDEATHSIG, sig);

// Get or set the name of calling thread
prctl(op = PR_GET_NAME | PR_SET_NAME, name);

// Query whether process retains permitted capabilities after `UID` changes
prctl(op = PR_GET_KEEPCAPS);

// Configure permitted capabilities retention after `UID` changes
prctl(op = PR_SET_KEEPCAPS, state);

// Retrieve or set "child subreaper" attribute
prctl(op = PR_GET_CHILD_SUBREAPER | PR_SET_CHILD_SUBREAPER, isset);

// Retrieve or set the timer slack value (nanoseconds)
prctl(op = PR_GET_TIMERSLACK | PR_SET_TIMERSLACK, slack_ns);

Partially-supported operations:

  • PR_GET_DUMPABLE and PR_SET_DUMPABLE because coredump is not supported

Unsupported operations:

  • PR_CAP_AMBIENT, PR_CAPBSET_READ and PR_CAPBSET_DROP
  • PR_GET_ENDIAN and PR_SET_ENDIAN
  • PR_GET_FP_MODE and PR_SET_FP_MODE
  • PR_GET_FPEMU and PR_SET_FPEMU
  • PR_GET_FPEXC and PR_SET_FPEXC
  • PR_GET_IO_FLUSHER and PR_SET_IO_FLUSHER
  • PR_MCE_KILL and PR_MCE_KILL_GET
  • PR_SET_MM and PR_SET_VMA
  • PR_MPX_ENABLE_MANAGEMENT and PR_MPX_DISABLE_MANAGEMENT
  • PR_GET_NO_NEW_PRIVS and PR_SET_NO_NEW_PRIVS
  • PR_PAC_RESET_KEYS
  • PR_SET_PTRACER
  • PR_GET_SECCOMP and PR_SET_SECCOMP
  • PR_GET_SPECULATION_CTRL and PR_SET_SPECULATION_CTRL
  • PR_SVE_GET_VL and PR_SVE_SET_VL
  • PR_SET_SYSCALL_USER_DISPATCH
  • PR_GET_TAGGED_ADDR_CTRL and PR_SET_TAGGED_ADDR_CTRL
  • PR_TASK_PERF_EVENTS_ENABLE and PR_TASK_PERF_EVENTS_DISABLE
  • PR_GET_THP_DISABLE and PR_SET_THP_DISABLE
  • PR_GET_TID_ADDRESS
  • PR_GET_TIMING and PR_SET_TIMING
  • PR_GET_TSC and PR_SET_TSC
  • PR_GET_UNALIGN and PR_SET_UNALIGN
  • PR_GET_AUXV
  • PR_GET_MDWE and PR_SET_MDWE
  • PR_RISCV_SET_ICACHE_FLUSH_CTX

For more information, see the man page.

capget and capset

Supported functionality in SCML:

// Get capabilities of thread
capget(
    hdrp = {
        version = _LINUX_CAPABILITY_VERSION_3,
        ..
    },
    datap
);

// Set capabilities of thread
capset(
    hdrp = {
        version = _LINUX_CAPABILITY_VERSION_3,
        ..
    },
    datap
);

Unsupported versions:

  • _LINUX_CAPABILITY_VERSION_1
  • _LINUX_CAPABILITY_VERSION_2

For more information, see the man page.

unshare

Supported functionality in SCML:

// Disassociate parts of the process execution context
unshare(flags = CLONE_FILES | CLONE_FS | CLONE_NEWNS | CLONE_NEWUTS | CLONE_THREAD | CLONE_SIGHAND | CLONE_VM);

Unsupported flags:

  • CLONE_NEWCGROUP
  • CLONE_NEWIPC
  • CLONE_NEWNET
  • CLONE_NEWPID
  • CLONE_NEWTIME
  • CLONE_NEWUSER

Silently-ignored flags:

  • CLONE_SYSVSEM

For more information, see the man page.

setns

Supported functionality in SCML:

// Reassociate thread with a namespace
setns(fd, ns_type = CLONE_NEWNS | CLONE_NEWUTS);

Unsupported flags:

  • CLONE_NEWCGROUP
  • CLONE_NEWIPC
  • CLONE_NEWNET
  • CLONE_NEWPID
  • CLONE_NEWTIME
  • CLONE_NEWUSER

For more information, see the man page.

System Information & Misc.

arch_prctl

Supported functionality in SCML:

// Get or set the FS register
arch_prctl(
    code = ARCH_GET_FS | ARCH_SET_FS,
    addr
);

Unsupported codes:

  • ARCH_GET_CPUID and ARCH_SET_CPUID
  • ARCH_GET_GS and ARCH_SET_GS

For more information, see the man page.

getrusage

Supported functionality in SCML:

// Return resource usage statistics for the calling process
getrusage(
    who = RUSAGE_SELF,
    usage
);

// Return resource usage statistics for the calling thread
getrusage(
    who = RUSAGE_THREAD,
    usage
);

Unsupported who flags:

  • RUSAGE_CHILDREN

For more information, see the man page.

getrandom

Supported functionality in SCML:

// Obtain random bytes
getrandom(
    buf, buflen,
    flags =
        // Optional flags:
        //
        // High-entropy pool
        GRND_RANDOM
);

Silently-ignored flags:

  • GRND_NONBLOCK because the underlying operation never blocks

For more information, see the man page.

reboot

Supported functionality in SCML:

reboot_magic2 = LINUX_REBOOT_MAGIC2 | LINUX_REBOOT_MAGIC2A | LINUX_REBOOT_MAGIC2B | LINUX_REBOOT_MAGIC2C;

// Stop the current system
reboot(
    magic  = LINUX_REBOOT_MAGIC1,
    magic2 = <reboot_magic2>,
    op     = LINUX_REBOOT_CMD_HALT | LINUX_REBOOT_CMD_POWER_OFF | LINUX_REBOOT_CMD_RESTART,
    arg
);

Unsupported op flags:

  • LINUX_REBOOT_CMD_CAD_OFF
  • LINUX_REBOOT_CMD_CAD_ON
  • LINUX_REBOOT_CMD_KEXEC
  • LINUX_REBOOT_CMD_RESTART2
  • LINUX_REBOOT_CMD_SW_SUSPEND

For more information, see the man page.

POSIX Clocks

clock_gettime

Supported functionality in SCML:

predefined_clockid = CLOCK_REALTIME | CLOCK_MONOTONIC | CLOCK_MONOTONIC_RAW |
                     CLOCK_REALTIME_COARSE | CLOCK_MONOTONIC_COARSE | CLOCK_BOOTTIME |
                     CLOCK_PROCESS_CPUTIME_ID | CLOCK_THREAD_CPUTIME_ID;

// Get the time of a clock specified by a static ID
clock_gettime(clockid = <predefined_clockid>, tp);

// Get the time of a clock specified by a dynamic ID
clock_gettime(clockid = <INTEGER>, tp);

Unsupported predefined clock IDs:

  • CLOCK_REALTIME_ALARM
  • CLOCK_BOOTTIME_ALARM
  • CLOCK_TAI

For more information, see the man page.

clock_nanosleep

Supported functionality in SCML:

// Sleep with a specified clock
clock_nanosleep(
    clockid = CLOCK_REALTIME | CLOCK_MONOTONIC | CLOCK_BOOTTIME | CLOCK_PROCESS_CPUTIME_ID,
    flags =
        // Optional flags:
        //
        // Sleep until an absolute time point
        TIMER_ABSTIME,
    t, remain
);

Unsupported clock IDs:

  • CLOCK_TAI

For more information, see the man page.

Kernel Parameters

This section documents kernel command-line parameters supported by Asterinas.

Inherited from Linux

init

Run the specified binary as init.

Example:

init=/bin/busybox

Notes:

  • The value is the path to the executable.
  • If omitted, Asterinas will fail to boot.

console

Select console devices for kernel messages. This parameter may be specified multiple times. Kernel messages are delivered to each listed console.

Valid values:

  • tty0
  • ttyS0
  • hvc0

Examples:

console=ttyS0
console=ttyS0 console=hvc0

Asterinas-specific

ostd.log_level

Set the verbosity level for Asterinas’s logs.

Valid values:

  • off
  • error
  • warn
  • info
  • debug
  • trace

Example:

ostd.log_level=error

i8042.exist

Override ACPI’s indication of whether a PS/2 (i8042) controller exists.

Valid values:

  • 1, on, yes, true or no value — treat the i8042 controller as present (force probing)
  • 0, off, no, false - treat the i8042 controller as absent (skip probing)

Examples:

i8042.exist
i8042.exist=1
i8042.exist=0

Roadmap

Asterinas is a general-purpose OS kernel designed to support multiple CPU architectures and a variety of use cases. Currently, it only supports x86-64 VMs. Our roadmap includes the following plans:

  • By 2024, we aim to achieve production-ready status for VM environments on x86-64.
  • In 2025 and beyond, we will expand our support for CPU architectures and hardware devices.

Target Early Use Cases

One of the biggest challenges for a new OS kernel is driver support. Linux has been widely accepted due to its wide range of hardware support. As a newcomer, Asterinas faces the challenge of implementing drivers for all devices on a target platform, which would take a significant amount of time.

To address this obstacle, we have decided to enter the cloud market first. In an IaaS (Infrastructure-as-a-Service) cloud, workloads of different tenants are run in VMs or VM-style bare-metal servers for maximum isolation and elasticity. The main device driver requirement for the VM environment is virtio, which is already supported by Asterinas. Therefore, using Asterinas as the guest OS of a VM or the host OS of a VM-style bare-metal server in production looks quite feasible in the near future.

Asterinas provides high assurance of memory safety thanks to the framekernel architecture. Thus, in the cloud setting, Asterinas is attractive for usage scenarios where Linux ABI is necessary but Linux itself is considered insecure due to its large Trusted Computing Base (TCB) and memory unsafety. Specifically, we are focusing on two use cases:

  1. VM-based TEEs: All major CPU architectures have introduced VM-based Trusted Execution Environment (TEE) technology, such as ARM CCA, AMD SEV, and Intel TDX. Applications running inside TEEs often handle private or sensitive data. By running on a lightweight and memory-safe OS kernel like Asterinas, they can greatly enhance security and privacy.

  2. Secure containers: In the cloud-native era, applications are commonly deployed in containers. The popular container runtimes like runc and Docker rely on the OS-level isolation enforced by Linux. However, Linux containers are prone to privilege escalation bugs. With its safety and security prioritized architecture, Asterinas can offer more reliable OS-level isolation, making it ideal for secure containers.

Asterinas OSTD

Confucious remarked, “I could follow whatever my heart desired without transgressing the law.”

子曰: “从心所欲,不逾矩。”

With the Asterinas OSTD (Operating System Standard Library), you don’t have to learn the dark art of unsafe Rust programming and risk shooting yourself in the foot. You will be doing whatever your heart desires, and be confident that your kernel will never crash or be hacked due to undefined behaviors, even if today marks your Day 1 as a Rust programmer.

APIs

Asterinas OSTD stands as a powerful and solid foundation for safe kernel development, providing high-level safe Rust APIs that are

  1. Essential for OS development, and
  2. Dependent on the use of unsafe Rust.

Most of these APIs fall into the following categories:

  • Memory management (e.g., allocating and accessing physical memory pages)
  • Task management (e.g., context switching between kernel tasks)
  • User space (e.g., manipulating and entering the user space)
  • Interrupt handling (e.g., registering interrupt handlers)
  • Timer management (e.g., registering timer handlers)
  • Driver support (e.g., performing DMA and MMIO)
  • Boot support (e.g., retrieving information from the bootloader)
  • Synchronization (e.g., locking and sleeping)

To explore how these APIs come into play, see the example of a 100-line kernel in safe Rust.

The OSTD APIs have been extensively documented. You can access the comprehensive API documentation by visiting the docs.rs.

Four Requirements Satisfied

In designing and implementing OSTD, we have risen to meet the challenge of fulfilling the aforementioned four criteria as demanded by the framekernel architecture.

Expressiveness is evident through Asterinas Kernel itself, where all system calls, file systems, network protocols, and device drivers (e.g., Virtio drivers) have been implemented in safe Rust by leveraging OSTD.

Adopting a minimalist philosophy, OSTD has a small codebase. At its core lies the ostd crate, currently encompassing about 10K lines of code - a figure that is even smaller than those of many microkernels. As OSTD evolves, its codebase will expand, albeit at a relatively slow rate in comparison to the OS services layered atop it.

The OSTD’s efficiency is measurable through the performance metrics of its APIs and the system calls of Asterinas Kernel. No intrinsic limitations have been identified within Rust or the framekernel architecture that could hinder kernel performance.

Soundness, unlike the other three requirements, is not as easily quantified or proved. While formal verification stands as the gold standard, it requires considerable resources and time and is not an immediate priority. As a more pragmatic approach, we will explain why the high-level design is sound in the soundness analysis and rely on the many eyes of the community to catch any potential flaws in the implementation.

Example: Writing a Kernel in About 100 Lines of Safe Rust

To give you a sense of how Asterinas OSTD enables writing kernels in safe Rust, we will show a new kernel in about 100 lines of safe Rust.

Our new kernel will be able to run the following Hello World program.

# SPDX-License-Identifier: MPL-2.0

.global _start                      # entry point
.section .text                      # code section
_start:
    mov     $1, %rax                # syscall number of write
    mov     $1, %rdi                # stdout
    mov     $message, %rsi          # address of message         
    mov     $message_end, %rdx
    sub     %rsi, %rdx              # calculate message len
    syscall
    mov     $60, %rax               # syscall number of exit, move it to rax
    mov     $0, %rdi                # exit code, move it to rdi
    syscall  

.section .rodata                    # read only data section
message:
    .ascii  "Hello, world\n"
message_end:

The assembly program above can be compiled with the following command.

gcc -static -nostdlib hello.S -o hello

The user program above requires our kernel to support three main features:

  1. Loading a program as a process image in user space;
  2. Handling the write system call;
  3. Handling the exit system call.

A sample implementation of the kernel in safe Rust is given below. Comments are added to highlight how the APIs of Asterinas OSTD enable safe kernel development.

// SPDX-License-Identifier: MPL-2.0

#![no_std]
#![deny(unsafe_code)]

extern crate alloc;

use align_ext::AlignExt;
use core::str;

use alloc::sync::Arc;
use alloc::vec;

use ostd::arch::cpu::context::UserContext;
use ostd::mm::{
    CachePolicy, FallibleVmRead, FrameAllocOptions, PageFlags, PageProperty, Vaddr, VmIo, VmSpace,
    VmWriter, PAGE_SIZE,
};
use ostd::power::{poweroff, ExitCode};
use ostd::prelude::*;
use ostd::task::{disable_preempt, Task, TaskOptions};
use ostd::user::{ReturnReason, UserMode};

/// The kernel's boot and initialization process is managed by OSTD.
/// After the process is done, the kernel's execution environment
/// (e.g., stack, heap, tasks) will be ready for use and the entry function
/// labeled as `#[ostd::main]` will be called.
#[ostd::main]
pub fn main() {
    let program_binary = include_bytes!("../hello");
    let vm_space = Arc::new(create_vm_space(program_binary));
    vm_space.activate();
    let user_task = create_user_task(vm_space);
    user_task.run();
}

fn create_vm_space(program: &[u8]) -> VmSpace {
    let nbytes = program.len().align_up(PAGE_SIZE);
    let user_pages = {
        let segment = FrameAllocOptions::new()
            .alloc_segment(nbytes / PAGE_SIZE)
            .unwrap();
        // Physical memory pages can be only accessed
        // via the `UFrame` or `USegment` abstraction.
        segment.write_bytes(0, program).unwrap();
        segment
    };

    // The page table of the user space can be
    // created and manipulated safely through
    // the `VmSpace` abstraction.
    let vm_space = VmSpace::new();
    const MAP_ADDR: Vaddr = 0x0040_0000; // The map addr for statically-linked executable
    let preempt_guard = disable_preempt();
    let mut cursor = vm_space
        .cursor_mut(&preempt_guard, &(MAP_ADDR..MAP_ADDR + nbytes))
        .unwrap();
    let map_prop = PageProperty::new_user(PageFlags::RWX, CachePolicy::Writeback);
    for frame in user_pages {
        cursor.map(frame.into(), map_prop);
    }
    drop(cursor);
    vm_space
}

fn create_user_task(vm_space: Arc<VmSpace>) -> Arc<Task> {
    fn user_task() {
        let current = Task::current().unwrap();
        // Switching between user-kernel space is
        // performed via the UserMode abstraction.
        let mut user_mode = {
            let user_ctx = create_user_context();
            UserMode::new(user_ctx)
        };

        loop {
            // The execute method returns when system
            // calls or CPU exceptions occur or some
            // events specified by the kernel occur.
            let return_reason = user_mode.execute(|| false);

            // The CPU registers of the user space
            // can be accessed and manipulated via
            // the `UserContext` abstraction.
            let user_context = user_mode.context_mut();
            if ReturnReason::UserSyscall == return_reason {
                let vm_space = current.data().downcast_ref::<Arc<VmSpace>>().unwrap();
                handle_syscall(user_context, &vm_space);
            }
        }
    }

    // Kernel tasks are managed by the Framework,
    // while scheduling algorithms for them can be
    // determined by the users of the Framework.
    Arc::new(TaskOptions::new(user_task).data(vm_space).build().unwrap())
}

fn create_user_context() -> UserContext {
    // The user-space CPU states can be initialized
    // to arbitrary values via the `UserContext`
    // abstraction.
    let mut user_ctx = UserContext::default();
    const ENTRY_POINT: Vaddr = 0x0040_1000; // The entry point for statically-linked executable
    user_ctx.set_rip(ENTRY_POINT);
    user_ctx
}

fn handle_syscall(user_context: &mut UserContext, vm_space: &VmSpace) {
    const SYS_WRITE: usize = 1;
    const SYS_EXIT: usize = 60;

    match user_context.rax() {
        SYS_WRITE => {
            // Access the user-space CPU registers safely.
            let (_, buf_addr, buf_len) =
                (user_context.rdi(), user_context.rsi(), user_context.rdx());
            let buf = {
                let mut buf = vec![0u8; buf_len];
                // Copy data from the user space without
                // unsafe pointer dereferencing.
                let mut reader = vm_space.reader(buf_addr, buf_len).unwrap();
                reader
                    .read_fallible(&mut VmWriter::from(&mut buf as &mut [u8]))
                    .unwrap();
                buf
            };
            // Use the console for output safely.
            println!("{}", str::from_utf8(&buf).unwrap());
            // Manipulate the user-space CPU registers safely.
            user_context.set_rax(buf_len);
        }
        SYS_EXIT => poweroff(ExitCode::Success),
        _ => unimplemented!(),
    }
}

Soundness Analysis

OSTD is the privileged OS framework — the memory-safety Trusted Computing Base (TCB) — of the Asterinas framekernel OS. OSTD is privileged because it is the only crate in the entire Asterinas codebase that is permitted to use unsafe Rust. Every other crate, including the full kernel (kernel/), enforces #![deny(unsafe_code)].

The privilege boundary in an OSTD-based framekernel is OSTD’s public safe API surface: every pub function, type, and trait that OSTD exposes. This document demonstrates that OSTD is sound: its unsafe code does not introduce undefined behavior (UB), regardless of how the rest of the system behaves.

Outline

This section presents a systematic argument for the soundness of OSTD. The argument proceeds as follows. We first define exactly what soundness means in an OS context and identify the threat model. We then explain the systematic principle OSTD uses to decide which resources require protection: the sensitivity classification, which partitions CPU, memory, and device resources into sensitive (kept inside OSTD) and insensitive (safely exposed) categories. This principle motivates every design decision in the sections that follow.

With the threat model and classification in hand, we address each attack surface in turn. The most fundamental is physical memory: if a client could obtain two mutable views of the same memory, or write to memory hosting Rust objects, all other guarantees collapse. Safe Physical Memory Management presents the typed/untyped memory model — the cornerstone of OSTD’s soundness — and argues that the frame lifecycle, reference counting, and metadata system prevent these violations. Building on this memory foundation, Safe User-Kernel Interactions shows that user-space programs cannot corrupt kernel memory (because only untyped frames are mapped into user space) and that the user-kernel transition preserves CPU state integrity. Safe Kernel-Peripheral Interactions closes the device boundary: IOMMU DMA remapping restricts devices to untyped memory, interrupt remapping prevents interrupt spoofing, and I/O access control hides sensitive hardware registers.

The remaining sections address threats that arise from within kernel-mode safe Rust code itself. Safe Kernel Logic demonstrates that OSTD’s synchronization primitives, preemption control, and CPU-local storage cannot be misused by clients to cause data races or deadlocks that lead to UB. Safe Policy Injection shows that even the large, complex policy components injected into OSTD (scheduler, frame allocator, slab allocator) cannot violate soundness regardless of their behavior, because OSTD validates their outputs independently.

Together, these sections cover all major UB risks of OSTD.

What Soundness Means for an OS Framework

The Soundness Goal

The soundness of OSTD provides the following strong guarantee:

No sequence of calls to OSTD’s safe public API — combined with any behavior from user-space programs and peripheral devices — from a client in safe Rust can trigger undefined behaviors (UBs).

Note that soundness is distinct from correctness. A sound OSTD may still contain logic bugs (e.g., a scheduler that starves tasks, or an allocator that wastes memory). Soundness means these bugs cannot escalate into memory corruption, data races, type confusion, or any other form of UB. The OS may crash, deadlock, or behave incorrectly — but it will never silently corrupt memory.

Three Levels of Undefined Behavior

An OS kernel faces undefined behavior (UB) threats that ordinary Rust programs do not. We classify them into three levels:

Language-level UB. These are the UB categories defined by the Rust Reference: data races, use-after-free, dangling pointers, null pointer dereference, type confusion, aliasing violations (breaking the “shared XOR mutable” rule), and reading uninitialized memory. Rust’s safe subset prevents all of these at compile time. OSTD must ensure that its unsafe code does not reintroduce them.

Environment-level UB. Programming languages make implicit assumptions about their execution environment: the stack is valid, the heap is not corrupted, and the code being executed has not been overwritten. In a kernel, these assumptions must be actively maintained. A stack overflow that jumps over a guard page, a heap corruption from a faulty allocator, or a code page overwritten by a rogue DMA transfer can all silently violate these assumptions.

Architecture-level UB. OS kernels directly manipulate hardware state: CPU control registers, page tables, interrupt descriptor tables, IOMMU configuration. Incorrect manipulation can corrupt the execution environment in ways invisible to the Rust compiler. For example: misconfiguring a page table can cause the CPU to write kernel data to the wrong physical address; allowing a device to DMA into a page table page can corrupt address translation; improperly saving/restoring CPU registers across context switches can corrupt task state.

OSTD must defend against all three levels simultaneously.

Note

To be concrete, our soundness analysis targets the x86-64 CPU architecture, but our arguments can be generalized to other CPU architectures.

The Trust Model

The following components are trusted — their correctness is assumed:

  • The Rust compiler and core libraries (rustc, core, alloc). We rely on the compiler to correctly enforce Rust’s safety guarantees for safe code and to correctly compile unsafe code according to the language specification.

  • OSTD itself. This is the subject of this document: we argue that OSTD’s unsafe code is sound.

  • The bootloader and firmware. We trust the bootloader to load the kernel correctly, provide accurate memory maps, and configure the initial page table. We trust firmware (ACPI tables, device trees) to accurately describe the hardware.

  • Core hardware: the CPU (correct instruction execution, correct privilege level enforcement, correct page table walks), memory controller (correct physical memory access), and IOMMU (correct DMA remapping and interrupt remapping when enabled).

The following components are not trusted — OSTD must remain sound regardless of their behavior:

  • All OS services in kernel/. These are written in safe Rust. They may contain logic bugs (incorrect syscall implementations, file system corruption, network protocol violations) but cannot cause UB. This includes all device drivers, file systems, networking stacks, process management, and IPC.

  • Injected policies: task schedulers, frame allocators, and slab allocators. These are registered by OS services via safe traits. They may make arbitrarily bad decisions (schedule the same task twice, return an already-allocated frame, return a misaligned slab slot) but cannot cause UB. OSTD’s mechanisms enforce soundness independently of policy correctness (Safe Policy Injection).

  • User-space programs. They may execute arbitrary instructions, make arbitrary system calls with arbitrary arguments, and attempt to corrupt kernel state through any user-accessible mechanism. OSTD must ensure that user-space behavior cannot cause kernel UB.

  • Peripheral devices (NICs, GPUs, disks, USB devices). They may issue arbitrary DMA transfers to any physical address, generate arbitrary interrupts, and return arbitrary data in response to MMIO/PIO reads. OSTD must ensure that peripheral behavior cannot corrupt kernel memory safety. This is enforced by the IOMMU (Safe Kernel-Peripheral Interactions) and the typed/untyped memory model (Safe Physical Memory Management).

The Sensitivity Classification Principle

At the heart of the framekernel architecture is a systematic classification of OS resources. An OS manages three fundamental classes of resources — CPU, memory, and devices — each subdivided into sensitive (can compromise kernel memory safety if misused) and insensitive (cannot compromise kernel memory safety even if misused). To ensure the soundness and minimality of TCB, OSTD adopts the following design principle:

Keep sensitive resources inside the framework (for soundness) and move insensitive resources outside (for minimality).

CPU Resources

ResourceClassificationRationaleOSTD Handling
Kernel-mode registers (CR0–CR4, MSRs, GDT/IDT/TSS pointers, kernel GS base)SensitiveCan corrupt execution environmentSet once at boot; never exposed to clients
Kernel-mode traps (exception/interrupt handlers)SensitiveCan hijack control flowIDT configured at boot; handlers are internal
User-mode registers (GP registers, user RFLAGS subset, FS base)InsensitiveCannot directly affect kernel stateExposed via UserContext with sanitization
User-mode trapsInsensitiveRouted through kernel trap handlerDispatched by OSTD; clients handle via callbacks

UserContext sanitizes user-visible registers: it forces the IF (Interrupt Flag) and ID flags in RFLAGS and strips IOPL, preventing user space from disabling interrupts or accessing I/O ports directly. Kernel-mode registers are never represented in any client-visible type.

Memory Resources

ResourceClassificationRationaleOSTD Handling
Kernel code pagesSensitiveOverwriting code = arbitrary executionMapped read-only; typed frames; never exposed
Kernel stack pagesSensitiveStack corruption = control flow hijackTyped frames; guard pages; never exposed
Kernel heap pagesSensitiveHeap corruption = type confusionTyped frames (SlabMeta); never exposed
Page table pagesSensitivePT corruption = arbitrary memory accessTyped frames (PageTablePageMeta); managed by OSTD
Frame metadata pagesSensitiveMetadata corruption = use-after-freeIn dedicated FRAME_METADATA_RANGE; never exposed
User-space virtual memoryInsensitiveKernel safety does not depend on itManipulated safely via VmSpace
Untyped physical memory pagesInsensitiveDo not host Rust objects; accessed only via POD copyExposed as UFrame / USegment; used for user pages, DMA buffers

The sensitive/insensitive distinction maps directly to the typed/untyped frame distinction (Safe Physical Memory Management). All sensitive memory is held in typed frames, which are never exposed to clients, user space, or devices. All insensitive memory is held in untyped frames, which can be safely shared.

Device Resources

ResourceClassificationRationaleOSTD Handling
Local APICSensitiveCan reset CPUs, mask interruptsIoMem<Sensitive>, pub(crate) only
I/O APICSensitiveCan redirect interruptsIoMem<Sensitive>, pub(crate) only
IOMMUSensitiveControls DMA accessIoMem<Sensitive>, pub(crate) only
PIC (8259A)SensitiveLegacy interrupt controllerSensitive I/O ports, pub(crate) only
Peripheral MMIO (NIC, disk, USB, GPU)InsensitiveFailure confined to the deviceIoMem<Insensitive> via IoMem::acquire(range)
Peripheral I/O portsInsensitiveFailure confined to the deviceIoPort via IoPort::acquire(port)
Peripheral interruptsInsensitiveRouted through OSTD’s IRQ frameworkIrqLine callback registration
DMA mappingsInsensitiveRestricted by IOMMU to untyped memoryDmaCoherent / DmaStream APIs

Internally, OSTD maintains two I/O resource allocators (IoMemAllocator, IoPortAllocator) that are initialized at boot time by removing all sensitive ranges. Only the remaining insensitive ranges are available for client allocation. This makes it impossible for a client to accidentally or maliciously access a sensitive device.

Safe Physical Memory Management

The typed/untyped memory model is the cornerstone of OSTD’s soundness. It addresses a fundamental challenge: how to safely handle memory that may be modified by entities outside the Rust compiler’s control (hardware devices, user-space programs).

The Problem: Externally-Modifiable Memory

Consider a kernel that maps a page into user space and also holds a Rust reference (&[u8]) to the same page. If the user program writes to the page, the kernel’s reference now points to data that has changed without the compiler’s knowledge. For immutable references, this violates Rust’s aliasing rules (UB). For mutable references, the concurrent modification is a data race (UB).

The same problem arises with DMA: a device may write to a page at any time, and any Rust reference to that page is invalidated.

The Solution: Typed and Untyped Frames

OSTD partitions all physical page frames into two categories:

Typed frames host Rust objects — page tables, kernel stacks, heap objects, frame metadata. They participate fully in Rust’s ownership, borrowing, and lifetime system. The Rust compiler’s aliasing assumptions are valid for typed frames because no external entity can access them.

Safety Invariant: Typed frames cannot be manipulated directly by OSTD clients.

Untyped frames are raw byte buffers. They do not host Rust objects. They are accessed exclusively through a POD (Plain Old Data) copy interface (VmReader/VmWriter) that never creates Rust references to the frame contents.

Safety Invariant: Untyped frames are never accessed via Rust references.

For example, copying data into and out of an untyped segment looks like:

// Writing data into an untyped segment (e.g., preparing a DMA buffer)
let mut writer = segment.writer();
writer.write_val(&header)?;

// Reading data from an untyped segment (e.g., receiving from a device)
let mut reader = segment.reader();
let header: Header = reader.read_val()?;

The reader() and writer() methods return VmReader<Infallible> / VmWriter<Infallible> cursors that operate on raw pointers internally and access the memory with volatile reads/writes — no &T or &mut T references to the frame contents are ever created. Because of this, external modifications by user programs or DMA devices cannot violate Rust’s aliasing rules.

The same VmReader/VmWriter mechanism is reused for accessing user-space virtual memory, but with a Fallible marker instead of Infallible since user-space accesses may trigger page faults.

Type-Level Encoding

Frame<M> is parameterized by a metadata type M. Clients associate custom metadata with each frame by choosing M — for example, page table nodes store PageTablePageMeta and heap slabs store SlabMeta. Frame::meta() provides typed access to the per-frame metadata.

The same type parameter enforces the typed/untyped distinction at compile time. VmReader/VmWriter access to frame contents is gated on M implementing AnyUFrameMeta:

// Only untyped frames expose reader()/writer()
impl<UM: AnyUFrameMeta> HasVmReaderWriter for Frame<UM> {
    fn reader(&self) -> VmReader<'_, Infallible> { ... }
    fn writer(&self) -> VmWriter<'_, Infallible> { ... }
}

A typed Frame<PageTablePageMeta> does not satisfy this bound, so the compiler rejects any attempt to read or write its contents. The two safety invariants from the previous section are therefore compile-time guarantees, not runtime checks: typed frame contents are inaccessible to clients because the operation does not exist in the type system, and untyped frame contents can only be accessed via VmReader/VmWriter (which never create Rust references).

The type hierarchy:

struct Frame<M: AnyFrameMeta> { ... }

unsafe trait AnyFrameMeta: Any + Send + Sync { ... }
trait AnyUFrameMeta: AnyFrameMeta { ... }

type UFrame = Frame<dyn AnyUFrameMeta>;
type USegment = Segment<dyn AnyUFrameMeta>;

AnyFrameMeta is unsafe, preventing clients from implementing it directly (kernel code enforces #![deny(unsafe_code)]). Instead, OSTD provides safe macros — impl_frame_meta_for! for typed metadata and impl_untyped_frame_meta_for! for untyped metadata — that produce correct implementations. A client cannot misclassify a frame’s sensitivity because only the macros can set the AnyFrameMeta::is_untyped() return value.

The Frame Lifecycle

Each physical page frame has a MetaSlot (64 bytes) in a linearly-mapped metadata region. The MetaSlot contains an AtomicU64 reference count that encodes the frame’s lifecycle state:

stateDiagram-v2
    UNUSED: UNUSED<br>(ref_count = MAX_VALUE)
    BUSY_INIT: BUSY<br>(ref_count = 0)
    IN_USE: IN_USE<br>(ref_count = 1..N)
    BUSY_DROP: BUSY<br>(ref_count = 0)

    UNUSED --> BUSY_INIT : from_unused()<br>CAS(UNUSED, 0)
    BUSY_INIT --> IN_USE : write metadata,<br>set ref_count = 1
    IN_USE --> IN_USE : clone (inc) / drop (dec)
    IN_USE --> BUSY_DROP : drop last ref<br>(ref_count → 0)
    BUSY_DROP --> UNUSED : drop metadata,<br>dealloc

Special states:

  • REF_COUNT_UNUSED (u64::MAX): Frame is not in use. Only Frame::from_unused(paddr, metadata) can transition out.
  • 0: Busy — frame is being constructed or destructed. No concurrent access permitted.
  • 1..REF_COUNT_MAX: Normal shared reference counting. REF_COUNT_MAX is a guard value; exceeding it calls abort() (same pattern as Arc).

The transitions are atomic: from_unused uses compare_exchange(UNUSED, 0, Acquire, Relaxed) to claim a frame. inc_ref_count uses fetch_add(1, Relaxed) with an overflow check. drop uses fetch_sub(1, Release) with an Acquire fence when the count reaches zero.

This state machine is key to the safety of physical memory page management.

Safety Invariant: Unused frames are inaccessible.

A frame with ref_count = REF_COUNT_UNUSED has no Frame handle in existence. The only way to obtain a handle is from_unused(), which atomically transitions the frame out of the UNUSED state via CAS.

Safety Invariant: Frames in the BUSY state are exclusively owned by the transitioning thread.

A frame with ref_count = 0 is being constructed or destructed. No other thread can obtain a handle to it: inc_ref_count cannot increment from 0, and from_unused only transitions from UNUSED. This guarantees exclusive access during metadata initialization and cleanup.

The BUSY state prevents a data race during frame initialization. Without it, from_unused would transition directly from UNUSED to IN_USE (ref_count = 1). This creates a critical window: after the CAS succeeds but before metadata is written, another thread could call inc_ref_count (which uses Relaxed ordering) to clone the frame, then read uninitialized metadata — a data race UB. By transitioning to BUSY (ref_count = 0) first, no other thread can obtain a handle: inc_ref_count requires an existing reference (impossible at ref_count = 0), and get_from_in_use rejects BUSY frames. Metadata is written safely during the BUSY state, then the store(1, Release) transition to IN_USE pairs with Acquire in subsequent readers, ensuring the metadata is visible before any other thread can access the frame.

The Linear Mapping

OSTD maps all physical memory into the kernel’s virtual address space via a linear mapping (paddr + LINEAR_MAPPING_BASE = vaddr). This means typed frame contents are technically accessible at a known virtual address. However:

  • The linear mapping is established by OSTD’s boot code and is not controllable by clients.
  • The only safe way to read/write through the linear mapping is via VmReader/VmWriter, which require the caller to prove the memory is untyped (via the type system).
  • There is no safe API that returns a raw pointer or reference to a typed frame’s linear-mapping address.

A client cannot exploit the linear mapping to access typed frames because they cannot construct a VmReader/VmWriter for a typed frame — the type system prevents it.

Safe User-Kernel Interactions

OSTD provides three API categories for user-kernel interactions: entering/exiting user space (UserMode, UserContext), managing user address spaces (VmSpace), and accessing user memory (VmReader<Fallible>, VmWriter<Fallible>).

UserMode and UserContext

UserContext stores the user-space register state (general-purpose registers, RFLAGS, FS/GS base, and exception info). Clients read and write this state between user-space entries — for instance, to set up syscall return values or inspect exception details.

UserMode owns a UserContext and provides the safe enter/exit loop for user space. It is !Send: once created on a task, it cannot be sent to another, preventing cross-task confusion of register state. UserMode::context() and UserMode::context_mut() give access to the underlying UserContext for register inspection and modification between entries.

Clients may set arbitrary values in UserContext, but the sanitization inside UserMode::execute(has_kernel_event) ensures kernel safety invariants are enforced every time before entering user space:

  1. RFLAGS sanitization: IF (Interrupt Flag) and ID are forced on. IOPL is stripped. This prevents user space from disabling interrupts or gaining I/O port access through RFLAGS manipulation.

  2. IRQ disabling before transition: Local IRQs are disabled before calling the assembly syscall_return routine. This is critical: the swapgs instruction (which switches between kernel and user GS base) must execute atomically with the actual privilege level transition (sysret or iret). If an interrupt occurred between swapgs and sysret, the interrupt handler would see the user GS base instead of the kernel GS base, corrupting CPU-local storage access.

  3. FS base: The user’s fsbase is stored in UserContext and restored via wrfsbase. The kernel uses GS (not FS) for CPU-local storage, so a user-controlled FS base cannot affect kernel state.

The above safety measures are key to achieving the following invariant:

Safety Invariant: User space cannot manipulate kernel-mode CPU state.

VmSpace

VmSpace manages a user-space page table. To modify page table mappings, clients obtain a CursorMut via VmSpace::cursor_mut(guard, &range), which gives exclusive access to a virtual address range within the page table.

The key safety invariant of VmSpace is:

Safety Invariant. Only untyped frames and I/O memory can be mapped into user space.

CursorMut exposes exactly two methods for creating mappings, and neither accepts typed frames:

TLB coherence: When frames are unmapped from user space, they are wrapped in RcuDrop and attached to the TlbFlusher, which tracks which CPUs have cached the translation.

Safety Invariant. The frames are not freed until TLB flush IPIs have been sent to all relevant CPUs and acknowledged.

This prevents use-after-free via stale TLB entries.

Accessing User Memory from the Kernel

The kernel frequently needs to read/write user-space memory (e.g., for read()/write() syscalls). Because user space can modify its own pages at any time, creating a Rust reference (&T or &mut T) to user memory would violate Rust’s aliasing rules — the same problem that motivates the VmReader/VmWriter interface for untyped frames.

Safety Invariant. The kernel never creates Rust references to user memory.

VmSpace::reader(vaddr, len) and VmSpace::writer(vaddr, len) are the only safe APIs for user memory access. They return VmReader<Fallible> / VmWriter<Fallible> cursors — the same raw-pointer-based mechanism used for untyped frame access, but operating on user virtual addresses instead of physical frame contents. The Fallible marker (as opposed to Infallible for frame access) indicates that individual reads and writes may trigger page faults (e.g., on unmapped or copy-on-write pages); OSTD handles these faults gracefully and propagates the error to the caller.

These methods also enforce two runtime checks before constructing the cursors:

  • Page table identity: The call fails if the VmSpace’s page table is not the one currently active on this CPU, preventing access to another task’s address space.
  • Address range: The call fails if the address range extends beyond the user-space address limit, preventing accidental access to kernel memory through this API.

Safe Kernel-Peripheral Interactions

OSTD provides APIs for device drivers to interact with peripheral hardware: DMA memory, MMIO, PIO, and interrupt handling. All are designed so that a safe-Rust driver cannot compromise kernel memory safety.

DMA Safety

DMA (Direct Memory Access) allows peripheral devices to read/write physical memory independently of the CPU. Without protection, a malicious or buggy device could DMA into kernel code, page tables, or the heap — catastrophic for memory safety.

OSTD provides two DMA abstractions, both restricted to untyped memory:

IOMMU enforcement: When IOMMU DMA remapping is enabled (the default on x86), OSTD configures the IOMMU so that no physical memory is DMA-accessible by default. DMA mappings are created on-demand via iommu::map, which inserts entries into the IOMMU page table. Only addresses that have been explicitly mapped via DmaCoherent or DmaStream are accessible to devices. Since these APIs only accept untyped memory, the IOMMU page table only ever contains mappings to untyped frames.

Safety Invariant. Devices can only DMA to untyped memory.

A peripheral device can only DMA to addresses present in the IOMMU page table. The IOMMU page table only maps untyped memory. Untyped memory does not host Rust objects. Therefore, even a malicious device performing arbitrary DMA cannot corrupt any Rust data structure.

Without IOMMU (when hardware lacks it), OSTD cannot enforce DMA restrictions. In this case, the threat model explicitly excludes malicious devices — but the API-level restriction (untyped-only) still prevents accidental DMA to sensitive memory by well-behaved drivers.

Interrupt Remapping

Beyond DMA, devices can also send interrupts — and on x86, interrupt delivery involves writing to a physical address (the APIC’s message signaled interrupt region). Without protection, a device could forge interrupts to arbitrary vectors, potentially triggering unexpected trap handlers.

OSTD enables IOMMU interrupt remapping, which interposes a translation table between device interrupt requests and CPU interrupt delivery. The interrupt remapping table is managed exclusively by OSTD — no client can access or modify it.

Safety Invariant. Devices cannot inject interrupts to arbitrary vectors.

All interrupt delivery is mediated by the remapping table, which OSTD controls exclusively.

I/O Memory Access Control

MMIO (Memory-Mapped I/O) allows the CPU to interact with device registers by reading/writing specific physical addresses. Some MMIO regions are sensitive (APIC, IOMMU registers) and must not be accessible to device drivers.

OSTD’s IoMem type has two sensitivity levels, enforced at the type level:

pub struct IoMem<S: SecuritySensitivity> { ... }
  • IoMem<Sensitive>: The read_once and write_once methods are pub(crate) unsafe. Only OSTD-internal system drivers (APIC, IOMMU, I/O APIC) can access sensitive I/O memory.
  • IoMem<Insensitive>: Provides safe reader() and writer() methods via the HasVmReaderWriter trait. Available to device drivers via IoMem::acquire(range).

The IoMemAllocator enforces the boundary:

  1. At initialization, it is populated with all MMIO address ranges that do not overlap with RAM (from firmware tables).
  2. System device ranges (APIC, IOMMU, etc.) are removed via remove().
  3. Only the remaining ranges are available for acquire().

Safety Invariant. Device drivers cannot access sensitive I/O memory through any safe API.

The IoMemAllocator has removed all sensitive ranges before clients can access it. Even if a driver obtains IoMem<Insensitive> for a peripheral’s MMIO, the worst it can do is misconfigure that specific peripheral — it cannot access kernel-critical hardware.

I/O Port Access Control (x86)

I/O ports, represented as IoPort, are an x86-specific mechanism for device communication. The same sensitivity distinction applies:

The IoPortAllocator removes sensitive ports at initialization. Sensitive ports are declared statically via the sensitive_io_port! macro, which places their addresses in the .sensitive_io_ports linker section. At boot, OSTD reads this section and removes all listed ports from the allocator.

Safety Invariant. Device drivers cannot access sensitive I/O ports through any safe API.

Same enforcement as I/O memory — the IoPortAllocator removes all sensitive ports before clients can access it.

Interrupt Handling

Device interrupts are managed through the IrqLine abstraction. Clients register callbacks but cannot manipulate the IDT, APIC, or interrupt routing directly. All interrupt configuration is mediated by OSTD’s safe APIs.

Safety Invariant. Clients cannot corrupt interrupt delivery mechanisms.

They have no access to the IDT, APIC registers, or interrupt remapping table.

Safe Kernel Logic

OSTD provides preemption control, synchronization primitives, and CPU-local storage for OS services. These must be sound even when used by potentially buggy (but safe-Rust) clients.

Preemption and Atomic Mode

In a preemptive kernel, a running task can be interrupted at almost any point and replaced by another task on the same CPU. Certain operations — such as holding a spinlock or executing an interrupt handler — require that preemption be temporarily disabled. OSTD calls this state atomic mode: a CPU is in atomic mode when preemption is disabled or local IRQs are disabled.

Sleeping while in atomic mode is a classic kernel bug. If a spinlock holder sleeps, the lock remains held while the CPU runs another task; if that task tries to acquire the same lock, a deadlock results. OSTD prevents this unconditionally:

Safety Invariant. A task cannot sleep while in atomic mode.

OSTD tracks atomic mode using guard types. DisabledPreemptGuard disables preemption for its lifetime, and DisabledLocalIrqGuard disables local IRQs (which also implies preemption is disabled). Both guards are !Send — they cannot be moved to another CPU, which would corrupt the CPU-local preemption state.

Every context switch and every sleeping-lock wait path checks that the current CPU is not in atomic mode, panicking if the check fails. This enforcement is comprehensive: a spinlock’s guard puts the CPU in atomic mode, so any attempt to sleep while holding a spinlock is caught.

Synchronization Primitives

OSTD provides six synchronization primitives: SpinLock, Mutex, RwLock, RwMutex, Rcu, and WaitQueue. Their soundness rests on two design principles:

Principle 1: Guards are !Send. Every lock guard type (SpinLockGuard, MutexGuard, RwLockReadGuard, etc.) is !Send. This prevents a client from acquiring a lock on one CPU and releasing it on another — which would corrupt the CPU-local preemption state (since the embedded preemption or IRQ guard is tied to the acquiring CPU).

Principle 2: Atomic mode is entered before lock acquisition. SpinLock::lock() disables preemption (or IRQs) before entering the CAS loop. If the order were reversed (acquire lock, then disable preemption), there would be a window where the lock is held but preemption is still enabled — an interrupt could preempt the task, and if the interrupt handler tries to acquire the same lock, a deadlock results.

The kind of guard — preemption-only (PreemptDisabled) or IRQ-disabled (LocalIrqDisabled) — is chosen at lock declaration time and statically enforced by the type system, ensuring consistent usage throughout the lock’s lifetime.

RCU (Read-Copy-Update)

Rcu<P> provides lock-free read access with deferred reclamation:

  • Read side: Rcu::read() disables preemption and loads the pointer with Acquire ordering. The read-side critical section is non-preemptible.
  • Write side: Rcu::update(new_ptr) atomically swaps the pointer and enqueues the old value for deferred drop.
  • Grace period: OSTD detects when all CPUs have left their read-side critical sections by observing context switches. A CPU that has performed a context switch must have passed through preemption-enabled state (since the context switch path enforces that), and therefore cannot still be inside a read-side critical section. Once all CPUs have been observed in this state, deferred drops are executed safely.

Safety Invariant. No safe API allows a client to violate lock invariants.

The guard type system (!Send, atomic mode tracking, statically-chosen guard kinds) ensures that locks are used correctly regardless of client behavior.

CPU-Local Storage

OSTD provides two CPU-local storage mechanisms:

cpu_local! (static variables): Access requires a DisabledLocalIrqGuard via CpuLocal::get_with(&irq_guard). This ensures two properties:

  • The task is pinned to the current CPU (IRQs disabled implies no preemption and no migration).
  • No interrupt handler can observe a partially-modified state.

For Sync types, CpuLocal::get_on_cpu(target_cpu) allows remote access without a guard (since Sync types are safe to share across threads).

cpu_local_cell! (cell variables): These provide inner mutability through operations that are atomic with respect to interrupts on the same CPU. OSTD uses this mechanism internally for preemption state and current-task tracking, which must be modified without risk of interrupt interference.

Safety Invariant. No safe API allows a client to access another CPU’s local storage without proper synchronization.

Static CPU-local variables require an IRQ guard to access, ensuring CPU pinning. Cell variables use interrupt-safe operations by construction.

Safe Policy Injection

OSTD separates mechanism (in the TCB) from policy (outside). Three complex policy components — task scheduler, frame allocator, and slab allocator — are implemented outside OSTD in safe Rust and injected via safe trait interfaces. This keeps the TCB minimal while allowing sophisticated, evolving implementations.

The key principle is: OSTD’s mechanisms enforce soundness independently of policy correctness. A buggy or adversarial (but safe-Rust) policy may cause incorrect behavior (wrong scheduling decisions, suboptimal memory allocation), but it cannot cause UB.

Task Scheduler

The scheduler is injected via two safe traits: Scheduler<T> (global scheduling decisions) and LocalRunQueue<T> (per-CPU run queue management). Neither trait contains unsafe methods.

The threat: A buggy scheduler might return the same task as the “next task” on two different CPUs simultaneously, or return a task that is already running.

The defense: The switched_to_cpu: AtomicBool flag in each Task. In before_switching_to(), OSTD performs compare_exchange(false, true, AcqRel, Relaxed). If the flag is already true (the task is running on another CPU), the CAS fails and the CPU spins until the flag becomes false.

This defense is unconditional — it does not depend on the scheduler being correct. Even if the scheduler’s LocalRunQueue::pick_next() returns a task that is actively running on another CPU, the CAS loop prevents the second CPU from entering the task’s context. The first CPU will eventually switch away from the task (setting switched_to_cpu = false in after_switching_to), at which point the second CPU can proceed.

Safety Invariant. A task never executes on two CPUs simultaneously, regardless of scheduler behavior.

Running a task on two CPUs would mean two CPUs share a single kernel stack — a catastrophic memory safety violation. The switched_to_cpu CAS prevents this unconditionally.

Frame Allocator

The frame allocator is injected via the GlobalFrameAllocator trait (safe). The allocator decides which physical addresses to return for allocation requests.

The threat: A buggy allocator might return an address that is already in use (double-allocation), an address outside physical memory, or an address in a reserved region.

The defense: Frame::from_unused(paddr, metadata) enforces the invariant by checking the frame metadata:

Safety Invariant. The allocator cannot trick OSTD into creating two Frame handles to the same physical page.

The metadata system acts as a second line of defense behind the allocator. Even if the allocator returns a wrong address, from_unused will detect the error (frame already in use, or out of range) and reject it. The atomic CAS on the reference count prevents double-allocation.

Slab Allocator

The slab allocator is injected via the GlobalHeapAllocator trait (safe).

The threat: A buggy slab allocator might return a slot that is already allocated (double-allocation), a slot with wrong size/alignment, or a slot from a freed slab.

The defenses: OSTD controls key slab allocator building blocks such as Slab and HeapSlot to prevent an injected slab allocator from tricking OSTD into using invalid memory slots.

Safety Invariant. A Slab cannot be freed while any of its slots are still allocated.

SlabMeta::alloc() increments and Slab::dealloc(slot) decrements nr_allocated. The slab panics on drop if nr_allocated > 0, preventing use-after-free of slab memory. Additionally, Slab::dealloc(slot) validates that the slot’s physical address falls within the slab’s range, rejecting slots that do not belong to it.

Safety Invariant. Every heap allocation is backed by a slot whose size and alignment match the requested layout.

AllocDispatch (the GlobalAlloc shim) validates every HeapSlot returned by the injected allocator: the slot’s size must match the size determined by slot_size_from_layout, it must be at least as large as the requested layout, and its address must satisfy the layout’s alignment requirement. Mismatches cause abort before the memory is used.

Safety Invariant. Client code cannot forge a HeapSlot.

HeapSlot::new() is pub(super) unsafe, so only OSTD’s heap module can create HeapSlot values. The injected allocator receives HeapSlots from SlabMeta::alloc() or HeapSlot::alloc_large(size) and must return them through GlobalHeapAllocator::dealloc() — it cannot construct arbitrary slots pointing to sensitive memory.

Other Injected Callbacks

Beyond the three complex policies above, OSTD has simpler injection points for logging, power management, scheduling hooks, page fault handling, SMP boot, interrupt bottom halves, and timer callbacks.

These callbacks are fundamentally different from the scheduler and allocators: they do not return information that OSTD must validate for correctness. A buggy scheduler can return the wrong task; a buggy allocator can return the wrong address. These callbacks simply do something when invoked — log a message, handle a fault, process deferred work.

Safety Invariant. No injected callback can compromise memory safety.

Two properties guarantee this. First, every callback is a safe function pointer or trait object, and the kernel enforces #![deny(unsafe_code)], so injected code is constrained to safe Rust. Second, OSTD’s execution context at each invocation point prevents the most dangerous side effects. Schedule hooks and bottom-half handlers run with IRQs or preemption disabled, which prevents sleeping and limits reentrancy. Timer callbacks run in IRQ-disabled interrupt context with a RefCell borrow guard that detects reentrant registration. Power handlers are divergent (-> !) with a machine-halt fallback.

The one notable exception is the logger: it can be called from virtually any context — normal tasks, interrupt handlers, panic handlers, and while OSTD holds internal locks. A logger that allocates, sleeps, or holds its own lock risks deadlock or cascading panics. The doc comment on inject_logger warns that the implementation must be simple, non-sleeping, and heapless, but this is advisory rather than enforced. Even so, the consequences are liveness failures (hangs, aborts) — not memory safety violations.

OSDK User Guide

Overview

The Asterinas OSDK (short for Operating System Development Kit) is designed to simplify the development of Rust operating systems. It aims to streamline the process by leveraging the framekernel architecture.

The OSDK provides a command-line tool cargo-osdk, which facilitates project management for those developed on the framekernel architecture. cargo-osdk can be used as a subcommand of Cargo. Much like Cargo for Rust projects, cargo-osdk enables building, running, and testing projects conveniently.

Install OSDK

Requirements

Currently, OSDK is only supported on x86_64 Ubuntu systems. We will add support for more operating systems in the future.

To run a kernel developed by OSDK with QEMU, the following tools need to be installed:

  • Rust >= 1.75.0
  • cargo-binutils
  • gcc
  • gdb
  • grub
  • ovmf
  • qemu-system-x86_64
  • xorriso

The dependencies required for installing Rust and running QEMU can be installed by:

apt install build-essential curl gdb grub-efi-amd64 grub2-common \
    libpixman-1-dev mtools ovmf qemu-system-x86 xorriso

About how to install Rust, you can refer to the official site.

cargo-binutils can be installed after Rust is installed by

cargo install cargo-binutils

Install

cargo-osdk is published on crates.io, and can be installed by running

cargo install cargo-osdk

Upgrade

If cargo-osdk is already installed, the tool can be upgraded by running

cargo install --force cargo-osdk

Why OSDK

OSDK is designed to elevate the development experience for Rust OS developers to the ease and convenience typically associated with Rust application development. Imagine crafting operating systems with the same simplicity as applications! This is important to Asterinas as we believe that the project’s success is intricately tied to the productivity and happiness of its developers. So the OSDK is here to upgrade your dev experience.

To be honest, writing OS kernels is hard. Even when you’re using Rust, which is a total game-changer for OS devs, the challenge stands tall. There is a bunch of reasons.

First, it is hard to write a new kernel from scratch. Everything that has been taken for granted by application developers are gone: no stack, no heap, no threads, not even the standard I/O. It’s just you and the no_std world of Rust. You have to implement these basic programming primitives by getting your hands dirty with the most low-level, error-prone, and nitty-gritty details of computer architecture. It’s a journey of learning, doing, and a whole lot of finger-crossing to make sure everything clicks into place. This means a high entry bar for new OS creators.

Second, it is hard to reuse OS-related libraries/crates across projects. Think about it: most applications share a common groundwork, like libc, Rust’s std library, or an SDK. This isn’t the case with kernels - they lack this shared starting point, forcing each one to craft its own set of tools from the ground up. Take device drivers, for example. They often need DMA-capable buffers for chatting with hardware, but since every kernel has its own DMA API flavor, a driver for one kernel is pretty much a no-go for another. This means that for each new kernel out there, developers find themselves having to ‘reinvent the wheel’ for many core components that are standard in other kernels.

Third, it is hard to do unit tests for OS functionalities. Unit testing plays a crucial role in ensuring code quality, but when you’re dealing with a monolithic kernel like Linux, it’s like a spaghetti bowl of intertwined parts. Trying to isolate one part for testing? Forget about it. You’d have to boot the whole kernel just to test a slice of it. Loadable kernel modules are no exception: you can’t test them without plugging them into a live kernel. This monolithic approach to unit testing is slow and unproductive as it performs the job of unit tests at a price of integration tests. Regardless of the kernel architecture, Rust’s built-in unit testing facility is not suited for kernel development, leaving each kernel to hack together their own testing frameworks.

Last, it is hard to avoid writing unsafe Rust in a Rust kernel. Rust brings safety… well, at least for Rust applications, where you can pretty much stay in the wonderland of safe Rust all the way through. But for a Rust kernel, one cannot help but use unsafe Rust. This is because, among other reasons, low-level operations (e.g., managing page tables, doing context switching, handling interrupts, and interacting with devices) have to be expressed with unsafe Rust features (like executing assembly code or dereferencing raw pointers). The misuse of unsafe Rust could lead to various safety and security issues, as reported by RustSec Advisory Database. Despite having a whole book to document “the Dark Arts of Unsafe Rust”, unsafe Rust is still tricky to use correctly, even among seasoned Rust developers.

Creating an OS Project

The OSDK can be used to create a new kernel project or a new library project. A kernel project defines the entry point of the kernel and can be run with QEMU. A library project can provide certain OS functionalities and be imported by other OSes.

Creating a new kernel project

Creating a new kernel project is simple. You only need to execute the following command:

cargo osdk new --kernel myos

Creating a new library project

Creating a new library project requires just one command:

cargo osdk new mylib

Generated files

Next, we will introduce the contents of the generated project in detail. If you don’t wish to delve into the details, you can skip the following sections.

Overview

The generated directory for both the kernel project and library project contains the following contents:

myos/
├── Cargo.toml
├── OSDK.toml
├── rust-toolchain.toml
└── src/
    └── lib.rs

src/lib.rs

Kernel project

The src/lib.rs file contains the code for a simple kernel. The function marked with the #[ostd::main] macro is considered the kernel entry point by OSDK. The kernel will print Hello world from the guest kernel!to the console and then abort.

#![no_std]
#![deny(unsafe_code)]

use ostd::prelude::*;

#[ostd::main]
fn kernel_main() {
    println!("Hello world from guest kernel!");
}

Library project

The src/lib.rs of library project only contains a simple kernel mode unit test. It follows a similar code pattern as user mode unit tests. The test module is marked with the #[cfg(ktest)] macro, and each test case is marked with #[ktest].

#![no_std]
#![deny(unsafe_code)]

#[cfg(ktest)]
mod tests {
    use ostd::prelude::*;

    #[ktest]
    fn it_works() {
        let memory_regions = &ostd::boot::boot_info().memory_regions;
        assert!(!memory_regions.is_empty());
    }
}

Cargo.toml

The Cargo.toml file is the Rust project manifest. In addition to the contents of a normal Rust project, OSDK will add the dependencies of the Asterinas OSTD to the file. The dependency version may change over time.

[dependencies.ostd]
git = "https://github.com/asterinas/asterinas"
branch = "main"

OSDK will also exclude the directory which is used to generate temporary files.

[workspace]
exclude = ["target/osdk/base"]

OSDK.toml

The OSDK.toml file is a manifest that defines the exact behavior of OSDK. By default, it includes settings on how to start QEMU to run a kernel. The meaning of each key can be found in the manifest documentation. Please avoid changing the default settings unless you know what you are doing.

The default manifest of a kernel project:

project_type = "kernel"

[boot]
method = "grub-rescue-iso"

[qemu]
args = """\
    -machine q35,kernel-irqchip=split \
    -cpu Icelake-Server,+x2apic \
    --no-reboot \
    -m 8G \
    -smp 1 \
    -nographic \
    -serial chardev:mux \
    -monitor chardev:mux \
    -chardev stdio,id=mux,mux=on,signal=off \
    -display none \
    -device isa-debug-exit,iobase=0xf4,iosize=0x04 \
    -drive if=pflash,format=raw,unit=0,readonly=on,file=/root/ovmf/release/OVMF.fd \
    -drive if=pflash,format=raw,unit=1,file=/root/ovmf/release/OVMF_VARS.fd \
"""

rust-toolchain.toml

The Rust toolchain for the kernel. It is the same as the toolchain of the Asterinas OSTD.

Running or Testing an OS Project

The OSDK allows for convenient building, running, and testing of an OS project. The following example shows the typical workflow.

Suppose you have created a new kernel project named myos and you are in the project directory:

cargo osdk new --kernel myos && cd myos

Build the project

To build the project and its dependencies, simply type:

cargo osdk build

The initial build of an OSDK project may take a considerable amount of time as it involves downloading the Rust toolchain used by the framekernel. However, this is a one-time process.

Run the project

To launch the kernel with QEMU, use the following command:

cargo osdk run

OSDK will boot the kernel and initialize OS resources like the console for output, and then hand over control to the kernel entry point to execute the kernel code.

Note: Only kernel projects (the projects that defines the function marked with #[ostd::main]) can be run; library projects cannot.

Test the project

Suppose you have created a new library project named mylib which contains a default test case and you are in the project directory.

cargo osdk new --lib mylib && cd mylib

To run the kernel mode tests, use the following command:

cargo osdk test

OSDK will run all the kernel mode tests in the crate.

Test cases can be added not only in library projects but also in kernel projects.

If you want to run a specific test with a given name, for example, if the test is named foo, use the following command:

cargo osdk test foo

Options

All of the build, run, and test commands accept options to control their behavior, such as how to compile and launch the kernel. The following documentations provide details on all the available options:

Working in a Workspace

Typically, an operating system may consist of multiple crates, and these crates may be organized in a workspace. The OSDK also supports managing projects in a workspace. Below is an example that demonstrates how to create, build, run, and test projects in a workspace.

Creating a new workspace

Create a new workspace by executing the following commands:

mkdir myworkspace && cd myworkspace
touch Cargo.toml

Then, add the following content to Cargo.toml:

[workspace]
members = []
resolver = "2"

Creating a kernel project and a library project

The two projects can be created using the following commands:

cargo osdk new --kernel myos
cargo osdk new mylib

The generated directory structure will be as follows:

myworkspace/
  ├── Cargo.toml
  ├── OSDK.toml
  ├── rust-toolchain.toml
  ├── myos/
  │   ├── Cargo.toml
  │   └── src/
  │       └── lib.rs
  └── mylib/
      ├── Cargo.toml
      └── src/
          └── lib.rs

At present, the OSDK mandates that there must be only one kernel project within a workspace.

In addition to the two projects, the OSDK will also generate OSDK.toml and rust-toolchain.toml at the root of the workspace.

Next, add the following function to mylib/src/lib.rs. This function will calculate the available memory after booting:

// SPDX-License-Identifier: MPL-2.0

pub fn available_memory() -> usize {
    let regions = &ostd::boot::boot_info().memory_regions;
    regions.iter().map(|region| region.len()).sum()
}

Then, add a dependency on mylib to myos/Cargo.toml:

[dependencies]
mylib = { path = "../mylib" }

In myos/src/lib.rs, modify the file content as follows. This main function will call the function from mylib:

// SPDX-License-Identifier: MPL-2.0

#![no_std]
#![deny(unsafe_code)]

use ostd::prelude::*;

#[ostd::main]
fn kernel_main() {
    let avail_mem_as_mb = mylib::available_memory() / 1_000_000;
    println!("The available memory is {} MB", avail_mem_as_mb);
}

Building and Running the kernel

Build and run the project using the following commands:

cargo osdk build
cargo osdk run

If everything goes well, you will see the output from the guest kernel.

Running unit test

You can run test cases from all crates by using the following command in the workspace folder:

cargo osdk test

If you want to run test cases from a specific crate, navigate to the crate’s folder and run cargo osdk test. For example, if you want to test mylib, use the following command:

cd mylib && cargo osdk test

Advanced topics

This chapter delves into advanced topics regarding OSDK, including its application in TEE environments such as Intel TDX.

Running an OS in Intel TDX env

The OSDK supports running your OS in an Intel TDX environment conveniently. Intel TDX can provide a more secure environment for your OS.

Prepare the Intel TDX Environment

Please make sure your server supports Intel TDX.

See this guide or other materials to enable Intel TDX in host OS.

To verify the TDX host status, you can type:

dmesg | grep "TDX module initialized"

The following result is an example:

[   20.507296] tdx: TDX module initialized.

If you see the message “TDX module initialized”, it means the TDX module has loaded successfully.

The Intel TDX environment requires TDX-enhanced versions of QEMU, KVM, GRUB, and other essential software for running an OS. Therefore, it is recommended to use a Docker image to deploy the environment.

Run a TDX Docker container:

docker run -it --privileged --network=host --device=/dev/kvm asterinas/osdk:0.17.1-20260317

Edit OSDK.toml for Intel TDX support

As Intel TDX has extra requirements or restrictions for VMs, it demands adjusting the OSDK configurations accordingly. This can be easily achieved with the scheme feature of the OSDK, which provides a convenient way to override the default OSDK configurations for a specific environment.

For example, you can append the following TDX-specific scheme to your OSDK.toml file.

[scheme."tdx"]
supported_archs = ["x86_64"]
boot.method = "grub-qcow2"
grub.mkrescue_path = "~/tdx-tools/grub"
grub.boot_protocol = "linux"
qemu.args = '''\
    -accel kvm \
    -m 8G \
    -vga none \
    -monitor pty \
    -nodefaults \
    -drive file=target/osdk/asterinas/asterinas.qcow2,if=virtio,format=qcow2 \
    -monitor telnet:127.0.0.1:9001,server,nowait \
    -bios /root/ovmf/release/OVMF.fd \
    -object '{ \"qom-type\": \"tdx-guest\", \"id\": \"tdx0\", \"sept-ve-disable\": true, \"quote-generation-socket\": { \"type\": \"vsock\", \"cid\": \"2\", \"port\": \"4050\" } }' \
    -cpu host,-kvm-steal-time,pmu=off \
    -machine q35,kernel-irqchip=split,confidential-guest-support=tdx0 \
    -smp 1 \
    -nographic \
'''

To choose the configurations specified by the TDX scheme over the default ones, add the --scheme argument to the build, run, or test command.

cargo osdk build --scheme tdx
cargo osdk run --scheme tdx
cargo osdk test --scheme tdx

OSDK User Reference

The Asterinas OSDK is a command line tool that can be used as a subcommand of Cargo. The common usage of OSDK is:

cargo osdk <COMMAND>

You can use cargo osdk -h to see the full list of available commands. For the specific usage of a subcommand, you can use cargo osdk help <COMMAND>.

Manifest

The OSDK utilizes a manifest named OSDK.toml to define its precise behavior regarding how to run a kernel with QEMU. The OSDK.toml file should be placed in the same folder as the project’s Cargo.toml. The Manifest documentation provides an introduction to all the available configuration options.

The command line tool can also be used to set the options in the manifest. If both occur, the command line options will always take priority over the options in the manifest. For example, if the manifest defines the path of QEMU as:

[qemu]
path = "/usr/bin/qemu-system-x86_64"

But the user provides a new QEMU path when running the project using:

cargo osdk run --qemu.path="/usr/local/qemu-kvm"

Then, the actual path of QEMU should be /usr/local/qemu-kvm since command line options have higher priority.

Commands

OSDK provides similar subcommands as Cargo, and these subcommands have similar meanings as corresponding Cargo subcommands.

Currently, OSDK supports the following subcommands:

  • new: Create a new kernel package or library package
  • build: Compile the project and its dependencies
  • run: Run the kernel with a VMM
  • test: Execute kernel mode unit test by starting a VMM
  • debug: Debug a remote target via GDB
  • profile: Profile a remote GDB debug target to collect stack traces
  • check: Analyze the current package and report errors
  • clippy: Check the current package and catch common mistakes

The new, build, run, test and debug subcommands can accept additional options, while the check and clippy subcommands can only accept arguments that are compatible with the corresponding Cargo subcommands.

cargo osdk new

Overview

The cargo osdk new command is used to create a kernel project or a new library project. The usage is as follows:

cargo osdk new [OPTIONS] <name>

Arguments

<name>: the name of the crate.

Options

--kernel: Use the kernel template. If this option is not set, the library template will be used by default.

--library: Use the library template. This is the default option.

Examples

  • Create a new kernel named myos:
cargo osdk new --kernel myos
  • Create a new library named mylib:
cargo osdk new mylib

cargo osdk build

Overview

The cargo osdk build command is used to compile the project and its dependencies. The usage is as follows:

cargo osdk build [OPTIONS]

Options

The options can be divided into two types: Cargo options that can be accepted by Cargo, and Manifest options that can also be defined in the manifest named OSDK.toml.

Cargo options

  • --profile <PROFILE>: Build artifacts with the specified Cargo profile (built-in candidates are ‘dev’, ‘release’, ‘test’, and ‘bench’) [default: dev]

  • --release: Build artifacts in release mode, with optimizations

  • --features <FEATURES>: Space or comma separated list of features to activate

  • --no-default-features: Do not activate the default features

  • --config <KEY=VALUE>: Override a configuration value

More Cargo options will be supported in future versions of OSDK.

Manifest options

These options can also be defined in the project’s manifest named OSDK.toml. Command line options are used to override or append values in OSDK.toml. The allowed values for each option can be found in the Manifest Documentation.

  • --kcmd-args <ARGS>: Command line arguments for the guest kernel
  • --init-args <ARGS>: Command line arguments for the init process
  • --initramfs <PATH>: Path of the initramfs
  • --boot-method <METHOD>: The method to boot the kernel
  • --grub-mkrescue <PATH>: Path of grub-mkrescue
  • --grub-boot-protocol <PROTOCOL>: The boot protocol for booting the kernel
  • --display-grub-menu: To display the GRUB menu if booting with GRUB
  • --qemu-exe <FILE>: The QEMU executable file
  • --qemu-args <ARGS>: Extra arguments for running QEMU
  • --strip-elf: Whether to strip the built kernel ELF using rust-strip
  • --scheme <SCHEME>: Select the specific configuration scheme provided in the OSDK manifest
  • --encoding <FORMAT>: Denote the encoding format for kernel self-decompression

Examples

  • Build a project with ./initramfs.cpio.gz as the initramfs and multiboot2 as the boot protocol used by GRUB:
cargo osdk build --initramfs="./initramfs.cpio.gz" --grub-boot-protocol="multiboot2"
  • Build a project and append sh, -l to init process arguments:
cargo osdk build --init_args="sh" --init_args="-l"

cargo osdk run

Overview

cargo osdk run is used to run the kernel with QEMU. The usage is as follows:

cargo osdk run [OPTIONS]

Options

Most options are the same as those of cargo osdk build. Refer to the documentation of cargo osdk build for more details.

Additionally, when running the kernel using QEMU, we can setup the QEMU as a debug server using option --gdb-server. This option supports an additional comma separated configuration list:

  • addr=ADDR: the network or unix socket address on which the GDB server listens (default: .osdk-gdb-socket, a local UNIX socket);
  • wait-client: let the GDB server wait for the GDB client before execution;
  • vscode: generate a ‘.vscode/launch.json’ for debugging with Visual Studio Code (Requires CodeLLDB).

Besides, to collect coverage data, we can use option --coverage. This option enables the coverage feature and collect coverage data to coverage.profraw when exit.

See Debug Command to interact with the GDB server in terminal.

Examples

Launch a debug server via QEMU with an unix socket stub, e.g. .debug:

cargo osdk run --gdb-server addr=.debug

```bash
cargo osdk run --gdb-server --gdb-server-addr .debug

Launch a debug server via QEMU with a TCP stub, e.g., localhost:1234:

cargo osdk run --gdb-server addr=:1234

Launch a debug server via QEMU and use VSCode to interact with:

cargo osdk run --gdb-server wait-client,vscode,addr=:1234

Launch a debug server via QEMU and use VSCode to interact with:

cargo osdk run --gdb-server --gdb-vsc --gdb-server-addr :1234

cargo osdk test

cargo osdk test is used to execute kernel mode unit test by starting QEMU. The usage is as follows:

cargo osdk test [TESTNAME] [OPTIONS] 

Arguments

TESTNAME: Only run tests containing this string in their names

Options

The options are the same as those of cargo osdk build. Refer to the documentation of cargo osdk build for more details.

Examples

  • Execute tests that include foo in their names using QEMU with 3GB of memory
cargo osdk test foo --qemu-args="-m 3G"

cargo osdk debug

Overview

cargo osdk debug is used to debug a remote target via GDB. You need to start a running server to debug with. This is accomplished by the run subcommand with --gdb-server. Then you can use the following command to attach to the server and do debugging.

cargo osdk debug [OPTIONS]

Note that when KVM is enabled, hardware-assisted break points (hbreak) are needed instead of the normal break points (break/b) in GDB.

Options

--remote <REMOTE>: Specify the address of the remote target [default: .osdk-gdb-socket]. The address can be either a path for the UNIX domain socket or a TCP port on an IP address.

Examples

To debug a remote target started with QEMU GDB stub or the run subcommand, use the following commands.

Connect to an unix socket, e.g., ./debug:

cargo osdk debug --remote ./debug

Connect to a TCP port ([IP]:PORT), e.g., localhost:1234:

cargo osdk debug --remote localhost:1234

cargo osdk profile

Overview

The profile command is used to collect stack traces when running the target kernel in QEMU. It attaches to the GDB server, initiated with the run subcommand, and collects the stack trace periodically. The collected information can be used to directly generate a flame graph, or be stored for later analysis using the original flame graph tool.

Options

--remote <REMOTE>:

Specify the address of the remote target. By default this is .osdk-gdb-socket

--samples <SAMPLES>:

The number of samples to collect (default 200). It is recommended to go beyond 100 for performance analysis.

--interval <INTERVAL>:

The interval between samples in seconds (default 0.1).

--parse <PATH>:

Parse a collected JSON profile file into other formats.

--format <FORMAT>:

Possible values: - json: The parsed stack trace log from GDB in JSON. - folded: The folded stack trace for flame graph. - flame-graph: A SVG flame graph.

If the user does not specify the format, it will be inferred from the output file extension. If the output file does not have an extension, the default format is flame graph.

--cpu-mask <CPU_MASK>:

The mask of the CPU to generate traces for in the output profile data (default first 128 cores). This mask is presented as an integer.

--output <PATH>:

The path to the output profile data file.

If the user does not specify the output path, it will be generated from the crate name, current time stamp and the format.

Examples

To profile a remote QEMU GDB server running some workload for flame graph, do:

cargo osdk profile --remote :1234 \
	--samples 100 --interval 0.01

If wanted a detailed analysis, do:

cargo osdk profile --remote :1234 \
	--samples 100 --interval 0.01 --output trace.json

When you get the above detailed analysis, you can also use the JSON file to generate the folded format for flame graph.

cargo osdk profile --parse trace.json --output trace.folded

Manifest

Overview

The OSDK tool utilizes a manifest to define its precise behavior. Typically, the configuration file is named OSDK.toml and is placed in the root directory of the workspace (the same directory as the workspace’s Cargo.toml). If there is only one crate and no workspace, the file is placed in the crate’s root directory.

For a crate inside workspace, it may have two distinct related manifests, one is of the workspace (in the same directory as the workspace’s Cargo.toml) and one of the crate (in the same directory as the crate’s Cargo.toml). OSDK will firstly try to find the crate-level manifest. If the crate-level manifest is found, OSDK uses it only. If the manifest is not found, OSDK will look into the workspace-level manifest.

Configurations

Below, you will find a comprehensive version of the available configurations in the manifest.

project_type = "kernel"                     # <1> 

# --------------------------- the default scheme settings -------------------------------
supported_archs = ["x86_64", "riscv64"]     # <2>

# The common options for all build, run and test subcommands 
[build]                                     # <3>
features = ["no_std", "alloc"]              # <4>
profile = "dev"                             # <5>
strip_elf = false                           # <6>
encoding = "raw"                            # <7>
[boot]                                      # <8>
method = "qemu-direct"                      # <9>
kcmd_args = ["SHELL=/bin/sh", "HOME=/"]     # <10>
init_args = ["sh", "-l"]                    # <11>
initramfs = "path/to/it"                    # <12>
[grub]                                      # <13>  
mkrescue_path = "path/to/it"                # <14>
boot_protocol = "multiboot2"                # <15> 
display_grub_menu = false                   # <16>
[qemu]                                      # <17>
path = "path/to/it"                         # <18>
args = "-machine q35 -m 2G"                 # <19>

# Special options for run subcommand
[run]                                       # <20>
[run.build]                                 # <3>
[run.boot]                                  # <8>
[run.grub]                                  # <13>
[run.qemu]                                  # <17>

# Special options for test subcommand
[test]                                      # <21>
[test.build]                                # <3>
[test.boot]                                 # <8>
[test.grub]                                 # <13>
[test.qemu]                                 # <17>
# ----------------------- end of the default scheme settings ----------------------------

# A customized scheme settings
[scheme."custom"]                           # <22>
[scheme."custom".build]                     # <3>
[scheme."custom".run]                       # <20>
[scheme."custom".test]                      # <21>

Here are some additional notes for the fields:

  1. The type of current crate.

    Optional. If not specified, the default value is inferred from the usage of the macro #[ostd::main]. if the macro is used, the default value is kernel. Otherwise, the default value is library.

    Possible values are library or kernel.

  2. The architectures that can be supported.

    Optional. By default OSDK supports all architectures. When building or running, if not specified in the CLI, the architecture of the host machine will be used.

    Possible values are aarch64, riscv64, x86_64.

  3. Options for compilation stage.

  4. Cargo features to activate.

    Optional. The default value is empty.

    Only features defined in Cargo.toml can be added to this array.

  5. Build artifacts with the specified Cargo profile.

    Optional. The default value is dev.

    Possible values are dev, release, test, and bench and other profiles defined in Cargo.toml.

  6. Whether to strip the built kernel ELF using rust-strip.

    Optional. The default value is false.

  7. Denote the encoding format for kernel self-decompression

    Optional. The default value is raw.

    Possible values are raw, gzip and zlib.

    If the boot protocol is not linux, it is not allowed to specipy the econding format.

  8. Options for booting the kernel.

  9. The boot method.

    Optional. The default value is qemu-direct.

    Possible values are grub-rescue-iso, grub-qcow2 and qemu-direct.

  10. The arguments provided will be passed to the guest kernel.

    Optional. The default value is empty.

    Each argument should be in one of the following two forms: KEY=VALUE or KEY if no value is required. Each KEY can appear at most once.

  11. The arguments provided will be passed to the init process, usually, the init shell.

    Optional. The default value is empty.

  12. The path to the initramfs.

    Optional. The default value is empty.

    If the path is relative, it is relative to the manifest’s enclosing directory.

  13. Grub options. Only take effect if boot method is grub-rescue-iso or grub-qcow2.

  14. The path to the grub-mkrescue executable.

    Optional. The default value is the executable in the system path, if any.

    If the path is relative, it is relative to the manifest’s enclosing directory.

  15. The protocol GRUB used.

    Optional. The default value is multiboot2.

    Possible values are linux, multiboot, multiboot2.

  16. Whether to display the GRUB menu when booting with GRUB.

    Optional. The default value is false.

  17. Options for finding and starting QEMU.

  18. The path to the QEMU executable.

    Optional. The default value is the executable in the system path, if any.

    If the path is relative, it is relative to the manifest’s enclosing directory.

  19. Additional arguments passed to QEMU are organized in a single string that can include any POSIX shell compliant separators.

    Optional. The default value is empty.

    Each argument should be in the form of KEY and VALUE or KEY if no value is required. Some keys can appear multiple times (e.g., -device, -netdev), while other keys can appear at most once. Certain keys, such as -kernel and -initrd, are not allowed to be set here as they may conflict with the settings of OSDK.

    The field will be evaluated, so it is ok to use environment variables in the arguments (usually for paths or conditional arguments). You can even use this mechanism to read from files by using command replacement $(cat path/to/your/custom/args/file).

  20. Special settings for running. Only take effect when running cargo osdk run.

    By default, it inherits common options.

    Values set here are used to override common options.

  21. Special settings for testing.

    Similar to 20, but only take effect when running cargo osdk test.

  22. The definition of customized scheme.

    A customized scheme has the same fields as the default scheme. By default, a customized scheme will inherit all options from the default scheme, unless overridden by new options.

Example

Here is a sound, self-explanatory example which is used by OSDK in the Asterinas project.

In the script ./tools/qemu_args.sh, the environment variables will be used to determine the actual set of qemu arguments.

# The OSDK manifest at the Asterinas root virtual workspace 
# provides default OSDK settings for all packages.

# The common options for build, run and test
[boot]
method = "grub-rescue-iso"

[grub]
boot_protocol = "multiboot2"

[qemu]
args = "$(./tools/qemu_args.sh normal)"

# Special options for running
[run.boot]
kcmd_args = [
    "SHELL=/bin/sh",
    "LOGNAME=root",
    "HOME=/",
    "USER=root",
    "PATH=/bin:/benchmark",
    "init=/usr/bin/busybox",
]
init_args = ["sh", "-l"]
initramfs = "test/initramfs/build/initramfs.cpio.gz"

# Special options for testing
[test.qemu]
args = "$(./tools/qemu_args.sh test)"

# Other Schemes

[scheme."microvm"]
boot.method = "qemu-direct"
build.strip_elf = true
qemu.args = "$(./tools/qemu_args.sh microvm)"

[scheme."iommu"]
supported_archs = ["x86_64"]
qemu.args = "$(./tools/qemu_args.sh iommu)"

[scheme."tdx"]
supported_archs = ["x86_64"]
build.features = ["cvm_guest"]
boot.method = "grub-qcow2"
grub.boot_protocol = "linux"
qemu.args = "$(./tools/qemu_args.sh tdx)"

[scheme."riscv"]
supported_archs = ["riscv64"]
boot.method = "qemu-direct"
build.strip_elf = false

qemu.args = """\
    -cpu rv64,svpbmt=true \
    -machine virt \
    -m ${MEM-:8G} \
    -smp ${SMP-:1} \
    --no-reboot \
    -nographic \
    -display none \
    -serial chardev:mux \
    -monitor chardev:mux \
    -chardev stdio,id=mux,mux=on,signal=off,logfile=qemu.log \
    -drive if=none,format=raw,id=x0,file=./test/initramfs/build/ext2.img \
    -drive if=none,format=raw,id=x1,file=./test/initramfs/build/exfat.img \
    # NOTE: The `/etc/profile.d/init.sh` assumes that `ext2.img` appears as the first block device (`/dev/vda`).
    # The ordering below ensures `x1` (ext2.img) is discovered before `x0`, maintaining this assumption.
    # TODO: Once UUID-based mounting is implemented, this strict ordering will no longer be required.
    -device virtio-blk-device,drive=x1 \
    -device virtio-blk-device,drive=x0 \
    -device virtio-keyboard-device \
    -device virtio-serial-device \
    -device virtconsole,chardev=mux \
"""

[scheme."sifive_u"]
supported_archs = ["riscv64"]
build.features = ["riscv_sv39_mode"]
boot.method = "qemu-direct"
build.strip_elf = false
qemu.with_monitor = true

qemu.args = """\
    -machine sifive_u \
    -m 8G \
    -smp 5 \
    -no-reboot \
    -nographic \
    -display none \
    -serial chardev:mux \
    -monitor chardev:mux \
    -chardev stdio,id=mux,mux=on,signal=off,logfile=qemu.log
"""

[scheme."loongarch"]
boot.method = "qemu-direct"
build.strip_elf = false

qemu.args = """\
    -machine virt \
    -m 8G \
    -smp 1 \
    --no-reboot \
    -nographic \
    -display none \
    -serial chardev:mux \
    -monitor chardev:mux \
    -chardev stdio,id=mux,mux=on,signal=off,logfile=qemu.log \
    -device virtio-keyboard-pci \
    -device virtio-serial \
    -device virtconsole,chardev=mux \
    -rtc base=utc
"""

Scheme

Scheme is an advanced feature that allows you to create multiple profiles for the same action (build, run, or test) under different scenarios (e.g., x86 vs. RISC-V). Schemes support any user-defined keys (see the 22nd configuration) and can be selected using the --scheme CLI argument.

If a scheme <s> is selected for an action (such as test), the value of an unspecified but required configuration key will be determined by the following rules:

  • If the general config (scheme.<s>.key) exists for the selected scheme, use this value.
  • Otherwise, if the default scheme has this configuration for the action (test.key), use this value.
  • Otherwise, if the default scheme has this configuration (key), use this value.

If the value is still not found, either the default value for the key will be used or an error will be thrown.

Environment Variables

The OSDK tool uses the following environment variables to customize compilation and target behavior.

OSDK-Specific Variables

  • OSDK_TARGET_ARCH: If set, OSDK will use the specified value as its default target architecture (e.g., x86_64, riscv64). If not set, OSDK will default to the host architecture.

Before You Contribute

This part of the Book provides guidelines for contributing to Asterinas.

Coding Guidelines

This section describes coding and collaboration conventions for the Asterinas project. These guidelines aim to keep code clear, consistent, maintainable, correct, and efficient.

For the underlying philosophy, principles, and quality criteria for individual guidelines, see How Guidelines Are Written.

The guidelines are organized into the following pages:

  • General Guidelines — Language-agnostic guidance for naming, comments, layout, formatting, and API design.
  • Rust Guidelines — Rust-specific guidance for naming, language items, and cross-cutting topics.
  • Git Guidelines — Commit hygiene and pull-request conventions.
  • Testing Guidelines — Test behavior, assertions, regression policy, and cleanup.
  • Assembly Guidelines.S and global_asm! conventions for sectioning, function metadata, labels, and alignment.

The guidelines represent the desired state of the codebase. If you find conflicting cases in the codebase, you are welcome to fix the code to match the guidelines.

Index

CategoryGuidelineShort Name
GeneralBe descriptivedescriptive-names
GeneralBe accurateaccurate-names
GeneralEncode units and important attributes in namesencode-units
GeneralUse assertion-style boolean namesbool-names
GeneralPrefer semantic line breakssemantic-line-breaks
GeneralExplain why, not whatexplain-why
GeneralDocument design decisionsdesign-decisions
GeneralCite specifications and algorithm sourcescite-sources
GeneralOne concept per fileone-concept-per-file
GeneralOrganize code for top-down readingtop-down-reading
GeneralGroup statements into logical paragraphslogical-paragraphs
GeneralFormat error messages consistentlyerror-message-format
GeneralStick to familiar conventionsfamiliar-conventions
GeneralHide implementation detailshide-impl-details
GeneralValidate at boundaries, trust internallyvalidate-at-boundaries
RustFollow Rust CamelCase and acronym capitalizationcamel-case-acronyms
RustEnd closure variables with _fnclosure-fn-suffix
RustIntroduce explaining variablesexplain-variables
RustUse block expressions to scope temporary stateblock-expressions
RustUse checked or saturating arithmeticchecked-arithmetic
RustMinimize nestingminimize-nesting
RustKeep functions small and focusedsmall-functions
RustAvoid boolean argumentsno-bool-args
RustUse types to enforce invariantsrust-type-invariants
RustPrefer enum over trait objects for closed setsenum-over-dyn
RustEncapsulate fields behind gettersgetter-encapsulation
RustFollow RFC 1574 summary line conventionsrfc1574-summary
RustEnd sentence comments with punctuationcomment-punctuation
RustWrap identifiers in backticksbacktick-identifiers
RustDo not disclose implementation details in doc commentsno-impl-in-docs
RustAdd module-level documentation for major componentsmodule-docs
RustJustify every use of unsafejustify-unsafe-use
RustDocument safety conditionsdocument-safety-conds
RustDeny unsafe code in kernel/deny-unsafe-kernel
RustReason about safety at the module boundarymodule-boundary-safety
RustDefault to the narrowest visibilitynarrow-visibility
RustUse workspace dependenciesworkspace-deps
RustSuppress lints at the narrowest scopenarrow-lint-suppression
RustUse #[expect(dead_code)] with restraintexpect-dead-code
RustPrefer functions over macrosmacros-as-last-resort
RustEstablish and enforce a consistent lock orderlock-ordering
RustNever do I/O or blocking operations while holding a spinlockno-io-under-spinlock
RustDo not use atomics casuallycareful-atomics
RustCritical sections must not be split across lock boundariesatomic-critical-sections
RustUse debug_assert for correctness-only checksdebug-assert
RustPropagate errors with ?propagate-errors
RustUse log crate macros exclusivelylog-crate-only
RustChoose appropriate log levelslog-levels
RustUse RAII for all resource acquisition and releaseraii
RustAvoid O(n) algorithms on hot pathsno-linear-hot-paths
RustMinimize unnecessary copies and allocationsminimize-copies
RustNo premature optimization without evidenceno-premature-optimization
GitWrite imperative, descriptive subject linesimperative-subject
GitOne logical change per commitatomic-commits
GitSeparate refactoring from featuresrefactor-then-feature
GitKeep pull requests focusedfocused-prs
TestingAdd regression tests for every bug fixadd-regression-tests
TestingTest user-visible behavior, not internalstest-visible-behavior
TestingUse assertion macros, not manual inspectionuse-assertions
TestingClean up resources after every testtest-cleanup
AssemblyUse the correct section directiveasm-section-directives
AssemblyPlace code-width directives after the section definitionasm-code-width
AssemblyPlace attributes directly before the functionasm-function-attributes
AssemblyAdd .type and .size for Rust-callable functionsasm-type-and-size
AssemblyUse unique label prefixes to avoid name clashesasm-label-prefixes
AssemblyPrefer .balign over .alignasm-prefer-balign

How Guidelines Are Written

The guidelines in this collection reflect a set of widely-recognized philosophy and principles for writing high-quality software. Three books have influenced the guidelines the most:

  1. The Art of Readable Code
  2. Clean Code
  3. Code Complete

The guidelines are derived from code review experience, where we point out code smells, observe anti-patterns, and fix recurring bugs. By the time the initial version of these guidelines was formulated, Asterinas had seen thousands of code reviews. We collected the historical review comments as a dataset and used it as both inspiration and evidence.

The remainder of this page moves from values to practice. The Philosophy section captures foundational beliefs about what makes code understandable and maintainable. The Principles section turns those beliefs into design-level rules that guide everyday decisions. Finally, the Quality Criteria section defines how individual guidelines are written and accepted into this collection.

Philosophy

Minimize time to understand

Code should be written to minimize the time it would take someone else to fully understand it. This is the fundamental theorem of readability and the single most important measure of code quality in this project. “Someone else” includes your future self.

Code is read far more often than it is written. If a technique makes code shorter but harder to follow at a glance, choose clarity over brevity.

Managing complexity is the primary technical imperative

No one can hold an entire modern program in their head. The purpose of every technique in software construction — decomposition, naming, encapsulation, abstraction — is to break complex problems into simple pieces so that you can safely focus on one thing at a time.

Craftsmanship and care

Clean code looks like it was written by someone who cares. Professionalism means never knowingly leaving a mess. The only way to go fast is to keep the code clean at all times.

Continuous improvement

Leave code cleaner than you found it. Small, steady improvements — renaming a variable, extracting a function, eliminating duplication — prevent code from rotting over time.

Principles

Single Responsibility

Each module, type, or function should have one, and only one, reason to change. If you cannot describe what a unit does without the words “and,” “or,” or “but,” it has too many responsibilities.

Don’t Repeat Yourself (DRY)

Every piece of knowledge should have a single, unambiguous representation. Duplication harms readability and maintainability. When the same pattern appears three or more times, eliminate the duplication (e.g., adding a helper function).

Information Hiding

Hide design decisions behind well-defined interfaces. A module’s public surface should contain only what its consumers need. Internal data structures, helper types, and bookkeeping fields should remain private.

Open for Extension, Closed for Modification

Stable modules and APIs should be open to extension but closed to breaking modification. Prefer adding new behavior through existing interfaces (traits, enums, and pluggable components) instead of repeatedly editing established call paths. Do not introduce extension points preemptively; add them when there is a concrete extension need.

Least Surprise

Functions, types, and APIs should behave as their names and signatures suggest. When an obvious behavior is not implemented, readers lose trust in the codebase and must fall back on reading implementation details.

Loose Coupling, Strong Cohesion

Connections between modules should be small, visible, and flexible. Within a module, every part should contribute to a single, well-defined purpose.

Consistency

Do similar things the same way throughout the codebase. Consistency reduces surprise and cognitive load even when neither approach is objectively superior. When a convention already exists, follow it; do not introduce a competing convention without compelling justification.

Test as the Source of Confidence

Tests exist to make change safe. A comprehensive test suite should give developers confidence that a passing run means the system works and a failing run pinpoints what broke. Every test must earn its place by increasing that confidence. A test that does not — flaky tests, tautological assertions, tests coupled to implementation details — is worse than no test at all.

Rust-Native Approach

Asterinas is inspired by Linux but is not a C port. The language shapes how we think about problems: where C code relies on conventions and manual discipline (return-code checking, paired init/cleanup, header-file contracts), Rust offers compiler-enforced, zero-cost abstractions (the ? operator, RAII, trait bounds).

Learn from Linux’s design, not its idioms. The result should read like idiomatic Rust, not like C written in Rust syntax.

Quality Criteria

Every guideline carries a descriptive short name in kebab-case (e.g., explain-variables, lock-ordering). Short names are kept intact even as the guidelines evolve and should be used when referencing guidelines in code reviews.

A guideline is accepted into this collection when it satisfies all four quality criteria:

  1. Concrete — Framed as an actionable item with an illustrating example when possible.
  2. Concise — Kept short; we do not want to intimidate readers.
  3. Grounded — Opinionated or non-obvious guidelines should include a “See also” line with supportive materials (literature, PR reviews, codebase examples).
  4. Relevant — Included only if it has codebase examples, prevents a past bug, or matches anti-patterns observed in code reviews.

When present, the “See also” line lists sources in the order: literature; PR reviews; codebase examples. Not every guideline needs all three; for strongly opinionated or non-obvious guidelines, include the line by default.

Do not add a guideline whose only value is mechanical enforcement already provided by automated tools such as rustfmt and clippy. If a tool-enforced convention appears frequently in review or needs project-specific rationale, keep a short explanatory guideline and point to the tool configuration.

Example

Below is a sample guideline demonstrating these conventions:

### Introduce explaining variables (`explain-variables`) {#explain-variables}

Break down complex expressions
by assigning intermediate results to well-named variables.
An explaining variable turns an opaque expression
into self-documenting code:

```rust
// Good — intent is clear
let is_page_aligned = addr % PAGE_SIZE == 0;
let is_within_range = addr < max_addr;
debug_assert!(is_page_aligned && is_within_range);

// Bad — reader must parse the whole expression
debug_assert!(addr % PAGE_SIZE == 0 && addr < max_addr);
```

See also:
_The Art of Readable Code_, Chapter 8 "Breaking Down Giant Expressions";
PR [#2083](https://github.com/asterinas/asterinas/pull/2083#discussion_r2512772091)
and [#643](https://github.com/asterinas/asterinas/pull/643#discussion_r1497243812).

This example demonstrates:

  • A stable short name (explain-variables) shown parenthetically and used as the heading anchor
  • A one-sentence actionable recommendation followed by a code example
  • A “See also” line ordered as: literature, PR reviews, codebase examples

General Guidelines

Names

Be descriptive (descriptive-names)

Choose names that convey meaning at the point of use. Avoid single-letter names and ambiguous abbreviations. Prefer full words over cryptic shorthand so that readers do not need surrounding context to understand a variable’s purpose. Prefer names that are as short as possible while still being unambiguous at the point of use.

Be accurate (accurate-names)

Avoid confusing names. If a name can be misread to imply the wrong meaning, behavior, or side effects, it must be corrected immediately.

// Good — clearly a count
nr_deleted_watches: usize,
// Bad — looks like a collection
// rather than a numeric counter
deleted_watches: usize

Choose verbs that reflect the actual work being done.

impl PciCommonDevice {
    // Good — implies a MMIO read is involved
    pub fn read_command(&self) -> Command { /* .. */ }
    // Bad — looks like a plain field access
    pub fn command(&self) -> Command { /* .. */ }
}
mod char_device {
    // Good — implies an O(n) collection pass
    pub fn collect_all() -> Vec<Arc<dyn Device>> { /* .. */ }
    // Bad — sounds like an accessor returning a reference
    pub fn get_all() -> Vec<Arc<dyn Device>> { /* .. */ }
}

See also: PR #1488 and #2964.

Encode units and important attributes in names (encode-units)

When the type does not encode the unit, the name must. Kernel code deals with bytes, pages, frames, nanoseconds, ticks, and sectors — ambiguous units are a source of real bugs.

// Good — unit is unambiguous
timeout_ns
offset_bytes
size_pages
delay_ms

// Bad — unit is ambiguous
timeout
offset
size
delay

Where the language’s type system can enforce units (e.g., newtypes), prefer that. Where it cannot, the name must carry the information.

See also: PR #2796.

Use assertion-style boolean names (bool-names)

Boolean variables and functions should read as assertions of fact. Use is_, has_, can_, should_, was_, or needs_ prefixes. Never use negated names (is_not_empty, no_error); prefer the positive form (is_empty, ok or succeeded). A bare name like found, done, or ready is acceptable when the context is unambiguous.

// Good — reads as an assertion
fn is_page_aligned(&self) -> bool { ... }
fn has_permission(&self, perm: Permission) -> bool { ... }
let can_read = mode.is_readable();

// Bad — verb suggests an action, not a query
fn check_permission(&self, perm: Permission) -> bool { ... }
// Bad — negated name
let is_not_empty = !buf.is_empty();

See also: PR #1488.

Comments

Prefer semantic line breaks (semantic-line-breaks)

For prose in Markdown and doc comments, insert line breaks at semantic boundaries so each line carries one coherent idea. At minimum, break at sentence boundaries. For longer sentences, also consider breaking at clause boundaries.

Semantic line breaks make diffs smaller, reviews easier, and merge conflicts less noisy.

As an exception, RFC documents that are mostly read-only can use regular paragraph wrapping.

See also: Semantic Line Breaks.

Explain why, not what (explain-why)

Comments should explain the intent behind the code, not restate what the code does. If a comment merely paraphrases the code, it adds noise without insight.

If a comment is needed to explain what code does, first try to rewrite the code. Do not write good comments to compensate for bad code — rewrite it to be straightforward.

See also: The Art of Readable Code, Chapter 6 “Knowing What to Comment”; PR #2265 and #2050.

Document design decisions (design-decisions)

When the code makes a non-obvious choice — a particular data structure, a locking strategy, a deviation from Linux behavior — add a comment explaining the rationale and any alternatives considered. Design-decision comments (“director’s commentary”) are the most valuable kind of comment.

// We use a radix tree rather than a HashMap
// because lookups must be O(log n) worst-case
// for the page fault handler.
// A HashMap gives O(1) amortized
// but O(n) worst-case due to rehashing,
// which is unacceptable on the page fault path.

See also: PR #2265 and #2050.

Cite specifications and algorithm sources (cite-sources)

When implementing behavior defined by an external specification or a non-trivial algorithm, cite the source: the relevant POSIX section, Linux man page, hardware reference manual, or academic paper.

/// Maximum number of bytes guaranteed to be written to a pipe atomically.
///
/// For more details, see the description of `PIPE_BUF` in
/// <https://man7.org/linux/man-pages/man7/pipe.7.html>.
const PIPE_BUF: usize = 4096;

Layout

One concept per file (one-concept-per-file)

When a file grows long or contains multiple distinct concepts, split it. Each major data structure, each subsystem entry point, each significant abstraction deserves its own file.

Organize code for top-down reading (top-down-reading)

A source file should read from top to bottom. Start with high-level entry points and core flow. Move implementation details downward so readers can understand the big picture first before diving into low-level helpers.

Within each visibility group (e.g., a module), order methods so that callers appear before callees where possible, enabling the file to be read top to bottom. Place public methods before private helpers.

Group statements into logical paragraphs (logical-paragraphs)

Within functions, group related statements into logical paragraphs separated by blank lines. Each paragraph should represent one sub-step of the function’s overall purpose.

For long functions, add a one-line summary comment at the start of each paragraph when the paragraph intent is not obvious.

Formatting

Format error messages consistently (error-message-format)

Start with a lowercase letter (unless the first word is a proper noun or identifier). Be specific: prefer “len is too large” over “the argument is invalid”.

For system call errors, follow the style and descriptions in Linux man pages.

API Design

Stick to familiar conventions (familiar-conventions)

Prefer names and API shapes that users already know from Rust and Linux. Do not invent new terms for well-known operations.

// Good — follows common Rust naming conventions
pub fn len(&self) -> usize { ... }
pub fn as_ptr(&self) -> *const u8 { ... }

// Bad — unfamiliar synonyms for common operations
pub fn length(&self) -> usize { ... }
pub fn to_pointer(&self) -> *const u8 { ... }

See also: Least Surprise.

Hide implementation details (hide-impl-details)

Do not expose internal implementation details through public APIs (including their documentation). A module’s public surface should contain only what its consumers need.

See also: Modules and Crates for Rust-specific visibility rules; PR #2951.

Validate at boundaries, trust internally (validate-at-boundaries)

Designate certain interfaces as validation boundaries. In Asterinas, syscall entry points are the primary boundary: all user-supplied data (pointers, file descriptors, sizes, flags, strings) must be validated at the syscall boundary. Once validated, internal kernel functions may trust these values without re-validation.

See also: PR #2806.

Rust Guidelines

Asterinas follows the Rust API Guidelines and the project-specific conventions below.

Naming

Asterinas enforces strict, Rust-idiomatic naming across the entire codebase. Names must be accurate, unabbreviated, and follow Rust API Guidelines on naming.

Follow Rust CamelCase and acronym capitalization (camel-case-acronyms)

Type names follow Rust’s CamelCase convention. Acronyms are title-cased per the Rust API Guidelines:

// Good
IoMemoryArea
PciDeviceLocation
Nvme
Tcp

// Bad
IOMemoryArea
PCIDeviceLocation
NVMe
TCP

End closure variables with _fn (closure-fn-suffix)

Variables holding closures or function pointers must signal they are callable by ending with _fn. Treating a closure variable as if it were a data object misleads readers.

// Good — clearly a callable
let task_fn = self.func.take().unwrap();
let thread_fn = move || {
    let _ = oops::catch_panics_as_oops(task_fn);
    current_thread!().exit();
};

let expired_fn = move |_guard: TimerGuard| {
    ticks.fetch_add(1, Ordering::Relaxed);
    pollee.notify(IoEvents::IN);
};

See also: PR #395 and #783.

Language Items

This section covers conventions for specific Rust language constructs. Each page addresses one category with actionable guidelines.

Variables, Expressions, and Statements

Introduce explaining variables (explain-variables)

Break down complex expressions by assigning intermediate results to well-named variables. An explaining variable turns an opaque expression into self-documenting code:

// Good — intent is clear
let is_page_aligned = addr % PAGE_SIZE == 0;
let is_within_range = addr < max_addr;
debug_assert!(is_page_aligned && is_within_range);

// Bad — reader must parse the whole expression
debug_assert!(addr % PAGE_SIZE == 0 && addr < max_addr);

See also: The Art of Readable Code, Chapter 8 “Breaking Down Giant Expressions”; PR #2083 and #643.

Use block expressions to scope temporary state (block-expressions)

Use block expressions when temporary variables are only needed to produce one final value. This keeps temporary state local and avoids leaking one-off names into outer scope.

// Good — intermediate values are scoped to the block
let socket_addr = {
    let bytes = read_bytes_from_user(addr, len as usize)?;
    parse_socket_addr(&bytes)?
};
connect(socket_addr)?;

// Bad — temporary variables leak into outer scope
let bytes = read_bytes_from_user(addr, len as usize)?;
let socket_addr = parse_socket_addr(&bytes)?;
connect(socket_addr)?;

Use checked or saturating arithmetic (checked-arithmetic)

Use checked or saturating arithmetic for operations that could overflow. Prefer explicit overflow handling over silent wrapping:

// Good — overflow is handled explicitly
let total = base.checked_add(offset)
    .ok_or(Error::new(Errno::EOVERFLOW))?;

// Good — clamps instead of wrapping
let remaining = budget.saturating_sub(cost);

// Bad — may silently wrap in release builds
let total = base + offset;

If wraparound behavior is intentional, use explicit wrapping_* or overflowing_* operations and document why wrapping is correct.

Functions and Methods

Minimize nesting (minimize-nesting)

Minimize nesting depth. Code nested more than three levels deep should be reviewed for refactoring opportunities. Each nesting level multiplies the reader’s cognitive load.

Techniques for flattening nesting:

  • Early returns and guard clauses for error paths.
  • let...else to collapse if let chains.
  • The ? operator for error propagation.
  • continue to skip loop iterations.
  • Extracting the nested body into a helper function.

The normal/expected code path should be the first visible path; error and edge cases should be handled and dismissed early.

pub(crate) fn init() {
    let Some(framebuffer_arg) = boot_info().framebuffer_arg else {
        log::warn!("Framebuffer not found");
        return;
    };
    // ... main logic at the top level
}

See also: PR #2877.

Keep functions small and focused (small-functions)

Each function should do one thing, do it well, and do it only. If you can extract another function from it with a name that is not merely a restatement of its implementation, the original function is doing more than one thing.

Do not mix levels of abstraction. For example, a syscall handler should read like a specification; byte-level manipulation belongs in a helper.

// Good — each function operates at one level of abstraction
pub fn sys_connect(sockfd: i32, addr: Vaddr, len: u32) -> Result<()> {
    let socket = get_socket(sockfd)?;
    let remote_addr = parse_socket_addr(addr, len)?;
    socket.connect(remote_addr)
}

// Bad — mixes high-level logic with low-level details
pub fn sys_connect(sockfd: i32, addr: Vaddr, len: u32) -> Result<()> {
    let fd_table = current_process().fd_table().lock();
    let file = fd_table.get(sockfd).ok_or(Errno::EBADF)?;
    let socket = file.downcast_ref::<Socket>().ok_or(Errno::ENOTSOCK)?;
    let bytes = read_bytes_from_user(addr, len as usize)?;
    let family = u16::from_ne_bytes([bytes[0], bytes[1]]);
    // ... 30 more lines of byte parsing ...
}

See also: Clean Code, Chapter 3 “Functions”; PR #639.

Avoid boolean arguments (no-bool-args)

A boolean parameter that selects between two behaviors signals the function does two things. Split it into two functions or use a typed enum.

// Good — two separate functions
fn read(&self, buf: &mut [u8]) -> Result<usize> { ... }
fn read_nonblocking(&self, buf: &mut [u8]) -> Result<usize> { ... }

// Good — typed enum
enum ReadMode { Blocking, NonBlocking }
fn read(&self, buf: &mut [u8], mode: ReadMode) -> Result<usize> { ... }

// Bad — boolean argument
fn read(&self, buf: &mut [u8], blocking: bool) -> Result<usize> { ... }

See also: Clean Code, Chapter 3 “Flag Arguments”.

Types and Traits

Use types to enforce invariants (rust-type-invariants)

Leverage the type system to make illegal states unrepresentable.

Define newtypes to encode domain constraints.

// Good — a `Nice` value is guaranteed to be valid
pub struct Nice(NiceValue);
type NiceValue = RangedI8<-20, 19>;

// Bad — `i8` admits invalid values for nice levels
pub type Nice = i8;

Prefer enums over bare integers and boolean flags.

// Good — access mode is constrained by the enum
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub enum AccessMode {
    O_RDONLY = 0,
    O_WRONLY = 1,
    O_RDWR = 2,
}

// Bad — `u8` admits invalid values
pub type AccessMode = u8;

Encode invariants in generic parameters where needed.

impl IoMem<Sensitive> {
    // Good — only unsafe code can write to sensitive MMIO
    pub unsafe fn write_u32(&self, offset: usize, new_val: u32) { /* .. */ }
}

impl IoMem<Insensitive> {
    // Good — safe code can write to insensitive MMIO
    pub fn write_u32(&self, offset: usize, new_val: u32) { /* .. */ }
}

pub enum Sensitive {}
pub enum Insensitive {}

Asterinas uses this pattern widely, for example with newtypes such as CpuId and AlignedUsize<const N: u16>.

See also: PR #2265 and #2514.

Prefer enum over trait objects for closed sets (enum-over-dyn)

When the set of variants is known and closed, an enum is often preferable to Box<dyn Trait> for both performance and pattern-matching expressiveness.

// Good — closed set modeled as an enum
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum TermStatus {
    Exited(u8),
    Killed(SigNum),
}

Encapsulate fields behind getters (getter-encapsulation)

Do not make fields public when a simple getter method would do. A getter preserves naming flexibility and leaves room for future invariants.

// Good — field is private, accessed via getter
pub struct Vma {
    perms: VmPerms,
}

impl Vma {
    pub fn perms(&self) -> VmPerms {
        self.perms
    }
}

// Bad — public field exposes representation
pub struct Vma {
    pub perms: VmPerms,
}

Comments and Documentation

API documentation describes API meaning and usage and is rendered by rustdoc. Public APIs should be documented, including crates, modules, structs, traits, functions, and macros. The #![warn(missing_docs)] lint helps enforce this baseline.

Asterinas follows Rust community documentation conventions. Two primary references are:

  1. The rustdoc book: How to write documentation
  2. The Rust RFC book: API Documentation Conventions

Follow RFC 1574 summary line conventions (rfc1574-summary)

The first line of a doc comment should be third-person singular present indicative (“Returns”, “Creates”, “Acquires”), concise, and one sentence.

/// Returns the mapping's start address.
pub fn map_to_addr(&self) -> Vaddr {
    self.map_to_addr
}

End sentence comments with punctuation (comment-punctuation)

If a comment line is a full sentence, end it with proper punctuation. This improves readability in dense code and avoids fragmented prose.

// Good — complete sentence with punctuation.
// SAFETY: The pointer is derived from a live allocation.

// Bad — complete sentence without punctuation
// SAFETY: The pointer is derived from a live allocation

Wrap identifiers in backticks (backtick-identifiers)

Type names, method names, and code identifiers in doc comments should be wrapped in backticks for rustdoc rendering. When referring to types, prefer rustdoc links ([TypeName]) where possible.

/// Acquires the [`SpinLock`] and returns a guard
/// that releases the lock on [`Drop`].
///
/// Callers must not call `acquire` while holding
/// a [`RwMutex`] to avoid deadlock.
pub fn acquire(&self) -> SpinLockGuard<'_, T> { ... }

Do not disclose implementation details in doc comments (no-impl-in-docs)

Doc comments should describe what the API does and how to use it, not how it is implemented internally.

// Good — behavior-oriented
/// Returns the number of active connections.

// Bad — leaks implementation details
/// Returns the length of the internal `HashMap`
/// that tracks connections by socket address.

Add module-level documentation for major components (module-docs)

A module file that serves as an important kernel component (e.g., subsystem entry point, major data structure, driver) should begin with a //! comment explaining:

  1. What the module does
  2. The key types it exposes
  3. How it relates to neighboring modules
//! Virtual memory area (VMA) management.
//!
//! This module defines [`VmMapping`] and associated types,
//! which represent contiguous regions of a process's virtual address space.
//! VMAs are managed by the [`Vmar`] tree in the parent module.

Unsafety

Justify every use of unsafe (justify-unsafe-use)

Every unsafe block must have a preceding // SAFETY: comment that justifies why the operation is sound. For multi-condition invariants, use a numbered list:

// SAFETY:
// 1. We have exclusive access to both the current context
//    and the next context (see above).
// 2. The next context is valid (because it is either
//    correctly initialized or written by a previous
//    `context_switch`).
unsafe {
    context_switch(next_task_ctx_ptr, current_task_ctx_ptr);
}

See also: PR #2958 and #836.

Document safety conditions (document-safety-conds)

All unsafe functions and traits must include a # Safety section in their doc comments describing the conditions, properties, or invariants that callers must uphold. State exactly what the caller must guarantee — not implementation details or side effects.

/// A marker trait for guard types that enforce the atomic mode.
///
/// # Safety
///
/// The implementer must ensure that the atomic mode is maintained while
/// the guard type is alive.
pub unsafe trait InAtomicMode: core::fmt::Debug {}

Deny unsafe code in kernel/ (deny-unsafe-kernel)

All crates under kernel/ must deny unsafe:

#![deny(unsafe_code)]

Only OSTD (ostd/) crates may contain unsafe code. If a kernel crate requires an unsafe operation, the functionality should be provided as a safe API in OSTD.

Reason about safety at the module boundary (module-boundary-safety)

The safety of an unsafe block depends on ALL code that can access the same private state. Encapsulate unsafe abstractions in the smallest possible module to minimize the “audit surface.” Any code in the same module that can modify relied-upon fields is part of the safety argument.

// Good — small, focused module limits the audit surface
mod frame_allocator {
    /// Invariant: `next` is always a valid frame index.
    struct FrameAlloc {
        next: usize,
        // ...
    }

    impl FrameAlloc {
        pub fn alloc(&mut self) -> PhysAddr {
            // SAFETY: `next` is always valid (see invariant above).
            // Only code in this module can modify `next`.
            unsafe { self.alloc_frame_unchecked(self.next) }
        }
    }
}

Modules and Crates

Default to the narrowest visibility (narrow-visibility)

Start private, then widen to pub(super), pub(crate), or pub only when an actual external consumer requires it.

// Good — restricted to the parent module
pub(super) static I8042_CONTROLLER:
    Once<SpinLock<I8042Controller, LocalIrqDisabled>> = Once::new();

pub(super) fn init() -> Result<(), I8042ControllerError> {
    // ...
}

// Bad — unnecessarily wide
pub static I8042_CONTROLLER: ...

See also: PR #2951 and #2605.

Use workspace dependencies (workspace-deps)

Always declare shared dependencies in the workspace [workspace.dependencies] table and reference them with .workspace = true in member crates.

# In the workspace root Cargo.toml
[workspace.dependencies]
ostd = { version = "0.17.0", path = "ostd" }
bitflags = "2.6"

# In a member crate's Cargo.toml
[dependencies]
ostd.workspace = true
bitflags.workspace = true

Macros and Attributes

Suppress lints at the narrowest scope (narrow-lint-suppression)

When suppressing lints, the suppression should affect as little scope as possible. This makes readers aware of the exact places where the lint is generated and makes it easier for subsequent committers to maintain the suppression.

// Good — each method is individually marked
trait SomeTrait {
    #[expect(dead_code)]
    fn foo();

    #[expect(dead_code)]
    fn bar();

    fn baz();
}

// Bad — the entire trait is suppressed
#[expect(dead_code)]
trait SomeTrait { ... }

There is one exception: if it is clear enough that every member will trigger the lint, it is reasonable to expect the lint at the type level.

#[expect(non_camel_case_types)]
enum SomeEnum {
    FOO_ABC,
    BAR_DEF,
}

See also: Clippy allow_attributes, Clippy allow_attributes_without_reason, and rustc unfulfilled_lint_expectations.

Use #[expect(dead_code)] with restraint (expect-dead-code)

In general, dead code should be avoided because (i) it introduces unnecessary maintenance overhead, and (ii) its correctness can only be guaranteed by manual and error-prone review.

Dead code is acceptable only when all of these hold:

  1. A concrete case will be implemented in the future that turns the dead code into used code.
  2. The semantics are clear enough, even without the use case.
  3. The dead code is simple enough that both the committer and the reviewer can be confident it is correct without testing.
  4. It serves as a counterpart to existing non-dead code.

For example, it is fine to add ABI constants that are unused because the corresponding feature is partially implemented.

See also: Rust Reference: Diagnostic attributes and rustc unfulfilled_lint_expectations.

Prefer functions over macros (macros-as-last-resort)

Prefer functions and generics over macros. Macros are powerful but harder to understand, debug, test, and format. Reach for a macro only when the type system or generics cannot express what you need (e.g., variadic arguments, compile-time code generation, or DSL syntax).

// Good — a generic function covers all types
fn align_up<T: Into<usize>>(val: T, align: usize) -> usize {
    let val = val.into();
    (val + align - 1) & !(align - 1)
}

// Bad — a macro where a function would suffice
macro_rules! align_up {
    ($val:expr, $align:expr) => {
        ($val + $align - 1) & !($align - 1)
    };
}

See also: The Rust Programming Language, Chapter 20.5 “Macros”; Rust by Example: Macros.

Select Topics

This section covers cross-cutting concerns that span multiple language items. Each page focuses on one topic with actionable guidelines.

Concurrency and Races

Concurrency code is reviewed with extreme rigor. Lock ordering, atomic correctness, memory ordering, and race condition analysis are all demanded explicitly.

Establish and enforce a consistent lock order (lock-ordering)

Acquiring two locks in different orders from different code paths is a potential deadlock. Hierarchical lock order must be established and documented.

pub(super) fn set_control(
    self: Arc<Self>,
    process: &Process,
) -> Result<()> {
    // Lock order: group of process -> session inner -> job control
    let process_group_mut = process.process_group.lock();
    // ...
}

See also: PR #2942.

Never do I/O or blocking operations while holding a spinlock (no-io-under-spinlock)

Holding a spinlock while performing I/O or blocking operations is a deadlock hazard. Use a sleeping mutex or restructure to drop the lock first.

// Good — spinlock dropped before I/O
let data = {
    let guard = self.state.lock(); // state: SpinLock<...>
    guard.pending_data.clone()
};
self.device.write(&data)?;

// Bad — I/O while holding spinlock
let guard = self.state.lock(); // state: SpinLock<...>
self.device.write(&guard.pending_data)?;

See also: PR #925.

Do not use atomics casually (careful-atomics)

When multiple atomic fields must be updated in concert, use a lock. Only use atomics when a single value is genuinely independent.

// Good — a lock protects correlated fields
struct Stats {
    inner: SpinLock<StatsInner>,
}
struct StatsInner {
    total_bytes: u64,
    total_packets: u64,
}

// Bad — two atomics that must be consistent
// but can be observed in an inconsistent state
struct Stats {
    total_bytes: AtomicU64,
    total_packets: AtomicU64,
}

Critical sections must not be split across lock boundaries (atomic-critical-sections)

Operations that must be atomic (check + conditional action) must happen under the same lock acquisition. Moving a comparison outside the critical region is a correctness bug.

// Good — check and action under the same lock
let mut inner = self.inner.lock();
if inner.state == State::Ready {
    inner.state = State::Running;
    inner.start();
}

// Bad — TOCTOU race: state can change
// between the check and the action
let is_ready = self.inner.lock().state == State::Ready;
if is_ready {
    self.inner.lock().state = State::Running;
    self.inner.lock().start();
}

See also: PR #2277.

Defensive Programming

Assertions verify invariants that must hold for the program to be correct. Choosing the right assertion type balances safety against runtime cost.

Use debug_assert for correctness-only checks (debug-assert)

Assertions verifying invariants that should never fail in correct code belong in debug_assert!, not assert!. debug_assert! is compiled out in release builds, so the check catches bugs during development without costing anything in production.

debug_assert!(self.align.is_multiple_of(PAGE_SIZE));
debug_assert!(self.align.is_power_of_two());

See also: std::debug_assert! and Rust Reference: debug_assertions.

Error Handling

Rust’s error handling model — Result, ?, and typed errors — is central to writing reliable kernel code.

Propagate errors with ? (propagate-errors)

Use the ? operator to propagate errors idiomatically. In kernel code, .unwrap() is rejected wherever failure is a legitimate possibility.

// Good — propagate with ?
let tsc_info = cpuid.get_tsc_info()?;
let frequency = tsc_info.nominal_frequency()?;

// Bad — unwrap hides the failure path
let tsc_info = cpuid.get_tsc_info().unwrap();

See also: The Rust Programming Language, Chapter 9 “Error Handling” and Rust by Example: unpacking options and defaults with ?.

Logging

Consistent logging makes debugging tractable across a large kernel codebase.

Use log crate macros exclusively (log-crate-only)

The project standardizes on the log crate’s macros: trace!, debug!, info!, warn!, error!. Custom output functions, println!, and hand-rolled serial print macros are not acceptable in production code. Exception: code that runs before the logging subsystem is initialized may use early-boot output helpers.

// Good
log::info!("VirtIO block device initialized: {} sectors", num_sectors);

// Bad
println!("VirtIO block device initialized: {} sectors", num_sectors);

Choose appropriate log levels (log-levels)

LevelUse for
trace!High-frequency events: every interrupt, every packet, every page fault.
debug!Development diagnostics: state transitions, intermediate values.
info!Rare, noteworthy events: subsystem initialization, configuration changes.
warn!Recoverable problems: fallback paths taken, deprecated usage detected.
error!Serious failures: resource exhaustion, invariant violations caught at runtime.

A log statement that fires on every syscall or every timer tick must use trace!, not debug!.

See also: PR #2260.

Memory and Resource Management

Rust’s ownership model is the primary tool for safe resource management in the kernel.

Use RAII for all resource acquisition and release (raii)

Resources — IRQ enable/disable state, port numbers, file handles, DMA buffers, lock guards — must use the Drop trait for automatic cleanup. Manual enable()/disable() call pairs are rejected.

// Good — RAII guard ensures IRQs are re-enabled
fn disable_local() -> DisabledLocalIrqGuard { ... }

impl Drop for DisabledLocalIrqGuard {
    fn drop(&mut self) {
        enable_local_irqs();
    }
}

// Bad — caller can forget to re-enable
fn disable_local_irqs() { ... }
fn enable_local_irqs() { ... }

Prefer lexical lifetimes so the Rust compiler inserts drop automatically, rather than calling drop() manually. When the default drop order is incorrect, use explicit drop() calls.

See also: PR #164.

Performance

Performance on critical paths is taken very seriously. Changes to hot paths must be benchmarked. Unnecessary copies, allocations, and O(n) algorithms are rejected.

Avoid O(n) algorithms on hot paths (no-linear-hot-paths)

System call dispatch, scheduler enqueue, and frequent query operations must not introduce O(n) complexity where n is a quantity that can be large (number of processes, number of file descriptors, etc.). Demand sub-linear alternatives.

// Bad — O(n) scan on every enqueue
fn select_cpu(&self, cpus: &[CpuState]) -> CpuId {
    cpus.iter()
        .min_by_key(|c| c.load())
        .expect("at least one CPU")
        .id()
}

// Good — maintain a priority queue
// so selection is O(log n)
fn select_cpu(&self) -> CpuId {
    self.cpu_heap.peek().expect("at least one CPU").id()
}

See also: PR #1790.

Minimize unnecessary copies and allocations (minimize-copies)

Extra data copies — serializing to a stack buffer before writing, cloning an Arc when a & reference suffices, collecting into a Vec when an iterator would do — should be avoided.

// Bad — unnecessary Arc::clone
fn process(&self, stream: Arc<DmaStream>) {
    let s = stream.clone();
    s.sync();
}

// Good — borrow when ownership is not needed
fn process(&self, stream: &DmaStream) {
    stream.sync();
}

See also: PR #2582 and #2725.

No premature optimization without evidence (no-premature-optimization)

Performance optimizations must be justified with data. Introducing complexity to solve a non-existent problem is rejected. If you claim a change improves performance, show the numbers.

Git Guidelines

These guidelines cover commit hygiene and pull request conventions. For the underlying philosophy, see How Guidelines Are Written.

Write imperative, descriptive subject lines (imperative-subject)

Write commit messages in imperative mood with the subject line at or below 72 characters. Wrap identifiers in backticks.

Common prefixes used in the Asterinas commit log:

  • Fix — correct a bug
  • Add — introduce new functionality
  • Remove — delete code or features
  • Refactor — restructure without changing behavior
  • Rename — change names of files, modules, or symbols
  • Implement — add a new subsystem or feature
  • Enable — turn on a previously disabled capability
  • Clean up — minor tidying without functional change
  • Bump — update a dependency version

Examples:

Fix deadlock in `Vmar::protect` when holding the page table lock

Add initial support for the io_uring subsystem

Refactor `TcpSocket` to separate connection state from I/O logic

If the commit requires further explanation, add a blank line after the subject followed by a body paragraph describing the why behind the change.

See also: PR #2877 and #2700.

One logical change per commit (atomic-commits)

Each commit should represent one logical change. Do not mix unrelated changes in a single commit. When fixing an issue discovered during review on a local or private branch, use git rebase -i to amend the commit that introduced the issue rather than appending a fixup commit at the end.

See also: PR #2791 and #2260.

Separate refactoring from features (refactor-then-feature)

If a feature requires preparatory refactoring, put the refactoring in its own commit(s) before the feature commit. This makes each commit easier to review and bisect.

See also: PR #2877.

Keep pull requests focused (focused-prs)

Keep pull requests focused on a single topic. A PR that mixes a bug fix, a refactoring, and a new feature is difficult to review.

Ensure that CI passes before requesting review. If CI fails on an unrelated flake, note it in the PR description.

Testing Guidelines

This page covers language-agnostic testing conventions. For Rust-specific assertion policy (assert! vs debug_assert!), see Rust Guidelines — Defensive Programming.

Add regression tests for every bug fix (add-regression-tests)

When a bug is fixed, a test that would have caught the bug should accompany the fix. Include a reference to the issue number in a comment so future readers can recover the original context.

See also: PR #2962.

Test user-visible behavior, not internals (test-visible-behavior)

Tests should validate observable, user-facing outcomes. Prefer testing through public APIs rather than exposing internal constants in test code.

Name tests after the behavior or specification concept being verified, not after internal implementation details. Using kernel-internal names in user-space regression tests creates unnecessary coupling.

See also: PR #2926.

Use assertion macros, not manual inspection (use-assertions)

Use language- or framework-provided assertion helpers instead of printing values and manually inspecting output. Assertions provide clear failure messages and make tests self-checking.

See also: PR #2877 and #2926.

Clean up resources after every test (test-cleanup)

Always clean up resources after a test: close file descriptors, unlink temporary files, and call waitpid on child processes. Leftover resources can cause flaky failures in subsequent tests.

// Good — cleanup after use
int fd = open("/tmp/test_file", O_CREAT | O_RDWR, 0644);
// ... test logic ...
close(fd);
unlink("/tmp/test_file");

See also: PR #2926 and #2969.

Assembly Guidelines

These guidelines apply to assembly code in module-level global_asm! blocks and standalone .S files. For the underlying philosophy, see How Guidelines Are Written.

Sections

Use the correct section directive (asm-section-directives)

For built-in sections, use the short directive (e.g., .text). For custom sections, use the .section directive with flags and type (e.g., .section ".bsp_boot", "awx", @progbits).

A blank line should follow each section definition to visually separate it from the code that follows.

.section ".bsp_boot.stack", "aw", @nobits

boot_stack_bottom:
    .balign 4096
    .skip 0x40000  # 256 KiB
boot_stack_top:

Place code-width directives after the section definition (asm-code-width)

In x86-64, if an executable section contains only 64-bit code, place the .code64 directive directly after the section definition. The same applies to .code32 for 32-bit code. In mixed sections, treat .code64 and .code32 as function attributes (see below).

.text
.code64

.global foo
foo:
    mov rax, 1
    ret

Functions

Place attributes directly before the function (asm-function-attributes)

Function attributes (.global, .balign, .type) should be placed directly before the function label and should not be indented. Prefer .global over .globl for clarity.

.balign 4
.global foo
foo:
    mov rax, 1
    ret

Add .type and .size for Rust-callable functions (asm-type-and-size)

Functions that can be called from Rust code should include the .type and .size directives. This gives debuggers a better understanding of the function.

.global bar
.type bar, @function
bar:
    mov rax, 2
    ret
.size bar, .-bar

This does not apply to boot entry points, exception trampolines, or interrupt trampolines — they may not fit the typical definition of “function” and their sizes may be ill-defined.

See also: PR #2320.

Use unique label prefixes to avoid name clashes (asm-label-prefixes)

A Rust crate is a single translation unit, so global_asm! labels in different modules within the same crate share the same global namespace. Add custom prefixes to labels to avoid name clashes (e.g., bsp_ for BSP boot code, ap_ for AP boot code).

# Good — prefixed to avoid clashes
bsp_boot_stack_top:
ap_boot_stack_top:

# Bad — generic names risk duplication
boot_stack_top:

See also: PR #2571 and #2573.

Prefer .balign over .align (asm-prefer-balign)

The .align directive’s behavior varies across architectures — on some it specifies a byte count, on others a power of two. Use .balign for unambiguous byte-count alignment.

# Good — unambiguous
.balign 4096

# Bad — architecture-dependent meaning
.align 12

See also: PR #2368.

Boterinas

Introduction

@boterinas is a general-purpose bot designed for a wide variety of tasks in Asterinas. It streamlines maintenance tasks to enhance workflow efficiency.

Commands are issued by writing comments that start with the text @boterinas. The available commands depend on which repository you are using. The main Asterinas repository contains a triagebot.toml file where you can see which features are enabled.

Commands for GitHub issues or pull requests should be issued by writing @boterinas followed by the command anywhere in the comment. Note that @boterinas will ignore commands in Markdown code blocks, inline code spans, or blockquotes. You can enter multiple @boterinas commands in a single comment.

For example, you can claim an issue and add a label in the same comment.

@boterinas claim
@boterinas label C-enhancement

Additionally, @boterinas allows for editing comments. If you don’t change the text of a command, the edit will be ignored. However, if you modify an existing command or add new ones, those commands will be processed.

Below, you’ll find a comprehensive guide on how to use @boterinas effectively.

Commands and Usage

Workflow Management

  • @boterinas rerun Restarts the workflow of the current pull request if it has failed unexpectedly. Only the author of the pull request can use this command.

Code Review

  • @boterinas codex Triggers an AI-powered code review of the pull request using OpenAI Codex. The review posts inline comments highlighting potential issues with correctness, performance, security, and maintainability. Only repository members and collaborators can use this command.

Issue and Pull Request Management

  • @boterinas claim
    Assigns the issue or pull request to yourself.

  • @boterinas release-assignment
    Removes the current assignee from an issue or pull request. This command can only be executed by the current assignee or a team member.

  • @boterinas assign @user
    Assigns a specific user to the issue or pull request. Only team members have permission to assign other users.

Label Management

  • @boterinas label <label>
    Adds a label to the issue or pull request.
    Example: @boterinas label C-enhancement C-rfc

  • @boterinas label -<label>
    Removes a label from the issue or pull request.
    Example: @boterinas label -C-enhancement -C-bug

Status Indicators

  • @boterinas author
    Indicates that a pull request is waiting on the author. It assigns the S-waiting-on-author label and removes both S-waiting-on-review and S-blocked, if present.

  • @boterinas blocked
    Marks a pull request as blocked on something.

  • @boterinas ready
    Indicates that a pull request is ready for review. This command can also be invoked with the aliases @boterinas review or @boterinas reviewer.

Notes

  • Only team members can assign users or remove assignments.
  • Labels are crucial for organizing issues and pull requests, so ensure they are used consistently and accurately.
  • For any issues or questions regarding @boterinas, please reach out to the team for support.

Version Bump

Version Numbers

Currently, Asterinas regularly releases two main artifacts for the Rust OS developer community: the OSDK and OSTD. To support development with these tools, we also publish companion Docker images on DockerHub, (i.e., asterinas/osdk). While the Asterinas kernel is not yet ready for public binary release, its development Docker images (i.e., asterinas/asterinas) are released regularly.

All released crates for OSDK and OSTD share the same version number, which is stored in the VERSION file at the project root. The current content of this file is shown below.

0.17.1

Similarly, the Docker images’ version number is stored in the DOCKER_IMAGE_VERSION file, as shown below.

0.17.1-20260317

We use a custom format for Docker image versions: MAJOR.MINOR.PATCH-DATE. The MAJOR.MINOR.PATCH component aligns with the target version of published crates, while the DATE component allows us to introduce non-breaking updates to the Docker images without publishing a new crate version. Normally, the version in VERSION and the version in DOCKER_IMAGE_VERSION (ignoring the DATE part) are identical, except during the version bump process.

How to Bump Versions

We recommend a three-commit procedure to bump versions:

  1. Commit 1 “Bump the Docker image version” triggers the generation of a new Docker image.
  2. Commit 2 “Switch to a new Docker image” makes the codebase use the new Docker image.
  3. Commit 3 “Bump the project version” triggers the release of new crates.

Depending on your exact purpose, you may complete the version bump process with at most three commits within two PRs.

  • To make non-breaking changes to the Docker images, submit Commit 1 in a PR, then Commit 2 in another.
  • To make breaking changes to the Docker images and the crates’ APIs, submit Commit 1 in a PR, then Commit 2 and 3 in another.

Across the three commits, you will be assisted with a convenient utility script, tools/bump_version.sh,

Commit 1: “Bump the Docker image version”

After updating the Docker image content, increment the Docker image version using the following command:

./bump_version.sh --docker_version_file [major | minor | patch | date]

The second argument specifies which part of the Docker image version to increment. Use date for non-breaking Docker image changes. If the changes affect the crates intended to publish, select major, minor, or patch in line with semantic versioning.

This command updates the DOCKER_IMAGE_VERSION file. Submit these changes as a pull request. Once merged, the CI will automatically trigger the creation of new Docker images.

Commit 2: “Switch to a new Docker image”

Creating new Docker images can be time-consuming. Once the images have been pushed to DockerHub, write a follow-up commit to update all Docker image version references across the codebase.

./bump_version.sh --docker_version_refs

If your purpose is to publish non-breaking changes to the Docker images, then submit this commit in a PR and then your job is finished. Otherwise, go on with Commit 3.

Commit 3: “Bump the project version”

In this commit, synchronize the version number in VERSION with that in DOCKER_IMAGE_VERSION by running:

./bump_version.sh --version_file

This command also updates all version numbers in the Cargo.toml files of all crates scheduled for release. Pack these changes into a third commit and submit the last two commits in a single PR. After the PR is merged into the main branch, the CI will automatically publish the new crate versions.

RFC Overview

The Asterinas RFC process is intended to provide a consistent, transparent, structured path for the community to make “big” decisions. For example, the RFC process can be used to evolve the project roadmap and the system architecture.

For more details, see the RFC-0001 RFC process.

RFC-0001: RFC process

  • Status: Implemented
  • Pull request: https://github.com/asterinas/asterinas/pull/2365/
  • Date submitted: 2025-08-24
  • Date approved: 2025-09-05

Summary

The “RFC” (request for comments) process is intended to provide a consistent, transparent, structured path for the community to make “big” decisions. For example, the RFC process can be used to evolve the project roadmap and the system architecture.

Motivation

As the Asterinas project grows in scale and community size, an informal decision-making process becomes insufficient. A formalized process is crucial for maintaining the quality and coherence of the project. Several highly successful open-source projects have demonstrated the value of such a process, including Rust (RFCs), Fuchsia (RFCs), Python (Python Enhancement Proposals or PEPs), and Kubernetes (Kubernetes Enhancement Proposals or KEPs).

Adopting a formal RFC process will:

  • Enhance decision quality: By requiring a detailed proposal and public discussion, we can ensure that decisions are well-reasoned, technically sound, and consider a wide range of perspectives.
  • Foster Consensus: The process provides a clear framework for the community to come together, discuss differing viewpoints, and arrive at a consensus on important issues.
  • Create design records: Approved RFCs will serve as a historical archive, documenting the design choices made and the reasoning behind them. This is invaluable for new contributors and for future reference.

By introducing this process, we aim to support the healthy and sustainable growth of the Asterinas project.

Design

When to follow the RFC process

The RFC process should be used for any “substantial” change to Asterinas. Examples of changes that require an RFC include:

  • Launching new top-level sub-projects of a similar significance to OSTD and OSDK.
  • Defining or significantly altering the project roadmap and goals.
  • Establishing or changing project-wide norms, such as this RFC process itself or coding style guides.
  • Proposing significant architectural designs, such as the framekernel architecture or safe policy injection.
  • Introducing changes that affect user-space programs, such as adding a non-Linux system call or feature.

Examples of changes that do not require an RFC include:

  • Proposing a design whose impact is confined to a single sub-project or module. If the design is significant or non-trivial, submit a design proposal on Github Issues for discussion.
  • Adding well-understood features with established patterns, such as a standard Linux system call or a device driver.
  • Refactoring existing code, fixing bugs, or improving performance.

When in doubt, it is best to consult with the project maintainers using the methods described in the next section.

How the RFC process works

The RFC process consists of several distinct stages:

The Socialization Stage

Before investing the time to write a full RFC, it is highly recommended to socialize the core idea with the community. This helps gauge interest, gather early feedback, and refine the proposal. Good venues for this include:

  • Starting a discussion on the project’s GitHub Discussions page.
  • Posting a “Pre-RFC” document to solicit more detailed feedback.
  • Talking to key contributors, code owners, or maintainers directly.

Having support from at least a few other community members is a strong signal that the idea is ready for a formal RFC. Additionally, having a proof-of-concept implementation or identifying individuals committed to the implementation can strengthen the proposal.

The Draft Stage

Once you are ready to proceed, create a formal RFC document.

  1. Fork the asterinas/asterinas repository.
  2. In the rfcs directory, copy rfc-template.md and rename it to 0000-your-rfc-title.md. The 0000 is a placeholder for the RFC number, which will be assigned later.
  3. Fill out the template with your proposal. The template provides a solid structure, but feel free to adapt it to best suit your proposal.

The Iteration Stage

Submit your RFC draft as a pull request to the Asterinas repository. Once the PR is opened, the formal review process begins:

  • A project maintainer will be assigned as the facilitator for the RFC. The facilitator’s role is to guide the discussion, ensure it remains productive, and ultimately determine when consensus has been reached.
  • All members of the community are encouraged to review the proposal and provide constructive feedback through comments on the pull request.
  • The RFC author is responsible for engaging with the feedback, addressing concerns, and updating the RFC text to reflect the evolving consensus.

The Voting Stage

When the discussion has converged and the major points of feedback have been addressed, the author can request that the facilitator initiate a final vote.

  • The vote is open to all project maintainers and code owners of top-level sub-projects. See the CODEOWNERS file for details.
  • Voters can express either “approval” or “rejection.”
  • The final decision will be based on a rough consensus as determined by the facilitator. This means that while unanimous agreement is not required, any major objections must be addressed.

If the final decision is approval:

  1. The facilitator will assign an unique RFC number.
  2. The author will update the RFC’s file name with the assigned RFC number. The author should also update the metadata fields of the RFC.
  3. The facilitator will merge the pull request.

The Maintenance Stage

Once an RFC is approved, the proposed changes are ready for implementation. When the corresponding work is completed and merged, the RFC’s status is updated from “approved” to “implemented.”

Approved RFCs are considered immutable historical records of design decisions. As such, only minor corrective edits like fixing typos or broken links are allowed.

Significant changes or amendments to an existing RFC must be proposed through an entirely new RFC. This new proposal can also be used to formally deprecate a previous RFC, in which case the original’s status will be updated to “deprecated.”

License

All RFCs submitted to the Asterinas project are licensed under Apache License, Version 2.0. This license is more permissive than the MPL license used for the Asterinas code.

Drawbacks

Introducing a formal RFC process can increase the overhead for making changes to the project. The process is intentionally deliberate, which may slow down the pace of development for major features. There is also a risk that the process could become overly bureaucratic if not managed carefully. For contributors, the effort required to write a high-quality RFC and see it through the review process can be significant.

Prior Art and References

The Asterinas RFC process is heavily inspired by the well-established processes of the following two open-source projects:

  • Rust RFC Process: The overall structure and philosophy are closely modeled on Rust’s RFC process.
  • Fuchsia RFC Process: We have drawn on the Fuchsia process for its clear definition of roles and its emphasis on transparent decision-making.

RFC-0002: Asterinas NixOS

  • Status: Approved
  • Pull request: https://github.com/asterinas/asterinas/pull/2584
  • Date submitted: 2025-11-14
  • Date approved: 2025-12-01

Summary

This RFC formally proposes the establishment of an Asterinas distribution as a new, top-level sub-project of Asterinas. We intend for this distribution to leverage NixOS due to its unparalleled customizability and rich package ecosystem. Accordingly, we propose naming this new sub-project Asterinas NixOS (AsterNixOS).

Motivations

What is a “distro”?

In the context of operating systems (OSes), a “distribution” (or “distro”) typically refers to a complete OS built around a kernel, complemented by userspace tools, libraries, and applications managed via a package management system. A Linux distro, for example, combines the Linux kernel with a rich userspace. Similarly, an Asterinas distro will pair the Asterinas kernel with a comprehensive userspace environment. For the purpose of this discussion, “distro” will refer broadly to either a Linux or Asterinas-based distro.

Short-term strategic advantages

Achieving a Minimum Viable Product (MVP) milestone is crucial for Asterinas’s maturation. Reaching MVP means that Asterinas is ready for evaluation by early adopters, who expect a seamless experience comparable to mainstream OSes. This includes easy application installation and out-of-the-box functionality. Simply providing a kernel is insufficient; we must deliver a user-friendly experience, which necessitates a full-fledged distro complete with an intuitive package manager.

Furthermore, direct control over an Asterinas distro offers a pragmatic approach to addressing Linux compatibility challenges. While Asterinas is committed to a high degree of Linux compatibility, achieving perfect parity in the near term is impractical. A dedicated distro allows us to configure or patch specific packages to circumvent reliance on advanced or bleeding-edge Linux features that do not have high priority in Asterinas. This significantly reduces pressure on Asterinas kernel developers to implement complex features prematurely, enabling a more focused and stable development roadmap.

Beyond external users, Asterinas developers themselves will greatly benefit from a dedicated distro. Testing complex applications with intricate dependencies is a significant hurdle. While “Hello World” or even medium-sized projects like Nginx or Redis can be manually built and integrated into a disk image, this approach does not scale for complex applications such as Spark, PyTorch, or Chromium. Identifying, understanding, and building every dependency for such projects would divert critical kernel development resources. A distro, by its very nature, abstracts this complexity through its package manager, streamlining the testing and development workflow for all Asterinas developers.

Long-term ecosystem vision

Looking ahead, we envision a thriving ecosystem of Asterinas distros. This initial, kernel-developer-maintained distro will serve as a vital reference implementation, fostering the creation of diverse Asterinas-based distros. While the Linux world boasts numerous distros, Asterinas-specific distros will be essential for two primary reasons:

Firstly, Asterinas prioritizes Linux compatibility at the external ABI level, not the internal kernel module interface. This means Asterinas cannot load Linux kernel modules (.ko files). An Asterinas distro will ensure that all system programs exclusively load Asterinas kernel modules.

Secondly, Asterinas will eventually provide features and value propositions that Linux simply does not have—for example, new system calls, file systems, and device drivers. These benefits only become real when applications can detect and leverage them.

An Asterinas distro is the natural vehicle for this. It can ship new packages written specifically for Asterinas and carry patches to existing packages to make them “Asterinas-aware”, ensuring that userspace can meaningfully take advantage of the kernel’s unique capabilities.

In essence, establishing Asterinas distros is paramount for Asterinas’s long-term success and differentiation. This RFC lays the groundwork for that future by proposing our foundational distro.

Design

Why NixOS as the base distro?

Building the first Asterinas distro from scratch would be an immense undertaking, diverting focus from core kernel development. Therefore, basing it on an existing Linux distro is the most rational path forward. This leads to the crucial question: which Linux distro should serve as our foundation?

The landscape of Linux distros is vast, featuring prominent names such as Arch, CentOS, Debian, and Gentoo. While theoretically, any of these could serve, we have identified NixOS as a particularly attractive and uniquely suited option.

In most distros, package recipes are centered around shell scripts (e.g., PKGBUILD in Arch Linux or the rules file in a Debian package). NixOS, however, utilizes a purpose-built, purely functional language called Nix, which offers unparalleled expressiveness and flexibility for defining package recipes.

A core requirement for our base distro is the ability to easily tweak existing package recipes to work around Asterinas’s limitations or customize the new distro’s look or behavior. The Nix language excels in this regard.

For instance, consider overriding specific attributes of an existing Nix package, such as xfdesktop, with minimal code:

{ pkgs }:
{
  xfdesktop = pkgs.xfce.xfdesktop.overrideAttrs (oldAttrs: {
    version = "4.16.0";
    patches = (oldAttrs.patches or []) ++ [
      ./patches/xfdesktop4/0001-Fix-not-using-consistent-monitor-identifiers.patch
    ];
  });
}

This Nix file creates a customized xfdesktop package without requiring intrusive modifications to its original recipe. This capability, known as the override design pattern in Nix, is uniquely feasible due to Nix’s first-class functions and lazy evaluation.

Another powerful example of NixOS’s flexibility is how easily we can create an Asterinas distro ISO installer by customizing an existing NixOS installer ISO. Consider the following iso-image.nix file:

{ lib, pkgs, ... }:
let
  asterinas_kernel = builtins.path { path = "asterinas_bin"; };
  auto_install_script = pkgs.replaceVarsWith {
    src = "./auto_install.sh";
    isExecutable = true;
    replacements = {
      shell = "${pkgs.bash}/bin/sh";
      inherit asterinas_kernel;
    };
  };
  configuration = {
    imports = [
      "${pkgs.path}/nixos/modules/installer/cd-dvd/installation-cd-minimal.nix"
      "${pkgs.path}/nixos/modules/installer/cd-dvd/channel.nix"
    ];

    services.getty.autologinUser = lib.mkForce "root";
    environment.loginShellInit = "${auto_install_script}";
  };
in (pkgs.nixos configuration).config.system.build.isoImage

This file defines a new installer ISO that differs from the original in two key ways. First, it ships with the Asterinas kernel (asterinas_kernel). Second, instead of dropping the user into an interactive shell, the ISO’s init process immediately runs an automatic installation script (auto_install.sh).

A notable NixOS feature relevant to our design is /etc/nixos/configuration.nix, the primary configuration file that determines all system-wide states of a NixOS installation, including the kernel, installed software, and configuration. A minimal configuration.nix for our distro might look like this:

{ config, pkgs, lib, ... }:
{
  # Do not edit the following system configuration for Asterinas NixOS
  nixpkgs.overlays = [ (import ./asterinas.nix) ];

  # Edit this list to add or remove installed software
  environment.systemPackages = with pkgs; [
    gcc
    python33
    vim
  ];
}

Here, the sample configuration.nix uses a NixOS feature called overlays to customize and extend Nixpkgs without modifying its upstream source. Overlays apply globally within the configuration, so our Asterinas-specific changes to vanilla NixOS can be expressed cleanly as reusable overlays rather than as ad-hoc patches scattered throughout the system.

A further significant advantage of NixOS is the portability of Nix, its package manager (and the Nix language it uses), across various Linux distros and even macOS.

Unlike most package managers, which are tightly coupled to their parent distros (e.g., pacman for Arch, dpkg for Debian, rpm for Red Hat-based systems), Nix operates independently. Nix packages are installed in the /nix/store directory, ensuring they do not conflict with native packages. This portability is invaluable for debugging: if a package in our NixOS-based distro encounters an issue, we can replicate the exact same package environment on any Linux development machine, greatly simplifying troubleshooting.

The New Top-Level Sub-Project

Given these compelling rationales, we propose the creation of a new top-level sub-project named Asterinas NixOS (AsterNixOS). This sub-project will reside in a new distro/ directory at the project root.

Initially, AsterNixOS will share version numbers with the Asterinas kernel, emphasizing its early development stage and close coupling. Every kernel release will be accompanied by a compatible distro release. In the long term, once both the kernel and the distro achieve maturity and stability, AsterNixOS will adopt its own independent versioning scheme, likely following a “YY.MM” format (e.g., 25.12).

Drawbacks, Alternatives, and Unknowns

Drawbacks

  • Learning curve for Nix/NixOS: This is arguably the most significant hurdle. While exceptionally powerful, the Nix ecosystem, including the Nix language and its functional paradigm, presents a steeper learning curve compared to conventional package managers and build systems. This could pose an initial barrier for new contributors and existing Asterinas developers unfamiliar with Nix. Investing in clear documentation and onboarding resources will be critical.
  • Maintenance overhead of Nixpkgs overlay: While the override pattern offers flexibility, maintaining a dedicated overlay for Asterinas-specific patches and configurations within the vast nixpkgs repository will still require continuous effort. Keeping pace with upstream nixpkgs changes, resolving potential conflicts, and ensuring compatibility will be an ongoing challenge requiring dedicated resources.

Alternatives

  • Embedded Linux distros: Buildroot and Yocto are established and highly capable tools for building embedded Linux distros, offering extensive customization for toolchains and root filesystems from scratch.
    • Why NixOS is superior: While powerful for embedded use cases, Buildroot and Yocto are less oriented toward a general-purpose desktop/server distro, which is our primary initial target. Furthermore, their configuration languages are generally less expressive and composable than the Nix language, limiting the flexibility we seek for deep customization and elegant overriding.
  • Direct port of Debian/Arch/etc.: This approach would involve directly modifying the build systems of a more traditional distro like Debian or Arch to target Asterinas.
    • Why NixOS is superior: As elaborated, traditional packaging systems are less flexible for the granular, targeted patching and configuration required when adapting to a new kernel. More critically, their package managers (e.g., dpkg, pacman, rpm) are inherently tied to their parent distros, making cross-distro development and debugging significantly more complex. The Nix package manager’s portability offers a clear advantage here.

Prior Art and References

The NixOS project itself serves as the most prominent piece of prior art, demonstrating the viability and advantages of a purely functional approach to OS configuration and package management. Its design principles, particularly around atomic upgrades, rollbacks, and reproducible builds, have been thoroughly proven since its inception in 2003. Other projects, like GNU Guix, also follow a similar functional package management paradigm.

RFC-0000: Your short, descriptive title

  • Status: Draft
  • Pull request: (link to PR)
  • Date submitted: YYYY-MM-DD
  • Date approved: YYYY-MM-DD

Summary

Provide a concise, one-paragraph summary of the proposed change. This should explain the core idea at a high level, making it understandable to someone without deep context.

Motivation

Explain the “why” behind this RFC. What problem does this solve? What is the user or developer story that justifies this change? Why is this change important for the Asterinas project now? This section should clearly articulate the benefits and convince readers of the proposal’s necessity.

Design

This is the heart of the RFC. Describe the proposed design in sufficient detail for a technical review. Explain how it works, how it interacts with other parts of the system, and any new APIs or concepts being introduced. Use code snippets, diagrams, and bullet points where they add clarity.

Drawbacks, Alternatives, and Unknown

  • Drawbacks: What are the negative consequences of this proposal? What compromises are being made?
  • Alternatives: What other designs or solutions were considered? Why was this specific design chosen over them? What is the impact of not doing this?
  • Unresolved Questions: What parts of the design are still to be determined (TBD)? What questions will need to be answered during the implementation phase?

Prior Art and References

Include links to relevant work