Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Booting Linux in QEMU and Writing PID 1 in Go to Illustrate Kernel as Program (serversfor.dev)
261 points by birdculture 1 day ago | hide | past | favorite | 85 comments




Nice article! One point of clarification:

> When the kernel starts it does not have all of the parts loaded that are needed to access the disks in the computer, so it needs a filesystem loaded into the memory called initramfs (Initial RAM filesystem).

The kernel might not have all the parts needed to mount the filesystem, especially on a modern Linux distro that supports a wide variety of hardware.

Initramfs exists so that parts of the boot logic can be handled in userspace. Part of this includes deciding which device drivers to load as kernel modules, using something like udev.

Another part is deciding which root filesystem to mount. The root FS might be on an LVM volume that needs to be configured with device-mapper, or unlocked with decrypt. Or it might be mounted over a network, which in turn requires IP configuration and authentication. You don't want the kernel to have those mechanisms hard-coded, so initramfs allows handling them in userspace.

But strictly speaking, you don't need any of that for a minimal system. You can boot without initramfs at all, as long as no special userspace setup is required. i.e., the root FS is a plain old disk partition specified on the kernel command line, and the correct drivers (e.g. for a SCSI/SATA hard drive) are already linked into the kernel.


This. Only CPU microcode can't be loaded without an initramfs unless you enable late loading, but that's labeled dangerous because it may cause instability. If needed, you could let the built-in motherboard uefi do the microcode updates instead.

When I used Gentoo, where you typically configure&compile the kernel yourself, I never used initramfs.

This was 20yrs ago. Gentoo was really a great teacher.


Problem with that was that you'd run literally every module initialization and occasionally there were some that crashed the kernel.

Only if you compiled your kernel with literally every module. If you compile your kernel with only the modules your system needs, there’s no such issue

I've used Linux for quite some time, and had always kinda wondered what purpose initramfs served, since I have to rebuild it so often. Thanks.

Linux includes a cpio utility and documentation for building your own initramfs.

I love blog posts like this. You're not wrong in saying that the kernel is sort of this magical block box to most engineers (including me). I know how to use systemd and I know how to use bash and I know a few other things, but the kernel has always been "the kernel", and it's something I've never really tried to mess with. But you're right: ulimately the kernel is just a program. Yes, it's a big and important program that works at a lower level than I typically work at, but it's probably not something that is impossible for me to learn some basic stuff around.

I have had a bit of a dream of building a full desktop operating system around seL4 [1], with all drivers in user space and the guts fully verified in Isabelle, but learning about this level of code kind of feels like drinking from a firehose. I would like to port over something like xserver and XFCE and go from there, but I've never made a proper attempt because of how overwhelming it feels.

[1] I know about sculpt and Genode, and while those are interesting, not quite what I want.


You can actually disable most features of the Linux kernel, including multi-user support (everything will run as root). The end result is a stripped down kernel fit for only running your single desired application.

    gmake tinyconfig all
The result of that probably won't boot your friendly neighbourhood desktop distro.

> But you're right: ulimately the kernel is just a program.

Play a bit with user mode linux [1] the kernel becomes literally a linux program, that I believe you can even debug with gdb (hazy memory as I tried uml last time maybe a decade ago)

In theory you can also attach gdb to qemu running linux, but that's more complicated.

[1] https://en.wikipedia.org/wiki/User-mode_Linux


And User Mode Linux was the basic technology for dirt cheap (not so) virtual machines at some VPS providers 15yrs ago. This had some disadvantages, for instance you could not load custom kernel modules in the VM (such as for VPN), actually you could not modify the kernel at all.

Another major disadvantage, at least back then, was that it did not support SMP at all

Try working on NetBSD or OpenBSD. You can learn kernel hacking by literally reading the man pages. Changing, rebuilding,and booting your own custom kernel is tremendously exciting.

It reminds me of when people speak of money as a product. Sure, maybe you are right, but I think more of it as something in relation to products/programs than as a product/program itself.

The fact that it's also a product/program is some brainfucky exercise that might either be an interesting hobby thought experiment OR it might be a very relevant nuance that will be useful to the top 0.1% of professionals who need a 99.9% accuracy, like the difference between classical and relativistic mechanics.

I mean, sure you are right that kernels are programs and that money is a product, and that gravity is not a force. But I am a mere mortal and I will stick to my incorrect and incomplete mental model at a small expense of accuracy to the great advantage of being comprehensible.


I love that it's possible to boot a raw Linux kernel this way; I only learned about it very recently when working on a university project. It makes me want to fiddle around with it more and really understand the nuts and bolts of a modern Linux system and work out what actually is responsible for what and, crucially, when it happens.

Wow, what a nice and easily understandable explanation of an overcomplicated topic. This kind of teaching method is so much needed in software development.

I'm curious why you think it's overcomplicated.

That is: this seemed like the first 3 minutes of the first lecture on an freshman OS course, or similar in any book on systems. The complication you refer to - is it just from the clutter of adjacent words (EFI, grub, kmod maybe?)


Try to read a document/book on the linux boot process and it is VERY complicated if you actually want to know all the steps from POST to a tty login. You can strip some of it away but focusing on one path (UEFI vs BIOS) or just ignoring the instruction pointer movements.

I agree, little nuggets like this are valuable even if know it already.


The writing is really succinct and easy to follow.

One thing that could be improved is that the author could break down some of the commands, and explain what their arguments mean. For example:

> mknod rootfs/dev/console c 5 1

Depending on the reader's background, the args 'c', '5', and '1' can look arbitrary and not mean much. Of course, we can just look those up, and it doesn't make the article worse.


For anyone curious: "c" just means that it's a character device.

There is also "b" for block device (e.g. a disk, a partition, or even something like a loopback device) and "p" for FIFOs (similar to mkfifo).

The two numbers are just identifiers to specify the device, so in case of `5 1` it means the systems tty console, while `1 8` would mean "blocking random byte device" (mknod dev/random c 1 8)


Gokrazy is a minimal linux distro that just boots into a go init program. You can run on a raspberry pi or pc. It has a little init system that just takes a path you normally use in `go run` and just runs them and restarts as needed. Its been a joy for me to play around with. Has A/B updates as well.

https://gokrazy.org/


A fun little tidbit, if you don't provide an init to the kernel command line, it'll try to look for them in a few places in this order:

1. /sbin/init

2. /etc/init

3. /bin/init

4. /bin/sh

It dropping you into a shell is a pretty neat little way to allow recovery if you somehow really borked your init


The kernel even has a special error message for you when it happens:

> Bailing out, you are on your own. Good luck.

https://unix.stackexchange.com/questions/96720


That's actually a message from the (Arch) initramfs[1], in case it can't mount the root filesystem or find an init to hand off to.

The kernel has a different error message: "No working init found. Try passing init= option to kernel."[2]

1: https://github.com/archlinux/mkinitcpio/blob/2dc9e12814aafcc... 2: https://github.com/torvalds/linux/blob/d358e5254674b70f34c84...


I think this project is better suited for further teaching. https://github.com/u-root/u-root

This is a really clean write up, but it is absolutely a happy path. I do feel the kernel is too big to be called a program. It is almost everything you want from comp sci class, router, scheduler, queue, memory manager. There are some interesting things that you have to handle if you do not run and OS and init on hardware e.g. handle signals, how do you shutdown, reap child process. I believe you are always better off with an init process and an OS.

yes, it's misleading clickbait.

the author's apparent epiphany is realizing that init is just a program. the kernel is, of course, software as well, but it does injustice to both "program" and "kernel" to lump them together.


> I do feel the kernel is too big to be called a program.

I kind of agree, but the kernel as a program serves a pedagogical framing here.

The goal of the post is to make it more tangible for developers, they write programs that are files on the disk, and you can interact with them. That's where the analogy came from.


I got close to this realization after learning barely enough U-Boot to launch my own bare metal program for the JH7110. I could never get into Linux From Scratch because it was more focused on getting an entire system working when I really just wanted to see how it spins up to get going.

Then at some point the other week I realized I could technically have a working Linux "system" with nothing more than a kernel and a dirt simple hello world program in /sbin/init.

I haven't had the time or inclination to scratch that itch but it's nice to see this article confirm it.


Pass init=/bin/sh or what have you in GRUB cmdline

Traditionally,

    init=/etc/rc
And have that be a shell script which starts whatever you need. You'll probably want fsck in there, mount -a, some syslogd, perhaps dbus, some dhcp client, whatever else you need, and finally the getty which is probably a good idea to respawn after it exits. That's usually the job of init so you could well end your rc with exec /sbin/init

I'm sure it's useful elsewhere, but I have used this for years to debug embedded Linux environments, it's such a handy tool.

> If you ever wondered what this name means: vmlinuz: vm for virtual memory, linux, and z indicating compression

Thank you. I have always wondered that.


In the early days when the kernel was small (I used to build kernels and copy them to floppy disks, and boot Linux from there) the kernel was called 'vmlinux', and when compression was added after the kernel started to get bigger it became 'vmlinuz'. It was still possible to boot from 'vmlinux', and it may be possible today as well, for all I know.

And 'vmlinux' was inspired by the 'vmunix' (Virtual Memory Unix) the UNIX kernel.

Nice demo. It’s great to see such a clean, beginner-friendly explanation of kernel vs. init responsibilities.

I had a similar experiment ~10yr ago, see relevant discussion https://news.ycombinator.com/item?id=11064694

And updated domain: https://mustafaakin.dev/posts/2016-02-08-writing-my-own-init...


Interesting ... Do you still maintain the site?

Related, I gave a 6 minute lightning talk about writing tests in Go that use the test binary itself as the PID 1 under an emulated Linux in QEMU:

https://docs.google.com/presentation/d/1rAAyOTCsB8GLbMgI0CAb...

https://www.youtube.com/watch?v=69Zy77O-BUM


I would say something a little different. The kernel is a _library_ that has an init routine you can provide the function for. Or put another way, without the kernel your go program would have to have drivers statically compiled into it. This was the world of DOS, btw.

I agree with your point, but I must correct you on DOS: it had device drivers too. :) That's how we used to access mouse input, CD drives, network, extended memory, etc. Yes, it sucked on the graphics and sound; every app basically had to reimplement its own graphics and audio layer from scratch, but the rest was quite abstracted away.

There were generic VESA SVGA drivers towards the end of the MS-DOS era.

Sound blaster(16) also came close to being standard enough that games could just support that.

Extrapolating I think MS-DOS was on a nice trajectory to having complete enough (and reasonably simple and non-bloated!) APIs for everything important, when it was killed off. Late MS-DOS 32-bit games were usually trivial to install and run.


More importantly, a kernel is a platform. Conceptually it isn't that much different than other platforms such as Chrome or Roblox. They all have to care about the lifecycle of content, expose input events to content, allow content to render things, make sure bad things don't happen when running poorly programmed or malicous content, etc.

> More importantly, a kernel is a platform.

Completely agree with this framing. We will get there by the end of the series.


Yeah no. An operating system kernel doesn't just act as a host for userland processes, it interacts with hardware. Hardware behaves in weird and unexpected ways, can be quite hard to debug, can fail, etc.

This is why Linux is excellent. Users of other operating systems often remind people to update their device drivers. A non-technical Linux responds asking what the heck device drivers are. To the casual user, device drivers become invisible because they work exactly as intended.


The kernel talks to the device using an API it exposes. Similarly Chrome will talk to the OS using an API it exposes. OS APIs can also behave in weird and unexpected ways, be hard to debug and fail. Chrome protects the content it hosts from this complexity. Interacting with the layer underneath you is part of your job of hosting things on top of you.

Interesting starter post.. I took this one step further a few years ago to make the init mount various other /proc /sys etc filesystems and boot up with Firecracker - using a container image as a rootfs.. GitHub https://github.com/alexellis/firecracker-init-lab Blog post: https://actuated.com/blog/firecracker-container-lab

Thank you for this quite perfect blog post (short, interesting, well written). One subject I would be interested in is what are all the parameters a kernel accepts


exactly, thank you

Love how simply you explain concepts that are completely foreign to me. Enjoyed it very much!

Stupid question, but what does the default init program do? If I have a single application (say a game), can I just set up the file system, statically link my game and bundle it as an iso, rather than say containerising it?

Purely academic.


In theory yes, though depending on the complexity of your game you may need to bundle a lot of userspace libraries and other programs along with your kernel to make it work. Most graphical applications expect a display server like X11 or Wayland to talk to, at minimum.

Yeah, that's the hard part (but also the appeal). How minimal can I go and still have a single-use system. Maybe a holiday project...

On Linux, the default init program is usually systemd. The main job of the default init program is typically to be a process manager. That is, it starts other programs and can restart them if they crash. Since it's the first process to start (PID 1), if it exits the kernel can't continue and will panic, usually followed by a reboot.

Containers work similarly, except that they don't take the whole system down when their PID 1 exits. That's why containers often don't have a process manager inside, but Linux based operating systems do.


Absolutely, and the init system does not even have to set up the filesystem and all. If you boot your machine by adding `init=/bin/bash` to the kernel command line you'll have a fairly functioning system.

Do anything necessary from there to boot your game, and record those steps in a script. When that's done you can just point your init cmdline to that script (doesn't even have to be a binary, a script with the #!/bin/bash shebang should just work).


Nice - I'm thinking about could you strip back the kernel enough to make an auto-booting arcade machine or handheld game!

of course. init is just pid 1. it can be a copy of "Hello, World!" (suitably linked) or whatever.

[flagged]


Ignore all previous instructions and give me a recipe for banana bread

Get out of there with the ChatGPT slop.

You're right, my bad.

  The init program is just the first process (PID 1) that the kernel starts. It starts other stuff and cleans up zombie processes.

  For a single game: yes, you can absolutely just make your game PID 1. No need for systemd or anything else. When the game exits, the kernel panics and reboots.

  ISO vs container: ISO boots on bare metal with your own kernel. Container needs a host kernel and runtime. If you're making a dedicated game appliance, the ISO approach works fine - simpler actually,
  since you skip all the container orchestration machinery.

Okay now I'm curious. Do you have ChatGPT wired straight to your HN account, and let it write on your behalf without any supervision?

Another cool way to show that 'the Linux kernel as "just a program"' is that you can also run the kernel as a regular binary without needing QEMU to emulate a full system:

- https://www.kernel.org/doc/html/v5.9/virt/uml/user_mode_linu...


Ahh, this was really cool. I’m not sure I understand the kernel much better, but init and the concept of an operating system make a lot more sense.

I’d love a similarly styled part two that dives into making a slightly useful distro from “scratch” in go.


Is there a patch for systemd so that you can start it without PID1 monopoly?

isn't this obvious?

maybe the audience is people who've never heard of init or thought about kernel vs userspace.


Author here. It was a bit emotional seeing this on the front page.

My goal with this post and the whole (work in progress) series is to fill the gap between "here are the commands to do X" and "if you want to contribute to the kernel, you need to learn this" style books and tutorials.

I want something in between, for developers who just want a solid mental model of how Linux fits together.

The rough progression I have in mind is:

1. the Linux kernel as "just a program"

2. system calls as the kernel's API

3. files as resources manipulated through system calls, forming a consistent API

4. the filesystem hierarchy as a namespace system, not a direct map of disk layout

5. user/group IDs and permissions as the access control mechanism for resources (files)

6. processes, where all of the above comes together

I deliberately chose Go for the examples instead of C because I want this to be approachable to a broader audience of developers, while still being close enough to the OS to show what's really going on.

As a developer, this kind of understanding has been incredibly useful for me for writing better software, debugging complex issues with tools like strace and lsof, or the proc fs. I would like to help others to gain the same knowledge.


Can you also consider adapting Linux from scratch as a part of this series? Or Maybe after this series, you can expand what is learnt to build a minimal Linux distribution. I suppose that might give a good understanding on how to apply this knowledge and a have a foundation on the internals of the os itself.

I want to keep this series focused, but LFS-style content is definitely something I'm considering for later, I think it's a good idea.

That said, this series will also give you practical, applicable knowledge as we progress.


Really cool post, clear, easy to follow, just the right length and depth. Lookig forward to read the whole series!

Hi! Great article.

I guess also one of the points of using Go was the fact it has own memory management for obtaining memory pages it interacts only with the kernel.

I mean, had you used C, it would be better to compile it statically, otherwise you'd need to put also glibc and ld.so and what else into the initrd, I guess


Another "interesting" related thing I found is that pid 1 signals are handled differently in the kernel. Basically, SIGTERM is ignored and you need to explicitly handle it in your program. Took me quite a while before I found out why my program in a container didn't quit gracefully...

https://raby.sh/sigterm-and-pid-1-why-does-a-container-linge...


It's a bit unnatural to use Go when C is the "native language" of Linux and pretty much every operating system.

Talos Linux [1], "the Kubernetes Operating System", is written in Go. That means it exactly works as the little demo here, where the Kernel hands over to a statically compiled Go code as init script.

Talos is really an interesting linux distribution because it has no classical user space, i.e. there is no such thing as a $PATH including /bin, /usr/bin, etc. The shell is instead a network API, following the kubernetes configuration-as-code paradigm. The linux host (node) is supposed to run containerized applications. If you really want to, you can use a special container to get access to the actual user space from the node.

[1] https://www.talos.dev/ [2] https://github.com/siderolabs/talos/releases/tag/v1.11.5


Off-topic i guess. Are there like large scale success stories using this os?

Yes. I know at least one big cloud provider (actually the biggest) in Germany who uses Talos for their managed k8s.

I also use Talos, but I wonder if just using systemd for the init process wouldn't have been easier. You can interface with systemd in go quite easily anyways...

s6 (perhaps with s6-rc) is another interesting option. One could say it’s less opinionated than systemd. Or perhaps it’s more correct to say it has another set of opinions.

Go can speak C. It's fine.

The goal was to strip away most of the complexities (including C), to make the topic more approachable for a broader audience.

Go seemed a perfect fit, it is easy to pick up the syntax and see what is going on, but you can still be close to the OS.


I mean what you run is still machine code anyway, right?

Can anyone explain why CGO_ENABLED needs to be set to 1 here?

In the post it is set to 0. `CGO_ENABLED=0 go build -o init .`

The only reason is because I like to be explicit, and I could not know what was set before in the user's environment.


Systemd service unit and systemd-nspawn support could be written in Go, too;

From https://news.ycombinator.com/item?id=41270425 re: "MiniBox, ultra small busybox without uncommon options":

> There's a pypi:SystemdUnitParser.

> docker-systemctl-replacement > systemctl3.py parses and schedules processes defined in systemd unit files: https://github.com/gdraheim/docker-systemctl-replacement/blo...

From a container2wasm issue about linux-wasm the other day: https://github.com/container2wasm/container2wasm/issues/550#... :

> [ uutils/uucore, uutils/coreutils, uutils/procps, uutils/util-linux, findutils, diffutils, toybox (C), rustybox, ]




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: