Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Onno Kortman has taken semu, a minimal RISC-V emulator, and cross-compiled it with llvm-mos, an LLVM port to the MOS 6502 processor, in order to run Linux on the Commodore 64. Kortman writes: "The screenshots took VICE a couple hours in 'warp mode' to generate. So, as is, a real C64 should be able to boot Linux within a week or so."


The 6502 is a notoriously poor target for C compilation, especially C that hasn't been written with the 6502's limitations in mind. I'll bet if you wrote a RISC-V emulator in native 6502 you could get Linux booting on a real machine in a day instead of a week. Think about how many lives that'd save!

https://www.folklore.org/StoryView.py?story=Saving_Lives.txt


Says you. llvm-mos generates surprisingly efficient 6502 code given its age and maturity. Don't take my word for it, try some experiments with it on godbolt.


Sure. Here's a couple of functions to iterate over an array of "Ball" objects, as you might do in a Breakout-style game that has a multi-ball powerup. I didn't do anything to make it particularly 6502-amenable; it's how I'd write it for a modern machine, probably. Even compiled with -O2: as expected, the code is slow and enormous - a single addition compiles to something like 30 instructions.

[0] https://godbolt.org/z/f6Ysv7nve


Seems like your problem is more with the venerable 6502 itself rather than the compiler. Most of that assembly code is spent calculating the offsets inside the Ball struct, which must be done at 16 bits of resolution in every case. The compiler's using the indirect indexed (zero page address with Y offset) 6502 addressing mode to get at all the fields in your struct. It has placed all the variables in zero page, so no instruction is more than two bytes long; additionally, the code in question is entirely linear, with no JSRs or other subroutines. Note in particular how it efficiently uses DEY/INY pairs of one byte instructions to get at low and high bytes of 16-bit memory. Hand-written assembly might be speedier, but not by much and still deal with all the corner cases that your generated code does. "While writing Apple BASIC for a 6502 microprocessor I repeatedly encountered a variant of Murphy's Law. Briefly stated, any routine operating on 16 bit data will require at least twice the code that it should." -Steve Wozniak


I think that was partially his point - on 6502, typical-looking C code will be horrifically inefficient, at least when compared to other architectures more suitable for C.

For 6502, to get the optimum assembly you'd have to structure your data in structure-of-arrays instead of arrays-of-structures and use indices instead of pointers as much as possible (at least when amount of Ball objects would be < 256).


Yes, exactly. Were I hand-writing the assembly for the 6502, I'd make all sorts of decisions that the C code doesn't - and that a compiler can't - to make it more efficient.

Instead of passing in a pointer to two separate functions, I'd write a single UpdateBalls procedure that operated on global data. This data is going to be core to my game logic and physics, so I'd put it all on the ZP. As you suggested, "structure-of-arrays". I'd choose a fixed number of balls so I don't need an argument; maybe I'd set my loop to iterate backwards so I get a free zero check with the decrement, maybe I'd unroll the loop ("dead" balls can be placed off-screen with a dx/dy of 0). I'd probably decide that I don't need 16-bit precision for the deltas (how fast could the balls move, really?), and a 16-8 addition is going to be quicker than a 16-16 one.

The compiler isn't going to make these optimizations; that's not a slight against the compiler. In fact, I just checked - the output [0] when I write my C code this way is pretty close to what I'd hand-write. It's roughly a third the number of instructions and - I'm not going to cycle count, so this is a stab in the dark - would take maybe an order of magnitude fewer cycles to run. semu wasn't written with performance on the 6502 in mind, it's not going to have taken considerations like this, so it's going to inevitably be slow when compiled.

[0] https://godbolt.org/z/WYKKeh9b7


I'd actually like the llvm-mos to do an automated AoS to SoA analysis and rewrite, but haven't gotten around to it yet. There aren't any intrinsic theoretical obstacles I'm aware of though; it's just difficult code to write.

Now that this has come up again as the stock reason "you can't do C well on the 6502", replacing the stack, the zero page, and the register set, I'm probably going to reprioritize it and put the register allocator on pause.


I found it interesting it didnt utilize the X reg. Maybe instead of updating Y, X could have been used. Then I scrutinized the Y reg use:

On line 14, it uses Y, then decrements it to 0, uses it, increments it, uses decrements, uses it, then increment again.. why not perform the indirect load on lines 18 and 26 without the Y index and eliminate lines 16, 21, and 25?

here's my pseudocode:

rc2 <= base of struct

rc4 = rc2 + 4 // addr of dx

rc5 = rc5 + 0 // addr of x

rc6 = *(&rc2+4)

rc4 = *(&rc4+1) // get low byte

rc5 = rc6 + *(&rc2) // add high byte

rc4 = rc4 + *(&rc2+1) // add low byte

rc2 = rc5 // store high result

*(&rc2+1) = rc4 // store low result

I believe it could have done more to do the work in place, but my batt is about to die :(


I've yet to see the 6502 C compiler that can beat good assembly code. I appreciate the convenience and llvm-mos has certainly improved greatly, but if you want speed on a 6502, there's no substitute.


I've yet to see a 6502 C compiler get within sight of good assembly code. C makes many assumptions that are slow to implement on a 6502, and are notoriously hard to optimize.



The problem likely isn't what the compiler produces given the semantics of a piece of code, but rather what you'd do with data structures and memory layout if you'd target the 6502 directly.


Oh wow! This bootstrapping method reminds me of yet another Linux-on-an-8-bit-micro project (https://dmitry.gr/?r=05.Projects&proj=07.%20Linux%20on%208bi...), which used an 8-bit AVR with an ARMv5 emulator. But, this takes the cake in terms of geek coolness.


That's exactly what it reminded me of too; there's been plenty of HN discussion about it: https://news.ycombinator.com/item?id=19762928


I'd be curious to see this running on real hardware. I hope someone's able to make it work.


It might be more interesting than watching paint dry, just via risk that an old C64 will let the smoke out.


Why would it let smoke out? I doubt a C64 has any power management. So whether it idles or boots Linux via a couple of emulation layers, the thermal load will be exactly the same.


I know there can be issues with thermal saturation on a heatsink design-- it was expected not just to generate N watts of heat, but to only do so for M hours. Can you expect to leave a real 64 on for days or weeks and it will stay up? I wonder if stores that had them as demo units when it was a relevant product, for example, power-cycled them regularly.

I know from experience if you block the bottom intake vents on a VIC 20, so it can't convect properly, it will eventually start acting funky.

On a related note, I understand the 64's original power bricks are considered timebombs, they might also not appreciate being left on for weeks at a time.


Can you expect to leave a real 64 on for days or weeks and it will stay up?

Yes. These early computers found their way into various embedded control applications too, and I suspect there's quite a few C64s still in operation that way; they would've been replaced long ago if they weren't stable. An article occasionally appears when someone discovers this:

https://news.ycombinator.com/item?id=12604414


A fairly simple diagnostic test for a C64 is to touch the chips - if the machine is in working order the chips will be warm to the touch but not uncomfortably so. If any are uncomfortably warm odds are they've failed.

If you block all air circulation, sure you might eventually end up with problems, but it takes quite a bit.


I just saw a museum exhibition that featured a C64C, running all day, with a dust cover on, on what appeared to be the original PSU. I think it makes sense to be cautious about your own unit, but they’re probably not as vulnerable as people assume.


Definitely the Achilles heel in any C64 setup. Perhaps new PSUs are visually distinct from the old ones, but practically everyone had to replace theirs as they aged.

Now the 64C was released in 1986, four years after the 64 and its faulty power supplies came out. I don't know whether Commodore had decisively fixed the flawed PSUs by that time, but I know for sure that my second PSU lasted for the lifetime of that device too.


Just because it's likely an old and long dormant piece of electronics, nothing to do with linux beyond it having to run the machine for multiple days 24x7. My understanding is they don't come out of deep dusty storage in ready for service condition. Leaky caps.


C64 caps are generally fine and do not need replacing. The most common things to spontaneously go bad of ”old age” on a 64 are probably RAM chips and the PLA, and of course the power supply is a time bomb.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: