Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Inside the HP Nanoprocessor (righto.com)
133 points by picture on Sept 2, 2020 | hide | past | favorite | 41 comments


The resistor compensating the manufacturing process differences reminds me of when I worked on the 3DFx Voodoo and there was a chain of transistors that sat inline with the clock but you could select which output would be sent to the remote TMUs which were clocked by this line. Code in the start up would draw textured test patterns and examine the Frame buffer to adjust the clock timing by nanoseconds using the chain of transistors. This was actually necessary because of variances in the manufacturing. When 3DFx switched to a completely new chip maker our boards failed and we had to fix our startup code because it didn't have enough margin. Thankfully there were more transistors in the chain we weren't using before. Crisis averted. The reason our boards were susceptible than the reference design is that we had one of our TMUs slightly further away from the FBI.


It's interesting to hear that the 3DFx adjusted the clock that way. Coincidentally, I was just reading about similar clock adjustment in the Pentium II and 4. They had "adaptive deskewing", where a phase comparator would adjust the clock delay as needed. It sounds like 3DFx did the adjustment at startup, but the Pentium did it during use so it could compensate for temperature drift. The Itanium 2 had similar deskewing, except the value was set during manufacturing by blowing fuses.

Source: "CMOS VLSI Design", page 806.


IIRC Intel does similar but way more advanced automatic deskewing black magic in the thunderbolt controllers. That's how they can carry high speed PCIe signals so effortlessly across your average copper cable(it was originally supposed to be optical).


That's somewhat standard part of transceivers with addition that PCI-E lower layers implement forced skew themselves - even on the motherboard.

While I don't exactly know the case with Thunderbolt, "normal" Display Port uses PCI-E physical layer, just unidirectional and with different protocol on top.

On networking equipment, the necessary signal corrections are part of why (other than DRM) it's more expensive to use full transceivers that accept cables, vs. fixed-length cables with fixed transceivers vs. direct-attach cables which have minimal logic for signal quality.


This is an amazing story.Thank you for sharing it.


I particularly liked the description of the HP clock module referenced in the article:

>The design of the clock module was rather unusual. To preserve the time when the computer was powered-down, the clock module was built around a digital watch chip with a backup battery.17 Inconveniently, the digital watch chip wasn't designed for computer control: it generated 7-segment signals to drive an LED, and it was set through three buttons. To read the time, the Nanoprocessor had to convert the 7-segment display outputs back into digits. And to set the time, the Nanoprocessor had to simulate the right sequence of button presses to advance through the digits.'

That's quite the convoluted bit of interfacing, but no doubt using the off-the-shelf digital watch chip made it a "win". It's pleasingly Rube Goldberg.


I just broke someone's brain by relating this fact to them.

Layers of abstraction, even in the hardware.


I remember reading a part in iWoz about a circuit he built with an IC, that he created by ignoring the interface and knowing the internal circuit diagram.


Makes me think about https://en.wikipedia.org/wiki/Casio_F-91W, "the gitmo watch".


One of these arrived on my doorstep today.

I was realizing that I don't have a decent portable way to tell the time for long periods. Cellphones die in a few days, my AA-powered wall clock is hardly pocket-sized, and I haven't worn a watch in years. Seemed prudent to acquire.

I hit up the Casio outlet on eBay, searched "digital alarm" and sorted by price. Bam, F91W. Buy it now, done and done.

Then I started reading about its storied history! What a wild ride. If it's good enough for.... literally everyone from Presidents to terrorists... it's good enough for me.


I saw some projects in the 70s where they add a math co-processor to a standard 8-bit CPU by interfacing to an off the shelf calculator chip, with all the same issues. I'm sure it would be slower, but maybe the physical size might be the same as the ROM chips (1702s?) that would be required for the floating point code.


Back in the 90s, Intel ran a print advertisement for the Intel 387 portraying their competitors' math coprocessors as pocket calculators:

https://i.pinimg.com/originals/fd/1d/01/fd1d012149d9e7d67371...

I guess there was something to that? :)


That's an interesting ad, but a bit ironic given Intel's later Pentium floating point division bug.

On the topic of using calculator chips as coprocessors, National Semiconductor introduced the MM57109 Number Cruncher Unit (that is the real name) in 1977. It was essentially a repackaged 12-digit scientific calculator chip, operating on binary-coded decimal values with values entered in Reverse Polish Notation. This chip was absurdly slow; a tangent, for instance, could take over a second.

http://www.projects.scorchingbay.nz/dokuwiki/_media/electron...


Very much so, it reminded me of when you see some 'clever' code working around some legacy APIs.


Author here for all your Nanoprocessor questions. It's an unusual processor, lacking the ability to add or subtract. Even so, it was used in HP equipment, not just as a controller, but parsing strings and doing calculations.


The whole aspect of each chips voltage being so variable that they had to test them and hand wrote the operating voltage, making any use of the chip down to matching that voltage - certainly making drop in replacements interesting for repairs.

Then the last number on the chip to indicate speed.

All that hands on for each chip and selling for $15 at that time - makes you wonder how much they made upon them with all that manual binning needed.

Any idea on the margins back then for this chip?


Since the chip was used in HP products, there wasn't a margin as such. Much of the benefit was that they weren't paying margin to another company.

As for repairs, each product's service manual has a table specifying the correct resistor value for each Nanocomputer bias voltage. So you'd need to change the resistor if you replaced the processor.


Keep in mind that $15 USD in 1974 is more similar to $80 USD today. So budget appropriately.


> The Nanoprocessor supports indexed register access, but lacks the complex addressing modes of the other processors.

It wouldn't have much use for complex addressing modes, given the lack of RAM, no?

I'm trying to get my head around a RAMless computer and I'm not quite sure this all makes sense just yet.


If you can do everything in 16 registers, you don't need RAM. You have to get in the mindset of a control processor application, doing things like reading buttons on a piece of HP test equipment or sending bytes over the network. You're probably just dealing with a couple of values at a time and don't need RAM. This is a completely different set of applications from a typical computer.

As far as addressing modes, the Nanoprocessor has two modes for accessing registers. The first is direct, e.g. store the accumulator in register #3.

The second is indexed through R0. Store the accumulator in register # (3 OR R0). This lets you do table lookups (in a small table). Adding to R0 would make more sense, but the Nanoprocessor doesn't have an adder so they use OR.

So it makes sense to have multiple addressing modes even without RAM, but there's not a whole lot you can do with them.


Thanks for doing this, I remember a stumbling over this mystery controller in some piece of HP equipment I bought and at the time there was basically zero information about these around.


Did they sell it on the open market or was it an in house device?


It was an in-house device, not something HP sold as a product.


Thanks, that’s what I had assumed but read some comments from people who thought otherwise...


The processor was covered recently here as well. https://news.ycombinator.com/item?id=24109437

One neat aspect is it was intended to allow the use of an off chip, MMIO ALU if the design required it (and was still faster than a 6502 even with the separate ALU).


Yes, the HP voltmeter used two 74LS181 ALU chips so it could do error and scaling calculations.

The ALU was accessed through four I/O ports: two for the arguments, one for the operation and carry-in, and one to read the result. It wasn't memory-mapped, but I/O mapped since the Nanoprocessor didn't have memory operations (except reading instructions from ROM).

Instead of memory-mapped I/O, the Nanoprocessor had I/O-mapped memory. The real time clock module had 256 bytes of RAM that were accessed through I/O ports.


What's the distinction you're making between mmio and I/O mapped? That it only has absolute addressing? Or that it just calls it I/O?


Memory and I/O were separate spaces with separate pins and separate operations. The Nanoprocessor had 11 address lines for reading instructions from a 2K ROM. It had 4 I/O device select lines for accessing 15 I/O devices.

So if you added RAM (as in the real time clock), the RAM was accessed through I/O instructions. You'd write the address to one port and read the data through another port. It ended up looking a lot like microcode, with memory accesses split into two pieces.


It doesn’t have an alu but can do other critical arithmetic, notably increment/decrement and, crucially, indexing in the addressing unit. Also bit manipulation. So for a state machine that’s mostly look up tables it’s not worth building an alu.

I was surprised by the two-instruction skip — skip was still pretty common in those days, but I haven’t seen two before. I suppose it would be useful for setting a flag before branching, but I wonder how valuable it was in the end.


The two-byte skip was typically used to skip over a jump instruction, giving you a conditional branch. But in many cases, two instructions were enough to implement the conditional case.

The two-instruction skip could also be used in tricky ways to implement two entry points to a function. E.g.

  Entry 1: Set Accumulator bit 1
           If accumulator bit 1 set, skip two instructions
  Entry 2: Set something different for entry 2
           More setup for entry 2
           Code continues for both entry 1 and entry 2


It’s much later, but Arm’s “IT (If-then) makes up to four following instructions conditional (known as the IT block). The conditions can all be the same, or some can be the logical inverse of others. IT is a pseudo-instruction in ARM state.”


The masks show how critical alignment is in metal gate transistors. The green, magenta and light blue have to just touch. Too much overlap or too far apart and you don't have a working transistor.

With polysilicon gates the equivalent of the green would be one big rectangle, but since it would come after the gate (instead of being the first step like here) it would actually become two separate rectangles just touching the gate on each side.


Teacher: you'll never not need addition and subtraction.

HP: hold my -2 voltage


The earliest AVRs (the family of MCUs used in Arduino) had no RAM either, only 32 8 bit regs. One of these was AT90S1200. AFAIK it had higher max clock frequency then AT90S2313, the one with SRAM.


Note that it's has sufficient instructions to emulate addition and subtraction, since it has compare and decrement/increment. Would take O(n) instructions to add or subtract by N


This is the algorithm the HP clock module uses to combine two BCD digits into one byte. It adds the two values by incrementing one and subtracting the other in a loop. Since the BCD digit is at most 9, this is fairly quick.

I think you could implement a faster addition algorithm by testing the high order bit of the arguments, incrementing the result as needed, and then shifting. Repeating this 8 times should give you the sum, compared with up to 255 steps for the simple algorithm.


You could also use a look up table in memory, a la IBM 1620. (CADET, can’t add, doesn’t even try.)


@kens: small typo in the article:

"lacking even a mentioned on Wikipedia"


Thanks, fixed.


𝚂̶𝚘̶ ̶𝚠̶𝚑̶𝚎̶𝚗̶ ̶𝚊̶𝚛̶𝚎̶ ̶𝚢̶𝚘̶𝚞̶ ̶𝚐̶𝚘̶𝚒̶𝚗̶𝚐̶ ̶𝚝̶𝚘̶ ̶𝚠̶𝚛̶𝚒̶𝚝̶𝚎̶ ̶𝚝̶𝚑̶𝚎̶ ̶𝚆̶𝚒̶𝚔̶𝚒̶𝚙̶𝚎̶𝚍̶𝚒̶𝚊̶ ̶𝚙̶𝚊̶𝚐̶𝚎̶,̶ ̶𝚊̶𝚗̶𝚍̶ ̶𝚌̶𝚘̶𝚛̶𝚛̶𝚎̶𝚌̶𝚝̶ ̶𝚝̶𝚑̶𝚎̶ ̶𝚊̶𝚛̶𝚝̶𝚒̶𝚌̶𝚕̶𝚎̶ ̶𝚊̶𝚐̶𝚊̶𝚒̶𝚗̶?̶ ̶:̶)̶

Never mind, someone already created one today (which is not too surprising).


The clock module that they talked about is amazing.

I recommend the article just for that bit.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: