And having a CPU with "128-by-64" or "64-by-32" division instruction (such as x86/x64) simplifies the algorithm even further.
Inputs: uint[128] N, uint[128] D
Outputs: uint[128] Q, uint[128] R
such that Q ≤ N, R < D, and N = Q * D + R
Uses: (uint[64] quot, uint[64] rem) hw_divmod(uint[128] dividend, uint[64] divisor)
if D.high = 0:
# long division; quotient can be 128-bit wide, but remainder will fit in 64 bits
uint[192] N' := N ≪ lzcnt(D.low)
uint[64] D' := D ≪ lzcnt(D.low)
uint[64] Q', uint[64] R' := hw_divmod(N'.high ≪ 64 + N'.mid, D')
uint[64] Q'', uint[64] R'' := hw_divmod(R' ≪ 64 + N'.low, D')
Q := Q' ≪ 64 + Q''
R := R'' ≫ lzcnt(D.low)
else:
# Quotient will fit in 64 bits, but remainder could be 128-bits wide
uint[192] N' := N ≪ lzcnt(D.high)
uint[128] D' := D ≪ lzcnt(D.high)
uint[64] Q', uint[64] R' := hw_divmod(N'.high ≪ 64 + N'.mid, D'.high)
uint[128] R'' := R' ≪ 64 + N'.low
uint[128] T := Q' * D'.low
if R'' < T:
Q := Q' - 1
R := D' + R'' - T
else:
Q := Q'
R := R'' - T
R := R ≫ lzcnt(D.high)
It's pseudocode. A mix of Python's block structure, C's style of variable introduction, and Pascal's use of ":=" for assignment and "=" for equality. Oh, and assigning to output variables instead of explicit return is from Go, although the tradition is much older, of course.
As for using "'" in variable names, can't recall from the top of my head what language it's from (I don't think it's Lisp), but something does that.
Nice, that's what i noticed; I rather like its clean structure.
But since it was put together by you/not-you, does it have a name and is it used as a Specification/IR/DSL language somewhere (since you gave the algorithm in it) ?