Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Local LLM Notepad – run a GPT-style model from a USB stick (github.com/runzhouye)
40 points by davidye324 8 months ago | hide | past | favorite | 9 comments
What it is A single 45 MB Windows .exe that embeds llama.cpp and a minimal Tk UI. Copy it (plus any .gguf model) to a flash drive, double-click on any Windows PC, and you’re chatting with an LLM—no admin rights, Cloud, or network.

Why I built it Existing “local LLM” GUIs assume you can pip install, pass long CLI flags, or download GBs of extras.

I wanted something my less-technical colleagues could run during a client visit by literally plugging in a USB drive.

How it works PyInstaller one-file build → bundles Python runtime, llama_cpp_python, and the UI into a single PE.

On first launch, it memory-maps the .gguf; subsequent prompts stream at ~20 tok/s on an i7-10750H with gemma-3-1b-it-Q4_K_M.gguf (0.8 GB).

Tick-driven render loop keeps the UI responsive while llama.cpp crunches.

A parser bold-underlines every token that originated in the prompt; Ctrl+click pops a “source viewer” to trace facts. (Helps spot hallucinations fast.)



> walk up to any computer

Windows users seem to think their OS is ubiquitous. But in fact for most hackers reading this site, using Windows is a huge step backwards in productivity and capability.


However the facts speak otherwise? Windows at 70%+ versus 4.1% for Linux globally. https://gs.statcounter.com/os-market-share/desktop/worldwide


> But in fact for most hackers reading this site

https://survey.stackoverflow.co/2024/technology#1-operating-...


idk... I gave up years of trying to switch to Linux as my main OS after the obvious difference in stability, support, ecosystem, and...yes even responsiveness in many apps.


Surely you're hinting at Linux, in which case this runs fine with WINE


Why not llamafile? Runs on everything from toothbrushes to toasters...


Seconded for Llamafile, here is a link for references https://github.com/Mozilla-Ocho/llamafile . It indeed is working on all major platforms and its tooling allows easy creating of new llamafiles with new models. The only caveat is Windows where there is a limit 4Gb for executable files so just a llamafile launcher and the gguf file itself must be used. But this approach will work anywhere anyway.


Interesting, will definitely try it. What can be expected? What other models do perform ok with this?


Wonder if you can use/interface with those coral accelerator boards




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: