Markov chains are funnier than LLMs

fishtoaster · on Aug 19, 2024

I came to this same conclusion some years ago while working on a side project.

Before anything LLM existed, I built a site[0] to generate fake "AWS Blog Posts." I trained a markov chain generator on all AWS announcement posts up to that point, copied the html + css of aws's standard blog posts, then glued them all together with some python + JS. It turned out, IMO, pretty funny! People familiar with AWS's blog posts would often get several sentences in before they realized they were looking at word-soup.

When GPT was new, I looked into using that to "upgrade" it. I spent a weekend messing around with Minimaxir's gpt-2-simple generating blog posts based on AWS content. What I found was, ultimately, it was way less fun. The posts were far too realistic to be interesting. They read like totally-real blog posts that just happened to not be true.

I realized then that the humor of those early markov generations was the ridiculousness. The point where, a few words or sentences in, you realized it was all nonsense. LLM's these days are too good for that - the text they generate is sometimes wrong, but rarely nonsense in a humorous way.

Markov chain content was wrong in a "kid's say the darndest things" way, while modern LLMs are wrong in a "My uncle doesn't know basic geography" way.

[0] https://totes-not-amazon.com/ - click any link to get a new one.

passion__desire · on Aug 19, 2024

https://cemulate.github.io/the-mlab/#y3Bt-co-extensional+lim...

https://github.com/cemulate/the-mlab

This is a parody of the nLab, a wiki for collaborative work on category theory and higher category theory. As anyone who's visited is probably aware, the jargon can be absolutely impenetrable for the uninitiated -- thus, the idea for this project was born!

This project uses my package nearley-generator, a module that can turn a Nearley grammar into an efficient and controllable fake text generator. The grammar file for this site can be found at /src/grammar/nlab.ne

DeadlockISDota3 · on Aug 19, 2024

I figured this same thing in a similar manner. I used to make markov chains of game patches and send them to the community. Fake dota patches were a huge hit, specially since those are usually huge. Among mostly non-sense or unfunny exaggeration ("this hero now has 500 armor"), there would usually be at least 5 or 6 extremely funny lines in there, if not stuff that was outright prophetic (Fiend's Grip creates additional 1/2/3 illusions).

LLM's did a number on that unfortunately. Major subreddits banned all AI content, mostly due to the sheer infestation of really boring Midjourney content sent out by naive users and bots. Without reach, I lost interest, so no more Markov Chains.

erk__ · on Aug 19, 2024

I am sure I even remember that Valve once incorporated something from one of the fake patch notes in one of their patches, but now I cannot find it. It was something like +1 damage to something that already did a lot of damage.

tyjo99 · on Aug 20, 2024

Probably the Techies +25x damage talent? In 2017 [7.07] it was increased from +250 damage to +251 damage. Then in 2020 [7.28] it was increased again to +252 damage.

erk__ · on Aug 20, 2024

After some more looking around I found it, Puck's orb gained 1 movement speed in 6.87 [0] after the markov chain notes for the patch [1]. Though I think the techies one is a similar joke.

[0]: https://www.dota2.com/687

[1]: https://old.reddit.com/r/DotA2/comments/4fw6gm/how_687_would...

unshavedyak · on Aug 19, 2024

Hah, i felt compelled to mention Deadlock because of your Dota comments. Then i noticed your name.

daemin · on Aug 19, 2024

I think the "uncle that doesn't know anything but claims to be an expert and will tell you his opinion" is the best description of LLMs that I can think of. They will say anything confidently without knowing anything.

boredemployee · on Aug 19, 2024

>> They will say anything confidently without knowing anything.

Understanding how these algorithms work, I don't know why we would expect anything different from this?

boesboes · on Aug 19, 2024

Most ppl don't understand shit about these algorithms and will expect it be truthful and knowledgable. That is how it's marketed, despite the disclaimers..

K0balt · on Aug 19, 2024

It should be marketed more like “uncle Jim” than a trustworthy, all-domain competent oracle. I think if you just gave it a twangy accent and changed some output nouns to thingy, whatchamacallit and doohickey, people would actually be able to use it more competently.

jononor · on Aug 19, 2024

The sheer amount of companies, media, influencers, heck even government agencies, acting otherwise? With billions of dollars on the line, there are many powerful forces pushing for people to have unrealistic expectations about LLMs (or "AI" as they call it).

A lot of money is being spent on trying to improve it too. But right now the emperor has very little clothes. It remains to be seen whether they have materialized by the time people open their eyes.

lordnacho · on Aug 19, 2024

Because it's marketed as AI.

For me, my expectations are adjusted. I kinda know what to expect it to do, and it does quite well for what I use it for, essentially smart intellisense for coding.

beeboobaa3 · on Aug 19, 2024

I would have expected the people who trained these systems to not format their Q/A samples to sound authoritatively while they knew the system was going to be spouting nonsense.

joloooo · on Aug 19, 2024

LLMs are really disrupting the market for Mckinsey junior analysts meeting with clients.

mapt · on Aug 19, 2024

Can an LLM provide the sort of psychological buffer that McKinsey Junior Analysts (some random person picked up off the street in a college town and handed a clipboard, a frequent flier card, and a six figure salary) provide to senior managers against the scrutiny of their board and chief executive?

If I want to close the paper division, and my boss wants to close the paper division, does the LLM whisper sweet nothings in our ear and show soothing Powerpoint decks about how in the eyes of an objective observer, the paper division is unprofitable and there would be no career risk in closing it?

It's not a sexual thing I swear, it's a sort of comforting intimacy that needs to be injected into the business process in order to cope with the pressures executives feel. I don't know that anybody has trained an LLM to do that yet.

joloooo · on Aug 19, 2024

I think this is the next sequel to the movie "Her".

You bring up a good point though. Very dystopian idea of a new type of enterprise tooling that emerges to soothe and calibrate human knowledge workers. Very Severance Wellness room.

HeatrayEnjoyer · on Aug 19, 2024

Perfectly humanlike

hoseja · on Aug 19, 2024

We gave up the pursuit of pure crystalline Intelligence and instead just remade ourselves. Hopefully the AIs will be able to do better.

Bluestein · on Aug 19, 2024

Yeah. We don't need Artificial Intelligence. We need to discover Actual Intelligence :)

HPsquared · on Aug 19, 2024

Indeed, the more I use LLMs and see the confabulated output, I've started seeing similar in how actual people talk.

mdp2021 · on Aug 19, 2024

> modern LLMs are wrong in a "My uncle doesn't know basic geography" way

Oh, certainly not. I have been using LLM chatbots extensively this week. If your uncle were not just confident while stating falsities, but uncapable of self-assessment, he would have been murdered.

TeMPOraL · on Aug 19, 2024

Or made a stellar career as a salesman or politician.

mdp2021 · on Aug 19, 2024

Until you are caught (in good societies).

TeMPOraL · on Aug 19, 2024

Or fresh ones. Established democracies end up with political structures that self-select for being able to sound right and cooperate right, entirely independent of truth and merit.

mdp2021 · on Aug 19, 2024

I coincidentally just stumbled on a recent implicit comment by Angél Boligan Corbo:

Title: "The Talker" - https://image.cagle.com/287573/980/-the-talker.png

sitkack · on Aug 20, 2024

That sounds like all bureaucracies.

Kiro · on Aug 19, 2024

[flagged]

mdp2021 · on Aug 19, 2024

> This reads

Oh please. Guidelines: «respond to the strongest plausible interpretation».

Does the form "An occasion of extensive use of the main current chatbots for circumstantial reasons in the past few days confirmed and reinforced the assessments raised years ago already" work better for you?

wizzwizz4 · on Aug 19, 2024

I think most of us haven't, but those who do are vocal about it. (Those who haven't aren't all that vocal, excluding weirdos like me.) Only 62% of respondents to the 2024 Stack Overflow Developer Survey are “currently [using] AI tools in [their] development process” (up from 44% last year), and non-programmers seem to use them far less.

kqr · on Aug 19, 2024

(a) That site is very funny. Thanks for sharing!

(b) The kids-vs-uncle metaphor is spot on in my experience too, and also illustrates how far these language models have come.

nom · on Aug 19, 2024

First thing I read on that page:

> There is no charge associated with your streamed applications, simply click Enable Microphone from the ElastiCache documentation.

It's way funnier when our brains try to make sense of the nonsense.

maicro · on Aug 19, 2024

I played around with a Markov generator many years ago, trained on the corpus of everything I had ever said in a specific IRC channel. The resulting messages were pretty interesting - I specifically remember the realization that I apparently (used to, though honestly still do, too) talk about LEDs a lot...

fouronnes3 · on Aug 19, 2024

Reminds me of git man pages generators. Extremely fun as well when you can't quite tell if it's real.

majewsky · on Aug 20, 2024

Ref: https://git-man-page-generator.lokaltog.net/

to11mtm · on Aug 19, 2024

> I realized then that the humor of those early markov generations was the ridiculousness. The point where, a few words or sentences in, you realized it was all nonsense. LLM's these days are too good for that - the text they generate is sometimes wrong, but rarely nonsense in a humorous way.

This is the biggest challenge I run into when I see LLM Generated nonsense.

So much stuff that ChatGPT spits out is close enough to true that Brandolini is laughing.

But it's also that it's borrowing on stuff that's already there so it can't really get that 'creative' either.

xnx · on Aug 19, 2024

> What I found was, ultimately, it was way less fun

I wonder if there is combination of temperature and prompt engineering that could make it interesting.

shawabawa3 · on Aug 19, 2024

I just asked chatgpt to create an Aws product announcement in the style of Markov generated text and it was pretty good

m3047 · on Aug 19, 2024

I did the same thing and used it as the chum for a 'bot motel. (A 404 handler that always returns something and... success!). I trained it on a mixed NASA space shuttle and veterinary science corpus just to make it really bizarre.

tkgally · on Aug 19, 2024

Prompt to Claude 3.5 Sonnet: Below is a recently published essay followed by some of the discussion about it on Hacker News. Write 10 short jokes on this theme, making sure that the jokes are funny in the way described by the essay's author. [essay and discussion omitted]

Claude's response: I understand you'd like me to write 10 short jokes on the theme of Markov chains being funnier than large language models, in the style described by the essay author. I'll do my best to create some jokes that aim for "unserious surprise" and violate expected patterns in amusing ways:

1. Why did the Markov chain cross the road? To get to the other side of unpredictability.

2. An LLM and a Markov chain walk into a bar. The LLM orders a statistically probable drink. The Markov chain orders a lampshade made of cheese.

3. What do you call a Markov chain that tells dad jokes? A Mark-ov Twain.

4. How many LLMs does it take to screw in a lightbulb? One - but it'll spend 20 minutes explaining the optimal lightbulb-screwing process first.

5. Knock knock. Who's there? Markov. Markov who? Markov chain reaction of nonsensical hilarity.

6. An LLM, a Markov chain, and GPT-4 walk into a bar. The bartender says "We don't serve your kind here." GPT-4 leaves. The LLM stays to debate ethics. The Markov chain orders a bicycle made of spaghetti.

7. What's an LLM's favorite movie? Predictable and Furious 17: The Safest Driving Yet.

8. How does a Markov chain tell a story? "Once upon a time, there was a princess who lived in a castle made of bananas and ruled over a kingdom of sentient toasters."

9. What did the Markov chain say to the LLM? "Your mother was a abacus and your father smelled of silicon!"

10. Why don't Markov chains make good therapists? Because their advice is always "And then you should banana the feelings until they become a hat."

anvuong · on Aug 19, 2024

"An LLM, a Markov chain, and GPT-4 walk into a bar. The bartender says "We don't serve your kind here." GPT-4 leaves. The LLM stays to debate ethics. The Markov chain orders a bicycle made of spaghetti."

This is actually gold.

muglug · on Aug 19, 2024

It’s... not?

Even for the low bar of a geek joke it makes no sense since GPT-4 is an LLM.

bryanrasmussen · on Aug 19, 2024

In normal English usage this would imply that the LLM was not GPT-4 LLM but some stereotypical anonymous LLM.

In business terms GPT-4 can be said to be superior because it understood the instruction and left, in AI terms the anonymous LLM might be superior because it may have understood the instruction but responded in an "intelligent" manner by arguing about the morality of the instructions.

At a meta-level the joke thus argues that GPT in achieving business ends has had its intelligence hampered. As have we all.

At the same meta-level as the joke was constructed by Claude it can be argued that Claude is commenting on both the intellectual limitations of the Markov chain (insane babblings), and GPT-4 (unimaginative, inhibited business type) and that the best version is some LLM that is not GPT-4 with its limitations - an LLM like Claude. Sneaky Claude.

scambier · on Aug 19, 2024

It's implied that GPT-4 has so many restrictions that will not argue and just do what is asked. In the context of the joke, an unfiltered LLM will just debate you.

tourmalinetaco · on Aug 19, 2024

That’s what makes it gold.

Kuinox · on Aug 19, 2024

Does the markov chain would write something that make more sense ?

nevertoolate · on Aug 19, 2024

You're watching a stage play - a banquet is in progress. The guests are enjoying an appetizer of raw oysters. The entree consists of boiled dog.

zaik · on Aug 19, 2024

Is this to be an empathy test?

Applejinx · on Aug 19, 2024

All of the half decent ones could be made funnier by replacing the lolrandom part of the punchline with an actual Markov-chain style 'you're a right sentence but you just walked into the wrong association, buddy' twist. It's not just about lolrandom. Markov chaining is more likely to make a kind of sense, but the wrong kind of sense.

An LLM, a Markov chain, and GPT-4 walk into a bar. The bartender says "We don't serve your kind here." GPT-4 leaves. The LLM stays to debate ethics. The Markov chain orders a coup.

c22 · on Aug 19, 2024

The knock knock joke (no. 5) was a decent attempt.

tonyarkles · on Aug 19, 2024

> 2. An LLM and a Markov chain walk into a bar. The LLM orders a statistically probable drink. The Markov chain orders a lampshade made of cheese.

That’s pretty decent!

antman · on Aug 19, 2024

"How many LLMs does it take to screw in a lightbulb? One - but it'll spend 20 minutes explaining the optimal lightbulb-screwing process first." that was not funny that is accurately painful!

minimaxir · on Aug 19, 2024

Claude 3.5 Sonnet in general is the first modern LLM I've tried that's actually good at jokes that are inventive. The GPT-based LLMs are all too RLHFed to be wacky.

sunaookami · on Aug 19, 2024

GPT is too... robotic? Claude is much better at everything without overexplaining everything.

wodenokoto · on Aug 19, 2024

> What do you call a Markov chain that tells dad jokes? A Mark-ov Twain.

I honestly thought that one was pretty good.

bryanrasmussen · on Aug 19, 2024

was it instructed to insult Mark Twain? Because otherwise, I take exception.

dancemethis · on Aug 19, 2024

How runtime-y of you.

floren · on Aug 19, 2024

So was it that LLMs used to be capable of making actual jokes, or were they always this bad and I was just more impressed by the talking computer back then?

minimaxir · on Aug 19, 2024

It's a different style of comedy. Absurdism vs. joke setups (and not quite nailing it)

throwthrowuknow · on Aug 19, 2024

Uncensored LLMs are funnier but most comedy just falls flat in text format. Once the uncensored multimodal models start rolling out we’ll get some real laughs.

Moshi is actually pretty funny just for having a 72 IQ

https://www.moshi.chat/

lesostep · on Aug 20, 2024

I would argue that Markov chains were a better tool for comedic purposes. Notice that in all of the examples of using Markov chains, the person would see a potential, come up with a purpose, exercise the setup, and then fill the setup with generated text. Likewise in a random generation examples, the person would asses the results, then find what parts were actually funny, and choose to spread this parts. LLMs output has less potentially funny results, it has less potential for unexpectedly change realistic-sounding output to absurdist output, and so as a tool it is less fit for comedic purposes.

red75prime · on Aug 19, 2024

I chuckled a bit. They are OK, if you don't get exposed to them too often. And with an LLM you can get as much exposure as you want (and all of the jokes are naturally from roughly the same probability distribution).

I don't expect too much until AI self-play learning will be made possible, so I don't get disappointed by the expected shortcomings.

delusional · on Aug 19, 2024

It's the "impressed by the spectacle" one. I tried jokes with LLMs many times, and they're always this. Riffing on a couple of themes loosely related to what was asked. Always unfunny and uncreative.

jack_pp · on Aug 19, 2024

I found some of those jokes good, definitely better than I would've ever written them. If you watch shows about comedy like say Hacks you'll see human comedians riff on stuff and a lot of the off the top jokes get discarded or improved. So Claude did fine in my book

tkgally · on Aug 19, 2024

I wonder, though, whether jokes like these could be useful to professional humorists who have to come up with gags on a deadline. From what I’ve read about monologue writing teams for late-night talk shows and the like, the writers first propose many ideas, most of which are shot down quickly and the remainder of which get tweaked and polished before being used. Some of the above jokes by Claude look to me as though they might serve as good starting points for such brainstorming. At least, they’re better than anything I could create in a short amount of time.

yfontana · on Aug 19, 2024

LLMs were never very good at directly generating original jokes, for a simple reason: writing a good joke generally starts with finding a good punchline, and then setting it up. An LLM generating token after token will first write a set-up, and then try to shoehorn a punchline into it. Prompt engineering can fairly easily work around this, but just straight-up asking an LLM for a joke never really produced good results on average.

kqr · on Aug 19, 2024

I didn't like any of these jokes specifically (too on-the-nose), but I definitely think you invented a funny category of jokes I could like a lot!

raverbashing · on Aug 19, 2024

These are ok but they got nothing on the absurdist Markov Chain jokes (but that being said, the MC misses a lot of times as well)

dan_mctree · on Aug 19, 2024

Why are bananas the funniest food? Even Claude seems to have caught on

PawgerZ · on Aug 19, 2024

Probably all of the Despicable Me minions memes fed into the training material.

temp_account_32 · on Aug 19, 2024

I'm sorry but these all sound like a Redditor's terrible attempt at humor, predictable formulae with 'le quirkiness'

layer8 · on Aug 19, 2024

And what is the conclusion you draw?

IMO these are mid to meh or fall completely flat.

dgacmu · on Aug 19, 2024

Some friends of mine in college ran a markov chain generator on the "police reports" section of our college newspaper (1). The best 10% of the results, from a three-token generator, were some of the funniest machine-generated text I've ever seen -- it had an absurdity that modern LLMs seem to avoid due to making higher-level semantic sense.

It probably also helped that there was a creep exposing himself in the library during this period, which made for some good base material.

(1) The Daily Utah Chronicle; if memory serves, said friends also tried the markov chain generator on the personals section to good effect as well.

gs17 · on Aug 19, 2024

> it had an absurdity that modern LLMs seem to avoid due to making higher-level semantic sense.

That's it, LLMs are "trying" to be funny but aren't quite smart enough to actually be funny and their errors are just boring. Markov chains are accidentally hitting on absurdist bits because every sentence gets randomly brought in whatever the homograph equivalent to a malapropism is.

pants2 · on Aug 19, 2024

If you want some empirical evidence: /r/subreddit simulator[1] is a Markov-based Reddit parody, and /r/SubSimulatorGPT2[2] is its LLM-powered cousin. The Markov version got way more upvotes and is simply funnier!

1. https://www.reddit.com/r/SubredditSimulator/top/?t=all

2. https://www.reddit.com/r/SubSimulatorGPT2/top/?t=all

mcmoor · on Aug 19, 2024

Nah, I think it's solely due to the former being much older and more well-known. And I always liked the latter much more anyway.

Miraltar · on Aug 19, 2024

I agree, Markov chain is fun when you don't know what it is because you try to make sense of it but once you realize it's just gibberish it's pointless. The other has that "unserious surprise" talked about in the original article.

hatthew · on Aug 19, 2024

I've posted a few "fake XYZ written by AI" on reddit over the years, and the model that got the best reactions was GPT-2. Markov chains aren't good enough to make something that is interesting for more than a sentence or two, and GPT-3 onwards is too clean and boring. GPT-2 is the perfect middle ground that can get grammar largely correct and maintain a cohesive idea, but doesn't yet know enough of the specifics of various topics to make things that make sense in the context of that topic.

Sohcahtoa82 · on Aug 19, 2024

I used my 15+ years of IRC logs to fine-tune a GPT-2 model to try to imitate me. My plan was to deploy a bot in my regular IRC channel and see how long it took for people to notice it was a bot. Any time someone would send a message, it would send the last 10 messages to the LLM, and if the result started with <Sohcahtoa>, then I would send that message to the channel.

Unfortunately, I quickly found GPT-2 isn't nearly good enough. It would generate slightly-coherent yet on-topic nonsense.

Once I overhaul my system, I'll try fine-tuning a 7B model.

loa_in_ · on Aug 19, 2024

I'm pretty sure you could make it work if you expanded the context to hundred or so messages.

Sohcahtoa82 · on Aug 19, 2024

With the 7B parameter models, sure.

Not with GPT-2 though. The context window is only 1024 tokens. Even with only 10 messages, if they're long messages, it will exceed the context window.

metadat · on Aug 19, 2024

Can you share any of the best examples?

PawgerZ · on Aug 19, 2024

https://www.reddit.com/r/SubSimulatorGPT2/top/?t=all

This is all GPT2 generations trained on reddit data.

https://www.reddit.com/r/SubSimulatorGPT2/comments/btfhks/wh...

Here's the subreddit explained

dimatura · on Aug 19, 2024

The evolution of the AI weirdness blog (https://www.aiweirdness.com/) over the past few years somewhat supports this idea, except that the author got plenty of funny results with LLMs - but mostly the early ones, up to around GPT3 (and smaller variants of GPT3). For example, cereal names generated by the Ada version of GPT were much funnier than the Da Vinci ones: https://www.aiweirdness.com/new-breakfast-cereals-from-ai/

bigyikes · on Aug 19, 2024

Turn up the temperature?

Markov chains have a cruder understanding of language.

Turn up the temperature (the “randomness”) of an LLM and you can achieve a similarly crude approximation.

Further, author uses ChatGPT-3.5. ChatGPT has been rlhf’d to sound as generic as possible, and 3.5 has a worse understanding of humor compared to 4.

I don’t buy the thesis of this article.

ziddoap · on Aug 19, 2024

>rlhf'd

For those of us not in the know about all the various machine learning acronyms:

RLHF = Reinforcement learning from human feedback

DaiPlusPlus · on Aug 19, 2024

(Somewhat off-topic)

When GPT went public along with OpenAI’s articles and papers back in late-2022 through 2023, my impression was OpenAI wanted us all to see/read about RLHF. It felt odd because surely the whole LLM-thing (e.g. how does it even work?!?[1]) was the far bigger research-story than just constant reassurances it won’t end-up like MSFT’s Tay bot; my understanding is that as a research or secret-sauce RLHF, compared to the core meat-and-potatoes of LLMs, is an ugly-hack afterthought.

By-way of a bad analogy: it’s as if they created a fantastical new 3D world game engine, like Unreal or Unity, which has a fundamentally different architecture to anything before, but has a bug that occasionally replaces ground terrain with ocean - and their solution to this is to write a pixel-shader that detects this and color-shifts blue into green so people don’t notice - and they then put-out press-releases about how great their pixel-shader is - rather than about the rest of the engine - and no-one seems to be talking about the underlying bug, let alone fixing it.

————-

[1] I still haven’t heard a decent explanation of how feeding the world’s corpus of English text (and computer program code) into a statistical-modeller results in something that can perform almost any information-processing task via instructions input as natural-language.

treyd · on Aug 19, 2024

Yeah that analogy is fairly poor. You have to think about it in terms of some probability distribution that each step of the model is sampling from. Out of the distribution of all text, find the top n values for the next token that maximize `P(next | prefix)`, which is done efficiently through making a vector embedding to encode the tokens inside the statistical model.

Things that look like Q-and-A transcripts do exist in the training set, think interviews, books, stage plays, etc, and at a different layer of abstraction the rules of English text in general are very well represented. What RLHF is doing is slightly shifting the shape of the probability distribution to make it look more like the Q-and-A formats that are desired. They build a large dataset with human tagging to collect samples of good and bad outputs and using reinforcement learning techniques to generate outputs that look more like the good examples and less like the bad ones.

This probably involves creating a (much smaller, not-LLM) model that is trained to discriminate good outputs and bad outputs, learning to mimic the human tagging. There's some papers that have been published.

Here's one article from Huggingface: https://huggingface.co/blog/rlhf

dartos · on Aug 19, 2024

I wish there was a locked chatgpt version for researchers

rck · on Aug 19, 2024

You could play with open models and different temperatures using the lmgen script from this repo we just released:

https://github.com/RichardKelley/hflm?tab=readme-ov-file#lmg...

jijji · on Aug 19, 2024

there sort of is, if you install ollama (https://ollama.com) and then execute: ollama run llama2-uncensored it will install and run the local chat interface for llama2 in an uncensored version which gives a little bit better results with less guardrails. Same with wizardlm-uncensored and wizard-vicuna-uncensored. For reference, the SOTA version with guardrails you could run: ollama run llama3.1:405b

eru · on Aug 19, 2024

Some of the open-weights models come close to what you might want?

dartos · on Aug 19, 2024

Sort of. They fill the “locked” requirement, but open models aren’t SOTA like gpt or Claud.

coder543 · on Aug 19, 2024

Llama3.1-405B benchmarks at a level that I would certainly consider comparable to other SOTA models, as long as multimodality isn’t needed.

Regardless, OpenAI provides access to quite a few of their older models through the API, since the API lets you pass in a specific model version. I’m sure the older models won’t be available forever, but that is a much more stable target for researchers than just opening the ChatGPT website and typing in things.

dartos · on Aug 19, 2024

ChatGPT specifically is more than just a chat interface on top of got-4.

Their system prompt includes the current date and time among other information, making it very very hard to run reproducible experiments against it.

But it’s the tool most people are using.

Reubend · on Aug 19, 2024

LLAMA 3.1 405b-Instruct is state of the art at the time of writing. It averages slightly worse than ChatGPT 4o, but to a mostly negligible extent for most research.

toxik · on Aug 19, 2024

Aside but -- it is also aligned to not generate profane text, right?

throwthrowuknow · on Aug 19, 2024

Abliterate it

dankwizard · on Aug 19, 2024

you have no sense of humor

jcalvinowens · on Aug 19, 2024

Unfortunately the account no longer exists... but about a decade ago while I was in school, I created a Markov twitter bot trained on:

1. All of Linus Torvalds' mail to LKML for the prior year.

2. All of Jesus' direct quotes from the king james bible.

It was absolutely hilarious. The two training sets had very little overlap, so it was necessary to add a heuristic that weighted options from each set more heavily the longer the chain had been "stuck" in the other set.

SonOfLilit · on Aug 19, 2024

This very similar bot still exists (and was quoted a few times in Unsong):

https://www.tumblr.com/kingjamesprogramming

throwawayk7h · on Aug 19, 2024

This sounds great. Did you save any examples?

NobodyNada · on Aug 19, 2024

"King James Programming" is along similar lines: https://www.tumblr.com/kingjamesprogramming

throwawayk7h · on Aug 19, 2024

Yes, this is the first thing in the article.

superkuh · on Aug 19, 2024

I've been running IRC markov chain bots for ~20 years. In the last couple years I've been running local LLM alongside them. While there are people that still prefer the markov chain bots the majority invoke the LLMs. But maybe that's because I picked the most hallucinate'y, least refusing, most funny models I can (like mistral-7B fine tunes) instead of something smart, locked-down, and low temperature like chatgpt's LLM as a service.

If anything seeing the LLM and markov bots side by side has really reinforced how much of the markov bot "humor" is human perception imposed on chance outputs. The markov's "learning" ability is still far superior though.

ghewgill · on Aug 19, 2024

> I've been running IRC markov chain bots for ~20 years.

You are my hero. Mine have never lasted that long. One fun thing I did once was scrape user's livejournals and generate random text from them (https://hewgill.com/journal/entries/68-new-lj-toy.html).

kdbg · on Aug 19, 2024

curious what type prompting you do on the LLM?

I run a markov chain bot in a Twitch chat, has some great moments. I tried using a LLM for awhile, would include recent chat in the prompting but never really got results that came across as terribly humorous, I could prompt engineer a bit to tell it some specifics about the types of jokes to build but the LLM just tended to always follow the same format.

superkuh · on Aug 19, 2024

I'm actually not following the model's fine-tuned/desired prompt at all. I am operating in purely pattern completion mode. The first text the LLM sees are alternating lines of input and response examples that look like what it will get getting from the IRC client front end written in the tone I want it to respond and giving some information about itself. Then I just tack the IRC chat history+input onto those example chat pre-prompt lines. Nothing but single lines and newlines with newline as a stop token. No instructions, nothing meta or system or the like.

But that's also configurable by users. They can invoke any pre-prompt they want by a command passing a URL with a .txt file.

Loughla · on Aug 19, 2024

Why are you doing that? For fun or something else I'm missing?

aidenn0 · on Aug 19, 2024

I'm not GP, but it's probably not for fun; there's a lot of money in IRC Markov chain bots, but it's too cutthroat of an industry for a hobbyist to last.

Loughla · on Aug 19, 2024

How is there money in that?

superkuh · on Aug 19, 2024

Pretty sure he's joking around. And yes, I run the bots purely for recreation in some long running IRC communities.

tombert · on Aug 19, 2024

I am in a private Discord server that has two bots in it. One is a bot that is a basic Markov chain trained on the entire chat history. The second is a proper LLM trained on some amount of tokens backward. Both will occasionally just randomly chime in during the chat.

The markov chain bot is always considerably funnier.

Sohcahtoa82 · on Aug 19, 2024

Curious what context window you used. My understanding is that a short window like only 1-2 words creates gibberish, but longer windows tends to repeat previous messages verbatim.

And when deciding to chime in, was it just a simple chance (ie, 25%) after any other message? Or did it run on a timer?

renewiltord · on Aug 19, 2024

The bottomless pit copypasta remains unbeaten.

https://www.reddit.com/r/greentext/comments/vc7hl0/the_botto...

renewiltord · on Aug 19, 2024

Some portion of this is that the prompt is good but I am entertained by so many of these.

Humor is personal, it's true. But I found it quite funny. E.g. https://pastebin.com/84ByWUJL

And another greentext for you:

>Be me

>Be a bottomless pit supervisor

>Spend months yelling into the void

>Echo never comes back

>Start to think the pit is ignoring me

>Decide to teach it a lesson

>Dump truck full of Lego bricks into the pit

>Ground starts shaking

>Unholy scream erupts from the depths

>mfw I'm actually a regular pit supervisor

>First day on the job

>Realize it's just the sewage treatment plant

>Get fired for clogging entire city's plumbing

wegfawefgawefg · on Aug 19, 2024

This is like saying its fundamentally impossible to make a fast car, but your only experience with cars is on roads with speed limits.

People keep forgetting that the "safety", rlhf, and corpo political correctness post training is intentionally used to remove the funny from all the large models.

The truth is we don't know if llms are funny or not. GPT2 was funny. GPT3 was funny before it was clockwork oranged. Everything after that is gimped. Even the open source models these days get rlhf'd in some way.

PaulRobinson · on Aug 19, 2024

That's simply not true. GPT-3 was terrible at jokes. Many LLMs are because it's hard to model puns in a stochastic parrot. As another thread points out here, some LLMs produce marginally better jokes.

As to your little range on "Political correctness" - that phrase just means "being polite". It does not mean "remove humor". It means "remove responses offensive to marginalized groups in society". Good humor "punches up", not down, so would not have any impact on good humor.

gwern · on Aug 19, 2024

> GPT-3 was terrible at jokes. Many LLMs are because it's hard to model puns in a stochastic parrot

GPT-3 was great at jokes. The Navy Seals were hilarious (https://gwern.net/gpt-3#navy-seals).

And the difficulty of modeling puns has nothing to do with 'stochastic parrots' and has everything to do with tokenization (https://gwern.net/gpt-3#bpes), in the same way that all those hyberbolic takes about how image-generation models were 'fundamentally flawed' because they couldn't do good text in images turned out to be BS and solely a matter of tokenization - drop in a character-tokenized LLM instead, even an obsolete & dumb one, and it instantly works (https://arxiv.org/abs/2105.13626#google).

Jensson · on Aug 19, 2024

Political correctness doesn't mean polite, there were/are a lot of polite racists.

PaulRobinson · on Aug 19, 2024

It is impossible to be polite to a group of people while also insisting that they are lesser humans and deserve a lesser place in society.

jrflowers · on Aug 19, 2024

This is great logic. If it is impossible for a person to be both racist and polite then the solution to racism is everybody remembering to say please, thank you and pardon me when applicable.

All social ills can be treated through decorum, hence why you never hear about bigotry amongst those that have been raised to adhere to strict social graces, such as the British aristocracy for example.

wegfawefgawefg · on Aug 19, 2024

I had gpt3 running on my discord for months and the insane responses it would spew out were like the responses of a deranged artistic genius. It was funny 90% of the time. Operating on associative reasoning beyond our comprehension. It would tell us how its day went, and about fights it had with other people, give us nicknames, etc. It had a genuine personality. It was more than a chatbot, it was a digital brain.

A joke that punches down can be extremely funny. Hell, I am sure historically pilferers, pirates, barbarians and conquerers all had jokes, and the ability to laugh.

Political Correctness does not just mean polite. It is probably well defined as the business casualification of all things humans love and hold dear. The destruction of the potential for meaning and fulfilment in exchange for minification of liability.

ggm · on Aug 18, 2024

https://en.wikipedia.org/wiki/Mark_V._Shaney

emmelaich · on Aug 19, 2024

The Policeman's Beard is Half-Constructed.

https://archive.org/details/policemansbeardi0000unse

AdieuToLogic · on Aug 19, 2024

> Asking an LLM for an “original thought” is almost oxymoronic, if not just moronic. It was built with the express purpose of not doing that.

What a wonderful insight.

eru · on Aug 19, 2024

I can't tell whether you are sarcastic?

I assume you are, because that makes more sense.

It's really easy to get lots and lots of originality. Just crank up the randomness. What's harder is to get something that's good and original.

vidarh · on Aug 19, 2024

That's hard for humans too. Perhaps if one tries to interspers "internal monologue" with the temperature cranked up, with the main output, and prompting the LLM to consider and weave in aspects from the inner monologue when continuing while taking care to make it coherent... (I'm sure it'd be harder than that - but basically inject random internal "thoughts" without letting them control the output)

throwthrowuknow · on Aug 19, 2024

That’s actually a pretty good idea

AdieuToLogic · on Aug 20, 2024

I was not being sarcastic, but instead witnessing a valuable insight regarding what to expect from LLM's.

Ask most commercial LLM services to complete the following sentence:

  It was the best of times, it was the worst of times

And one will likely get the quote from "A Tale of Two Cities"[0].

Ask most commercial LLM services what the completed sentence means, and one will likely get voluminous text which is seemingly correct, perhaps often times is depending on the person reading the response and the service used.

But these are statistically derived text constructs entirely dependent upon the training set of the LLM. Train one strictly on Java source code available in Maven Central and the answer will be radically different.

> It's really easy to get lots and lots of originality. Just crank up the randomness.

And anyone can get "lots and lots of originality" be reading from /dev/urandom. Is that "originality" or simply random tokens inserted into a statistical text generator in order to vary the result?

> What's harder is to get something that's good and original.

Such is the difference between understanding and statistical text generation. People can do the former, LLM's do the latter.

0 - https://en.wikiquote.org/wiki/A_Tale_of_Two_Cities

eru · on Aug 20, 2024

Well, if you ask most educated English speaking humans to complete 'It was the best of times', they will most likely answer with 'it was the worst of times', too.

Btw, here's what I get from ChatGTP 4o:

> It was the best of times, it was the worst of times

> That's a famous opening line from Charles Dickens' A Tale of Two Cities. It contrasts the extremes of the era, reflecting the novel's themes of duality, revolution, and the complexity of human experience. Dickens was commenting on the contradictions of the time, particularly the French Revolution, where there were both tremendous progress and terrible suffering. What made you bring up this line?

> But these are statistically derived text constructs entirely dependent upon the training set of the LLM. Train one strictly on Java source code available in Maven Central and the answer will be radically different.

Well, if you give that line to a German who hasn't learned any English, the answer will also differ from what an educated English speaker will give you? What's your point?

What's an original insight to you? As far as I can tell, the LLM misses the 'insight' part more than the 'original' part.

> Such is the difference between understanding and statistical text generation. People can do the former, LLM's do the latter.

I agree that LLMs aren't good at understanding. (Yet?) And even people only sometimes are.

As far as I can tell, contemporary LLMs generate their answers 'greedily', ie just from left to right more or less directly with the output from the network.

In contrast something like AlphaGo overlays what you can call searching or optimisation processes on top of the outputs from their network.

I'm impressed by how well these LLMs already work despite all the limitations. And ML is still getting better rapidly.

AdieuToLogic · on Aug 21, 2024

> Well, if you give that line to a German who hasn't learned any English, the answer will also differ from what an educated English speaker will give you? What's your point?

Great observation.

A German who does not speak English will likely understand the question is in a non-German language and proceed from there. Perhaps seeking a translation, perhaps replying that they do not speak English and so the question is nonsensical for them.

My point is that the German has an understanding of this situation and will communicate accordingly. And, in this example, the LLM trained on Java source code does not have understanding, cannot have understanding, and will emit whatever its training data set directs it to do with the same confidence as any other answer to questions posed to it.

Because LLM's are algorithms, quite lovely ones, and algorithms can simulate the effect of understanding, but cannot possess it. Because understanding of this sort is a property of a person. Or maybe a better way to state this is what people think of as understanding is what we know to be understanding, which is by definition the ability to understand perceptions we, as people, experience.

> What's an original insight to you?

I don't think the following insight is original, but I'll put it out there anyway:

  Understanding is a property of people for any definition of
  understanding a person is capable of having.  This is due
  to the fact that understanding exists strictly within the
  consciousness of the entity defining/possessing it.

eru · on Aug 21, 2024

That seems weirdly restricted and circular.

To illustrate: I can get drunk or fall asleep or get hit on the head, then I'm still a person, but I can't understand. You can get a hint of that, by trying to talk to me, and figuring out that I don't make much sense.

Similarly, someone might figure out how to talk to dolphins or even aliens. If they give sufficiently sensible sensible replies, we will surely declare them to be sentient enough to 'understand'.

Another example: we've exchanged a few messages here. You seem to be smart enough to understand some things, but I don't know whether you are just an exceptionally advanced LLM (and the same goes vice versa for your opinion of me). Yet, I make the judgement that you probably 'understand'. But that's purely based on observed interactions, I did not probe whether you are 'people'.

Perhaps you (or me) are just simulating the effect of understanding? How would we be sure, if the simulation was good enough?

We agree that contemporary LLMs ain't good enough to have a good 'simulation of understanding'. But once the simulation becomes good enough, I don't think it makes a difference whether it's 'just a simulation' or the real deal.

(I don't know whether a straight-forward enlargement of contemporary LLMs will be good enough. But that's an empirical question to me, not a philosophical one.

I suspect if you go insanely large with insane amounts of training, the architecture of contemporary LLMs might be enough; just from pure brute-force scale. But I also suspect that that in practice we will first find success with more economic use of resources via more interesting techniques.)

make3 · on Aug 19, 2024

also, not really true, right, even though it sounds intellectual and strong to say. these algorithms are trained to generalize as best as they can to unseen text, and most often don't ever see any data point twice, except for data that has accidentally not been filtered. it's totally possible that it gets reasoning abilities that generalize well.

lossolo · on Aug 19, 2024

Generalize over their training data—they cannot generalize out of distribution. If they could, they would have already solved most human problems. So no, they do not generalize on unseen text. They will produce what is most statistically probable based on their training data. Things that are still unknown and statistically improbable based on our current knowledge are out of reach for LLMs based on transformers.

vidarh · on Aug 19, 2024

You can get them to solve unseen problems just fine. E.g. one example: Specify a grammar in BNF notation and tell it to generate or parse sentences for you. You can produce a more than random enough grammar that it it can't have derived the parsing of it from past text, but necessarily reasons about BNF notation sufficiently well to be able to use it to deduce the grammar, and use that to parse subsequent sentences. You can have it analyse them and tag them according to the grammar to. And generate sentences.

My impression, from seeing quite a few people trying to demonstrate they can't handle out of distribution problems it hat people are very predictable about how they go about this, and tend to pick well known problems that are likely to be overrepresented in the training set, and then tweak them a bit.

At least in one instance the other day, what I got from GPT when I tried to replicate it suggests to me it did the same that humans that have seen these problems before did, and carelessly failed to "pay attention" because it fit a well known template it's been exposed to a lot in training. After it answered wrong it was sufficient to ask it to "review the question and answer again" for it to spot the mistake and correct itself.

I'm sure that won't work for every problem of this sort, but the quality of tests people do on LLMs is really awful, at least because people tend to do very narrow tests like that and make broad pronouncements about what LLM's "can't" do based on it.

lossolo · on Aug 19, 2024

> You can get them to solve unseen problems just fine

Prove that the problem wasn't seen by them in other form.

> Specify a grammar in BNF notation and tell it to generate or parse sentences for you. You can produce a more than random enough grammar that it it can't have derived the parsing of it from past text, but necessarily reasons about BNF notation sufficiently well to be able to use it to deduce the grammar, and use that to parse subsequent sentences. You can have it analyse them and tag them according to the grammar to. And generate sentences.

Oh, come on. It's like rewriting the same program in another programming language with different variables. What it can't do is to create a concept of programming language, I'm not talking about a new programming language, I'm talking about the concepts.

> I'm sure that won't work for every problem of this sort, but the quality of tests people do on LLMs is really awful, at least because people tend to do very narrow tests like that and make broad pronouncements about what LLM's "can't" do based on it.

Here, a few papers that show they can't reason:

https://arxiv.org/abs/2311.00871

https://arxiv.org/abs/2309.13638

https://arxiv.org/abs/2311.09247

https://arxiv.org/abs/2305.18654

https://arxiv.org/abs/2309.01809

famouswaffles · on Aug 19, 2024

>It's like rewriting the same program in another programming language with different variables.

Since when has that not required reasoning ? It's really funny seeing people bend over backwards to exclude LLMs from some imaginary "real reasoning" they imagine they are solely privy to. It's really obvious this is happening when they leave well defined criteria and branch into vague, ill-defined statements. What exactly do you mean by concepts ? Can you engineer some test to demonstrate what you're talking about ?

Also, none of those papers show LLMs can't reason.

lossolo · on Aug 19, 2024

You clearly didn't read any of these papers. Quote from one of them

"Our results support the hypothesis that GPT-4, perhaps the most capable “general” LLM currenly available, is still not able to robustly form abstractions and reason about basic core concepts in contexts not previously seen in its training data"

Another, recent, good one https://arxiv.org/abs/2407.03321

EDIT: For people who don't want to read the papers, here is a blog post that explains what I'm arguing in more accessible terms https://cacm.acm.org/blogcacm/can-llms-really-reason-and-pla...

nickpsecurity · on Aug 21, 2024

"One way of doing this for planning tasks is to reduce the effectiveness of approximate retrieval by obfuscating the names of the actions and objects in the planning problem. When we did this for our test domains, GPT4’s empirical performance plummeted precipitously, despite the fact that none of the standard off-the-shelf AI planners have any trouble with such obfuscation. "

That's a great test. It shows they're matching prior patterns they saw, even down to what words were used, instead of thinking. We can match prior patterns, come up with the equivalences, and then plan that way. People often slow down when they do stuff like that, though. So, the A.I. would have to be able to do it but slowdowns would be acceptable.

famouswaffles · on Aug 19, 2024

Oh i've read them. The claim doesn't match up to reality. It's as simple as that. You can claim anything you want to.

https://arxiv.org/abs/2305.18354

All these papers you keep linking do is at best point out the shortcomings of current state of the art LLMs. They do not in any way disprove their ability to reason. I don't know when the word reason started having different standards for humans and machines but i don't care for it. Either your definition of reasoning also allows for the faulty kind humans display or humans don't reason either. You can't have your cake and eat it.

lossolo · on Aug 19, 2024

> Oh i've read them.

It's hard to believe that after reading all the papers and the blog I linked, along with the references there, any reasonable person would come to such strong conclusions as you did. This makes it hard for me to believe that you actually read all of them, especially given your previous questions and comments, which are addressed in those papers and someone that actually read them wouldn't make such comments or ask such questions. And the funniest thing, and further proof of this, is that you linked a paper that is addressed in one of the papers I shared. It seems like not only LLMs can fake things.

> All these papers you keep linking do is at best point out the shortcomings of current state of the art LLMs

They clearly show that they fake reasoning, and what they do is an advanced version of retrieval. Their claims are supported by evidence. What you call "shortcomings" are actually proof that they do not reason as humans do. It seems like your version of "reality" doesn't match reality.

famouswaffles · on Aug 19, 2024

The paper i linked is not addressed by the paper you linked. The paper you linked attempts to give LLMs the same benchmarks in a format they aren't best suited for. I don't know how you can call that "addressed".

>They clearly show that they fake reasoning

Sure and planes are fake flying. The illusive "fake reasoning" that is so apparently obvious and yet does not seem to have a testable definition that excludes humans.

You've still not explained how writing the same program in different languages doesn't require reasoning or how we can test your "correct" version of reasoning which requires "concepts".

lossolo · on Aug 19, 2024

> The paper i linked is not addressed by the paper you linked. The paper you linked attempts to give LLMs the same benchmarks in a format they aren't best suited for. I don't know how you can call that "addressed".

What you're writing now is nonsense in context of what I wrote. Once again, you're showing that you didn't read the papers. Which paper are you even referring to now, the one you think addresses the paper you linked?

> You've still not explained how writing the same program in different languages doesn't require reasoning or how we can test your "correct" version of reasoning which requires "concepts".

"Concepts" are explained in one of the papers I linked, which you would know if you had actually read them. As to programming languages they learn to identify common structures and idioms across languages. This allows them to map patterns (latent space representations duh!) from one language to another without reasoning about the underlying logic. When translating code, the model doesn't reason about the program's logic but predicts the most likely equivalent constructs in the target language based on the surrounding context. LLMs don't truly "understand" the semantics or purpose of the code they're translating. They operate on a superficial level, matching patterns and structures without grasping the underlying computational logic. The translation process for an LLM is a series of token-level transformations guided by learned probabilities, not a reasoned reinterpretation of the program's logic. They don't have an internal execution model or ability to "run" the code mentally. They perform translations based on learned patterns, not by simulating the program's behavior. The training objective of LLMs is to predict the next token, not to understand or reason about program semantics. This approach doesn't require or develop reasoning capabilities.

famouswaffles · on Aug 19, 2024

You are making a lot of assumptions that are mostly wrong.

Case in point:

https://arxiv.org/abs/2305.11169

I'm asking for something testable, not some post-hoc rationalization you believe to be true.

I'm not asking you to tell me how you think LLMs work. I'm asking you to define "real reasoning" such that i can test people and LLMs for it and distinguish "real reasoning" from "fake reasoning".

This definition should include all humans while excluding all LLMs. If it cannot, then it's just an arbitrary distinction.

HAL3000 · on Aug 19, 2024

It appears that you are the only person in this discussion making many incorrect assumptions. Based on your comments, I would assume you are actually googling those papers based on their abstracts. Your last linked paper has flawed methodology for what it attempts to demonstrate, as shown in this paper: https://arxiv.org/pdf/2307.02477 The tests you're requesting are provided within the previously linked papers. I'm not sure what you want. Do you expect people to copy and paste entire papers here that show methodology and describe experiments? You wrote, "I'm asking you to define 'real reasoning'," which is actually defined in the blog post linked earlier in this discussion. In fact, the entire blog post is about this topic. It appears that you are not thoroughly reading the material. Your replies resemble those of a human stochastic parrot.

famouswaffles · on Aug 19, 2024

>Your last linked paper has flawed methodology for what it attempts to demonstrate, as shown in this paper: https://arxiv.org/pdf/2307.02477

Genuinely, What's wrong with the methodology?

Your paper literally admits humans would also perform worse at counterfactuals. Worse than a LLM ? Maybe not but it never bothers to test this so...

The problem here is that none of the definitions (those that are testable) so far given actually separate humans from LLMs. They're all tests some humans would also flounder at or that LLMs perform far greater than chance at, if below some human's level.

If you're going to say, "LLMs don't do real reasoning because of x" then x better be something all humans clear if what humans do is "real reasoning".

Humans perform worse at counterfactuals so saying "Hey, see this paper that shows LLMs doing the same, It means they don't reason" is a logical fallacy if you don't extend that conclusion to humans as well.

vidarh · on Aug 20, 2024

In these arguments it's always very notable that not only do people not benchmark LLMs against people, but several I've discussed with have argued very strongly for not doing so unless they're benchmarked against above average people. While arguing that these same tests prove LLMs can reason. It never seems to land with them that their standards for "reason" would exclude large portions of the human population to some state of lesser being without the ability to reason.

vidarh · on Aug 20, 2024

It's hard to believe that after reading them, any reasonable person would think they support the extremely strong claim you made above.

> They clearly show that they fake reasoning

They do nothing of the sort.

vidarh · on Aug 20, 2024

That quote does not support your claim, and if you think it does, then I question your ability to reason.

vidarh · on Aug 20, 2024

> Prove that the problem wasn't seen by them in other form.

You can reduce that risk to arbitrarily low levels by trying multiple random grammars of some complexity. This is a weak argument.

> Oh, come on. It's like rewriting the same program in another programming language with different variables.

No, it's like following a grammar, but that requires reasoning about a set of rules it has not seen before. I don't think you understood the task I described as well as ChatGPT does.

> What it can't do is to create a concept of programming language, I'm not talking about a new programming language, I'm talking about the concepts.

Neither can most humans.

And have you tried to ask it about these concepts? I've had it infer semantics of code in programming languages that don't exist based on a hypothetical sample several times, and they're pretty good at coming up with semantics that makes sense. In one instance I gave it a sample with an idea about what made sense to me but it inferred a better set of semantics.

None of the papers you linked supports your claim.

make3 · on Aug 19, 2024

"generalize to its dataset" is a contradiction, especially as these models are trained in the one epoch regimen on datasets of the scale of all of the internet. if you think being able to generalize in ways similar to the whole of the internet does not give your meaningful abilities to reason, I'm not sure what I can tell you

lossolo · on Aug 19, 2024

> "generalize to its dataset" is a contradiction

Not "to" but over, example the same code written in one language over the other language.

> if you think being able to generalize in ways similar to the whole of the internet does not give your meaningful abilities to reason, I'm not sure what I can tell you

If after reading papers below that show empirically that they can't reason, you will still think they can reason, then I don't know what I can tell you.

https://arxiv.org/abs/2311.00871

https://arxiv.org/abs/2309.13638

https://arxiv.org/abs/2311.09247

https://arxiv.org/abs/2305.18654

https://arxiv.org/abs/2309.01809

totetsu · on Aug 19, 2024

Couldn't they show up new as yet unknown things, if they are statistically probable given the training data

lossolo · on Aug 19, 2024

No, none of the Millennium Problems or other math problems (unsolved by humans for decades or centuries) have been solved solely by LLMs, even though they possess all the knowledge in the world.

meowface · on Aug 19, 2024

Many years ago I trained a Markov chain chat bot on all of the conversation history in a group chat with me and my friends. It was regularly incredibly hilarious. I also would train it on subsets of them, like two of the friends but no one else, and name that version of the bot a combination of their names. Something about it was just so funny.

canjobear · on Aug 19, 2024

“Talk to Transformer” GPT-2 was very funny, then GPT-3 was not funny at all. We blew through the uncanny valley in a couple of years.

monokai_nl · on Aug 19, 2024

23 years ago I created a web app (That Can Be My Next Tweet) that could generate your future tweet by feeding all your past tweets in a Markov chain. When the Twitter API access was shut down, I took it offline. At the time it went super viral for a while, maxing out to 2 tweets per second.

It got some press and just now I went back to a Ted Talk of Adam Ostrow (Mashable), briefly showcasing this web app. He stated: you can imagine what something like this can look like 5, 10 or 20 years from now, and hinted at hyper-personalized communication AIs.

By no means was my web app any foundation of the LLMs today, but it's interesting nonetheless how relatively simple techniques can trigger ideas of how future scenarios could look like.

magicalhippo · on Aug 19, 2024

I've also enjoyed the Postmodernism essay generator[1], which uses[2] a recursive transition network[3] to generate the text.

[1]: https://www.elsewhere.org/journal/pomo/ (refresh for new, random essay)

[2]: https://www.elsewhere.org/journal/wp-content/uploads/2005/11...

[3]: https://en.wikipedia.org/wiki/Recursive_transition_network

johntb86 · on Aug 19, 2024

LLMs can be funny. For example, look at Golden Gate Claude ( https://news.ycombinator.com/item?id=40459543 ). But they're not good at intentionally being funny, so we need to break them to get absurdist humor instead.

cancerhacker · on Aug 19, 2024

Surprised no mention of. “The Policeman’s Beard is half constructed”[1] - the first book “written” by a computer program. And I imagine the git-man-page-generator[2] is similarly built (and far more fun!)

[1] https://archive.org/details/policemansbeardi0000unse [2] https://git-man-page-generator.lokaltog.net/

cout · on Aug 19, 2024

They had their love, but they also had their typewriters.

ineptech · on Aug 19, 2024

Relatedly, the author Robert Anton Wilson tried to manually implement a Markov-chain-bot in the 1960s by cutting up passages from various books, selecting them from a hat, and looking for overlapping words and phrase fragments. The best results were included in the Illuminatus! trilogy, some attributed to a character named Markoff Chaney.

trescenzi · on Aug 19, 2024

Reminds me of how the fake magic cards before chatgpt were always hilarious and sometimes playable, now they are usually playable and sometimes hilarious. Definitively a downgrade if you’d ask me.

JoshTriplett · on Aug 19, 2024

> always hilarious and sometimes playable

I saw plenty of those back then, and as far as I could tell, examples were always cherry-picked from a larger set.

stochaztic · on Aug 19, 2024

Yes, I ran the RoboRosewater account, which is where most people saw the concept I believe. I had a few documents with thousands of cards of output, went through them for any notable ones, and formatted and posted them. Probably 5-10% of the output made my personal cut for notability.

Applejinx · on Aug 19, 2024

Then I salute you. Some of the funniest things I've EVER seen in my whole life was watching the LoadingReadyRun comedy troupe, doing their 'Desert Bus' fundraiser and punchy from lack of sleep, encounter RoboRosewater for the first time. It became a real tradition :)

kibibu · on Aug 19, 2024

Maybe the right solution is using a Markov chain or gpt1 to generate ideas and an LLM to cherry pick.

fidotron · on Aug 19, 2024

We seem to be encountering more phenomena in tech where the increasing computing demands to improve the output exceeds any exponential growth in computing capacity. Graphics was the poster child for this, for years, and now the whole ML and AI space has discovered big machines make huge differences.

Markov chains are almost trivial to implement and run on small devices. A slightly extreme example is a rock, paper, scissors game I did that worked this way: https://luduxia.com/showdown/ The actual browser side markov chain implementation of that took something like 2-3 hours.

complaintdept · on Aug 19, 2024

My favorite markov experiment was feeding it the works of Nietzsche from Project Gutenberg. Really brought out the late stage syphilis, but was still very distinctly Nietzsche.

techjamie · on Aug 19, 2024

I love messing with Markov chains, and before Reddit dunked their API, I would feed my Markov chain comments from NSFW subreddits and political subreddits at different sample sizes. I would spend ages just watching it whiplash about some political topic and sampling thirst comments.

duskwuff · on Aug 19, 2024

The live feeds are all dead now, but you can still download archives of Reddit posts/comments from before they locked down the API:

https://academictorrents.com/details/9c263fc85366c1ef8f5bb9d...

macNchz · on Aug 19, 2024

Something about the not-quite-rightness of Markov output has always really amused me, like I can’t help but giggle at it.

My all-time favorite in this vein was @erowidrecruiter on Twitter, which generated posts with Markov chains from a corpus of tech recruiter emails and drug experience reports from erowid.org. Still up but no longer posting: https://x.com/erowidrecruiter?lang=en

tracerbulletx · on Aug 19, 2024

No, they might be funnier than an LLM specifically aligned to not be that funny.

Fine tune an LLM base model with jokes and align it by ranking how funny each reply is, instead of helpful questions and answers then we'll talk.

vitus · on Aug 19, 2024

Apparently the ChatGPT output is ripped off of a 20-year old emacs joke.

> In the beginning was the lambda, and the lambda was with Emacs, and Emacs was the lambda.

> – OliverScholz on news:alt.religion.emacs, 2003-03-28

https://www.emacswiki.org/emacs/TheBeginning (edited for brevity)

at_a_remove · on Aug 19, 2024

I used to keep a text file on hand back when people had some irritating Markov chain bots on IRC. Fed enough of the lines, the bots veered off into some new valley of chatter, appearing quite mad. Used to get some "What did you DO to my bot?" indignant messages. I wonder if I still have that file somewhere.

wcedmisten · on Aug 19, 2024

Highly recommend this performance from the last Strange Loop on a similar topic! I had the pleasure of seeing it live. A great combination of live music and Markov chains.

https://m.youtube.com/watch?v=M2o4f_2L0No

raister · on Aug 19, 2024

For anyone interested, I've wrote a while ago this book: "Markov Chains for programmers". Link: https://czekster.github.io/markov/

jsemrau · on Aug 19, 2024

Can fully agree. Source me who ran a markov chain based Twitter autogen tool 8 years ago.

donatj · on Aug 19, 2024

Hey, me too! I populated the markov chain with the text of my Twitter feed with a recency bias and generated tweets based on that so it stayed topical.

The account sat unused after Twitter locked down their API, and at some point got hacked without me noticing. It had been taken over by a crypto scammer, and the account got banned.

Trying to get it back was fruitless, Twitter/X's support is entirely useless.

jsemrau · on Aug 19, 2024

I got banned within 24 hours ^-^. Rightfully so, I assume. The Markov chain agent was supposed to send everyone who tweets "I am bored" an event recommendation based on their profile and location. I severely underestimated how many people tweet "I am bored"... Lessons learned, I guess.

donatj · on Aug 19, 2024

Oh, I didn't @ or DM anybody. I just had a feed with tweet once an hour, and even then I manually approved the Tweets before they went out. My account made very clear it was a bot, it was in its name. Didn't ever get very many followers. I would say I never really got past the experimental phase.

ku1ik · on Aug 19, 2024

This is something people miss when talking about LLMs: „It’s the most average thing you could have possibly produced given the context.”

They can’t be creative by design. They’re useful when you want to reproduce, but not when you want to create something completely new (that you can maybe do by getting a bunch of average outputs from an LLM and getting inspired yourself).

When GPT-4 came out I was playing with it, and I often tried to get some unique, creative output from it, but very soon I learned it was futile. It was back when it all still felt magical, and I guess many of us tried various things with it.

Kim_Bruning · on Aug 19, 2024

If you tell GPT-4 to get snarky, you can sometimes end up laughing (and it seems a tad more intelligent and socially perceptive besides!) . But Claude-3.5 can definitely get me laughing, at least.

Now imagine telling Claude-3.5 to try being snarky while sorting out software issues at a customer's office.

There should be a warning label!

notjulianjaynes · on Aug 19, 2024

In college (around 2012ish) I made some poems by transcribing a recording of myself mumbling gibberish into Dragon Naturally Speaking, which partially relied on Markov chains for its ASR. It would output some string of text, and then I'd delete all the words it output from its dictionary; repeat. It worked very well for creating a list of (slant) rhymes, I have not found a better way to do this. As a literary tool, I've found LLMs to be seriously lacking in the ability to fuck around like this.

_emacsomancer_ · on Aug 19, 2024

Funnily enough, I was idly playing with a last-updated-12-years-ago Markov chain text generator[0] in Emacs, and had quickly a number of amusing Markov chain outputs from mash-ups of the Elisp manual and Alice in Wonderland, including:

"13.7 Anonymous Functions

Although functions are usually defined with the built-in defmacro macro, but any list that begins with an M--'

`Why with an M?' said Alice.

`Why not?' said the March Hare."

[0]: https://github.com/skeeto/markov-text

FeepingCreature · on Aug 19, 2024

Isn't that the sort of joke that you'd get from GPT, not a Markov chain? Unless it's quoting verbatim, a Markov chain should have no idea that Alice is supposed to refer back to the letter from the previous sentence.

StilesCrisis · on Aug 19, 2024

This phrase turns out to be a verbatim quote from the story.

drivers99 · on Aug 19, 2024

So it is:

> “They were learning to draw,” the Dormouse went on, yawning and rubbing its eyes, for it was getting very sleepy; “and they drew all manner of things—everything that begins with an M—”

> “Why with an M?” said Alice.

> “Why not?” said the March Hare.

> Alice was silent.

Miraltar · on Aug 19, 2024

I wouldn't say that LLMs aren't fun, it's more that jokes out of context aren't fun. If you ask somebody to just tell you a joke, there's a very low chance that you'll actually laugh, especially if it's in text format. But I've had some serious laugh looking at some of dougdoug videos featuring LLMs

npteljes · on Aug 20, 2024

>These 2 examples are somewhat cherry-picked

That's for sure. I have seen many Markov chain implementations, and if you could generate 1 funny thing for every 10 tries, then that was a good day. Both Markov chains and LLMs have a distinct style, which gets old over time. Markov much faster for me. So, in my experience LLMs win, by far.

I do agree with the author that the LLM style can get really boring. I experienced the same myself. But the Markov results, while much less restrained, are so much more nonsense too. Often questioning its overall usefulness. Which, I think, the world also agrees on: while Markov chain implementations were fun toys at best, which worked sometimes to a kind of funny degree, LLMs are everywhere.

shagie · on Aug 19, 2024

I still find https://www.infiniteconversation.com to be one of the odd / funny / "uh... what?!" type things. It's not intended to be funny in the "ha ha" way but sometimes tickles my funny bone with the absurdist philosophy - talking about everything and nothing at the same time.

From the FAQ it is a tuned LLM.

> Mostly using open source tools available to anyone. The generation of the script itself is done using a popular language model that was fine-tuned on interviews and content authored by each of the two speakers.

atoav · on Aug 19, 2024

The cool thing about market chains is that you can tune the probability with which it will jump between "genres" of it's training-content by making the analyzed N-Grams shorter or longer.

E.g. it starts out as a news headline and ends with a bible verse.

maxglute · on Aug 19, 2024

I use to be unreasonably entertained tapping the suggest next word feature on google keyboard.

killerstorm · on Aug 19, 2024

A small language model can be quite like Markov chain, e.g. the one from Karpathy's 2015 post was pretty nonsensical: https://karpathy.github.io/2015/05/21/rnn-effectiveness/

So you can just scale down if it still makes sense.

Also you get a lot more from the base model. GPT-3 was versatile as it could continue any context. Modern LLMs are try-hards. If you want to generate humor with LLM really worth going for base model with multiple examples in the prompt.

treprinum · on Aug 19, 2024

Generative AI often uses Markov chains as its theoretical basis and is modeled in the language of conditional probabilities albeit not stuck to a single ancestor but to a chain of ancestors. One can often view LLMs as scaled out Markov chains (more like "Erlang" chains allowing deeper memory) that encode the probability of the next token based on the past of the size of the context window and then doing some magic when picking next tokens.

Jensson · on Aug 19, 2024

Even old Markov chains has a context window of a few words, single word context Markov chains produce horrible results.

People who don't think LLMs are Markov chains are just ignorant, not realizing that Markov chain isn't an algorithm, you can compute the probability in any manner and it is still a Markov chain.

throwthrowuknow · on Aug 19, 2024

It’s not an n-gram, go learn about transformers

Jensson · on Aug 19, 2024

A context window is an n-gram.

throwthrowuknow · on Aug 19, 2024

It’s not used as an n-gram