ClosedAI scraped human content without asking and they explained why this was acceptable... but when the outputs of their training corpus is scraped, it is THEIR dataset and this is NOT acceptable!
Oh, the irony! :D
I shared a few screenshots of DeepSeek answering using ChatGPT's output in yesterday's article!
Also, DeepSeek is allegedly... better? So saying they just copied ClosedAI isn't really sufficient of an answer. Seems to be just bluster because the US Govt would probably accept any excuse to ban it, see TikTok.
It’s not better. In most of my tests (C++/QT code) it just runs out of context before it can really do anything. And the output is very bad - it mashes together the header and cpp file. The reasoning output is fun to look at and occasionally useful though.
The max token output is only 8K (32K thinking tokens). O1 is 128k, which is far more useful, and it doesn’t get stuck like R1 does.
The hype around the DeepSeek release is insane and I’m starting to really doubt their numbers.
Is this a local run of one of the smaller models and/or other-models-distilled-with-r1, or are you using their Chat interface?
I've also compared o1 and (online-hosted) r1 on Qt/C++ code, being a KDE Plasma dev, and my impression so far was that the output is roughly on par. I've given both models some tricky tasks about dark corners of the meta-object system in crafting classes etc. and they came up with generally the same sort of suggestions and implementations.
I do appreciate that "asking about gotchas with few definitive solutions, even if they require some perspective" and "rote day-to-day coding ops" are very different benchmarks due to how things are represented in the training data corpus, though.
I use it through Kagi Assistant which has the proper R1 model through Together.ai/Fireworks.ai
My standard test is to ask the model to write a QSyntaxHighlighter subclass that uses TreeSitter to implement syntax highlighting. O1 can do it after a few iterations, but R1’s output has been a mess. That said, its thought process revealed a few issues that I then fixed in my canonical implementation.
Thanks for adding detail! My prompts have been very in-the-bubble-of-Qt I'd say, less so about mashing together Qt and something else, which I agree is a good real-world test case.
I haven’t had the chance to try it out with R1 yet but if you implement a debugger class that screenshots the widget/QML element, dumps its metadata like GammaRay, and includes the source, you can feed that context into Sonnet and o1. They are scarily good at identifying bugs and making modifications if you include all that context (although you have to be selective with what metadata you include. I usually just dump a few things like properties, bindings, signals, etc).
R1 is trained for a context length of 128K. Where are you getting 8K/32K? The model doesn't distinguish "thinking" tokens and "output" tokens, so this must be some specific API limitations.
> max_tokens:The maximum length of the final response after the CoT output is completed, defaulting to 4K, with a maximum of 8K. Note that the CoT output can reach up to 32K tokens, and the parameter to control the CoT length (reasoning_effort) will be available soon. [1]
I’m using it through Kagi which doesn’t use Deepseek’s official API [1]. That limitation from the docs seems to be everywhere.
In practice I don’t think anyone can economically host the whole model plus the kv cache for the entire context size of 128k (and I’m skeptical of Deepseek’s claims now anyway).
Edit: a Kagi team member just said on Discord that they’ll be increasing max tokens next release
He's just repeating a lot of disinformation that has been released about deepseek in the last few days. People who took the time to test DeepSeek models know that the results have the same or better quality for coding tasks.
Benchmarks are great to have but individual/org experiences on specific codebases still matter tremendously.
If an org consistently finds one model performs worse on their corpus than another, they aren't going to keep using it because it ranks higher in some set of benchmarks.
But you should also be very wary of these kind of anecdotes, and this thread highlights exactly why. That commenter says in another comment (https://news.ycombinator.com/item?id=42866350) that the token limitation that he is complaining about has actually nothing to do with DeepSeek's model or their API, but is a consequence of an artificial limit that Kagi imposes. In other words, his conclusion about DeepSeek is completely unwarranted.
It mashed the header and C++ file together, which is egregiously bad in the context of QT. This isn’t a new library, it’s been around for almost thirty years. Max token sizes have nothing to do with that.
I invite anyone to post a chat transcript showing a successful run of R1 against this prompt (and please tell me which API/service it came from so I can go use it too!)
I wasn't suggesting using the anecdotes of others to make a decision.
I'm talking about individuals and organizations making a decision on whether or not to use a model based on their own testing. That's what ultimately matters here.
It's not great at super-complex tasks due to limited context, but it's quite a good "junior intern that has memorized the Internet." Local deepseek-r1 on my laptop (M1 w/64GiB RAM) can answer about any question I can throw at it... as long as it's not something on China's censored list. :)
Thanks for saying this, I thought I was insane, DeepSeek is kinda bad. I guess it’s impressive all things considered but in absolute terms it’s not great.
I have run personal tests and the results are at least as good as I get from OpenAI. Smarter people have also reached the same conclusion. Of course you can find contrary datapoints, but it doesn't change the big picture.
To be fair, it's amazing by the standards of six months ago. The only models that beat it are o1, the latest gemini models and (for some things) sonnet 3.6
It’s definitely not all hype, it really is a breakthrough for open source reasoning models. I don’t mean to diminish their contribution, especially since being able to read the reasoning output is a very interesting new modality (for lack of a better word) for me as a developer.
It’s just not as impressive as people make it out to be. It might be better than o1 on Python or Javascript thats all over the training data, but o1 is overwhelmingly better at anything outside the happy path.
> An AACS encryption key (09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0) that came to prominence in May 2007 is an example of a number claimed to be a secret, and whose publication or inappropriate possession is claimed to be illegal in the United States.
This is a silly take for anyone in tech. Any binary sequence is a number. Any information can be, for practical purposes, rendered in binary [1].
Getting worked up about restrictions on numbers works as a meme, for the masses, because it sounds silly, but is tantamount to technically arguing against privacy, confidentiality, the concept of national secrets, IP as a whole, et cetera.
> Any piece of digital information is representable as a number; consequently, if communicating a specific set of information is illegal in some way, then the number may be illegal as well.
There is thought-stopping satire and thought-provoking satire. Much of it depends on the context. I’m not getting the latter from a “USA land of the ‘free’” comment.
> It depends on where you live. In many places, collecting rainwater is completely legal and even encouraged, but some regions have regulations or restrictions.
United States: Most states allow rainwater collection, but some have restrictions on how much you can collect or how it can be used. For example, Colorado has limits on the amount of rainwater homeowners can store.
Australia: Generally legal and encouraged, with many homes using rainwater tanks.
UK & Canada: Legal with few restrictions.
India & Many Other Countries: Often encouraged due to water scarcity.
I think so; I joined Reddit when it was in tech news as people left Digg after the big redesign. I'm not sure when the exodus started. I left Fark over the hd-dvd mess.
In both cases, legality depends entirely on repercussions, i.e. if there's someone to enforce the ban. I suspect that in the "illegal numbers" case there might be.
It's not open source. The provide the model and the weights, but not the source code and, crucially, the training data. As long as LLM makers don't provide the training data (and they never will, because then they will be admitting to stealing), LLMs are never going to be open source.
(a) You have everything you need to be able to re-create something, and at any step of the process change it.
(b) You have broad permissions how to put the result to use.
The "open source" models from both Meta so far fail either both or one of these checks (Meta's fails both). We should resist the dilution of the term open source to the point where it means nothing useful.
Agreed, but the "connotations don't match" is mostly because the folks who chose to call it open source wanted the marketing benefits of doing so. Otherwise it'd match pretty well.
At the risk of being called rms, no, that's not what open source means. Open source just means you have access to the source code. Which you do. Code that is open source but restrictively licensed is still open source.
That's why terms like "libre" were born to describe certain kinds of software. And that's what you're describing.
This is a debate that started, like, twenty years ago or something when we started getting big code projects that were open source but encumbered by patents so that they couldn't be redistributed, but could still be read and modified for internal use.
> Open source just means you have access to the source code.
That's https://en.wikipedia.org/wiki/Source-available_software , not 'open source'. The latter was specifically coined [1] as a way to talk about "free software" (with its freedom connotations) without the price connotations:
The argument was as follows: those new to the term "free software" assume it is referring to the price. Oldtimers must then launch into an explanation, usually given as follows: "We mean free as in freedom, not free as in beer." At this point, a discussion on software has turned into one about the price of an alcoholic beverage. The problem was not that explaining the meaning is impossible—the problem was that the name for an important idea should not be so confusing to newcomers. A clearer term was needed. No political issues were raised regarding the free software term; the issue was its lack of clarity to those new to the concept.
It's common for terms to have a more specific meaning when combined with other terms. "Open source" has had a specific meaning now for decades, which goes beyond "you can see the source" to, among other things, "you're allowed to it without restriction".
I don't know why you've been downvoted. This is a 100% correct history. "Open source" was specifically coined as a synonym to "free software", and has always been used that way.
> Open source just means you have access to the source code. Which you do.
No, they also fail even that test. Neither Meta nor DeepSeek have released the source code of their training pipeline or anything like that. There's very little literal "source code" in any of these releases at all.
What you can get from them is the model weights, which for the purpose of this discussion, is very similar to compiler binary executable output you cannot easily reverse, which is what open source seeks to address. In the case of Meta, this comes with additional usage limitations on how you may put them to use.
As a sibling comment said, this is basically "freeware" (with asterisks) but has nothing to do with open source, either according to RMS or OSI.
> This is a debate that started, like, twenty years ago
For the record, I do appreciate the distinction. This isn't meant as an argument from authority at all, but I've been an active open source (and free software) developer for close to those 20 years, am on the board of one of the larger FOSS orgs, and most households have a few copies of FOSS code I've written running. It's also why I care! :-)
The weights, which are part of the source, are open. Now you are arguing it not being open source because they don't provide the source for that part of the source. If you follow that reasoning you can ad infinitum claim the absence of sources since every source originates from something.
The source is the training data and the code used to turn the training data _into_ the weights. Thus GP is correct, the weights are more akin to a binary from a traditional compiler.
To me this 'source' requirement does not make sense. It is not that you bring training data and the application together and press a train button, there's much more actions involved.
Also the training data is of a massive amount.
Additionally, what about human in the loop training, do you deliver humans as part of the source?
> they also fail even that test. Neither Meta nor DeepSeek have released the source code of the
This debate is over and makes the open source community look silly. Open model and weights is, practically speaking, open source for LLMs.
I have tremendous respect for FOSS and those who build and maintain it. But arguing for open training data means only toy models can practically exist. As a result, the practical definition will prevail. And if the only people putting forward a practical definition are Meta et al, this is what you get: source available.
I'm not arguing for open training data BTW, and the problem is exactly this sort of myopic focus on the concerns of the AI community and the benefits of open-washing marketing.
Completely, fully breaking the meaning of the term "open source" is causing collateral damage outside the AI topic, that's where it really hurts. The open source principle is still useful and necessary, and we need words to communicate about it and raise correct expectations and apply correct standards. As a dev you very likely don't want to live in a tech environment where we regress on this.
It's not "source available" either. There's no source. It's freeware.
"I can download it and run it" isn't open source.
I'm actually not too worried that people won't eventually re-discover the same needs that open source originally discovered, but it's pretty lame if we lose a whole bunch of time and effort to re-learn some lessons yet again.
> it's pretty lame if we lose a whole bunch of time and effort to re-learn some lessons yet again
We need to relearn because we need a different definition for LLMs. One that works in practice, not just at the peripheries.
Maybe we can have FOSS LLMs vs open-source ones, like we do with software licenses. The former refers to the hardcore definition. The latter the practical (and widely used) one.
Sure, I don't disagree. I fully understand the open-weights folks looking for a word to communicate their approach and its benefits, and I support them in doing so. It's just a shame they picked this one in - and that's giving folks a lot of benefit of the doubt - a snap judgement.
> Maybe we can have FOSS LLMs vs open-source ones, like we do with software licenses.
Why not just call them freeware LLMs, which would be much more accurate?
There's nothing "hardcore" or "zealot" about not calling these open source LLMs because there's just ... absolutely nothing there that you call open source in any way. We don't call any other freeware "open source" for being a free download with a limited use license.
This is just "we chose a word to communicate we are different from the other guys". In games, they chose to call it "free to play (f2p)" when addressing a similar issue (but it's also not a great fit since f2p games usually have a server dependency).
> Why not just call them freeware LLMs, which would be much more accurate?
Most of the public is unfamiliar with the term. And with some of the FOSS community arguing for open training data, it was easy to overrule them and take the term.
Most of the public is also unfamiliar with the term open source, and I'm not sure they did themselves any favors by picking one that invites far more questions and needs for explanation. In that sense, it may have accomplished little but its harmful effects.
I get your overall take is "this is just how things go in language", but you can escalate that non-caring perspective all the way to entropy and the heat death of the universe, and I guess I prefer being an element that creates some structure in things, however fleeting.
The only practical and widely used definition of open source is the one known as the Open Source Definition published by the OSI.
The set of free/libre licenses (as defined by the FSF) is almost identical to the set of open sources licenses (as defined by the OSI).
The debate within FOSS communities has been between copyleft licenses like the GPL, and permissive licenses like the MIT licence. Both copyleft and permissive licenses are considered free/libre by the FSF, and both of them are considered open source by the OSI.
People say this, but when it comes to AI models, the training data is not owned by these companies/groups, so it cannot be "open sourced" in any sense. And the training code is basically accessing that training data that cannot be open sourced, therefore it also cannot be shared. So the full open source model you wish to have can only provide subpar results.
They could easily list the data used though.
These datasets are mostly known and floating around.
When they are constructed, instructions for replication could be provided too
But I think my argument still stands though? Users can run Deepseek locally, so unless the US Gov't wants to reach for book burning levels or idiocy, there is not really a feasible way to ban the American public of running DeepSeek, no?
Yes, your argument still stands. But I think it's important to stand firm that the term "open source" is not a good label for what these "freeware" LLMs are.
There was an executive order passed by the previous administration that make using anything with more than 10 billion parameters illegal and punishable by government force if done without authorization. Of course like most government regulations (even though this is not a regulation, it is an executive action) the point is not to stop the behavior but instead to create a system where everyone breaks the regulation constantly so that if anyone rocks the boat they can be indicted/charged and dealt with.
>(k) The term “dual-use foundation model” means an AI model that is trained on broad data; generally uses self-supervision; contains at least tens of billions of parameters; is applicable across a wide range of contexts; and that exhibits, or could be easily modified to exhibit, high levels of performance at tasks that pose a serious risk to security, national economic security, national public health or safety, or any combination of those matters, such as by: ...
That order does not "make using anything with more than 10 billion parameters illegal and punishable by government force if done without authorization".
It orders the Secretary of Commerce to "solicit input from the private sector, academia, civil society, and other stakeholders through a public consultation process on potential risks, benefits, other implications, and appropriate policy and regulatory approaches related to dual-use foundation models for which the model weights are widely available".
Many regulations are created by executive action, without input from Congress. The Council on Environmental Quality, created by the National Environmental Policy Act, has the power to issue it's own regulations. Executive Orders can function similarly and the executive can order rulemaking bodies to create and remove regulations, though there is a judicial effort to restrict this kind of policymaking and return regulatory power back to Congress.
There’s an effort to restrict certain regulatory rule-making where it’s ideologically convenient, but it isn’t “returning” regulatory power. That rulemaking authority isn’t derived by some bullshit executive order, but by Federal law, as implemented by congress.
Congress has never ceded power to anyone. They wield legislative authority and power of the purse, and wield it as they see fit. The special interests campaigning about this are extreme reactionaries whose stated purpose is to make government ineffective.
If I'm no wrong wasn't PGP encryption once illegal to export ?
Not quite the same but the government has a nice habit of feeling like they can bad the export of research.
Add PS1 too. The US government banned sale of PlayStation to China because the PLA would apparently have access to cutting edge chips for their missiles
But that's not the goal, the goal is to protect the "intelectual property" only to American companies. Countries not in the "friends list" cannot sell products in that area without suffering repercussions. That's how the US has maintained technological dominance in some areas by restricting what other countries can do.
> We will attempt to directly build safe and beneficial AGI, but will also consider our mission fulfilled if our work aids others to achieve this outcome.
> Our primary fiduciary duty is to humanity. We anticipate needing to marshal substantial resources to fulfill our mission, but will always diligently act to minimize conflicts of interest among our employees and stakeholders that could compromise broad benefit.
> We will actively cooperate with other research and policy institutions; we seek to create a global community working together to address AGI’s global challenges.
I think one good thing to come out of all this tech elite flip flopping is that I now see these tech leaders for exactly who they are. It makes me kind of sad, because as someone who came of age early in the Web era I really wanted to believe that there was a bigger moral good to all we were doing.
I now view any moralistic statement by any of these big tech companies as complete and total bullshit, which is probably for the best, because that is what it is. These companies now exist solely to amass power and wealth. They will still use moralistic language to try to motivate their employees, but I hope folks still see it for the complete nonsense that it is.
The picture at the end showing deepseek's privacy policy and being concerned that it's "a security risk" is hilarious[1]. Basically every B2C company collects this sort of information[2], and is far less intrusive than what social networks collect[3]. But because it's Chinese and at the risk of overtaking Western companies, people are suddenly worried about device information and IP addresses?
I welcome friction, so I'll be blunt: I disagree with you, not because what you are saying is wrong but because you only consider systematic data collection.
That's not the issue here.
There's a difference between democracies like the United States or European countries, no matter how IMPERFECT they are, and a dictatorship that does not allow dissenting opinions.
There's a difference in how the data collected will be used.
Freedom of speech, even when it is relative, is better than totalitarianism.
It’s also important to recognize that the Chinese government is known to walk into internet service companies and demand they censor, alter data, delete things. No court order or search warrant required.
China considers industry to be completely subservient to government. Checks and balances are secondary to ideas like harmony and collective well being.
>There's a difference between democracies like the United States or European countries, no matter how IMPERFECT they are, and a dictatorship that does not allow dissenting opinions.
>There's a difference in how the data collected will be used.
>Freedom of speech, even when it is relative, is better than totalitarianism.
I don't disagree with "democracy is better than totalitarianism", but what does that have to do with collecting device information and IP addresses? Is that excuse a cudgel you can use against any behavior that would otherwise be innocuous? It's fine to be against deepseek because you're concerned about them getting sensitive data via queries, or even that their models be a backdoor to project chinese soft power, but hand wringing about device information and IP addresses is absurd. It makes as much sense as being concerned that the CCP/deepseek does meetings, because even though every other companies does meetings, CCP/deepseek meetings could be used for totalitarianism.
Also, the same people that complain about this are just fine with a western government having access to the same data via big corporations. Why being democratic gives you a free access card to disregard privacy, in other words, doing exactly the opposite of what is expected from a free society?
I don't disagree with you either and like you, I'm entirely against privacy violations in any way, shape or form.
I admit I am concerned when I see blatant algorithmic manipulation of social platforms to favor any narrative that aligns with geopolitical objectives.
I also wrote about the TikTok algo a few days ago. You'll see what I think of user privacy violations (closed ecosystem + basically a keylogger in this case):
>I'm entirely against privacy violations in any way, shape or form.
>Our privacy should be respected.
Characterizing device information and IP addresses as "privacy violations" is a stretch. If you showed a history railing against this sort of stuff, agnostic of geopolitical alignment, then you get a pass, but I think it's fair to assume the converse until proven otherwise.
>In the meantime: strong encryption at every corner, please!
Irrelevant. The data collection is done by first parties. Encryption doesn't do anything.
>I admit I am concerned when I see blatant algorithmic manipulation of social platforms to favor any narrative that aligns with geopolitical objectives.
>I cannot stand when dissenting voices or opinions are shadow-banned.
What does this have to do with privacy? Again, it's fine to be against "blatant algorithmic manipulation of social platforms" or whatever, but dragging seemingly unrelated topics in an attempt to amass as big pile of greviances as possible is disingenuous.
>I also wrote about the TikTok algo a few days ago. You'll see what I think of user privacy violations (closed ecosystem + basically a keylogger in this case):
Where's the keylogging? I skimmed the article and the only thing I could find was a passing mention about an article that you "was advised not to publish it and I didn’t". How much keylogging could possibly going on in a short video app? Is the "keylogging" just a way to make "we measure how engaged someone is with a video" as sinister as possible?
>Characterizing device information and IP addresses as "privacy violations" is a stretch.
I agree: this is a characterization I never made. FYI, I also collect this type of data about you when you visit my website. That said, telemetry + totalitarianism = bad combo.
>Irrelevant. The data collection is done by first parties. Encryption doesn't do anything.
Even if data is collected by first parties, encryption is still highly relevant because it ensures that the data remains secure in transit and at rest. It does a lot.
>What does this have to do with privacy? Again, it's fine to be against "blatant algorithmic manipulation of social platforms" or whatever, but dragging seemingly unrelated topics in an attempt to amass as big pile of greviances as possible is disingenuous.
You are aggressive for no reason whatsoever. There's nothing disingenuous: when users are shadow-banned by platforms under dictatorships, they end up flagged, and their private data is often analyzed for nefarious reasons. There's a link with privacy but I'll stop at this stage if we cannot have a civilized discussion.
>Where's the keylogging? I skimmed the article and the only thing I could find was a passing mention about an article that you "was advised not to publish it and I didn’t". How much keylogging could possibly going on in a short video app? Is the "keylogging" just a way to make "we measure how engaged someone is with a video" as sinister as possible?
“TikTok iOS subscribes to every keystroke (text inputs) happening on third party websites rendered inside the TikTok app. This can include passwords, credit card information and other sensitive user data. (keypress and keydown). We can’t know what TikTok uses the subscription for, but from a technical perspective, this is the equivalent of installing a keylogger on third party websites.”
Please note that this article is outdated (August 2022). Importantly, the article does not claim that any data logging or transmission is actively occurring. Instead, it highlights the potential technical capabilities of in-app browsers to inject JavaScript code, which could theoretically be used to monitor user interactions.
> I admit I am concerned when I see blatant algorithmic manipulation of social platforms to favor any narrative that aligns with geopolitical objectives.
I'm curious how robust this principle is for you, because China and Russia are not the first countries that come to mind when talking about the (actual, existing, documented) manipulation of US speech and media by a foreign government.
Yet it seems we can only have this discussion, ironically, when the subject is a US government-approved one like China. Anything else would be problematic and unsafe.
Amusing Bruno seems to think in terms of labels when the reality is that the USA imprisons far more people per capita, and blatantly disregards its so-called "core freedoms" (ie, Bill of Rights) for its citizens very often.
This kind of person has a lot of cognitive dissonance going on.
While all of this is true, that DeepSeek wouldn't be here were it not for the research that preceded it notably Google's paper, then Llama, and ChatGPT which they're modeled after, its release still did something profound to their psyche, the motivation and self-actualization this instills to the Chinese. They witnessed the power of their accomplishments: a side-hustle project knocked off an easy trillion. This is only egging them on and will serve to ramp up their efforts even more.
Separately, I do think that now that the Chinese leadership saw this, that they have the chops to pull this off and then some, they are probably going to rein in future innovations; they'll likely demand that the big future discoveries remain closed-sourced (or even unannounced/unpublicized).
OpenAI wouldn't be here without the work that Yann Lecun did at Facebook (back when it was facebook). Science is built on top of science, that's just how things work.
Yes, but in science you reference your work and credit those who came before you.
Edit: I am not defending OpenAI and we are all enjoying the irony here. But it puts into perspective some of the wilder claims circulating that DeekSeek was able to somehow complete with OpenAI for only $5M, as if on a level playing field.
OpenAI has been hiding their datasets, and certainly haven't credited me for the data they stole from my website and github repositories. If OpenAI doesn't think they should give attribution to the data they used, it seems weird to require that of others.
Edit: Responding to your edit, Deepseek only claimed that the final training run was $5m, not that the whole process caught that (they even call this out). I think it's important to acknowledge that, even if they did get some training data from OpenAI, this is a remarkable achievement.
It is a remarkable achievement. But if “some training data from OpenAI” turns out to essentially be a wholesale distillation of their entire model (along with Llama etc) I do think that somewhat dampens the spirit of it.
We don’t know that of course. OpenAI claim to have some evidence and I guess we’ll just have to wait and see how this plays out.
There’s also a substantial difference between training of the entire internet and one that very specifically targets your competitor's products (or any specific work directly).
That's $5M for the final training run. Which is an improvement to be sure, but it doesn't include the other training runs -- prototypes, failed runs and so forth.
It is OpenAI that discredits themselves when they say that each new model is the result of hundreds of USD millions in training. They throw this around as it is a big advantage of their models.
Is that really true? If anything OpenAI was dependent on the transformers paper from Google from Ashish Vaswani and others. LeCun has been criticizing LLM architectures for a long time and has been wrong about them for a long time.
Personally, I have not seen anything from him that is meaningful. OpenAI and Anthropic (itself started by former OpenAI people) of course have built their models without LeCun’s contributions. And for a few years now, LeCun has been giving the same talk anywhere he makes appearances, saying that large language models are a dead end and that other approaches like his JEPA architecture are the future. Meanwhile current LLM architecture has continued to evolve and become very useful. As for the misuse of the term “open source”, I think that really began once he was at Meta, and is a way to use his fame to market Llama and help Meta not look irrelevant.
By the way, as someone who once did classical image recognition using convolutions, I can't say I was very impressed by the CNN approach, especially since their implementation didn't even use FFTs for efficiency.
We wouldn't be here discussing if nobody invented internet... nor these models had training data at all.
> Separately, I do think that now that the Chinese leadership saw this, that they have the chops to pull this off and then some, they are probably going to rein in future innovations; they'll likely demand that the big future discoveries remain closed-sourced (or even unannounced/unpublicized).
How do we know that this is not already happening with OpenAI/Meta and the U.S. government at some level? The concept of power is equal, whether we wanted it or not. We don't have to pretend to be "better" all the time.
> they'll likely demand that the big future discoveries remain closed-sourced
Depends on whether they want these tools to be adopted in the wider world. Rightly or wrongly there is a lot of suspicion in the West and an open source approach builds trust.
> While all of this is true, that DeepSeek wouldn't be here were it not for the research that preceded it (notably Llama), and ChatGPT which they're modeled after...
If the allegation is true (we don't know yet), then what you've written perfectly proves the point everyone is making. ChatGPT wouldn't be here if it weren't for all the research and work that preceded it in terms of tons of scrapable content being available on the Internet, and it's not like OpenAI invented transformers either.
Nobody is accusing DeepSeek of hacking into OpenAI's systems and stealing their content. OpenAI is just saying they scraped them in an "unauthorized" manner. The hypocrisy is laughably striking, but sadly nobody has any shame anymore in this world it seems. Play me the world's tiniest violin for OpenAI.
"That's hilarious!" was my first reaction as well, when I heard about it the first time. When I came to HN and saw this story on top I was hoping this was the top comment. I was not disappointed.
US AI folk were leading for two years by just throwing more and more compute at the same thing that Google threw them like a bone years ago (namely transformers). They made next to no innovation in any area other than how to connect more compute together. The idea of additional inference time compute, looping the network back on its own outputs, which is the only significant conceptual advancement of last years was something I, as a layman, came up with after few days of thinking why AI sucks and what can be done to make it able to tackle problems that require iterative reasoning. They announced it few weeks after I came up with the idea, so it was in the works for some time, but it shows you how basic idea it was. There was nothing else.
Suddenly when there comes a small company that introduced few actual algorithmic advancements which resulted in 100x optimization which is something expected with algorithmic optimizations, the big AI suddenly went into full "dog ate my homework" mode. Blaming everyone and everything around.
Let's not mention the fact that if full outputs of their models could enable them to train a better model at 1% cost then it puts them in even worse light that they didn't do it.
It’s not often you get 100x optimization with some small improvements so I’m kind of skeptical.
We have and apples and oranges thing here which deepseek is intentionally leaning into. They get very cheap electricity and are bragging about their cheap cost, and OpenAI etc typically brag about how expensive their training is. But it’s all pr and lies.
> They get very cheap electricity and are bragging about their cheap cost
The cost of $5.5 million was quoted at $2/GPU-hour which is a reasonable price for on-demand H100s that anyone in the US could access, and likely on the high side given bulk pricing and that they are using nerfed versions. OpenAI might be all pr and lies but everything I've seen so far says that deepseek's claims about cost are legit.
Hypocrisy or not, the US government has managed to make this work for a long time now, the Biden administration just proves the point. Thankfully, other countries are starting to catch up to this scam.
Yes , to be fair , As a foreigner (not a US citizen basically) I don't mean to offend somebody. But USA just seems to be build on top of Hypocrisy.
Like the fact that US revolution was basically kickstarted by blatantly breaking the patent law (like there was this one mill specifically) , I think its a historic event. And now here we are ! The scam of national security.
To be honest. People seem to be really kind on the fall of USA. I am not that interested since the rise of China terrifies me. But the hypocrisy of USA / losing such soft power (like here I am , from random country critiquing USA based on facts , it really downplays it being a superpower) that would be the downfall of USA.
To me , the future terrifies me. In fact the present terrifies me. I think the world is running crazy or maybe its just me.
> Like the fact that US revolution was basically kickstarted by blatantly breaking the patent law...
Hollywood also started by using non-regulation / non-licensed movie equipment when nobody was looking.
So, USA has all this "move fast, break things, and monopolize the new thing so hard that no one can get near" mentality since forever, and this moves in cycles.
It's now AIs turn, but it turned out that they democratized the world so hard, so everybody can act fast now.
In nature, nobody can stay at the top forever. People should understand this.
Any ML based service with an API is basically a dataset builder for more ML. This has been known forever and is actually a useful "law" of ML-based systems.
Aye, this should be obvious even to non-technical folks. Much has been written about how LLMs regurgitate the data they were trained on. So if you're looking for data to train on, you can certainly extract it there.
Plus of course for people within the tech bubble, plenty of research results on the value of synthetically augmented and expanded training data that put the impact past just regurgitating source data.
This whole episode is a failure of reporting what to expect next and projecting running costs etc. most of all.
They really lost their minds. They're all scared and worried because companies in other countries can also access the same data they stole from the Internet.
I liked Matt Levine’s newsletter few days ago where he hypothesized scenarios where it’s much more profitable to short your competitors, then release a much better version of some widget completely free, and then profit $$$.
Which is plausible here too, considering DeepSeek is made by a hedge fund.
I share the sentiment here, but asking as a noob: does this mean the performance comparison is not really apples to apples? If it required the distillation of the expensive model in order to get such good results for a much lower price, is that shady accounting?
Exactly this, especially as journalism melts down into slag. Soon all anybody will have to train on is social media, Wikipedia and GitHub, and that last one will slowly be metastasized by AI-generated code anyway.
It reminds me of 1984 in a sense. "Don’t you see that the whole aim of Newspeak is to narrow the range of thought? In the end we shall make thoughtcrime literally impossible, because there will be no words in which to express it."
Unlike 1984 I don't see this winnowing of new concepts as purposeful, but on the other hand I keep asking myself how we can be so stupid as to keep doing it.
I agree, (2) seems much less problematic since the AI outputs are not copyrightable and since OpenAI gives up ownership of the outputs. [1]
So, if you really really care about ToS, then just never enter into a contract with OpenAI. Company A uses OpenAI to generate data and posts it on the open Internet. Company B scrapes open Internet, including the data from Company A [2].
[1]: Ownership of content. As between you and OpenAI, and to the extent permitted by applicable law, you (a) retain your ownership rights in Input and (b) own the Output. We hereby assign to you all our right, title, and interest, if any, in and to Output.
[2]: This is not hypothetical. When ChatGPT got first released, several big AI labs accidentally and not so accidentally trained on the contents of the ShareGPT website (site that was made for sharing ChatGPT outputs). ;)
But arguably these actions share enough characteristics that it’s reasonable to place them in the same category. Something like: “products that exist largely/solely because of the work of other people”. The nonconsensual nature of this and the lack of compensation is what people understandably take issue with.
There is enough similarity that it evokes specific feelings about OpenAI when they suddenly find themselves on the other side of the situation.
Number 2 is already possible with open models. You can do distillation using Llama, which could likely be doing #1 to build their models (I'm not sure it's the case though)
Not that poster, but I think both are equally fine.
It's funny if OpenAI were to complain about this, but at least on Twitter I don't see that much whining about it from OpenAI employees. Sam publicly praised DeepSeek.
I do see some of them spreading the "they're hiding GPUs they got through sanction evasion" theory, which is disappointing, though.
You’re right. The second one is far more ethical. Especially when stealing from a thief.
Doesn’t Sam Altman keep parroting they’re developing AI “for the good of humanity”? Well then, someone taking their model and improving on it, making it open-source, having it consume less, and having a cheaper API, should make him delighted. Unless he *gasp* was full of shit the whole time. Who could have guessed?
You say "public", but what I think you mean is "publicly available". Even publicly available data has copyrights, and unless that copyright is "public domain", you need to follow some rules. Even licenses like Creative Commons, which would be the most permissive, come with caveats which OpenAI doesn't follow [0].
It is unclear if someone breaking someone else's copyright to use A can claim copyright on a work B, derived from A. My point is that OpenAI played loose with the copyright rules to build its various models, so the legality of their claims against DeepSeek might not be so strong.
I am not saying OpenAI did good by using publicly available data. I meant these are separate activities. None is good. But DeepSeek is slightly better by making theirs opensource.
So far the whole business model of Silicon Valley since social media has been to monetize other peoples' content given out for free. The whole empire is built on this.
I wonder if this is going to come to an end through a combination of social media fatigue, social media fragmentation, and open source LLMs just giving it all back to us for free. LLMs are analogous to a "JPEG for ideas" so they're just lossy compression blobs of human thought expressed through language.
> So far the whole business model of Silicon Valley since social media has been to monetize other peoples' content given out for free. The whole empire is built on this.
They scraped literally all the content of the internet without permissions. And I won't even be surprised if they scraped the output of other LLMs as well.
The schadenfreude is very real right now. I have difficulty putting to words my level of antipathy towards Altman, and I hope to watch gleefully as this all blows up in his smug face.
Well, anyone who will flex their spine in every (im)possible position as required of them, just to get even more money and power.
I could understand that from someone with an empty stomach. But so many people doing it when their pockets are already overflowing is exactly the kind of rot that degrades an entire society.
We're all just seeing the results so much better now that they can't even be bothered to pretend they ever more than this.
Later edit: The way this submission fell ~400th spots after just two hours despite having 1250 points and 550 comments, had its comments flagged and shuffled around to different submissions as soon as they touched too close to YC&Co is a good mirror of how today's society works.
It's an addiction. There's no amount of money that will be enough, there's no amount of power that will be enough. They'll burn the world for another hit, and we know that because we've been watching them do it for 50 years now.
I've read a lot about Aaron's time at Reddit / Not A Bug. I somewhat think his fame exceeds his actual accomplishments at times. He was perceived to be very hostile to his peers and subordinates.
Kind of a cliche, but aspire to be the best version yourself every day. Learn from the successes and failures of others, but don't aspire to be anyone else because eventually you'll be very disappointed.
Yeah, definitely not a statement on Aaron himself. More a statement on idolizing people. There will always be instances where they didn't live up to what people think of them as. I think Aaron was fine and a normal human being.
Aaron was not happy. Neither is Trump, or Musk. I don’t know if Bernie is happy, or AOC. Obama seems happy. Hilary doesn’t. Harris seems happy.
Striving for good isn’t gonna be fun all the time, but when choosing role models I like to factor in how happy they seem. I’d like to spend some time happy.
Try to imagine a society where people only did things that were rewarded.
Could such a society even exist?
Thought experiment: make a list of all the jobs, professions, and vocations that are not rewarded in the sense you mean,
and imagine they don't exist.
What would be left?
I don't need to imagine. Teachers almost everywhere around the globe have poor salaries. In my country there are lower enrolment requirements to universities to become a school teacher than almost every other field of study. Means the dumbest students are there.
And then later they go to the school to teach our future, working with high stress and low salary.
Same with medical school in many countries where healthcare is not privatized. Insane hours, huge responsibilities and poor pay for doctors and nurses in many countries.
Nowadays everyone wants to be an influencer or software developer.
Teachers, sure. But what about janitors & garbage collectors, paramedics, farm laborers, artists, librarians, musicians, case managers, religious/spiritual leaders?
Because only one person can be king, but everybody can participate and contribute. Also there's too many things out side of just being "the best" that decide who gets to be king. Often that person is a terrible leader.
Upvoted not because I agree, but I think it‘s a valid question that shouldn‘t be greyed out. My kids dream job is youtube influencer, I don‘t like it but can I blame them? It‘s money for nothing and the chicks for free.
Tragedy of current days. No one wants to be a firefighter, astronaut or a doctor. Influencers everywhere! Can you blame kids? Do you know firefighters who earns million dollars annually?
AaronSw exfiltrated data without authorization. You can argue the morality of that, but I think you could make the argument for OpenAI as well. I'm not opining on either, just pointing out the marked similarity here.
edit: It appears I'm wrong. Will someone correct me on what he did?
This is an argument, but isn't this where your scenario diverges completely? OpenAI's "means to an end" is further than you state; not initial advancement but the control and profit from AI.
Yes, they intended for control and profit, but it's looking like they can't keep it under control and ultimately its advancements will be available more broadly.
So, the argument goes that despite its intention, OpenAI has been one of the largest drivers of innovation in an emerging technology.
At that same link is an account of the unlawful activity. He was not authorized to access a restricted area, set up a sieve on the network, and collect the contents of JSTOR for outside distribution.
He wasn't authorised to access the wiring closet. There are many troubling things about the case, but it's fairly clear Aaron knew he was doing something he wasn't authorised to do.
> He wasn't authorised to access the wiring closet.
For which MIT can certainly have a) locked the door and b) trespassed him, but that's a very different issue than having authorization to access JSTOR.
I don’t think your links are evidence of a flip flop.
The first link is from mid-2016. The second link is from January 2025.
It is entirely reasonable for someone to genuinely change his or her views of a person over the course of 8.5 years. That is a substantial length of time in a person’s life.
To me a “flip-flop” is when one changes views on something in a very short amount of time.
This is quite honestly one of the major problems with our society right now. Once you take a public stance, you are not allowed to revisit and re-evaluate. I think that this is by and large driving most of the polarization in the country, since "My view is right and I will not give an inch least I be seen as weak".
While most of the things affected are highly political situations, i.e. Trump's ideas or Biden's fitness. We also seem to have thrown out things that we used to consider cornerstones of liberal democracy i.e. our ideas regarding free speech and censorship, where we claim that it's not happening because it is a private company.
In 2016: Sam alluded to Trump's rise as not dissimilar to Hitler's. He said that Trump's ideas on how to fix things are so far off the mark that they are dangerous. He even quoted the famous: "The only thing necessary for the triumph of evil is for good men to do nothing."
In 2025: "I'm not going to agree with him on everything, but I think he will be incredible for the country"
This is quite obviously someone who is pandering for their own benefit.
IMO it probably is and Altman probably still (rightly) hates Trump. He's playing politics because he needs to. I don't really blame him for it, though his tweet certainly did make me wince.
That's the thing though right, that we all created this mess together. Like yeah, why don't you (and the rest of us) blame him?. We're all pretty warped and it's going to take collective rehab.
Super pretentious to quote MLK, but the man had stuff to say so here it is (on Inaction):
"He who passively accepts evil is as much involved in it as he who helps to perpetrate it"
"The ultimate tragedy is not the oppression and cruelty by the bad people but the silence over that by the good people"
It seems he was virtue signaling before. So it would be more accurate to blame him for having let himself become an ego driven person in the past. Or to put it nicely and to add the context of Brian Armstrong of Coinbase, who has also been showing public support for Trump, a mission-driven person.
Yes, the first mistake was a business leader in tech taking a public political position. It was popular and accepted (if not expected) in the valley in 2016.
Doing that then (and banking the social and reputational proceeds) created the problem of dissonance now. If he'd just stayed neutral in public in 2016, he could do what he's doing now and we could assume he's just being a pragmatic business person lobbying the government to further his company's interests.
I think “progressive” is probably the safest position to take. It also works if you want to get involved in a different sort of politics later on. David Sacks had no problem doing that when he was no longer interested in being CEO of a large company.
The evidence indicates not taking a position is the optimal position.
I have a lot of respect for CEOs who just focus on being a good CEO. It's a hard enough job as is. I don't care about or want to know some CEO's personal position on politics, religion or sports teams. It's all a distraction from the job at hand. Same goes for actors, athletes and singers. They aren't qualified to have an opinion any more relevant than anyone else's, except on acting, athletics, singing - or CEO-ing.
Sadly, my perspective is in the minority. Which is why I think so many public figures keep making this mistake. The media, pundits and social sphere need them to keep making this mistake.
I guess I think they should study what a neutral position looks like, and avoid going beyond it as best as they can. I had in mind a "progressive" who avoids any hot button issues. Someone with a high profile will be asked about politics from time to time. I think Brian Chesky is a good example of acting like a progressive in a way that stays low profile, but maybe he doesn't really act like one. https://www.businessinsider.com/brian-chesky-airbnb-new-bree...
Also it helps to have sincere political views. GitHub's CEO at the time of #DropICE was too cynical and his image suffered because of it.
There are no neutral positions in today's political landscape. I'm not stating my opinion here, this is according to most political positions on the spectrum. You suggested "Progressive" (but without hot button issues) as a way of signaling a neutral position. That may be true in parts of the valley tech sphere but it certainly doesn't hold in the rest of the U.S. "Progressive" is usually defined being to the left of "Liberal", so it's hardly neutral. Over half of U.S. voters cast their ballot for the Republican candidate. Almost all those people interpret anyone identifying themselves as "Liberal" as definitely partisan (and negative, of course). Most of them see "Progressive" as even worse, slipping dangerously toward "Socialist". And the same holds true for the term "Conservative" on the other side of the spectrum, of course.
No, identifying as "Progressive" wouldn't distance you from political connotations and culture warring, it's leaping into the maelstrom yelling "Yipee-Ki-Yay!" You may want to update your priors regarding how the broad populace perceives political labels. With voters divided almost exactly in half regarding politics and cultural war issues and a large percentage on both sides having "Strong" or "Very Strong" feelings, stating any position will be seen as strongly negative by tens of millions of people. If you're a CEO (or actor, athlete, singer, etc) who relies on appealing to a broad audience, when it comes to publicly discussing politics (or religion), the downsides can be large and long-lasting but the upsides are small and fleeting. As was said in the movie "WarGames", the only winning move is not playing.
I especially like how he quoted Napoleon or something framing himself as the heart of revolution and Deep Seek as a child of the revolution only to get a response from some random guy "It's not that deep bro. Just release a better model."
I worked on something back then that had to interface with payment networks. All the payment networks had software for Windows to accomplish this that you could run under NT, while under Linux you had to implement your own connector -- which usually involved interacting with hideous old COBOL systems and/or XML and other abominations. In many cases you had to use dialup lines to talk to the banks. Again, software was available for Windows NT but not Linux.
Our solution was to run stuff to talk to banks on NT systems and everything else on Linux. Yes, those NT machines had banks of modems.
In the late 90s using NT for something to talk to banks is not necessarily a terrible idea seen through the lens of the time. Linux was also far less mature back then, and we did not have today's embarrassment of riches when it comes to Linux management and clustering and orchestration software.
If you're a tech leader and confuse Linux boxes for mainframes then I don't think it's hindsight that makes you look foolish. It's that you do not, in fact, understand what you're talking about or how to talk about it - which is your job as a tech leader.
Yeah Elon has gotten annoying (my god has he been insufferable lately) but his companies have done genuine good for the human race. It's really hard for me to think of any of the other recently made billionaires who have gotten rich off of something other than addicting devices, time-wasting social media and financial schemes.
"Donald Trump represents an unprecedented threat to America, and voting for Hillary is the best way to defend our country against it"
- Sam Altman - 2016
"If you elect a reality TV star as President, you can't be surprised when you get a reality TV show"
- Sam Altman - 2017
"When the future of the republic is at risk, the duty to the country and our values transcends the duty to your particular company and your stock price."
- Sam Altman - 2017
"I think I started that a little bit earlier than other people, but at this point I am in really good company"
- Sam Altman - 2017 ( On his criticism of Trump )
"Very few people realize just how much @reidhoffman did and spent to stop Trump from getting re-elected -- it seems reasonably likely to me that Trump would still be in office without his efforts. Thank you, Reid!"
As a society we might talk about virtue, but the reason we put it as a goal in stories is that in the real world, we don't reward it. It's not just that corruption wins sometimes, but we directly punish those that fight it. The mood of the times, if anything, comes from people realizing that what we called moral behavior leads to worse outcomes for the virtuous.
A community only espouses good values when it punishes bad behavior. How do we do this when those misbehaving are very rich, and attempting to punish the misbehavior has negative consequences on you? There just aren't many available tools that don't require significant sacrifices.
That is particularly gross, but that really feels like the norm among all the tech elite these days - Zuckerberg, Bezos, etc. all doing the most laughable flip flops.
The reason the flip flops are so laughable to me is because they attempt to couch them in some noble, moralistic viewpoint, instead of the obvious reason "We own big companies, the government has extreme power to make or break these companies, and everyone knows kissing up to Trump is what is required to be on his good side."
I think Tim Sweeney's (CEO of Epic Games) comment was spot on:
> After years of pretending to be Democrats, Big Tech leaders are now pretending to be Republicans, in hopes of currying favor with the new administration. Beware of the scummy monopoly campaign to vilify competition law as they rip off consumers and crush competitors.
This is exactly what OpenAI is trying to do with these allegations.
Those men and their companies are responsible for hundreds of thousands of jobs and a significant portion of the global economy. I'm actually thankful that they aren't shooting their mouths off to the new boss like spoiled children at their first job. It wouldn't make the world better, it would make their companies and the lives of those who depend on them, worse.
There is a fine line between cowardice and common sense.
In what sense is the federal government "the boss" of private sector businesses? This isn't an oligarchy yet, right? They don't have to behave obsequiously, they are choosing to. They're doing it for themselves, not for their shareholders or their employees. It's an attempt to grab power and become oligarchs because they see in this government a gullible mark.
The richest man in the world has a government office down the street from the white house, which the taxpayers are funding. He's rumored to sleep there.
Puhleeeese. I'm not advocating that these leaders all lead protest marches against the new administration. But the transparent obsequiousness and Trump ball gargling under the guise of some moralistic principles is so nauseating. And please spare me the idea that the likes of Zuckerberg or Bezos gives a rat's ass about their employees.
For a contrast to the Bezos, Zuckerberg and Altman types, look at Tim Cook. Sure, Apple paid the 1 million inauguration "donation", and Cook was at the inauguration, and I'm not arguing he's winning any "Profiles in Courage" awards, but he didn't come out with lots of tweets claiming how massuh Trump is so wise and awesome, Apple didn't do a 180 on their previous policies, etc.
Although I dislike him now glazing Trump, I understand why he's doing it. Trump runs a racket and this is part of the game.
One of my most contrarian positions is I still like and support Altman, despite most of the internet now hating him almost as much as they (justifiably) hate Elon. Was a fan of Sam pre-YC presidency and still am now.
For me, it’s the technical results. Same as for Musk.
Tesla accelerated us forward into the electric car age. SpaceX revolutionized launches.
OpenAI added some real startup oomph to the AI arms race which was dominated by megacorps with entrenched products that they would have disrupted only slowly.
So these guys are doing useful things, however you feel about their other conduct. Personally I find the gross political flip-flops hard to stomach.
Why would you support someone you said was part of a racket in the sentence before? We're talking about real life, where actions have consequences, not a TV show where we're expected to identifiy with Tony Soprano.
Yeah I don't know, Altman is a sociopath who is now trying to get intertwined with local governments (SF) as well as the federal government. He's going to do a lot of weaseling to get what he wants: laws that forcibly make OpenAI a monopoly.
Society will always have crazy sociopaths destroying things for their own gain, and now is Altman's turn.
I don’t care for Sam Altman and his general untrustworthy behavior. But DeepSeek is perhaps more untrustworthy. Models from American companies at least aren’t surprising us with government driven misinformation, and even though safety can also be censorship, the companies that make these models at least openly talk about their safety programs. DeepSeek is implementing a censorship and propaganda program without admitting it at all, and once they become good at doing it in less obvious ways, it can become very damaging and corrupt the political process of other societies, because users will trust the tools they use are neutral.
I think DeepSeek’s strategy to announce a misleading low cost (just the final training run that optimizes a base model that in turn is possibly based on OpenAI) is also purposeful. After all, High Flyer, the parent company of DeepSeek, is a hedge fund - and I bet they took out big short positions on Nvidia before their recent announcements. The Chinese government, of course, benefits from a misleading number being announced broadly, causing doubt among investors who would otherwise continue to prop up American technology startups. Not to mention the big fall in American markets as a result.
I do think there’s also a big difference between scraping the Internet for training data, which might just be fair use, and training off other LLMs or obtaining their assets in some other way. The latter feels like the kind of copying and industrial espionage that used to get China ridiculed in the 2000s and 2010s. Note that DeepSeek has never detailed their training data, even at a high level. This is true even in their previous papers, where they were very vague about the pre training process, which feels suspicious.
> Models from American companies at least aren’t surprising us with government driven misinformation, and even though safety can also be censorship
Being a citizen of a western nation, I'm inclined to agree with the general sentiment here, but how can you definitively say this? You, or I, don't know with any certainty what interference the US government has played with domestic LLMs, or what lies they have fabricated and cultivated, that are now part of those LLMs' collective knowledge. We can see the perceived censorship with deepseek more clearly, but that isn't evidence that we're in any safer territory.
> There are loads of examples on the internet of LLMs pushing (foreign) government narratives e.g. on Israel-Palestine
There isn’t even a single example of that. If an LLM is taking a certain position because it has learned from articles on that topic, that’s different from it being manipulated on purpose to answer differently on that topic. You’re confusing an LLM simply reflecting the complexity out there in the world on some topics (showing up in training data), with government forced censorship and propaganda in DeepSeek.
Fine, whatever. It's actually much more concerning if the overall information landscape has been so curated by censors that a naively-trained LLM comes "pre-censored", as you are asserting. This issue is so "complex" when it comes to one side, and "morally clear" when it comes to the other. Classic doublespeak.
That's far more dystopian than a post-hoc "guardrailed" model (that you can run locally without guardrails).
> I don’t care for Sam Altman and his general untrustworthy behavior. But DeepSeek is perhaps more untrustworthy. Models from American companies at least aren’t surprising us with government driven misinformation, and even though safety can also be censorship, the companies that make these models at least openly talk about their safety programs. DeepSeek is implementing a censorship and propaganda program without admitting it at all, and once they become good at doing it in less obvious ways, it can become very damaging and corrupt the political process of other societies, because users will trust the tools they use are neutral.
These arguments always remind me of the arguments against Huawei because they _might_ be spying on western countries. On the other hand we had the US government working hand in hand with US corporations in proven spying operations against western allies for political and economic gain. So why should we choose an American supplier over a Chinese one?
> I think DeepSeek’s strategy to announce a misleading low cost (just the final training run that optimizes a base model that in turn is possibly based on OpenAI) is also purposeful. After all, High Flyer, the parent company of DeepSeek, is a hedge fund - and I bet they took out big short positions on Nvidia before their recent announcements. The Chinese government, of course, benefits from a misleading number being announced broadly, causing doubt among investors who would otherwise continue to prop up American technology startups. Not to mention the big fall in American markets as a result.
Why should I care about the stock value of US corporations?
> I do think there’s also a big difference between scraping the Internet for training data, which might just be fair use, and training off other LLMs or obtaining their assets in some other way.
So if training of copyrighted work scrapped of the Internet is fair use, how would the training of the LLMs not be fair use as well? You can't have it both ways.
> Models from American companies at least aren’t surprising us with government driven misinformation
Is corporate misinformation so much better? Recall about Tienanmen Square might be more honest but if LLMs had been available over the past 50 years, I would expect many popular models would have cheerfully told us company towns are a great place to live, cigarettes are healthy, industrial pollution has no impact on your health, and anthropogenic climate change isn't real.
Especially after the recent behaviour of Meta, Twitter, and Amazon in open support of Trump and Republican interests, I'll be shocked if we don't start seeing that reflected in their LLMs over the next few years.
Yes the irony is so thick in the air that it can be cut through using a swiss knife lol
I had literally come to this post to say the same. You beat me to it.
USA is going crazy over deepseek and to me , it just shows that the world is a black swan , an AI bubble.
I am not saying AI has no use. I regularly use it to create something , but its just not recommended. I am going to stop using AI , to grow my mind.
And its definitely way overpriced. People are investing so much money without seeing the returns? , and I think people are also using AI because of a sense of FOMO , I don't know , to me its funny .
I really really want to create a index fund with strictly no AI companies. Since this doesn't feel diversified enough. Like sure nvidia gave a quarter of return the last year , but I mean , at this point , it almost feels the same as that of bitcoin. The reason I don't / won't invest in bitcoin is I don't want "that" risk.
This has been a boggling year.
I have realized that the world is crazy. Truly. Trump winning from going to the point of getting shot to deepseek causing nvidia / american stock market to go down , heck even bitcoin! , its so crazy , trump launching his meme coin.
If the world is crazy. Just be the sane person around. You will stick around , that's my philosophy. I won't jump on AI wandwagon . But its still absolutely wild & horror seeing how a "sideproject" (deepseek) absolutely put american stock market in shambles.
I want more diversifaction. I am not satisfied with the current system. This feels like a bubble and I want no part in it.
Copyright is weird and often legal ≠ moral, but I'm having a hard time constructing a mental model where it's ok to scrape a novel written by a person but it's not ok to scrape a story written by chatgpt
ClosedAI scraped human content without asking and they explained why this was acceptable... but when the outputs of their training corpus is scraped, it is THEIR dataset and this is NOT acceptable!
Oh, the irony! :D
I shared a few screenshots of DeepSeek answering using ChatGPT's output in yesterday's article!
https://semking.com/deepseek-china-ai-model-breakthrough-sec...