The hard truth is that LLMs will fully replace wikipedia.
Let's put aside wikipedia being rotten with bureaucracy and obsession-driven bias, which is similar to stackoverflow preexisting flows before LLMs streamrolled.
Fact is, wikipedia is a human driven summarization engine of secondary sources, hopefully in a way that echos the sources consensus.
This is exactly what LLMs are best for, summarizing huge amount of text, and training can easily focus on high quality books and thus exceed wikipedia in quality.
It's enough to read an AI summary where the first line talks about the subject in hand, compared to wikipedia where the first line is the product of some petty argument about a political disagreement
And wait what happens to their quality as soon as the AI companies habe to generate profit.
I doubt you can collect second source references at the same price Wikipedia does.
People aren’t as eager to help a billion dollar company for free to make profit.
LLMs lack the human nuance that a good Wikipedia article requires. Weighing quality sources and digesting them in the most useful way that a human would want and expect — that is very difficult for both humans and machines, and it is why Wikipedia as a whole is such a treasure: Because a community of editors take the time to tweak the articles and aim for perfection.
There are guidelines across all Wikipedia articles that make a good experience for the reader. We can’t even get the world’s greatest LLMs to follow a set of rules in a small conversation.
In my opinion simply using a dataset of high quality books and highly rated academic journals is enough to surpass current Wikipedia quality.
In my experience when using LLMs as a replacement for Wikipedia (learning about history), it is often of higher quality in niche topics and far less biased in political contentious areas
For me Wikipedia is only good for introductions and exploration. You don't have time to read a dense tome but also don't have enough experience in reading research papers in that area? Wikipedia it is then.
Wikipedia is the tabloid equivalent for scientific topics.
LLMs tend to be much more useful for niche topics, because they've most likely been trained directly on the source itself.
> and training can easily focus on high quality books and thus exceed wikipedia in quality.
Then it will never exceed Wikipedia. There are no high quality books to many of Wikipedia‘s articles and even if who qualifies those books as as high quality.
And less traffic is not only a problem of Wikipedia but also other websites.
So AI is killing the source it feeds on.
This is the golden age of AI.
The next age will be filled with less good sources, more AI generated content and sites that actively try to poison LLMs
As others have said, Wikipedia is a tertiary source. A primary source is the original materials, e.g. research papers for example. Secondary sources analyze or summarize, but also connect the primary sources into a bigger work. This is how comprehensive textbooks that cover a wide breadth of topics are written. Wikipedia is for those who are too lazy for even the secondary sources. I.e. blog post sized articles about a particular topic. That's what a tertiary source is and nobody really needs tertiary sources other than for convenience and time savings.
If if all this is true, LLMs would still need to create a repository of such aggregated "summarizations of secondary sources".
LLMs cannot in real time find and read thousands of secondary sources. Especially not if some of these might have already disappeared or are not digitalized.
I can see a future where LLM labs a) donate to Wikipedia and b) contribute to it with agents that suggest edits and review facts.
I think LLM companies will need to donate and contribute to more direct sources are newspapers, journalists, bloggers, coders posting on stack overflow etc and not Wikipedia (which is more of a tertiary source).
Let's put aside wikipedia being rotten with bureaucracy and obsession-driven bias, which is similar to stackoverflow preexisting flows before LLMs streamrolled.
Fact is, wikipedia is a human driven summarization engine of secondary sources, hopefully in a way that echos the sources consensus.
This is exactly what LLMs are best for, summarizing huge amount of text, and training can easily focus on high quality books and thus exceed wikipedia in quality.
It's enough to read an AI summary where the first line talks about the subject in hand, compared to wikipedia where the first line is the product of some petty argument about a political disagreement