Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think perhaps what's more upsetting is that GPT-3 flips the traditional notions of what machines are good at and what humans are good at on their respective heads.

GPT-3 seems to indicate there's a chance that "creative" domains such as poetry, literature, music, etc. will be taken over by AI (i.e. AIs will have superhuman performance) before "logical" domains such as logic, mathematics, and the sciences.

This means that it is becoming more and more conceivable to more and more people that sometime in the foreseeable future an AI will be better than any human along any dimension you choose to measure, even when it comes to the ability to elicit emotions and reactions in other humans.



I think you hit the nail on the head, with the salient point here being that in the near future "creative" things will be automated first (see Image GPT, Jukebox, etc. Google has 100 billion dollars cash and countless TPUs, best engineers, infra, etc - they could probably replicate results far better than each of these OpenAI projects within a few years). One of the things that got me into ML research was the notion that we could automate a lot of the hard work humans do every day (agriculture, cooking, desk jobs, etc) so that humans could do things that were uniquely theirs & interesting, that were human, that were beautiful... Unfortunately it turns out that classical music and waxing poetic are easily generative in an enjoyable way. In the most ironic fashion possible, it turns out that the very thing we do when we conduct ML research, what you call the "logical domain", is one of the only things that stays human-only in the foreseeable future.

GPT-3 and other projects seem to drive hype cycles in the tech community and convince people like Elon Musk that the AGI revolution is near. But I think recent progress is just another example of machine learning models being able to generalize on super large datasets, even if it's the biggest model so far. It's not clear to me that larger models will solve this in the limit; take the way GPT3 fails on addition past a certain number, and the fundamental inability for transformers to learn certain algorithms. It is certainly still possible for this type of large dataset, large model style of ML to make human life better in many ways - like Tesla is trying to do with self driving cars, or Covariant with automating Amazon-like jobs. But I think when it comes to tackling the hard problems of true intelligence, we're missing a dimension somewhere.


Disclaimer: I'm a composer

> Unfortunately it turns out that classical music and waxing poetic are easily generative in an enjoyable way

On the contrary, I would say that generating convincing and original classical is an incredibly hard (if not impossible) task. All the current music AI projects give results which may sound “good“ to a casual listener, but they sound horribly wrong to any educated listener. The reason is that AI can only imitate the surface, but completely misses to recognize/synthesize larger structures. This might be ok for some background noodling in a TV drama, but not for the concert stage.

Finally, we rarely perceive art works in isolation. We know and appreciate the fact that a certain work has been created by a certain person in a certain time.


The reality is likely neither here nor there - i.e. computing may have more to offer to the creative endeavor than creators would like to admit, but still leave an obvious gap which technologists might be loathe to admit.

It may be instructive to look at David Cope's [1] work (what he calls "recombinant music" [2]). Cope's been writing algorithms to compose in the styles of the masters (Mozart/Chopin/et al) for about 3 decades now, well before the recent surge in "AI". His techniques are much less sexy for the "deep learning" enthusiasts, and yet he managed to outrage an audience of connoisseurs who assembled to listen to a "lost Chopin piece" only to be told, after they shared their applause, that it was composed by a computer taught to mimic Chopin's style (the composition was performed by a musician). The response, in my opinion, also points to music as a social constructed experience and not purely attributable to the sound signal itself. i.e. if I give you a romantic background story for a lost composition of a master, you may be inclined to experience the piece in a more favorable light than if I told you it was generated by an algorithm (or the converse).

You're absolutely right that the musical output of the current crop of "AI" projects (especially the ones using deep learning / neural networks) are crappy to even a modestly trained listener .. or even a lay untrained listener for that matter. However, more involved modeling (such as Cope's) has produced some very compelling results decades ago, so it would be a mistake to assume that the current crop won't get close enough [3]. The fact that DL systems don't need to be instructed in the way Cope has had to encode his musical understanding is also something to be considered in the evaluation as well as in scoping their capabilities going forward.

[1]: https://en.wikipedia.org/wiki/David_Cope [2]: https://www.recombinantinc.com [3]: https://deepmind.com/blog/article/wavenet-generative-model-r... (see "Making Music" section and examples there)


I am also a computer musician, btw, so I am well aware of the creative potentials of algorithmic composition. ;-)

However, we have to make a clear distinction between creative and recreative methods. David Cope's work is impressive, but it focusses on the recreation of existing musical styles. This is interesting from a musicologist perspective, but not very interesting artistically.

I would certainly say that deep learning generates lots of interesting “material“ (like many other methods of algorithmic composition), but we still need a human being to curate, edit and assemble the material into a meaningful piece of art.

Finally, I think the current AI debate can be very fruitful for the arts. In a way, it raises similar questions as the concept of the “readymade“ and the pop art movement did in the 20th century.

Btw, I'm currently working on an opera which uses AI generated lyrics :-)


Humans also need other humans to curate their work. We are comparing AI not only to the best composers alive, but also to the best composers ever. Nobody remembers millions of failed musicians.

BTW - I'm curious, what do you think about birds songs? Are their songs interesting artistically? How do you think they were composed?


Oh, you're opening up a huge topic there. Actually, there have been philosophers who claimed that the beauty/sublimity of nature was ultimately superior to the sensations produces by the arts. You can find this reasoning in Kant's "Kritik der Urteilskraft", for example.

On the other hand, you have composers like John Cage (or more recently: Peter Ablinger) who claim that the act of listening itself can be/create art, blurring the borders between nature and art. There are conceptual pieces which only consist of listening instructions.

Finally, bird "songs" have been used as the source material for musical composition for centuries. You can find it in Beethoven, Mahler, Debussy, Stravinsky, etc. Olivier Messiaen even was a hobby ornithologist; he faithfully transcribed hundreds of bird songs and used them in his music (see for example his piano cycle "Catalogue d’oiseaux").

As for the question of who composed the actual bird songs, the answer probably depends on the theological background of the person you ask ;-)


I'm willing to go a little further with recombination given that a good part of a traditional musician's education consists of studying and re-performing "standards" be they jazz, western classical or Indian classical (which is my background). A simple example is how pretty much every hero-soundig film background music smells of Also Sprach Zarathustra to me. I do think that musician's stand as much on the shoulders of giants as scientists do .. but sometimes don't quite acknowledge that explicitly in their works.

I think this topic will keep reverting to the point you raise - "meaningful art". As long as the "meaning" is a construct in a human brain that we're looking for, we have little to say about AI and it's capabilities (like Joshua Bell's hardly-noticed playing of Bach classics at New York's subway station as opposed to when he's performing at a concert hall).

.. (edit) and I do think that active listening is itself a creative act.


> All the current music AI projects give results which may sound “good“ to a casual listener, but they sound horribly wrong to any educated listener

I think you're right, in that AI won't be able to create deeper themes and patterns, but I disagree with the above point: AI will take over the music industry because the vast, vast majority of people aren't educated listeners. The popularity of 6six9ine is a fantastic example.

To put it another way, I don't need another Terry Riley, Clint Mansell, or Meredith Monk, I just need something good enough to occupy some brainspace while I drive home after work; a move soundtrack just needs something sad, or exciting, or tension building. The AI can and will get there soon enough.


Even if it takes over the industry (I can actually imagine this happening), my original point still holds: the educated/experienced listener will notice and will care. For some people at least, music or art in general will always be an existential form of human expression, not some random exchangable consumer product.


All the current music AI projects give results which may sound “good“ to a casual listener, but they sound horribly wrong to any educated listener. The reason is that AI can only imitate the surface, but completely misses to recognize/synthesize larger structures.

Lack of "larger structures" is the key here. That's where GPT-1 was. Each sentence, in isolation, seemed to make sense, but after a few lines, it was clear the text wasn't going anywhere. By GPT-2, paragraphs seemed semi-reasonable, but multiple paragraphs didn't hold together. GPT-3 is able to keep it together for a few paragraphs, but probably not for a book chapter.

Music synthesis has the same scaling issue. Generators which imitate known patterns work for a few bars, but after a while you realize the music is going nowhere. The GPT results on text indicate that a scaleup may fix that problem.


This is the same argument people made against MP3 compression.

Lossy is bad. Humans will never stand for it.

Perfections will not stand for it. Pragmatists won’t notice.

This isn’t a bad thing. We need perfectionists to drag us across the “good enough” line. Despite our childish kicking&screaming.


Absolutely terrible comparison, completely not relevant.


Could you give some AI-generated examples that people like but professional would not like?

Is originality the key point? Because AI-generated music has high probability containing piece of rhythm from their training dataset.


This is going to sound very dismissive and condescending: "meh."

Generative music has been around for half a century, or longer depending on how you want to interpret things. Mimicry as a mechanism for composition has been around for as long as humans have made music.

It is wholly uninteresting to discover that we can design generative systems for music that excel at mimicry, because we've already perfected that mechanism in analog. The interesting bit is that the genesis of new musical ideas is driven by manual interaction and direction of the generative system, and at that point it's the guiding hand of the engineer turned artist that we can respect and appreciate, not the mimicry of a machine.


That's like saying we've had abakuses since forever therefore these computers will never be revolutionary. Quantitative change by orders of magnitude is qualitative change.

Imagine a world where you ask your smartphone to make you a death metal song about fishing and feminism in Australia and to use Freddy Mercury voice and jazz harmonies and it does that on the fly and generates something objectively good.

Wouldn't that be revolutionary for music? Because it's entirely possible in the next decade. Probable even.


To be honest, that doesn't sound _that_ revolutionary for music. Because I'm pretty sure if you went digging, you could already find somewhere on Spotify a pretty decent death metal song with a vocalist who sounds like Freddy Mercury and jazz harmonies (I will concede, the specified subject matter is unlikely). Would you go looking for that, though? Probably not, because musical tastes and interests aren't about wanting a very specific set of attributes in a song. It's about tribalism, cults of personality, senses of belonging, nostalgia etc. The world is not short of good music, or variation in styles of good music, and what causes songs to be popular is not the objective quality of the music.

Put it another way. If an AI could generate new Beatles music on the fly, making it sound exactly like the Beatles, with the same creativity of lyrics, tight harmonies, beautiful melodies, would Beatles fans go out in their millions to buy them? No. In the same way that the same dusty demos from the 60s found in an attic somewhere became valuable when it was discovered that they were Beatles demos. The music didn't change, it didn't get better or worse. The personal story attached to them was what mattered.


My point isn't that any particular generated song will be revolutionary. My point is that you can get any song you can describe. There will be billions of good quality songs made because billions of people will be able to produce a song just by describing it.

I expect new genres to be created almost immediately. And I'm not sure how real musicians can compete with that level of noise out there.


This only works if the sound and themes desired are vast enough for that. It's fine if a casual listener is a fan of something like anything house, pop, or electro. It's more difficult if your taste level is more obscure- a specific artist's style, or a specific juxtaposition produced from a one-off album. In that case there is quite literally not enough data to train on to produce further.


Even when there's not enough data to train on, it might still be possible to generate something in a desired rare style - provided this style is a mixture of several more common styles. Modern generative models are pretty good at interpolating.


That sounds more like a meme than something which would revolutionize music. It would be a funny gag, but what really determines if its good music or not is... if its good music or not. If my phone idea of "generate a death metal song" is to parrot what every other death metal song sounds like, it will be boring and not enjoyable to listen to.


The border between "parroting" and "generating something good" may be very hard to discern at some point.


> Generative music has been around for half a century,

If you start by referring to results from 50 years ago, have you tried listening to state of the art generative music systems lately? They can probably compose music better than 99% of humans.


But we mostly listen to music written by humans who are better at writing music than 99.9999% of humans.


Yes. And there was a time when we literally used paintings to assess progress in a mine. Cars didn't outperform horse carriages for certainly 10, arguably 30 years after their invention.

This "human music" > "ai music" will flip. Suddenly. And it will never flip back.


> This "human music" > "ai music" will flip. Suddenly. And it will never flip back.

Already starting to happen with ai lyrics I use for inspiration in creating EDM music ( i.e. https://TheseLyricsDoNotExist.com/ )


This shares the same foundation as the argument that ebooks will kill physical sales and solent will change how people see food, namely that we're all purely motivated by boiling every need we have down to the most fundamental version.

It never seems to play out that way at population scales


Have you been moved by any of that music though? Am I missing something?


Listen to some samples:

- https://openai.com/blog/jukebox/ (2020, quite good, but no classical music)

- https://openai.com/blog/musenet/ (2019 so not as good as the 2020 one, but showcases classical music)

There is no reason to assume that one cannot be moved by AI-generated music, as the AI has learnt from human-generated music and tries to mimick the styles.


While it's technically impressive and has a decent surface-level resemblance, none of the samples had any sense of direction or substance.

I can see this kind of tech taking over stuff like stock music that's automatically added to consumer holiday videos or played on the phone while you wait for a customer service agent.

That said, I'd expect the agent to be an AI long before generated music becomes independently musically relevant.


Yeah, it's very moving to see a human-made machine do such wonders. Fills me with awe, appreciation, and hope.


That's exactly it, though. This stuff is interesting because of the novelty of AI. The works themselves are not independently relevant (not yet, at least).


Elsewhere someone replied that art is interesting in a large part because of the personal story. How is this differemt?


99% of the time, I don't listen to music for the personal story of the artists involved. In fact, a lot of the music I listen to is made by artists that I know very little about.


Yes - the older music is better, because it was an exploration of nondeterminism in art, and not automated replication.

Doing what has already been done is rarely compelling.


Is there anything you'd recommend for SOTA music gen?



GPT-3 can write working React components. But we can't expect it to scale up to complete useful programs soon.

GPT-3 can write hauntingly beautiful snippets of prose. Can we expect it to scale up to coherent novels?

It's easier to see the limitations in the areas you know best. It's significant that it's this good at creative tasks, but I'm not convinced that creative tasks are the most at risk.


> It's not clear to me that larger models will solve this in the limit; take the way GPT3 fails on addition past a certain number, and the fundamental inability for transformers to learn certain algorithms.

GPT-3 was OpenAI exercise in how far pure scaling can get you. They have used some 2 years old method. Already at the point when they started training GPT-3 there were readily available remedies to many of GPT-3 issues. Given how they energized the wider community I'm sure even more focus will be given to improving language models in the following years.

Some rough ideas right now:

- People think that cherry-picking the best GPT-3 examples is cheating - why? Train a model that will be selecting the best examples for you. My proposition is to train a model that guesses whether some text was GPT-3 generated or human made - select samples that look the most human like.

- Use a good search method to look for the best samples. Monte Carlo Tree Search? AlphaZero? MuZero? If MuZero can play a games of Chess, Shogi, Go and all of Atari then way should it not be able to play a game of what word will come next?

- Hook up the language model to a search engine. Instead of writing a whole program yourself, why not to copy-paste some stuff from StackOverflow with some slight modifications?

Etc.

It doesn't address the issues with agency, grounding and multi-modality, but it's a good road map for the next 2-3 years.


train a model that guesses whether some text was GPT-3 generated or human made - select samples that look the most human like.

What you said is essentially: "Train a better GPT model". Humans have trouble distinguishing between (some of) GPT-3 and human writing. The only way to build a classifier that can do this is to build a model that is better than GPT-3 at understanding text. It would need to have features currently absent in GPT-3, such as common sense and understanding the world (e.g. causality, physics, psychology, history, etc). If what you say could be done, GPT-3 would have been designed as a GAN.


It's a lot easier to notice logical mistakes in already written text, than it is to avoid making them in the first place. When you write text do you write it in one pass or do you read yourself and fix mistakes, reformulate sentences etc.? I have reformulated this piece of text at least once in order to make my argument clear.

That's the difference between GPT and BERT. GPT can only attend to the past outputs, while BERT one can attend also to the future outputs.

Now imagine that what you are going to say is not actually determined by you, but it is sampled randomly from what seems like a reasonable thing to say. This is how GPT-3 works. If somebody ask you some kind of question you can guess 70% yes or 30% no, then roll a 10 side dice to pick one, but once you pick there is no way back.

And I already mentioned that it does not address agency, grounding and multi-modality, but it could improve GPT ability to formulate coherent arguments, follow instructions, write mathematical proofs and computer programs or play games.

BTW - I actually have implemented it and it works quite reasonably.

Here are samples from GPT-2 small and GPT-2 small + RoBERTa adversarial decoder.

https://github.com/Isinlor/AdvDecoder/tree/master/outputs


It's a lot easier to notice logical mistakes in already written text, than it is to avoid making them in the first place

For a human who does logical thinking, yes. But for a language model? I'm actually not sure, because it's possible that a sufficiently complex language model like GPT-3 does form some kind of general logical rules encoded in its weights somehow. This would be interesting to explore.

I actually have implemented it and it works quite reasonably.

Oh, so you are trying to design GPT-2 like a GAN, or at least move into that direction. Interesting. Yes, I don't see why not. What do you think about taking a step further, and actually making it a GAN, i.e propagating the error from discriminator into the encoder? I'm sure you're aware of multiple attempts to do this with smaller models, with mediocre results, but maybe GPT-3 scale is what needed to make it work?


But in the arts, can AI come up with something truly new?

This should be testable: train AI on all the music ever written before Bach, and see if it ever produces something ressembling Bach.

Maybe that kind of test has alretbeen done; it would be interesting to know what comes out of it.


The GPT-2 based Musenet music generator is already interesting but far from perfect. You can try it in the middle of this article: https://openai.com/blog/musenet/ (you can even upload custom prompts in the advanced mode) Would be interesting to see it with the updated GPT-3.

There is also AIVA with more production ready results:

https://www.youtube.com/watch?v=gzGkC_o9hXI&list=PLv7BOfa4Cx...

Not sure how it works, but it has better results maybe because it's using more predefined components and less AI so it's also less "creative".

More AI music projects here: https://magenta.tensorflow.org/


This should be testable

There have been music resembling Bach written before Bach (e.g. https://www.youtube.com/watch?v=VUcdBz3LIuU). How much more of resemblance you hope for?


Obviously no.

But there’s so much classical music out there, that an average person would never be able to tell the difference between something that is generated anew and something just really obscure.

Have you ever tried copying and pasting sections of GPT output into Google?


A better or hopeful projection is that "creative" things will split into casually consumed which is largely automated and more active/deeply experienced content which will be human made or directed. The first already exists in formulaic content generated by humans with little consideration for a cohesive story without self contradiction.

I don't know which way things will go. Will newer and later generations be accustomed to and accept lower fidelity art? the uncanny valley be bridged from both sides? Or will there be attention being drawn to what is 'real' vs 'synthetic'. Good art is pain. Labelling these things distinctly will probably reveal that I consume some 'real', annoyed by some 'synthetic' while enjoying as much. This will get challenging as machine generated can seem more 'real' than much human made content: 'real' is/was a subset of human made, machine made is/was a subset of 'synthetic'.

This line of reasoning leads me to believe that premium content will be interactive. This means that the content has to either have a human connection or be closer and closer to passing a Turing test. The current examples of machine made static content wont cut it.


But does the fact that machines can also create works of music and art make it any less enjoyable for humans to create them? Will we suddenly stop writing or drawing for pleasure?


There is nothing like the feeling of performing music for a crowd. There is also nothing like hitting a chord in a big empty space and listening while the sound slowly fades away.

Related to instruments themselves, the trial and error is one very important aspects I can think of right now that's enjoyable: playing something off beat or out of tune and correcting yourself. The feeling of correction and improvement.

It is a real pity the actual algorithm itself has no way to enjoy what it is creating.


Probably not. Humans are still playing Jeopardy and chess despite losing dominance in those games a long time ago.


Here is the scary bit.

This 1957 novel

https://www.commentarymagazine.com/articles/wallace-markfiel...

points out that low-status jobs are jobs where you can be held accountable for doing something wrong (e.g. bank teller who gives out two $20 bills instead of one $20 bill) and high-status jobs where you can can't. (Back in the the 1980s looting a bank as CEO could get you in jail, today the DOJ seems to think a judge and jury couldn't understand how a bank gets looted.)

If current patterns continued, GPT-3 would get the "Brahmin" jobs and real people would get the "Dalit" jobs. GPT-3 can do the job of Bill Lumbergh, probably better than Lumbergh himself, but if it tried to pass as anybody who gets real work done, it wouldn't.


There's a quote attributed to Donald Knuth that goes "Science is what we understand well enough to explain to a computer. Art is everything else we do."

Now if you take the word "explain" broadly and maintain that we've actually found a way to "explain" a huge volume of information to GPT-3 then you might hold that Knuth had got it backwards.

But maybe that's the crux of it. GPT-3 doesn't get explained anything. You might better say it was force fed.


How about politics? Load all the political punditry, polling data, blogs, transcripts of Fox News and CNBC and build the perfect Presidential tweet bot, speech writer and campaign adviser.

Of course what you'd end up with is a presidency that only cared about electoral chances, and would have no understanding whatsoever of the actual impact of policies or how to manage issues and crises to achieve actual goals.


Nothing new there, then.


AI systems have been known to be able to elicit emotions and reactions in humans, even very strong such emotions and reactions, since the early days of the field. A classic example is Joseph Weizenbaum's ELIZA, which gives its name to the "Eliza effect", i.e. the tendency to anthropomorphise AI programs [1], even very simple ones, with a small range of pre-scripted behaviours, like ELIZA.

For a longer example involving a robot specifically designed to mimick emotions by manipulating actuators to change its "facial" expressions, see Rodnay Brooks' third part of his tripartite essay on "Steps towards super-intelligence", specifically the chapter titled "7. Bond With Humans" [2] (there's no direct link to the chapter but you xcan search for it in the article).

I quote from Rodney Brooks' article:

In the 1990’s my PhD student Cynthia Breazeal used to ask whether we would want the then future robots in our homes to be “an appliance or a friend”. So far they have been appliances. For Cynthia’s PhD thesis (defended in the year 2000) she built a robot, Kismet, an embodied head, that could interact with people. She tested it with lab members who were familiar with robots and with dozens of volunteers who had no previous experience with robots, and certainly not a social robot like Kismet.

I have put two videos (cameras were much lower resolution back then) from her PhD defense online.

In the first one Cynthia asked six members of our lab group to variously praise the robot, get its attention, prohibit the robot, and soothe the robot. As you can see, the robot has simple facial expressions, and head motions. Cynthia had mapped out an emotional space for the robot and had it express its emotion state with these parameters controlling how it moved its head, its ears and its eyelids. A largely independent system controlled the direction of its eyes, designed to look like human eyes, with cameras behind each retina–its gaze direction is both emotional and functional in that gaze direction determines what it can see. It also looked for people’s eyes and made eye contact when appropriate, while generally picking up on motions in its field of view, and sometimes attending to those motions, based on a model of how humans seem to do so at the preconscious level. In the video Kismet easily picks up on the somewhat exaggerated prosody in the humans’ voices, and responds appropriately.

In the second video, a naïve subject, i.e., one who had no previous knowledge of the robot, was asked to “talk to the robot”. He did not know that the robot did not understand English, but instead only detected when he was speaking along with detecting the prosody in his voice (and in fact it was much better tuned to prosody in women’s voices–you may have noticed that all the human participants in the previous video were women). Also he did not know that Kismet only uttered nonsense words made up of English language phonemes but not actual English words. Nevertheless he is able to have a somewhat coherent conversation with the robot. They take turns in speaking (as with all subjects he adjusts his delay to match the timing that Kismet needed so they would not speak over each other), and he successfully shows it his watch, in that it looks right at his watch when he says “I want to show you my watch”. It does this because instinctively he moves his hand to the center of its visual field and makes a motion towards the watch, tapping the face with his index finger. Kismet knows nothing about watches but does know to follow simple motions. Kismet also makes eye contact with him, follows his face, and when it loses his face, the subject re-engages it with a hand motion. And when he gets close to Kismet’s face and Kismet pulls back he says “Am I too close?”.

The article includes links to the videos.

_____________

[1] https://en.wikipedia.org/wiki/ELIZA_effect

[2] https://rodneybrooks.com/forai-steps-toward-super-intelligen...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: