Jump to content

Using AI to manipulate film score audio (dialogue removal, stem separation, etc)


Jay

Recommended Posts

On 08/08/2024 at 8:27 PM, The Great Gonzales said:

 

Which model specifically did you use to make these?

Link to comment
Share on other sites

Just now, Jay said:

 

Which model specifically did you use to make these?

"DeNoise by aufr33"

Also, the vocal model "BSRoformer" has been updated with an additional version "2024.08"

Link to comment
Share on other sites

 

Elsa's Betrayal 

Just A Gigolo using the atoms mix and the stereo mix still some SFX but the zeppelin engine noise is dialed back. 

The holy Grail not using the boot at all only the channels from the atmos mix 

 

 

Link to comment
Share on other sites

  • 2 weeks later...

I've been working on a project and wanted to try out the Crowd Removal model before separating out the guitar to see if it made a difference. I've tried it out for a number of concerts too and it works amazingly well.

Link to comment
Share on other sites

  • 3 weeks later...

Anyways here is the "press release":

 

Spoiler
Quote

 

1) We have added new piano models. The MVSep Piano model now comes in several variants based on the MDX23C, MelRoformer and SCNet Large neural net architectures. The model produces high-quality separation of music into piano and everything else. See the results in the table below. For comparison, the table shows metrics on the open model Demucs4HT (6 parts) and the old model "mdx23c (2023.08)". The SDR metric used is the higher the better.

 

Algorithm name Validation type
piano (SDR) other (SDR)
Demucs4HT (6 stems) 2.23 14.51
mdx23c (2023.08, SDR: 4.79) 4.79 17.07
mdx23c (2024.09, SDR: 5.59) 5.59 17.89
MelRoformer (viperx, SDR: 5.67) 5.67 17.95
SCNet Large (2024.09, SDR: 5.89) 5.89 18.16
Ensemble (SCNet + Mel, SDR: 6.19) 6.19 18.47

 

Listen to: demo, user demos.

 

2) We have updated our guitar models. A model based on the BSRoformer architecture by viperx has been added. The ensemble has also been updated. It is the one used by default. SDR on our test dataset increased from 7.18 to 7.51.

 

Listen to: demo, user demos

 

3) We added a new version of MelBand Roformer for vocals, which showed record results on Synth dataset. You can select it from the list called "Bas Curtiz edition (SDR vocals: 11.18, SDR instrument: 17.49)" in the "MelBand Roformer (vocals, instrumental)" section.

 

4) We added a new algorithm to the Experimental section: "Apollo MP3 Enhancer (by JusperLee)". This algorithm improves the sound quality of MP3 files compressed with a bitrate of 128 kbps or less. The algorithm is based on the paper "Apollo: Band-sequence Modeling for High-Quality Audio Restoration" and the model is available on huggingface. Below are the spectrograms for the audio compressed to 32 kbps (left) and restored by the new algorithm (right).

apollo_mix.pngapollo_restore.png

 

Listen to: demo, user demos.

 

5) We added the "Aspiration by Sucial" algorithm. This algorithm extracts whispers from the voice. The algorithm has limited use, but may be useful to someone. The model was published in our open models topic on github and is also available for download on huggingface.

 

Listen to: demo, user demos.

 

 

 

Link to comment
Share on other sites

31 minutes ago, The Great Gonzales said:

Anyways here is the "press release":

 

  Reveal hidden contents

 

Huh the demo is pretty impressive actually. I thought the "restored" sounded like total shit until I listened to the original and I couldn't believe how much worse it was.

 

It definitely doesn't make a crap source listenable but it's much better than I thought it'd be

Link to comment
Share on other sites

That's actually not bad at all in comparison to the original file. This could probably be used well for a lot of music that's only ever surfaced in the shittiest of quality.

 

My only concern is some folks will start using it on, say, MP3 session leaks, and claiming that they're "lossless". People are already doing that of course, but I think this'll potentially make the issue much worse.

Link to comment
Share on other sites

2 hours ago, Andy said:

Now, can the AI separate the vocals from my brain, 

I think Elon Musk is working on that :lol:

 

1 hour ago, Manakin Skywalker said:

My only concern is some folks will start using it on, say, MP3 session leaks, and claiming that they're "lossless". People are already doing that of course, but I think this'll potentially make the issue much worse.

That's the problem with a tool like this, some will abuse it and make it more difficult to know what's legit. But then again we've got people like you to call it out when it does pop up ;)

Link to comment
Share on other sites

3 hours ago, Groovygoth666 said:

Did this for @Holko the other day, used lalal.ai to remove the vocals, but needed vocalremover.org for the very ending (sorry Holko couldn't get the timpani from the film audio) -

 

 

 

 

Was that with Perseus?

Link to comment
Share on other sites

8 minutes ago, Groovygoth666 said:

I think so, just checked and it looks like that's the default, which is what I picked and didn't tamper with the settings. 

Yeah, I was just asking, because I just got an email this morning (5:34 AM) announcing it's "release"

Link to comment
Share on other sites

3 hours ago, The Great Gonzales said:

Yeah, I was just asking, because I just got an email this morning (5:34 AM) announcing it's "release"

Ooohhh I did this a couple days ago so not sure then, I'll put it through again and see if it's any different just to confirm.

 

2 hours ago, Manakin Skywalker said:

 

tenor.gif

94xn29.gif

 

Link to comment
Share on other sites

Someone at FSM mentioned that neither the ST box set nor the 1701 Series have Shatner’s narration.  Anyone feel up to using the best models to isolate that narration from either the DVDs or Blu Rays? Or suggesting which ones I should consider?

Link to comment
Share on other sites

19 minutes ago, Andy said:

Someone at FSM mentioned that neither the ST box set nor the 1701 Series have Shatner’s narration.  Anyone feel up to using the best models to isolate that narration from either the DVDs or Blu Rays? Or suggesting which ones I should consider?

For MVsep, it's still BS Roformer, with the ver 2024.08 model type

 

LALA.AI has a new one called Perseus what I have not checked yet.

Link to comment
Share on other sites

2 minutes ago, ThePenitentMan1 said:

One day we'll be able to extract individual piano keys from a recording, too!

We're already able to separate individual drum kit pieces, so that wouldn't be too surprising!

Link to comment
Share on other sites

Sounds good! I did the Omen a bunch of months ago. Is this getting better such that I should run it through again with an updated model?

Link to comment
Share on other sites

3 minutes ago, Andy said:

Sounds good! I did the Omen a bunch of months ago. Is this getting better such that I should run it through again with an updated model?

 

Why, did you want to sing the lyrics yourself?

Link to comment
Share on other sites

29 minutes ago, Andy said:

Sounds good! I did the Omen a bunch of months ago. Is this getting better such that I should run it through again with an updated model?

It's definitely better than back then. BS RoFormer with 2024.08 would be recommended

Link to comment
Share on other sites

These sound incredible! Something I’ve always been interested in is isolating the voice stem in, say, “When You’re Alone” from Hook, or “Christmas, Why Can’t I Find You?” from The Grinch, and then pitch correcting the rough vocal performances, then combining the stems back together. I tried a while ago, but wasn’t very successful…

Link to comment
Share on other sites

@The Great Gonzales - I am just dipping my toes into this tech. If I wanted to re-create DME stems for a feature film, I'm assuming I'd use "BanditPlus"? 

Any advantage to using 7.1 stems? Does that produce cleaner results?

Link to comment
Share on other sites

24 minutes ago, harryfrishberg said:

@The Great Gonzales - I am just dipping my toes into this tech. If I wanted to re-create DME stems for a feature film, I'm assuming I'd use "BanditPlus"? 

Any advantage to using 7.1 stems? Does that produce cleaner results?

BandIt V2 probably.

 

No idea

Link to comment
Share on other sites

Has anyone tried AudioModify maybe, what are your experiences with it? I found that it has some cool features when it comes to sound, and even a voice customization works really good. Worth a try definitely, you will even get a royalty free output so you can publish.

 

Link to comment
Share on other sites

On 05/10/2024 at 3:52 PM, Trope said:

These sound incredible! Something I’ve always been interested in is isolating the voice stem in, say, “When You’re Alone” from Hook, or “Christmas, Why Can’t I Find You?” from The Grinch, and then pitch correcting the rough vocal performances, then combining the stems back together. I tried a while ago, but wasn’t very successful…


For the latter, I just did an instrumental version of it instead. 

Link to comment
Share on other sites

 

This is a literal dream come true. The technology has caught up enough to finally help me achieve this goal: 

Using AI stem separation and extensive editing, filtering and adjustments of the film audio, I was able to *finally* restore the three cues of the "War" sequence, as originally intended, to the full audio mix. 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Guidelines.