Jump to content

Using AI to manipulate film score audio (dialogue removal, stem separation, etc)


Jay

Recommended Posts

I remember the wind and guitar models were very fucky so that method was mostly good to get brass and string bits to patch up other tracks where the vocal remover messed them up.

1 minute ago, Jay said:

Wow, this is like our hobby on expert level +++

I wwas very happy that there were only 2 full unreleased cues to extract and a few others to clean up! :lol:

Link to comment
Share on other sites

I tried using Becruily on Bilbo and Gandalf singing and it sounds decent at low volume, but on headphones you really hear the weird stretching artifact left behind :(

 

 

 

 

Link to comment
Share on other sites

32 minutes ago, Jay said:

I tried using Becruily on Bilbo and Gandalf singing and it sounds decent at low volume, but on headphones you really hear the weird stretching artifact left behind :(

 

 

 

 

The Bilbo one is in the DVD menus.

 

 

Link to comment
Share on other sites

36 minutes ago, Jay said:

Wow I definitely see why the CRs included Gandalf singing, that definitely wasn't meant to be listened to without it 😂

 

I prefer the other one without Bilbo though

Link to comment
Share on other sites

3 hours ago, Jay said:

I tried using Becruily on Bilbo and Gandalf singing and it sounds decent at low volume, but on headphones you really hear the weird stretching artifact left behind :(

 

 

 

 

This was what I got with BS Roformer when I tried it in September :

 

 

Green Dragon instrumental using Becruily:

 

 

Link to comment
Share on other sites

4 hours ago, Jay said:

The music I'm uploading is mono to begin with, so I don't know what it would do to stereo files, unfortunately


Earth Star Voyager perhaps?

Link to comment
Share on other sites

18 hours ago, Doo_liss said:

This was what I got with BS Roformer when I tried it in September :

 

The Becruily version I made definitely sounds better than your bs roformer version, but neither are good enough to fool anybody

 

 

18 hours ago, Doo_liss said:

Green Dragon instrumental using Becruily:

 

Woah!  That's so cool!!!

Link to comment
Share on other sites

12 hours ago, Jay said:

Haha, yup

 

I'm not familiar with it, only through your enthusiasm for the score.  Post up some samples if you can when you finish.  I'd love to hear it.

Link to comment
Share on other sites

Yeah I was thinking I'd try to throw together some kind of suite on youtube or something to try to get it more exposure.

 

Since its a Disney owned score, only Intrada can expand it, and I believe Disney has only ever let them work on theatrical films, and not tv shows or tv-movies, so my hopes are low, sadly.

Link to comment
Share on other sites

I just completed a series of sequences from TITANIC that I'd done previously, but this time with the MVSep AI and Stemroller program which has completely changed how successful these experiments were, because I was never able to really get the existing edited music within the soundmix of a movie and remove it with any real precision. 

I have them on my Google Drive, but if I upload them to YouTube, they'll get flagged for copyright by Disney which will block them from being viewable everywhere in the world except for North America, because Paramount owns the rights for it in the US / Canada. 

I'll upload them to Vimeo when I can, but they limit free accounts to two videos per month and I have to wait until January 12 to put anything up. Ugh. 

Link to comment
Share on other sites


I hate how frustrating it is getting this all to work (the extensive editing is an unending tedious nightmare, but the results are always worth it) but at the same time there isn't much else I'd rather do. 

Link to comment
Share on other sites

I experimented with taking the output of one model and putting it into another and it works great. De-crowd, then vocal remover.

 

My main application recently has been one of those 90s TV shows where the live sound was tossed due to copyright, and custom music and custom applause was added in post. What that meant was that the end result was more clinical because you didn't have live acoustics muddying everything.

 

Results ranged from basically getting an isolated score for some performances, to some noticeable impact where the applause had been, but nothing worse than like when a cassette tape is a bit worn - vastly better than intrusive sounds and completely listenable. I was ecstatic when I got these results, even though they're far from perfect.

 

And for one or two cases, I get rid of the applause, then run through a vocal remover, and the worst I'm left with is the occasional foley-like sound, and those can sort of work with the music.

 

This is one is missing a loud vocal at 0:11, and applause for the entire rest of the track. Considering how flat some library tracks sound to start with, and that I thought I'd never hear this music, this is pretty good.

 

 

This starts with heavy applause, which reduces as the track goes. More affected, but I think still listenable considering the source.

 

Link to comment
Share on other sites

My need has always been to create instrumental versions of songs from films and musicals, and the best one I've found so far for the job is Mel-RoFormer by unwa v1e. It's only available via a test link for subscribers (I'm on the 'Ultimate' package) from the Ultimate Vocal Remover Online website and is well worth the money I can tell you! It produces the cleanest result possible when used in conjunction with the Mel-RoFormer correct phase/de-noise plugin and here is an example from my YouTube channel:-

 

 

Incidentally, I have been testing out various models on the MVSep website, but certainly where the removal of SFX from film soundtracks is concerned, none of the models there do an adequate job of it unfortunately and I've tried all sorts of things on all the available models as well. 

Link to comment
Share on other sites

3 hours ago, Giftheck said:

6M5 Duel of Yoda And Sidious, the cleanest version so far.

 

 

Nearly clean on both fronts - orchestral for Duel of the Fates, and choir from the version in Episode III - some minor dips where the AI took out SFX from the chorus, but this is the closest we've gotten so far to having ROTS 6M5 as it was recorded rather than just straight-up reusing the TPM recording in its entirety.

One of the TCW Season 3 trailers uses the Episode III version with a bit less sfx than the film, you might be able to combine the two for even better results 

Link to comment
Share on other sites

2 hours ago, enderdrag64 said:

One of the TCW Season 3 trailers uses the Episode III version with a bit less sfx than the film, you might be able to combine the two for even better results 

 

I've only found a really low-quality, watery version of that, unfortunately. Which is a pity because I think if I had it in high quality I could get the one part majorly affected by SFX dropout in my edit fixed.

Link to comment
Share on other sites

40 minutes ago, Giftheck said:

 

I've only found a really low-quality, watery version of that, unfortunately. Which is a pity because I think if I had it in high quality I could get the one part majorly affected by SFX dropout in my edit fixed.

Yeah the youtube upload is abysmal quality

 

There's a higher quality version on starwars.com: https://www.starwars.com/video/the-clone-wars-season-3-trailer

Link to comment
Share on other sites

Just now, enderdrag64 said:

I don't remember seeing it on my S3 bluray 

Did they ever do any of those episodes selections releases like they did for SE1, that might contain it?

Link to comment
Share on other sites

19 hours ago, enderdrag64 said:

Yeah the youtube upload is abysmal quality

 

There's a higher quality version on starwars.com: https://www.starwars.com/video/the-clone-wars-season-3-trailer

 

Weirdly enough, neither MVSep nor LALAL.AI could pick up the chorus in this trailer's audio.

Link to comment
Share on other sites

My favourite result so far - this piece is heard a few times on TV, but 100% of the time under applause.

 

 

Removing applause also effectively removes the percussion as well, so I isolated the percussion (extremely clean) from another episode that has the instrumentation (but a different melody) and combined the two. Plus a pass through a vocal remover to remove a bird sound. The synth part of the full quality original has a slightly wobbly 90s sound.

 

I'm loving the crowd removal - it may degrade the music a bit (much more in some cases) but you don't get any applause left behind, and that's the part that makes TV rips generally no good. If you can live with music that adopts a slightly worn cassette tape feeling in places (and I most certainly can), it's game changer.

Link to comment
Share on other sites

Spoiler

January news

1) We have changed the way of selecting models in the menu. Now, instead of a dropdown menu, there is a list with the ability to display information about the models and statistics. If you wish, you can roll back to the old version of the list.

2) By popular demand, we have added the HQ5 instrumental model to the site for the MDX-B algorithm (vocals, instrumental).

3) We have published weights obtained on the MUSDB18 dataset for the top models BSRoformer, MelBandRoformer and SCNet XL. These weights can be an excellent starting point for training your own models.

4) We added three models from unwa and 2 models from becruily, which are based on the Mel-Band RoFormer architecture. All models are focused on increasing the fullness metric either for vocals or for instrumental. They give a fuller sound but may contain more noise. The new models are available under the names:

  • unwa Instrumental v1 (SDR vocals: 10.24, SDR instrum: 16.54)
  • unwa Instrumental v1e (SDR vocals: 10.05, SDR instrum: 16.36)
  • unwa big beta v5e (SDR vocals: 10.59, SDR instrum: 16.89)
  • becruily instrum high fullness (SDR instrum: 16.47)
  • becruily vocals high fullness (SDR vocals: 10.55)

The models are located in the "MelBand Roformer (vocals, instrumental)" section. Detailed metrics are available in the table below:

Model Vocals fullness Vocals bleedless  Vocals SDR Vocals L1Freq Instrum fullness Instrum bleedless  Instrum SDR Instrum L1Freq
MelBand Roformer (Kimberley Jensen) 16.66 36.51 11.01 38.96 27.71 46.72 17.32 39.77
MelBand Roformer (ver. 2024.08) 16.39 39.13 11.18 39.26 27.74 47.07 17.49 40.16
Bas Curtiz edition 16.30 38.94 11.18 39.18 27.49 47.00 17.49 40.15
MelBand Roformer (ver. 2024.10) 16.92 37.78 11.28 39.41 27.71 47.29 17.59 40.29
unwa Instrumental v1 (SDR vocals: 10.24, SDR instrum: 16.54) 15.89 27.48 10.24 36.06 35.44 38.02 16.55 38.67
unwa Instrumental v1e (SDR vocals: 10.05, SDR instrum: 16.36) 14.67 26.83 10.06 34.37 38.85 35.68 16.37 38.31
unwa big beta v5e (SDR vocals: 10.59, SDR instrum: 16.89) 20.78 32.02 10.59 38.53 25.65 45.90 16.90 37.31
becruily instrum high fullness (SDR instrum: 16.47) 15.76 30.15 10.16 35.84 33.93 40.55 16.47 38.86
becruily vocals high fullness (SDR vocals: 10.55) 20.72 31.25 10.55 38.84 28.28 40.85 16.86 38.24

5) We have added 2 models from lew for Super Resolution task. The first "Universal Super Resolution (by Lew)" - restores high frequencies for music, the second more specialized "Vocals Super Resolution (by Lew)" restores the quality and high frequencies for vocals. They are available for selection in the menu under the item "Apollo Enhancers (by JusperLee and Lew)".

6) We have added a set of models for separating vocals into Male/Female. There are 2 models from Sucial and aufr33. There are also two models trained by the MVSep team based on SCNet XL and MelBand RoFormer. All models available in "MVSep Male/Female separation".

Algorithm name Male/Female validation dataset
SDR Male SDR Female L1_Freq Male L1_Freq Female
BSRoformer by Sucial (SDR: 6.52) 6.82 6.23 40.99 40.62
BSRoformer by aufr33 (SDR: 8.18) 8.47 7.89 46.65 44.73
SCNet XL (SDR: 11.83) 12.08 11.58 50.50 51.51
MelRoformer (2025.01) (SDR: 13.03) 13.39 12.68 57.61 56.76

7) We have added a new SCNet XL model for bass with a very high SDR: 13.81. In the ensemble, the SDR metric reached 14.07, which is a record. The model is available under the item MVSep Bass (bass, other)

8) We have added the second version of the model for removing the dereverberation effect from Sucial to the Reverb Removal (noreverb) section. Model name: Reverb removal by Sucial v2 (MelRoformer).

9) We have prepared a new model for vocals based on the SCNet XL architecture, it has achieved quite high metrics.

Algorithm name Multisong dataset Synth dataset MDX23 Leaderboard
SDR Vocals SDR Instrumental SDR Vocals SDR Instrumental SDR Vocals
SCNet 10.25 16.56 12.27 11.97 ---
SCNet Large 10.74 17.05 12.89 12.59 ---
SCNet XL 10.96 17.27 13.08 12.78 ---

Adding SCNet XL to Mel and BS roformers in the ensemble increased the SDR metric:
vocals: 11.54 -> 11.61
instrumental: 17.84 -> 17.92

10) We have added a new model for organ musical instrument. It is available in the list under the name: MVSep Organ (organ, other).

11) We have updated our API, adding more functionality related to the task queue, rating, and the use of different types of separation, as well as added a Quality Checker to the API. More information is available in the documentation: https://mvsep.com/full_api

12) We are testing an Android application, it will soon appear on Google Play. We will announce this separately.

13) In the near future, we plan to publish examples of using the MVSep API in Python. Both simple console programs and those with a graphical interface.

MVsep January update:

 

Testing the MDX-B instrumental model mentioned in the update:

 

 

 

Eowyn and Pippin's songs from TTT and ROTK instrumental with Becruily:

 

 

 

 

 

Link to comment
Share on other sites

On 22/01/2025 at 6:22 AM, Giftheck said:

Weirdly enough, neither MVSep nor LALAL.AI could pick up the chorus in this trailer's audio.

 

I got it mostly working with Becruily instrumental, but it sounds like crap.

 

Choir.flac

Link to comment
Share on other sites

  • 4 weeks later...

Tried assembling the film version of What's Up Danger (Into the Spider-Verse) with MVSep.

 

 

First I used MDX23c to separate the vocals and instrumental of the song as is in the album, then I ripped the version in the film with channel 3 and without it. Then I used models 

Demucs4HT, DnR v3 and MDX23 in both of the rips to hear the results, which led to 8 files to sort. For 01:35, the model was DnR v3, but it wasn't cleaned enough for 02:27 onwards. Since I was already using the album track, I had to clean the percussion too and use just Pemberton's overlay. 

 

Using Demucs4 in the results from DnR v3 and MDX23, I got a cleaned but thin overlay, so I put them together alongside the version with the percussion to fill it. It doesn't sound that great, but it's something.

 

It ended up looking like this:

image.png

 

Link to comment
Share on other sites

  • 2 months later...

The saxophone model I've been trying for a while often mistook any brass for a saxophone. Or some brass. When it felt like it. Worth a shot, it was useful here and there when using the ground up method.

Link to comment
Share on other sites

  • 2 weeks later...

It's a shame you can't feed the cues from the soundtrack so that when you put the film audio in it knows what the music is so the seperation is more accurate. 

Link to comment
Share on other sites

On 06/05/2025 at 9:49 AM, Giftheck said:

The second is Return of the Jedi's Victory Celebration (film version) and End Credits, properly segued together with the End Credits (which now also contains the correct intro). I used the newer unwa Instrumental v1e plus model for this one.

 

I'm aware it lacks the 'Stormtrooper steel drum' the film has, but I'm not convinced that wasn't just something Ben Burtt threw in on his end.

 

 

That solo flute is priceless compared to the Ewok voices on the album.

Link to comment
Share on other sites

Just now, Cameron007 said:

 

That solo flute is priceless compared to the Ewok voices on the album.

 

I am so happy the new MVSEP model was able to remove those Ewok voices. LALAL.AI had previously been my go-to but even its most recent model tended to drag out the flute with the voices.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Guidelines.