In this work we explore the explicit use of the stereophonic spatial information in music source separation. A naïve approach could be to assume that the source of interest is with a fixed location throughout the entire song as, but we know that this assumption does not always stand. Instead, we formulate the problem as an informed source separation task, where the source separation system benefits from the auxiliary information about the source’s spatial location. Then, the remaining question is “where do we get the information from?“
Here, we assume that the user may want to interact with this source separation process—they are eager to do better source separation, so they listen to the music carefully and provide some guidance to the system as to which direction to focus on in order to extract sources efficiently. This kind of interactive source separation is not a new idea. For example, the users can be asked to scribble on the 2D image representation of the song (a.k.a., spectrogram) to extract out a source of interest. This time, we focus on spatially informed music source separation, which we name SpaIn-Net.
Check out our ICASSP paper, source-code, and project page provided above!