采用编码器-解码器架构与注意力机制,高效分离音频混合中的人声、鼓、贝斯及其他乐器,提升音乐制作后处理与音频分析质量。【此简介由AI生成】
以下内容由 AI 翻译,如有问题请 点此提交 issue 反馈
license: mit
简介
在 ICASSP 2024 Cadenza 挑战赛中,demucs 模型是一种创新的声音分离技术,它借助深度学习算法从音频混合信号中高效分离出纯净的音频轨道。demucs 模型采用复杂的神经网络结构,包括编码器 - 解码器架构和注意力机制,以提升分离过程中的音频质量与准确性。该模型在音乐制作后期处理、音频分析以及音乐信息检索等多个领域均展现出卓越性能,为音乐技术领域带来了突破性进展。
使用方法
import torch
import torchaudio
from typing import Callable
from functools import partial
from dataclasses import dataclass
from modelscope import snapshot_download
from torchaudio.models import hdemucs_high
@dataclass
class SourceSeparationBundle:
"""Dataclass that bundles components for performing source separation.
Example
>>> import torchaudio
>>> from torchaudio.pipelines import CONVTASNET_BASE_LIBRI2MIX
>>> import torch
>>>
>>> # Build the separation model.
>>> model = CONVTASNET_BASE_LIBRI2MIX.get_model()
>>> 100%|███████████████████████████████|19.1M/19.1M [00:04<00:00, 4.93MB/s]
>>>
>>> # Instantiate the test set of Libri2Mix dataset.
>>> dataset = torchaudio.datasets.LibriMix("/home/datasets/", subset="test")
>>>
>>> # Apply source separation on mixture audio.
>>> for i, data in enumerate(dataset):
>>> sample_rate, mixture, clean_sources = data
>>> # Make sure the shape of input suits the model requirement.
>>> mixture = mixture.reshape(1, 1, -1)
>>> estimated_sources = model(mixture)
>>> score = si_snr_pit(estimated_sources, clean_sources) # for demonstration
>>> print(f"Si-SNR score is : {score}.)
>>> break
>>> Si-SNR score is : 16.24.
>>>
"""
_model_path: str
_model_factory_func: Callable[[], torch.nn.Module]
_sample_rate: int
@property
def sample_rate(self) -> int:
"""Sample rate of the audio that the model is trained on.
:type: int
"""
return self._sample_rate
def get_model(self) -> torch.nn.Module:
"""Construct the model and load the pretrained weight."""
model = self._model_factory_func()
path = torchaudio.utils.download_asset(self._model_path)
state_dict = torch.load(path)
model.load_state_dict(state_dict)
model.eval()
return model
model_dir = snapshot_download(
"monetjoe/hdemucs_high_musdbhq",
cache_dir="./__pycache__",
)
HDEMUCS_HIGH_MUSDB = SourceSeparationBundle(
_model_path=f"{model_dir}/hdemucs_high_musdbhq_only.pt",
_model_factory_func=partial(
hdemucs_high, sources=["drums", "bass", "other", "vocals"]
),
_sample_rate=44100,
)
HDEMUCS_HIGH_MUSDB.__doc__ = """Pre-trained music source separation pipeline with
*Hybrid Demucs* :cite:`defossez2021hybrid` trained on the training set of MUSDB-HQ :cite:`MUSDB18HQ`.
The model is constructed by :func:`~torchaudio.models.hdemucs_high`.
Training was performed in the original HDemucs repository `here <https://github.com/facebookresearch/demucs/>`__.
Please refer to :class:`SourceSeparationBundle` for usage instructions.
"""
维护
git clone git@hf.co:monetjoe/hdemucs_high_musdbhq
cd hdemucs_high_musdbhq
镜像
https://www.modelscope.cn/models/monetjoe/hdemucs_high_musdbhq