hdemucs_high_musdbhq:基于深度学习的高质量音乐源分离模型

采用编码器-解码器架构与注意力机制,高效分离音频混合中的人声、鼓、贝斯及其他乐器,提升音乐制作后处理与音频分析质量。【此简介由AI生成】

分支1Tags0
b2c48d0e创建于 2月27日4次提交
文件最后提交记录最后更新时间
initial commit1 年前
add ignore 2 个月前
add ignore 2 个月前

license: mit

简介

在 ICASSP 2024 Cadenza 挑战赛中,demucs 模型是一种创新的声音分离技术,它借助深度学习算法从音频混合信号中高效分离出纯净的音频轨道。demucs 模型采用复杂的神经网络结构,包括编码器 - 解码器架构和注意力机制,以提升分离过程中的音频质量与准确性。该模型在音乐制作后期处理、音频分析以及音乐信息检索等多个领域均展现出卓越性能,为音乐技术领域带来了突破性进展。

使用方法

import torch
import torchaudio
from typing import Callable
from functools import partial
from dataclasses import dataclass
from modelscope import snapshot_download
from torchaudio.models import hdemucs_high

@dataclass
class SourceSeparationBundle:
    """Dataclass that bundles components for performing source separation.

    Example
        >>> import torchaudio
        >>> from torchaudio.pipelines import CONVTASNET_BASE_LIBRI2MIX
        >>> import torch
        >>>
        >>> # Build the separation model.
        >>> model = CONVTASNET_BASE_LIBRI2MIX.get_model()
        >>> 100%|███████████████████████████████|19.1M/19.1M [00:04<00:00, 4.93MB/s]
        >>>
        >>> # Instantiate the test set of Libri2Mix dataset.
        >>> dataset = torchaudio.datasets.LibriMix("/home/datasets/", subset="test")
        >>>
        >>> # Apply source separation on mixture audio.
        >>> for i, data in enumerate(dataset):
        >>>     sample_rate, mixture, clean_sources = data
        >>>     # Make sure the shape of input suits the model requirement.
        >>>     mixture = mixture.reshape(1, 1, -1)
        >>>     estimated_sources = model(mixture)
        >>>     score = si_snr_pit(estimated_sources, clean_sources) # for demonstration
        >>>     print(f"Si-SNR score is : {score}.)
        >>>     break
        >>> Si-SNR score is : 16.24.
        >>>
    """

    _model_path: str
    _model_factory_func: Callable[[], torch.nn.Module]
    _sample_rate: int

    @property
    def sample_rate(self) -> int:
        """Sample rate of the audio that the model is trained on.

        :type: int
        """
        return self._sample_rate

    def get_model(self) -> torch.nn.Module:
        """Construct the model and load the pretrained weight."""
        model = self._model_factory_func()
        path = torchaudio.utils.download_asset(self._model_path)
        state_dict = torch.load(path)
        model.load_state_dict(state_dict)
        model.eval()
        return model

model_dir = snapshot_download(
    "monetjoe/hdemucs_high_musdbhq",
    cache_dir="./__pycache__",
)
HDEMUCS_HIGH_MUSDB = SourceSeparationBundle(
    _model_path=f"{model_dir}/hdemucs_high_musdbhq_only.pt",
    _model_factory_func=partial(
        hdemucs_high, sources=["drums", "bass", "other", "vocals"]
    ),
    _sample_rate=44100,
)
HDEMUCS_HIGH_MUSDB.__doc__ = """Pre-trained music source separation pipeline with
*Hybrid Demucs* :cite:`defossez2021hybrid` trained on the training set of MUSDB-HQ :cite:`MUSDB18HQ`.

The model is constructed by :func:`~torchaudio.models.hdemucs_high`.
Training was performed in the original HDemucs repository `here <https://github.com/facebookresearch/demucs/>`__.

Please refer to :class:`SourceSeparationBundle` for usage instructions.
"""

维护

git clone git@hf.co:monetjoe/hdemucs_high_musdbhq
cd hdemucs_high_musdbhq

镜像

https://www.modelscope.cn/models/monetjoe/hdemucs_high_musdbhq

参考资料

项目介绍

采用编码器-解码器架构与注意力机制,高效分离音频混合中的人声、鼓、贝斯及其他乐器,提升音乐制作后处理与音频分析质量。【此简介由AI生成】

定制我的领域

下载使用量

0

项目总下载次数(含Clone、Pull、 zip 包及 release 下载),每日凌晨更新