pyrosage-koc-attentivefp:基于AttentiveFP的有机碳分配系数(log KOC)预测模型

利用图神经网络AttentiveFP,从SMILES字符串直接预测有机碳分配系数(log KOC),助力土壤吸附行为及环境迁移性评估。【此简介由AI生成】

分支2Tags0

license: mit tags:

  • chemistry
  • molecular-property-prediction
  • graph-neural-networks
  • attentivefp
  • pytorch-geometric
  • toxicity-prediction language:
  • en pipeline_tag: tabular-regression

Pyrosage KOC AttentiveFP 模型

模型描述

这是一个基于 AttentiveFP(注意力指纹)的图神经网络模型,经训练用于预测有机碳分配系数(log KOC)。该性质可预测土壤吸附行为,是环境迁移性评估的关键指标。该模型以 SMILES 字符串作为输入,利用图神经网络直接从分子结构预测分子性质。

模型详情

  • 模型类型:AttentiveFP(图神经网络)
  • 任务:回归
  • 输入:SMILES 字符串(分子表示)
  • 输出:连续数值
  • 框架:PyTorch Geometric
  • 架构:具有增强原子和键特征的 AttentiveFP

超参数

{
  "name": "larger_model",
  "hidden_channels": 128,
  "num_layers": 3,
  "num_timesteps": 3,
  "dropout": 0.1,
  "learning_rate": 0.0005,
  "weight_decay": 0.0001,
  "batch_size": 32,
  "epochs": 50,
  "patience": 10
}

使用方法

安装

pip install torch torch-geometric rdkit-pypi

加载模型

import torch
from torch_geometric.nn import AttentiveFP
from rdkit import Chem
from torch_geometric.data import Data

# Load the model
model_dict = torch.load('pytorch_model.pt', map_location='cpu')
state_dict = model_dict['model_state_dict']
hyperparams = model_dict['hyperparameters']

# Create model with correct architecture
model = AttentiveFP(
    in_channels=10,  # Enhanced atom features
    hidden_channels=hyperparams["hidden_channels"],
    out_channels=1,
    edge_dim=6,  # Enhanced bond features
    num_layers=hyperparams["num_layers"],
    num_timesteps=hyperparams["num_timesteps"],
    dropout=hyperparams["dropout"],
)

model.load_state_dict(state_dict)
model.eval()

进行预测

def smiles_to_data(smiles):
    """Convert SMILES string to PyG Data object"""
    mol = Chem.MolFromSmiles(smiles)
    if mol is None:
        return None

    # Enhanced atom features (10 dimensions)
    atom_features = []
    for atom in mol.GetAtoms():
        features = [
            atom.GetAtomicNum(),
            atom.GetTotalDegree(),
            atom.GetFormalCharge(),
            atom.GetTotalNumHs(),
            atom.GetNumRadicalElectrons(),
            int(atom.GetIsAromatic()),
            int(atom.IsInRing()),
            # Hybridization as one-hot (3 dimensions)
            int(atom.GetHybridization() == Chem.rdchem.HybridizationType.SP),
            int(atom.GetHybridization() == Chem.rdchem.HybridizationType.SP2),
            int(atom.GetHybridization() == Chem.rdchem.HybridizationType.SP3)
        ]
        atom_features.append(features)

    x = torch.tensor(atom_features, dtype=torch.float)

    # Enhanced bond features (6 dimensions)
    edges_list = []
    edge_features = []
    for bond in mol.GetBonds():
        i = bond.GetBeginAtomIdx()
        j = bond.GetEndAtomIdx()
        edges_list.extend([[i, j], [j, i]])

        features = [
            # Bond type as one-hot (4 dimensions)
            int(bond.GetBondType() == Chem.rdchem.BondType.SINGLE),
            int(bond.GetBondType() == Chem.rdchem.BondType.DOUBLE),
            int(bond.GetBondType() == Chem.rdchem.BondType.TRIPLE),
            int(bond.GetBondType() == Chem.rdchem.BondType.AROMATIC),
            # Additional features (2 dimensions)
            int(bond.GetIsConjugated()),
            int(bond.IsInRing())
        ]
        edge_features.extend([features, features])

    if not edges_list:
        return None

    edge_index = torch.tensor(edges_list, dtype=torch.long).t()
    edge_attr = torch.tensor(edge_features, dtype=torch.float)

    return Data(x=x, edge_index=edge_index, edge_attr=edge_attr)

def predict(model, smiles):
    """Make prediction for a SMILES string"""
    data = smiles_to_data(smiles)
    if data is None:
        return None
    
    batch = torch.zeros(data.num_nodes, dtype=torch.long)
    with torch.no_grad():
        output = model(data.x, data.edge_index, data.edge_attr, batch)
        return output.item()

# Example usage
smiles = "CC(=O)OC1=CC=CC=C1C(=O)O"  # Aspirin
prediction = predict(model, smiles)
print(f"Prediction for {smiles}: {prediction}")

训练数据

该模型在 Pyrosage 项目的 KOC 数据集上进行训练,该数据集专注于分子毒性和环境性质预测。

模型性能

详细性能指标请参见训练日志。

局限性

  • 模型基于特定化学数据集训练,可能无法泛化到所有分子类型
  • 对于与训练分布差异显著的分子,性能可能有所不同
  • 输入需要符合正确的 SMILES 字符串格式

引用

如果使用此模型,请引用 Pyrosage 项目:

@misc{pyrosagekoc,
  title={Pyrosage KOC AttentiveFP Model},
  author={UPCI NTUA},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/upci-ntua/pyrosage-koc-attentivefp}
}

许可证

MIT 许可证 - 详情参见 LICENSE 文件。

项目介绍

利用图神经网络AttentiveFP,从SMILES字符串直接预测有机碳分配系数(log KOC),助力土壤吸附行为及环境迁移性评估。【此简介由AI生成】

定制我的领域

下载使用量

0

项目总下载次数(含Clone、Pull、 zip 包及 release 下载),每日凌晨更新

语言类型

Python100%