利用图神经网络AttentiveFP,从SMILES字符串直接预测有机碳分配系数(log KOC),助力土壤吸附行为及环境迁移性评估。【此简介由AI生成】
以下内容由 AI 翻译,如有问题请 点此提交 issue 反馈
license: mit tags:
- chemistry
- molecular-property-prediction
- graph-neural-networks
- attentivefp
- pytorch-geometric
- toxicity-prediction language:
- en pipeline_tag: tabular-regression
Pyrosage KOC AttentiveFP 模型
模型描述
这是一个基于 AttentiveFP(注意力指纹)的图神经网络模型,经训练用于预测有机碳分配系数(log KOC)。该性质可预测土壤吸附行为,是环境迁移性评估的关键指标。该模型以 SMILES 字符串作为输入,利用图神经网络直接从分子结构预测分子性质。
模型详情
- 模型类型:AttentiveFP(图神经网络)
- 任务:回归
- 输入:SMILES 字符串(分子表示)
- 输出:连续数值
- 框架:PyTorch Geometric
- 架构:具有增强原子和键特征的 AttentiveFP
超参数
{
"name": "larger_model",
"hidden_channels": 128,
"num_layers": 3,
"num_timesteps": 3,
"dropout": 0.1,
"learning_rate": 0.0005,
"weight_decay": 0.0001,
"batch_size": 32,
"epochs": 50,
"patience": 10
}
使用方法
安装
pip install torch torch-geometric rdkit-pypi
加载模型
import torch
from torch_geometric.nn import AttentiveFP
from rdkit import Chem
from torch_geometric.data import Data
# Load the model
model_dict = torch.load('pytorch_model.pt', map_location='cpu')
state_dict = model_dict['model_state_dict']
hyperparams = model_dict['hyperparameters']
# Create model with correct architecture
model = AttentiveFP(
in_channels=10, # Enhanced atom features
hidden_channels=hyperparams["hidden_channels"],
out_channels=1,
edge_dim=6, # Enhanced bond features
num_layers=hyperparams["num_layers"],
num_timesteps=hyperparams["num_timesteps"],
dropout=hyperparams["dropout"],
)
model.load_state_dict(state_dict)
model.eval()
进行预测
def smiles_to_data(smiles):
"""Convert SMILES string to PyG Data object"""
mol = Chem.MolFromSmiles(smiles)
if mol is None:
return None
# Enhanced atom features (10 dimensions)
atom_features = []
for atom in mol.GetAtoms():
features = [
atom.GetAtomicNum(),
atom.GetTotalDegree(),
atom.GetFormalCharge(),
atom.GetTotalNumHs(),
atom.GetNumRadicalElectrons(),
int(atom.GetIsAromatic()),
int(atom.IsInRing()),
# Hybridization as one-hot (3 dimensions)
int(atom.GetHybridization() == Chem.rdchem.HybridizationType.SP),
int(atom.GetHybridization() == Chem.rdchem.HybridizationType.SP2),
int(atom.GetHybridization() == Chem.rdchem.HybridizationType.SP3)
]
atom_features.append(features)
x = torch.tensor(atom_features, dtype=torch.float)
# Enhanced bond features (6 dimensions)
edges_list = []
edge_features = []
for bond in mol.GetBonds():
i = bond.GetBeginAtomIdx()
j = bond.GetEndAtomIdx()
edges_list.extend([[i, j], [j, i]])
features = [
# Bond type as one-hot (4 dimensions)
int(bond.GetBondType() == Chem.rdchem.BondType.SINGLE),
int(bond.GetBondType() == Chem.rdchem.BondType.DOUBLE),
int(bond.GetBondType() == Chem.rdchem.BondType.TRIPLE),
int(bond.GetBondType() == Chem.rdchem.BondType.AROMATIC),
# Additional features (2 dimensions)
int(bond.GetIsConjugated()),
int(bond.IsInRing())
]
edge_features.extend([features, features])
if not edges_list:
return None
edge_index = torch.tensor(edges_list, dtype=torch.long).t()
edge_attr = torch.tensor(edge_features, dtype=torch.float)
return Data(x=x, edge_index=edge_index, edge_attr=edge_attr)
def predict(model, smiles):
"""Make prediction for a SMILES string"""
data = smiles_to_data(smiles)
if data is None:
return None
batch = torch.zeros(data.num_nodes, dtype=torch.long)
with torch.no_grad():
output = model(data.x, data.edge_index, data.edge_attr, batch)
return output.item()
# Example usage
smiles = "CC(=O)OC1=CC=CC=C1C(=O)O" # Aspirin
prediction = predict(model, smiles)
print(f"Prediction for {smiles}: {prediction}")
训练数据
该模型在 Pyrosage 项目的 KOC 数据集上进行训练,该数据集专注于分子毒性和环境性质预测。
模型性能
详细性能指标请参见训练日志。
局限性
- 模型基于特定化学数据集训练,可能无法泛化到所有分子类型
- 对于与训练分布差异显著的分子,性能可能有所不同
- 输入需要符合正确的 SMILES 字符串格式
引用
如果使用此模型,请引用 Pyrosage 项目:
@misc{pyrosagekoc,
title={Pyrosage KOC AttentiveFP Model},
author={UPCI NTUA},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/upci-ntua/pyrosage-koc-attentivefp}
}
许可证
MIT 许可证 - 详情参见 LICENSE 文件。