from kdock.data.core import Data
from kdock.px.core import *
Protenix: virtual screening
Install
git clone https://github.com/bytedance/Protenix.git
cd Protenix
pip install .
Setup
Protein sequence
= Data.get_kras_seq()
kras kras
ID | WT_sequence | g12d_seq | g12c_seq | |
---|---|---|---|---|
0 | kras_human | MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI... | MTEYKLVVVGADGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI... | MTEYKLVVVGACGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI... |
1 | kras_human_isoform2b | MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI... | MTEYKLVVVGADGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI... | MTEYKLVVVGACGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI... |
= kras.iloc[0]['g12d_seq'] g12d
g12d
'MTEYKLVVVGADGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHHYREQIKRVKDSEDVPMVLVGNKCDLPSRTVDTKQAQDLARSYGIPFIETSAKTRQRVEDAFYTLVREIRQYRLKKISKEEKTPGCVKIKKCIIM'
Get MSA on server
Submitted on protenix-server to get msa/pairing & unpairing a3m files.
Upload and make a folder that contains the two files, use it as msa_dir
SMILES
= Data.get_mirati_g12d() df
=False).sum() df.ID.duplicated(keep
0
df.head()
ID | SMILES | Kd | IC50 | erk_IC50 | |
---|---|---|---|---|---|
0 | US_1 | CN1CCC[C@H]1COc1nc(N2CC3CCC(C2)N3)c2cnc(cc2n1)... | 97.7 | 124.7 | 3159.1 |
1 | US_4 | Oc1cc(-c2ncc3c(nc(OCCc4ccccn4)nc3c2F)N2CC3CCC(... | 155.7 | 496.2 | 8530.0 |
2 | US_5 | Cn1nccc1COc1nc(N2CC3CCC(C2)N3)c2cnc(c(F)c2n1)-... | 294.8 | 722.9 | 8193.8 |
3 | US_6 | Cc1cccnc1CCOc1nc(N2CC3CCC(C2)N3)c2cnc(c(F)c2n1... | 442.2 | 434.1 | 11518.2 |
4 | US_7 | Oc1cc(-c2ncc3c(nc(OCCc4ncccn4)nc3c2F)N2CC3CCC(... | 463.5 | 1867.3 | NaN |
Test a positive control
MRTX
get_single_protein_ligand_json?
Signature: get_single_protein_ligand_json( job_name, protein_seq, msa_dir, SMILES=None, CCD=None, json_path=None, ) Docstring: Generate and optionally save a JSON config for one protein-ligand job. File: /tmp/ipykernel_1724/1197518213.py Type: function
= get_single_protein_ligand_json('kras_g12d_mrtx',
_
g12d,='kras_g12d_msa',
msa_dir="C#CC1=C(C=CC2=CC(=CC(=C21)C3=NC=C4C(=C3F)N=C(N=C4N5CC6CCC(C5)N6)OC[C@@]78CCCN7C[C@@H](C8)F)O)F",
SMILES='g12d_mrtx.json'
json_path )
JSON saved to g12d_mrtx.json
Protenix command
protenix predict --input g12d_mrtx.json --out_dir ./output --seeds 101
Run with other SMILES
get_virtual_screening_json?
Signature: get_virtual_screening_json( df, protein_seq, msa_dir, id_col, smi_col=None, ccd_col=None, save_json=None, ) Docstring: Get json file of single protein against multiple SMILES in a dataframe. File: /tmp/ipykernel_1724/3782683879.py Type: function
= get_virtual_screening_json(df,
_
g12d,'kras_g12d_msa',
='ID',
id_col='SMILES',
smi_col='kras_g12d_input.json') save_json
JSON saved to kras_g12d_input.json
protenix predict --input kras_g12d_input.json --out_dir ./output --seeds 101