Protenix: virtual screening

Install

git clone https://github.com/bytedance/Protenix.git
cd Protenix
pip install .

Setup

from kdock.data.core import Data
from kdock.px.core import *

Protein sequence

kras = Data.get_kras_seq()
kras
ID WT_sequence g12d_seq g12c_seq
0 kras_human MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI... MTEYKLVVVGADGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI... MTEYKLVVVGACGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI...
1 kras_human_isoform2b MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI... MTEYKLVVVGADGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI... MTEYKLVVVGACGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI...
g12d = kras.iloc[0]['g12d_seq']
g12d
'MTEYKLVVVGADGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHHYREQIKRVKDSEDVPMVLVGNKCDLPSRTVDTKQAQDLARSYGIPFIETSAKTRQRVEDAFYTLVREIRQYRLKKISKEEKTPGCVKIKKCIIM'

Get MSA on server

Submitted on protenix-server to get msa/pairing & unpairing a3m files.

Upload and make a folder that contains the two files, use it as msa_dir

SMILES

df = Data.get_mirati_g12d()
df.ID.duplicated(keep=False).sum()
0
df.head()
ID SMILES Kd IC50 erk_IC50
0 US_1 CN1CCC[C@H]1COc1nc(N2CC3CCC(C2)N3)c2cnc(cc2n1)... 97.7 124.7 3159.1
1 US_4 Oc1cc(-c2ncc3c(nc(OCCc4ccccn4)nc3c2F)N2CC3CCC(... 155.7 496.2 8530.0
2 US_5 Cn1nccc1COc1nc(N2CC3CCC(C2)N3)c2cnc(c(F)c2n1)-... 294.8 722.9 8193.8
3 US_6 Cc1cccnc1CCOc1nc(N2CC3CCC(C2)N3)c2cnc(c(F)c2n1... 442.2 434.1 11518.2
4 US_7 Oc1cc(-c2ncc3c(nc(OCCc4ncccn4)nc3c2F)N2CC3CCC(... 463.5 1867.3 NaN

Test a positive control

MRTX

get_single_protein_ligand_json?
Signature:
get_single_protein_ligand_json(
    job_name,
    protein_seq,
    msa_dir,
    SMILES=None,
    CCD=None,
    json_path=None,
)
Docstring: Generate and optionally save a JSON config for one protein-ligand job.
File:      /tmp/ipykernel_1724/1197518213.py
Type:      function
_ = get_single_protein_ligand_json('kras_g12d_mrtx',
                                 g12d,
                                 msa_dir='kras_g12d_msa',
                                 SMILES="C#CC1=C(C=CC2=CC(=CC(=C21)C3=NC=C4C(=C3F)N=C(N=C4N5CC6CCC(C5)N6)OC[C@@]78CCCN7C[C@@H](C8)F)O)F",
                                 json_path='g12d_mrtx.json'
                                )
JSON saved to g12d_mrtx.json

Protenix command

protenix predict --input g12d_mrtx.json --out_dir  ./output --seeds 101

Run with other SMILES

get_virtual_screening_json?
Signature:
get_virtual_screening_json(
    df,
    protein_seq,
    msa_dir,
    id_col,
    smi_col=None,
    ccd_col=None,
    save_json=None,
)
Docstring: Get json file of single protein against multiple SMILES in a dataframe.
File:      /tmp/ipykernel_1724/3782683879.py
Type:      function
_ = get_virtual_screening_json(df,
                               g12d,
                               'kras_g12d_msa',
                               id_col='ID',
                               smi_col='SMILES',
                               save_json='kras_g12d_input.json')
JSON saved to kras_g12d_input.json
protenix predict --input kras_g12d_input.json --out_dir  ./output --seeds 101