This is for protein-ligand task using Protenix github repository.
Install
git clone https://github.com/bytedance/Protenix.git
cd Protenix
pip install .
Single job json
Run the protein sequence on server to get msa folder that contains pairing.a3m and unpairing.a3m
Use the folder as the msa_dir
source
get_single_job
get_single_job (job_name, protein_seq, msa_dir, SMILES=None, CCD=None)
Get protenix json format of protein and ligand.
get_single_job('job_name', 'AAA', './msa', SMILES='CCC',CCD=None)
{'name': 'job_name',
'sequences': [{'proteinChain': {'count': 1,
'sequence': 'AAA',
'msa': {'precomputed_msa_dir': './msa', 'pairing_db': 'uniref100'}}},
{'ligand': {'count': 1, 'ligand': 'CCC'}}]}
source
get_single_protein_ligand_json
get_single_protein_ligand_json (job_name, protein_seq, msa_dir,
SMILES=None, CCD=None, json_path=None)
Generate json input for one protein-ligand job.
# _ = get_single_protein_ligand_json('kras_g12d_mrtx',
# g12d,
# msa_dir='kras_g12d_msa',
# SMILES="C#CC1=C(C=CC2=CC(=CC(=C21)C3=NC=C4C(=C3F)N=C(N=C4N5CC6CCC(C5)N6)OC[C@@]78CCCN7C[C@@H](C8)F)O)F",
# json_path='g12d_mrtx.json'
# )
Use the json as input file for protenix
protenix predict --input input.json --out_dir ./output --seeds 101
Different protein-ligand pairs in df
source
get_protein_ligand_df_json
get_protein_ligand_df_json (df, id_col, seq_col, msa_col, smi_col=None,
ccd_col=None, save_json=None)
Get json file of protein and ligand in a dataframe.
# _ = get_protein_ligand_df_json(df,
# id_col='ID',
# seq_col='sequence',
# msa_col='msa_dir',
# smi_col="SMILES",
# ccd_col=None,
# save_json="input.json")
Virtual screening
single protein against multiple ligands
source
get_virtual_screening_json
get_virtual_screening_json (df, protein_seq, msa_dir, id_col,
smi_col=None, ccd_col=None, save_json=None)
Get json file of single protein against multiple SMILES in a dataframe.
# _ = get_virtual_screening_json(df,
# g12d_seq,
# 'kras_g12d_msa',
# id_col='ID',
# smi_col='SMILES',
# save_json='kras_g12d_input.json')