Protenix

This is for protein-ligand task using Protenix github repository.

Install

git clone https://github.com/bytedance/Protenix.git
cd Protenix
pip install .

Setup

Single job json

Run the protein sequence on server to get msa folder that contains pairing.a3m and unpairing.a3m

Use the folder as the msa_dir


source

get_single_job

 get_single_job (job_name, protein_seq, msa_dir, SMILES=None, CCD=None)

Get protenix json format of protein and ligand.

get_single_job('job_name', 'AAA', './msa', SMILES='CCC',CCD=None)
{'name': 'job_name',
 'sequences': [{'proteinChain': {'count': 1,
    'sequence': 'AAA',
    'msa': {'precomputed_msa_dir': './msa', 'pairing_db': 'uniref100'}}},
  {'ligand': {'count': 1, 'ligand': 'CCC'}}]}

source

get_single_protein_ligand_json

 get_single_protein_ligand_json (job_name, protein_seq, msa_dir,
                                 SMILES=None, CCD=None, json_path=None)

Generate json input for one protein-ligand job.

# _ = get_single_protein_ligand_json('kras_g12d_mrtx',
#                                  g12d,
#                                  msa_dir='kras_g12d_msa',
#                                  SMILES="C#CC1=C(C=CC2=CC(=CC(=C21)C3=NC=C4C(=C3F)N=C(N=C4N5CC6CCC(C5)N6)OC[C@@]78CCCN7C[C@@H](C8)F)O)F",
#                                  json_path='g12d_mrtx.json'
#                                 )

Use the json as input file for protenix

protenix predict --input input.json --out_dir  ./output --seeds 101

Different protein-ligand pairs in df


source

get_protein_ligand_df_json

 get_protein_ligand_df_json (df, id_col, seq_col, msa_col, smi_col=None,
                             ccd_col=None, save_json=None)

Get json file of protein and ligand in a dataframe.

# _ = get_protein_ligand_df_json(df,
                               # id_col='ID',
                               # seq_col='sequence', 
                               # msa_col='msa_dir', 
                               # smi_col="SMILES", 
                               # ccd_col=None, 
                               # save_json="input.json")

Virtual screening

single protein against multiple ligands


source

get_virtual_screening_json

 get_virtual_screening_json (df, protein_seq, msa_dir, id_col,
                             smi_col=None, ccd_col=None, save_json=None)

Get json file of single protein against multiple SMILES in a dataframe.

# _ = get_virtual_screening_json(df,
#                                g12d_seq,
#                                'kras_g12d_msa',
#                                id_col='ID',
#                                smi_col='SMILES',
#                                save_json='kras_g12d_input.json')

End