Protenix

This is for protein-ligand task using Protenix github repository.

Install

git clone https://github.com/bytedance/Protenix.git
cd Protenix
pip install .

Setup

Single job json

Run the protein sequence on server to get msa folder that contains pairing.a3m and unpairing.a3m

Use the folder as the msa_dir

source

get_single_job


def get_single_job(
    job_name, protein_seq, msa_dir, SMILES:NoneType=None, CCD:NoneType=None
):

Get protenix json format of protein and ligand.

get_single_job('job_name', 'AAA', './msa', SMILES='CCC',CCD=None)

{'name': 'job_name',
 'sequences': [{'proteinChain': {'count': 1,
    'sequence': 'AAA',
    'msa': {'precomputed_msa_dir': './msa', 'pairing_db': 'uniref100'}}},
  {'ligand': {'count': 1, 'ligand': 'CCC'}}]}

source

get_single_protein_ligand_json


def get_single_protein_ligand_json(
    job_name, protein_seq, msa_dir, SMILES:NoneType=None, CCD:NoneType=None, json_path:NoneType=None
):

Generate json input for one protein-ligand job.

# _ = get_single_protein_ligand_json('kras_g12d_mrtx',
#                                  g12d,
#                                  msa_dir='kras_g12d_msa',
#                                  SMILES="C#CC1=C(C=CC2=CC(=CC(=C21)C3=NC=C4C(=C3F)N=C(N=C4N5CC6CCC(C5)N6)OC[C@@]78CCCN7C[C@@H](C8)F)O)F",
#                                  json_path='g12d_mrtx.json'
#                                 )

Use the json as input file for protenix

protenix predict --input input.json --out_dir  ./output --seeds 101

Different protein-ligand pairs in df

source

get_protein_ligand_df_json


def get_protein_ligand_df_json(
    df, id_col, seq_col, msa_col, smi_col:NoneType=None, ccd_col:NoneType=None, save_json:NoneType=None
):

Get json file of protein and ligand in a dataframe.

# _ = get_protein_ligand_df_json(df,
                               # id_col='ID',
                               # seq_col='sequence', 
                               # msa_col='msa_dir', 
                               # smi_col="SMILES", 
                               # ccd_col=None, 
                               # save_json="input.json")

Virtual screening

single protein against multiple ligands

source

get_virtual_screening_json


def get_virtual_screening_json(
    df, protein_seq, msa_dir, id_col, smi_col:NoneType=None, ccd_col:NoneType=None, save_json:NoneType=None
):

Get json file of single protein against multiple SMILES in a dataframe.

# _ = get_virtual_screening_json(df,
#                                g12d_seq,
#                                'kras_g12d_msa',
#                                id_col='ID',
#                                smi_col='SMILES',
#                                save_json='kras_g12d_input.json')

Install

Setup

Single job json

get_single_job

get_single_protein_ligand_json

Different protein-ligand pairs in df

get_protein_ligand_df_json

Virtual screening

get_virtual_screening_json

End