Data

Setup

Compound datasets


source

fetch_csv

 fetch_csv (url)

source

Collins

 Collins ()

A class of loading compound datasets from Collins lab.

Collins lab dataset

Publication list is available on the lab page


source

Collins.get_antibiotics_2k

 Collins.get_antibiotics_2k ()

Antibiotics dataset of 50 µM 2,560 compounds screening in E. coli K12 BW25113. 2,335 unique compounds after deduplicated. Table S1B from 2020 Cell: A Deep Learning Approach to Antibiotic Discovery.

Collins.get_antibiotics_2k()
name SMILES inhibition activity
0 CEFPIRAMIDE Cc1cc(O)c(C(=O)NC(C(=O)NC2C(=O)N3C(C(=O)O)=C(C... 0.041572 1
1 GEMIFLOXACIN MESYLATE CON=C1CN(c2nc3c(cc2F)c(=O)c(C(=O)O)cn3C2CC2)CC... 0.041876 1
... ... ... ... ...
2333 EVANS BLUE Cc1cc(-c2ccc(N=Nc3ccc4c(S(=O)(=O)[O-])cc(S(=O)... 2.263200 0
2334 PROTOPORPHYRINOGEN IX C=Cc1c(C)c2cc3[nH]c(cc4nc(cc5[nH]c(cc1n2)c(C)c... 2.627450 0

2335 rows × 4 columns


source

Collins.get_antibiotics_39k

 Collins.get_antibiotics_39k ()

Antibiotics dataset of 50 µM 39,128 compounds screening in E. coli K12 BW25113. Supplementary dataset EV1 from 2022 Molecular Systems Biology: Benchmarking AlphaFold-enabled molecular docking predictions for antibiotic discovery.

Collins.get_antibiotics_39k()
CANONICAL_SMILES NAME R1_50uM R2_50uM MEAN_50uM activity
0 O=C1NC(=O)C=C1 BRD-K78206682 0.029762 0.028344 0.029053 1
1 CC(C)CCCC(=O)N[C@@H](CCNCS(=O)(=O)O)C(=O)N[C@@... BRD-K01666924 0.029695 0.030176 0.029935 1
... ... ... ... ... ... ...
39126 COC(=O)CCc1c(c/2[nH]c1/C=C/1\N=C(/C=C/3\N=C(/C... BRD-K81849500 1.924381 1.924847 1.924614 0
39127 Cc1cc(-c2ccc(N=Nc3ccc4c(S(=O)(=O)[O-])cc(S(=O)... EVANS BLUE 2.242000 2.284400 2.263200 0

39128 rows × 6 columns


source

Collins.get_antibiotics_enzyme

 Collins.get_antibiotics_enzyme ()

Antibiotics enzymatic inhibition dataset of 100 µM 218 compounds and 12 essential proteins in E. coli K12 BW25113. Flattened benchmark dataset/Supplementary EV4 from 2022 Molecular Systems Biology: Benchmarking AlphaFold-enabled molecular docking predictions for antibiotic discovery.

Collins.get_antibiotics_enzyme()
compound_ID SMILES enzyme_uniprot enzyme enzyme_type rep1 rep2 both_less_05
0 BRD-K78206682 O=C1NC(=O)C=C1 P0AES4, P0AES6 gyrA, gyrB DNA gyrase 0.610697 0.411738 0
1 BRD-K01666924 CC(C)CCCC(=O)N[C@@H](CCNCS(=O)(=O)O)C(=O)N[C@@... P0AES4, P0AES6 gyrA, gyrB DNA gyrase 0.574242 0.536372 0
... ... ... ... ... ... ... ... ...
2614 OXYQUINOLINE HEMISULFATE O=S(=O)(O)O.Oc1cccc2cccnc12 P11880 murF MurF 0.746858 1.154308 0
2615 CHLOROXINE Oc1c(Cl)cc(Cl)c2cccnc12 P11880 murF MurF 0.851553 1.108231 0

2616 rows × 8 columns

Kras datasets


source

Kras

 Kras ()

A class of fetching various KRAS datasets.


source

Kras.get_mirati_g12d_raw

 Kras.get_mirati_g12d_raw ()

Raw G12D dataset from the paper and patents without deduplication.

Kras.get_mirati_g12d_raw()
ID SMILES group with_3F racemic_trans mixture_isomer trans Kd IC50 erk_IC50
0 US_1 CN1CCC[C@H]1COc1nc(N2CC3CCC(C2)N3)c2cnc(cc2n1)... US 0 0 0 0 97.7 124.7 3159.1
1 US_2 CN1CCC[C@H]1COc1nc(N2CC3CCC(C2)N3)c2cnc(c(F)c2... US 1 0 0 0 2.4 2.7 721.4
... ... ... ... ... ... ... ... ... ... ...
720 paper_37 FC1=C(C2=C(C(Cl)=CC=C3)C3=CC(O)=C2)N=CC4=C1N=C... paper 0 0 0 0 NaN 2.0 63.0
721 paper_38 FC1=C(C2=C(C(C#C)=CC=C3)C3=CC(O)=C2)N=CC4=C1N=... paper 0 0 0 0 NaN 2.0 14.0

722 rows × 10 columns


source

Kras.get_mirati_g12d

 Kras.get_mirati_g12d ()

Deduplicated G12D dataset from the mirati paper and patents.

Kras.get_mirati_g12d()
ID SMILES Kd IC50 erk_IC50
0 US_1 CN1CCC[C@H]1COc1nc(N2CC3CCC(C2)N3)c2cnc(cc2n1)... 97.7 124.7 3159.1
1 US_4 Oc1cc(-c2ncc3c(nc(OCCc4ccccn4)nc3c2F)N2CC3CCC(... 155.7 496.2 8530.0
... ... ... ... ... ...
658 US_56 OC[C@@H](O)COc1nc(N2CC3CCC(C2)N3)c2cnc(c(F)c2n... 13805.3 6024.0 NaN
659 US_66 Fc1c(ncc2c(nc(OC[C@@]34CCCN3C(CCl)CC4)nc12)N1C... NaN 273.8 1332.6

660 rows × 5 columns


source

Kras.get_seq

 Kras.get_seq ()

Protein sequence of human KRAS and its mutants G12D and G12C.

Kras.get_seq()
ID WT_sequence g12d_seq g12c_seq
0 kras_human MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI... MTEYKLVVVGADGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI... MTEYKLVVVGACGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI...
1 kras_human_isoform2b MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI... MTEYKLVVVGADGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI... MTEYKLVVVGACGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVI...

End