Datahub

from katlas.core import *
import pandas as pd

Kinase basic information

The kinase info (kinase group, family and subfamily) is from Coral github, sequence info is from Uniprot, kinase domain sequence is from kinase.com, subcellular location data is from Zhang et al.

info = Data.get_kinase_info()
info
kinase ID_coral uniprot ID_HGNC group family subfamily_coral subfamily in_ST_paper in_Tyr_paper ... cytosol cytoskeleton plasma membrane mitochondrion Golgi apparatus endoplasmic reticulum vesicle centrosome aggresome main_location
0 AAK1 AAK1 Q2M2I8 AAK1 Other NAK NaN NAK 1 0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 ABL1 ABL1 P00519 ABL1 TK Abl NaN Abl 0 1 ... 6.0 NaN 4.0 NaN NaN NaN NaN NaN NaN cytosol
2 ABL2 ABL2 P42684 ABL2 TK Abl NaN Abl 0 1 ... 4.0 6.0 NaN NaN NaN NaN NaN NaN NaN cytoskeleton
3 TNK2 ACK Q07912 TNK2 TK Ack NaN Ack 0 1 ... NaN NaN NaN NaN NaN NaN 8.0 NaN 2.0 vesicle
4 ACVR2A ACTR2 P27037 ACVR2A TKL STKR STKR2 STKR2 1 0 ... 5.0 NaN NaN NaN NaN 5.0 NaN NaN NaN cytosol
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
518 YSK1 YSK1 O00506 STK25 STE STE20 YSK YSK 1 0 ... 6.0 NaN NaN NaN 4.0 NaN NaN NaN NaN cytosol
519 ZAK ZAK Q9NYL2 MAP3K20 TKL MLK ZAK ZAK 1 0 ... 5.0 NaN NaN NaN NaN NaN NaN NaN NaN nucleus
520 ZAP70 ZAP70 P43403 ZAP70 TK Syk NaN Syk 0 1 ... 5.0 NaN 2.0 NaN NaN NaN NaN NaN NaN cytosol
521 EEF2K eEF2K O00418 EEF2K Atypical Alpha eEF2K eEF2K 1 0 ... 9.0 NaN 1.0 NaN NaN NaN NaN NaN NaN cytosol
522 FAM20C FAM20C Q8IXL6 FAM20C Atypical FAM20C NaN FAM20C 1 0 ... 2.0 NaN NaN NaN 7.0 1.0 NaN NaN NaN Golgi apparatus

523 rows × 30 columns

info.columns
Index(['kinase', 'ID_coral', 'uniprot', 'ID_HGNC', 'group', 'family',
       'subfamily_coral', 'subfamily', 'in_ST_paper', 'in_Tyr_paper',
       'in_cddm', 'pseudo', 'pspa_category_small', 'pspa_category_big',
       'cddm_big', 'cddm_small', 'length', 'human_uniprot_sequence',
       'kinasecom_domain', 'nucleus', 'cytosol', 'cytoskeleton',
       'plasma membrane', 'mitochondrion', 'Golgi apparatus',
       'endoplasmic reticulum', 'vesicle', 'centrosome', 'aggresome',
       'main_location'],
      dtype='object')

Kinase-substrate dataset

A combination of Sugiyama et al. and PhosphoSitePlus kinase-substrate dataset

Data.get_ks_dataset()
kin_sub_site kinase_uniprot substrate_uniprot site source substrate_genes substrate_phosphoseq position site_seq sub_site
0 O00141_A4FU28_S140 O00141 A4FU28 S140 Sugiyama CTAGE9 MEEPGATPQPYLGLVLEELGRVVAALPESMRPDENPYGFPSELVVC... 140 AAAEEARSLEATCEKLSRsNsELEDEILCLEKDLKEEKSKH A4FU28_S140
1 O00141_O00141_S252 O00141 O00141 S252 Sugiyama SGK1 SGK MTVKTEAAKGTLTYSRMRGMVAILIAFMKQRRMGLNDFIQKIANNS... 252 SQGHIVLTDFGLCKENIEHNsTtstFCGtPEyLAPEVLHKQ O00141_S252
2 O00141_O00141_S255 O00141 O00141 S255 Sugiyama SGK1 SGK MTVKTEAAKGTLTYSRMRGMVAILIAFMKQRRMGLNDFIQKIANNS... 255 HIVLTDFGLCKENIEHNsTtstFCGtPEyLAPEVLHKQPYD O00141_S255
3 O00141_O00141_S397 O00141 O00141 S397 Sugiyama SGK1 SGK MTVKTEAAKGTLTYSRMRGMVAILIAFMKQRRMGLNDFIQKIANNS... 397 sGPNDLRHFDPEFTEEPVPNsIGKsPDsVLVTAsVKEAAEA O00141_S397
4 O00141_O00141_S404 O00141 O00141 S404 Sugiyama SGK1 SGK MTVKTEAAKGTLTYSRMRGMVAILIAFMKQRRMGLNDFIQKIANNS... 404 HFDPEFTEEPVPNsIGKsPDsVLVTAsVKEAAEAFLGFsYA O00141_S404
... ... ... ... ... ... ... ... ... ... ...
187061 Q9Y6R4_P62273_Y7 Q9Y6R4 P62273 Y7 Sugiyama RPS29 MGHQQLyWsHPRKFGQGSRSCRVCSNRHGLIRKyGLNMCRQCFRQY... 7 ______________MGHQQLyWsHPRKFGQGSRSCRVCSNR P62273_Y7
187062 Q9Y6R4_Q86W56_Y832 Q9Y6R4 Q86W56 Y832 Sugiyama PARG MNAGPGCEPCTKRPRWGAATtsPAASDARSFPSRQRRVLDPKDAHV... 832 DDWQRRCTEIVAIDALHFRRyLDQFVPEKMRRELNKAYCGF Q86W56_Y832
187063 Q9Y6R4_Q9Y6R4_T1324 Q9Y6R4 Q9Y6R4 T1324 Sugiyama MAP3K4 KIAA0213 MAPKKK4 MEKK4 MTK1 MREAAAALVPPPAFAVTPAAAMEEPPPPPPPPPPPPEPETESEPEC... 1324 FEEKRYREMRRKNIIGQVCDtPKSyDNVMHVGLRKVTFKWQ Q9Y6R4_T1324
187064 Q9Y6R4_Q9Y6R4_T1494 Q9Y6R4 Q9Y6R4 T1494 SIGNOR|EPSD|PSP MAP3K4 KIAA0213 MAPKKK4 MEKK4 MTK1 MREAAAALVPPPAFAVTPAAAMEEPPPPPPPPPPPPEPETESEPEC... 1494 SGLIKLGDFGCSVKLKNNAQtMPGEVNSTLGTAAYMAPEVI Q9Y6R4_T1494
187065 Q9Y6R4_Q9Y6R4_Y1328 Q9Y6R4 Q9Y6R4 Y1328 Sugiyama MAP3K4 KIAA0213 MAPKKK4 MEKK4 MTK1 MREAAAALVPPPAFAVTPAAAMEEPPPPPPPPPPPPEPETESEPEC... 1328 RYREMRRKNIIGQVCDtPKSyDNVMHVGLRKVTFKWQRGNK Q9Y6R4_Y1328

187066 rows × 10 columns

Phosphoproteomics data

PhosphoSitePlus human

Data.get_psp_human_site()
gene protein uniprot site gene_site SITE_GRP_ID species site_seq LT_LIT MS_LIT MS_CST CST_CAT# Ambiguous_Site
0 YWHAB 14-3-3 beta P31946 T2 YWHAB_T2 15718712 human ______MtMDksELV NaN 3.0 1.0 None 0
1 YWHAB 14-3-3 beta P31946 S6 YWHAB_S6 15718709 human __MtMDksELVQkAk NaN 8.0 NaN None 0
2 YWHAB 14-3-3 beta P31946 Y21 YWHAB_Y21 3426383 human LAEQAERyDDMAAAM NaN NaN 4.0 None 0
3 YWHAB 14-3-3 beta P31946 T32 YWHAB_T32 23077803 human AAAMkAVtEQGHELs NaN NaN 1.0 None 0
4 YWHAB 14-3-3 beta P31946 S39 YWHAB_S39 27442700 human tEQGHELsNEERNLL NaN 4.0 NaN None 0
... ... ... ... ... ... ... ... ... ... ... ... ... ...
240006 ZZZ3 ZZZ3 Q8IYH5 S474 ZZZ3_S474 482028 human PsAKESAsQHITEEE NaN 1.0 NaN None 0
240007 ZZZ3 ZZZ3 Q8IYH5 S606 ZZZ3_S606 23077718 human GLPARPksPLDPKKD NaN 6.0 4.0 None 0
240008 ZZZ3 ZZZ3 Q8IYH5 Y670 ZZZ3_Y670 23077724 human LEQLLIKyPPEEVEs NaN NaN 1.0 None 0
240009 ZZZ3 ZZZ3 Q8IYH5 S677 ZZZ3_S677 23077721 human yPPEEVEsRRWQKIA NaN NaN 1.0 None 0
240010 ZZZ3 ZZZ3 Q8IYH5 S777 ZZZ3_S777 41455930 human NTAVEDAsDDESIPI NaN 2.0 NaN None 0

240011 rows × 13 columns

Ochoa et al. dataset

Data.get_ochoa_site()
uniprot position residue is_disopred disopred_score log10_hotspot_pval_min isHotspot uniprot_position functional_score current_uniprot name gene Sequence is_valid site_seq gene_site
0 A0A075B6Q4 24 S True 0.91 6.839384 True A0A075B6Q4_24 0.149257 A0A075B6Q4 A0A075B6Q4_HUMAN None MDIQKSENEDDSEWEDVDDEKGDSNDDYDSAGLLSDEDCMSVPGKT... True VDDEKGDSNDDYDSA A0A075B6Q4_S24
1 A0A075B6Q4 35 S True 0.87 9.192622 False A0A075B6Q4_35 0.136966 A0A075B6Q4 A0A075B6Q4_HUMAN None MDIQKSENEDDSEWEDVDDEKGDSNDDYDSAGLLSDEDCMSVPGKT... True YDSAGLLSDEDCMSV A0A075B6Q4_S35
2 A0A075B6Q4 57 S False 0.28 0.818834 False A0A075B6Q4_57 0.125364 A0A075B6Q4 A0A075B6Q4_HUMAN None MDIQKSENEDDSEWEDVDDEKGDSNDDYDSAGLLSDEDCMSVPGKT... True IADHLFWSEETKSRF A0A075B6Q4_S57
3 A0A075B6Q4 68 S False 0.03 0.375986 False A0A075B6Q4_68 0.119811 A0A075B6Q4 A0A075B6Q4_HUMAN None MDIQKSENEDDSEWEDVDDEKGDSNDDYDSAGLLSDEDCMSVPGKT... True KSRFTEYSMTSSVMR A0A075B6Q4_S68
4 A0A075B6Q4 71 S False 0.05 0.000000 False A0A075B6Q4_71 0.095193 A0A075B6Q4 A0A075B6Q4_HUMAN None MDIQKSENEDDSEWEDVDDEKGDSNDDYDSAGLLSDEDCMSVPGKT... True FTEYSMTSSVMRRNE A0A075B6Q4_S71
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
112276 V9GYY5 127 S True 0.97 3.193174 False V9GYY5_127 0.292446 V9GYY5 V9GYY5_HUMAN None KRDGDDRRPRLVLSFDEEKRREYLTGFHKRKVERKKAAIEEIKQRL... True EGGAGDRSEEEASST V9GYY5_S127
112277 V9GYY5 132 S True 0.93 2.055830 False V9GYY5_132 0.219329 V9GYY5 V9GYY5_HUMAN None KRDGDDRRPRLVLSFDEEKRREYLTGFHKRKVERKKAAIEEIKQRL... True DRSEEEASSTEKPTK V9GYY5_S132
112278 V9GYY5 133 S True 0.89 2.055830 False V9GYY5_133 0.202808 V9GYY5 V9GYY5_HUMAN None KRDGDDRRPRLVLSFDEEKRREYLTGFHKRKVERKKAAIEEIKQRL... True RSEEEASSTEKPTKA V9GYY5_S133
112279 V9GYY5 134 T True 0.83 2.055830 False V9GYY5_134 0.187417 V9GYY5 V9GYY5_HUMAN None KRDGDDRRPRLVLSFDEEKRREYLTGFHKRKVERKKAAIEEIKQRL... True SEEEASSTEKPTKAL V9GYY5_T134
112280 V9GYY5 138 T True 0.82 0.726611 False V9GYY5_138 0.121025 V9GYY5 V9GYY5_HUMAN None KRDGDDRRPRLVLSFDEEKRREYLTGFHKRKVERKKAAIEEIKQRL... True ASSTEKPTKALPRKS V9GYY5_T138

112281 rows × 16 columns

Combined Ochoa and PhosphoSitePlus

Data.get_combine_site_psp_ochoa()
uniprot gene site site_seq source AM_pathogenicity CDDM_upper CDDM_max_score
0 A0A024R4G9 C19orf48 S20 ITGSRLLSMVPGPAR psp NaN PRKX,AKT1,PKG1,P90RSK,HIPK4,AKT3,HIPK1,PKACB,H... 2.407041
1 A0A075B6Q4 None S24 VDDEKGDSNDDYDSA ochoa NaN CK2A2,CK2A1,GRK7,GRK5,CK1G1,CK1A,IKKA,CK1G2,CA... 2.295654
2 A0A075B6Q4 None S35 YDSAGLLSDEDCMSV ochoa NaN CK2A2,CK2A1,IKKA,ATM,IKKB,CAMK1D,MARK2,GRK7,IK... 2.488683
3 A0A075B6Q4 None S57 IADHLFWSEETKSRF ochoa NaN GRK7,CK2A1,CK2A2,PKN2,GRK1,GRK5,MARK1,MARK2,UL... 1.851894
4 A0A075B6Q4 None S68 KSRFTEYSMTSSVMR ochoa NaN AKT1,P90RSK,AKT3,SGK1,AKT2,NDR2,RSK2,P70S6K,RS... 2.026384
... ... ... ... ... ... ... ... ...
121414 V9GYY5 None S127 EGGAGDRSEEEASST ochoa NaN CK2A1,CK2A2,GRK7,GRK5,ALK2,GRK1,CK1E,PLK3,CK1A... 2.665606
121415 V9GYY5 None S132 DRSEEEASSTEKPTK ochoa NaN CK2A2,CK2A1,GRK7,TGFBR1,GRK2,ALK2,PLK3,CLK3,BM... 2.445179
121416 V9GYY5 None S133 RSEEEASSTEKPTKA ochoa NaN CK2A1,ATR,GRK1,CK1G1,PLK3,CLK3,GRK7,CK1G2,MARK... 2.090739
121417 V9GYY5 None T134 SEEEASSTEKPTKAL ochoa NaN ASK1,PERK,EEF2K,MAP2K4,MEKK2,MST1,BMPR1B,OSR1,... 1.832532
121418 V9GYY5 None T138 ASSTEKPTKALPRKS ochoa NaN ASK1,MEK2,MPSK1,TNIK,PBK,MST2,MINK,NEK4,LKB1,MEK5 1.807565

121419 rows × 8 columns

Phosphorylated version of the above version

The Ochoa dataset was first phophorylated based on the given site info, then combined with the PSP dataset with phosphorylation status.

Data.get_combine_site_phosphorylated()
uniprot gene site site_seq source AM_pathogenicity CDDM PSPA CDDM_max_score PSPA_max_score
0 A0A024R4G9 C19orf48 S20 ITGSRLLsMVPGPAR psp NaN PRKX,PKG1,AKT1,AKT3,HIPK4,P90RSK,PKACB,PKACA,P... MAPKAPK5,AKT1,RSK3,P70S6K,MAPKAPK3,AKT2,DYRK1A... 2.339278 3.726109
1 A0A075B6Q4 None S24 VDDEKGDsNDDYDSA ochoa NaN CK2A2,CK2A1,GRK7,GRK5,CK1G1,IKKA,CAMK1D,MARK2,... CAMK2B,CK2A2,CAMK2A,CK2A1,GRK7,TLK2,FAM20C,CAM... 2.253027 4.940056
2 A0A075B6Q4 None S35 YDSAGLLsDEDCMSV ochoa NaN CK2A2,CK2A1,ATM,CAMK1D,IKKB,IKKA,MARK2,MARK1,G... CK2A1,CK2A2,FAM20C,CDC7,GRK6,SMG1,ALK2,TGFBR1,... 2.396014 5.803230
3 A0A075B6Q4 None S57 IADHLFWsEETKSRF ochoa NaN GRK7,CK2A2,PKN2,CK2A1,GRK1,MARK1,TSSK2,MARK2,N... FAM20C,ACVR2B,GRK1,CDC7,BMPR1B,BMPR1A,ACVR2A,D... 1.793644 4.038678
4 A0A075B6Q4 None S68 KSRFTEYsMTssVMR ochoa NaN AKT1,P90RSK,RSK2,RSK4,CLK3,NDR2,P70S6K,AKT3,SG... GSK3B,GSK3A,CK1G2,PLK2,PLK3,TGFBR1,TLK2,GRK3,A... 1.789278 7.268416
... ... ... ... ... ... ... ... ... ... ...
120099 V9GYY5 None S127 EGGAGDRsEEEAsst ochoa NaN CK2A2,CK2A1,GRK7,ALK2,GRK5,GRK1,CK1E,PLK3,CK1A... FAM20C,CK2A1,CK2A2,GRK7,BMPR1B,GRK1,ALK2,BMPR1... 2.575281 7.326407
120100 V9GYY5 None S132 DRsEEEAsstEKPtK ochoa NaN CK2A2,CK2A1,GRK7,TGFBR1,ALK2,GRK2,PLK3,BMPR1B,... BMPR1B,BMPR1A,GRK3,CK2A1,PLK2,GRK7,ACVR2A,GRK2... 2.359323 9.746005
120101 V9GYY5 None S133 RsEEEAsstEKPtKA ochoa NaN GRK1,CK2A1,CK1G1,PLK3,GRK7,CK2A2,CK1G2,CLK3,AT... BMPR1B,CK1G2,GRK7,BMPR1A,GRK3,PLK2,GRK1,ACVR2B... 2.019862 5.370222
120102 V9GYY5 None T134 sEEEAsstEKPtKAL ochoa NaN PERK,ASK1,EEF2K,MST1,BMPR1B,PBK,MEKK2,OSR1,MST... CK1G2,GSK3A,ALPHAK3,GRK1,GRK7,GSK3B,BMPR1B,BMP... 1.723089 7.009429
120103 V9GYY5 None T138 AsstEKPtKALPRKS ochoa NaN ASK1,PBK,TNIK,MPSK1,MINK,MST2,NEK4,MEK2,MST1,BUB1 CK1G3,CK1G2,CK1A2,CK1D,CK1A,GRK3,PASK,GRK2,CK1... 1.651888 4.350109

120104 rows × 10 columns

Human phosphoproteome

With phosphorylated protein sequence and site sequence

Data.get_human_site()
substrate_uniprot substrate_genes site source AM_pathogenicity substrate_sequence substrate_species sub_site substrate_phosphoseq position site_seq
0 A0A024R4G9 C19orf48 MGC13170 hCG_2008493 S20 psp NaN MTVLEAVLEIQAITGSRLLSMVPGPARPPGSCWDPTQCTRTWLLSH... Homo sapiens (Human) A0A024R4G9_S20 MTVLEAVLEIQAITGSRLLsMVPGPARPPGSCWDPTQCTRTWLLSH... 20 _MTVLEAVLEIQAITGSRLLsMVPGPARPPGSCWDPTQCTR
1 A0A075B6Q4 None S24 ochoa NaN MDIQKSENEDDSEWEDVDDEKGDSNDDYDSAGLLSDEDCMSVPGKT... Homo sapiens (Human) A0A075B6Q4_S24 MDIQKSENEDDSEWEDVDDEKGDsNDDYDSAGLLsDEDCMSVPGKT... 24 QKSENEDDSEWEDVDDEKGDsNDDYDSAGLLsDEDCMSVPG
2 A0A075B6Q4 None S35 ochoa NaN MDIQKSENEDDSEWEDVDDEKGDSNDDYDSAGLLSDEDCMSVPGKT... Homo sapiens (Human) A0A075B6Q4_S35 MDIQKSENEDDSEWEDVDDEKGDsNDDYDSAGLLsDEDCMSVPGKT... 35 EDVDDEKGDsNDDYDSAGLLsDEDCMSVPGKTHRAIADHLF
3 A0A075B6Q4 None S57 ochoa NaN MDIQKSENEDDSEWEDVDDEKGDSNDDYDSAGLLSDEDCMSVPGKT... Homo sapiens (Human) A0A075B6Q4_S57 MDIQKSENEDDSEWEDVDDEKGDsNDDYDSAGLLsDEDCMSVPGKT... 57 EDCMSVPGKTHRAIADHLFWsEETKSRFTEYsMTssVMRRN
4 A0A075B6Q4 None S68 ochoa NaN MDIQKSENEDDSEWEDVDDEKGDSNDDYDSAGLLSDEDCMSVPGKT... Homo sapiens (Human) A0A075B6Q4_S68 MDIQKSENEDDSEWEDVDDEKGDsNDDYDSAGLLsDEDCMSVPGKT... 68 RAIADHLFWsEETKSRFTEYsMTssVMRRNEQLTLHDERFE
... ... ... ... ... ... ... ... ... ... ... ...
121327 V9GYY5 None S127 ochoa NaN KRDGDDRRPRLVLSFDEEKRREYLTGFHKRKVERKKAAIEEIKQRL... Homo sapiens (Human) V9GYY5_S127 KRDGDDRRPRLVLSFDEEKRREYLTGFHKRKVERKKAAIEEIKQRL... 127 DLsGARLLGLtPPEGGAGDRsEEEAsstEKPtKALPRKSRD
121328 V9GYY5 None S132 ochoa NaN KRDGDDRRPRLVLSFDEEKRREYLTGFHKRKVERKKAAIEEIKQRL... Homo sapiens (Human) V9GYY5_S132 KRDGDDRRPRLVLSFDEEKRREYLTGFHKRKVERKKAAIEEIKQRL... 132 RLLGLtPPEGGAGDRsEEEAsstEKPtKALPRKSRDPLLSQ
121329 V9GYY5 None S133 ochoa NaN KRDGDDRRPRLVLSFDEEKRREYLTGFHKRKVERKKAAIEEIKQRL... Homo sapiens (Human) V9GYY5_S133 KRDGDDRRPRLVLSFDEEKRREYLTGFHKRKVERKKAAIEEIKQRL... 133 LLGLtPPEGGAGDRsEEEAsstEKPtKALPRKSRDPLLSQR
121330 V9GYY5 None T134 ochoa NaN KRDGDDRRPRLVLSFDEEKRREYLTGFHKRKVERKKAAIEEIKQRL... Homo sapiens (Human) V9GYY5_T134 KRDGDDRRPRLVLSFDEEKRREYLTGFHKRKVERKKAAIEEIKQRL... 134 LGLtPPEGGAGDRsEEEAsstEKPtKALPRKSRDPLLSQRI
121331 V9GYY5 None T138 ochoa NaN KRDGDDRRPRLVLSFDEEKRREYLTGFHKRKVERKKAAIEEIKQRL... Homo sapiens (Human) V9GYY5_T138 KRDGDDRRPRLVLSFDEEKRREYLTGFHKRKVERKKAAIEEIKQRL... 138 PPEGGAGDRsEEEAsstEKPtKALPRKSRDPLLSQRISSLT

119955 rows × 11 columns

CPTAC data

Query specific tumor type

CPTAC.list_cancer()
['HNSCC', 'GBM', 'COAD', 'CCRCC', 'LSCC', 'BRCA', 'UCEC', 'LUAD', 'PDAC', 'OV']

To load CPTAC phosphorylation site information, use CPTAC.get_id(). Fold change of various conditions can be acquired through LinkedOmics or LinkedOmicsKB. Use is_KB to indicate whether the phosphorylation site information is for LinkedOmics or LinkedOmicsKB.

tumor = CPTAC.get_id('CCRCC',is_KB=True)
normal = CPTAC.get_id('CCRCC',is_KB=True, is_Tumor=False)
tumor.head()
the CCRCC dataset length is: 54238
after id mapping, the length is 213737
0 sites does not have a mapped gene name
after removing duplicates of protein_site, the length is 212814
the CCRCC dataset length is: 53152
after id mapping, the length is 209188
0 sites does not have a mapped gene name
after removing duplicates of protein_site, the length is 208298
gene site site_seq protein gene_name gene_site protein_site
0 ENSG00000003056.8 S267 DDQLGEESEERDDHL ENSP00000000412.3 M6PR M6PR_S267 ENSP00000000412_S267
1 ENSG00000003056.8 S267 DDQLGEESEERDDHL ENSP00000440488.2 M6PR M6PR_S267 ENSP00000440488_S267
2 ENSG00000048028.11 S1053 PPTIRPNSPYDLCSR ENSP00000003302.4 USP28 USP28_S1053 ENSP00000003302_S1053
3 ENSG00000048028.11 S1053 PPTIRPNSPYDLCSR ENSP00000445743.1 USP28 USP28_S1053 ENSP00000445743_S1053
4 ENSG00000048028.11 S1053 PPTIRPNSPYDLCSR ENSP00000442431.1 USP28 USP28_S1053 ENSP00000442431_S1053

Unique Ensemble ProteinID + site

Query all of cancer types and compile

Data.get_cptac_ensembl_site()
gene site site_seq protein gene_name gene_site protein_site
0 ENSG00000003056.8 S267 DDQLGEESEERDDHL ENSP00000000412.3 M6PR M6PR_S267 ENSP00000000412_S267
1 ENSG00000003056.8 S267 DDQLGEESEERDDHL ENSP00000440488.2 M6PR M6PR_S267 ENSP00000440488_S267
2 ENSG00000048028.11 S1053 PPTIRPNSPYDLCSR ENSP00000003302.4 USP28 USP28_S1053 ENSP00000003302_S1053
3 ENSG00000048028.11 S1053 PPTIRPNSPYDLCSR ENSP00000445743.1 USP28 USP28_S1053 ENSP00000445743_S1053
4 ENSG00000048028.11 S1053 PPTIRPNSPYDLCSR ENSP00000442431.1 USP28 USP28_S1053 ENSP00000442431_S1053
... ... ... ... ... ... ... ...
488581 ENSG00000173230.15 S2878 TSPAEVQSLKKAMSS ENSP00000484083.1 GOLGB1 GOLGB1_S2878 ENSP00000484083_S2878
488582 ENSG00000143631.11 S1642 SHQEDRASHGHSAES ENSP00000357789.1 FLG FLG_S1642 ENSP00000357789_S1642
488583 ENSG00000143631.11 S495 STGGRQGSHHEQARD ENSP00000357789.1 FLG FLG_S495 ENSP00000357789_S495
488584 ENSG00000143631.11 S648 ASRNHHGSAQEQSRD ENSP00000357789.1 FLG FLG_S648 ENSP00000357789_S648
488585 ENSG00000143520.6 S2310 DTTRHGHSGYGQSTQ ENSP00000373370.4 FLG2 FLG2_S2310 ENSP00000373370_S2310

488586 rows × 7 columns

Unique site sequences

Data.get_cptac_unique_site()
site_seq gene_site num_site acceptor
0 AAAAAAASFPWSAFG ZBTB7A_S182 1 S
1 AAAAAAASGAAGGGG INTS3_S16 1 S
2 AAAAAAASGALLGAY TMEM64_S62 1 S
3 AAAAAAASGGAGSDN PBX1_S136 1 S
4 AAAAAAASGGGVSPD PBX2_S146 1 S
... ... ... ... ...
125471 ______MTMETLPKV PIRT_T2 1 T
125472 ______MTPPPPPPP ESRP2_T2 1 T
125473 ______MTVSGPGTP UNC45A_T2 1 T
125474 ______MYPAGPPAG TIGD5_Y2 1 Y
125475 _______SPASLPLA RFLNB_S1 1 S

125476 rows × 4 columns

Unique gene name+site

Data.get_cptac_gene_site()
gene site site_seq protein gene_name gene_site protein_site
0 ENSG00000003056.8 S267 DDQLGEESEERDDHL ENSP00000000412.3 M6PR M6PR_S267 ENSP00000000412_S267
1 ENSG00000048028.11 S1053 PPTIRPNSPYDLCSR ENSP00000003302.4 USP28 USP28_S1053 ENSP00000003302_S1053
2 ENSG00000004776.13 S16 PSWLRRASAPLPGLS ENSP00000004982.3 HSPB6 HSPB6_S16 ENSP00000004982_S16
3 ENSG00000005175.10 S116 DSTHESLSQESESEE ENSP00000005386.3 RPAP3 RPAP3_S116 ENSP00000005386_S116
4 ENSG00000005175.10 S121 SLSQESESEEDGIHV ENSP00000005386.3 RPAP3 RPAP3_S121 ENSP00000005386_S121
... ... ... ... ... ... ... ...
126220 ENSG00000173230.15 S2878 TSPAEVQSLKKAMSS ENSP00000341848.5 GOLGB1 GOLGB1_S2878 ENSP00000341848_S2878
126221 ENSG00000143631.11 S1642 SHQEDRASHGHSAES ENSP00000357789.1 FLG FLG_S1642 ENSP00000357789_S1642
126222 ENSG00000143631.11 S495 STGGRQGSHHEQARD ENSP00000357789.1 FLG FLG_S495 ENSP00000357789_S495
126223 ENSG00000143631.11 S648 ASRNHHGSAQEQSRD ENSP00000357789.1 FLG FLG_S648 ENSP00000357789_S648
126224 ENSG00000143520.6 S2310 DTTRHGHSGYGQSTQ ENSP00000373370.4 FLG2 FLG2_S2310 ENSP00000373370_S2310

126225 rows × 7 columns

PSSM: CDDM data

Data.get_cddm()
substrate -7P -7G -7A -7C -7S -7T -7V -7I -7L -7M ... 7E 7s 7t 7y 0s 0t 0y 0S 0T 0Y
kinase
SRC 0.055749 0.064895 0.060105 0.010017 0.045732 0.033101 0.049216 0.037892 0.080139 0.020035 ... 0.085192 0.029438 0.013381 0.017841 0.038927 0.034602 0.926471 0.038927 0.034602 0.926471
EPHA3 0.042881 0.075316 0.068169 0.013194 0.039582 0.031336 0.048378 0.043430 0.079714 0.021440 ... 0.092551 0.024266 0.013544 0.022009 0.054526 0.035442 0.910033 0.054526 0.035442 0.910033
FES 0.049633 0.075578 0.065990 0.012972 0.036097 0.029893 0.055274 0.040045 0.080090 0.014100 ... 0.084483 0.024713 0.017816 0.022414 0.038699 0.026921 0.934380 0.038699 0.026921 0.934380
NTRK3 0.042771 0.074699 0.073494 0.012048 0.034940 0.022289 0.052410 0.044578 0.080723 0.016867 ... 0.079679 0.026560 0.022236 0.022236 0.101796 0.060479 0.837725 0.101796 0.060479 0.837725
ALK 0.045482 0.076214 0.070682 0.014136 0.034419 0.028273 0.049170 0.035648 0.079902 0.018439 ... 0.081835 0.028518 0.018599 0.023559 0.059038 0.044431 0.896531 0.059038 0.044431 0.896531
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
CDK8 0.056818 0.079545 0.090909 0.011364 0.011364 0.022727 0.068182 0.045455 0.011364 0.011364 ... 0.103448 0.045977 0.022989 0.000000 0.752809 0.202247 0.044944 0.752809 0.202247 0.044944
BUB1 0.023256 0.069767 0.081395 0.000000 0.023256 0.011628 0.058140 0.023256 0.058140 0.023256 ... 0.105882 0.058824 0.070588 0.011765 0.558140 0.406977 0.034884 0.558140 0.406977 0.034884
MEKK3 0.083333 0.071429 0.059524 0.000000 0.071429 0.000000 0.047619 0.059524 0.059524 0.011905 ... 0.073171 0.048780 0.012195 0.000000 0.458824 0.470588 0.070588 0.458824 0.470588 0.070588
MAP2K3 0.045977 0.057471 0.114943 0.000000 0.045977 0.045977 0.022989 0.022989 0.022989 0.011494 ... 0.109756 0.085366 0.036585 0.000000 0.528090 0.191011 0.280899 0.528090 0.191011 0.280899
GRK1 0.060241 0.072289 0.084337 0.000000 0.048193 0.036145 0.024096 0.060241 0.012048 0.012048 ... 0.197368 0.039474 0.000000 0.013158 0.831325 0.156627 0.012048 0.831325 0.156627 0.012048

289 rows × 328 columns

All uppercase

Data.get_cddm_upper()
substrate -7P -7G -7A -7C -7S -7T -7V -7I -7L -7M ... 7Q 7N 7D 7E 0S 0T 0Y 0s 0t 0y
kinase
SRC 0.055749 0.064895 0.060105 0.010017 0.071429 0.046167 0.049216 0.037892 0.080139 0.020035 ... 0.045941 0.036574 0.074487 0.085192 0.038927 0.034602 0.926471 0.038927 0.034602 0.926471
EPHA3 0.042881 0.075316 0.068169 0.013194 0.064871 0.042881 0.048378 0.043430 0.079714 0.021440 ... 0.046275 0.046840 0.073928 0.092551 0.054526 0.035442 0.910033 0.054526 0.035442 0.910033
FES 0.049633 0.075578 0.065990 0.012972 0.059222 0.038353 0.055274 0.040045 0.080090 0.014100 ... 0.048276 0.044828 0.074138 0.084483 0.038699 0.026921 0.934380 0.038699 0.026921 0.934380
NTRK3 0.042771 0.074699 0.073494 0.012048 0.060843 0.039157 0.052410 0.044578 0.080723 0.016867 ... 0.040148 0.040148 0.071032 0.079679 0.101796 0.060479 0.837725 0.101796 0.060479 0.837725
ALK 0.045482 0.076214 0.070682 0.014136 0.056546 0.041795 0.049170 0.035648 0.079902 0.018439 ... 0.046497 0.039058 0.068196 0.081835 0.059038 0.044431 0.896531 0.059038 0.044431 0.896531
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
CDK8 0.056818 0.079545 0.090909 0.011364 0.056818 0.090909 0.068182 0.045455 0.011364 0.011364 ... 0.080460 0.057471 0.080460 0.103448 0.752809 0.202247 0.044944 0.752809 0.202247 0.044944
BUB1 0.023256 0.069767 0.081395 0.000000 0.081395 0.069767 0.058140 0.023256 0.058140 0.023256 ... 0.023529 0.070588 0.035294 0.105882 0.558140 0.406977 0.034884 0.558140 0.406977 0.034884
MEKK3 0.083333 0.071429 0.059524 0.000000 0.083333 0.011905 0.047619 0.059524 0.059524 0.011905 ... 0.060976 0.024390 0.036585 0.073171 0.458824 0.470588 0.070588 0.458824 0.470588 0.070588
MAP2K3 0.045977 0.057471 0.114943 0.000000 0.068966 0.045977 0.022989 0.022989 0.022989 0.011494 ... 0.024390 0.060976 0.073171 0.109756 0.528090 0.191011 0.280899 0.528090 0.191011 0.280899
GRK1 0.060241 0.072289 0.084337 0.000000 0.084337 0.096386 0.024096 0.060241 0.012048 0.012048 ... 0.013158 0.026316 0.118421 0.197368 0.831325 0.156627 0.012048 0.831325 0.156627 0.012048

289 rows × 286 columns

Other kinases

Data.get_cddm_others().head()
substrate -7P -7G -7A -7C -7S -7T -7V -7I -7L -7M ... 7Q 7N 7D 7E 7s 7t 7y 0s 0t 0y
kinase
LYNb 0.045929 0.068894 0.061935 0.013222 0.034795 0.029923 0.050800 0.045233 0.083507 0.022269 ... 0.045032 0.036455 0.074339 0.082202 0.027162 0.019299 0.017870 0.038010 0.035245 0.926745
ABL1[T315I] 0.046140 0.074534 0.066548 0.010648 0.039042 0.023957 0.055013 0.037267 0.075421 0.021295 ... 0.048693 0.035167 0.064022 0.079351 0.027953 0.022543 0.019838 0.085613 0.045013 0.869373
ABL1[E255K] 0.039631 0.065438 0.060829 0.014747 0.044240 0.030415 0.053456 0.036866 0.067281 0.021198 ... 0.048598 0.039252 0.073832 0.085047 0.025234 0.018692 0.017757 0.062271 0.042125 0.895604
RET[M918T] 0.046422 0.080271 0.066731 0.010638 0.029014 0.023211 0.038685 0.044487 0.069632 0.023211 ... 0.053202 0.042365 0.077833 0.086700 0.032512 0.023645 0.018719 0.042025 0.025788 0.932187
FGFR3[K650M] 0.031985 0.072437 0.068674 0.015992 0.031044 0.026341 0.035748 0.038570 0.079962 0.018815 ... 0.045977 0.040230 0.076628 0.089080 0.028736 0.022031 0.021073 0.051115 0.027881 0.921004

5 rows × 325 columns

Data.get_cddm_others_info().head()
kinase count
0 ALK 1889
1 ABL1 1837
2 RET 1769
3 LYNb 1694
4 MET 1485

PSSM: PSPA data

Normalized Tyr

Data.get_pspa_tyr_norm()
-5P -5G -5A -5C -5S -5T -5V -5I -5L -5M ... 5E 5s 5t 5y 0S 0T 0Y 0s 0t 0y
kinase
ABL1 0.0668 0.0689 0.0646 0.0520 0.0564 0.0539 0.0485 0.0448 0.0520 0.0536 ... 0.0339 0.0254 0.0254 0.0337 0 0 1 0 0 1
TNK2 0.0679 0.0818 0.0627 0.0617 0.0529 0.0528 0.0419 0.0463 0.0437 0.0453 ... 0.0572 0.0364 0.0364 0.0572 0 0 1 0 0 1
ALK 0.0675 0.0640 0.0590 0.0511 0.0476 0.0422 0.0455 0.0514 0.0546 0.0543 ... 0.0226 0.0181 0.0181 0.0172 0 0 1 0 0 1
ABL2 0.0687 0.0715 0.0611 0.0448 0.0537 0.0513 0.0467 0.0398 0.0462 0.0505 ... 0.0381 0.0252 0.0252 0.0289 0 0 1 0 0 1
AXL 0.0656 0.0753 0.0535 0.0525 0.0468 0.0467 0.0459 0.0538 0.0507 0.0542 ... 0.0559 0.0413 0.0413 0.0455 0 0 1 0 0 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
KDR 0.0634 0.0672 0.0556 0.0517 0.0541 0.0526 0.0427 0.0420 0.0428 0.0476 ... 0.0387 0.0335 0.0335 0.0406 0 0 1 0 0 1
FLT4 0.0457 0.0531 0.0488 0.0553 0.0512 0.0471 0.0432 0.0499 0.0474 0.0530 ... 0.0528 0.0600 0.0600 0.0464 0 0 1 0 0 1
WEE1_TYR 0.0531 0.0640 0.0559 0.0560 0.0433 0.0435 0.0568 0.0571 0.0637 0.0562 ... 0.0365 0.0453 0.0453 0.0490 0 0 1 0 0 1
YES1 0.0677 0.0571 0.0537 0.0530 0.0527 0.0505 0.0435 0.0375 0.0400 0.0463 ... 0.0482 0.0374 0.0374 0.0411 0 0 1 0 0 1
ZAP70 0.0602 0.0880 0.0623 0.0496 0.0471 0.0514 0.0465 0.0380 0.0307 0.0526 ... 0.0710 0.0862 0.0862 0.0605 0 0 1 0 0 1

93 rows × 236 columns

Normalized Ser/Thr

Data.get_pspa_st_norm()
-5P -5G -5A -5C -5S -5T -5V -5I -5L -5M ... 4E 4s 4t 4y 0s 0t 0y 0S 0T 0Y
kinase
AAK1 0.0720 0.0245 0.0284 0.0456 0.0425 0.0425 0.0951 0.1554 0.0993 0.0864 ... 0.0457 0.0251 0.0251 0.0270 0.1013 1.0000 0.0 0.1013 1.0000 0.0
ACVR2A 0.0415 0.0481 0.0584 0.0489 0.0578 0.0578 0.0598 0.0625 0.0596 0.0521 ... 0.0640 0.0703 0.0703 0.0589 0.9833 1.0000 0.0 0.9833 1.0000 0.0
ACVR2B 0.0533 0.0517 0.0566 0.0772 0.0533 0.0533 0.0543 0.0442 0.0471 0.0516 ... 0.0697 0.0761 0.0761 0.0637 0.9593 1.0000 0.0 0.9593 1.0000 0.0
AKT1 0.0603 0.0594 0.0552 0.0605 0.0516 0.0516 0.0427 0.0435 0.0464 0.0505 ... 0.0312 0.0393 0.0393 0.0263 1.0000 0.6440 0.0 1.0000 0.6440 0.0
AKT2 0.0602 0.0617 0.0643 0.0582 0.0534 0.0534 0.0433 0.0418 0.0493 0.0513 ... 0.0350 0.0548 0.0548 0.0417 1.0000 0.6077 0.0 1.0000 0.6077 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
YANK2 0.0580 0.0699 0.0637 0.0602 0.0580 0.0580 0.0433 0.0470 0.0459 0.0469 ... 0.0452 0.1095 0.1095 0.6305 0.6321 1.0000 0.0 0.6321 1.0000 0.0
YANK3 0.0625 0.0776 0.0647 0.0598 0.0545 0.0545 0.0502 0.0537 0.0561 0.0543 ... 0.0862 0.1204 0.1204 0.5776 1.0000 0.8985 0.0 1.0000 0.8985 0.0
YSK1 0.0590 0.0713 0.0731 0.0606 0.0542 0.0542 0.0499 0.0471 0.0446 0.0529 ... 0.0267 0.0256 0.0256 0.0219 0.2558 1.0000 0.0 0.2558 1.0000 0.0
YSK4 0.0593 0.0728 0.0744 0.0734 0.0597 0.0597 0.0517 0.0400 0.0433 0.0512 ... 0.0484 0.0634 0.0634 0.0389 0.7907 1.0000 0.0 0.7907 1.0000 0.0
ZAK 0.0604 0.0641 0.0659 0.0631 0.0597 0.0597 0.0454 0.0431 0.0477 0.0484 ... 0.0370 0.0390 0.0390 0.0408 0.6135 1.0000 0.0 0.6135 1.0000 0.0

303 rows × 213 columns

Normalized all

Data.get_pspa_all_norm()
-5P -5G -5A -5C -5S -5T -5V -5I -5L -5M ... 5H 5K 5R 5Q 5N 5D 5E 5s 5t 5y
kinase
AAK1 0.0720 0.0245 0.0284 0.0456 0.0425 0.0425 0.0951 0.1554 0.0993 0.0864 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
ACVR2A 0.0415 0.0481 0.0584 0.0489 0.0578 0.0578 0.0598 0.0625 0.0596 0.0521 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
ACVR2B 0.0533 0.0517 0.0566 0.0772 0.0533 0.0533 0.0543 0.0442 0.0471 0.0516 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
AKT1 0.0603 0.0594 0.0552 0.0605 0.0516 0.0516 0.0427 0.0435 0.0464 0.0505 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
AKT2 0.0602 0.0617 0.0643 0.0582 0.0534 0.0534 0.0433 0.0418 0.0493 0.0513 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
KDR 0.0634 0.0672 0.0556 0.0517 0.0541 0.0526 0.0427 0.0420 0.0428 0.0476 ... 0.0543 0.0653 0.0771 0.0509 0.0582 0.0414 0.0387 0.0335 0.0335 0.0406
FLT4 0.0457 0.0531 0.0488 0.0553 0.0512 0.0471 0.0432 0.0499 0.0474 0.0530 ... 0.0624 0.0564 0.0559 0.0537 0.0610 0.0620 0.0528 0.0600 0.0600 0.0464
WEE1_TYR 0.0531 0.0640 0.0559 0.0560 0.0433 0.0435 0.0568 0.0571 0.0637 0.0562 ... 0.0585 0.1058 0.1658 0.0447 0.0495 0.0312 0.0365 0.0453 0.0453 0.0490
YES1 0.0677 0.0571 0.0537 0.0530 0.0527 0.0505 0.0435 0.0375 0.0400 0.0463 ... 0.0593 0.0662 0.0840 0.0559 0.0604 0.0422 0.0482 0.0374 0.0374 0.0411
ZAP70 0.0602 0.0880 0.0623 0.0496 0.0471 0.0514 0.0465 0.0380 0.0307 0.0526 ... 0.0484 0.0477 0.0290 0.0520 0.0537 0.0709 0.0710 0.0862 0.0862 0.0605

396 rows × 236 columns

(?) Combined PSPA

Data.get_combine()
-5P -5G -5A -5C -5S -5T -5V -5I -5L -5M ... 4E 4s 4t 4y 0s 0t 0y 0S 0T 0Y
kinase
CK1A 0.029499 0.106195 0.058997 0.008850 0.029499 0.020649 0.035398 0.029499 0.085546 0.061947 ... 0.124629 0.074184 0.029674 0.023739 0.800587 0.129032 0.070381 0.800587 0.129032 0.070381
CK1D 0.047619 0.084942 0.082368 0.011583 0.029601 0.023166 0.052767 0.034749 0.047619 0.024453 ... 0.143969 0.067445 0.018158 0.032425 0.745174 0.213642 0.041184 0.745174 0.213642 0.041184
CK1E 0.060386 0.099034 0.086957 0.024155 0.016908 0.024155 0.057971 0.028986 0.060386 0.024155 ... 0.168704 0.075795 0.012225 0.029340 0.729469 0.219807 0.050725 0.729469 0.219807 0.050725
CK1G1 0.034749 0.111969 0.073359 0.015444 0.023166 0.019305 0.023166 0.042471 0.046332 0.027027 ... 0.119691 0.061776 0.007722 0.054054 0.818533 0.146718 0.034749 0.818533 0.146718 0.034749
CK1G2 0.023055 0.086455 0.112392 0.011527 0.023055 0.031700 0.060519 0.031700 0.043228 0.025937 ... 0.127907 0.061047 0.017442 0.040698 0.835735 0.126801 0.037464 0.835735 0.126801 0.037464
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
VRK2 0.050454 0.049864 0.046913 0.047503 0.042266 0.042266 0.042266 0.037619 0.037545 0.038430 ... 0.033657 0.045816 0.045816 0.059786 0.294284 0.705716 0.000000 0.294284 0.705716 0.000000
WNK4 0.028356 0.040191 0.041420 0.041804 0.044571 0.044571 0.040267 0.048490 0.051333 0.044571 ... 0.029169 0.028236 0.028236 0.030803 0.455990 0.544010 0.000000 0.455990 0.544010 0.000000
YANK2 0.039266 0.047322 0.043125 0.040756 0.039266 0.039266 0.029314 0.031819 0.031074 0.031751 ... 0.022667 0.054912 0.054912 0.316183 0.387292 0.612708 0.000000 0.387292 0.612708 0.000000
YANK3 0.045607 0.056626 0.047212 0.043637 0.039769 0.039769 0.036632 0.039186 0.040937 0.039623 ... 0.043549 0.060827 0.060827 0.291806 0.526732 0.473268 0.000000 0.526732 0.473268 0.000000
YSK4 0.042943 0.052719 0.053878 0.053154 0.043233 0.043233 0.037439 0.028967 0.031356 0.037077 ... 0.036228 0.047455 0.047455 0.029117 0.441559 0.558441 0.000000 0.441559 0.558441 0.000000

390 rows × 213 columns

Reference data to calculate percentile

PSPA Ser/Thr score across human phosphoproteome

Data.get_pspa_st_pct()
kinase AAK1 ACVR2A ACVR2B AKT1 AKT2 AKT3 ALK2 ALK4 ALPHAK3 AMPKA1 ... VRK1 VRK2 WNK1 WNK3 WNK4 YANK2 YANK3 YSK1 YSK4 ZAK
0 -10.960 -0.581 0.329 -3.891 -3.591 -5.312 0.814 -0.559 -0.933 -2.607 ... -4.682 -2.854 -1.669 -1.527 -2.965 -2.877 -1.792 -6.283 -1.715 -3.204
1 -6.788 -0.166 0.307 -5.886 -4.786 -6.576 1.561 -0.865 -3.399 -3.261 ... -5.670 -2.817 -4.071 -3.394 -5.097 -1.874 -1.480 -8.709 -3.708 -6.093
2 -9.031 1.232 1.775 -6.164 -5.446 -8.330 0.778 -1.355 -0.929 -4.998 ... -5.832 -3.243 -4.249 -2.750 -5.053 0.581 -0.503 -6.448 -1.897 -2.847
3 -4.849 2.272 2.057 -2.886 -2.380 -3.635 1.547 2.735 -2.826 -1.697 ... -2.758 -1.699 -1.725 -0.091 -0.673 0.313 -0.207 -2.316 -0.054 -1.118
4 -6.597 -1.388 -0.956 -2.834 -3.794 -4.969 -1.862 -1.717 -2.653 -3.515 ... -1.546 -1.457 -1.278 0.511 -1.046 -0.314 -1.023 -2.482 -2.227 -1.593
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
89779 -7.310 3.942 3.002 -6.844 -4.643 -7.518 2.899 3.402 -1.935 -3.813 ... -5.589 -3.034 -4.303 -4.057 -4.097 0.384 -0.680 -5.214 -2.004 -1.670
89780 -8.009 1.134 1.012 -4.002 -2.917 -4.024 -0.157 -0.607 0.828 -4.222 ... -5.383 -2.866 -2.781 -2.824 -3.990 1.137 0.139 -5.171 -1.500 -2.052
89781 -0.940 -2.553 -2.435 -5.031 -3.635 -6.779 -0.982 -0.106 -3.507 -4.883 ... -2.522 -0.100 -3.280 -4.375 -3.447 -5.780 -3.851 -4.275 -3.504 -4.381
89782 -3.753 1.451 1.883 -5.583 -5.253 -7.164 1.226 -0.399 3.341 -5.932 ... -1.930 -1.420 -5.949 -4.854 -5.401 -1.853 -2.068 -2.824 -0.340 -1.326
89783 -1.540 -2.180 -2.014 -2.416 -0.592 -1.364 -3.320 -0.826 -4.438 -1.393 ... -1.979 -0.661 -2.586 -4.076 -2.832 -0.575 -0.859 -2.415 -2.999 -2.550

89784 rows × 303 columns

PSPA Tyr score across human phosphoproteome

Data.get_pspa_tyr_pct()
kinase ABL1 TNK2 ALK ABL2 AXL BLK BMPR2_TYR PTK6 BTK CSF1R ... NTRK3 TXK TYK2 TYRO3 FLT1 KDR FLT4 WEE1_TYR YES1 ZAP70
0 -0.709617 -3.624831 -2.136338 -0.022776 -0.737589 2.345905 0.504821 2.417165 -0.121611 -1.205218 ... -0.368491 1.187208 -1.601712 -1.143748 -0.891566 -1.888643 -1.758264 -1.610344 4.545175 0.280174
1 0.986158 -1.645273 -1.183920 0.553010 -1.098784 -1.245678 -0.276461 -0.156496 -1.322652 -0.684989 ... -0.777541 -0.385554 -0.624216 -0.737089 -0.315447 -1.293708 -1.182827 -1.891533 -0.456570 -2.465316
2 -4.000671 0.543232 -4.721913 -3.662958 -2.086910 -6.134138 -0.380569 -2.595287 -3.307418 -2.386468 ... -6.363768 -4.401061 -1.096380 -2.017356 -2.000577 -1.511887 -1.844273 2.112679 -3.783810 -5.066184
3 1.496697 1.335568 -1.360722 1.760211 1.016971 -0.106255 -0.547279 -0.916277 -0.572105 -1.044687 ... -1.244076 1.742046 -1.782387 0.598170 -1.859460 -1.254715 -2.740284 0.392029 -1.136538 -0.588075
4 -0.992936 -1.729882 -1.510540 -0.906642 -0.261331 -0.977430 0.886090 -0.460256 0.173188 0.970516 ... 1.234777 0.244627 0.108616 0.952371 0.615983 0.423058 0.148546 1.342049 -0.721687 0.909419
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
7310 -0.696420 -1.151890 -2.088003 -0.799744 -0.758219 -3.110850 -0.060125 0.637491 -1.755881 -1.128610 ... -1.532597 -0.325391 -1.102283 -0.877872 0.152143 -1.525713 -1.607762 -1.833375 -2.765932 -0.503243
7311 0.741063 0.847899 0.151897 0.858172 0.348263 0.187113 0.204329 -0.822278 -1.292904 0.546294 ... -0.195173 -0.924637 0.086444 0.114326 -2.471192 -0.677875 -1.549373 -2.780892 -1.075559 0.499488
7312 -2.858631 0.269949 -3.219751 -2.741632 -1.810684 -5.456068 -0.460094 -2.476075 -4.069701 -1.687347 ... -6.592211 -5.588492 -2.446790 -2.328961 -2.656506 -0.631762 -2.268915 -0.440015 -3.189601 -2.032972
7313 0.737694 -0.477689 -0.646850 0.928066 0.187149 -1.000041 -0.283551 -3.053869 -0.750475 0.132043 ... -0.122134 -1.275022 -0.020350 0.483620 -0.060204 1.378042 0.573273 -2.383657 -0.246005 1.174693
7314 2.115113 0.153795 0.356357 1.846239 -0.856035 -0.422296 -0.985140 0.554181 0.381133 -1.666383 ... -1.670434 1.684176 -0.508297 -0.304215 -2.045909 -1.629804 -2.227050 -2.294855 0.428825 -1.789086

7315 rows × 93 columns

Amino acids

Data.get_aa_info().head()
Name SMILES MW pKa1 pKb2 pKx3 pl4 H VSC P1 P2 SASA NCISC phospho
aa
A Alanine C[C@@H](C(=O)O)N 89.10 2.34 9.69 NaN 6.00 0.62 27.5 8.1 0.046 1.181 0.007187 0
C Cysteine C([C@@H](C(=O)O)N)S 121.16 1.96 10.28 8.18 5.07 0.29 44.6 5.5 0.128 1.461 -0.036610 0
D Aspartic acid C([C@@H](C(=O)O)N)C(=O)O 133.11 1.88 9.60 3.65 2.77 -0.90 40.0 13.0 0.105 1.587 -0.023820 0
E Glutamic acid C(CC(=O)O)[C@@H](C(=O)O)N 147.13 2.19 9.67 4.25 3.22 -0.74 62.0 12.3 0.151 1.862 0.006802 0
F Phenylalanine c1ccc(cc1)C[C@@H](C(=O)O)N 165.19 1.83 9.13 NaN 5.48 1.19 115.5 5.2 0.290 2.228 0.037552 0

Rdkit features

Data.get_aa_rdkit().head()
MaxAbsEStateIndex MinAbsEStateIndex MinEStateIndex qed MolWt MinPartialCharge MaxAbsPartialCharge FpDensityMorgan1 FpDensityMorgan2 FpDensityMorgan3 ... fr_Ar_N fr_C_O fr_NH0 fr_NH1 fr_NH2 fr_SH fr_imidazole fr_priamide fr_sulfide fr_unbrch_alkane
aa
A 9.574074 0.731481 -0.962963 0.451352 89.094 -0.480094 0.480094 2.000000 2.166667 2.166667 ... 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
C 9.756435 0.189815 -1.004630 0.424382 121.161 -0.480064 0.480064 2.000000 2.428571 2.428571 ... 0.0 1.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0
D 9.846435 0.532407 -1.294074 0.452021 133.103 -0.481175 0.481175 1.444444 1.888889 2.000000 ... 0.0 2.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
E 9.993880 0.023148 -1.165509 0.485976 147.130 -0.481229 0.481229 1.400000 1.900000 2.200000 ... 0.0 2.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0
F 10.378642 0.385093 -0.959395 0.690463 165.192 -0.480078 0.480078 1.416667 2.000000 2.500000 ... 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0

5 rows × 106 columns

Morgan features

Data.get_aa_morgan().head()
morgan_1 morgan_11 morgan_24 morgan_27 morgan_70 morgan_74 morgan_79 morgan_80 morgan_82 morgan_116 ... morgan_1879 morgan_1882 morgan_1898 morgan_1911 morgan_1912 morgan_1926 morgan_1937 morgan_1942 morgan_1946 morgan_1970
aa
A 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
C 1 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 0 1 0 0 0 0
D 1 0 0 0 1 0 0 1 0 0 ... 0 0 0 0 0 0 0 0 0 0
E 1 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 0 0 0 0 0 0
F 1 0 0 0 0 0 1 1 0 0 ... 0 0 0 0 0 0 0 0 0 0

5 rows × 168 columns

Others

Number of random aa for each kinase when calculating PSPA score

# Data.get_num_dict()
pd.DataFrame.from_dict(Data.get_num_dict(),orient="index")
0
SYK 18
PTK2 18
ZAP70 18
ERBB2 18
CSK 18
... ...
YANK3 17
YSK1 17
ZAK 17
EEF2K 17
FAM20C 17

396 rows × 1 columns