from katlas.core import *
import pandas as pd
Scoring with single input sequence
In this session, we will conduct substrate scoring on well known kinase-substrate pairs. We will show examples of the two methods respectively:
- CDDM, computational data-driven method
- PSPA, positional scanning peptide array
Above is a diagram showing the calculation of the score using sum as aggregation, which is used in CDDM. For PSPA, values are multiplied followed by a log transform (which is equal to log transform first followed by sum)
Setup
Run below to install:
!pip install python-katlas
Test with CDDM
We provide two parameters for CDDM:
- param_CDDM: if you know the phosphorylated status of the substrate sequence
- param_CDDM_upper: for all capital sequence
Below are substrate sequences from PhosphoSitePlus:
# ATM/ATR/DNAPK --> H2AX
'GkkAtQAsQEy____',**Params('CDDM')) predict_kinase(
considering string: ['-7G', '-6K', '-5K', '-4A', '-3t', '-2Q', '-1A', '0s', '1Q', '2E', '3y']
kinase
ATR 2.321
ATM 2.291
DNAPK 2.013
NIM1 1.663
MARK3 1.658
...
HCK 0.668
SRC 0.656
FYN 0.654
JAK2 0.654
TNK1 0.641
Length: 289, dtype: float64
# AKT--> mTOR
'RsRtRtDsysAGQsV',**Params('CDDM')) predict_kinase(
considering string: ['-7R', '-6s', '-5R', '-4t', '-3R', '-2t', '-1D', '0s', '1y', '2s', '3A', '4G', '5Q', '6s', '7V']
kinase
AKT1 2.702
SGK1 2.476
P90RSK 2.473
AKT2 2.437
AKT3 2.436
...
FLT3 0.719
LCK 0.717
SRC 0.708
TEC 0.686
FYN 0.685
Length: 289, dtype: float64
# ATM/ATR --> p53, S15
'PsVEPPLsQEtFsDL',**Params('CDDM')) predict_kinase(
considering string: ['-7P', '-6s', '-5V', '-4E', '-3P', '-2P', '-1L', '0s', '1Q', '2E', '3t', '4F', '5s', '6D', '7L']
kinase
ATR 2.998
ATM 2.800
DNAPK 2.340
CK2A1 1.912
CDK8 1.908
...
YES1 0.850
DDR2 0.833
WEE1 0.818
TNK1 0.809
TYK2 0.804
Length: 289, dtype: float64
# ABL--> CRKL, Y207
'IPEPAHAyAQPQttt',**Params('CDDM')) predict_kinase(
considering string: ['-7I', '-6P', '-5E', '-4P', '-3A', '-2H', '-1A', '0y', '1A', '2Q', '3P', '4Q', '5t', '6t', '7t']
kinase
ABL1 1.722
TNK2 1.700
ABL2 1.672
JAK2 1.669
FER 1.652
...
CK1G2 0.560
DCAMKL1 0.551
CK1G1 0.540
GRK7 0.526
CK2A1 0.518
Length: 289, dtype: float64
# EGFR --> EGFR, Y1092
'tFLPVPEyINQsVPk',**Params('CDDM')) predict_kinase(
considering string: ['-7t', '-6F', '-5L', '-4P', '-3V', '-2P', '-1E', '0y', '1I', '2N', '3Q', '4s', '5V', '6P', '7K']
kinase
EGFR 1.774
CSK 1.733
JAK2 1.731
ERBB4 1.725
FLT3 1.719
...
PKACB 0.641
PAK6 0.630
NIM1 0.627
PAK5 0.593
SGK2 0.571
Length: 289, dtype: float64
# JAK2 --> STAT3, Y705
'DPGsAAPyLktKFIC',**Params('CDDM')) predict_kinase(
considering string: ['-7D', '-6P', '-5G', '-4s', '-3A', '-2A', '-1P', '0y', '1L', '2K', '3t', '4K', '5F', '6I', '7C']
kinase
JAK2 1.716
EPHA4 1.709
KIT 1.702
FLT3 1.696
TNK1 1.696
...
CAMK4 0.567
BRSK2 0.558
PAK5 0.555
CAMK1D 0.535
PRKX 0.532
Length: 289, dtype: float64
# LCK --> cd3 zeta,y83
'NLGRREEyDVLDkRR',**Params('CDDM')) predict_kinase(
considering string: ['-7N', '-6L', '-5G', '-4R', '-3R', '-2E', '-1E', '0y', '1D', '2V', '3L', '4D', '5K', '6R', '7R']
kinase
PTK2 2.155
LCK 2.127
ZAP70 2.117
EPHA2 2.117
BLK 2.105
...
GSK3A 0.801
ERK1 0.778
ATR 0.767
GSK3B 0.758
DNAPK 0.689
Length: 289, dtype: float64
# SYK--> BLNK, Y96
'EENADDSyEPPPVEQ',**Params('CDDM')) predict_kinase(
considering string: ['-7E', '-6E', '-5N', '-4A', '-3D', '-2D', '-1S', '0y', '1E', '2P', '3P', '4P', '5V', '6E', '7Q']
kinase
SYK 2.045
ZAP70 2.038
LCK 1.975
EGFR 1.967
PTK2 1.966
...
PKCA 0.656
PKCB 0.651
PKCD 0.640
PHKG1 0.638
PKCZ 0.631
Length: 289, dtype: float64
# CDK4 --> RB1, S807
'PGGNIyIsPLksPyk',**Params('CDDM')) predict_kinase(
considering string: ['-7P', '-6G', '-5G', '-4N', '-3I', '-2y', '-1I', '0s', '1P', '2L', '3K', '4s', '5P', '6y', '7K']
kinase
CDK2 2.369
CDK4 2.351
CDK1 2.346
CDK3 2.300
CDK5 2.296
...
JAK3 0.700
EPHA4 0.698
ERBB4 0.688
FGFR4 0.687
TNK2 0.665
Length: 289, dtype: float64
# AKT --> TSC2, S939
'sFRARstsLNERPKs',**Params('CDDM')) predict_kinase(
considering string: ['-7s', '-6F', '-5R', '-4A', '-3R', '-2s', '-1t', '0s', '1L', '2N', '3E', '4R', '5P', '6K', '7s']
kinase
AKT1 2.776
SGK1 2.578
AKT3 2.526
AKT2 2.437
P90RSK 2.420
...
EPHA4 0.717
FER 0.714
TNK2 0.705
TEC 0.702
FYN 0.696
Length: 289, dtype: float64
# CK1G1 --> NFkB, RELA S536
'sGDEDFSsIADMDFS',**Params('CDDM')) predict_kinase(
considering string: ['-7s', '-6G', '-5D', '-4E', '-3D', '-2F', '-1S', '0s', '1I', '2A', '3D', '4M', '5D', '6F', '7S']
kinase
CK1G1 2.177
CK1G2 2.100
CK1G3 2.012
CK2A1 1.942
PAK6 1.916
...
DDR2 0.769
TYRO3 0.769
TNK1 0.768
TNK2 0.746
AXL 0.743
Length: 289, dtype: float64
# LKB1 --> AMPK
'sDGEFLRtsCGsPNY',**Params('CDDM')) predict_kinase(
considering string: ['-7s', '-6D', '-5G', '-4E', '-3F', '-2L', '-1R', '0t', '1s', '2C', '3G', '4s', '5P', '6N', '7Y']
kinase
LKB1 1.898
CAMKK2 1.690
CAMKK1 1.684
PBK 1.485
GSK3A 1.403
...
CSK 0.611
DDR2 0.611
KIT 0.607
FGFR4 0.594
TSSK2 0.590
Length: 289, dtype: float64
Test with PSPA
We provide three parameters for PSPA:
- param_PSPA_s : for S/T sequence
- param_PSPA_y : for Y sequence
- param_PSPA : lazy mode, for both S/T and Y sequences, run slower
PSPA performs the best on substrate sequences with phosphorylation status informed.
#Insulin Receptor and IRS-1 (Insulin Receptor Substrate 1)
# Kinase: Insulin Receptor
# Substrate: IRS-1 #Y612, Y632, Y662
'GRKGsGDyMPMsPKs',**Params('PSPA')) predict_kinase(
considering string: ['-5K', '-4G', '-3s', '-2G', '-1D', '0y', '1M', '2P', '3M', '4s', '5P']
kinase
ZAP70 6.625
INSRR 4.442
IGF1R 3.792
FLT1 3.693
ERBB4 3.503
...
YANK2 NaN
YANK3 NaN
YSK1 NaN
YSK4 NaN
ZAK NaN
Length: 396, dtype: float64
We’ll get the same result with PSPA_y
, which do not include the calculation of Ser/Thr kinase (those NaNs) and works faster.
'GRKGsGDyMPMsPKs',**Params('PSPA_y')) predict_kinase(
considering string: ['-5K', '-4G', '-3s', '-2G', '-1D', '0y', '1M', '2P', '3M', '4s', '5P']
kinase
ZAP70 6.625
INSRR 4.442
IGF1R 3.792
FLT1 3.693
ERBB4 3.503
...
TEC -1.348
TNNI3K_TYR -1.713
LIMK1_TYR -2.112
TNK1 -2.217
BTK -2.622
Length: 93, dtype: float64
Let’s try using param_PSPA_st
# CK1G1 --> NFkB, RELA S536
'sGDEDFSsIADMDFS',**Params('PSPA_st')) predict_kinase(
considering string: ['-5D', '-4E', '-3D', '-2F', '-1S', '0s', '1I', '2A', '3D', '4M']
kinase
IKKA 5.435
CK1G3 4.977
GRK1 4.488
IKKB 4.286
CK1G2 4.184
...
DMPK1 -8.521
MOK -9.204
BUB1 -9.361
CDK10 -10.330
AAK1 -10.342
Length: 303, dtype: float64
param_PSPA_st
shows same result with param_PSPA
, but faster
'sGDEDFSsIADMDFS',**Params('PSPA')) predict_kinase(
considering string: ['-5D', '-4E', '-3D', '-2F', '-1S', '0s', '1I', '2A', '3D', '4M', '5D']
kinase
IKKA 5.435
CK1G3 4.977
GRK1 4.488
IKKB 4.286
CK1G2 4.184
...
KDR NaN
FLT4 NaN
WEE1_TYR NaN
YES1 NaN
ZAP70 NaN
Length: 396, dtype: float64
Customize reference PSSM and aggregation function
You can put your own PSSM dataframe and aggregation function in predict_kinase
and predict_kinase_df
For example, predict_kinase(‘sGDEDFSsIADMDFS’,ref = df, func=some_func)
Here we show an example of PSPA canonical TK as ref and sumup as func:
= Params('PSPA')['ref'] ref
= ref[ref['0Y']==1] TK
= TK.loc[~TK.index.str.contains('_TYR'),:] canonical_TK
canonical_TK.head()
-5P | -5G | -5A | -5C | -5S | -5T | -5V | -5I | -5L | -5M | ... | 5H | 5K | 5R | 5Q | 5N | 5D | 5E | 5s | 5t | 5y | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
kinase | |||||||||||||||||||||
ABL1 | 0.0668 | 0.0689 | 0.0646 | 0.0520 | 0.0564 | 0.0539 | 0.0485 | 0.0448 | 0.0520 | 0.0536 | ... | 0.0613 | 0.0652 | 0.0756 | 0.0526 | 0.0512 | 0.0362 | 0.0339 | 0.0254 | 0.0254 | 0.0337 |
TNK2 | 0.0679 | 0.0818 | 0.0627 | 0.0617 | 0.0529 | 0.0528 | 0.0419 | 0.0463 | 0.0437 | 0.0453 | ... | 0.0499 | 0.0385 | 0.0302 | 0.0531 | 0.0465 | 0.0630 | 0.0572 | 0.0364 | 0.0364 | 0.0572 |
ALK | 0.0675 | 0.0640 | 0.0590 | 0.0511 | 0.0476 | 0.0422 | 0.0455 | 0.0514 | 0.0546 | 0.0543 | ... | 0.0448 | 0.0367 | 0.0489 | 0.0334 | 0.0387 | 0.0245 | 0.0226 | 0.0181 | 0.0181 | 0.0172 |
ABL2 | 0.0687 | 0.0715 | 0.0611 | 0.0448 | 0.0537 | 0.0513 | 0.0467 | 0.0398 | 0.0462 | 0.0505 | ... | 0.0566 | 0.0640 | 0.0779 | 0.0538 | 0.0565 | 0.0378 | 0.0381 | 0.0252 | 0.0252 | 0.0289 |
AXL | 0.0656 | 0.0753 | 0.0535 | 0.0525 | 0.0468 | 0.0467 | 0.0459 | 0.0538 | 0.0507 | 0.0542 | ... | 0.0441 | 0.0506 | 0.0355 | 0.0635 | 0.0696 | 0.0592 | 0.0559 | 0.0413 | 0.0413 | 0.0455 |
5 rows × 236 columns
'GRKGsGDyMPMsPKs',ref =canonical_TK, func=sumup) predict_kinase(
considering string: ['-5K', '-4G', '-3s', '-2G', '-1D', '0y', '1M', '2P', '3M', '4s', '5P']
kinase
ZAP70 2.041
INSRR 1.907
FLT1 1.906
PTK2 1.873
SYK 1.842
...
PTK6 1.546
LYN 1.541
PDGFRA 1.539
TEC 1.539
BTK 1.496
Length: 78, dtype: float64