cut_seq('AAkUuPSFSTtH',-5,4)'AkUuPSFSTt'
cut_seq (input_string:str, min_position:int, max_position:int)
Extract sequence based on a range relative to its center position
| Type | Details | |
|---|---|---|
| input_string | str | site sequence |
| min_position | int | minimum position relative to its center |
| max_position | int | maximum position relative to its center |
STY2sty (input_string:str)
Replace all ‘STY’ with ‘sty’ in a sequence
get_dict (input_string:str)
Get a dictionary of input string; no need for the star in the middle; make sure it is 15 or 10 length
| Type | Details | |
|---|---|---|
| input_string | str | phosphorylation site sequence |
multiply (values, kinase=None, num_aa=23)
Multiply the possibilities of the amino acids at each position in a phosphorylation site
| Type | Default | Details | |
|---|---|---|---|
| values | list of values, possibilities of amino acids at certain positions | ||
| kinase | NoneType | None | |
| num_aa | int | 23 | number of amino acids, 23 for standard CDDM, 20 for all uppercase CDDM |
\[ \text{Score} = \log_2 \left( \frac{ \prod P_{\text{KinX}}(\text{AA}, \text{Position}) }{ \left( \frac{1}{\#\text{Random AA}} \right)^{\text{length(Position except 0)}} } \right) \]
The function implement formula from Johnson et al. Nature: An atlas of substrate specificities for the human serine/threonine kinome, Supplementary Note2 (page 160)
Multiply class, consider the dynamics of scale factor
multiply_pspa (values, kinase, num_aa_dict={'SYK': 18, 'PTK2': 18, 'ZAP70': 18, 'ERBB2': 18, 'CSK': 18, 'FGFR4': 18, 'EGFR': 18, 'ERBB4': 18, 'EPHA8': 18, 'EPHA7': 18, 'EPHA5': 18, 'EPHA2': 18, 'EPHB2': 18, 'EPHB1': 18, 'EPHB3': 18, 'EPHB4': 18, 'EPHA4': 18, 'EPHA3': 18, 'EPHA6': 18, 'FRK': 18, 'EPHA1': 18, 'TEC': 18, 'BTK': 18, 'ITK': 18, 'BMX': 18, 'TXK': 16, 'ABL2': 18, 'ABL1': 18, 'SRMS': 18, 'PTK2B': 18, 'FER': 18, 'MERTK': 18, 'AXL': 18, 'FES': 18, 'PTK6': 18, 'YES1': 18, 'FGR': 18, 'SRC': 18, 'FYN': 18, 'LCK': 18, 'BLK': 18, 'LYN': 18, 'HCK': 18, 'PDGFRB': 18, 'PDGFRA': 18, 'FLT3': 18, 'TYRO3': 18, 'ROS1': 18, 'TEK': 18, 'LTK': 18, 'ALK': 18, 'MUSK': 18, 'KIT': 18, 'CSF1R': 18, 'MET': 18, 'KDR': 18, 'RET': 18, 'MST1R': 16, 'JAK3': 16, 'FLT1': 16, 'MATK': 18, 'FGFR3': 18, 'FGFR2': 18, 'FGFR1': 18, 'FLT4': 18, 'INSR': 18, 'IGF1R': 18, 'INSRR': 16, 'NTRK3': 18, 'NTRK1': 18, 'NTRK2': 18, 'TNK1': 18, 'TNK2': 18, 'DDR2': 18, 'DDR1': 18, 'TYK2': 18, 'JAK2': 18, 'JAK1': 18, 'TNNI3K_TYR': 18, 'NEK10_TYR': 16, 'PINK1_TYR': 16, 'MAP2K7_TYR': 16, 'PKMYT1_TYR': 16, 'TESK1_TYR': 16, 'LIMK1_TYR': 16, 'LIMK2_TYR': 16, 'WEE1_TYR': 18, 'MAP2K6_TYR': 16, 'MAP2K4_TYR': 16, 'PDHK1_TYR': 16, 'BMPR2_TYR': 16, 'PDHK4_TYR': 16, 'PDHK3_TYR': 16, 'AAK1': 17, 'ACVR2A': 17, 'ACVR2B': 17, 'AKT1': 17, 'AKT2': 17, 'AKT3': 17, 'ALK2': 17, 'ALK4': 17, 'ALPHAK3': 17, 'AMPKA1': 17, 'AMPKA2': 17, 'ANKRD3': 17, 'ATM': 17, 'ATR': 17, 'AURA': 17, 'AURB': 17, 'AURC': 17, 'GRK2': 17, 'GRK3': 17, 'BCKDK': 17, 'BIKE': 17, 'BMPR1A': 17, 'BMPR1B': 17, 'BMPR2': 17, 'BRAF': 17, 'BRSK1': 17, 'BRSK2': 17, 'BUB1': 17, 'CAMK1A': 17, 'CAMK1B': 17, 'CAMK1D': 17, 'CAMK1G': 17, 'CAMK2A': 17, 'CAMK2B': 17, 'CAMK2D': 17, 'CAMK2G': 17, 'CAMK4': 17, 'CAMKK1': 17, 'CAMKK2': 17, 'CAMLCK': 17, 'CDK1': 17, 'CDC7': 17, 'CDK10': 17, 'CDK19': 17, 'CDK2': 17, 'CDK3': 17, 'CDK4': 17, 'CDK5': 17, 'CDK6': 17, 'CDK7': 17, 'CDK8': 17, 'CDK9': 17, 'CDKL1': 17, 'CDKL5': 17, 'CHAK1': 17, 'CHAK2': 17, 'CDK13': 17, 'CHK1': 17, 'CHK2': 17, 'CK1A': 17, 'CK1A2': 17, 'CK1D': 17, 'CK1E': 17, 'CK1G1': 17, 'CK1G2': 17, 'CK1G3': 17, 'CK2A1': 17, 'CK2A2': 17, 'CLK1': 17, 'CLK2': 17, 'CLK3': 17, 'CLK4': 17, 'COT': 17, 'CRIK': 17, 'CDK12': 17, 'DAPK1': 17, 'DAPK2': 17, 'DAPK3': 17, 'DCAMKL1': 17, 'DCAMKL2': 17, 'DLK': 17, 'DMPK1': 17, 'DNAPK': 17, 'DRAK1': 17, 'DYRK1A': 17, 'DYRK1B': 17, 'DYRK2': 17, 'DYRK3': 17, 'DYRK4': 17, 'ERK1': 17, 'ERK2': 17, 'ERK5': 17, 'ERK7': 17, 'MTOR': 17, 'GAK': 17, 'GCK': 17, 'GCN2': 17, 'GRK4': 17, 'GRK5': 17, 'GRK6': 17, 'GRK7': 17, 'GSK3A': 17, 'GSK3B': 17, 'HASPIN': 17, 'HGK': 17, 'HIPK1': 17, 'HIPK2': 17, 'HIPK3': 17, 'HIPK4': 17, 'HPK1': 17, 'HRI': 17, 'HUNK': 17, 'ICK': 17, 'IKKA': 17, 'IKKB': 17, 'IKKE': 17, 'IRAK1': 17, 'IRAK4': 17, 'IRE1': 17, 'IRE2': 17, 'JNK1': 17, 'JNK2': 17, 'JNK3': 17, 'KHS1': 17, 'KHS2': 17, 'KIS': 17, 'LATS1': 17, 'LATS2': 17, 'LKB1': 17, 'LOK': 17, 'LRRK2': 17, 'MAK': 17, 'MEK1': 17, 'MEK2': 17, 'MEK5': 17, 'MEKK1': 17, 'YSK4': 17, 'MEKK2': 17, 'MEKK3': 17, 'ASK1': 17, 'MEKK6': 17, 'MAP3K15': 17, 'MAPKAPK2': 17, 'MAPKAPK3': 17, 'MAPKAPK5': 17, 'MARK1': 17, 'MARK2': 17, 'MARK3': 17, 'MARK4': 17, 'MASTL': 17, 'MELK': 17, 'MINK': 17, 'MLK1': 17, 'MLK2': 17, 'MLK3': 17, 'MLK4': 17, 'MNK1': 17, 'MNK2': 17, 'MOK': 17, 'MOS': 17, 'MPSK1': 17, 'MRCKA': 17, 'MRCKB': 17, 'MSK1': 17, 'MSK2': 17, 'SRPK3': 17, 'MST1': 17, 'MST2': 17, 'MST3': 17, 'MST4': 17, 'MYO3A': 17, 'MYO3B': 17, 'NDR1': 17, 'NDR2': 17, 'NEK1': 17, 'NEK11': 17, 'NEK2': 17, 'NEK3': 17, 'NEK4': 17, 'NEK5': 17, 'NEK6': 17, 'NEK7': 17, 'NEK8': 17, 'NEK9': 17, 'NIK': 17, 'NIM1': 17, 'NLK': 17, 'NUAK1': 17, 'NUAK2': 17, 'OSR1': 17, 'P38A': 17, 'P38B': 17, 'P38D': 17, 'P38G': 17, 'P70S6K': 17, 'P70S6KB': 17, 'PAK1': 17, 'PAK2': 17, 'PAK3': 17, 'PAK4': 17, 'PAK5': 17, 'PAK6': 17, 'PASK': 17, 'PBK': 17, 'CDK16': 17, 'CDK17': 17, 'CDK18': 17, 'PDHK1': 16, 'PDHK4': 16, 'PDK1': 17, 'PERK': 17, 'CDK14': 17, 'PHKG1': 17, 'PHKG2': 17, 'PIM1': 17, 'PIM2': 17, 'PIM3': 17, 'PINK1': 17, 'PKACA': 17, 'PKACB': 17, 'PKACG': 17, 'PKCA': 17, 'PKCB': 17, 'PKCD': 17, 'PKCE': 17, 'PKCG': 17, 'PKCH': 17, 'PKCI': 17, 'PKCT': 17, 'PKCZ': 17, 'PRKD1': 17, 'PRKD2': 17, 'PRKD3': 17, 'PKG1': 17, 'PKG2': 17, 'PKN1': 17, 'PKN2': 17, 'PKN3': 17, 'PKR': 17, 'PLK1': 17, 'PLK2': 17, 'PLK3': 17, 'PLK4': 17, 'PRKX': 17, 'PRP4': 17, 'PRPK': 17, 'QIK': 17, 'QSK': 17, 'RAF1': 17, 'GRK1': 17, 'RIPK1': 17, 'RIPK2': 17, 'RIPK3': 17, 'ROCK1': 17, 'ROCK2': 17, 'P90RSK': 17, 'RSK2': 17, 'RSK3': 17, 'RSK4': 17, 'SBK': 17, 'MYLK4': 17, 'SGK1': 17, 'SGK3': 17, 'DSTYK': 17, 'SIK': 17, 'SKMLCK': 17, 'SLK': 17, 'SMG1': 17, 'SMMLCK': 17, 'SNRK': 17, 'SRPK1': 17, 'SRPK2': 17, 'SSTK': 17, 'STK33': 17, 'STLK3': 17, 'TAK1': 17, 'TAO1': 17, 'TAO2': 17, 'TAO3': 17, 'TBK1': 17, 'TGFBR1': 17, 'TGFBR2': 17, 'TLK1': 17, 'TLK2': 17, 'TNIK': 17, 'TSSK1': 17, 'TSSK2': 17, 'TTBK1': 17, 'TTBK2': 17, 'TTK': 17, 'ULK1': 17, 'ULK2': 17, 'VRK1': 17, 'VRK2': 17, 'WNK1': 17, 'WNK3': 17, 'WNK4': 17, 'YANK2': 17, 'YANK3': 17, 'YSK1': 17, 'ZAK': 17, 'EEF2K': 17, 'FAM20C': 17})
Multiply values, consider the dynamics of scale factor, which is PSPA random aa number.
sumup (values, kinase=None)
Sum up the possibilities of the amino acids at each position in a phosphorylation site sequence
| Type | Default | Details | |
|---|---|---|---|
| values | list of values, possibilities of amino acids at certain positions | ||
| kinase | NoneType | None |
duplicate_ref_zero (df:pandas.core.frame.DataFrame)
If ‘0S’, ‘0T’, ‘0Y’ exist with non-zero values, create ‘0s’, ‘0t’, ‘0y’ with same values. If ‘0s’, ‘0t’, ‘0y’ exist with non-zero values, create ‘0S’, ‘0T’, ‘0Y’ with same values.
preprocess_ref (ref)
Convert pS/T/Y in ref columns to s/t/y if any; mirror 0S/T/Y to 0s/t/y.
predict_kinase (input_string:str, ref:pandas.core.frame.DataFrame, func:Callable, to_lower:bool=False, to_upper:bool=False, verbose=True)
Predict kinase given a phosphorylation site sequence
| Type | Default | Details | |
|---|---|---|---|
| input_string | str | site sequence | |
| ref | DataFrame | reference dataframe for scoring | |
| func | Callable | function to calculate score | |
| to_lower | bool | False | convert capital STY to lower case |
| to_upper | bool | False | convert all letter to uppercase |
| verbose | bool | True |
PSPA scoring:
considering string: ['-5V', '-4E', '-3P', '-2P', '-1L', '0s', '1Q', '2E', '3T', '4F', '5S']
kinase
ATM 5.037
SMG1 4.385
DNAPK 3.818
ATR 3.507
FAM20C 3.170
...
PKN1 -7.275
P70S6K -7.295
AKT3 -7.375
PKCI -7.742
NEK3 -8.254
Length: 303, dtype: float64
CDDM scoring, LO + sum
considering string: ['-7P', '-6S', '-5V', '-4E', '-3P', '-2P', '-1L', '0s', '1Q', '2E', '3T', '4F', '5S', '6D', '7L']
ATR 12.751
ATM 10.960
DNAPK 6.039
SRPK2 2.079
SMMLCK 1.876
...
ROR1 -89.216
CDC7 -91.457
CAMK1B -91.577
TNNI3K -118.835
BRAF -134.851
Length: 328, dtype: float64
CDDM scoring, PSSM + multiply (#23aa)
considering string: ['-7P', '-6S', '-5V', '-4E', '-3P', '-2P', '-1L', '0s', '1Q', '2E', '3T', '4F', '5S', '6D', '7L']
ATR 16.824
ATM 15.033
DNAPK 10.112
SRPK2 6.152
SMMLCK 5.949
...
ROR1 -85.143
CDC7 -87.384
CAMK1B -87.503
TNNI3K -114.762
BRAF -130.778
Length: 328, dtype: float64
CDDM scoring, PSSM + multiply (#20aa)
considering string: ['-7P', '-6S', '-5V', '-4E', '-3P', '-2P', '-1L', '0s', '1Q', '2E', '3T', '4F', '5S', '6D', '7L']
ATR 16.587
ATM 14.362
DNAPK 10.430
SRPK2 8.044
CHK2 7.955
...
TTK -43.375
GAK -45.159
CAMK1B -69.395
TNNI3K -70.993
BRAF -109.130
Length: 328, dtype: float64
Here we provide different PSSM settings from either PSPA data or kinase-substrate dataset for kinase prediction:
Params (name=None, load=True)
considering string: ['-5V', '-4E', '-3P', '-2P', '-1L', '0s', '1Q', '2E', '3T', '4F', '5S']
kinase
ATM 5.037
SMG1 4.385
DNAPK 3.818
ATR 3.507
FAM20C 3.170
dtype: float64
considering string: ['-7P', '-6S', '-5V', '-4E', '-3P', '-2P', '-1L', '0s', '1Q', '2E', '3T', '4F', '5S', '6D', '7L']
ATR 12.751
ATM 10.960
DNAPK 6.039
SRPK2 2.079
SMMLCK 1.876
dtype: float64
considering string: ['-7P', '-6S', '-5V', '-4E', '-3P', '-2P', '-1L', '0S', '1Q', '2E', '3T', '4F', '5S', '6D', '7L']
ATR 11.815
ATM 9.590
DNAPK 5.659
SRPK2 3.272
CHK2 3.183
dtype: float64
multiply_generic (merged_df, kinases, df_index, divide_factor_func)
Multiply-based log-sum aggregation across kinases.
predict_kinase_df (df, seq_col, ref, func, to_lower=False, to_upper=False)
Predict kinase scores based on reference PSSM or weight matrix. Applies preprocessing, merges long format keys, then aggregates using given func.
Input dataframe has 100 rows
Preprocessing...
Preprocessing done. Expanding sequences...
Merging reference...
Merge complete.
Computing multiply_generic: 0%| | 0/396 [00:00<?, ?it/s]Computing multiply_generic: 100%|██████████| 396/396 [00:00<00:00, 650.95it/s]
get_pct (site, ref, func, pct_ref)
Replicate the precentile results from The Kinase Library.
considering string: ['-5V', '-4E', '-3P', '-2P', '-1L', '0Y', '1Q', '2E', '3T', '4F', '5S']
| log2(score) | percentile | |
|---|---|---|
| ABL2 | 3.137 | 96.568694 |
| BMX | 2.816 | 96.117567 |
| BTK | 1.956 | 95.693780 |
| CSK | 2.303 | 95.174299 |
| MERTK | 2.509 | 93.588517 |
| ... | ... | ... |
| FLT1 | -1.919 | 25.358852 |
| PINK1_TYR | -1.227 | 21.927546 |
| MUSK | -3.031 | 21.298701 |
| TNNI3K_TYR | -3.549 | 11.004785 |
| PKMYT1_TYR | -1.739 | 4.798360 |
93 rows × 2 columns
considering string: ['-5V', '-4E', '-3P', '-2P', '-1L', '0S', '1Q', '2E', '3T', '4F']
| log2(score) | percentile | |
|---|---|---|
| ATM | 5.037 | 99.822351 |
| SMG1 | 4.385 | 99.831819 |
| DNAPK | 3.818 | 99.205315 |
| ATR | 3.507 | 99.680344 |
| FAM20C | 3.170 | 95.370556 |
| ... | ... | ... |
| PKN1 | -7.275 | 14.070436 |
| P70S6K | -7.295 | 4.089816 |
| AKT3 | -7.375 | 11.432995 |
| PKCI | -7.742 | 8.129511 |
| NEK3 | -8.254 | 4.637240 |
303 rows × 2 columns
get_pct_df (score_df, pct_ref)
Replicate the precentile results from The Kinase Library.
| Details | |
|---|---|
| score_df | output from predict_kinase_df |
| pct_ref | a reference df for percentile calculation |