cut_seq('AAkUuPSFSTtH',-5,4)'AkUuPSFSTt'
Extract sequence based on a range relative to its center position
Get a dictionary of input string; no need for the star in the middle; make sure it is 15 or 10 length
Multiply the possibilities of the amino acids at each position in a phosphorylation site
\[ \text{Score} = \log_2 \left( \frac{ \prod P_{\text{KinX}}(\text{AA}, \text{Position}) }{ \left( \frac{1}{\#\text{Random AA}} \right)^{\text{length(Position except 0)}} } \right) \]
The function implement formula from Johnson et al. Nature: An atlas of substrate specificities for the human serine/threonine kinome, Supplementary Note2 (page 160)
Multiply class, consider the dynamics of scale factor
Multiply values, consider the dynamics of scale factor, which is PSPA random aa number.
Sum up the possibilities of the amino acids at each position in a phosphorylation site sequence
If ‘0S’, ‘0T’, ‘0Y’ exist with non-zero values, create ‘0s’, ‘0t’, ‘0y’ with same values. If ‘0s’, ‘0t’, ‘0y’ exist with non-zero values, create ‘0S’, ‘0T’, ‘0Y’ with same values.
Convert pS/T/Y in ref columns to s/t/y if any; mirror 0S/T/Y to 0s/t/y.
Predict kinase given a phosphorylation site sequence
PSPA scoring:
considering string: ['-5V', '-4E', '-3P', '-2P', '-1L', '0s', '1Q', '2E', '3T', '4F', '5S']
kinase
ATM 5.037
SMG1 4.385
DNAPK 3.818
ATR 3.507
FAM20C 3.170
...
PKN1 -7.275
P70S6K -7.295
AKT3 -7.375
PKCI -7.742
NEK3 -8.254
Length: 303, dtype: float64
CDDM scoring, LO + sum
considering string: ['-7P', '-6S', '-5V', '-4E', '-3P', '-2P', '-1L', '0s', '1Q', '2E', '3T', '4F', '5S', '6D', '7L']
ATR 12.751
ATM 10.960
DNAPK 6.039
SRPK2 2.079
SMMLCK 1.876
...
ROR1 -89.216
CDC7 -91.457
CAMK1B -91.577
TNNI3K -118.835
BRAF -134.851
Length: 328, dtype: float64
CDDM scoring, PSSM + multiply (#23aa)
considering string: ['-7P', '-6S', '-5V', '-4E', '-3P', '-2P', '-1L', '0s', '1Q', '2E', '3T', '4F', '5S', '6D', '7L']
ATR 16.824
ATM 15.033
DNAPK 10.112
SRPK2 6.152
SMMLCK 5.949
...
ROR1 -85.143
CDC7 -87.384
CAMK1B -87.503
TNNI3K -114.762
BRAF -130.778
Length: 328, dtype: float64
CDDM scoring, PSSM + multiply (#20aa)
considering string: ['-7P', '-6S', '-5V', '-4E', '-3P', '-2P', '-1L', '0s', '1Q', '2E', '3T', '4F', '5S', '6D', '7L']
ATR 16.587
ATM 14.362
DNAPK 10.430
SRPK2 8.044
CHK2 7.955
...
TTK -43.375
GAK -45.159
CAMK1B -69.395
TNNI3K -70.993
BRAF -109.130
Length: 328, dtype: float64
Here we provide different PSSM settings from either PSPA data or kinase-substrate dataset for kinase prediction:
Call self as a function.
considering string: ['-5V', '-4E', '-3P', '-2P', '-1L', '0s', '1Q', '2E', '3T', '4F', '5S']
kinase
ATM 5.037
SMG1 4.385
DNAPK 3.818
ATR 3.507
FAM20C 3.170
dtype: float64
considering string: ['-7P', '-6S', '-5V', '-4E', '-3P', '-2P', '-1L', '0s', '1Q', '2E', '3T', '4F', '5S', '6D', '7L']
ATR 12.751
ATM 10.960
DNAPK 6.039
SRPK2 2.079
SMMLCK 1.876
dtype: float64
considering string: ['-7P', '-6S', '-5V', '-4E', '-3P', '-2P', '-1L', '0S', '1Q', '2E', '3T', '4F', '5S', '6D', '7L']
ATR 11.815
ATM 9.590
DNAPK 5.659
SRPK2 3.272
CHK2 3.183
dtype: float64
Multiply-based log-sum aggregation across kinases.
Predict kinase scores based on reference PSSM or weight matrix. Applies preprocessing, merges long format keys, then aggregates using given func.
Input dataframe has 100 rows
Preprocessing...
Preprocessing done. Expanding sequences...
Merging reference...
Merge complete.
Computing multiply_generic: 0%| | 0/396 [00:00<?, ?it/s]
Computing multiply_generic: 44%|████▍ | 176/396 [00:00<00:00, 1755.82it/s]
Computing multiply_generic: 91%|█████████▏| 362/396 [00:00<00:00, 1812.62it/s]
Computing multiply_generic: 100%|██████████| 396/396 [00:00<00:00, 1816.26it/s]
Replicate the precentile results from The Kinase Library.
considering string: ['-5V', '-4E', '-3P', '-2P', '-1L', '0Y', '1Q', '2E', '3T', '4F', '5S']
| log2(score) | percentile | |
|---|---|---|
| ABL2 | 3.137 | 96.568694 |
| BMX | 2.816 | 96.117567 |
| BTK | 1.956 | 95.693780 |
| CSK | 2.303 | 95.174299 |
| MERTK | 2.509 | 93.588517 |
| ... | ... | ... |
| FLT1 | -1.919 | 25.358852 |
| PINK1_TYR | -1.227 | 21.927546 |
| MUSK | -3.031 | 21.298701 |
| TNNI3K_TYR | -3.549 | 11.004785 |
| PKMYT1_TYR | -1.739 | 4.798360 |
93 rows × 2 columns
considering string: ['-5V', '-4E', '-3P', '-2P', '-1L', '0S', '1Q', '2E', '3T', '4F']
| log2(score) | percentile | |
|---|---|---|
| ATM | 5.037 | 99.822351 |
| SMG1 | 4.385 | 99.831819 |
| DNAPK | 3.818 | 99.205315 |
| ATR | 3.507 | 99.680344 |
| FAM20C | 3.170 | 95.370556 |
| ... | ... | ... |
| PKN1 | -7.275 | 14.070436 |
| P70S6K | -7.295 | 4.089816 |
| AKT3 | -7.375 | 11.432995 |
| PKCI | -7.742 | 8.129511 |
| NEK3 | -8.254 | 4.637240 |
303 rows × 2 columns
Replicate the precentile results from The Kinase Library.