= Data.get_ks_dataset() data
CPU times: user 1.05 s, sys: 447 ms, total: 1.5 s
Wall time: 14.2 s
get_prob (df:pandas.core.frame.DataFrame, col:str, aa_order=['P', 'G', 'A', 'C', 'S', 'T', 'V', 'I', 'L', 'M', 'F', 'Y', 'W', 'H', 'K', 'R', 'Q', 'N', 'D', 'E', 's', 't', 'y'])
Get the probability matrix of PSSM from phosphorylation site sequences.
This function computes a position-specific probability matrix (PSSM) from a list of aligned phosphorylation site sequences.
For each position \(i\) (e.g., from \(-7\) to \(+7\)), the probability of observing amino acid \(x\) is:
\[ P_i(x) = \frac{\text{count of amino acid } x \text{ at position } i}{\text{total counts at position } i} \]
The following 23 amino acids are included:
A
, C
, D
, E
, F
, G
, H
, I
, K
, L
, M
, N
, P
, Q
, R
, S
, T
, V
, W
, Y
s
, t
, y
(often used to denote phosphorylated S
, T
, Y
)In the output, the modified residues are renamed as: - s
→ pS
- t
→ pT
- y
→ pY
The resulting matrix has: - Rows: Amino acids (including pS
, pT
, pY
), - Columns: Sequence positions (centered on the phosphosite), - Values: Probabilities of each amino acid at each position.
Position | -20 | -19 | -18 | -17 | -16 | -15 | -14 | -13 | -12 | -11 | -10 | -9 | -8 | -7 | -6 | -5 | -4 | -3 | -2 | -1 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
aa | |||||||||||||||||||||||||||||||||||||||||
P | 0.100762 | 0.083686 | 0.082422 | 0.090680 | 0.079631 | 0.082008 | 0.088333 | 0.077815 | 0.091584 | 0.083128 | 0.071487 | 0.080526 | 0.085106 | 0.091354 | 0.097561 | 0.087520 | 0.091350 | 0.104116 | 0.159677 | 0.082192 | 0.0 | 0.758065 | 0.086360 | 0.088781 | 0.084951 | 0.101297 | 0.090171 | 0.107492 | 0.086743 | 0.098280 | 0.088889 | 0.085950 | 0.091211 | 0.078138 | 0.078138 | 0.085762 | 0.096909 | 0.096477 | 0.063973 | 0.081218 | 0.094996 |
G | 0.069433 | 0.073542 | 0.068966 | 0.054576 | 0.081308 | 0.064435 | 0.084167 | 0.072848 | 0.080858 | 0.075720 | 0.087099 | 0.089565 | 0.060556 | 0.075041 | 0.083740 | 0.076985 | 0.080032 | 0.082324 | 0.066935 | 0.096696 | 0.0 | 0.031452 | 0.092817 | 0.062954 | 0.050162 | 0.081037 | 0.066613 | 0.065961 | 0.076105 | 0.066339 | 0.074074 | 0.065289 | 0.067993 | 0.087282 | 0.068994 | 0.071607 | 0.085213 | 0.060403 | 0.074074 | 0.082910 | 0.064461 |
A | 0.075360 | 0.071006 | 0.079058 | 0.058774 | 0.070411 | 0.067782 | 0.075833 | 0.086921 | 0.068482 | 0.069959 | 0.067379 | 0.062449 | 0.069558 | 0.070962 | 0.069919 | 0.073744 | 0.074373 | 0.071832 | 0.083065 | 0.087027 | 0.0 | 0.022581 | 0.092010 | 0.056497 | 0.079288 | 0.057536 | 0.077985 | 0.072476 | 0.068740 | 0.066339 | 0.075720 | 0.071074 | 0.065506 | 0.063175 | 0.069825 | 0.072440 | 0.067669 | 0.079698 | 0.058081 | 0.061760 | 0.078880 |
C | 0.013548 | 0.006762 | 0.007569 | 0.009236 | 0.014250 | 0.011715 | 0.008333 | 0.007450 | 0.009901 | 0.007407 | 0.010682 | 0.008217 | 0.007365 | 0.008972 | 0.010569 | 0.005673 | 0.004042 | 0.007264 | 0.013710 | 0.008864 | 0.0 | 0.001613 | 0.008878 | 0.006457 | 0.010518 | 0.009724 | 0.008936 | 0.010586 | 0.007365 | 0.010647 | 0.006584 | 0.011570 | 0.012438 | 0.004988 | 0.011638 | 0.010824 | 0.011696 | 0.009228 | 0.013468 | 0.007614 | 0.012723 |
S | 0.041490 | 0.053254 | 0.043734 | 0.052057 | 0.035205 | 0.041004 | 0.050833 | 0.050497 | 0.043729 | 0.039506 | 0.043550 | 0.041085 | 0.040098 | 0.038336 | 0.034146 | 0.029984 | 0.034762 | 0.016949 | 0.020968 | 0.028203 | 0.0 | 0.003226 | 0.029056 | 0.025020 | 0.038026 | 0.033225 | 0.044679 | 0.041531 | 0.043372 | 0.044226 | 0.050206 | 0.041322 | 0.046434 | 0.044057 | 0.050707 | 0.045795 | 0.045948 | 0.046980 | 0.050505 | 0.054146 | 0.047498 |
pSTY2sty (string)
Convert pS/pT/pY to s/t/y in a string.
-5P | -5G | -5A | -5C | -5S | -5T | -5V | -5I | -5L | -5M | -5F | -5Y | -5W | -5H | -5K | -5R | -5Q | -5N | -5D | -5E | -5pS | -5pT | -5pY | -4P | -4G | -4A | -4C | -4S | -4T | -4V | -4I | -4L | -4M | -4F | -4Y | -4W | -4H | -4K | -4R | -4Q | -4N | -4D | -4E | -4pS | -4pT | -4pY | -3P | -3G | -3A | -3C | ... | 2E | 2pS | 2pT | 2pY | 3P | 3G | 3A | 3C | 3S | 3T | 3V | 3I | 3L | 3M | 3F | 3Y | 3W | 3H | 3K | 3R | 3Q | 3N | 3D | 3E | 3pS | 3pT | 3pY | 4P | 4G | 4A | 4C | 4S | 4T | 4V | 4I | 4L | 4M | 4F | 4Y | 4W | 4H | 4K | 4R | 4Q | 4N | 4D | 4E | 4pS | 4pT | 4pY | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
kinase | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
AAK1 | 0.05845 | 0.01989 | 0.02305 | 0.03702 | 0.03450 | 0.03450 | 0.07720 | 0.12615 | 0.08061 | 0.07014 | 0.03450 | 0.07728 | 0.02557 | 0.02687 | 0.02127 | 0.07760 | 0.04546 | 0.02232 | 0.01299 | 0.01242 | 0.01632 | 0.01632 | 0.04960 | 0.04172 | 0.05015 | 0.05515 | 0.04375 | 0.04836 | 0.04836 | 0.04836 | 0.04851 | 0.05796 | 0.05414 | 0.04062 | 0.04172 | 0.03148 | 0.04015 | 0.06320 | 0.05586 | 0.04898 | 0.03351 | 0.02594 | 0.04375 | 0.02594 | 0.02594 | 0.02648 | 0.08610 | 0.04067 | 0.08888 | 0.05203 | ... | 0.04025 | 0.03142 | 0.03142 | 0.03149 | 0.05264 | 0.07135 | 0.04499 | 0.05735 | 0.04499 | 0.04499 | 0.04715 | 0.02999 | 0.03780 | 0.03378 | 0.03324 | 0.04120 | 0.03718 | 0.05210 | 0.05712 | 0.06965 | 0.04816 | 0.05681 | 0.03131 | 0.02868 | 0.02589 | 0.02589 | 0.02775 | 0.05026 | 0.05618 | 0.05170 | 0.04826 | 0.04482 | 0.04482 | 0.03377 | 0.03321 | 0.03689 | 0.03713 | 0.04186 | 0.04170 | 0.06611 | 0.04482 | 0.06651 | 0.07427 | 0.05082 | 0.04738 | 0.03113 | 0.03657 | 0.02009 | 0.02009 | 0.02161 |
ACVR2A | 0.02971 | 0.03443 | 0.04180 | 0.03500 | 0.04137 | 0.04137 | 0.04281 | 0.04474 | 0.04266 | 0.03729 | 0.04295 | 0.04137 | 0.05748 | 0.04080 | 0.03651 | 0.03400 | 0.03078 | 0.03837 | 0.06356 | 0.05648 | 0.05605 | 0.05605 | 0.05440 | 0.03341 | 0.03936 | 0.03979 | 0.03950 | 0.03936 | 0.03936 | 0.03893 | 0.03771 | 0.03728 | 0.04130 | 0.04438 | 0.04201 | 0.05406 | 0.03950 | 0.02911 | 0.03276 | 0.03456 | 0.03592 | 0.07456 | 0.06230 | 0.05800 | 0.05800 | 0.04882 | 0.03345 | 0.04351 | 0.03578 | 0.03918 | ... | 0.04447 | 0.04786 | 0.04786 | 0.03799 | 0.04958 | 0.04381 | 0.03914 | 0.03559 | 0.04366 | 0.04366 | 0.04196 | 0.04099 | 0.04529 | 0.04358 | 0.04765 | 0.04839 | 0.04699 | 0.04366 | 0.03419 | 0.02864 | 0.03692 | 0.03877 | 0.04603 | 0.06438 | 0.03840 | 0.03840 | 0.06031 | 0.05559 | 0.03989 | 0.03652 | 0.03791 | 0.04129 | 0.04129 | 0.03784 | 0.04129 | 0.03755 | 0.04855 | 0.03835 | 0.04246 | 0.05867 | 0.04202 | 0.03865 | 0.03601 | 0.04517 | 0.04077 | 0.04693 | 0.04693 | 0.05155 | 0.05155 | 0.04319 |
ACVR2B | 0.03779 | 0.03665 | 0.04013 | 0.05473 | 0.03779 | 0.03779 | 0.03850 | 0.03134 | 0.03339 | 0.03658 | 0.04282 | 0.04303 | 0.05112 | 0.03672 | 0.03063 | 0.03346 | 0.03531 | 0.04218 | 0.07551 | 0.06381 | 0.05402 | 0.05402 | 0.05268 | 0.03774 | 0.04406 | 0.04048 | 0.04645 | 0.04013 | 0.04013 | 0.03647 | 0.03226 | 0.03640 | 0.03521 | 0.04392 | 0.04273 | 0.04954 | 0.04287 | 0.02663 | 0.02586 | 0.03465 | 0.04013 | 0.07224 | 0.06149 | 0.05538 | 0.05538 | 0.05987 | 0.03044 | 0.03658 | 0.03579 | 0.03882 | ... | 0.04579 | 0.06240 | 0.06240 | 0.04636 | 0.05072 | 0.04699 | 0.04188 | 0.03976 | 0.04268 | 0.04268 | 0.04210 | 0.04297 | 0.05006 | 0.04443 | 0.04122 | 0.05123 | 0.04845 | 0.03727 | 0.02653 | 0.03311 | 0.03888 | 0.03815 | 0.04268 | 0.05423 | 0.04633 | 0.04633 | 0.05130 | 0.05205 | 0.04200 | 0.04735 | 0.03984 | 0.04056 | 0.04056 | 0.03998 | 0.03926 | 0.03832 | 0.03774 | 0.03485 | 0.04244 | 0.05675 | 0.04056 | 0.03261 | 0.03514 | 0.04229 | 0.03846 | 0.05278 | 0.05039 | 0.05502 | 0.05502 | 0.04605 |
AKT1 | 0.04669 | 0.04599 | 0.04274 | 0.04684 | 0.03995 | 0.03995 | 0.03306 | 0.03368 | 0.03592 | 0.03910 | 0.04235 | 0.03995 | 0.04065 | 0.05056 | 0.07355 | 0.11753 | 0.03801 | 0.03616 | 0.02911 | 0.02911 | 0.03298 | 0.03298 | 0.03314 | 0.04161 | 0.04729 | 0.04091 | 0.03858 | 0.04091 | 0.04091 | 0.03461 | 0.03018 | 0.03430 | 0.04114 | 0.03360 | 0.03593 | 0.04449 | 0.04667 | 0.08455 | 0.12592 | 0.04161 | 0.03679 | 0.02598 | 0.03220 | 0.03508 | 0.03508 | 0.03166 | 0.02642 | 0.02358 | 0.02837 | 0.02748 | ... | 0.01816 | 0.02271 | 0.02271 | 0.02975 | 0.03828 | 0.05555 | 0.03828 | 0.05109 | 0.04422 | 0.04422 | 0.03594 | 0.03852 | 0.03797 | 0.03852 | 0.04750 | 0.04500 | 0.04422 | 0.06000 | 0.07180 | 0.08906 | 0.04430 | 0.04875 | 0.02289 | 0.02469 | 0.02633 | 0.02633 | 0.02656 | 0.07361 | 0.04755 | 0.03900 | 0.03884 | 0.03900 | 0.03900 | 0.03301 | 0.03037 | 0.03692 | 0.03637 | 0.03109 | 0.02789 | 0.04156 | 0.05299 | 0.09151 | 0.08648 | 0.05874 | 0.05187 | 0.03541 | 0.02494 | 0.03141 | 0.03141 | 0.02102 |
AKT2 | 0.04617 | 0.04732 | 0.04931 | 0.04464 | 0.04095 | 0.04095 | 0.03321 | 0.03206 | 0.03781 | 0.03934 | 0.04203 | 0.04180 | 0.04095 | 0.04671 | 0.07516 | 0.11320 | 0.03796 | 0.03643 | 0.02339 | 0.02416 | 0.03497 | 0.03497 | 0.03651 | 0.04437 | 0.05416 | 0.05245 | 0.04390 | 0.04056 | 0.04056 | 0.03201 | 0.03023 | 0.03644 | 0.04056 | 0.03194 | 0.03668 | 0.03085 | 0.04701 | 0.09053 | 0.10723 | 0.04507 | 0.04181 | 0.02611 | 0.02953 | 0.03435 | 0.03435 | 0.02930 | 0.01778 | 0.01831 | 0.02729 | 0.02649 | ... | 0.02437 | 0.04664 | 0.04664 | 0.04794 | 0.04509 | 0.05588 | 0.04689 | 0.05708 | 0.03918 | 0.03918 | 0.03273 | 0.02719 | 0.03783 | 0.03236 | 0.03790 | 0.03910 | 0.03918 | 0.06127 | 0.07865 | 0.08030 | 0.04247 | 0.04052 | 0.02592 | 0.02577 | 0.03558 | 0.03558 | 0.04434 | 0.07404 | 0.05528 | 0.04158 | 0.04502 | 0.03668 | 0.03668 | 0.02887 | 0.03093 | 0.03315 | 0.03668 | 0.03063 | 0.03446 | 0.03178 | 0.05199 | 0.08844 | 0.07580 | 0.04992 | 0.04770 | 0.02772 | 0.02680 | 0.04196 | 0.04196 | 0.03193 |
5 rows × 230 columns
Index(['-5P', '-5G', '-5A', '-5C', '-5S', '-5T', '-5V', '-5I', '-5L', '-5M',
...
'4H', '4K', '4R', '4Q', '4N', '4D', '4E', '4s', '4t', '4y'],
dtype='object', length=230)
flatten_pssm (pssm_df, use_sty=False, column_wise=True)
Flatten PSSM dataframe to dictionary
Type | Default | Details | |
---|---|---|---|
pssm_df | |||
use_sty | bool | False | if True, use s,t,y instead of pS,pT,pY |
column_wise | bool | True | if True, column major flatten; else row wise flatten (for pytorch training) |
-20P 0.100762
-20G 0.069433
...
20pT 0.018660
20pY 0.008482
Length: 943, dtype: float64
recover_pssm (flat_pssm:pandas.core.series.Series)
Recover 2D PSSM from flattened PSSM Series.
Position | -20 | -19 | -18 | -17 | -16 | -15 | -14 | -13 | -12 | -11 | -10 | -9 | -8 | -7 | -6 | -5 | -4 | -3 | -2 | -1 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
aa | |||||||||||||||||||||||||||||||||||||||||
P | 0.100762 | 0.083686 | 0.082422 | 0.090680 | 0.079631 | 0.082008 | 0.088333 | 0.077815 | 0.091584 | 0.083128 | 0.071487 | 0.080526 | 0.085106 | 0.091354 | 0.097561 | 0.087520 | 0.091350 | 0.104116 | 0.159677 | 0.082192 | 0.000000 | 0.758065 | 0.086360 | 0.088781 | 0.084951 | 0.101297 | 0.090171 | 0.107492 | 0.086743 | 0.098280 | 0.088889 | 0.085950 | 0.091211 | 0.078138 | 0.078138 | 0.085762 | 0.096909 | 0.096477 | 0.063973 | 0.081218 | 0.094996 |
G | 0.069433 | 0.073542 | 0.068966 | 0.054576 | 0.081308 | 0.064435 | 0.084167 | 0.072848 | 0.080858 | 0.075720 | 0.087099 | 0.089565 | 0.060556 | 0.075041 | 0.083740 | 0.076985 | 0.080032 | 0.082324 | 0.066935 | 0.096696 | 0.000000 | 0.031452 | 0.092817 | 0.062954 | 0.050162 | 0.081037 | 0.066613 | 0.065961 | 0.076105 | 0.066339 | 0.074074 | 0.065289 | 0.067993 | 0.087282 | 0.068994 | 0.071607 | 0.085213 | 0.060403 | 0.074074 | 0.082910 | 0.064461 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
pT | 0.028789 | 0.027895 | 0.031960 | 0.020151 | 0.020956 | 0.038494 | 0.019167 | 0.031457 | 0.023102 | 0.023868 | 0.032868 | 0.027938 | 0.033552 | 0.027732 | 0.038211 | 0.038088 | 0.039612 | 0.047619 | 0.045161 | 0.031426 | 0.324738 | 0.016129 | 0.037934 | 0.025827 | 0.038835 | 0.021070 | 0.035743 | 0.035831 | 0.027005 | 0.026208 | 0.023868 | 0.026446 | 0.022388 | 0.033250 | 0.023275 | 0.022481 | 0.028404 | 0.024329 | 0.028620 | 0.027919 | 0.018660 |
pY | 0.003387 | 0.005917 | 0.011775 | 0.007557 | 0.005029 | 0.010879 | 0.005000 | 0.004139 | 0.006601 | 0.009877 | 0.009860 | 0.011504 | 0.009820 | 0.006525 | 0.010569 | 0.012156 | 0.016168 | 0.008071 | 0.012097 | 0.008058 | 0.010475 | 0.004032 | 0.009685 | 0.015335 | 0.009709 | 0.014587 | 0.004874 | 0.006515 | 0.005728 | 0.007371 | 0.010700 | 0.007438 | 0.010779 | 0.007481 | 0.008313 | 0.004996 | 0.004177 | 0.002517 | 0.008418 | 0.007614 | 0.008482 |
23 rows × 41 columns
Or recover from PSPA data, where s, t, y will be converted to pS, pT, and pY:
-5P 0.0720
-5G 0.0245
...
0T 1.0000
0Y 0.0000
Name: AAK1, Length: 213, dtype: float64
Position | -5 | -4 | -3 | -2 | -1 | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|---|---|---|---|---|
aa | ||||||||||
P | 0.0720 | 0.0534 | 0.1084 | 0.0226 | 0.1136 | 0.0 | 0.0463 | 0.0527 | 0.0681 | 0.0628 |
G | 0.0245 | 0.0642 | 0.0512 | 0.0283 | 0.0706 | 0.0 | 0.7216 | 0.0749 | 0.0923 | 0.0702 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
pT | 0.0201 | 0.0332 | 0.0303 | 0.0209 | 0.0121 | 1.0 | 0.0123 | 0.0409 | 0.0335 | 0.0251 |
pY | 0.0611 | 0.0339 | 0.0274 | 0.0486 | 0.0178 | 0.0 | 0.0100 | 0.0410 | 0.0359 | 0.0270 |
23 rows × 10 columns
PSPA is not scaled per position, and the recovered pssm_df also contained copies of pS,pT,pY in zero position (S,T,Y).
So we need to remove the redundant copy in zero position (leave pS/pT/pY only) and scaled to 1 per position.
clean_zero_normalize (pssm_df)
Zero out non-last three values in position 0 (keep only s,t,y values at center), and normalize per position
This function applies phosphosite-specific cleaning and normalization to a PSSM.
At the center position (\(i = 0\)), only the last three rows of the matrix — corresponding to phosphorylatable residues s
, t
, and y
— are retained. All other amino acid values at position 0 are set to 0.
After masking, the matrix is column-normalized to ensure the probabilities at each position sum to 1:
\[ P_i(x) = \frac{P_i(x)}{\sum_{x'} P_i(x')} \]
Position | -5 | -4 | -3 | -2 | -1 | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|---|---|---|---|---|
aa | ||||||||||
P | 0.058446 | 0.041715 | 0.086100 | 0.017935 | 0.096068 | 0.0 | 0.042649 | 0.040482 | 0.052640 | 0.050260 |
G | 0.019888 | 0.050152 | 0.040667 | 0.022459 | 0.059704 | 0.0 | 0.664702 | 0.057536 | 0.071346 | 0.056182 |
A | 0.023054 | 0.055152 | 0.088880 | 0.042695 | 0.032558 | 0.0 | 0.028740 | 0.057613 | 0.044987 | 0.051701 |
C | 0.037016 | 0.043747 | 0.052025 | 0.046663 | 0.026469 | 0.0 | 0.020542 | 0.052543 | 0.057355 | 0.048259 |
S | 0.034500 | 0.048356 | 0.041859 | 0.044044 | 0.046089 | 0.0 | 0.013172 | 0.042403 | 0.044987 | 0.044818 |
get_pssm_LO (pssm_df, site_type)
Get log odds PSSM: log2 (freq pssm/background pssm).
Details | |
---|---|
pssm_df | |
site_type | S, T, Y, ST, or STY |
Let \(P_i(x)\) be the frequency of amino acid \(x\) at position \(i\) in the input PSSM, and let \(B_i(x)\) be the background frequency of amino acid \(x\) at the same position, derived from a background model corresponding to the specified site type (S
, T
, Y
, or STY
).
The log-odds score at each position \(i\) for amino acid \(x\) is computed as:
\[ \mathrm{LO}_i(x) = \log_2 \left( \frac{P_i(x) + \varepsilon}{B_i(x) + \varepsilon} \right) \]
where \(\varepsilon = 10^{-8}\) is a small constant added for numerical stability and to avoid division by zero.
This results in a matrix where:
Position | -20 | -19 | -18 | -17 | -16 | -15 | -14 | -13 | -12 | -11 | -10 | -9 | -8 | -7 | -6 | -5 | -4 | -3 | -2 | -1 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
aa | |||||||||||||||||||||||||||||||||||||||||
P | 3.346444 | -22.444049 | -22.483700 | -22.372787 | -22.330740 | -22.389426 | -22.374910 | -22.472588 | -22.395191 | 1.987608 | -22.494239 | 0.445166 | -22.568947 | 1.592353 | 0.481364 | -22.533741 | -22.473084 | 0.559090 | 0.126612 | -22.159002 | 0.0 | -21.219041 | -22.470604 | -22.994381 | -22.573808 | 0.385149 | 2.042397 | -22.444554 | -22.627785 | -22.737480 | -22.549102 | -22.614367 | -22.347328 | -22.450356 | -22.595824 | -22.811667 | -22.536115 | -22.489836 | -22.515329 | -22.349759 | -22.683033 |
G | -22.787957 | -22.672065 | -22.655564 | -22.817534 | -22.708515 | -22.680674 | -22.968775 | -22.718374 | -22.624219 | -22.752463 | -22.755726 | -22.704299 | 1.820192 | -22.601956 | 1.706261 | -22.640859 | 1.574369 | 2.033835 | 2.576232 | -22.485421 | 0.0 | 1.630751 | 1.522260 | -22.302012 | -22.682175 | -22.810294 | 0.400111 | 0.432922 | 1.787866 | -22.691364 | -22.606707 | -22.871736 | 2.968492 | -22.716491 | 0.361120 | 0.294845 | -22.713768 | 0.228233 | -22.828422 | 0.399882 | -22.816948 |
A | -22.752667 | -22.787160 | 2.244457 | -22.502032 | 1.566399 | 0.633076 | 0.209845 | -22.641080 | -22.668390 | -22.755318 | -22.862306 | -22.751646 | 0.302795 | -22.576117 | -22.685387 | -22.680889 | -22.568251 | -22.719210 | -22.719001 | -22.414159 | 0.0 | -22.708936 | 1.140189 | 0.379526 | 0.404134 | -22.548397 | 1.174491 | 1.373178 | -22.526834 | -22.873051 | -22.715444 | -22.607610 | 1.739854 | -22.700917 | 0.117036 | -22.679601 | 3.016049 | 1.792057 | -22.670553 | -22.569643 | 0.388053 |
C | -20.605126 | 2.946638 | -20.610543 | -20.696255 | -20.705566 | -20.614143 | -20.674869 | -20.556372 | -20.395192 | -20.465631 | -20.723175 | -20.683462 | -20.272986 | -20.562907 | -20.150661 | -20.637326 | -20.490327 | -20.198206 | -20.074310 | -20.198206 | 0.0 | -20.724008 | -20.427265 | -20.857569 | -20.533741 | -20.380471 | -20.784367 | -20.540379 | -20.325535 | -20.087319 | -20.474570 | -20.292215 | -20.675731 | -20.827647 | -20.497167 | -20.411835 | -20.843836 | -20.777160 | -20.951627 | -20.735623 | -20.464634 |
S | -22.084726 | -22.150350 | -22.199704 | -22.059532 | -22.210743 | -22.220518 | -22.174019 | 0.711677 | -22.086347 | -21.972111 | -21.907566 | -21.850337 | -22.021406 | -22.147869 | -21.771126 | -21.831131 | -21.924700 | -21.627563 | -21.621315 | -21.091844 | 0.0 | -21.500574 | -21.519414 | -21.615939 | -22.014952 | -22.077246 | -21.879422 | -21.929403 | -21.962611 | -21.994381 | -21.856429 | -22.144138 | -22.079527 | -22.071038 | -22.125341 | -22.129434 | -22.149420 | -22.204488 | -22.361586 | -22.234619 | -22.274408 |
aa
P 0.0
G 0.0
...
pT 0.0
pY 0.0
Name: 0, Length: 23, dtype: float64
get_pssm_LO_flat (flat_pssm, site_type)
Details | |
---|---|
flat_pssm | |
site_type | S, T, Y, ST, or STY |
Position | -20 | -19 | -18 | -17 | -16 | -15 | -14 | -13 | -12 | -11 | -10 | -9 | -8 | -7 | -6 | -5 | -4 | -3 | -2 | -1 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
aa | |||||||||||||||||||||||||||||||||||||||||
P | 0.586568 | 0.394138 | 0.374460 | 0.453737 | 0.289568 | 0.359052 | 0.482577 | 0.248766 | 0.474820 | 0.389707 | 0.109113 | 0.301928 | 0.336733 | 0.501051 | 0.515686 | 0.480461 | 0.515589 | 0.755086 | 1.058803 | 0.375194 | 0.000000 | 2.390053 | 0.400190 | 0.338784 | 0.471693 | 0.492316 | 0.449098 | 0.719735 | 0.363785 | 0.522481 | 0.321545 | 0.304417 | 0.526085 | 0.281430 | 0.236798 | 0.265018 | 0.545229 | 0.560005 | -0.022531 | 0.367827 | 0.484451 |
G | 0.016776 | 0.076372 | -0.002393 | -0.350614 | 0.235123 | -0.081289 | 0.200026 | 0.090254 | 0.242240 | 0.091234 | 0.285729 | 0.325186 | -0.114535 | 0.097026 | 0.230394 | 0.205385 | 0.155193 | 0.172634 | 0.046777 | 0.230465 | 0.000000 | -1.025163 | 0.477781 | -0.167616 | -0.357107 | 0.253900 | 0.024959 | -0.031779 | 0.169683 | -0.014948 | 0.151715 | -0.064406 | 0.045991 | 0.379368 | 0.067291 | 0.072022 | 0.275089 | -0.179173 | 0.127471 | 0.299874 | -0.100668 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
pT | 0.911994 | 0.912470 | 1.056133 | 0.470833 | 0.670078 | 1.311424 | 0.428650 | 1.116331 | 0.677966 | 0.513521 | 0.960501 | 0.625419 | 0.681497 | 0.332761 | 0.822303 | 0.639484 | 0.489218 | 0.812713 | 0.380983 | 0.113433 | 0.517665 | -0.851359 | 0.094853 | -0.047944 | 0.336769 | -0.327007 | 0.592785 | 0.869603 | 0.411031 | 0.426534 | 0.319260 | 0.750463 | 0.527671 | 1.073438 | 0.567107 | 0.517958 | 0.980042 | 0.740658 | 1.009078 | 0.904955 | 0.266741 |
pY | -1.507542 | -0.328457 | 0.480363 | -0.162906 | -0.511604 | 0.115742 | -0.743298 | -0.952493 | -0.347779 | -0.199342 | -0.067135 | 0.148291 | -0.000005 | -0.745499 | -0.322869 | -0.285925 | 0.016887 | -1.040359 | -0.684405 | -1.496734 | -4.500353 | -2.397121 | -1.122700 | -0.237571 | -0.778166 | -0.000325 | -1.465473 | -0.918068 | -0.956973 | -0.367583 | 0.107690 | -0.646100 | 0.325873 | -0.135850 | -0.111424 | -0.763337 | -1.033418 | -1.598209 | 0.046593 | -0.108687 | -0.180172 |
23 rows × 41 columns
get_cluster_pssms (df, cluster_col, seq_col='site_seq', id_col='sub_site', count_thr=10, valid_thr=None, IC_thr=None, plot=False)
Extract motifs from clusters in a dataframe
Type | Default | Details | |
---|---|---|---|
df | |||
cluster_col | |||
seq_col | str | site_seq | |
id_col | str | sub_site | |
count_thr | int | 10 | if less than the count threshold, not include in the return |
valid_thr | NoneType | None | percentage of not-nan values in pssm |
IC_thr | NoneType | None | |
plot | bool | False |
100%|██████████████████████████████████████████████████████████████████████████████████████████| 108/108 [00:03<00:00, 27.98it/s]
-20P | -20G | -20A | -20C | -20S | -20T | -20V | -20I | -20L | -20M | -20F | -20Y | -20W | -20H | -20K | -20R | -20Q | -20N | -20D | -20E | -20pS | -20pT | -20pY | -19P | -19G | -19A | -19C | -19S | -19T | -19V | -19I | -19L | -19M | -19F | -19Y | -19W | -19H | -19K | -19R | -19Q | -19N | -19D | -19E | -19pS | -19pT | -19pY | -18P | -18G | -18A | -18C | ... | 18E | 18pS | 18pT | 18pY | 19P | 19G | 19A | 19C | 19S | 19T | 19V | 19I | 19L | 19M | 19F | 19Y | 19W | 19H | 19K | 19R | 19Q | 19N | 19D | 19E | 19pS | 19pT | 19pY | 20P | 20G | 20A | 20C | 20S | 20T | 20V | 20I | 20L | 20M | 20F | 20Y | 20W | 20H | 20K | 20R | 20Q | 20N | 20D | 20E | 20pS | 20pT | 20pY | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Src | 0.054423 | 0.074083 | 0.068329 | 0.013906 | 0.041237 | 0.036922 | 0.056821 | 0.051786 | 0.081036 | 0.022776 | 0.033805 | 0.020858 | 0.007193 | 0.021338 | 0.075282 | 0.056821 | 0.049628 | 0.037881 | 0.060897 | 0.076960 | 0.025414 | 0.014625 | 0.017981 | 0.054041 | 0.071497 | 0.079388 | 0.016738 | 0.040411 | 0.037064 | 0.057628 | 0.052128 | 0.080344 | 0.022238 | 0.031803 | 0.017456 | 0.009565 | 0.020803 | 0.075801 | 0.061932 | 0.045911 | 0.039216 | 0.060976 | 0.073888 | 0.025586 | 0.011478 | 0.014108 | 0.052418 | 0.071718 | 0.064332 | 0.014772 | ... | 0.076174 | 0.029720 | 0.013237 | 0.01024 | 0.056462 | 0.069009 | 0.065997 | 0.016311 | 0.045420 | 0.040402 | 0.055207 | 0.049937 | 0.085571 | 0.020577 | 0.041405 | 0.015307 | 0.009536 | 0.016562 | 0.068005 | 0.067754 | 0.046173 | 0.036637 | 0.058720 | 0.077290 | 0.030615 | 0.011794 | 0.015307 | 0.059874 | 0.082767 | 0.059874 | 0.014591 | 0.043019 | 0.036478 | 0.048050 | 0.052075 | 0.073459 | 0.026164 | 0.041761 | 0.018616 | 0.009057 | 0.015849 | 0.077987 | 0.061384 | 0.039245 | 0.038239 | 0.055849 | 0.086792 | 0.025660 | 0.014340 | 0.018868 |
STE20 | 0.049319 | 0.085559 | 0.079019 | 0.013079 | 0.035422 | 0.032698 | 0.069755 | 0.049046 | 0.072752 | 0.023433 | 0.031063 | 0.018256 | 0.010354 | 0.019346 | 0.075749 | 0.050136 | 0.041144 | 0.036512 | 0.062670 | 0.077112 | 0.031608 | 0.024523 | 0.011444 | 0.053834 | 0.076400 | 0.075041 | 0.010604 | 0.034530 | 0.039695 | 0.057368 | 0.048396 | 0.083741 | 0.026101 | 0.027732 | 0.016585 | 0.009244 | 0.018760 | 0.083197 | 0.055193 | 0.043230 | 0.036161 | 0.051115 | 0.080479 | 0.038064 | 0.022295 | 0.012235 | 0.048860 | 0.064875 | 0.076276 | 0.014115 | ... | 0.082520 | 0.032897 | 0.023418 | 0.01143 | 0.051454 | 0.072427 | 0.074105 | 0.014262 | 0.036633 | 0.028803 | 0.053691 | 0.050336 | 0.081376 | 0.017617 | 0.032438 | 0.015380 | 0.015101 | 0.021532 | 0.085850 | 0.063199 | 0.038591 | 0.043904 | 0.051174 | 0.089206 | 0.034396 | 0.019295 | 0.009228 | 0.050548 | 0.068801 | 0.079753 | 0.016849 | 0.036787 | 0.029486 | 0.065993 | 0.048301 | 0.071609 | 0.022466 | 0.037630 | 0.015445 | 0.006459 | 0.025274 | 0.085650 | 0.052233 | 0.037349 | 0.043808 | 0.055883 | 0.075821 | 0.041000 | 0.019657 | 0.013199 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
FAM20C | 0.000000 | 0.153846 | 0.000000 | 0.000000 | 0.076923 | 0.000000 | 0.000000 | 0.000000 | 0.153846 | 0.000000 | 0.000000 | 0.076923 | 0.000000 | 0.153846 | 0.000000 | 0.076923 | 0.000000 | 0.076923 | 0.000000 | 0.076923 | 0.076923 | 0.076923 | 0.000000 | 0.000000 | 0.000000 | 0.307692 | 0.000000 | 0.000000 | 0.076923 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.230769 | 0.076923 | 0.076923 | 0.076923 | 0.153846 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.153846 | 0.076923 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.00000 | 0.083333 | 0.000000 | 0.250000 | 0.000000 | 0.000000 | 0.083333 | 0.000000 | 0.000000 | 0.000000 | 0.083333 | 0.000000 | 0.000000 | 0.000000 | 0.083333 | 0.000000 | 0.083333 | 0.000000 | 0.166667 | 0.166667 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.083333 | 0.000000 | 0.083333 | 0.083333 | 0.000000 | 0.083333 | 0.083333 | 0.000000 | 0.083333 | 0.000000 | 0.166667 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.083333 | 0.083333 | 0.000000 | 0.083333 | 0.083333 | 0.000000 | 0.000000 |
KIS | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.111111 | 0.000000 | 0.000000 | 0.222222 | 0.000000 | 0.111111 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.222222 | 0.000000 | 0.222222 | 0.000000 | 0.111111 | 0.000000 | 0.000000 | 0.111111 | 0.000000 | 0.222222 | 0.000000 | 0.000000 | 0.000000 | 0.111111 | 0.000000 | 0.111111 | 0.000000 | 0.111111 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.111111 | 0.000000 | 0.111111 | 0.000000 | 0.000000 | 0.111111 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.111111 | 0.000000 | ... | 0.222222 | 0.000000 | 0.000000 | 0.00000 | 0.000000 | 0.000000 | 0.222222 | 0.111111 | 0.000000 | 0.111111 | 0.000000 | 0.000000 | 0.222222 | 0.000000 | 0.111111 | 0.000000 | 0.000000 | 0.111111 | 0.000000 | 0.000000 | 0.111111 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.111111 | 0.000000 | 0.111111 | 0.000000 | 0.111111 | 0.000000 | 0.111111 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.333333 | 0.111111 | 0.000000 | 0.000000 | 0.111111 | 0.000000 | 0.000000 | 0.000000 |
97 rows × 943 columns
get_entropy (pssm_df, return_min=False, exclude_zero=False, clean_zero=True)
Calculate entropy per position of a PSSM surrounding 0. The less entropy the more information it contains.
Type | Default | Details | |
---|---|---|---|
pssm_df | a dataframe of pssm with index as aa and column as position | ||
return_min | bool | False | return min entropy as a single value or return all entropy as a pd.series |
exclude_zero | bool | False | exclude the column of 0 (center position) in the entropy calculation |
clean_zero | bool | True | if true, zero out non-last three values in position 0 (keep only s,t,y values at center) |
Let \(P_i(x)\) be the probability of amino acid \(x\) at position \(i\) in the PSSM, with \(i \in \{-k, \dots, -1, 0, +1, \dots, +k\}\). The entropy at each position \(i\) is defined as:
\[ H_i = - \sum_{x} P_i(x) \log_2 \left( P_i(x) + \varepsilon \right) \]
where \(\varepsilon = 10^{-8}\) is a small constant added for numerical stability.
If exclude_zero=True
, the central position \(i = 0\) is omitted from the entropy calculation.
If clean_zero=True
, all values at position \(i = 0\) are zeroed out except for amino acids Serine (S), Threonine (T), and Tyrosine (Y), typically the only possible phospho-acceptors in kinase motif analysis.
If return_min=True
, the function returns the minimum entropy across all positions:
\[ H_{\text{spec}} = \min_i H_i \]
Otherwise, the function returns the full vector \(\{H_i\}\) for each position \(i\), reflecting how much information (or uncertainty) is contained at each position in the motif.
Position
0 0.987416
1 1.740698
...
-18 4.284598
14 4.285491
Length: 41, dtype: float64
# calculate minimum entropy of surrouding positions
get_entropy(pssm_df,return_min=True,exclude_zero=True)
1.7406981100302623
get_entropy_flat (flat_pssm:pandas.core.series.Series, return_min=False, exclude_zero=False, clean_zero=True)
Calculate entropy per position of a flat PSSM surrounding 0
Type | Default | Details | |
---|---|---|---|
flat_pssm | Series | ||
return_min | bool | False | return min entropy as a single value or return all entropy as a pd.series |
exclude_zero | bool | False | exclude the column of 0 (center position) in the entropy calculation |
clean_zero | bool | True | if true, zero out non-last three values in position 0 (keep only s,t,y values at center) |
Position
0 0.987416
1 1.740698
...
-18 4.284598
14 4.285491
Length: 41, dtype: float64
get_IC (pssm_df, return_min=False, exclude_zero=False, clean_zero=True)
Calculate the information content (bits) from a frequency matrix, using log2(3) for the middle position and log2(len(pssm_df)) for others. The higher the more information it contains.
Type | Default | Details | |
---|---|---|---|
pssm_df | a dataframe of pssm with index as aa and column as position | ||
return_min | bool | False | return min entropy as a single value or return all entropy as a pd.series |
exclude_zero | bool | False | exclude the column of 0 (center position) in the entropy calculation |
clean_zero | bool | True | if true, zero out non-last three values in position 0 (keep only s,t,y values at center) |
Let \(P_i(x)\) be the frequency (probability) of amino acid \(x\) at position \(i\) in the PSSM. The standard information content (IC) at position \(i\) is defined as:
\[ \mathrm{IC}_i = \max H_i - H_i \]
which is:
\[ \mathrm{IC}_i = \log_2(N) - H_i \]
where \(N\) is the number of possible amino acids (i.e., \(N = \text{len}(P_i)\)).
At the center position (\(i = 0\)), only three amino acids (S, T, Y) are relevant, so the maximum entropy at each position is defined as:
\[ \max H_i = \begin{cases} \log_2(3) & \text{if } i = 0 \\ \log_2(N) & \text{otherwise} \end{cases} \]
Position
14 0.238071
-18 0.238964
...
3 0.575586
1 2.782864
Length: 40, dtype: float64
Check all zero cases:
Position
-20 0.000000
1 1.740698
...
-18 4.284598
14 4.285491
Length: 40, dtype: float64
get_IC_flat (flat_pssm:pandas.core.series.Series, return_min=False, exclude_zero=False, clean_zero=True)
Calculate the information content (bits) from a flattened pssm pd.Series, using log2(3) for the middle position and log2(len(pssm_df)) for others.
Type | Default | Details | |
---|---|---|---|
flat_pssm | Series | ||
return_min | bool | False | return min entropy as a single value or return all entropy as a pd.series |
exclude_zero | bool | False | exclude the column of 0 (center position) in the entropy calculation |
clean_zero | bool | True | if true, zero out non-last three values in position 0 (keep only s,t,y values at center) |
Position
14 0.238071
-18 0.238964
...
3 0.575586
1 2.782864
Length: 40, dtype: float64
get_specificity (pssm_df)
Get specificity score of a pssm, excluding zero position.
We evaluated the overall specificity of a PSSM by combining two metrics: the maximum IC across surrounding positions and the variance of IC values:
\[ \text{Specificity Score} = 2 \times \max(\text{IC}) + \mathrm{Var}(\text{IC}) \]
get_specificity_flat (flat_pssm)
Get specificity score of a pssm, excluding zero position.
/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/fastcore/docscrape.py:230: UserWarning: Unknown section See Also
else: warn(msg)
plot_heatmap_simple (matrix, title:str='heatmap', figsize:tuple=(6, 7), cmap:str='binary', vmin=None, vmax=None, center=None, robust=False, annot=None, fmt='.2g', annot_kws=None, linewidths=0, linecolor='white', cbar=True, cbar_kws=None, cbar_ax=None, square=False, xticklabels='auto', yticklabels='auto', mask=None, ax=None)
Plot heatmap based on a matrix of values
Type | Default | Details | |
---|---|---|---|
matrix | a matrix of values | ||
title | str | heatmap | title of the heatmap |
figsize | tuple | (6, 7) | (width, height) |
cmap | str | binary | color map, default is dark&white |
vmin | NoneType | None | |
vmax | NoneType | None | |
center | NoneType | None | The value at which to center the colormap when plotting divergent data. Using this parameter will change the default cmap if none isspecified. |
robust | bool | False | If True and vmin or vmax are absent, the colormap range iscomputed with robust quantiles instead of the extreme values. |
annot | NoneType | None | If True, write the data value in each cell. If an array-like with the same shape as data , then use this to annotate the heatmap insteadof the data. Note that DataFrames will match on position, not index. |
fmt | str | .2g | String formatting code to use when adding annotations. |
annot_kws | NoneType | None | Keyword arguments for :meth:matplotlib.axes.Axes.text when annot is True. |
linewidths | int | 0 | Width of the lines that will divide each cell. |
linecolor | str | white | Color of the lines that will divide each cell. |
cbar | bool | True | Whether to draw a colorbar. |
cbar_kws | NoneType | None | Keyword arguments for :meth:matplotlib.figure.Figure.colorbar . |
cbar_ax | NoneType | None | Axes in which to draw the colorbar, otherwise take space from the main Axes. |
square | bool | False | If True, set the Axes aspect to “equal” so each cell will be square-shaped. |
xticklabels | str | auto | |
yticklabels | str | auto | |
mask | NoneType | None | If passed, data will not be shown in cells where mask is True.Cells with missing values are automatically masked. |
ax | NoneType | None | Axes in which to draw the plot, otherwise use the currently-active Axes. |
Returns | matplotlib Axes | Axes object with the heatmap. |
plot_heatmap (heatmap_df, ax=None, position_label=True, figsize=(5, 6), include_zero=True, scale_pos_neg=False, colorbar_title='Prob.')
Plots a heatmap with specific formatting.
This function visualizes a PSSM or log-odds matrix as a heatmap with diverging color scales centered at 0.
Color scale behavior:
By default (scale_pos_neg=False
), the colormap is centered at 0, but the full data range determines the color intensity:
\[ \text{color range} = [\min(\text{data}), \max(\text{data})], \quad \text{with center at } 0 \]
This is useful when you want to emphasize whether values are above or below zero, but without enforcing symmetry.
If scale_pos_neg=True
, the function uses a balanced diverging scale via TwoSlopeNorm
, such that:
\[ \text{min color} = \min(\text{data}), \quad \text{center} = 0, \quad \text{max color} = \max(\text{data}) \]
The positive and negative ranges are scaled separately, ensuring that both ends of the heatmap have equal visual weight — especially helpful for symmetric data like log-odds matrices.
Additional visual features: - The center position (\(i = 0\)) can be masked out using include_zero=False
.
change_center_name (df)
Transfer the middle pS,pT,pY to S,T,Y for plot.
Now instead of pS, pT, and pY, the center name becomes S, T and Y:
aa
S 0.664786
T 0.324738
Y 0.010475
A 0.000000
G 0.000000
Name: 0, dtype: float64
get_pos_min_max (pssm_df)
Get min and max value of sum of positive and negative values across each position.
scale_zero_position (pssm_df)
Scale position 0 so that: - Positive values match the max positive column sum of other positions - Negative values match the min (most negative) column sum of other positions
This function rescales position 0 in a log-odds PSSM so that its total positive and negative stack heights match those of the most extreme positions on either side.
This ensures the central position visually matches the dynamic range of surrounding positions in log-odds logo plots.
scale_pos_neg_values (pssm_df)
Globally scale all positive values by max positive column sum, and negative values by min negative column sum (preserving sign).
convert_logo_df (pssm_df, scale_zero=True, scale_pos_neg=False)
Change center name from pS,pT,pY to S, T, Y in a pssm and scaled zero position to the max of neigbors.
plot_logo_raw (pssm_df, ax=None, title='Motif', ytitle='Bits', figsize=(10, 2))
Plot logo motif using Logomaker.
get_logo_IC (pssm_df)
For plotting purpose, calculate the scaled information content (bits) from a frequency matrix, using log2(3) for the middle position and log2(len(pssm_df)) for others.
To visualize the motif using Logomaker, the scaled PSSM is computed by weighting each amino acid’s frequency at position \(i\) by the position’s information content:
\[ \text{PSSM\_scaled}_i(x) = P_i(x) \cdot \mathrm{IC}_i \]
This results in a matrix where the total stack height at each position equals the information content, and each letter’s height is proportional to its contribution. This is the standard format used by Logomaker to generate sequence logos.
Position | -20 | -19 | -18 | -17 | -16 | -15 | -14 | -13 | -12 | -11 | -10 | -9 | -8 | -7 | -6 | -5 | -4 | -3 | -2 | -1 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
aa | |||||||||||||||||||||||||||||||||||||||||
P | 0.028979 | 0.021736 | 0.019696 | 0.026208 | 0.020191 | 0.020036 | 0.026940 | 0.020702 | 0.024652 | 0.022535 | 0.017114 | 0.021499 | 0.023403 | 0.026208 | 0.031930 | 0.024593 | 0.032412 | 0.039787 | 0.068936 | 0.029358 | 0.000000 | 2.109590 | 0.033707 | 0.051101 | 0.028948 | 0.037501 | 0.028892 | 0.036076 | 0.027630 | 0.028354 | 0.025461 | 0.025050 | 0.022227 | 0.020044 | 0.018602 | 0.024891 | 0.026862 | 0.027559 | 0.015651 | 0.022606 | 0.025402 |
G | 0.019969 | 0.019102 | 0.016480 | 0.015774 | 0.020616 | 0.015743 | 0.025669 | 0.019380 | 0.021765 | 0.020527 | 0.020852 | 0.023912 | 0.016652 | 0.021528 | 0.027406 | 0.021633 | 0.028396 | 0.031460 | 0.028898 | 0.034539 | 0.000000 | 0.087526 | 0.036227 | 0.036235 | 0.017093 | 0.030001 | 0.021343 | 0.022138 | 0.024241 | 0.019139 | 0.021217 | 0.019028 | 0.016569 | 0.022389 | 0.016426 | 0.020782 | 0.023620 | 0.017255 | 0.018122 | 0.023077 | 0.017237 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
pT | 0.008280 | 0.007245 | 0.007637 | 0.005824 | 0.005313 | 0.009405 | 0.005845 | 0.008369 | 0.006219 | 0.006470 | 0.007869 | 0.007459 | 0.009226 | 0.007956 | 0.012506 | 0.010703 | 0.014055 | 0.018197 | 0.019497 | 0.011225 | 0.194046 | 0.044885 | 0.014806 | 0.014866 | 0.013233 | 0.007800 | 0.011453 | 0.012025 | 0.008602 | 0.007561 | 0.006837 | 0.007708 | 0.005456 | 0.008529 | 0.005541 | 0.006525 | 0.007873 | 0.006950 | 0.007002 | 0.007771 | 0.004990 |
pY | 0.000974 | 0.001537 | 0.002814 | 0.002184 | 0.001275 | 0.002658 | 0.001525 | 0.001101 | 0.001777 | 0.002677 | 0.002361 | 0.003071 | 0.002700 | 0.001872 | 0.003459 | 0.003416 | 0.005737 | 0.003084 | 0.005222 | 0.002878 | 0.006260 | 0.011221 | 0.003780 | 0.008827 | 0.003308 | 0.005400 | 0.001562 | 0.002186 | 0.001825 | 0.002127 | 0.003065 | 0.002168 | 0.002627 | 0.001919 | 0.001979 | 0.001450 | 0.001158 | 0.000719 | 0.002059 | 0.002119 | 0.002268 |
23 rows × 41 columns
plot_logo (pssm_df, title='Motif', scale_zero=True, ax=None, figsize=(10, 1))
Plot logo of information content given a frequency PSSM.
Set scale_zero to default True can have better vision of the side amino acids
plot_logo_LO (pssm_LO, title='Motif', acceptor=None, scale_zero=True, scale_pos_neg=True, ax=None, figsize=(10, 1))
Plot logo of log-odds given a frequency PSSM.
To ensure the phosphorylated residue is visible at the center of a log-odds motif (position 0), two mechanisms are used:
Acceptor override: If the center column is entirely zero (e.g., masked), the user can specify an acceptor
('S'
, 'T'
, 'Y'
, or 'STY'
). The function then assigns a small nonzero value (e.g., 0.1) to the corresponding phospho-residue row (pS
, pT
, pY
) at position 0. This ensures the central letter appears in the logo plot, even when real log-odds values are absent.
Stack height rescaling: To maintain visual consistency with surrounding columns, position 0 is rescaled so that its total positive and negative stack heights match the most extreme values observed elsewhere.
Together, these adjustments ensure that: - The phospho-acceptor appears explicitly at the center, - The visual scale remains consistent with neighboring positions, - The resulting logo can faithfully reflect both biological relevance and statistical signal.
plot_logos_idx (pssms_df, *idxs)
Plot logos of a dataframe with flattened PSSMs with index ad IDs.
plot_logos (pssms_df, count_dict=None, path=None, prefix='Motif')
Plot all logos from a dataframe of flattened PSSMs as subplots in a single figure.
Type | Default | Details | |
---|---|---|---|
pssms_df | |||
count_dict | NoneType | None | used to display n in motif title |
path | NoneType | None | |
prefix | str | Motif |
plot_logo_heatmap (pssm_df, title='Motif', figsize=(17, 10), include_zero=False)
Plot logo and heatmap vertically
Type | Default | Details | |
---|---|---|---|
pssm_df | column is position, index is aa | ||
title | str | Motif | |
figsize | tuple | (17, 10) | |
include_zero | bool | False |
plot_logo_heatmap_LO (pssm_LO, title='Motif', acceptor=None, figsize=(17, 10), include_zero=False, scale_pos_neg=True)
Plot logo and heatmap of enrichment bits vertically
Type | Default | Details | |
---|---|---|---|
pssm_LO | pssm of log-odds | ||
title | str | Motif | |
acceptor | NoneType | None | |
figsize | tuple | (17, 10) | |
include_zero | bool | False | |
scale_pos_neg | bool | True |
raw2norm (df:pandas.core.frame.DataFrame, PDHK:bool=False)
Normalize single ST kinase data
Type | Default | Details | |
---|---|---|---|
df | DataFrame | single kinase’s df has position as index, and single amino acid as columns | |
PDHK | bool | False | whether this kinase belongs to PDHK family |
This function implement the normalization method from Johnson et al. Nature: An atlas of substrate specificities for the human serine/threonine kinome
Specifically, > - matrices were column-normalized at all positions by the sum of the 17 randomized amino acids (excluding serine, threonine and cysteine), to yield PSSMs. >- PDHK1 and PDHK4 were normalized to the 16 randomized amino acids (excluding serine, threonine, cysteine and additionally tyrosine) >- The cysteine row was scaled by its median to be 1/17 (1/16 for PDHK1 and PDHK4). >- The serine and threonine values in each position were set to be the median of that position. >- The S0/T0 ratio was determined by summing the values of S and T rows in the matrix (SS and ST, respectively), accounting for the different S vs. T composition of the central (1:1) and peripheral (only S or only T) positions (Sctrl and Tctrl, respectively), and then normalizing to the higher value among the two (S0 and T0, respectively, Supplementary Note 1)
This function is usually implemented with the below function, with normalize
being a bool argument.
get_one_kinase (df:pandas.core.frame.DataFrame, kinase:str, normalize:bool=False, drop_s:bool=True)
Obtain a specific kinase data from stacked dataframe
Type | Default | Details | |
---|---|---|---|
df | DataFrame | stacked dataframe (paper’s raw data) | |
kinase | str | a specific kinase | |
normalize | bool | False | normalize according to the paper; special for PDHK1/4 |
drop_s | bool | True | drop s as s is a duplicates of t in PSPA |
Retreive a single kinase data from PSPA data that has an format of kinase as index and position+amino acid as column.
aa | A | C | D | E | F | G | H | I | K | L | M | N | P | Q | R | S | T | V | W | Y | t | y |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
position | ||||||||||||||||||||||
-5 | 0.0594 | 0.0625 | 0.0589 | 0.0550 | 0.0775 | 0.0697 | 0.0687 | 0.0590 | 0.0515 | 0.0657 | 0.0687 | 0.0613 | 0.0451 | 0.0424 | 0.0594 | 0.0594 | 0.0594 | 0.0573 | 0.1001 | 0.0775 | 0.0583 | 0.0658 |
-4 | 0.0618 | 0.0621 | 0.0550 | 0.0511 | 0.0739 | 0.0715 | 0.0598 | 0.0601 | 0.0520 | 0.0614 | 0.0744 | 0.0549 | 0.0637 | 0.0552 | 0.0617 | 0.0608 | 0.0608 | 0.0519 | 0.0916 | 0.0739 | 0.0528 | 0.0752 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
3 | 0.0486 | 0.0609 | 0.0938 | 0.0684 | 0.1024 | 0.0676 | 0.0544 | 0.0583 | 0.0388 | 0.0552 | 0.0637 | 0.0505 | 0.0686 | 0.0502 | 0.0561 | 0.0588 | 0.0588 | 0.0593 | 0.0641 | 0.1024 | 0.0539 | 0.0431 |
4 | 0.0565 | 0.0749 | 0.0631 | 0.0535 | 0.0732 | 0.0655 | 0.0664 | 0.0625 | 0.0496 | 0.0552 | 0.0627 | 0.0640 | 0.0677 | 0.0553 | 0.0604 | 0.0626 | 0.0626 | 0.0579 | 0.0864 | 0.0732 | 0.0548 | 0.0575 |
10 rows × 22 columns
get_logo (df:pandas.core.frame.DataFrame, kinase:str)
Given stacked df (index as kinase, columns as substrates), get a specific kinase’s logo
Type | Details | |
---|---|---|
df | DataFrame | stacked Dataframe with kinase as index, substrates as columns |
kinase | str | a specific kinase name in index |
This function is to replicate the motif logo from Johnson et al. Nature: An atlas of substrate specificities for the human serine/threonine kinome. Given raw PSPA data, it can output a motif logo.
# load raw PSPA data
df = pd.read_csv('https://github.com/sky1ove/katlas_raw/raw/refs/heads/main/nbs/raw/pspa_st_raw.csv').set_index('kinase')
df.head()
-5P | -5G | -5A | -5C | -5S | -5T | -5V | -5I | -5L | -5M | -5F | -5Y | -5W | -5H | -5K | -5R | -5Q | -5N | -5D | -5E | -5s | -5t | -5y | -4P | -4G | -4A | -4C | -4S | -4T | -4V | -4I | -4L | -4M | -4F | -4Y | -4W | -4H | -4K | -4R | -4Q | -4N | -4D | -4E | -4s | -4t | -4y | -3P | -3G | -3A | -3C | ... | 2E | 2s | 2t | 2y | 3P | 3G | 3A | 3C | 3S | 3T | 3V | 3I | 3L | 3M | 3F | 3Y | 3W | 3H | 3K | 3R | 3Q | 3N | 3D | 3E | 3s | 3t | 3y | 4P | 4G | 4A | 4C | 4S | 4T | 4V | 4I | 4L | 4M | 4F | 4Y | 4W | 4H | 4K | 4R | 4Q | 4N | 4D | 4E | 4s | 4t | 4y | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
kinase | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
AAK1 | 7614134.38 | 2590563.43 | 3001315.49 | 4696631.43 | 4944311.77 | 8315837.72 | 10056545.00 | 16433061.43 | 10499735.53 | 9133577.86 | 4493053.86 | 10062728.22 | 3327454.51 | 3504742.95 | 2767294.24 | 10105742.33 | 5923673.04 | 2909152.87 | 1695155.97 | 1617848.59 | 2128670.48 | 2128670.48 | 6460994.89 | 5260312.92 | 6325834.43 | 6957993.77 | 5369434.90 | 5713920.54 | 6612201.68 | 6093662.03 | 6120308.98 | 7306988.18 | 6829677.84 | 5119221.55 | 5263235.93 | 3974771.07 | 5065007.89 | 7968511.43 | 7041049.08 | 6174443.51 | 4228327.20 | 3271230.67 | 5511933.84 | 3267817.62 | 3267817.62 | 3338569.94 | 8921287.46 | 4210322.63 | 9202467.84 | 5247517.95 | ... | 5087031.12 | 3976345.18 | 3976345.18 | 3984759.21 | 7873214.56 | 10666925.10 | 6726092.35 | 8347110.75 | 8474126.59 | 36243425.13 | 7049439.08 | 4480458.41 | 5646461.38 | 5049205.04 | 4966940.21 | 6154422.64 | 5554384.65 | 7784625.71 | 8536454.84 | 10411516.21 | 7199439.88 | 8496115.61 | 4678462.79 | 4293019.55 | 3871242.35 | 3871242.35 | 4144314.24 | 6754640.94 | 7548893.13 | 6945441.59 | 6316583.85 | 5852227.64 | 11986373.78 | 4544765.44 | 4468425.80 | 4958371.35 | 4992757.20 | 5630292.14 | 5605199.37 | 8889242.83 | 6020662.73 | 8938081.41 | 9983402.01 | 6833481.55 | 6364453.29 | 4189045.89 | 4921595.57 | 2705053.53 | 2705053.53 | 2909279.71 |
ACVR2A | 4991039.28 | 5783855.86 | 7015770.78 | 8367603.09 | 7072052.48 | 7601399.57 | 7188292.41 | 7513915.73 | 7159894.71 | 6266122.81 | 7217726.01 | 6944709.95 | 9655463.75 | 6855044.90 | 6135259.88 | 5714942.29 | 5174360.28 | 6446237.55 | 10676798.47 | 9490370.51 | 9417512.45 | 9417512.45 | 9143262.67 | 5189500.90 | 6115977.27 | 6183207.45 | 8746774.91 | 8620216.35 | 8958568.82 | 6057960.27 | 5865979.65 | 5795429.17 | 6425254.28 | 6896823.79 | 6528270.38 | 8404648.40 | 6144455.59 | 4524121.26 | 5095303.46 | 5374811.94 | 5585576.72 | 11592053.32 | 9685649.12 | 9011965.48 | 9011965.48 | 7594632.10 | 5362570.64 | 6972103.63 | 5730145.40 | 8939563.00 | ... | 6089086.81 | 6553062.94 | 6553062.94 | 5204999.87 | 6765402.33 | 5981896.69 | 5346578.80 | 6919984.14 | 7959489.88 | 7230276.28 | 5724908.70 | 5600557.92 | 6186548.03 | 5952584.60 | 6508513.22 | 6613614.54 | 6419485.14 | 5958101.56 | 4666926.40 | 3909037.15 | 5041118.65 | 5297856.53 | 6281516.23 | 8795439.82 | 5241575.71 | 5241575.71 | 8237893.33 | 7993593.88 | 5729648.65 | 5252569.87 | 7759899.88 | 5847330.49 | 6832130.05 | 5439639.57 | 5935276.66 | 5396841.45 | 6976824.69 | 5517910.17 | 6107147.03 | 8435953.93 | 6039472.76 | 5556300.56 | 5178734.62 | 6490097.70 | 5862480.97 | 6742905.78 | 6750653.36 | 7414220.16 | 7414220.16 | 6209576.97 |
ACVR2B | 26480329.10 | 25689687.16 | 28137300.90 | 45175909.30 | 32876722.90 | 33516959.03 | 27011194.06 | 21996255.94 | 23412987.54 | 25670581.40 | 30029680.93 | 30172687.84 | 35861732.85 | 25743398.12 | 21466618.54 | 23457282.42 | 24765933.65 | 29600378.31 | 52942189.79 | 44756418.68 | 37869524.53 | 37869524.53 | 36929423.91 | 26315617.68 | 30726667.27 | 28226685.89 | 38126762.75 | 43013450.33 | 42772589.49 | 25461877.69 | 22496529.73 | 25367364.10 | 24579622.29 | 30632363.88 | 29811628.74 | 34569034.39 | 29901290.43 | 18566682.92 | 18058410.71 | 24160712.63 | 28003909.47 | 50383510.79 | 42873444.64 | 38601826.06 | 38601826.06 | 41781415.03 | 21589896.66 | 25896930.06 | 25366399.23 | 32391161.86 | ... | 28198813.13 | 38385326.47 | 38385326.47 | 28511534.88 | 32570983.87 | 30150790.48 | 26899530.88 | 30059325.25 | 38558739.93 | 36859921.47 | 27039358.24 | 27590185.37 | 32159022.90 | 28530956.88 | 26440586.17 | 32902030.17 | 31106381.62 | 23931820.75 | 17025117.96 | 21234075.57 | 24959228.30 | 24492089.19 | 27379743.65 | 34799587.30 | 29745626.40 | 29745626.40 | 32930899.01 | 35872341.41 | 28942663.95 | 32630294.18 | 32307682.27 | 29351484.80 | 32158594.62 | 27585750.22 | 27087769.55 | 26427108.32 | 26008460.67 | 24006599.74 | 29260306.53 | 39105460.91 | 27984195.21 | 22496915.32 | 24236904.72 | 29132857.30 | 26527389.14 | 36388726.15 | 34729319.54 | 37906081.09 | 37906081.09 | 31761418.56 |
AKT1 | 18399509.29 | 18104681.05 | 16831835.48 | 17247743.90 | 22647275.57 | 17801288.32 | 13037570.99 | 13271896.32 | 14156489.52 | 15409761.84 | 16671963.73 | 15742204.09 | 16027501.16 | 19907160.04 | 28966209.27 | 46308665.22 | 14988023.16 | 14258599.40 | 11464166.24 | 11466588.37 | 12987224.59 | 12987224.59 | 13061088.75 | 19398931.56 | 22044179.39 | 19063613.39 | 16798065.92 | 24561075.64 | 21053645.01 | 16134289.55 | 14065393.27 | 15980319.99 | 19175233.16 | 15650650.91 | 16726542.08 | 20714995.14 | 21727731.71 | 39387269.30 | 58649797.06 | 19398853.50 | 17142314.75 | 12090323.96 | 14986249.30 | 16353759.94 | 16353759.94 | 14758361.18 | 13509722.97 | 12072788.52 | 14485106.48 | 13140602.43 | ... | 7899072.12 | 9884167.72 | 9884167.72 | 12951695.66 | 19773952.19 | 28710820.58 | 19788527.29 | 24659376.30 | 38048939.96 | 28284495.88 | 18552859.18 | 19892556.30 | 19599278.84 | 19914391.75 | 24525110.15 | 23248076.40 | 22854369.53 | 30978724.22 | 37068344.22 | 45991399.36 | 22887074.13 | 25185236.76 | 11842652.96 | 12741276.18 | 13591360.01 | 13591360.01 | 13703183.16 | 41007225.21 | 26477432.58 | 21719674.69 | 20203616.26 | 38961301.73 | 32270913.63 | 18364889.10 | 16918422.73 | 20570253.26 | 20228125.32 | 17323199.08 | 15512400.43 | 23151572.85 | 29511541.69 | 50942663.29 | 48152924.11 | 32693882.62 | 28896602.57 | 19701350.30 | 13887460.52 | 17483074.60 | 17483074.60 | 11696833.54 |
AKT2 | 5439237.54 | 5569477.23 | 5805462.70 | 6301076.01 | 5004932.12 | 4812022.80 | 3906822.27 | 3776845.45 | 4450344.85 | 4629319.80 | 4945257.93 | 4922327.73 | 4818865.35 | 5502849.58 | 8846468.40 | 13331891.81 | 4466206.11 | 4288906.37 | 2757476.64 | 2846855.07 | 4120973.53 | 4120973.53 | 4296409.60 | 5553404.98 | 6777166.70 | 6560098.67 | 6582761.90 | 5632446.40 | 5626768.67 | 4006942.83 | 3777456.81 | 4557921.65 | 5073875.76 | 3998927.33 | 4589150.59 | 3853565.73 | 5877347.98 | 11323980.56 | 13410263.75 | 5637101.36 | 5224016.00 | 3264181.56 | 3696233.79 | 4296297.38 | 4296297.38 | 3662821.27 | 2964033.65 | 3057508.32 | 4553667.71 | 5296786.48 | ... | 2833442.97 | 5420413.57 | 5420413.57 | 5570730.59 | 5492323.72 | 6803045.82 | 5712262.24 | 8338449.42 | 6916137.16 | 5765528.76 | 3987390.37 | 3310626.06 | 4606344.89 | 3944710.78 | 4615743.61 | 4760316.90 | 4766437.21 | 7463546.47 | 9581858.17 | 9777288.15 | 5173060.60 | 4932877.74 | 3157981.08 | 3133584.99 | 4337162.19 | 4337162.19 | 5399811.38 | 9707178.16 | 7244546.68 | 5450860.89 | 7077129.19 | 7739122.90 | 7823932.94 | 3778464.64 | 4053742.59 | 4346509.71 | 4803778.43 | 4018212.65 | 4513237.41 | 4161648.84 | 6812201.58 | 11590683.50 | 9932525.89 | 6544476.93 | 6252360.75 | 3629091.99 | 3510048.19 | 5499662.30 | 5499662.30 | 4188620.88 |
5 rows × 207 columns