Plot

A collection of plot functions

Setup

Utils


source

set_sns

 set_sns ()

Set seaborn resolution for notebook display


source

get_color_dict

 get_color_dict (categories, palette:str='tab20')

Assign colors to a list of names (allow duplicates), returns a dictionary of unique name with corresponding color

Type Default Details
categories list of names to assign color
palette str tab20 choose from sns.color_palette
get_color_dict(['a','a','b'])
{'a': (0.6823529411764706, 0.7803921568627451, 0.9098039215686274),
 'b': (1.0, 0.4980392156862745, 0.054901960784313725)}

Plot single kinase

logo_func

 logo_func (df:pandas.core.frame.DataFrame, title:str='logo')

Use logomaker plot motif logos given a df matrix

Type Default Details
df DataFrame a dataframe that contains ratios for each amino acid at each position
title str logo title of the motif logo

source

get_logo2

 get_logo2 (full:pandas.core.frame.DataFrame, title:str='logo')

Plot logo from a full freqency matrix of a kinase

Type Default Details
full DataFrame a dataframe that contains the full matrix of a kinase, with index as amino acid, and columns as positions
title str logo title of the graph
# get kinase-substrate dataset
df = Data.get_ks_dataset()

# get data for a specific kinase
df_k = df.query('kinase == "DYRK2"')

# get the full freq matrix
_,full = get_freq(df_k)

# plot logo
get_logo2(full,'DYRK2')

Rank

/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/fastcore/docscrape.py:230: UserWarning: Unknown section See Also
  else: warn(msg)

source

plot_rank

 plot_rank (sorted_df:pandas.core.frame.DataFrame, x:str, y:str,
            n_hi:int=10, n_lo:int=10, figsize:tuple=(10, 8), data=None,
            hue=None, size=None, style=None, palette=None, hue_order=None,
            hue_norm=None, sizes=None, size_order=None, size_norm=None,
            markers=True, style_order=None, legend='auto', ax=None)

Plot rank from a sorted dataframe

Type Default Details
sorted_df DataFrame a sorted dataframe
x str column name for x axis
y str column name for y aixs
n_hi int 10 if not None, show the head n names
n_lo int 10 if not None, show the tail n names
figsize tuple (10, 8) figure size
data NoneType None Input data structure. Either a long-form collection of vectors that can be
assigned to named variables or a wide-form dataset that will be internally
reshaped.
hue NoneType None Grouping variable that will produce points with different colors.
Can be either categorical or numeric, although color mapping will
behave differently in latter case.
size NoneType None Grouping variable that will produce points with different sizes.
Can be either categorical or numeric, although size mapping will
behave differently in latter case.
style NoneType None Grouping variable that will produce points with different markers.
Can have a numeric dtype but will always be treated as categorical.
palette NoneType None Method for choosing the colors to use when mapping the hue semantic.
String values are passed to :func:color_palette. List or dict values
imply categorical mapping, while a colormap object implies numeric mapping.
hue_order NoneType None Specify the order of processing and plotting for categorical levels of the
hue semantic.
hue_norm NoneType None Either a pair of values that set the normalization range in data units
or an object that will map from data units into a [0, 1] interval. Usage
implies numeric mapping.
sizes NoneType None An object that determines how sizes are chosen when size is used.
List or dict arguments should provide a size for each unique data value,
which forces a categorical interpretation. The argument may also be a
min, max tuple.
size_order NoneType None Specified order for appearance of the size variable levels,
otherwise they are determined from the data. Not relevant when the
size variable is numeric.
size_norm NoneType None Normalization in data units for scaling plot objects when the
size variable is numeric.
markers bool True Object determining how to draw the markers for different levels of the
style variable. Setting to True will use default markers, or
you can pass a list of markers or a dictionary mapping levels of the
style variable to markers. Setting to False will draw
marker-less lines. Markers are specified as in matplotlib.
style_order NoneType None Specified order for appearance of the style variable levels
otherwise they are determined from the data. Not relevant when the
style variable is numeric.
legend str auto How to draw the legend. If “brief”, numeric hue and size
variables will be represented with a sample of evenly spaced values.
If “full”, every group will get an entry in the legend. If “auto”,
choose between brief or full representation based on number of levels.
If False, no legend data is added and no legend is drawn.
ax NoneType None Pre-existing axes for the plot. Otherwise, call :func:matplotlib.pyplot.gca
internally.
Returns :class:matplotlib.axes.Axes The matplotlib axes containing the plot.
# load data
# df = Data.get_pspa_raw().set_index('kinase')
df = pd.read_csv('https://github.com/sky1ove/katlas_raw/raw/refs/heads/main/nbs/raw/pspa_st_raw.csv').set_index('kinase')


# get sorted dataframe
sorted_df = df.max(1).reset_index(name='values').sort_values('values')
sorted_df.head()
kinase values
68 CK1G2 189898.392
294 VRK2 4191709.640
8 ALPHAK3 4573611.730
249 PRPK 8495330.790
38 CAMLCK 9413689.600
plot_rank(sorted_df,x='kinase',y='values')
plt.xlabel('kinase');

Histogram


source

plot_hist

 plot_hist (df:pandas.core.frame.DataFrame, x:str, figsize:tuple=(6, 2),
            data=None, y=None, hue=None, weights=None, stat='count',
            bins='auto', binwidth=None, binrange=None, discrete=None,
            cumulative=False, common_bins=True, common_norm=True,
            multiple='layer', element='bars', fill=True, shrink=1,
            kde=False, kde_kws=None, line_kws=None, thresh=0,
            pthresh=None, pmax=None, cbar=False, cbar_ax=None,
            cbar_kws=None, palette=None, hue_order=None, hue_norm=None,
            color=None, log_scale=None, legend=True, ax=None)
Type Default Details
df DataFrame a dataframe that contain values for plot
x str column name of values
figsize tuple (6, 2)
data NoneType None Input data structure. Either a long-form collection of vectors that can be
assigned to named variables or a wide-form dataset that will be internally
reshaped.
y NoneType None
hue NoneType None Semantic variable that is mapped to determine the color of plot elements.
weights NoneType None If provided, weight the contribution of the corresponding data points
towards the count in each bin by these factors.
stat str count Aggregate statistic to compute in each bin.

- count: show the number of observations in each bin
- frequency: show the number of observations divided by the bin width
- probability or proportion: normalize such that bar heights sum to 1
- percent: normalize such that bar heights sum to 100
- density: normalize such that the total area of the histogram equals 1
bins str auto Generic bin parameter that can be the name of a reference rule,
the number of bins, or the breaks of the bins.
Passed to :func:numpy.histogram_bin_edges.
binwidth NoneType None Width of each bin, overrides bins but can be used with
binrange.
binrange NoneType None Lowest and highest value for bin edges; can be used either
with bins or binwidth. Defaults to data extremes.
discrete NoneType None If True, default to binwidth=1 and draw the bars so that they are
centered on their corresponding data points. This avoids “gaps” that may
otherwise appear when using discrete (integer) data.
cumulative bool False If True, plot the cumulative counts as bins increase.
common_bins bool True If True, use the same bins when semantic variables produce multiple
plots. If using a reference rule to determine the bins, it will be computed
with the full dataset.
common_norm bool True If True and using a normalized statistic, the normalization will apply over
the full dataset. Otherwise, normalize each histogram independently.
multiple str layer Approach to resolving multiple elements when semantic mapping creates subsets.
Only relevant with univariate data.
element str bars Visual representation of the histogram statistic.
Only relevant with univariate data.
fill bool True If True, fill in the space under the histogram.
Only relevant with univariate data.
shrink int 1 Scale the width of each bar relative to the binwidth by this factor.
Only relevant with univariate data.
kde bool False If True, compute a kernel density estimate to smooth the distribution
and show on the plot as (one or more) line(s).
Only relevant with univariate data.
kde_kws NoneType None Parameters that control the KDE computation, as in :func:kdeplot.
line_kws NoneType None Parameters that control the KDE visualization, passed to
:meth:matplotlib.axes.Axes.plot.
thresh int 0 Cells with a statistic less than or equal to this value will be transparent.
Only relevant with bivariate data.
pthresh NoneType None Like thresh, but a value in [0, 1] such that cells with aggregate counts
(or other statistics, when used) up to this proportion of the total will be
transparent.
pmax NoneType None A value in [0, 1] that sets that saturation point for the colormap at a value
such that cells below constitute this proportion of the total count (or
other statistic, when used).
cbar bool False If True, add a colorbar to annotate the color mapping in a bivariate plot.
Note: Does not currently support plots with a hue variable well.
cbar_ax NoneType None Pre-existing axes for the colorbar.
cbar_kws NoneType None Additional parameters passed to :meth:matplotlib.figure.Figure.colorbar.
palette NoneType None Method for choosing the colors to use when mapping the hue semantic.
String values are passed to :func:color_palette. List or dict values
imply categorical mapping, while a colormap object implies numeric mapping.
hue_order NoneType None Specify the order of processing and plotting for categorical levels of the
hue semantic.
hue_norm NoneType None Either a pair of values that set the normalization range in data units
or an object that will map from data units into a [0, 1] interval. Usage
implies numeric mapping.
color NoneType None Single color specification for when hue mapping is not used. Otherwise, the
plot will try to hook into the matplotlib property cycle.
log_scale NoneType None Set axis scale(s) to log. A single value sets the data axis for any numeric
axes in the plot. A pair of values sets each axis independently.
Numeric values are interpreted as the desired base (default 10).
When None or False, seaborn defers to the existing Axes scale.
legend bool True If False, suppress the legend for semantic variables.
ax NoneType None Pre-existing axes for the plot. Otherwise, call :func:matplotlib.pyplot.gca
internally.
Returns :class:matplotlib.axes.Axes The matplotlib axes containing the plot.
# we can use the same df
sorted_df.head()
kinase values
68 CK1G2 189898.392
294 VRK2 4191709.640
8 ALPHAK3 4573611.730
249 PRPK 8495330.790
38 CAMLCK 9413689.600
plot_hist(sorted_df,'values')

Heatmap


source

plot_heatmap

 plot_heatmap (matrix, title:str='heatmap', figsize:tuple=(6, 10),
               cmap:str='binary', vmin=None, vmax=None, center=None,
               robust=False, annot=None, fmt='.2g', annot_kws=None,
               linewidths=0, linecolor='white', cbar=True, cbar_kws=None,
               cbar_ax=None, square=False, xticklabels='auto',
               yticklabels='auto', mask=None, ax=None)

Plot heatmap based on a matrix of values

Type Default Details
matrix a matrix of values
title str heatmap title of the heatmap
figsize tuple (6, 10) figure size of the heatmap
cmap str binary color map, default is dark&white
vmin NoneType None
vmax NoneType None
center NoneType None The value at which to center the colormap when plotting divergent data.
Using this parameter will change the default cmap if none is
specified.
robust bool False If True and vmin or vmax are absent, the colormap range is
computed with robust quantiles instead of the extreme values.
annot NoneType None If True, write the data value in each cell. If an array-like with the
same shape as data, then use this to annotate the heatmap instead
of the data. Note that DataFrames will match on position, not index.
fmt str .2g String formatting code to use when adding annotations.
annot_kws NoneType None Keyword arguments for :meth:matplotlib.axes.Axes.text when annot
is True.
linewidths int 0 Width of the lines that will divide each cell.
linecolor str white Color of the lines that will divide each cell.
cbar bool True Whether to draw a colorbar.
cbar_kws NoneType None Keyword arguments for :meth:matplotlib.figure.Figure.colorbar.
cbar_ax NoneType None Axes in which to draw the colorbar, otherwise take space from the
main Axes.
square bool False If True, set the Axes aspect to “equal” so each cell will be
square-shaped.
xticklabels str auto
yticklabels str auto
mask NoneType None If passed, data will not be shown in cells where mask is True.
Cells with missing values are automatically masked.
ax NoneType None Axes in which to draw the plot, otherwise use the currently-active
Axes.
Returns matplotlib Axes Axes object with the heatmap.
_.head()
Position -5 -4 -3 -2 -1 1 2 3 4
aa
P 0.060639 0.066152 0.074972 0.110254 0.110254 0.386313 0.057459 0.135105 0.062361
G 0.076075 0.074972 0.126792 0.061742 0.087100 0.046358 0.068508 0.101883 0.067929
A 0.091510 0.083793 0.061742 0.142227 0.100331 0.089404 0.108287 0.071982 0.080178
C 0.011025 0.006615 0.011025 0.030871 0.017641 0.012141 0.023204 0.018826 0.006682
S 0.036384 0.049614 0.024256 0.036384 0.023153 0.027594 0.028729 0.035437 0.038976
plot_heatmap(_)

Visualize features in 2D


source

plot_2d

 plot_2d (X:pandas.core.frame.DataFrame, data=None, x=None, y=None,
          hue=None, size=None, style=None, palette=None, hue_order=None,
          hue_norm=None, sizes=None, size_order=None, size_norm=None,
          markers=True, style_order=None, legend='auto', ax=None)

Make 2D plot from a dataframe that has first column to be x, and second column to be y

Type Default Details
X DataFrame a dataframe that has first column to be x, and second column to be y
data NoneType None Input data structure. Either a long-form collection of vectors that can be
assigned to named variables or a wide-form dataset that will be internally
reshaped.
x NoneType None
y NoneType None
hue NoneType None Grouping variable that will produce points with different colors.
Can be either categorical or numeric, although color mapping will
behave differently in latter case.
size NoneType None Grouping variable that will produce points with different sizes.
Can be either categorical or numeric, although size mapping will
behave differently in latter case.
style NoneType None Grouping variable that will produce points with different markers.
Can have a numeric dtype but will always be treated as categorical.
palette NoneType None Method for choosing the colors to use when mapping the hue semantic.
String values are passed to :func:color_palette. List or dict values
imply categorical mapping, while a colormap object implies numeric mapping.
hue_order NoneType None Specify the order of processing and plotting for categorical levels of the
hue semantic.
hue_norm NoneType None Either a pair of values that set the normalization range in data units
or an object that will map from data units into a [0, 1] interval. Usage
implies numeric mapping.
sizes NoneType None An object that determines how sizes are chosen when size is used.
List or dict arguments should provide a size for each unique data value,
which forces a categorical interpretation. The argument may also be a
min, max tuple.
size_order NoneType None Specified order for appearance of the size variable levels,
otherwise they are determined from the data. Not relevant when the
size variable is numeric.
size_norm NoneType None Normalization in data units for scaling plot objects when the
size variable is numeric.
markers bool True Object determining how to draw the markers for different levels of the
style variable. Setting to True will use default markers, or
you can pass a list of markers or a dictionary mapping levels of the
style variable to markers. Setting to False will draw
marker-less lines. Markers are specified as in matplotlib.
style_order NoneType None Specified order for appearance of the style variable levels
otherwise they are determined from the data. Not relevant when the
style variable is numeric.
legend str auto How to draw the legend. If “brief”, numeric hue and size
variables will be represented with a sample of evenly spaced values.
If “full”, every group will get an entry in the legend. If “auto”,
choose between brief or full representation based on number of levels.
If False, no legend data is added and no legend is drawn.
ax NoneType None Pre-existing axes for the plot. Otherwise, call :func:matplotlib.pyplot.gca
internally.
Returns :class:matplotlib.axes.Axes The matplotlib axes containing the plot.
plot_2d(_.iloc[:,:2])


source

plot_cluster

 plot_cluster (df:pandas.core.frame.DataFrame, method:str='pca',
               hue:str=None, complexity:int=30, palette:str='tab20',
               legend:bool=False, name_list=None, seed:int=123, s:int=50,
               **kwargs)

Given a dataframe of values, plot it in 2d, method could be pca, tsne, or umap

Type Default Details
df DataFrame a dataframe of values that is waited for dimensionality reduction
method str pca dimensionality reduction method, choose from pca, umap, and tsne
hue str None colname of color
complexity int 30 recommend 30 for tsne, 15 for umap, none for pca
palette str tab20 color scheme, could be tab10 if less categories
legend bool False whether or not add the legend on the side
name_list NoneType None a list of names to annotate each dot in the plot
seed int 123 seed for dimensionality reduction
s int 50 size of the dot
kwargs
# load data
aa = Data.get_aa_info()
aa_rdkit = get_rdkit(aa, 'SMILES') # get rdkit features from SMILES columns
aa_rdkit = preprocess(aa_rdkit) # remove similar columns
info=Data.get_aa_info()
removing columns: {'fr_aniline', 'HeavyAtomMolWt', 'fr_benzodiazepine', 'fr_nitrile', 'SMR_VSA2', 'fr_N_O', 'fr_furan', 'fr_oxazole', 'NumAliphaticRings', 'fr_alkyl_halide', 'fr_C_S', 'fr_Ndealkylation2', 'SMR_VSA8', 'SlogP_VSA6', 'fr_quatN', 'fr_oxime', 'fr_prisulfonamd', 'fr_isothiocyan', 'fr_nitro_arom', 'fr_imide', 'fr_methoxy', 'fr_epoxide', 'NumAliphaticCarbocycles', 'fr_tetrazole', 'PEOE_VSA5', 'SlogP_VSA11', 'fr_benzene', 'fr_azo', 'fr_dihydropyridine', 'fr_Nhpyrrole', 'fr_ketone', 'fr_nitro_arom_nonortho', 'fr_diazo', 'fr_para_hydroxylation', 'fr_piperzine', 'fr_Ar_COO', 'fr_morpholine', 'fr_sulfone', 'ExactMolWt', 'SlogP_VSA7', 'fr_thiocyan', 'LabuteASA', 'fr_phenol', 'EState_VSA11', 'fr_bicyclic', 'fr_HOCCN', 'fr_isocyan', 'fr_phos_acid', 'fr_lactone', 'MaxPartialCharge', 'fr_Ar_OH', 'fr_thiazole', 'fr_phenol_noOrthoHbond', 'NumSaturatedHeterocycles', 'fr_alkyl_carbamate', 'BCUT2D_MRHI', 'MinAbsPartialCharge', 'fr_nitroso', 'fr_barbitur', 'fr_azide', 'fr_phos_ester', 'fr_Al_OH_noTert', 'fr_ether', 'fr_hdrzone', 'SlogP_VSA10', 'fr_Ar_NH', 'NumSaturatedCarbocycles', 'Chi1n', 'fr_sulfonamd', 'fr_C_O_noCOO', 'fr_guanido', 'fr_halogen', 'fr_thiophene', 'fr_aldehyde', 'fr_ketone_Topliss', 'fr_nitro', 'fr_urea', 'fr_pyridine', 'fr_piperdine', 'fr_ArN', 'SlogP_VSA12', 'fr_ester', 'fr_COO2', 'fr_hdrzine', 'NumRadicalElectrons', 'MaxEStateIndex', 'Chi0', 'fr_term_acetylene', 'PEOE_VSA13', 'fr_allylic_oxid', 'fr_amidine', 'SlogP_VSA9', 'fr_lactam', 'NumSaturatedRings', 'fr_COO', 'fr_aryl_methyl', 'MolMR', 'fr_amide', 'HeavyAtomCount', 'NumValenceElectrons', 'fr_Imine', 'fr_Ndealkylation1', 'VSA_EState1'}
plot_cluster(aa_rdkit, name_list = aa.Name.tolist(), hue = 'aa')

Interactive plot


source

plot_bokeh

 plot_bokeh (X:pandas.core.frame.DataFrame, idx, hue:None, s:int=3,
             **kwargs)

Make interactive 2D plot with a searching box and window of dot information when pointing

Type Default Details
X DataFrame a dataframe of two columns from dimensionality reduction
idx pd.Series or list that indicates identities for searching box
hue None pd.Series or list that indicates category for each sample
s int 3 dot size
kwargs
# PCA reduce dimension to 2
X = reduce_feature(aa_rdkit)

# get info
info=Data.get_aa_info()

#plot
plot_bokeh(X, 
           idx = info.Name,
           hue = info.Name,
           s=7, 
           smiles = info.SMILES)
Loading BokehJS ...

Bar graph


source

plot_count

 plot_count (cnt, tick_spacing:float=None, palette:str='tab20')

Make bar plot from df[‘x’].value_counts()

Type Default Details
cnt from df[‘x’].value_counts()
tick_spacing float None tick spacing for x axis
palette str tab20
# get count
cnt = aa_rdkit.fr_sulfide.round(3).value_counts()

# make plot
plot_count(cnt)


source

plot_bar

 plot_bar (df, value, group, title=None, figsize=(12, 5), fontsize=14,
           dots=True, rotation=90, ascending=False, data=None, x=None,
           y=None, hue=None, order=None, hue_order=None, estimator='mean',
           errorbar=('ci', 95), n_boot=1000, seed=None, units=None,
           weights=None, orient=None, color=None, palette=None,
           saturation=0.75, fill=True, hue_norm=None, width=0.8,
           dodge='auto', gap=0, log_scale=None, native_scale=False,
           formatter=None, legend='auto', capsize=0, err_kws=None,
           ci=<deprecated>, errcolor=<deprecated>, errwidth=<deprecated>,
           ax=None)

Plot bar graph from unstacked dataframe; need to indicate columns of values and categories

Type Default Details
df
value colname of value
group colname of group
title NoneType None
figsize tuple (12, 5)
fontsize int 14
dots bool True whether or not add dots in the graph
rotation int 90
ascending bool False
data NoneType None Dataset for plotting. If x and y are absent, this is
interpreted as wide-form. Otherwise it is expected to be long-form.
x NoneType None
y NoneType None
hue NoneType None
order NoneType None
hue_order NoneType None
estimator str mean Statistical function to estimate within each categorical bin.
errorbar tuple (‘ci’, 95) Name of errorbar method (either “ci”, “pi”, “se”, or “sd”), or a tuple
with a method name and a level parameter, or a function that maps from a
vector to a (min, max) interval, or None to hide errorbar. See the
:doc:errorbar tutorial </tutorial/error_bars> for more information.

.. versionadded:: v0.12.0
n_boot int 1000 Number of bootstrap samples used to compute confidence intervals.
seed NoneType None Seed or random number generator for reproducible bootstrapping.
units NoneType None Identifier of sampling units; used by the errorbar function to
perform a multilevel bootstrap and account for repeated measures
weights NoneType None Data values or column used to compute weighted statistics.
Note that the use of weights may limit other statistical options.

.. versionadded:: v0.13.1
orient NoneType None Orientation of the plot (vertical or horizontal). This is usually
inferred based on the type of the input variables, but it can be used
to resolve ambiguity when both x and y are numeric or when
plotting wide-form data.

.. versionchanged:: v0.13.0
Added ‘x’/‘y’ as options, equivalent to ‘v’/‘h’.
color NoneType None Single color for the elements in the plot.
palette NoneType None Colors to use for the different levels of the hue variable. Should
be something that can be interpreted by :func:color_palette, or a
dictionary mapping hue levels to matplotlib colors.
saturation float 0.75 Proportion of the original saturation to draw fill colors in. Large
patches often look better with desaturated colors, but set this to
1 if you want the colors to perfectly match the input values.
fill bool True If True, use a solid patch. Otherwise, draw as line art.

.. versionadded:: v0.13.0
hue_norm NoneType None Normalization in data units for colormap applied to the hue
variable when it is numeric. Not relevant if hue is categorical.

.. versionadded:: v0.12.0
width float 0.8 Width allotted to each element on the orient axis. When native_scale=True,
it is relative to the minimum distance between two values in the native scale.
dodge str auto When hue mapping is used, whether elements should be narrowed and shifted along
the orient axis to eliminate overlap. If "auto", set to True when the
orient variable is crossed with the categorical variable or False otherwise.

.. versionchanged:: 0.13.0

Added "auto" mode as a new default.
gap int 0 Shrink on the orient axis by this factor to add a gap between dodged elements.

.. versionadded:: 0.13.0
log_scale NoneType None Set axis scale(s) to log. A single value sets the data axis for any numeric
axes in the plot. A pair of values sets each axis independently.
Numeric values are interpreted as the desired base (default 10).
When None or False, seaborn defers to the existing Axes scale.

.. versionadded:: v0.13.0
native_scale bool False When True, numeric or datetime values on the categorical axis will maintain
their original scaling rather than being converted to fixed indices.

.. versionadded:: v0.13.0
formatter NoneType None Function for converting categorical data into strings. Affects both grouping
and tick labels.

.. versionadded:: v0.13.0
legend str auto How to draw the legend. If “brief”, numeric hue and size
variables will be represented with a sample of evenly spaced values.
If “full”, every group will get an entry in the legend. If “auto”,
choose between brief or full representation based on number of levels.
If False, no legend data is added and no legend is drawn.

.. versionadded:: v0.13.0
capsize int 0 Width of the “caps” on error bars, relative to bar spacing.
err_kws NoneType None Parameters of :class:matplotlib.lines.Line2D, for the error bar artists.

.. versionadded:: v0.13.0
ci Deprecated Level of the confidence interval to show, in [0, 100].

.. deprecated:: v0.12.0
Use errorbar=("ci", ...).
errcolor Deprecated Color used for the error bar lines.

.. deprecated:: 0.13.0
Use err_kws={'color': ...}.
errwidth Deprecated Thickness of error bar lines (and caps), in points.

.. deprecated:: 0.13.0
Use err_kws={'linewidth': ...}.
ax NoneType None Axes object to draw the plot onto, otherwise uses the current Axes.
Returns matplotlib Axes Returns the Axes object with the plot drawn onto it.
info['cat'] = (info.pKa1>2).astype(int)
# get an unstacked dataframe
plot_bar(info,value='MW',group='cat',palette='tab20')


source

plot_group_bar

 plot_group_bar (df, value_cols, group, figsize=(12, 5), order=None,
                 title=None, fontsize=14, rotation=90, data=None, x=None,
                 y=None, hue=None, hue_order=None, estimator='mean',
                 errorbar=('ci', 95), n_boot=1000, seed=None, units=None,
                 weights=None, orient=None, color=None, palette=None,
                 saturation=0.75, fill=True, hue_norm=None, width=0.8,
                 dodge='auto', gap=0, log_scale=None, native_scale=False,
                 formatter=None, legend='auto', capsize=0, err_kws=None,
                 ci=<deprecated>, errcolor=<deprecated>,
                 errwidth=<deprecated>, ax=None)

Plot grouped bar graph from dataframe.

Type Default Details
df
value_cols list of column names for values, the order depends on the first item
group column name of group (e.g., ‘kinase’)
figsize tuple (12, 5)
order NoneType None
title NoneType None
fontsize int 14
rotation int 90
data NoneType None Dataset for plotting. If x and y are absent, this is
interpreted as wide-form. Otherwise it is expected to be long-form.
x NoneType None
y NoneType None
hue NoneType None
hue_order NoneType None
estimator str mean Statistical function to estimate within each categorical bin.
errorbar tuple (‘ci’, 95) Name of errorbar method (either “ci”, “pi”, “se”, or “sd”), or a tuple
with a method name and a level parameter, or a function that maps from a
vector to a (min, max) interval, or None to hide errorbar. See the
:doc:errorbar tutorial </tutorial/error_bars> for more information.

.. versionadded:: v0.12.0
n_boot int 1000 Number of bootstrap samples used to compute confidence intervals.
seed NoneType None Seed or random number generator for reproducible bootstrapping.
units NoneType None Identifier of sampling units; used by the errorbar function to
perform a multilevel bootstrap and account for repeated measures
weights NoneType None Data values or column used to compute weighted statistics.
Note that the use of weights may limit other statistical options.

.. versionadded:: v0.13.1
orient NoneType None Orientation of the plot (vertical or horizontal). This is usually
inferred based on the type of the input variables, but it can be used
to resolve ambiguity when both x and y are numeric or when
plotting wide-form data.

.. versionchanged:: v0.13.0
Added ‘x’/‘y’ as options, equivalent to ‘v’/‘h’.
color NoneType None Single color for the elements in the plot.
palette NoneType None Colors to use for the different levels of the hue variable. Should
be something that can be interpreted by :func:color_palette, or a
dictionary mapping hue levels to matplotlib colors.
saturation float 0.75 Proportion of the original saturation to draw fill colors in. Large
patches often look better with desaturated colors, but set this to
1 if you want the colors to perfectly match the input values.
fill bool True If True, use a solid patch. Otherwise, draw as line art.

.. versionadded:: v0.13.0
hue_norm NoneType None Normalization in data units for colormap applied to the hue
variable when it is numeric. Not relevant if hue is categorical.

.. versionadded:: v0.12.0
width float 0.8 Width allotted to each element on the orient axis. When native_scale=True,
it is relative to the minimum distance between two values in the native scale.
dodge str auto When hue mapping is used, whether elements should be narrowed and shifted along
the orient axis to eliminate overlap. If "auto", set to True when the
orient variable is crossed with the categorical variable or False otherwise.

.. versionchanged:: 0.13.0

Added "auto" mode as a new default.
gap int 0 Shrink on the orient axis by this factor to add a gap between dodged elements.

.. versionadded:: 0.13.0
log_scale NoneType None Set axis scale(s) to log. A single value sets the data axis for any numeric
axes in the plot. A pair of values sets each axis independently.
Numeric values are interpreted as the desired base (default 10).
When None or False, seaborn defers to the existing Axes scale.

.. versionadded:: v0.13.0
native_scale bool False When True, numeric or datetime values on the categorical axis will maintain
their original scaling rather than being converted to fixed indices.

.. versionadded:: v0.13.0
formatter NoneType None Function for converting categorical data into strings. Affects both grouping
and tick labels.

.. versionadded:: v0.13.0
legend str auto How to draw the legend. If “brief”, numeric hue and size
variables will be represented with a sample of evenly spaced values.
If “full”, every group will get an entry in the legend. If “auto”,
choose between brief or full representation based on number of levels.
If False, no legend data is added and no legend is drawn.

.. versionadded:: v0.13.0
capsize int 0 Width of the “caps” on error bars, relative to bar spacing.
err_kws NoneType None Parameters of :class:matplotlib.lines.Line2D, for the error bar artists.

.. versionadded:: v0.13.0
ci Deprecated Level of the confidence interval to show, in [0, 100].

.. deprecated:: v0.12.0
Use errorbar=("ci", ...).
errcolor Deprecated Color used for the error bar lines.

.. deprecated:: 0.13.0
Use err_kws={'color': ...}.
errwidth Deprecated Thickness of error bar lines (and caps), in points.

.. deprecated:: 0.13.0
Use err_kws={'linewidth': ...}.
ax NoneType None Axes object to draw the plot onto, otherwise uses the current Axes.
Returns matplotlib Axes Returns the Axes object with the plot drawn onto it.
plot_group_bar(info,['pKa1','pKb2'],'Name')

Box plot


source

plot_box

 plot_box (df, value, group, title=None, figsize=(6, 3), fontsize=14,
           dots=True, rotation=90, data=None, x=None, y=None, hue=None,
           order=None, hue_order=None, orient=None, color=None,
           palette=None, saturation=0.75, fill=True, dodge='auto',
           width=0.8, gap=0, whis=1.5, linecolor='auto', linewidth=None,
           fliersize=None, hue_norm=None, native_scale=False,
           log_scale=None, formatter=None, legend='auto', ax=None)

Plot box plot.

Type Default Details
df
value colname of value
group colname of group
title NoneType None
figsize tuple (6, 3)
fontsize int 14
dots bool True
rotation int 90
data NoneType None Dataset for plotting. If x and y are absent, this is
interpreted as wide-form. Otherwise it is expected to be long-form.
x NoneType None
y NoneType None
hue NoneType None
order NoneType None
hue_order NoneType None
orient NoneType None Orientation of the plot (vertical or horizontal). This is usually
inferred based on the type of the input variables, but it can be used
to resolve ambiguity when both x and y are numeric or when
plotting wide-form data.

.. versionchanged:: v0.13.0
Added ‘x’/‘y’ as options, equivalent to ‘v’/‘h’.
color NoneType None Single color for the elements in the plot.
palette NoneType None Colors to use for the different levels of the hue variable. Should
be something that can be interpreted by :func:color_palette, or a
dictionary mapping hue levels to matplotlib colors.
saturation float 0.75 Proportion of the original saturation to draw fill colors in. Large
patches often look better with desaturated colors, but set this to
1 if you want the colors to perfectly match the input values.
fill bool True If True, use a solid patch. Otherwise, draw as line art.

.. versionadded:: v0.13.0
dodge str auto When hue mapping is used, whether elements should be narrowed and shifted along
the orient axis to eliminate overlap. If "auto", set to True when the
orient variable is crossed with the categorical variable or False otherwise.

.. versionchanged:: 0.13.0

Added "auto" mode as a new default.
width float 0.8 Width allotted to each element on the orient axis. When native_scale=True,
it is relative to the minimum distance between two values in the native scale.
gap int 0 Shrink on the orient axis by this factor to add a gap between dodged elements.

.. versionadded:: 0.13.0
whis float 1.5 Paramater that controls whisker length. If scalar, whiskers are drawn
to the farthest datapoint within whis IQR* from the nearest hinge.
If a tuple, it is interpreted as percentiles that whiskers represent.
linecolor str auto Color to use for line elements, when fill is True.

.. versionadded:: v0.13.0
linewidth NoneType None Width of the lines that frame the plot elements.
fliersize NoneType None Size of the markers used to indicate outlier observations.
hue_norm NoneType None Normalization in data units for colormap applied to the hue
variable when it is numeric. Not relevant if hue is categorical.

.. versionadded:: v0.12.0
native_scale bool False When True, numeric or datetime values on the categorical axis will maintain
their original scaling rather than being converted to fixed indices.

.. versionadded:: v0.13.0
log_scale NoneType None Set axis scale(s) to log. A single value sets the data axis for any numeric
axes in the plot. A pair of values sets each axis independently.
Numeric values are interpreted as the desired base (default 10).
When None or False, seaborn defers to the existing Axes scale.

.. versionadded:: v0.13.0
formatter NoneType None Function for converting categorical data into strings. Affects both grouping
and tick labels.

.. versionadded:: v0.13.0
legend str auto How to draw the legend. If “brief”, numeric hue and size
variables will be represented with a sample of evenly spaced values.
If “full”, every group will get an entry in the legend. If “auto”,
choose between brief or full representation based on number of levels.
If False, no legend data is added and no legend is drawn.

.. versionadded:: v0.13.0
ax NoneType None Axes object to draw the plot onto, otherwise uses the current Axes.
Returns matplotlib Axes Returns the Axes object with the plot drawn onto it.
plot_box(info,value='MW',group='cat',palette='tab20')

Correlation graph with regression line


source

plot_corr

 plot_corr (x, y, xlabel=None, ylabel=None, data=None, text_location=[0.8,
            0.1], x_estimator=None, x_bins=None, x_ci='ci', scatter=True,
            fit_reg=True, ci=95, n_boot=1000, units=None, seed=None,
            order=1, logistic=False, lowess=False, robust=False,
            logx=False, x_partial=None, y_partial=None, truncate=True,
            dropna=True, x_jitter=None, y_jitter=None, label=None,
            color=None, marker='o', scatter_kws=None, line_kws=None,
            ax=None)

Given a dataframe and the name of two columns, plot the two columns’ correlation

Type Default Details
x x axis values, or colname of x axis
y y axis values, or colname of y axis
xlabel NoneType None x axis label
ylabel NoneType None y axis label
data NoneType None dataframe that contains data
text_location list [0.8, 0.1]
x_estimator NoneType None Apply this function to each unique value of x and plot the
resulting estimate. This is useful when x is a discrete variable.
If x_ci is given, this estimate will be bootstrapped and a
confidence interval will be drawn.
x_bins NoneType None Bin the x variable into discrete bins and then estimate the central
tendency and a confidence interval. This binning only influences how
the scatterplot is drawn; the regression is still fit to the original
data. This parameter is interpreted either as the number of
evenly-sized (not necessary spaced) bins or the positions of the bin
centers. When this parameter is used, it implies that the default of
x_estimator is numpy.mean.
x_ci str ci Size of the confidence interval used when plotting a central tendency
for discrete values of x. If "ci", defer to the value of the
ci parameter. If "sd", skip bootstrapping and show the
standard deviation of the observations in each bin.
scatter bool True If True, draw a scatterplot with the underlying observations (or
the x_estimator values).
fit_reg bool True If True, estimate and plot a regression model relating the x
and y variables.
ci int 95 Size of the confidence interval for the regression estimate. This will
be drawn using translucent bands around the regression line. The
confidence interval is estimated using a bootstrap; for large
datasets, it may be advisable to avoid that computation by setting
this parameter to None.
n_boot int 1000 Number of bootstrap resamples used to estimate the ci. The default
value attempts to balance time and stability; you may want to increase
this value for “final” versions of plots.
units NoneType None If the x and y observations are nested within sampling units,
those can be specified here. This will be taken into account when
computing the confidence intervals by performing a multilevel bootstrap
that resamples both units and observations (within unit). This does not
otherwise influence how the regression is estimated or drawn.
seed NoneType None Seed or random number generator for reproducible bootstrapping.
order int 1 If order is greater than 1, use numpy.polyfit to estimate a
polynomial regression.
logistic bool False If True, assume that y is a binary variable and use
statsmodels to estimate a logistic regression model. Note that this
is substantially more computationally intensive than linear regression,
so you may wish to decrease the number of bootstrap resamples
(n_boot) or set ci to None.
lowess bool False If True, use statsmodels to estimate a nonparametric lowess
model (locally weighted linear regression). Note that confidence
intervals cannot currently be drawn for this kind of model.
robust bool False If True, use statsmodels to estimate a robust regression. This
will de-weight outliers. Note that this is substantially more
computationally intensive than standard linear regression, so you may
wish to decrease the number of bootstrap resamples (n_boot) or set
ci to None.
logx bool False If True, estimate a linear regression of the form y ~ log(x), but
plot the scatterplot and regression model in the input space. Note that
x must be positive for this to work.
x_partial NoneType None
y_partial NoneType None
truncate bool True If True, the regression line is bounded by the data limits. If
False, it extends to the x axis limits.
dropna bool True
x_jitter NoneType None
y_jitter NoneType None
label NoneType None Label to apply to either the scatterplot or regression line (if
scatter is False) for use in a legend.
color NoneType None Color to apply to all plot elements; will be superseded by colors
passed in scatter_kws or line_kws.
marker str o Marker to use for the scatterplot glyphs.
scatter_kws NoneType None
line_kws NoneType None
ax NoneType None Axes object to draw the plot onto, otherwise uses the current Axes.
Returns matplotlib Axes The Axes object containing the plot.
norm = Data.get_pspa_st_norm().iloc[:,:-6].T

norm.head()
kinase AAK1 ACVR2A ACVR2B AKT1 AKT2 AKT3 ALK2 ALK4 ALPHAK3 AMPKA1 AMPKA2 ANKRD3 ASK1 ATM ATR AURA AURB AURC BCKDK BIKE BMPR1A BMPR1B BMPR2 BRAF BRSK1 BRSK2 BUB1 CAMK1A CAMK1B CAMK1D CAMK1G CAMK2A CAMK2B CAMK2D CAMK2G CAMK4 CAMKK1 CAMKK2 CAMLCK CDC7 CDK1 CDK10 CDK12 CDK13 CDK14 CDK16 CDK17 CDK18 CDK19 CDK2 CDK3 CDK4 CDK5 CDK6 CDK7 CDK8 CDK9 CDKL1 CDKL5 CHAK1 CHAK2 CHK1 CHK2 CK1A CK1A2 CK1D CK1E CK1G1 CK1G2 CK1G3 CK2A1 CK2A2 CLK1 CLK2 CLK3 CLK4 COT CRIK DAPK1 DAPK2 DAPK3 DCAMKL1 DCAMKL2 DLK DMPK1 DNAPK DRAK1 DSTYK DYRK1A DYRK1B DYRK2 DYRK3 DYRK4 EEF2K ERK1 ERK2 ERK5 ERK7 FAM20C GAK GCK GCN2 GRK1 GRK2 GRK3 GRK4 GRK5 GRK6 GRK7 GSK3A GSK3B HASPIN HGK HIPK1 HIPK2 HIPK3 HIPK4 HPK1 HRI HUNK ICK IKKA IKKB IKKE IRAK1 IRAK4 IRE1 IRE2 JNK1 JNK2 JNK3 KHS1 KHS2 KIS LATS1 LATS2 LKB1 LOK LRRK2 MAK MAP3K15 MAPKAPK2 MAPKAPK3 MAPKAPK5 MARK1 MARK2 MARK3 MARK4 MASTL MEK1 MEK2 MEK5 MEKK1 MEKK2 MEKK3 MEKK6 MELK MINK MLK1 MLK2 MLK3 MLK4 MNK1 MNK2 MOK MOS MPSK1 MRCKA MRCKB MSK1 MSK2 MST1 MST2 MST3 MST4 MTOR MYLK4 MYO3A MYO3B NDR1 NDR2 NEK1 NEK11 NEK2 NEK3 NEK4 NEK5 NEK6 NEK7 NEK8 NEK9 NIK NIM1 NLK NUAK1 NUAK2 OSR1 P38A P38B P38D P38G P70S6K P70S6KB P90RSK PAK1 PAK2 PAK3 PAK4 PAK5 PAK6 PASK PBK PDHK1 PDHK4 PDK1 PERK PHKG1 PHKG2 PIM1 PIM2 PIM3 PINK1 PKACA PKACB PKACG PKCA PKCB PKCD PKCE PKCG PKCH PKCI PKCT PKCZ PKG1 PKG2 PKN1 PKN2 PKN3 PKR PLK1 PLK2 PLK3 PLK4 PRKD1 PRKD2 PRKD3 PRKX PRP4 PRPK QIK QSK RAF1 RIPK1 RIPK2 RIPK3 ROCK1 ROCK2 RSK2 RSK3 RSK4 SBK SGK1 SGK3 SIK SKMLCK SLK SMG1 SMMLCK SNRK SRPK1 SRPK2 SRPK3 SSTK STK33 STLK3 TAK1 TAO1 TAO2 TAO3 TBK1 TGFBR1 TGFBR2 TLK1 TLK2 TNIK TSSK1 TSSK2 TTBK1 TTBK2 TTK ULK1 ULK2 VRK1 VRK2 WNK1 WNK3 WNK4 YANK2 YANK3 YSK1 YSK4 ZAK
-5P 0.0720 0.0415 0.0533 0.0603 0.0602 0.0705 0.0536 0.0552 0.0571 0.0555 0.0567 0.0542 0.0830 0.0461 0.0535 0.0434 0.0579 0.0734 0.0482 0.0664 0.0411 0.0644 0.0558 0.0676 0.0552 0.0561 0.0899 0.0908 0.0585 0.0699 0.0549 0.0737 0.0618 0.0659 0.0508 0.0487 0.0711 0.0756 0.0654 0.0537 0.0684 0.0570 0.0753 0.0689 0.0563 0.0534 0.0626 0.0662 0.0454 0.0648 0.0886 0.0673 0.0854 0.0728 0.0627 0.0527 0.0597 0.0540 0.0535 0.0649 0.0532 0.0288 0.0608 0.0843 0.0514 0.0600 0.0488 0.0512 0.0438 0.0387 0.0442 0.0493 0.0494 0.0574 0.0558 0.0535 0.0773 0.0371 0.0444 0.0632 0.0564 0.0685 0.0557 0.0585 0.0513 0.0555 0.0584 0.0539 0.0681 0.0581 0.0582 0.0529 0.0601 0.0603 0.0655 0.0556 0.0608 0.0699 0.0496 0.0410 0.0661 0.0485 0.0770 0.0527 0.0463 0.0525 0.0602 0.0508 0.0683 0.0768 0.0644 0.0775 0.0716 0.0712 0.0877 0.0688 0.0675 0.0641 0.0465 0.0540 0.0611 0.0577 0.0572 0.0564 0.0409 0.0718 0.0535 0.0538 0.0818 0.0813 0.0784 0.0809 0.0654 0.0560 0.0490 0.0401 0.1095 0.0639 0.0582 0.1117 0.0705 0.0835 0.0677 0.0440 0.0456 0.0446 0.0567 0.0525 0.0628 0.0550 0.0654 0.0526 0.0629 0.0584 0.0438 0.0634 0.0605 0.0621 0.0529 0.0753 0.0825 0.0633 0.0738 0.0878 0.0653 0.0745 0.0907 0.0392 0.0515 0.0543 0.0522 0.0782 0.0723 0.0704 0.0465 0.0867 0.0496 0.0640 0.0749 0.0440 0.0412 0.0825 0.0545 0.0613 0.0483 0.0719 0.0819 0.0641 0.0521 0.0656 0.0583 0.0620 0.0501 0.0602 0.0428 0.0432 0.0821 0.0746 0.0804 0.0566 0.0745 0.0379 0.0490 0.0584 0.0683 0.0545 0.0640 0.0603 0.0529 0.0525 0.0388 0.0538 0.0451 0.0452 0.0671 0.0492 0.0596 0.0415 0.0565 0.0465 0.0588 0.0516 0.0528 0.0712 0.0506 0.0720 0.0719 0.0687 0.0542 0.0551 0.0469 0.0465 0.0599 0.0658 0.0471 0.0629 0.0542 0.0582 0.0562 0.0547 0.0551 0.0562 0.0611 0.0545 0.0790 0.0795 0.0634 0.0477 0.0691 0.0582 0.0451 0.0561 0.0606 0.0490 0.0459 0.0526 0.0547 0.0610 0.0560 0.0509 0.0444 0.0676 0.0519 0.0583 0.0489 0.0955 0.0513 0.0590 0.0425 0.0475 0.0594 0.0446 0.0435 0.0856 0.0657 0.0731 0.0624 0.0599 0.0570 0.0579 0.0577 0.0607 0.0559 0.0528 0.0591 0.0832 0.0739 0.0791 0.0412 0.0577 0.0816 0.0477 0.0593 0.0710 0.0684 0.0482 0.0413 0.0369 0.0580 0.0625 0.0590 0.0593 0.0604
-5G 0.0245 0.0481 0.0517 0.0594 0.0617 0.0624 0.0659 0.0574 0.0478 0.0504 0.0479 0.0555 0.0753 0.0581 0.0596 0.0694 0.0728 0.0956 0.0672 0.0333 0.0547 0.0706 0.0621 0.0583 0.0565 0.0567 0.0222 0.0236 0.0490 0.0332 0.0370 0.0446 0.0492 0.0486 0.0571 0.0371 0.0779 0.0758 0.0578 0.0550 0.0823 0.0619 0.0638 0.0578 0.0662 0.0502 0.0593 0.0679 0.0601 0.0485 0.0804 0.0755 0.0795 0.0724 0.0660 0.0519 0.0596 0.0590 0.0727 0.0823 0.0844 0.0195 0.0302 0.0590 0.0528 0.0579 0.0663 0.0614 0.0465 0.0506 0.0577 0.0601 0.0516 0.0725 0.0546 0.0603 0.0827 0.0293 0.0603 0.0634 0.0608 0.0627 0.0447 0.0659 0.0539 0.0665 0.0653 0.0772 0.0647 0.0677 0.0682 0.0551 0.0648 0.0627 0.0733 0.0598 0.0575 0.0641 0.0620 0.0489 0.0743 0.0622 0.1220 0.0493 0.0529 0.0599 0.0684 0.0548 0.0791 0.0585 0.0709 0.0522 0.0745 0.0583 0.0643 0.0508 0.0741 0.0675 0.0527 0.0806 0.0605 0.0824 0.0721 0.0709 0.0596 0.0758 0.0736 0.0593 0.0623 0.0677 0.0674 0.0834 0.0774 0.0585 0.0640 0.0416 0.0754 0.0754 0.0584 0.0881 0.0731 0.0216 0.0267 0.0310 0.0533 0.0575 0.0607 0.0678 0.0752 0.0480 0.0628 0.0567 0.0982 0.0718 0.0539 0.0677 0.0520 0.0708 0.0664 0.0818 0.0816 0.0731 0.0556 0.0615 0.0612 0.0714 0.0561 0.0370 0.0613 0.0695 0.0623 0.0535 0.0582 0.0705 0.0608 0.0629 0.0432 0.0657 0.0814 0.0518 0.0571 0.0626 0.0800 0.0670 0.0736 0.0609 0.0856 0.0798 0.0572 0.0618 0.0765 0.0540 0.0658 0.0578 0.0358 0.0355 0.0824 0.0692 0.0717 0.0781 0.0787 0.0358 0.0565 0.0575 0.0845 0.0707 0.0730 0.0759 0.0670 0.0700 0.0477 0.0590 0.0697 0.0645 0.0711 0.0579 0.0551 0.0452 0.0565 0.0464 0.0738 0.0474 0.0697 0.0834 0.0641 0.0901 0.0852 0.0696 0.0616 0.0722 0.0562 0.0598 0.0710 0.0896 0.0621 0.0711 0.0464 0.0610 0.0572 0.0712 0.0713 0.0738 0.0681 0.0839 0.0416 0.0448 0.0331 0.0823 0.0583 0.0574 0.0524 0.0681 0.0586 0.0694 0.0525 0.0663 0.0626 0.0701 0.0596 0.0646 0.0548 0.0225 0.0557 0.0697 0.0434 0.0884 0.0624 0.0751 0.0434 0.0563 0.0753 0.0660 0.0618 0.0319 0.0841 0.0629 0.0609 0.0695 0.0664 0.0686 0.0701 0.0647 0.0652 0.0408 0.0703 0.0772 0.0374 0.0258 0.0516 0.0752 0.0740 0.0693 0.0724 0.0786 0.0676 0.0510 0.0572 0.0523 0.0699 0.0776 0.0713 0.0728 0.0641
-5A 0.0284 0.0584 0.0566 0.0552 0.0643 0.0745 0.0662 0.0605 0.0253 0.0534 0.0523 0.0611 0.0595 0.0646 0.0571 0.0637 0.0633 0.0857 0.0598 0.0376 0.0578 0.0787 0.0638 0.0623 0.0616 0.0593 0.0249 0.0204 0.0504 0.0313 0.0369 0.0542 0.0519 0.0568 0.0588 0.0351 0.0781 0.0771 0.0579 0.0740 0.0613 0.0497 0.0665 0.0667 0.0506 0.0606 0.0629 0.0613 0.0532 0.0609 0.0653 0.0600 0.0719 0.0613 0.0657 0.0506 0.0664 0.0567 0.0660 0.0686 0.0761 0.0415 0.0101 0.0664 0.0542 0.0553 0.0634 0.0715 0.0497 0.0593 0.0642 0.0594 0.0491 0.0673 0.0625 0.0544 0.0748 0.0264 0.0536 0.0605 0.0561 0.0642 0.0480 0.0529 0.0391 0.0596 0.0576 0.0747 0.0566 0.0530 0.0638 0.0497 0.0585 0.0635 0.0622 0.0538 0.0668 0.0633 0.0669 0.0561 0.0557 0.0631 0.0682 0.0538 0.0605 0.0589 0.0622 0.0592 0.0512 0.0672 0.0676 0.0492 0.0770 0.0532 0.0636 0.0510 0.0569 0.0631 0.0576 0.0745 0.0519 0.0758 0.0718 0.0742 0.0579 0.0679 0.0631 0.0570 0.0692 0.0669 0.0689 0.0767 0.0629 0.0577 0.0471 0.0362 0.0716 0.0678 0.0594 0.0684 0.0582 0.0444 0.0387 0.0459 0.0572 0.0619 0.0685 0.0777 0.0718 0.0608 0.0637 0.0544 0.0609 0.0659 0.0563 0.0677 0.0545 0.0756 0.0659 0.0868 0.0713 0.0629 0.0735 0.0723 0.0480 0.0671 0.0649 0.0456 0.0542 0.0613 0.0624 0.0567 0.0608 0.0788 0.0721 0.0522 0.0492 0.0669 0.0674 0.0488 0.0559 0.0569 0.0602 0.0767 0.0621 0.0578 0.0712 0.0680 0.0631 0.0571 0.0702 0.0572 0.0841 0.0587 0.0491 0.0473 0.0721 0.0554 0.0659 0.0767 0.0631 0.0317 0.0457 0.0539 0.0817 0.0617 0.0780 0.0764 0.0628 0.0712 0.0445 0.0759 0.0594 0.0665 0.0660 0.0631 0.0590 0.0547 0.0561 0.0409 0.0681 0.0612 0.0638 0.0694 0.0642 0.0640 0.0796 0.0613 0.0680 0.0684 0.0702 0.0591 0.0666 0.0778 0.0652 0.0732 0.0509 0.0567 0.0541 0.0665 0.0681 0.0573 0.0628 0.0741 0.0958 0.0742 0.0582 0.0633 0.0552 0.0619 0.0572 0.0778 0.0620 0.0494 0.0591 0.0681 0.0561 0.0589 0.0565 0.0662 0.0514 0.0439 0.0562 0.0628 0.0592 0.0804 0.0629 0.0694 0.0414 0.0629 0.0889 0.0596 0.0556 0.1256 0.0690 0.0667 0.0615 0.0580 0.0607 0.0682 0.0624 0.0657 0.0589 0.0498 0.0718 0.0802 0.0616 0.0706 0.0503 0.0605 0.0570 0.0678 0.0812 0.0633 0.0636 0.0555 0.0503 0.0539 0.0637 0.0647 0.0731 0.0744 0.0659
-5C 0.0456 0.0489 0.0772 0.0605 0.0582 0.0628 0.0762 0.0483 0.0384 0.0588 0.0588 0.0521 0.0604 0.0716 0.0582 0.0608 0.0655 0.0759 0.0694 0.0560 0.0581 0.0556 0.0716 0.0516 0.0935 0.0789 0.0470 0.0364 0.0546 0.0478 0.0569 0.0722 0.0588 0.0627 0.0600 0.0584 0.0774 0.0772 0.0588 0.0683 0.0675 0.0588 0.0626 0.0623 0.0607 0.0584 0.0671 0.0636 0.0606 0.0726 0.0665 0.0588 0.0643 0.0632 0.0695 0.0650 0.0671 0.0622 0.0668 0.0995 0.0626 0.0588 0.0526 0.0588 0.0535 0.0542 0.0495 0.0567 0.0582 0.0604 0.0579 0.0588 0.0754 0.0709 0.0669 0.0645 0.0651 0.0355 0.0551 0.0588 0.0553 0.0588 0.0594 0.0639 0.0540 0.0554 0.0621 0.0700 0.0478 0.1766 0.0708 0.0543 0.0614 0.0586 0.0665 0.0588 0.0584 0.0767 0.0649 0.0578 0.0702 0.0630 0.0618 0.0588 0.0619 0.0588 0.0638 0.0598 0.0718 0.0805 0.0716 0.0495 0.0725 0.0591 0.0656 0.0453 0.0663 0.0644 0.0590 0.0625 0.0588 0.0786 0.0661 0.0657 0.0807 0.0647 0.0588 0.0585 0.0588 0.0588 0.0588 0.0716 0.0722 0.0494 0.0724 0.0684 0.0550 0.0732 0.0610 0.0604 0.0551 0.0525 0.0431 0.0439 0.0815 0.0806 0.0722 0.0689 0.0635 0.0588 0.0552 0.0554 0.0645 0.0613 0.0553 0.0531 0.0562 0.0732 0.0704 0.0674 0.0639 0.0509 0.0852 0.0713 0.0476 0.0679 0.0670 0.0559 0.0479 0.0612 0.0732 0.0804 0.0940 0.0588 0.0699 0.0785 0.0588 0.0766 0.0624 0.0560 0.0596 0.0503 0.0860 0.0666 0.0716 0.0486 0.0809 0.0651 0.0773 0.0588 0.0651 0.0587 0.0683 0.0612 0.0592 0.0544 0.0658 0.0588 0.0567 0.0592 0.0607 0.0379 0.0490 0.0550 0.0630 0.0595 0.0621 0.0689 0.0582 0.0665 0.0435 0.0660 0.0625 0.0672 0.0732 0.0622 0.0588 0.0525 0.0638 0.0420 0.0640 0.0570 0.0552 0.0654 0.0593 0.0751 0.0758 0.0758 0.0751 0.0806 0.0749 0.0627 0.0779 0.0802 0.0721 0.0623 0.0609 0.0780 0.0730 0.0639 0.0639 0.0592 0.0615 0.0597 0.0588 0.0470 0.0452 0.0679 0.0686 0.0545 0.0660 0.0859 0.0588 0.0843 0.0716 0.0789 0.0627 0.0588 0.0489 0.0675 0.0588 0.0505 0.0588 0.0550 0.0588 0.0610 0.0972 0.0714 0.0506 0.0746 0.0814 0.0694 0.0622 0.0761 0.0751 0.0691 0.0588 0.0728 0.0663 0.0735 0.0658 0.0633 0.0653 0.0493 0.0760 0.0698 0.0683 0.0638 0.0460 0.0588 0.0588 0.0718 0.0682 0.0641 0.0644 0.0576 0.0732 0.0544 0.0602 0.0598 0.0606 0.0734 0.0631
-5S 0.0425 0.0578 0.0533 0.0516 0.0534 0.0442 0.0567 0.0574 0.0571 0.0504 0.0503 0.0554 0.0586 0.0581 0.0571 0.0567 0.0536 0.0473 0.0566 0.0507 0.0534 0.0542 0.0571 0.0594 0.0555 0.0593 0.0286 0.0544 0.0563 0.0565 0.0517 0.0542 0.0603 0.0568 0.0583 0.0577 0.0613 0.0601 0.0579 0.0561 0.0550 0.0536 0.0552 0.0549 0.0563 0.0530 0.0593 0.0541 0.0527 0.0609 0.0584 0.0539 0.0596 0.0582 0.0597 0.0522 0.0596 0.0567 0.0573 0.0608 0.0588 0.0389 0.0538 0.0590 0.0546 0.0590 0.0579 0.0588 0.0497 0.0536 0.0574 0.0580 0.0524 0.0547 0.0546 0.0535 0.0635 0.0371 0.0536 0.0582 0.0545 0.0597 0.0557 0.0597 0.0493 0.0604 0.0557 0.0550 0.0566 0.0530 0.0526 0.0529 0.0542 0.0602 0.0573 0.0602 0.0593 0.0611 0.0564 0.0591 0.0595 0.0612 0.0470 0.0548 0.0567 0.0561 0.0602 0.0581 0.0504 0.0578 0.0589 0.0522 0.0578 0.0524 0.0584 0.0560 0.0569 0.0627 0.0589 0.0557 0.0588 0.0466 0.0545 0.0567 0.0592 0.0526 0.0566 0.0538 0.0573 0.0557 0.0569 0.0573 0.0591 0.0577 0.0490 0.0395 0.0552 0.0497 0.0584 0.0508 0.0603 0.0578 0.0662 0.0459 0.0533 0.0557 0.0567 0.0572 0.0561 0.0555 0.0592 0.0528 0.0609 0.0584 0.0594 0.0605 0.0531 0.0576 0.0563 0.0560 0.0584 0.0595 0.0570 0.0574 0.0539 0.0561 0.0565 0.0454 0.0478 0.0543 0.0522 0.0603 0.0606 0.0530 0.0570 0.0596 0.0537 0.0633 0.0584 0.0433 0.0436 0.0569 0.0602 0.0588 0.0601 0.0578 0.0590 0.0518 0.0552 0.0600 0.0594 0.0556 0.0511 0.0594 0.0472 0.0473 0.0574 0.0589 0.0555 0.0566 0.0604 0.0388 0.0457 0.0499 0.0462 0.0505 0.0508 0.0565 0.0558 0.0520 0.0477 0.0490 0.0594 0.0622 0.0572 0.0586 0.0582 0.0564 0.0481 0.0409 0.0545 0.0523 0.0526 0.0454 0.0503 0.0520 0.0589 0.0548 0.0550 0.0522 0.0516 0.0557 0.0529 0.0634 0.0538 0.0520 0.0545 0.0556 0.0570 0.0576 0.0597 0.0562 0.0628 0.0540 0.0419 0.0463 0.0510 0.0477 0.0581 0.0566 0.0569 0.0561 0.0599 0.0638 0.0579 0.0580 0.0458 0.0446 0.0492 0.0507 0.0410 0.0439 0.0433 0.0489 0.0526 0.0547 0.0589 0.0567 0.0519 0.0563 0.0525 0.0491 0.0541 0.0531 0.0511 0.0565 0.0585 0.0578 0.0574 0.0579 0.0605 0.0576 0.0559 0.0551 0.0526 0.0543 0.0460 0.0418 0.0516 0.0577 0.0579 0.0598 0.0603 0.0595 0.0573 0.0561 0.0569 0.0580 0.0580 0.0545 0.0542 0.0597 0.0597
plot_corr(data=norm, x='AAK1', y='BIKE')

Correlation with heatmap


source

draw_corr

 draw_corr (corr)

plot heatmap from df.corr()

rdkit_corr = aa_rdkit.T.corr()
rdkit_corr.head()
aa A C D E F G H I K L M N P Q R S T V W Y s t y Kac Kme3
aa
A 1.000000 0.396026 0.047421 -0.175815 -0.170865 0.749792 -0.002553 0.116027 -0.054081 0.242662 0.188048 0.047783 0.216545 -0.115347 -0.244142 0.507632 0.383204 0.286786 -0.319528 -0.274776 -0.252319 -0.356094 -0.510190 -0.368160 -0.320212
C 0.396026 1.000000 -0.058477 -0.133039 -0.225357 0.417041 -0.014784 -0.054076 -0.204713 -0.080324 0.435169 0.003458 0.098103 -0.110953 -0.144396 0.298194 0.085377 0.036676 -0.224113 -0.270975 -0.020420 -0.158218 -0.322427 -0.309861 -0.187490
D 0.047421 -0.058477 1.000000 0.817318 -0.312307 0.038680 -0.220570 -0.346854 -0.270019 -0.325001 -0.305533 0.422209 -0.296263 0.284730 -0.045075 0.291661 0.259357 -0.180636 -0.332358 -0.187248 0.400617 0.333868 -0.012449 -0.105319 -0.230464
E -0.175815 -0.133039 0.817318 1.000000 -0.285335 -0.067490 -0.224050 -0.218968 -0.207841 -0.284995 -0.321922 0.308917 -0.241104 0.406776 0.019323 0.023854 0.009260 -0.134792 -0.255025 -0.049448 0.240428 0.203028 0.107233 0.144592 -0.199208
F -0.170865 -0.225357 -0.312307 -0.285335 1.000000 -0.139753 0.040292 0.218310 -0.055113 0.216716 -0.021011 -0.328251 0.081243 -0.283719 -0.293145 -0.363187 -0.319373 0.029431 0.670357 0.452073 -0.385055 -0.308111 0.224323 -0.112850 0.005131
draw_corr(rdkit_corr)

AUCDF


source

get_AUCDF

 get_AUCDF (df, col, reverse=False, plot=True, xlabel='Rank of reported
            kinase')

Plot CDF curve and get relative area under the curve

get_AUCDF(sorted_df,'values')

0.8754977337946649

Confusion matrix


source

plot_confusion_matrix

 plot_confusion_matrix (target, pred, class_names:list=['0', '1'],
                        normalize=False, title='Confusion matrix',
                        cmap=<matplotlib.colors.LinearSegmentedColormap
                        object at 0x7fce0fa98220>)

Plot the confusion matrix.

Type Default Details
target pd.Series
pred pd.Series
class_names list [‘0’, ‘1’]
normalize bool False
title str Confusion matrix
cmap LinearSegmentedColormap <matplotlib.colors.LinearSegmentedColormap object at 0x7fce0fa98220>
target = info.MW<160
pred = info.pKa1>2.1
plot_confusion_matrix(target,pred,normalize=True)
Normalized confusion matrix