Plot

Functions to plot on 2D

Setup

Dimensionality reduction

reduce_feature

 reduce_feature (data, method='pca', complexity=20, n=2, seed:int=123,
                 **kwargs)

Reduce the dimensionality given a dataframe of values

	Type	Default	Details
data			df or numpy array
method	str	pca	dimensionality reduction method, accept both capital and lower case
complexity	int	20	None for PCA; perfplexity for TSNE, recommend: 30; n_neigbors for UMAP, recommend: 15
n	int	2	n_components
seed	int	123	seed for random_state
kwargs	VAR_KEYWORD

# morgan fingerprints
df = pd.read_csv('files/morgan.csv')
df

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31	32	33	34	35	36	37	38	39	40	41	42	43	44	45	46	47	48	49	...	1998	1999	2000	2001	2002	2003	2004	2005	2006	2007	2008	2009	2010	2011	2012	2013	2014	2015	2016	2017	2018	2019	2020	2021	2022	2023	2024	2025	2026	2027	2028	2029	2030	2031	2032	2033	2034	2035	2036	2037	2038	2039	2040	2041	2042	2043	2044	2045	2046	2047
0	0	1	1	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	...	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
1	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	...	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
298	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0
299	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0

300 rows × 2048 columns

pca = reduce_feature(df,'pca',n=10)
pca

	PCA1	PCA2	PCA3	PCA4	PCA5	PCA6	PCA7	PCA8	PCA9	PCA10
0	5.055364	-0.201475	0.017985	-1.484066	1.548818	0.998801	-1.959704	-1.446077	2.579568	0.791852
1	0.720893	0.104023	3.616964	0.774077	0.262882	-0.813578	0.194586	0.606086	0.337897	1.006187
...	...	...	...	...	...	...	...	...	...	...
298	-0.911653	-0.834387	0.054771	-0.141513	-0.385500	0.036934	0.139089	-0.157255	-0.316494	-0.042620
299	-0.506653	-0.217700	-0.309063	0.005900	-0.275369	-0.652045	-0.151574	0.838589	-0.281150	-0.323430

300 rows × 10 columns

2d plot

source

set_sns

 set_sns ()

set_sns()

/opt/hostedtoolcache/Python/3.10.18/x64/lib/python3.10/site-packages/fastcore/docscrape.py:230: UserWarning: Unknown section See Also
  else: warn(msg)

source

plot_2d

 plot_2d (X:pandas.core.frame.DataFrame, data=None, x=None, y=None,
          hue=None, size=None, style=None, palette=None, hue_order=None,
          hue_norm=None, sizes=None, size_order=None, size_norm=None,
          markers=True, style_order=None, legend='auto', ax=None)

Make 2D plot from a dataframe that has first column to be x, and second column to be y

	Type	Default	Details
X	DataFrame		a dataframe that has first column to be x, and second column to be y
data	NoneType	None	Input data structure. Either a long-form collection of vectors that can be assigned to named variables or a wide-form dataset that will be internally reshaped.
x	NoneType	None
y	NoneType	None
hue	NoneType	None	Grouping variable that will produce points with different colors. Can be either categorical or numeric, although color mapping will behave differently in latter case.
size	NoneType	None	Grouping variable that will produce points with different sizes. Can be either categorical or numeric, although size mapping will behave differently in latter case.
style	NoneType	None	Grouping variable that will produce points with different markers. Can have a numeric dtype but will always be treated as categorical.
palette	NoneType	None	Method for choosing the colors to use when mapping the `hue` semantic. String values are passed to :func:`color_palette`. List or dict values imply categorical mapping, while a colormap object implies numeric mapping.
hue_order	NoneType	None	Specify the order of processing and plotting for categorical levels of the `hue` semantic.
hue_norm	NoneType	None	Either a pair of values that set the normalization range in data units or an object that will map from data units into a [0, 1] interval. Usage implies numeric mapping.
sizes	NoneType	None	An object that determines how sizes are chosen when `size` is used. List or dict arguments should provide a size for each unique data value, which forces a categorical interpretation. The argument may also be a min, max tuple.
size_order	NoneType	None	Specified order for appearance of the `size` variable levels, otherwise they are determined from the data. Not relevant when the `size` variable is numeric.
size_norm	NoneType	None	Normalization in data units for scaling plot objects when the `size` variable is numeric.
markers	bool	True	Object determining how to draw the markers for different levels of the `style` variable. Setting to `True` will use default markers, or you can pass a list of markers or a dictionary mapping levels of the `style` variable to markers. Setting to `False` will draw marker-less lines. Markers are specified as in matplotlib.
style_order	NoneType	None	Specified order for appearance of the `style` variable levels otherwise they are determined from the data. Not relevant when the `style` variable is numeric.
legend	str	auto	How to draw the legend. If “brief”, numeric `hue` and `size` variables will be represented with a sample of evenly spaced values. If “full”, every group will get an entry in the legend. If “auto”, choose between brief or full representation based on number of levels. If `False`, no legend data is added and no legend is drawn.
ax	NoneType	None	Pre-existing axes for the plot. Otherwise, call :func:`matplotlib.pyplot.gca` internally.
Returns	:class:`matplotlib.axes.Axes`		The matplotlib axes containing the plot.

plot_2d(pca.iloc[:,:2])

source

plot_corr

 plot_corr (x, y, xlabel=None, ylabel=None, order=3)

	Type	Default	Details
x			a column of df
y			a column of df
xlabel	NoneType	None	x axis label
ylabel	NoneType	None	y axis label
order	int	3	polynomial level, if straight, order=1

plot_corr(pca.PCA1,pca.PCA2)

Setup

Dimensionality reduction

reduce_feature

2d plot

set_sns

plot_2d

plot_corr

End