Feature

A collection of tools to extract features from SMILES, proteins, etc.

Setup

Features from SMILES

RDKit descriptors


source

get_rdkit

 get_rdkit (df:pandas.core.frame.DataFrame, col:str='SMILES',
            normalize:bool=True)

Extract chemical features from smiles via rdkit.Chem.Descriptors; if normalize, apply StandardScaler

Type Default Details
df DataFrame a dataframe that contains smiles
col str SMILES colname of smile
normalize bool True normalize features using StandardScaler()
aa = Data.get_aa_info()
aa.head()
Name SMILES MW pKa1 pKb2 pKx3 pl4 H VSC P1 P2 SASA NCISC phospho
aa
A Alanine C[C@@H](C(=O)O)N 89.10 2.34 9.69 NaN 6.00 0.62 27.5 8.1 0.046 1.181 0.007187 0
C Cysteine C([C@@H](C(=O)O)N)S 121.16 1.96 10.28 8.18 5.07 0.29 44.6 5.5 0.128 1.461 -0.036610 0
D Aspartic acid C([C@@H](C(=O)O)N)C(=O)O 133.11 1.88 9.60 3.65 2.77 -0.90 40.0 13.0 0.105 1.587 -0.023820 0
E Glutamic acid C(CC(=O)O)[C@@H](C(=O)O)N 147.13 2.19 9.67 4.25 3.22 -0.74 62.0 12.3 0.151 1.862 0.006802 0
F Phenylalanine c1ccc(cc1)C[C@@H](C(=O)O)N 165.19 1.83 9.13 NaN 5.48 1.19 115.5 5.2 0.290 2.228 0.037552 0
aa_rdkit = get_rdkit(aa, 'SMILES')
aa_rdkit.head()
MaxAbsEStateIndex MaxEStateIndex MinAbsEStateIndex MinEStateIndex qed SPS MolWt HeavyAtomMolWt ExactMolWt NumValenceElectrons ... fr_sulfide fr_sulfonamd fr_sulfone fr_term_acetylene fr_tetrazole fr_thiazole fr_thiocyan fr_thiophene fr_unbrch_alkane fr_urea
aa
A -1.653421 -1.653421 1.218945 0.407753 -0.383393 -0.070345 -1.523488 -1.498443 -1.523163 -1.551232 ... -0.204124 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -0.408248 0.0
C -1.058215 -1.058215 -0.588000 0.372307 -0.641865 -0.138884 -0.727067 -0.666442 -0.728722 -1.137202 ... -0.204124 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -0.408248 0.0
D -0.764466 -0.764466 0.554854 0.126078 -0.376981 -0.390192 -0.430473 -0.356599 -0.430106 -0.447152 ... -0.204124 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -0.408248 0.0
E -0.283221 -0.283221 -1.143984 0.235448 -0.051582 -0.406185 -0.082096 -0.044965 -0.081845 -0.033122 ... -0.204124 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -0.408248 0.0
F 0.972596 0.972596 0.063427 0.410788 1.908107 -0.430173 0.366494 0.371360 0.366059 0.380907 ... -0.204124 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -0.408248 0.0

5 rows × 210 columns

Morgan fingerprint


source

get_morgan

 get_morgan (df:pandas.core.frame.DataFrame, col:str='SMILES', radius=3)

Get 2048 morgan fingerprint (binary feature) from smiles in a dataframe

Type Default Details
df DataFrame a dataframe that contains smiles
col str SMILES colname of smile
radius int 3
aa_morgan = get_morgan(aa, 'SMILES')
aa_morgan.head()
morgan_0 morgan_1 morgan_2 morgan_3 morgan_4 morgan_5 morgan_6 morgan_7 morgan_8 morgan_9 morgan_10 morgan_11 morgan_12 morgan_13 morgan_14 morgan_15 morgan_16 morgan_17 morgan_18 morgan_19 morgan_20 morgan_21 morgan_22 morgan_23 morgan_24 morgan_25 morgan_26 morgan_27 morgan_28 morgan_29 morgan_30 morgan_31 morgan_32 morgan_33 morgan_34 morgan_35 morgan_36 morgan_37 morgan_38 morgan_39 morgan_40 morgan_41 morgan_42 morgan_43 morgan_44 morgan_45 morgan_46 morgan_47 morgan_48 morgan_49 morgan_50 morgan_51 morgan_52 morgan_53 morgan_54 morgan_55 morgan_56 morgan_57 morgan_58 morgan_59 morgan_60 morgan_61 morgan_62 morgan_63 morgan_64 morgan_65 morgan_66 morgan_67 morgan_68 morgan_69 morgan_70 morgan_71 morgan_72 morgan_73 morgan_74 morgan_75 morgan_76 morgan_77 morgan_78 morgan_79 morgan_80 morgan_81 morgan_82 morgan_83 morgan_84 morgan_85 morgan_86 morgan_87 morgan_88 morgan_89 morgan_90 morgan_91 morgan_92 morgan_93 morgan_94 morgan_95 morgan_96 morgan_97 morgan_98 morgan_99 morgan_100 morgan_101 morgan_102 morgan_103 morgan_104 morgan_105 morgan_106 morgan_107 morgan_108 morgan_109 morgan_110 morgan_111 morgan_112 morgan_113 morgan_114 morgan_115 morgan_116 morgan_117 morgan_118 morgan_119 morgan_120 morgan_121 morgan_122 morgan_123 morgan_124 morgan_125 morgan_126 morgan_127 morgan_128 morgan_129 morgan_130 morgan_131 morgan_132 morgan_133 morgan_134 morgan_135 morgan_136 morgan_137 morgan_138 morgan_139 morgan_140 morgan_141 morgan_142 morgan_143 morgan_144 morgan_145 morgan_146 morgan_147 morgan_148 morgan_149 morgan_150 morgan_151 morgan_152 morgan_153 morgan_154 morgan_155 morgan_156 morgan_157 morgan_158 morgan_159 morgan_160 morgan_161 morgan_162 morgan_163 morgan_164 morgan_165 morgan_166 morgan_167 morgan_168 morgan_169 morgan_170 morgan_171 morgan_172 morgan_173 morgan_174 morgan_175 morgan_176 morgan_177 morgan_178 morgan_179 morgan_180 morgan_181 morgan_182 morgan_183 morgan_184 morgan_185 morgan_186 morgan_187 morgan_188 morgan_189 morgan_190 morgan_191 morgan_192 morgan_193 morgan_194 morgan_195 morgan_196 morgan_197 morgan_198 morgan_199 morgan_200 morgan_201 morgan_202 morgan_203 morgan_204 morgan_205 morgan_206 morgan_207 morgan_208 morgan_209 morgan_210 morgan_211 morgan_212 morgan_213 morgan_214 morgan_215 morgan_216 morgan_217 morgan_218 morgan_219 morgan_220 morgan_221 morgan_222 morgan_223 morgan_224 morgan_225 morgan_226 morgan_227 morgan_228 morgan_229 morgan_230 morgan_231 morgan_232 morgan_233 morgan_234 morgan_235 morgan_236 morgan_237 morgan_238 morgan_239 morgan_240 morgan_241 morgan_242 morgan_243 morgan_244 morgan_245 morgan_246 morgan_247 morgan_248 morgan_249 morgan_250 morgan_251 morgan_252 morgan_253 morgan_254 morgan_255 morgan_256 morgan_257 morgan_258 morgan_259 morgan_260 morgan_261 morgan_262 morgan_263 morgan_264 morgan_265 morgan_266 morgan_267 morgan_268 morgan_269 morgan_270 morgan_271 morgan_272 morgan_273 morgan_274 morgan_275 morgan_276 morgan_277 morgan_278 morgan_279 morgan_280 morgan_281 morgan_282 morgan_283 morgan_284 morgan_285 morgan_286 morgan_287 morgan_288 morgan_289 morgan_290 morgan_291 morgan_292 morgan_293 morgan_294 morgan_295 morgan_296 morgan_297 morgan_298 morgan_299 morgan_300 morgan_301 morgan_302 morgan_303 morgan_304 morgan_305 morgan_306 morgan_307 morgan_308 morgan_309 morgan_310 morgan_311 morgan_312 morgan_313 morgan_314 morgan_315 morgan_316 morgan_317 morgan_318 morgan_319 morgan_320 morgan_321 morgan_322 morgan_323 morgan_324 morgan_325 morgan_326 morgan_327 morgan_328 morgan_329 morgan_330 morgan_331 morgan_332 morgan_333 morgan_334 morgan_335 morgan_336 morgan_337 morgan_338 morgan_339 morgan_340 morgan_341 morgan_342 morgan_343 morgan_344 morgan_345 morgan_346 morgan_347 morgan_348 morgan_349 morgan_350 morgan_351 morgan_352 morgan_353 morgan_354 morgan_355 morgan_356 morgan_357 morgan_358 morgan_359 morgan_360 morgan_361 morgan_362 morgan_363 morgan_364 morgan_365 morgan_366 morgan_367 morgan_368 morgan_369 morgan_370 morgan_371 morgan_372 morgan_373 morgan_374 morgan_375 morgan_376 morgan_377 morgan_378 morgan_379 morgan_380 morgan_381 morgan_382 morgan_383 morgan_384 morgan_385 morgan_386 morgan_387 morgan_388 morgan_389 morgan_390 morgan_391 morgan_392 morgan_393 morgan_394 morgan_395 morgan_396 morgan_397 morgan_398 morgan_399 morgan_400 morgan_401 morgan_402 morgan_403 morgan_404 morgan_405 morgan_406 morgan_407 morgan_408 morgan_409 morgan_410 morgan_411 morgan_412 morgan_413 morgan_414 morgan_415 morgan_416 morgan_417 morgan_418 morgan_419 morgan_420 morgan_421 morgan_422 morgan_423 morgan_424 morgan_425 morgan_426 morgan_427 morgan_428 morgan_429 morgan_430 morgan_431 morgan_432 morgan_433 morgan_434 morgan_435 morgan_436 morgan_437 morgan_438 morgan_439 morgan_440 morgan_441 morgan_442 morgan_443 morgan_444 morgan_445 morgan_446 morgan_447 morgan_448 morgan_449 morgan_450 morgan_451 morgan_452 morgan_453 morgan_454 morgan_455 morgan_456 morgan_457 morgan_458 morgan_459 morgan_460 morgan_461 morgan_462 morgan_463 morgan_464 morgan_465 morgan_466 morgan_467 morgan_468 morgan_469 morgan_470 morgan_471 morgan_472 morgan_473 morgan_474 morgan_475 morgan_476 morgan_477 morgan_478 morgan_479 morgan_480 morgan_481 morgan_482 morgan_483 morgan_484 morgan_485 morgan_486 morgan_487 morgan_488 morgan_489 morgan_490 morgan_491 morgan_492 morgan_493 morgan_494 morgan_495 morgan_496 morgan_497 morgan_498 ... morgan_1549 morgan_1550 morgan_1551 morgan_1552 morgan_1553 morgan_1554 morgan_1555 morgan_1556 morgan_1557 morgan_1558 morgan_1559 morgan_1560 morgan_1561 morgan_1562 morgan_1563 morgan_1564 morgan_1565 morgan_1566 morgan_1567 morgan_1568 morgan_1569 morgan_1570 morgan_1571 morgan_1572 morgan_1573 morgan_1574 morgan_1575 morgan_1576 morgan_1577 morgan_1578 morgan_1579 morgan_1580 morgan_1581 morgan_1582 morgan_1583 morgan_1584 morgan_1585 morgan_1586 morgan_1587 morgan_1588 morgan_1589 morgan_1590 morgan_1591 morgan_1592 morgan_1593 morgan_1594 morgan_1595 morgan_1596 morgan_1597 morgan_1598 morgan_1599 morgan_1600 morgan_1601 morgan_1602 morgan_1603 morgan_1604 morgan_1605 morgan_1606 morgan_1607 morgan_1608 morgan_1609 morgan_1610 morgan_1611 morgan_1612 morgan_1613 morgan_1614 morgan_1615 morgan_1616 morgan_1617 morgan_1618 morgan_1619 morgan_1620 morgan_1621 morgan_1622 morgan_1623 morgan_1624 morgan_1625 morgan_1626 morgan_1627 morgan_1628 morgan_1629 morgan_1630 morgan_1631 morgan_1632 morgan_1633 morgan_1634 morgan_1635 morgan_1636 morgan_1637 morgan_1638 morgan_1639 morgan_1640 morgan_1641 morgan_1642 morgan_1643 morgan_1644 morgan_1645 morgan_1646 morgan_1647 morgan_1648 morgan_1649 morgan_1650 morgan_1651 morgan_1652 morgan_1653 morgan_1654 morgan_1655 morgan_1656 morgan_1657 morgan_1658 morgan_1659 morgan_1660 morgan_1661 morgan_1662 morgan_1663 morgan_1664 morgan_1665 morgan_1666 morgan_1667 morgan_1668 morgan_1669 morgan_1670 morgan_1671 morgan_1672 morgan_1673 morgan_1674 morgan_1675 morgan_1676 morgan_1677 morgan_1678 morgan_1679 morgan_1680 morgan_1681 morgan_1682 morgan_1683 morgan_1684 morgan_1685 morgan_1686 morgan_1687 morgan_1688 morgan_1689 morgan_1690 morgan_1691 morgan_1692 morgan_1693 morgan_1694 morgan_1695 morgan_1696 morgan_1697 morgan_1698 morgan_1699 morgan_1700 morgan_1701 morgan_1702 morgan_1703 morgan_1704 morgan_1705 morgan_1706 morgan_1707 morgan_1708 morgan_1709 morgan_1710 morgan_1711 morgan_1712 morgan_1713 morgan_1714 morgan_1715 morgan_1716 morgan_1717 morgan_1718 morgan_1719 morgan_1720 morgan_1721 morgan_1722 morgan_1723 morgan_1724 morgan_1725 morgan_1726 morgan_1727 morgan_1728 morgan_1729 morgan_1730 morgan_1731 morgan_1732 morgan_1733 morgan_1734 morgan_1735 morgan_1736 morgan_1737 morgan_1738 morgan_1739 morgan_1740 morgan_1741 morgan_1742 morgan_1743 morgan_1744 morgan_1745 morgan_1746 morgan_1747 morgan_1748 morgan_1749 morgan_1750 morgan_1751 morgan_1752 morgan_1753 morgan_1754 morgan_1755 morgan_1756 morgan_1757 morgan_1758 morgan_1759 morgan_1760 morgan_1761 morgan_1762 morgan_1763 morgan_1764 morgan_1765 morgan_1766 morgan_1767 morgan_1768 morgan_1769 morgan_1770 morgan_1771 morgan_1772 morgan_1773 morgan_1774 morgan_1775 morgan_1776 morgan_1777 morgan_1778 morgan_1779 morgan_1780 morgan_1781 morgan_1782 morgan_1783 morgan_1784 morgan_1785 morgan_1786 morgan_1787 morgan_1788 morgan_1789 morgan_1790 morgan_1791 morgan_1792 morgan_1793 morgan_1794 morgan_1795 morgan_1796 morgan_1797 morgan_1798 morgan_1799 morgan_1800 morgan_1801 morgan_1802 morgan_1803 morgan_1804 morgan_1805 morgan_1806 morgan_1807 morgan_1808 morgan_1809 morgan_1810 morgan_1811 morgan_1812 morgan_1813 morgan_1814 morgan_1815 morgan_1816 morgan_1817 morgan_1818 morgan_1819 morgan_1820 morgan_1821 morgan_1822 morgan_1823 morgan_1824 morgan_1825 morgan_1826 morgan_1827 morgan_1828 morgan_1829 morgan_1830 morgan_1831 morgan_1832 morgan_1833 morgan_1834 morgan_1835 morgan_1836 morgan_1837 morgan_1838 morgan_1839 morgan_1840 morgan_1841 morgan_1842 morgan_1843 morgan_1844 morgan_1845 morgan_1846 morgan_1847 morgan_1848 morgan_1849 morgan_1850 morgan_1851 morgan_1852 morgan_1853 morgan_1854 morgan_1855 morgan_1856 morgan_1857 morgan_1858 morgan_1859 morgan_1860 morgan_1861 morgan_1862 morgan_1863 morgan_1864 morgan_1865 morgan_1866 morgan_1867 morgan_1868 morgan_1869 morgan_1870 morgan_1871 morgan_1872 morgan_1873 morgan_1874 morgan_1875 morgan_1876 morgan_1877 morgan_1878 morgan_1879 morgan_1880 morgan_1881 morgan_1882 morgan_1883 morgan_1884 morgan_1885 morgan_1886 morgan_1887 morgan_1888 morgan_1889 morgan_1890 morgan_1891 morgan_1892 morgan_1893 morgan_1894 morgan_1895 morgan_1896 morgan_1897 morgan_1898 morgan_1899 morgan_1900 morgan_1901 morgan_1902 morgan_1903 morgan_1904 morgan_1905 morgan_1906 morgan_1907 morgan_1908 morgan_1909 morgan_1910 morgan_1911 morgan_1912 morgan_1913 morgan_1914 morgan_1915 morgan_1916 morgan_1917 morgan_1918 morgan_1919 morgan_1920 morgan_1921 morgan_1922 morgan_1923 morgan_1924 morgan_1925 morgan_1926 morgan_1927 morgan_1928 morgan_1929 morgan_1930 morgan_1931 morgan_1932 morgan_1933 morgan_1934 morgan_1935 morgan_1936 morgan_1937 morgan_1938 morgan_1939 morgan_1940 morgan_1941 morgan_1942 morgan_1943 morgan_1944 morgan_1945 morgan_1946 morgan_1947 morgan_1948 morgan_1949 morgan_1950 morgan_1951 morgan_1952 morgan_1953 morgan_1954 morgan_1955 morgan_1956 morgan_1957 morgan_1958 morgan_1959 morgan_1960 morgan_1961 morgan_1962 morgan_1963 morgan_1964 morgan_1965 morgan_1966 morgan_1967 morgan_1968 morgan_1969 morgan_1970 morgan_1971 morgan_1972 morgan_1973 morgan_1974 morgan_1975 morgan_1976 morgan_1977 morgan_1978 morgan_1979 morgan_1980 morgan_1981 morgan_1982 morgan_1983 morgan_1984 morgan_1985 morgan_1986 morgan_1987 morgan_1988 morgan_1989 morgan_1990 morgan_1991 morgan_1992 morgan_1993 morgan_1994 morgan_1995 morgan_1996 morgan_1997 morgan_1998 morgan_1999 morgan_2000 morgan_2001 morgan_2002 morgan_2003 morgan_2004 morgan_2005 morgan_2006 morgan_2007 morgan_2008 morgan_2009 morgan_2010 morgan_2011 morgan_2012 morgan_2013 morgan_2014 morgan_2015 morgan_2016 morgan_2017 morgan_2018 morgan_2019 morgan_2020 morgan_2021 morgan_2022 morgan_2023 morgan_2024 morgan_2025 morgan_2026 morgan_2027 morgan_2028 morgan_2029 morgan_2030 morgan_2031 morgan_2032 morgan_2033 morgan_2034 morgan_2035 morgan_2036 morgan_2037 morgan_2038 morgan_2039 morgan_2040 morgan_2041 morgan_2042 morgan_2043 morgan_2044 morgan_2045 morgan_2046 morgan_2047
aa
A 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
C 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
D 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
E 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
F 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

5 rows × 2048 columns

aa_morgan = get_morgan(aa, 'SMILES')
aa_morgan.head()
morgan_0 morgan_1 morgan_2 morgan_3 morgan_4 morgan_5 morgan_6 morgan_7 morgan_8 morgan_9 ... morgan_2038 morgan_2039 morgan_2040 morgan_2041 morgan_2042 morgan_2043 morgan_2044 morgan_2045 morgan_2046 morgan_2047
aa
A 0 1 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
C 0 1 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
D 0 1 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
E 0 1 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
F 0 1 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

5 rows × 2048 columns

Features from protein sequence

ESM2


source

get_esm

 get_esm (df:pandas.core.frame.DataFrame, col:str='sequence',
          model_name:str='esm2_t33_650M_UR50D')

Extract esmfold2 embeddings from protein sequence in a dataframe

Type Default Details
df DataFrame a dataframe that contains amino acid sequence
col str sequence colname of amino acid sequence
model_name str esm2_t33_650M_UR50D Name of the ESM model to use for the embeddings.

ESM2 model is trained on UniRef sequence. The default model in the function is esm2_t33_650M_UR50D, which is trained on UniRef50.

Uncheck below to use:

# # Examples
# df = Data.get_kinase_info().set_index('kinase')
# sample = df[:5]
# esmfeature = get_esm(sample,'sequence')
# esmfeature.head()

ProtT5


source

get_t5

 get_t5 (df:pandas.core.frame.DataFrame, col:str='sequence')

Extract ProtT5-XL-uniref50 embeddings from protein sequence in a dataframe

XL-uniref50 model is a t5-3b model trained on Uniref50 Dataset.

Uncheck below to use:

# t5feature = get_t5(sample,'sequence')
# t5feature.head()

source

get_t5_bfd

 get_t5_bfd (df:pandas.core.frame.DataFrame, col:str='sequence')

Extract ProtT5-XL-BFD embeddings from protein sequence in a dataframe

XL-BFD model is a t5-3b model trained on Big Fantastic Database(BFD).

Uncheck below to use:

# t5bfd = get_t5_bfd(sample,'sequence')
# t5bfd.head()

Dimensionality reduction


source

reduce_feature

 reduce_feature (df:pandas.core.frame.DataFrame, method:str='pca',
                 complexity:int=20, n:int=2, load:str=None, save:str=None,
                 seed:int=123, **kwargs)

Reduce the dimensionality given a dataframe of values

Type Default Details
df DataFrame
method str pca dimensionality reduction method, accept both capital and lower case
complexity int 20 None for PCA; perfplexity for TSNE, recommend: 30; n_neigbors for UMAP, recommend: 15
n int 2 n_components
load str None load a previous model, e.g. model.pkl
save str None pkl file to be saved, e.g. pca_model.pkl
seed int 123 seed for random_state
kwargs

A very common way to reduce feature number is to use dimensionality reduction method. reduce_feature is a dimensionality reduction function that can apply three dimensionality reduction methods: PCA, UMAP, TSNE. The later two is non-linear transformation, and PCA is linear transformation. Therefore, for plotting purpose, it is good to use UMAP/TSNE, by setting n (n_components) to 2 for 2d plot; for featuring purpose, it is good to use PCA, and set n to values to a rational values, like 64, 128 etc.

# Load data
df = Data.get_aa_rdkit()

# Use PCA to reduce dimension; reduce the number of features to 20
reduce_feature(df,'pca',n=20).head()
PCA1 PCA2 PCA3 PCA4 PCA5 PCA6 PCA7 PCA8 PCA9 PCA10 PCA11 PCA12 PCA13 PCA14 PCA15 PCA16 PCA17 PCA18 PCA19 PCA20
aa
A -463.014948 -79.180061 -8.957621 -13.455810 1.334975 0.996915 -6.228205 -2.573987 0.637178 1.904173 -1.165818 5.803809 -3.519867 1.620306 -3.686674 -1.070729 -1.044587 -2.245493 -0.023173 2.143434
C -446.251885 -52.851228 1.200874 0.469406 -16.721236 12.310611 5.623647 17.543569 6.290376 -4.818617 0.871101 -1.274344 3.983329 -6.019231 -5.866159 -0.339787 -3.342606 0.934348 -0.335121 -0.045040
D -407.721016 9.532878 10.375789 -21.871983 -3.757091 -2.804468 3.684495 -8.257556 0.885011 4.454468 4.085862 3.059634 3.463971 2.626308 1.260286 -3.054156 -1.823627 4.300302 -2.407938 1.076088
E -355.786380 21.077202 11.870110 -10.861780 4.869825 -3.906521 -2.281413 -2.893303 8.997722 3.828554 2.004998 -1.002484 9.471326 -1.945113 3.684237 2.119809 0.362597 -1.166347 -3.604024 -0.752169
F 69.598210 74.375112 -68.407808 2.572185 6.659703 16.787547 10.585299 -1.588954 -10.532959 -2.643261 -7.012191 4.522682 4.779671 0.908865 -0.218389 1.152286 0.560021 1.736196 0.152801 -0.567469

source

remove_hi_corr

 remove_hi_corr (df:pandas.core.frame.DataFrame, thr:float=0.98)

Remove highly correlated features in a dataframe given a pearson threshold

Type Default Details
df DataFrame
thr float 0.98 threshold

Another way to reduce features besides PCA/UMAP/TSNE is to remove features that are highly correlated. remove_hi_corr is a function to remove highly correlated features based on threshold of Pearson correlation between features.

# Load data
df = Data.get_aa_rdkit()
df.shape
(25, 106)
remove_hi_corr(df,thr=0.9).shape
(25, 78)

source

preprocess

 preprocess (df:pandas.core.frame.DataFrame, thr:float=0.98)

Remove features with no variance, and highly correlated features based on threshold

This function is similar to remove_hi_corr, but can additionaly remove features of zero variance (e.g., 1 across all samples)

preprocess(df,thr=0.9).shape
removing columns: {'fr_SH', 'NumAromaticCarbocycles', 'Chi0v', 'NOCount', 'SlogP_VSA5', 'VSA_EState2', 'Chi2n', 'NumHDonors', 'Chi1v', 'VSA_EState10', 'fr_NH2', 'Chi4n', 'Ipc', 'Chi3v', 'Chi1', 'NumHeteroatoms', 'Kappa1', 'Chi2v', 'RingCount', 'Chi3n', 'SMR_VSA1', 'Chi4v', 'VSA_EState6', 'SMR_VSA9', 'fr_Ar_N', 'NumAromaticRings', 'NumRotatableBonds', 'Chi0n'}
(25, 78)