Reactome pathway
source
get_reactome_raw
get_reactome_raw (gene_list)
Reactome pathway analysis for a given gene set; returns raw output in dataframe.
pi3ks= ['PIK3CA' ,'PIK3CB' ,'PIK3CD' ,'PIK3CG' ,'PIK3R1' ,'PIK3R2' ,'PIK3R3' ,'PTEN' ,'AKT1' ,'AKT2' ,'AKT3' ,'MTOR' ,'RICTOR' ,'RPTOR' ,'TSC1' ,'TSC2' ,'PDK1' ,'IRS1' ,'IRS2' ,'INSR' ,'IGF1R' ,'GAB1' ,'HRAS' ,'NRAS' ,'KRAS' ,'EGFR' ,'ERBB2' ,'ERBB3' ,'ERBB4' ]
raw_out = get_reactome_raw(pi3ks)
raw_out.head()
0
R-HSA-1963640
1963640
GRB2 events in ERBB2 signaling
True
False
48887
9606
Homo sapiens
TOTAL
21
9
0.001308
1.110223e-16
1.110223e-15
[]
TOTAL
4
4
0.000257
1
R-HSA-9665348
9665348
Signaling by ERBB2 ECD mutants
True
True
48887
9606
Homo sapiens
TOTAL
23
9
0.001432
1.110223e-16
1.110223e-15
[]
TOTAL
15
15
0.000964
2
R-HSA-9664565
9664565
Signaling by ERBB2 KD Mutants
True
True
48887
9606
Homo sapiens
TOTAL
35
13
0.002179
1.110223e-16
1.110223e-15
[]
TOTAL
17
17
0.001093
3
R-HSA-1227990
1227990
Signaling by ERBB2 in Cancer
False
True
48887
9606
Homo sapiens
TOTAL
36
13
0.002242
1.110223e-16
1.110223e-15
[]
TOTAL
62
62
0.003986
4
R-HSA-9665686
9665686
Signaling by ERBB2 TMD/JMD mutants
True
True
48887
9606
Homo sapiens
TOTAL
30
10
0.001868
1.110223e-16
1.110223e-15
[]
TOTAL
13
13
0.000836
source
get_reactome
get_reactome (gene_list, p_type='FDR')
Reactome pathway analysis for a given gene set; returns formated output in dataframe with additional -log10(p)
gene_list
p_type
str
FDR
or p
format_out = get_reactome(pi3ks,p_type= 'p' )
format_out.head()
Running pathway anlysis
Done
0
GRB2 events in ERBB2 signaling
R-HSA-1963640
1.110223e-16
15.955
1
Signaling by ERBB2 ECD mutants
R-HSA-9665348
1.110223e-16
15.955
2
Signaling by ERBB2 KD Mutants
R-HSA-9664565
1.110223e-16
15.955
3
Signaling by ERBB2 in Cancer
R-HSA-1227990
1.110223e-16
15.955
4
Signaling by ERBB2 TMD/JMD mutants
R-HSA-9665686
1.110223e-16
15.955
format_out = get_reactome(pi3ks,p_type= 'FDR' )
format_out.head()
Running pathway anlysis
Done
0
GRB2 events in ERBB2 signaling
R-HSA-1963640
1.110223e-15
14.955
1
Signaling by ERBB2 ECD mutants
R-HSA-9665348
1.110223e-15
14.955
2
Signaling by ERBB2 KD Mutants
R-HSA-9664565
1.110223e-15
14.955
3
Signaling by ERBB2 in Cancer
R-HSA-1227990
1.110223e-15
14.955
4
Signaling by ERBB2 TMD/JMD mutants
R-HSA-9665686
1.110223e-15
14.955
format_out[format_out.FDR< 0.05 ]
0
GRB2 events in ERBB2 signaling
R-HSA-1963640
1.110223e-15
14.955
1
Signaling by ERBB2 ECD mutants
R-HSA-9665348
1.110223e-15
14.955
2
Signaling by ERBB2 KD Mutants
R-HSA-9664565
1.110223e-15
14.955
3
Signaling by ERBB2 in Cancer
R-HSA-1227990
1.110223e-15
14.955
4
Signaling by ERBB2 TMD/JMD mutants
R-HSA-9665686
1.110223e-15
14.955
...
...
...
...
...
318
Viral Infection Pathways
R-HSA-9824446
3.936417e-02
1.405
319
RHO GTPase cycle
R-HSA-9012999
3.955153e-02
1.403
320
RUNX3 regulates p14-ARF
R-HSA-8951936
4.386315e-02
1.358
321
Cellular response to chemical stress
R-HSA-9711123
4.685451e-02
1.329
322
Signaling by Rho GTPases
R-HSA-194315
4.717396e-02
1.326
323 rows × 4 columns
Reference
Download from Reactome/Download_data: https://reactome.org/download-data
Download UniProt to All pathways
under Identifier mapping files
for type, there are IEA (Inferred from Electronic Annotation) and TAS (Traceable Author Statement, higher confidence)
# all level
ref = Data.get_reactome_pathway()
CPU times: user 5 μs, sys: 0 ns, total: 5 μs
Wall time: 7.39 μs
0
A0A023GPK8
R-DME-1500931
Cell-Cell communication
IEA
Drosophila melanogaster
1
A0A023GPK8
R-DME-373753
Nephrin family interactions
IEA
Drosophila melanogaster
2
A0A023GRW3
R-DME-72163
mRNA Splicing - Major Pathway
IEA
Drosophila melanogaster
3
A0A023GRW3
R-DME-72172
mRNA Splicing
IEA
Drosophila melanogaster
4
A0A023GRW3
R-DME-72203
Processing of Capped Intron-Containing Pre-mRNA
IEA
Drosophila melanogaster
source
query_reactome
query_reactome (uniprot_id, level='lowest')
Query uniprot ID in Reactome all level pathway database.
uniprot_id
level
str
lowest
or all
akt_all = query_reactome('P31749' ,level= 'all' )
akt_lowest = query_reactome('P31749' ,level= 'lowest' )
0
R-HSA-109581
P31749
Apoptosis
IEA
Homo sapiens
1
R-HSA-109582
P31749
Hemostasis
TAS
Homo sapiens
2
R-HSA-109606
P31749
Intrinsic Pathway for Apoptosis
IEA
Homo sapiens
3
R-HSA-111447
P31749
Activation of BAD and translocation to mitocho...
IEA
Homo sapiens
4
R-HSA-114452
P31749
Activation of BH3-only proteins
IEA
Homo sapiens
print (akt_all.shape)
print (akt_lowest.shape)
source
add_react_ref
add_react_ref (react_df, uniprot)
format_out = add_react_ref(format_out,'P31749' )
format_out
0
GRB2 events in ERBB2 signaling
R-HSA-1963640
1.110223e-15
14.955
0
0
1
Signaling by ERBB2 ECD mutants
R-HSA-9665348
1.110223e-15
14.955
0
0
2
Signaling by ERBB2 KD Mutants
R-HSA-9664565
1.110223e-15
14.955
0
0
3
Signaling by ERBB2 in Cancer
R-HSA-1227990
1.110223e-15
14.955
0
0
4
Signaling by ERBB2 TMD/JMD mutants
R-HSA-9665686
1.110223e-15
14.955
0
0
...
...
...
...
...
...
...
397
Metabolism of vitamins and cofactors
R-HSA-196854
6.634597e-01
0.178
1
0
398
Neutrophil degranulation
R-HSA-6798695
7.432593e-01
0.129
0
0
399
Metabolism of RNA
R-HSA-8953854
9.149735e-01
0.039
1
0
400
Post-translational protein modification
R-HSA-597592
9.930721e-01
0.003
0
0
401
Metabolism of proteins
R-HSA-392499
9.941750e-01
0.003
0
0
402 rows × 6 columns
Plot
plot_path
plot_path (react_df, p_type='FDR', ref_id_list=None, ref_col=None,
top_n=10, max_label_length=80)
Plot the output of get_reactome. If ref_df is provided, bars corresponding to pathways in ref_df are shown in dark red.
react_df
the output df of get_reactome
p_type
str
FDR
ref_id_list
NoneType
None
list of reactome_id
ref_col
NoneType
None
column in reac_df, 1 or 0 to indicate whether it’s in ref
top_n
int
10
max_label_length
int
80
plot_path(format_out)
plt.title('PI3K Pathways' );
# All level
plot_path(format_out,p_type= 'FDR' ,ref_id_list= akt_all.reactome_id,top_n= 15 )
plt.title('PI3K Pathways (with highlight as overlap with all level Reactome database)' );
0
GRB2 events in ERBB2 signaling
R-HSA-1963640
1.110223e-15
14.955
0
0
1
Signaling by ERBB2 ECD mutants
R-HSA-9665348
1.110223e-15
14.955
0
0
2
Signaling by ERBB2 KD Mutants
R-HSA-9664565
1.110223e-15
14.955
0
0
3
Signaling by ERBB2 in Cancer
R-HSA-1227990
1.110223e-15
14.955
0
0
4
Signaling by ERBB2 TMD/JMD mutants
R-HSA-9665686
1.110223e-15
14.955
0
0
# All level, use ref_col
plot_path(format_out,p_type= 'FDR' ,ref_col= 'P31749_path_all' ,top_n= 15 )
plt.title('PI3K Pathways (with highlight as overlap with all level Reactome database)' );
# All level
plot_path(format_out,p_type= 'FDR' ,ref_col= 'P31749_path_lowest' ,top_n= 15 )
plt.title('PI3K Pathways (with highlight as overlap with lowest level Reactome database)' );
plot_overlap
plot_overlap (react_df, ref_id_list=None, ref_col=None, p_type='FDR',
thr=0.05)
react_df
ref_id_list
NoneType
None
ref_col
NoneType
None
column in reac_df, 1 or 0 to indicate whether it’s in ref
p_type
str
FDR
thr
float
0.05
original threshold of p value, will be log10 transformed
def plot_overlap(react_df,
ref_id_list= None ,
ref_col= None , # column in react_df, 1 or 0 to indicate whether it's in ref
p_type= 'FDR' ,
thr= 0.05 # original threshold of p value, will be log10 transformed
):
assert p_type in ['p' , 'FDR' ]
p_col = f'-log10_ { p_type} '
p_col_convert = p_col.replace('_' , '(' , 1 ) + ')' # e.g., -log10(FDR)
threshold = - np.log10(thr)
# Subset based on input
if ref_id_list is not None :
subset = react_df[react_df.reactome_id.isin(ref_id_list)].copy()
elif ref_col is not None :
subset = react_df[react_df[ref_col] == 1 ].copy()
else :
raise ValueError ("Need to give values to ref_id_list or ref_col" )
# Plot histogram
subset[p_col].hist(bins= 100 )
# Add threshold line
plt.axvline(x= threshold, color= 'red' , linestyle= '--' , label= f' { p_type} = { thr} ' )
plt.legend()
# Label axes
plt.xlabel(p_col_convert)
plt.title(f'Histogram of { p_col_convert} ' )
# Calculate and print statistics
num_total = len (subset)
num_pass = (subset[p_col] > threshold).sum ()
percent_pass = (num_pass / num_total) * 100
# Optionally show percentage on plot
plt.text(0.66 , 0.85 , f' { percent_pass:.1f} % ( { num_pass} / { num_total} ) pass' ,
transform= plt.gca().transAxes,
ha= 'right' , va= 'top' , fontsize= 10 , color= 'green' )
return float (num_pass / num_total)
plot_overlap(format_out, ref_id_list= akt_all.reactome_id)
0
GRB2 events in ERBB2 signaling
R-HSA-1963640
1.110223e-15
14.955
0
0
1
Signaling by ERBB2 ECD mutants
R-HSA-9665348
1.110223e-15
14.955
0
0
2
Signaling by ERBB2 KD Mutants
R-HSA-9664565
1.110223e-15
14.955
0
0
3
Signaling by ERBB2 in Cancer
R-HSA-1227990
1.110223e-15
14.955
0
0
4
Signaling by ERBB2 TMD/JMD mutants
R-HSA-9665686
1.110223e-15
14.955
0
0
plot_overlap(format_out, ref_col= 'P31749_path_all' )
plot_overlap(format_out, ref_col= 'P31749_path_lowest' )
Statistics
subset_all = format_out[format_out.P31749_path_all== 1 ]
subset_lowest = format_out[format_out.P31749_path_lowest== 1 ]
(subset_all.FDR< 0.05 ).value_counts()/ len (subset_all)
FDR
True 0.803571
False 0.196429
Name: count, dtype: float64
(subset_all.FDR< 0.05 ).value_counts()/ len (subset_all)
FDR
True 0.803571
False 0.196429
Name: count, dtype: float64
subset_all['-log10_FDR' ].mean()
np.float64(5.994508928571428)
subset_lowest['-log10_FDR' ].mean()
np.float64(5.884880952380952)