Pathway analysis

Setup

Reactome pathway


source

get_reactome_raw

 get_reactome_raw (gene_list)

Reactome pathway analysis for a given gene set; returns raw output in dataframe.

pi3ks=['PIK3CA','PIK3CB','PIK3CD','PIK3CG','PIK3R1','PIK3R2','PIK3R3','PTEN','AKT1','AKT2','AKT3','MTOR','RICTOR','RPTOR','TSC1','TSC2','PDK1','IRS1','IRS2','INSR','IGF1R','GAB1','HRAS','NRAS','KRAS','EGFR','ERBB2','ERBB3','ERBB4']
raw_out = get_reactome_raw(pi3ks)
raw_out.head()
stId dbId name llp inDisease species.dbId species.taxId species.name entities.resource entities.total entities.found entities.ratio entities.pValue entities.fdr entities.exp reactions.resource reactions.total reactions.found reactions.ratio
0 R-HSA-1963640 1963640 GRB2 events in ERBB2 signaling True False 48887 9606 Homo sapiens TOTAL 21 9 0.001308 1.110223e-16 1.110223e-15 [] TOTAL 4 4 0.000257
1 R-HSA-9665348 9665348 Signaling by ERBB2 ECD mutants True True 48887 9606 Homo sapiens TOTAL 23 9 0.001432 1.110223e-16 1.110223e-15 [] TOTAL 15 15 0.000964
2 R-HSA-9664565 9664565 Signaling by ERBB2 KD Mutants True True 48887 9606 Homo sapiens TOTAL 35 13 0.002179 1.110223e-16 1.110223e-15 [] TOTAL 17 17 0.001093
3 R-HSA-1227990 1227990 Signaling by ERBB2 in Cancer False True 48887 9606 Homo sapiens TOTAL 36 13 0.002242 1.110223e-16 1.110223e-15 [] TOTAL 62 62 0.003986
4 R-HSA-9665686 9665686 Signaling by ERBB2 TMD/JMD mutants True True 48887 9606 Homo sapiens TOTAL 30 10 0.001868 1.110223e-16 1.110223e-15 [] TOTAL 13 13 0.000836

source

get_reactome

 get_reactome (gene_list, p_type='FDR')

Reactome pathway analysis for a given gene set; returns formated output in dataframe with additional -log10(p)

Type Default Details
gene_list
p_type str FDR or p
format_out = get_reactome(pi3ks,p_type='p')
format_out.head()
Running pathway anlysis
Done
name reactome_id p -log10_p
0 GRB2 events in ERBB2 signaling R-HSA-1963640 1.110223e-16 15.955
1 Signaling by ERBB2 ECD mutants R-HSA-9665348 1.110223e-16 15.955
2 Signaling by ERBB2 KD Mutants R-HSA-9664565 1.110223e-16 15.955
3 Signaling by ERBB2 in Cancer R-HSA-1227990 1.110223e-16 15.955
4 Signaling by ERBB2 TMD/JMD mutants R-HSA-9665686 1.110223e-16 15.955
format_out = get_reactome(pi3ks,p_type='FDR')
format_out.head()
Running pathway anlysis
Done
name reactome_id FDR -log10_FDR
0 GRB2 events in ERBB2 signaling R-HSA-1963640 1.110223e-15 14.955
1 Signaling by ERBB2 ECD mutants R-HSA-9665348 1.110223e-15 14.955
2 Signaling by ERBB2 KD Mutants R-HSA-9664565 1.110223e-15 14.955
3 Signaling by ERBB2 in Cancer R-HSA-1227990 1.110223e-15 14.955
4 Signaling by ERBB2 TMD/JMD mutants R-HSA-9665686 1.110223e-15 14.955
format_out[format_out.FDR<0.05]
name reactome_id FDR -log10_FDR
0 GRB2 events in ERBB2 signaling R-HSA-1963640 1.110223e-15 14.955
1 Signaling by ERBB2 ECD mutants R-HSA-9665348 1.110223e-15 14.955
2 Signaling by ERBB2 KD Mutants R-HSA-9664565 1.110223e-15 14.955
3 Signaling by ERBB2 in Cancer R-HSA-1227990 1.110223e-15 14.955
4 Signaling by ERBB2 TMD/JMD mutants R-HSA-9665686 1.110223e-15 14.955
... ... ... ... ...
318 Viral Infection Pathways R-HSA-9824446 3.936417e-02 1.405
319 RHO GTPase cycle R-HSA-9012999 3.955153e-02 1.403
320 RUNX3 regulates p14-ARF R-HSA-8951936 4.386315e-02 1.358
321 Cellular response to chemical stress R-HSA-9711123 4.685451e-02 1.329
322 Signaling by Rho GTPases R-HSA-194315 4.717396e-02 1.326

323 rows × 4 columns

Reference

Download from Reactome/Download_data: https://reactome.org/download-data

Download UniProt to All pathways under Identifier mapping files

for type, there are IEA (Inferred from Electronic Annotation) and TAS (Traceable Author Statement, higher confidence)

# all level
ref = Data.get_reactome_pathway()
CPU times: user 5 μs, sys: 0 ns, total: 5 μs
Wall time: 7.39 μs
ref.head()
uniprot reactome_id pathway type species
0 A0A023GPK8 R-DME-1500931 Cell-Cell communication IEA Drosophila melanogaster
1 A0A023GPK8 R-DME-373753 Nephrin family interactions IEA Drosophila melanogaster
2 A0A023GRW3 R-DME-72163 mRNA Splicing - Major Pathway IEA Drosophila melanogaster
3 A0A023GRW3 R-DME-72172 mRNA Splicing IEA Drosophila melanogaster
4 A0A023GRW3 R-DME-72203 Processing of Capped Intron-Containing Pre-mRNA IEA Drosophila melanogaster

source

query_reactome

 query_reactome (uniprot_id, level='lowest')

Query uniprot ID in Reactome all level pathway database.

Type Default Details
uniprot_id
level str lowest or all
akt_all = query_reactome('P31749',level='all')
akt_lowest = query_reactome('P31749',level='lowest')
akt_all.head()
reactome_id uniprot pathway type species
0 R-HSA-109581 P31749 Apoptosis IEA Homo sapiens
1 R-HSA-109582 P31749 Hemostasis TAS Homo sapiens
2 R-HSA-109606 P31749 Intrinsic Pathway for Apoptosis IEA Homo sapiens
3 R-HSA-111447 P31749 Activation of BAD and translocation to mitocho... IEA Homo sapiens
4 R-HSA-114452 P31749 Activation of BH3-only proteins IEA Homo sapiens
print(akt_all.shape)
print(akt_lowest.shape)
(112, 5)
(42, 5)

source

add_react_ref

 add_react_ref (react_df, uniprot)
format_out = add_react_ref(format_out,'P31749')
format_out
name reactome_id FDR -log10_FDR P31749_path_all P31749_path_lowest
0 GRB2 events in ERBB2 signaling R-HSA-1963640 1.110223e-15 14.955 0 0
1 Signaling by ERBB2 ECD mutants R-HSA-9665348 1.110223e-15 14.955 0 0
2 Signaling by ERBB2 KD Mutants R-HSA-9664565 1.110223e-15 14.955 0 0
3 Signaling by ERBB2 in Cancer R-HSA-1227990 1.110223e-15 14.955 0 0
4 Signaling by ERBB2 TMD/JMD mutants R-HSA-9665686 1.110223e-15 14.955 0 0
... ... ... ... ... ... ...
397 Metabolism of vitamins and cofactors R-HSA-196854 6.634597e-01 0.178 1 0
398 Neutrophil degranulation R-HSA-6798695 7.432593e-01 0.129 0 0
399 Metabolism of RNA R-HSA-8953854 9.149735e-01 0.039 1 0
400 Post-translational protein modification R-HSA-597592 9.930721e-01 0.003 0 0
401 Metabolism of proteins R-HSA-392499 9.941750e-01 0.003 0 0

402 rows × 6 columns

Plot

Bar plot of pathways


source

plot_path

 plot_path (react_df, p_type='FDR', ref_id_list=None, ref_col=None,
            top_n=10, max_label_length=80)

Plot the output of get_reactome. If ref_df is provided, bars corresponding to pathways in ref_df are shown in dark red.

Type Default Details
react_df the output df of get_reactome
p_type str FDR
ref_id_list NoneType None list of reactome_id
ref_col NoneType None column in reac_df, 1 or 0 to indicate whether it’s in ref
top_n int 10
max_label_length int 80
plot_path(format_out)
plt.title('PI3K Pathways');

# All level
plot_path(format_out,p_type='FDR',ref_id_list=akt_all.reactome_id,top_n=15)
plt.title('PI3K Pathways (with highlight as overlap with all level Reactome database)');

format_out.head()
name reactome_id FDR -log10_FDR P31749_path_all P31749_path_lowest
0 GRB2 events in ERBB2 signaling R-HSA-1963640 1.110223e-15 14.955 0 0
1 Signaling by ERBB2 ECD mutants R-HSA-9665348 1.110223e-15 14.955 0 0
2 Signaling by ERBB2 KD Mutants R-HSA-9664565 1.110223e-15 14.955 0 0
3 Signaling by ERBB2 in Cancer R-HSA-1227990 1.110223e-15 14.955 0 0
4 Signaling by ERBB2 TMD/JMD mutants R-HSA-9665686 1.110223e-15 14.955 0 0
# All level, use ref_col
plot_path(format_out,p_type='FDR',ref_col='P31749_path_all',top_n=15)
plt.title('PI3K Pathways (with highlight as overlap with all level Reactome database)');

# All level
plot_path(format_out,p_type='FDR',ref_col='P31749_path_lowest',top_n=15)
plt.title('PI3K Pathways (with highlight as overlap with lowest level Reactome database)');

Histogram of overlaps


source

plot_overlap

 plot_overlap (react_df, ref_id_list=None, ref_col=None, p_type='FDR',
               thr=0.05)
Type Default Details
react_df
ref_id_list NoneType None
ref_col NoneType None column in reac_df, 1 or 0 to indicate whether it’s in ref
p_type str FDR
thr float 0.05 original threshold of p value, will be log10 transformed
def plot_overlap(react_df, 
                 ref_id_list=None,
                 ref_col=None,  # column in react_df, 1 or 0 to indicate whether it's in ref
                 p_type='FDR',
                 thr=0.05  # original threshold of p value, will be log10 transformed
                ):
    assert p_type in ['p', 'FDR']
    p_col = f'-log10_{p_type}'
    p_col_convert = p_col.replace('_', '(', 1) + ')'  # e.g., -log10(FDR)

    threshold = -np.log10(thr)

    # Subset based on input
    if ref_id_list is not None:
        subset = react_df[react_df.reactome_id.isin(ref_id_list)].copy()
    elif ref_col is not None:
        subset = react_df[react_df[ref_col] == 1].copy()
    else:
        raise ValueError("Need to give values to ref_id_list or ref_col")

    # Plot histogram
    subset[p_col].hist(bins=100)

    # Add threshold line
    plt.axvline(x=threshold, color='red', linestyle='--', label=f'{p_type} = {thr}')
    plt.legend()

    # Label axes
    plt.xlabel(p_col_convert)
    plt.title(f'Histogram of {p_col_convert}')

    # Calculate and print statistics
    num_total = len(subset)
    num_pass = (subset[p_col] > threshold).sum()

    percent_pass = (num_pass / num_total) * 100

    # Optionally show percentage on plot
    plt.text(0.66, 0.85, f'{percent_pass:.1f}% ({num_pass}/{num_total}) pass',
             transform=plt.gca().transAxes,
             ha='right', va='top', fontsize=10, color='green')

    return float(num_pass / num_total)
plot_overlap(format_out, ref_id_list=akt_all.reactome_id)
0.8035714285714286

format_out.head()
name reactome_id FDR -log10_FDR P31749_path_all P31749_path_lowest
0 GRB2 events in ERBB2 signaling R-HSA-1963640 1.110223e-15 14.955 0 0
1 Signaling by ERBB2 ECD mutants R-HSA-9665348 1.110223e-15 14.955 0 0
2 Signaling by ERBB2 KD Mutants R-HSA-9664565 1.110223e-15 14.955 0 0
3 Signaling by ERBB2 in Cancer R-HSA-1227990 1.110223e-15 14.955 0 0
4 Signaling by ERBB2 TMD/JMD mutants R-HSA-9665686 1.110223e-15 14.955 0 0
plot_overlap(format_out, ref_col='P31749_path_all')
0.8035714285714286

plot_overlap(format_out, ref_col='P31749_path_lowest')
0.8333333333333334

Statistics

subset_all = format_out[format_out.P31749_path_all==1]

subset_lowest = format_out[format_out.P31749_path_lowest==1]
(subset_all.FDR<0.05).value_counts()/len(subset_all)
FDR
True     0.803571
False    0.196429
Name: count, dtype: float64
(subset_all.FDR<0.05).value_counts()/len(subset_all)
FDR
True     0.803571
False    0.196429
Name: count, dtype: float64
subset_all['-log10_FDR'].mean()
np.float64(5.994508928571428)
subset_lowest['-log10_FDR'].mean()
np.float64(5.884880952380952)

End