Protein

Setup

Uniprot sequence


source

get_uniprot_seq

 get_uniprot_seq (uniprot_id)

Queries the UniProt database to retrieve the protein sequence for a given UniProt ID.

get_uniprot_seq('P04626')
'MELAALCRWGLLLALLPPGAASTQVCTGTDMKLRLPASPETHLDMLRHLYQGCQVVQGNLELTYLPTNASLSFLQDIQEVQGYVLIAHNQVRQVPLQRLRIVRGTQLFEDNYALAVLDNGDPLNNTTPVTGASPGGLRELQLRSLTEILKGGVLIQRNPQLCYQDTILWKDIFHKNNQLALTLIDTNRSRACHPCSPMCKGSRCWGESSEDCQSLTRTVCAGGCARCKGPLPTDCCHEQCAAGCTGPKHSDCLACLHFNHSGICELHCPALVTYNTDTFESMPNPEGRYTFGASCVTACPYNYLSTDVGSCTLVCPLHNQEVTAEDGTQRCEKCSKPCARVCYGLGMEHLREVRAVTSANIQEFAGCKKIFGSLAFLPESFDGDPASNTAPLQPEQLQVFETLEEITGYLYISAWPDSLPDLSVFQNLQVIRGRILHNGAYSLTLQGLGISWLGLRSLRELGSGLALIHHNTHLCFVHTVPWDQLFRNPHQALLHTANRPEDECVGEGLACHQLCARGHCWGPGPTQCVNCSQFLRGQECVEECRVLQGLPREYVNARHCLPCHPECQPQNGSVTCFGPEADQCVACAHYKDPPFCVARCPSGVKPDLSYMPIWKFPDEEGACQPCPINCTHSCVDLDDKGCPAEQRASPLTSIISAVVGILLVVVLGVVFGILIKRRQQKIRKYTMRRLLQETELVEPLTPSGAMPNQAQMRILKETELRKVKVLGSGAFGTVYKGIWIPDGENVKIPVAIKVLRENTSPKANKEILDEAYVMAGVGSPYVSRLLGICLTSTVQLVTQLMPYGCLLDHVRENRGRLGSQDLLNWCMQIAKGMSYLEDVRLVHRDLAARNVLVKSPNHVKITDFGLARLLDIDETEYHADGGKVPIKWMALESILRRRFTHQSDVWSYGVTVWELMTFGAKPYDGIPAREIPDLLEKGERLPQPPICTIDVYMIMVKCWMIDSECRPRFRELVSEFSRMARDPQRFVVIQNEDLGPASPLDSTFYRSLLEDDDMGDLVDAEEYLVPQQGFFCPDPAPGAGGMVHHRHRSSSTRSGGGDLTLGLEPSEEEAPRSPLAPSEGAGSDVFDGDLGMGAAKGLQSLPTHDPSPLQRYSEDPTVPLPSETDGYVAPLTCSPQPEYVNQPDVRPQPPSPREGPLPAARPAGATLERPKTLSPGKNGVVKDVFAFGGAVENPEYLTPQGGAAPQPHPPPAFSPAFDNLYYWDQDPPERGAPPSTFKGTPTAENPEYLGLDVPV'

source

get_uniprot_features

 get_uniprot_features (uniprot_id)

Given uniprot_id, get specific region for uniprot features.

get_uniprot_features('P04626')[:3]
[{'type': 'Signal',
  'location': {'start': {'value': 1, 'modifier': 'EXACT'},
   'end': {'value': 22, 'modifier': 'EXACT'}},
  'description': '',
  'evidences': [{'evidenceCode': 'ECO:0000255'}]},
 {'type': 'Chain',
  'location': {'start': {'value': 23, 'modifier': 'EXACT'},
   'end': {'value': 1255, 'modifier': 'EXACT'}},
  'description': 'Receptor tyrosine-protein kinase erbB-2',
  'featureId': 'PRO_0000016669'},
 {'type': 'Topological domain',
  'location': {'start': {'value': 23, 'modifier': 'EXACT'},
   'end': {'value': 652, 'modifier': 'EXACT'}},
  'description': 'Extracellular',
  'evidences': [{'evidenceCode': 'ECO:0000255'}]}]

source

get_uniprot_kd

 get_uniprot_kd (uniprot_id)

Query ‘Domain: Protein kinase’ based on UniProt ID and get its sequence info.

get_uniprot_kd('P04626')
[{'uniprot_id': 'P04626',
  'type': 'Domain',
  'start': 720,
  'end': 987,
  'description': 'Protein kinase',
  'sequence': 'LRKVKVLGSGAFGTVYKGIWIPDGENVKIPVAIKVLRENTSPKANKEILDEAYVMAGVGSPYVSRLLGICLTSTVQLVTQLMPYGCLLDHVRENRGRLGSQDLLNWCMQIAKGMSYLEDVRLVHRDLAARNVLVKSPNHVKITDFGLARLLDIDETEYHADGGKVPIKWMALESILRRRFTHQSDVWSYGVTVWELMTFGAKPYDGIPAREIPDLLEKGERLPQPPICTIDVYMIMVKCWMIDSECRPRFRELVSEFSRMARDPQRFV'}]

source

get_uniprot_type

 get_uniprot_type (uniprot_id, type_='Signal')

Get region sequences based on UniProt ID features.

get_uniprot_type('P04626','Signal') # signal peptide
[{'uniprot_id': 'P04626',
  'type': 'Signal',
  'start': 1,
  'end': 22,
  'description': '',
  'sequence': 'MELAALCRWGLLLALLPPGAAS'}]
get_uniprot_type('P04626','Transmembrane') # tm domain
[{'uniprot_id': 'P04626',
  'type': 'Transmembrane',
  'start': 653,
  'end': 675,
  'description': 'Helical',
  'sequence': 'SIISAVVGILLVVVLGVVFGILI'}]

Mutate sequence


source

apply_mut_single

 apply_mut_single (seq, *mutations, start_pos=1)

Apply mutations to a protein sequence.

Type Default Details
seq protein sequence
mutations VAR_POSITIONAL e.g., E709A
start_pos int 1 if the protein sequence does not start from index 1, indicate the start index to match the mutations
seq = get_uniprot_seq('P04626')
mut_seq = apply_mut_single(seq,'M1A','E2S')
mut_seq
Converted: M1A
Converted: E2S
'ASLAALCRWGLLLALLPPGAASTQVCTGTDMKLRLPASPETHLDMLRHLYQGCQVVQGNLELTYLPTNASLSFLQDIQEVQGYVLIAHNQVRQVPLQRLRIVRGTQLFEDNYALAVLDNGDPLNNTTPVTGASPGGLRELQLRSLTEILKGGVLIQRNPQLCYQDTILWKDIFHKNNQLALTLIDTNRSRACHPCSPMCKGSRCWGESSEDCQSLTRTVCAGGCARCKGPLPTDCCHEQCAAGCTGPKHSDCLACLHFNHSGICELHCPALVTYNTDTFESMPNPEGRYTFGASCVTACPYNYLSTDVGSCTLVCPLHNQEVTAEDGTQRCEKCSKPCARVCYGLGMEHLREVRAVTSANIQEFAGCKKIFGSLAFLPESFDGDPASNTAPLQPEQLQVFETLEEITGYLYISAWPDSLPDLSVFQNLQVIRGRILHNGAYSLTLQGLGISWLGLRSLRELGSGLALIHHNTHLCFVHTVPWDQLFRNPHQALLHTANRPEDECVGEGLACHQLCARGHCWGPGPTQCVNCSQFLRGQECVEECRVLQGLPREYVNARHCLPCHPECQPQNGSVTCFGPEADQCVACAHYKDPPFCVARCPSGVKPDLSYMPIWKFPDEEGACQPCPINCTHSCVDLDDKGCPAEQRASPLTSIISAVVGILLVVVLGVVFGILIKRRQQKIRKYTMRRLLQETELVEPLTPSGAMPNQAQMRILKETELRKVKVLGSGAFGTVYKGIWIPDGENVKIPVAIKVLRENTSPKANKEILDEAYVMAGVGSPYVSRLLGICLTSTVQLVTQLMPYGCLLDHVRENRGRLGSQDLLNWCMQIAKGMSYLEDVRLVHRDLAARNVLVKSPNHVKITDFGLARLLDIDETEYHADGGKVPIKWMALESILRRRFTHQSDVWSYGVTVWELMTFGAKPYDGIPAREIPDLLEKGERLPQPPICTIDVYMIMVKCWMIDSECRPRFRELVSEFSRMARDPQRFVVIQNEDLGPASPLDSTFYRSLLEDDDMGDLVDAEEYLVPQQGFFCPDPAPGAGGMVHHRHRSSSTRSGGGDLTLGLEPSEEEAPRSPLAPSEGAGSDVFDGDLGMGAAKGLQSLPTHDPSPLQRYSEDPTVPLPSETDGYVAPLTCSPQPEYVNQPDVRPQPPSPREGPLPAARPAGATLERPKTLSPGKNGVVKDVFAFGGAVENPEYLTPQGGAAPQPHPPPAFSPAFDNLYYWDQDPPERGAPPSTFKGTPTAENPEYLGLDVPV'

source

apply_mut_complex

 apply_mut_complex (seq, mut, start_pos=1)

*Apply a composite mutation like ‘G776delinsVC/S783C’ to seq, assuming seq[0] corresponds to residue number start_pos.

  • At most one delins or dup is allowed.
  • Point substitutions are executed first; the indel/dup is done last.*
Type Default Details
seq protein sequence
mut mutation (e.g., G776delinsVC/S783C, G778dupGSP)
start_pos int 1 if truncated protein sequence, indicate where it starts to match the position of mutation
her2_seq = 'LRKVKVLGSGAFGTVYKGIWIPDGENVKIPVAIKVLRENTSPKANKEILDEAYVMAGVGSPYVSRLLGICLTSTVQLVTQLMPYGCLLDHVRENRGRLGSQDLLNWCMQIAKGMSYLEDVRLVHRDLAARNVLVKSPNHVKITDFGLARLLDIDETEYHADGGKVPIKWMALESILRRRFTHQSDVWSYGVTVWELMTFGAKPYDGIPAREIPDLLEKGERLPQPPICTIDVYMIMVKCWMIDSECRPRFRELVSEFSRMARDPQRFV'
mut_seq = apply_mut_complex(her2_seq,'G776delinsVC/S783C',720)
mut_seq
'LRKVKVLGSGAFGTVYKGIWIPDGENVKIPVAIKVLRENTSPKANKEILDEAYVMAVCVGSPYVCRLLGICLTSTVQLVTQLMPYGCLLDHVRENRGRLGSQDLLNWCMQIAKGMSYLEDVRLVHRDLAARNVLVKSPNHVKITDFGLARLLDIDETEYHADGGKVPIKWMALESILRRRFTHQSDVWSYGVTVWELMTFGAKPYDGIPAREIPDLLEKGERLPQPPICTIDVYMIMVKCWMIDSECRPRFRELVSEFSRMARDPQRFV'

source

compare_seq

 compare_seq (seq1:str, seq2:str, start_pos:int=1, label1:str='Original',
              label2:str='Mutant', visualize:bool=True)

Align two protein sequences and summarise differences.

Type Default Details
seq1 str original
seq2 str mutant
start_pos int 1
label1 str Original
label2 str Mutant
visualize bool True
compare_seq(her2_seq,mut_seq)
Original       1-79   : LRKVKVLGSGAFGTVYKGIWIPDGENVKIPVAIKVLRENTSPKANKEILDEAYVMA-GVGSPYVSRLLGICLTSTVQLVT
Mutant                : LRKVKVLGSGAFGTVYKGIWIPDGENVKIPVAIKVLRENTSPKANKEILDEAYVMAVCVGSPYVCRLLGICLTSTVQLVT
                                                                                ^^      ^               

Original      80-159  : QLMPYGCLLDHVRENRGRLGSQDLLNWCMQIAKGMSYLEDVRLVHRDLAARNVLVKSPNHVKITDFGLARLLDIDETEYH
Mutant                : QLMPYGCLLDHVRENRGRLGSQDLLNWCMQIAKGMSYLEDVRLVHRDLAARNVLVKSPNHVKITDFGLARLLDIDETEYH
                                                                                                        

Original     160-239  : ADGGKVPIKWMALESILRRRFTHQSDVWSYGVTVWELMTFGAKPYDGIPAREIPDLLEKGERLPQPPICTIDVYMIMVKC
Mutant                : ADGGKVPIKWMALESILRRRFTHQSDVWSYGVTVWELMTFGAKPYDGIPAREIPDLLEKGERLPQPPICTIDVYMIMVKC
                                                                                                        

Original     240-268  : WMIDSECRPRFRELVSEFSRMARDPQRFV
Mutant                : WMIDSECRPRFRELVSEFSRMARDPQRFV
                                                     

Differences:
  insertion    at   57: - → V
  substitution at   57: G → C
  substitution at   64: S → C

End