1. Data statistics of the experimental and potential TM proteins in topPTM.

      The TM proteins can be classified by structure as alpha-helical proteins and beta-barrel proteins. Alpha-helical TM proteins are a main class of membrane proteins; an estimated 27% of all human proteins are alpha-helical membrane proteins (Almen, Nordstrom et al. 2009). Beta-barrel TM proteins, which are found in the outer membranes of Gram-negative bacteria, in the cell walls of Gram-positive bacteria, and in the outer membranes of mitochondria and chloroplasts, participate in essential cellular functions by acting as porins, transporters, enzymes, virulence factors and receptors. Experimentally verified TM proteins annotated with membrane topology information were mainly collected from PDB_TM (Tusnady, Dosztanyi et al. 2005), OPM (Lomize, Lomize et al. 2006), TOPDB (Tusnady, Kalmar et al. 2008), and TMPad (Lo, Cheng et al. 2011). After the removal of redundant protein entries, a total of 5394 TM proteins containing experimentally curated annotations of transmembrane topology remained. A set of candidate TM proteins was also extracted from UniProtKB by choosing protein entries containing the keyword "TRANSMEM" in the feature ("FT") line, the localization of "membrane", and the transmembrane topology information. The candidate TM proteins were further filtered using HMMTOP (Tusnady and Simon 2001) and MEMSAT (Nugent and Jones 2009) to determine their transmembrane topologies. Following table shows that the filtering process obtained 69402 potential TM proteins with annotated topologies.

 

Resource Number of experimentally verified TM proteins Number of potential TM proteins
All α-helical β-barrel All α-helical β-barrel
TMPad
379
379
0
N/A
N/A
N/A
OPM
651
1435
44
N/A
N/A
N/A
TOPDB
1479
667
91
N/A
N/A
N/A
PDBTM
785
556
96
N/A
N/A
N/A
UniProtKB
4964
4920
139
69402
68545
856
Total
5394
4991
170
69402
68545
856

 


 

 

  2. Data statistics of the substrate sites according to PTM type.

      Due to an emerging evidence of MS/MS-based proteomics in identifying post-translational modifications, the site-specific modified peptides are manually extracted from approximately 500 MS/MS-associated research articles using a text mining approach. After removing the redundant PTM instances collected from a veriety of public resources, totally 4747 and 47358 experimental PTM sites are annotated on 1049 experimental and 8674 potential TM proteins, respectively. According to the data statistics of each PTM type shown in under table, protein phosphorylation contains the most abundant substrate sites on experimental TM proteins, including 2108 phosphoserines on 603 TM proteins, 645 phosphothreonines on 333 TM proteins, and 585 phosphotyrosines on 268 TM proteins. Otherwise, there are 25789 phosphoserines, 7510 phosphothreonines and 5939 phosphotyrosines on potential TM proteins.

 

PTM Instance Type Number of PTM sites on TM proteins (experimental) Number of PTM sites on TM proteins (poteintial)
Phosphoserine210825789
Phosphothreonine6457510
Phosphotyrosine5855939
N-linked (GlcNAc...)5932519
N6-acetyllysine1141214
S-nitrosocysteine70655
N-linked (Glc...)128570
O-linked (GalNAc...)63497
S-cysteinyl 3-(oxidosulfanyl)alanine (Cys-Cys)110222
S-palmitoyl cysteine43210
N-acetylalanine13155
N-palmitoyl cysteine20121
N-myristoyl glycine6129
O-linked (GlcNAc)8122
N-acetylserine2093
N-acetylmethionine1290
S-farnesyl cysteine093
Caspase cleavage aspartic acid681
Methionine sulfone478
N2,N2-dimethylarginine969
N6-(retinylidene)lysine5718
5-methylarginine866
DePhosphotyrosine1255
S-geranylgeranyl cysteine147
O-linked (GlcNAc...)146
Cysteine methyl ester145
4-hydroxyproline243
O-linked (Man)437
Asymmetric dimethylarginine039
Pyrrolidone carboxylic acid632
S-diacylglycerol cysteine236
N-acetylthreonine234
Sulfotyrosine1125
N-formylmethionine1221
DePhosphoserine427
N6,N6-dimethyllysine031
Glutamate methyl ester (Glu)227
Nitrated522
Omega-N-methylarginine323
Deamidated asparagine123
O-linked (Man...)023
N4-methylasparagine022
(3S)-3-hydroxyasparagine020
GPI-anchor amidated serine218
N6-methyllysine217
Omega-N-methylated arginine315
Glutamate methyl ester (Gln)214
N6-succinyllysine015
N-linked (Glc)015
C-linked (Man)212
Phosphohistidine212
Citrulline013
Deamidated glutamine013
Nitrated tyrosine49
O-linked (Xyl...)310
O-linked (Xyl...) (glycosaminoglycan)111
N-acetylglycine011
Hydroxyproline18
Blocked amino end (Met)35
GPI-anchor amidated asparagine08
N2-acetylarginine17
N6-malonyllysine08
O-linked (HexNAc)17
Symmetric dimethylarginine08
Dimethylated arginine34
Leucine amide07
N6-(pyridoxal phosphate)lysine07
Neddyllysine70
O-linked (Fuc)07
ADP-ribosylarginine06
N6,N6,N6-trimethyllysine15
O-AMP-tyrosine06
DePhosphothreonine05
GPI-anchor amidated aspartate14
Methylhistidine05
O-linked (Hex...)05
S-methylcysteine05
Hypusine04
Lysine amide04
N6-palmitoyl lysine04
none04
Phosphatidylethanolamine amidated glycine04
4-aspartylphosphate03
Alkylcysteine03
Blocked amino end (Gln)03
Carbamidation cysteine03
Glutamine amide03
GPI-anchor amidated glycine03
N6-myristoyl lysine21
N-acetyltyrosine03
O-AMP-threonine03
O-linked (Xyl...) (keratan sulfate)03
S-8alpha-FAD cysteine30
S-glutathionyl cysteine12
S-stearoyl cysteine03
Sulfoserine12
Tele-8alpha-FAD histidine30
(3S)-3-hydroxyhistidine02
5-hydroxylysine02
Arginine amide02
Blocked amino end (Ser)02
Cholesterol glycine ester02
FMN phosphoryl threonine02
Glycine amide02
Glycosylation alanine11
Glycosylation methionine02
GPI-anchor amidated alanine02
GPI-anchor amidated cysteine02
N6-carboxylysine02
N-acetylcysteine02
N-acetylproline02
N-acetylvaline02
O-(5'-phospho-RNA)-serine02
O-linked (Fuc...)02
O-linked (P-Man...)02
Oxidation arginine02
Phenylalanine amide02
Pros-methylhistidine02
S-12-hydroxyfarnesyl cysteine02
S-4a-FMN cysteine02
(3S)-3-hydroxyaspartate01
2',4',5'-topaquinone10
3',4'-dihydroxyphenylalanine01
3'-nitrotyrosine01
3-oxoalanine (Cys)10
4-carboxycysteine01
4-carboxytyrosine01
ADP-ribosylasparagine01
ADP-ribosylcysteine01
Alanine amide10
Alkyllysine01
Aspartic acid 1-[(3-aminopropyl)(5'-adenosyl)phosphono]amide01
Aspartyl aldehyde01
Blocked amino end (Ala)01
Blocked amino end (Thr)01
Blocked amino end (Xaa)01
Cysteine persulfide01
Glutamic acid 1-amide01
Glycosylation glutamine01
GPI-like-anchor amidated asparagine01
GPI-like-anchor amidated serine01
N6-murein peptidoglycan lysine01
N-acetylaspartate01
N-acetylglutamate10
N-D-glucuronoyl asparagine01
N-formylglycine01
N-linked (GalNAc...)01
N-linked (Glc) (glycation)01
N-palmitoyl glycine10
O-(2-cholinephosphoryl)serine01
O-(5'-phospho-RNA)-tyrosine01
O-acetylthreonine01
O-linked (HexNAc...)01
O-linked (Man6P...)01
O-linked (Xyl...) (heparan sulfate)01
O-palmitoyl threonine10
Phosphocysteine01
Phosphorproline01
Pyruvic acid (Ser)01
S-(15-deoxy-Delta12,14-prostaglandin J2-9-yl)cysteine01
S-archaeol cysteine01
S-farnesyl serine01
Tryptophan amide01
Valine amide01
Total
4747
47358


 

 

   3. The structural distribution of PTMs containing more than ten substrate sites on experimental transmembrane proteins.

      According to the information of experimentally verified PTMs collected in topPTM database, the structural distribution of PTMs containing more than ten substrate sites on experimental TM proteins is presented in under table. The structural topologies of a TM protein are mainly categorized into five types: Extracellular, Intracellular, Transmembrane, Other and Unknown regions.

 

PTM Type

 Number of substrate sites

Extracellular

Cytoplasmic

Transmembrane

Other

Unknown

Phosphoserine

72
1603
24
210
199

Phosphothreonine

52
416
12
66
99

Phosphotyrosine

53
374
21
88
49

N-linked (GlcNAc...)

417
0
0
146
30

N6-acetyllysine

4
48
8
41
13

S-nitrosocysteine

8
26
6
12
18

N-linked (Glc...)

101
0
1
21
5

O-linked (GalNAc...)

57
0
0
6
0

S-cysteinyl 3-(oxidosulfanyl)alanine (Cys-Cys)

92
0
0
16
2

S-palmitoyl cysteine

0
32
4
1
6

N-acetylalanine

0
4
0
1
8

N-palmitoyl cysteine

0
17
1
0
2

N-myristoyl glycine

0
1
0
5
0

O-linked (GlcNAc)

3
4
0
0
1

N-acetylserine

0
12
0
4
4

N-acetylmethionine

1
5
1
1
4

S-farnesyl cysteine

0
0
0
0
0

Caspase cleavage aspartic acid

0
6
0
0
0

Methionine sulfone

0
4
0
0
0

N2,N2-dimethylarginine

1
4
0
4
0

N6-(retinylidene)lysine

0
0
57
0
0

5-methylarginine

1
3
0
4
0

DePhosphotyrosine

0
12
0
0
0

S-geranylgeranyl cysteine

0
0
0
0
1

O-linked (GlcNAc...)

1
0
0
0
0

Cysteine methyl ester

0
0
0
0
1

4-hydroxyproline

0
2
0
0
0

O-linked (Man)

4
0
0
0
0

Asymmetric dimethylarginine

0
0
0
0
0

Pyrrolidone carboxylic acid

3
0
0
1
2

S-diacylglycerol cysteine

0
0
0
0
2

N-acetylthreonine

0
0
0
0
2

Sulfotyrosine

11
0
0
0
0

N-formylmethionine

0
2
1
5
4

DePhosphoserine

0
4
0
0
0

N6,N6-dimethyllysine

0
0
0
0
0

Glutamate methyl ester (Glu)

0
2
0
0
0

Nitrated

0
3
1
0
1

Omega-N-methylarginine

0
0
0
0
3

Deamidated asparagine

0
1
0
0
0

O-linked (Man...)

0
0
0
0
0

N4-methylasparagine

0
0
0
0
0

(3S)-3-hydroxyasparagine

0
0
0
0
0

GPI-anchor amidated serine

1
0
0
0
1

N6-methyllysine

0
2
0
0
0

Omega-N-methylated arginine

0
3
0
0
0

Glutamate methyl ester (Gln)

0
2
0
0
0

N6-succinyllysine

0
0
0
0
0

N-linked (Glc)

0
0
0
0
0

C-linked (Man)

2
0
0
0
0

Phosphohistidine

0
1
0
0
1

Citrulline

0
0
0
0
0

Deamidated glutamine

0
0
0
0
0

Nitrated tyrosine

0
2
0
0
2

O-linked (Xyl...)

3
0
0
0
0

O-linked (Xyl...) (glycosaminoglycan)

0
0
0
1
0

N-acetylglycine

0
0
0
0
0