Blog-like notes

What’s the average size of a human protein?

Proteins vary in size from a few kDa to a few MDa (a little less than 4 MDa for titin, the largest known protein). But what is the average weight of a human protein?

A student asked me that question a few years ago. I had never thought about that before, so my best-of-my-knowledge answer was “probably about 50 kDa”. I came back the following week with a more precise and confident answer, which I had obtained as I explain here.

A comprehensive list of all known proteins in some organisms is available on UniProt. Let’s download the reference proteome for Homo sapiens (in FASTA format, about 40 MB).

Given any protein sequence, the pepstats program, part of the EMBOSS package, computes all kinds of informations, including the protein’s molecular weight:

$ pepstats -auto -outfile=stdout human_proteome.fasta
PEPSTATS of 1433B_HUMAN from 1 to 246

Molecular weight = 28082.40             Residues = 246
Average Residue Weight  = 114.156       Charge   = -13.0
Isoelectric Point = 4.4720
[…]

Pipe that into sed(1) to extract only the molecular weight:

$ pepstats -auto -outfile=stdout human_proteome.fasta | \
    sed -nre ‘s/^Molecular weight = ([.0-9]+)\W+Residues = [0-9]+\W+$/\1/p’ | \
    > sizes.dat

The sizes.dat file now contains the sizes of all human proteins, let’s compute the average value:

$ R --vanilla --slave
w <- read.table(‘sizes.dat’)
colMeans(w)
      V1
38229.43

And here the answer to our question: the average size of a human protein is ~38.2 kDa.

Obviously, the same method may be applied to compute the average size of proteins from any other organism (as long as the proteome data for that organism is available), and it may also be slightly adapted to compute other average values, such as the average length.