Proteins vary in size from a few kDa to a few MDa (a little less than 4 MDa for titin, the largest known protein). But what is the average weight of a human protein?
A student asked me that question a few years ago. I had never thought about that before, so my best-of-my-knowledge answer was “probably about 50 kDa”. I came back the following week with a more precise and confident answer, which I had obtained as I explain here.
A comprehensive list of all known proteins in some organisms is available on UniProt. Let’s download the reference proteome for Homo sapiens (in FASTA format, about 40 MB).
Given any protein sequence, the pepstats program, part of the EMBOSS package, computes all kinds of informations, including the protein’s molecular weight:
$ pepstats -auto -outfile=stdout human_proteome.fasta PEPSTATS of 1433B_HUMAN from 1 to 246 Molecular weight = 28082.40 Residues = 246 Average Residue Weight = 114.156 Charge = -13.0 Isoelectric Point = 4.4720 […]
Pipe that into sed(1) to extract only the molecular weight:
$ pepstats -auto -outfile=stdout human_proteome.fasta | \ sed -nre ‘s/^Molecular weight = ([.0-9]+)\W+Residues = [0-9]+\W+$/\1/p’ | \ > sizes.dat
sizes.dat file now contains the sizes of all human
proteins, let’s compute the average value:
$ R --vanilla --slave w <- read.table(‘sizes.dat’) colMeans(w) V1 38229.43
And here the answer to our question: the average size of a human protein is ~38.2 kDa.
Obviously, the same method may be applied to compute the average size of proteins from any other organism (as long as the proteome data for that organism is available), and it may also be slightly adapted to compute other average values, such as the average length.