background image
Systems biology of energetic and atomic costs in the yeast
transcriptome, proteome, and metabolome
Michael D Barton
1
, Bal“
azs Papp
1
,
2
, Daniela Delneri
1
, Stephen G Oliver
1
,
, Magnus Rattray
3
,
Casey M Bergman
1
1
Faculty of Life Sciences, University of Manchester, Manchester, M13 9PT, UK
2
Institute of Biochemistry, Biological Research Centre of the Hungarian Academy of Sciences, H-6726 Szeged, Hungary
3
School of Computer Science, University of Manchester, Manchester, M13 9PL, UK
Present address: Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge, CB2
1GA, UK
Email: Michael D Barton - mail@michaelbarton.me.uk;
Corresponding author
Abstract
Background:
Every protein has a variable atomic and energetic cost to the cell based on the synthesis of its
constituent amino acids. Quantifying the cost of amino acid synthesis is challenging, however natural selection
is expected to favour the use of proteins whose constituents are cheaper to produce in terms of energetic and
atomic cost.
Results:
We develop a systems biology approach to estimate the cost of amino acid synthesis based on
genome-scale metabolic models, and directly investigate the effects of the cost of amino acid synthesis on
transcriptomic, proteomic and metabolomic data in Saccharomyces cerevisiae. We used our two new and six
previously reported measures of amino acid cost in conjunction with codon usage bias, tRNA gene number and
atomic composition to identify the factors that predict transcript, protein and free amino acid levels in the yeast
cell. While most previously reported cost measures are highly correlated, we find that our systems approach to
formulating the cost of amino acid synthesis produces a novel measure of cost, which explains similar levels of
variation in gene expression. Regardless of the measure used, the cost of amino acid synthesis is weakly
associated with transcript and protein levels, independent of codon usage bias. In contrast, energetic costs
explain a large proportion of variation in levels of free amino acids.
1
background image
Conclusions:
In the economy of the yeast cell, the cost of amino acid synthesis correlates with transcript and
protein levels to a lesser degree than translational optimisation, whereas atomic and energetic cost plays a much
larger role in explaining levels in free amino acids. However, as there appears to be no single currency to
compute the cost of amino acid synthesis, a systems approach is necessary to uncover the full effects of amino
acid biosynthetic cost in complex biological systems that vary with cellular and environmental conditions.
Background
Everything in a living cell has a cost: from the energy needed to transform molecules against
thermodynamic equilibria, to the raw materials needed to produce the constituents of a new cell. Natural
selection may be expected to minimise such cellular costs, and evidence for adaptation to require less
energy and matter may exist at the molecular or cellular level. Testing this hypothesis requires answering
several questions about the meaning of cost in the cell, and how to measure it. For example, how does one
assign a biochemical price to a molecule whose state is dependent on changing environmental and cellular
conditions? Similarly, is it possible to separate different costs from one another, or from other molecular
constraints in the cell? Moreover, are biosynthetic costs the same at different levels of the gene expression
hierarchy? Knowing the answers to these questions is central to a systematic understanding of the chemical
forces that shape the composition of biomolecules, and how biomolecular composition relates to gene
expression at the transcriptional and post-transcriptional levels.
Craig and Weber [1] pioneered the quantitative analysis of cost at the cellular level to investigate the
effects on the synthesis and evolution of a small number of Escherichia coli proteins. These authors
estimated the cost of a protein as the sum of how many units of high energy phosphate bonds (e.g. ATP)
and reducing hydrogen atoms (e.g. NADPH) are diverted from the available energy pool to produce each
of the constituent amino acids from glucose, divided by the length of the protein. Akashi and Gojobori [2]
used a modified version of this approach to show in the chemoheterotrophic bacteria E. coli and Bacillus
subtilis
, that predicted gene expression levels based on codon usage bias show a negative correlation with
average protein cost. This work provided the first genome-wide evidence that evolution has optimised
prokaryotic cells to produce highly expressed proteins with less expensive amino acids, and established an
2
background image
important link between the metabolism of a cell and the evolution of its genome sequence. Heizer et al. [3]
extended these findings to four additional prokaryotic species including photoautotrophs, demonstrating
that this cost optimisation occurs whether the source of energy is organic or inorganic. More recently,
Swire [4] used Craig and Weber's [1] cost values to generate a new cost measure for an amino acid based on
its usage in proteins as a function of overall protein cost computed from all other amino acids, and showed
that cost selection affects multiple prokaryotic, archaeal and eukaryotic genomes. Wagner [5] developed a
method similar to Craig and Weber [1] that includes the energetic costs of synthesising both mRNA and
protein for Saccharomyces cerevisiae, and showed that the cost of doubling gene expression after a gene
duplication is likely to be significant enough to be selected.
Seligmann [6] argued that, while the number of high energy molecules is an important part of the energetic
investment of synthesising an amino acid, this approach is unlikely to explain the entire investment made
by a cell when producing an amino acid. Instead, Seligmann [6] used the molecular weight of an amino acid
as a proxy for energetic costs, reasoning that this may take into account all the manifold effects of the
complexity of producing larger amino acids. Molecular weight also has the advantage of being constant
across species, and therefore can be used to test the cost selection hypothesis where the genome sequence is
available but the topology of amino acid synthetic pathways is unknown. Seligmann [6] used this to prove
on an individual protein basis, molecular weight is indeed minimised across a range of bacterial and
eukaryotic genomes. Estimating the cost of an amino acid based on its molecular weight also raises the
issue of the potential costs of the atomic content in biomolecules. In their seminal work, Mazel &
Marliere [7] showed in the cyanobacterium Calothrix that abundant proteins expressed under sulphur
limiting conditions are depleted for sulfur-containing cysteine and methionine residues. Baudouin-Cornu et
al
. [8] showed that enzymes in pathways scavenging sulphur, carbon, or nitrogen from the environment are
under-represented in terms of their atomic composition for that particular nutrient. Further research by
Bragg et al. [9] showed across 141 genomes that the sulphur content of the encoded protein varies widely
and is associated with environmental conditions of the species. All of these results indicate that atomic
composition may play an equally important role as biosynthetic cost in the evolution of protein sequence.
Even taking into consideration all of the atoms and energy required for protein synthesis accurately
predicted, this may not represent the true cost under all cellular or environmental conditions. Just as in
supply and demand economics, when a particular atom is scarce in the cell or environment, synthesis of
3
background image
molecules abundant in this atom will be more expensive, in comparison to molecules where that atom is
under-represented. As an example of the effect of supply and demand in cellular economics, Varma et
al.
[10] showed in E. coli that the "shadow price" of using molecules involved in energy production changes
according to the availability of oxygen. As the availability of oxygen decreases, the cost of its use rises,
while the cost of ethanol use decreases as the energy to redox ratio become less efficient. The field of
metabolic control analysis takes a similar approach, estimating the importance of an entity in a system, by
perturbing a given parameter and determining the resultant change in flux in the rest of the
system [11, 12]. Carlson [13] demonstrated in silico the importance of supply and demand by illustrating
that E. coli will favour the expression of pathways using inexpensive proteins in stress inducing
environments. In this report, we use a systems biology approach based on Flux Balance Analayis (FBA)
similar to that of Varma et al. [10] to estimate cost of synthesising amino acids in the context of the
availability of nutrients. We first estimate the "relative" cost of synthesising an amino acid by examining
the sensitivity of growth rate to the required quantity of the amino acid per gram of biomass. We then
estimate a second "absolute cost" by multiplying the relative cost by the biomass requirement of the amino
acid in the FBA model. Similar to Varma at al. [10] where they examined the sensitivity of growth to
small increases in oxygen availability, we use in silico models estimate the effect of an increasing per gram
of biomass amino acid requirement has on the uptake of environmental nutrients. We calculated each of
these cost types for four nutrient limiting conditions (glucose, nitrogen, sulphur, and phosphorus) to
investigate how cost varies according to environmental conditions. As in previous studies [1, 3­6], we focus
on amino acid synthesis, because this allows us to analyse the effects of cost on gene expression.
Using our in silico systems approach, we analysed the joint effects of energetic and atomic costs on the
transcriptome, proteome and metabolome of S. cerevisiae. Our aim is to determine if there is measurable
relationship between biosynthetic cost and gene expression that could indicate a possible role for cost
minimisation as a selective pressure on genomic regulatory systems. Our results show that biosynthetic
cost and atomic composition do indeed have a measurable relationship with gene expression, but that the
effects of cost are dependent on the level at which the gene expression hierarchy is considered. Importantly,
we also analyse transcriptomic and proteomic data directly to study gene expression, whereas previous
work has used codon usage bias as a proxy for gene expression [2, 3, 14], examined atomic composition for
only a small set of expressed genes [15], or attempted to predict gene expression based on amino acid
composition without respect to biosynthetic cost [16]. By analysing the joint effects of cost and atomic
4
background image
content together with other factors such as codon usage bias and genomic tRNA count together we are able
to account for any possible correlations between variables, and thereby assess the importance of each
independently. We show that our systems biology approach explains a significant proportion of variation in
gene expression levels, independent of codon usage bias, tRNA gene number or atomic content. Our
relative measure of cost is poorly correlated with previously reported measures of cost, but also explains a
comparable amount of gene expression, suggesting that no single measure currently captures all aspects of
biosynthetic cost. We also extend cost analysis to levels of free amino acid levels in the metabolome, an
aspect of cellular economics that has not been considered in previous research, but which provides intimate
an link to protein synthesis.
Results
A systems biology approach to estimating the cost of amino acid synthesis
To estimate the cost of synthesising an amino acid in S. cerevisiae, we used the in silico genome-scale
metabolic model created by Duarte et al. [17]. For each amino acid, the required quantity for growth was
altered and the effect on one of four nutrient uptake fluxes was measured. These uptake fluxes were
glucose, ammonium, sulphate and phosphate. The biomass production rate for the model was fixed at a
constant value. For each amino acid, a small percentage increase and decrease in requirement mmol of
amino acid per gram of biomass production was applied. This change was applied to the S
ij
position the
stoichiometric matrix, where j corresponds to the biomass reaction, and i is the position of the amino acid
in the reaction. This allowed the cost of an amino acid relative to growth ("relative cost") to be measured
as the slope between the percentage change in amino acid requirement and the predicted uptake flux. A
per-molecule "absolute cost" was then calculated for each amino acid by multiplying the relative cost by
the biomass requirement in the FBA model. Details of the relationship between these two estimates of cost
are outlined in the Materials and Methods. An alternative method to estimate the cost of an amino acid
would be to fix nutrient influx and determine the effect on maximal growth rate, however our approach of
fixing biomass and minimising influx has that advantage of scaling each cost to the same growth rate,
which allows costs to compared on the same scale. The left hand side of Figure 1 shows previous costs
measures reported in the literature and our novel systems biology cost estimates. The right hand side
shows an agglomerative hierarchical clustering dendrogram of all cost measures listed in Table 1, based on
the Spearman's Rank correlation (Additional File 2). The correlation between absolute and relative costs
in S. cerevisiae and E. coli with Akashi and Gojobori's [2] energetic cost and molecular weight are also
5
background image
shown in Additional File 3.
As expected if the limiting factor is the availability of atoms to create the molecule rather than energetic
limitation, we find that the absolute cost of an amino acid under S and N limitation is directly
proportional to the number of atoms of that nutrient in the molecule. The absolute costs of all amino acids
under P limitation are zero, in accordance with the fact that amino acids do not contain phosphate atoms.
Absolute cost estimates under glucose limitation have Spearman correlation coefficients greater than 0.8
with Akashi and Gojobori's [2] energetic cost, Craig and Weber's [1] energetic cost, Wagner's [5]
respiratory energetic cost, and molecular weight (Additional File 2). Wagner's [5] fermentative energy cost
and Craig and Weber's [1] biosynthetic complexity show lower coefficients of 0.522 and 0.65, respectively,
but are still significantly correlated. These results show our absolute cost measure under glucose limitation
is in good agreement with the previous manually-curated measures described in the literature, and indicate
that the most relevant measure of amino acids biosynthetic cost is most likely a function of energetic
limitation in yeast.
Our relative costs are can be viewed as the absolute cost of synthesising the amino acid, scaled by its use in
the proteome. For example, cysteine and methionine have the same absolute cost under sulphur limiting
conditions, but in relative terms the cost of methionine is much greater because it is used more in the
proteome. A similar pattern is also observed for histidine and lysine, whose rank order absolute costs switch
when scaled by proteome use. As expected, phosphate limitation does not shown any effect on the relative
cost of amino acid synthesis. Compared with previously reported cost measures, relative cost under glucose
limitation shows no significant correlation with any previously described cost measure (all p > 0.05). The
highest Spearman coefficient between relative cost under glucose limitation and any other literature
dataset is 0.077 (p = 0.49, with Wagner's [5] fermentative energetic cost), indicating that our relative cost
measure under glucose limitation has little in common previous descriptions of amino acid cost.
We used similar measures to calculate amino acid cost using the iJR904 model of the E. coli metabolic
network [18] to estimate the generality of our results between species and FBA models. Absolute costs of
synthesis are highly correlated between species under glucose limitation (Spearman R = 0.94, p = 0), as are
relative costs under glucose limitation (Spearman R = 0.74, p <0.001). The high correlation of absolute
cost is expected given the conservation of core metabolic pathways across species [19], whereas the lower
correlation of relative costs may arise from differences in amino acid composition of the proteome across
6
background image
species. The estimated costs for E. coli are also included in Additional File 3 for illustrative purposes, and
demonstrate the general applicability of our method to any species with a genome-scale metabolic model.
The cost of amino acid synthesis on the yeast transcriptome, proteome and metabolome
Transcriptome
We investigated the capacity of absolute and relative cost under glucose limited growth, as well as each
previously reported cost measure, to explain transcript expression levels in each of the four nutrient
limiting environments from the S. cerevisiae dataset of Castrillo et al. [20] using multivariate regression.
The expression of each transcript was modelled as a function of the codon adaptation index (CAI) of the
coding sequence, average tRNA gene number, the mean energetic cost per residue of the protein, and the
mean atomic composition per residue of the protein. We included CAI and tRNA gene number in the
model since these factors are known to correlate with gene expression levels [21, 22], and allows us to
demonstrate an independent effect of cost that controls for these factors. We note that we do not imply
that CAI or genomic tRNA count are causal factors in gene expression, however these variables have been
shown to be correlated with both cost and/or gene expression [2, 22] , and we aim to consider the
importance of separately. Each of the selected cost types, was cycled as the cost variable in the regression
equation. Only the relative cost under glucose limitation was used, as this is the environment most relevant
to yeast [20, 23, 24] and the other costs under P, N and S limitation are proportional to atomic composition,
which is already included in the model. Table 2 shows the explanatory power for the full regression model
for each cost type in predicting transcript levels. All models explain 40% of the variation in transcript
levels across genes, with the difference in variation explained by the best and worst model being only 4.5%.
Using Akaike's Information Criterion (AIC) [25] the importance of the variables in the regression equation
was measured by removing each in turn, then comparing the goodness of fit with the model containing all
terms. Figure 2A compares the importance of each variable with other variables in the same model for
each cost type. Compared to characteristics of the encoded protein, the CAI of the transcript is at least
half an order of magnitude more important than the nearest explanatory variable, regardless of which cost
type is included. This result supports the well-established fact that codon bias correlates with gene
expression levels in growing yeast cells [21, 26, 27]. The dominant influence of CAI over other factors also
explains why the different cost types do not yield substantially different predictive power in the model. In
the molecular weight model which has the greatest explanatory power (42.2%), the most important
individual variable for predicting transcript level after CAI is cost, and carbon content is the third most
7
background image
important. A general trend across all models is that the most important variable after CAI is either cost,
carbon content or nitrogen content. The importance of tRNA gene number on transcript levels appears
relatively fixed regardless of which cost is used. Finally average sulphur content appears the least
predictive measure of transcript levels.
Proteome
The importance of cost in explaining protein levels was also modelled using multivariate regression followed
by variable removal. To analyse the effect of cost on gene expression at the protein level, we used data
from Ghaemmaghami et al. [28], measuring antibody tap-tagged protein expression, since protein
expression levels from Castrillo et al. [20] were measured relative to a background (see Methods for
details). Table 2 illustrates the explanatory power of each cost model to predict protein expression levels.
As with the transcript data, each model explains approximately 40% of gene expression, and the
difference in explained variation between the best and worst model is very small (0.8%), relative to the
overall variance explained.
Figure 2B shows the relative importance of each factor in the multivariate regression model for protein
levels. This analysis illustrates similar trends to that of the transcript data where CAI is, by an order of
magnitude, the most important factor in the model. This is not unexpected given that in the original
paper, Ghaemmaghami et al. [28] showed Spearman R = 0.57 for the relationship between CAI and protein
expression. The best fit model uses absolute cost under glucose limited conditions, in which biosynthetic
cost, carbon content and nitrogen content all have a similar importance in explaining variation. Sulphur
content is again the least important variable. This is a similar trend to the transcript data where generally
(i.e. across all models) biosynthetic cost, carbon content and nitrogen content all play a similar importance
in explaining variation in gene expression levels, and sulphur content is the least important. However the
importance of tRNA gene number, and sulphur content are more variable than in the transcript data, and
in some instances their removal improves model parsimony, as indicated by a negative AIC.
Metabolome
The availability of comprehensive metabolomic data for S. cerevisiae from Castrillo et al. [20], allows us to
determine if atomic and energetic costs are important in maintenance of free amino acid levels in the cell.
Using similar multivariate regression and variable removal, we investigated the importance of each variable
8
background image
in explaining free amino acid levels. We used the same factors as the previous two analyses, with the
exception of CAI (which is not applicable to amino acids). The main difference between analysis of the
metabolomic data and either the transcriptomic or proteomic data are fewer number of data points for free
amino acids versus those for transcripts and proteins. Table 2 shows how much of the variance in free
amino acid level is explained by each of the multivariate models. In contrast to transcript or protein levels,
cost type shows the greatest range in explaining variation in free amino acid levels, with R
2
coefficients
ranging from 76.7% for molecular weight, to 87.5% for relative cost under glucose limitation. The
explanatory power of these models at the metabolomic level are remarkable given that CAI was not
included, and are due only to the effects of energetic costs, atomic costs and genomic tRNA gene number.
Figure 2C shows the importance of each cost type in explaining free amino acid levels. The general trend
across these models in that all variables appear important, though there is more variability for carbon and
sulphur content. In particular under glucose limitation, carbon and sulphur content are less important and
therefore the explanation of free amino acid levels can be attributed to nitrogen content, tRNA gene
number, and the relative cost of amino acid synthesis.
Discussion
The principal achievements of this work are twofold. First, we developed a novel method to estimate the
cost of amino acid synthesis using a systems biology approach that incorporates sensitivity analysis and
flux balance analysis of genome-scale metabolic models. We compared our novel estimates of amino acid
costs to six measures reported in the literature and showed that absolute cost under glucose limitation is
highly correlated with previous cost measures, while relative cost under glucose limitation is not.
Furthermore we showed that our systems biology approach can be applied to calculate environment-specific
biosynthetic costs, which highlighted the effects of limiting elements of amino acid cost. Second, we
investigated the utility of energetic cost measures in conjunction with atomic costs and other factors to
analyse transcript, proteomic, and metabolomic data from S. cerevisiae. Our analysis shows that amino
acid costs do show an association with gene expression, but explain only a minor component of transcript
and protein levels relative to factors related to translational optimisation such as CAI. In contrast, we find
that energetic and atomic costs do explain a substantial degree of the variation in levels of free amino acids
in the metabolome.
9
background image
No single currency for amino acid biosynthetic cost
Our systematic review and comparison of energetic cost types previously described in the literature (Table
1) shows that they are highly correlated with one another. Among previously reported measures, molecular
weight is the least related (Figure 1), which is expected since the other energetic cost measures are based
on manual curation of metabolic networks. This finding supports the view of Seligmann [6] that the
molecular weight of an amino acid includes investments by the cell that may not easily be estimated from
the metabolic network alone. Nevertheless, molecular weight and biosynthetic cost based on curated
metabolic networks are highly correlated (see also [6]). Of the two costs estimated from a glucose limited
state, which is probably most relevant to yeast biology, our absolute cost measure correlates with those
previously described in the literature, confirming previous cost measures and validating our general
approach to estimating biosynthetic cost. Our absolute cost measure, like all previously reported cost
measures (with the exception of Wagner's fermentative measure [5]), points to tryptophan as being the
most expensive amino acid for the cell to produce (Table 1). Tryptophan is considered expensive because
of its complex double ring structure and the number of high energy molecules required for its synthesis and
is (along with methionine) unusual in that it is encoded by only one codon in the genetic code.
In contrast to our absolute cost in glucose limitation, the corresponding relative cost shows little
relationship with any previously described cost metric under the same conditions, and provides a novel
perspective on how to measure the cost of amino acid biosynthesis. Under glucose limitation, relative cost
shows leucine and lysine to be the most expensive amino acids and tryptophan is estimated as one of the
cheapest, in contrast to other previously reported cost measures (see above). Our relative cost measure
reflects the absolute cost of synthesising the amino acid, scaled to its use in the proteome. Therefore
although a tryptophan molecule may be expensive to produce individually, its low usage in the proteome
makes it cheaper to maintain overall at the cellular level. An interesting observation is that carbon limited
relative cost and nitrogen limited relative cost are correlated (Spearman R = 0.63, p = 0.003). We
speculate that this may be a possible selective advantage as any mutations to minimise biosynthetic cost
under carbon limiting conditions would also have an effect to minimise cost under nitrogen limiting
conditions content at the same time, and vice versa.
To test whether absolute or relative cost may have shaped the long-term evolution of yeast genes, we
compared the cost of each amino acid estimated under glucose limitation with their proportional usage in
10
background image
the genome (Figure 3). It is important to note that our relative costs are estimated using dry weight amino
acid biomass composition in the FBA model, not amino acid usage in the genome, and therefore these two
datasets are not intrinsically correlated. We find a high correlation between relative cost and amino acid
usage in the genome (Spearman R = 0.65, p = 0.0021), but not for absolute cost (Spearman R = -0.37, p
=0.1053). This result supports the observation that certain amino acids in S. cerevisiae are more likely to
appear in highly expressed proteins noted by Jansen & Gerstein [29], who suggested that this could be
related to their cost of synthesis. An interesting exception to the relationship between glucose limited
relative cost and usage in the genome is that serine does not follow the proportional use versus cost trend.
Serine was also previously identified as an outlier in an analysis of the relationship between cost and rates
of amino acid substitution [30]. We speculate that there are biological reasons why serine may be less
costly than expected relative to other amino acids based on it usage in yeast genes. Serine is involved in
nucleic acid synthesis, as well as that of glycine and cysteine, therefore additional demand for serine may
be buffered by the many pathways to which it is linked. Alternatively, the low cost and the fact that six
codons encode this amino acid may permit serine to be used at relatively high abundance in unconstrained
positions of proteins.
While it is clear that no single measure may fully capture all aspects of the cost of amino acid synthesis, we
believe our systems biology method for computing amino acid cost has a number of advantages over
previous methods. Given a genome-scale FBA model, computationally generated cost measures require no
manual curation and allow cost calculations that are more explicitly replicable. Moreover, use of a
computational model allows costs to be calculated under a variety of nutrient conditions, permitting a
more flexible approach to exploring costs under different cellular and environmental conditions.
Additionally, we believe our approach takes into account the whole cellular state, including all simulated
reactions and metabolites, not just those between substrates and products in amino acid metabolism.
Furthermore, as more information is included in genome scale models, the in silico predictions of amino
acid cost may come to more closely represent their costs in vivo. In particular the inclusion of
thermodynamic constraints in the S. cerevisiae model, as has been done in E. coli [31], would be of
particular relevance when estimating amino acid biosynthetic cost. Possible drawbacks are that a
species-specific FBA model must be available to perform the analysis, though our results shows that
absolute cost of synthesis is conserved across species, while relative requirements may vary. Secondly the
FBA estimated cost of an amino acid may be dependent on the objective function used in the model, for
11
background image
example do we assume that the S. cerevisiae growth strategy is to maximise biomass, or another function
such as to maximise ATP yield? Work by Schuetz et al. [32] on this problem showed that the biological
relevance of the objective function is dependent on the environment considered. The analysis of the
transcript and proteomic data showed relatively little difference difference depending on the which cost
measure was used. Therefore, we would expect that further detailed objective function investigation would
have little impact on these results. In the analysis of the metabolic data the importance of object function
may be more relevant, as the metabolome will likely very sensitive to changes in the environment. Future
work could therefore address how costs vary between organisms that have evolved and adapted their
proteome to markedly different environmental conditions, with specific analysis of organisms that would
likely have different objective functions given the environment. In addition it may also be interesting to
consider the impact of different flux optima on determining amino acid cost. Of particular relevance to this
is an exhaustive study of flux optima by Reed and Palsson [33] which showed that a significant portion of
network flexibility associated with different optima is observed in energy metabolism, which could point to
variability in amino acid cost dependent on the flux distribution phenotype.
Translational optimisation over cost minimisation
At the transcript and protein levels, our models explain approximately 40% of the the variation in
expression (Table 2). Overall, codon usage bias is the most important factor for explaining variance in gene
expression levels, and the other factors only show a limited impact on model fit. Therefore, of the variance
in gene expression explained by our models, the majority is due to optimisation of the coding sequence for
translation rather than cost minimisation. Nevertheless, we can demonstrate a small effect of amino acid
cost on transcript and protein levels that is independent of codon usage bias (Table 2, Figure 2).
Importantly, the rank ordering of these constraints on gene expression could not be made in previous
studies that used codon usage bias as a proxy for gene expression levels [2]. A secondary role for cost on
gene expression is consistent with the results of Dekel at al. [34] who compared the benefit of increased
enzymatic activity, versus the cost of expressing the protein, in terms of the benefit to growth rate. These
authors showed that the expression of the lac operon attenuated to the optimal concentation given the
environmental lactose concentation, indicating that expression levels are a more significant factor than any
optimisation of a per molecule cost. The small effect of the cost regardless of which cost measure is used in
the model may also be expected, as we have shown that they are all highly correlated, and therefore little
variance occurs between each measure. The exception to this is the relative cost measure under glucose
12
background image
limited conditions which shows no correlation with previous measures, yet still explains a significant degree
of variation in gene expression levels. This indicates that the physiological maintenance of amino acids in
the the proteome, and not just their absolute cost, is an important factor in considering the cost of gene
expression.
For the analysis of the metabolomic data, the variation in the R
2
values between models is much greater
than observed at the transcript or protein levels. One possible explanation for this is the reduced number
of data points (N = 184 [20]), compared with the large transcript (N = 36264 [20]) and protein (N =
2204 [28]) data sets used in the previous analyses. In contrast to transcript and protein levels, the model
that explained the greatest variation in free amino acid levels was based on our relative cost measure,
demonstrating the value of this model for interpreting energetic investment at the metabolomic level. This
is of interest as the relative cost relates cost of amino acid synthesis to its usage in the proteome, therefore
highlighting a possible link between the maintenance of a free amino acid levels proportional to their usage
in the genome.
Conclusions
We have conducted a systematic investigation of the hypothesis that the cost of synthesising amino acids
has shaped the evolution of protein primary structure and gene expression in yeast. Our analysis indicates
that cost plays a role, but not as large as might be expected given that a predicted 80% [5] of the cellular
ATP budget is devoted to protein synthesis. Instead our research shows that codon usage bias, and
therefore translational efficiency is a more dominant factor in the evolution of gene expression. We believe
this indicates that the optimisation of translation outweighs any benefits that would be gained from the
use of cheaper amino acids. This is further illustrated by our analysis of the metabolomic data where the
cost measure that shows the greatest explanatory power, is highly correlated with the usage of amino acids
in the proteome.
Materials and Methods
Simulating a genome scale model of metabolism
The S. cerevisiae and E. coli genome scale models are each a matrix detailing the stoichiometry of a set of
metabolic reactions, that is the ratios of metabolites used and produced in each reaction. Each model
matrix S is of size m × n, where m is the total number of metabolites, and n is the total number of
metabolic reactions. The position S
ij
in the matrix gives the coefficient of metabolite i in reaction j. A
13
background image
positive value indicates the metabolite is produced, while a negative value indicates the metabolite is
consumed. A value of 0 indicates the metabolite does not participate in the reaction.
Flux balance analysis of a genome scale model aims to solve the equation S · v = 0 using linear
programming, where v is a vector of predicted flux distributions, i.e. reaction rates, and therefore is of
length n. Multiple solutions may exist for v, and a biologically relevant reaction is usually optimised, such
as production of biomass, or ATP. Additional constraints may be placed on the model, such that certain
fluxes may not be negative indicating that the reaction may only proceed in the forward direction. Setting
additional constraints on the reactions that move nutrients from the external to the internal of the cell can
also be used the simulate the availability of different nutrients in the environment.
A systems biology approach to estimating the cost of amino acid synthesis
Flux balance analysis was performed using the COBRA toolbox [35] and the lpsolve library [36], running in
the MATLAB environment. The genome scale models used were iND750 for S. cerevisiae [17] and iJR904
for E. coli [18]. Costs between species and environment models were compared by fixing biomass flux to a
constant value. The units used in the model are mmol of reactant per gram of biomass per hour.
For each of the twenty amino acids, which are all included in the biomass reaction, we altered in turn the
requirement of each for the production of biomass, for position S
ij
in stoichiometric matrix where i is the
position of the amino acid, and j is the biomass reaction. This ranged from a 0.0002% increase in
requirement, to a -0.0002% decrease, at 0.0001% intervals. For each perturbation biomass production flux
was fixed at 0.05, and the model solved to maximise one of the four input fluxes glucose, ammonium,
sulphate, and phosphate. The figure 0.05 was choosen as an arbitrary value for biomass flux, as it was
much smaller than the possible magnatude of the intake fluxes (-1000) as to make them unbounded, and as
shown, our results using this value are consistent with previous estimates of amino acid cost. Maximising
uptake flux of a given nutrient is the equivalent of finding the solution with the minimal flux of that
molecule into the cell. The aim of this was to simulate the expense of the amino acid under a given
nutrient limiting environment, but also scaled each cost to the same growth rate. The relative cost for a
given amino acid, under a given nutrient limitation, was then estimated as the slope between amino acid
requirement (x) and the corresponding nutrient uptake flux (U ), at the level of requirement defined in the
model (x
0
). As the relative cost is estimated from a percentage change in amino requirement, this can be
scaled to an absolute per molecule cost by multiplication of 100/x
0
. The proofs for this relationship are
14
background image
shown below. The code used in this analysis is available in the supplementary materials.
Absolute Cost
=
dU
dx
x 0
= U (x)
Relative Cost
=
d
dx
U x
0
1 +
x
100
x 0
=
x
0
100
U
(x)
Determination of transcript, protein, and amino acid characteristics
The Codon Adaptation Index (CAI) for each S. cerevisiae gene was taken from Wall et al. 2005 [37], tRNA
gene number was taken from Akashi [22]. Previously reported amino acid energetic costs were obtained
from Craig & Weber [1], Akashi & Gojobori [2], Wagner [5], and Seligmann [6]. For each gene, the average
tRNA gene number, energetic cost, or atomic cost was computed as the sum of the count or cost over the
encoded protein, divided by the length, excluding stop codons. Prior to analysis, each these variables was
transformed by the natural logarithm, then scaled to have the the same mean and variance. This was to
reduce any over-variation and heteroscedasticity biasing model estimation. Scaling was performed by
subtracting the mean, then dividing by the root mean square for each data set. For the metabolomic data
set, a small constant (0.0001) was added to sulphur content so that this variable could be logged.
Determining explanatory power of factors in transcript, protein, and metabolite data
Multiple regression was used to measure the importance of atomic and energetic cost on transcript and
protein expression using the R statistical computing language [38]. For each data set, a multiple regression
model was fitted. The measured quantities of the transcript, protein, or metabolite was treated as the
response variable, and atomic cost, energetic cost, and the CAI (if applicable) were used as explanatory
variables. Atomic cost consisted of three independent variables: carbon, nitrogen and sulphur content.
Experimental conditions that differed among replicates in the datasets were treated as fixed effects in the
model, and included as interaction terms. Initially, all possible interaction terms were considered and
automated step-wise regression used to remove superfluous interaction terms based on a penalised
15
background image
log-likelihood score, Akaike's Information Criterion (AIC) [25].
To estimate the importance of each of the equation parameters, the data set was modelled without the
variable in question, and then compared to the model containing all terms, again using AIC. For example,
to estimate the importance of nitrogen in the Castrillo et al. 2007 [20] data set, the data were first
modelled using all factors - environment, dilution rate, CAI, tRNA gene count, energetic cost, nitrogen,
carbon and sulphur content. The importance of nitrogen was then determined by repeating the data
modelling with the same variables, except nitrogen content. The contribution of nitrogen content to
explain the variation in models was then estimated from the difference in the model without nitrogen with
the model containing all terms. This process was performed for all factors in the equation, and then
repeated for all energetic cost estimates as the cost variable in the equation.
Experimental data
Experimental transcriptomic, proteomic and metabolomic data used in this analysis are from Castrillo et
al
. 2007 [20], and an additional proteomic dataset is from Ghaemmaghami et al. 2003 [28]. Briefly, the
Castrillo et al. 2007 [20] experiments continuously cultured S. cerevisiae using a chemostat under four
nutrient limiting conditions and three (two for protein data) dilution rates, for a total of twelve (eight for
protein) different experimental conditions. The transcript data produced from replicate microarray
analysis of total RNA, which were processed by robust multi-array (RMA) quantile normalisation [39].
Proteomic data was produced using Isotope Tags for Relative and Absolute Quantification (iTRAQ)
LC-MS/MS and standardised relative to a standard pool sample and normalised by median absolute
deviation. Metabolomic data was obtained by GC/TOF-MS, and also normalised using median absolute
deviation, missing values were inferred from replicates in the same conditions.
As the protein data from Castrillo et al. [20] measured up/down regulation of a protein against a
background, which is not suitable as a measure of absolute protein expression levels, we instead used data
from Ghaemmaghami et al. 2003 [28] for our analyses of cost on protein expression. This reasoning was
borne out by the small explanatory power (R
2
<
3%) for any cost measure model using the protein data
from Castrillo et al. [20]. Protein expression data from Ghaemmaghami et al. 2003 [28] is based on tandem
affinity purification (TAP) of TAP-tagged S. cerevisiae ORFs. Expression levels for each protein were
determined using antibody-tag based quantification. These data were converted to absolute protein
molecules per cell using a purified E. coli INFA-TAP construct standardised against the range of yeast
16
background image
TAP tag protein observations.
For the model analysis, metabolite levels were mean averaged in each experimental condition to prevent
pseudo-replication of observations. Protein and metabolite levels were logged then scaled. Transcript levels
were scaled, but not logged as they were logged already in the original processing. The reasons for this are
the same as above, as is the scaling method.
Authors contributions
MDB performed the research. MDB, MR and CMB analysed the data. BP contributed ideas and methods
to the development of the systems biology cost measures. DD and SGO provided supervision. MDB and
CMB wrote the manuscript. All authors read, revised and approved the final manuscript.
Acknowledgements
We thank Nick Gresham, Simon Oliver and Markus Herrgard for help implementing the COBRA toolbox;
Nick Gresham and Adam Huffman for support of the University of Manchester Faculty of Life Sciences
Bioinformatics Beowulf cluster; Evangelos Simeonidis for discussion of flux balance analysis; Leo Zeef, Juan
Castrillo, Pinar Pir and Andy Hayes for helpful discussion of the transcriptomic, proteomic and
metabolomic datasets; and Hans Westerhoff, Sam Griffiths-Jones, Simon Whelan and Laurence Hurst for
their critical comments on this work. This work is funded by NERC Ph.D. studentship
NER/S/R/2005/13609 to MDB, NERC advanced fellowship NE/B500190/1 to DD, NER/T/S/2001/00343
to SGO, and BBSRC grant BB/C008219/1 to SGO, MR, and CMB and others. This is a contribution
from the Manchester Centre for Integrative Systems Biology [40].
17
background image
References
1. Craig CL, Weber RS: Selection costs of amino acid substitutions in ColE1 and ColIa gene clusters
harbored by Escherichia coli . Mol Biol Evol 1998, 15(6):774­776.
2. Akashi H, Gojobori T: Metabolic efficiency and amino acid composition in the proteomes of
Escherichia coli
and Bacillus subtilis. Proc Natl Acad Sci U S A 2002, 99(6):3695­3700.
3. Heizer EMJ, Raiford DW, Raymer ML, Doom TE, Miller RV, Krane DE: Amino acid cost and codon-usage
biases in 6 prokaryotic genomes: a whole-genome analysis. Mol Biol Evol 2006, 23(9):1670­1680.
4. Swire J: Selection on synthesis cost affects interprotein amino acid usage in all three domains of
life. J Mol Evol 2007, 64(5):558­571.
5. Wagner A: Energy costs constrain the evolution of gene expression. J Exp Zoolog B Mol Dev Evol
2007, 308(3):322­324.
6. Seligmann H: Cost-minimization of amino acid usage. J Mol Evol 2003, 56(2):151­161.
7. Mazel D, Marli`ere P: Adaptive eradication of methionine and cysteine from cyanobacterial
light-harvesting proteins. Nature 1989, 341(6239):245­248.
8. Baudouin-Cornu P, Surdin-Kerjan Y, Marli`ere P, Thomas D: Molecular evolution of protein atomic
composition. Science 2001, 293(5528):297­300.
9. Bragg JG, Thomas D, Baudouin-Cornu P: Variation among species in proteomic sulphur content is
related to environmental conditions. Proc Biol Sci 2006, 273(1591):1293­1300.
10. Varma A, Boesch BW, Palsson BŲ: Stoichiometric interpretation of Escherichia coli glucose
catabolism under various oxygenation rates. Appl Environ Microbiol 1993, 59(8):2465­2473.
11. Fell D: Understanding the control of metabolism. Portland Press 1997.
12. ter Kuile BH, Westerhoff HV: Transcriptome meets metabolome: hierarchical and metabolic
regulation of the glycolytic pathway. FEBS Lett 2001, 500(3):169­171.
13. Carlson RP: Metabolic systems cost-benefit analysis for interpreting network structure and
regulation. Bioinformatics 2007, 23(10):1258­1264.
14. Das S, Ghosh S, Pan A, Dutta C: Compositional variation in bacterial genes and proteins with
potential expression level. FEBS Letters 2005, 579(23):5205­5210.
15. Bragg JG, Wagner A: Protein carbon content evolves in response to carbon availability and may
influence the fate of duplicated genes. Proc Biol Sci 2007, 274(1613):1063­1070.
16. Raghava GP, Han JH: Correlation and prediction of gene expression level from amino acid and
dipeptide composition of its protein. BMC Bioinformatics 2005, 6:59­59.
17. Duarte NC, Palsson BŲ, Fu P: Integrated analysis of metabolic phenotypes in Saccharomyces
cerevisiae
. BMC Genomics 2004, 5:63­63.
18. Reed JL, Vo TD, Schilling CH, Palsson BŲ: An expanded genome-scale model of Escherichia coli K-12
(iJR904 GSM/GPR). Genome Biol 2003, 4(9).
19. Brauer MJ, Yuan J, Bennett BD, Lu W, Kimball E, Botstein D, Rabinowitz JD: Conservation of the
metabolomic response to starvation across two divergent microbes. Proc Natl Acad Sci U S A 2006,
103(51):19302­19307.
20. Castrillo JI, Zeef LA, Hoyle DC, Zhang N, Hayes A, Gardner DC, Cornell MJ, Petty J, Hakes L, Wardleworth
L, Rash B, Brown M, Dunn WB, Broadhurst D, O'Donoghue K, Hester SS, Dunkley TP, Hart SR, Swainston
N, Li P, Gaskell SJ, Paton NW, Lilley KS, Kell DB, Oliver SG: Growth control of the eukaryote cell: a
systems biology study in yeast. J Biol 2007, 6(2):4­4.
21. Ikemura T: Correlation between the abundance of yeast transfer RNAs and the occurrence of the
respective codons in protein genes. Differences in synonymous codon choice patterns of yeast
and Escherichia coli with reference to the abundance of isoaccepting transfer RNAs. J Mol Biol
1982, 158(4):573­597.
22. Akashi H: Translational selection and yeast proteome evolution. Genetics 2003, 164(4):1291­1303.
18
background image
23. DeRisi JL, Iyer VR, Brown PO: Exploring the metabolic and genetic control of gene expression on a
genomic scale. Science 1997, 278(5338):680­686.
24. Brauer MJ, Saldanha AJ, Dolinski K, Botstein D: Homeostatic adjustment and metabolic remodeling
in glucose-limited yeast cultures. Mol Biol Cell 2005, 16(5):2503­2517.
25. Akaike H: A new look at the statistical model identification. Automatic Control, IEEE Transactions on
1974, 19(6):716­723.
26. Brockmann R, Beyer A, Heinisch JJ, Wilhelm T: Posttranscriptional expression regulation: what
determines translation rates? PLoS Comput Biol 2007, 3(3).
27. Jansen R, Bussemaker HJ, Gerstein M: Revisiting the codon adaptation index from a whole-genome
perspective: analyzing the relationship between gene expression and codon occurrence in yeast
using a variety of models. Nucleic Acids Res 2003, 31(8):2242­2251.
28. Ghaemmaghami S, Huh WK, Bower K, Howson RW, Belle A, Dephoure N, O'Shea EK, Weissman JS: Global
analysis of protein expression in yeast. Nature 2003, 425(6959):737­741.
29. Jansen R, Gerstein M: Analysis of the yeast transcriptome with structural and functional
categories: characterizing highly expressed proteins. Nucl. Acids Res. 2000, 28(6):1481­1488.
30. Hurst LD, Feil EJ, Rocha EPC: Protein evolution: Causes of trends in amino-acid gain and loss.
Nature
2006, 442(7105):E11­E12.
31. Feist AM, Henry CS, Reed JL, Krummenacker M, Joyce AR, Karp PD, Broadbelt LJ, Hatzimanikatis V,
Palsson BŲ: A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that
accounts for 1260 ORFs and thermodynamic information. Mol Syst Biol 2007, 3.
32. Schuetz R, Kuepfer L, Sauer U: Systematic evaluation of objective functions for predicting
intracellular fluxes in Escherichia coli. Mol Syst Biol 2007, 3.
33. Reed JL, Palsson B Genome-scale in silico models of E. coli have multiple equivalent phenotypic
states: assessment of correlated reaction subsets that comprise network states. Genome Res 2004,
14(9):1797­1805.
34. Dekel E, Alon U: Optimality and evolutionary tuning of the expression level of a protein. Nature
2005, 436(7050):588­592.
35. Becker SA, Feist AM, Mo ML, Hannum G, Palsson BŲ, Herrgard MJ: Quantitative prediction of cellular
metabolism with constraint-based models: the COBRA Toolbox. Nat Protoc 2007, 2(3):727­738.
36. lp solve: http://sourceforge.net/projects/lpsolve.
37. Wall DP, Hirsh AE, Fraser HB, Kumm J, Giaever G, Eisen MB, Feldman MW: Functional genomic
analysis of the rates of protein evolution. Proc Natl Acad Sci U S A 2005, 102(15):5483­5488.
38. R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for
Statistical Computing, Vienna, Austria 2006, [http://www.R-project.org]. [ISBN 3-900051-07-0].
39. Bolstad BM, Irizarry RA, Astrand M, Speed TP: A comparison of normalization methods for high
density oligonucleotide array data based on variance and bias. Bioinformatics 2003, 19(2):185­193.
40. Manchester Centre For Integrative Systems Biology T: http://www.mcisb.org/.
19
background image
Figures
Figure 1 - Comparison of amino acid cost estimates.
Amino acid cost estimates are shown as barcharts on the left hand side. Each barchart axis shows the
minimum and maximum value of each cost type, rounded to three significant figures. The correlations
between costs are compared in a dendrogram on the righthand side computed by complete agglomerative
clustering using Spearman's Rank correlation distance between data sets (see Additional File 2).
Figure 2 - Comparison of models and variable explanatory effects for transcript, protein and
metabolite data.
A multiple regression model was fitted to explain transcript, protein and metabolite levels using carbon,
nitrogen, sulphur and cost and CAI codon adaptation index (CAI). Each variable was then removed from
the model and effect on model explanatory power was measured using Akaike's Information Criterion
(AIC). Eight different cost effects described in the text were used as the cost explanatory variable.
Figure 3 - Comparison of absolute and relative cost of amino acid biosynthesis and the percentage use
in the yeast genome.
`Loess' smoothing is used to indicate trend line.
20
background image
Tables
Table 1 - Predicted amino acid cost estimates
Datasets taken from the literature are indicated with reference. The Akashi & Gojobori [2], Craig & Weber
energy [1], and the two Wagner [5] data sets are based on the curation of the number of high-energy
molecules used during synthesis, where a defined ratio is used to convert them into a single measures:
usually ATP. The Craig & Weber 'steps' measure [1] is based on the number of the number of biosynthetic
steps between central metabolism and the produced amino acid. Molecular weight is in Daltons. Our cost
measures are the slope of the relationship between the amino acid requirement for growth and nutrient
uptake flux.
Amino acid
S.
cere-
visiae
Absolute
S.
cere-
visiae
Relative
Akashi &
Gojobori
(2002)
Craig
&
Weber
(1998)
Energy
Craig
&
Weber
(1998)
Steps
Wagner
(2005)
Fermen-
tative
Wagner
(2005)
Respira-
tory
Molecular
Weight
ala
0.00474
0.002174
11.7
12.5
1
2
14.5
89.1
arg
0.0132
0.00212
27.3
18.5
10
13
20.5
174.2
asn
0.00746
0.000757
14.7
4
1
6
18.5
132.1
asp
0.00586
0.00173
12.7
1
1
3
15.5
133.1
cys
0.0071
0.000047
24.7
24.5
9
13
26.5
121.2
gln
0.00874
0.00092
16.3
9.5
2
3
10.5
146.2
glu
0.0082
0.00247
15.3
8.5
1
2
9.5
147.1
gly
0.0029
0.000845
11.7
14.5
4
1
14.5
75.1
his
0.01386
0.000918
38.3
33
1
5
29
155.2
ile
0.01146
0.002205
32.3
20
11
14
38
131.2
leu
0.01146
0.00339
27.3
33
7
4
37
131.2
lys
0.01246
0.003562
30.3
18.5
10
12
36
146.2
met
0.01186
0.0006
34.3
18.5
9
24
36.5
149.2
phe
0.0174
0.00233
52
63
9
10
61
165.2
pro
0.0094
0.00155
20.3
12.5
4
7
14.5
115.1
ser
0.00468
0.000865
11.7
15
3
1
14.5
105.1
thr
0.0065
0.001243
18.7
6
6
9
21.5
119.1
trp
0.02264
0.000645
74.3
78.5
12
14
75.5
204.2
tyr
0.0168
0.001715
50
56.5
9
8
59
181.2
val
0.00908
0.0024
23.3
25
4
4
29
117.2
21
background image
Table 2 - Adjusted R
2
coefficients for multiple regression models
The R
2
describes the fit of the model with tRNA gene count, and all atomic, and energetic factors to
experimental data. CAI is also included in the model for transcript and protein data. Each row represents
the specific cost factor used in that model.
Cost type
Castrillo
et
al
.
2007 Tran-
scripts
Ghaemmaghami
et al
. 2003 Pro-
teins
Castrillo et al.
2007
Metabo-
lites
S. cerevisiae
Absolute
0.389
0.406
0.782
S. cerevisiae
Relative
0.383
0.408
0.875
Akashi & Gojobori (2002)
0.398
0.405
0.805
Craig & Weber (1998) Energy
0.416
0.40
0.835
Craig & Weber (1998) Steps
0.375
0.404
0.866
Wagner (2005) Respiratory
0.382
0.405
0.822
Wagner (2005) Fermentative
0.377
0.407
0.851
Molecular Weight
0.422
0.405
0.767
22
background image
Additional Files
Additional file 1 -- Amino acid costs
Amino acid cost data sets used in the analysis.
Additional file 2 -- Amino acid cost correlations
Spearman's rank correlations between cost data sets.
Additional file 3 - Comparison of the genome scale model derived cost data sets.
Comparison of estimated amino acid cost with number of ATP and NADPH molecules used in synthesis
(left), and molecular weight (right). On the y axis are the amino acid costs estimated using flux balance
analysis. Both S. cerevisiae and E. coli measures are included to illustrate correlation of cost estimates
between species. Estimated cost values have been been rescaled around their mean value to allow
comparisons across species. The trends in each plot are drawn using `loess' smoothing.
Additional file 4 -- Tablulated transcript data set
The transcript data from Castrillo et al. 2007 tabulated with cost, atomic composition, tRNA gene number
and CAI.
Additional file 5 -- Tabulated protein data set
The protein data from Ghaemmaghami et al. 2003 tabulated with cost, atomic composition, tRNA gene
number and CAI
Additional file 6 -- Tabulated metabolite data set
The metabolite data from Castrillo et al. 2007 tabulated with cost, atomic composition, and tRNA gene
number.
Additional file 7 -- Matlab code to estimate amino acid cost
The Matlab and COBRA [35] code used to estimate amino acid cost in this analysis.
23
background image
0.0188
0
0.0321
0
204
0
24
0
12
0
74.3
0
12.0
0
75.5
0
78.5
0
20
0
0.00253
0
5
0
Carbon Limited Relative
Nitrogen Limited Relative
Molecular Weight
Wagner 2005 Fermentative
Craig & Weber 1998 Steps
Akashi & Gojobori 2002
Carbon Limited Absolute
Wagner 2005 Respiratory
Craig & Weber 1998 Energy
Nitrogen Limited Absolute
Sulphur Limited Relative
Sulphur Limited Absolute
ala
arg
asn
asp
cys
gln
glu
gly
his
ile
leu
lys
met
phe
pro
ser
thr
trp
tyr
val
background image
!ariable rem*val e,,ect/ (l*g10 456 distance)
!ariable rem*ved
s;l<h;r
nitr*gen
carb*n
c*st
tR?4
!1
0
1
2
A
4
6astrill* 200C Detab*lites
s;l<h;r
nitr*gen
carb*n
c*st
tR?4
645
Ehaemmaghami 200A Fr*teins
s;l<h;r
nitr*gen
carb*n
c*st
tR?4
645
6astrill* 200C Transcri<ts
4Hashi I E*J*b*ri 2002
6raig I Keber 1LL8 NnergO
6raig I Keber 1LL8 Pte<s
Kagner 2005 Res<irat*rO
Kagner 2005 Rermentative
D*lec;lar Keight
6arb*n Limited Relative
6arb*n Limited 4bs*l;te
background image
Mean rescaled estimated amino acid cost
Percent use in the yeast genome
2
4
6
8
10
!1
0
1
2
ala
arg
asn
asp
cys
gln
glu
gly
his
ile
leu
lys
met
phe
pro
ser
thr
trp
tyr
val<>Absolute
!1
0
1
2
ala
arg
asn
asp
cys
gln
glu
gly
his
ile
leu
lys
met
phe
pro
ser
thr
trp
tyr
val
Relative
Relative