Prospects for the Directed Evolution of Cellulases for Biofuels Production

Caravaggio Caniglia
December 4, 2020

Submitted as coursework for PH240, Stanford University, Fall 2020

Introduction

Fig. 1: Structure of Cellulose, composed of monomeric cellobiose units that can, in turn, be broken down into glucose. (Source: C. Caniglia)

Cellulose (Fig. 1), the fibrous polymer responsible for the strength and indigestibility of wood, has long been considered an attractive precursor to alternative fuels such as bio-ethanol, with production dating back more than a century (the prefix simply indicates derivation from biological sources such as starch in plants, notably corn). [1] While starch, like cellulose, consists of long chains of glucose molecules joined by bridging oxygens in what are known as glycosidic bonds, the geometry of these linkages allows a variety of processes, in human and animal guts and in industrial reactors, to break down starch and use its individual sugar units for fuel. At present, very few biological or chemical processes can efficiently achieve the same effect with cellulose. [2] Since the use of specialized enzymes, called Cellulases, to efficiently split cellulose into its glucose monomer units, is of great importance to the future development of plant-based ethanol fuels, it is important to understand what degree of improvement in efficiency and yield of ethanol generation can be achieved by maximizing their reactivity. [3] Here, the possibility of improving efficiency through directed evolution, a biological technique used to increase the efficacy of enzymatic chemical reactions, is examined.

Cellulases

Cellulases, first discovered during the Second World War in fungi, are enzymes responsible for the degradation of the cell walls in plant cells, which consist of polymers formed from repeating units of simple sugars, most notably cellulose. [4] This discovery had major ramifications for our understanding of cellulose decomposition, a process that is accomplished 1018 times more quickly by the aforementioned fungi than by ambient conditions. [3] The fungus discovered in World War II, Trichoderma reesei (T. reesei), is good enough at what it does to still be the go-to in industry. As biofuels have entered the renewable energy conversation, use of cellulases to generate ethanol precursor sugars has received increased attention because, unlike the starch used traditionally to derive corn ethanol, cellulose is indigestible and its use in fuel, therefore, does not present as immediate a conflict with food supply.

Although cellulases are often spoken of collectively, the term encompasses a wide variety of enzymes responsible for diverse aspects of the cellulose break down process. [3] Of these, enzymes known as glycoside hydrolases or cellobiohydrolases terms which simply indicate that they break the glycosidic bonds between every other glucose unit in a cellulose polymer, are responsible for the polymer's decomposition. [5] This leaves a monomer unit called cellobiose, which consists of two glucose molecules oriented differently in space and joined by an oxygen atom. This oxygen linkage can then be broken by β-glucosidase enzymes so that cellobiose is split into two glucose molecules. These can subsequently be turned into ethanol via fermentation. [5]

Glucoside hydrolases are of more interest than other cellulases because they systematically break apart the cellulose polymer from end to end. [5] Other enzymes that split glycosidic bonds do so randomly and are therefore less amenable to industrial processes that require the polymer to be completely split apart. They are also, however, expensive and somewhat inefficient, as their activity is slowed by the cellobiose and glucose they produce and large amounts (about 15 milligrams of enzyme for every gram of cellulose broken down) are needed for complete disassembly of the polymer chain. [6,7] At present, the cost of energy produced from bio-ethanol is increased by $1.00 per gallon of oil equivalent by these inefficiencies alone, yielding 2007 U.S. government estimates of $2.65/gallon, or $3.60 per energy equivalent of a gallon of gasoline. [2]

Since cellulases are important in the laundry detergent industry, where they are used to smooth fabric and have a share of the detergent market that was worth $110 million annually in 1995 and has grown since, research into improving their functionality is likely to have some benefits. [8,9] Whether these benefits will extend downstream to the emerging biofuels industry is difficult to say. The United States Renewable Fuel Standard, set in 2007, mandates use of biofuels derived from cellulose on the order of 10s of billions of gallons annually by 2020. [10] The Environmental Protection Agency (EPA), however, has reserved the right to set smaller mandates based on the improvement of cellulose decomposition technologies. In 2019, the mandate was 418 million gallons, or 5% of the 2007 target. [11] Clearly, drastic advances in the production or efficiency of cellulases are required for cellulosic biofuels to compete with starch-derived bioethanol for a seat in the renewable energy marketplace.

Directed Evolution

Directed evolution, a laboratory mimic of natural selection and Darwinian evolution, aims to allow quick development of desirable traits in enzymes. The procedure, first used by Francis Arnold in 1993 and for which she received a part of the 2018 Nobel Prize in Chemistry, involves cycles during which protein-encoding DNA is mutated to allow it to encode a variety of mutated enzymes, which are subjected to an assay that selects for a desirable chemical trait (such as reaction rate or specificity in production of a chemical product). [12] The resultant enzymes that prove desirable enough to pass through the assay can be amplified by polymerase chain reaction (a biology technique used to quickly make copies of a DNA sequence, which in this case has been mutated) and isolated for further study. [12]

Enzymes are composed of a sequence of linked amino acids in a certain order that both facilitates their active site reactivity and results in their shape, on a larger scale. There are twenty amino acids, which in theory can be linked in any order, meaning that any protein containing n amino acids has a sequence space of 20n ordering combinations of those amino acids. Groupings of three base pairs in DNA encode for certain amino acids, so performing mutagenesis on a DNA strand allows researchers to co-opt cellular mechanisms and quickly generate catalytic enzymes that, because they are encoded for by mutated DNA strands, contain amino acids in different orderings than what is found in nature. [12] The key to the success of directed evolution has been the use of mutagenesis techniques to screen enormous regions of sequence space, on the order of 1015 enzyme mutants, for improvements in chemical activity. [13] While a single change to a single amino acid is more likely than not to decrease the activity of an enzyme, often a handful out of 1015 or so enzymes in a mutant library exhibit increased activity or non-natural reactivity that can then be repurposed for synthetic chemistry.

Traditional synthetic chemistry relies on the rational design of molecules to create new catalysts for reactions from nitrogen fixing (Haber-Bosch) to polymerization of basic petroleum-sourced carbon compounds (e.g. Ziegler-Natta Catalysis). While this approach has had remarkable success, small-molecule catalysts are simply not large enough to create the complex web of electronic and spatial environments that proteins can. [12] As a result, while proteins cannot perform all of the reactions that humans have found to be important (making plastic, for instance), the reactions that they do perform are done with fewer errors and far faster than most industrial reactions can be. Since nature is, in general, far more efficient in its chemistry than laboratory science, altering the active site (catalytic portion) of an enzyme to get it to perform new-to-nature reactions with human significance and subsequently using directed evolution to screen the new enzyme for reactivity improvements has recently been highly successful. [14,15] It is not unusual to see reaction parameters such as turnover rate (reactions performed, or essentially products produced, per unit time) improve by many orders of magnitude in an evolved enzyme compared with a small-molecule catalyst. [16]

In contrast, naturally occurring reactions are harder to improve. The reason is that enzymes in biology have already evolved for billions of years. [17] Because cells have undergone mutations throughout all of that time, selection for effective mutations has resulted in the creation of highly sophisticated and efficient catalysts. Estimates of the number of amino acid sequences tried by biology over billions of years range from ~1020 to 1050, quite large compared with the 1015 sequences often cited as the norm for in vitro directed evolution mutations. [17] Thus, it seems reasonable to argue that nature has explored a far greater breadth of the available enzyme sequence space than human researchers can at the present, and that it is no wonder the efficacy of reactions useful in nature is difficult to improve.

The rationale behind this argument is less clear quantitatively than it sounds intuitively. The order of amino acids in a protein is referred to as a sequence space, and since there are 20 amino acids it is often quoted that an enzyme 100 amino acids in length (not particularly large!) would have a sequence space of 20100 or about 10130 possible structures. If, generously, biology has tried 1050 sequences and humans have tried 1017 over the course of many experiments, there are about 1080 sequences and 10113 sequences left untried by nature and science, respectively, for every one found via mutagenesis. In essence, both the earth and modern research have failed even to scratch the surface of the possible configurations of an enzyme, so it is understandable to wonder why laboratory science might not just get lucky, or, through rational designation of regions for mutation, discover a cellulase far more effective than any in nature, for instance. As it turns out, however, modern amino acids were likely derived from 5 or so earlier precursors and tend to be possible to group by similar properties. [17] If this is the case, it is highly likely that biology's 1050 attempts have examined the most different portions of sequence space, or those that produce the most drastic changes in enzyme function. [17]

This argument may feel somewhat pedantic, but is important because if biology has, in fact, tried most of the combinations of very different amino acids in the impactful regions of sequence space, we would expect that only marginal improvements in catalytic efficiency are achievable by directed evolution. Rational choice of regions for mutagenesis should give laboratory science the upper hand over nature in deciding where to construct mutant libraries. The rise of site-directed mutagenesis methods as an alternative to error-prone PCR techniques lends further credence to this theory. However, we would expect these improvements to be within an order of magnitude of the catalytic activity of natural cellulases.

This is, indeed, borne out in the data. Of the evolved enzymes listed in Dadwa et al., not one involved in breaking down cellulose into cellobiose showed an improvement in catalytic activity beyond ten times that of a naturally occurring cellulase. [18] Only one (a -glucosidase that was not made using directed evolution) showed a more than tenfold improvement over nature in any industrially relevant cellulase reaction. [18,19] Other reviews suggest a similar trend. [20] The shear size of unexplored sequence space means that the possibility of a hyper-efficient cellulase being made in the future cannot be ruled out, but, barring a massive advance in the size and variety of protein sequences that can be tested in an experiment (e.g. the ability to use directed evolution assays in the analysis of 1030 sequences involving unnatural as well as natural amino acids), it seems that 10× natural catalytic activity is the cap for engineered enzymes breaking down cellulose.

Costs

It should be noted that T. reesei has been subjected to mutagenesis in the decades since World War II. [4] For the purposes of a cost analysis, however, we will nonetheless assume that a tenfold improvement in the activity of cellulase enzymes and especially glucoside hydrolases is possible. Since it is quite difficult to locate input costs for cellulosic ethanol, but its current share of ethanol production (corn starch is the precursor for 95% of ethanol production in the United States, with other crop starches and cellulose making up the remaining 5%) makes clear that it is, in general, not cost-competitive, we will be forced to accede to the American government's 2007 suggestion that a reduction in costs of enzyme from ~$0.40/gallon of ethanol to $0.05/gallon and of cellulosic feedstock from $1.00/gallon to $0.33/gallon would make cellulosic ethanol cost-competitive with starch ethanol. [2,21]

Since corn ethanol costs have not, in fact, deviated much from an ~$1.50/gallon baseline since 2007, we can take the government-referenced point of cost-competitiveness at face-value. [22] The cost of enzyme is another story, however. Comparisons with other industries suggest values between $0.69/gallon produced to $2.71/gallon depending on process parameters. [23] Generously, we will use a cost value of $0.30/gallon of ethanol produced that is obtained after assuming cellulase production occurs at or near the site of ethanol production. [23]

In this case, a tenfold increase in enzyme efficiency would reduce the amount needed per gram of cellulose from 15 milligrams to 1.5. Doing so would, in turn, decrease the cost per gallon of ethanol produced to about $0.03, below government targets. If the evolved enzyme cost less than 1.67 times as much as the wildtype, then, the government target of $0.05 spent on cellulases per gallon of ethanol produced could be met. Whether such an advance without a dramatic cost increase is likely is tough to say, but it would seem, judging by back-of-the- envelope calculations, to be possible.

The larger cost per gallon, though, comes from the need to buy cellulose feedstocks, which cost about $60/ton in 2007 and even more today, with one study suggesting an average price of $84.45/ton. [2, 24] Since one ton can yield about 60 gallons in an industrial setup, a price of $1.00/gallon of ethanol is obtained, which has not budged from the 2007 value. [2] Improving the efficiency with which the feedstock is converted to cellulose has no bearing on the cost of the feed itself, and since prices have not decreased in at least 13 years, it is unclear how the government target of $30 per ton of feedstock can be met. Barring a drastic shift in economic conditions or a change in the structure of subsidies, it appears reasonable to assume that cellulosic feedstock prices will remain relatively stable and that, no matter how effectively they can be converted to glucose, their cost will prohibit cellulosic ethanol from becoming competitive on the open market.

Emissions

Estimates of the energy balance of cellulosic ethanol often site drastic emission reductions compared with both standard corn ethanol and gasoline. [25] Even studies that factor in transportation and land-use changes often fail to consider cellulase feeding, however. Culturing cellulase enzymes in T. reesei requires above 1% concentrations of glucose to be maintained in solution to feed the fungi. [26] An academic study cites an initial glucose concentration of 3% to be used, and since subsequent feeding once concentration dips as the fungi eat is difficult to quantify, 3% will be used as a go-to production. [26]

The same study showed a maximum production of 80 mg cellulase/liter of glucose-containing growth medium. [26] Since an evolved cellulase could feasibly be utilized in T. reesei, this value should not change drastically if directed evolution is used to improve the enzymes reactivity. One liter of water weighs one kilogram, so a 3% by mass conversion indicates that at least 30 g of glucose are needed to produce 80 mg of cellulase. Using present industrial loading, 15 mg of cellulase are used to break down 1 g of glucose with about 90% efficiency, for a yield of 900 mg of glucose. [7] Thus, 80 mg of cellulase could be used to decompose 5.33 g of cellulose into 4.8 g of glucose, an order of magnitude lower than the amount consumed in making the cellulase in the first place! In other words, the cellulases essentially convert glucose to glucose, with many intermediate steps, at around 10% efficiency. It would be more environmentally friendly to ferment the sugar in the fungal growth medium than to use it to make cellulases!

Let us, again generously, assume that directed evolution can be used to produce a cellulase that only needs to be supplied in amounts of 2 mg per 1 g of cellulose in industrial reactors to break down 90% of the cellulose. Then our 30 g of glucose input would yield 36 g of glucose output. Let us again disregard the inefficiencies of fermentation and the fact that glucose is added sporadically to growth medium while cellulases are produced. At a maximum, only 6/36, or ~16.7%, of cellulosic ethanol is the result of a net increase in available glucose. So, a figure of about 70 g CO2 per megajoule produced by combustion of cellulosic ethanol is inaccurate because a net of only 0.167 MJ are actually produced by that ethanol based on the glucose balance involved in making it, giving gross carbon dioxide emissions of 420 g/MJ, which, subtracting the 75g/MJ absorbed from the air in the corn used for cellulose, gives a carbon cost of ~345 g CO2/MJ, more than three times the emissions from gasoline! [23] In other words, any reasonable expectation for improvement in the catalytic performance of cellulases would fail to dent the incredible energy cost inherent in their production and use in ethanol generation.

Let us, again generously, assume that directed evolution can be used to produce a cellulase that only needs to be supplied in amounts of 1.5 mg per 1 g of cellulose in industrial reactors to break down 90% of the cellulose. Then our 30 g of glucose input would yield 48 g of glucose output. Let us again disregard the inefficiencies of fermentation and the fact that glucose is added sporadically to growth medium while cellulases are produced. At a maximum, only 18/48, or ~37.5%, of cellulosic ethanol is the result of a net increase in available glucose. So, a figure of about 70 g CO2 per megajoule produced by combustion of cellulosic ethanol is inaccurate because a net of only 0.375 MJ are actually produced by that ethanol based on the glucose balance involved in making it, giving gross carbon dioxide emissions of 187 g/MJ, which, subtracting the 75g/MJ absorbed from the air in the corn used for cellulose, gives a carbon cost of ~110 g CO2/MJ, considerably more than the emissions from gasoline! [25] In other words, any reasonable expectation for improvement in the catalytic performance of cellulases would fail to overcome the incredible energy cost inherent in their production and use in ethanol generation.

Conclusions

Due to the vast size of sequence space for large enzymes like cellulases, it is impossible to say for certain that many-orders-of-magnitude improvements are unachievable by mutagenesis. Glycoside hydrolases are generally 40-70 kilo-Daltons in mass, or, using the conversion 1 amino acid weighs ~ 110 grams per mole, contain over 330 amino acid residues, for sequence space of 20333 or about 10433 orderings of amino acids. As such, it is conceivable that an evolved cellulase with extraordinarily high reaction efficiency could drastically alter the discouraging carbon emission arithmetic of cellulosic ethanol production.

Whether this is likely is another story, however. The improvements in cellulase activity thus far achieved through directed evolution are on the less-than-an-order-of magnitude scale, indicating that the most useful parts of its sequence space have been probed by evolution in some degree of completeness. Furthermore, even if an enzyme mutant could erase the environmental worries in cellulosic ethanol production, it would do nothing to ameliorate the issue of cost-competitiveness, since cellulosic feedstock is prohibitively expensive. For now, the glucose consumed in cellulase production erases any positive environmental impact, and it seems reasonable to assume that any improvements in the enzyme, at least in the near future, will benefit detergent companies far more than ethanol producers.

© Caravaggio Caniglia. The author warrants that the work is the author's own and that Stanford University provided no input other than typesetting and referencing guidelines. The author grants permission to copy, distribute and display this work in unaltered form, with attribution to the author, for noncommercial purposes only. All other rights, including commercial rights, are reserved to the author.

References

[1] B. D. Solomon, J. R. Barnes, and K. E. Halvorsen, "Grain and Cellulosic Ethanol: History, Economics, and Energy Policy," Biomass Bioenerg. 31, 416 (2007).

[2] S. Osborne, "Energy in 2020: Assessing the Economic Effects of Commercialization of Cellulosic Ethanol," U.S. International Trade Administration, November 2007.

[3] C. M. Payne et al., "Fungal Cellulases," Chem. Rev. 115, 1308 (2015).

[4] V. Seidl et al., "Sexual Development in the Industrial Workhorse Trichoderma reesei," Proc. Natl. Acad. Sci. (USA) 106, 13909 (2009).

[5] P. V. Harris et al., "Stimulation of Lignocellulosic Biomass Hydrolysis by Proteins of Glycoside Hydrolase Family 61: Structure and Function of a Large, Enigmatic Family," Biochemistry 49, 3305 (2010).

[6] R. K. Sukumaran et al., "Cellulase Production Using Biomass Feed Stock and its Application in Lignocellulose Saccharification for Bio-Ethanol Production," Renew. Energ. 34, 421 (2009).

[7] B. Yang et al., "Enzymatic Hydrolysis of Cellulosic Biomass," Biofuels 2, 421 (2014).

[8] F. N. Niyonzima, "Detergent-Compatible Bacterial Cellulases," J. Basic Microb. 59, 134 (2019).

[9] J. H. Houston, "Detergent Enzymes' Market," in Enzymes in Detergency , ed. by J. H. van Ee, O. Misset, and E. J. Baas (Marcel Dekker, 1997).

[10] "The Renewable Fuel Standard (RFS): An Overview," R43325, Congressional Research Service, April 2020.

[11] "Renewable Fuel Standard Program: Standards for 2019 and Biomass-Based Diesel Volume for 2020," Federal Register 83, 63704 (2018).

[12] F. H. Arnold, "Innovation by Evolution: Bringing New Chemistry to Life (Nobel Lecture)," Angew. Chem. Int. Edit. 58, 14420 (2019).

[13] P. A. Romero, and F. H. Arnold, "Exploring Protein Fitness Landscapes by Directed Evolution," Nat. Rev. Mol. Cell. Biol. 10, 866 (2009).

[14] Y. Wang, X. Yu, and H. Zhao, "Biosystems Design by Directed Evolution," AIChE J. 66, e16716 (2020).

[15] H. Renata, Z. J. Wang, and F. H. Arnold, "Expanding the Enzyme Universe: Accessing Non-Natural Reactions by Mechanism-Guided Directed Evolution," Angew. Chem. Int. Edit. 54, 3351 (2015).

[16] S. C. Hammer, A. M. Knight, and F. H. Arnold, "Design and Evolution of Enzymes for Non-Natural Chemistry," Curr. Opin. Green Sustain. Chem. 7, 23 (2017).

[17] D. T. F. Dryden, A. R. Thomson, and J. H. White, "How Much of Protein Sequence Space has been Explored by Life on Earth?" J. R. Soc. Inferface 5, 953 (2008).

[18] A. Dadwal, S. Sharma, and T. Satyanarayana, "Progress in Ameliorating Beneficial Characteristics of Microbial Cellulases by Genetic Engineering Approaches for Cellulose Saccharification," Front. Microbiol. 11, 1387 (2020).

[19] G. Yao et al., "Production of a High-Efficiency Cellulase Complex via β-glucosidase Engineering in Penicillium oxalicum," Biotechnol. Biofuels 9, 78 (2016).

[20] H. Lin et al., "Advances in the Study of Directed Evolution for Cellulases," Front. Environ. Sci. Eng. 5, 519 (2011).

[21] A. Bušić et al., "Bioethanol Production from Renewable Raw Materials and its Separation and Purification: A Review," Food Technol. Biotechnol. 56, 289 (2018).

[22] S. Irwin, "2019 Ethanol Production Profits: Just How Bad Was It?" Farmdoc Daily, 19 Jan 20.

[23] G. Liu, J. Zhang, and J. Bao, "Cost Evaluation of Cellulase Enzyme for Industrial Cellulosic Ethanol Production Based on Rigorous Aspen Plus Modeling," Bioproc. Biosyst. Eng. 39, 133 (2016).

[24] L. R. Lynd et al., "Cellulosic Ethanol: Status and Innovation," Curr. Opin. Biotechnol. 45, 202 (2017).

[25] M. Q. Wang et al., "Energy and Greenhouse Gas Emission Effects of Corn and Cellulosic Ethanol with Technology Improvements and Land Use Changes," Biomass Bioenerg. 35, 1885 (2011).

[26] T. Nakari-Setälä and M. Penttilä, "Production of Trichoderma reesei Cellulases on Glucose-Containing Media," Appl. Environ. Microbiol. 61, 3650 (1995).