A roadmap to directed enzyme evolution and screening systems for biotechnological applications

Enzymes have been long used in man-made biochemical processes, from brewing and fermentation to current industrial production of fi ne chemicals. The ever-growing demand for enzymes in increasingly specifi c applications requires tailoring naturally occurring enzymes to the non-natural conditions found in industrial processes. Relationships between enzyme sequence, structure and activity are far from understood, thus hindering the capacity to design tailored biocatalysts. In the fi eld of protein engineering, directed enzyme evolution is a powerful algorithm to generate and identify novel and improved enzymes through iterative rounds of mutagenesis and screening applying a specifi c evolutive pressure. In practice, critical checkpoints in directed evolution are: selection of the starting point, generation of the mutant library, development of the screening assay and analysis of the output of the screening campaign. Each step in directed evolution can be performed using conceptually and technically diff erent approaches, all having inherent advantages and challenges. In this article, we present and discuss in a general overview, challenges of designing and performing a directed enzyme evolution campaign, current advances in methods, as well as highlighting some examples of its applications in industrially relevant enzymes.


INTRODUCTION
Proteins are the functional and structural units in biological systems. Among proteins, enzymes have evolved across millions of years to specifi cally carry out essential functions in living organisms. The use of enzymes as biocatalysts was initially tied to the discovery of enzymes or organisms able to perform a reaction of interest in nature such as in brewing or fermentation processes. However, as immense as natural diversity is, the reaction conditions, reaction chemistry, substrate and product concentrations in which industrial processes are conceived often differ from physiological conditions, resulting in reduced enzymatic effi ciency, stability or even complete loss of activity.
Generating and identifying enzymes adapted to nonnatural (non-biological) conditions and achieving functional or economically effi cient catalytic rates in such conditions is a particularly diffi cult task; despite being able to successfully modify protein sequence and change enzymatic properties, our understanding of the molecular mechanisms which will allow predicting amino acid sequence changes that would result in a specifi c behavior remains incomplete.
Successfully engineered enzymes for non-natural conditions enabled biocatalysts to be efficiently used in industrial processes which were traditionally performed chemically. Retroactively, in the past decade, the increasing demand of biocatalyst as replacement for chemical processes has progressively fueled the need for novel and improved enzymes applied to product-driven biocatalysis fi elds such as the synthesis of pharmaceutical precursors and the production fi ne chemicals.
Protein engineering strategies comprise rational and semirational approaches, directed enzyme evolution and recently the "data-driven" enzyme design (Bommarius et al., 2011).
This review aims to present and discuss the advantages and challenges of designing and performing a directed enzyme evolution campaign, current advances in methods, as well as highlighting some examples of its applications in industrially relevant enzymes.
Enzyme engineering strategies seek to generate suitable enzymes having a higher performance compared to the starting variant. Rational design aims to modify specifi c amino acids which are known or expected to be directly involved in the activity, stability or substrate/product specifi city of the target enzyme, commonly generating single variants of small focused libraries (<100 variants) which can be rapidly screened. Such identifi cation of critical amino acid residues strongly relies on the available information, in the way of structures (X-ray, NMR), catalytic mechanisms, or number of sequences with high identity. In the case of wellknown enzymes, rational design allows to focus the mutant library diversity on a few amino acids, thus generating very small libraries. On the other hand, when working with recently isolated enzymes having limited or no available information, or those with few described homologs, assigning amino acids to a specifi c function becomes a high risk bet. Increasing the confi dence of this (educated) guess often means generating tenths of single-saturation mutagenesis libraries, or combinatorial libraries in which theoretical diversity exponentially grows up to millions of variants.
Directed enzyme evolution (Farinas et al., 2001) represents a highly versatile approach for tailoring enzyme properties to the needs in industrial applications through iterative cycles of gene diversity generation and high throughput screening, providing, in addition, valuable insights in structure-function relationships in recently isolated and less studied enzymes (Tee and . A classic directed evolution experiment comprises three main iterative steps: Design and construction of a mutant library from a parent gene, screening to identify improved variants out of a large mutant pool, and isolating the genes encoding for improved variants (Fig. 1).
Despite the directed enzyme evolution concept seems simple and straightforward, each of its steps is a key process, involving decisions that will ultimately aff ect the output of the directed evolution campaign. The following sections discuss checkpoints of a directed enzyme evolution campaign, their role in the whole process and the current and novel strategies available for each step.

Starting point
Enzymes can be evolved towards improving an existing activity, conversion of novel substrates (substrate promiscuity) or performing novel reactions (chemical promiscuity) under non-natural conditions. Enzymes catalyzing a specific chemical reaction can be identifi ed from extensive databases, such as BRENDA (Schomburg et al., 2013). Many naturally occurring enzymes are promiscuous (O'Brien and Herschlag, 1999). When aiming to improve an existing activity, special consideration has to be taken on how much nature has worked on that specifi c reaction under the natural selection conditions. In the case of the main catalytic activity, a dramatic improvement should not be expected. Higher improvement rates can be expected when expanding side activities, or when laboratory evolution focusses in non-natural aspects of the fi nal application of the target enzyme. Therefore, the optimization of enzymes for novel substrates is more likely to succeed if the parents already show low catalytic rates towards a target substrate or a closely related molecule (Leemhuis et al., 2009, O'Brien andHerschlag, 1999).
On the other hand, physicochemical properties of the parent, such as thermal and solvent stability, are considered to have an eff ect on the "evolvability" of the enzyme. A more stable parent is able to withstand a greater number of amino acid substitutions than a less stable homolog. This is not trivial, since catalytically benefi cial amino acid substitutions can usually reduce enzyme stability as a side-eff ect (Bloom et al., 2006, Martinez et al., 2013. The ability of a parent enzyme to accept multiple destabilizing amino acid substitutions increases the probability of identifying benefi cial substitutions. Enzymes isolated from thermophilic organism are therefore often a fi rst choice as parents for directed evolution. However, it is reported that thermophilic enzymes show a lower activity at lower reaction temperatures (Demirjian et al., 2001, Kumar andNussinov, 2001). A suitable combination of activity and stability of the selected parent is a main parameter to judge on an appropriate starting point for a directed evolution campaign.
Additionally, multiple homologous enzymes partially fulfilling only activity or stability requirements can be recombined using gene recombination methods, such as DNA shuffl ing (Coco et al., 2001), StEP (Zhao et al., 1998), SCOPE (O'Maille et al., 2002 or PTRec (Marienhagen et al., 2012) in order to generate, from that limited diversity, a hybrid molecule with both appropriate activity and high stability. An overview on the developments in diversity generation can be found in a recent review (Ruff et al., 2013).

Library construction
Rather than introducing genetic diversity by accumulation of mutations and recombination over generations, in a directed evolution experiment genetic diversity is generated by introducing random mutations in the parent gene in a single event over multiple rounds, yielding a population of mutant variants that can be screened for a desired trait (Fig. 1).
Methods to generate mutant libraries range from codon focused to random unbiased mutagenesis, and every approach has advantages and drawbacks.
A straightforward brute-force random mutagenesis approach would consist in generating a mutant library covering the whole diversity available within the protein sequence space of the target protein, and subsequent screening for improved variants. Brought to numbers, the theoretical sequence space of unique amino acid peptides with 10 amino acids is 1 x 20 10 = 1.024 x 10 13 (20 amino acid building blocks per position) Even at the impressive screening rate of one million variants per hour -seven days a week-the fi rst year of screening would only cover less than 0.1% percent of the theoretical sequence space. Given that enzymes range from 180 to 600 amino acids, strategies and methods for understanding how sequence randomization can be grasped are required, in order to generate sequence diversity enriched in variants having a higher chance of improvement for a given trait (Shivange et al., 2009).
Understanding how generated sequence diversity translates into functional diversity goes hand to hand with understanding protein properties that ought to be improved. In the case of localized enzyme properties, such Figure 1. Schematic representation of a directed enzyme evolution campaign. A typical directed evolution campaign comprises iterative rounds in which a parent gene coding for an enzyme of interest is subjected to random mutagenesis, generating a mutant library. Enzyme variants are expressed and screened for a given property. Improved variants are isolated, confi rmed and mutations found are evaluated and recombined generating an overall improved variant which can be used as an input for the next round.
as substrate binding (enantio-selectivity, substrate specifi city or regio-selectivity), a focused mutant library approach can be performed; when a 3-dimensional model or a crystal structure of the enzyme is available. Successful attempts are reported in which the selection of one or multiple amino acid substitutions were suffi cient to alter enzyme enantioselectivity and substrate specifi city (Jakoblinnert et al., 2013, Jakoblinnert et al., 2013, Reetz, 2009, Reetz et al., 2005. Through single or simultaneous sitesaturation mutagenesis (SSM), specifi c codons are randomized using degenerate codons (NNN; N = A, T, C or G) yielding all 64 possible codons in a single SSM (Table I). Considering an oversampling factor of 3 for a ~95 percent chance of full coverage , screening 192 randomly selected variants would be necessary to cover the diversity generated. Simultaneous SSM generates a diversity increment in which saturating two simultaneous NNN codons means screening 12,288 variants (64 x 64 codons), whereas three simultaneous codons saturation demands screening 786,432 variants (Table  I). This diversity explosion is due to the genetic redundancy of the NNN codon. By employing NNK codons (K = guanine or thymine) representing 32 possible codons, still coding for all 20 amino acids, the screening demands of single, double and triple SSM libraries are reduced to 96, 3,072 and 98,304, respectively ( Table I). The ability to control the codon alphabet can be used to construct SSM libraries enriched in specifi c amino acids to achieve a double purpose: to reduce the generated diversity, thus the screening eff ort and to focus the chemical properties of the introduced amino acids according to a predicted eff ect in the protein. The degenerate codon NDT (N = A, T, C or G; D = A, G or T) has been proposed to further decrease the chemical redundancy since it reduces the diversity to 12 amino acids (Gly, Phe, Ile, Leu, Val, Tyr, His, Cys, Ser, Asn, Asp, Arg), representing a balanced sample of aromatic, aliphatic, non-polar, polar, negative and positively charged residues. The eff ectiveness of a NDT library was shown to be 13-fold higher compared to NNK after screening 5000 clones , supporting the idea that a genetically and chemically non-redundant codon randomization approach will greatly decrease screening eff orts without hindering -given the reduced amino acid diversitythe chance to fi nd improved variants.
Following this approach, the type and amount introduced amino acids in SSM libraries can nowadays be controlled by selecting degenerate codons for virtually any subset of amino acid residues (Firth andPatrick, 2005, Mena andDaugherty, 2005), which combined with the computational prediction of important amino acids positions and catalytic mechanisms form the basis for the construction of "smart" libraries.
There are, however, enzyme properties which are not understood, such as solvent related properties (solubility, co-solvent resistance), salt and surfactant eff ects, allosteric inhibition and temperature activity and stability, where focused libraries cannot be applied. Suggestions based in collections of mutational and activity data can be obtained using enzyme structure-function relationship driven databases and servers such as hotspot wizard (Pavelka et al., 2009), MAP  and MAP 3D (Verma et al., 2012). Random mutagenesis libraries off er the possibility to improve enzymes in virtually any property that can be refl ected in a screening assay, offering in addition to improved variants, novel structure-function relationship data for the target enzyme.
Random mutagenesis libraries are commonly generated using PCR-based methods diff ering mainly in how they tackle the mutational bias of DNA polymerases, the organization of the genetic code, and the limitation of single nucleotide exchanges.
Commonly used DNA polymerases (Taq, Pfu, Mutazyme, Φ29) show a mutational bias towards transitions (purine to purine and pyrimidine to pyrimidine; nucleotide substitutions: A ↔ G, T ↔ C) over transversions (purine to pyrimidine or pyrimidine to purine; amino acid substitutions: A ↔ T, A ↔ C, T ↔ G, G ↔ C) . In the context of the genetic code, transition mutations result in amino acids chemically similar or identical to the original residue , causing a limited chemical diversity in the constructed library.
Another challenge presented by organization of the genetic code is the uneven representation of the 20 amino acids, ranging from six codons per amino acids in the case of Leu, Ser and Arg, to only one codon in the case of Trp and Met.

TABLE I
Codon diversity generated by site-saturation mutagenesis (SSM) using different degenerate codons Generated codon diversity for NNN (64 codons, 20 amino acids), NNK (32 codons, 20 amino acids) and NDT degenerate codons (12 codons, 12 amino acids). An oversampling of 3 is necessary for sampling the whole diversity with a ~95% confi dence . The diversity explosion is dramatically reduced when using NDT codon for simultaneous SSM, however only 12 amino acids are represented (Gly, Phe, Ile, Leu, Val, Tyr, His, Cys, Ser, Asn, Asp, Arg) In practice, this means that random libraries have a naturally biased amino acid distribution, thus further limiting their protein diversity. Furthermore, codon diversity in PCR-based mutagenesis methods is introduced a single nucleotide at a time, given the inability of DNA polymerases to elongate multiple nucleotide mismatches . All abovementioned challenges are conveniently integrated in the genetic code most likely as a protection mechanism of the biological system to hamper the accumulation of mutations and drastic protein modifi cations. Codon redundancy, DNA polymerases bias towards transitions and their inability to introduce consecutive mutations ensure a very small but necessary genetic variability, in the way of silent mutations of substitutions with chemically similar amino acids.
Error-prone PCR (epPCR) is the workhorse of random library generating methods, relying in the introduction of mutations in the template gene by performing a PCR reaction in error-inducing conditions such as in presence of Mn +2 , unbalanced nucleotide concentrations or excessive PCR cycles.
Error-prone PCR libraries, despite being of straightforward construction, are aff ected by diversity challenges presented by the organization of the genetic code discussed above and the introduced mutations are transition biased, resulting in chemically similar amino acid substitutions. As a result, in average, less than 40% of all possible amino acid substitutions are possible using epPCR methods .
Diverse library generation methods, developed to overcome mutational biases, have been presented and discussed in detail, off ering a more complete diversity at amino acid level (Shivange et al., 2009). The addition of nucleotide analogues to bypass mutational bias, Random Insertion and Deletion (RID) mutagenesis, Random Insertional-deletional Strand Exchange (RAISE) mutagenesis and Sequence Saturation Mutagenesis (SeSaM) off er libraries with enhanced diversity . In particular, SeSaM (Wong et al., 2004) and its transversion enriched version SeSaM-Tv (Mundhada et al., 2011, Wong et al., 2008 offer the additional benefits of generating transversion enriched libraries and the ability of incorporate subsequent mutations in a codon, which increases the obtained codon diversity from a 39.5% (single mutation in a codon) to 83% coverage of the natural diversity (Shivange et al., 2009).
The main diff erence between biased or unbiased random mutagenesis libraries lays in the probability of generating more complete (or less incomplete) representation of the actual diversity that can be generated from the parent gene. However, given the immense theoretical diversity present in a random mutagenesis library and the possible screening limitations, researchers should not expect to identify "the best" variant from screening a random mutagenesis library. Instead, signifi cant enzyme improvement can be generated using random mutagenesis by performing several subsequent rounds of screening in a directed evolution campaign, where one or more improved enzyme variants identified from screened are used as parents for the next round of random mutagenesis.

Library screening
A reliable enzyme production system is a pre-requisite for a suitable high throughput screening system. Although enzymes are isolated from numerous organisms, directed evolution is performed only in a few diff erent host organisms with Escherichia coli, Bacillus subtilis and Saccharomyces cerevisiae being the standard hosts, based on their high transformation effi ciencies and well established genetic manipulation tools. Further considerations on selecting an appropriate host in directed evolution have been discussed, describing bacterial, yeast, insect and mammalian cells as feasible hosts for directed enzyme evolution (Pourmir and Johannes, 2012). Important is that enzyme production is suffi cient for activity and is homogenous across the host population, which can be statistically assessed by measuring enzymatic activity from multiple initial cultures of the selected host.
Discussions regarding theoretical diversity in random mutagenesis methods lose relevance if there isn't a screening or selection platform that can sample a meaningful percentage of the generated library.
High throughput assay development usually deals with a balance between substrate selection, assay complexity, and detection limits. Protein engineers are often presented with the challenge of not being able to use the "real" substrate due to availability, economic or complexity reasons. Using a model substrate for a library screening campaign, though initially more convenient, might lead to optimization of the enzyme towards the model substrate, in detriment of the "real" substrate.
Assay complexity has usually an inverse relationship with throughput, being selection systems, relying on the link between cell growth and enzyme functionality, those providing the highest throughput/complexity relationship and reducing the need for special instruments ( Fig. 2A). Selection screening follows library transformation into the host, allowing screening large libraries, by growing transformed cells in a selective medium lacking essential nutrients or containing toxic compounds. Selection assays are therefore mostly limited for detoxifying enzymes or enzymes synthetizing compounds necessary for cell growth and survival. Selection assays can be designed to screen for enantio-selective enzymes (Boersma et al., 2008, where the desired enantiopure substrate releases essential nutrients upon reaction, while the undesired enantiomer was supplied as an inhibitor or poison releasing substrate, allowing the growth only of the variants having the desired activity. Following in the complexity scale and throughput are agar plate assays, in which substrate conversion around single colonies generates a visual signal directly or indirectly in the agar medium ( Fig. 2A). Chromogenic of fl uorogenic substrate conversion can be observed by change on colonies color, fl uorescence, or generation of halos around the active colonies, allowing identifying active variants among those growing on the agar plate. Agar plate screening methods are, however, not suitable for quantifying catalytic activities of individual variants in the library, making them ideal as pre-screening methods.
Microtiter plate (MTP) is the most used screening format for library screening, mainly because it performs as a miniature cuvette system in which the optimal ratio between assay complexity and throughput is achieved by current mainstream technology (Fig 2B). A large diversity of enzymatic assays has been ported to the MTP format and an equal amount has been developed specifi cally for the format. Single colonies isolated from a mutant library are inoculated and grown in 96-well plates (master plate), replicated into an expression plate where enzyme is produced and extracted by removing of the biomass (extracellular enzymes) or by cell disruption (intracellular enzymes), and subsequently the assay is performed in 96-or 384-well format. The main advantage of MTP screening systems is the wide availability of analytical tools and standardized equipment, enabling quantitative activity measurements of each screened variant thus providing a more complete dataset for hit evaluation compared to all other high throughput screening methods. On the other hand, MTP assays have a throughput of 10 3 -10 5 variants per round, falling nowadays, for 96-well plates, to the medium throughput category.
Further miniaturization of the system leads to the use single host cells as femto-liter scale reaction compartments and individual assessment using fl ow cytometry (Fig 2C). A key requirement in this approach is to consistently keep the link between the genotype and the phenotype of the cell, meaning that in the context of the reaction, active enzyme variants should label exclusively the cell in which they are produced. This is achieved by the accumulation of a fl uorescent product in the cell, enabling the identifi cation and recovery of the active variants. A whole cell cytochrome P450 monooxygenase assay for odealkylation was reported in which the fl uorogenic 7-benzoxy-3-carboxy-coumarin Ethyl ester fluorogenic substrate is converted and retained in the E.coli cell due to the spontaneous cleavage of the ethyl ester group to the corresponding carboxylic acid. An epPCR library of cytochrome P450 BM3 monooxygenase was analyzed at a rate of 3.6 x 10 7 events per day using fl ow cytometry, active cells were sorted in agar plates, and the active variant enriched re analyzed and screened by fl ow cytometry. After a three rounds of enrichment, variants with up to 7-fold increased activity were identifi ed, validating the direct cell labeling approach (Ruff et al., 2012). Success in this approach relays heavily in the ability of the cell to uptake the substrate and to accumulate the fl uorescent product, allowing the identifi cation of active variants.
The product accumulation challenge has been addressed by the development of microencapsulation methods in which cells producing target enzyme are entrapped together with the substrate and product in water/oil/water micro-emulsions (< 70 μm) which are subsequently analyzed at speed of thousands per second by fl ow cytometry (Griffi ths and Tawfi k, 2006) (Fig. 2D,  i). This approach allows keeping the high throughput analysis of fl ow cytometry whilst allowing a wider chemical range of substrates and products, which must not necessarily remain inside the cell producing the active variant (Tu et al., 2011).
The microencapsulation approach, combined with flow cytometry analysis, enabled to completely skip the cell host by producing the enzyme directly from the DNA library by in vitro protein translation and was termed as in vitro compartmentalization (IVC) (Miller et al., 2006, Tawfi k andGriffi ths, 1998) (Fig 2D, ii). In vitro (cell-free) enzyme production allows a simpler biochemical and genetic environment for the selection pressure, enabling for example the production and screening of toxic-proteins, the addition of non-natural cofactors or modifi ed amino acids for labeling the translated protein (Wang et al., 2009). IVC prevents the gene diversity loss inherited from the cloning and transformation steps, by performing protein translation directly from the DNA obtained in the PCR reaction used to generate the mutant library. The use of water-in-oil-in water emulsions and cell-free protein expression allows reducing experimental time by removing cell growth steps in cloning, transformation and protein production. Despite the conceptual elegance of the IVC screening method and promising potential for directed enzyme evolution, a small number of examples describing routine application of in vitro compartmentalization are reported, compared to the technical signifi cance of the method, including directed evolution of a methyltransferase, a beta-galactosidase, and a thiolactanase (Aharoni et al., 2005, Mastrobattista et al., 2005, Tawfi k and Griffi ths, 1998. This suggests that the IVC approach is not yet established in most protein engineering laboratories likely due to multiple technical challenges (Lu andEllington, 2012, Nishikawa et al., 2012). An important challenge of water-inoil-in-water microencapsulation approaches is the generation of mono-dispersed homogeneous droplets containing a single inner water phase, which must be stable in an aqueous-based sheet fluid compatible with pressure fluctuations of flow cytometry devices. Current droplet formation approaches include two-step emulsifi cation by dispersion and extrusion, which can generate a population of droplets around a specifi c diameter. The droplet size variation, however, can reach more than 100% between the smallest and biggest subpopulations (Mastrobattista et al., 2005).
Microfl uidics technology has recently been introduced in the fi eld of directed evolution as an alternative approach to overcome the encapsulation challenge in IVC. Homogeneous single and double emulsion generation have been shown to be a mature technology with broad applications in physics, chemistry and biology (Theberge et al., 2010). Successful approaches on expressing green from single DNA templates fluorescent protein GFP paved the way for the use of microfl uidic encapsulation in directed evolution experiments (Dittrich et al., 2005). Besides the encapsulation of mutant libraries and protein translation, microfl uidic devices have the capacity to integrate quantitatively detect single cell enzymatic assays such as the phosphatase driven cleavage of 3-o-methylfluorescein phosphate in the periplasm of encapsulated E. coli cells (Huebner et al., 2008) and the indroplet cell lysis-based screening system for detecting activity of an epPCR library of a promiscuous sulfatase towards phosphonate using the fl uorogenic bis-(methylphosphonyl)fluorescein (Kintses et al., 2012). The latter example is especially interesting, since the microfl uidic device included droplet fluorescence detection and a dielectrophoresisbased sorting system which allowed recovering droplets showing phosphonate cleaving activity for DNA isolation and transformation into E. coli cells for the next round of directed evolution. Very recently, a completely in vitro highthroughput microfluidic screening system was reported for specifi c application in directed enzyme evolution using β-galactosidase as a proof of principle (Fallah-Araghi et al., 2012), allowing detection and sorting of up to 2000 droplets per second.
Screening systems available for directed enzyme evolution are varied both in conceptual and technical complexity. Until today, MTP screening platforms are generally used for protein engineers due to their fl exibility and automation. Emerging platforms, though conceptually and technically more advanced in terms of throughput, are yet to reach a routine maturity to replace MTP platforms form their privileged position.

Analysis of the output from a directed evolution round
The identifi cation of improved enzyme variants after library screening is still far from the end of a directed evolution campaign. It is infrequent that a directed evolution round yields a single "better" variant; commonly, a population of 3 to 5 variants showing various levels of increased performance compared to the parent enzyme are identifi ed each screening round. Depending on the mutational load applied in the library, each of the improved variants might contain one or more amino acid substitutions. A common approach is to "clean" selected variants by performing single site-directed mutagenesis (SDM) on the parent gene, thus identifying amino acids substitutions responsible of the observed activity changes. Furthermore, mutational cleaning, in addition to isolating meaningful amino acid substitutions, results in identifying synergistic interactions between amino acid substitutions and also removing neutral and nonbeneficial amino acid substitutions which are thought to decrease inherent stability and thus the "evolvability" of the enzyme for future rounds (Bloom et al., 2006). Experimental characterization of cleaned variants is also convenient when structure-function relationships are investigated, since changes in activity and stability can be directly attributed to the present amino acid substitutions. A more complex scenario is when a large number of selected variants have multiple amino acid substitutions. A statistical-based approach is reported in which sequence-activity data is analyzed for a population of variants, in order to extract and label the beneficial/detrimental factor of single amino acid substitutions and proposing new combinations expecting increased activity, without the need of constructing the single mutants (Brouk et al., 2010). Using this approach and a broad dataset of sequence-activity obtained from directed evolution and rational design, 2-phenylethanol degradation by a multicomponent toluene 4-monooxygenase from Pseudomonas mendocina was improved up to 7.3 times in average from the initially found improved variants only by recombination of known amino acid substitutions from previous rounds (Brouk et al., 2010). The main advantage of the statistical approach is that, similar to directed evolution, no previous structural information is necessary to propose new combinations, though it may be complemented by it. The interaction between amino acid substitutions, however, is regarded as additive -the sum of their individual eff ect-thus underestimating synergistic eff ects between substitutions.
Large populations of amino acid substitutions in diff erent improved variants can be analyzed computationally, when structural information is available, by identifying clusters of amino acid substitutions within the targeted protein structure. The latter allows constructing of variants that contain amino acid substitutions that "can interfere with each other" in order to identify and analyze combinatorial eff ects in smaller subpopulations. Such an analysis allow to identify the most beneficial combinations in each cluster, and subsequently integrate them a single variant (Bocola et al., 2004).

CURRENT CHALLENGES OF ENZYME EVOLUTION
It is clear that for each step in directed evolution, there are multiple methods and strategies that can -and have been shown to-be successful identifying improved enzymes. Choosing the appropriated approach aff ects not only in how easy or hard will be to actually generate and identify the desired enzyme variant, but also the information that will be generated along with the process. Nowadays, challenges in directed enzyme evolution can be divided into conceptual challenges and technical challenges.
The ultimate challenge in protein engineering is to understand and therefore predict protein behavior. A main conceptual challenge of directed enzyme evolution is to predict and relate enzyme evolutive behavior in the laboratory with that of natural evolution. Laboratory evolution allows to study and modify enzymes as an isolated entity in the context of the selected pressure to which is subjected during the screening campaign. On the other hand, natural evolution the selection pressure is multifactorial, occurs over an extended period of time, and those selection pressures are variable across different evolutionary periods. It is proposed that nature evolves enzymes through accumulation of neutral or even deleterious mutations and, in contrast to directed evolution, allows the continuity of inactive sequences (Romero and Arnold, 2009), suggesting that every natural enzyme has a certain percentage accumulation of neutral mutations and that inactive sequences can over time recover or generate enzymes with new activities, for example after gene duplication events. Considering that starting points for directed evolution are natural enzymes, there is always an uncertainty on how the neutral mutations will translate when a unique evolutionary pressure is applied to the enzyme. An interesting way to convey the enzyme response to the pressure applied in laboratory evolution is the concept of protein fi tness landscape, which is the representation of the sequence space (genotype) of a given protein as (theoretical) single variants and adding a third measure of fi tness (phenotype) for each of these variants with respect to one or more applied evolutionary pressures (screening assay) (Carneiro andHartl, 2010, Romero andArnold, 2009). The shape of this landscape ranges from the "Mount Fuji" shape, in which the uphill climb is always connected by increasingly active sequences reaching fi nally a single optimum sequence to the "rugged badlands" landscape in which many optima are present but separated by low activity of inactive sequences. The former landscape allows through navigating through multiple active sequences and through diff erent evolutive paths, to reach the optimum; the latter landscape, however, presents a nearly impossible way to navigate between the diff erent optima, since directed evolution does not allow inactive sequences to be used in further rounds of directed evolution.
The genetic diversity generation has migrated from a technical challenge to a conceptual challenge. Nowadays, routinely used methods for generating mutant libraries range from single site-saturation mutagenesis using the Quickchange method to the transversion enriched full sequence saturation mutagenesis (SeSaM). This range of possibilities has generated fruitful debate about which is the most eff ective strategy to achieve rapid and dramatic improvements in directed evolution. The "low mutagenic load" approach, where in average 1 or 2 amino acid substitutions are introduced, generates libraries with a high percentage of active variants where incremental improvements (2-3 fold activity/stability) are obtained, relying in several iterative rounds of evolution to generate a dramatic improvement on the desired property.
On the other hand, a high mutational rate approach generates libraries with a low percentage of active clones, relying heavily on a high throughput screening system to identify and recover active variants for further analysis. A higher mutagenic rate could generate epistatic effects that could result in alternative adaptation pathways which are not reachable by low mutational rate approaches (Salverda et al., 2011).
Random mutagenesis and focused libraries have migrated from an excluding to a complementary relationship; a successful directed evolution campaign will often start by epPCR or SeSaM as an "explorative" screening round in order to identify relevant amino acid positions which can be subsequently saturated and recombined using SDM or SSM to obtain optimized variants and generate structurefunction knowledge. Conversely, in semi rational active-site focused campaigns, random mutagenesis libraries can be screened in parallel for increase stability, in order to identify stabilizing amino acid substitutions, which can be introduced in improved variants in order to increase enzyme evolvability in further rounds.
Reducing the timescale of a directed evolution campaign is the main challenge for studying the evolutionary behavior of enzymes. Generally, directed evolution campaigns generate isolated examples of improved variants and hypotheses on the eff ect of the introduced amino acid substitutions, rarely reporting a long-term evolutionary story. The lineage of a P450 monooxygenase evolved towards propane hydroxylation was studied, revealing how the original P450 BM3 monooxygenase which was able to hydroxylate C 12 -C 20 fatty acids transformed, over 20 amino acid substitutions, into a P450 propane monoxygenase. Key changes in the substrate binding pocket of P450 BM3 allowed the appearance of a wide range of substrate activities, including propane. Further changes on substrate recognition mechanisms enabled a narrower substrate profi le around propane, ending in a propane-specific P450 PMO variant (Fasan et al., 2008). Reducing the time required for a directed evolution round requires overcoming two major bottlenecks: i) the growth rate of the host organism, which aff ects transformation, pre-culture and protein production timescales, and ii) the library screening throughput. Currently, the in vitro compartmentalization approach (IVC), due to the application of cell-free enzyme production and ultrahigh throughput screening capabilities, seems to be the most promising methodology in which multiple rounds of directed enzyme evolution can be achieved in the matter of weeks or even days. An advanced IVC technology will allow a higher throughput per round of directed evolution, thus increasing the probabilities of identifying significantly improved enzyme variants (through more rounds of directed evolution per campaign) and allowing the study of the evolutionary behavior of enzymes under diff erent selection pressures.

GENERATING AND IMPROVING INDUSTRIAL ENZYMES BY DIRECTED EVOLUTION
Industrial ("white") biotechnology (Frazzetto, 2003) comprises mainly the production of chemicals, fuels and materials using isolated enzymes (biocatalysis) or enzyme-producing living cells (fermentation). The rapid development of industrial biotechnology in the last years can be attributed to two main driving forces; the increased economic effi ciency of the biotechnological production compared to traditional chemical processes, and the current social demands for sustainable, environment-friendly industrial processes (Tang and Zhao, 2009). The global demand for replacing fossil resources by renewable agricultural crops or low-value side products as raw starting materials ensures that the market share of chemical produced by industrial biotechnology (fuels, plastics, fi ne and bulk chemicals) will continue to grow in the next years (Soetaert and Vandamme, 2006).
Enzymes are important part of industrial biotechnology, the use proteases and lipases for the detergent industry, α-amylases for the production of glucose, the use of phytases for the animal feed industry and the biosynthesis of ribofl avin (vitamin B2) and cobalamin (vitamin B12), are just the classical examples of enzymes applied to industrial production. Nowadays, enzymes have taken an important role in the production of fi ne chemicals, pharmaceuticals, food additives and supplements, colorants, vitamins, pesticides, bio-plastics, solvents, bio-plastics, bulk chemicals and biofuels Singh, 2012, Soetaert andVandamme, 2006).
The use of enzymes, however, is limited when their economic effi ciency, stability or substrate/product specifi city do not meet the demands of the chemical processes in an industrial scale. Protein engineering in a rational or evolutive or combined manner is currently the preferred approach for tailoring biocatalysts for specifi c needs in products or production. As an example, we discuss two directed enzyme evolution campaigns in which hydrolases were tailored for specifi c industrial applications.

Adaptation of biomass degrading enzymes for non-conventional solvents
During the last decade, chemical industries are evaluating biomass -based alternatives for bio-and chemical manufacturing as a way to secure energy production (Grande and Domínguez de María, 2012). In that mindset, cellulose depolimerization has become a key step in biorefineries to produce fermentable sugars. Cellulases, together with hemicellulases, cellobioases, endo-and exoglucanases, are able to perform such depolimerization under mild conditions. Biomass pretreatment, nevertheless, includes several mechanical and chemical steps to decrease cellulose crystallinity, increase its availability for dissolution. Ionic liquids (IL) and more recently deep eutectic solvents (DES) have been proposed as "green" solvents for cellulose pretreatment (Domínguez de María and Maugeri, 2011); however, high IL or DES content usually leads to rapid enzyme inactivation, hindering the subsequent depolymerization steps. A cellulase, CelA2, isolated from a metagenomic library constructed from a biogas plant (Ilmberger et al., 2012) was subjected to directed evolution in order to increase its resistance to choline chloride: glycerol (ChCl:Gly) as a co-solvent In the fi rst round, an CelA2 epPCR library was generated and screened using a developed 4-methylumbelliferyl-β-D-cellobioside fluorescent MTP assay, yielding a variant, 4D1, harboring 6 amino acid substitutions (Leu21Pro; Leu184Gln; His288Arg; Lys299Ile; Asp330Gly; Asn442Asp) with an increased specifi c activity in 30% (v/v) ChCl:Gly. In the second evolution round, 6 single SSM libraries were constructed, in order to identify which amino acid substitutions were responsible for the activity change. Screening result revealed that only variants from the library at position 288 showed signifi cant increase in activity, identifying a substitution that increased the resistance to ChCl:Gly (His288Arg) showing a similar behavior as 4D1, and a substitution that resulted in a more than 8-fold increase in specifi c activity compared to the original CelA2 cellulase, with a similar inactivation rate in ChCl:Gly. (Lehmann et al., 2012). This cellulase evolution study is an interesting example how mutational cleaning approach can work to identify by single SSM, among the six amino acid substitutions present in variant 4D1, that the His288Arg substitution is responsible for increased co-solvent resistance, and further identifying an additional substitution (His288Phe) that dramatically increased specifi c activity (Fig. 3 B).

Optimizing and developing subtilisin proteases for detergent industry
Alkaline proteases from Bacillus species are one of the fi rst success examples of the use of bacterial enzymes in endproducts. Since the 1960's subtilisin proteases have been applied for the detergent industry as active additives for the degradation of protein stains, and in the beginning of the 1990's the fi rst engineered enzymes entered the laundry market, establishing themselves as industrial benchmarks (Maurer, 2004). The application of subtilisin proteases in detergents is still challenged by market changes and the type of product in which they are applied such as improving proteolytic performance at low temperatures for energy effi cient "cold" washing cycles, and storage stability in liquid detergents. Subtilisins are produced as extracellular enzymes, which greatly simplifi es protease separation from cell biomass and facilitates downstream purifi cation procedures (Gupta et al., 2002). In order to generate stable subtilisins able to perform in low temperatures, the activity at 15°C of a Bacillus gibsonii alkaline protease (BgAP) was increased through 3 rounds of SeSaM-based directed evolution. Libraries were screened in parallel towards increased activity and towards increased thermal stability by measuring proteolytic activity at 15°C and after incubation at 58°C. Directed BgAP evolution yielded a set of BgAP variants with increased specifi c activity at 15°C and increased thermal resistance. Recombination of both sets of amino acid substitutions resulted fi nally in the variant MF1 with a 1.5-fold increased specifi c activity and an over 100 times prolonged half-life at 60°C (224 min compared to 2 min of the WT BgAP) (Martinez et al., 2013). Through iterative combination of these amino acid subsets it was shown that improved thermal resistance (requiring strong interactions) and improved activity (often requiring flexibility) can be obtained within a single protease variant harboring six amino acid substitutions.
Using a similar strategy and a novel fluorescent microtiter plate screening assay for peroxo acid (R-COOOH) detection based in the hypobromite induced o-dearylation of 7-(4′-aminophenoxy)-3-carboxy coumarin. The screening system was validated by improving the perhydrolytic activity of a subtilisin Carlsberg variant for methyl-butyrate (Despotovic et al., 2012). Peroxo acids are typical bleaching agents (Maurer, 2004) and their enzymatic in situ production in the cleaning process would be desirable. Oxidative resistance of the peroxo acid producing subtilisin Carlsberg variant was improved by site saturation mutagenesis in positions Trp216 and Met221and further recombination. Variant M4 (W216M, M221) had an increase of ~5 fold in the peroxyacetic acid concentration necessary to decrease the enzyme activity by 50% (3.1 mM for M4 versus 0.6 mM for the parent) (Vojcic et al., 2013).

CLOSING REMARKS AND OUTLOOK
Each individual directed enzyme evolution campaign is a challenge, in which the best possible enzyme for a specifi c trait should be generated and identifi ed using the quickest and most promising strategy. Selecting the appropriate evolution strategy implies considering the expected improvement from the selected parent, the conceptual and technical limitations of library generation and screening systems, and how to evaluate the output of each evolution round, in order to identify and select amino acid substitutions to be carried over to the next round.
Directed evolution is nowadays a general and often validated approach to generate tailored enzymes for specifi c applications providing additionally valuable information regarding how enzymes adapt and hints in how selection under simple evolutionary pressure can be extrapolated to natural evolution.
The future of directed enzyme evolution as a research fi eld seems quite promising; the increasing demand of engineered enzymes, especially for industrial applications, will continue to fuel research and development of faster, better, more efficient mutagenesis and screening methods for tailoring biocatalysts.
Due to rapid advances in diversity generation and screening technologies, it is not diffi cult to envision a near future in which protein engineering -and especially directed protein evolution-shares a primary role in developing biocatalysts for industrial and end-user processes, leading the way in understanding of protein adaptation and evolutionary behavior.

ACKNOWLEGEMENTS
The authors thank Dr. Marco Bocola for critical discussion and editing role for this article. . Enzyme evolution strategies applied to two industry relevant enzymes. A) Directed evolution of Bacillus subtilis alkaline protease (BgAP) using a parallel screening towards high activity at low temperatures (left) and thermal resistance (right) with 3 rounds of sequence saturation mutagenesis (SeSaM). Two single variants with improved temperature-activity profi les were identifi ed and identifi ed amino acid substitutions were combined to generate a single enzyme variant with a wider temperature profi le compared to the wild type (Martinez et al., 2013). B) Directed evolution of a metagenome isolated cellulase (CelA2) towards increased performance in aqueous solutions of the deep eutectic solvent choline chloride with glycerol (ChCl:Gly). After screening an error-prone PCR library, variant 4D1 was identifi ed that harbored 6 amino acid substitutions. In order to identify single amino acid substitutions, 6 site-saturation mutagenesis (SSM) libraries were constructed and screened, revealing that substitution His288Arg was responsible of the observed improvement in 4D1. Additionally substitution His288Phe was identifi ed, which results in an 8-fold increased specifi c activity, compared to wild type (Lehmann et al., 2012).