What's an Evolution Strategy (ES)?

In 1963 two students at the Technical University of Berlin (TUB) met and were soon to collaborate on experiments which used the wind tunnel of the Institute of Flow Engineering. During the search for the optimal shapes of bodies in a flow, which was then a matter of laborious intuitive experimentation, the idea was conceived of proceeding strategically. However, attempts with the coordinate and simple gradient strategies were unsuccessful. Then one of the students, Ingo Rechenberg, now Professor of Bionics and Evolutionary Engineering, hit upon the idea of trying random changes in the parameters defining the shape, following the example of natural MUTATIONs. The EVOLUTION STRATEGY was born. A third student, Peter Bienert, joined them and started the construction of an automatic experimenter, which would work according to the simple rules of mutation and SELECTION. The second student, Hans-Paul Schwefel, set about testing the efficiency of the new methods with the help of a Zuse Z23 computer; for there were plenty of objections to these "random strategies."

In spite of an occasional lack of financial support, the Evolutionary Engineering Group which had been formed held firmly together. Ingo Rechenberg received his doctorate in 1970 (Rechenberg 73). It contains the theory of the two membered EVOLUTION strategy and a first proposal for a multimembered strategy which in the nomenclature introduced here is of the (m+1) type. In the same year financial support from the Deutsche Forschungsgemeinschaft (DFG, Germany's National Science Foundation) enabled the work, that was concluded, at least temporarily, in 1974 with the thesis "Evolutionsstrategie und numerische Optimierung" (Schwefel 77).

Thus, EVOLUTION STRATEGIEs were invented to solve technical OPTIMIZATION problems (TOPs) like e.g. constructing an optimal flashing nozzle, and until recently ES were only known to civil engineering folks, as an alternative to standard solutions. Usually no closed form analytical objective function is available for TOPs and hence, no applicable optimization method exists, but the engineer's intuition.

The first attempts to imitate principles of organic evolution on a computer still resembled the iterative optimization methods known up to that time: In a two-membered or (1+1) ES, one PARENT generates one OFFSPRING per GENERATION by applying NORMALLY DISTRIBUTED mutations, i.e. smaller steps occur more likely than big ones, until a child performs better than its ancestor and takes its place. Because of this simple structure, theoretical results for STEPSIZE control and CONVERGENCE VELOCITY could be derived. The ratio between successful and all mutations should come to 1/5: the so-called 1/5 SUCCESS RULE was discovered. This first algorithm, using mutation only, has then been enhanced to a (m+1) strategy which incorporated RECOMBINATION due to several, i.e. m parents being available. The mutation scheme and the exogenous stepsize control were taken across unchanged from (1+1) ESs.

Schwefel later generalized these strategies to the multimembered ES now denoted by (m+l) and (m,l) which imitates the following basic principles of organic evolution: a POPULATION, leading to the possibility of recombination with random mating, mutation and selection. These strategies are termed PLUS STRATEGY and COMMA STRATEGY, respectively: in the plus case, the parental generation is taken into account during selection, while in the comma case only the offspring undergoes selection, and the parents die off. m (usually a lowercase mu, denotes the population size, and l, usually a lowercase lambda denotes the number of offspring generated per generation). Or to put it in an utterly insignificant and hopelessly outdated language:

     (define (Evolution-strategy population)
       (if (terminate? population)
             (cond (plus-strategy?
                     (union (mutate
                              (recombine population))
                       (recombine population))))))))
However, dealing with ES is sometimes seen as "strong tobacco," for it takes a decent amount of probability theory and applied STATISTICS to understand the inner workings of an ES, while it navigates through the hyperspace of the usually n-dimensional problem space, by throwing hyperelipses into the deep...

Luckily, this medium doesn't allow for much mathematical ballyhoo; the author therefore has to come up with a simple but brilliantly intriguing explanation to save the reader from falling asleep during the rest of this section, so here we go:

Imagine a black box. A large black box. As large as, say for example, a Coca-Cola vending machine. Now, [..] (to be continued)

A single INDIVIDUAL of the ES' population consists of the following GENOTYPE representing a point in the SEARCH SPACE:

OBJECT VARIABLES Real-valued x_i have to be tuned by recombination and mutation such that an objective function reaches its global optimum. Referring to the metaphor mentioned previously, the x_i represent the regulators of the alien Coka-Cola vending machine.

STRATEGY VARIABLEs Real-valued s_i (usually denoted by a lowercase sigma) or mean stepsizes determine the mutability of the x_i. They represent the STANDARD DEVIATION of a (0, s_i) GAUSSIAN DISTRIBUTION (GD) being added to each x_i as an undirected mutation. With an "expectancy value" of 0 the parents will produce offspring similar to themselves on average. In order to make a doubling and a halving of a stepsize equally probable, the s_i mutate log-normally, distributed, i.e. exp(GD), from generation to generation. These stepsizes hide the internal model the population has made of its ENVIRONMENT, i.e. a SELF-ADAPTATION of the stepsizes has replaced the exogenous control of the (1+1) ES.

This concept works because selection sooner or later prefers those individuals having built a good model of the objective function, thus producing better offspring. Hence, learning takes place on two levels: (1) at the genotypic, i.e. the object and strategy variable level and (2) at the phenotypic level, i.e. the FITNESS level.

Depending on an individual's x_i, the resulting objective function value f(x), where x denotes the vector of objective variables, serves as the PHENOTYPE (fitness) in the selection step. In a plus strategy, the m best of all (m+l) individuals survive to become the parents of the next generation. Using the comma variant, selection takes place only among the l offspring. The second scheme is more realistic and therefore more successful, because no individual may survive forever, which could at least theoretically occur using the plus variant. Untypical for conventional optimization algorithms and lavish at first sight, a comma strategy allowing intermediate deterioration performs better! Only by forgetting highly fit individuals can a permanent adaptation of the stepsizes take place and avoid long stagnation phases due to misadapted s_i's. This means that these individuals have built an internal model that is no longer appropriate for further progress, and thus should better be discarded.

By choosing a certain ratio m/l, one can determine the convergence property of the evolution strategy: If one wants a fast, but local convergence, one should choose a small HARD SELECTION, ratio, e.g. (5,100), but looking for the global optimum, one should favour a softer selection (15,100).

Self-adaptation within ESs depends on the following agents (Schwefel 87):

Randomness: One cannot model mutation as a purely random process. This would mean that a child is completely independent of its parents.

Population size: The population has to be sufficiently large. Not only the current best should be allowed to reproduce, but a set of good individuals. Biologists have coined the term "requisite variety" to mean the genetic variety necessary to prevent a SPECIES from becoming poorer and poorer genetically and eventually dying out.

COOPERATION: In order to exploit the effects of a population (m > 1), the individuals should recombine their knowledge with that of others (cooperate) because one cannot expect the knowledge to accumulate in the best individual only.

Deterioration: In order to allow better internal models (stepsizes) to provide better progress in the future, one should accept deterioration from one generation to the next. A limited life-span in nature is not a sign of failure, but an important means of preventing a species from freezing genetically.

ESs prove to be successful when compared to other iterative methods on a large number of test problems (Schwefel 77). They are adaptable to nearly all sorts of problems in optimization, because they need very little information about the problem, especially no derivatives of the objective function. For a list of some 300 applications of EAs, see the SyS-2/92 report . ESs are capable of solving high dimensional, multimodal, nonlinear problems subject to linear and/or nonlinear constraints. The objective function can also, e.g. be the result of a SIMULATION, it does not have to be given in a closed form. This also holds for the constraints which may represent the outcome of, e.g. a finite elements method (FEM). ESs have been adapted to VECTOR OPTIMIZATION problems (Kursawe 92), and they can also serve as a heuristic for NP-complete combinatorial problems like the TRAVELLING SALESMAN PROBLEM or problems with a noisy or changing response surface.


Kursawe, F. (1992) " Evolution strategies for vector optimization", Taipei, National Chiao Tung University, 187-193.

Kursawe, F. (1994) " Evolution strategies: Simple models of natural processes?", Revue Internationale de Syst?mique, France (to appear).

Rechenberg, I. (1973) "Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution", Stuttgart: Fromman-Holzboog.

Schwefel, H.-P. (1977) "Numerische Optimierung von Computermodellen mittels der Evolutionsstrategie", Basel: Birkh?user.

Schwefel, H.-P. (1987) "Collective Phaenomena in Evolutionary Systems", 31st Annu. Meet. Inter'l Soc. for General System Research, Budapest, 1025-1033.