In spite of an occasional lack of financial support, the Evolutionary
Engineering Group which had been formed held firmly together. Ingo Rechenberg
received his doctorate in 1970 (Rechenberg 73). It contains the theory
of the two membered EVOLUTION strategy and a first
proposal for a multimembered strategy which in the nomenclature introduced
here is of the (*m*+1) type. In the same year financial support from
the Deutsche Forschungsgemeinschaft (DFG, Germany's National Science Foundation)
enabled the work, that was concluded, at least temporarily, in 1974 with
the thesis "Evolutionsstrategie und numerische Optimierung" (Schwefel 77).

Thus, EVOLUTION STRATEGIEs were invented to solve technical OPTIMIZATION problems (TOPs) like e.g. constructing an optimal flashing nozzle, and until recently ES were only known to civil engineering folks, as an alternative to standard solutions. Usually no closed form analytical objective function is available for TOPs and hence, no applicable optimization method exists, but the engineer's intuition.

The first attempts to imitate principles of organic evolution on a computer
still resembled the iterative optimization methods known up to that time:
In a two-membered or (1+1) ES, one PARENT generates
one OFFSPRING per GENERATION
by applying NORMALLY DISTRIBUTED mutations, i.e. smaller
steps occur more likely than big ones, until a child performs better than
its ancestor and takes its place. Because of this simple structure, theoretical
results for STEPSIZE control and CONVERGENCE
VELOCITY could be derived. The ratio between successful and all
mutations should come to 1/5: the so-called 1/5 SUCCESS RULE
was discovered. This first algorithm, using mutation only, has then been
enhanced to a (*m*+1) strategy which incorporated RECOMBINATION
due to several, i.e. *m* parents being available. The mutation scheme
and the exogenous stepsize control were taken across unchanged from (1+1)
ESs.

Schwefel later generalized these strategies to the multimembered ES
now denoted by (*m*+*l*) and (*m,l*) which imitates the
following basic principles of organic evolution: a POPULATION,
leading to the possibility of recombination with random mating, mutation
and selection. These strategies are termed PLUS STRATEGY
and COMMA STRATEGY, respectively: in the plus case,
the parental generation is taken into account during selection, while in
the comma case only the offspring undergoes selection, and the parents
die off. *m* (usually a lowercase *mu*, denotes the population
size, and *l*, usually a lowercase *lambda* denotes the number
of offspring generated per generation). Or to put it in an utterly insignificant
and hopelessly outdated language:

(However, dealing with ES is sometimes seen as "strong tobacco," for it takes a decent amount of probability theory and applied STATISTICS to understand the inner workings of an ES, while it navigates through the hyperspace of the usually n-dimensional problem space, by throwing hyperelipses into the deep...define(Evolution-strategy population) (if(terminate? population) population (evolution-strategy (select (cond(plus-strategy? (union(mutate (recombine population)) population)) (comma-strategy? (mutate (recombine population))))))))

Luckily, this medium doesn't allow for much mathematical ballyhoo; the author therefore has to come up with a simple but brilliantly intriguing explanation to save the reader from falling asleep during the rest of this section, so here we go:

Imagine a black box. A large black box. As large as, say for example, a Coca-Cola vending machine. Now, [..] (to be continued)

A single INDIVIDUAL of the ES' population consists of the following GENOTYPE representing a point in the SEARCH SPACE:

OBJECT VARIABLES Real-valued x_i have to be tuned by recombination and mutation such that an objective function reaches its global optimum. Referring to the metaphor mentioned previously, the x_i represent the regulators of the alien Coka-Cola vending machine.

STRATEGY VARIABLEs Real-valued s_i (usually denoted by a lowercase sigma) or mean stepsizes determine the mutability of the x_i. They represent the STANDARD DEVIATION of a (0, s_i) GAUSSIAN DISTRIBUTION (GD) being added to each x_i as an undirected mutation. With an "expectancy value" of 0 the parents will produce offspring similar to themselves on average. In order to make a doubling and a halving of a stepsize equally probable, the s_i mutate log-normally, distributed, i.e. exp(GD), from generation to generation. These stepsizes hide the internal model the population has made of its ENVIRONMENT, i.e. a SELF-ADAPTATION of the stepsizes has replaced the exogenous control of the (1+1) ES.

This concept works because selection sooner or later prefers those individuals having built a good model of the objective function, thus producing better offspring. Hence, learning takes place on two levels: (1) at the genotypic, i.e. the object and strategy variable level and (2) at the phenotypic level, i.e. the FITNESS level.

Depending on an individual's x_i, the resulting objective function value
f(x), where x denotes the vector of objective variables, serves as the
PHENOTYPE (fitness) in the selection step. In a plus
strategy, the *m* best of all (*m+l*) individuals survive to
become the parents of the next generation. Using the comma variant, selection
takes place only among the *l* offspring. The second scheme is more
realistic and therefore more successful, because no individual may survive
forever, which could at least theoretically occur using the plus variant.
Untypical for conventional optimization algorithms and lavish at first
sight, a comma strategy allowing intermediate deterioration performs better!
Only by *forgetting* highly fit individuals can a permanent adaptation
of the stepsizes take place and avoid long stagnation phases due to misadapted
s_i's. This means that these individuals have built an internal model that
is no longer appropriate for further progress, and thus should better be
discarded.

By choosing a certain ratio *m/l*, one can determine the convergence
property of the evolution strategy: If one wants a fast, but local convergence,
one should choose a small HARD SELECTION, ratio, e.g.
(5,100), but looking for the global optimum, one should favour a softer
selection (15,100).

Self-adaptation within ESs depends on the following agents (Schwefel 87):

Randomness: One cannot model mutation as a purely random process. This would mean that a child is completely independent of its parents.

Population size: The population has to be sufficiently large. Not only the current best should be allowed to reproduce, but a set of good individuals. Biologists have coined the term "requisite variety" to mean the genetic variety necessary to prevent a SPECIES from becoming poorer and poorer genetically and eventually dying out.

COOPERATION: In order to exploit the effects of
a population (*m* > 1), the individuals should recombine their knowledge
with that of others (cooperate) because one cannot expect the knowledge
to accumulate in the best individual only.

Deterioration: In order to allow better internal models (stepsizes) to provide better progress in the future, one should accept deterioration from one generation to the next. A limited life-span in nature is not a sign of failure, but an important means of preventing a species from freezing genetically.

ESs prove to be successful when compared to other iterative methods
on a large number of test problems (Schwefel 77). They are adaptable to
nearly all sorts of problems in optimization, because they need very little
information about the problem, especially no derivatives of the objective
function. For a list of some 300 applications of EAs, see the SyS-2/92
report . ESs are capable of solving *high dimensional, multimodal, nonlinear
problems* subject to *linear and/or nonlinear constraints.* The
objective function can also, e.g. be the result of a SIMULATION,
it does not have to be given in a closed form. This also holds for the
constraints which may represent the outcome of, e.g. a finite elements
method (FEM). ESs have been adapted to VECTOR OPTIMIZATION
problems (Kursawe 92), and they can also serve as a heuristic for NP-complete
combinatorial problems like the TRAVELLING SALESMAN PROBLEM
or problems with a noisy or changing response surface.

**References**

Kursawe, F. (1992) " Evolution strategies for vector optimization", Taipei, National Chiao Tung University, 187-193.

Kursawe, F. (1994) " Evolution strategies: Simple models of natural processes?", Revue Internationale de Syst?mique, France (to appear).

Rechenberg, I. (1973) "Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution", Stuttgart: Fromman-Holzboog.

Schwefel, H.-P. (1977) "Numerische Optimierung von Computermodellen mittels der Evolutionsstrategie", Basel: Birkh?user.

Schwefel, H.-P. (1987) "Collective Phaenomena in Evolutionary Systems", 31st Annu. Meet. Inter'l Soc. for General System Research, Budapest, 1025-1033.