Questo sito utilizza cookie di terze parti per raccogliere informazioni statistiche.
scorrendo questa pagina, cliccando su un link o proseguendo la navigazione in altra maniera, acconsenti all' uso dei cookie.

Genetic Algorithm. Introduction

 

 

1 Genetic algortihm and evolutionary algorithm
The genetic algorithms (GA) constitutes a sub-set of the Evolutionary Algorithms, generic term that indicates a range of systems of resoultion of problems based on the use of computer similar to the evolutionary processes. Apart from the genetic algorithms, they include the Evolutionary Programming, the Evolutionary Strategies, the Classifying Systems and the Genetic Programming.
In general, the algorithms used in the disciplines of Artificial Intelligence work on the research of a global minimum or maximum in a finite space on the basis of bounds on the space of the solutions. From a formal point of view we can state that, given an element X belonging to a cartesian space D (in the case in which n is the cardinality of D, then X is a vector), and given a function called objective function, then the research of the global optimum is the research of an X* that maximises such function, that is e .


Factors as the presence of more points of local maximum, bounds on the domain D, the non-linearity, can make the research very difficult, so the problem is not solvable in an acceptable time. Then we use algorithms of an euristic type that, even solving the problem with degrees of uncertanties and not assuring the convergence of the search of the solution only in some cases, require much less time of convergence.
From here the distinction between "strong" and "weak" methods. The formers are oriented to the soltution of a specific problem, on the basis of the knowledge of the particular domain and of the inner representation of the system under examination. The good solutions obtained are hardly adaptable to other tasks and with unsatisfying results. The weak methods use less knowledge of the domain, they are not oriented to a specific target and solve a wide range of problems. The evolutionary algorithms are algorithms of euristic research, considered weak methods. However, the new typology of the weak evolutionary methods has been recently introduced, methods that have initially little knowledge of the domain but that during their evolution acquire a greater awareness of the problem, implementing some characteristics of the strong methods ("emerging intelligence").

 

 

 

2 The genetic algorithms
Between the end of the 50’es and the beginning of the 60’es the researchers in the field of evolutionary computation began taking interest in the natural systems convinced that they could build a model for the new algorithms of optimisation. In this optics, the mechanisms of the evolution can be adapted in order to face some of the most interesting computational problem, related to the search for the solution among an enormous number of alternatives. For insance, to solve the problem of design of the proteins with the help of the computer it is necessary to build an algorithm that locates a protein with certain characeristics among a very high number of possible sequences of aminoacids. Similarly, we can search a group of rules, or equations, that allow to foresee the behaviour of the financial markets. Algorithms of this type have to be adaptive, "interact" with a changing environment.

 

From this point of view the organisms can be consdidered as very good problem solvers, as they are able to survive in their environment, they develope behaviours and skills that are the result of the natural evolution. The biologic evolution is similar to a method of research inside a very big number of solutions, constituted by the set of all the genetic sequences, whose results, the desired solutions, are highly adapted organisms, with a strong capacity of survival and of reprodution in a changeable environment, that will transmit to the future generations their genetic material. Essentially, the evolution of a species is ruled thus, by two fondamental processes: the natural selection and the sexual reproduction, the latter determines the re-combination of the genetic material of the parents generating an evolution much more rapid the one that might be obtained if all the descendants contained simply a copy of genes of one parent, modified randomly by a mutation. It is a process with a high degree of parallelism: it does not work on a species at the time, but it tries and changes milions of species in parallel.

 

In short, a genetic algorithm (AG) is an iterative algorithm that operates on a population of individuals that encode the possible solutions of a given problem. The individuals are evaluated through a function that measures the capacity to solve the problem and identifies the most suitable to the reproduction. The new population evolves on the basis of random operators, inspired to the sexual reproduction and to the mutation. The complete cycle is repeated until reaching a given criteria of stop. The use of these algorithms is essentially linked to the programming of the artificial intelligence in robotics, to the biocomputation, to the study of the evolution of the parallel cell systems, to particular problems of management and systems of optimization in engineering.
The GA have thus these strong points:

  1. Possibility to solve complex problems without knowing the precise method of solution,
  2. capacity of auto-modification on the basis of the mutation of the problem,
  3. capacity of simulating some phenomena given a structure and operative modalitites isomorphic with those of the biological evolution.

The first attempts of designing instruments of optimisation, the Evolutionary Strategies of Rechemberg and the Evolutionary Planning of Fogel Owens and Walsh, did not produce interesting results as the tests of biology of the first 60’ies highlitened the operator of the mutation, rather than the redroductive process for the generation of new genes. In the mid 60ies John Holland’s proposal marked a meaningful progress, whose Genetic Algorithms underlined, for the first time, the importance of the sexual reproduction.
In some applications, the GA find good solutions in reasonable times. In others, they might take days, months or even years to find an acceptable solution. But since they work with populations of independent solutions, it is possible to distribute the computational load on more camputers, that produce simultaneously different hypothesis with the consequent reduction of the calculation time .

 

 

 

3 Natural evolution and artificial evolution. Terminology

 

3.1 The natural evolution
The modalities of action of the Darwinian principle of the natural selection can be summarised as follows:

  • The natural evolution acts on the genetic material (genotype) of an individual and not on his physical characteristics, the fenotype. Each variation that promotes the adaptation of an individual emerges from the genetic property, not from what the parents have learnt in their life.
  • The natural selection favours the reproduction of the individuals that improve the adaptability to the changing environment and eliminates the individuals with less reproductive potentials. From the genetic point of view, the natural selection promotes those particular genetic combinations that give birth to a more efficient organism, selecting the genotype, not the fenotype.
  • The reproduction is the central core of the evolutionary process: the generational variability of a species is determined by the genic recombination and by the little random mutations of the genetic code. The differences between individual and parents are established. The variability is an essential condition of the evolution.
  • Natural evolution operates over entire populations through cyclic and generational processes and determined exclusively by the environment and by the interactions among the various organisms.

 

The terminology used draws inspiration directly from the studies on the natural biological evoultion.
The combination of the Darwinian hypothesis with genetics gave birth to principles that constitutes the basis of the genetics of the populations, the explanation of the evolution at a genetic level of the populations.
A population is defined as a group of individuals of the same species, that operate and cross in the same place.
In biology the cromosomes are the filaments of DNA that act as project for the organism. Each cromosome is composed by genes, each of which encodes a particular protein, that in its turn determines the specific characteristics of the organism, like, in example the colour of the eyes. The positions of the genes inside the cromosome are said locus and the different configurations of the proteins are said allels. Most of the organisms have more than a cromosome, whose set is said genome. With genotype is meant the set of the genes of the genome. The final risult of the fetal evolution, the individual, is said fenotype.

 

The sexual reproduction consists in the recombination (crossover) of the genetic material of the parents that produces a new complete genetic material for the progeny. Mutations on single parts of DNA may verify. The fitness is the suitability of the individual, the probability that he lives enough to reproduce. The natural selection promotes the individuals that have the more suitable fenotypes - encoded from particular genotypes- as parents for the next generation. It can be directional, in case it helps the increase of frequence of an extreme form of the character, stabilising if it helps the individuals carrying an intermediate form of a certain character and diverging if they the extreme forms of a character are favoured at the expense of the intermediate ones.
The evolution is based on the following mechanisms:

  • Mutation of allel: primary source of genetic variability;
  • Genic Flux: variation of the frequences of the allels, due to the migratory movements of some individuals, with a consequent introduction or removal of certain genotypes.
  • Genetic Drift: unpredictable variations in the frequence of the allels in case a population have a small number of components. Actually, from a probabilisitc point of view it is easier that events that are less probable happen in a small population with bigger effects.

 

3.2 The artificial evolution
In the terminology of the Genetic Algoritms the cromosome encodes a candidate solution of a given research problem. In Holland’s model with a binary encoding, the cromosome identifies a string of bits, the genes are the bits of the string and the allels, as property of the genes, they can be 1 or 0. The crossover, is the recombination of the genetic material of two parents composed by a single cromosome and the mutation in the random variation of the value of the allels in each locus of the cromosome. The fenotype is the meaning of the cromosome, that is the decoding of the candidate solution of the problem. In the common applications the individuals at a single cromosome are used, thus, in terms of genotype, cromosome and individual are equivalent. In the cases in which the coding of the cromosome represents directly a candidate solution, as in some applications in which the cromosome is a string of real numbers rather than the bits, the terms genotype and fenotype can coincide too.

 

 

4 Holland's model
Stating that the Genetic Algorithms are adaptive complex procedures, finalized to the resolution of research and optimization problems is equivalent to stating that they are procedures searching for the maximum point of a certain function, when this function is too complex to be rapidly maximised with analytical tecniques and a procedure that explores randomly the space of the solutions is inconceivable. The GA selects the best solutions and recombines them with different modalities so that they evolve towards a maximum point. The function to be maximised is said fitness. The term has many aspects: it may means "adaptation", "adaptability", "biologic success”, "competition".

The original model by Holland is based on a population of n strings of bits with a fixed lenght l (n, l Î N), generated randomly. The set of the binary strings with a lenght l has 2l elements and represents the space of the solutions of the problems. Each string (genotype) is the binary coding of a candidate solution (fenotype). In general, the function of fitness is given in the following form:

Through this function, at each genotype gi of the initial population P(t=0) is associated a value Fi=F(gi) which represents the capacity of the individual to solve the given problem. In order to determine the value of adaptability, the function of fitness receives in input a genotype, it decodes it into the correspondent fenotype and tests it on the given problem. Once the phase of evaluation of the individuals of the initial population is concluded, a new population P(t+1) of new n candidate solutions obtained applying the operators of selection, crossover, mutation and inversion is generated.

 

4.1 Selection
Within a population, a probability of selection linked to the fitness is associated to each individual. The selection of operator generates a random number c Î (0,1) which determines which individual will be chosen. The chosen individual is copied in the so called mating pool. The mating pool is filled with n copies of the selected individuals, at the time P(t=0). The new population P(t+1) is obtained through the operator of crossover, mutation and inversion. The operator of selection, chosing the individuals that have the possibility to generate descendants with a high fitness, plays, in the context of the genetic algorithm the role of the natural selection for the living organismsi

 

4.2 Crossover
Within the mating pool two individuals, called parents, and a point of cut, called point of crossover, are randomly chosen. The portions of genotype on the right of the crossover point are exchaned generating two descendants, as in the picture

 

The so called operator of crossover one point is applied n/2 times to obtain n descendants on the basis of a given probability p. In the case in which the crossover is not applied, the descendants coincide with the parents.
An other technique used is the two points crossover: the individuals are not represented by linear strings, but by circles. A portion of circle of an individual is substituted with tath of another one, selecting two points of crossover.
The uniform crossover states that for each couple of parents a binary string of the same lenght called mask is generated. The descendant is generated copying the bit of the father or the one of the mother if the corresponding position of the mask is respectively 0 or 1.
The crossover is a metaphor of the sexual reproduction in which the genetic material of the descendants is a combination of the one of the parents.

 

4.3 Mutation
This operator is inspired to the rare variation of elements of the genome of the living creatures during the evoultion. On the basis of a little probability p, the value of the bits of each individual is changed (from 0 to 1 and viceversa)

 



like in nature, the mutation adds a "noise" or a certain randomness to the whole procedure, so to assure that starting from a population generated randomly there are no points of a space of the solutions that are not explored.

 

4.4 Inversion
On the basis of a fixed probability p, two points in the string are chosen. The string encodes the individual and inverts the bits between two positions.

 



On an initial large population it is difficult to estimate which values of the probability of crossover and of the probability of mutation will give the best performances. The experience shows that there is a strong dependence from the type of problem. Generally the probability of crossover is between 60 and 80%, while the one of the mutation fluctuates between 0,1-1%. If the probabilities that an individual is selected for the reproduction are proportional to its fitness (if f is the value of fitness of a solution and F the sum of the values of fitness of all the population, the probability might be f/F), it is probable that, after the crossover, the best individuals are recombined, with the consequent loss of the best cromosome. In order to avoid this and to speed up the convergence times the best individual of a generation may be cloned. Through this technique, called elitism, keeping a high number of populations, it is possible to clone more individuals in the next generation, while for the others we proceed in the calssical way.

 

 

5 Types of encofing
In general, it is possible to encode the solutions of a problem with binary strings. In other cases, representation of higher level are used and operators of crossover and mutation able to operate on such representations are defined. An other type of encoding used, besides the binary one, is the one based on the real numbers.
The ebinary encoding is important not only from the historical point of view, but also because the more relevant theorical results have been obtained with models based on it. The structure of the data is a vector of bit with a lenght l, which associated a space of 2l possible solutions. We need to define a function that encodes the genotype, or parts, in one or more of real values. The most used operator of crossover is, in this case, the crossover n points (n points of cut).
The encoding based on floating point numbers is the most natural for problems of optimisations of real parameters. The structure of the data is a vector of lenght l where each element is a real number. Each candidiate solution is a point in the space of the research and it is not necessary to foresee functions of decoding of the genotype. The operator of crossover can be the classic one point of cut, but for the mutation operator, in order to alter the genes of the individuals, the elements of another vector are added.

 

 

 

 

 

 

 



Semeion Centro Ricerche di Scienza della Comunicazione
Via Sersale 117, 00128 Roma - Tel. 0650652350, Fax 065060064 - C.F. 06911530589 P.IVA 01644611004

Joomla templates by a4joomla