Tuesday, October 7, 2008

Working through Genetic Programming

> import Random
> import Control.Monad
> import Control.Monad.Random
> import Data.Ratio
> import Data.List (sortBy)

So I really did accomplish my reading of the first three chapters of Koza's first book on Genetic Programming the week I said I would, but I sat on this post for a good bit afterwards. There wasn't a lot of meat in those initial chapters except for a few salient points:

  1. Genetic programming can give you good answer if you're willing to accept that the answer is not 'correct', merely accurate and if you're willing to lay aside preconceived notions of beauty in your solutions.
  2. Due to the schema theorem, a genetic recombination/fitness based search can effectively cover a large swath of the solution space with a small population.

The majority of Ch. 3 of the book is devoted to Old School genetic algorithms and the schema theorem. I'm not going to bother going through fully fledged examples of the schema theorem, but I think the point can be understood fairly intuitively: when you proportionally select the fittest specimen to be in the mating pool of each generation, you are making judgements about the fitness of patterns of genes and thus gather information about not just the specimen you have but all the possible specimen that have overlapping genetic structure.

What I want to do in the rest of this post is just write my own tiny, not terribly well-featured, genetic algorithm implementation. I'm hoping that this will provide not only more practice for me, but also be instructive to others. I base the implementation more upon the heuristic description that Koza gives than on anything else and make no claim that this is any kind of optimal construction.

First, since I hope to express my code at least somewhat generically between genetic algorithms and genetic programs, I'm going to define a typeclass to capture the basic behavior required of a "genetic" structure. Also, it is worth noting that in this post I'm going to make gratuitious use of the Control.Monad.Random library just because it feels somehow appropriate to be doing these calculations in a monad.

> class Genome a where
> mutation :: MonadRandom m => a -> m a
> recombination :: MonadRandom m => a -> a -> m (a,a)

Oddly enough, that didn't seem too terrible. We have no notion of how often something like mutation in the typeclass, because that should be controlled at the level of the actual implementation.

Now, though, we'll define an instance of this class for lists as that's probably the easiest thing to do. I do, however, make the implicit assumption that in a given simulation all the specimen are of the same length.

> instance Random a => Genome [a] where
> mutation xs = do
> i <- getRandomR (0,length xs - 1)
> x' <- getRandom
> return $ replaceAt i xs x'
> recombination xs xs' = do
> i <- getRandomR (1,length xs - 2)
> let (xs1,xs2) = splitAt i xs
> (xs1',xs2') = splitAt i xs'
> return (xs1++xs2',xs1'++xs2)
> replaceAt i xs x = xs1++[x]++(drop 1 xs2)
> where (xs1,xs2) = splitAt i xs

At this point we can start putting together the real framework for a genetic algorithm run. The first, and relatively easiest function to compute will be the one to choose the mating pool for the next generation given a fitness function and a list of specimen. I assume a constant population size for the course of the simulation, so the outgoing list will have just as many members as the in going list. We also do our selection based entirely upon the relative fitness between the specimen of the population, an idiom captured quite well in the fromList function in the Control.Monad.Random library.

> chooseMatingPool :: MonadRandom m => (a -> Integer) -> [a] -> m [a]
> chooseMatingPool eval pop = replicateM l $ fromList weighted
> where weighted = map (\p -> (p, eval p % 1)) pop
> l = length pop

Now I'll admit that I'm limiting the fitness function a little by requiring it to return Integer values, but I honestly feel like that's a generality that I'm willing to lose for the purposes of this post and for the convenience of using fromList.

Next will be the handler that sexually recombines the mating pool. I cheat a little here and presume that since the order of the mating pool was selected randomly that I don't need to shuffle them any more when I perform the selection and thus I can just take pairs of specimen, breed them, then put them back in the list. If there's an odd man out, or as I like to call him "Creighton in High School", then we don't breed him with anyone and just put him back in the pool.

The function to implement mutation in the population is also extremely trivial so I'll throw it in below without much ado.

> breed :: (MonadRandom m, Genome a) => [a] -> m [a]
> breed [] = return []
> breed (x:[]) = return [x]
> breed (x:y:xs) = do
> (x',y') <- recombination x y
> xs' <- breed xs
> return $ x':y':xs'
> mutate :: (MonadRandom m, Genome a) => Double -> [a] -> m [a]
> mutate chance pop = mapM (mutate' chance) pop
> where mutate' c a = do
> result <- getRandomR (0,1)
> if result < c then mutation a else return a

Alright, so the next minor hurdle is going to be generating the initial population we need. Now, my cheating with lists has painted us into a slight corner where we can't be as generic as we'd like. The problem is that there are two conflicting imperatives at work here. First, that lists are a very convenient way to deal with sets of genes. Second, that the lists in a single simulation must all be of the same size. The result is that I can't write any generic instance such as

instance (Random a) => Random [a] where

useful in all instances, since I want to be able to use different size lists, or arrays for that matter, for different simulations. What would be the slickest solution? Depedently typed arrays with the size as a part of the type. However, in a world without dependent types we can still write something that works even if the lack of clean semantics annoys me.

In any case we can now start wrapping the whole thing up a bit more into larger control structures. I think these are fairly straight forward, but the main function is runSimulation. It takes in the number of runs to execute, the performance threshold for declaring an early end, the probability of mutation, the evaluation function, and the initial population. That's a lot of parameters!

> randomList :: (Random a,MonadRandom m) => Int -> m [a]
> randomList i = liftM (take i) getRandoms
> initialListPopulation :: (Random a, MonadRandom m) => Int -> Int -> m [[a]]
> initialListPopulation s l = sequence $ replicate s $ randomList l
> simulationStep :: (MonadRandom m, Genome a) => Double -> (a -> Integer) -> [a] -> m [a]
> simulationStep mut_chance eval pop = chooseMatingPool eval pop >>= breed >>= mutate mut_chance
> maxInPopulation :: (a -> Integer) -> [a] -> (a,Integer)
> maxInPopulation f = head . reverse . sortBy (\(_,w) (_,w') -> compare w w') . map (\p -> (p,f p))
> runSimulation :: (MonadRandom m, Genome a) => Int -> Integer -> Double -> (a -> Integer) -> [a] -> m a
> runSimulation 0 _ _ eval pop = return $ fst $ maxInPopulation eval pop
> runSimulation runs threshold mut_chance eval pop = do
> if snd (maxInPopulation eval pop) >= threshold
> then return $ fst (maxInPopulation eval pop)
> else do
> newpop <- simulationStep mut_chance eval pop
> let finalpop = takeBest eval pop newpop
> runSimulation (runs-1) threshold mut_chance eval finalpop
> takeBest :: (a -> Integer) -> [a] -> [a] -> [a]
> takeBest eval xs xs' = take (length xs) . reverse . sortBy (\p p' -> compare (eval p) (eval p')) $ xs++xs'

Now we have enough framework in place that we can introduce an actual, executable, example. In accordance with a tutorial I saw once, we'll evolve the string of all 'True's as a quick test.

> evalOnes :: [Bool] -> Integer
> evalOnes = foldr (\b s -> let v=if b then 1 else 0 in v+s) 0
> main = do
> p <- initialListPopulation 50 20
> final <- runSimulation 200 20 0.01 evalOnes p
> print final

I've done some preliminary testing and found that for this size of problem it still quite often finds the list of 20 Trues within the 200 generations. I'm sure if I played a bit more with how the population was chosen I could get it to be much more optimal, but I plan to worry more about that when I tweak my code to accommodate monotypic genetic programming instead of just fixed length genetic algorithms.

Random Thoughts and Idle Speculation

So first off, I found this code extremely easy to write. Now that I'm pretty much using Haskell for all of my personal projects and toys, I've come to appreciate just how effortless writing code starts to feel. It's just a language that stays out of my way when I'm trying to figure out a problem, and that's about the best compliment I think I can give a tool.

Also, as I've continued reading Koza's book I'm starting to wonder about what exactly makes genetic programming so much more powerful than straight up genetic algorithsm. I'm guessing it really comes down to the richness and varying size of the structure being evolved rather than the fact that you're evolving a parse tree. There's some ideas tickling at the back of my head about what about qualities a data structure, a functor, needs to satisify to be "evolvable". Hopefully I'll be able to clarify some thoughts about that as I continue reading the book.


Antoine said...

You seem to be taking it as a given that the reader understands the difference between "genetic algorithms" and "genetic programming."

Off to Wikipedia with me!

Unknown said...

There's something I don't quite get: breed doesn't seem to take into account the relative fitness of both individuals being mated; furthermore, the offspring do not seem to be included in the resulting population… Am I missing something here?

Creighton Hogg said...

First, thanks for catching the typo in the function definition: should have been x' & y' not x & y in the return statement.

Also, breed should be taking in the "mating pool" population, which has already been randomly generated roulette-wheel style from the population of that simulation step. I didn't think I needed to do anything more than perform the recombination on pairs after that. I'm still learning this, so if I'm wrong please let me know.

Christian said...

Thanks for posting your experiences with haskell in the field of evolutionary algorithms