The Knapsack Problem, An Evolutionary Approach

**Shaggy Hiker** · Jan 10th, 2009, 06:03 PM

There is a general class of problems known as the Knapsack Problem, described in detail at this link:

http://en.wikipedia.org/wiki/Knapsack_problem

This problem comes up in various guises in questions asked on this forum about once a year or so. In this thread:

http://www.vbforums.com/showthread.p...ckpack+problem

the question came in the form of filling orders from stock material, which is a somewhat simpler subset of the general problem, but still inordinately difficult for all but trivial cases. I decided to modify an existing genetic algorithm to solve this particular problem, in hopes that it would serve as a basis for others to work with. The project took about six hours, but that was in two parts separated by about three months. All of the files needed for the project are attached here, though the final exe is not included in the zip file, nor is the sln file, so a person would have to create their own project and include the files. All of the code was written in VB.NET2005.

There are a few things I didn't really like about the code, such as the graphics button. The project that I modified for this project had a Hollywood-style interface (lots of colorful lines waving around on the screen) that I had written so that I could have a backdrop for a presentation. That graphical interface made no sense in the current project, so I just put a couple colored panels on the screen, and it really didn't work very well. There are a few things about the output that are less than ideal, as well, including a couple forms leftover from the first project that have meaning, but are of little use to most people.

Having said that, the purpose of posting the code is not for that interface, but because the general design could be useful to somebody who wants to use an evolutionary approach to solving a difficult problem. In particular, the code includes a few features that I have found to be highly useful for genetic algorithms (GA) in general.

The code is divided into three main classes: Genes, Genomes, and Population. The Population class holds a list of genomes, while each genome is primarily a list of genes. The most difficult part of a GA is figuring out what constitutes a gene. In general, a gene seems like it should be the smallest item, but the only real key is that a gene needs to be an item such that there are a bunch of them forming a genome. In this program, there would be a set of stock items of various lengths, and an order, which would consist of N1 pieces of length L1, N2 pieces of length L2, N3 of length L3, etc. One possible gene would have been N, so a gene could have a number of items needed, and a length. I rejected that idea, because the goal of the program was to combine cuts to minimize waste. If Nx was a gene, then the only options for minimizing lengths would be to find combinations of lengths that were less than a stock length. That could work, but it would leave out the possibility that two items of Nx could come from a single piece of stock. Every Nx would have to be combined with an Ny, because there would only be one Nx in every genome. Therefore, I settled on each single piece in the order being a gene. A piece had a length as it's only characteristic, and there were N1 pieces of length L1, so N1 genes with a length of L1.

However, if I just listed these genes in order, there would be no variation from one genome to the next, and since the length could never change for any one item, there was no mutation that was possible. I realized quickly that I would need to be able to mix up the order of the genes, such that gene 0 might need length L2, then gene 1 need length L5, etc. I also realized that for every piece, there were a couple of decisions to make as to which piece of stock to cut the item from. For example, I could cut the piece from the smallest stock item that was longer than the length needed, but if two items could be cut from a single, longer, piece of stock, then that would be more efficient than cutting from the smallest stock available. Furthermore, I had to keep track of the scrap, so that I could cut items from any scrap that was large enough.

To address those two issues I added a starting point, a stock direction, and a scrap direction. The starting point was the point in the stock list to begin looking for a suitable piece of stock. This would be randomized for new genes. From that point, the stock direction would tell the gene whether to look up the list, or down the list, for suitably sized stock items. The scrap direction did the same for the scrap list, which is built up over time as stock items are cut.

Mixing up the order of the cuts proved to be far more difficult. At first, I thought that I could simply scramble the genes within the genome, but during the sex routine, the parental genomes line up, and a few genes are taken from parent 1, then a few are taken from parent 2, and so on. This means that all N1 items of length L1 need to be in exactly the same place in each genome, or crossing over will mean that there will be too many or too few items of length L1 in the offspring. I solved this by giving each gene an Evaluation Order member, which was just a random number between 1 and 100 million. When it came time to evaluate the genome, the genes are moved to a temporary list, and sorted by this evaluation order member, which will effectively scramble them as far as the order of cuts is performed, even though they will remain in the proper order within the genome.

The result of these items is that a mutation could:
1) Flip the Stock Direction
2) Flip the Scrap Direction
3) Change the starting point within the stock or scrap pile.
4) Change the evaluation order to something totally different, which will change the order in which the cut for that particular gene is made.

Four different kinds of mutation is nice, especially since the first two types of mutations are just flipping booleans to one value or another.

Mutations play an odd roll in evolution. For one thing, mutation can introduce completely novel genes into the population, by creating a gene that doesn't otherwise exist in the population. If that gene is beneficial, the mutation will be passed on. The other thing that mutations can do is liven up a stagnant population. Over time, bad genes tend to get weeded out of the population, which can make the population rather homogenous. In fact, with so little variation between genes as exist in this case, I found that genomes would become exact clones at an alarming rate. Therefore, I opted to check for clones in the population, and as the number of clones increased, I increased the mutation rate. This meant that as the population became more homogenized, the rate of mutation increased to shake things up a bit more.

The cleanest way to handle mutation is to let the gene alter themselves, but that can be risky. If the mutation rate is set at a fixed rate, such as 0.1%, then on average one in every thousand genes could mutate. However, for this program, the length of a genome can vary depending on the order size, so a fixed rate might mean that there is one mutation every generation, or one mutation in every reproduction event. The first may be too small, while the second is almost certainly too large, as no offspring would truly be the sum of their parents. Therefore, I set the mutation rate based not on the genes, but on the reproduction event, such that mutations are handled at the genome level (a genome is told to mutate) rather than the individual gene level. That's nothing but a matter of taste. The only key is that the mutation rate be greater than zero, but not so high that the offspring don't resemble their parents.

The next section will discuss the multi-tier approach to the GA.

**Shaggy Hiker** · Jan 10th, 2009, 06:22 PM

To start a GA running, the first step is to create a random starting population. Of course, a randomly created population is going to be pretty bad, and quite often we would be able to look at individuals and KNOW that they sucked. Wouldn't it be better if the initial population was not random, but only included individuals that were better than average? How would we know that the individuals were better than average?

As it turns out, the GA is a tool for finding individuals that are better than average, and that's the basis for the two tiered GA here. The way the program runs, a random population is created, that population is evolved for the designated number of generations (the default value is 100, in this program), then the winner of that round is added to a winners population. These steps are repeated with new, random, populations, until enough winners have been added to the winners population to make up the full populations size (default population size is also 100, in this program). The winners group is now a population of individuals, all drawn from different random starting populations, who are significantly better than average. The winners group is then evolved, and the winner of the winners is the best solution to the problem.

In other GA's, though not in this one, I determined that the winner of the winners was always considerably better than any of the winners themselves. I have not taken the time to confirm those results for this program, but I would expect that it holds true. Of course, this leads to the idea that a population of winners of winners could be formed and evolved to create the winner of the winners of the winners, and so forth. My previous experience has suggested that this third tier is unlikely to produce significantly superior results. The winner of the winners of the winners is often no better than the best winner of winners. Furthermore, every tier takes far longer to run. If tier 1 takes time T1, then T2 = ((T1 * population size) + T1). For the parent of this program, T1 was so long that T2 generally was between 48 and 72 hours of continuous runtime. For the current program, T2 for a population size of 100 would be only about 8 hours, but since there would be little gained in that time, why bother?

There is a name in biology for what is happening with these two tiers: Hybrid vigor. If members of two separate populations breed, the offspring is often superior to either of the parents populations. There is also a phenomena called Outbreeding Depression that acts in the exact opposite fashion, but I feel that the mechanism causing Outbreeding Depression is not so likely to affect software GAs.

An alternative to the multi-tier approach to hybrid vigor would be to have multiple populations evolving simultaneously on multiple computers, or multiple cores within the same computer, and periodically take the best member or two from different populations, and mix them into other populations. This can't happen very often, or else the two populations will be effectively a single population, but if this is done every 100 generations or so, it should allow each population to evolve in isolation enough that the hybrid vigor effect should take place. I don't know the exact number of generation needed for this to work, as I prefer not to tie down more than one computer at a time on a single problem.

So there it is, the code, and whatever things I can think of to say about it.

**NickThissen** · Jan 10th, 2009, 06:22 PM

Interesting, I'm going to check this out. I did a fairly simple GA project (in VB6) a few years back for school, back when I knew nearly nothing about programming. I did got it to work fairly well in the end, but I know now that it could have been written soooo much better... I might try to rewrite it now!

**Shaggy Hiker** · Mar 25th, 2010, 07:31 PM

Just updated this with a 2008 version that includes a few files missing in the original upload.

Thread: The Knapsack Problem, An Evolutionary Approach

Thread Tools

Display

The Knapsack Problem, An Evolutionary Approach

Re: The Knapsack Problem, An Evolutionary Approach

Re: The Knapsack Problem, An Evolutionary Approach

Re: The Knapsack Problem, An Evolutionary Approach

Posting Permissions