Artificial Neural Networks - Question about hidden layers.

**Atheist** · Jun 2nd, 2009, 08:30 AM

Hey.
Some background:
I have finished implementing an artificial neural network (ANN, with the help of articles, my beloved AI book, and some pre-existing code).
I do not yet know if my implementation is 100% correct, so I am doing tests to see if it is.
I have given the net very simple training data as such:
{5.0, 0.0, 0.0} -> "A"
{0.0, 5.0, 0.0} -> "B"
{0.0, 0.0, 5.0} -> "C"

It seems to be able to learn these patterns alright. The network I used for testing this was a very simple one, 3 input neurons and 3 output neurons. No hidden layers.

What I'm trying to do now:
Now I'm trying to train my network on something similar to above. I'm giving it images of every letter in the English alphabet (+ space and 0-9), in an attempt to later have it recognize letters in an image.

The problem:
Given that each image of a letter is 8x10 in size, im setting my ANN to have 80 input nodes, one for each pixel. There are 36 different characters that it will train on, so I let the ANN have 36 output nodes.
My problem is; How many hidden layers should I use? And what should the sizes of these hidden layers be? Is there a rule of thumb that one should use?
Its practically impossible for me to "try" a bunch of settings, given the time it actually takes to train these networks. (or perhaps the slowness is due to my implementation? I know training ANNs are slow, but this is almost unbearable!)

I'm grateful for any tips on this.
Thanks.

**Shaggy Hiker** · Jun 2nd, 2009, 09:12 AM

I'd like to see an answer to that, as well. As far as I can determine, there is no solid rule as to how many hidden layers and/or how big they should be.

How painfully slow are you finding the ANN to train?

**Atheist** · Jun 2nd, 2009, 09:27 AM

Well, currently ive just gone with a random setting of 2 hidden layers, the first one 77 in size, the second one 75 in size.

So:
Layer 1: 80
Layer 2: 77
Layer 3: 75
Layer 4: 36

I have set the maximum-error threshold to 1.1, and the learning-rate to 0.2.
Immediately when I begin the training of the net, the current error is 17 (with alot of decimals).
I have set it to give me an update of the current error for each 10th iteration of training, (conveniently, 10 training iterations takes approximately 1 second), and it seems to lower the error only about 1/1000000 per second. And since my goal is to reach 1.1, and if my math isnt all that lousy:

Error decreases 1/1 000 000 per second. This means that it'll take 1 000 000 seconds to decrease the error by 1. It needs to decrease by approximately 16 to reach 1.1, which would give me 1 000 000*16 = 16 000 000 seconds = 185 days.

185 days of training. I know training an ANN is slow, but that is not reasonable is it? I can think of two possibilities:
1. An error in my implementation.
2. Wrong settings (hidden layer count and sizes, learning-rate etc etc.)

Whats your take on this SH?

**namrekka** · Jun 2nd, 2009, 10:31 AM

I used only 1 hidden layer for those 36 different characters. I must admit it was some time ago. I can't rember all the details. Perhaps I could dig up that old project of me in VB6.
I followed the guide lines and setup of:
http://www.dspguide.com/pdfbook.htm
Chapter 26 is about those networks. I was amazed by the results.

**Shaggy Hiker** · Jun 2nd, 2009, 10:41 AM

I was going to say that the three stage GA I have, which runs through (popSize * generations +1) * popsize^2 iterations before completing, should be around the upper boundary, though I don't have exact experience with ANN systems, yet. The reason for this is that a GA can be used for the ANN training, and the runtime of that GA would be the time taken to perfect the GA. That time would be three days, and even that is excessive (the answer found without squaring that last popsize factor is as good as the final answer, in general, which means that time to completion could be divided by the popSize, or about 72/100 hours for a popSize of 100). Therefore, I would say that 85 days is decidedly inefficient.

I had a good link to an ANN design that I was going to modify to my needs, but I think I've misplaced it by now. I got side tracked on some other, expensive projects, and will still be side tracked for another month or so, but then I'll be back to it.

**Atheist** · Jun 2nd, 2009, 11:03 AM

Thanks for the information guys.
namrekka:
Thank you, that site was really useful. It suggests that most neural networks are designed with only 1 hidden layer that is about 10% smaller than the input layer. I tried that but noticed that, at a couple of iterations, my current Error actually increased after performing the backpropagation. This is a major setback as it is my understanding that the backpropagation should never increase the error at all.

I also read about a bias weight in there, this is something I do not have. Could it make a difference to add it?

SH:
Just to avoid misunderstanding, you're refering to genetic algorithms right? I'm relatively new in the field of AI so I dont know of all the abbreviations yet
So GA can be used for ANN training? Thats interesting. My "professor" at university once told us "if you think training an ANN is slow, just wait until you deal with genetic algorithms".

Anyway I'm going to try implementing the bias node thing now and see if it makes any difference.

**Atheist** · Jun 2nd, 2009, 12:16 PM

The bias node didnt make any noticeable difference but it could be a good thing for future usage. I just read that the learning rate value could range from 10E-6 to 0.1, I use 0.2, which I'm guessing might be a little too high. I changed it to use a learning rate of 0.0006 and indeed it does seem to decrease the error at a faster pace now than earlier. Still very slow, but better. The speed of the error decreasing is, itself, decreasing over time, so I'll have to wait to see if it comes to an unfortunate halt, in which case I'll have to try another learning rate.

The initial question I had in this thread is partially solved. I have learned that there is no definite rule of thumb, but a matter of trying different settings. It is good to know that most neural networks manage with only 1 hidden layer, about 5 to 30% smaller than the input layer.

I'll keep this thread unresolved, in case of further discussions on these ANNs.

**Shaggy Hiker** · Jun 2nd, 2009, 03:23 PM

The site I found was using a GA (which is Genetic Algorithm) to train the ANN. I think the design was something along the lines of setting up the ANN and using the GA to evolve for a set of weights such that the ANN produced the correct results. In that case, the genes in the GA would be the weights in whichever sequence you wanted to try them in. This is a pretty clean design for a GA, and the one I posted in the CodeBank, with the caveats I noted about GA design, could be applied to the problem with relatively minor modification.

The link to that would be here, in case you want to see it.

http://www.vbforums.com/showthread.php?t=553187

**namrekka** · Jun 3rd, 2009, 02:58 AM

I used a dynamic learning rate value. How bigger the error how higher that value. Indeed be careful with those values. The input weights of the neurons can be pushed to far from their values. More then 1 hidden layer I couldn't get it stable. After ca 100 training runs with a certain character set it was able to read the alphabet (capitals) of that character set. However the probability between a "P" and "R" and a "I" and "J" and "1" for example was'n that good. But good enough to make a dession. Reading an other character set was however bad.
After training ca 10000 runs with different charater sets it worked amazing. Even with a test with a new character set the neurons didn't see before.

I didn't use a bias. The pixel values in my system was "1" for white and "0' for black. Was thinking the shift that value with a bias or with a bias directly to the pixel value to +/- 0.5, but Ididn't.

EDIT:
After finding the project again and reading the source I did use a bias input node with a value of -1

**Atheist** · Jun 3rd, 2009, 09:57 AM

A dynamic learning rate value is a good idea, I've adopted that to my implementation, thanks!
100 training runs you say? When I train my net (currently in progress!), after the first training run my squared error is as high as 169. This descends at a rather quick paste until about 80, and from this point on the rate in which the error descends is getting slower and slower. Currently, the training has been going for 1 hour and 26 minutes, resulting in in approx. 517 000 training runs, and a total error of 31. Since its getting exponentially slower as time progresses, I fear reaching a total error of atleast 1 is going to be about as slow as earlier.

I must have set something wrong in order to get this kind of result...if you manage to train your net in just 100 training runs, it would seem rather silly for me to do 500 000 runs and still have a total error of 31.

**Atheist** · Jun 3rd, 2009, 11:11 AM

Slightly better news.
I have made many changes to my neural network since it was implemented to try to successfully train it on these images of characters. But none of the changes seems to have made much difference...until I removed the hidden layer. Now, all of a sudden, using only 2 layers and a combination of the settings I've set (I have tried 2 layers before to no avail), it completes the training after only 2 seconds of training.
I immediately tested it on the following letters: R O I E C A W
It was 100% accurate in recognizing the letters.
However I still wish to make this work with 1 hidden layer. All attempts I've made have had to be canceled due to taking too much time (basically begins crawling around a total error of 18).

**namrekka** · Jun 4th, 2009, 05:27 AM

Here some details I did 3 years ago:
Name: ocr.jpg
Views: 1354
Size: 43.4 KB

The char generator produced characters (A..Z) of a certain font in a picturebox. This was copied to an other picturebox of 16*10 pixels with a 1 pixel shift in each direction so 9 times (8 shifts and a no shift). I used 188 passive input nodes (no weights) for the picturebox and 1 for a bias with value -1. These nodes produced 1 for white pixel and 0 for a black pixel.
I used 51 hidden layer nodes and 27 output nodes. 26 for A...Z and 1 unused for observation. The outputs had values between 0 and 1. 0=100% fail and 1=100% OK.
At first startup all weights are random filled with values between -0.5 and +0.5.

Thread: Artificial Neural Networks - Question about hidden layers.

Thread Tools

Display

Artificial Neural Networks - Question about hidden layers.

Re: Artificial Neural Networks - Question about hidden layers.

Re: Artificial Neural Networks - Question about hidden layers.

Re: Artificial Neural Networks - Question about hidden layers.

Re: Artificial Neural Networks - Question about hidden layers.

Re: Artificial Neural Networks - Question about hidden layers.

Re: Artificial Neural Networks - Question about hidden layers.

Re: Artificial Neural Networks - Question about hidden layers.

Re: Artificial Neural Networks - Question about hidden layers.

Re: Artificial Neural Networks - Question about hidden layers.

Re: Artificial Neural Networks - Question about hidden layers.

Re: Artificial Neural Networks - Question about hidden layers.

Posting Permissions