When adding nodes to an already spanned tree, do I need to re-calculate the entire tree from scratch, or can I just locally re-structure it?
To illustrate
I want to add both the green and black nodes. The green one seems straightforward because it doesn't affect the local structure much. But the black node's insertion would disrupt it quite a lot.
Now I have a theory...
I draw a new Node (P) on the graph. Its 6 closest neighbouring nodes are the only ones potentially affected by the new node P.
Is that theory viable? I'm not sure how to go about proving/disproving it and I wondered if anyone had done it already.
(I came by the number 6 because 6 is the maximum number of connections a single node can have. On a 2D plane it is impossible for a node to be connected to 7 or more other nodes. 6 is possible but only by chance and the order in which the nodes are processed. 5 is rare, 3 and 4 are common)
I won't be able to help you but I'm curious, what does this diagram mean? I've never seen this type of diagramatical representation of anything before.
Imagine you have a list of (X,Y) coordinates called nodes (which could represent telephone systems for example) you need to have each node connected to the group as a whole. But the problem is that telephone cable is very expensive so you have to use as little of it as possible. There must be no circular loops in the finished spanned tree, as this would indicate a waste of cable.
The algorithm itself is quite fascinating.
Among other things it can serve as a rudimentary starting point for the Travelling Salesman problem. But it's mainly used in telephone and computing networks to connect things cheaply together.
Anyway this kind of thing is a LOT more useful than it might look. Its good programming practice too linked lists, binary searches, line intersection the list goes on.
Well, you know all the co-ordinates so can't you simply look for the closest node and draw the line and add the length of that line to the total length?
I'll try making this program on my calc, I'm not good enough at VB.
your theory doesn't hold, imagine the six closest nodes are to the left of the black node, and the seventh closest node to the right of the black node, and it is connected to one of them on the left side.
Use
writing software in C++ is like driving rivets into steel beam with a toothpick.
writing haskell makes your life easier:
reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.
Acidic: Well, not always (that would work when P is on the outskirts of the pattern).
lets look at the black node again...
I'd need to delete the 2 blue dashed lines and add the 3 green ones. So a local re-plotting is required.
My concern is that such a replot might interfere with other localities and ultimately lead to a complete re-calculation of the whole structure in certain cases.
Kedaman: Damn, hadn't considered that. Good point.
So do you think I'll have to re-calc after add each batch of new nodes, in all cases?
Last edited by wossname; Dec 13th, 2003 at 08:21 AM.
if it is not necessary then your theory is flawed.
Use
writing software in C++ is like driving rivets into steel beam with a toothpick.
writing haskell makes your life easier:
reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.
A total recalc might be needed, but I'm certain that you can localize it somehow for most cases.. although I'm not sure if there is a general solution..
Use
writing software in C++ is like driving rivets into steel beam with a toothpick.
writing haskell makes your life easier:
reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.
You should go a bit further with the theory: you need to recalculate the closest nodes connections as well. Actually, recalculate each node which will have changes in its connections. However... that's already a lot of calculations.
This is interesting... a thing one would like to do himself
For that graph: is there dependancy on where the calculation starts that'll affect the end result, or will it always end up in the same result, no matter what order the nodes are processed?
It shouldn't matter where you start. here's my psuedo code
1. ask how many points there are
2. ask X and Y co-ord for each
3. plot each point
4. loop through all items looking for shortest connection if the two aren't already connected (stuck but will get through)
5. draw the shortest line
6. Loop through 4 & 5 for each item.
There it shouldn't matter which co-ord you start off with. I will do so that adding another item will cause a total re-draw (which shouldn't take long anyway)
a total recalc would be extremely slow (btw that algoritm is called prim's algoritm and has an algoritmic complexity of nlogn) compared to a local recalc if the node count is high
Use
writing software in C++ is like driving rivets into steel beam with a toothpick.
writing haskell makes your life easier:
reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.
Use
writing software in C++ is like driving rivets into steel beam with a toothpick.
writing haskell makes your life easier:
reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.
i was thinking:
if we take each node starting from the closest, and exclude all nodes behind the perpendicular bisector until all nodes are excluded or included, and the included ones might be affected. Not sure if this is efficient though, could be an nlogn operation but if there are many nodes, they will effectively be excluded.
Use
writing software in C++ is like driving rivets into steel beam with a toothpick.
writing haskell makes your life easier:
reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.
That picture is very ambigous.
the yellow circle would not cut off any lines and a line would form to the top left node. This would turn out OK. As far as I can see, Merri might have hit the nail on the head. Please make a clearer image kedeman. sorry, I know it's time consuming but I need to know.
hmm, the idea i got of merris was a circle with radius=distance to closest node. In the picture, the radius=2, but in this case it should form a line to rightmost node, but that doesn't happen.
Use
writing software in C++ is like driving rivets into steel beam with a toothpick.
writing haskell makes your life easier:
reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.
The only time that you might get a different spanning tree when calculated in a different order, is in a scenario when several pair of nodes (in close-ish proximity to each other) have identical distances. For instance, imagine 3 nodes in an equilateral triangle formation...
AB=AC=BC
If this formation is among several other nodes then we might get A--B--C or B--C--A or C--A--B depending on the order in which calculations were performed.
But this doesn't really matter because they are equivalent anyway, none of these 3 variants is better than the other 2.
1. Load node Coords from file and plot them on imaginary graph
2. Calculate table of distances. For each node, calculate its distance from every other node. (Taking care not to do each calculation twice by mistake).
3. Sort the distances into ascending order. This is the most time-consuming bit, as N increases, the number of distances to sort increases geometrically (Dists = 0.5N^2 - 0.5N)
4. Keep adding the lines until N-1 have been added, you can't add a line if it makes a loop. When done, optionally discard the rest of the lines.
Now we should have something looking like the picture in the first posting (without black and green nodes).
Its about the fastest and memory efficient way I have found so far.
Last edited by wossname; Dec 13th, 2003 at 10:44 AM.
kedaman, why should it make a line to the rightmost if the leftmost is the nearest? Atm I see no flaw in what I suggested... only that it is a little on the harder side to accomplish and does need some processing power. It would eventually be faster than recounting everything when the number of nodes increase though.
an MST has necessarily got to be the or one of the smallest trees (trough all nodes) possible. If we name the nodes in my picture a,b,c and d (from top to bottom, left to right (as we read that is))
calculate the distances for:
ab, ad, bc
ab, ad, cd
Use
writing software in C++ is like driving rivets into steel beam with a toothpick.
writing haskell makes your life easier:
reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.
1. when calculating the distances, calculate their squares instead, dx*dx+dy*dy where dx=x0-x1, dy=y0-y1. Taking the root is timeconsuming, and squares wont affect the sort
2. do this for each node0 for each node1 starting at node0 +1 to the last node (this avoids the need to calculate distance twice).
3. use an nlogn sort like quicksort
Use
writing software in C++ is like driving rivets into steel beam with a toothpick.
writing haskell makes your life easier:
reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.
Originally posted by kedaman wossname, some suggestions:
1. when calculating the distances, calculate their squares instead, dx*dx+dy*dy where dx=x0-x1, dy=y0-y1. Taking the root is timeconsuming, and squares wont affect the sort
2. do this for each node0 for each node1 starting at node0 +1 to the last node (this avoids the need to calculate distance twice).
3. use an nlogn sort like quicksort
I'd considered dropping the Sqrt() but I'd have to calc it later anyway so I might as well do it when it could be masked slightly by the hard-drive read routine, it might not be so noticeable. Besides I've set a restriction of a maximum of 300 nodes in any network. This means only 44850 sqrts need be calculated. But it does mean a hell of a lot of sorting needs to happen.
I already had suggestion 2 down Hence the (0.5(x^2))-0.5x formula.
I'm currently using a linked list to sort the ever growing list of distances. Would it be better (memory and speed wise) to create an array instead and run a quicksort? If so would it be better to write quicksort to suit my needs or is it generic enough to handle my problem anyway?
When I've made the program stable enough, I'm planning to run it overnight on a LARGE problem Maybe 1000+ nodes. Probably take a few days.
Last edited by wossname; Dec 14th, 2003 at 08:59 AM.
i still suggest you leave the roots til after sorting out all edges not included in the tree, they will be growing in a quadratic manner, the tree linearily (1000 nodes gives 999 times sqrt after but 500000 before, and expensive operations like sqrt will become really nasty in these quantities)
an array could be nasty if you resize it when its large, a middle thing would be to use a tree, but it consumes more memory, but in this case its nothing, trees can keep themselves sorted, while you add items to them. alternatively you could use a heap, especially if you drop chunks of items on them, or a hashtable, but they are arrays as well. lists would definitely be the worst option still.
Use
writing software in C++ is like driving rivets into steel beam with a toothpick.
writing haskell makes your life easier:
reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.
if you still need help with the original problem, i posted a solution here earlier..
still: instead of finding the nearest nodes (which would require calculating distances again) start with any node and start excluding the nodes in their bisector shadows right away and you'll save a lot of time.
Use
writing software in C++ is like driving rivets into steel beam with a toothpick.
writing haskell makes your life easier:
reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.
With the new Node N:
1) Add the edge to its closest neighbour.
2) For each other node K with an edge to N:
a) If the smallest current edge off K is > N, add the new edge
b) Scan for loop. if there is one, remove the longest edge in the loop
I'm not sure its necessary to sort edges, you might have to use all of them, you might have to use a lot of large ones.
You would just need, for every node, the smallest edge coming off that node. (including for the new node to be added)
I now realise that you have to take them in the order from the closest.. but i guess you have to live with it
A perpendicular bisector between two points (in this case the new node added and closest node not yet excluded or included) is the symmetry line between those points.
by shadow i meant everything behind this line (in the new node pov)
Reason for this is (look at the picture) if there was to be a node inside the bisectors it would form a triangle with shorter distance to the inner node (new node) than to the outher, and thus an edge to the outher node would potentially be broken. However this is not possible for any node outside.
Use
writing software in C++ is like driving rivets into steel beam with a toothpick.
writing haskell makes your life easier:
reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.
btw: don't take what i say for granted, I could be wrong too.
In the figure:
yellow node: the new node
red node: an arbitrary node you test
Redline: the perpendicular bisector between yellow and red
yellow circle: with radius=distance to the arbitrary node
blue node: This one is inside the bisector - therefore don't exclude it (draw a line from it to the red one and then to the yellow, and see which is shorter, you'll find that if its shorter then it will link to yellow, thus edge to red might break)
green node: This one is outside the bisector - therefore exclude it (it has shorter way to the red node and thus the yellow node can't affect it)
purple node: This is a special case, because its within the circle, which means that it can connect to yellow even if its outside the bisector, so don't exclude it.
This way you wont have to start from the closest node which saves time again. The algorithm is:
Take any node not excluded or included
make up equation for bisector and take every node behind it and outside the circle, and exclude them (including those earlier included). The node itself is included.
keep going til there are no free nodes left, and voila you have localized all affected nodes.
Use
writing software in C++ is like driving rivets into steel beam with a toothpick.
writing haskell makes your life easier:
reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.