Suppose that there is a container that has a bunch of widgets moved into it, then a series of moves out of it. The widgets have properties. All of them are either A or B, and there are four moves into the container, each of which is equally split between A and B. So, if each move in is 2000 widgets, then 1000 will be A and 1000 will be B, after which the container holds 8000 widgets, 4000 of which are A and 4000 of which are B.

Now assume that each of the moves in adds some distinct trait to the widgets in the move. Move 1 adds trait 1, move 2 adds trait 2, etc. Therefore, widgets that enter the container from move one are now A1 and B1, widgets that enter the container from move two are A2 and B2, and so on. This means that the container ends up holding 1000 A1, 1000 A2, 1000 A3, 1000 A4, 1000 B1, 1000 B2, 1000 B3, and 1000 B4.

If widgets are selected at random for any move out, then the numbers of each trait combination in the move out will be proportional to the numbers that were found in the source. This is easy to calculate. The problem is, what happens if there are filters on the moves out. For example, what if the first move out takes out 10 widgets, but ONLY type A. The next move out takes out 20 widgets but ONLY type 3. The problem is to determine what is the number of A3 remaining in the container after those removals.

To solve this, I know the number of {A, 3} prior to any moves out, as that is 1000. For the first move out, I calculate the ratio {A, 3} / ({A, 3} + {A, !3}). In other words, the portion of the widgets removed that are {A, 3} is based on the ratio of {A, 3} to ANY A, which is the sum of {A, 1} + {A, 2} + {A, 3} + {A, 4}. This is easy enough to calculate, since I know how many widgets are in each of these sets. The calculation becomes 1K / (1K + 1K + 1K + 1K), or 0.25, which when multiplied by the size of the move out (10 widgets) shows that roughly 2.5 of those widgets will be {A, 3}

The same sequence of events can be performed on the next move out, where I am looking at the number of the 20 type 3 widgets that are {A, 3}, which is calculated by:

portion that are {A, 3} = 20 * {A, 3} / ({A, 3} + {B, 3})

or

20 * (997/(997 + 1000)

which is 10.

So, after those two moves out, the number of widgets remaining that are {A, 3} is about 987.

What I am trying to do is generalize this solution for more than two traits. In this example, the first trait is either A or B, the second is either 1, 2, 3, or 4, and the number of buckets is the combination of all of those. That's a large number of buckets, as it is 2 * 4 buckets.

In the real world example that I am looking at, there could be a dozen traits, but with several restrictions that make things simpler. Most of the traits won't matter because they will be on 100% of the widgets. The preceding example would be much simpler if ALL the widgets were A. In fact, A would then be ignored for the calculations. Realistically, I can only filter on one trait. In the example, I had a filter on A vs B on one move and a filter on 3 vs 1, 2, or 4 on the second move. In the real world scenario, I have only the latter and never the former, but I can't count on that remaining forever true.

Furthermore, there are traits that people will want to ask about, but for which there can't be a filter on the move out, and one of the most important of those could have 20 different values (think of that second trait being 1 to 20, rather than 1 to 4).

The example shown would be more or less possible, because all the buckets make up an N dimensional array, and the calculations at any move out are straightforward in that they would sum across one of the indexes, but N seems like it could easily become absurd. N = 2 is tractable. N = 3 is getting a bit much. N = 5 is kind of insane.

What I'm looking for is whether or not this general problem has a known solution. ]]>