Crosstab over a 3 coumns array
Hello,
I have a 50k rows array of 3 columns. The 2 first columns are categorical while the third one is numeric
I want to build a cross tabulation where each cell represents the average of the corresponding two categories
I'm doing so by looping over the 1st column then over the 2nd one to create a temp array from which I extract the average
I wonder if it is the most effective way to do it?
Thanks
Re: Crosstab over a 3 coumns array
You might look at Tabulator, Crosstab Class.
There you can find several versions of a Tabulator class. Version 4 there auto-tabulates a number of aggregate statistics or you could add your own specialized aggregates.
Re: Crosstab over a 3 coumns array
Looks extremely promising
Thanks
2 Attachment(s)
Re: Crosstab over a 3 coumns array
Here's a more stripped down demo that might be easier to follow. It just generates random data (always the same since Randomize wasn't called) and tabulates that, then reports it.
Timings are reported for the Gen & Tabulate phase and the Reporting phase.
Here the generated "Code" category runs from A00 through M99, the column catagories are A through M, and the values are numbers. Averages of the values are reported here.
Re: Crosstab over a 3 coumns array
Actually I am more interested in the previous version where 2 variables are displayed for each column (price,volume)
Moreover I wonder if you could instruct about 3 or even 4 variables per column
Thanks again
Avi
Re: Crosstab over a 3 coumns array
The main difference is storing objects as values instead of numbers. Those objects are instances of a simple value Class with multiple value properties.
Because of this the "cell" Class (TabulatorValue or TabulatorCell depending on the "version") can't accumulate aggregate statistics aside from the count, so the more primitive TabulatorValue used earlier is adequate.
The idea was to create generic reusable code for the Tabulator Class and its helper Value/Cell Class.
It is possible to create specialized versions for handling a multivalued Value Class and accumulate aggregate statistics for each "cell." You could combine the two approaches in the demos and have a Value Class and a TabulatorCell Class that is "aware of" the Value Class' structure.
Since the count statistic is simple enough that's covered. Then you need only worry about accumulating sums (as well as min, max, etc. when needed). The code in TabulatorCell gets less complex because you have a hard-coded Value Class so there is no need for interpreting data types at runtime, yet more complex because of dealing with the multiple values of the Value Class.
I'll see if I can cook up a demo.
2 Attachment(s)
Re: Crosstab over a 3 coumns array
Ok, instead of a specific Value Class I decided to do this more generically using a Variant containing an array instead.
That makes it easier to use this version of Tabulator with different numbers of different values. However I only implemented Sum & Avg, so if you want things like Min, Max, etc. you could add those too based on earlier versions of TabulatorCell.
Nothing special to accomplish this or do other kinds of things with Tabulator, it's all "plain sailing" as programming goes.
A wrinkle here is the use of two menu items for "Random" and "Verify" runs. I added the verification run because I got suspicious of some reported values I was seeing.
Here we have a set of 3 values (Alpha, Beta, Gamma) that are simulating some sorts of "measurements" from some fictitious device. We're reporting averages of these by hour by day.
Since we have a lot more data points it takes a lot more time than the previous example did.
Edit
Ok, since the performance of the reporting phase was nagging at me I went in and optimized it. Should more than 10 times as fast as before.
Reposted attachment.
Re: Crosstab over a 3 coumns array