Results 1 to 8 of 8

Thread: Crosstab over a 3 coumns array

  1. #1

    Thread Starter
    Member
    Join Date
    Apr 2014
    Posts
    52

    Crosstab over a 3 coumns array

    Hello,

    I have a 50k rows array of 3 columns. The 2 first columns are categorical while the third one is numeric

    I want to build a cross tabulation where each cell represents the average of the corresponding two categories

    I'm doing so by looping over the 1st column then over the 2nd one to create a temp array from which I extract the average

    I wonder if it is the most effective way to do it?

    Thanks

  2. #2
    PowerPoster dilettante's Avatar
    Join Date
    Feb 2006
    Posts
    24,487

    Re: Crosstab over a 3 coumns array

    You might look at Tabulator, Crosstab Class.

    There you can find several versions of a Tabulator class. Version 4 there auto-tabulates a number of aggregate statistics or you could add your own specialized aggregates.

  3. #3

    Thread Starter
    Member
    Join Date
    Apr 2014
    Posts
    52

    Re: Crosstab over a 3 coumns array

    Looks extremely promising

    Thanks

  4. #4
    PowerPoster dilettante's Avatar
    Join Date
    Feb 2006
    Posts
    24,487

    Re: Crosstab over a 3 coumns array

    Here's a more stripped down demo that might be easier to follow. It just generates random data (always the same since Randomize wasn't called) and tabulates that, then reports it.

    Timings are reported for the Gen & Tabulate phase and the Reporting phase.

    Name:  sshot.png
Views: 286
Size:  11.9 KB

    Here the generated "Code" category runs from A00 through M99, the column catagories are A through M, and the values are numbers. Averages of the values are reported here.
    Attached Files Attached Files
    Last edited by dilettante; Feb 22nd, 2015 at 01:06 PM. Reason: screenshot

  5. #5

    Thread Starter
    Member
    Join Date
    Apr 2014
    Posts
    52

    Re: Crosstab over a 3 coumns array

    Actually I am more interested in the previous version where 2 variables are displayed for each column (price,volume)

    Moreover I wonder if you could instruct about 3 or even 4 variables per column

    Thanks again

    Avi

  6. #6
    PowerPoster dilettante's Avatar
    Join Date
    Feb 2006
    Posts
    24,487

    Re: Crosstab over a 3 coumns array

    The main difference is storing objects as values instead of numbers. Those objects are instances of a simple value Class with multiple value properties.

    Because of this the "cell" Class (TabulatorValue or TabulatorCell depending on the "version") can't accumulate aggregate statistics aside from the count, so the more primitive TabulatorValue used earlier is adequate.

    The idea was to create generic reusable code for the Tabulator Class and its helper Value/Cell Class.


    It is possible to create specialized versions for handling a multivalued Value Class and accumulate aggregate statistics for each "cell." You could combine the two approaches in the demos and have a Value Class and a TabulatorCell Class that is "aware of" the Value Class' structure.

    Since the count statistic is simple enough that's covered. Then you need only worry about accumulating sums (as well as min, max, etc. when needed). The code in TabulatorCell gets less complex because you have a hard-coded Value Class so there is no need for interpreting data types at runtime, yet more complex because of dealing with the multiple values of the Value Class.

    I'll see if I can cook up a demo.

  7. #7
    PowerPoster dilettante's Avatar
    Join Date
    Feb 2006
    Posts
    24,487

    Re: Crosstab over a 3 coumns array

    Ok, instead of a specific Value Class I decided to do this more generically using a Variant containing an array instead.

    That makes it easier to use this version of Tabulator with different numbers of different values. However I only implemented Sum & Avg, so if you want things like Min, Max, etc. you could add those too based on earlier versions of TabulatorCell.

    Nothing special to accomplish this or do other kinds of things with Tabulator, it's all "plain sailing" as programming goes.

    A wrinkle here is the use of two menu items for "Random" and "Verify" runs. I added the verification run because I got suspicious of some reported values I was seeing.

    Name:  sshot.png
Views: 223
Size:  11.8 KB

    Here we have a set of 3 values (Alpha, Beta, Gamma) that are simulating some sorts of "measurements" from some fictitious device. We're reporting averages of these by hour by day.

    Since we have a lot more data points it takes a lot more time than the previous example did.


    Edit

    Ok, since the performance of the reporting phase was nagging at me I went in and optimized it. Should more than 10 times as fast as before.

    Reposted attachment.
    Attached Files Attached Files
    Last edited by dilettante; Feb 25th, 2015 at 12:20 PM.

  8. #8

    Thread Starter
    Member
    Join Date
    Apr 2014
    Posts
    52

    Re: Crosstab over a 3 coumns array

    Remarkable!!!!

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width