dcsimg
Results 1 to 3 of 3

Thread: Exotic stats problem ; mode, fuzzy clusters, etc

  1. #1

    Thread Starter
    New Member
    Join Date
    Apr 2016
    Posts
    1

    Exotic stats problem ; mode, fuzzy clusters, etc

    Was originally thinking of doing this in Python, but now am wondering if its possible in VB ?

    Hi I'm working on a math problem and not sure how to approach it.

    Trying to enumerate, rank and extract most common numeric ranges
    from a list, with a twist ;

    Basic operation example is, I have ten numeric values representing
    weights in variable unit measure types, i.e. grams, ounces, etc

    Each value is unique and is a decimal value,
    for example we have the following set of numbers
    shown below. In this set, we are interested in the
    most common magnitudinal range. Here the most
    common range value is shown by three values ;

    295.999, 312.015, 330.111

    the complete set is shown ;
    .........................
    102.35
    8000.32
    330.111
    295.999
    77.01
    16.999
    1099.222
    645
    890.01
    312.015
    .........................

    What I want to be able to do, is to input a list of ten values similar to the above, and have some way, to simply, easily derive the most common value by range.

    If I was using values that were more static, for example if all value in the range in the list were all similar such as "310", then I could just use the mode function and it would easily tell me this.

    However since the values are variable decimal types, I am a bit stumped as to how I would accomplish this. I came across Python Fuzzy Clustering and it looks like this might work possibly in relation to mode but wondering if there is a simpler, easier, faster
    way to do this?

    The end goal is I want to be able to do pattern analysis on a list of numbers and return the most common range of highest magnitude.

    Outputs desired, as an example from the list above would be the thee values printed to screen, text file, or variable

    ???? better way to do this than ;

    1.) Fuzzy Clustering
    2.) Mode (most common value) of discrete data.

  2. #2
    PowerPoster
    Join Date
    Jun 2013
    Posts
    3,827

    Re: Exotic stats problem ; mode, fuzzy clusters, etc

    Since you're working with sets, there's of course SQL which comes to mind
    (which was designed to deal with sets)...

    Here's a bit of VB6-code, which is using an SQLite-InMemory-DB
    (so that you can avoid, creating a DB-Table on the FileSystem, when using it for statistic-queries)...

    Code:
    Option Explicit
     
    Private Sub Form_Load()
      Const cmdInst = "Insert Into T Values(?)"
      Const grpCond = "Pow(10,Fix(Log10(Abs(IIf(V,V,1)+1e-30))))"
      With New_c.MemDB
        'create a little table for the values
        .Exec "Create Table T(V Double Primary Key)"
        
        'insert the value-test-set
        .ExecCmd cmdInst, -0.22
        .ExecCmd cmdInst, 102.35
        .ExecCmd cmdInst, 8000.32
        .ExecCmd cmdInst, 330.111
        .ExecCmd cmdInst, 295.999
        .ExecCmd cmdInst, 77.01
        .ExecCmd cmdInst, 16.999
        .ExecCmd cmdInst, 1099.222
        .ExecCmd cmdInst, 645
        .ExecCmd cmdInst, 890.01
        .ExecCmd cmdInst, 312.015
          
        DumpResult .GetRs("Select " & grpCond & ", Avg(V), group_concat(V) From (Select * From T Order By V) Group By " & grpCond)
      End With
    End Sub
    
    Private Sub DumpResult(Rs As cRecordset)
      Do Until Rs.EOF
        Dim F As cField
        For Each F In Rs.Fields
          Debug.Print F.Value,
        Next
        Debug.Print
        Rs.MoveNext
      Loop
    End Sub
    Not sure, what you're after exactly - but the above does a Grouping using the Log10-function,
    to "map values into their appropriate magnitude-buckets"...

    It will print out the following in the Debug-Window in 3 Columns, which are:
    - Log10Magnitude
    - Average of the Values within the Group
    - comma-separated (and ordered) List of Values with the given Group

    Code:
     1            -0.22         -0.22         
     10            47.0045      16.999,77.01  
     100           429.2475     102.35,295.999,312.015,330.111,645.0,890.01             
     1000          4549.771     1099.222,8000.32
    HTH

    Olaf

  3. #3
    Sinecure devotee
    Join Date
    Aug 2013
    Location
    Southern Tier NY
    Posts
    4,632

    Re: Exotic stats problem ; mode, fuzzy clusters, etc

    I don't know stats, and fuzzy clustering, etc. so probably shouldn't bother asking, but since you gave an example and I can come up with an algorithm that would return the answer you gave in your example, I don't know if it would be correct with other sets, or return what you expect.

    I guess I'll modify your example set, and ask whether the result would be the same, {295.999, 312.015, 330.111} or something different, perhaps {1000.01, 1057.02, 1099.222, 1200.3}. So, the first has 3 numbers within a range of 34.112 but the second has 4 numbers in a range of 200.29.
    .........................
    102.35
    1200.3
    330.111
    295.999
    77.01
    16.999
    1099.222
    1000.01
    1057.02
    312.015
    .........................
    Four numbers in a range of 200.29 would be an average delta of 50.0725 (200.29 / 4) which is much greater than (34.112 / 3) which is an average delta of 11.37. So 3 out of 10 fall in the much smaller range, compared to the 4 out of 10 subset.
    But then, if you include the set 77.01, 102.35, 295.999, 312.015, 330.111), you have half your numbers (5 out of 10) falling in a range of 253.101, which is an average delta of 50.62, so the average is slightly higher than the set of 4.

    I'm curious about what the criteria is for determining "most common magnitudinal range", and if you want the that, or later you say it a bit differently, "most common range of highest magnitude".

    I'm not sure if that implies a different distinction in the result. i.e. if it turns out that you have two ranges, both having four members, you want to return the four that have the highest values, or would you really want to return the four with the lowest delta between high and low, which would statistically say you would likely have more numbers in that set if the ranges were equal.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Featured


Click Here to Expand Forum to Full Width