
Jul 31st, 2018, 02:59 PM
#1
Thread Starter
New Member
Exotic stats problem ; mode, fuzzy clusters, etc
Was originally thinking of doing this in Python, but now am wondering if its possible in VB ?
Hi I'm working on a math problem and not sure how to approach it.
Trying to enumerate, rank and extract most common numeric ranges
from a list, with a twist ;
Basic operation example is, I have ten numeric values representing
weights in variable unit measure types, i.e. grams, ounces, etc
Each value is unique and is a decimal value,
for example we have the following set of numbers
shown below. In this set, we are interested in the
most common magnitudinal range. Here the most
common range value is shown by three values ;
295.999, 312.015, 330.111
the complete set is shown ;
.........................
102.35
8000.32
330.111
295.999
77.01
16.999
1099.222
645
890.01
312.015
.........................
What I want to be able to do, is to input a list of ten values similar to the above, and have some way, to simply, easily derive the most common value by range.
If I was using values that were more static, for example if all value in the range in the list were all similar such as "310", then I could just use the mode function and it would easily tell me this.
However since the values are variable decimal types, I am a bit stumped as to how I would accomplish this. I came across Python Fuzzy Clustering and it looks like this might work possibly in relation to mode but wondering if there is a simpler, easier, faster
way to do this?
The end goal is I want to be able to do pattern analysis on a list of numbers and return the most common range of highest magnitude.
Outputs desired, as an example from the list above would be the thee values printed to screen, text file, or variable
???? better way to do this than ;
1.) Fuzzy Clustering
2.) Mode (most common value) of discrete data.

Aug 18th, 2018, 03:47 PM
#2
Re: Exotic stats problem ; mode, fuzzy clusters, etc
Since you're working with sets, there's of course SQL which comes to mind
(which was designed to deal with sets)...
Here's a bit of VB6code, which is using an SQLiteInMemoryDB
(so that you can avoid, creating a DBTable on the FileSystem, when using it for statisticqueries)...
Code:
Option Explicit
Private Sub Form_Load()
Const cmdInst = "Insert Into T Values(?)"
Const grpCond = "Pow(10,Fix(Log10(Abs(IIf(V,V,1)+1e30))))"
With New_c.MemDB
'create a little table for the values
.Exec "Create Table T(V Double Primary Key)"
'insert the valuetestset
.ExecCmd cmdInst, 0.22
.ExecCmd cmdInst, 102.35
.ExecCmd cmdInst, 8000.32
.ExecCmd cmdInst, 330.111
.ExecCmd cmdInst, 295.999
.ExecCmd cmdInst, 77.01
.ExecCmd cmdInst, 16.999
.ExecCmd cmdInst, 1099.222
.ExecCmd cmdInst, 645
.ExecCmd cmdInst, 890.01
.ExecCmd cmdInst, 312.015
DumpResult .GetRs("Select " & grpCond & ", Avg(V), group_concat(V) From (Select * From T Order By V) Group By " & grpCond)
End With
End Sub
Private Sub DumpResult(Rs As cRecordset)
Do Until Rs.EOF
Dim F As cField
For Each F In Rs.Fields
Debug.Print F.Value,
Next
Debug.Print
Rs.MoveNext
Loop
End Sub
Not sure, what you're after exactly  but the above does a Grouping using the Log10function,
to "map values into their appropriate magnitudebuckets"...
It will print out the following in the DebugWindow in 3 Columns, which are:
 Log10Magnitude
 Average of the Values within the Group
 commaseparated (and ordered) List of Values with the given Group
Code:
1 0.22 0.22
10 47.0045 16.999,77.01
100 429.2475 102.35,295.999,312.015,330.111,645.0,890.01
1000 4549.771 1099.222,8000.32
HTH
Olaf

Aug 18th, 2018, 08:58 PM
#3
Re: Exotic stats problem ; mode, fuzzy clusters, etc
I don't know stats, and fuzzy clustering, etc. so probably shouldn't bother asking, but since you gave an example and I can come up with an algorithm that would return the answer you gave in your example, I don't know if it would be correct with other sets, or return what you expect.
I guess I'll modify your example set, and ask whether the result would be the same, {295.999, 312.015, 330.111} or something different, perhaps {1000.01, 1057.02, 1099.222, 1200.3}. So, the first has 3 numbers within a range of 34.112 but the second has 4 numbers in a range of 200.29.
.........................
102.35
1200.3
330.111
295.999
77.01
16.999
1099.222
1000.01
1057.02
312.015
.........................
Four numbers in a range of 200.29 would be an average delta of 50.0725 (200.29 / 4) which is much greater than (34.112 / 3) which is an average delta of 11.37. So 3 out of 10 fall in the much smaller range, compared to the 4 out of 10 subset.
But then, if you include the set 77.01, 102.35, 295.999, 312.015, 330.111), you have half your numbers (5 out of 10) falling in a range of 253.101, which is an average delta of 50.62, so the average is slightly higher than the set of 4.
I'm curious about what the criteria is for determining "most common magnitudinal range", and if you want the that, or later you say it a bit differently, "most common range of highest magnitude".
I'm not sure if that implies a different distinction in the result. i.e. if it turns out that you have two ranges, both having four members, you want to return the four that have the highest values, or would you really want to return the four with the lowest delta between high and low, which would statistically say you would likely have more numbers in that set if the ranges were equal.
Posting Permissions
 You may not post new threads
 You may not post replies
 You may not post attachments
 You may not edit your posts

Forum Rules

Click Here to Expand Forum to Full Width
