 # Thread: Exotic stats problem ; mode, fuzzy clusters, etc

1. ## Exotic stats problem ; mode, fuzzy clusters, etc

Was originally thinking of doing this in Python, but now am wondering if its possible in VB ?

Hi I'm working on a math problem and not sure how to approach it.

Trying to enumerate, rank and extract most common numeric ranges
from a list, with a twist ;

Basic operation example is, I have ten numeric values representing
weights in variable unit measure types, i.e. grams, ounces, etc

Each value is unique and is a decimal value,
for example we have the following set of numbers
shown below. In this set, we are interested in the
most common magnitudinal range. Here the most
common range value is shown by three values ;

295.999, 312.015, 330.111

the complete set is shown ;
.........................
102.35
8000.32
330.111
295.999
77.01
16.999
1099.222
645
890.01
312.015
.........................

What I want to be able to do, is to input a list of ten values similar to the above, and have some way, to simply, easily derive the most common value by range.

If I was using values that were more static, for example if all value in the range in the list were all similar such as "310", then I could just use the mode function and it would easily tell me this.

However since the values are variable decimal types, I am a bit stumped as to how I would accomplish this. I came across Python Fuzzy Clustering and it looks like this might work possibly in relation to mode but wondering if there is a simpler, easier, faster
way to do this?

The end goal is I want to be able to do pattern analysis on a list of numbers and return the most common range of highest magnitude.

Outputs desired, as an example from the list above would be the thee values printed to screen, text file, or variable

???? better way to do this than ;

1.) Fuzzy Clustering
2.) Mode (most common value) of discrete data.  Reply With Quote

2. ## Re: Exotic stats problem ; mode, fuzzy clusters, etc

Since you're working with sets, there's of course SQL which comes to mind
(which was designed to deal with sets)...

Here's a bit of VB6-code, which is using an SQLite-InMemory-DB
(so that you can avoid, creating a DB-Table on the FileSystem, when using it for statistic-queries)...

Code:
```Option Explicit

Const cmdInst = "Insert Into T Values(?)"
Const grpCond = "Pow(10,Fix(Log10(Abs(IIf(V,V,1)+1e-30))))"
With New_c.MemDB
'create a little table for the values
.Exec "Create Table T(V Double Primary Key)"

'insert the value-test-set
.ExecCmd cmdInst, -0.22
.ExecCmd cmdInst, 102.35
.ExecCmd cmdInst, 8000.32
.ExecCmd cmdInst, 330.111
.ExecCmd cmdInst, 295.999
.ExecCmd cmdInst, 77.01
.ExecCmd cmdInst, 16.999
.ExecCmd cmdInst, 1099.222
.ExecCmd cmdInst, 645
.ExecCmd cmdInst, 890.01
.ExecCmd cmdInst, 312.015

DumpResult .GetRs("Select " & grpCond & ", Avg(V), group_concat(V) From (Select * From T Order By V) Group By " & grpCond)
End With
End Sub

Private Sub DumpResult(Rs As cRecordset)
Do Until Rs.EOF
Dim F As cField
For Each F In Rs.Fields
Debug.Print F.Value,
Next
Debug.Print
Rs.MoveNext
Loop
End Sub```
Not sure, what you're after exactly - but the above does a Grouping using the Log10-function,
to "map values into their appropriate magnitude-buckets"...

It will print out the following in the Debug-Window in 3 Columns, which are:
- Log10Magnitude
- Average of the Values within the Group
- comma-separated (and ordered) List of Values with the given Group

Code:
``` 1            -0.22         -0.22
10            47.0045      16.999,77.01
100           429.2475     102.35,295.999,312.015,330.111,645.0,890.01
1000          4549.771     1099.222,8000.32```
HTH

Olaf  Reply With Quote

3. ## Re: Exotic stats problem ; mode, fuzzy clusters, etc

I don't know stats, and fuzzy clustering, etc. so probably shouldn't bother asking, but since you gave an example and I can come up with an algorithm that would return the answer you gave in your example, I don't know if it would be correct with other sets, or return what you expect.

I guess I'll modify your example set, and ask whether the result would be the same, {295.999, 312.015, 330.111} or something different, perhaps {1000.01, 1057.02, 1099.222, 1200.3}. So, the first has 3 numbers within a range of 34.112 but the second has 4 numbers in a range of 200.29.
.........................
102.35
1200.3
330.111
295.999
77.01
16.999
1099.222
1000.01
1057.02
312.015
.........................
Four numbers in a range of 200.29 would be an average delta of 50.0725 (200.29 / 4) which is much greater than (34.112 / 3) which is an average delta of 11.37. So 3 out of 10 fall in the much smaller range, compared to the 4 out of 10 subset.
But then, if you include the set 77.01, 102.35, 295.999, 312.015, 330.111), you have half your numbers (5 out of 10) falling in a range of 253.101, which is an average delta of 50.62, so the average is slightly higher than the set of 4.

I'm curious about what the criteria is for determining "most common magnitudinal range", and if you want the that, or later you say it a bit differently, "most common range of highest magnitude".

I'm not sure if that implies a different distinction in the result. i.e. if it turns out that you have two ranges, both having four members, you want to return the four that have the highest values, or would you really want to return the four with the lowest delta between high and low, which would statistically say you would likely have more numbers in that set if the ranges were equal.  Reply With Quote

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•

Featured

Click Here to Expand Forum to Full Width