|
-
Mar 20th, 2006, 08:59 PM
#1
Thread Starter
New Member
normalize weighting
i'm trying to figure out a fair weighting system for words in a file. this is what i mean. lets say i have a file that has 200 unique words and the word 'foo' is there 5 times. then i have another file that has 150 unique words with 'foo' also listed 5 times. it works out to 5/200=.025 and 5/150=.033. just because the second file has less words doesn't make it a better match for foo. they should be equal. is there anyway i can normalize the weights. and i can't just use the number of occurances because there are other metrics involved. i hope this makes sense. any help would be appreciated.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|