Results 1 to 15 of 15

Thread: Loosly compare string

  1. #1

    Thread Starter
    Lively Member
    Join Date
    Jan 2007
    Posts
    79

    Question Loosly compare string

    Hey,

    Subject sound strange, but let me explain;

    I have a list of filenames, 99% of them look like each other, and the 1% is the one I need to capture.
    How can I compare the list and get the ugly duckling?

    This is the list:

    uxm_543_001.jpg
    uxm_543_001a.jpg
    uxm_543_001b.png
    uxm_543_002.jpg
    uxm_543_003.png
    uxm_543_004.jpg
    uxm_543_005.jpg
    uxm_543_006.jpg
    uxm_543_007.jpg
    uxm_543_008.jpg
    uxm_543_009.jpg
    uxm_543_010.jpg
    uxm_543_011.jpg
    uxm_543_012.png
    uxm_543_013.jpg
    uxm_543_014.jpg
    uxm_543_015.jpg
    uxm_543_016.jpg
    uxm_543_017.jpg
    uxm_543_018.jpg
    uxm_543_019.png
    uxm_543_020.jpg
    uxm_543_021.jpg
    uxm_543_022.jpg
    prettypretty.png

    I really appreciate the help, thank you

    Kind Regards, Starf0x

  2. #2
    Lively Member ShadowTzu's Avatar
    Join Date
    Oct 2014
    Location
    France
    Posts
    68

    Re: Loosly compare string

    vb.net Code:
    1. 'Add every filename in a list of string
    2. list_files = New List(Of String)
    3. 'for ...
    4. '   list_files.add filename...
    5. 'next
    6.  
    7. 'find Ugly_duckling
    8.  list_files = list_files.FindAll(AddressOf isUgly_duckling)
    9.  
    10. Private Function isUgly_duckling(filename As String) As Boolean
    11.     If Not Regex.IsMatch(filename, "^[uxm_]") Then
    12.         Return True
    13.     End If
    14.     Return False
    15. End Function

  3. #3

    Thread Starter
    Lively Member
    Join Date
    Jan 2007
    Posts
    79

    Re: Loosly compare string

    That would be nice ShadowTzu, if the files where alway te same.
    But I did my own research and found this, I'm not sure how waterproof this is:

    Code:
            For Each strFile In strFileList
                Dim strMatch As Match = Regex.Match(strFile.ToString, "([00-99])\d+")
                If strMatch.Success = False Then
                    strWrongOnes.Add(strFile.ToString)
                End If
            Next
    kind Regards, Starf0x
    Last edited by Starf0x; Mar 13th, 2015 at 05:12 AM.

  4. #4
    Powered By Medtronic dbasnett's Avatar
    Join Date
    Dec 2007
    Location
    Jefferson City, MO
    Posts
    9,764

    Re: Loosly compare string

    Quote Originally Posted by Starf0x View Post
    That would be nice ShadowTzu, if the files where alway te same.
    But I did my own research and found this, I'm not sure how waterproof this is:

    Code:
            For Each strFile In strFileList
                Dim strMatch As Match = Regex.Match(strFile.ToString, "([00-99])\d+")
                If strMatch.Success = False Then
                    strWrongOnes.Add(strFile.ToString)
                End If
            Next
    kind Regards, Starf0x
    All of this depends on the specifics which you did not state. Specifics for the file name guess:

    1. must start with lowercase uxm_
    2. followed by three numbers
    3. followed by _
    4. followed by three numbers
    5. followed by a lowercase letter - optional
    6. followed by a period
    7. followed by either jpg or png



    If those are the specifics then here is a match string that meets those rules:

    Code:
    uxm_\d{3}_\d{3}[a-z]*\.(jpg|png)
    This will match all of the names in the list you provided except for the last.
    It will not match UXM_123_456.jpg or uxm_12_345.jpg.
    My First Computer -- Documentation Link (RT?M) -- Using the Debugger -- Prime Number Sieve
    Counting Bits -- Subnet Calculator -- UI Guidelines -- >> SerialPort Answer <<

    "Those who use Application.DoEvents have no idea what it does and those who know what it does never use it." John Wein

  5. #5

    Thread Starter
    Lively Member
    Join Date
    Jan 2007
    Posts
    79

    Re: Loosly compare string

    Hi dbassnet,

    As I said, the names are not always the same e.g.:

    Legacy v2 015-000.jpg
    Legacy v2 015-013.jpg
    Legacy v2 015-014.png
    Legacy v2 015-015.jpg
    Legacy v2 015-016.jpg
    Legacy v2 015-017.png
    Legacy v2 015-018.jpg
    Legacy v2 015-019.jpg
    Legacy v2 015-020.jpg
    Legacy v2 015-021.jpg
    dropletter12.jpg

    I could be anything, I need to find that one or 2 files that are not the same as the rest.

    Kind Regards, Starf0x

  6. #6
    PowerPoster SJWhiteley's Avatar
    Join Date
    Feb 2009
    Location
    South of the Mason-Dixon Line
    Posts
    2,256

    Re: Loosly compare string

    Quote Originally Posted by Starf0x View Post
    Hi dbassnet,

    As I said, the names are not always the same e.g.:

    Legacy v2 015-000.jpg
    Legacy v2 015-013.jpg
    Legacy v2 015-014.png
    Legacy v2 015-015.jpg
    Legacy v2 015-016.jpg
    Legacy v2 015-017.png
    Legacy v2 015-018.jpg
    Legacy v2 015-019.jpg
    Legacy v2 015-020.jpg
    Legacy v2 015-021.jpg
    dropletter12.jpg

    I could be anything, I need to find that one or 2 files that are not the same as the rest.

    Kind Regards, Starf0x
    Well, you have 10 files not the same as the rest. Whichever file you choose, there are 10 files not the same as the rest. As DB noted, you need some rules.

    I suspect what you are encountering, here, is that the human mind can make assumptions - a program cannot: presumably, the last file is 'not the same as the rest', but how is a program to know? How do you know the last file is not the same as the rest?
    "Ok, my response to that is pending a Google search" - Bucky Katt.
    "There are two types of people in the world: Those who can extrapolate from incomplete data sets." - Unk.
    "Before you can 'think outside the box' you need to understand where the box is."

  7. #7
    Super Moderator dday9's Avatar
    Join Date
    Mar 2011
    Location
    South Louisiana
    Posts
    11,753

    Re: Loosly compare string

    Here is my try at it:
    Code:
    Imports System
    Public Module Module1
    	Public Sub Main()
    		'Get the contents of the file
    		Dim contents() As String = { _
    			"Legacy v2 015-000.jpg", "Legacy v2 015-013.jpg", "Legacy v2 015-014.png", "Legacy v2 015-015.jpg", "Legacy v2 015-016.jpg", "Legacy v2 015-017.png", "Legacy v2 015-018.jpg", "Legacy v2 015-019.jpg", "Legacy v2 015-020.jpg", "Legacy v2 015-021.jpg", "dropletter12.jpg"}
    
    		'Sort the file
    		Array.Sort(contents)
    
    		'The most uncommon string will be either at the beginning or end
    		If contents(0)(0) = contents(1)(0) Then
    			Console.WriteLine(contents(contents.Length - 1))
    		Else
    			Console.WriteLine(contents(0))
    		End If
    
    		Console.ReadLine()
    	End Sub
    End Module
    Edit -
    That example assumes that there are at least 2 items in the array and that the first letter of the ugly duckling will be different from the rest.
    Last edited by dday9; Mar 13th, 2015 at 09:17 AM.
    "Code is like humor. When you have to explain it, it is bad." - Cory House
    VbLessons | Code Tags | Sword of Fury - Jameram

  8. #8

    Thread Starter
    Lively Member
    Join Date
    Jan 2007
    Posts
    79

    Re: Loosly compare string

    Hi dday9,

    That's not a feasable answer is it?
    As I said, I do not know how the files are named, I only now there are 1 or 2 that are not part of the collection, those are the ones I need.

    Kind Regards, Starf0x

  9. #9
    Super Moderator dday9's Avatar
    Join Date
    Mar 2011
    Location
    South Louisiana
    Posts
    11,753

    Re: Loosly compare string

    I made an edit to my previous post but it assumes that there are at least 2 items in the array and that the first letter of the ugly duckling will be different from the rest.
    "Code is like humor. When you have to explain it, it is bad." - Cory House
    VbLessons | Code Tags | Sword of Fury - Jameram

  10. #10
    Super Moderator dday9's Avatar
    Join Date
    Mar 2011
    Location
    South Louisiana
    Posts
    11,753

    Re: Loosly compare string

    Here is a more concise version in an array:
    Code:
    	Private Function FindUglyDuckling(ByVal contents() As String) As Integer
    		Array.Sort(contents)
    		
    		If contents(0)(0) = contents(1)(0) AndAlso contents(contents.Length - 1)(0) <> contents(1)(0) Then
    			Return contents.Length - 1
    		ElseIf contents(contents.Length - 1)(0) = contents(1)(0) AndAlso contents(0)(0) <> contents(1)(0)
    			Return 0
    		Else
    			Return -1
    		End If
    	End Function
    It will return a -1 when there are no more ugly ducklings.
    "Code is like humor. When you have to explain it, it is bad." - Cory House
    VbLessons | Code Tags | Sword of Fury - Jameram

  11. #11
    Frenzied Member IanRyder's Avatar
    Join Date
    Jan 2013
    Location
    Healing, UK
    Posts
    1,232

    Re: Loosly compare string

    Hi,

    Here is my contribution for what its worth. You are going to have to do some sort of File Name “Pattern Recognition” and work out a way to discern what is an “Ugly Duckling” and what is a “Pretty Pattern”.

    As a starting point, please see below something I came up with. In this example I have assumed that if a Percentage of File Patterns (Grouped Together) is greater than or equal to 90% of the Total Number of files in a collection of File Names then there has to be some “Ugly Ducklings” in the remaining 10% of File Patterns.

    There is quite a lot going on here so please do take the time to understand it and potentially refine it into a possible workable solution for yourself.
    vb.net Code:
    1. Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
    2.   'Lets Set a Threshold to work with. Lets say 90%
    3.   Const PercentageThreshold As Decimal = 90D
    4.   'Get ths file names which ever way you want
    5.   Dim fileNames As String() = IO.File.ReadAllLines("c:\temp\filenames.txt")
    6.  
    7.   'Get the shortest File Name length
    8.   Dim shortestFileNameLength As Integer = fileNames.Min(Function(x) IO.Path.GetFileNameWithoutExtension(x).Length)
    9.   'A couple of variables to hold information
    10.   Dim fileGroups As IEnumerable(Of IGrouping(Of String, String)) = Nothing
    11.   Dim prettyFileGroupFound As Boolean
    12.  
    13.   'Decrease the file length, one character at a time, to work with using a For Loop looking for patterns in the file names
    14.   For fileNameLengthToUse As Integer = shortestFileNameLength To 1 Step -1
    15.     Dim currentfileNameLengthToUse As Integer = fileNameLengthToUse
    16.     'Group the file name patterns and look for Percentage Thresholds
    17.     fileGroups = fileNames.GroupBy(Function(x) New String(x.Take(currentfileNameLengthToUse).ToArray))
    18.     Dim maxGroupPercentage As Decimal = (fileGroups.Max(Function(x) x.Count) / fileNames.Count) * 100
    19.  
    20.     If maxGroupPercentage >= PercentageThreshold Then
    21.       'If we get here then there is a common file pattern that has been recognised
    22.       prettyFileGroupFound = True
    23.       Exit For
    24.     End If
    25.   Next
    26.  
    27.   'If a common file pattern HAS been recognised and therefore is ALSO nore than 1 File Pattern Group then we have some "Ugly Ducklings"
    28.   If prettyFileGroupFound AndAlso fileGroups.Count > 1 Then
    29.     'Lest loop through the "Ugly Ducklings" nd display them
    30.     For Each currentFileGroup As IGrouping(Of String, String) In fileGroups.Where(Function(x) (x.Count / fileNames.Count) * 100 < PercentageThreshold)
    31.       For Each specificFileName In currentFileGroup
    32.         MsgBox(specificFileName)
    33.       Next
    34.     Next
    35.   End If
    36. End Sub

    Hope that helps.

    Cheers,

    Ian

  12. #12

    Thread Starter
    Lively Member
    Join Date
    Jan 2007
    Posts
    79

    Re: Loosly compare string

    [QUOTE=IanRyder;4849437]Hi,

    Here is my contribution for what its worth. You are going to have to do some sort of File Name “Pattern Recognition” and work out a way to discern what is an “Ugly Duckling” and what is a “Pretty Pattern”.

    As a starting point, please see below something I came up with. In this example I have assumed that if a Percentage of File Patterns (Grouped Together) is greater than or equal to 90% of the Total Number of files in a collection of File Names then there has to be some “Ugly Ducklings” in the remaining 10% of File Patterns.

    There is quite a lot going on here so please do take the time to understand it and potentially refine it into a possible workable solution for yourself.
    vb.net Code:
    1. Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
    2.   'Lets Set a Threshold to work with. Lets say 90%
    3.   Const PercentageThreshold As Decimal = 90D
    4.   'Get ths file names which ever way you want
    5.   Dim fileNames As String() = IO.File.ReadAllLines("c:\temp\filenames.txt")
    6.  
    7.   'Get the shortest File Name length
    8.   Dim shortestFileNameLength As Integer = fileNames.Min(Function(x) IO.Path.GetFileNameWithoutExtension(x).Length)
    9.   'A couple of variables to hold information
    10.   Dim fileGroups As IEnumerable(Of IGrouping(Of String, String)) = Nothing
    11.   Dim prettyFileGroupFound As Boolean
    12.  
    13.   'Decrease the file length, one character at a time, to work with using a For Loop looking for patterns in the file names
    14.   For fileNameLengthToUse As Integer = shortestFileNameLength To 1 Step -1
    15.     Dim currentfileNameLengthToUse As Integer = fileNameLengthToUse
    16.     'Group the file name patterns and look for Percentage Thresholds
    17.     fileGroups = fileNames.GroupBy(Function(x) New String(x.Take(currentfileNameLengthToUse).ToArray))
    18.     Dim maxGroupPercentage As Decimal = (fileGroups.Max(Function(x) x.Count) / fileNames.Count) * 100
    19.  
    20.     If maxGroupPercentage >= PercentageThreshold Then
    21.       'If we get here then there is a common file pattern that has been recognised
    22.       prettyFileGroupFound = True
    23.       Exit For
    24.     End If
    25.   Next
    26.  
    27.   'If a common file pattern HAS been recognised and therefore is ALSO nore than 1 File Pattern Group then we have some "Ugly Ducklings"
    28.   If prettyFileGroupFound AndAlso fileGroups.Count > 1 Then
    29.     'Lest loop through the "Ugly Ducklings" nd display them
    30.     For Each currentFileGroup As IGrouping(Of String, String) In fileGroups.Where(Function(x) (x.Count / fileNames.Count) * 100 < PercentageThreshold)
    31.       For Each specificFileName In currentFileGroup
    32.         MsgBox(specificFileName)
    33.       Next
    34.     Next
    35.   End If
    36. End Sub

    Thanks Ian,

    This is a whole different approach, I didn't think to look at this in this way.

    What Imports are you using? Am I correct to assume:
    vb.net Code:
    1. Imports System.Linq

    Kind Regards, Starf0x

  13. #13

    Thread Starter
    Lively Member
    Join Date
    Jan 2007
    Posts
    79

    Re: Loosly compare string

    Hey IanRyder,

    This ihard to comprehend for me you code (never studied for it).
    I'm trying, believe me

    But it doesn't always work, i have issues with this pattern:

    001.jpg
    002.jpg
    003.jpg
    005.jpg
    006.jpg
    007.jpg
    008.jpg
    Zone50ft.jpg

    I just doesn't recognize the "Ugly Duckling", and won't go past:

    vb.net Code:
    1. If prettyFileGroupFound = True AndAlso fileGroups.Count > 1 Then

    Kind Regards, Starf0x

  14. #14
    Frenzied Member IanRyder's Avatar
    Join Date
    Jan 2013
    Location
    Healing, UK
    Posts
    1,232

    Re: Loosly compare string

    Hi,

    I assume you got the LINQ question sorted so I will ignore that for the moment. With regards to your last post, don’t forget that the example I provided worked on the principle of 90% of files being a prettyFileGroupFound but in the last case that you presented you can clearly see that there are only 8 files and therefore the “prettyFileGroup” only accounts for about 87% of files which means the routine does not work.

    This is where you need to expand on my logic to come up with something that works for you.

    One suggestion could be to have a Tiered PercentageThreshold whereby if you have a total number of files of less than 10 then the PercentageThreshold should be 80%? If the total number of files is between 11 and 50 then the PercentageThreshold should be 85%? etc, etc...

    Over to you to come up with something that works.

    Hope that helps.

    Cheers,

    Ian
    Last edited by IanRyder; Mar 17th, 2015 at 09:33 AM. Reason: I Can't Count?

  15. #15

    Thread Starter
    Lively Member
    Join Date
    Jan 2007
    Posts
    79

    Re: Loosly compare string

    Ian,

    Thank you for all your help, I will do my best to do so

    Cheers,

    Starf0x

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width