Results 1 to 13 of 13

Thread: [2008] Parsing text file, need some help

  1. #1

    Thread Starter
    PowerPoster
    Join Date
    Apr 2007
    Location
    The Netherlands
    Posts
    5,070

    [2008] Parsing text file, need some help

    Hi,

    I am trying to parse a very large text file for certain strings. The text file is part of a level-making software for an old game I play.

    The text file basically contains all the information the level designer software needs, but the only important bit is the 'texture information'.

    Basically what I'm trying to create is a little program that parses the text files and shows the user a list of every texture in that text file.

    The problem is, the strings denoting textures are not really easy to find, and I can't think of any sensible and fast way to get them...

    Here is part of an actual file:
    Code:
    // brush 5064
    {
    ( 2716 -3384 896 ) ( 2720 -3376 896 ) ( 2716 -3384 1024 ) battery_wall/wall02_bot 0.000000 0.000000 0.000000 0.500000 0.500000 0 0 0
    ( 2980 -3504 1024 ) ( 2884 -3408 1024 ) ( 2892 -3392 1024 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 0 4 0
    ( 2928 -3512 896 ) ( 2944 -3512 896 ) ( 2928 -3024 896 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 0 4 0
    ( 2752 -3302 976 ) ( 2768 -3310 976 ) ( 2752 -3302 1024 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 0 4 0
    ( 2756 -3404 896 ) ( 2752 -3412 896 ) ( 2756 -3404 1024 ) battery_wall/wall02_bot 0.000000 0.000000 0.000000 0.500000 0.500000 0 0 0
    ( 2780 -3356 896 ) ( 2772 -3352 896 ) ( 2780 -3356 1024 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 0 4 0
    }
    // brush 5065
    {
    ( 2928 -3512 896 ) ( 2944 -3512 896 ) ( 2928 -3024 896 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 134217728 4 0
    ( 2980 -3504 1024 ) ( 2884 -3408 1024 ) ( 2892 -3392 1024 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 134217728 4 0
    ( 3000 -3476 1008 ) ( 2744 -3348 1008 ) ( 3000 -3476 1024 ) battery_wall/wall02_bot 0.000000 0.000000 0.000000 0.500000 0.500000 134217728 0 0
    ( 2744 -3348 896 ) ( 2740 -3336 896 ) ( 2744 -3348 1024 ) battery_wall/wall02_bot 0.000000 0.000000 0.000000 0.500000 0.500000 134217728 0 0
    ( 2780 -3356 896 ) ( 2768 -3360 896 ) ( 2780 -3356 1024 ) battery_wall/wall02_bot 0.000000 0.000000 0.000000 0.500000 0.500000 134217728 0 0
    ( 2740 -3336 896 ) ( 2748 -3340 896 ) ( 2740 -3336 1024 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 134217728 4 0
    }
    // brush 5066
    {
    ( 2748 -3320 880 ) ( 2752 -3312 880 ) ( 2748 -3320 896 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 0 4 0
    ( 2828 -3296 832 ) ( 2820 -3296 832 ) ( 2828 -3368 832 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 0 4 0
    ( 2794 -3296 896 ) ( 2798 -3296 896 ) ( 2794 -3368 896 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 0 4 0
    ( 2740 -3336 832 ) ( 2748 -3340 832 ) ( 2740 -3336 896 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 0 4 0
    ( 2780 -3356 832 ) ( 2776 -3364 832 ) ( 2780 -3356 896 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 0 4 0
    ( 2716 -3504 832 ) ( 2708 -3500 832 ) ( 2716 -3504 896 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 0 4 0
    }
    There are only two (different) textures in this example, they are "common/caulk" and "battery_wall/wall02_bot". (Note that some 'brushes' may have up to six different textures).

    Obviously, my program will not list duplicate textures, so if I would use the example I would only need my program to list the two textures. This however is not a problem, it is easy enough to check if the list already contains a certain texture before adding it.

    The problem is in the 'format'. It seems to be like this all the time:
    Code:
    // brush <number>
    {
    ( ... ) ( ... ) ( ... ) [TEXTURE STRING] ....
    ( ... ) ( ... ) ( ... ) [TEXTURE STRING] ....
    ( ... ) ( ... ) ( ... ) [TEXTURE STRING] ....
    ( ... ) ( ... ) ( ... ) [TEXTURE STRING] ....
    ( ... ) ( ... ) ( ... ) [TEXTURE STRING] ....
    ( ... ) ( ... ) ( ... ) [TEXTURE STRING] ....
    }
    // brush <number>
    {
    ( ... ) ( ... ) ( ... ) [TEXTURE STRING] ....
    ( ... ) ( ... ) ( ... ) [TEXTURE STRING] ....
    ( ... ) ( ... ) ( ... ) [TEXTURE STRING] ....
    ( ... ) ( ... ) ( ... ) [TEXTURE STRING] ....
    ( ... ) ( ... ) ( ... ) [TEXTURE STRING] ....
    ( ... ) ( ... ) ( ... ) [TEXTURE STRING] ....
    }
    As you can see, the [TEXTURE STRING] is always right behind the three sets of coordinates (...) (...) (...).

    The only way I can think of to parse this is by using something like:
    Code:
    1. Search for an opening parentheses "("
    2. Search for the next closing parentheses ")"
    3. Search for an opening parentheses "("
    4. Search for the next closing parentheses ")"
    5. Search for an opening parentheses "("
    6. Search for the next closing parentheses ")"
    7. The texture string is now 1 character (space) behind the last closing parentheses, and stops as soon as another space is reached.
    However, I am pretty certain this will be very very slow, especially with files of up to 150.000 lines...

    Can this be done using regular expressions? I have never used them before so I haven't got a clue...


    If anyone has any idea how to make this faster, thanks!!

  2. #2
    PowerPoster gep13's Avatar
    Join Date
    Nov 2004
    Location
    The Granite City
    Posts
    21,963

    Re: [2008] Parsing text file, need some help

    Hey,

    This can definitely be done using regular expression.s I am not at my own machine just now, so can't knock up an example, but I am sure you can construct a regular expression that will match on the parenthesis that you have identified and then grab the texture beside it.

    There is a nice little application called Expresso that can help you grahically construct the regular expression, and then simply put that in a Regex.Match method and you should be good.

    Hope that helps!

    Gary

  3. #3
    Fanatic Member vbasicgirl's Avatar
    Join Date
    Jan 2004
    Location
    Manchester, UK
    Posts
    1,016

    Re: [2008] Parsing text file, need some help

    Or use the LastIndexOf method.

    Casey.

  4. #4

    Thread Starter
    PowerPoster
    Join Date
    Apr 2007
    Location
    The Netherlands
    Posts
    5,070

    Re: [2008] Parsing text file, need some help

    I have been trying to get the regular expressions to work but I just haven't got a clue how to use them

    I think this particular case is just too difficult for someone with absolutely 0 experience with regex

    I would be really grateful if someone could cook me up the required regex pattern... Or is that really a lot more work then I think it is? Seems simple enough for someone with a little experience, no?

  5. #5
    PowerPoster gep13's Avatar
    Join Date
    Nov 2004
    Location
    The Granite City
    Posts
    21,963

    Re: [2008] Parsing text file, need some help

    Hey,

    If I get a chance, I will knock up an example and post back here, unless someone beats me to it. Currently re-installing an OS on relative's PC so no Visual Studio, just filling time while SP3 downloads

    Gary

  6. #6

    Thread Starter
    PowerPoster
    Join Date
    Apr 2007
    Location
    The Netherlands
    Posts
    5,070

    Re: [2008] Parsing text file, need some help

    Heh, I spent all day re-installing my PC aswell! Vista decided after two years that my (borrowed) serial key was invalid so I reinstalled XP.

    Anyway, it would be great if you could knock something up because I don't think anything except regex will do the job efficiently here...

    If only the coordinates in the brackets where the same each time this would be much easier, but because they are different every time I have found it quite impossible to do with my limited parsing skills

  7. #7

    Thread Starter
    PowerPoster
    Join Date
    Apr 2007
    Location
    The Netherlands
    Posts
    5,070

    Re: [2008] Parsing text file, need some help

    EDIT
    Nevermind, I thought I found a better solution but turns out that needs another few problems solved first... So I'm going to stick with this...

    So I still need help!

  8. #8
    PowerPoster gep13's Avatar
    Join Date
    Nov 2004
    Location
    The Granite City
    Posts
    21,963

    Re: [2008] Parsing text file, need some help

    Hey,

    Okay, this is a very quick example this morning, and I am only just starting to wake up so likely to be things wrong with it, but have a look, and let me know what you think.

    By the way, TextFile1, is just a text file that I copied the example input that you gave above into.

    Code:
        Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
            Dim inputFile As String = System.IO.File.ReadAllText("TextFile1.txt")
            Dim m As System.Text.RegularExpressions.Match
    
            Dim myRegex As New System.Text.RegularExpressions.Regex("\( \d{3,4} -\d{3,4} \d{3,4} \) \( \d{3,4} -\d{3,4} \d{3,4} \) \( \d{3,4} -\d{3,4} \d{3,4} \) (?<1>\w*/\w* )", System.Text.RegularExpressions.RegexOptions.IgnoreCase Or System.Text.RegularExpressions.RegexOptions.Compiled)
    
            m = myRegex.Match(inputFile)
            While m.Success
                MessageBox.Show("Found match " & m.Groups(1).Value & " at " & m.Groups(1).Index.ToString())
                m = m.NextMatch()
            End While
        End Sub
    It is doing what I expect it to at least, i.e. displays the texture information.

    It is quite a crude regex, and could easily be refined, but hopefully it will get you on your way.

    Gary

  9. #9

    Thread Starter
    PowerPoster
    Join Date
    Apr 2007
    Location
    The Netherlands
    Posts
    5,070

    Re: [2008] Parsing text file, need some help

    Thanks, it does seem to work!

    I also found a different way of getting the textures (using a different file) using the help of Negative0 on this forum, but this is going to come in handy

  10. #10
    Frenzied Member MaximilianMayrhofer's Avatar
    Join Date
    Aug 2007
    Location
    IM IN YR LOOP
    Posts
    2,001

    Re: [2008] Parsing text file, need some help

    Code:
    "(?<=(\([^\)]+\) ){3}).+?(?=\d\.)"

  11. #11
    PowerPoster gep13's Avatar
    Join Date
    Nov 2004
    Location
    The Granite City
    Posts
    21,963

    Re: [2008] Parsing text file, need some help

    Good stuff, I was obviously more awake than I thought

  12. #12
    PowerPoster gep13's Avatar
    Join Date
    Nov 2004
    Location
    The Granite City
    Posts
    21,963

    Re: [2008] Parsing text file, need some help

    Quote Originally Posted by MaximilianMayrhofer
    Code:
    "(?<=(\([^\)]+\) ){3}).+?(?=\d\.)"
    Now you are just showing off

  13. #13
    Frenzied Member MaximilianMayrhofer's Avatar
    Join Date
    Aug 2007
    Location
    IM IN YR LOOP
    Posts
    2,001

    Re: [2008] Parsing text file, need some help


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width