|
-
Feb 22nd, 2009, 10:18 AM
#1
[2008] Parsing text file, need some help
Hi,
I am trying to parse a very large text file for certain strings. The text file is part of a level-making software for an old game I play.
The text file basically contains all the information the level designer software needs, but the only important bit is the 'texture information'.
Basically what I'm trying to create is a little program that parses the text files and shows the user a list of every texture in that text file.
The problem is, the strings denoting textures are not really easy to find, and I can't think of any sensible and fast way to get them...
Here is part of an actual file:
Code:
// brush 5064
{
( 2716 -3384 896 ) ( 2720 -3376 896 ) ( 2716 -3384 1024 ) battery_wall/wall02_bot 0.000000 0.000000 0.000000 0.500000 0.500000 0 0 0
( 2980 -3504 1024 ) ( 2884 -3408 1024 ) ( 2892 -3392 1024 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 0 4 0
( 2928 -3512 896 ) ( 2944 -3512 896 ) ( 2928 -3024 896 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 0 4 0
( 2752 -3302 976 ) ( 2768 -3310 976 ) ( 2752 -3302 1024 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 0 4 0
( 2756 -3404 896 ) ( 2752 -3412 896 ) ( 2756 -3404 1024 ) battery_wall/wall02_bot 0.000000 0.000000 0.000000 0.500000 0.500000 0 0 0
( 2780 -3356 896 ) ( 2772 -3352 896 ) ( 2780 -3356 1024 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 0 4 0
}
// brush 5065
{
( 2928 -3512 896 ) ( 2944 -3512 896 ) ( 2928 -3024 896 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 134217728 4 0
( 2980 -3504 1024 ) ( 2884 -3408 1024 ) ( 2892 -3392 1024 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 134217728 4 0
( 3000 -3476 1008 ) ( 2744 -3348 1008 ) ( 3000 -3476 1024 ) battery_wall/wall02_bot 0.000000 0.000000 0.000000 0.500000 0.500000 134217728 0 0
( 2744 -3348 896 ) ( 2740 -3336 896 ) ( 2744 -3348 1024 ) battery_wall/wall02_bot 0.000000 0.000000 0.000000 0.500000 0.500000 134217728 0 0
( 2780 -3356 896 ) ( 2768 -3360 896 ) ( 2780 -3356 1024 ) battery_wall/wall02_bot 0.000000 0.000000 0.000000 0.500000 0.500000 134217728 0 0
( 2740 -3336 896 ) ( 2748 -3340 896 ) ( 2740 -3336 1024 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 134217728 4 0
}
// brush 5066
{
( 2748 -3320 880 ) ( 2752 -3312 880 ) ( 2748 -3320 896 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 0 4 0
( 2828 -3296 832 ) ( 2820 -3296 832 ) ( 2828 -3368 832 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 0 4 0
( 2794 -3296 896 ) ( 2798 -3296 896 ) ( 2794 -3368 896 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 0 4 0
( 2740 -3336 832 ) ( 2748 -3340 832 ) ( 2740 -3336 896 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 0 4 0
( 2780 -3356 832 ) ( 2776 -3364 832 ) ( 2780 -3356 896 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 0 4 0
( 2716 -3504 832 ) ( 2708 -3500 832 ) ( 2716 -3504 896 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 0 4 0
}
There are only two (different) textures in this example, they are "common/caulk" and "battery_wall/wall02_bot". (Note that some 'brushes' may have up to six different textures).
Obviously, my program will not list duplicate textures, so if I would use the example I would only need my program to list the two textures. This however is not a problem, it is easy enough to check if the list already contains a certain texture before adding it.
The problem is in the 'format'. It seems to be like this all the time:
Code:
// brush <number>
{
( ... ) ( ... ) ( ... ) [TEXTURE STRING] ....
( ... ) ( ... ) ( ... ) [TEXTURE STRING] ....
( ... ) ( ... ) ( ... ) [TEXTURE STRING] ....
( ... ) ( ... ) ( ... ) [TEXTURE STRING] ....
( ... ) ( ... ) ( ... ) [TEXTURE STRING] ....
( ... ) ( ... ) ( ... ) [TEXTURE STRING] ....
}
// brush <number>
{
( ... ) ( ... ) ( ... ) [TEXTURE STRING] ....
( ... ) ( ... ) ( ... ) [TEXTURE STRING] ....
( ... ) ( ... ) ( ... ) [TEXTURE STRING] ....
( ... ) ( ... ) ( ... ) [TEXTURE STRING] ....
( ... ) ( ... ) ( ... ) [TEXTURE STRING] ....
( ... ) ( ... ) ( ... ) [TEXTURE STRING] ....
}
As you can see, the [TEXTURE STRING] is always right behind the three sets of coordinates (...) (...) (...).
The only way I can think of to parse this is by using something like:
Code:
1. Search for an opening parentheses "("
2. Search for the next closing parentheses ")"
3. Search for an opening parentheses "("
4. Search for the next closing parentheses ")"
5. Search for an opening parentheses "("
6. Search for the next closing parentheses ")"
7. The texture string is now 1 character (space) behind the last closing parentheses, and stops as soon as another space is reached.
However, I am pretty certain this will be very very slow, especially with files of up to 150.000 lines...
Can this be done using regular expressions? I have never used them before so I haven't got a clue...
If anyone has any idea how to make this faster, thanks!!
-
Feb 22nd, 2009, 10:48 AM
#2
Re: [2008] Parsing text file, need some help
Hey,
This can definitely be done using regular expression.s I am not at my own machine just now, so can't knock up an example, but I am sure you can construct a regular expression that will match on the parenthesis that you have identified and then grab the texture beside it.
There is a nice little application called Expresso that can help you grahically construct the regular expression, and then simply put that in a Regex.Match method and you should be good.
Hope that helps!
Gary
-
Feb 22nd, 2009, 10:49 AM
#3
Re: [2008] Parsing text file, need some help
Or use the LastIndexOf method.
Casey.
-
Feb 22nd, 2009, 11:00 AM
#4
Re: [2008] Parsing text file, need some help
I have been trying to get the regular expressions to work but I just haven't got a clue how to use them
I think this particular case is just too difficult for someone with absolutely 0 experience with regex 
I would be really grateful if someone could cook me up the required regex pattern... Or is that really a lot more work then I think it is? Seems simple enough for someone with a little experience, no?
-
Feb 22nd, 2009, 11:02 AM
#5
Re: [2008] Parsing text file, need some help
Hey,
If I get a chance, I will knock up an example and post back here, unless someone beats me to it. Currently re-installing an OS on relative's PC so no Visual Studio, just filling time while SP3 downloads 
Gary
-
Feb 22nd, 2009, 03:08 PM
#6
Re: [2008] Parsing text file, need some help
Heh, I spent all day re-installing my PC aswell! Vista decided after two years that my (borrowed) serial key was invalid so I reinstalled XP.
Anyway, it would be great if you could knock something up because I don't think anything except regex will do the job efficiently here...
If only the coordinates in the brackets where the same each time this would be much easier, but because they are different every time I have found it quite impossible to do with my limited parsing skills
-
Feb 22nd, 2009, 03:21 PM
#7
Re: [2008] Parsing text file, need some help
EDIT
Nevermind, I thought I found a better solution but turns out that needs another few problems solved first... So I'm going to stick with this...
So I still need help!
Last edited by NickThissen; Feb 22nd, 2009 at 04:21 PM.
-
Feb 23rd, 2009, 03:16 AM
#8
Re: [2008] Parsing text file, need some help
Hey,
Okay, this is a very quick example this morning, and I am only just starting to wake up so likely to be things wrong with it, but have a look, and let me know what you think.
By the way, TextFile1, is just a text file that I copied the example input that you gave above into.
Code:
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
Dim inputFile As String = System.IO.File.ReadAllText("TextFile1.txt")
Dim m As System.Text.RegularExpressions.Match
Dim myRegex As New System.Text.RegularExpressions.Regex("\( \d{3,4} -\d{3,4} \d{3,4} \) \( \d{3,4} -\d{3,4} \d{3,4} \) \( \d{3,4} -\d{3,4} \d{3,4} \) (?<1>\w*/\w* )", System.Text.RegularExpressions.RegexOptions.IgnoreCase Or System.Text.RegularExpressions.RegexOptions.Compiled)
m = myRegex.Match(inputFile)
While m.Success
MessageBox.Show("Found match " & m.Groups(1).Value & " at " & m.Groups(1).Index.ToString())
m = m.NextMatch()
End While
End Sub
It is doing what I expect it to at least, i.e. displays the texture information.
It is quite a crude regex, and could easily be refined, but hopefully it will get you on your way.
Gary
-
Feb 23rd, 2009, 04:53 AM
#9
Re: [2008] Parsing text file, need some help
Thanks, it does seem to work!
I also found a different way of getting the textures (using a different file) using the help of Negative0 on this forum, but this is going to come in handy
-
Feb 23rd, 2009, 04:55 AM
#10
Re: [2008] Parsing text file, need some help
Code:
"(?<=(\([^\)]+\) ){3}).+?(?=\d\.)"
-
Feb 23rd, 2009, 04:59 AM
#11
Re: [2008] Parsing text file, need some help
 Originally Posted by MaximilianMayrhofer
Code:
"(?<=(\([^\)]+\) ){3}).+?(?=\d\.)"
Now you are just showing off
-
Feb 23rd, 2009, 04:56 AM
#12
Re: [2008] Parsing text file, need some help
Good stuff, I was obviously more awake than I thought
-
Feb 23rd, 2009, 06:22 AM
#13
Re: [2008] Parsing text file, need some help
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|