Hi,
I am trying to parse a very large text file for certain strings. The text file is part of a level-making software for an old game I play.
The text file basically contains all the information the level designer software needs, but the only important bit is the 'texture information'.
Basically what I'm trying to create is a little program that parses the text files and shows the user a list of every texture in that text file.
The problem is, the strings denoting textures are not really easy to find, and I can't think of any sensible and fast way to get them...
Here is part of an actual file:
There are only two (different) textures in this example, they are "common/caulk" and "battery_wall/wall02_bot". (Note that some 'brushes' may have up to six different textures).Code:// brush 5064 { ( 2716 -3384 896 ) ( 2720 -3376 896 ) ( 2716 -3384 1024 ) battery_wall/wall02_bot 0.000000 0.000000 0.000000 0.500000 0.500000 0 0 0 ( 2980 -3504 1024 ) ( 2884 -3408 1024 ) ( 2892 -3392 1024 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 0 4 0 ( 2928 -3512 896 ) ( 2944 -3512 896 ) ( 2928 -3024 896 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 0 4 0 ( 2752 -3302 976 ) ( 2768 -3310 976 ) ( 2752 -3302 1024 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 0 4 0 ( 2756 -3404 896 ) ( 2752 -3412 896 ) ( 2756 -3404 1024 ) battery_wall/wall02_bot 0.000000 0.000000 0.000000 0.500000 0.500000 0 0 0 ( 2780 -3356 896 ) ( 2772 -3352 896 ) ( 2780 -3356 1024 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 0 4 0 } // brush 5065 { ( 2928 -3512 896 ) ( 2944 -3512 896 ) ( 2928 -3024 896 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 134217728 4 0 ( 2980 -3504 1024 ) ( 2884 -3408 1024 ) ( 2892 -3392 1024 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 134217728 4 0 ( 3000 -3476 1008 ) ( 2744 -3348 1008 ) ( 3000 -3476 1024 ) battery_wall/wall02_bot 0.000000 0.000000 0.000000 0.500000 0.500000 134217728 0 0 ( 2744 -3348 896 ) ( 2740 -3336 896 ) ( 2744 -3348 1024 ) battery_wall/wall02_bot 0.000000 0.000000 0.000000 0.500000 0.500000 134217728 0 0 ( 2780 -3356 896 ) ( 2768 -3360 896 ) ( 2780 -3356 1024 ) battery_wall/wall02_bot 0.000000 0.000000 0.000000 0.500000 0.500000 134217728 0 0 ( 2740 -3336 896 ) ( 2748 -3340 896 ) ( 2740 -3336 1024 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 134217728 4 0 } // brush 5066 { ( 2748 -3320 880 ) ( 2752 -3312 880 ) ( 2748 -3320 896 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 0 4 0 ( 2828 -3296 832 ) ( 2820 -3296 832 ) ( 2828 -3368 832 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 0 4 0 ( 2794 -3296 896 ) ( 2798 -3296 896 ) ( 2794 -3368 896 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 0 4 0 ( 2740 -3336 832 ) ( 2748 -3340 832 ) ( 2740 -3336 896 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 0 4 0 ( 2780 -3356 832 ) ( 2776 -3364 832 ) ( 2780 -3356 896 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 0 4 0 ( 2716 -3504 832 ) ( 2708 -3500 832 ) ( 2716 -3504 896 ) common/caulk 0.000000 0.000000 0.000000 0.500000 0.500000 0 4 0 }
Obviously, my program will not list duplicate textures, so if I would use the example I would only need my program to list the two textures. This however is not a problem, it is easy enough to check if the list already contains a certain texture before adding it.
The problem is in the 'format'. It seems to be like this all the time:
As you can see, the [TEXTURE STRING] is always right behind the three sets of coordinates (...) (...) (...).Code:// brush <number> { ( ... ) ( ... ) ( ... ) [TEXTURE STRING] .... ( ... ) ( ... ) ( ... ) [TEXTURE STRING] .... ( ... ) ( ... ) ( ... ) [TEXTURE STRING] .... ( ... ) ( ... ) ( ... ) [TEXTURE STRING] .... ( ... ) ( ... ) ( ... ) [TEXTURE STRING] .... ( ... ) ( ... ) ( ... ) [TEXTURE STRING] .... } // brush <number> { ( ... ) ( ... ) ( ... ) [TEXTURE STRING] .... ( ... ) ( ... ) ( ... ) [TEXTURE STRING] .... ( ... ) ( ... ) ( ... ) [TEXTURE STRING] .... ( ... ) ( ... ) ( ... ) [TEXTURE STRING] .... ( ... ) ( ... ) ( ... ) [TEXTURE STRING] .... ( ... ) ( ... ) ( ... ) [TEXTURE STRING] .... }
The only way I can think of to parse this is by using something like:
However, I am pretty certain this will be very very slow, especially with files of up to 150.000 lines...Code:1. Search for an opening parentheses "(" 2. Search for the next closing parentheses ")" 3. Search for an opening parentheses "(" 4. Search for the next closing parentheses ")" 5. Search for an opening parentheses "(" 6. Search for the next closing parentheses ")" 7. The texture string is now 1 character (space) behind the last closing parentheses, and stops as soon as another space is reached.
Can this be done using regular expressions? I have never used them before so I haven't got a clue...
If anyone has any idea how to make this faster, thanks!!




Reply With Quote