Hi,

I am writing a tool that logs stats for a first person shooter game. The game writes its information (including who killed who with which weapon) to a log file, which I read every so many seconds and parse so that I can write that information to a database.

I am now having trouble parsing the weapon out of that string of information.


Some additional information is required: a weapon in this game can contain attachments (such as a scope, a grenade launcher, etc). Weapons can have either 1 or 2 attachments. Some weapons however cannot have any attachments at all.

Each log file entry that describes a kill contains a code that describes the weapon that was used. This code is a concatenation of either 3 or 4 parts:
Code:
<weapon>_<attachment>_mp
<weapon>_<attachment1>_<attachment2>_mp
where <weapon> is a code that describes a weapon and <attachment> is another code describing the attachment.


This should be easy to parse by just splitting along the underscore characters, but there's a few catches:

1. Some attachments are able to kill players as well. Specifically: grenade launchers, flame throwers and underbarrel shotguns. In this case, the attachment is listed before the weapon:
Code:
<attachment>_<weapon>_mp
Note also that in this case there is always only 1 attachment.

2. The biggest catch: some weapon names have an underscore in them (some even have 3 underscores)! So simply splitting along the underscore won't work in all cases; if the weapon name contains an underscore I'm splitting the name of the weapon...


This makes the list of possible combinations a lot longer. The ones I can think of (I think these are all):
Code:
Weapon names without underscores:

1. <weapon>_mp								-	No attachments
2. <weapon>_<attachment>_mp					-	One attachment
3. <attachment>_<weapon>_mp					- 	Attachment kill
4. <weapon>_<attachment>_<attachment>_mp	-	Two attachments

Weapon names with underscores:

5. <weaponpart1>_<weaponpart2>_mp										- No attachments
6. <weaponpart1>_<weaponpart2>_<weaponpart3>_mp							- No attachments
7. <weaponpart1>_<weaponpart2>_<weaponpart3>_<weaponpart4>_mp			- No attachments
Luckily, all weapons with an underscore in their name cannot have any attachments. Probably a coincidence, but that makes the list a lot shorter (otherwise the first 4 options would be repeated for option 5, 6 and 7, making a total of 19 options if I counted correctly).


My question now I guess: can anyone see an easy way to parse this weapon code so that I can extract the weapon name and the attachments used separately? I started out with splitting along the underscore and checking the length of the resulting array. If the length is 2 then it's always option 1. If the length is 3 then there's already 3 options. This adds up really fast, my code was already way too long at this point and impossible to understand This approach requires a lot of "trial and error", where I parse out the first part and basically do;
1. Check if it represents a weapon
2. If not, check if it's an attachment
3. If not, check if it's part of a weapon

As you can imagine this gets really ugly. There must be a better way to parse this stuff?

If anyone can see an easy way please let me know!


Finally if you need it here's the list of weapons and attachments:

Weapons (code, full name, bitmask of possible attachments):
Code:
ak47   AK47   37627
ak74u   AK74u   49595
asp   ASP   262144
aug   AUG   37627
knife_ballistic   Ballistic Knife   0
china_lake   China Lake   0
m1911   M1911   295968
commando   Commando   37627
crossbow_explosive   Crossbow   0
cz75   CZ75   820256
dragunov   Dragunov   164385
enfield   Enfield   37627
famas   Famas   37627
fnfal   FN FAL   37627
g11   G11   133120
galil   Galil   37627
hk21   HK21   563
hs10   HS10   262144
ithaca   Stakeout   256
kiparis   Kiparis   311603
knife   Knife   0
l96a1   L96A1   164385
m14   M14   37875
m16   M16   37627
m60   M60   819
m72_law   M72 LAW   0
mac11   MAC11   311602
makarov   Makarov   295968
mp5k   MP5K   49203
mpl   MPL   49435
pm63   PM63   278816
psg1   PSG1   164385
python   Python   335873
rottweil72   Olympia   0
rpg   RPG   0
rpk   RPK   571
skorpion   Skorpion   311584
spas   SPAS-12   32768
spectre   Spectre   49459
stoner63   Stoner63   563
strela   Strela-3   0
uzi   Uzi   49459
wa2000   WA2000   164385
concussion_grenade   Concussion Grenade   0
flash_grenade   Flash Grenade   0
frag_grenade   Frag Grenade   0
sticky_grenade   Semtex   0
tabun_gas   Nova Gas   0
willy_pete   Willy Pete   0
ft   Flamethrower   0
gl   Grenade Launcher   0
mk   Masterkey Shotgun   0
airstrike   Rolling Thunder   0
auto_gun_turret   Sentry Gun   0
cobra_20mm_comlink   Attack Helicopter   0
dog_bite   Attack Dogs   0
hind_minigun_pilot_firstperson   Gunship [Minigun]   0
huey_minigun_gunner   Chopper Gunner   0
m220_tow   Valkyrie Rockets   0
mortar   Mortar Team   0
napalm   Napalm Strike   0
rcbomb   RC-XD   0
claymore   Claymore   0
satchel_charge   C4   0
hatchet   Tomahawk   0
explosive_bolt   Explosive Bolt   0
explodable_barrel   Explodable Barrel   0
hind_rockets_firstperson   Gunship [Rockets]   0
minigun_mp   Death Machine   0
m202_flash   Grim Reaper   0
Attachments (code, full name, bitmask):
Code:
acog   ACOG Sight   1
reflex   Reflex Sight   2
drum   Drum Mag   4
dualclip   Dual Mag   8
elbit   Red Dot Sight   16
extclip   Extended Mag   32
ft   Flamethrower   64
gl   Grenade Launcher   128
grip   Grip   256
ir   Infrared Scope   512
upgradesight   Upgraded Iron Sights   1024
lps   Low Power Scope   2048
mk   Masterkey   4096
speed   Speed Reloader   8192
rf   Rapid Fire   16384
silencer   Suppressor   32768
snub   Snub Nose   65536
vzoom   Variable Zoom   131072
dw   Dual Wield   262144
auto   Full Auto Upgrade   524288
The code is the name of the weapon as it gets written to the log file (so this is the name I am parsing out, the full name is irrelevant). As you can also see, there's some weapons with underscores in the name, but all of them have an attachments bitmask of 0 meaning they cannot have any attachments.