|
-
Apr 7th, 2009, 10:24 AM
#1
Thread Starter
Hyperactive Member
[RESOLVED] RegEx
I'm messing around with regular expressions and I can't seem to figure out how to parse this file.
It's basically a lua file containing a table like:
Code:
TableName = {
["QuotedString"] = {
["QuotedString"] = {
["QuotedString-Key"] = 1,
["QuotedString-Key"] = 0,
["QuotedString-Key"] = 0,
},
["QuotedString"] = {
["QuotedString-Key"] = 1,
["QuotedString-Key"] = 0,
["QuotedString-Key"] = 0,
},
},
["QuotedString-Key"] = true,
["QuotedString-Key"] = true,
}
Some things with the file are standard, like the "TableName" (Never in quotes) starts the table. Variables always have [" "] around them and then equal something like ["Test"] = 0. However, the variable could have multiple variables within it.
My orginal idea was to split up the file within each {} and then try to parse each ["QuotedString-Key"] = 0
My question is, how can I do this?
-
Apr 7th, 2009, 10:59 AM
#2
Re: RegEx
The simplest way would be to look for a JSON parser in VB.NET, because that's JSON.
Here are a few:
http://james.newtonking.com/projects/json-net.aspx
http://sourceforge.net/projects/csjson
If you're feeling adventurous, then you can also write your own JSON parser.
-
Apr 7th, 2009, 11:33 AM
#3
Re: RegEx
I'm not very good at Regex but this is what I came up with. What I did was copy what you posted into a text file and ran this regex on it:
Code:
Imports System.Text.RegularExpressions
Public Class Form1
Private Sub Form1_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
Dim MyRegex As New Regex("[A-Za-z0-9-,]*$")
Dim MyFile() As String = System.IO.File.ReadAllLines("C:\test\RegexMe.txt")
For Each Line As String In MyFile
Dim MyValue As String = (MyRegex.Match(Line).Value)
If Not MyValue.Equals(CStr(",")) AndAlso Not MyValue.Equals(CStr("")) Then
MsgBox(MyValue.Trim(","c))
End If
Next
End Sub
End Class
Last edited by ForumAccount; Apr 7th, 2009 at 11:45 AM.
Reason: Fixed Regex pattern
-
Apr 7th, 2009, 11:39 AM
#4
Thread Starter
Hyperactive Member
Re: RegEx
 Originally Posted by ForumAccount
I'm not very good at Regex but this is what I came up with. What I did was copy what you posted into a text file and ran this regex on it:
Code:
Imports System.Text.RegularExpressions
Public Class Form1
Private Sub Form1_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
Dim MyRegex As New Regex("[A-Za-z0-9-,*$]")
Dim MyFile() As String = System.IO.File.ReadAllLines("C:\test\RegexMe.txt")
For Each Line As String In MyFile
Dim MyValue As String = (MyRegex.Match(Line).Value)
If Not MyValue.Equals(CStr(",")) AndAlso Not MyValue.Equals(CStr("")) Then
MsgBox(MyValue.Trim(","c))
End If
Next
End Sub
End Class
Thanks!
It displays each key name, but it only displays the first letter of the key name. So ["TestKey"] appears as T.
Besides that, how easy would it be to display the value of the key?
Say I have ["TestKey"] = 123 or even ["TestKey"] = { ["ChildKey] = 3, }
-
Apr 7th, 2009, 11:44 AM
#5
Re: RegEx
Sorry! The Regex I put is wrong. It should be: The Regex should be giving you the values: 0,1,True etc...
-
Apr 7th, 2009, 11:58 AM
#6
Re: RegEx
Just like Mendhak, I would recommend you put some time to learn JSON. It's worth it for the file you are trying to work with.
Pradeep
-
Apr 7th, 2009, 12:09 PM
#7
Thread Starter
Hyperactive Member
Re: RegEx
I'm playing with all 3 resources Mendhak provided. A little daunting.
My issue is that the file isn't true JSON. Instead of "Name": "test" you have ["Name"] = "Test",
Not to mention each variable could have thousands of sub variables, which JSON does, but it uses [ ... ], my file uses {... }
Why can't everyone just use something like XML? lol
-
Apr 7th, 2009, 01:11 PM
#8
Re: RegEx
That should be easy, isn't it?
You just need to make a couple of replacements so that it becomes pure JSON format:
[" --> "
"] --> "
= --> :
Last edited by Pradeep1210; Apr 7th, 2009 at 01:14 PM.
-
Apr 7th, 2009, 01:36 PM
#9
Thread Starter
Hyperactive Member
Re: RegEx
 Originally Posted by Pradeep1210
That should be easy, isn't it?
You just need to make a couple of replacements so that it becomes pure JSON format:
[" --> "
"] --> "
= --> :
Actually that makes alot of sense. For the most part the parsing works, but (Of course) I now run into the issue where my file's formatting starts like this:
TableName = { ... all the variables and stuff here ... }
Stripping "TableName = " works, but how could I go through (foreach) each table? There could be more than one table in a given file.
-
Apr 7th, 2009, 02:12 PM
#10
Thread Starter
Hyperactive Member
Re: RegEx
Turns out there are a few more differences I'm still trying to figure out.
First being that JSON arrays dont end with a ,....
So "Key1": 1800,
"Key2": 5800,
"Key3": 18050,
Key3 in JSON should not have a , - However the lua file I have does.
So right now I'm trying to strip out a comma if the there is a new line then a ]
-
Apr 7th, 2009, 02:19 PM
#11
Re: RegEx
hmm... so include this one also:
[" --> "
"] --> "
= --> :
,} --> }
-
Apr 7th, 2009, 02:24 PM
#12
Thread Starter
Hyperactive Member
Re: RegEx
Nope. I think because the ] is always (Almost) on a new line, it's not replacing ,] with ]
-
Apr 7th, 2009, 02:37 PM
#13
Re: RegEx
white-space characters like space, newline etc. are immaterial for JSON format. So you should make provisions to ignore them while making the replacements.
BTW, that was ,} i.e. curly brackets and they are perfectly ok as far as I see from your example in post #1.
[] and {} have different meanings in JSON
[] represents an array
{} represents object (or a subset of objects)
-
Apr 7th, 2009, 04:18 PM
#14
Re: RegEx
I suppose it's a semi-red herring either way... it's almost JSON (I missed the [ bits) so you have the temptation to strip the characters out and then convert using any existing library; or regex which I think is going to be far more complicated. Perhaps a combination of both - regex to remove unnecessary bits and then the JSON converter to get it to get it to XML to understand it easier.
-
Apr 7th, 2009, 04:24 PM
#15
Thread Starter
Hyperactive Member
Re: RegEx
 Originally Posted by mendhak
I suppose it's a semi-red herring either way... it's almost JSON (I missed the [ bits) so you have the temptation to strip the characters out and then convert using any existing library; or regex which I think is going to be far more complicated. Perhaps a combination of both - regex to remove unnecessary bits and then the JSON converter to get it to get it to XML to understand it easier.
I'm defiantly liking the JSON conversion, it's just swapping out bits to make it JSON-compatiable.
-
Apr 8th, 2009, 10:57 AM
#16
Thread Starter
Hyperactive Member
Re: RegEx
Hmm, the multiple variables within eachother is proving confusing.
I have an idea, what about using RegEx or some other means to isolate blocks of the file:
Code:
["QuotedString"] = {
["QuotedString-Key"] = 1,
["QuotedString-Key"] = 0,
["QuotedString-Key"] = 0,
}
Somehow grab each block that starts with [""] = { and ends with }
Then I can check within that for even more variable blocks at which point I can strip and format the text for JSON to be acceptable.
If the file is just:
Code:
{
["QuotedString-Key"] = 1,
["QuotedString-Key"] = 0,
["QuotedString-Key"] = 0,
}
I can parse that with the JSON converter no problem, but if it contains other blocks, it starts to become an issue.
-
Apr 8th, 2009, 02:57 PM
#17
Re: RegEx
I copied the data from your first post to a file on my disk and ran the following code:
vb.net Code:
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click Dim fileData = File.ReadAllText("C:\Temp\test.txt") fileData = fileData.Replace("[""", """") fileData = fileData.Replace("""] = ", """ : ") fileData = Regex.Replace(fileData, "([^\s]),(\s+})", "$1$2") Debug.Print(fileData) End Sub
Output
Code:
TableName = {
"QuotedString" : {
"QuotedString" : {
"QuotedString-Key" : 1,
"QuotedString-Key" : 0,
"QuotedString-Key" : 0
},
"QuotedString" : {
"QuotedString-Key" : 1,
"QuotedString-Key" : 0,
"QuotedString-Key" : 0
},
},
"QuotedString-Key" : true,
"QuotedString-Key" : true
}
I think this is good JSON string and now your parser should work with this new string (assuming whole of your file is formatted the same way you showed in first post).
-
Apr 8th, 2009, 04:10 PM
#18
Thread Starter
Hyperactive Member
Re: RegEx
Everything is working except for one last thing (I'm pretty sure this is the last issue). In my haste, I overlooked the array part.
I noticed that a few blocks like this:
Code:
["MyArray"] = {
80,
"test",
"2009-04-06",
}
So my last question is how can I (Probably using RegEx) replace the { with [ ] if it's an array.
Again, I cannot thank you enough for your help. I had this issue once before a few years ago and never got it working.
-
Apr 8th, 2009, 04:41 PM
#19
Re: RegEx
hmm.. that's tricky. So here we have to replace the curly brackets with square brackets in this special case.
I'll try that and post here.
-
Apr 8th, 2009, 04:44 PM
#20
Thread Starter
Hyperactive Member
Re: RegEx
 Originally Posted by Pradeep1210
hmm.. that's tricky. So here we have to replace the curly brackets with square brackets in this special case.
I'll try that and post here.
Maybe looking for a ["Test"] = { "..test.."...... and replacing there since we know [""] won't be in the array. and then replacing the } at the end of the block
Err correction, look for [""] within a bracket, if it contains [""] then it can't be an array otherwise it would be a string, integer, etc.
-
Apr 8th, 2009, 04:49 PM
#21
Re: RegEx
Yes.. I was thinking along the same lines. We will need to replace that thing before we remove the square brackets [" "], otherwise we won't have any other way to find it.
-
Apr 8th, 2009, 04:51 PM
#22
Thread Starter
Hyperactive Member
Re: RegEx
 Originally Posted by Pradeep1210
Yes.. I was thinking along the same lines. We will need to replace that thing before we remove the square brackets [" "], otherwise we won't have any other way to find it.
That makes sense. Now we're getting somewhere!
-
Apr 9th, 2009, 05:43 AM
#23
Re: RegEx
Try this now.
The new regex now considers text between { and } only if there are no occurrences of [ or ] between them. We use this to match JSON arrays.
The regex used to replace the trailing comma and brackets, now checks for both ,} or ,]
vb.net Code:
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click Dim fileData = File.ReadAllText("C:\Temp\test.txt") fileData = Regex.Replace(fileData, "(""] = ){([^\[\]]+?)}", "$1[$2]") fileData = Regex.Replace(fileData, "([^\s]),(\s+[}\]])", "$1$2") fileData = fileData.Replace("[""", """") fileData = fileData.Replace("""] = ", """ : ") Debug.Print(fileData) End Sub
I inserted the above bock at 2 places in the original file data and this is the output:
Code:
TableName = {
"QuotedString" : {
"QuotedString" : {
"QuotedString-Key" : 1,
"QuotedString-Key" : 0,
"QuotedString-Key" : 0
},
"QuotedString" : {
"QuotedString-Key" : 1,
"QuotedString-Key" : 0,
"QuotedString-Key" : 0
},
"MyArray" : [
80,
"test",
"2009-04-06"
]
},
"QuotedString-Key" : true,
"QuotedString-Key" : true,
"MyArray" : [
80,
"test",
"2009-04-06"
]
}
Last edited by Pradeep1210; Apr 9th, 2009 at 05:48 AM.
-
Apr 9th, 2009, 09:25 AM
#24
Thread Starter
Hyperactive Member
Re: RegEx
Hmm, getting closer. Ran this and got this:
Code:
["Servers": {
["Test": {
["Users": {
["TestGroup": {
["Test2": {
["Test3": {
The only part that should contain the [ would be "Test3".
The code above did work if there was only one array in that block of code.
-
Apr 9th, 2009, 09:46 AM
#25
Re: RegEx
How big is the file? Can you attach a sample file here?
-
Apr 9th, 2009, 10:15 AM
#26
Thread Starter
Hyperactive Member
Re: RegEx
This is the basic structure of the file:
Code:
CensusPlus_Database = {
["TimesPlus"] = {
["Test"] = {
["Alliance"] = {
},
},
},
["Guilds"] = {
},
["Info"] = {
["AutoCensusTimer"] = 1800,
["AutoCensus"] = false,
["ClientLocale"] = "enUS",
["CensusButtonPosition"] = 289,
["CensusButtonShown"] = 1,
["Version"] = "4.2.2",
["LoginServer"] = "us.logon.worldofwarcraft.com",
["UseLogBars"] = 1,
["MiniStart"] = 0,
["Locale"] = "US",
},
["Servers"] = {
["Test"] = {
["Alliance"] = {
["Human"] = {
["Rogue"] = {
["Test"] = [
10, -- [1]
"", -- [2]
"2009-04-06", -- [3]
],
},
},
},
},
},
}
CensusPlus_BGInfo = {
}
CensusPlus_Unhandled = {
}
Under servers is where the bulk of the data is (10,000+ lines). It contains a server name then faction then race then class then name.
-
Apr 9th, 2009, 01:35 PM
#27
Re: RegEx
ok.. fixed furthur and now I get a valid JSON from that sample data:
vb.net Code:
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click Dim fileData = File.ReadAllText("C:\Temp\test.txt") Dim fileLen As Long Do fileLen = fileData.Length fileData = Regex.Replace(fileData, "(""] = ){([^\[\]]+?)}", "$1[$2]") fileData = Regex.Replace(fileData, "([^\s]),(\s+[}\]])", "$1$2") Loop While fileLen <> fileData.Length fileData = fileData.Replace("[""", """") fileData = fileData.Replace("""] = ", """ : ") Debug.Print(fileData) End Sub
EDIT: I had to remove the --[1] etc. from your data to get valid data. Not sure if that is a part of actual data though.
Last edited by Pradeep1210; Apr 9th, 2009 at 01:48 PM.
-
Apr 9th, 2009, 01:59 PM
#28
Re: RegEx
What we are actually doing in the above code now is that we make it go 2 or 3 passes until all occurances are replaced, since it leaves some of those commas in the first pass. 
This is the last output:
Code:
CensusPlus_Database = {
"TimesPlus" : {
"Test" : {
"Alliance" : [
]
}
},
"Guilds" : [
],
"Info" : {
"AutoCensusTimer" : 1800,
"AutoCensus" : false,
"ClientLocale" : "enUS",
"CensusButtonPosition" : 289,
"CensusButtonShown" : 1,
"Version" : "4.2.2",
"LoginServer" : "us.logon.worldofwarcraft.com",
"UseLogBars" : 1,
"MiniStart" : 0,
"Locale" : "US"
},
"Servers" : {
"Test" : {
"Alliance" : {
"Human" : {
"Rogue" : {
"Test" : [
10,
"",
"2009-04-06"
]
}
}
}
}
}
}
CensusPlus_BGInfo = {
}
CensusPlus_Unhandled = {
}
-
Apr 9th, 2009, 02:16 PM
#29
Thread Starter
Hyperactive Member
Re: RegEx
Thank you again!
The -- [1] was part of the application that exports the data. Not actually useful.
Loading the file (14,000 lines) using the above code with some tweaks and it passed with no problems.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|