I'm wanting to see if I can make this lexical analyzer any more efficient:
Code:
Private Function Scan(ByVal source As String) As KeyValuePair(Of String, String)()
Dim lexed As New List(Of KeyValuePair(Of String, String))
Dim definitions() As KeyValuePair(Of String, String) = {New KeyValuePair(Of String, String)("number", "(-?(\d*[1-9]+)|0)(\.\d+)?"), New KeyValuePair(Of String, String)("number", "([""'])(?:(?=(\\?))\2.)*?\1"), New KeyValuePair(Of String, String)("number", "true|false|null"), New KeyValuePair(Of String, String)("literal", "[\[\]{},.:]")}
Dim sourceQueue As New Queue(Of Char)(source.ToCharArray)
Dim currentMatch As Match
Dim matchDefinition As KeyValuePair(Of String, String)
Dim currentSource As String
Do
currentSource = New String(sourceQueue.ToArray())
matchDefinition = definitions.FirstOrDefault(Function(d)
currentMatch = New Regex(d.Value).Match(currentSource)
Return currentMatch.Success AndAlso currentMatch.Index = 0
End Function)
If Not String.IsNullOrWhiteSpace(matchDefinition.Key) Then
lexed.Add(New KeyValuePair(Of String, String)(matchDefinition.Key, currentMatch.Value))
For x As Integer = 0 To currentMatch.Value.Length - 1
sourceQueue.Dequeue()
Next
Else
Throw New InvalidProgramException(String.Format("'{0}' Invalid Character", sourceQueue.Peek))
End If
Loop Until sourceQueue.Count = 0
Return lexed.ToArray()
End Function
Basically the way that it works right now is:
- Create a function that returns a KeyValuePair(Of String, String) where the Key is the category name and the Value is the lexeme.
- Define the definitions by creating a collection to hold KeyValuePair(Of String, String) where the Key is the category name and the Value is the matching RegEx pattern.
- Create a Queue(Of Char) populated with all the Chars that make up the source.
- Loop until the Queue is empty
- Inside the loop get the definition from the collection where the source code at the current char is a match and the match is at index 0
- If a match is found then add it to a collection to be returned, otherwise throw an Invalid Character exception
- Iterate the length of the match and get rid of all the Chars that were matched from the Queue