I'm wanting to see if I can make this lexical analyzer any more efficient:
Code:
    Private Function Scan(ByVal source As String) As KeyValuePair(Of String, String)()
        Dim lexed As New List(Of KeyValuePair(Of String, String))
        Dim definitions() As KeyValuePair(Of String, String) = {New KeyValuePair(Of String, String)("number", "(-?(\d*[1-9]+)|0)(\.\d+)?"), New KeyValuePair(Of String, String)("number", "([""'])(?:(?=(\\?))\2.)*?\1"), New KeyValuePair(Of String, String)("number", "true|false|null"), New KeyValuePair(Of String, String)("literal", "[\[\]{},.:]")}
        Dim sourceQueue As New Queue(Of Char)(source.ToCharArray)
        Dim currentMatch As Match
        Dim matchDefinition As KeyValuePair(Of String, String)
        Dim currentSource As String
        Do
            currentSource = New String(sourceQueue.ToArray())
            matchDefinition = definitions.FirstOrDefault(Function(d)
                currentMatch = New Regex(d.Value).Match(currentSource)
                Return currentMatch.Success AndAlso currentMatch.Index = 0
            End Function)
            If Not String.IsNullOrWhiteSpace(matchDefinition.Key) Then
                lexed.Add(New KeyValuePair(Of String, String)(matchDefinition.Key, currentMatch.Value))
                For x As Integer = 0 To currentMatch.Value.Length - 1
                    sourceQueue.Dequeue()
                Next
            Else
                Throw New InvalidProgramException(String.Format("'{0}' Invalid Character", sourceQueue.Peek))
            End If
        Loop Until sourceQueue.Count = 0

        Return lexed.ToArray()
    End Function
Basically the way that it works right now is:
  1. Create a function that returns a KeyValuePair(Of String, String) where the Key is the category name and the Value is the lexeme.
  2. Define the definitions by creating a collection to hold KeyValuePair(Of String, String) where the Key is the category name and the Value is the matching RegEx pattern.
  3. Create a Queue(Of Char) populated with all the Chars that make up the source.
  4. Loop until the Queue is empty
  5. Inside the loop get the definition from the collection where the source code at the current char is a match and the match is at index 0
  6. If a match is found then add it to a collection to be returned, otherwise throw an Invalid Character exception
  7. Iterate the length of the match and get rid of all the Chars that were matched from the Queue