Results 1 to 2 of 2

Thread: Syntax Analyzer

  1. #1

    Thread Starter
    Super Moderator dday9's Avatar
    Join Date
    Mar 2011
    Location
    South Louisiana
    Posts
    11,715

    Syntax Analyzer

    I'm wanting to create an interpreter for the language brain****(mask picks it up but the last for letters rhymes with duck) and I'm wanting to make this as efficient as I can. Right now I am on the syntax analysis phase, I skipped the lexical analysis because there really is no need for it considering the language is so small and the syntax will never change. This is what I currently have:
    Code:
        Private Function Parse(ByVal source As String) As XDocument
            Dim parentNode As New XElement("brain****")
            Dim commandNode As XElement = Nothing
            Dim index As Integer = 0
            If index < source.Length AndAlso source(index) = "]"c Then
                'If the source code starts with a closed bracket, inform the user of the syntax error
                Throw New SyntaxErrorException("The source code cannot start with a closed bracket.")
            Else
                'Loop while there is still source code to parse
                Do While index < source.Length
                    'Parse the current source code
                    commandNode = ParseCommand(source, index)
                    If commandNode IsNot Nothing Then
                        If commandNode.Name.LocalName = "command" OrElse commandNode.Value.Length > 0 Then
                            'Only add commands or non-empty loops
                            parentNode.Add(commandNode)
                        End If
                    Else
                        'If the current lexeme is a ']' then inform the user of the syntax error
                        Throw New SyntaxErrorException("Bracket mismatch. There are more closed brackets then there are open brackets.")
                    End If
                Loop
            End If
    
            Return New XDocument(New XDeclaration("1.0", "utf-8", "yes"), parentNode)
        End Function
    
        CONST nonLoopCommands As String = ",.<>-+"
    
        Private Function ParseCommand(ByVal source As String, ByRef index As Integer) As XElement
            Dim terminalCommand As XElement = Nothing
            Dim lexeme As Char = source(index)
            If lexeme = "["c Then
                'If the current lexeme is a '[' then start to parse the <loop> terminal and consume the lexeme
                index += 1
                If index < source.Length Then
                    lexeme = source(index)
                    terminalCommand = New XElement("loop")
                    If lexeme = "]"c Then
                        'If the current lexeme is a ']' then we have a blank <loop> terminal, consume the lexeme
                        index += 1
                    Else
                        'If the current lexeme is not a ']' then continue parsing the <loop> terminal
                        Dim terminalCode As XElement = Nothing
                        'Loop while there is still source left to parse and while something was returned
                        Do
                            'Set the current value of the <command> terminal
                            terminalCode = ParseCommand(source, index)
                            'If the returning <command> terminal is not a blank loop then...
                            If terminalCode IsNot Nothing AndAlso terminalCode.Value.Length > 0 Then
                                'Add it to the <loop> terminal
                                terminalCommand.Add(terminalCode)
                            End If
                        Loop While index < source.Length AndAlso terminalCode IsNot Nothing
    
                        'If the last parsed terminalCode was succesfully parsed then inform the program of the syntax error
                        If terminalCode IsNot Nothing Then
                            Throw New SyntaxErrorException("Bracket mismatch. There are more open brackets then there are closed brackets.")
                        End If
                    End If
                Else
                    'Inform the programmer of the syntax error
                    Throw New SyntaxErrorException("The source code cannot end with an open bracket.")
                End If
            ElseIf lexeme = "]"c Then
                'If the current lexeme is a ']' then simply consume the lexeme
                index += 1
            ElseIf nonLoopCommands.IndexOf(lexeme) <> -1 Then
                'If the current lexeme is any of the non-loop commands(,.<>-+) then create a new <command> terminal with the lexeme as the value and consume the lexeme
                terminalCommand = New XElement("command", lexeme)
                index += 1
            Else
                'If all else fails then inform the user of the syntax error
                Throw New SyntaxErrorException(String.Format("'{0}' is an invalid command symbol.", lexeme))
            End If
    
            Return terminalCommand
        End Function
    
        Private Function FormatSource(ByVal source As String) As String
            'Define the commands to keep
            Dim commands As String = nonLoopCommands & "[]"
            'Declare something to hold the source code to be returned
            Dim builder As New Text.StringBuilder(source.Length - 1)
            'Loop through each character in the source code
            For index As Integer = 0 To source.Length - 1
                Dim lexeme As Char = source(index)
                'Check if the current character is one of the commands to keep
                If commands.IndexOf(lexeme) <> -1 Then
                    'If so then append it to the builder
                    builder.Append(lexeme)
                End If
            Next
    
            Return builder.ToString()
        End Function
    It is a recursive descent parser that returns an XDocument to represent the abstract syntax tree. Is there anything that y'all notice that I could make this more efficient?

    My next phase is optimization where I intend on remove repeating nodes and adding an attribute on the repeating node indicating how many times it repeated. For example:
    Code:
    >++++++++[<+++++++++>-]
    Would build a parse tree with 9 command nodes followed by a loop node with 12 command nodes. Where as the optimization tree would have 2 command nodes followed by a loop node with 4 command nodes, basically transforming it to:
    Code:
    command(>, 1) command(+, 8) loop(command(<, 1) command(+, 9) command(>, 1) command(-, 1))
    "Code is like humor. When you have to explain it, it is bad." - Cory House
    VbLessons | Code Tags | Sword of Fury - Jameram

  2. #2

    Thread Starter
    Super Moderator dday9's Avatar
    Join Date
    Mar 2011
    Location
    South Louisiana
    Posts
    11,715

    Re: Syntax Analyzer

    So, I was able to reduce the size of the code, although I'm unsure if this makes the code more efficient or not. Instead of a recursive descent parser, because of the simplicity of brainf**k's syntax, I was able to roll out a simple brute force parser:
    Code:
    Private Const commands As String = ",.<>-+"
    Private Const openLoop As Char = "["c
    Private Const closeLoop As Char = "]"c
    Private Function Parse(ByVal source As String) As XElement
        Dim parent As New XElement("brain****")
        Dim currentParent As XElement = parent
    
        For Each command As Char In source
            If commands.IndexOf(command) <> -1 Then
                currentParent.Add(New XElement("command", command))
            ElseIf command = openLoop Then
                Dim loopCommand As XElement = New XElement("loop")
                currentParent.Add(loopCommand)
                currentParent = loopCommand
            ElseIf command = closeLoop AndAlso currentParent.Parent IsNot Nothing Then
                currentParent = currentParent.Parent
            ElseIf command = closeLoop Then
                Throw New SyntaxErrorException("Bracket Mismatch. There are more close brackets than opened.")
            Else
                Throw New SyntaxErrorException("Invalid Command. '{0}' is an invalid brain**** command lexeme.")
            End If
        Next
    
        If currentParent IsNot parent Then
            Throw New SyntaxErrorException("Bracket Mismatch. There are more open brackets than closed.")
        End If
    
        Return parent
    End Function
    "Code is like humor. When you have to explain it, it is bad." - Cory House
    VbLessons | Code Tags | Sword of Fury - Jameram

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width