-
Dec 10th, 2013, 12:37 AM
#1
high level compiler
First off before I start, I want to know that I'm not sure if this belongs here in the General Developer Forum or the Vb.Net forum or even the Code it Better forum, but to start I'll post here. Just notify a mod if it should be moved.
I'm wanting to know more about how a compiler really works. So to do this, I'm just creating my own basic one. Basically I'm on the declaring variables portion and here is what I've got so far:
Declaration/Initialization
Code:
set <var_name> as <type> (optional) = <value>
Types
type |
description |
vb.net equivalent |
text |
Represents any form of text |
System.String |
num |
Represents any number, positive or negative, without any decimal places |
System.Int32 |
decimal |
Represents any number, positive or negative, with decimal place |
System.Decimal |
bool |
True or False value |
System.Boolean |
Examples
Code:
set foo_value as text
set foo_value as text = "hello world"
set i as num = 0
set d as decimal = 4.5
set quit as bool = False
Logic
If the first word is "set" then move on, otherwise return a syntax error
The second word is the variable name, move on
If the third word is "as" then move on, otherwise return a syntax error
If the fourth word is: text, num, decimal, or bool then move on, otherwise return a syntax error
If the fifth word is nothing, then end statement. If the fifth word is "=" then move on. Otherwise return a syntax error
If the sixth word meets the qualifications of the type, then end statement. Otherwise return a syntax error
I got all that down. Now I've put my coding skills to the test and came up with this:
Code:
Private Function Declaration(ByVal input As String) As String
'Set up an array of strings that will be the tokens we parse
Dim tokens() As String = input.Split({" "}, StringSplitOptions.None)
'declare, but don't initialize a:
'String for the name
'Type for the type
'Object for the value
Dim var_name As String
Dim var_type As Type
Dim var_value As Object
'If the token count is less than 4 or the first word isn't set then return a syntax error
'Otherwise move on
If tokens.Count > 3 AndAlso tokens(0).ToLower = "set" Then
'the variable's name is the 2nd word
var_name = tokens(1)
'if the 3rd word isn't as then return a syntax error
If tokens(2).ToLower = "as" Then
'Try to convert the string to a type
'If that fails, then return a syntax error
Dim temp_type As String = tokens(3).ToLower
Select Case temp_type
Case "text"
var_type = GetType(String)
Case "num"
var_type = GetType(Integer)
Case "decimal"
var_type = GetType(Double)
Case "bool"
var_type = GetType(Boolean)
Case Else
Return "Syntax Error"
End Select
'If the token count is 4 then the user elected to use this declaration:
'set <name> as <type>
If tokens.Count = 4 Then
Return String.Format("You've successfuly declared an object.{0}Vb.Net Equivalent :Dim {1} As {2}", _
Environment.NewLine, var_name, var_type.ToString)
'Otherwise the user elected to set the variable
'Next we check if the count is atleast 6 and the 5th word is =
'If not, then return a syntax error
ElseIf tokens.Count >= 6 AndAlso tokens(4) = "=" Then
'Next we try to convert the type:
'
'text = System.String
'num = System.Int32
'decimal = System.Decimal
'bool = System.Boolean
'The num, dec, and bool are easy because those 3 types have TryParse
'If the TryParse returns a false value, then return a syntax error
'The string is a bit tricky...
Select Case var_type
Case GetType(String)
'First we get all the text left over by iterating through the remaining array items
Dim temp_text As String = String.Empty
For i As Integer = 5 To tokens.Length - 1
'The temp_text adds that word along with a blank space
'This is to keep the spacing in the string
temp_text &= tokens(i) & " "
Next
'Here we remove the last letter because that is a space that isn't needed
temp_text = temp_text.Substring(0, temp_text.Length - 1)
'Here we check if the first and last characters are a double quote
'If not, then return a syntax error
If temp_text.Substring(0, 1) = """" AndAlso temp_text.Substring(temp_text.Length - 1) = """" Then
'Here we get all the text in between the opening and closing quotes
temp_text = temp_text.Substring(1, temp_text.Length - 2)
'If the text contains any other double quotes, then return a syntax error
'Otherwise we've got the value of the string
If temp_text.Contains("""") = False Then
var_value = """" & temp_text & """"
Else
Return "Syntax Error"
End If
Else
Return "Syntax Error"
End If
Case GetType(Integer)
If Integer.TryParse(tokens(5), New Integer) Then
var_value = CInt(tokens(5))
Else
Return "Syntax Error"
End If
Case GetType(Double)
If Double.TryParse(tokens(5), New Double) Then
var_value = CDbl(tokens(5))
Else
Return "Syntax Error"
End If
Case GetType(Boolean)
If Boolean.TryParse(tokens(5), New Boolean) Then
var_value = CBool(tokens(5))
Else
Return "Syntax Error"
End If
End Select
'Let the user know that they've declared an object and what the vb.net equivalent would be
Return String.Format("You've successfuly declared an object.{0}Vb.Net Equivalent :Dim {1} As {2} = {3}", _
Environment.NewLine, var_name, var_type.ToString, var_value.ToString)
Else
Return "Syntax Error"
End If
Else
Return "Syntax Error"
End If
Else
Return "Syntax Error"
End If
End Function
Is that basically how a compiler works? Because that seems like it would be pretty slow, which it isn't because I'm only declaring that one variable, but I can see it slowing down if I'm doing quite a bit of work. What do y'all think?
-
Dec 10th, 2013, 05:49 AM
#2
Re: high level compiler
It's more a syntax checker than a compiler.
http://en.wikipedia.org/wiki/Compiler
-
Dec 10th, 2013, 09:33 AM
#3
Re: high level compiler
Well my plan is to check the syntax first, then once the syntax is correct, only return the vb.net equivalent. Because from there I can compile that vb.net code using Boops Boops codebank post here.
-
Dec 10th, 2013, 09:47 AM
#4
Re: high level compiler
Well compilation is not really a simple process. Typically you start with a lexical analyzer who's output is fed to a parser which is then fed to a compiler. Each of these processes is quite complicated in their own right.
-
Dec 10th, 2013, 09:55 AM
#5
Re: high level compiler
Originally Posted by dday9
Well my plan is to check the syntax first, then once the syntax is correct, only return the vb.net equivalent. Because from there I can compile that vb.net code using Boops Boops codebank post here.
That would be a Translator, translating from one programming language to another.
-
Dec 10th, 2013, 11:49 AM
#6
Re: high level compiler
So I guess an example of a lexical analyzer would be something like this then:
A base analyzer so whenever I expand to other things I can just inherit the Analyzer
Code:
Option Strict On
Option Explicit On
Public MustInherit Class Analyzer
#Region "Globals"
Private source_code As String
Private _end_statement As New List(Of String)
Private _identifiers As New List(Of String)
Private _keywords As New List(Of String)
Private _literals As New List(Of String)
Private _operators As New List(Of String)
Private _statements() As String = {}
Private _line_errors As New Dictionary(Of String, Exception)
#End Region
#Region "Properties"
Public Property Source() As String
Get
Return source_code
End Get
Set(ByVal value As String)
source_code = value
End Set
End Property
Public Property EndStatements() As List(Of String)
Get
Return _end_statement
End Get
Set(ByVal value As List(Of String))
_end_statement = value
End Set
End Property
Public Property Identifiers() As List(Of String)
Get
Return _identifiers
End Get
Set(ByVal value As List(Of String))
_identifiers = value
End Set
End Property
Public Property Keywords() As List(Of String)
Get
Return _keywords
End Get
Set(ByVal value As List(Of String))
_keywords = value
End Set
End Property
Public Property Literals() As List(Of String)
Get
Return _literals
End Get
Set(ByVal value As List(Of String))
_literals = value
End Set
End Property
Public Property Operators() As List(Of String)
Get
Return _operators
End Get
Set(ByVal value As List(Of String))
_operators = value
End Set
End Property
Public Property Statements() As String()
Get
Return _statements
End Get
Set(value As String())
_statements = value
End Set
End Property
Public Property LineErrors() As Dictionary(Of String, Exception)
Get
Return _line_errors
End Get
Set(ByVal value As Dictionary(Of String, Exception))
_line_errors = value
End Set
End Property
#End Region
End Class
An analyzer specifically for declaration/initialization
Code:
Option Strict On
Option Explicit On
Public Class Declaration_Analyzer
Inherits Analyzer
#Region "Properties"
Private _parser As Declaration_Parser = New Declaration_Parser
Public Property DeclarationParser() As Declaration_Parser
Get
Return _parser
End Get
Set(ByVal value As Declaration_Parser)
_parser = value
End Set
End Property
#End Region
#Region "Private Subs and Functions"
Private Sub load_lexeme()
'_keywords include: set, as, text, num, decimal, bool
Me.Keywords.AddRange({"set", "as", "text", "num", "decimal", "bool"})
'_literals include: 0 - 9, A - Z, True, False, double quote, and decimal point
Me.Literals.AddRange({"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", _
"""", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", _
"True", "False", "."})
'_operators include: addition, subtraction, multiplication, division, and power
Me.Operators.AddRange({"+", "-", "*", "/", "^"})
'End of statement is new line
Me.EndStatements.Add(Environment.NewLine)
End Sub
Private Function GetStatements(ByVal input As String) As String()
Return input.Split(Me.EndStatements.ToArray, StringSplitOptions.None)
End Function
Private Function AnalyseStatements(ByVal statements() As String) As String()
Dim non_identifiable_tokens As New List(Of String)
For Each line As String In statements
Dim words() As String = line.Split({" "}, StringSplitOptions.None)
For Each w As String In words
If Me.Literals.Contains(w) = False AndAlso _
Me.Operators.Contains(w) = False AndAlso _
Me.EndStatements.Contains(w) = False Then
non_identifiable_tokens.Add(line)
Exit For
End If
Next
Next
Return non_identifiable_tokens.ToArray
End Function
Private Sub GetIdentifiers(ByVal non_identifiable_tokens() As String)
For Each statement As String In non_identifiable_tokens
Dim attempt As String = _parser.Declaration(statement)
If attempt.ToLower = "syntax error" Then
Me.LineErrors.Add(statement, New SyntaxErrorException)
ElseIf Me.Literals.Contains(attempt) = True OrElse _
Me.Operators.Contains(attempt) = True OrElse _
Me.EndStatements.Contains(attempt) = True Then
Me.LineErrors.Add(statement, New SyntaxErrorException("Keyword is not a valid identifier."))
Else
Me.Identifiers.Add(attempt)
End If
Next
End Sub
Private Function Tokenize() As Boolean
Me.Statements = GetStatements(Me.Source)
Dim non_identifiable_tokens() As String = AnalyseStatements(Statements)
Call GetIdentifiers(non_identifiable_tokens)
If Me.LineErrors.Count = 0 Then
Return True
Else
Me.Statements = {}
Return False
End If
End Function
#End Region
#Region "Public Methods"
Public Function Convert_To_Vb() As String
Dim str As String = String.Empty
If Tokenize() Then
str = _parser.VbEquivalent
Else
For Each syntaxerror As KeyValuePair(Of String, Exception) In Me.LineErrors
str &= String.Format("An error occured on line:{1}{0}{2}{0}", Environment.NewLine, syntaxerror.Value, syntaxerror.Key)
Next
End If
Return str
End Function
#End Region
Public Sub New()
Call load_lexeme()
End Sub
End Class
A parser specifically for declaration/initialization
Code:
Option Strict On
Option Explicit On
Public Class Declaration_Parser
Private _vb As String
Public Property VbEquivalent() As String
Get
Return _vb
End Get
Set(ByVal value As String)
_vb = value
End Set
End Property
Public Function Declaration(ByVal input As String) As String
'Set up an array of strings that will be the tokens we parse
Dim tokens() As String = input.Split({" "}, StringSplitOptions.None)
'declare, but don't initialize a:
'String for the name
'Type for the type
'Object for the value
Dim var_name As String
Dim var_type As Type
Dim var_value As Object = Nothing
'If the token count is less than 4 or the first word isn't set then return a syntax error
'Otherwise move on
If tokens.Count > 3 AndAlso tokens(0).ToLower = "set" Then
'the variable's name is the 2nd word
var_name = tokens(1)
'if the 3rd word isn't as then return a syntax error
If tokens(2).ToLower = "as" Then
'Try to convert the string to a type
'If that fails, then return a syntax error
Dim temp_type As String = tokens(3).ToLower
Select Case temp_type
Case "text"
var_type = GetType(String)
Case "num"
var_type = GetType(Integer)
Case "decimal"
var_type = GetType(Double)
Case "bool"
var_type = GetType(Boolean)
Case Else
Return "Syntax Error"
End Select
'If the token count is 4 then the user elected to use this declaration:
'set <name> as <type>
If tokens.Count = 4 Then
_vb &= String.Format("Dim {0} As {1}{2}", var_name, var_type.ToString, Environment.NewLine)
Return var_name
'Otherwise the user elected to set the variable
'Next we check if the count is atleast 6 and the 5th word is =
'If not, then return a syntax error
ElseIf tokens.Count >= 6 AndAlso tokens(4) = "=" Then
'Next we try to convert the type:
'
'text = System.String
'num = System.Int32
'decimal = System.Decimal
'bool = System.Boolean
'The num, dec, and bool are easy because those 3 types have TryParse
'If the TryParse returns a false value, then return a syntax error
'The string is a bit tricky...
Select Case var_type
Case GetType(String)
'First we get all the text left over by iterating through the remaining array items
Dim temp_text As String = String.Empty
For i As Integer = 5 To tokens.Length - 1
'The temp_text adds that word along with a blank space
'This is to keep the spacing in the string
temp_text &= tokens(i) & " "
Next
'Here we remove the last letter because that is a space that isn't needed
temp_text = temp_text.Substring(0, temp_text.Length - 1)
'Here we check if the first and last characters are a double quote
'If not, then return a syntax error
If temp_text.Substring(0, 1) = """" AndAlso temp_text.Substring(temp_text.Length - 1) = """" Then
'Here we get all the text in between the opening and closing quotes
temp_text = temp_text.Substring(1, temp_text.Length - 2)
'If the text contains any other double quotes, then return a syntax error
'Otherwise we've got the value of the string
If temp_text.Contains("""") = False Then
var_value = """" & temp_text & """"
Else
Return "Syntax Error"
End If
Else
Return "Syntax Error"
End If
Case GetType(Integer)
If Integer.TryParse(tokens(5), New Integer) Then
var_value = CInt(tokens(5))
Else
Return "Syntax Error"
End If
Case GetType(Double)
If Double.TryParse(tokens(5), New Double) Then
var_value = CDbl(tokens(5))
Else
Return "Syntax Error"
End If
Case GetType(Boolean)
If Boolean.TryParse(tokens(5), New Boolean) Then
var_value = CBool(tokens(5))
Else
Return "Syntax Error"
End If
End Select
_vb &= String.Format("Dim {0} As {1} = {2}{3}", var_name, var_type.ToString, var_value.ToString, Environment.NewLine)
'Let the user know that they've declared an object and what the vb.net equivalent would be
Return var_name
Else
Return "Syntax Error"
End If
Else
Return "Syntax Error"
End If
Else
Return "Syntax Error"
End If
End Function
End Class
And then I'd call something like this:
Code:
Option Strict On
Option Explicit On
Module Module1
Sub Main()
Dim analysis As New Declaration_Analyzer
analysis.Source = ReadFile(IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments), "source.txt"))
Console.WriteLine(analysis.Convert_To_Vb)
Console.ReadLine()
End Sub
Private Function ReadFile(ByVal path As String) As String
Return IO.File.ReadAllText(path)
End Function
End Module
But then again, I guess it's sort of a lexical analyzer mixed with a parser because I parse all the statements that contain a word that isn't apart of my keywords, literals, operators, and end_statement list. But I parse it to return either A) an Identifier token or B) an Exception token. And on top of it all, it's only a translator as I just convert the custom source code to vb.net code.
Last edited by dday9; Dec 10th, 2013 at 06:24 PM.
-
Dec 12th, 2013, 07:40 PM
#7
Re: high level compiler
As mentioned in this thread... here is an updated code:
Code:
Option Strict On
Option Explicit On
Module Module1
Sub Main()
Do Until True = False
Dim foo_analyzer As New Analyzer_Declaration
foo_analyzer.Source = Console.ReadLine
Dim foo_parser As New Parser_Declaration
foo_parser.Analyzer = foo_analyzer
If foo_parser.CanParse Then
Console.WriteLine(String.Format("Source code parsed properly{0}Vb.Net Equivalent - {1}{0}", Environment.NewLine, foo_parser.ReturnedObject.GetEquivalent))
Else
For Each ex As Exception In foo_parser.Errors
Console.WriteLine(String.Format("Error on line:{0}{1}{2}{1}", Console.ReadLine, Environment.NewLine, ex))
Next
End If
Loop
End Sub
End Module
Public MustInherit Class Analyzer
#Region "Globals"
Private source_code As String
Private m_end_statement As New List(Of String)
Private m_identifiers As New List(Of String)
Private m_keywords As New List(Of String)
Private m_text_literals As New List(Of String)
Private m_num_literals As New List(Of String)
Private m_decimal_literals As New List(Of String)
Private m_bool_literals As New List(Of String)
Private m_operators As New List(Of String)
Private m_types As New List(Of String)
#End Region
#Region "Properties"
Public Property Source() As String
Get
Return source_code
End Get
Set(ByVal value As String)
source_code = value
End Set
End Property
Public ReadOnly Property EndStatements() As List(Of String)
Get
Return m_end_statement
End Get
End Property
Public Property Identifiers() As List(Of String)
Get
Return m_identifiers
End Get
Set(ByVal value As List(Of String))
m_identifiers = value
End Set
End Property
Public Property Keywords() As List(Of String)
Get
Return m_keywords
End Get
Set(ByVal value As List(Of String))
m_keywords = value
End Set
End Property
Public ReadOnly Property TextLiterals() As List(Of String)
Get
Return m_text_literals
End Get
End Property
Public ReadOnly Property NumLiterals() As List(Of String)
Get
Return m_num_literals
End Get
End Property
Public ReadOnly Property DecimalLiterals() As List(Of String)
Get
Return m_decimal_literals
End Get
End Property
Public ReadOnly Property BoolLiterals() As List(Of String)
Get
Return m_bool_literals
End Get
End Property
Public ReadOnly Property Types() As List(Of String)
Get
Return m_types
End Get
End Property
Public ReadOnly Property Operators() As List(Of String)
Get
Return m_operators
End Get
End Property
#End Region
Public Sub New()
'Text literals inclued: the whole alphabet and double quotes
m_text_literals.AddRange({"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", _
""""})
'Num literals include: 0 - 9
m_num_literals.AddRange({"0", "1", "2", "3", "4", "5", "6", "7", "8", "9"})
'Decimal literals include: the negative sign, the decimal point, and 0 - 9
m_decimal_literals.AddRange({"-", ".", _
"0", "1", "2", "3", "4", "5", "6", "7", "8", "9"})
'Bool literals include: True and False
m_bool_literals.AddRange({"true", "false"})
'operators include: addition, subtraction, multiplication, division, and power
m_operators.AddRange({"+", "-", "*", "/", "^", "="})
'operators include: text, num, decimal, bool
m_types.AddRange({"text", "num", "decimal", "bool"})
'End of statement is new line
m_end_statement.Add(Environment.NewLine)
End Sub
End Class
Public MustInherit Class Parser
#Region "Globals"
Protected m_errors As New List(Of Exception)
#End Region
#Region "Properties"
Public MustOverride Property Analyzer() As Analyzer
Public ReadOnly Property Errors() As List(Of Exception)
Get
Return m_errors
End Get
End Property
#End Region
End Class
Public Class Analyzer_Declaration
Inherits Analyzer
#Region "Globals"
Private m_tokens As New Dictionary(Of String, TokenType)
#End Region
#Region "Properties/Enums"
Public Enum TokenType
Identifier
Keyword
Literal
[Operator]
[Type]
End Enum
Public ReadOnly Property Tokens As Dictionary(Of String, TokenType)
Get
Return m_tokens
End Get
End Property
#End Region
#Region "Methods"
Public Sub GetTokens()
'Clear any tokens
m_tokens.Clear()
For Each word As String In Me.Source.Split({" "}, StringSplitOptions.None)
If Me.Keywords.Contains(word) Then
m_tokens.Add(word, TokenType.Keyword)
ElseIf Me.Operators.Contains(word) Then
m_tokens.Add(word, TokenType.Operator)
ElseIf Me.NumLiterals.Contains(word) OrElse _
Me.DecimalLiterals.Contains(word) OrElse _
Me.BoolLiterals.Contains(word) Then
m_tokens.Add(word, TokenType.Literal)
ElseIf Me.Types.Contains(word) Then
m_tokens.Add(word, TokenType.Type)
Else
m_tokens.Add(word, TokenType.Identifier)
End If
Next
End Sub
#End Region
End Class
Public Class Parser_Declaration
Inherits Parser
#Region "Globals"
Private m_analyzer As Analyzer_Declaration
Private returned_obj As Variable
#End Region
#Region "Properties"
Public Overrides Property Analyzer As Analyzer
Get
Return m_analyzer
End Get
Set(ByVal value As Analyzer)
m_analyzer = DirectCast(value, Analyzer_Declaration)
End Set
End Property
Public Overloads ReadOnly Property Errors() As List(Of Exception)
Get
Return m_errors
End Get
End Property
Public ReadOnly Property ReturnedObject As Variable
Get
Return returned_obj
End Get
End Property
#End Region
#Region "Methods"
Public Function CanParse() As Boolean
Dim parsable As Boolean = True
m_analyzer.GetTokens()
Dim token_array() As String = m_analyzer.Tokens.Keys.ToArray
Dim type_array() As Analyzer_Declaration.TokenType = m_analyzer.Tokens.Values.ToArray
Dim var_name As String = String.Empty
Dim var_type As Type = Nothing
Dim var_value As Object = Nothing
'Check for the newline by seeing if the length is only 1
If token_array.Length > 1 Then
'If the first word isn't set then return a syntax error
If token_array(0).ToLower <> "set" Then
m_errors.Add(New SyntaxErrorException("Declarations start with the 'set' keyword."))
parsable = False
End If
'The variable's name is the second word, if it's an identifier
'If it's not then return a syntax error
If type_array(1) = Analyzer_Declaration.TokenType.Identifier Then
var_name = token_array(1)
Else
m_errors.Add(New SyntaxErrorException("Invalid identifier name."))
parsable = False
End If
'if the 3rd word isn't as then return a syntax error
If token_array(2).ToLower <> "as" Then
m_errors.Add(New SyntaxErrorException("When declaring a variable, the 'as' keyword follows the identifier."))
parsable = False
End If
'if the 4th word isn't a type, then return a syntax error
'if it is, then set the variable type
If type_array(3) = Analyzer_Declaration.TokenType.Type Then
Select Case token_array(3)
Case "text"
var_type = GetType(String)
Case "num"
var_type = GetType(Integer)
Case "decimal"
var_type = GetType(Decimal)
Case "bool"
var_type = GetType(Boolean)
End Select
Else
m_errors.Add(New SyntaxErrorException("Invalid data type."))
parsable = False
End If
'If there are no more words, then the user elected to use this declaration:
'set <name> as <type>
If token_array.Length = 4 Then
If parsable <> False Then
parsable = True
returned_obj = New Variable(var_name, var_type, var_value)
End If
Else
'Check if the 5th word is the equal sign, if not return an error
If token_array(4) <> "=" Then
m_errors.Add(New SyntaxErrorException("Invalid operator type. When initializing a variable, use the '=' sign."))
parsable = False
End If
'The num, dec, and bool are easy because those 3 types have TryParse
'If the TryParse returns a false value, then return a syntax error
'The string is a bit tricky...
Select Case var_type
Case GetType(String)
'First we get all the text left over by iterating through the remaining array items
Dim temp_text As String = String.Empty
For i As Integer = 5 To token_array.Length - 1
'The temp_text adds that word along with a blank space
'This is to keep the spacing in the string
temp_text &= token_array(i) & " "
Next
'Here we remove the last letter because that is a space that isn't needed
temp_text = temp_text.Substring(0, temp_text.Length - 1)
'Here we check if the first and last characters are a double quote
'If not, then return a syntax error
If temp_text.Substring(0, 1) = """" AndAlso temp_text.Substring(temp_text.Length - 1) = """" Then
'Here we get all the text in between the opening and closing quotes
temp_text = temp_text.Substring(1, temp_text.Length - 2)
'If the text contains any other double quotes, then return a syntax error
'Otherwise we've got the value of the string
If temp_text.Contains("""") = False Then
var_value = """" & temp_text & """"
Else
m_errors.Add(New SyntaxErrorException("Text cannot store double quotes."))
parsable = False
End If
Else
m_errors.Add(New SyntaxErrorException("Text variables store their value by wrapping the value in opening and closing quotation marks."))
parsable = False
End If
Case GetType(Integer)
If Integer.TryParse(token_array(5), New Integer) Then
var_value = CInt(token_array(5))
Else
m_errors.Add(New SyntaxErrorException("Value is not of Num type"))
parsable = False
End If
Case GetType(Decimal)
If Decimal.TryParse(token_array(5), New Decimal) Then
var_value = CDbl(token_array(5))
Else
m_errors.Add(New SyntaxErrorException("Value is not of Decimal type"))
parsable = False
End If
Case GetType(Boolean)
If Boolean.TryParse(token_array(5), New Boolean) Then
var_value = CBool(token_array(5))
Else
m_errors.Add(New SyntaxErrorException("Value is not of Bool type."))
parsable = False
End If
End Select
If parsable <> False Then
parsable = True
'set the variable type
returned_obj = New Variable(var_name, var_type, var_value)
End If
End If
Else
parsable = False
End If
Return parsable
End Function
#End Region
End Class
Public Class Variable
#Region "Globals"
Private m_name As String
Private m_type As Type
Private m_val As Object
#End Region
#Region "Properties"
Public Property Name() As String
Get
Return m_name
End Get
Set(ByVal value As String)
m_name = value
End Set
End Property
Public Property Type() As Type
Get
Return m_type
End Get
Set(ByVal value As Type)
m_type = value
End Set
End Property
Public Property Value() As Object
Get
Return m_val
End Get
Set(ByVal value As Object)
m_val = value
End Set
End Property
#End Region
#Region "Methods"
Public Function GetEquivalent() As String
If m_val Is Nothing Then
Return String.Format("Dim {0} As {1}", m_name, m_type)
Else
Return String.Format("Dim {0} As {1} = {2}", m_name, m_type, m_val)
End If
End Function
#End Region
Public Sub New(ByVal _name As String, ByVal _type As Type, Optional ByVal _val As Object = Nothing)
m_name = _name
m_type = _type
m_val = _val
End Sub
End Class
moved to next post....
-
Dec 12th, 2013, 07:41 PM
#8
Re: high level compiler
Sorry my last post was to long, so I'm just going to continue in 3, 2, 1...
So my analyzer processes all the words in a statement and stores them as tokens. My parser goes through every token and makes sure that they follow the rules(the logic I posted above) by calling a boolean function. If that function returns a false value, then the code cannot be parsed and it fills up a list of exceptions. If that function returns a true value then the code is in the correct syntax. I guess the next step would be to make sure that even though the code is in the correct syntax, it can be properly parsed. IE -
Code:
set myname as string = "david day"
set myname as string = "David Daniel Day"
wouldn't be able to be parsed because there has already been a variable called myname declared. If I wanted to correct it, I'd call something like:
Code:
set myname as string = "david day"
set myfullname as string = "David Daniel Day"
or
Code:
set myname as string = "david day"
myname = "David Daniel Day"
But what would be after that? How would I be able to preform that command without translating it to vb.net code and compiling from the JIT compiler?
-
Dec 12th, 2013, 10:39 PM
#9
Re: high level compiler
Originally Posted by dday9
But what would be after that? How would I be able to preform that command without translating it to vb.net code and compiling from the JIT compiler?
You could write your own interpreter. Or you could make your own stack based language and a VM that executes its instructions and write a compiler that compiles for the VM.
-
Apr 22nd, 2014, 11:38 PM
#10
Re: high level compiler
Well I've fiddled around with the mini-language for a while now. I've changed the syntax to this:
Required:
Modifier
Indentifier
As
Data Type
Optional:
=
Value
Modifiers are: local, global
As is literally the word "as"
Data types are: text, number, decimal, and Boolean
Equal sign is literally the "="
Value is determine on what the data type is.
Here is my scanner:
Code:
Option Strict On
Option Explicit On
Public Class Scanner
Private symbols() As String = {"~", "`", "!", "@", "#", "$", "%", "^", "&", "*", "(", ")", "_", "-", "+", "=", "{", "[", "}", "]", "|", "\", ":", ";", """", "'", "<", ",", ">", ".", "?", "/"}
Public Enum Token
Modifier
Identifier
[As]
Type
Equal
[String]
Number
[Decimal]
[Boolean]
EOL
Exception
[Null]
End Enum
Private pTokens As Dictionary(Of String, Token)
Public ReadOnly Property Tokens() As Dictionary(Of String, Token)
Get
Return pTokens
End Get
End Property
Private code As String
Public Property SourceCode() As String
Get
Return code
End Get
Set(ByVal value As String)
code = value
End Set
End Property
Public Sub Scan()
Dim lines() As String = code.Split({Environment.NewLine}, StringSplitOptions.None)
For Each line As String In lines
For Each word As String In line.Split({" "}, StringSplitOptions.None)
If word.ToLower = "global" OrElse word.ToLower = "local" Then
pTokens.Add(word, Token.Modifier)
ElseIf word.ToLower = "as" Then
pTokens.Add(word, Token.As)
ElseIf word.ToLower = "text" OrElse word.ToLower = "number" OrElse word.ToLower = "decimal" OrElse word.ToLower = "boolean" Then
pTokens.Add(word, Token.Type)
ElseIf word = "=" Then
pTokens.Add(word, Token.Equal)
ElseIf Integer.TryParse(word, New Integer) Then
pTokens.Add(word, Token.Number)
ElseIf Decimal.TryParse(word, New Decimal) Then
pTokens.Add(word, Token.Decimal)
ElseIf Boolean.TryParse(word, New Boolean) Then
pTokens.Add(word, Token.Boolean)
ElseIf word.First = """" AndAlso word.Last = """" AndAlso word.Substring(1, word.Length - 1).Contains("""") = False Then
pTokens.Add(word, Token.String)
ElseIf symbols.Contains(word) OrElse Integer.TryParse(word.First, New Integer) Then
If symbols.Contains(word) Then
pTokens.Add("There is an invalid symbol in the word: " & word, Token.Exception)
Else
pTokens.Add("There is a number at the beginning of the word: " & word, Token.Exception)
End If
Else
pTokens.Add(word, Token.Identifier)
End If
Next
pTokens.Add(Environment.NewLine, Token.EOL)
Next
End Sub
Public Sub New()
pTokens = New Dictionary(Of String, Token)
End Sub
End Class
Here is my parser:
Code:
Option Strict On
Option Explicit On
Public Class Parser
Private ex As String
Public Property Exception() As String
Get
Return ex
End Get
Set(ByVal value As String)
ex = value
End Set
End Property
Private pTokens As Dictionary(Of String, Scanner.Token)
Public Property Tokens() As Dictionary(Of String, Scanner.Token)
Get
Return pTokens
End Get
Set(ByVal value As Dictionary(Of String, Scanner.Token))
pTokens = value
End Set
End Property
Public Function CanParse() As Boolean
If Not IsNothing(pTokens) Then
Dim prior As Scanner.Token = Scanner.Token.Null
For Each item As KeyValuePair(Of String, Scanner.Token) In pTokens
Select Case item.Value
Case Scanner.Token.As
If AcceptsAs(prior) = False Then
ex = "Syntax error. The as keyword cannot follow a " & prior.ToString
Return False
End If
Case Scanner.Token.Boolean Or Scanner.Token.Decimal Or Scanner.Token.Number Or Scanner.Token.String
If AcceptsValue(prior) = False Then
ex = "Syntax error. Values cannot follow a " & prior.ToString
Return False
End If
Case Scanner.Token.EOL
If AcceptsEOL(prior) = False Then
ex = "Syntax error. The End of Line cannot follow a " & prior.ToString
Return False
End If
Case Scanner.Token.Equal
If AcceptsEqual(prior) = False Then
ex = "Syntax error. The equal sign cannot follow a " & prior.ToString
Return False
End If
Case Scanner.Token.Identifier
If AcceptsIdentifier(prior) = False Then
ex = "Syntax error. A variable cannot follow a " & prior.ToString
Return False
End If
Case Scanner.Token.Modifier
If AcceptsModifier(prior) = False Then
ex = "Syntax error. An access modifier cannot follow a " & prior.ToString
Return False
End If
Case Scanner.Token.Type
If AcceptsType(prior) = False Then
ex = "Syntax error. A data type cannot follow a " & prior.ToString
Return False
End If
End Select
prior = item.Value
Next
Return True
End If
End Function
Private Function AcceptsAs(ByVal prior As Scanner.Token) As Boolean
Return (prior = Scanner.Token.Identifier)
End Function
Private Function AcceptsValue(ByVal prior As Scanner.Token) As Boolean
Return (prior = Scanner.Token.Equal)
End Function
Private Function AcceptsEOL(ByVal prior As Scanner.Token) As Boolean
Return (prior = Scanner.Token.Type OrElse prior = Scanner.Token.Boolean OrElse prior = Scanner.Token.Decimal OrElse prior = Scanner.Token.Number OrElse prior = Scanner.Token.String)
End Function
Private Function AcceptsEqual(ByVal prior As Scanner.Token) As Boolean
Return (prior = Scanner.Token.Type)
End Function
Private Function AcceptsIdentifier(ByVal prior As Scanner.Token) As Boolean
Return (prior = Scanner.Token.EOL OrElse prior = Scanner.Token.Modifier)
End Function
Private Function AcceptsModifier(ByVal prior As Scanner.Token) As Boolean
Return (prior = Scanner.Token.Null OrElse prior = Scanner.Token.EOL)
End Function
Private Function AcceptsType(ByVal prior As Scanner.Token) As Boolean
Return (prior = Scanner.Token.As)
End Function
End Class
Here is a test:
Code:
Module Module1
Sub Main()
Dim scan As Scanner = New Scanner
Dim parse As Parser = New Parser
scan.SourceCode = Console.ReadLine
scan.Scan()
If scan.Tokens.Values.Contains(Scanner.Token.Exception) = False Then
parse.Tokens = scan.Tokens
If parse.CanParse Then
Console.WriteLine("Success!")
Else
Console.WriteLine(parse.Exception)
End If
Else
For Each item In scan.Tokens
If item.Value = Scanner.Token.Exception Then
Console.WriteLine(item.Key)
End If
Next
End If
Console.ReadLine()
End Sub
End Module
-
Apr 23rd, 2014, 12:08 AM
#11
-
Apr 24th, 2014, 04:22 AM
#12
Re: high level compiler
As a side note, having written a couple of parsers/interpreters of my own mini language, rather than first figuring out how existing commercial compilers work, do what you are doing and parse each statement/line in turn and see how far you get and what problems arise.
You will find that sequential parsing is not as slow as you think on a modern computer - back in the day, compiling/linking, etc. could take hours. We have it so comfy with our JIT compilers (unfortunately, this has some common, unwanted, side effects).
One of the things I found useful (after the fact) was railroad diagrams, such as those used describing the JSON format at json.org
"Ok, my response to that is pending a Google search" - Bucky Katt.
"There are two types of people in the world: Those who can extrapolate from incomplete data sets." - Unk.
"Before you can 'think outside the box' you need to understand where the box is."
-
Apr 24th, 2014, 09:32 AM
#13
Re: high level compiler
Thank you for your input SJWhiteley, I really like the railroad diagrams. They make it so much easier to follow. I tried to Google sequential parsing and didn't get much results, is that what I'm currently doing right now?
-
Jun 4th, 2014, 04:20 PM
#14
-
Jun 4th, 2014, 04:21 PM
#15
Re: high level compiler
Declaration Process:
I've also narrowed my language down to a DSL that specializes on matrix math, however I plan to keep it broad enough to where I can eventually expand it.
-
Jun 5th, 2014, 12:22 AM
#16
Re: high level compiler
Here is another important update... I've successfully implemented a true lexical analyzer. It uses REGEX to match for the pattern and eliminates all the conditional statements:
Code:
Module Lexer
Private classes As List(Of List(Of String))
Private Sub LoadClasses()
classes = New List(Of List(Of String))
Dim identifier As List(Of String) = New List(Of String)
identifier.AddRange({"identifier", "[a-zA-Z_][a-zA-Z_0-9]*"}) 'Matches a series of letters, numbers, or underscore that starts with a letter
Dim whitespace As List(Of String) = New List(Of String)
whitespace.AddRange({"whitespace", "\s"}) 'Matches any sequence of blanks or new lines
Dim keywords As List(Of String) = New List(Of String)
keywords.AddRange({"keyword", "global", "local", "as", "bool", "decimal", "number", "text"}) 'Matches a few keywords
Dim operators As List(Of String) = New List(Of String)
operators.AddRange({"operator", "=", "\+", "-", "\*", "\/", "\^"})
Dim bool As List(Of String) = New List(Of String)
bool.AddRange({"bool", "true", "false"}) 'Matches: true, false
'TODO: fix decimals REGEX to accept negative sign
Dim [decimal] As List(Of String) = New List(Of String)
[decimal].AddRange({"decimal", "\d+(\.\d)?"}) 'Matches any number(positive or negative) with an optional decimal place
'TODO: fix number REGEX to accept negative sign
Dim number As List(Of String) = New List(Of String)
number.AddRange({"number", "\d"}) 'Matches any number(positive or negative) with out a decimal
Dim text As List(Of String) = New List(Of String)
text.AddRange({"text", "\""(\\.|[^""])*\"""}) 'Matches any unicode character, except for the double quote
'IMPORTANT: have the identifier at the end and have number before decimal
classes.AddRange({keywords, whitespace, operators, bool, number, [decimal], text, identifier})
End Sub
Public Function Tokenize(ByVal source As String, ByRef tokens() As String, ByRef values() As String) As String
Call LoadClasses()
'A list to store our tokens and values
Dim tokenCollection As List(Of String) = New List(Of String)
Dim valueCollection As List(Of String) = New List(Of String)
'The exception that will be returned(hopefully empty)
Dim exception As String = String.Empty
'Loop through each line in the source
For Each line As String In source.Split(Environment.NewLine)
'Loop through each word in that line
For Each word As String In line.Split()
'The possible token
Dim token As String = String.Empty
'Loop through each token class
For Each tokenClass As List(Of String) In classes
'Loop through each REGEX in the token class
For i As Integer = 1 To tokenClass.Count - 1
'Use REGEX
Dim regex As System.Text.RegularExpressions.Regex = New System.Text.RegularExpressions.Regex(tokenClass.Item(i))
Dim match As System.Text.RegularExpressions.Match = regex.Match(word)
If match.Success Then
'If there is a match the set the token variable and leave the REGEX loop
token = tokenClass.Item(0)
Exit For
End If
Next
'Leave the token class loop if there is already a match
If Not String.IsNullOrWhiteSpace(token) Then
Exit For
End If
Next
If String.IsNullOrWhiteSpace(token) Then
'If no token was returned, then throw an exception
tokens = Nothing
values = Nothing
Return "Unrecognizable code at: " & word
Else
'Otherwise add it to the collection
tokenCollection.Add(token)
valueCollection.Add(word)
End If
Next
Next
'Set the tokens and values parameters if there has not been an exception thrown
If String.IsNullOrWhiteSpace(exception) Then
tokens = tokenCollection.ToArray
values = valueCollection.ToArray
End If
'Hopefully return an empty exception
Return exception
End Function
End Module
Here it is implemented:
Code:
Option Strict On
Option Explicit On
Public Class Form1
Private splitter As SplitContainer
Private txtSource, txtTokens As TextBox
Private btnScan As Button
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
splitter = New SplitContainer With {.Dock = DockStyle.Fill}
txtSource = New TextBox With {.Dock = DockStyle.Fill, .Multiline = True, .WordWrap = False}
txtTokens = New TextBox With {.Dock = DockStyle.Fill, .Multiline = True, .ReadOnly = True, .ScrollBars = ScrollBars.Both, .WordWrap = False}
btnScan = New Button With {.Dock = DockStyle.Bottom, .Text = "Scan"}
AddHandler btnScan.Click, AddressOf btnScan_Click
Me.Controls.AddRange({splitter, btnScan})
splitter.Panel1.Controls.Add(txtSource)
splitter.Panel2.Controls.Add(txtTokens)
End Sub
Private Sub btnScan_Click(ByVal sender As Object, ByVal e As EventArgs)
txtTokens.Clear()
Dim tokens() As String = {}
Dim values() As String = {}
Dim exception As String
For Each line As String In txtSource.Lines
exception = Lexer.Tokenize(line, tokens, values)
If String.IsNullOrWhiteSpace(exception) Then
For i As Integer = 0 To tokens.Length - 1
txtTokens.Text &= String.Format("(<{0}> , <{1}>) ", tokens(i), values(i))
Next
If txtTokens.TextLength > 0 Then
txtTokens.Text = txtTokens.Text.Substring(0, txtTokens.TextLength - 1) 'Remove the extra blank space
End If
txtTokens.Text &= Environment.NewLine
Else
MessageBox.Show(exception, "Exception", MessageBoxButtons.OK)
End If
Next
If txtTokens.TextLength > 0 Then
txtTokens.Text = txtTokens.Text.Substring(0, txtTokens.TextLength - 1) 'Remove the extra new line
End If
End Sub
End Class
The tokens are displayed in this format in the txtToken textbox: (token, value)
-
Jun 16th, 2014, 01:28 PM
#17
Re: high level compiler
So I do have a two questions and they both relate to parsers.
A) Are they specific to each process. For example, will I have one parser for my declaration/initialization process, another parser for calling functions, etc.
And
B) Is a parser just a collection conditional statements such as: If I have tokenA, then the token preceding/succeeding must be tokenB or is there a different concept?
-
Jun 16th, 2014, 02:30 PM
#18
Re: high level compiler
The .Net language compilers are not "JIT compilers."
What they do is compile from source to interpreted p-code that they chose to call "IL" in an attempt to pretend they'd invented something. The p-code, or bytecode, or IL is interpreted by the CLR.
Like many other script interpreters, the CLR has some limited abilities to JIT-compile fragments of IL that its internal heuristics deem worthy at runtime. But it would be a rare bit of .Net script that ever gets 100% compiled to native code. By its nature this isn't an optimizing compiler because it views the code through small peepholes. NGen just does the same thing via a single, static pass over the script. This can lead to even worse performance than dynamically JITted .Net script.
Compiler theory and practice are usually handled in a full year university course building upon prior courses. Without this background you are likely to get frustrated quickly because the material assumes familiarity with a large amount of prerequisite material.
You might do some searches and find published material such as Compiler Basics, Basic Compiling Theory, or Basics of Compiler Design (PDF).
Last edited by dilettante; Jun 16th, 2014 at 02:33 PM.
-
Jun 16th, 2014, 02:34 PM
#19
Re: high level compiler
Compiler theory and practice are usually handled in a full year university course building upon prior courses. Without this background you are likely to get frustrated quickly because the material assumes familiarity with a large amount of background material.
Thanks for your concern, but as you can tell from the start date of this thread I'm in it for the long haul.
You might do some searches and find published material such as Compiler Basics, Basic Compiling Theory, or Basics of Compiler Design (PDF).
I have read the second link before, but not the first and the last link so thank you for those links.
BTW, where did this come from?
The .Net language compilers are not "JIT compilers."
-
Jun 16th, 2014, 04:44 PM
#20
Re: high level compiler
Originally Posted by dday9
BTW, where did this come from?
There are some ill-informed comments in earlier posts that play into the vicious cycle of misinformation if not nipped in the bud. I assume these stem from crowdsourced "knowledge" such as this very kind of thread.
-
Jun 16th, 2014, 04:55 PM
#21
Re: high level compiler
There are some ill-informed comments in earlier posts that play into the vicious cycle of misinformation if not nipped in the bud.
I think you misunderstood my question. I was asking specifically which statements lead you to post that? The only thing that I can see that comes close is this:
We have it so comfy with our JIT compilers (unfortunately, this has some common, unwanted, side effects).
I assume these stem from crowdsourced "knowledge" such as this very kind of thread.
What's' the old saying about assumption?
-
Jun 16th, 2014, 06:53 PM
#22
Re: high level compiler
I "assume" the "nip it in the bud" comment was in reference to this question in post #8.
Originally Posted by dday9
...
But what would be after that? How would I be able to preform that command without translating it to vb.net code and compiling from the JIT compiler?
You mentioned "vb.net code" and the "JIT compiler", and dilettante just wanted to make sure you didn't start falsely associated vb.net code as using a JIT compiling paradigm.
-
Jun 16th, 2014, 07:11 PM
#23
Re: high level compiler
Ah ok. Nah I was referring to the fact that if I translated my code to VB.Net code then eventually it would go thru JIT.
-
Jun 18th, 2014, 07:32 AM
#24
Re: high level compiler
I had noted 'JIT compilation' and didn't mean to imply that .NET was purely JIT...it was overly simplistic. Prior to .NET, JIT was a popular paradigm to aspire to in scripting and language 'compilation' - it still is the best way to compile for a dynamic language.
Are we trying to mimic .NET compilation, or create a straightforward language compiler? How will the compilation be performed? JIT can be an easy or hard mechanism to implement, depending on how far you want to go, a basic interpreter is simpler. Performing a .NET-style compilation is, I think, a waste of time if you are targeting a single platform (and if you are using .NET, then the single platform is most likely).
Unless you are truly interested in obtaining the fastest compiled code (JIT; I'm assuming a dynamic language, of course), then stick with interpretation. Parsing effectively is hard enough, especially when trying to identify coding patterns (e.g. a simple loop). Of course, JIT is really a waste of time if you are compiling to VB code (or, rather, IL).
EDIT: re. .NET as JIT. While it isn't a pure JIT compiler, it has the elements of a JIT compiler, that, for 95% of all aspects, it may as well be a JIT compiler (YMMV).
Last edited by SJWhiteley; Jun 18th, 2014 at 08:34 AM.
"Ok, my response to that is pending a Google search" - Bucky Katt.
"There are two types of people in the world: Those who can extrapolate from incomplete data sets." - Unk.
"Before you can 'think outside the box' you need to understand where the box is."
-
Jun 18th, 2014, 08:40 AM
#25
Re: high level compiler
I'm thinking about creating an interpreter instead of a compiler, just for simplicity. I want to target multiple platforms, so I'm actually initially writing the code in VB.Net and then recoding it in Lua.
-
Jun 18th, 2014, 06:21 PM
#26
Re: high level compiler
I've been reading the Dragon Book lately and it says:
Regular expressions are most useful for describing the structure of lexical constructs such as identifiers, constants, keywords, and so forth. Grammars, on the other hand, are most useful in describing nested structures such as balanced parentheses, matching begin-end's, corresponding if-then-else's, and so on. As we have noted, these nested structures cannot be described by regular expressions.
Page 173 Sec. 4.2 of Compilers Principles, Techniques, and Tools by Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman
Could I use RegEx to express my declaration and initialization process and if so, should it be in the lexical or syntax analyzer part?
-
Jun 18th, 2014, 10:06 PM
#27
Re: high level compiler
Yet again, here is another update:
Token Class:
Code:
Option Strict On
Option Explicit On
Public Class Token
Private nToken As List(Of Token)
<System.ComponentModel.Description("Gets the collection of acceptable subsequent tokens.")> _
Public ReadOnly Property NextTokenCollection() As List(Of Token)
Get
Return nToken
End Get
End Property
Private pattern As String
<System.ComponentModel.Description("Gets the Regular Expression patterns that would match the value for the token.")> _
Public Property RegexPattern() As String
Get
Return pattern
End Get
Set(value As String)
pattern = value
End Set
End Property
Private s As String
<System.ComponentModel.Description("Gets or sets the symbol associated with the token.")> _
Public Property Symbol() As String
Get
Return s
End Get
Set(ByVal value As String)
s = value
End Set
End Property
Private v As String
<System.ComponentModel.Description("Gets or sets the value associated with the token.")> _
Public Property Value() As String
Get
Return v
End Get
Set(ByVal value As String)
v = value
End Set
End Property
<System.ComponentModel.Description("Returns a boolean value based on if the token passed through the argument is in the NextTokenCollection.")> _
Friend Function CanPrecede(ByVal NextToken As Token) As Boolean
Return nToken.Contains(NextToken)
End Function
Sub New()
nToken = New List(Of Token)
End Sub
End Class
Lexical Analyzer:
Code:
Option Strict On
Option Explicit On
Module Lexical_Analyzer
Dim tokenClasses As List(Of Token)
Private Sub LoadClasses()
tokenClasses = New List(Of Token)
Dim [as], id, concatenation, modifier, [operator], type, value, whitespace As New Token
With [as]
'The AS would accept a TYPE in this situation: local foo as type
.NextTokenCollection.Add(type)
'The pattern broken dow:
'^ - Look ahead that states that the resulting pattern would start at the beginning of the string
'as - literally matches the word as
'+$ - Look behind that states that the resulting pattern would finish at the end of the string
.RegexPattern = "^(as)+$"
.Symbol = "keyword"
End With
With id
'The IDENTIFIER would accept a KEYWORD in this situation: local foo as type
'The IDENTIFIER would accept an AS in this situation: foo = value
'The IDENTIFIER would accept NOTHING in this situation: foo1 = foo2
.NextTokenCollection.AddRange({[as], [operator], Nothing})
'The pattern broken down:
'^ - Look ahead that states that the resulting pattern would start at the beginning of the string
'[a-zA-Z_][a-zA-Z_0-9] - States that it will match a string that starts with a letter or underscore followed by any letter/underscore/number
'+$ - Look behind that states that the resulting pattern would finish at the end of the string
.RegexPattern = "^([a-zA-Z_[a-zA-Z_0-9])+$"
.Symbol = "identifier"
End With
With concatenation
'The CONCATENATION would accept a VALUE in this situation: value .. value
.NextTokenCollection.Add(value)
'The pattern broken down:
'^ - Look ahead that states that the resulting pattern would start at the beginning of the string
'\.\. - Literally matches two dots back to back
'+$ - Look behind that states that the resulting pattern would finish at the end of the string
.RegexPattern = "^(\.\.)+$"
.Value = "concatenation"
End With
With modifier
'The MODIFIER would accept an IDENTIFIER in this situation: local foo as type
.NextTokenCollection.Add(id)
'The pattern broken down:
'^ - Look ahead that states that the resulting pattern would start at the beginning of the string
'global|local - literally matches the words global or local
'+$ - Look behind that states that the resulting pattern would finish at the end of the string
.RegexPattern = "^(global|local)+$"
.Symbol = "modifier"
End With
With [operator]
'The OPERATOR would accept a VALUE in this situation: local foo as type = value
'The OPERATOR would accept an IDENTIFIER in this situation: foo1 = foo2
.NextTokenCollection.AddRange({value, id})
'The pattern broken down:
'^ - Look ahead that states that the resulting pattern would start at the beginning of the string
'\+ - Literally matches the +
'- - Literally matches the -
'\* - Literally matches the *
'\/ - Literally matches the /
'\^ - Literally matches the ^
'= - Literally matches the =
'+$ - Look behind that states that the resulting pattern would finish at the end of the string
.RegexPattern = "^(\+|-|\*|\/|\^|=)+$"
.Symbol = "operator"
End With
With type
'The TYPE would accept a WHITESPACE in this situation: local foo as type
'The TYPE would accept an OPERATOR in this situation: local foo as type = value
'The TYPE would accept a NOTHING in this situatino: local foo as type
.NextTokenCollection.AddRange({whitespace, [operator], Nothing})
'The pattern broken down
'^ - Look ahead that states that the resulting pattern would start at the beginning of the string
'bool - Literally mathes the word bool
'number - Literally mathes the word number
'text - Literally mathes the word text
'+$ - Look behind that states that the resulting pattern would finish at the end of the string
.RegexPattern = "^(bool|number|text)+$"
.Symbol = "type"
End With
With value
'The VALUE would accept a WHITESPACE in this situation: local foo as type = value
'The VALUE would accept an OPERATOR in this situation: value + value
'The VALUE would accept a CONCATENATION in this situation: value .. value
'The VALUE would accept a NOTHING in this situation: foo = value
.NextTokenCollection.AddRange({whitespace, [operator], concatenation})
'The pattern broken down:
'^ - Look ahead that states that the resulting pattern would start at the beginning of the string
'nothing - Literally matches the word nothing
'true|false - Literally matches the words true or false
'\d+(\.\d)? - Matches any number with the option for one decimal point
'\"(\\.|[^"])*\" - Matches any unicode character minus the double quote
'+$ - Look behind that states that the resulting pattern would finish at the end of the string
.RegexPattern = "^(nothing|true|false|d+(\.\d)?|\""(\\.|[^""])*\"")+$"
.Value = "value"
End With
With whitespace
'The WHITESPACE would accept nothing in any situation
'The pattern broken down:
'^ - Look ahead that states that the resulting pattern would start at the beginning of the string
'Environment.NewLine() - Literally matches a new line character
'+$ - Look behind that states that the resulting pattern would finish at the end of the string
.RegexPattern = String.Format("^({0})+$", Environment.NewLine)
.Symbol = "whitespace"
End With
'IMPORTANT: Have the IDENTIFIER at the very end of the list
tokenClasses.AddRange({[as], concatenation, modifier, [operator], type, value, whitespace, id})
End Sub
Friend Function Parse(ByVal source As String, ByRef tokens() As Token) As Exception()
If IsNothing(tokenClasses) Then
Call LoadClasses()
End If
Dim ex As List(Of Exception) = New List(Of Exception)
Dim t As List(Of Token) = New List(Of Token)
'Loop through each line
For l As Integer = 0 To source.Split({Environment.NewLine}, StringSplitOptions.None).Length - 1
Dim line As String = source.Split({Environment.NewLine}, StringSplitOptions.None)(l)
'Loop through each word
For w As Integer = 0 To line.Split({" "}, StringSplitOptions.RemoveEmptyEntries).Length - 1
Dim word As String = line.Split({" "}, StringSplitOptions.RemoveEmptyEntries)(w)
'The possible token
Dim token As Token = Nothing
'Loop through each token class
For tc As Integer = 0 To tokenClasses.Count - 1
'The current class in the iteration
Dim tokenClass As Token = tokenClasses.Item(tc)
'Check if the RegEx matches
Dim regex As System.Text.RegularExpressions.Regex = New System.Text.RegularExpressions.Regex(tokenClass.RegexPattern)
Dim match As System.Text.RegularExpressions.Match = regex.Match(word)
If match.Success Then
'If there is a match the set the token variable and leave the RegEx loop
token = tokenClass
token.Value = word
Exit For
End If
'Leave the token class loop if there is already a match
If token IsNot Nothing Then
Exit For
End If
Next 'End of token class
'If there was no match found then add an exception
'Otherwise, add it to the list
If token IsNot Nothing Then
t.Add(token)
Else
ex.Add(New Exception(String.Format("'{0}' is not valid syntax.{1}Line: {2}{1}Word: {3}", word, Environment.NewLine, l + 1, w + 1)))
End If
Next 'End of word
Next 'End of line
'Return the tokens if there are no exceptions
tokens = If(ex.Count = 0, t.ToArray, Nothing)
'Return any exceptions(hopefully none)
Return ex.ToArray
End Function
End Module
Syntax Analyzer:
Code:
Option Strict On
Option Explicit On
Module Syntax_Analyzer
Friend Function Parse(ByVal tokens() As Token) As Boolean
For t As Integer = 0 To tokens.Length - 1
'Get the current and next token
Dim currentToken As Token = tokens(t)
Dim nextToken As Token = If(t + 1 <= tokens.Length - 1, tokens(t + 1), Nothing)
If Not currentToken.CanPrecede(nextToken) Then
Return False
End If
Next
Return True
End Function
End Module
Last edited by dday9; Jun 18th, 2014 at 10:10 PM.
-
Jun 18th, 2014, 10:12 PM
#28
Re: high level compiler
The syntax analyzer in that example isn't as advanced yet because it doesn't keep track of identifiers. For example:
would return a true value even if foo1 or foo2 isn't declared.
While I haven't addressed conditional logic or loops yet, this syntax analyzer wouldn't pick them up anyways. Atleast not yet.
-
Jun 18th, 2014, 11:13 PM
#29
Re: high level compiler
To account for conditional statements, I've added a conditional token and have adjusted the parser:
conditional token
Code:
Option Strict On
Option Explicit On
Public Class Conditional_Token
Inherits Token
Private pEnd As Token
<System.ComponentModel.Description("Gets or sets if the token that marks the end of the statement.")> _
Public Property EndToken() As Token
Get
Return pEnd
End Get
Set(ByVal value As Token)
pEnd = value
End Set
End Property
Private fEnd As Boolean
<System.ComponentModel.Description("Gets or sets if the conditional token has found it's end token.")> _
Public Property FoundEnd() As Boolean
Get
Return fEnd
End Get
Set(ByVal value As Boolean)
fEnd = value
End Set
End Property
Private [end] As Boolean
<System.ComponentModel.Description("Gets or sets if the token is an end token.")> _
Public Property IsEnd() As Boolean
Get
Return [end]
End Get
Set(ByVal value As Boolean)
[end] = value
End Set
End Property
Private used As Boolean
<System.ComponentModel.Description("Gets or sets if the conditional token has been used before.")> _
Public Property IsUsed() As Boolean
Get
Return used
End Get
Set(ByVal value As Boolean)
used = value
End Set
End Property
End Class
Syntax Analyzer(updated):
Code:
Option Strict On
Option Explicit On
Module Syntax_Analyzer
Friend Function Parse(ByVal tokens() As Token) As Boolean
For t As Integer = 0 To tokens.Length - 1
Dim currentToken As Token = tokens(t)
Dim nextToken As Token = If(t + 1 <= tokens.Length - 1, tokens(t + 1), Nothing)
If currentToken.GetType Is GetType(Conditional_Token) AndAlso Not DirectCast(currentToken, Conditional_Token).IsEnd Then
For nt As Integer = t + 1 To tokens.Length - 1
Dim newCurrent As Token = tokens(nt)
If newCurrent.GetType Is GetType(Conditional_Token) AndAlso DirectCast(newCurrent, Conditional_Token).IsEnd AndAlso Not DirectCast(newCurrent, Conditional_Token).IsUsed Then
DirectCast(newCurrent, Conditional_Token).IsUsed = True
With DirectCast(currentToken, Conditional_Token)
.EndToken = newCurrent
.FoundEnd = True
End With
Exit For
End If
Next
If Not DirectCast(currentToken, Conditional_Token).FoundEnd Then
Return False
End If
ElseIf Not currentToken.CanPrecede(nextToken) Then
Return False
End If
Next
Return True
End Function
End Module
I'm just working out some bugs in the Lexical Analyzer to match the New Line constant.
Edit - With some help from JMcIlhinney, I was able to fix the lexical analyzer to match the new lines:
Code:
Option Strict On
Option Explicit On
Public Class Lexical_Analyzer
Dim tokenClasses As List(Of Token)
Private Sub LoadClasses()
tokenClasses = New List(Of Token)
Dim [as], concatenation, id, modifier, [operator], type, value, whitespace As New Token
Dim conditional, [end] As New Conditional_Token
With [as]
'The AS would accept a TYPE in this situation: local foo as type
.NextTokenCollection.Add(type)
'The pattern broken dow:
'^ - Look ahead that states that the resulting pattern would start at the beginning of the string
'as - literally matches the word as
'+$ - Look behind that states that the resulting pattern would finish at the end of the string
.RegexPattern = "^(as)+$"
.Symbol = "keyword"
End With
With concatenation
'The CONCATENATION would accept a VALUE in this situation: value .. value
.NextTokenCollection.Add(value)
'The pattern broken down:
'^ - Look ahead that states that the resulting pattern would start at the beginning of the string
'\.\. - Literally matches two dots back to back
'+$ - Look behind that states that the resulting pattern would finish at the end of the string
.RegexPattern = "^(\.\.)+$"
.Value = "concatenation"
End With
With conditional
'The CONDITIONAL would accept a VALUE in this situation: if value = value
'The CONDITIONAL would accept a IDENTIFIER in this situatino: if foo = value
.NextTokenCollection.AddRange({value, id})
'The pattern broke down:
'^ - Look ahead that states that the resulting pattern would start at the beginning of the string
'if - Literally matches the word if
'+$ - Look behind that states that the resulting pattern would finish at the end of the string
.RegexPattern = "^(if)+$"
.Symbol = "conditional"
.EndToken = [end]
End With
With [end]
'The END would accept a WHITESPACE and NOTHING
.NextTokenCollection.AddRange({whitespace, Nothing})
'The pattern broken dow:
'^ - Look ahead that states that the resulting pattern would start at the beginning of the string
'end - literally matches the word end
'+$ - Look behind that states that the resulting pattern would finish at the end of the string
.RegexPattern = "^(end)+$"
.Symbol = "end"
.IsEnd = True
End With
With id
'The IDENTIFIER would accept a KEYWORD in this situation: local foo as type
'The IDENTIFIER would accept an OPERATOR in this situation: foo = value
'The IDENTIFIER would accept NOTHING in this situation: foo1 = foo2
'THE IDENTIFIER would accept WHITESPACE in this situation: foo1 = foo2
.NextTokenCollection.AddRange({[as], [operator], Nothing, whitespace})
'The pattern broken down:
'^ - Look ahead that states that the resulting pattern would start at the beginning of the string
'[a-zA-Z_][a-zA-Z_0-9] - States that it will match a string that starts with a letter or underscore followed by any letter/underscore/number
'+$ - Look behind that states that the resulting pattern would finish at the end of the string
.RegexPattern = "^([a-zA-Z_[a-zA-Z_0-9])+$"
.Symbol = "identifier"
End With
With modifier
'The MODIFIER would accept an IDENTIFIER in this situation: local foo as type
.NextTokenCollection.Add(id)
'The pattern broken down:
'^ - Look ahead that states that the resulting pattern would start at the beginning of the string
'global|local - literally matches the words global or local
'+$ - Look behind that states that the resulting pattern would finish at the end of the string
.RegexPattern = "^(global|local)+$"
.Symbol = "modifier"
End With
With [operator]
'The OPERATOR would accept a VALUE in this situation: local foo as type = value
'The OPERATOR would accept an IDENTIFIER in this situation: foo1 = foo2
.NextTokenCollection.AddRange({value, id})
'The pattern broken down:
'^ - Look ahead that states that the resulting pattern would start at the beginning of the string
'\+ - Literally matches the +
'- - Literally matches the -
'\* - Literally matches the *
'\/ - Literally matches the /
'\^ - Literally matches the ^
'= - Literally matches the =
'+$ - Look behind that states that the resulting pattern would finish at the end of the string
.RegexPattern = "^(\+|-|\*|\/|\^|=)+$"
.Symbol = "operator"
End With
With type
'The TYPE would accept a WHITESPACE in this situation: local foo as type
'The TYPE would accept an OPERATOR in this situation: local foo as type = value
'The TYPE would accept a NOTHING in this situatino: local foo as type
.NextTokenCollection.AddRange({whitespace, [operator], Nothing})
'The pattern broken down
'^ - Look ahead that states that the resulting pattern would start at the beginning of the string
'bool - Literally mathes the word bool
'number - Literally mathes the word number
'text - Literally mathes the word text
'+$ - Look behind that states that the resulting pattern would finish at the end of the string
.RegexPattern = "^(bool|number|text)+$"
.Symbol = "type"
End With
With value
'The VALUE would accept a WHITESPACE in this situation: local foo as type = value
'The VALUE would accept an OPERATOR in this situation: value + value
'The VALUE would accept a CONCATENATION in this situation: value .. value
'The VALUE would accept a NOTHING in this situation: foo = value
.NextTokenCollection.AddRange({whitespace, [operator], concatenation})
'The pattern broken down:
'^ - Look ahead that states that the resulting pattern would start at the beginning of the string
'nothing - Literally matches the word nothing
'true|false - Literally matches the words true or false
'\d+(\.\d)? - Matches any number with the option for one decimal point
'\"(\\.|[^"])*\" - Matches any unicode character minus the double quote
'+$ - Look behind that states that the resulting pattern would finish at the end of the string
.RegexPattern = "^(nothing|true|false|d+(\.\d)?|\""(\\.|[^""])*\"")+$"
.Value = "value"
End With
With whitespace
'The WHITESPACE accepts anything as the preceeding character
.NextTokenCollection.AddRange({[as], concatenation, conditional, [end], modifier, [operator], type, value, whitespace, id})
'The pattern broken down:
'^ - Look ahead that states that the resulting pattern would start at the beginning of the string
'Environment.NewLine() - Literally matches a new line character
'+$ - Look behind that states that the resulting pattern would finish at the end of the string
.RegexPattern = Environment.NewLine
.Symbol = "whitespace"
End With
'IMPORTANT: Have the IDENTIFIER at the very end of the list
tokenClasses.AddRange({[as], concatenation, conditional, [end], modifier, [operator], type, value, whitespace, id})
End Sub
Friend Function Parse(ByVal source As String, ByRef tokens() As Token) As Exception()
If IsNothing(tokenClasses) Then
Call LoadClasses()
End If
Dim ex As List(Of Exception) = New List(Of Exception)
Dim t As List(Of Token) = New List(Of Token)
'Loop through each token
Dim splitTokens() As String = GetTokens(source)
For w As Integer = 0 To tokens.Length - 2
Dim word As String = splitTokens(w)
'The possible token
Dim token As Token = Nothing
'Loop through each token class
For tc As Integer = 0 To tokenClasses.Count - 1
'The current class in the iteration
Dim tokenClass As Token = tokenClasses.Item(tc)
If tokenClass.RegexPattern = Environment.NewLine Then
'If the pattern is just the new line constant, then there is no pattern
'I just need to check for if the word is actually the new line constant
If word = Environment.NewLine Then
token = tokenClass
token.Value = word
Exit For
End If
Else
'Check if the RegEx matches
Dim regex As System.Text.RegularExpressions.Regex = New System.Text.RegularExpressions.Regex(tokenClass.RegexPattern)
Dim match As System.Text.RegularExpressions.Match = regex.Match(word)
If match.Success Then
'If there is a match the set the token variable and leave the RegEx loop
token = tokenClass
token.Value = word
Exit For
End If
End If
'Leave the token class loop if there is already a match
If token IsNot Nothing Then
Exit For
End If
Next 'End of token class
'If there was no match found then add an exception
'Otherwise, add it to the list
If token IsNot Nothing Then
t.Add(token)
Else
ex.Add(New Exception(String.Format("'{0}' is not valid syntax.", word)))
End If
Next 'End of tokens
'Return the tokens if there are no exceptions
tokens = If(ex.Count = 0, t.ToArray, Nothing)
'Return any exceptions(hopefully none)
Return ex.ToArray
End Function
Private Function GetTokens(ByVal text As String) As String()
Dim lines() As String = text.Split(New String() {Environment.NewLine}, StringSplitOptions.None)
Dim tokens As List(Of String) = New List(Of String)
For Each line In lines
tokens.AddRange(line.Split(New Char() {" "c}, StringSplitOptions.RemoveEmptyEntries))
tokens.Add(Environment.NewLine)
Next
Return tokens.ToArray()
End Function
End Class
It even allows for nested if statements, though it does not allow for else and elseif statements yet.
Last edited by dday9; Jun 19th, 2014 at 10:08 AM.
-
Jun 20th, 2014, 12:28 PM
#30
Re: high level compiler
Well I've adjusted my code a bit more now and figured that I'd give an update:
Code:
Option Strict On
Option Explicit On
Public Class Token
Private ex As List(Of Exception)
Public ReadOnly Property Exceptions() As List(Of Exception)
Get
Return ex
End Get
End Property
Private nToken As List(Of Token)
<System.ComponentModel.Description("Gets the collection of acceptable subsequent tokens.")> _
Public ReadOnly Property NextTokenCollection() As List(Of Token)
Get
Return nToken
End Get
End Property
Private pattern As String
<System.ComponentModel.Description("Gets the Regular Expression patterns that would match the value for the token.")> _
Public Property RegexPattern() As String
Get
Return pattern
End Get
Set(ByVal value As String)
pattern = value
End Set
End Property
Private s As String
<System.ComponentModel.Description("Gets or sets the symbol associated with the token.")> _
Public Property Symbol() As String
Get
Return s
End Get
Set(ByVal value As String)
s = value
End Set
End Property
Private v As String
<System.ComponentModel.Description("Gets or sets the value associated with the token.")> _
Public Property Value() As String
Get
Return v
End Get
Set(ByVal value As String)
v = value
End Set
End Property
<System.ComponentModel.Description("Returns a boolean value based on if the token passed through the argument is in the NextTokenCollection.")> _
Friend Function CanPrecede(ByVal NextToken As Token) As Boolean
For Each item As Token In nToken
If NextToken IsNot Nothing AndAlso item IsNot Nothing AndAlso NextToken.Symbol = item.Symbol Then
Return True
ElseIf IsNothing(NextToken) AndAlso item Is Nothing Then
Return True
End If
Next
Return False
End Function
Friend Overridable Function Clone() As Token
Dim clonedToken As Token = New Token
With clonedToken
.Exceptions.AddRange(ex.ToArray)
.NextTokenCollection.AddRange(nToken.ToArray)
.RegexPattern = pattern
.Symbol = s
End With
Return clonedToken
End Function
Sub New()
ex = New List(Of Exception)
nToken = New List(Of Token)
End Sub
End Class
Code:
Option Strict On
Option Explicit On
Public Class Conditional_Token
Inherits Token
Private pEnd As Token
<System.ComponentModel.Description("Gets or sets if the token that marks the end of the statement.")> _
Public Property EndToken() As Token
Get
Return pEnd
End Get
Set(ByVal value As Token)
pEnd = value
End Set
End Property
Private fEnd As Boolean
<System.ComponentModel.Description("Gets or sets if the conditional token has found it's end token.")> _
Public Property FoundEnd() As Boolean
Get
Return fEnd
End Get
Set(ByVal value As Boolean)
fEnd = value
End Set
End Property
Private [end] As Boolean
<System.ComponentModel.Description("Gets or sets if the token is an end token.")> _
Public Property IsEnd() As Boolean
Get
Return [end]
End Get
Set(ByVal value As Boolean)
[end] = value
End Set
End Property
Private used As Boolean
<System.ComponentModel.Description("Gets or sets if the conditional token has been used before.")> _
Public Property IsUsed() As Boolean
Get
Return used
End Get
Set(ByVal value As Boolean)
used = value
End Set
End Property
Friend Overrides Function Clone() As Token
Dim clonedToken As Conditional_Token = New Conditional_Token
With clonedToken
.Exceptions.AddRange(Me.Exceptions.ToArray)
.NextTokenCollection.AddRange(Me.NextTokenCollection.ToArray)
.RegexPattern = Me.RegexPattern
.Symbol = Me.Symbol
.EndToken = pEnd
.FoundEnd = fEnd
.IsEnd = [end]
.IsUsed = used
End With
Return clonedToken
End Function
End Class
Next Post...
-
Jun 20th, 2014, 12:29 PM
#31
Re: high level compiler
Code:
Option Strict On
Option Explicit On
Public Module Lexical_Analyzer
Dim tokenClasses As List(Of Token)
Private Sub LoadClasses()
tokenClasses = New List(Of Token)
Dim [as], concatenation, id, modifier, [operator], type, value, whitespace As New Token
Dim conditional, [end] As New Conditional_Token
With [as]
'The AS would accept a TYPE in this situation: local foo as type
.NextTokenCollection.Add(type)
'The pattern broken dow:
'^ - Look ahead that states that the resulting pattern would start at the beginning of the string
'as - literally matches the word as
'+$ - Look behind that states that the resulting pattern would finish at the end of the string
.RegexPattern = "^(as)+$"
.Symbol = "keyword"
End With
With concatenation
'The CONCATENATION would accept a VALUE in this situation: value .. value
.NextTokenCollection.Add(value)
.Symbol = "concatentation"
'The pattern broken down:
'^ - Look ahead that states that the resulting pattern would start at the beginning of the string
'\.\. - Literally matches two dots back to back
'+$ - Look behind that states that the resulting pattern would finish at the end of the string
.RegexPattern = "^(\.\.)+$"
.Value = "concatenation"
End With
With conditional
'The CONDITIONAL would accept a VALUE in this situation: if value = value
'The CONDITIONAL would accept a IDENTIFIER in this situatino: if foo = value
.NextTokenCollection.AddRange({value, id})
'The pattern broke down:
'^ - Look ahead that states that the resulting pattern would start at the beginning of the string
'if - Literally matches the word if
'+$ - Look behind that states that the resulting pattern would finish at the end of the string
.RegexPattern = "^(if)+$"
.Symbol = "conditional"
.EndToken = [end]
.FoundEnd = False
.IsUsed = False
End With
With [end]
'The END would accept a WHITESPACE and NOTHING
.NextTokenCollection.AddRange({whitespace, Nothing})
'The pattern broken dow:
'^ - Look ahead that states that the resulting pattern would start at the beginning of the string
'end - literally matches the word end
'+$ - Look behind that states that the resulting pattern would finish at the end of the string
.RegexPattern = "^(end)+$"
.Symbol = "end"
.IsEnd = True
End With
With id
'The IDENTIFIER would accept a KEYWORD in this situation: local foo as type
'The IDENTIFIER would accept an OPERATOR in this situation: foo = value
'The IDENTIFIER would accept NOTHING in this situation: foo1 = foo2
'THE IDENTIFIER would accept WHITESPACE in this situation: foo1 = foo2
.NextTokenCollection.AddRange({[as], [operator], Nothing, whitespace})
'The pattern broken down:
'^ - Look ahead that states that the resulting pattern would start at the beginning of the string
'[a-zA-Z_][a-zA-Z_0-9] - States that it will match a string that starts with a letter or underscore followed by any letter/underscore/number
'+$ - Look behind that states that the resulting pattern would finish at the end of the string
.RegexPattern = "^([a-zA-Z_[a-zA-Z_0-9])+$"
.Symbol = "identifier"
End With
With modifier
'The MODIFIER would accept an IDENTIFIER in this situation: local foo as type
.NextTokenCollection.Add(id)
'The pattern broken down:
'^ - Look ahead that states that the resulting pattern would start at the beginning of the string
'global|local - literally matches the words global or local
'+$ - Look behind that states that the resulting pattern would finish at the end of the string
.RegexPattern = "^(global|local)+$"
.Symbol = "modifier"
End With
With [operator]
'The OPERATOR would accept a VALUE in this situation: local foo as type = value
'The OPERATOR would accept an IDENTIFIER in this situation: foo1 = foo2
.NextTokenCollection.AddRange({value, id})
'The pattern broken down:
'^ - Look ahead that states that the resulting pattern would start at the beginning of the string
'\+ - Literally matches the +
'- - Literally matches the -
'\* - Literally matches the *
'\/ - Literally matches the /
'\^ - Literally matches the ^
'= - Literally matches the =
'+$ - Look behind that states that the resulting pattern would finish at the end of the string
.RegexPattern = "^(\+|-|\*|\/|\^|=)+$"
.Symbol = "operator"
End With
With type
'The TYPE would accept a WHITESPACE in this situation: local foo as type
'The TYPE would accept an OPERATOR in this situation: local foo as type = value
'The TYPE would accept a NOTHING in this situatino: local foo as type
.NextTokenCollection.AddRange({whitespace, [operator], Nothing})
'The pattern broken down
'^ - Look ahead that states that the resulting pattern would start at the beginning of the string
'bool - Literally mathes the word bool
'number - Literally mathes the word number
'text - Literally mathes the word text
'+$ - Look behind that states that the resulting pattern would finish at the end of the string
.RegexPattern = "^(bool|number|text)+$"
.Symbol = "type"
End With
With value
'The VALUE would accept a WHITESPACE in this situation: local foo as type = value
'The VALUE would accept an OPERATOR in this situation: value + value
'The VALUE would accept a CONCATENATION in this situation: value .. value
'The VALUE would accept a NOTHING in this situation: foo = value
.NextTokenCollection.AddRange({whitespace, [operator], concatenation})
'The pattern broken down:
'^ - Look ahead that states that the resulting pattern would start at the beginning of the string
'nothing - Literally matches the word nothing
'true|false - Literally matches the words true or false
'\d+(\.\d)? - Matches any number with the option for one decimal point
'\"(\\.|[^"])*\" - Matches any unicode character minus the double quote
'+$ - Look behind that states that the resulting pattern would finish at the end of the string
.RegexPattern = "^(nothing|true|false|d+(\.\d)?|\""(\\.|[^""])*\"")+$"
.Value = "value"
End With
With whitespace
'The WHITESPACE accepts anything as the preceeding character
.NextTokenCollection.AddRange({[as], concatenation, conditional, [end], modifier, [operator], type, value, whitespace, id})
'The pattern broken down:
'^ - Look ahead that states that the resulting pattern would start at the beginning of the string
'Environment.NewLine() - Literally matches a new line character
'+$ - Look behind that states that the resulting pattern would finish at the end of the string
.RegexPattern = Environment.NewLine
.Symbol = "whitespace"
End With
'IMPORTANT: Have the IDENTIFIER at the very end of the list
tokenClasses.AddRange({[as], concatenation, conditional, [end], modifier, [operator], type, value, whitespace, id})
End Sub
Friend Function Scan(ByVal source As String, ByRef tokens() As Token) As Exception()
If IsNothing(tokenClasses) Then
Call LoadClasses()
End If
Dim ex As List(Of Exception) = New List(Of Exception)
Dim t As List(Of Token) = New List(Of Token)
'Loop through each token
Dim splitTokens() As String = GetTokens(source)
For w As Integer = 0 To splitTokens.Length - 2
Dim word As String = splitTokens(w)
'The possible token
Dim token As Token = Nothing
'Loop through each token class
For tc As Integer = 0 To tokenClasses.Count - 1
'The current class in the iteration
Dim tokenClass As Token = tokenClasses.Item(tc)
If tokenClass.RegexPattern = Environment.NewLine Then
'If the pattern is just the new line constant, then there is no pattern
'I just need to check for if the word is actually the new line constant
If word = Environment.NewLine Then
token = tokenClass.Clone
token.Value = word
Exit For
End If
Else
'Check if the RegEx matches
Dim regex As System.Text.RegularExpressions.Regex = New System.Text.RegularExpressions.Regex(tokenClass.RegexPattern)
Dim match As System.Text.RegularExpressions.Match = regex.Match(word)
If match.Success Then
'If there is a match the set the token variable and leave the RegEx loop
token = tokenClass.Clone
token.Value = word
Exit For
End If
End If
'Leave the token class loop if there is already a match
If token IsNot Nothing Then
Exit For
End If
Next 'End of token class
'If there was no match found then add an exception
'Otherwise, add it to the list
If token IsNot Nothing Then
t.Add(token)
Else
ex.Add(New Exception(String.Format("'{0}' is not valid token.", word)))
End If
Next 'End of tokens
'Return the tokens if there are no exceptions
tokens = If(ex.Count = 0, t.ToArray, Nothing)
'Return any exceptions(hopefully none)
Return ex.ToArray
End Function
Private Function GetTokens(ByVal text As String) As String()
Dim lines() As String = text.Split(New String() {Environment.NewLine}, StringSplitOptions.None)
Dim tokens As List(Of String) = New List(Of String)
For Each line In lines
tokens.AddRange(line.Split(New Char() {" "c}, StringSplitOptions.RemoveEmptyEntries))
tokens.Add(Environment.NewLine)
Next
Return tokens.ToArray()
End Function
End Module
Code:
Option Strict On
Option Explicit On
Module Syntax_Analyzer
Friend Function Parse(ByVal tokens() As Token) As Exception()
Dim ex As List(Of Exception) = New List(Of Exception)
For t As Integer = 0 To tokens.Length - 1
Dim currentToken As Token = tokens(t)
Dim nextToken As Token = If(t + 1 <= tokens.Length - 1, tokens(t + 1), Nothing)
If currentToken.GetType Is GetType(Conditional_Token) AndAlso Not DirectCast(currentToken, Conditional_Token).IsEnd Then
For nt As Integer = t + 1 To tokens.Length - 1
Dim newCurrent As Token = tokens(nt)
If newCurrent.GetType Is GetType(Conditional_Token) AndAlso DirectCast(newCurrent, Conditional_Token).IsEnd AndAlso Not DirectCast(newCurrent, Conditional_Token).IsUsed Then
DirectCast(newCurrent, Conditional_Token).IsUsed = True
With DirectCast(currentToken, Conditional_Token)
.EndToken = newCurrent
.FoundEnd = True
End With
Exit For
End If
Next
If Not DirectCast(currentToken, Conditional_Token).FoundEnd Then
ex.Add(New Exception("'If' must end with a matching 'end' statement."))
End If
ElseIf Not currentToken.CanPrecede(nextToken) Then
ex.Add(New Exception(String.Format("{0} cannot precede {1}", currentToken.Value, nextToken.Value)))
End If
Next
Return ex.ToArray
End Function
End Module
Currently the Syntax Analyzer isn't much of a parser, more of a syntax checker, but I plan to change that
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|