Results 1 to 19 of 19

Thread: Parse VB/C# code files

  1. #1

    Thread Starter
    PowerPoster
    Join Date
    Apr 2007
    Location
    The Netherlands
    Posts
    5,070

    Parse VB/C# code files

    Hi,

    This may be a bit of an odd question, but I was thinking about it and thought it might not be as hard as I first thought it would.

    I am creating a kind of 'lightweight' visual studio IDE, basically a text editor for VB or C# code files, or in other words: a visual studio IDE without all the fancy stuff like actually running projects, debugging, form designer, etc. The idea is that you can view and edit code source files quickly and easily without having to start visual studio itself. I often find myself opening source files in notepad when I just want to take a look at something quickly, or to type some code that is intended to be posted on this forum. I find it too much 'work' to fire up visual studio on those occasions, but it would be nice if I had some basic syntax highlighting (already have that), Intellisense and such.

    What I'm trying to do now is parse a VB or C# source file to extract information such as the types/classes in a file, and the members (as well as their parameters and return type) in those classes.

    I was going to do this manually but I doubt I could pull it off efficiently (if at all), so I started thinking about it and thought maybe .NET has some built in functionality already. I don't know much about this stuff, but I do know that you can compile source code by feeding it a string that represents the code. So if it can 'compile a string', then surely it should be able to parse it as well? With the difference between compiling and parsing (I'm not sure if I have this terminology correct) I mean:
    - compiling: reading the source and converting it to the CIL so it can be run
    - parsing: reading the source and extracting classes/members info, so that the IDE can do some basic error checking (undeclared variable warnings etc) and display the members in a class (in those comboboxes at the top of the code editor for example).

    I don't need to compile the source (I'm not going to run the application), I only need to parse it.


    Can this be done? I think it should be possible but I cannot find what I would need to do this... I would assume that I could call some function that would return a collection of types (Type), and from that I could figure out the members (MethodInfo, PropertyInfo, etc) and their parameters, return types, etc. And all I should give it is the source code as a string.

    Thanks for any help!



    Oh, and if this is impossible after all, then does someone know where I can find some information about the VB and C# languages related to parsing? I think I read something like that a while back, where some article (probably MSDN) explained the format of each language in detail so someone could write their own parser. I didn't look at it too long but I can't find it anymore now that I might need it...

  2. #2
    Frenzied Member
    Join Date
    Sep 2006
    Location
    Scotland
    Posts
    1,054

    Re: Parse VB/C# code files

    Wow... This sounds like an amazing project. It's beyond my skills to help but its something I will definately be keeping an eye on.

    The only comment I can make is that even though im not sure about the interpretation part of visual studio, it shouldnt be too hard to use the compiler from your app? (If that would be of any use....)


    Edit: http://www.vbdotnetheaven.com/Upload...dlineInVB.aspx -Thats related to the compiler thing if its any help.
    Last edited by 03myersd; Sep 5th, 2010 at 07:04 AM. Reason: Further information.

  3. #3
    Frenzied Member
    Join Date
    Jan 2008
    Posts
    1,754

    Re: Parse VB/C# code files

    It's not what you're asking for but here's some code to compile an application.

    For debugging you could compile the app to a TEMP directory and possibly add a TraceListener, if it is what I think it is. Here's a link for the TraceListener on MSDN: http://msdn.microsoft.com/en-us/libr...elistener.aspx

    Code for Compiling a String into an EXE.
    NOTE * In the below code I have two RichTextBoxes, rtbScript which is where the code is, and rtbResults which displays error messages etc...
    Code:
                CodeDomProvider codeProvider = CodeDomProvider.CreateProvider("CSharp");
                string Output = null;
                Output = projectLocation + @"\" + projectName.Replace(" ", "") + ".exe";
               
                System.CodeDom.Compiler.CompilerParameters parameters = new CompilerParameters();
                //Make sure we generate an EXE, not a DLL
                parameters.GenerateExecutable = true;
                parameters.OutputAssembly = Output;
    
                string cSharpScriptInput = rtbScript.Text; //Replace rtbScript with your code source.
                CompilerResults results = codeProvider.CompileAssemblyFromSource(parameters, cSharpScriptInput);
                if (results.Errors.Count > 0)
                {
                    rtbResults.Select(0, 0);
                    rtbResults.SelectionColor = Color.Red;
                    foreach (CompilerError CompErr in results.Errors)
                    {
                        rtbResults.SelectedText = "Line number " + CompErr.Line + ", Error Number: " + CompErr.ErrorNumber + ", '" + CompErr.ErrorText + ";" + Environment.NewLine;
                    }
                }
                else
                {
                    //Successful Compile
                    rtbResults.Select(0, 0);
                    rtbResults.SelectionColor = Color.Blue;
                    rtbResults.SelectedText = "Success!" + Environment.NewLine;
                    Process.Start(Output);
                }
    Just an idea, but is there some way of utilizing the functions of the XMLDocument class to parse data other than that of an XML structure? That way you can find out where the if statement ends, etc...

    Does this answer your question about MethodInfo/PropertyInfo etc..?
    http://msdn.microsoft.com/en-us/libr...eflection.aspx

  4. #4

    Thread Starter
    PowerPoster
    Join Date
    Apr 2007
    Location
    The Netherlands
    Posts
    5,070

    Re: Parse VB/C# code files

    Thanks both, but that is indeed not what I need. I don't need to compile the code, I don't need to run it, and I certainly don't need to debug it. That's stuff the VS IDE does best and if you want to do that, you would use the normal IDE. My lightweight IDE is solely for reading and editing the source code files. It doesn't even have any concept of 'project' or 'solution', besides the fact that you can open a project or solution (meaning: it opens all the source files in that project, but nothing else).

    I had a quick look in the System.CodeDom.Compiler namespace, and while there is a class for parsing (CodeParser), it seems to be only an empty implementation of the ICodeParser interface. As far as I can tell, I would still have to parse the code string manually... There must be a better way?

  5. #5
    Frenzied Member
    Join Date
    Sep 2006
    Location
    Scotland
    Posts
    1,054

    Re: Parse VB/C# code files

    How about a plugin for something like notepad++ then? It has all the syntax highlighting. All it would have to do is scan the project file for the files it has to open?

  6. #6

    Thread Starter
    PowerPoster
    Join Date
    Apr 2007
    Location
    The Netherlands
    Posts
    5,070

    Re: Parse VB/C# code files

    I don't see how that would help me... I already got syntax highlighting, I already got the entire text editor functionality working, including a tabbed MDI interface with tabgroups as in visual studio. Notepad++ wouldn't parse the code for me.

    Again, this is what I need:
    I need to parse the source code (string) so that I have collection of the Types in that code, with each type having a collection of members (MethodInfo, PropertyInfo and FieldInfo objects probably) with their arguments and return types. I basically want to be able to build a treeview containing all the types in the current file, with their methods, properties, fields and their return types, similar to the Class View window in visual studio. That would then allow me to build the comboboxes on top of the code editor that list the types and their members allowing for quick navigation, and I could use it as a Intellisense source. I think the editor I'm using has Intellisense support built in just like visual studio, but I have to supply it with the members it displays manually, so that's not much use if I don't have those...

  7. #7
    Frenzied Member
    Join Date
    Sep 2006
    Location
    Scotland
    Posts
    1,054

    Re: Parse VB/C# code files

    Ah ok. I misunderstood. Apologies!

  8. #8

    Thread Starter
    PowerPoster
    Join Date
    Apr 2007
    Location
    The Netherlands
    Posts
    5,070

    Re: Parse VB/C# code files

    I suppose however that I could compile the code first to some temporary path, and then use Reflection on the resulting assembly to extract the Types (Assembly.GetTypes), and from that the members (Type.GetMethods, Type.GetProperties, etc). That would probably work, but:
    1. Compiling is slow,
    2. Reflection is slow.

    So it would likely be extremely slow. That's not good, I need this to happen 'on the fly' just like visual studio does. I'm going to see how far I can get with this just to see how slow it actually is, but I doubt it is a good approach.

  9. #9
    Frenzied Member
    Join Date
    Sep 2006
    Location
    Scotland
    Posts
    1,054

    Re: Parse VB/C# code files

    Again not quite what you are looking for but have you tried sharpdevelop? If there are features that you don't want you can strip them out.

  10. #10

    Thread Starter
    PowerPoster
    Join Date
    Apr 2007
    Location
    The Netherlands
    Posts
    5,070

    Re: Parse VB/C# code files

    I'm using the text editor from SharpDevelop, and I suppose I could take a look at their source code at how they parse the source, but that would probably be a hell of a job to figure out. I suppose that is my last resort.

    At the moment I'm trying to compile the source code but I'm running into problems: if a class uses another class (which is like 99% of the time) then it can't compile the class because it doesn't know the other class. I tried compiling them at the same time (by providing both sources in the ParamArray parameter instead of just one) but that doesn't seem to work either. I haven't really looked into it after that (dinner time) but I'm not sure if that is the correct approach. I also saw errors telling me that 'Color' was not defined. I suppose I have to let the compiler 'reference' System.Drawing... But how?

  11. #11
    Frenzied Member
    Join Date
    Sep 2006
    Location
    Scotland
    Posts
    1,054

    Re: Parse VB/C# code files

    I honestly couldnt tell you. This is way over my head. But when you get it how you like it, if you could post the source that would be great!

  12. #12
    Stack Overflow mod​erator
    Join Date
    May 2008
    Location
    British Columbia, Canada
    Posts
    2,824

    Re: Parse VB/C# code files

    Well, you could compile the code, load the DLL dynamically, and use reflection to get an object's properties. It's probably about how VS does it (I know code compilation is involved somewhere in there) but I'm not sure how efficient it'll be... that leaves "manual" parsing, which is what I'd go with. It could be really easy with regex, for example a method signature would be:
    Code:
    ^(Private|Public|Friend|Protected\sFriend|Protected)\s(Sub|Function)\s\w+\s*\(?([^\)]+\)(As\s\w+)?$
    Or something like that.

  13. #13

    Thread Starter
    PowerPoster
    Join Date
    Apr 2007
    Location
    The Netherlands
    Posts
    5,070

    Re: Parse VB/C# code files

    That's what I'm trying to do now: compiling the source and loading the compiled assembly using Reflection. I have tried it a few times but the compilation can fail in so many ways that I haven't really tested how fast it is yet. I'm going to have to research compiling source code in .NET some more because I don't really understand the errors it's returning. For example, if I try to compile a simple vb code file it always returns an error that mentions the '<Default>' namespace (it can't be found or something, can't remember). If I try to compile the exact same code except in C# then it works fine, but as soon as you get multiple files (which is like 99&#37; of the time) then it screws up again. The files I'm compiling are all error free when I open the project in visual studio so they should be possible to compile, I probably have to use some different options or something...

    And I doubt regex would be useful to parse the code manually. There are so many different allowed formats that I think any decent regex pattern (one that catches everything) would be several pages long. I'm sure the only feasible way is to do some kind of lexical analysis (which may or may not use regex, I think most parsers simply read 'tokens', one character at a time). I don't know anything about that though so I doubt I'm going to try it anytime soon.

  14. #14
    Stack Overflow mod​erator
    Join Date
    May 2008
    Location
    British Columbia, Canada
    Posts
    2,824

    Re: Parse VB/C# code files

    No, really - regular expressions would be quite easy to use. Although there are different kinds of modifiers, just put them in order, first.

  15. #15
    You don't want to know.
    Join Date
    Aug 2010
    Posts
    4,578

    Re: Parse VB/C# code files

    Regex might actually work, but it's not a golden hammer. Just because it works and gets the job done doesn't mean it's going to be elegant or easy to maintain. It'd definitely be easier to parse C# than HTML with a regex due to the nature of the grammar, but it just doesn't seem like an appropriate solution given the easier solutions that exist.

    Either choice you make, it's good to have the C# Language Specification and the Visual Basic Language Specification handy. These include the grammar for the languages in BNF (or at least something like it). Whether you choose regex or a more traditional parser, it will be invaluable to have these as a guide because it will tell you every possible valid way to express a concept. C# makes a better example here because its syntactical grammar seems to be explained better than VB's (probably because MS intended for other people to implement C# compilers and VB is Microsoft-only property.) Here's a rough specification of a method in (maybe invalid) regex to give you an idea of what you're dealing with:
    ((new|public|protected|internal|private|static|virtual|sealed|override|abstract|extern)\s+)?partial\ s+(<type>|void)\s+(<identifier>)(...
    That's up to the parameter list. <type> is more or less the same as <identifier>; it has a fairly complicated definition but [A-Za-z][0-9A-Za-z_]? is probably adequate. This is just for method definitions; you likely need to detect namespaces, type definitions, variable declarations, delegates, events, properties, and possibly anonymous methods in addition to some other constructs I might have neglected to mention.

    I spent a couple of semesters on formal languages and compiler design, so to me it's more natural to use the lexical/syntax grammar to tokenize the file and create syntactical tokens. Find articles about writing language parsers and you'll usually find them written in 4 or 5 phases; first is the lexical analysis and second is semantic analysis. You only need to get to semantic analysis to be able to build a syntax tree that will make it easy to find the methods in a code file. The code to do this would be a mite laborious, but it would follow the language spec very closely so it wouldn't be hard to catch mistakes. It's fun, but I'm pretty rusty on it and it'd take me a long time to get back with explanations with actual code to back them up.

    Code compilation followed by reflection would be as easy as the syntax tree from a parser if you could get the assemblies actually built, but it's magicks I've never fooled with so I don't dare to comment on it. The VB compiler seems to do some funky things compared to C#; it does a lot more auto-generation of code. If you provided more detail about the code you're using and the errors it throws I'm sure someone could help you out with that.

  16. #16

    Thread Starter
    PowerPoster
    Join Date
    Apr 2007
    Location
    The Netherlands
    Posts
    5,070

    Re: Parse VB/C# code files

    Yeah, I kind of ruled out manual parsing by now as it will probably turn out to be a project that would take me months if not years, and that is assuming I ever get it right and never give up.

    Compiling is probably the way to go, but again, manual compilation (using the CodeDom namespace) is not going to work either. It is useful for compiling small pieces of code, but entire projects is a completely different story. I recently found out that I could use MSBuild.exe, which I can probably run from the command line, to compile a solution or project file exactly the way VS does it. That gives me some more problems though (mainly because my application has no concept of 'project' other than 'collection of these source files', while the compiler actually needs a valid project to work). I will probably work it out some more in the coming days but it's going to take time... which I don't have at the moment

  17. #17
    PowerPoster SJWhiteley's Avatar
    Join Date
    Feb 2009
    Location
    South of the Mason-Dixon Line
    Posts
    2,256

    Re: Parse VB/C# code files

    The problem is compiling and reflecting is that it assumes the code can compile...If you have an editor, you are obviously editing it - which means, 90% of the time it doesn't compile at any given instance And as you have found, the compiler will need to know which files and references it needs to compile (aka. a 'project' of some kind).

    Essentially, you have to add intelligence to the parser (used for syntax highlighting, etc). It seems you are using a third-party editor which means you may not be able to do that...unless it has the ability to extend the parser and give up information about what it has parsed - this is what Sitten indicated (lexical analysis followed by syntactical analysis).
    "Ok, my response to that is pending a Google search" - Bucky Katt.
    "There are two types of people in the world: Those who can extrapolate from incomplete data sets." - Unk.
    "Before you can 'think outside the box' you need to understand where the box is."

  18. #18

    Thread Starter
    PowerPoster
    Join Date
    Apr 2007
    Location
    The Netherlands
    Posts
    5,070

    Re: Parse VB/C# code files

    Yes, but when the code cannot compile, the CodeDom compiler gives me an excellent list of errors which I can then display to the user, just like Visual Studio does. When it doesn't compile of course I cannot provide a list of the classes, and I will simply display the latest successful compile. Visual Studio itself is much better in this as it can happily display the Class View window or Intellisense in a project that cannot compile, so that's why I know it has to use some kind of parser to parse the language, which is completely separate from compiling it. So that's one thing Visual Studio does better, but I don't really care. It's not like I'm creating a new IDE trying to overthrow MS I'm just creating a simple lightweight editor, but I would still like some basic form of the class view and Intellisense.

    I am pretty certain the MSBuild.exe compiler supplies the same list of errors by the way, but I haven't looked at it much yet.

    And I have the source code of the editor I'm using (it's the ICSharpCode.TextEditor one used by SharpDevelop). I am pretty sure it uses some kind of regex-based syntax highlighting, since the highlighting scheme is defined in XML files. I don't think it actually parses the language as being VB or C# and applies a scheme for that, as I can simply create a custom XML file for some language 'X' and tell it to use that and it would happily highlight any file written in language 'X' for me, without knowing anything about the structure of language 'X'.

  19. #19
    Stack Overflow mod​erator
    Join Date
    May 2008
    Location
    British Columbia, Canada
    Posts
    2,824

    Re: Parse VB/C# code files

    BTW, [a-zA-Z0-9_] is the same as \w.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width