Results 1 to 3 of 3

Thread: [2.0] Problem with Regex formatting C++ code to HTML

  1. #1

    Thread Starter
    Frenzied Member
    Join Date
    Aug 2000
    Location
    Birmingham, AL
    Posts
    1,276

    Question [2.0] Problem with Regex formatting C++ code to HTML

    I'm using regular expressions to format C++ code to HTML with syntax highlighting. I've run into several problems right off the bat. Have a look at the functions I have so far and see if you can help me figure this out.

    I'll start with the simplest function, to format strings for HTML in dark red:
    Code:
    public string FormatStrings(string strSource)
    {
    	Regex r = new Regex("(\".*?\")+?|('.*?')+?|(\"|').*");
    	MatchEvaluator eval = new MatchEvaluator(ReplaceRed);
    	strSource = r.Replace(strSource, eval);
    
    	return strSource;
    }
    
    public string ReplaceRed(Match m)
    {
    	return "<span style=\"color: #800000\">" + m + "</span>";
    }
    This looks for:

    1) a " followed by anything but \n (.), zero or more times, but as few as possible (*?) followed by a matching " - all that one or more times, but as few as possible.

    2) same scenario except for single quotes

    3) an unmatched " or ' followed by anything but \n




    Next, functions to format comments in dark green:
    Code:
    public string FormatComments(string strSource)
    {
    
    	Regex r = new Regex("(/\\*(.|\n)*?\\*/)+?|//.*|/\\*(.|\n)*");
    	MatchEvaluator eval = new MatchEvaluator(ReplaceGreen);
    	strSource = r.Replace(strSource, eval);
    
    	return strSource;
    }
    
    public string ReplaceGreen(Match m)
    {
    	return "<span style=\"color: #008000\">" + m + "</span>";
    }
    This searches for:

    1) a slash /* followed by anything including newline, zero or more times but as few as possible (*?) followed by */ - all that one or more times but as few as possible

    2) a // followed by anything but \n

    3) an unterminated /*



    The first problem I have is if I have strings nested within comments, after I call both replace functions, the nested inner string tags will override the comment coloring.

    for example, this is some text before formatting:
    Code:
    // this is a "comment"
    and after running FormatStrings and FormatComments:
    Code:
    <span style="color: #008000">// this is a <span style="color: #800000">"comment"</span>
    </span>
    So as I see it, I have two or more options:

    1) Don't match strings that are enclosed in comment tags - but this is very difficult since it would involve a complex lookahead/lookbehind

    2) Run the strings first, then when running comments, if I find nested string tags, remove the tags

    Is there a better way to do this?

  2. #2
    PowerPoster gep13's Avatar
    Join Date
    Nov 2004
    Location
    The Granite City
    Posts
    21,963

    Re: [2.0] Problem with Regex formatting C++ code to HTML

    Hello,

    To be honest, i haven't read all your post, so I am not in anyway trying to answer your question here, but a suggestion that I have would be the following:

    Syntax Highlighter

    which will highlight all the syntax that you would like. If you are doing this as a learning experience, then ignore it, but if you are just trying to get your syntax highlighted, it might be worth a look!!

    Hope this helps!

    Gary

  3. #3

    Thread Starter
    Frenzied Member
    Join Date
    Aug 2000
    Location
    Birmingham, AL
    Posts
    1,276

    Re: [2.0] Problem with Regex formatting C++ code to HTML

    Quote Originally Posted by gep13
    Hello,

    To be honest, i haven't read all your post, so I am not in anyway trying to answer your question here, but a suggestion that I have would be the following:

    Syntax Highlighter

    which will highlight all the syntax that you would like. If you are doing this as a learning experience, then ignore it, but if you are just trying to get your syntax highlighted, it might be worth a look!!

    Hope this helps!

    Gary
    Thanks for the link...I looked at what he was trying to do and he did put a lot of work into it, but for all that - it doesn't even highlight C# code properly.

    Actually, live syntax highlighting would be nice to have but first I just want to be able to parse C++ files (C# later on) and generate HTML out of them.

    I was hoping someone with a lot of Regular Expression experience could help.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width