|
-
Dec 11th, 2006, 04:35 PM
#1
Thread Starter
Frenzied Member
[2.0] Problem with Regex formatting C++ code to HTML
I'm using regular expressions to format C++ code to HTML with syntax highlighting. I've run into several problems right off the bat. Have a look at the functions I have so far and see if you can help me figure this out.
I'll start with the simplest function, to format strings for HTML in dark red:
Code:
public string FormatStrings(string strSource)
{
Regex r = new Regex("(\".*?\")+?|('.*?')+?|(\"|').*");
MatchEvaluator eval = new MatchEvaluator(ReplaceRed);
strSource = r.Replace(strSource, eval);
return strSource;
}
public string ReplaceRed(Match m)
{
return "<span style=\"color: #800000\">" + m + "</span>";
}
This looks for:
1) a " followed by anything but \n (.), zero or more times, but as few as possible (*?) followed by a matching " - all that one or more times, but as few as possible.
2) same scenario except for single quotes
3) an unmatched " or ' followed by anything but \n
Next, functions to format comments in dark green:
Code:
public string FormatComments(string strSource)
{
Regex r = new Regex("(/\\*(.|\n)*?\\*/)+?|//.*|/\\*(.|\n)*");
MatchEvaluator eval = new MatchEvaluator(ReplaceGreen);
strSource = r.Replace(strSource, eval);
return strSource;
}
public string ReplaceGreen(Match m)
{
return "<span style=\"color: #008000\">" + m + "</span>";
}
This searches for:
1) a slash /* followed by anything including newline, zero or more times but as few as possible (*?) followed by */ - all that one or more times but as few as possible
2) a // followed by anything but \n
3) an unterminated /*
The first problem I have is if I have strings nested within comments, after I call both replace functions, the nested inner string tags will override the comment coloring.
for example, this is some text before formatting:
Code:
// this is a "comment"
and after running FormatStrings and FormatComments:
Code:
<span style="color: #008000">// this is a <span style="color: #800000">"comment"</span>
</span>
So as I see it, I have two or more options:
1) Don't match strings that are enclosed in comment tags - but this is very difficult since it would involve a complex lookahead/lookbehind
2) Run the strings first, then when running comments, if I find nested string tags, remove the tags
Is there a better way to do this?
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|