|
-
Dec 11th, 2006, 04:35 PM
#1
Thread Starter
Frenzied Member
[2.0] Problem with Regex formatting C++ code to HTML
I'm using regular expressions to format C++ code to HTML with syntax highlighting. I've run into several problems right off the bat. Have a look at the functions I have so far and see if you can help me figure this out.
I'll start with the simplest function, to format strings for HTML in dark red:
Code:
public string FormatStrings(string strSource)
{
Regex r = new Regex("(\".*?\")+?|('.*?')+?|(\"|').*");
MatchEvaluator eval = new MatchEvaluator(ReplaceRed);
strSource = r.Replace(strSource, eval);
return strSource;
}
public string ReplaceRed(Match m)
{
return "<span style=\"color: #800000\">" + m + "</span>";
}
This looks for:
1) a " followed by anything but \n (.), zero or more times, but as few as possible (*?) followed by a matching " - all that one or more times, but as few as possible.
2) same scenario except for single quotes
3) an unmatched " or ' followed by anything but \n
Next, functions to format comments in dark green:
Code:
public string FormatComments(string strSource)
{
Regex r = new Regex("(/\\*(.|\n)*?\\*/)+?|//.*|/\\*(.|\n)*");
MatchEvaluator eval = new MatchEvaluator(ReplaceGreen);
strSource = r.Replace(strSource, eval);
return strSource;
}
public string ReplaceGreen(Match m)
{
return "<span style=\"color: #008000\">" + m + "</span>";
}
This searches for:
1) a slash /* followed by anything including newline, zero or more times but as few as possible (*?) followed by */ - all that one or more times but as few as possible
2) a // followed by anything but \n
3) an unterminated /*
The first problem I have is if I have strings nested within comments, after I call both replace functions, the nested inner string tags will override the comment coloring.
for example, this is some text before formatting:
Code:
// this is a "comment"
and after running FormatStrings and FormatComments:
Code:
<span style="color: #008000">// this is a <span style="color: #800000">"comment"</span>
</span>
So as I see it, I have two or more options:
1) Don't match strings that are enclosed in comment tags - but this is very difficult since it would involve a complex lookahead/lookbehind
2) Run the strings first, then when running comments, if I find nested string tags, remove the tags
Is there a better way to do this?
-
Dec 12th, 2006, 08:19 AM
#2
Re: [2.0] Problem with Regex formatting C++ code to HTML
Hello,
To be honest, i haven't read all your post, so I am not in anyway trying to answer your question here, but a suggestion that I have would be the following:
Syntax Highlighter
which will highlight all the syntax that you would like. If you are doing this as a learning experience, then ignore it, but if you are just trying to get your syntax highlighted, it might be worth a look!!
Hope this helps!
Gary
-
Dec 12th, 2006, 09:13 AM
#3
Thread Starter
Frenzied Member
Re: [2.0] Problem with Regex formatting C++ code to HTML
 Originally Posted by gep13
Hello,
To be honest, i haven't read all your post, so I am not in anyway trying to answer your question here, but a suggestion that I have would be the following:
Syntax Highlighter
which will highlight all the syntax that you would like. If you are doing this as a learning experience, then ignore it, but if you are just trying to get your syntax highlighted, it might be worth a look!!
Hope this helps!
Gary
Thanks for the link...I looked at what he was trying to do and he did put a lot of work into it, but for all that - it doesn't even highlight C# code properly.
Actually, live syntax highlighting would be nice to have but first I just want to be able to parse C++ files (C# later on) and generate HTML out of them.
I was hoping someone with a lot of Regular Expression experience could help.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|