VS 2010 I would like a more efficient way to remove characters from nodes in a string of XML

I would like a more efficient way to remove characters from nodes in a string of XML

Can anyone offer a more ellegant/efficient solution?

I've writen a function that cleans out an XML string. The string of XML eventually gets read into a dataset like so

Code:

favoriteDataSet.ReadXml(New System.IO.StringReader(sCleanedXML))

so this function is setup to clean the offending characters without removing them if they are actually valid xml characters.

So if I sent in

<Some&Text/ToClean> blah blah blah </Some&Text/ToClean>
I would get back

<Some_Text_ToClean> blah blah blah </Some_Text_ToClean>
It works but I'd really like to learn how to implement it more efficiently.

It accepts...

sOriginal which is a long string of xml with tags like blah blah blah

asBetween which is a string of characters to look between to find the offending characters. The string of characters is pipe delimited. (ie. asBetween = "<|>")

sRemovalTokens which is a string of pipe delimited values to clean out. (ie. sRemovalTokens = "&|/|#" )

Here is the code...

Code:

Private Function CleanXMLTags(sOriginal As String, asBetween As String, sRemovalTokens As String) As String Try Dim aBookEnds() As String = asBetween.Split("|") Dim aRemovalTokens() As String = sRemovalTokens.Split("|") Dim iStartIndex As Integer = 0 Dim iEndIndex As Integer = 0 Dim sCurToken As String = String.Empty Dim sReplaceToken As String = String.Empty For i As Integer = 0 To aRemovalTokens.Length - 1 If aBookEnds.Length = 2 AndAlso Not String.Equals(aRemovalTokens(i).Trim, "", StringComparison.CurrentCultureIgnoreCase) Then iStartIndex = 0 iEndIndex = 0 While iStartIndex > -1 iStartIndex = sOriginal.IndexOf(aBookEnds(0), iEndIndex) iEndIndex = sOriginal.IndexOf(aBookEnds(1), iEndIndex + 1) If iStartIndex < 0 OrElse iEndIndex < 0 Then Exit While End If sCurToken = sOriginal.ToString.Substring(iStartIndex, iEndIndex - iStartIndex + 1) If sCurToken.ToString.StartsWith("</") Then sCurToken = sCurToken.Substring(2) End If If sCurToken.ToString.EndsWith("/>") Then sCurToken = sCurToken.Substring(0, sCurToken.Length - 2) End If If sCurToken.Contains(aRemovalTokens(i)) Then sReplaceToken = sCurToken.Replace(aRemovalTokens(i), "_") sOriginal = sOriginal.Replace(sCurToken, sReplaceToken) End If End While End If Next Return sOriginal 'Cleaned Catch ex As Exception Return sOriginal End Try End Function

Re: I would like a more efficient way to remove characters from nodes in a string of

Here's one of my reference in dealing with XML illegal characters.
Strip illegal xml characters.

Another option would be to include Regex for cleaning xml nodes.

Re: I would like a more efficient way to remove characters from nodes in a string of

Are you creating this, <Some&Text/ToClean> blah blah blah </Some&Text/ToClean>, or is it coming from somewhere else? What is it supposed to be when it is in this format?