I would like a more efficient way to remove characters from nodes in a string of XML
Can anyone offer a more ellegant/efficient solution?
I've writen a function that cleans out an XML string. The string of XML eventually gets read into a dataset like so
Code:
favoriteDataSet.ReadXml(New System.IO.StringReader(sCleanedXML))
so this function is setup to clean the offending characters without removing them if they are actually valid xml characters.
So if I sent in
<Some&Text/ToClean> blah blah blah </Some&Text/ToClean>
I would get back
<Some_Text_ToClean> blah blah blah </Some_Text_ToClean>
It works but I'd really like to learn how to implement it more efficiently.
It accepts...
sOriginal which is a long string of xml with tags like blah blah blah
asBetween which is a string of characters to look between to find the offending characters. The string of characters is pipe delimited. (ie. asBetween = "<|>")
sRemovalTokens which is a string of pipe delimited values to clean out. (ie. sRemovalTokens = "&|/|#" )
Here is the code...
Code:
Private Function CleanXMLTags(sOriginal As String, asBetween As String, sRemovalTokens As String) As String
Try
Dim aBookEnds() As String = asBetween.Split("|")
Dim aRemovalTokens() As String = sRemovalTokens.Split("|")
Dim iStartIndex As Integer = 0
Dim iEndIndex As Integer = 0
Dim sCurToken As String = String.Empty
Dim sReplaceToken As String = String.Empty
For i As Integer = 0 To aRemovalTokens.Length - 1
If aBookEnds.Length = 2 AndAlso Not String.Equals(aRemovalTokens(i).Trim, "", StringComparison.CurrentCultureIgnoreCase) Then
iStartIndex = 0
iEndIndex = 0
While iStartIndex > -1
iStartIndex = sOriginal.IndexOf(aBookEnds(0), iEndIndex)
iEndIndex = sOriginal.IndexOf(aBookEnds(1), iEndIndex + 1)
If iStartIndex < 0 OrElse iEndIndex < 0 Then
Exit While
End If
sCurToken = sOriginal.ToString.Substring(iStartIndex, iEndIndex - iStartIndex + 1)
If sCurToken.ToString.StartsWith("</") Then
sCurToken = sCurToken.Substring(2)
End If
If sCurToken.ToString.EndsWith("/>") Then
sCurToken = sCurToken.Substring(0, sCurToken.Length - 2)
End If
If sCurToken.Contains(aRemovalTokens(i)) Then
sReplaceToken = sCurToken.Replace(aRemovalTokens(i), "_")
sOriginal = sOriginal.Replace(sCurToken, sReplaceToken)
End If
End While
End If
Next
Return sOriginal 'Cleaned
Catch ex As Exception
Return sOriginal
End Try
End Function
Re: I would like a more efficient way to remove characters from nodes in a string of
Here's one of my reference in dealing with XML illegal characters.
Strip illegal xml characters.
Another option would be to include Regex for cleaning xml nodes.
Re: I would like a more efficient way to remove characters from nodes in a string of
Are you creating this, <Some&Text/ToClean> blah blah blah </Some&Text/ToClean>, or is it coming from somewhere else? What is it supposed to be when it is in this format?