Results 1 to 3 of 3

Thread: I would like a more efficient way to remove characters from nodes in a string of XML

  1. #1

    Thread Starter
    Addicted Member thetimmer's Avatar
    Join Date
    Jan 2014
    Location
    Plano, Texas
    Posts
    243

    I would like a more efficient way to remove characters from nodes in a string of XML

    Can anyone offer a more ellegant/efficient solution?

    I've writen a function that cleans out an XML string. The string of XML eventually gets read into a dataset like so
    Code:
    favoriteDataSet.ReadXml(New System.IO.StringReader(sCleanedXML))
    so this function is setup to clean the offending characters without removing them if they are actually valid xml characters.

    So if I sent in

    <Some&Text/ToClean> blah blah blah </Some&Text/ToClean>
    I would get back

    <Some_Text_ToClean> blah blah blah </Some_Text_ToClean>
    It works but I'd really like to learn how to implement it more efficiently.

    It accepts...

    sOriginal which is a long string of xml with tags like blah blah blah

    asBetween which is a string of characters to look between to find the offending characters. The string of characters is pipe delimited. (ie. asBetween = "<|>")

    sRemovalTokens which is a string of pipe delimited values to clean out. (ie. sRemovalTokens = "&|/|#" )

    Here is the code...

    Code:
     Private Function CleanXMLTags(sOriginal As String, asBetween As String, sRemovalTokens As String) As String
        Try
            Dim aBookEnds() As String = asBetween.Split("|")
            Dim aRemovalTokens() As String = sRemovalTokens.Split("|")
            Dim iStartIndex As Integer = 0
            Dim iEndIndex As Integer = 0
            Dim sCurToken As String = String.Empty
            Dim sReplaceToken As String = String.Empty
            For i As Integer = 0 To aRemovalTokens.Length - 1
                If aBookEnds.Length = 2 AndAlso Not String.Equals(aRemovalTokens(i).Trim, "", StringComparison.CurrentCultureIgnoreCase) Then
                    iStartIndex = 0
                    iEndIndex = 0
                    While iStartIndex > -1
                        iStartIndex = sOriginal.IndexOf(aBookEnds(0), iEndIndex)
                        iEndIndex = sOriginal.IndexOf(aBookEnds(1), iEndIndex + 1)
                        If iStartIndex < 0 OrElse iEndIndex < 0 Then
                            Exit While
                        End If
                        sCurToken = sOriginal.ToString.Substring(iStartIndex, iEndIndex - iStartIndex + 1)
                        If sCurToken.ToString.StartsWith("</") Then
                            sCurToken = sCurToken.Substring(2)
                        End If
                        If sCurToken.ToString.EndsWith("/>") Then
                            sCurToken = sCurToken.Substring(0, sCurToken.Length - 2)
                        End If
                        If sCurToken.Contains(aRemovalTokens(i)) Then
                            sReplaceToken = sCurToken.Replace(aRemovalTokens(i), "_")
                            sOriginal = sOriginal.Replace(sCurToken, sReplaceToken)
                        End If
                    End While
                End If
            Next
            Return sOriginal 'Cleaned
        Catch ex As Exception
            Return sOriginal
        End Try
    End Function
    _____________
    Tim

    If anyone's answer has helped you, please show your appreciation by rating that answer.
    When you get a solution to your issue remember to mark the thread Resolved.


    reference links

  2. #2
    Frenzied Member KGComputers's Avatar
    Join Date
    Dec 2005
    Location
    Cebu, PH
    Posts
    2,024

    Re: I would like a more efficient way to remove characters from nodes in a string of

    Here's one of my reference in dealing with XML illegal characters.
    Strip illegal xml characters.

    Another option would be to include Regex for cleaning xml nodes.
    CodeBank: VB.NET & C#.NET | ASP.NET
    Programming: C# | VB.NET
    Blogs: Personal | Programming
    Projects: GitHub | jsFiddle
    ___________________________________________________________________________________

    Rating someone's post is a way of saying Thanks...

  3. #3
    Powered By Medtronic dbasnett's Avatar
    Join Date
    Dec 2007
    Location
    Jefferson City, MO
    Posts
    9,897

    Re: I would like a more efficient way to remove characters from nodes in a string of

    Are you creating this, <Some&Text/ToClean> blah blah blah </Some&Text/ToClean>, or is it coming from somewhere else? What is it supposed to be when it is in this format?
    My First Computer -- Documentation Link (RT?M) -- Using the Debugger -- Prime Number Sieve
    Counting Bits -- Subnet Calculator -- UI Guidelines -- >> SerialPort Answer <<

    "Those who use Application.DoEvents have no idea what it does and those who know what it does never use it." John Wein

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width