|
-
Nov 29th, 2010, 12:26 PM
#1
Thread Starter
Hyperactive Member
XML cleanup with missing tags
I am trying to format xml entries I have so that I can use the xmltextreader without getting errors. I added a default header and footer in the event I notice there is no opening or closing tags. I remove illegal characters and check for unicode but I always find an issue where an entry slips in and gives the error: data at the root level is invalid and when I check that entry is slipped through the cleaning process or just has an unmatched tag somewhere. Now I use
Code:
Dim stringSplitter() As String = {"</entry>"}
' split the file content based on the closing entry tag
sampleResults = _html.Split(stringSplitter, StringSplitOptions.RemoveEmptyEntries)
to split my xml into individual entries before I start the cleanup process. Here are my default headers;
Code:
Private defaultheader = "xmlns=""http://www.w3.org/2005/Atom"""
Private headerl As String = "<?xml version=""1.0"" encoding=""utf-8""?>" & vbNewLine & "<entry " & defaultNameSpace & ">"
Private footer As String = "</entry>"
is there any tool in the .net framework that can detect and cleanup unmatched tags so that I can get this to work. I mean works for the most part with a whole bunch of IF statements but just wondering if there is something more practical to use
-- Please rate me if I am helpful --
-
Nov 29th, 2010, 01:13 PM
#2
Re: XML cleanup with missing tags
What's the source of the XML? If you are, then don't build XML using strings like that... it'll just case you heart ache (as you're finding out). Instead use the XML namespace and create an XMLDocument. From there you can create nodes, attributes and so on. Then once you have your XMLDocument built and all the nodes added, you can use the .XML property to get the XML itslef, or use the SaveXML/LoadXML (or it might be .Save/.Load) to write/read to/from files.
-tg
-
Nov 29th, 2010, 01:49 PM
#3
Re: XML cleanup with missing tags
As per techgnome suggestion check out examples at http://msdn.microsoft.com/en-us/vbasic/bb688087.aspx
Here is a quick/simple look at how you can easily construction a document using LINQ.
Code:
Public Sub xDemo()
Dim Fruits() As String = {"Apple", "Peach", "Orange", "Grape"}
Dim Prices() As Double = {1.23, 2.0, 1.11, 0.87}
Dim doc = <?xml version="1.0" encoding="UTF-16" standalone="yes"?><foo/>
For x As Integer = 0 To Fruits.Count - 1
doc...<foo>(0).Add( _
<Entry>
<Key><%= x + 1 %></Key>
<Value><%= Fruits(x) %></Value>
<Price><%= Prices(x) %></Price>
</Entry>)
Next
Console.WriteLine(doc.ToString)
End Sub
Output
Code:
<?xml version="1.0" encoding="utf-16" standalone="yes"?>
<foo>
<Entry>
<Key>1</Key>
<Value>Apple</Value>
<Price>1.23</Price>
</Entry>
<Entry>
<Key>2</Key>
<Value>Peach</Value>
<Price>2</Price>
</Entry>
<Entry>
<Key>3</Key>
<Value>Orange</Value>
<Price>1.11</Price>
</Entry>
<Entry>
<Key>4</Key>
<Value>Grape</Value>
<Price>0.87</Price>
</Entry>
</foo>
-
Nov 29th, 2010, 02:00 PM
#4
Thread Starter
Hyperactive Member
Re: XML cleanup with missing tags
reply to techgnome
The xml data is parsed unformatted from a site to text files, whereafter I take the data to form valid xml entries. When I tried using xmldocument.loadxml()
and then get the error
Name cannot begin with the '<' character, hexadecimal value 0x3C.
So I use the text parsing method when I cannot load an XML doc and I do cleaning of characters before I load the xml doc.
-- Please rate me if I am helpful --
-
Nov 29th, 2010, 03:01 PM
#5
Re: XML cleanup with missing tags
OK, so it's already in XML format... but there seems to be a problem with it? What does the data look like? form the sound of the error, it sounds like there's something wrong with the data to begin with and really should be fixed at the source.
-tg
-
Nov 29th, 2010, 03:19 PM
#6
Thread Starter
Hyperactive Member
Re: XML cleanup with missing tags
The data comes from an api response stream and I capture chunks because the data is too large to just readtoend(). Then I append the data to a stringbuilder and write to text files once it reaches a certain size. So the data in the text files is in xml format but not corrected to acceptable format.
eg.. at time a file may start with
Code:
erb xmlns:activity="http........</entry></results>
instead of beginning as follows:
Code:
<results data="1" publisher="xxxxxr" endpoint="Notices" refreshURL="https://xxxxxxxx.com.activities.xml?max=10000">
<entry xmlns:.........
So when I read the text files I have two methods where I first try to load as an xml doc after cleaning illegal chars and adding a default header and footer or if I cannot accomplish this then just do text parsing. The problem comes up with when I need to parse these entries and get values and using regex is so tedious and I found that the XMLtextreader and readelementcontent controls are so convenient to parse. So is a problem in that a typical file will not contain the data in acceptable format so it means I need to find what is the opening tag or what tag is missing and close it.
Now I hope this makes sense since I don't know any other way to handle it. The idea is to capture the response continously and handle the file parsing on a different thread and then recreate valid xml files to deliver.
-- Please rate me if I am helpful --
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|