|
-
Dec 3rd, 2009, 02:36 PM
#1
Thread Starter
Fanatic Member
[RESOLVED] XML parsing issue (LINQ)
Hi guys,
My issue is i have some non standard XML and i need to tidy it before it can be parsed with an XElement.Parse(text) function the problem is that the source is as follows (just a sample, the entire file contains many items)
Code:
−<item>
<id>3050</id>
<name>Change of Heart</name>
<plural_name>Changes of Heart</plural_name>
−
<image_url>
{some encoded url that i removed}
</image_url>
<color>Black</color>
<attack>10</attack>
<defense>20</defense>
<favor_points>0</favor_points>
<cost>100</cost>
<num_owned>1</num_owned>
<item_limit>0</item_limit>
<enhancements/>
−
<details>
%3Cspan%20class%3D%27itemTitle%27%3E%20Change%20of%20Heart%3C%2Fspan%3E%3Cbr%3E
</details>
<category>OBJECT</category>
<sub_category>extras</sub_category>
</item>
notice the "-" now initially i tried to replace these with "" using string.replace("-", "") but the resulting string is the same.
I then looked into these dashes and found that if i url encode the string that these dashes consist of 3 chars ie (in % hexadecimal):-
%E2%88%92
i converted these to decimal and created a string of these chars as follows:-
Dim glut As String = Chr(227) & Chr(136) & Chr(146)
i then replaced any occurrence of this string in the text and again it is still there. Any ideas how i can remove these artifacts from my xml so i can parse it correctly?
The original XML is beyond my control
Here is the relevant code i have used
Code:
Option Strict On
Option Explicit On
Imports System
Imports System.Web
Imports System.Text
Public Class Form1
Private Sub btn_populate_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btn_populate.Click
Dim content As String = TextBox1.Text
Dim glut As String = Chr(227) & Chr(136) & Chr(146)
content = content.Replace(glut, "")
TextBox1.Text = HttpUtility.UrlDecode(content)
End Sub
Private Sub btn_encode_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btn_encode.Click
Dim content As String = TextBox1.Text
TextBox1.Text = HttpUtility.UrlEncode(content)
End Sub
End Class
-
Dec 3rd, 2009, 03:08 PM
#2
Re: XML parsing issue (LINQ)
Where are you getting this data from? The only time I have seen the dash as shown here is when someone copied the XML from viewing it in a browser. If this is the case it might be prudent to see if you can get the data another way.
-
Dec 3rd, 2009, 03:28 PM
#3
Thread Starter
Fanatic Member
Re: XML parsing issue (LINQ)
Hi Kevin thanks for the response, indeed you are correct with your reasoning, this xml was indeed copied from a browser, I intend to get the data direct from the url but i'm still learning webrequest/webresponse and thought i could at least produce some workable data and functions for later down the road.
My thoughts with this issue however is that i want to produce a simple app where you could copy and paste xml and have it parsed and displayed in a datagrid. for this to work i would need to remove this issue and any other issues and ensure the xml is correct
-
Dec 3rd, 2009, 04:00 PM
#4
Re: XML parsing issue (LINQ)
Here is some really bare bones code to read an XML file from the web.
The file exists in the code below as I just placed it there but will remove the file before Monday of next week. There are countless methods to read the file into a memory stream but I wanted to keep the code simple. The downloaded file will be placed in the application folder, if you want it in another place simple tweak the location both in the download and the XDocument Load. Hope this helps.
Code:
If Not IO.File.Exists("Customers.xml") Then
Dim URL As String = "http://www.jimjacobe.com/Customers.xml"
Dim webClient As New System.Net.WebClient
webClient.DownloadFile(URL, "Customers.xml")
Application.DoEvents()
End If
Dim mCustomersDocument As XDocument
mCustomersDocument = XDocument.Load("Customers.xml")
Dim Customers = _
( _
From customer In mCustomersDocument...<Customer> _
Select New With _
{ _
.Identifier = customer.<CustomerID>.Value, _
.CompanyName = customer.<CompanyName>.Value _
} _
).ToList
If Not Customers Is Nothing Then
DataGridView1.DataSource = Customers
Else
MsgBox("Error")
End If
-
Dec 3rd, 2009, 04:16 PM
#5
Thread Starter
Fanatic Member
Re: XML parsing issue (LINQ)
Thanks for that kevininstructor, I was trying to load the xml direct into the app and was getting bad requests will look at this idea and start another project to play around with it, i like having the xml local as i only need to get it every few days and this will make things run quicker when evaluating the data on days inbetween.
One of these days i'm going to go over all my projects over the last 5 years and put all the useful code into a few classes <that should be my sig lol
-
Dec 3rd, 2009, 04:25 PM
#6
Re: XML parsing issue (LINQ)
 Originally Posted by Megalith
Thanks for that kevininstructor, I was trying to load the xml direct into the app and was getting bad requests will look at this idea and start another project to play around with it, i like having the xml local as i only need to get it every few days and this will make things run quicker when evaluating the data on days inbetween.
One of these days i'm going to go over all my projects over the last 5 years and put all the useful code into a few classes  <that should be my sig lol
Food for thought in regards to stockpiling code. Consider placing code that is used a lot into a common area and then look at linking the code rather than directly copying into projects. For instance, when adding an existing item click the down arrow on the add button and select link.
-
Dec 3rd, 2009, 04:31 PM
#7
Re: XML parsing issue (LINQ)
instead of copying the xml from the browser, try right-click -> View Source.... copy THAT XML ... and use it... using the source is going to be closer to what you are going to get back. Using what's displayed in the browser may cause problems (other than the "-" problem) as things would be rendered (like for instance an endash, will show up as a - in the viewed data, but the raw data comes as &emdash; .... just something to think about.
-tg
-
Dec 3rd, 2009, 06:16 PM
#8
Thread Starter
Fanatic Member
Re: XML parsing issue (LINQ)
 Originally Posted by kevininstructor
Food for thought in regards to stockpiling code. Consider placing code that is used a lot into a common area and then look at linking the code rather than directly copying into projects. For instance, when adding an existing item click the down arrow on the add button and select link.
I'll look into that, for sure it's high time i started to manage my code better 
 Originally Posted by techgnome
instead of copying the xml from the browser, try right-click -> View Source.... copy THAT XML ... and use it... using the source is going to be closer to what you are going to get back. Using what's displayed in the browser may cause problems (other than the "-" problem) as things would be rendered (like for instance an endash, will show up as a - in the viewed data, but the raw data comes as &emdash; .... just something to think about.
now i have no idea why i didn't think of that think i'd say its safe to say this is resolved, thanks guys, got lots to think about over the weekend now :s the gf wont be happy haha, i get so enthusiastic about writing code an she gets that glazed over eyes look an says uh huh a lot lol
-
Dec 3rd, 2009, 08:06 PM
#9
Thread Starter
Fanatic Member
Re: [RESOLVED] XML parsing issue (LINQ)
Just 2 additional points, firstly how can i urldecode the values within the elements (i found that doing it prior to building the query caused an error as one of the blocks in the code is html), i could remove these blocks as i don't need them is that possible? and secondly my xml file has some unique elements at the end, how do i access these?
e.g
<xml>
<item>
{various elements}
</item>
<item>
{various elements}
</item>
... {lots of item blocks} ...
<item>
{various elements}
</item>
<user_id>202342</user_id>
<user_name>megalith</user_name>
</xml>
how do i get the contents of user_id and user_name?
I tried Dim user_id As String = element.<user_id>.Value but it returns null
-
Dec 4th, 2009, 10:51 AM
#10
Re: [RESOLVED] XML parsing issue (LINQ)
 Originally Posted by Megalith
Just 2 additional points, firstly how can i urldecode the values within the elements (i found that doing it prior to building the query caused an error as one of the blocks in the code is html), i could remove these blocks as i don't need them is that possible? and secondly my xml file has some unique elements at the end, how do i access these?
e.g
<xml>
<item>
{various elements}
</item>
<item>
{various elements}
</item>
... {lots of item blocks} ...
<item>
{various elements}
</item>
<user_id>202342</user_id>
<user_name>megalith</user_name>
</xml>
how do i get the contents of user_id and user_name?
I tried Dim user_id As String = element.<user_id>.Value but it returns null
Here is an example of getting unique data as shown retrieving CopyRight data. Note that when I get data for industry I set a default value if the element is not read, the same can be done for CopyRight, I simply wanted you to see two different methods.
Code:
Private Sub Demo1()
Dim Document As New XDocument
Document = _
<?xml version="1.0" encoding="utf-8"?>
<UserName>
<Settings>
<Industry Default="3"/>
<EntryIndex Default="Manufacturer, Equipment"/>
<RowPosition Default="0"/>
<MainWindowShowSideMenu Default="True"/>
<RememberLastRow Default="False"/>
</Settings>
<CopyRight>Your CompanyName</CopyRight>
</UserName>
Dim Industry As String = (From Index In Document...<Industry> _
Select Index.@Default).DefaultIfEmpty("1").FirstOrDefault
Dim Right As String = (From item In Document...<CopyRight> _
Select item.Value).FirstOrDefault
Console.WriteLine("Industry [{0}]", Industry)
Console.WriteLine("CopyRight [{0}]", Right)
Dim ChangeRight = (From item In Document...<CopyRight> _
Select item).FirstOrDefault
ChangeRight.Value = "Another Company"
Console.WriteLine(Document.ToString)
Document.Save("UserRights.xml")
End Sub
-
Dec 7th, 2009, 02:56 PM
#11
Thread Starter
Fanatic Member
Re: [RESOLVED] XML parsing issue (LINQ)
ah thats great kevininstructor, i need to be thinking queries not data LINQ is a powerful thing, the more i learn the more i see how powerful it is, in a way LINQ is as revolutionary to basic as vb was to qbasic
-
Dec 7th, 2009, 03:48 PM
#12
Re: [RESOLVED] XML parsing issue (LINQ)
 Originally Posted by Megalith
ah thats great kevininstructor, i need to be thinking queries not data  LINQ is a powerful thing, the more i learn the more i see how powerful it is, in a way LINQ is as revolutionary to basic as vb was to qbasic
When time permits check out the samples of LINQ here
http://msdn.microsoft.com/en-us/vbasic/bb688088.aspx
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|