You may well have to declare those characters as entities.
Not sure but there may also be a character set issue in the xml directive at the top of the file.
Printable View
You may well have to declare those characters as entities.
Not sure but there may also be a character set issue in the xml directive at the top of the file.
since I don't know a great deal about XML, how do I do that?
Entities, what?
Have a look here:
http://www.xml.com/pub/a/2001/01/31/qanda.html
Found from doing a search on Google: entities special characters xml
Are those characters part of the standard ASCII character set?
Dave
So I need to declare them somehow? In a DOCTYPE element?
The reference you posted talks about DTD, I'm using a schema (XSD) how does all that hang together
The sorts of characters I'm using are:
° = ALT+248
± = ALT+177
² = ALT+0178
³ = ALT+0179
Yes that's right you need to declare them in a DTD (DOCTYPE element).
I suggest you take a few hours out to learn about DTDs, they really are quite simple. Don't think of XML as anything other than self-describing data (in markup form).
I learnt about DTDs a while back - just starting on Schemas. Take a look here:
http://www.xml101.com/dtd/default.asp
And here:
http://lists.w3.org/Archives/Public/...1Jul/0057.html
It would imply you can use a combination of schema and DTD.
I imagine this is the way you will accomplish what you are trying to do.
Thanks for your help so far dave. I haveone more thing I'm not sure about.
It isn't practical for me to prevent the '³' characters from appearing in my XML file. From what I've read, entities allow things to be recognised as '&something;', but that wouldn't be the case in my situation. I want the charcter itself to be recognised.
Am I understanding this correctly?
The alternative is to remove the validation of my documents entirely, which I'm not too keen on...
Entity declarations are a kind of variable, so you can reference any string (could be a standard footer or disclaimer) with another (much shorter or more convenient) one i.e.
<?xml version="1.0"?>
<!-- Internal DTD Subset-->
<!DOCTYPE page [
<!ENTITY disclaimer "We are not responsible for the content of this page in any way">
]>
<!--DTD end-->
<page>
<content>Blah Blah Blah</content>
<disclaim>&disclaimer;</disclaim>
</page>
So you might implement something like:
<?xml version="1.0"?>
<!-- Internal DTD Subset-->
<!DOCTYPE juddspage [
<!ENTITY degrees "ø">
<!ENTITY slash "/">
]>
<!--DTD end-->
<juddspage>
<Unit External="kW&slash;°rees;C" Internal="kW&slash;°rees;C" Zero="0." Scale="1." Offset="0.">kW&slash;°rees;C :: kW&slash;°rees;C</Unit>
</juddspage>
but I think I haven't explained myself properly.
What you describe is a change to the content of the XML file.
I'm writing large files so it is not really practical to watch every character and replace '°' with '°ree;' etc.
What I need is a way to instruct the parser (through the schema file? DTD?) that if it comes across a '³' or a '°' (for instance) when validating, not to throw a fatal error event. I really don't see why these are considered invalid characters in the first place.
I guess I need to know if this can even be done?
Sorry to be a pain about this!
Even & and / are invalid characters and have to defined by their ASCII Character codes, although in the case of >, <, & etc. the XML equivalents. & will do fine.
Thinking logically about it, I'm not so sure the parser doesn't like the special characters, could it be objecting to the "/" s before them?
Dave
..our internet pipe went down. I found the 'actual' problem was my encoding statement. IF I specified the encoding as UTF-8 the validation failed, if I used ISO-8859-1 (Latin-1) the validation passed.
However, I had to hack in the XML declaration manually (hardcode it) because using the put_encoding method ddoesn't seem to work (if you don't use it, it defaults to writing the encoding as UTF-16 (even though it doesn't write in that encoding!!!), if you use put_encoding, it doesn't write any encoding statement at all, thus assuming UTF-8 and failing validation....