ANZMP guide to parser messages
This guide explains some of the messages that you may be presented with
when your geospatial metadata documents are processed by the Australia New
Zealand Metadata Parser (ANZMP).
ANZMP will retrieve your document, validate it against the
ANZMETA Document Type
Definition, and report any problems under the heading: 'Validate against
DTD' errors. The parser is strict and any problems are comprehensively
reported. It will also check the content of some fields in accordance with
the ANZLIC Guidelines for
Core Metadata Elements.
After ensuring that your metadata document looks like ANZMETA XML
format the parser will read your file as a stream of sequential
characters. When the parser encounters the pattern of an opening tag of an
element, e.g. <origin>, it will run a process to validate this
element's relationship to the structure specified by the DTD. Similar
processes will run when it encounters the closing tag to ensure that all
required children were present. At the close of some elements, the content
is compared to the relevant authority list.
Four classes of error are reported by the parser:
Please report any parsing and
validating problems that are not related to your specific metadata.
The document "About dataset
descriptions" explains geospatial dataset descriptions. Here is
an
example
XML file.
When reading the error messages you will need to bear in mind that the
parser is processing the flow of your document, character by character and
line by line. It will report errors in the order that they are
encountered. Occasionally you may need to look further down the list to
find the real cause of an earlier error.
If you have used the W3C HTML
Validation service, then you will recognise these error messages
that come directly from the wonderful 'nsgmls'
parser.
- In each example error message below:
- - the first number is the line number in the XML file
- - the second number is the character number on that line
- Missing the closing tag </origin>
- Error message:
- dataset/test.sgm:21:11:E: end tag for "ORIGIN"
omitted, but OMITTAG NO was specified
- dataset/test.sgm:12:2: start tag was here
- Explanation: Every element must have a closing XML tag <origin>
... </origin>
- Missing the required element <custod>
- Error message:
- dataset/test1.xml:13:12:E: document type does not allow
element "JURISDIC" here
- dataset/test1.xml:18:10:E: end tag for "ORIGIN"
which is not finished
- Explanation: The DTD specifies that the <origin>
element must have two sub-elements <custod> and <jurisdic>
and that they must be in that order. The parser expects to find the
<custod> element followed by the <jurisdic> element. It
didn't find the <custod> element first, so it complains about
the <jurisdic> element.
- Missing the required element <jurisdic>
- Error message:
- dataset/test.sgm:16:10:E: end tag for "ORIGIN"
which is not finished
- Explanation: As above, however this time it found the
first component of the <origin> element (the <custod>
element) but it didn't find the second component (the <jurisdic>
element).
- Missing a closing tag </p>
- Error message:
- dataset/test.sgm:26:20:E: document type does not allow
element "P" here
- dataset/test.sgm:27:21:E: document type does not allow
element "UL" here
- dataset/test.sgm:33:12:E: end tag for "P"
omitted, but OMITTAG NO was specified
- dataset/test.sgm:25:3: start tag was here
- Explanation: In this example, the <abstract>
comprised two paragraph elements <p> followed by an unnumbered
list <ul>. The closing tag </p> was missing on the first
paragraph, so the parser complains about the second paragraph and
the list element.
- Order of sub-elements is incorrect - <southbc>
must follow <northbc>
- Error message:
- dataset/test.xml:46:12:E: document type does not allow
element "SOUTHBC" here
- dataset/test.xml:48:11:E: document type does not allow
element "EASTBC" here
- dataset/test.xml:49:11:E: document type does not allow
element "WESTBC" here
- dataset/test.xml:50:13:E: end tag for "BOUNDING"
which is not finished
- Explanation: The DTD specifies that the <bounding>
element must comprise the following four elements: <northbc>, <southbc>,
<eastbc>, <westbc> and they must appear in that order.
In the example, the order was incorrect: <southbc> followed by
<northbc>, ...
- Two <keyword>s in a <place> element
- Error message:
- dataset/test.xml:44:12:E: document type does not allow
element "KEYWORD" here
- Explanation: If you want to define two "Geographic
Place Names" then you must use two <place> elements. The
DTD only allows one <keyword> sub-element for each <place>
element.
- Tried to add an unsupported element
- Error message:
- dataset/test.xml:49:12:E: document type does not allow
element "GUFF" here
- Explanation: You included an element that is not defined
in the DTD or you are adding a metadata element where it is not
allowed.
We often see this error when the custodian has forgotten the required
"contact details" section, so the parser says that the following
element "metainfo is not allowed here. Other reasons ....
Perhaps you have a misspelling (like <acconst>
instead of <accconst>). Another cause may be that you have an
uppercase element name (all metadata elements must be lowercase).
Consult the ANZMETA
documentation.
- Incorrect structure of <jurusdic> element
- Error message:
- dataset/test.sgm:17:0:E: character data is not allowed
heredataset/test.sgm:18:10:E: end tag for "JURISDIC"
which is not finished
- Explanation: The Jurisdiction element will only accept a
controlled list of keywords. You cannot just add free text. The <jurisdic>
element must enclose a <keyword> sub-element which holds the
text value. This restriction enables validating parsers to check the
keywords against the authority list.
- Missing bounding coordinates
- Error message:
- dataset/test.sgm:45:9:E: end tag for "SPDOM"
which is not finished
- Explanation: The document needs the minimum bounding
rectangle which enables spatial searching. The <spdom> element
must contain the <bounding> element which has its four
sub-elements <northbc> <eastbc> <southbc> <westbc>.
- Always use lowercase element names
- Error message:
- dataset/test.sgm:17:3:E: end tag for element "P"
which is not open
- dataset/test.sgm:18:2:E: document type does not allow
element "p" here
- dataset/test.sgm:22:15:E: end tag for "p"
omitted, but OMITTAG NO was specified
- dataset/test.sgm:13:9: start tag was here
- Explanation: The closing paragraph tag was uppercase </P>.
All element names must be in lowercase </p>.
- Incomplete elements
- Error message:
- dataset/test.sgm:80:11:E: end tag for "avlform"
which is not finished
- Explanation: The DTD specifies certain required elements.
The parser complains that a parent element is not finished, meaning
that it is missing one of its required children elements.
- Character data is not allowed here
- Error message:
- dataset/test.sgm:17:4:E: character data is not allowed
here
- Explanation: Plain text has been entered where there
should be a <keyword> element which is a container for the
text and forces the text to be from an authority list of controlled
keywords. See Incorrect structure of <jurusdic>
element for a specific example.
- Cannot use & or < or >
- Error message:
- dataset/test.sgm:17:4:W: character "<"
is the first character of a delimiter but occurred as
data
- Explanation: certain characters are reserved by XML and
cannot be used as bare content (e.g. < and > and &).
These characters are used by XML for "markup" - the
tags with angle-brackets (e.g. <title>). You must use an
alternate representation if you want to use such characters (i.e.
use < for < and use > for > and use
& for &).
Character
entities from restricted sets are the only ones allowed.
If you do not abide by this then your documents will be presented
with wierd looking characters.
- Tried to use non-SGML character
- Error message:
- dataset/test.sgm:516:24:W: non SGML character
number 127
- Explanation: a character was discovered in the content
that cannot be presented with the SGML character set. This is
probably a control character like ^?
Warning: some document editing applications allow
these spurious and unreadable characters to creep into a
document and be hidden.
Hint: do not use cut-and-paste from word processor
applications, rather save as text first.
- element "LI" undefined
- Error message:
- dataset/test.xml:39:24:E: element "LI"
undefined
- Explanation: You used a metadata element that is not
defined in the ANZMETA DTD. Remember also, that all elements names are strictly
lowercase. So a list element is <li> and not <LI> and a hypertext
anchor is <a href=http://... and not <A HREF=...
- Cannot use & in content
- Error message:
- dataset/test.sgm:17:4:W: general entity
"plan" not defined and no default entity
- Explanation: the bare character "&" was
found in content e.g. define&plan
(See explanation about
Cannot use & or < or >
above). Various different errors will occur
depending on the context of your bare ampersand,
e.g. "Black & White" will cause one type of error,
"B&W photograph" will cause different error messages.
- Missing required attribute
- Error message:
- dataset/test.xml:16:23:E: required attribute
"thesaurus" not specified
- Explanation: the relevant DTD would have specified that
this metadata element has a required attribute called thesaurus
- entity end not allowed in end tag
- Error message:
- dataset/test.xml:341:9:E: entity end not allowed in end tag
- Explanation: xml element names cannot have any strange
or hidden control characters. Or perhaps you have forgotten the
closing > symbol.
Some checks are carried out before ANZMP even sets off to work. Other
errors include difficulties retrieving your XML file from the URL that
you specified. Perhaps your metadata document does not have the required
XML header. All of these errors are reported in a list at the top of
the page. Simply correct the errors and submit the form again.
- Document Not Found
- Error message:
- The specified URL was not successfully retrieved: server
response error code = 404 - Not Found
- Explanation: This familiar WWW error message means that
the URL that you specified did not match a document on that server.
ANZMP requested your document, but the server said that it did not
have such a thing.
- Authorisation Required
- Error message:
- The specified URL was not successfully retrieved: server
response error code = 401 - Authorisation Required
- Explanation: The remote server asked for user
authentication, because your XML file is located in a password
protected directory. There is no way for Eco Companion to
automatically provide the authentication, so your server denies the
request. You will need to place the XML file in an openly available
directory on your server (that has no such authorisation
restrictions).
- Missing XML declaration
- Error message:
- The XML declaration must be the first line of the XML
file
- Explanation: The first line of your file must be similar
to this ...
- <?xml version="1.0"?>
- Missing SGML DOCTYPE declaration
- Error message:
- The SGML DOCTYPE declaration must be near the top of the
SGML file
- Explanation: The next lines must be ...
- <!DOCTYPE anzmeta PUBLIC "-//ANZLIC//DTD ANZMETA 1.3//EN"
"http://www.auslig.gov.au/anzmeta/anzmeta-1.3.dtd">
- Note that the ANZMP parser can validate documents against other DTDs. Send feedback to have yours included.
- Not the proper filename extension
- Error message:
- The specified URL does not have '.xml' or '.sgm' or
'.sgml' extension
- Explanation: The filename of the document must end with
one of those extensions
- Poorly constructed name for the dataset
- Error message:
- 'Short dataset name' must only contain alphabetic
characters, digits, or dashes (-)
- Explanation: The storage name that you have chosen for
the dataset can only use a limited set of characters
- Incorrectly formatted name for the dataset
- Error message:
- 'Short dataset name' must not have a filename extension
- Explanation: Don't say "mydataset.sgm", simply
say "mydataset".
- Dataset name is too long
- Error message:
- 'Short dataset name' must not be longer than 20
characters
- Explanation: The storage name that you have chosen for
the dataset must be between 3 and 20 characters long. Long names
form confusing URLs.
- You have reached your document entitlement
- Error message:
- Processing this new dataset will exceed your
entitlement. You must upgrade your membership or delete an
existing dataset.
- Explanation: Grade 1 members are allowed one published
dataset description and two in preparation. You can process a
dataset description with the same dataset shortname a number of
times. However, you cannot try to add a third dataset. The document
About membership explains
the grades of membership, entitlements, and costs.
Sometimes the content of a field can cause the post-processor to die.
These errors are at the document production and formatting stage of the
process.
- Couldn't word-wrap because string too long
- Error message:
- couldn't wrap
'ThisLineIsWayLoooooooooooooooooooonnnnnnnnnnnnnnnnnnnngggggggggggggggggggg'
at /usr/lib/perl5/Text/Wrap.pm line 87, STDIN chunk 334.
- Explanation: The post-processor attempts to format the
output for the TEXT file to 70 columns. To do this it needs to
word-wrap the paragraphs so that the words are kept whole. The most
common source of long words is URLs.
- Solution: If your URL is very long then you could use a
Persistent URL (PURL) where
you can assign a simple URL which would be automatically redirected
to your long URL.
URL:http://www.indexgeo.com.au/ec/help/anzmp-guide.html
Last Modified: 11 April 2005