Eco Companion Help: ANZMP guide to parser messages

Eco Companion Australasia

Help

ANZMP guide to parser messages

This guide explains some of the messages that you may be presented with when your geospatial metadata documents are processed by the Australia New Zealand Metadata Parser (ANZMP).

ANZMP will retrieve your document, validate it against the ANZMETA Document Type Definition, and report any problems under the heading: 'Validate against DTD' errors. The parser is strict and any problems are comprehensively reported. It will also check the content of some fields in accordance with the ANZLIC Guidelines for Core Metadata Elements.

After ensuring that your metadata document looks like ANZMETA XML format the parser will read your file as a stream of sequential characters. When the parser encounters the pattern of an opening tag of an element, e.g. <origin>, it will run a process to validate this element's relationship to the structure specified by the DTD. Similar processes will run when it encounters the closing tag to ensure that all required children were present. At the close of some elements, the content is compared to the relevant authority list.

Four classes of error are reported by the parser:

Pre-parser errors
- document not found, not a plain text file, missing XML declaration
- described below
Validation against DTD errors
- missing required elements, not a "well-formed" document, poor XML structure
- described below
Content validation errors
- keywords are not in authority list, coordinates are out of range, invalid date
- these are described in a separate document
Occasional fatal errors
- sometimes the content of a field can cause the parser to die
- a word (mostly URLs) is too long and cannot be word-wrapped for formatted text output
- described below

Please report any parsing and validating problems that are not related to your specific metadata.

The document "About dataset descriptions" explains geospatial dataset descriptions. Here is an example XML file.

Example "Validation against DTD" errors:

When reading the error messages you will need to bear in mind that the parser is processing the flow of your document, character by character and line by line. It will report errors in the order that they are encountered. Occasionally you may need to look further down the list to find the real cause of an earlier error.

If you have used the W3C HTML Validation service, then you will recognise these error messages that come directly from the wonderful 'nsgmls' parser.

In each example error message below:: - the first number is the line number in the XML file; - the second number is the character number on that line

Missing the closing tag </origin>
Missing the required element <custod>
Missing the required element <jurisdic>
Missing a closing tag
Order of sub-elements is incorrect
Two <keyword>s in a <place> element
Tried to add an unsupported element
Incorrect structure of <jurusdic> element

Missing bounding coordinates
Always use lowercase element names
Incomplete elements
Character data is not allowed here
Cannot use & or < or >
Tried to use non-SGML character
element "LI" undefined
Cannot use & in content

Missing required attribute

Entity end not allowed in end tag

Missing the closing tag </origin>
- Error message:
 - dataset/test.sgm:21:11:E: end tag for "ORIGIN" omitted, but OMITTAG NO was specified
 - dataset/test.sgm:12:2: start tag was here
- Explanation: Every element must have a closing XML tag <origin> ... </origin>
Missing the required element <custod>
- Error message:
 - dataset/test1.xml:13:12:E: document type does not allow element "JURISDIC" here
 - dataset/test1.xml:18:10:E: end tag for "ORIGIN" which is not finished
- Explanation: The DTD specifies that the <origin> element must have two sub-elements <custod> and <jurisdic> and that they must be in that order. The parser expects to find the <custod> element followed by the <jurisdic> element. It didn't find the <custod> element first, so it complains about the <jurisdic> element.
Missing the required element <jurisdic>
- Error message:
 - dataset/test.sgm:16:10:E: end tag for "ORIGIN" which is not finished
- Explanation: As above, however this time it found the first component of the <origin> element (the <custod> element) but it didn't find the second component (the <jurisdic> element).
Missing a closing tag 
- Error message:
 - dataset/test.sgm:26:20:E: document type does not allow element "P" here
 - dataset/test.sgm:27:21:E: document type does not allow element "UL" here
 - dataset/test.sgm:33:12:E: end tag for "P" omitted, but OMITTAG NO was specified
 - dataset/test.sgm:25:3: start tag was here
- Explanation: In this example, the <abstract> comprised two paragraph elements followed by an unnumbered list <ul>. The closing tag was missing on the first paragraph, so the parser complains about the second paragraph and the list element.
Order of sub-elements is incorrect - <southbc> must follow <northbc>
- Error message:
 - dataset/test.xml:46:12:E: document type does not allow element "SOUTHBC" here
 - dataset/test.xml:48:11:E: document type does not allow element "EASTBC" here
 - dataset/test.xml:49:11:E: document type does not allow element "WESTBC" here
 - dataset/test.xml:50:13:E: end tag for "BOUNDING" which is not finished
- Explanation: The DTD specifies that the <bounding> element must comprise the following four elements: <northbc>, <southbc>, <eastbc>, <westbc> and they must appear in that order. In the example, the order was incorrect: <southbc> followed by <northbc>, ...
Two <keyword>s in a <place> element
- Error message:
 - dataset/test.xml:44:12:E: document type does not allow element "KEYWORD" here
- Explanation: If you want to define two "Geographic Place Names" then you must use two <place> elements. The DTD only allows one <keyword> sub-element for each <place> element.
Tried to add an unsupported element
- Error message:
 - dataset/test.xml:49:12:E: document type does not allow element "GUFF" here
- Explanation: You included an element that is not defined in the DTD or you are adding a metadata element where it is not allowed. We often see this error when the custodian has forgotten the required "contact details" section, so the parser says that the following element "metainfo is not allowed here. Other reasons .... Perhaps you have a misspelling (like <acconst> instead of <accconst>). Another cause may be that you have an uppercase element name (all metadata elements must be lowercase). Consult the ANZMETA documentation.
Incorrect structure of <jurusdic> element
- Error message:
 - dataset/test.sgm:17:0:E: character data is not allowed heredataset/test.sgm:18:10:E: end tag for "JURISDIC" which is not finished
- Explanation: The Jurisdiction element will only accept a controlled list of keywords. You cannot just add free text. The <jurisdic> element must enclose a <keyword> sub-element which holds the text value. This restriction enables validating parsers to check the keywords against the authority list.
Missing bounding coordinates
- Error message:
 - dataset/test.sgm:45:9:E: end tag for "SPDOM" which is not finished
- Explanation: The document needs the minimum bounding rectangle which enables spatial searching. The <spdom> element must contain the <bounding> element which has its four sub-elements <northbc> <eastbc> <southbc> <westbc>.
Always use lowercase element names
- Error message:
 - dataset/test.sgm:17:3:E: end tag for element "P" which is not open
 - dataset/test.sgm:18:2:E: document type does not allow element "p" here
 - dataset/test.sgm:22:15:E: end tag for "p" omitted, but OMITTAG NO was specified
 - dataset/test.sgm:13:9: start tag was here
- Explanation: The closing paragraph tag was uppercase . All element names must be in lowercase .
Incomplete elements
- Error message:
  - dataset/test.sgm:80:11:E: end tag for "avlform" which is not finished
- Explanation: The DTD specifies certain required elements. The parser complains that a parent element is not finished, meaning that it is missing one of its required children elements.
Character data is not allowed here
- Error message:
 - dataset/test.sgm:17:4:E: character data is not allowed here
- Explanation: Plain text has been entered where there should be a <keyword> element which is a container for the text and forces the text to be from an authority list of controlled keywords. See Incorrect structure of <jurusdic> element for a specific example.
Cannot use & or < or >
- Error message:
 - dataset/test.sgm:17:4:W: character "<" is the first character of a delimiter but occurred as data
- Explanation: certain characters are reserved by XML and cannot be used as bare content (e.g. < and > and &). These characters are used by XML for "markup" - the tags with angle-brackets (e.g. <title>). You must use an alternate representation if you want to use such characters (i.e. use < for < and use > for > and use & for &).
 Character entities from restricted sets are the only ones allowed. If you do not abide by this then your documents will be presented with wierd looking characters.
Tried to use non-SGML character
- Error message:
  - dataset/test.sgm:516:24:W: non SGML character number 127
- Explanation: a character was discovered in the content that cannot be presented with the SGML character set. This is probably a control character like ^?
  Warning: some document editing applications allow these spurious and unreadable characters to creep into a document and be hidden.
  Hint: do not use cut-and-paste from word processor applications, rather save as text first.
element "LI" undefined
- Error message:
 - dataset/test.xml:39:24:E: element "LI" undefined
- Explanation: You used a metadata element that is not defined in the ANZMETA DTD. Remember also, that all elements names are strictly lowercase. So a list element is <li> and not <LI> and a hypertext anchor is <a href=http://... and not <A HREF=...
Cannot use & in content
- Error message:
 - dataset/test.sgm:17:4:W: general entity "plan" not defined and no default entity
- Explanation: the bare character "&" was found in content e.g. define&plan
 (See explanation about Cannot use & or < or > above). Various different errors will occur depending on the context of your bare ampersand, e.g. "Black & White" will cause one type of error, "B&W photograph" will cause different error messages.
Missing required attribute
- Error message:
  - dataset/test.xml:16:23:E: required attribute "thesaurus" not specified
- Explanation: the relevant DTD would have specified that this metadata element has a required attribute called thesaurus
entity end not allowed in end tag
- Error message:
  - dataset/test.xml:341:9:E: entity end not allowed in end tag
- Explanation: xml element names cannot have any strange or hidden control characters. Or perhaps you have forgotten the closing > symbol.

Pre-parser errors

Some checks are carried out before ANZMP even sets off to work. Other errors include difficulties retrieving your XML file from the URL that you specified. Perhaps your metadata document does not have the required XML header. All of these errors are reported in a list at the top of the page. Simply correct the errors and submit the form again.

Document Not Found
Authorisation Required
Missing XML declaration
Missing SGML DOCTYPE declaration
Not the proper filename extension

Poorly constructed name for the dataset
Incorrectly formatted name for the dataset
Dataset name is too long
You have reached your document entitlement

Document Not Found
- Error message:
  - The specified URL was not successfully retrieved: server response error code = 404 - Not Found
- Explanation: This familiar WWW error message means that the URL that you specified did not match a document on that server. ANZMP requested your document, but the server said that it did not have such a thing.
Authorisation Required
- Error message:
  - The specified URL was not successfully retrieved: server response error code = 401 - Authorisation Required
- Explanation: The remote server asked for user authentication, because your XML file is located in a password protected directory. There is no way for Eco Companion to automatically provide the authentication, so your server denies the request. You will need to place the XML file in an openly available directory on your server (that has no such authorisation restrictions).
Missing XML declaration
- Error message:
 - The XML declaration must be the first line of the XML file
- Explanation: The first line of your file must be similar to this ...
- <?xml version="1.0"?>
Missing SGML DOCTYPE declaration
- Error message:
 - The SGML DOCTYPE declaration must be near the top of the SGML file
- Explanation: The next lines must be ...
- <!DOCTYPE anzmeta PUBLIC "-//ANZLIC//DTD ANZMETA 1.3//EN" "http://www.auslig.gov.au/anzmeta/anzmeta-1.3.dtd">
- Note that the ANZMP parser can validate documents against other DTDs. Send feedback to have yours included.
Not the proper filename extension
- Error message:
  - The specified URL does not have '.xml' or '.sgm' or '.sgml' extension
- Explanation: The filename of the document must end with one of those extensions
Poorly constructed name for the dataset
- Error message:
  - 'Short dataset name' must only contain alphabetic characters, digits, or dashes (-)
- Explanation: The storage name that you have chosen for the dataset can only use a limited set of characters
Incorrectly formatted name for the dataset
- Error message:
  - 'Short dataset name' must not have a filename extension
- Explanation: Don't say "mydataset.sgm", simply say "mydataset".
Dataset name is too long
- Error message:
  - 'Short dataset name' must not be longer than 20 characters
- Explanation: The storage name that you have chosen for the dataset must be between 3 and 20 characters long. Long names form confusing URLs.
You have reached your document entitlement
- Error message:
  - Processing this new dataset will exceed your entitlement. You must upgrade your membership or delete an existing dataset.
- Explanation: Grade 1 members are allowed one published dataset description and two in preparation. You can process a dataset description with the same dataset shortname a number of times. However, you cannot try to add a third dataset. The document About membership explains the grades of membership, entitlements, and costs.

Occasional fatal errors

Sometimes the content of a field can cause the post-processor to die. These errors are at the document production and formatting stage of the process.

Couldn't word-wrap because string too long
- Error message:
  - couldn't wrap 'ThisLineIsWayLoooooooooooooooooooonnnnnnnnnnnnnnnnnnnngggggggggggggggggggg' at /usr/lib/perl5/Text/Wrap.pm line 87, STDIN chunk 334.
- Explanation: The post-processor attempts to format the output for the TEXT file to 70 columns. To do this it needs to word-wrap the paragraphs so that the words are kept whole. The most common source of long words is URLs.
- Solution: If your URL is very long then you could use a Persistent URL (PURL) where you can assign a simple URL which would be automatically redirected to your long URL.

URL:http://www.indexgeo.com.au/ec/help/anzmp-guide.html
Last Modified: 11 April 2005