xml - XMLSchema: Is it possible to calculate how valid an invalid document is (eg. as a percentage)? -
i'm using lxml
in python validate number of xml documents against xml schema definition. number of these documents not validate -- , @ moment they're not expected -- useful if calculate how valid are, percentage, reporting purposes. have ability use xmllint
or other command line tools, should able provide useful statistic.
lxml
parsers provide way a list of errors occurred while trying parse document. combine parser's recover
keyword argument , this:
# warning, untested, may not work parser = etree.xmlparser(recover=true) it_would_be_a_tree = etree.parse(your_xml_data, parser) total_errors = len(parser.error_log)
then can calculate percentage of file total_errors
represents. use naive measure, errors per line or errors per character without trouble. more sophisticated measures possible if it_would_be_a_tree
tree
structure (total_elements / total_errors
, example).
Comments
Post a Comment