Centro tecnológico Cartif

Home Research areas TICs Gestión documental

Document Management: detection and management of inconsistencies

Documentation, both on paper and electronic format, is an essential element in the information society. It is the mainly used method to store, save, and exchange information in most of the activities currently performed within the human context, so data and the knowledge contained in it have to be accurate and clear, thus avoiding any possible confusion and contradiction. But this objective is not a minor one and depends on multiple factors.

computational architectureIt is difficult to find organizations that manage the flow of information in a centralized and formal way, and it is not uncommon to come across with collections of heterogeneous information, that not have been submitted to a unique formulation when generated and have multiple structures. This fact generates certain issues centred on the emergence of inconsistencies or “incoherences” on said documentation.

approachWithin this context, inconsistency or “incoherence” is understood as the lack of coherence between related documents (problems with cross-references), detected by the comparison of the documents with the regulations’ contents or other referred documents, by the comparison of the documents with themselves, or by the analysis of their structure. In general, an inconsistency is caused by the existence of redundant or contradictory data, or even by lost or erroneous information.

The consequences brought by these inconsistencies, and the inability to execute a coherent documentation management, generate the emergence of problems, not only technical but also legal, social and economic, since the available documentation could cause confusions, mistakes, or prejudices to the entities using it.

The appearance of these problems, which not only affect the entity or organization that generates the documentation but could also have social repercussions, justify the interest on studying the techniques that science and technology make available for the detection of these inconsistencies, their elimination, and the set-up of any methodology that could avoid their reappearance.

Based on the application of various soft-computing techniques (patter recognition, Latent Semantic Analysis, data and text mining), and formal representation techniques like the use of grammars and semantics, various aspects of document management are dealt with in this research line:

  • Content interpretation.
  • Automatic detection and elimination of inconsistencies.
  • Document coherence.

The main element of study is the automatic detection of document incoherences. It has been found that the lack of document coherence affects sectors as dissimilar as the health, legal, or construction sectors. This research line has focused its study, by the development of various research projects, on the following sectors: power, railway, scientific, financial and accounting, construction, energy, and security and defence sectors.

Related projects:

Related publications:

S. Martin, G.I. Sainz, Y. Dimitriadis. "Detection of incoherences in a technical and normative document corpus". Proceedings of the 10th International Conference on Enterprise Information Systems (ICEIS 2008), vol. II Artificial Intelligence and Decision Support Systems, pp. 282-287, Barcelona (Spain). June 2008.

S. Martín, V. Arribas, G. I. Sáinz. "Detection of incoherences in a document corpus based on the application of neuro-fuzzy system". Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR 2009). Barcelona (Spain). July 2009.

More information: This e-mail address is being protected from spambots. You need JavaScript enabled to view it