Tagmatica - TagTools

TagTools - The software components

TagTools contains the following components:

- format detector of files from the content.

- file reader (processed formats: Text, HTML, SGML, XML)

- language detector (32 recognized languages),

- text segmentor that produces sentences,

- sentence segmentor that produces words (tokenization),

- spell corrector,

- morphological analyzer for simple words and compound words,

- robust syntactic parser (based on a chunker),

- unknow words extractor for simple words and coumpounds words, by the means of customizable patterns.

- document indexor,

- search engine upon the index built by the indexor,

- text mining tool to compare and classify texts or sum up a document by means of a small set of terms.

The develoment result may:

a) be a library accessed thru an API in order to integrate the code in a Knowledge Management (KM) application, a text mining application or another application.

b) be a ready to use application with an HTML or Swing graphical interface.

The code does not depend on the operating system. At the moment, the programs runs on Windows and Linux..

A complementary work can be done in your office or our office. It could be for instance to develop lacking functionalities, integration, consulting or training.