com.ta.TagExtractAPI
Interface FilteringAPI


public interface FilteringAPI

Top-level API for filtering
Filtering is applied to the result of an extraction in order to obtain a controled sub-set.

See Also:
ExtractAPI

Method Summary
 java.lang.String apply(ExtractAPI exAPI, int countingMinimum, int countingMaximum, java.util.ArrayList<FilteringPattern> patterns, double share, double proxi, java.lang.String cannedText, java.lang.String criterion, java.util.Set<java.lang.String> alreadyCreated, java.util.Set<java.lang.String> alreadyRejected, java.util.Set<java.lang.String> thesaurus, boolean pruneWhenSameWeakTypography, java.util.Set<java.lang.String> exclusion)
          Run the filtering after an extraction.
 java.util.TreeMap<java.lang.String,Candidate> getResult()
          Returns the candidates, i.e. the result of filtering
Each result is a pair composed of:
- the term's lemmatised form, as a TreeMap key
- the Candidate instance, as a TreeMap value
 void writeResult(java.lang.String fileName)
          Write the result in an XML file
 

Method Detail

apply

java.lang.String apply(ExtractAPI exAPI,
                       int countingMinimum,
                       int countingMaximum,
                       java.util.ArrayList<FilteringPattern> patterns,
                       double share,
                       double proxi,
                       java.lang.String cannedText,
                       java.lang.String criterion,
                       java.util.Set<java.lang.String> alreadyCreated,
                       java.util.Set<java.lang.String> alreadyRejected,
                       java.util.Set<java.lang.String> thesaurus,
                       boolean pruneWhenSameWeakTypography,
                       java.util.Set<java.lang.String> exclusion)
Run the filtering after an extraction. It is possible to run multiple filtering processes from one extraction.

Parameters:
exAPI - the extraction result, see ExtractAPI
countingMinimum - floor limit for counting criterion
countingMaximum - upper limit for counting criterion
patterns - list of patterns for structural criterion
share - variable for distributional criterion
proxi - variable for distributional criterion
cannedText - character string for textual criterion
criterion - criterion or criterion combination to be taken in
"only C", "only S", "only D", "only T",
"C and S", "C or S", "C and D", "C or D", "C and T", "C or T",
"S and D", "S or D", "S and T", "S or T", "D and T", "D or T",
"C and S and D and T", "C or S or D or T"
alreadyCreated - lemmatised forms to be ignored when candidate
alreadyRejected - lemmatised forms to be ignored when candidate
thesaurus - lemmatised forms to be ignored when candidate
pruneWhenSameWeakTypography - to factorize the terms that have the same weak typography lemmatised form
exclusion - to ignore a candidate when one of these strings is in the lemmatized form
Returns:
"ok" or a message string in case of error.

getResult

java.util.TreeMap<java.lang.String,Candidate> getResult()
Returns the candidates, i.e. the result of filtering

Each result is a pair composed of:
- the term's lemmatised form, as a TreeMap key
- the Candidate instance, as a TreeMap value


writeResult

void writeResult(java.lang.String fileName)
Write the result in an XML file