Class DictionaryLookup

java.lang.Object
morfologik.stemming.DictionaryLookup
All Implemented Interfaces:
Iterable<WordData>, IStemmer

public final class DictionaryLookup extends Object implements IStemmer, Iterable<WordData>
This class implements a dictionary lookup of an inflected word over a dictionary previously compiled using the dict_compile tool.
  • Constructor Details

    • DictionaryLookup

      public DictionaryLookup(Dictionary dictionary) throws IllegalArgumentException
      Creates a new object of this class using the given FSA for word lookups and encoding for converting characters to bytes.
      Parameters:
      dictionary - The dictionary to use for lookups.
      Throws:
      IllegalArgumentException - if FSA's root node cannot be acquired (dictionary is empty).
  • Method Details

    • lookup

      public List<WordData> lookup(CharSequence word)
      Searches the automaton for a symbol sequence equal to word, followed by a separator. The result is a stem (decompressed accordingly to the dictionary's specification) and an optional tag data.
      Specified by:
      lookup in interface IStemmer
      Parameters:
      word - The word (typically inflected) to look up base forms for.
      Returns:
      A list of WordData entries (possibly empty).
    • applyReplacements

      public static String applyReplacements(CharSequence word, LinkedHashMap<String,String> replacements)
      Apply partial string replacements from a given map. Useful if the word needs to be normalized somehow (i.e., ligatures, apostrophes and such).
      Parameters:
      word - The word to apply replacements to.
      replacements - A map of replacements (from->to).
      Returns:
      new string with all replacements applied.
    • iterator

      public Iterator<WordData> iterator()
      Return an iterator over all WordData entries available in the embedded Dictionary.
      Specified by:
      iterator in interface Iterable<WordData>
    • getDictionary

      public Dictionary getDictionary()
      Returns:
      Return the Dictionary used by this object.
    • getSeparatorChar

      public char getSeparatorChar()
      Returns:
      Returns the logical separator character splitting inflected form, lemma correction token and a tag. Note that this character is a best-effort conversion from a byte in DictionaryMetadata.separator and may not be valid in the target encoding (although this is highly unlikely).