Class DictionaryLookup
java.lang.Object
morfologik.stemming.DictionaryLookup
-
Constructor Summary
ConstructorsConstructorDescriptionDictionaryLookup(Dictionary dictionary) Creates a new object of this class using the given FSA for word lookups and encoding for converting characters to bytes. -
Method Summary
Modifier and TypeMethodDescriptionstatic StringapplyReplacements(CharSequence word, LinkedHashMap<String, String> replacements) Apply partial string replacements from a given map.chariterator()Return an iterator over allWordDataentries available in the embeddedDictionary.lookup(CharSequence word) Searches the automaton for a symbol sequence equal toword, followed by a separator.Methods inherited from class Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface Iterable
forEach, spliterator
-
Constructor Details
-
DictionaryLookup
Creates a new object of this class using the given FSA for word lookups and encoding for converting characters to bytes.- Parameters:
dictionary- The dictionary to use for lookups.- Throws:
IllegalArgumentException- if FSA's root node cannot be acquired (dictionary is empty).
-
-
Method Details
-
lookup
Searches the automaton for a symbol sequence equal toword, followed by a separator. The result is a stem (decompressed accordingly to the dictionary's specification) and an optional tag data. -
applyReplacements
public static String applyReplacements(CharSequence word, LinkedHashMap<String, String> replacements) Apply partial string replacements from a given map. Useful if the word needs to be normalized somehow (i.e., ligatures, apostrophes and such).- Parameters:
word- The word to apply replacements to.replacements- A map of replacements (from->to).- Returns:
- new string with all replacements applied.
-
iterator
-
getDictionary
- Returns:
- Return the
Dictionaryused by this object.
-
getSeparatorChar
public char getSeparatorChar()- Returns:
- Returns the logical separator character splitting inflected form,
lemma correction token and a tag. Note that this character is a best-effort
conversion from a byte in
DictionaryMetadata.separatorand may not be valid in the target encoding (although this is highly unlikely).
-