Class LangProfile
java.lang.Object
com.optimaize.langdetect.cybozu.util.LangProfile
- All Implemented Interfaces:
Serializable
Deprecated.
replaced by LanguageProfile
LangProfile is a Language Profile Class.
Users don't use this class directly.
TODO split into builder and immutable class.
TODO currently this only makes n-grams with the space before a word included. no n-gram with the space after the word.
Example: "foo" creates " fo" as 3gram, but not "oo ". Either this is a bug, or if intended then needs documentation.- Author:
- Nakatani Shuyo
- See Also:
-
Constructor Summary
ConstructorsConstructorDescriptionDeprecated.Constructor for JSONICLangProfile(String name) Deprecated.Normal Constructor -
Method Summary
Modifier and TypeMethodDescriptionvoidDeprecated.Add n-gram to profilegetFreq()Deprecated.getName()Deprecated.int[]Deprecated.voidDeprecated.Removes ngrams that occur fewer times than MINIMUM_FREQ to get rid of rare ngrams.voidDeprecated.voidDeprecated.voidsetNWords(int[] nWords) Deprecated.
-
Constructor Details
-
LangProfile
public LangProfile()Deprecated.Constructor for JSONIC -
LangProfile
-
-
Method Details
-
add
-
omitLessFreq
public void omitLessFreq()Deprecated.Removes ngrams that occur fewer times than MINIMUM_FREQ to get rid of rare ngrams. Also removes ascii ngrams if the total number of ascii ngrams is less than one third of the total. This is done because non-latin text (such as Chinese) often has some latin noise in between. TODO split the 2 cleaning to separate methods. TODO distinguish ascii/latin, currently it looks for latin only, should include characters with diacritics, eg Vietnamese. TODO current code counts ascii, but removes any latin. is that desired? if so then this needs documentation. -
getName
Deprecated. -
setName
Deprecated. -
getFreq
-
setFreq
-
getNWords
public int[] getNWords()Deprecated. -
setNWords
public void setNWords(int[] nWords) Deprecated.
-