|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.lucene.analysis.Analyzer
edu.mayo.informatics.indexer.lucene.analyzers.WhiteSpaceLowerCaseAnalyzer
public class WhiteSpaceLowerCaseAnalyzer
This analyzer uses the WhiteSpaceTokenizer, LowerCaseFilter, and StopFilter.
Constructor Summary | |
---|---|
WhiteSpaceLowerCaseAnalyzer()
Construct the WhiteSpaceLowerCase analyzer, using the stop words from the Standard Analyzer. |
|
WhiteSpaceLowerCaseAnalyzer(java.util.Set stopWords,
java.util.Set charsToRemove,
java.util.Set charsToTreatAsWhitespace)
Construct the WhiteSpaceLowerCase analyzer, using the provided stop words. |
|
WhiteSpaceLowerCaseAnalyzer(java.lang.String[] stopWords,
char[] charsToRemove,
char[] charsToTreatAsWhitespace)
Construct the WhiteSpaceLowerCase analyzer, using the provided stop words. |
Method Summary | |
---|---|
java.util.Set |
getCurrentCharRemovalTable()
|
java.util.Set |
getCurrentStopWordTable()
|
java.util.Set |
getCurrentWhiteSpaceEquivalentTable()
|
static char[] |
getDefaultCharRemovalSet()
Default characters to remove from indexed content. , . / \ ` ' " + * = @ # $ % ^ & ? |
static char[] |
getDefaultWhiteSpaceSet()
Default characters to treat as whitespace (in addition to standard whitespace characters). - : ; ( ) { } [ ] < > | Note that this does not include the underscore - '_' |
org.apache.lucene.analysis.TokenStream |
tokenStream(java.lang.String fieldname,
java.io.Reader reader)
|
Methods inherited from class org.apache.lucene.analysis.Analyzer |
---|
getPositionIncrementGap, getPreviousTokenStream, reusableTokenStream, setPreviousTokenStream |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public WhiteSpaceLowerCaseAnalyzer()
getDefaultCharRemovalSet()
,
getDefaultWhiteSpaceSet()
public WhiteSpaceLowerCaseAnalyzer(java.lang.String[] stopWords, char[] charsToRemove, char[] charsToTreatAsWhitespace)
stopWords
- - Stop words to use. Null or empty causes it to not use stop
words.charsToRemove
- - Characters to strip from input. null or empty causes it to
not remove any characters. @see getDefaultCharRemovalSet for a
recommended set of characters to to remove from input.charsToTreatAsWhitespace
- - Characters to treat as whitespace (or split points in the
tokenization) null or empty causes it to just split on
whitespace.public WhiteSpaceLowerCaseAnalyzer(java.util.Set stopWords, java.util.Set charsToRemove, java.util.Set charsToTreatAsWhitespace)
stopWords
- - Stop words to use. Null or empty causes it to not use stop
words.charsToRemove
- - Characters to strip from input. null or empty causes it to
not remove any characters. @see getDefaultCharRemovalSet for a
recommended set of characters to to remove from input.charsToTreatAsWhitespace
- - Characters to treat as whitespace (or split points in the
tokenization) null or empty causes it to just split on
whitespace.Method Detail |
---|
public static char[] getDefaultCharRemovalSet()
public static char[] getDefaultWhiteSpaceSet()
public final org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String fieldname, java.io.Reader reader)
tokenStream
in class org.apache.lucene.analysis.Analyzer
public java.util.Set getCurrentCharRemovalTable()
public java.util.Set getCurrentWhiteSpaceEquivalentTable()
public java.util.Set getCurrentStopWordTable()
|
Copyright: (c) 2004-2006 Mayo Foundation for Medical Education and Research (MFMER). All rights reserved. MAYO, MAYO CLINIC, and the triple-shield Mayo logo are trademarks and service marks of MFMER. | ||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |