edu.mayo.informatics.indexer.lucene.analyzers
Class EncoderAnalyzer

java.lang.Object
  extended by org.apache.lucene.analysis.Analyzer
      extended by edu.mayo.informatics.indexer.lucene.analyzers.EncoderAnalyzer

public class EncoderAnalyzer
extends org.apache.lucene.analysis.Analyzer

This is an analyzer that generates codes for each token to index. Uses the Apache commons coded package.

Author:
Dan Armbrust

Constructor Summary
EncoderAnalyzer()
          Create a new PhonetixAnalyzer.
EncoderAnalyzer(org.apache.commons.codec.Encoder encoder)
          The lvg config file location is required.
EncoderAnalyzer(org.apache.commons.codec.Encoder encoder, java.lang.String[] stopWords, char[] charsToRemove, char[] charsToTreatAsWhiteSpace)
          Create a new EncoderAnalyzer - everything configured by the user.
EncoderAnalyzer(java.lang.String[] stopWords, char[] charsToRemove, char[] charsToTreatAsWhiteSpace)
          Create a new EncoderAnalyzer - uses a default configured DoubleMetaphone encoder.
 
Method Summary
 WhiteSpaceLowerCaseAnalyzer getWhiteSpaceLowerCaseAnalyzer()
          This method should not be part of the public API - but design requirements require it to be public.
 void setWhiteSpaceLowerCaseAnalyzer(WhiteSpaceLowerCaseAnalyzer whiteSpaceLowerCaseAnalyzer)
          This method should not be part of the public API - but design requirements require it to be public.
 org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String fieldname, java.io.Reader reader)
          Create a token stream for this analyzer.
 
Methods inherited from class org.apache.lucene.analysis.Analyzer
getPositionIncrementGap, getPreviousTokenStream, reusableTokenStream, setPreviousTokenStream
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

EncoderAnalyzer

public EncoderAnalyzer()
Create a new PhonetixAnalyzer. Uses all defaults in the

See Also:
and a DoubleMetaphone generator set to the default values.

EncoderAnalyzer

public EncoderAnalyzer(org.apache.commons.codec.Encoder encoder,
                       java.lang.String[] stopWords,
                       char[] charsToRemove,
                       char[] charsToTreatAsWhiteSpace)
Create a new EncoderAnalyzer - everything configured by the user.

Parameters:
encoder - - The encoder to use. DoubleMetaphone, Metaphone, Soundex, etc.
stopWords - - Stop words to use - not used if null or empty.
charsToRemove - - characters to remove from input (before encoding) - not used if null or empty.
charsToTreatAsWhiteSpace - - characters to treat as whitespace (split points) - defaults to typical whitespace if null or empty.

EncoderAnalyzer

public EncoderAnalyzer(java.lang.String[] stopWords,
                       char[] charsToRemove,
                       char[] charsToTreatAsWhiteSpace)
Create a new EncoderAnalyzer - uses a default configured DoubleMetaphone encoder.

Parameters:
stopWords - - Stop words to use - not used if null or empty.
charsToRemove - - characters to remove from input (before encoding) - not used if null or empty.
charsToTreatAsWhiteSpace - - characters to treat as whitespace (split points) - defaults to typical whitespace if null or empty.

EncoderAnalyzer

public EncoderAnalyzer(org.apache.commons.codec.Encoder encoder)
The lvg config file location is required. Uses all defaults in the

Parameters:
encoder -
See Also:
WhiteSpaceLowerCaseAnalyzer
Method Detail

tokenStream

public final org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String fieldname,
                                                                java.io.Reader reader)
Create a token stream for this analyzer.

Specified by:
tokenStream in class org.apache.lucene.analysis.Analyzer

getWhiteSpaceLowerCaseAnalyzer

public WhiteSpaceLowerCaseAnalyzer getWhiteSpaceLowerCaseAnalyzer()
This method should not be part of the public API - but design requirements require it to be public. Do not use this method.


setWhiteSpaceLowerCaseAnalyzer

public void setWhiteSpaceLowerCaseAnalyzer(WhiteSpaceLowerCaseAnalyzer whiteSpaceLowerCaseAnalyzer)
This method should not be part of the public API - but design requirements require it to be public. Do not use this method.


Copyright: (c) 2004-2006 Mayo Foundation for Medical Education and Research (MFMER). All rights reserved. MAYO, MAYO CLINIC, and the triple-shield Mayo logo are trademarks and service marks of MFMER.