edu.mayo.informatics.indexer.lucene.analyzers
Class NormAnalyzer

java.lang.Object
  extended by org.apache.lucene.analysis.Analyzer
      extended by edu.mayo.informatics.indexer.lucene.analyzers.NormAnalyzer

public class NormAnalyzer
extends org.apache.lucene.analysis.Analyzer

This is an analyzer that uses LVG to normalize each term before it is inserted into the index.

Author:
Dan Armbrust

Field Summary
static int LVG_CACHE_SIZE
           
static java.lang.String LVG_CONFIG_FILE_ABSOLUTE
           
 
Constructor Summary
NormAnalyzer()
          Create a new NormAnalyzer.
NormAnalyzer(boolean keepOrigional)
           
NormAnalyzer(boolean keepOrigional, java.lang.String[] stopWords, char[] charsToRemove, char[] charsToTreatAsWhiteSpace)
          Create a norm analyzer.
NormAnalyzer(java.lang.String lvgConfigFileLocation, boolean keepOrigional)
          The lvg config file location is required.
NormAnalyzer(java.lang.String lvgConfigFileLocation, boolean keepOrigional, java.lang.String[] stopWords, char[] charsToRemove, char[] charsToTreatAsWhiteSpace)
          The lvg config file location is required.
 
Method Summary
 WhiteSpaceLowerCaseAnalyzer getWhiteSpaceLowerCaseAnalyzer()
          This method should not be part of the public API - but design requirements require it to be public.
 void setWhiteSpaceLowerCaseAnalyzer(WhiteSpaceLowerCaseAnalyzer whiteSpaceLowerCaseAnalyzer)
          This method should not be part of the public API - but design requirements require it to be public.
 org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String fieldname, java.io.Reader reader)
           
 
Methods inherited from class org.apache.lucene.analysis.Analyzer
getPositionIncrementGap, getPreviousTokenStream, reusableTokenStream, setPreviousTokenStream
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LVG_CONFIG_FILE_ABSOLUTE

public static java.lang.String LVG_CONFIG_FILE_ABSOLUTE

LVG_CACHE_SIZE

public static int LVG_CACHE_SIZE
Constructor Detail

NormAnalyzer

public NormAnalyzer()
Create a new NormAnalyzer. Uses all defaults in the @see WhiteSpaceLowerCaseAnalyzer, and config file location from the LVG_CONFIGJ_FILE_ABSOLUTE variable.


NormAnalyzer

public NormAnalyzer(java.lang.String lvgConfigFileLocation,
                    boolean keepOrigional,
                    java.lang.String[] stopWords,
                    char[] charsToRemove,
                    char[] charsToTreatAsWhiteSpace)
The lvg config file location is required.

Parameters:
lvgConfigFileLocation -
keepOrigional -
stopWords - - Stop words to use - not used if null or empty.
charsToRemove - - characters to remove from input (before norm) - not used if null or empty.
charsToTreatAsWhiteSpace - - characters to treat as whitespace (split points) - defaults to typical whitespace if null or empty.

NormAnalyzer

public NormAnalyzer(boolean keepOrigional,
                    java.lang.String[] stopWords,
                    char[] charsToRemove,
                    char[] charsToTreatAsWhiteSpace)
Create a norm analyzer. Uses preset LVG_CONFIG_File value.

Parameters:
keepOrigional -
stopWords - - Stop words to use - not used if null or empty.
charsToRemove - - characters to remove from input (before norm) - not used if null or empty.
charsToTreatAsWhiteSpace - - characters to treat as whitespace (split points) - defaults to typical whitespace if null or empty.

NormAnalyzer

public NormAnalyzer(java.lang.String lvgConfigFileLocation,
                    boolean keepOrigional)
The lvg config file location is required. Uses all defaults in the @see WhiteSpaceLowerCaseAnalyzer.

Parameters:
lvgConfigFileLocation -
keepOrigional -

NormAnalyzer

public NormAnalyzer(boolean keepOrigional)
Method Detail

tokenStream

public final org.apache.lucene.analysis.TokenStream tokenStream(java.lang.String fieldname,
                                                                java.io.Reader reader)
Specified by:
tokenStream in class org.apache.lucene.analysis.Analyzer

getWhiteSpaceLowerCaseAnalyzer

public WhiteSpaceLowerCaseAnalyzer getWhiteSpaceLowerCaseAnalyzer()
This method should not be part of the public API - but design requirements require it to be public. Do not use this method.


setWhiteSpaceLowerCaseAnalyzer

public void setWhiteSpaceLowerCaseAnalyzer(WhiteSpaceLowerCaseAnalyzer whiteSpaceLowerCaseAnalyzer)
This method should not be part of the public API - but design requirements require it to be public. Do not use this method.


Copyright: (c) 2004-2006 Mayo Foundation for Medical Education and Research (MFMER). All rights reserved. MAYO, MAYO CLINIC, and the triple-shield Mayo logo are trademarks and service marks of MFMER.