org.LexGrid.LexBIG.example
Class FindCodesForDescription

java.lang.Object
  extended by org.LexGrid.LexBIG.example.FindCodesForDescription

public class FindCodesForDescription
extends java.lang.Object

Example showing how to find codes matching descriptive text. The program accepts up to two parameters... The first param (required) indicates the text used to search matching descriptions. Matches are determined through a customized match algorithm, which uses a simple heuristic to try and rank returned values by relevance. The second param (optional) indicates the type of entity to search. Possible values include the LexGrid built-in types "concept" and "instance". Additional supported types can be defined uniquely to a coding scheme. If provided, this should be a comma-delimited list of types. If not provided, all entity types are searched. Example: FindCodesForDescription "blood" Example: FindCodesForDescription "breast cancer" "concept"


Constructor Summary
FindCodesForDescription()
           
 
Method Summary
static void main(java.lang.String[] args)
          Entry point for processing.
 void run(java.lang.String phrase, java.lang.String[] nodeTypes)
           
protected  float score(java.lang.String text, java.util.List<java.lang.String> keywords, boolean isPreferred, float searchRank)
          Returns a score providing a relative comparison of the given text against a set of keywords.
protected  ResolvedConceptReferencesIterator search(LexBIGService lbs, java.lang.String codingSchemeName, CodingSchemeVersionOrTag csvt, java.lang.String phrase, LocalNameList nodeTypeList)
           
protected  ResolvedConceptReferencesIterator sortByScore(java.lang.String searchTerm, ResolvedConceptReferencesIterator toSort, int maxToReturn)
          Sorts the given concept references based on a scoring algorithm designed to provide a more natural ordering.
protected  java.util.List<java.lang.String> toScoreWords(java.lang.String s)
          Return words from the given string to be used in scoring algorithms, in order of occurrence and ignoring duplicates, stop words, whitespace and common separators.
protected  java.util.List<java.lang.String> toWords(java.lang.String s, java.lang.String delimitRegex, boolean removeStopWords, boolean removeDuplicates)
          Return words from the given string in order of occurrence, normalized to lower case, separated by the given delimiters (regular expression), and optionally removing stop words and duplicates.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

FindCodesForDescription

public FindCodesForDescription()
Method Detail

main

public static void main(java.lang.String[] args)
Entry point for processing.

Parameters:
args -

run

public void run(java.lang.String phrase,
                java.lang.String[] nodeTypes)
         throws LBException
Throws:
LBException

search

protected ResolvedConceptReferencesIterator search(LexBIGService lbs,
                                                   java.lang.String codingSchemeName,
                                                   CodingSchemeVersionOrTag csvt,
                                                   java.lang.String phrase,
                                                   LocalNameList nodeTypeList)

sortByScore

protected ResolvedConceptReferencesIterator sortByScore(java.lang.String searchTerm,
                                                        ResolvedConceptReferencesIterator toSort,
                                                        int maxToReturn)
                                                 throws LBException
Sorts the given concept references based on a scoring algorithm designed to provide a more natural ordering. Scores are determined by comparing each reference against a provided search term.

Parameters:
searchTerm - The term used for comparison; single or multi-word.
toSort - The iterator containing references to sort.
maxToReturn - Sets upper limit for number of top-scored items returned.
Returns:
Iterator over sorted references.
Throws:
LBException

score

protected float score(java.lang.String text,
                      java.util.List<java.lang.String> keywords,
                      boolean isPreferred,
                      float searchRank)
Returns a score providing a relative comparison of the given text against a set of keywords.

Currently the score is evaluated as a simple percentage based on number of words in the first set that are also in the second, with minor adjustment for order (matching later words given slightly higher weight, anticipating often the subject of search will follow descriptive text). Weight is also increased for shorter phrases (measured in # words) If the text is indicated to be preferred, the score is doubled to promote 'bubbling to the top'.

Ranking from the original search is available but not currently used in the heuristic (tends to throw-off desired alphabetic groupings later).

Parameters:
text -
keywords -
isPreferred -
searchRank -
Returns:
The score; a higher value indicates a stronger match.

toScoreWords

protected java.util.List<java.lang.String> toScoreWords(java.lang.String s)
Return words from the given string to be used in scoring algorithms, in order of occurrence and ignoring duplicates, stop words, whitespace and common separators.

Parameters:
s -
Returns:
List

toWords

protected java.util.List<java.lang.String> toWords(java.lang.String s,
                                                   java.lang.String delimitRegex,
                                                   boolean removeStopWords,
                                                   boolean removeDuplicates)
Return words from the given string in order of occurrence, normalized to lower case, separated by the given delimiters (regular expression), and optionally removing stop words and duplicates.

Parameters:
s -
delimitRegex -
removeStopWords -
removeDuplicates -
Returns:
List

Copyright: (c) 2004-2006 Mayo Foundation for Medical Education and Research (MFMER). All rights reserved. MAYO, MAYO CLINIC, and the triple-shield Mayo logo are trademarks and service marks of MFMER.