|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.LexGrid.LexBIG.example.FindCodesForDescription
public class FindCodesForDescription
Example showing how to find codes matching descriptive text. The program accepts up to two parameters... The first param (required) indicates the text used to search matching descriptions. Matches are determined through a customized match algorithm, which uses a simple heuristic to try and rank returned values by relevance. The second param (optional) indicates the type of entity to search. Possible values include the LexGrid built-in types "concept" and "instance". Additional supported types can be defined uniquely to a coding scheme. If provided, this should be a comma-delimited list of types. If not provided, all entity types are searched. Example: FindCodesForDescription "blood" Example: FindCodesForDescription "breast cancer" "concept"
Constructor Summary | |
---|---|
FindCodesForDescription()
|
Method Summary | |
---|---|
static void |
main(java.lang.String[] args)
Entry point for processing. |
void |
run(java.lang.String phrase,
java.lang.String[] nodeTypes)
|
protected float |
score(java.lang.String text,
java.util.List<java.lang.String> keywords,
boolean isPreferred,
float searchRank)
Returns a score providing a relative comparison of the given text against a set of keywords. |
protected ResolvedConceptReferencesIterator |
search(LexBIGService lbs,
java.lang.String codingSchemeName,
CodingSchemeVersionOrTag csvt,
java.lang.String phrase,
LocalNameList nodeTypeList)
|
protected ResolvedConceptReferencesIterator |
sortByScore(java.lang.String searchTerm,
ResolvedConceptReferencesIterator toSort,
int maxToReturn)
Sorts the given concept references based on a scoring algorithm designed to provide a more natural ordering. |
protected java.util.List<java.lang.String> |
toScoreWords(java.lang.String s)
Return words from the given string to be used in scoring algorithms, in order of occurrence and ignoring duplicates, stop words, whitespace and common separators. |
protected java.util.List<java.lang.String> |
toWords(java.lang.String s,
java.lang.String delimitRegex,
boolean removeStopWords,
boolean removeDuplicates)
Return words from the given string in order of occurrence, normalized to lower case, separated by the given delimiters (regular expression), and optionally removing stop words and duplicates. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public FindCodesForDescription()
Method Detail |
---|
public static void main(java.lang.String[] args)
args
- public void run(java.lang.String phrase, java.lang.String[] nodeTypes) throws LBException
LBException
protected ResolvedConceptReferencesIterator search(LexBIGService lbs, java.lang.String codingSchemeName, CodingSchemeVersionOrTag csvt, java.lang.String phrase, LocalNameList nodeTypeList)
protected ResolvedConceptReferencesIterator sortByScore(java.lang.String searchTerm, ResolvedConceptReferencesIterator toSort, int maxToReturn) throws LBException
searchTerm
- The term used for comparison; single or multi-word.toSort
- The iterator containing references to sort.maxToReturn
- Sets upper limit for number of top-scored items returned.
LBException
protected float score(java.lang.String text, java.util.List<java.lang.String> keywords, boolean isPreferred, float searchRank)
Currently the score is evaluated as a simple percentage based on number of words in the first set that are also in the second, with minor adjustment for order (matching later words given slightly higher weight, anticipating often the subject of search will follow descriptive text). Weight is also increased for shorter phrases (measured in # words) If the text is indicated to be preferred, the score is doubled to promote 'bubbling to the top'.
Ranking from the original search is available but not currently used in the heuristic (tends to throw-off desired alphabetic groupings later).
text
- keywords
- isPreferred
- searchRank
-
protected java.util.List<java.lang.String> toScoreWords(java.lang.String s)
s
-
protected java.util.List<java.lang.String> toWords(java.lang.String s, java.lang.String delimitRegex, boolean removeStopWords, boolean removeDuplicates)
s
- delimitRegex
- removeStopWords
- removeDuplicates
-
|
Copyright: (c) 2004-2006 Mayo Foundation for Medical Education and Research (MFMER). All rights reserved. MAYO, MAYO CLINIC, and the triple-shield Mayo logo are trademarks and service marks of MFMER. | ||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |