public class FindCodesForDescription
extends java.lang.Object
Constructor and Description |
---|
FindCodesForDescription() |
Modifier and Type | Method and Description |
---|---|
static void |
main(java.lang.String[] args)
Entry point for processing.
|
void |
run(java.lang.String phrase,
java.lang.String[] nodeTypes) |
protected float |
score(java.lang.String text,
java.util.List<java.lang.String> keywords,
boolean isPreferred,
float searchRank)
Returns a score providing a relative comparison of the given text against
a set of keywords.
|
protected ResolvedConceptReferencesIterator |
search(LexBIGService lbs,
java.lang.String codingSchemeName,
CodingSchemeVersionOrTag csvt,
java.lang.String phrase,
LocalNameList nodeTypeList) |
protected ResolvedConceptReferencesIterator |
sortByScore(java.lang.String searchTerm,
ResolvedConceptReferencesIterator toSort,
int maxToReturn)
Sorts the given concept references based on a scoring algorithm designed
to provide a more natural ordering.
|
protected java.util.List<java.lang.String> |
toScoreWords(java.lang.String s)
Return words from the given string to be used in scoring algorithms, in
order of occurrence and ignoring duplicates, stop words, whitespace and
common separators.
|
protected java.util.List<java.lang.String> |
toWords(java.lang.String s,
java.lang.String delimitRegex,
boolean removeStopWords,
boolean removeDuplicates)
Return words from the given string in order of occurrence, normalized to
lower case, separated by the given delimiters (regular expression), and
optionally removing stop words and duplicates.
|
public static void main(java.lang.String[] args)
args
- public void run(java.lang.String phrase, java.lang.String[] nodeTypes) throws LBException
LBException
protected ResolvedConceptReferencesIterator search(LexBIGService lbs, java.lang.String codingSchemeName, CodingSchemeVersionOrTag csvt, java.lang.String phrase, LocalNameList nodeTypeList)
protected ResolvedConceptReferencesIterator sortByScore(java.lang.String searchTerm, ResolvedConceptReferencesIterator toSort, int maxToReturn) throws LBException
searchTerm
- The term used for comparison; single or multi-word.toSort
- The iterator containing references to sort.maxToReturn
- Sets upper limit for number of top-scored items returned.LBException
protected float score(java.lang.String text, java.util.List<java.lang.String> keywords, boolean isPreferred, float searchRank)
Currently the score is evaluated as a simple percentage based on number of words in the first set that are also in the second, with minor adjustment for order (matching later words given slightly higher weight, anticipating often the subject of search will follow descriptive text). Weight is also increased for shorter phrases (measured in # words) If the text is indicated to be preferred, the score is doubled to promote 'bubbling to the top'.
Ranking from the original search is available but not currently used in the heuristic (tends to throw-off desired alphabetic groupings later).
text
- keywords
- isPreferred
- searchRank
- protected java.util.List<java.lang.String> toScoreWords(java.lang.String s)
s
- protected java.util.List<java.lang.String> toWords(java.lang.String s, java.lang.String delimitRegex, boolean removeStopWords, boolean removeDuplicates)
s
- delimitRegex
- removeStopWords
- removeDuplicates
-