I need a simple algorithm to build a query ( ie lucene query for example ) from user input text.
I have a program which can extract the tokens out of input text.
I have a database storing different senses of tokens.
For example, the word 'renovated' has a synonym sense of 'remodeled'.
I have stored in a database 2000 words and their different senses and relationship types.
I have 6 relationship types for each term.
1. synonym 2. abbreviation 3. antonym 3. subset, 4. superset 5. implied or suggested
'balcony' for example is a superset of 'juliet balcony' whereas 'juliet balcony' is a subset of 'balcony'
Also relationships can be defined to be uni directional or bi-directional. For example : B is a suggestion of A but not necessarily A is a suggestion of B.
I create a lucene query with all possible senses of the words in input text.
however, to find out all possible senses, we may need to traverse the relationships.
For example, if A occurs in input text
if B is a synonym of A then B goes into luecen query vector. also if C is a synonym of B then C also goes into query vector and so on.
we form 2 types of lucene query vectors :
direct = consists of derivedwords which are synonyms, abbreviations, subsets
suggested = derived words which are supersets, implied or suggested
i need an algorithm to build this query vector in an efficient way.
Simply can BE WRITTEN DOWN on paper / pen as algorithm / pseudo code. No need to write actual physical code . Looking for ideas.