defaultsimilarity请教

helei123a 2012-03-31 09:36:51
1 public float coord(int overlap, int maxOverlap),这2个参数是代表什么?是不是表示每一个Document中所有匹配的关键字与当前关键字的匹配比例因素影响
2 public float computeNorm(String field, FieldInvertState state)
{
int numTerms;
if (discountOverlaps)
numTerms = state.getLength() - state.getNumOverlap();
else
numTerms = state.getLength();
return state.getBoost() * (float)(1.0D / Math.sqrt(numTerms));
},这个函数帮我讲解一下
...全文
102 3 打赏 收藏 转发到动态 举报
写回复
用AI写文章
3 条回复
切换为时间正序
请发表友善的回复…
发表回复
poson 2012-04-30
  • 打赏
  • 举报
回复
重叠率
helei123a 2012-04-01
  • 打赏
  • 举报
回复
The positionIncrement determines the position of this token relative to the previous Token in a TokenStream, used in phrase searching.
The default value is one.
Some common uses for this are:

Set it to zero to put multiple terms in the same position. This is useful if, e.g., a word has multiple stems. Searches for phrases including either stem will match. In this case, all but the first stem's increment should be set to zero: the increment of the first instance should be one. Repeating a token with an increment of zero can also be used to boost the scores of matches on that token.
Set it to values greater than one to inhibit exact phrase matches. If, for example, one does not want phrases to match across removed stop words, then one could build a stop word filter that removes stop words and also sets the increment to the number of stop words removed before each non-stop word. Then exact phrase queries will only match when the terms occur with no intervening stop words.
helei123a 2012-04-01
  • 打赏
  • 举报
回复
一天居然没有人回答,这个板块太冷了。通过api,说明如下:
overlap - the number of query terms matched in the document
maxOverlap - the total number of terms in the query
field - field name
state - current processing state for this field
public int getLength()
Get total number of terms in this field.
public int getNumOverlap()
Get the number of terms with positionIncrement == 0.
public float getBoost()
Get boost value. This is the cumulative product of document boost and field boost for all field instances sharing the same field name.

2,760

社区成员

发帖
与我相关
我的任务
社区描述
搜索引擎的服务器通过网络搜索软件或网络登录等方式,将Internet上大量网站的页面信息收集到本地,经过加工处理建立信息数据库和索引数据库。
社区管理员
  • 搜索引擎技术社区
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧