lucene 在 search的时候区分大小写问题。

lauweiaaa 2002-08-07 05:48:06
如何设置可以使search不区分大小写?(search 英文)
...全文
388 5 打赏 收藏 转发到动态 举报
写回复
用AI写文章
5 条回复
切换为时间正序
请发表友善的回复…
发表回复
netnerd 2003-02-18
  • 打赏
  • 举报
回复
另外我问你个问题后:你的搜索支持中文嘛?包括简体中文和繁体中文哦。
还有你的索引支持文件类型pdf eml zip嘛?

多谢啦~~~
netnerd 2003-02-18
  • 打赏
  • 举报
回复
在你index文件的时候,用我下面的这个PorterStemAnalyzer类来做Analyzer(那么在search的时候也要用这个PorterStemAnalyzer来做Analyzer):
...
Analyzer analyzer = new PorterStemAnalyzer();
boolean createFlag = true;
IndexWriter writer = new IndexWriter(indexDir, analyzer, createFlag);
...



package lucene;

/**
* <p>Title: </p>
* <p>Description: </p>
* <p>Copyright: Copyright (c) 2003</p>
* <p>Company: </p>
* @author unascribed
* @version 1.0
*/

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.StopFilter;
import org.apache.lucene.analysis.LowerCaseTokenizer;
import org.apache.lucene.analysis.PorterStemFilter;

import java.io.Reader;
import java.util.Hashtable;

/**
* PorterStemAnalyzer processes input
* text by stemming English words to their roots.
* This Analyzer also converts the input to lower case
* and removes stop words. A small set of default stop
* words is defined in the STOP_WORDS
* array, but a caller can specify an alternative set
* of stop words by calling non-default constructor.
*/
public class PorterStemAnalyzer extends Analyzer
{
private static Hashtable _stopTable;

/**
* An array containing some common English words
* that are usually not useful for searching.
*/
public static final String[] STOP_WORDS =
{
"0", "1", "2", "3", "4", "5", "6", "7", "8",
"9", "000", "$",
"about", "after", "all", "also", "an", "and",
"another", "any", "are", "as", "at", "be",
"because", "been", "before", "being", "between",
"both", "but", "by", "came", "can", "come",
"could", "did", "do", "does", "each", "else",
"for", "from", "get", "got", "has", "had",
"he", "have", "her", "here", "him", "himself",
"his", "how","if", "in", "into", "is", "it",
"its", "just", "like", "make", "many", "me",
"might", "more", "most", "much", "must", "my",
"never", "now", "of", "on", "only", "or",
"other", "our", "out", "over", "re", "said",
"same", "see", "should", "since", "so", "some",
"still", "such", "take", "than", "that", "the",
"their", "them", "then", "there", "these",
"they", "this", "those", "through", "to", "too",
"under", "up", "use", "very", "want", "was",
"way", "we", "well", "were", "what", "when",
"where", "which", "while", "who", "will",
"with", "would", "you", "your",
"a", "b", "c", "d", "e", "f", "g", "h", "i",
"j", "k", "l", "m", "n", "o", "p", "q", "r",
"s", "t", "u", "v", "w", "x", "y", "z"

};

/**
* Builds an analyzer.
*/
public PorterStemAnalyzer()
{
this(STOP_WORDS);
}

/**
* Builds an analyzer with the given stop words.
*
* @param stopWords a String array of stop words
*/
public PorterStemAnalyzer(String[] stopWords)
{
_stopTable = StopFilter.makeStopTable(stopWords);
}

/**
* Processes the input by first converting it to
* lower case, then by eliminating stop words, and
* finally by performing Porter stemming on it.
*
* @param reader the Reader that
* provides access to the input text
* @return an instance of TokenStream
*/
public final TokenStream tokenStream(Reader reader)
{
System.out.println("PorterStemAnalyzer.tokenStream(Reader reader)");
return new PorterStemFilter(
new StopFilter(new LowerCaseTokenizer(reader),
_stopTable));
}
}

lauweiaaa 2002-08-14
  • 打赏
  • 举报
回复
谁来帮我顶一下,谢了先。
lauweiaaa 2002-08-13
  • 打赏
  • 举报
回复
没人会。再up!
lauweiaaa 2002-08-08
  • 打赏
  • 举报
回复
up!高手!

62,614

社区成员

发帖
与我相关
我的任务
社区描述
Java 2 Standard Edition
社区管理员
  • Java SE
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧