关于多个关键字全文检索的问题……

wl8685 2008-01-15 01:10:55

比如：英语四六级考试大纲中有“多个超纲的单词”，给出一篇文章，在这篇文章中检索出都有哪些“所列出的超纲的词”..

...全文

306 13 打赏收藏转发到动态举报

写回复

用AI写文章

13 条回复

切换为时间正序

请发表友善的回复…

发表回复

wl8685 2008-01-15

打赏
举报

自己写了一个，但可能比较耗时，大批量数据查询的时候没有测试。不过功能是实现了

部分代码如下：

String content = request.getParameter("content");//接收的文章内容；
String[] key = request.getParameter("key").split(",");//接收的关键字；
String arr = "";
for (int i = 0; i < key.length; i++) {
if (content.indexOf(key[i].trim()) != -1) {
arr += key[i].trim() + " ";
}
}
System.out.println("有关键字：" + arr);

wl8685 2008-01-15

打赏
举报

小弟在此先谢过楼上几位了....
我先研究一番去......有了结果及时给大家通报..

kekeemx 2008-01-15

打赏
举报

亚.....LUCENE都已经出来了..那我的那个日志分析就不用拿出来献丑了.蛤蛤
最近我也在做一个全文检索的东西.是利用的COMPASS这个开源项目.它就是基于LUCENE的
感觉还是很不错的.正在深入研究它的运作中....感觉最终结果的输出顺序还是可控性不太高
当然不是指它的addsort方法不行,而是说compass对输出结果的一个相似度排列没有我预想
中那么高.也许是我的配置方面还有问题,呃....扯远了.
楼主的问题看来火龙果是已经给出比较棒的解决办法了.

PS：顺便问问有没对COMPASS比较有研究的高人能给我留个信.....急切想知道它的一些配置方面的东西

aunty_flybird 2008-01-15

打赏
举报

建议楼主看看 Lucene，如果真是想提高效率，Lucene是apache开源项目之一，http://lucene.apache.org

老紫竹 2008-01-15

打赏
举报

空格，还有回车，还有制表，这些放在单词里就分割成2个单词了，所以无所谓的

火龙果被占用了 2008-01-15

打赏
举报

用Lucene做了一个。

english.txt内容如下：

Provides a convenient implementation of the HttpServletRequest interface that can 

be subclassed by developers wishing to adapt the request to a Servlet. This class 

implements the Wrapper or Decorator pattern. Methods default to calling through to 

the wrapped request object.



Objects that are bound to a session may listen to container events notifying them 

that sessions will be passivated and that session will be activated. A container 

that migrates session between VMs or persists sessions is required to notify all 

attributes bound to sessions implementing HttpSessionActivationListener.

代码：

import java.io.File;

import java.io.FileReader;

import java.util.ArrayList;

import java.util.List;



import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.document.Field;

import org.apache.lucene.index.IndexWriter;

import org.apache.lucene.queryParser.QueryParser;

import org.apache.lucene.search.Hits;

import org.apache.lucene.search.IndexSearcher;

import org.apache.lucene.search.Query;

import org.apache.lucene.store.Directory;



public class LuceneTest {



    public static void main(String[] args) {

        // 超纲词汇列表

        List<String> wordList = new ArrayList<String>();

        wordList.add("convenient");

        wordList.add("attributes");

        wordList.add("required");

        wordList.add("human");

        wordList.add("computer");

        wordList.add("object");



        // 索引存放的路径

         File indexDir = new File("f:/test/index");

        // 文章文件名

         File srcFile = new File("f:/test/english.txt");

        // 建立 Lucene 索引，在文件不改动时，只需要执行一次

         createIndex(indexDir, srcFile);

        

        System.out.println("超纲词汇如下：");

        List<String> overList = getOverWords(indexDir, wordList);

        for (String s : overList) {

            System.out.println(s);

        }



    }



    private static void createIndex(File indexDir, File srcFile) {

        IndexWriter writer = null;

        try {

            writer = new IndexWriter(indexDir, new StandardAnalyzer(), true);

            writer.setUseCompoundFile(false);

            Document doc = new Document();

            System.out.println("Indexing...");

            doc.add(new Field("content", new FileReader(srcFile)));

            writer.addDocument(doc);

            System.out.println("Index finshed.");

            writer.optimize();

        } catch (Exception e) {

            e.printStackTrace();

        } finally {

            try {

                writer.close();

            } catch (Exception e) {

                e.printStackTrace();

            }

        }

    }



    private static List<String> getOverWords(File indexDir,

            List<String> wordList) {

        IndexWriter writer = null;

        List<String> overList = new ArrayList<String>();

        try {

            writer = new IndexWriter(indexDir, new StandardAnalyzer(), false);

            Directory fsDir = writer.getDirectory();

            IndexSearcher is = new IndexSearcher(fsDir);

            QueryParser parse = new QueryParser("content",

                    new StandardAnalyzer());

            for (int i = 0, k = wordList.size(); i < k; i++) {

                String word = wordList.get(i);

                Query query = parse.parse(word);

                Hits hits = is.search(query);

                if (hits.length() > 0) {

                    overList.add(word);

                }

            }

        } catch (Exception e) {

            e.printStackTrace();

        } finally {

            try {

                writer.close();

            } catch (Exception e) {

                e.printStackTrace();

            }

        }

        return overList;

    }

}