Lucene结合IKAnalyzer内存溢出

xcfdsarfew 2012-12-25 11:14:40

public LuceneDomain searchIndex(String searchStr) throws Exception{

File indexDir = new File(PropertiesUtil.getPropertyValue(searchDirKEY));

String[] fields=new String[]{"id","source","title","context","url"};
//索引目录
Directory dir=FSDirectory.open(indexDir);
//根据索引目录创建读索引对象
IndexReader reader = IndexReader.open(dir);
//搜索对象创建
IndexSearcher searcher = new IndexSearcher(reader);
//IKAnalyzer中文分词
Analyzer analyzer = new IKAnalyzer();
//创建查询解析对象
//QueryParser parser = new QueryParser(Version.LUCENE_36,"context", analyzer);
QueryParser parser = new MultiFieldQueryParser(Version.LUCENE_36,fields, analyzer);
parser.setDefaultOperator(QueryParser.AND_OPERATOR);
//根据域和目标搜索文本创建查询器
//Query query = parser.parse(searchStr);
Query query =IKQueryParser.parseMultiField(fields, searchStr);
System.out.println("Searching for: " + query.toString("context"));
//对结果进行相似度打分排序
TopScoreDocCollector collector = TopScoreDocCollector.create(maxBufferedDocs,false);
searcher.search(query, collector);
//获取结果
ScoreDoc[] hits = collector.topDocs().scoreDocs;

int numTotalHits = collector.getTotalHits();
LuceneDomain lucene=new LuceneDomain();
lucene.setTotalNum(numTotalHits);
lucene.setSearchText(searchStr);
List<SearchDomain> searchList=new ArrayList<SearchDomain>();
//显示搜索结果
SearchDomain search=null;
for (int i = 0; i < hits.length; i++) {
search=new SearchDomain();
Document doc = searcher.doc(hits[i].doc);
// String url = doc.get("url");
// String title=doc.get("title");
String context=Tools.replaceHtml(doc.get("context"));
search.setId(Integer.parseInt(doc.get("id")));
search.setSource(Integer.parseInt(doc.get("source")));
search.setTitle(Tools.replaceHtml(doc.get("title")));
if(context.length()>100){
search.setContext(context.substring(0,100));
}else{
search.setContext(context);
}
search.setUrl(doc.get("url"));
// System.out.println((i + 1) + "." + title);
// System.out.println("-----------------------------------");
// System.out.println(context.substring(0,100)+"......");
// System.out.println(url+"......");
// System.out.println("-----------------------------------");
// System.out.println(url);
searchList.add(search);
}
lucene.setSearchData(searchList);
return lucene;
}


这里 //Query query = parser.parse(searchStr);
Query query =IKQueryParser.parseMultiField(fields, searchStr);
我如果用原版的分词器没问题
用这个IK分词器的话就会出现内存溢出!我设置了512的内存才5000行新闻就溢出了我想知道是什么问题 并不想去改内存大小,
因为英文分词器可以达到20万条
...全文
331 6 打赏 收藏 转发到动态 举报
写回复
用AI写文章
6 条回复
切换为时间正序
请发表友善的回复…
发表回复
xcfdsarfew 2013-01-03
  • 打赏
  • 举报
回复
结贴吧 !我最后还是妥协了 只能使用分批生成
悲催的程序猿 2012-12-26
  • 打赏
  • 举报
回复
xcfdsarfew 2012-12-25
  • 打赏
  • 举报
回复
//IKAnalyzer中文分词 Analyzer analyzer = new IKAnalyzer();
xcfdsarfew 2012-12-25
  • 打赏
  • 举报
回复
引用 3 楼 yjflinchong 的回复:
什么叫 5000行新闻就溢出了 是一次性 分词5000行新闻吗? 照理说不会啊。
我的分配512M内存 5000条新闻就直接内存溢出 用Lucene自带的分词器不会有这种问题存在但是他分词不支持中文 导致搜索不到 我是tomcat容器启动时候用spring调度时出现的 错误信息如下

Message: Context initialization failed
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'createLecuneIndexTask' defined in file [D:\Program Files\Java\Tomcat 6.0\webapps\onionPortal\WEB-INF\classes\applicationContext_app.xml]: Invocation of init method failed; nested exception is java.lang.OutOfMemoryError: Java heap space
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1420)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:519)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:456)
	at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:291)
	at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222)
	at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:288)
	at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:190)
	at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:580)
	at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:895)
	at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:425)
	at org.springframework.web.context.ContextLoader.createWebApplicationContext(ContextLoader.java:276)
	at org.springframework.web.context.ContextLoader.initWebApplicationContext(ContextLoader.java:197)
	at org.springframework.web.context.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:47)
	at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4135)
	at org.apache.catalina.core.StandardContext.start(StandardContext.java:4630)
	at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791)
	at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771)
	at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:546)
	at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:905)
	at org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:740)
	at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:500)
	at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1277)
	at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:321)
	at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
	at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
	at org.apache.catalina.core.StandardHost.start(StandardHost.java:785)
	at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045)
	at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:445)
	at org.apache.catalina.core.StandardService.start(StandardService.java:519)
	at org.apache.catalina.core.StandardServer.start(StandardServer.java:710)
	at org.apache.catalina.startup.Catalina.start(Catalina.java:581)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
	at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
Caused by: java.lang.OutOfMemoryError: Java heap space
	at org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.<init>(FreqProxTermsWriterPerField.java:194)
	at org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.newInstance(FreqProxTermsWriterPerField.java:204)
	at org.apache.lucene.index.ParallelPostingsArray.grow(ParallelPostingsArray.java:48)
	at org.apache.lucene.index.TermsHashPerField.growParallelPostingsArray(TermsHashPerField.java:137)
	at org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:440)
	at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:184)
	at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:276)
	at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:766)
	at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2060)
	at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2034)
	at com.onionportal.lucene.SearchUntil.createIndex(SearchUntil.java:104)
	at com.onionportal.lucene.service.imp.LecuneServiceImp.createNewsIndex(LecuneServiceImp.java:54)
	at com.onionportal.until.task.CreateLecuneIndexTask.createIndex(CreateLecuneIndexTask.java:37)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeCustomInitMethod(AbstractAutowireCapableBeanFactory.java:1544)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1485)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1417)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:519)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:456)
	at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:291)
	at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222)
	at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:288)
	at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:190)
	at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:580)
	at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:895)
	at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:425)
	at org.springframework.web.context.ContextLoader.createWebApplicationContext(ContextLoader.java:276)
	at org.springframework.web.context.ContextLoader.initWebApplicationContext(ContextLoader.java:197)
	at org.springframework.web.context.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:47)
2012-12-25 23:46:52 org.apache.catalina.core.StandardContext listenerStart
严重: Exception sending context initialized event to listener instance of class org.springframework.web.context.ContextLoaderListener
yjflinchong 2012-12-25
  • 打赏
  • 举报
回复
什么叫 5000行新闻就溢出了 是一次性 分词5000行新闻吗? 照理说不会啊。
xcfdsarfew 2012-12-25
  • 打赏
  • 举报
回复
怎么木有人回复

67,513

社区成员

发帖
与我相关
我的任务
社区描述
J2EE只是Java企业应用。我们需要一个跨J2SE/WEB/EJB的微容器,保护我们的业务核心组件(中间件),以延续它的生命力,而不是依赖J2SE/J2EE版本。
社区管理员
  • Java EE
加入社区
  • 近7日
  • 近30日
  • 至今
社区公告
暂无公告

试试用AI创作助手写篇文章吧