测试Nutch的时候,搜索结果一直为0 附log
Urack 2010-01-16 07:22:20 最近在机器上测试nutch0.9,每次搜索结果都是0 不知道是哪里变量设置错误了。
参考了书店的书还有网上的很多教程就是没法做到预期的效果。
环境
cygwin最新版(前天在线安装的)/ Tomcat 5.5 /JDK 1.6 /nutch 0.9
如爬取apache.org网站以后,索引内容应该是正确建立了。以下是抓取的状态信息,使用luke也确实看到了索引内容
但是每次使用 “bin/nutch org.apache.nutch.searcher.NutchBean apache >&search.log”来测试的时候
检测的结果都是 “Total hits: 0”
注:使用tomcat的时候也是检索不到结果
如下是机器在cygwin时候的log
=========================================================
bin/nutch readdb apache.org/crawldb -stats >&stats.log
CrawlDb statistics start: apache.org/crawldb
Statistics for CrawlDb: apache.org/crawldb
TOTAL urls: 2207
retry 0: 2207
min score: 0.0
avg score: 0.0030
max score: 1.03
status 1 (db_unfetched): 2106
status 2 (db_fetched): 94
status 3 (db_gone): 2
status 5 (db_redir_perm): 5
CrawlDb statistics: done
bin/nutch org.apache.nutch.searcher.NutchBean apache >&search.log
Total hits: 0
如下是tomcat的检索log
===============================================
2010-01-16 19:17:58,679 WARN Configuration - bad conf file: element not <property>
2010-01-16 19:17:58,689 INFO PluginRepository - Plugins: looking in: C:\Tomcat\webapps\ROOT\WEB-INF\classes\plugins
2010-01-16 19:17:58,879 INFO PluginRepository - Plugin Auto-activation mode: [true]
2010-01-16 19:17:58,879 INFO PluginRepository - Registered Plugins:
2010-01-16 19:17:58,879 INFO PluginRepository - the nutch core extension points (nutch-extensionpoints)
2010-01-16 19:17:58,879 INFO PluginRepository - Basic Query Filter (query-basic)
2010-01-16 19:17:58,879 INFO PluginRepository - Basic URL Normalizer (urlnormalizer-basic)
2010-01-16 19:17:58,879 INFO PluginRepository - Basic Indexing Filter (index-basic)
2010-01-16 19:17:58,879 INFO PluginRepository - Html Parse Plug-in (parse-html)
2010-01-16 19:17:58,879 INFO PluginRepository - Basic Summarizer Plug-in (summary-basic)
2010-01-16 19:17:58,879 INFO PluginRepository - Site Query Filter (query-site)
2010-01-16 19:17:58,879 INFO PluginRepository - HTTP Framework (lib-http)
2010-01-16 19:17:58,879 INFO PluginRepository - Text Parse Plug-in (parse-text)
2010-01-16 19:17:58,879 INFO PluginRepository - Regex URL Filter (urlfilter-regex)
2010-01-16 19:17:58,879 INFO PluginRepository - Pass-through URL Normalizer (urlnormalizer-pass)
2010-01-16 19:17:58,879 INFO PluginRepository - Http Protocol Plug-in (protocol-http)
2010-01-16 19:17:58,879 INFO PluginRepository - Regex URL Normalizer (urlnormalizer-regex)
2010-01-16 19:17:58,879 INFO PluginRepository - OPIC Scoring Plug-in (scoring-opic)
2010-01-16 19:17:58,879 INFO PluginRepository - CyberNeko HTML Parser (lib-nekohtml)
2010-01-16 19:17:58,879 INFO PluginRepository - JavaScript Parser (parse-js)
2010-01-16 19:17:58,879 INFO PluginRepository - URL Query Filter (query-url)
2010-01-16 19:17:58,879 INFO PluginRepository - Regex URL Filter Framework (lib-regex-filter)
2010-01-16 19:17:58,879 INFO PluginRepository - Registered Extension-Points:
2010-01-16 19:17:58,879 INFO PluginRepository - Nutch Summarizer (org.apache.nutch.searcher.Summarizer)
2010-01-16 19:17:58,879 INFO PluginRepository - Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer)
2010-01-16 19:17:58,879 INFO PluginRepository - Nutch Protocol (org.apache.nutch.protocol.Protocol)
2010-01-16 19:17:58,879 INFO PluginRepository - Nutch Analysis (org.apache.nutch.analysis.NutchAnalyzer)
2010-01-16 19:17:58,879 INFO PluginRepository - Nutch URL Filter (org.apache.nutch.net.URLFilter)
2010-01-16 19:17:58,879 INFO PluginRepository - Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter)
2010-01-16 19:17:58,879 INFO PluginRepository - Nutch Online Search Results Clustering Plugin (org.apache.nutch.clustering.OnlineClusterer)
2010-01-16 19:17:58,879 INFO PluginRepository - HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter)
2010-01-16 19:17:58,879 INFO PluginRepository - Nutch Content Parser (org.apache.nutch.parse.Parser)
2010-01-16 19:17:58,879 INFO PluginRepository - Nutch Scoring (org.apache.nutch.scoring.ScoringFilter)
2010-01-16 19:17:58,879 INFO PluginRepository - Nutch Query Filter (org.apache.nutch.searcher.QueryFilter)
2010-01-16 19:17:58,879 INFO PluginRepository - Ontology Model Loader (org.apache.nutch.ontology.Ontology)
2010-01-16 19:17:58,889 INFO NutchBean - creating new bean
2010-01-16 19:17:58,909 INFO NutchBean - opening indexes in crawl/indexes
2010-01-16 19:17:58,999 INFO Configuration - found resource common-terms.utf8 at file:/C:/Tomcat/webapps/ROOT/WEB-INF/classes/common-terms.utf8
2010-01-16 19:17:59,009 INFO NutchBean - opening segments in crawl/segments
2010-01-16 19:17:59,029 INFO SummarizerFactory - Using the first summarizer extension found: Basic Summarizer
2010-01-16 19:17:59,029 INFO NutchBean - opening linkdb in crawl/linkdb
2010-01-16 19:17:59,049 INFO NutchBean - query request from 127.0.0.1
2010-01-16 19:17:59,069 INFO NutchBean - query: apache
2010-01-16 19:17:59,069 INFO NutchBean - lang:
2010-01-16 19:17:59,139 INFO NutchBean - searching for 20 raw hits
2010-01-16 19:17:59,310 INFO NutchBean - total hits: 0