648 lines
8.7 KiB
HTML
648 lines
8.7 KiB
HTML
<html>
|
|
<title>LIVE Interactive Comparison of Gigablast vs SOLR Open Source Search Engine</title>
|
|
|
|
|
|
<h2>Comparing Gigablast to SOLR</h2>
|
|
|
|
|
|
<table cellspacing=10 border=1>
|
|
<tr>
|
|
|
|
<td style=max-width:100px;min-width:10%;></td>
|
|
|
|
<td style=min-width:30%><b><a href=http://www.gigablast.com/>Gigablast</a></b></td>
|
|
|
|
<td style=min-width:30%><b><a href=http://lucene.apache.org/solr/>SOLR</a></b></td>
|
|
<!--
|
|
<td><b><a href=http://www.elasticsearch.org/>ElasticSearch</a></b></td>-->
|
|
</tr>
|
|
|
|
<tr valign=top>
|
|
<td><b>Package Installation</b></td>
|
|
|
|
<!-- gb install -->
|
|
<td>
|
|
<a href=/admin.html#quickstart>Download packages for Ubuntu or RedHat</a>
|
|
</td>
|
|
|
|
<!-- solr install-->
|
|
<td>
|
|
|
|
<a href=http://wiki.apache.org/solr/SolrInstall>Instructions</a>
|
|
|
|
</td>
|
|
|
|
<!-- elastic search install-->
|
|
<!--
|
|
<td>
|
|
<ul>
|
|
<li>wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.3.zip
|
|
<li>unzip elasticsearch-0.90.3.zip
|
|
<li>cd elasticsearch-0.90.3
|
|
<li>cd bin
|
|
<li>./elasticsearch -f
|
|
<li>curl -X GET http://localhost:9200/
|
|
</ul>
|
|
</td>
|
|
-->
|
|
|
|
</tr>
|
|
|
|
|
|
|
|
|
|
<tr valign=top>
|
|
<td><b>Source Installation</b></td>
|
|
<!-- gb install -->
|
|
<td>
|
|
Just a <a href=/admin.html#src>few simple steps</a>
|
|
</td>
|
|
<!-- solr install-->
|
|
<td>
|
|
|
|
<a href=http://lucene.apache.org/solr/downloads.html>Source download</a>
|
|
|
|
</td>
|
|
<!-- elastic search install-->
|
|
<!--
|
|
<td>
|
|
<ul>
|
|
<li>wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.3.zip
|
|
<li>unzip elasticsearch-0.90.3.zip
|
|
<li>cd elasticsearch-0.90.3
|
|
<li>cd bin
|
|
<li>./elasticsearch -f
|
|
<li>curl -X GET http://localhost:9200/
|
|
</ul>
|
|
</td>
|
|
-->
|
|
</tr>
|
|
|
|
|
|
|
|
<tr>
|
|
<td>
|
|
<b>Complete Web GUI</b>
|
|
</td>
|
|
<!--gigablast-->
|
|
<td>
|
|
<font color=green><b>
|
|
Yes.
|
|
</b></font>
|
|
</td>
|
|
<!--solr-->
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
|
|
|
|
<tr>
|
|
<td>
|
|
<b>Indexing a Single File Containing Multiple Documents via cmdline</b>
|
|
</td>
|
|
<!--gigablast-->
|
|
<td>
|
|
Use curl using args listed <a href=/api.html#/admin/inject>here</a>
|
|
<br>
|
|
</td>
|
|
<!--solr-->
|
|
<td>
|
|
unsupported
|
|
</td>
|
|
</tr>
|
|
|
|
|
|
|
|
|
|
<tr>
|
|
<td>
|
|
<b>Indexing an Individual File via cmdline</b>
|
|
</td>
|
|
<!--gigablast-->
|
|
<td>
|
|
Use curl to post the content of the file with args listed
|
|
<a href=/api.html#/admin/inject>here</a>
|
|
</td>
|
|
<!--solr-->
|
|
<td>
|
|
You can index individual local files as such:
|
|
<b>curl "http://127.0.0.1:8080/solr/update" --data-binary @myfile.html -H 'Content-type: text/html'</b>
|
|
but it does not seem to work unless your HTML meets stringent requirements for some reason.
|
|
</td>
|
|
</tr>
|
|
|
|
|
|
<tr>
|
|
<td>
|
|
<b>Indexing an Individual URL via cmdline</b>
|
|
</td>
|
|
<!--gigablast-->
|
|
<td>
|
|
Use curl to inject the url with args listed
|
|
<a href=/api.html#/admin/inject>here</a>
|
|
|
|
</td>
|
|
<!--solr-->
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
|
|
<tr>
|
|
<td>
|
|
<b>Indexing a File of URLs via cmdline</b>
|
|
</td>
|
|
<!--gigablast-->
|
|
<td>
|
|
Use one curl command for each url, using the interface described
|
|
<a href=/api.html#/admin/inject>here</a></b>
|
|
</td>
|
|
<!--solr-->
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
|
|
|
|
|
|
<tr>
|
|
<td>
|
|
<b>Deleting Documents via cmdline</b>
|
|
</td>
|
|
<!--gigablast-->
|
|
<td>
|
|
Use curl command to delete a url, using the interface described
|
|
<a href=/api.html#/admin/inject>here</a></b>
|
|
</td>
|
|
<!--solr-->
|
|
<td>
|
|
You can delete individual documents by specifying queries that match just those documents:
|
|
<b>java -Dcommit....</b>
|
|
</td>
|
|
</tr>
|
|
|
|
|
|
|
|
|
|
<tr>
|
|
<td><b>Getting Results via cmdline</b></td>
|
|
<td>
|
|
Use curl command to do a search, using the interface described
|
|
<a href=/api.html#/search>here</a></b>
|
|
</td>
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
|
|
<tr>
|
|
<td><b>Faceted Search</b></td>
|
|
<td>
|
|
Coming soon.
|
|
</td>
|
|
<td>
|
|
Yes.
|
|
</td>
|
|
</tr>
|
|
|
|
|
|
<tr>
|
|
<td><b>Numeric Fields</b></td>
|
|
<td>
|
|
You can forward/reverse sort by and constrain by numeric fields.
|
|
</td>
|
|
<td>
|
|
You can forward/reverse sort by and constrain by numeric fields.
|
|
</td>
|
|
</tr>
|
|
|
|
|
|
<tr>
|
|
<td><b>Boolean Search</b></td>
|
|
<td>
|
|
Fully nested boolean search with AND OR NOT.
|
|
</td>
|
|
<td>
|
|
Fully nested boolean search with AND OR NOT.
|
|
</td>
|
|
</tr>
|
|
|
|
<!-- title: inurl: -->
|
|
<tr>
|
|
<td><b>Searchable Fields</b></td>
|
|
<td>
|
|
Yes. Any meta tag, or if indexing JSON or XML.
|
|
</td>
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<!-- CTS -->
|
|
<tr>
|
|
<td><b>Site Restricted Searches</b></td>
|
|
<td>
|
|
Yes. Using the site: query operator.
|
|
</td>
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><b>Spell Checker</b></td>
|
|
<td>
|
|
Yes. But currently disabled until improved.
|
|
</td>
|
|
<td>
|
|
Yes.
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><b>Language Identification</b></td>
|
|
<td>
|
|
Yes.
|
|
</td>
|
|
<td>
|
|
Yes.
|
|
</td>
|
|
</tr>
|
|
|
|
<!-- gigabits -->
|
|
<tr>
|
|
<td><b>Related Concepts</b></td>
|
|
<td>
|
|
<font color=green><b>
|
|
Yes. Called <i>Gigabits</i>.
|
|
</b></font>
|
|
</td>
|
|
<td>
|
|
No.
|
|
</td>
|
|
</tr>
|
|
|
|
|
|
<tr>
|
|
<td><b>Query Expansion (Synonyms)</b></td>
|
|
<td>
|
|
Yes. Uses mysynonyms.txt file to add your own expansion terms.
|
|
</td>
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><b>Cached Pages</b></td>
|
|
<td>
|
|
Yes.
|
|
</td>
|
|
<td>
|
|
???
|
|
</td>
|
|
</tr>
|
|
|
|
|
|
<tr>
|
|
<td><b>RESTful/XML/JSON APIs</b></td>
|
|
<td>
|
|
Yes, JSON and XML.
|
|
</td>
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><b>Schemas</b></td>
|
|
<td>
|
|
<font color=green><b>
|
|
You do not need to define schemas to begin indexing files and urls.
|
|
</b></font>
|
|
</td>
|
|
<td>
|
|
You have to define annoying schemas.
|
|
</td>
|
|
</tr>
|
|
|
|
|
|
|
|
<tr>
|
|
<td><b>Spidering</b></td>
|
|
<td>
|
|
<font color=green><b>
|
|
Gigablast has a complete web spider.
|
|
</b></font>
|
|
</td>
|
|
<td>
|
|
SOLR has no spider.
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><b>Document Filters</b></td>
|
|
<td>
|
|
antiword (for Microsoft Word)<br>
|
|
pdftohtml (for PDF)
|
|
xlstohtml (for Excel)
|
|
ppthtml (for power point)
|
|
pstotext (for PostScript)
|
|
</td>
|
|
<td>
|
|
uses Apache Tika for several formats.
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><b>Scalability</b></td>
|
|
<td>
|
|
<font color=green><b>
|
|
Highly scalable. Has scaled to over
|
|
12 billion pages while server millions
|
|
of queries per day.
|
|
</b></font>
|
|
</td>
|
|
<td>
|
|
Has not scaled nearly as high to our knowledge.
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><b>Performance</b></td>
|
|
<td>
|
|
<font color=green><b>
|
|
High performance. Written in C/C++.
|
|
</b></font>
|
|
</td>
|
|
<td>
|
|
Slower. Written in Java. Has garbage collection, etc.
|
|
</td>
|
|
</tr>
|
|
|
|
<!--
|
|
<tr>
|
|
<td><b>Configuration Files and Descriptions</b></td>
|
|
<td>
|
|
</td>
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><b>Duplicate Content</b></td>
|
|
<td>
|
|
</td>
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><b>Duplicate Sections</b></td>
|
|
<td>
|
|
Can remove duplicate content at spider time
|
|
or query time.
|
|
</td>
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><b>Section Classification</b></td>
|
|
<td>
|
|
</td>
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
-->
|
|
<!--
|
|
<tr>
|
|
<td><b>Phrases</b></td>
|
|
<td>
|
|
</td>
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><b>Query Weighting</b></td>
|
|
<td>
|
|
</td>
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
-->
|
|
|
|
<!--
|
|
<tr>
|
|
<td><b>Index Layout</b></td>
|
|
<td>
|
|
</td>
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
-->
|
|
|
|
<tr>
|
|
<td><b>Ranking Alogrithm</b></td>
|
|
<td>
|
|
<font color=green><b>
|
|
Custom query term proximity based algorithm. Superior to TF/IDF or Cosine methods.
|
|
</b></font>
|
|
</td>
|
|
<td>
|
|
Old school TF/IDF based on simple statistics.
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><b>Scoring Explanations</b></td>
|
|
<td>
|
|
Complete scoring information provided.
|
|
</td>
|
|
<td>
|
|
Complete scoring information provided.
|
|
</td>
|
|
</tr>
|
|
|
|
|
|
<tr>
|
|
<td><b>Inlink Text</b></td>
|
|
<td>
|
|
<font color=green><b>
|
|
Indexed incoming link text, compensates for link spam.
|
|
</b></font>
|
|
</td>
|
|
<td>
|
|
None. Not geared for web search.
|
|
</td>
|
|
</tr>
|
|
|
|
|
|
<tr>
|
|
<td><b>Page Rank</b></td>
|
|
<td>
|
|
<font color=green><b>
|
|
Uses <i>Site Rank</i> based on number of incoming links to a site
|
|
from other sites. Detects link spam and compensates accordingly.
|
|
</b></font>
|
|
</td>
|
|
<td>
|
|
None. Not geared for web search.
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><b>On-Page Spam</b></td>
|
|
<td>
|
|
Demotes terms deemed spammy on a page.
|
|
</td>
|
|
<td>
|
|
None.
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><b>Reliability</b></td>
|
|
<td>
|
|
Pretty good.
|
|
</td>
|
|
<td>
|
|
Pretty good.
|
|
</td>
|
|
</tr>
|
|
|
|
<!--
|
|
<tr>
|
|
<td><b>Administration</b></td>
|
|
<td>
|
|
Simple web-based GUI and API.
|
|
</td>
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
-->
|
|
|
|
<!--
|
|
<tr>
|
|
<td><b>File Descriptions</b></td>
|
|
<td>
|
|
</td>
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
-->
|
|
|
|
<tr>
|
|
<td><b>Developer Documentation</b></td>
|
|
<td>
|
|
Yes. <a href=/developer.html>Here</a>.
|
|
</td>
|
|
<td>
|
|
Yes. Lots of documentation.
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><b>Graphing</b></td>
|
|
<td>
|
|
Graphs performance of various subroutines and query times.
|
|
</td>
|
|
<td>
|
|
Unknown.
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><b>Monitoring</b></td>
|
|
<td>
|
|
<font color=green><b>
|
|
Monitors drive temperature, disk space, query latency and shard uptime. Sends email alerts.
|
|
</b></font>
|
|
</td>
|
|
<td>
|
|
None known.
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><b>Geospatial</b></td>
|
|
<td>
|
|
Can use with numeric gbminint: gbmaxint: query operators on lat/lon fields.
|
|
</td>
|
|
<td>
|
|
Yes.
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><b>Dynamic Summaries</b></td>
|
|
<td>
|
|
Yes. Contain query terms.
|
|
</td>
|
|
<td>
|
|
Yes. Contain query terms.
|
|
</td>
|
|
</tr>
|
|
|
|
|
|
<tr>
|
|
<td><b>Site Clustering</b></td>
|
|
<td>
|
|
Yes.
|
|
</td>
|
|
<td>
|
|
???
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><b>More Like This</b></td>
|
|
<td>
|
|
Coming soon.
|
|
</td>
|
|
<td>
|
|
Yes.
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><b>Sort by Date</b></td>
|
|
<td>
|
|
<i>gbsortbyint:gbspiderdate</i><br>
|
|
<i>gbsortbyint:gbindexdate</i><br>
|
|
<i>gbrevsortbyint:gbspiderdate</i><br>
|
|
<i>gbrevsortbyint:gbindexdate</i>
|
|
|
|
|
|
</td>
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><b>Query Completion</b></td>
|
|
<td>
|
|
Coming soon.
|
|
</td>
|
|
<td>
|
|
Available with additional module.
|
|
</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><b>Document Collections</b></td>
|
|
<td>
|
|
<font color=green><b>
|
|
Supports tens of thousands of separate collections,
|
|
and federated search across them.
|
|
</b></font>
|
|
</td>
|
|
<td>
|
|
</td>
|
|
</tr>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
</table>
|