An open source search engine and spider/crawler written in C/C++ for Linux on Intel/AMD. See the README.md file at the very bottom of this page for instructions.
Go to file
2024-12-22 20:54:06 -05:00
antiword-dir Initial file population. 2013-08-02 13:12:24 -07:00
diffbot-widget widget updates 2014-04-21 09:21:28 -07:00
doxygen codespell: spelling corrections 2021-05-06 01:52:55 +10:00
html Merge pull request #175 from onlyjob/codespell 2021-05-09 10:28:31 -06:00
junkdrawer Move ancient script dir to junkdrawer, create a new gen_cluster_config cmd line tool in fresh script dir 2024-07-02 00:08:44 -04:00
script Move ancient script dir to junkdrawer, create a new gen_cluster_config cmd line tool in fresh script dir 2024-07-02 00:08:44 -04:00
src Add more tests to Abbreviations and some more minor cleanup 2024-12-22 18:53:33 -05:00
test Add some testing for HashTableX 2024-12-22 20:54:06 -05:00
ucdata Initial file population. 2013-08-02 13:12:24 -07:00
.gitignore Start to get CMake working 2024-06-11 07:32:31 -04:00
antiword fix calls to antiword and pdftohtml etc. 2014-06-15 17:44:52 -07:00
badcattable.dat Initial file population. 2013-08-02 13:12:24 -07:00
bmptopnm Initial file population. 2013-08-02 13:12:24 -07:00
catcountry.dat Initial file population. 2013-08-02 13:12:24 -07:00
character-sets Initial file population. 2013-08-02 13:12:24 -07:00
CMakeLists.txt Cleanup Linkdb.cpp/h, remove much commented code. IMPORTANT: removed 2024-06-30 00:45:51 -04:00
control.deb package bldg updates 2014-06-16 21:50:32 -06:00
copyright.head package bldg updates 2014-06-16 21:50:32 -06:00
copyright.tail package bldg updates 2014-06-16 21:50:32 -06:00
Dockerfile Add a dockerfile and entrypoint to make it easy to play with 2024-04-10 23:26:38 -04:00
entrypoint.sh Add a dockerfile and entrypoint to make it easy to play with 2024-04-10 23:26:38 -04:00
gb-1.0.spec make it so we don't need --nodeps with 2014-05-25 22:08:46 -04:00
gb.deb.rules if netpbm pkg already installed use it. 2014-07-06 09:54:28 -07:00
gb.pem make: build and check "gb.pem"; updated expired "gb.pem" (Closes: #178). 2021-05-09 11:08:19 +10:00
giftopnm Initial file population. 2013-08-02 13:12:24 -07:00
gigablast.cbp Rename Errno.h/Errno.cpp to GbErrno.h/GbErrno.cpp to keep from conflicting with errno.h on case-insensitive filesystems 2023-11-04 16:57:58 -04:00
gigablast.layout added Codeblocks project file 2014-10-31 11:00:18 -07:00
init.gb.conf minor make install changes 2014-05-22 18:46:38 -07:00
injectme3 added injectme3 file and documentation into compare.html 2013-08-17 11:02:26 -06:00
injectmedemo fix sections.cpp to not set root title section 2014-12-11 19:54:33 -08:00
jpegtopnm Initial file population. 2013-08-02 13:12:24 -07:00
libjpeg.so.62 thumbnail generation support back in. 2014-04-24 10:13:45 -07:00
libnetpbm.so.10 thumbnail generation support back in. 2014-04-24 10:13:45 -07:00
libpng12.so.0 thumbnail generation support back in. 2014-04-24 10:13:45 -07:00
libtiff.so.4 thumbnail generation support back in. 2014-04-24 10:13:45 -07:00
LICENSE license fix 2014-06-16 13:52:51 -07:00
Make.depend Separate sources that are used to build gb, put them in src, and put everything else in the junkdrawer dir. Start a new/clean Makefile. 2024-05-15 22:29:34 -04:00
Makefile Fix stack dump on crash to dump symbols rather than log addr2line command. Add gbassert() so the purposeful segfaults can be replaced by just raising a SIGSEGV & it's purpose will be more clear 2024-05-24 15:53:16 -04:00
mysynonyms.txt mysyn fixes 2015-04-22 08:34:29 -06:00
parse_iana_charsets.pl move CollectionRec stuff into Collectiondb files 2013-12-10 15:28:04 -08:00
pdftohtml fix rdbcache init core 2014-12-01 12:37:51 -08:00
pngtopnm Initial file population. 2013-08-02 13:12:24 -07:00
pnmscale Initial file population. 2013-08-02 13:12:24 -07:00
postalCodes.txt Initial file population. 2013-08-02 13:12:24 -07:00
ppmtojpeg Initial file population. 2013-08-02 13:12:24 -07:00
pstotext Initial file population. 2013-08-02 13:12:24 -07:00
README.md Update readme with cmake/catch2 2024-08-16 00:24:25 -04:00
S99gb added S99gb for loading at boot. 2014-06-23 07:32:38 -06:00
sitelinks.txt fixed missing sites in sitelinks.txt 2015-03-05 20:32:01 -08:00
supported_charsets.txt Initial file population. 2013-08-02 13:12:24 -07:00
tifftopnm Initial file population. 2013-08-02 13:12:24 -07:00
unifiedDict.txt Initial file population. 2013-08-02 13:12:24 -07:00
wikititles.txt.part1 Initial file population. 2013-08-02 13:12:24 -07:00
wikititles.txt.part2 Initial file population. 2013-08-02 13:12:24 -07:00
wiktionary-buf.txt when user searches for a word without the 2014-06-01 09:37:00 -07:00
wiktionary-lang.txt when user searches for a word without the 2014-06-01 09:37:00 -07:00
wiktionary-syns.dat when user searches for a word without the 2014-06-01 09:37:00 -07:00

open-source-search-engine

An open source web search engine and spider/crawler. This was once the codebase for a search engine called Gigablast, but the site is no longer operational. This is a fork of the original codebase located at https://github.com/gigablast/open-source-search-engine

Quick Start

To experiment, you can quickly launch via docker by running:

docker run -p 8000:8000 -it --rm moldybits/open-source-search-engine

If you wish to preserve data between runs, you can:

docker run -p 8000:8000 -it --rm -v $(pwd)/data:/var/gigablast/data0 moldybits/open-source-search-engine

Major changes in this fork

  • cleanup! - Moved sources that are actually used into src dir. Everything else has been stuffed in the junkdrawer dir.
  • More cleanup - formatting, removing TONS of commented code, fixing some segfaults. This is ongoing...
  • I have replaced the original Makefile with CMake. This now installs the correct files required so you can execute ./gb in the build directory and run a test server there without it borking your source dir.
  • Stubbed out some testing functionality for building tests if this ever gets cleaned up enough to start making "real" changes.

Building

This does not build on ARM and does not work correctly on modern versions of MacOS, though it looks like there once was support at one point in time.

Install Catch2

git clone https://github.com/catchorg/Catch2.git
cd Catch2
cmake -Bbuild -H. -DBUILD_TESTING=OFF
sudo cmake --build build/ --target install

Debian or Ubuntu

sudo apt-get install make g++ libssl-dev libz-dev cmake

RedHat or AlmaLinux

Last tried with AlmaLinux 9

sudo yum install gcc-c++ openssl-devel libz-devel cmake

Build

cd open-source-search-engine
cmake -Bbuild
cmake --build build/

Issues & Pull Requests

Should be filed at https://github.com/twistdroach/open-source-search-engine

Testing

Tests can be put in the tests directory. I have written a few simple examples just to make sure it (mostly) works.

Documentation

There are various docs located in the html directory. The FAQ & developer.html are particularly interesting.