Commit Graph

  • 93f9cb9478 Try to fix shared library linkage Ai Lin Chia 2017-07-07 13:59:55 +02:00
  • b969a864e6 Made domain-like query rewrite configurable Ivan Skytte Jørgensen 2017-07-07 13:49:31 +02:00
  • e1811b40ce Build c-ares locally Ai Lin Chia 2017-07-07 11:41:30 +02:00
  • c612582c4c Remove now unused code for getting IP with MsgC in XmlDoc Ai Lin Chia 2017-07-07 10:09:28 +02:00
  • 9ab64d6f4f First implementation of getting ns/a records using c-ares Ai Lin Chia 2017-07-07 09:32:30 +02:00
  • dbfa031c6d Merge branch 'master' into dev-dns Ai Lin Chia 2017-07-07 09:10:53 +02:00
  • 892edba59e Remove system c-ares library as dependency Ai Lin Chia 2017-07-06 17:24:53 +02:00
  • 40fc0c49df Add c-ares as submodule Ai Lin Chia 2017-07-06 17:23:04 +02:00
  • c3d53d6cf2 Point submodules to our own forked version Ai Lin Chia 2017-07-06 16:44:54 +02:00
  • 3a890f2895 Add allowindexpage & allowrootpages criteria to urlblocklist for domain Ai Lin Chia 2017-07-06 13:17:20 +02:00
  • 60b5ea11db corrected display of language boost in scoring for pages with unknown language Brian Rasmusson 2017-07-06 11:51:32 +02:00
  • e3f4925066 Dropped sanity-check ni XmlDoc::set2() which could never be hit Ivan Skytte Jørgensen 2017-07-06 14:11:37 +02:00
  • 52ed7f9c44 Merge branch 'master' into dev-dns Ai Lin Chia 2017-07-06 13:52:20 +02:00
  • 4676193dff Add allowindexpage & allowrootpages criteria to urlblocklist for domain Ai Lin Chia 2017-07-06 13:17:20 +02:00
  • e82db4c17e bugfix (harmless) is_alnum_utf8_string() Ivan Skytte Jørgensen 2017-07-06 12:28:30 +02:00
  • c351a8a6c9 corrected display of language boost in scoring for pages with unknown language Brian Rasmusson 2017-07-06 11:51:32 +02:00
  • 2a8b194fb2 Merge branch 'master' into dev-dns Ai Lin Chia 2017-07-06 11:38:54 +02:00
  • 4f51df45bb Code style changes Ai Lin Chia 2017-07-06 10:45:19 +02:00
  • b9b8fd5e2a Merge remote-tracking branch 'origin/staging' into stable Brian Rasmusson 2017-07-05 17:20:45 +02:00
  • acba01a9d8 modified url filters to detect certain errors faster Brian Rasmusson 2017-07-05 17:17:59 +02:00
  • b16a11d06e added ETRYAGAIN, EHOSTDEAD and EINTERNALERROR to list of temporary errors Brian Rasmusson 2017-07-05 17:01:28 +02:00
  • 9bd6287e1f removed 'isdocidbased' URL filter expression as it was the same as 'isreindex' Brian Rasmusson 2017-07-05 16:00:53 +02:00
  • 57e5e61cb3 dump spiderdb: changed format of timestamps to more grep'able and sortable YYYYMMDD-HHMISS format Brian Rasmusson 2017-07-05 11:56:43 +02:00
  • 4aee65813d added sameerrcnt to spider info log line and changed format of logged timestamps to more grep'able and sortable YYYYMMDD-HHMISS format Brian Rasmusson 2017-07-05 11:47:31 +02:00
  • 2a838ec3cc modified url filters to detect certain errors faster Brian Rasmusson 2017-07-05 17:17:59 +02:00
  • 2ec28807fd added ETRYAGAIN, EHOSTDEAD and EINTERNALERROR to list of temporary errors Brian Rasmusson 2017-07-05 17:01:28 +02:00
  • 82ba170705 removed 'isdocidbased' URL filter expression as it was the same as 'isreindex' Brian Rasmusson 2017-07-05 16:00:53 +02:00
  • fb705d163f dump spiderdb: changed format of timestamps to more grep'able and sortable YYYYMMDD-HHMISS format Brian Rasmusson 2017-07-05 11:56:43 +02:00
  • 83e96daafb added sameerrcnt to spider info log line and changed format of logged timestamps to more grep'able and sortable YYYYMMDD-HHMISS format Brian Rasmusson 2017-07-05 11:47:31 +02:00
  • 34e25e9c9d Use case insensitive compare for dnsblocklist Ai Lin Chia 2017-07-05 11:46:33 +02:00
  • fd8690d395 Remove some unused fields Ai Lin Chia 2017-07-05 11:38:12 +02:00
  • 3efde8a60d Merge branch 'master' into dev-dns Ai Lin Chia 2017-07-05 11:27:56 +02:00
  • 2ec678d628 Remove now unused configuration to index spider reply. Remove commented out code. Remove now unused fields. Ai Lin Chia 2017-07-05 11:22:26 +02:00
  • 9f2f00eec6 Remove indexing of status doc, remove commented out code, simplify some statements, remove unused variables Ai Lin Chia 2017-07-05 11:10:25 +02:00
  • eeeed15067 merged spidering related changes from staging Brian Rasmusson 2017-07-04 16:12:01 +02:00
  • 04ba9ba3a6 removed description of obsolete 'age' URL filter expression, and added description of new 'sameerrorcount' Brian Rasmusson 2017-07-04 13:07:11 +02:00
  • 1258456130 added version to SpiderRequest and SpiderReply. Added constructor to SpiderRequest and SpiderReply that initializes all members Brian Rasmusson 2017-07-04 12:51:18 +02:00
  • 515206a3a4 Initial work to store count of consecutive errors of the same kind: Brian Rasmusson 2017-07-04 12:05:08 +02:00
  • 7942ac43fd Added more info to spider debug log and made log lines more grep'able. Added lock and readable timestamp to printWaitingTree. Added more comments to spider code. Brian Rasmusson 2017-07-03 20:00:56 +02:00
  • 4579cae08d Initialize firstIp by using getFirstIp instead of m_firstIp directly Ai Lin Chia 2017-07-03 12:06:48 +02:00
  • c3013411c5 Query rewriting: domains Ivan Skytte Jørgensen 2017-07-04 15:14:53 +02:00
  • ad0c136cdb Merge branch 'master' into tight_words_boost Ivan Skytte Jørgensen 2017-07-04 15:04:50 +02:00
  • b3b7d3380a bugfix is_alnum_utf8_string() Ivan Skytte Jørgensen 2017-07-04 15:02:22 +02:00
  • 0c20494613 Merge branch 'master' into tight_words_boost Ivan Skytte Jørgensen 2017-07-04 14:55:56 +02:00
  • a0296b8091 Added is_alnum_utf8_string() Ivan Skytte Jørgensen 2017-07-04 14:55:36 +02:00
  • ff3ba9674d #const ptr for has_alpha_utf8() Ivan Skytte Jørgensen 2017-07-04 14:48:10 +02:00
  • 3c3fa00894 removed description of obsolete 'age' URL filter expression, and added description of new 'sameerrorcount' Brian Rasmusson 2017-07-04 13:07:11 +02:00
  • 27f614d917 added version to SpiderRequest and SpiderReply. Added constructor to SpiderRequest and SpiderReply that initializes all members Brian Rasmusson 2017-07-04 12:51:18 +02:00
  • 46e468fbd7 Add check for out of memory error for buffer used by cld3 Ai Lin Chia 2017-07-04 12:34:09 +02:00
  • b378faef10 Initial work to store count of consecutive errors of the same kind: Brian Rasmusson 2017-07-04 12:05:08 +02:00
  • ebe2492e85 Merge branch 'master' into dev-dns Ai Lin Chia 2017-07-04 11:19:43 +02:00
  • be4c93c0dd Added more info to spider debug log and made log lines more grep'able. Added lock and readable timestamp to printWaitingTree. Added more comments to spider code. Brian Rasmusson 2017-07-03 20:00:56 +02:00
  • daea451c75 Added more info to spider debug log and made log lines more grep'able. Added lock and readable timestamp to printWaitingTree. Added more comments to spider code. Brian Rasmusson 2017-07-03 20:00:56 +02:00
  • c1d53c73d0 Merge branch 'master' into dev-dns Ai Lin Chia 2017-07-03 18:15:50 +02:00
  • 1f2c4d524b Make sure tools can compile with the latest library additions Ai Lin Chia 2017-07-03 18:14:50 +02:00
  • be1a90abbe Fix status message (don't have duplicates) Ai Lin Chia 2017-07-03 18:14:29 +02:00
  • c5d9357975 Remove commented out code Ai Lin Chia 2017-07-03 18:14:19 +02:00
  • eba5dadde2 Check urlblocklist during start of indexing document. If blocked, simulate a force delete. Placeholder for checking dnsblocklist. Ai Lin Chia 2017-07-03 17:08:46 +02:00
  • fcd39f354d Split EDOCBLOCKED error codes into two (for url & dns) Ai Lin Chia 2017-07-03 16:52:25 +02:00
  • b01e596bbc Support userWeight in scoring Ivan Skytte Jørgensen 2017-07-03 15:41:15 +02:00
  • 3bc3fcb1f9 Merge branch 'master' into tight_words_boost Ivan Skytte Jørgensen 2017-07-03 15:22:05 +02:00
  • 899112a0bf Missing use of helper.syn instead of Posdb::getIsSynonym() Ivan Skytte Jørgensen 2017-07-03 15:20:31 +02:00
  • 77886d34a4 Dropped the Posdb::getKeySize()!=6 loop end conditions Ivan Skytte Jørgensen 2017-07-03 14:59:43 +02:00
  • 0691e6cbab Merge again? Ivan Skytte Jørgensen 2017-07-03 14:58:26 +02:00
  • 3d0780c61c Make posdb/mergebuf pointer advancement uniform Ivan Skytte Jørgensen 2017-07-03 14:38:14 +02:00
  • f9aa714398 Added todo-comment about possible incorrect +6 offset in inlinking-siterank logic Ivan Skytte Jørgensen 2017-07-03 14:20:45 +02:00
  • f65128b825 Add libc-ares-dev to travis Ai Lin Chia 2017-07-03 14:16:00 +02:00
  • bb52f4282a Merge branch 'master' into dev-dns Ai Lin Chia 2017-07-03 14:13:42 +02:00
  • 5115ca7ca6 Don't use the stack for potentially large allocations Ai Lin Chia 2017-07-03 13:19:49 +02:00
  • 6c0d2822db Merge branch 'master' into tight_words_boost Ivan Skytte Jørgensen 2017-07-03 13:00:36 +02:00
  • 7e8ad5f017 Replaced 'do the math' with a simple abs() call Ivan Skytte Jørgensen 2017-06-30 17:34:51 +02:00
  • 27002d8a21 Simplify/align code in PosdbTable by introducing PosdbDecodeHelper Ivan Skytte Jørgensen 2017-06-30 17:31:46 +02:00
  • 07e0d36142 Initialize firstIp by using getFirstIp instead of m_firstIp directly Ai Lin Chia 2017-07-03 12:06:48 +02:00
  • 0485f06f5c Initialize firstIp by using getFirstIp instead of m_firstIp directly Ai Lin Chia 2017-07-03 12:06:48 +02:00
  • 0c2b23dd62 Add error number for blocked documents Ai Lin Chia 2017-06-30 17:43:25 +02:00
  • 2e6b846386 Add first version of DnsBlockList based on UrlBlockList Ai Lin Chia 2017-06-30 12:49:02 +02:00
  • 8fe0e198b4 Fix includes, fix log message, rename urlRegexList to urlBlockList Ai Lin Chia 2017-06-30 12:46:24 +02:00
  • 9d326144a6 Merge branch 'master' into dev-dns Ai Lin Chia 2017-06-30 11:59:05 +02:00
  • 03a02f26a1 Move log rotation out of logR Ai Lin Chia 2017-06-30 11:37:04 +02:00
  • 7f49a5aa5e Remove commented out code Ai Lin Chia 2017-06-30 11:20:22 +02:00
  • 55985c2cf1 Add newly added languages into the allowed list Ai Lin Chia 2017-06-29 16:55:32 +02:00
  • cea8d5b96e more const + syntax error fix Ivan Skytte Jørgensen 2017-06-29 15:30:58 +02:00
  • e8b5601924 Merge branch 'tight_words_boost' of github.com:privacore/open-source-search-engine into tight_words_boost Ivan Skytte Jørgensen 2017-06-29 15:29:03 +02:00
  • f790989caa PosdbTable: moved minimergebuf et al into a dedicated struct Ivan Skytte Jørgensen 2017-06-29 15:19:58 +02:00
  • 335f479705 Don't set S(ynonym) bit in bigram posdb entries. Ivan Skytte Jørgensen 2017-06-27 16:34:30 +02:00
  • a8108ad20c Removed "hack of confusion" in PosdbTable.cpp Ivan Skytte Jørgensen 2017-06-27 16:19:19 +02:00
  • c846624508 More trace info in PosdbTable.pp Ivan Skytte Jørgensen 2017-06-20 16:49:24 +02:00
  • e3e0f0b740 keep track of whe a qti matchingsublist came from (for later referencing the qt->* fields (weights etc.) Ivan Skytte Jørgensen 2017-06-20 16:41:57 +02:00
  • ef2eadcbc0 Added more strategic VALGRIND_MAKE_MEM_UNDEFINED calls Ivan Skytte Jørgensen 2017-06-20 14:07:19 +02:00
  • 847bf8e899 Changed eneless if-chaigns for qt->m_fieldCode into plain switch() Ivan Skytte Jørgensen 2017-06-20 13:35:52 +02:00
  • 15d3da74c1 Eliminated QueryTermInfo::m_totalSubListsSize Ivan Skytte Jørgensen 2017-06-20 12:51:47 +02:00
  • 883d9fe5e9 More const in PosdbTable.cpp Ivan Skytte Jørgensen 2017-06-20 12:41:25 +02:00
  • f1b9719a08 Restructed QueryTermInfo Ivan Skytte Jørgensen 2017-06-20 12:28:50 +02:00
  • 7ba8c92496 Changed quer:* m_fieldCode from char to an enum Ivan Skytte Jørgensen 2017-06-19 16:25:14 +02:00
  • 9f1aba01f4 Actually find the first position match of a qti, and not the first position in the last list of a qti Ivan Skytte Jørgensen 2017-06-19 14:46:44 +02:00
  • 466b58f65b Factor out common sublist iteratore from PosdbTable::prefilterMaxPossibleScoreByDistance() to setRingbufFromQTI() Ivan Skytte Jørgensen 2017-06-19 14:45:14 +02:00
  • cfbb904e94 Use Posdb::getWordPos() in PosdbTable.cpp Ivan Skytte Jørgensen 2017-06-19 14:27:52 +02:00
  • e4fa4e1d97 Move local variable declarations nearer to first use Ivan Skytte Jørgensen 2017-06-19 13:11:59 +02:00
  • 3bb9ae1d3c if...continue -> if {} Ivan Skytte Jørgensen 2017-06-16 15:53:15 +02:00
  • 60dbb43dbc Moved local variable decl to first use Ivan Skytte Jørgensen 2017-06-16 14:57:35 +02:00