Commit Graph

  • e0eb47a840 Remove commented out log("hey") Ai Lin Chia 2016-05-02 16:14:57 +02:00
  • ae53b62bc7 Remove commented out codes, code style changes, remove unused methods Ai Lin Chia 2016-05-02 16:03:40 +02:00
  • b094328b62 Code style changes Ai Lin Chia 2016-05-02 15:38:43 +02:00
  • c762a1b4bb Remove commented out g_sync. Code style changes. Remove unused variable. Ai Lin Chia 2016-05-02 14:07:14 +02:00
  • 320d38de5a Code style changes Ai Lin Chia 2016-05-02 13:51:48 +02:00
  • 62910c2fc7 Use logTrace in BigFile Ai Lin Chia 2016-05-02 12:41:16 +02:00
  • 3a56e7f670 Add m_spiderdbMinFilesToMerge & split out db files configuration into separate page Ai Lin Chia 2016-05-02 12:18:42 +02:00
  • 6f7bdb52c6 Remove unused variable from Parms Ai Lin Chia 2016-05-02 11:51:04 +02:00
  • 61c8f7a49f Add logTrace to RdbBase Ai Lin Chia 2016-05-02 11:45:02 +02:00
  • d3db490590 Remove getValueAs* functions Ai Lin Chia 2016-05-02 11:37:32 +02:00
  • dd2fb62d6c Code style changes Ai Lin Chia 2016-04-29 14:29:06 +02:00
  • 0bbf3a4381 Code style changes Ai Lin Chia 2016-04-29 14:21:21 +02:00
  • 830abf2c36 Remove unused m_allowHttps config Ai Lin Chia 2016-04-29 11:08:24 +02:00
  • 5e93d023d3 Remove unused config variable Ai Lin Chia 2016-04-29 11:05:30 +02:00
  • 79fde3eb1d Use logTrace Ai Lin Chia 2016-04-29 10:36:39 +02:00
  • 4859c18c18 Code style changes Ai Lin Chia 2016-04-29 10:05:14 +02:00
  • 3ffbf5c3c1 Code style changes, remove commented out codes Ai Lin Chia 2016-04-29 10:05:00 +02:00
  • 73ac44b57e Use logDebug, code style changes, remove obsolete codes Ai Lin Chia 2016-04-29 10:03:16 +02:00
  • a7a533334d Code style changes Ai Lin Chia 2016-04-29 09:40:57 +02:00
  • cd62bb442e Fix some stat error in dump spiderdb Ai Lin Chia 2016-04-28 16:26:26 +02:00
  • 22cbd2d46e Move isRobotsTxt into caller function Ai Lin Chia 2016-04-28 12:02:08 +02:00
  • 7e70d56456 Code style changes Ai Lin Chia 2016-04-28 10:33:14 +02:00
  • 4b16f4e453 Minor code style changes Ai Lin Chia 2016-04-27 15:16:49 +02:00
  • 7db27601e5 Remove commented out code in Doledb Ai Lin Chia 2016-04-27 15:16:16 +02:00
  • d0884aac28 Remove commented out code. Simplify some statements. Convert some goto to loops Ai Lin Chia 2016-04-27 15:15:50 +02:00
  • 39179782be Remove _CHECK_FORMAT_STRING_ Ai Lin Chia 2016-04-27 15:04:09 +02:00
  • 6beb443f8f Use logTrace & logDebug for SpiderColl & SpiderLoop. Change a few goto to for loops. Ai Lin Chia 2016-04-27 15:03:18 +02:00
  • e8140d8cab Use logTrace for Spider. Rename getUrlFilterNum2 to getUrlFilterNum and remove original getUrlFilterNum which is a passthrough to getUrlFilterNum2. Ai Lin Chia 2016-04-27 14:43:20 +02:00
  • 9f27f7283b Remove commented out code in Msg4 Ai Lin Chia 2016-04-27 14:11:21 +02:00
  • 70aa4e0f99 Remove unused m_buf & m_bufPtr from Log. Add logTrace & logDebug macro Ai Lin Chia 2016-04-27 14:10:32 +02:00
  • aebda6002f Remove commented out code in Collectiondb Ai Lin Chia 2016-04-27 14:09:39 +02:00
  • a2a593f186 Remove commented out code in PageReindex Ai Lin Chia 2016-04-26 13:45:55 +02:00
  • b1b0abd039 Add constness to strnstr. Remove unused methods. Ai Lin Chia 2016-04-26 11:36:38 +02:00
  • 9fdfd046a1 Rename strnstr2 to strnstr Ai Lin Chia 2016-04-26 11:18:18 +02:00
  • 40cc294182 Remove gettimeofdayInMilliseconds_force which is the same as gettimeofdayInMilliseconds Ai Lin Chia 2016-04-26 10:36:36 +02:00
  • f4b598daea Remove unused doLoadBalancing variable Ai Lin Chia 2016-04-26 00:40:32 +02:00
  • e6e0811725 Remove commented out code in XmlDoc_Indexing Ai Lin Chia 2016-04-25 23:58:16 +02:00
  • 9ca13201ba Remove commented out code in XmlDoc Ai Lin Chia 2016-04-25 23:54:59 +02:00
  • 315020c651 Remove commented out code in SpiderLoop Ai Lin Chia 2016-04-25 23:54:12 +02:00
  • 78cb387b2c Remove commented out code in Msg22 Ai Lin Chia 2016-04-25 23:53:38 +02:00
  • adcbadefb1 Remove commented out codes & minor code style changes Ai Lin Chia 2016-04-21 12:35:52 +02:00
  • be46b7bbec Modify goto to a simple for loop Ai Lin Chia 2016-04-20 16:09:59 +02:00
  • e8f76c601c Remove commented out code Ai Lin Chia 2016-04-20 15:35:19 +02:00
  • 6116fc502f Remove always true m_forwardDownloadRequest variable Ai Lin Chia 2016-04-20 14:38:33 +02:00
  • 5e75b2c607 Remove unused variable & commented out code Ai Lin Chia 2016-04-20 14:32:04 +02:00
  • 70614070b4 Remove commented out code Ai Lin Chia 2016-04-20 13:58:54 +02:00
  • af43c7d1d2 Remove hardcoded google recaptcha page check. We should only hit that page if we're scraping google. Ai Lin Chia 2016-04-20 13:57:39 +02:00
  • 15fdea69d0 Remove unused variable Ai Lin Chia 2016-04-20 13:33:22 +02:00
  • d77f7c69ee Add a more restrictive check for who query parameter Ai Lin Chia 2016-04-20 13:32:39 +02:00
  • 2cb2f4e877 Normalize url. Don't encode character which are not suppose to be encoded. Ai Lin Chia 2016-04-19 16:03:11 +02:00
  • d09ed9f7f2 Rename MANDATORY_HEX to MANDATORY_ALPHA_HEX Ai Lin Chia 2016-04-19 10:55:46 +02:00
  • bf57e59590 Use queries map when we're not matching partial/case sensitive. Remove more tracking params. Combine strip session id & tracking param into one setting. Doesn't make sense to have 2 separate setting. Ai Lin Chia 2016-04-18 17:30:26 +02:00
  • b24efd2b98 Add more unit test & cater for more session id types Ai Lin Chia 2016-04-18 10:53:18 +02:00
  • 940b74f2e2 Split out path & path param since we can have a much more relaxed criteria for path param. Ai Lin Chia 2016-04-15 15:59:49 +02:00
  • 32120e2ef8 Separate strip tracking param unit test into smaller chunks Ai Lin Chia 2016-04-14 23:21:10 +02:00
  • d0be4f0e1c Fix compilation warnings on debug messages Ai Lin Chia 2016-04-14 23:10:18 +02:00
  • 52ec5e755e Initial commit of UrlParser implementation. Separated TitleRec version from Titledb.h Ai Lin Chia 2016-04-14 23:05:44 +02:00
  • 3e12649824 Whitespace changes in comments Ai Lin Chia 2016-04-10 11:44:03 +02:00
  • 5beba83a4e constness Ivan Skytte Jørgensen 2016-05-10 15:35:40 +02:00
  • c65dd173b9 constness Ivan Skytte Jørgensen 2016-05-10 15:29:37 +02:00
  • f63e84be23 cleanup XmlDoc::setSpam() Ivan Skytte Jørgensen 2016-05-10 12:38:33 +02:00
  • d5abecd3f6 clean up XmlDoc::getProbSpam() Ivan Skytte Jørgensen 2016-05-10 12:27:45 +02:00
  • 02d39254f0 Moved XmlDoc::getProbSpam() from header to .cpp Ivan Skytte Jørgensen 2016-05-10 12:17:24 +02:00
  • 226594ec2e constness Ivan Skytte Jørgensen 2016-05-10 12:15:55 +02:00
  • 9dd5b805a0 Make size calculation clearer with sizeof() Ivan Skytte Jørgensen 2016-05-10 12:11:36 +02:00
  • 5d99ad4b46 Scunthorpe is not an adult word Ivan Skytte Jørgensen 2016-05-09 16:07:52 +02:00
  • 9ef903104d Refactored XmlDEoc's adult-check to a separate file Ivan Skytte Jørgensen 2016-05-09 15:51:11 +02:00
  • 8d9104fa23 Speed up family filter Ivan Skytte Jørgensen 2016-05-09 14:30:19 +02:00
  • 042c699e23 include isAdult in JSON result Ivan Skytte Jørgensen 2016-05-09 13:28:18 +02:00
  • 0f07ff3270 Support expireTimeUTC in results Ivan Skytte Jørgensen 2016-05-09 11:45:02 +02:00
  • 4c10f3d9dd few more fixes related to using ugly 'pragma pack(4)'/32-bit depedent pointer arithmics Brian Rasmusson 2016-05-09 10:41:47 +02:00
  • 3847150c87 Count variable of array parameters like URL filters are now found via offsetof instead of using ugly 'pragma pack(4)'/32-bit depedent pointer arithmics. Used pointer to array member -4 to get to the int32_t placed before it in the class. Yuck. Added new Privacore URL filter option. Cleaned up code and removed unused code. Brian Rasmusson 2016-05-08 22:00:19 +02:00
  • 6c886eb042 Removed unused #includes from Mem.cpp Ivan Skytte Jørgensen 2016-05-06 23:30:08 +02:00
  • ce64e4a9a8 Fixed incorrect comments in Mem.cpp Ivan Skytte Jørgensen 2016-05-06 23:29:39 +02:00
  • e5ccc016ae Don't increment m_dgramsFrom/m_dgramsTo counters when the host is unknown Ivan Skytte Jørgensen 2016-05-06 17:27:25 +02:00
  • 79f7949b13 Removed code doing flip-flop between eth0 and eth1 Ivan Skytte Jørgensen 2016-05-06 17:12:11 +02:00
  • 10ab603a1e use true/false instead of 1/0 in calls to Hostdb::getHostFromTable() Ivan Skytte Jørgensen 2016-05-06 16:18:36 +02:00
  • 492cb19cf3 Remoed mentions on now-unsupported localhosts.conf Ivan Skytte Jørgensen 2016-05-06 16:15:08 +02:00
  • 8c4483cdca Check for same-host using ip-addresse instead of just loopback Ivan Skytte Jørgensen 2016-05-06 16:10:33 +02:00
  • 9ba39fc52b Use IPAddressChecks instead ofr hardcoded loopback checks Ivan Skytte Jørgensen 2016-05-06 15:54:41 +02:00
  • 3a33c1c962 Do not index into m_forceDelete using negative value Brian Rasmusson 2016-05-06 15:46:33 +02:00
  • cb92675aac bugfix BigFile raed disk: fd < 0. Bad engineer. Ivan Skytte Jørgensen 2016-05-06 14:45:09 +02:00
  • 944fa3bc4c Multiple JobScehduler fixes Ivan Skytte Jørgensen 2016-05-06 11:54:43 +02:00
  • f054b460db Revert "Fix posdb3 cache entry limit calculation" Ivan Skytte Jørgensen 2016-05-06 11:07:26 +02:00
  • 35621243d0 Fix script error which causes gb not to stop properly Ai Lin Chia 2016-05-05 13:03:34 +02:00
  • ab66aeeb6a Modify gbstart to read cpu affinity from taskset.conf file Ai Lin Chia 2016-05-04 17:31:47 +02:00
  • cbebd9cb9a Pass hostid to gbstart.sh to enable some host specific settings Ai Lin Chia 2016-05-04 15:27:31 +02:00
  • f8bedc8d24 Move start script to a real script instead of building the command inside gb. Merge gbcheck.sh into gbstart.sh. Ai Lin Chia 2016-05-04 15:23:21 +02:00
  • a89e93ce4f Rename kstart to start Ai Lin Chia 2016-05-04 12:38:38 +02:00
  • 4b96e34059 Make proxy start default to kstart like the normal gb start Ai Lin Chia 2016-05-04 12:26:57 +02:00
  • 992d6ffab7 Remove noop booltest Ai Lin Chia 2016-05-04 12:23:34 +02:00
  • 8b7bffb7c8 Remove commented out code & simplify statements Ai Lin Chia 2016-05-04 12:21:38 +02:00
  • 22fedf17bf Remove ./gb installgbrcp which uses the rcp command Ai Lin Chia 2016-05-04 12:11:41 +02:00
  • 1e11749b6a Code style changes & add constness Ai Lin Chia 2016-05-04 11:31:18 +02:00
  • f11beb2b17 Add constness & remove unused variable Ai Lin Chia 2016-05-04 11:10:51 +02:00
  • 5a7d9be04d Remove noop ./gb installgb2 Ai Lin Chia 2016-05-04 10:41:24 +02:00
  • 66c679d7ca Remove ./gb dstart Ai Lin Chia 2016-05-04 10:39:53 +02:00
  • ec54816738 Remove ./gb nstart Ai Lin Chia 2016-05-04 10:35:14 +02:00
  • 85795f0b45 Make PageThreads show something in html again Ivan Skytte Jørgensen 2016-05-03 16:32:58 +02:00
  • 9703e7c606 Remove noop command ./gb removedocids Ai Lin Chia 2016-05-03 16:30:05 +02:00