Commit Graph

  • 7ad9058f77 when doing a query reindex on a json child url we need to add the spider request of the original parent url and make sure it does not get "EDOCUNCHANGED" error. then the possibly new json child objects won't get indexed. Matt Wells 2014-05-21 05:43:53 -07:00
  • 34afc7c7cf Merge branch 'diffbot-dan' into diffbot-testing Matt Wells 2014-05-21 05:30:56 -07:00
  • e39dffadcf use "expand" option when calling Diffbot Daniel Steinberg 2014-05-20 22:00:46 -07:00
  • 4b587f168b fix bug of not including empty responses when &icc=1 Matt Wells 2014-05-20 21:07:21 -07:00
  • c729b51ae5 fixed exact # search results hit count when using min/max/sort operators. Matt Wells 2014-05-20 13:45:00 -07:00
  • 6664faa792 fix printing back-to-back commas when showing results in json with &icc=1. Matt Wells 2014-05-20 13:23:29 -07:00
  • ffc4036840 update admin.html mwells 2014-05-19 06:22:34 -07:00
  • cd3e11b6ee Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing Matt Wells 2014-05-16 18:48:06 -07:00
  • d2cc117d82 fix oops Matt Wells 2014-05-16 18:47:52 -07:00
  • 526be98ec8 fix core scenario when diffbot reply that was injected using &diffbotreply= contains the http mime. Matt Wells 2014-05-16 18:46:39 -07:00
  • baf1ccb7d5 note updates Matt Wells 2014-05-16 09:52:41 -07:00
  • eea5dff0f5 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing Matt Wells 2014-05-16 09:38:42 -07:00
  • a22396c344 quick doc update Matt Wells 2014-05-16 09:38:32 -07:00
  • 2484147403 fix core Matt Wells 2014-05-16 09:30:46 -07:00
  • 1af8ca846f Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing Matt Wells 2014-05-16 08:08:42 -07:00
  • a81f2145bd fix sendmail ip to 127.0.0.1 Matt Wells 2014-05-16 08:08:20 -07:00
  • 4684298965 minor doc update Matt Wells 2014-05-16 08:01:29 -07:00
  • 2ce6ed266a fix another core from a 0 docid Matt Wells 2014-05-16 07:59:04 -07:00
  • 6d9fdc975b fix core from not setting m_gotClusterRecs in Msg39.cpp Matt Wells 2014-05-16 06:32:51 -07:00
  • 5c2cc973a8 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing Matt Wells 2014-05-15 18:27:13 -07:00
  • a303bda1f8 fix core Matt Wells 2014-05-15 15:10:57 -07:00
  • b38f62c7dc nothing Matt Wells 2014-05-15 14:15:05 -07:00
  • 72c6d032d8 fix query reindex on subdocuments (diffbot json blurbs) so that they just put in a spiderrequest to reindex the parent url. Added &diffbotreply= to the injection interface so dan can provide that along with the pageUrl he passes in with &u= Matt Wells 2014-05-15 14:11:12 -07:00
  • fc5cfa2a62 move list of bulk urls to new directory earlier. May fix Defect #2218 if there is something that is causing the bulk job to restart before this function returns Daniel Steinberg 2014-05-15 13:35:32 -07:00
  • 6afa3f2561 save spots to disk as space separated Daniel Steinberg 2014-05-14 14:40:46 -07:00
  • 00b652581f fix boolean query containing quoted phrase Matt Wells 2014-05-14 11:22:07 -07:00
  • 8ac7fdfa24 Msg39::controlLoop now works Matt Wells 2014-05-14 11:02:09 -07:00
  • d95cbb42d6 Merge branch 'diffbot-testing' into diffbot-matt Matt Wells 2014-05-14 10:52:45 -07:00
  • db543ddd9f nothing Matt Wells 2014-05-14 09:37:59 -07:00
  • 40bca5d120 try to fix msg22 core some more Matt Wells 2014-05-14 08:16:47 -07:00
  • 48df53e74f Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing Matt Wells 2014-05-14 07:48:23 -07:00
  • 0242fe88ff try to fix msg22 based cores Matt Wells 2014-05-14 07:46:32 -07:00
  • 88eb44827f fix avail docid logic some more for indexing spdier replies Matt Wells 2014-05-13 21:27:05 -07:00
  • 773c9ad8f6 Merge branch 'diffbot-testing' into diffbot-matt Matt Wells 2014-05-13 21:11:14 -07:00
  • 015b9d4597 fix oopsy Matt Wells 2014-05-13 21:10:34 -07:00
  • 4cba959529 revised msg39.cpp in order to fix boolean bug Matt Wells 2014-05-13 20:50:11 -07:00
  • 0905fc48c1 fix bug in getAvailDocId() Matt Wells 2014-05-13 20:10:03 -07:00
  • 3773b84f84 Merge branch 'diffbot-dan' into diffbot-testing Matt Wells 2014-05-13 17:49:05 -07:00
  • 75642f44a3 don't need var i Daniel Steinberg 2014-05-13 17:25:42 -07:00
  • ffee90f3bb Defect #2268 Daniel Steinberg 2014-05-13 17:23:07 -07:00
  • 35f6652ceb make gb start and kstart not use hostid any more. it is now inferred from path of gb binary. Matt Wells 2014-05-12 21:24:41 -07:00
  • 037067170c fix for symlinks in host paths in hosts.conf Matt Wells 2014-05-12 20:50:11 -07:00
  • 32a95cca45 fix 'gb install' Matt Wells 2014-05-12 17:04:38 -07:00
  • c5ae5ca4b5 v3 support for tokenized diffbot replies using the "objects" array in the json. Matt Wells 2014-05-12 16:13:24 -07:00
  • 8d1c4e3097 Merge branch 'diffbot-testing' into diffbot-dan Matt Wells 2014-05-12 15:33:15 -07:00
  • 4bb1f99296 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing Matt Wells 2014-05-12 15:15:52 -07:00
  • c58bd016a6 multiple content types for page parser content Matt Wells 2014-05-12 15:15:34 -07:00
  • 5f7bbe7523 fix diffbot smoke tests. do not index spider replies for custom crawls. Matt Wells 2014-05-12 15:14:11 -07:00
  • 78e2bd8171 start implementing handling for array of "objects" Daniel Steinberg 2014-05-12 15:04:36 -07:00
  • 0a2523f361 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing Matt Wells 2014-05-12 10:59:00 -07:00
  • 8e00a9e7e1 Merge branch 'diffbot-testing' of github.com:gigablast/open-source-search-engine into diffbot-testing Matt Wells 2014-05-12 10:58:44 -07:00
  • 7b840f1231 updated err msg Matt Wells 2014-05-12 10:58:36 -07:00
  • 5041307c2a Merge branch 'testing' into diffbot-testing Matt Wells 2014-05-12 10:53:57 -07:00
  • 7ca1e8e790 gb.conf update for new parm Matt Wells 2014-05-12 10:53:22 -07:00
  • 6c72292e57 added mysynomyms.txt file to official list mwells 2014-05-12 10:36:54 -07:00
  • 8f6d54f5a0 Merge branch 'master' into diffbot-testing Matt Wells 2014-05-12 10:09:36 -07:00
  • deaaf69968 fix core from federated search and m_r being null Matt Wells 2014-05-12 10:05:38 -07:00
  • 85818e1b98 doc update mwells 2014-05-12 07:56:22 -07:00
  • 1717bd2ab1 Merge branch 'master' into testing mwells 2014-05-12 07:32:28 -07:00
  • 1d2b234831 quick fix for core mwells 2014-05-12 07:32:05 -07:00
  • 45b8bb3421 log msg cleanups mwells 2014-05-11 21:55:44 -07:00
  • a9dc18c866 fix more bugs. mwells 2014-05-11 19:44:41 -07:00
  • c3a1c674c3 now we run gb without a hostid. we use its path and the local ip to identify its hostid # in the hosts.conf. mwells 2014-05-11 19:36:24 -07:00
  • 463dc2159f more make install updates mwells 2014-05-11 17:02:15 -07:00
  • 5df28bb147 more pkg fixes mwells 2014-05-11 15:02:03 -07:00
  • a7fbcfc188 make install updates mwells 2014-05-11 14:47:46 -07:00
  • 7c30c6b970 make install fixes. getting ready for pkg build. mwells 2014-05-11 14:20:24 -07:00
  • 70016ec3a3 work on make install. mwells 2014-05-11 12:48:56 -07:00
  • aa76b36bf0 nothing mwells 2014-05-11 12:04:10 -07:00
  • 467e70bd98 improvements for thumbnail generator. mwells 2014-05-11 08:44:38 -07:00
  • 533a6caef7 image formatting fixes mwells 2014-05-11 07:06:35 -07:00
  • 2d6bc12866 thumbnails gen off by default for now mwells 2014-05-10 17:24:48 -07:00
  • 8ecf05fe04 updated features to include federated search mwells 2014-05-10 17:14:31 -07:00
  • d2f0824f8b ah screw it mwells 2014-05-10 16:54:56 -07:00
  • de5388c79d default images off until we fix better mwells 2014-05-10 16:54:07 -07:00
  • 35a94afcc9 thumb display fixes mwells 2014-05-10 14:30:30 -07:00
  • 1a13342782 fix thumbnail printing. mwells 2014-05-10 14:24:13 -07:00
  • b49e4ab05f fix core mwells 2014-05-10 12:57:11 -07:00
  • 898ffa40bc image core fix. image log cleanups. mwells 2014-05-10 12:46:10 -07:00
  • 6e922722da tree repair logic. mwells 2014-05-10 12:32:01 -07:00
  • 7a0971ca39 fix nasty spider bug that was not prioritizing things right. fixed image debug logging. mwells 2014-05-10 10:07:37 -07:00
  • 7e1429cc30 more bug fixes mwells 2014-05-10 08:22:26 -07:00
  • 8e381504a1 fix makeTrashDir() mwells 2014-05-10 08:02:46 -07:00
  • 4c2a6a2519 minor fix mwells 2014-05-10 07:58:26 -07:00
  • 2b37f56e4c Merge branch 'diffbot-matt' into testing mwells 2014-05-10 07:56:45 -07:00
  • 38a79888b6 Merge branch 'diffbot-testing' into testing mwells 2014-05-10 07:49:29 -07:00
  • ed816b2c11 a few bug fixes mwells 2014-05-10 07:48:23 -07:00
  • 6b92a1f3d4 Merge branch 'master' into testing mwells 2014-05-10 06:43:27 -07:00
  • f19014cc6c fixed missing / mwells 2014-05-10 06:39:36 -07:00
  • e70f760d87 us gbstatus: and gbstatusmsg: field operators Matt Wells 2014-05-09 18:10:38 -07:00
  • b1cd0cac86 indexing spider replies now working. use type:status to see them or gbstatus:success or gbstatus:tcp or gbstatus:0. Matt Wells 2014-05-09 18:07:38 -07:00
  • 941c8f1892 now added CT_STATUS type results into serps. one for each spider reply we add so we can query spider replies. using url: or type:status etc. Matt Wells 2014-05-09 13:52:12 -07:00
  • eb49094343 try to start indexing spider replies as regular search results in the index so you can query on those. get histograms of spider status msgs, etc. ability to turn that and images on/off. Matt Wells 2014-05-09 11:18:24 -07:00
  • 6048ae849b added support for spidering a particular language with higher priority. mwells 2014-05-09 10:03:24 -06:00
  • 305340e4ff minor update Matt Wells 2014-05-07 16:33:16 -07:00
  • a2c47750fa temp disable wpid logic. do indexing of err pages first. Matt Wells 2014-05-07 16:26:57 -07:00
  • 3bf52f0f2d if "wpid" is supplied try to update sitelist for that wpid. hopefully we can get the wp admin tools to send a /search?wpid=xxxx&sites=xyz.com request so we can start spidering those sites before they even see the widget. also it is simpler than trying to update m_siteListBuf each time someone does a query since those can be hundreds a second. Matt Wells 2014-05-07 16:10:26 -07:00
  • 01a6ae1166 take html column out of csv Matt Wells 2014-05-07 13:28:20 -07:00
  • dd5f35b06d added icon Matt Wells 2014-05-07 13:06:56 -07:00
  • dd3ab38e55 formatting updates for widget Matt Wells 2014-05-07 13:06:42 -07:00