1
0
mirror of https://github.com/yacy/yacy_search_server.git synced 2025-07-23 09:24:39 -04:00

Commit Graph

  • 61f627eb85 fix for ssl-connections from proxy-usage staying in close-wait-state + some extra 'close' in HttpClient sixcooler 2013-10-10 20:57:37 +02:00
  • 91fa99e9bb added new icon/image for latest commit Michael Peter Christen 2013-10-09 22:07:59 +02:00
  • 9fac9249bc - replaced 'edit' link with a clone symbol in Table_API_p since that is what it does: it clones the crawl, it does not change the crawl. - moved the appearance of this clone link to the type column since this makes it visible also if the URL column is not visible. Michael Peter Christen 2013-10-09 22:07:32 +02:00
  • 0f6db6ad5b Merge remote-tracking branch 'jensbees/crawlexpert-post' Michael Peter Christen 2013-10-09 21:32:27 +02:00
  • 3fcf7a94c5 rolling back wrong merge bhoerdzn 2013-10-09 21:06:11 +02:00
  • 3252c1ec39 Merge upstream/master into crawlexpert-post Jens Bertram 2013-10-09 20:49:14 +02:00
  • d328cc4a83 fix for didyoumean, added also more asian alphabets Michael Peter Christen 2013-10-09 16:17:50 +02:00
  • 90c8577840 enhanced ranking; patches to replace old ranking Michael Peter Christen 2013-10-09 15:10:03 +02:00
  • 9f6b98d374 Merge master into crawlexpert-post Jens Bertram 2013-10-09 14:39:20 +02:00
  • 6e33be4ce6 reverting local changes to project.xml bhoerdzn 2013-10-09 14:23:06 +02:00
  • a3824dfbaa check URL on inital load, if set bhoerdzn 2013-10-09 13:52:44 +02:00
  • 52f49d475b add a hidden field for "crawlingstart" since jQuery omits the submit button value bhoerdzn 2013-10-09 13:38:20 +02:00
  • b0c0ec2dec link recorded crawl starts back to "CrawlStartExpert_p" in "Process Scheduler" bhoerdzn 2013-10-09 12:55:42 +02:00
  • d64d45361c use integer types for boolean values bhoerdzn 2013-10-09 12:42:04 +02:00
  • eda123d6fd remove debugging code intercepting post requests bhoerdzn 2013-10-09 11:51:07 +02:00
  • 5057f27bbd fix typo in parsing "cachePolicy" parameter bhoerdzn 2013-10-09 11:41:15 +02:00
  • 98f5c9018d Fixed template vars for "deleteold". Fixed parsing "deleteold" parameter. Stop "setState" overwriting "deletold" state on load. bhoerdzn 2013-10-09 11:32:17 +02:00
  • a6a62986d4 correct state handling for country code restriction bhoerdzn 2013-10-09 10:42:35 +02:00
  • 4066b85155 correctly set initial state for load filters bhoerdzn 2013-10-09 10:36:08 +02:00
  • 8c91c3e7cd set form boolean values to 0 & 1 instead of false & true bhoerdzn 2013-10-09 10:05:51 +02:00
  • c27fabc88e fixed wrong parameter check bhoerdzn 2013-10-09 10:00:16 +02:00
  • 2214bf5396 Remove some post parameters, if they are set to default values, as their values are already set by YaCy. Added some documentation. bhoerdzn 2013-10-09 09:48:00 +02:00
  • e74f548551 make legacy http server (serverCore) implement YaCyHttpServer interface reger 2013-10-09 01:07:22 +02:00
  • 71d2655c02 downgrade to Jetty 8 to assure support of JRE 1.6 - introduce a YaCyHttp interface to modulize/separate http server - adjust the Jetty version specific implementation part (in package net.yacy.http) - putting the version specific code in classes starting with Jetty8xxxx - moved existing Jetty9xxx implementation into a test class (to keep the code) - adjust build to the changed jars - make use of the introduced YaCyHttpServer interface in related htroot servlets reger 2013-10-09 00:40:48 +02:00
  • 1b61bd40ed - Added new solr field url_file_name_tokens_t which stores the file name tokens. This can be used to enhance the ranking. - Added also a rating_i field as basis for later usage. - enhanced the tokenization process. Michael Peter Christen 2013-10-08 23:48:13 +02:00
  • 6efa7532d2 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2013-10-08 19:04:57 +02:00
  • 5f5a97bafc added the anchor text within web pages to the searcheable entities of a web page. This can be of benefit for the ranking if these fields are used for boosts. orbiter 2013-10-08 18:41:07 +02:00
  • 705b3338ee list more fields available for search and for ranking boosts orbiter 2013-10-08 18:15:35 +02:00
  • d536092fe4 fix false fill NAME_CACHE_MISS-DNS-Cache in case of a timeout for eg. caused by massive requests when crawl from file sixcooler 2013-10-08 18:02:42 +02:00
  • 405878182f Use list template for all other option lists. Fixed some template expressions. bhoerdzn 2013-10-08 15:04:31 +02:00
  • 8e74098cd4 Use list template for "reloadIfOlderNumber". bhoerdzn 2013-10-08 13:26:09 +02:00
  • 52bad7b908 Dynamic toggling of form fields, based on passed in and selected values. This will also cut down the post string by disabling not needed fields. bhoerdzn 2013-10-08 13:24:27 +02:00
  • 78e7aadb26 removed unused initialization method Michael Peter Christen 2013-10-07 23:51:28 +02:00
  • e56aa4fe93 fixed search navigation Michael Peter Christen 2013-10-07 23:51:08 +02:00
  • 4fbc4740df removed warnings Michael Peter Christen 2013-10-07 23:41:50 +02:00
  • 202a9fbdad adding synonyms from German OpenThesaurus ready for use in YaCy Lotus 2013-10-07 22:02:42 +02:00
  • 21aa6a0321 migration to Solr 4.5.0 Michael Peter Christen 2013-10-07 17:09:40 +02:00
  • 45cf553bc3 try to guess default crawling mode, if none set bhoerdzn 2013-10-07 13:13:22 +02:00
  • b4f0c822f2 assign strings before checking contents bhoerdzn 2013-10-07 13:01:39 +02:00
  • ef31d0f279 fix for rss reader, see http://bugs.yacy.net/view.php?id=294 Michael Peter Christen 2013-10-07 12:59:54 +02:00
  • 499abe8f91 set default values for string parameters bhoerdzn 2013-10-07 12:32:23 +02:00
  • 85316b3ac6 Merge branch 'master' into crawlexpert-post Jens Bertram 2013-10-07 12:02:52 +02:00
  • 42ea56eaad made crawStartExpert_p aware of post variables; extended template where needed bhoerdzn 2013-10-07 11:25:59 +02:00
  • 101a6e6e14 Patch the citation index for links with canonical tags. This shall fulfill the following requirement: If a document A links to B and B contains a 'canonical C', then the citation rank computation shall consider that A links to C and B does not link to C. To do so, we first must collect all canonical links, find all references to them, get the anchor list of the documents and patch the citation reference of these links. Michael Peter Christen 2013-10-07 11:15:58 +02:00
  • daebeb93aa add call to AccessTracker to jetty security handler reger 2013-10-04 01:16:17 +02:00
  • 172aefaeeb adjust YaCySecurityHandler to Jetty 9 conventions - mainly adjust prepareConstraintInfo to use the RoleInfo.setChecked as in Jetty Source distribution - use constraint check behavior as in ConstraintSecurityHandler see http://git.eclipse.org/c/jetty/org.eclipse.jetty.project.git/tree/jetty-security/src/main/java/org/eclipse/jetty/security/ConstraintSecurityHandler.java?id=jetty-9.0.5.v20130813 reger 2013-10-03 19:38:03 +02:00
  • ba3c173077 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2013-10-03 18:19:02 +02:00
  • 6f9ed439d3 - expand localHostName check of AbstractRemoteHandler to pevent request is handled as proxy request - make domain handler not relay on included path in resolved .yacy address reger 2013-10-01 03:04:32 +02:00
  • 561ea135af fix : forgot adding security handler reger 2013-09-30 04:35:17 +02:00
  • f46771bdf5 upd build script from rc1/master reger 2013-09-30 03:47:55 +02:00
  • c7c706fd9f merge with rc1/master reger 2013-09-30 03:46:39 +02:00
  • 272b196d05 update Jetty server init() to activate yacy-domain and transparent proxy handler - adding domain & proxy handler to a context (as it was in inital design) (context required for dispatcher) - make handler context and servlet context parallel available (to allow use of YaCyDefaultServlet to handle legacyServlets) - set transparent proxy request handled after dispatch.forward to skip further handling for .yacy domain requests reger 2013-09-30 03:12:52 +02:00
  • fd119deb00 fix NPE on modified since check ( Response.requestHeader allowed to be null) reger 2013-09-30 02:50:53 +02:00
  • 66145a0410 - add welcome file (index.html) support to YaCyDefaultServlet - change SolrServlet default search field (&df) to text_t reger 2013-09-29 03:34:00 +02:00
  • a3b5d84c81 Merge remote-tracking branch 'origin/master' orbiter 2013-09-28 15:46:59 +02:00
  • adfae074cf added classpath for debugging orbiter 2013-09-28 15:45:33 +02:00
  • b28d43decc added two more fields source_cr_host_norm_i,target_cr_host_norm_i in webgraph and an addition to postprocessing to copy all cr ranking attributes to the link edges associated to the postprocessing documents Michael Peter Christen 2013-09-27 16:57:05 +02:00
  • a52f3a597e fix for canonical-from-http-header feature Michael Peter Christen 2013-09-27 15:09:04 +02:00
  • 2dd7c5be44 added parsing of http-canonical tags (untested, could not find an example page) Michael Peter Christen 2013-09-27 13:17:50 +02:00
  • 4476dea5ba do not fail if a wrong boost key is used; instead, print only a warning See also: http://bugs.yacy.net/view.php?id=293 Michael Peter Christen 2013-09-27 12:28:09 +02:00
  • ab9583d429 add default field (&df) to SolrServlet query if missing reger 2013-09-26 22:20:35 +02:00
  • 3bf0104199 fix for crawl domain counter limitation (limit was reached too early) Michael Peter Christen 2013-09-26 13:41:52 +02:00
  • 82bfd9e00a - crawl profiles shall be deleted from active and passive stacks if they are deleted to terminate the crawl because otherwise the crawl will go on after the load-from-passive stack policy. - better check if a crawl is terminated using the loader queue. Michael Peter Christen 2013-09-26 10:22:31 +02:00
  • 1b3d26dd23 hack to remove most of the warning: deprecated messages (but not all, one is left) Michael Peter Christen 2013-09-25 21:14:52 +02:00
  • a496313248 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-09-25 20:41:02 +02:00
  • 3c48fc65fd reverted RemoteInstance to deprecated methods of httpClient-4.2 this should work with current remote-Solr-Instances sixcooler 2013-09-25 18:45:16 +02:00
  • 91a875dff5 self-healing of mistakenly deactivated crawl profiles. This fixes a bug which can happen in rare cases when a crawl start and a cleanup process happen at the same time. Michael Peter Christen 2013-09-25 18:27:54 +02:00
  • 095053a9b4 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2013-09-25 17:32:52 +02:00
  • 0cae420d8e some dns-timing changes: since httpclient uses the domain-cache it is useful not to clean the domain cache until crawling is running (domains are filled into this cache) On huge crawl-starts (eg. from file) my DNS did not follow the high rates - so I reduced the rate and give some more time(-out) sixcooler 2013-09-25 15:01:28 +02:00
  • 15b1bb2513 bump to httpClient-4.3 sixcooler 2013-09-25 14:48:37 +02:00
  • 4f83d5f18c added the new field harvestkey_s to the collection index and the webgraph index which is temporary filled with the crawl profile key. This is used to select a set of documents for post-processing as soon as a crawl is finished. Now the postprocessing for a specific crawl is started when that specific crawl is finished and not at the end of all post-processing steps. Michael Peter Christen 2013-09-25 14:38:24 +02:00
  • 14442efa6d when profiles are cleaned, there shall be first a callback showing which profiles are cleaned. This shall enable a profile-termination-driven postprocessing. To do this, index writings must carry the profile key which will be implemented in another (next) step. orbiter 2013-09-25 11:04:12 +02:00
  • 0013d0d0bb removed superfluous class orbiter 2013-09-24 21:18:37 +02:00
  • f90d5296cb Added new data structure to be used by the balancer (not used yet). These data structures will enable the balancer to store the crawl queue into individual queues, one each for a single host. orbiter 2013-09-24 21:08:40 +02:00
  • 0e8d752462 refactoring orbiter 2013-09-24 19:55:59 +02:00
  • 8ac2e8c8c9 added location navigator which causes that the image to the map search is visible whenever a location is available in the search result. To activate this, the search.navigation property in yacy.conf must be modified to the new default values. orbiter 2013-09-24 11:26:51 +02:00
  • d86d2be5c3 automatically removed Places autotagging if no location library is wanted orbiter 2013-09-24 11:23:45 +02:00
  • 214a087cdf Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2013-09-23 20:59:03 +02:00
  • 96ed0c980e - added hosthash to all documents (also fail documents which is needed there for deletion), this fixes a problem for the deletion of old documents for new crawl starts - added clickdepth and citation computation for fail documents Michael Peter Christen 2013-09-23 18:09:42 +02:00
  • 179ad281f9 close include byte buffer after usage Michael Peter Christen 2013-09-23 12:19:51 +02:00
  • 52dd491c04 fix not necessary use of DigestURL reger 2013-09-23 03:05:09 +02:00
  • 6b9a624808 remove double declaration of TLD_any_zone_filter reger 2013-09-23 03:01:08 +02:00
  • 5111841e5b - reduce Jetty debug logging - fix Context path initialization reger 2013-09-23 01:30:45 +02:00
  • bc6ebb3c06 adjust to DigestURI changes from master to DigestURL reger 2013-09-22 20:57:50 +02:00
  • 561cbc7ee2 use more YaCy HeaderFramework constants (instead of Jetty's) reger 2013-09-22 04:23:42 +02:00
  • 5c4ba9b5db merge rc1 master reger 2013-09-22 02:21:24 +02:00
  • 70c51775ae Merge remote-tracking branch 'origin/master' into jetty reger 2013-09-22 02:09:02 +02:00
  • 4b77733e59 implement a YaCyDefaultServlet to handle YaCy-servlets within Jetty server - the implementation is inspired by Jetty's DefaultServlet - handles static html content and YaCy servlets - translates between standard servlet request/response and YaCy request/response specification With the implementation of YaCy-servlets as servlet instead via a jetty handler it's closer to servlet standard and carries less jetty specific dependencies. reger 2013-09-22 01:57:32 +02:00
  • d2effd21db fix for npe during location search orbiter 2013-09-21 21:03:58 +02:00
  • 828603e4f1 fix for 100%CPU problem in error cache cleaning process orbiter 2013-09-21 10:20:13 +02:00
  • c64b51134e hack to add all tokens from the url to text_t. This was working for the RWI index (and still is working) but not for solr-only search indexes. Maybe we should find a solution using a separate search field instead. orbiter 2013-09-21 08:57:43 +02:00
  • 6e8377b8ad do not check all words with synonym library if the library is empty orbiter 2013-09-21 08:56:24 +02:00
  • 70ba74b23a disabled ipv4 preference to enable ipv6-only networks like freifunk orbiter 2013-09-20 16:52:37 +02:00
  • f3be1930cb CPU problem when pusing to the error cache; wrong class, ConcurrentHashMap needed for concurrency orbiter 2013-09-20 16:51:50 +02:00
  • e40671ddb7 better and consistent deletions for error urls Michael Peter Christen 2013-09-17 15:52:57 +02:00
  • 2602be8d1e - removed ZURL data structure; removed also the ZURL data file - replaced load failure logging by information which is stored in Solr - fixed a bug with crawling of feeds: added must-match pattern application to feed urls to filter out such urls which shall not be in a wanted domain - delegatedURLs, which also used ZURLs are now temporary objects in memory Michael Peter Christen 2013-09-17 15:27:02 +02:00
  • 31920385f7 set anchor rel attribute of all links to "nofollow" if the html meta contains a robots:nofollow or if the http header contains a "X-Robots-Tag: nofollow" Michael Peter Christen 2013-09-16 16:14:56 +02:00
  • 9619b8743c add Solr Servlet reger 2013-09-16 03:01:18 +02:00
  • 57e00baf26 fix for parsing of image links inside of anchor links (image-links) Michael Peter Christen 2013-09-15 23:54:46 +02:00
  • 61c5e40687 - replaced the properties object in AnchorURL with distinct variables for anchor attributes. - this caused that large portions of the parser code had to be adopted as well - added a counter target_order_i for anchor links in webgraph computation Michael Peter Christen 2013-09-15 23:27:04 +02:00