Commit Graph

  • 181784a5cb Merge branch 'master' of git@gitorious.org:yacy/rc1.git orbiter 2014-05-15 08:06:59 +02:00
  • 0587077d06 cleanup obsolete and not used serverswitch Authentify code as auth is mostly delegated to Jetty container. reger 2014-05-14 23:13:49 +02:00
  • c9f66be20b move unnecessary nested else out of condition orbiter 2014-05-13 22:31:12 +02:00
  • 0d8072aa99 removed warnings orbiter 2014-05-13 22:29:05 +02:00
  • 88f4af90da removed warnings orbiter 2014-05-13 22:27:31 +02:00
  • 0f425e01ca another circle computation enhancement orbiter 2014-05-13 21:30:47 +02:00
  • be7c99dbe8 switched menu position of ConfigPortal.html and ConfigSearchBox.html orbiter 2014-05-13 08:14:56 +02:00
  • a8d162810c Exclude = from percent-encoding in MultiProtocolURL fix http://mantis.tokeek.de/view.php?id=185 and http://mantis.tokeek.de/view.php?id=280 reger 2014-05-13 02:33:35 +02:00
  • 024f8e9b33 fix truncated urls containing "," adressing http://mantis.tokeek.de/view.php?id=58 reger 2014-05-13 01:50:15 +02:00
  • 9112f0a2df enhanced circle tool initialization Michael Peter Christen 2014-05-12 16:21:24 +02:00
  • a1ac4c3b76 automatically clear graphics cache Michael Peter Christen 2014-05-12 15:45:25 +02:00
  • 505f58c79c enhanced circle computation time and memory footprint Michael Peter Christen 2014-05-12 15:34:56 +02:00
  • f87ac716f3 improve IndexDeletion by query adding transparently text_t as pseudo default search field if no fieldname (no : ) is included. adressing bug report http://mantis.tokeek.de/view.php?id=274 reger 2014-05-12 00:12:05 +02:00
  • f02203fb2f fix xml validation error on defaults/web.xml reger 2014-05-11 04:39:59 +02:00
  • cd8c0dbda9 assign serialVersionUID for proxyservlet, too. reger 2014-05-11 03:51:47 +02:00
  • b300d7f4ce set serialVersionUID on urlproxyservlet to skip compiler warning - remove commented out code reger 2014-05-11 03:31:07 +02:00
  • c947ee06bf remove redundant servlet-api reger 2014-05-11 02:37:00 +02:00
  • e9060d31bd update to Jetty 9 besides adjustments in code it makes the servlet settings in web.xml significant. This applies to solr, gsa and proxy servlet. There is no longer a default setup in code during init (as jetty 9 checks for double definition). reger 2014-05-11 01:53:11 +02:00
  • 1432a817dd respect "index media" switched off in CrawlStartExpert.html fix http://mantis.tokeek.de/view.php?id=64 reger 2014-05-08 22:21:24 +02:00
  • 6122f8df91 Merge origin/master reger 2014-05-08 22:19:47 +02:00
  • f7b6ca1f4b switch pom to v1.73 and java 1.7 reger 2014-05-08 22:18:12 +02:00
  • b9c1a61814 added a peername=<peername> property in the seedlist API orbiter 2014-05-08 07:41:40 +02:00
  • 39e1913585 next development step: migration to java 1.7 This includes also a small code change to test generic type inference, a java 1.7 feature orbiter 2014-05-08 07:41:11 +02:00
  • f42d291039 Release 1.72 Release_1.72 Michael Peter Christen 2014-05-06 18:54:56 +02:00
  • 4e734815e8 enhanced snippets: remove lines which are identical to the title and choose longer versions if possible. Prefer the description part. Michael Peter Christen 2014-05-06 16:48:50 +02:00
  • e84e07399a Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2014-05-06 14:51:57 +02:00
  • c637955e67 fix for navigation steering / p2p mode see also: http://forum.yacy-websuche.de/viewtopic.php?f=5&t=5198&p=29958#p29958 orbiter 2014-05-06 05:58:51 +02:00
  • 89f76da24b Merge branch 'master' of git@gitorious.org:yacy/rc1.git orbiter 2014-05-06 05:38:38 +02:00
  • f98ccf952f Improved Blacklist API: Marc Nause 2014-05-05 23:16:01 +02:00
  • 390f03e041 o not check for segments-count on optimize: this is also done in Solr and our getSegmentsCount() does not return up-to-date values sixcooler 2014-05-05 13:24:41 +02:00
  • 8a7c68e4c7 content of surrogates/out never accessed (remove) After import the conent is never accessed but may take up a lot of disk space, also the getLoadedOAIServer (which lists the files in surrogate out) is not used. Making the surrogate.out obsolete. Removed keeping of xmls after import. reger 2014-05-04 09:29:07 +02:00
  • d781fcd809 Merge origin/master reger 2014-05-03 21:57:06 +02:00
  • 91bd384cf6 fix input-group layout on index.html see bug http://mantis.tokeek.de/view.php?id=391 reger 2014-05-03 21:55:10 +02:00
  • b8cee9b7d8 remove tables from tabletracker on close to avoid lots of dead entrys in /PerformanceMemory_p.html sixcooler 2014-05-02 22:55:47 +02:00
  • 1600414450 fix NPE on continuing crawls after YaCy restart (Agent is then nulll) reger 2014-05-02 19:32:09 +02:00
  • c600d5d144 Merge branch 'master' of git@gitorious.org:yacy/rc1.git orbiter 2014-05-02 17:28:20 +02:00
  • 0d88f292dc Key for parameter "blacklist name" is "list" in all servlets now. Marc Nause 2014-05-02 14:18:52 +02:00
  • dc3159df17 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2014-05-02 08:16:43 +02:00
  • 80e0ee92e5 adjust search page layout - search box to current style reger 2014-05-02 01:15:03 +02:00
  • a81dfc27eb remove obsolet css class bookmarkfieldset reger 2014-05-02 00:35:54 +02:00
  • 229f2248b8 added configuration option for maxmimum load and minimum ram for postprocessing Michael Peter Christen 2014-04-30 13:26:32 +02:00
  • f15c832587 Merge branch 'master' of git@gitorious.org:yacy/rc1.git orbiter 2014-04-30 07:42:52 +02:00
  • 0898f0be17 input-group for main search input window Michael Peter Christen 2014-04-30 06:46:06 +02:00
  • 9bb616d778 enhanced HostBrowser buttons and fixed text input alignment Michael Peter Christen 2014-04-30 06:21:53 +02:00
  • 4a818ad72c fix for strange fail reason Michael Peter Christen 2014-04-30 05:14:01 +02:00
  • a2fba6584f use submitted default userAgent if cloning a crawl Michael Peter Christen 2014-04-30 05:05:02 +02:00
  • e0822fa008 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Marc Nause 2014-04-30 00:48:55 +02:00
  • c97da1a0d8 First draft of a blacklist API. Marc Nause 2014-04-30 00:48:38 +02:00
  • 312972c586 add display filter (active/disabled) to IndexSchema_p.html config for easier overview of schema fields reger 2014-04-29 22:51:01 +02:00
  • d4f65833a1 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2014-04-29 19:51:01 +02:00
  • c1c1be8f02 fix for slow crawling and better logging in balancer Michael Peter Christen 2014-04-29 19:50:33 +02:00
  • 3acf416335 npe fix Michael Peter Christen 2014-04-29 19:24:05 +02:00
  • 7d37e74b44 fix to menu colours Michael Peter Christen 2014-04-29 19:13:54 +02:00
  • 3d5e354471 small changes to search headline colour Michael Peter Christen 2014-04-29 18:46:50 +02:00
  • d79d7dde55 fix for result display Michael Peter Christen 2014-04-29 16:24:21 +02:00
  • 362c988c05 design fixes to better use the new colours Michael Peter Christen 2014-04-29 16:24:01 +02:00
  • 71efc76170 new default skin pdbootstrap which keeps the design shapes but slightly changes the colours to match with bootstrap colours Michael Peter Christen 2014-04-29 16:23:42 +02:00
  • bbadccbd8d better buttons Michael Peter Christen 2014-04-29 16:22:31 +02:00
  • 2eb7682772 add html5 audio/video <source> tag to html content scraper - <source src=.. type=..> tag content is added to embed collection reger 2014-04-29 00:41:29 +02:00
  • a9963d5c95 bootstrap update Michael Peter Christen 2014-04-28 11:52:13 +02:00
  • b2bbb9a0b5 Merge branch 'master' of gitorious.org:yacy/icewindxs-rc1 Michael Peter Christen 2014-04-28 09:17:21 +02:00
  • 0b6db04e40 fix contentscraper img height/width parsing prevent numberformat exception on common "100px" property reger 2014-04-28 04:59:47 +02:00
  • 37424b0c42 Update russian translation malykhin.dmitry 2014-04-28 01:54:34 +04:00
  • 4e57000a40 remove redundant javascript & id in index.html to set focus to query field in IE11 reger 2014-04-27 22:22:00 +02:00
  • ffc5b75c73 optimize and fix lat / lon assignment reger 2014-04-27 20:52:06 +02:00
  • 9313447de2 reimplement tighter lat/lon calc in URIMetadataNode from old MetadataRow, considering http://mantis.tokeek.de/view.php?id=272 reger 2014-04-27 18:20:33 +02:00
  • d812f80784 add exit proxy link to UrlProxy on proxied pages a link to exit proxy is added to top of page. Link text can be configured in web.xml init-parameter (see default/web.xml). If missing no link is displayed. reger 2014-04-26 22:27:59 +02:00
  • 78d08998db throw MalformedURLException on unknown protocol on other than the supported http https ftp file smb \\ mailto reger 2014-04-26 01:30:51 +02:00
  • bb8181b2be fix: resolve url without path but searchpart e.g. http://yacy.net?q=test was resolved as host "yacy.net?q=test" now host="yacy.net" path="/" fixes http://mantis.tokeek.de/view.php?id=47 reger 2014-04-25 20:15:55 +02:00
  • a3542f29b4 npe fix orbiter 2014-04-25 09:26:20 +02:00
  • c48d2a2a02 npe fix orbiter 2014-04-25 09:23:10 +02:00
  • 121d25be38 recover sax fatal error on OAI-PMH import of xml with entity error this allows to continue loading next resumptionToken even if import file caused sax parser error fix http://mantis.tokeek.de/view.php?id=63 reger 2014-04-25 01:05:28 +02:00
  • 81dc2aa536 add current css to HTMLResponseWriter to fix metadata view (using css from metas.template except js links) reger 2014-04-23 23:41:10 +02:00
  • 2fd8a0ead6 Merge branch 'master' of git@gitorious.org:yacy/rc1.git orbiter 2014-04-23 23:13:23 +02:00
  • 8e5ce7cd51 fixed a situation where finished crawls had not been detected. orbiter 2014-04-23 23:13:07 +02:00
  • c6f0bd05f8 better removal of stored urls when doing a crawl start orbiter 2014-04-23 23:12:08 +02:00
  • 2f63bd0261 enhanced Host Balancer strategy: fair round robin orbiter 2014-04-23 23:11:37 +02:00
  • 0c88a32c36 do not apply lazy value instantiation for numeric or boolean values because that is misleading and confusing in case of 0- or false-values and may cause NPEs in retrieval functions. orbiter 2014-04-23 08:41:36 +02:00
  • 8e04030596 in case of short memory, do not cut down robinson peers to 1, just reduce by 50% orbiter 2014-04-23 08:37:19 +02:00
  • 86f6975edc exclude html tags in in/outboundlinks_anchortext_txt parsed text - some outboundlinks_anchortext_txt in index contain e.g. <span>text</span> or more tags, remove all tags for text property (inline img tags are still parsed) - added test case for above (to htmlParserTest) - fix solr test case reger 2014-04-23 00:55:16 +02:00
  • 469e0a62f1 added new button to terminate all crawls orbiter 2014-04-22 23:14:54 +02:00
  • ccb1864d55 catch IllegalArgumentException for wrong process types (that is needed for migrations when new process types are introduced or disappear) orbiter 2014-04-22 23:14:05 +02:00
  • 4ee4ba1576 fix for NPE in IndexCreateParserErrors_p.html caused by bad handling of lazy value instantiation of 0-value in crawldepth_i orbiter 2014-04-22 19:48:49 +02:00
  • 12ba890205 removed warnings orbiter 2014-04-22 19:35:15 +02:00
  • d51f9cc863 add custom Jetty errorhandler to provide custom error page footer line - remove redundant mime check in UrlProxyServlet reger 2014-04-21 17:28:21 +02:00
  • c193a02023 defer creation of new ArrayList after possible early return (to skip not used object allocation) reger 2014-04-21 17:16:06 +02:00
  • 727dfb5875 refactore URIMetadataNode to further unify interaction with index - URIMetadataNode extending SolrDocument - use language as stored (String), reducing conversion to string - optimize debug code in transferIndex reger 2014-04-20 01:41:30 +02:00
  • 79e7947442 - remove empty http0_9 status text array and unused default_charset = ISO-8859-1 reger 2014-04-18 22:03:16 +02:00
  • 2dabe2009d - remove unused manual http KeepAlive config (reducing references to obsolete httpdemon) - add port info to settings_http reger 2014-04-18 19:57:35 +02:00
  • 5746aae3db add canonical links to the same crawldepth, not the next crawldepth Michael Peter Christen 2014-04-18 06:51:46 +02:00
  • 74ab5ef9fa increased runtime for postprocessing query job Michael Peter Christen 2014-04-18 06:51:10 +02:00
  • 8b32dd5f9e special strategy for balancer: do not remove targets with zero wait time from the queue Michael Peter Christen 2014-04-18 06:50:07 +02:00
  • 9c6228d948 fix for deadlocks in crawler Michael Peter Christen 2014-04-17 16:58:17 +02:00
  • 7a2f3e2353 increased resource.disk.used.max.steadystate and resource.disk.used.max.overshot by 4 times because first users reached that limit and wondered why the crawler was paused automatically :) Michael Peter Christen 2014-04-17 16:19:38 +02:00
  • 10cf8215bd added crawl depth for failed documents Michael Peter Christen 2014-04-17 13:21:43 +02:00
  • 7fefebaeca Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2014-04-17 12:55:38 +02:00
  • c2f62e783f - better subgraph handling, less overhead for crawls without the webgraph - usage of crawler crawldepth cache for the linkgraph target depth computation Michael Peter Christen 2014-04-17 12:54:18 +02:00
  • 06afb568e2 new Strategies in Balancer: - doublecheck cache now records the crawl depth as well - doublecheck cache is available from the outside (made static) - no more need to crawl hosts with lowest depth first, instead all hosts which have only singleton entries are preferred to reduce the number of files. Michael Peter Christen 2014-04-17 12:52:54 +02:00
  • 1aea01fe5b fix for Table in case that requested file does not exist and paths also do not exist Michael Peter Christen 2014-04-17 12:44:05 +02:00
  • 710054bb37 implement gzip input handling directly in defaultservlet (making reference to legacy httpdemon obsolete) reger 2014-04-17 03:20:29 +02:00