Commit Graph

  • 1f300217f8 more protection for the cleanup thread orbiter 2011-07-17 08:39:39 +00:00
  • d13103a0a7 changed the way how the index cache is flushed: do not flush when a put was made because that could cause that many put calls synchronize for a long time when the dump or a merge is performed. Instead a watchdog thread is doing the dump and therefore puts cannot block any more which is good when a put happens during a search result preparation. orbiter 2011-07-17 00:02:42 +00:00
  • b06faab9d3 do not allocate a StringBuilder object in case that there is not enough memory for that orbiter 2011-07-16 23:17:19 +00:00
  • 6a6f27eaf3 do not sort arrays again if arrays are already sorted orbiter 2011-07-16 19:21:39 +00:00
  • 3d043ce9d6 - refactoring - do not start worker threads in Array class if concurrency is not used orbiter 2011-07-16 19:13:30 +00:00
  • 48b78e9ff4 disabling concurrency in new sort since that is not working yet correctly orbiter 2011-07-16 11:54:47 +00:00
  • 62ac73a108 fixed bugs and deadlocks in core database indexing structures: - added new Array class that contains an abstraction of the java Arrrays class which replaces the home-brew quicksort algorithm. - the new class is about four times slower than the old one, but it works correct (the old one had errors) - fixed a synchronization problem orbiter 2011-07-16 10:08:43 +00:00
  • aff875baef smaler ping-entry @ ProfilingGraph sixcooler 2011-07-15 09:14:21 +00:00
  • 1912d0cccc changed handling of RowSet element retrieval: until today all elements had been copied from the underlying byte[] arrays into a new Entry object that again had a copy of a portion of that byte[] in its own bye[]. There was an option to just refer to the underlying byte[] with a pointer but that was almost never used. This commit now changes an interface to the Row class where it is now necessary to tell if a copy is always required. Fortunately the copy is only needed in very rare cases. That means that this change should cause much less memory allocation; it is expected that this happens especially during search situations. orbiter 2011-07-15 08:38:10 +00:00
  • bb8e3f8523 code cleanup orbiter 2011-07-14 21:42:30 +00:00
  • be15874be1 added request line in http which can support better debugging orbiter 2011-07-14 11:00:38 +00:00
  • 11dc653de3 added a visualization of peer pings to the performance graphic orbiter 2011-07-14 07:07:06 +00:00
  • 3a191cdf14 because newbies are scared about the memory consumption in the performance graph and arguments about high memory consumption according to bad knowledge about java garbage collection techniques, the memory display had been removed from the performance graph shown on the Status.html page. The memory graph can still be seen on the Performance page where the memory graph is just like it was. orbiter 2011-07-14 03:25:57 +00:00
  • 09bb7a390c do not replace malformed or invalid URLs in urlproxy cominch 2011-07-12 07:44:23 +00:00
  • c0d9474b31 update to eclipse class path environmen orbiter 2011-07-06 14:29:17 +00:00
  • 52d799e7c8 fix for solr auth orbiter 2011-07-05 09:21:30 +00:00
  • 9eb8e9acd9 no error message about missing browser in headless environments orbiter 2011-07-05 06:54:05 +00:00
  • d3c89b90ce temporary adding the old httpclient-3.1 again because the solrj classes need them. should be removed as soon solrj supports httpclient-4 orbiter 2011-07-04 17:04:49 +00:00
  • bd99969758 fixed bad query orbiter 2011-07-04 16:53:18 +00:00
  • 768c59740c - replaced solrj 3.1 with solrj 3.3 - updated also slf4j - added authentication for solrj orbiter 2011-07-04 16:35:30 +00:00
  • e7c7598923 docfix orbiter 2011-07-04 10:48:01 +00:00
  • c7b95e8c81 *) Invalid crawl profiles (containing invalid mustmatch/mustnotmatch filters) will be moved from active crawls to invalid crawls (new file: DATA/INDEX/freeworld/QUEUES/crawlProfilesInvalid.heap). This file can not be edited yet, but it shoudl be easy to extend the CrawlProfileEditor accordingly. *) Corrupt crawlProfilesPassive.heap would cause crawlProfilesActive.heap to be deleted. Don't know if this ever happend, but will not happen anymore. *) Cleaned up a little bit. *) Added some comments. low012 2011-07-03 23:55:55 +00:00
  • b84089ff04 fix for solr scheme list definition orbiter 2011-07-03 22:59:43 +00:00
  • fd02d6d9f8 fixed solr scheme table view orbiter 2011-07-03 22:55:36 +00:00
  • 4f730a711b same for debian as for latest commit orbiter 2011-07-03 21:40:12 +00:00
  • 60ee245486 setting startup options: -Xss256k and -XX:ReservedCodeCacheSize=1024m after appearance of a malloc error together with a crash of the jvm which stated at the end of the log: orbiter 2011-07-03 21:33:24 +00:00
  • 6d2e252bcf fix for: java.lang.NullPointerException at net.yacy.kelondro.index.RowCollection.<init>(RowCollection.java:97) at net.yacy.kelondro.index.RowSet.<init>(RowSet.java:48) at net.yacy.kelondro.rwi.ReferenceContainer.<init>(ReferenceContainer.java:58) at net.yacy.kelondro.rwi.ReferenceIterator.next(ReferenceIterator.java:69) at net.yacy.kelondro.rwi.ReferenceIterator.next(ReferenceIterator.java:43) at net.yacy.kelondro.blob.ArrayStack.merge(ArrayStack.java:1023) at net.yacy.kelondro.blob.ArrayStack.mergeWorker(ArrayStack.java:922) at net.yacy.kelondro.blob.ArrayStack.mergeMount(ArrayStack.java:869) at net.yacy.kelondro.rwi.IODispatcher$MergeJob.merge(IODispatcher.java:267) at net.yacy.kelondro.rwi.IODispatcher$MergeJob.access$300(IODispatcher.java:239) at net.yacy.kelondro.rwi.IODispatcher.run(IODispatcher.java:180) orbiter 2011-07-03 20:44:33 +00:00
  • 719777b2a7 replaced method to call getUsableSpace using reflection with direct call since we now use java 1.6 orbiter 2011-07-03 18:13:37 +00:00
  • 2d4bb139d3 - added counting of links with noindex tag for solr index - bugfixes for solr index orbiter 2011-07-03 06:40:05 +00:00
  • 528b59e078 replaced xerces.jar library that was originally added 2005 with SVN 126 to the libx directory and that was moved to lib in SVN 5781 the new replacement is taken from http://xerces.apache.org and has the version 2.11.0 and was inside the file Xerces-J-bin.2.11.0.tar.gz and consists of two files named xercesImpl.jar and xml-apis.jar The original purpose of that library was to support: - content parsers - optional seed uploader - SOAP API (which will be committed later) Since the SOAP API does not exist any more the purpose is to support content parser and an optional seed uploader orbiter 2011-07-02 22:33:35 +00:00
  • e7e1a0f328 replaced commons-io v1.4 with v2.0.1 orbiter 2011-07-02 21:10:13 +00:00
  • 5092a14bcb replaced fontbox, jempbox, pdfbox v 1.5 with v1.6 orbiter 2011-07-02 20:52:33 +00:00
  • 68681a9576 hint for proxy scraping lotus 2011-07-02 17:23:37 +00:00
  • fa6f2c2b44 use proxy accounts by default for more security http://bugs.yacy.net/view.php?id=45 lotus 2011-07-02 17:16:00 +00:00
  • 892caccdca added default configuration in ConfigurationSet in case of new values orbiter 2011-07-02 00:09:49 +00:00
  • 7bf39c8bcf added XX:MaxPermSize to debian and mac start scripts orbiter 2011-07-01 22:50:46 +00:00
  • bda3eec0ff added parsing of canonical link element to html parser orbiter 2011-07-01 16:38:01 +00:00
  • b6f09a475d - added an index profile editor in the /indexFederated_p.html servlet for solr indexes orbiter 2011-06-30 15:49:21 +00:00
  • 214ea005cf added "-XX:MaxPermSize=256m" to start script orbiter 2011-06-30 15:44:06 +00:00
  • b666a929e7 fixed Semaphore handling in case of interruptions orbiter 2011-06-30 15:37:14 +00:00
  • de7a054d77 added parser for such files like the new solr.key.list it parses text files with the following syntax: - all lines beginning with '##' are comments - all non-empty lines not beginning with '#' are keyword lines - all lines beginning with '#' and where the second character is not '#' are commented-out keyword lines orbiter 2011-06-29 15:35:45 +00:00
  • 6deef60bc0 added keyword list for solr index attributes orbiter 2011-06-29 15:33:27 +00:00
  • a17351dcfe * navigation bar for filetype constraints f1ori 2011-06-29 15:30:24 +00:00
  • 96957375cc * fix url proxy for relative links and chromium f1ori 2011-06-29 09:32:02 +00:00
  • fdc84d8319 small pi link on index page to administration pages f1ori 2011-06-29 09:32:00 +00:00
  • 9ebc75db4b fix for channel authorization orbiter 2011-06-26 23:14:02 +00:00
  • 267290a821 removed the semaphores from the cache dump process because I believe some of the semaphores may be lost somewhere which then causes that the cache is never flushed and then the peer dies from a OOM. The re-introduced synchronization may not be the best solution but should ensure that the caches are flushed. orbiter 2011-06-26 21:45:04 +00:00
  • 6d9e5865ee faster appearance of search result page (but complete search time is the same) this was inspired by http://bugs.yacy.net/view.php?id=37 orbiter 2011-06-26 21:17:02 +00:00
  • f7ca84cfc0 enhanced template engine orbiter 2011-06-26 21:15:13 +00:00
  • 4fe1329de2 *) trying to at least fix symptoms of http://forum.yacy-websuche.de/viewtopic.php?f=5&t=3293#p22791 low012 2011-06-25 10:15:42 +00:00
  • d8072d1866 added more info to DNS cache in /PerformanceMemory_p.html orbiter 2011-06-24 08:27:36 +00:00
  • f803da8aae code cleanup orbiter 2011-06-24 00:24:00 +00:00
  • 4999740790 added new navigation to search trailer json and xml files which causes that these navigation is also available in the search widget orbiter 2011-06-24 00:22:57 +00:00
  • 84c9658644 added a file type navigator added a protocol navigator orbiter 2011-06-23 15:39:52 +00:00
  • 31283ecd07 - added a search option to filter only specific network protocols. i.e. get only results from ftp servers. Just add '/ftp' to your search. for example search for "passwd /ftp". This can also be done with /http /https and /smb - fixed some search throttling processes that should protect your peer against search DoS or strong search load orbiter 2011-06-23 11:57:17 +00:00
  • 4b425ffdd2 fix for http://bugs.yacy.net/view.php?id=41 added another RSS channel "PROXY". the rss feed for peer news filters this channel if there is not an authorized access on that channel orbiter 2011-06-22 10:19:32 +00:00
  • a65ecffef6 fix for http://bugs.yacy.net/view.php?id=42 orbiter 2011-06-22 10:04:30 +00:00
  • 7db208c992 performance hacks: more pre-allocated StringBuilder orbiter 2011-06-21 23:10:50 +00:00
  • 87bd559c42 fixed warning orbiter 2011-06-20 22:53:43 +00:00
  • 07e89a7ae5 added @Deprecated orbiter 2011-06-20 22:33:45 +00:00
  • 9706fc55aa enhanced content scraper (should discover urls much faster in case of very large plain texts) orbiter 2011-06-20 22:29:45 +00:00
  • 996f0a8764 disabled assert in Base64Order which eats away too much performance during testing with -l orbiter 2011-06-19 13:34:55 +00:00
  • f667b9c289 enhanced identificator: using AtomicInteger for counter orbiter 2011-06-19 13:31:10 +00:00
  • 16327d1cbe unwrapping of call depth (one call less for UTF8.String) orbiter 2011-06-19 13:15:01 +00:00
  • f30d36b101 enhanced template engine orbiter 2011-06-19 13:02:06 +00:00
  • aa6c32d753 enhanced UTCDiffString orbiter 2011-06-19 12:38:06 +00:00
  • 07cbb6cb5f display cache hit/miss values in correct column lotus 2011-06-15 18:57:04 +00:00
  • f87865a50b always shutdown log, fixes zombie processes in init stop script f1ori 2011-06-15 09:14:51 +00:00
  • 115abc8917 - more attributes for search progress bar - moved cache strategy to cora package orbiter 2011-06-13 21:44:03 +00:00
  • ccad615f58 The Java-XMS and Xmx values for the target of "run" (run YaCy) inserted. suessthomas 2011-06-11 21:22:08 +00:00
  • 7bfa6bb4b6 prevent getting a yacySeed from zero-length-hash-string by chance (for eg.: proxy-crawls got displayed as initiated by some other peer) sixcooler 2011-06-05 22:58:17 +00:00
  • bce280a308 update on options for interface graphics orbiter 2011-06-05 22:48:21 +00:00
  • 77fe69395d added jempbox-1.5.0.jar which is required by pdfbox-1.5 as stated in http://pdfbox.apache.org/dependencies.html orbiter 2011-06-05 20:04:41 +00:00
  • 72a3cd5832 equalize lock icon for Status.html lotus 2011-06-04 18:55:09 +00:00
  • df1725ef43 re-enable POST over proxy, which didn't work since update to httpcore-4.1.1 sixcooler 2011-06-04 13:25:03 +00:00
  • 66c477129e Creates a new network definition, yacy.networks.metager.unit. The YaCy freeworld network used in this network definition, minor enhancements for the feed of MetaGer were integrated. suessthomas 2011-06-03 22:34:42 +00:00
  • 2683162ec5 - added more options to access grid picture, web structure picture and network graphics - remove test class orbiter 2011-06-02 23:27:26 +00:00
  • efcd21e0ed new httpclient, httcore (bugfixrelease) sixcooler 2011-06-02 21:34:50 +00:00
  • d0d6123b18 added a deploy script that can be used to deploy yacy releases into the current release for testing orbiter 2011-06-01 19:52:05 +00:00
  • 265b7ce4f9 removed pause in search test orbiter 2011-06-01 19:49:44 +00:00
  • 0c1b29f3c9 - applied many small performance hacks - added a memory limitation in the zip parser and the pdf parser - added a search throttling: if there are too many search queries are still to be computed, then new requests are not accepted for some time. if after a one second still no space is there to perform another search, the search terminates with no results. this case should only happen in case of DoS-like situations and in case of strong load on a peer like if it is integrated in metager. - added a search cache deletion process that removes search requests in case that throttling happens orbiter 2011-06-01 19:31:56 +00:00
  • 900dacbf97 * improve link rewriting in proxy-url * only rewrites links, which are in current search domain f1ori 2011-06-01 13:27:04 +00:00
  • 7fea51ecee check filter to bee a correct pattern on edit CrawlProfiles see; http://forum.yacy-websuche.de/viewtopic.php?f=5&t=3277&p=22662#p22660 sixcooler 2011-05-31 16:13:33 +00:00
  • af63aa1d0e added fresh links to java regular expression api-doc orbiter 2011-05-31 13:33:04 +00:00
  • dc855d881b * further improve proxyurl f1ori 2011-05-30 21:25:20 +00:00
  • 761b1c71dc added latest pdfbox orbiter 2011-05-30 14:56:36 +00:00
  • a7a6b392f5 code cleanup orbiter 2011-05-30 10:16:43 +00:00
  • fe0c08455b more concurrency (enhancement) hacks orbiter 2011-05-30 08:53:58 +00:00
  • 0e9a99cb05 another resource hack orbiter 2011-05-30 07:51:18 +00:00
  • 535b6b953c more hacks to omit superfluous string object allocation orbiter 2011-05-30 07:31:17 +00:00
  • 87082f407e less String object creation during search orbiter 2011-05-30 04:19:20 +00:00
  • ab5a16b957 lesse memory occupation during ranking and faster host navigator orbiter 2011-05-29 20:33:12 +00:00
  • b8aa41a1b4 show nsis version in installer again for more transparency lotus 2011-05-27 16:30:55 +00:00
  • 1489ebeedf one more hack to free ram for search events orbiter 2011-05-27 14:26:37 +00:00
  • 3c2b994bd6 write access/load time to solr index orbiter 2011-05-27 12:35:08 +00:00
  • a36fda991e hack to increase speed of url hash computation orbiter 2011-05-27 12:34:38 +00:00
  • 752576b521 - localsearch test script does also a snippet-fetch - killYACY.sh does not need a sleep between kill -3 and kill -9 orbiter 2011-05-27 12:08:45 +00:00
  • ddcc333acc * fix negative result counts f1ori 2011-05-27 11:21:00 +00:00
  • fa734bdf9f better memory protection in search logger orbiter 2011-05-27 11:18:22 +00:00
  • dbea40d536 - changed snippet fetch strategy logic: do not check if entry is in cache. This should reduce IO load on the HTCACHE which is a showstopper during large number of search requests - forced a possible short memory status when a search is started to flush caches that may cause search-heaps with resource contention effects orbiter 2011-05-27 09:32:03 +00:00