Commit Graph

  • 601d63ef48 removed comment tag (no use at this point) orbiter 2009-04-11 12:24:45 +00:00
  • c2d85b039e *) added language statistics files low012 2009-04-11 10:18:42 +00:00
  • 0c8fd811dc *) first and very limited version of XML import, does not use benefits provided by XML yet low012 2009-04-11 09:55:46 +00:00
  • fcb77c3140 * added .im (Isle of Man) to TLD-list f1ori 2009-04-10 23:06:48 +00:00
  • 8ce5bb4f31 added shell scripts that list host addresses orbiter 2009-04-10 09:45:22 +00:00
  • 51ea865569 small fix for localsearch shell script orbiter 2009-04-10 09:44:03 +00:00
  • b81c7467d8 protection against too many files in RICELL in case of massive emergency dumps caused by low memory orbiter 2009-04-09 23:55:47 +00:00
  • d4d87d90c4 - extended experimental wikipedia dump parser - removed historic, possibly unused code from wiki parser that was in conflict with actual wikipedia wiki code orbiter 2009-04-09 14:55:20 +00:00
  • c3aff2521e fix for NPE orbiter 2009-04-09 13:32:56 +00:00
  • 57c00dd8c9 fix for bad filtering of common http error orbiter 2009-04-09 13:20:09 +00:00
  • 14361f1ca4 added log message for index generation in HeapReader orbiter 2009-04-09 10:34:22 +00:00
  • 43bcd192cd ups orbiter 2009-04-08 15:56:38 +00:00
  • c08f9b36a4 refactoring of wiki parser. This was done to prepare the wiki parser as parser for wikipedia dumps, which will be used for performance test (to omit crawling) orbiter 2009-04-08 15:28:45 +00:00
  • faeff21012 - fix for display of automatic ReCrawls in surftips apfelmaennchen 2009-04-08 07:35:08 +00:00
  • 44e01afa5b - refactoring - a little bit more abstraction - new interfaces for index abstraction orbiter 2009-04-07 09:34:41 +00:00
  • 82fb60a720 increased memory limit for emergency cache flush orbiter 2009-04-06 15:54:19 +00:00
  • 4905a17f6a moved xerces.jar from libx to lib orbiter 2009-04-06 14:45:33 +00:00
  • 9180617dd9 *) Classes to handle import of lists (especially blacklists) from XML files, not used yet, but will be used soon. low012 2009-04-05 13:36:44 +00:00
  • 596e6215dc fix in case of white space in path name lotus 2009-04-03 16:07:24 +00:00
  • b887f4a116 keep more free mem orbiter 2009-04-03 14:27:04 +00:00
  • c2359f20dd refactoring: better abstraction of reference and metadata prototypes. This is a preparation to introduce other index tables as used now only for reverse text indexes. Next application of the reverse index is a citation index. Moved to version 0.74 orbiter 2009-04-03 13:23:45 +00:00
  • ab656687d7 more strict BLOB initialization .. may also help to save some ram orbiter 2009-04-03 12:42:24 +00:00
  • 5b138ada16 fixes to web structure reference collection and url construction orbiter 2009-04-03 08:29:40 +00:00
  • a29a11e526 added evaluation of incoming links in webstructure api the api hash changed, new XML schema. orbiter 2009-04-03 07:59:49 +00:00
  • f6691411b5 - migration of files from SplitTable (which are used for the URL-DB) to a different file name format. - the file generation logic is slightly different: files may now have only a maximum size of one gigabyte and a maximum age of one month. orbiter 2009-04-02 22:15:33 +00:00
  • 1f37cc6107 Robots.txt is now reused after one day. See forum-topic: http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1669&p=13565#p13565 shostakovich 2009-04-02 15:29:36 +00:00
  • f21a8c9e9c a different naming scheme for BLOBArray files. This may be necessary if blobs are written more often than once in a second. orbiter 2009-04-02 15:08:56 +00:00
  • 7ba078daa1 - added fast site-operator - refactoring merge into BLOBArray orbiter 2009-04-02 13:26:47 +00:00
  • b4126432bc hardening of index dump write process orbiter 2009-04-02 12:24:15 +00:00
  • 9bfb2641db - removed deprecated threads - added automatic http client reset. this was necessary because excessive intranet crawling caused deadlocks. this hack solved the problem. orbiter 2009-04-01 20:13:57 +00:00
  • 293290c317 fix for bad assert in last commit orbiter 2009-04-01 15:17:14 +00:00
  • bd409fb7ba added web structure analysis for a special domain that can be requested from the api. orbiter 2009-04-01 14:53:23 +00:00
  • b6c2167143 - patch for bad web structure dumps - added automatic slow down of accessed to specific domains when access to a web page fails orbiter 2009-04-01 13:21:47 +00:00
  • 0139988c04 - added writing of temporary file names and renaming to final file name when index dump/merge are done. Interrupted merges can be cleaned up. - added clean-up of unfinished merges and unused idx/gap files - enhanced merge file selection method orbiter 2009-04-01 12:39:11 +00:00
  • 3621aa96ab - added a memory protection for the IndexCell migration - fix for bad cell file selection orbiter 2009-03-31 19:17:45 +00:00
  • 568e8f1741 fix in unmountBLOB orbiter 2009-03-31 17:03:13 +00:00
  • 9da69d6b68 - better selection of files to be merged - fix for getChannel().close(), which works on windows but not on macs and linux orbiter 2009-03-31 16:49:02 +00:00
  • d39a5b42ca more care about open file handles. Now files also close on windows and can be deleted afterwards. orbiter 2009-03-31 12:42:12 +00:00
  • 029495e64d fixed bug introduced in SVN 5756 in EcoTable.put() orbiter 2009-03-31 07:51:32 +00:00
  • 587838bd09 git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@5758 6c8d7289-2bf4-0310-a012-ef5d649a1542 orbiter 2009-03-30 21:13:53 +00:00
  • d2e2420a68 - added another file selection method for index cell merge - more hacks to check that files are closed propertly and filehandles do not exist after files are closed. orbiter 2009-03-30 19:05:08 +00:00
  • 96eaecda3e - added migration class to go from index collections to the index cell data structure. - added better control over file deletion, because this sometimes fails, especially on windows orbiter 2009-03-30 15:31:25 +00:00
  • 9ab009b16b fix for http://forum.yacy-websuche.de/viewtopic.php?f=6&t=1890#p13476 apfelmaennchen 2009-03-30 07:33:43 +00:00
  • 0f0b4aec75 better index cell merge logic orbiter 2009-03-30 06:22:27 +00:00
  • 832fef670f migration of urls-files into subdirectory METADATA orbiter 2009-03-30 04:41:06 +00:00
  • fa07234d4e fix for clear method: now deletes files orbiter 2009-03-29 21:28:14 +00:00
  • eb65990f85 small fix for opera in yacyui-portalsearch apfelmaennchen 2009-03-29 12:49:42 +00:00
  • 695c420bcd small fix for yacyui-portalsearch apfelmaennchen 2009-03-29 10:31:15 +00:00
  • 95885a263a - added default properies to yacyui-portalsearch - see http://localhost.8080/yacy/ui/yacyui-portaltest.html apfelmaennchen 2009-03-29 09:29:08 +00:00
  • c001a020a9 - small modifications to yacyui-portalsearch - see http://forum.yacy-websuche.de/viewtopic.php?f=15&t=1762&p=13459#p13459 apfelmaennchen 2009-03-28 23:06:15 +00:00
  • df87e4dbf6 missing count of send Index and URLs lulabad 2009-03-28 20:49:58 +00:00
  • 34a825f90d small fix for yacyui-portaltest.html apfelmaennchen 2009-03-27 06:41:39 +00:00
  • 9f9d7f875d small fix apfelmaennchen 2009-03-27 06:30:59 +00:00
  • 453f3aaa94 RichClient: further clean-up apfelmaennchen 2009-03-27 06:06:14 +00:00
  • e888c9a934 RichClient: - renamed base theme to start theme - reoved all but start theme - additional themes can be downloaded from http://jquery-ui.googlecode.com/files/jquery-ui-themes-1.7.zip - or a custom theme can be generated at http://jqueryui.com/themeroller/ - themes are installed into DATA/LOCALE/htroot/yacy/ui/css/themes - update for RichClient theme selection will follow soon apfelmaennchen 2009-03-27 06:04:39 +00:00
  • 42c5f930c8 reverted an accidental commit apfelmaennchen 2009-03-24 05:41:29 +00:00
  • b5e6232f8d small correction of font-size for portal search apfelmaennchen 2009-03-23 21:26:28 +00:00
  • 7425c6c3ca added an ajax loading graph to portal search apfelmaennchen 2009-03-23 20:22:53 +00:00
  • a975ae4a7e Added YaCy portal search: http://localhost:8080/yacy/ui/yacyui-portaltest.html apfelmaennchen 2009-03-23 20:16:07 +00:00
  • b57a1820bd small fix for jquery-faviconize-1.0.js to handle https properly apfelmaennchen 2009-03-21 19:35:20 +00:00
  • 075b58a0a9 minor fixes for RichClient search apfelmaennchen 2009-03-21 18:54:42 +00:00
  • c450e3746b svn attributes added borg-0300 2009-03-20 15:44:59 +00:00
  • 37f892b988 added new concurrent merger class for IndexCell RWI data orbiter 2009-03-20 14:54:37 +00:00
  • 8c494afcfe svn attributes added borg-0300 2009-03-20 11:21:32 +00:00
  • 67aaffc0a2 - added Latency control to the crawler: because of the strongly enhanced indexing speed when using the new IndexCell RWI data structures (> 2000PPM on my notebook), it is now necessary to control the crawling speed depending on the response time of the target server (which is also YaCy in case of some intranet indexing use cases). The latency factor in crawl delay times is derived from the time that a target hosts takes to answer on http requests. For internet domains, the crawl delay is a minimum of twice the response time, in intranet cases the delay time is now a halve of the response time. orbiter 2009-03-20 10:21:23 +00:00
  • 7426dde6a6 windows installer: * install 64 bit JRE in case of 64 bit OS (not testet yet) * less languages but localized hint boxes * some cosmetics lotus 2009-03-19 18:53:30 +00:00
  • 0926310461 another performance hack orbiter 2009-03-18 22:33:36 +00:00
  • ebe5d69d14 performance hacks orbiter 2009-03-18 22:19:08 +00:00
  • 61f9dbf0cc - fixed a display problem in watch crawler - another small enhancement in balancer orbiter 2009-03-18 21:25:52 +00:00
  • b3f75e48fa - enhanced balancer: auto-solving of waiting-deadlocks - removed deprecated cache-init size value - more debug lines for IndexCell cache dump merge orbiter 2009-03-18 20:21:19 +00:00
  • 9a90ea05e0 added a merge operation for IndexCell data structures orbiter 2009-03-18 16:14:31 +00:00
  • d99ff745aa fix for http://forum.yacy-websuche.de/viewtopic.php?p=13378#p13378 orbiter 2009-03-18 10:29:13 +00:00
  • 0c3ab291c4 fix for http://forum.yacy-websuche.de/viewtopic.php?p=13354#p13354 orbiter 2009-03-17 22:20:58 +00:00
  • a9cea419ef Integration of the new index data structure IndexCell This is the start of a testing phase for IndexCell data structure which will replace the collections and caching strategy. IndexCall creation and maintenance is fast, has no caching overhead, very low IO load and is the basis for the next data structure, index segments. orbiter 2009-03-17 13:03:27 +00:00
  • fd0976c0a7 refactoring borg-0300 2009-03-16 18:08:43 +00:00
  • 83792d9233 more refactoring orbiter 2009-03-16 16:24:53 +00:00
  • ce79239322 "typo" borg-0300 2009-03-16 16:22:33 +00:00
  • cdbdc731c5 small updates: unescape, isCGI borg-0300 2009-03-16 08:49:49 +00:00
  • 474aac65af more refactoring orbiter 2009-03-16 08:32:28 +00:00
  • 209f25f5f5 refactoring to integrate indexCell data structures orbiter 2009-03-16 00:18:37 +00:00
  • 359a238acf faster isCGI() borg-0300 2009-03-14 13:47:49 +00:00
  • f75628e53b some corrections borg-0300 2009-03-14 11:08:32 +00:00
  • b7138e5fcb even more efficient comparator calls (less System.arraycopy for primary keys) orbiter 2009-03-14 00:41:20 +00:00
  • 65784eb656 - more efficient comparator calls - fix for http://forum.yacy-websuche.de/viewtopic.php?p=13331#p13331 orbiter 2009-03-14 00:07:37 +00:00
  • 44874cb550 added a deleteOnExit for blob file deletion in case that a deletion is not successful. orbiter 2009-03-13 22:47:31 +00:00
  • 66f78d67e0 bad idea. Concurrency in index management will be done differently orbiter 2009-03-13 22:22:11 +00:00
  • 7dff1cba62 removed option to use different primary keys in kelondro tables this option was never used and there is also no use to set other columns but the first as the primary key. as a result, access methods to the key do not need to compute key positions, and they work faster. orbiter 2009-03-13 16:52:31 +00:00
  • 7f67238f8b refactoring of plasmaWordIndex: less methods in the class, separated the index to CachedIndexCollection orbiter 2009-03-13 14:56:25 +00:00
  • 14a1c33823 refactoring of wordIndex class orbiter 2009-03-13 10:34:51 +00:00
  • d49238a637 more performance hacks: better default values for scaling, less memory usage orbiter 2009-03-13 10:07:04 +00:00
  • 39644dc14e performance hacks to compare methods in database core orbiter 2009-03-13 09:30:19 +00:00
  • e2e7949feb replaced old PPM computation with a better one that simply sums up events that had been stored in the profiling table. orbiter 2009-03-13 00:13:47 +00:00
  • f6d989aa04 added new class RowSetArray which arranges RowSet objects like Elements in a hashtable, but still provides the functionality of sorted enumeration. The new class is now integrated into the ObjectIndexCache, which is the core class to provide index functions to all database files. The new index access is about twice as fast as before. This has strong speed enhancement effects on all parts of YaCy. orbiter 2009-03-12 23:05:18 +00:00
  • 0a2fabeef3 static TMPDIR borg-0300 2009-03-12 16:23:12 +00:00
  • 9f7e62e900 refactoring lotus 2009-03-12 16:20:04 +00:00
  • f35dc11dc4 allow crawl start from pages with script tags http://forum.yacy-websuche.de/viewtopic.php?f=5&t=1910 lotus 2009-03-12 16:12:50 +00:00
  • 6958eff196 removed unnecessary exceptions, extended testing in IntegerHandleIndex orbiter 2009-03-12 07:35:17 +00:00
  • 13c666adef performance hack to ObjectIndex put() method: Java standard classes provide a Map Interface, that has a put() method that returns the object that was replaced by the object that was the argument of the put call. The kelondro ObjectIndex defined a put method in the same way, that means it also returned the previous value of the Entry object before the put call. However, this value was not used by the calling code in the most cases. Omitting a return of the previous value would cause some performance benefit. This change implements a put method that does not return the previous value to reflect the common use. Omitting the return of previous values will cause some benefit in performance. The functionality to get the previous value is still maintained, and provided with a new 'replace' method. orbiter 2009-03-11 20:23:19 +00:00
  • 1f1be1518c added stub for another performance hack: concurrent indexes orbiter 2009-03-11 15:52:03 +00:00
  • 3e4c28e188 enhanced count feature for kelondroRowSet. This is about twice as fast as before. Should speed up the collection analysis (half time!) orbiter 2009-03-11 15:10:38 +00:00