Commit Graph

  • 2f536cb54d code cleanup: removed unised methods and made more methods and objects private Michael Peter Christen 2012-10-08 10:50:24 +02:00
  • 584663ae8c - redesign of solr query construction - fix for solr boosts and location search - fix for number of search results in local search Michael Peter Christen 2012-10-07 07:46:55 +02:00
  • 6ab64746d7 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2012-10-06 03:35:32 +02:00
  • a8167e6e5b clean-up: removed unused methods in kelondro Michael Peter Christen 2012-10-06 03:34:52 +02:00
  • 5cb244b79b Merge remote branch 'origin/master' sof 2012-10-05 18:54:39 +02:00
  • 88b062210c Added a parser for audio file tags (e.g. ID3 tags for MP3 files) based on the jaudiotagger library. The parser is disabled by default as it needs to store temporary files for non file:// protocols, which might be disliked. For your local MP3-collection it loads nicely Artist, Title, Album etc. from the audio files meta data. apfelmaennchen 2012-10-05 18:54:26 +02:00
  • 28bd3e62b1 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2012-10-05 00:04:09 +02:00
  • 4fed4a86d8 another fix to location search orbiter 2012-10-04 22:44:44 +02:00
  • 507c612015 Merge branch 'master' of git://gitorious.org/~reger/yacy/bbyacy-rc1 orbiter 2012-10-04 21:32:04 +02:00
  • 5650b0333e adjusted Netbeans-IDE classpath to current jars change solr jars to 3.6.1 (from 3.6.0) change lucene jars to 3.6.1 (from 3.6.0) added jsoup-1.6.3 reger 2012-10-04 21:12:09 +02:00
  • b58e1f6d67 - add translation for ConfigHeuristics_p.html # section search-result - removed old/unused scroogle text reger 2012-10-04 20:57:29 +02:00
  • 0f7a54452d fix for location search query encoding orbiter 2012-10-04 14:46:40 +02:00
  • 679d562908 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2012-10-04 13:18:52 +02:00
  • 9aa21506be bump to httpcore-4.2.2 (maintenance release) sixcooler 2012-10-03 02:15:02 +02:00
  • 31485a963d refactoring Michael Peter Christen 2012-10-02 21:57:50 +02:00
  • 406e1f3e7e added an option to start indexing right from the host browser Michael Peter Christen 2012-10-02 21:18:27 +02:00
  • f8a3ab2d82 added the usage of synonyms to the GSA search interface Michael Peter Christen 2012-10-02 14:29:45 +02:00
  • 3d33a5bdf6 turned the synonyms_t Text field into a multi-valued String field synonyms_sxt Michael Peter Christen 2012-10-02 11:13:06 +02:00
  • 41ab2a2279 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2012-10-02 10:24:03 +02:00
  • c8b1a693dc ups, added missing class for last commit orbiter 2012-10-02 10:23:10 +02:00
  • 3b959ee002 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2012-10-02 10:14:09 +02:00
  • 3190347814 added a synonyms_t field to solr and a process to read synonym files. This can be used to add another stemming to solr using stemming files that are expressed as synonyms for grammatical alternatives. The synonym/stemming files must have the following form: - each line is a comma-separated list of synonyms - the list of synonyms may be enclosed with {} (like the GSA synonyms file) - the file may contain comments which are lines starting with a '#' The synonym file(s) must be placed in DATA/DICTIONARIES/synonyms/ and are activated by default whenever a synonym file is in place. Then, for each word that is found in a document all synonyms are added to a long text field which is stored into synonyms_t. Processes using the synonyms must query with that field as optional matcher. orbiter 2012-10-02 00:02:50 +02:00
  • 411d0e839b added an underline text field to solr to record all underlined texts Michael Peter Christen 2012-10-01 14:16:49 +02:00
  • be4c96f3b1 The HostBrowser now offers to index files that are discovered because they are linked in the web interface. orbiter 2012-09-30 13:23:06 +02:00
  • c4a3d8870f fixed computation of links in host browser which are not indexed but knwon by the crawler. Such links are now displayed in grey color. Michael Peter Christen 2012-09-29 02:13:11 +02:00
  • 97a47319c8 added nice links to the host browser: - click on the file icon to get the metadata of the file - click on the link icon behind the link to open the original file in the browser Michael Peter Christen 2012-09-28 23:09:21 +02:00
  • f45f7fc12e added new Host Browser to main menu: this new search interface is something completely new for search, but completely common on desktops: browser a web space like one would browse a file system in a file browser. The file listing is created using the search index and a faceted restriction to specific domains. Michael Peter Christen 2012-09-28 22:45:16 +02:00
  • 8556a3d521 extended solr connector with a method to retrieve a single facet. Michael Peter Christen 2012-09-28 13:50:13 +02:00
  • d0015df61c added lucene memory library which is now necessary as solr has to process more complex queries Michael Peter Christen 2012-09-28 13:48:51 +02:00
  • 80edd8ecd7 some more after-refactoring fixes Michael Peter Christen 2012-09-28 10:24:57 +02:00
  • 816cb6ce93 another fix for the debian installer: the installer fails because some classes had unresolved dependencies. This fix removes the dependencies. Michael Peter Christen 2012-09-28 09:00:40 +02:00
  • c461c28c5d fix for debian package installation (caused by refactoring) Michael Peter Christen 2012-09-27 17:23:10 +02:00
  • 280e36c90b allow Cross-Origin Resource Sharing for all stream servlets, that is the solr and the gsa search interface. That means that all JavaScript in browsers now can Cross-Origin access all YaCy search interfaces, which opens the option of 'YaCy Client in Browser' and 'End-Point Fail-over' concepts. Michael Peter Christen 2012-09-27 12:02:24 +02:00
  • ccd65ecf8d fixed url search in IndexControlURLs_p.html / using now the solr interface Michael Peter Christen 2012-09-27 00:31:59 +02:00
  • 016ffa7434 increased strength of crawling waves in network image Michael Peter Christen 2012-09-26 23:32:13 +02:00
  • 23f68f2a69 force usage of default faceting mechanisms for search Michael Peter Christen 2012-09-26 18:48:59 +02:00
  • 24d2ee3c52 - better date ranking - more protection against NPE and time travel effects Michael Peter Christen 2012-09-26 18:36:32 +02:00
  • ca313e404f - if a "/date" modifier is used, the solr remote query applies an ordering by date (ascending) - added also some 'anti-timetravel' protection (check if date is in the future within any metadata date field) Michael Peter Christen 2012-09-26 16:56:33 +02:00
  • a4214694df We assert that no other metadata storage than solr is used now. Therefore a property like solrConnected() must be true all the time. Removal of this method causes removal of all write operations to the old metadata index. Michael Peter Christen 2012-09-26 16:05:11 +02:00
  • abab291162 made the index schema retrieval public and allow cross-domain retrieval Michael Peter Christen 2012-09-26 15:44:50 +02:00
  • 0cec7e761a enhanced snippet extractor to find snippets also inside of tokens of an url Michael Peter Christen 2012-09-26 15:33:37 +02:00
  • c65b576a6f added filename for missing crawlname when crawling from file sixcooler 2012-09-26 14:05:33 +02:00
  • 6c50d016ed pdf- and zipParser should not use forced Memory-Limits sixcooler 2012-09-26 14:03:51 +02:00
  • 562183932b - removed ip_s from default profile since that needs a DNS lookup to create an document entry. This makes remote search much slower. - removed synchronization of add method if ip_s is activated to prevent that a user configuration causes bad behavior. The disadvantage of that is, that a index dump can cause data loss if an indexing is running during index dump - catched more exceptions and more NPE - better abstraction in MirrorSolrConnector - slight performance enhancement when only the index count is requested (rows=0 is sufficient to get a total count) Michael Peter Christen 2012-09-26 13:38:04 +02:00
  • 24f4ca4d85 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2012-09-26 12:01:34 +02:00
  • 7efe9eb37b adding CORS access header for Network.xml to overcome cross domain restriction (e.g. necessary to build a JavaScript YaCy client). apfelmaennchen 2012-09-26 10:36:09 +02:00
  • 116f429e35 fix for java.lang.RuntimeException: TableColumnIndex not available... apfelmaennchen 2012-09-26 09:56:16 +02:00
  • 5ac61591f3 better abstraction for solr query params Michael Peter Christen 2012-09-25 23:59:30 +02:00
  • c913b2ba77 - fix for NPEs during remote solr configuration - fixed remote solr setting switch - added more logging Michael Peter Christen 2012-09-25 23:59:09 +02:00
  • b5192e03d7 fixed bad output in stopYACY.sh Michael Peter Christen 2012-09-25 23:20:09 +02:00
  • 882d54067a added dummy update servlet Michael Peter Christen 2012-09-25 23:09:32 +02:00
  • 1533bfd63b refactoring Michael Peter Christen 2012-09-25 21:20:03 +02:00
  • e49359cc95 removed tenant query attribute since it is not used any more and is replaced by the site-operator in the GSA interface. This operator can also be simulated in the Solr interface using the collections_sxt field. Michael Peter Christen 2012-09-25 21:09:06 +02:00
  • 872f83ebe0 refactoring Michael Peter Christen 2012-09-25 21:04:58 +02:00
  • fb9460f0a8 using the search filter to drill down search to file types. A search like "mp3 filetype:mp3" will now maybe surprise you. Michael Peter Christen 2012-09-25 17:52:33 +02:00
  • bc865ab816 more cleaning (yacy-cora) Michael Peter Christen 2012-09-25 12:19:24 +02:00
  • 640339ee21 added the indexrestore.sh script which must be called with the path of the index dump. This is the reverse of indexdump.sh which takes the output of indexdump.sh as input to restore an index. Now it should be possible to transfer a complete YaCy Solr index from one peer yacy1 to another peer yacy2 with the following command: yacy2/bin/indexrestore.sh ´yacy1/bin/indexdump.sh´ Michael Peter Christen 2012-09-25 00:28:20 +02:00
  • 15ea053c3a - added xml output in IndexControlURLs to get the storage page of index dump commands - adjusted the apicall.sh script to get the downloaded text as output to stdout which is necessary to parse the content out of it - added indexdump.sh script which creates a solr dump and prints out the storage path for the index dump - added synchronization to the Fulltext class to prevent that data is stored to a non-existing solr index while this index is disabled during the storage of the dump Michael Peter Christen 2012-09-25 00:19:52 +02:00
  • 1b474139dd used the new zip writer/reader to add a solr dump process: the whole solr index can be written to a zip dump and also restored during runtime Michael Peter Christen 2012-09-24 17:05:28 +02:00
  • 4a3e684f8c added a directory-to-zip writer and zip-to-directory reader Michael Peter Christen 2012-09-24 17:04:37 +02:00
  • d9ebf4a40f a bit more logging Michael Peter Christen 2012-09-24 15:01:44 +02:00
  • 5683162bd3 simplifications in DHT Distribution class and more documentation Michael Peter Christen 2012-09-24 12:01:09 +02:00
  • e57bf2ca39 simplified DHT classes Michael Peter Christen 2012-09-24 01:04:39 +02:00
  • a053b356ee added new classes to renovate the YaCy protocol based on simple data structures in cora: - added the Peer object, which is a fresh version of Seed - added the Peers object, which is a fresh version of Network - added the Network api access class to retrieve a list of peers based on the Network.xml servlet in all YaCy peers. orbiter 2012-09-22 11:10:11 +02:00
  • 14897d4bfc fixed mistake in wt-option which caused that the yacy json format overlapped the solr built-in json format orbiter 2012-09-21 21:38:50 +02:00
  • 8219a445f3 refactoring Michael Peter Christen 2012-09-21 16:46:57 +02:00
  • f879a344e7 fix for no depth limit default value Michael Peter Christen 2012-09-21 16:05:17 +02:00
  • fa7f6f0be8 added HostBrowser servlet (stub) Michael Peter Christen 2012-09-21 15:48:40 +02:00
  • 00c1c777fa refactoring Michael Peter Christen 2012-09-21 15:48:16 +02:00
  • 563d584420 removed more dependencies in cora from kelondro orbiter 2012-09-21 11:02:36 +02:00
  • aa65282259 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2012-09-21 10:27:30 +02:00
  • 63762d8f89 removed kelondro dependencies from cora orbiter 2012-09-20 19:38:22 +02:00
  • 39564fddbd more ignore orbiter 2012-09-20 18:45:51 +02:00
  • 6e0f4557f8 added ftp to getName orbiter 2012-09-20 18:29:04 +02:00
  • 23204d2245 change parameter to support the smw extension for list import cominch 2012-09-20 15:02:57 +02:00
  • c235d5c0f1 fixed size parsing in RSS message parser (for YaCy size parameter) Michael Peter Christen 2012-09-19 06:36:07 +02:00
  • 089a03114e full memory usage for debian and when changing the size: debian seems to dislike the big difference between xmx and xms (I have crashes here which stop if both values are same) orbiter 2012-09-18 22:31:01 +02:00
  • 5bc8f34150 fix for success query counter Michael Peter Christen 2012-09-18 11:06:36 +02:00
  • 60b1e23f05 added new crawl options: - indexUrlMustMatch and indexUrlMustNotMatch which can be used to select loaded pages for indexing. Default patterns are in such a way that all loaded pages are also indexed (as before) but when doing an expert crawl start, then the user may select only specific urls to be indexed. - crawlerNoDepthLimitMatch is a new pattern that can be used to remove the crawl depth limitation. This filter a never-match by default (which causes that the depth is used) but the user can select paths which will be loaded completely even if a crawl depth is reached. orbiter 2012-09-16 21:27:55 +02:00
  • 4987921d3d fixed the size() method which counted also failed pages (which are also inside the solr index) orbiter 2012-09-16 21:22:56 +02:00
  • 6ec02deec6 added new crawl attributes in crawl profile (not active yet) Michael Peter Christen 2012-09-14 16:49:29 +02:00
  • a13e5153ac - added the possibility to have not one but a list of crawl start urls - the list of urls is entered in the expert crawl start in a textfield; the one-line input field was replaced with a text box - start urls can also be given in one single line where the urls are separated by a '|'-character - as an effect, the crawl profile cannot carry a single start url for identificaton because it is possible to have more. Therefore the url was removed from the crawl profile - this affect all servlets which display a crawl profile: removed the url field from all there servlets - to work consistently with several start urls and the other crawl starts which computed crawl start url lists from sitelists or sitemaps, the crawl start servlet was restructured completely - new rules for must-match patterns were created to make it possible that site crawl starts also work with several crawl starts at once Michael Peter Christen 2012-09-14 12:25:46 +02:00
  • 975bc95ddf added default facet fields for json response format (stub) Michael Peter Christen 2012-09-14 12:09:20 +02:00
  • 2f218df55d added missing license headers Michael Peter Christen 2012-09-14 12:06:06 +02:00
  • a30653a864 added a regular expression test servlet which is linked within the parser/crawler error page whenever a problem with regular expression occurs. This makes it easy to correct and enhance the must-match and must-not-match patterns just by trying out which pattern could be correct. Michael Peter Christen 2012-09-14 12:04:54 +02:00
  • 0504b01bdc Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2012-09-14 00:48:17 +02:00
  • 9413f77b65 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2012-09-13 23:54:26 +02:00
  • a55e77a115 added twitter search heuristic orbiter 2012-09-13 23:53:53 +02:00
  • e54ac38095 - some corrections in usage of getFile() and getFileName() - added more attributes in json response writer according to yacy servlet Michael Peter Christen 2012-09-11 23:28:21 +02:00
  • 62add1d564 added the protocol and the file name extension to the solr fields since these fields are probably facets in file search Michael Peter Christen 2012-09-11 22:46:39 +02:00
  • e072632a54 no complaints about memory if the database is empty Michael Peter Christen 2012-09-11 22:28:10 +02:00
  • b846f585fa fixed a bug with size_i field usage Michael Peter Christen 2012-09-11 20:24:27 +02:00
  • 9db032664e activate two solr fields which will be used by administration interface (later) Michael Peter Christen 2012-09-11 20:15:54 +02:00
  • fcd5c7eec3 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git orbiter 2012-09-11 09:16:38 +02:00
  • 6171143b4a added facet stub in JsonResponseWriter orbiter 2012-09-11 09:15:47 +02:00
  • e6330f648a Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2012-09-11 03:02:47 +02:00
  • e84ffdb4f3 enhanced solr writers Michael Peter Christen 2012-09-11 03:02:02 +02:00
  • 9644c186a4 added search functionality to ViewFile.html servlet Michael Peter Christen 2012-09-11 02:03:14 +02:00
  • 03f3a8b647 *) fix for http://www.yacy-forum.org/viewtopic.php?f=2&t=759 Marc Nause 2012-09-10 20:22:26 +02:00
  • b69ed96f0b - added collections to yacydoc - changed yacydoc.htm to yacydoc.json - added query logging in solr and gsa search result Michael Peter Christen 2012-09-10 15:20:55 +02:00