Commit Graph

  • 54bea96e67 Merge branch 'master' of git@gitorious.org:yacy/rc1.git orbiter 2014-07-02 23:23:34 +02:00
  • 15b2fad6a2 reverted latest change for reindexing because that works actually only for internal Solr indexes. This is mainly caused by the fact that an external Solr may be also a SolrCloud which do not support LukeRequests, which are needed to request the old Schema. Michael Peter Christen 2014-07-02 14:56:34 +02:00
  • 841cc77391 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2014-07-02 14:35:02 +02:00
  • e09218129c remove check for local solr. This check was made during a time when Solr was optional and another alternative metadata store was available. Since that store is now removed, Solr is always available (internally or externally) Michael Peter Christen 2014-07-02 14:34:48 +02:00
  • 2073e69034 fix for long periods in timeline orbiter 2014-07-02 11:29:50 +02:00
  • 1f94df29e7 fix NPE in solr rss where snippet contains only the title text and adjusted xslt, for solr snippets (&hl=true) to decode the xml encoded html <b> tag by adding disable-output-escaping (still open item description may be double as dc: tag and rss.description tag) reger 2014-07-01 23:24:26 +02:00
  • 09dcdb9b19 update to solr 4.9.0 Michael Peter Christen 2014-07-01 16:39:00 +02:00
  • 282b53db42 update of commons-io and slf4j-api (as preparation for Solr 4.9.0) Michael Peter Christen 2014-07-01 16:18:12 +02:00
  • 1cd4b2e8be Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2014-07-01 16:06:12 +02:00
  • 8c52f0651b refactoring of AccessTracker events & timeline fix Michael Peter Christen 2014-07-01 16:06:01 +02:00
  • 431a5f9c4e added test case for TextSnippet, removed obsolete/unused parameter and reference to MediaSnippet reger 2014-06-30 05:36:48 +02:00
  • 5b94a257ce no timeout for large reference collections Michael Peter Christen 2014-06-29 22:26:22 +02:00
  • f5b817bac4 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2014-06-29 22:25:08 +02:00
  • cb2c17d236 extract author and keywords in .doc and .ppt parser reger 2014-06-29 02:54:09 +02:00
  • a5707cd2eb enable proper Author navigator - author facet is based on omitted author_sxt field - adjust to make author nav available on exist of author field but keep using author_sxt to construct the facet (why!?) - add check for querymodifier author in searchevent reger 2014-06-27 23:05:06 +02:00
  • 1b279d7a7e fixed external link Michael Peter Christen 2014-06-27 15:12:53 +02:00
  • 74206a10c7 refactoring Michael Peter Christen 2014-06-27 14:40:36 +02:00
  • fec673c9d1 Merge branch 'master' of git@gitorious.org:yacy/rc1.git orbiter 2014-06-27 10:15:37 +02:00
  • 4a66af716d added apkParser stub (work in progress) orbiter 2014-06-27 10:15:01 +02:00
  • c59da9fe7a added access tracker log reader stub orbiter 2014-06-27 10:14:36 +02:00
  • 2d67f29244 adjust mergeDocument after parsing to - preserve charset and languages - fix merge of author reger 2014-06-26 22:16:15 +02:00
  • 0d29b972cc Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2014-06-26 13:02:56 +02:00
  • 36e623d8bf enhanced metadata enrichment for media file type search: - Web servers may now deliver YaCy-specific http header field with a title and keywords. The new http header fields are: X-YaCy-Media-Title - to be used for media (image, audio, video) titles X-YaCy-Media-Keywords - to be used for media (image, audio, video) keywords - both fields are written to document fields title and keywords and are searched also during image search. - to make the usage of arbitrary http header fields (including this new fields) possible in the /api/push_p.json servlet, a new POST argument is also introduced to push http header fields. The new POST attribute is named "responseHeader-X" (where X is the counter). It is allowed to use this attribute as multi-attribute several times, each can be filled with a http header line. - see /api/push_p.html for examples Michael Peter Christen 2014-06-26 13:02:35 +02:00
  • 49886fab08 enhanced debugging Michael Peter Christen 2014-06-26 12:57:01 +02:00
  • b893c42a0f bugfix for image search Michael Peter Christen 2014-06-26 12:56:33 +02:00
  • c7995d3e2a increased fixed limit for http POST request sizes to 100MB Michael Peter Christen 2014-06-26 11:58:07 +02:00
  • 7847a93558 fix AbstractParser.singleList not adding null strings - prevents null titles in oo... parser (as detected by ParserTest) - correct ParserTest dc_description check (dc_description allowed to return 0 length array) reger 2014-06-26 02:56:45 +02:00
  • 8acae852a0 write <em>-tagged texts also into the bold_txt field Michael Peter Christen 2014-06-25 11:51:11 +02:00
  • a88ea14e09 harmonize use of style for "delete" button - apply the monstly used btn-danger class reger 2014-06-22 23:33:59 +02:00
  • 66c784c552 bump to httpclient-4.3.4 sixcooler 2014-06-22 16:24:45 +02:00
  • b9f6acee23 update to Jetty 9.2.1 reger 2014-06-22 00:21:47 +02:00
  • 90c4576361 add a link to recrawl index entry to metadata html page - to allow manually renew index content for this url (e.g. in case it is a remote search result with metadata only) - use simply a QuickCrawlLink_p javascript snippet (minimalistic 1st solution) reger 2014-06-21 04:21:29 +02:00
  • 8fd72b5e8b Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2014-06-20 13:57:06 +02:00
  • 81d0f01a6f added 'synchronous' and 'commit' flags in push api Michael Peter Christen 2014-06-20 13:56:55 +02:00
  • 2626c8f6db using concurrency to do base64 encoding in file POST commands Michael Peter Christen 2014-06-20 13:55:15 +02:00
  • e132689818 fixed and enhanced Base64 (en)coder (again) Michael Peter Christen 2014-06-20 13:54:18 +02:00
  • 2415e3db43 enhanced ASCII byte[] -> String conversion Michael Peter Christen 2014-06-20 13:53:22 +02:00
  • 5043eff33a move page navigation below results (image search) force page navigation to be displayed below results in image search for any number of displayed images instead to be displayed to the right of last image. reger 2014-06-20 01:02:43 +02:00
  • 4751ed974f enhanced base64 encoding Michael Peter Christen 2014-06-19 12:11:02 +02:00
  • e949071160 removed superfluous date method Michael Peter Christen 2014-06-19 12:10:42 +02:00
  • 501d55cd35 removed superfluous assert Michael Peter Christen 2014-06-19 12:10:12 +02:00
  • f443cfa32d Improvements and bugfixes for recording actions of blacklist API. Marc Nause 2014-06-17 22:54:47 +02:00
  • 0ba6b98d5b fix for broken json Michael Peter Christen 2014-06-17 11:36:20 +02:00
  • 4177c9cf05 fix for crawl start check orbiter 2014-06-15 22:50:04 +02:00
  • 515e63c274 ignore the api javadoc directory in git commits orbiter 2014-06-15 12:41:14 +02:00
  • 0bbb5040b8 Merge branch 'master' of git@gitorious.org:yacy/rc1.git orbiter 2014-06-15 12:38:52 +02:00
  • 9d5d86cd03 Added filter query options to the ranking servlet /RankingSolr_p.html. Filter queries are not actually related to ranking, but user requests have pointed out that specific boost queries to move results to the end of the result list are not sufficient. Such boost filters may be better executed as actual filter and therefore such a filter can now be statically applied to every search request. A typical use could be the expression "http_unique_b:true AND www_unique_b:true" which uses the recently introduced fields http_unique_b and www_unique_b which are true only for one of the alternatives with/without http(s) and with/without prefix 'www.' in host names. orbiter 2014-06-15 12:38:30 +02:00
  • d2151857f1 Added collection navigation: The collection field (can be filled i.e. in Crawl Start) can be used to add categories to YaCy index entries. The usage of that field was restricted to solr searches and post argument filters as implemented in commit f7571386a3. This commit extends collections to a full navigation option in the standard YaCy search interface. The field is not active by default but can be activated easily in the /ConfigSearchPage_p.html servlet (just check the 'Collection' facet field). Collections can now be used for (at least) two purposes: - to provide search tenants (through post argument collection) - to provide self-made category navigation Search requests may now have (independently from switched on or off collection facet) a "collection:<collection-name>" modifier attached; firthermore collection names may use disjunctions using the '|' pipe symbol. For example, this is a valid search request: www collection:user|proxy Michael Peter Christen 2014-06-15 12:11:23 +02:00
  • 74c249288a added a push api to make it possible to upload files directly without crawling to the YaCy indexer. Files are uploaded using POST multipart requests; multiple file uploads are possible as well. Each file has attached the file date and mime type which is used to get the right parser for the submitted data. Also an url is submitted which is assigned to the document. The CrawlSwitchboard has a new option for default Crawl Profiles which are assigned dynamically from the new push interface. Michael Peter Christen 2014-06-12 18:10:07 +02:00
  • f13c8aa7dd re-implementation of file push option in the context of POST http requests. The internal representation of post-arguments is String and therefore not appropriate for byte[] object as submitted by file pushes. Therefore all pushed files are encoded to base64 _after_ uploading with an http form (you do not need to do that encoding yourself) to hand-over the byte[] as string in the post argument. Servlets which read such files must decode the base64 data to get the original byte[] array. This is considered as a temporary solution for file uploads and a proper implementations would need to consider all attributes as handed over as Objects with either String or byte[] Object instances. This would be a major code change and is not done at this time here now. The feature was submitted to realize a feature as pushed with the next commit. Michael Peter Christen 2014-06-12 18:06:22 +02:00
  • ba6ffddefc refactoring Michael Peter Christen 2014-06-12 05:23:26 +02:00
  • 982601017e crawling of filenames with + fails due to url decoding modified UTF8.decodeURL to apply x-www-form-urlencoded ( space -> + ) to the query part of the url only. reger 2014-06-11 04:13:55 +02:00
  • 3b559e7846 optimize pdfParser skip starting reader thread if all content already read reger 2014-06-10 04:25:20 +02:00
  • 09f73b790f fix pdfParser not closed warning from pdfbox for encrypted pdf on exit due to missing permission to extract reger 2014-06-08 08:20:30 +02:00
  • c798a9d1bb fix unresolved pattern in yacysearch.rss title and rss xml error due to html & encoding in url entries reger 2014-06-07 03:01:26 +02:00
  • 92d1604a31 Crawler hostbalancer does not delete finished queue files, use alternative delete to fight the sympthom (and fix deletion of host dirs on startup) Root cause (which class holds a lock on .stack) not found. http://mantis.tokeek.de/view.php?id=404 reger 2014-06-05 02:13:08 +02:00
  • e64be5dcad in case that the network is switched to any other than freeworld, RWIs are disabled. This is a temporary fix. There must be a better way to determine if RWIs are to be switched on or of. Michael Peter Christen 2014-06-04 13:59:37 +02:00
  • 0c324d735c NPE fix for postprocessing without term index Michael Peter Christen 2014-06-04 12:28:28 +02:00
  • 87f171675b doing index deletions using a get string which makes it easier to copy-paste deletion examples (see: #EuGH :( ) Michael Peter Christen 2014-06-04 12:09:49 +02:00
  • a2f800cd8f fix for bad String conversion Michael Peter Christen 2014-06-04 12:07:07 +02:00
  • 922979aae1 added option to prefer http over https in unique-protocol ranking Michael Peter Christen 2014-06-02 17:40:56 +02:00
  • b3b174e2b8 fixed webgraph postprocessing and status display in Crawler_p servlet Michael Peter Christen 2014-06-02 15:06:38 +02:00
  • 3b53bee90f Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2014-06-02 13:12:08 +02:00
  • e6b28f5958 removed check on protocol for double content (user request) Michael Peter Christen 2014-06-02 13:11:44 +02:00
  • 7a52a6ba3f add links to port config in status panel - pom upd to match javadoc location reger 2014-06-02 02:11:54 +02:00
  • b803622ac3 changed javadoc publishing path from 'api' to 'javadoc' because there are also other APIs in in YaCy. Michael Peter Christen 2014-06-01 22:25:55 +02:00
  • d8d318233e fix logging settings - add missing .level - remove obsolete jena settings - set default level=INFO to prevent debug logging of not explicite specified classes reger 2014-06-01 06:43:50 +02:00
  • c3e40c82fe make https port setting changeable via front end somewhere (chosen Http Networking page /Settings_p.html?page=http ) reger 2014-06-01 03:15:38 +02:00
  • 698f053658 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2014-06-01 01:02:12 +02:00
  • f23c4142e0 added option to configure a custom user agent within allip networks Michael Peter Christen 2014-06-01 01:02:03 +02:00
  • 8e233e2eb4 - fix typo in Message_p (defaultpath) - use more existing switchboardconstants for getproperties - replace depriciated call defaultservlet reger 2014-06-01 00:20:25 +02:00
  • d7d38f9135 made number of open files in crawler configurable and increased default maximum number of open files from 100 to 1000. This number can be changed with the attribut crawler.onDemandLimit orbiter 2014-05-31 09:29:55 +02:00
  • 20cffa34bf Merge branch 'master' of gitorious.org:yacy/icewindxs-rc1 Michael Peter Christen 2014-05-30 16:13:09 +02:00
  • 873f8c2d2c Update russian translation malykhin.dmitry 2014-05-30 07:12:56 +04:00
  • c43acb0e80 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2014-05-29 13:24:44 +02:00
  • 8ad41a882c fixed several problems with postprocessing: - unique-postprocessing was destroying results from other postprocessings; removed cross-updates as they had been not necessary - unique-postprocessing did not restrict on same protocol - inefficient concurrent update cache was redesigned completely - increased limits for concurrent blocking queues to prevent early time-out Michael Peter Christen 2014-05-29 13:24:24 +02:00
  • 370f1c408e Changed Windows Firewall Rules to just honor the default Port 8090, but not use any programm-path. This should match more installations in different paths and also running YaCy as service (prunsrv). sixcooler 2014-05-29 00:01:48 +02:00
  • 640b684bb6 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2014-05-28 19:19:17 +02:00
  • 2f5477ea59 a try to fix the mixed up terms 'Active' -> 'Senior' and 'Passive' -> 'Junior' Michael Peter Christen 2014-05-28 18:48:54 +02:00
  • ca5437dd50 fix crawl of file:// , also http://mantis.tokeek.de/view.php?id=149 local files can be crawled (intranet mode) url parsing fixed according to RFC 1738 (for unix and windows) for win like file:///c:/tmp or file://localhost/c:/tmp for linux like file:///tmp or file://localhost/tmp Host is ignored and path must be absolute reger 2014-05-28 03:01:34 +02:00
  • 9b4282344b changed debian dependency to openjdk-7-jre-headless Michael Peter Christen 2014-05-27 18:57:05 +02:00
  • ff5b3ac84d added new fields http_unique_b and www_unique_b which can be used for ranking to prefer urls containing a www subdomain or using the https protocol Michael Peter Christen 2014-05-27 15:28:28 +02:00
  • 66f6797f52 make config search page layout closer to actual page appearance reger 2014-05-25 01:06:39 +02:00
  • 9ecf28b708 - upd pom to Solr 4.8.1 and latest jar updates - upd nsis java autodownload package to jre 7u55 reger 2014-05-24 01:01:27 +02:00
  • 06f2eeda22 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git Michael Peter Christen 2014-05-23 00:50:10 +02:00
  • 5d5896b3f6 fixed dependency in debian package on java 7 Michael Peter Christen 2014-05-23 00:49:50 +02:00
  • 5b1c4ef191 Monitoring and limit connection-count for Jetty sixcooler 2014-05-22 22:16:39 +02:00
  • 046e41e376 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git sixcooler 2014-05-22 21:59:57 +02:00
  • ee7416816b upgraded poi library (office document format parser) from 3.9 to 3.10 orbiter 2014-05-22 15:53:07 +02:00
  • ce1dbfeb0f fix appearance of image search thumbnails. orbiter 2014-05-22 15:01:58 +02:00
  • 6daae59479 switch on core.service.rwi when switching back from portal mode to p2p mode orbiter 2014-05-22 12:55:22 +02:00
  • a12701ddf6 upgraded bouncy caste libraries (needed for encrypted pdfs, dependency in pdfbox) to 1.46 removed the activation.jar library; I don't know which other library depends on it. orbiter 2014-05-22 12:09:21 +02:00
  • f0db501630 better handling of ranking parameters and new default values for date navigation which is done using ranking in solr. Michael Peter Christen 2014-05-22 03:01:07 +02:00
  • 53948da7d0 tried to make last_modified recognition smarter Michael Peter Christen 2014-05-22 00:28:51 +02:00
  • 2d03037965 'Last-Modified', not 'Last-modified' according to http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html Michael Peter Christen 2014-05-21 23:21:31 +02:00
  • 2520590b45 migrated from pdfbox 1.8.4 to 1.8.5. They have a very long bugfix list for that update: http://www.apache.org/dist/pdfbox/1.8.5/RELEASE-NOTES.txt Michael Peter Christen 2014-05-21 22:48:41 +02:00
  • 2d508618a4 Merge branch 'master' of ssh://git@gitorious.org/yacy/rc1.git sixcooler 2014-05-21 20:57:47 +02:00
  • 3dc5fb0050 fix for operator precedence bug (cast binds stronger than bitwise AND) in peer hash hashing. This should not change anything if java casts long to int by masking with 0xFFFFFFFFL but you never know. The important thing is, that the hashCode() should not return numbers that have the same order as the hash code order because hashing of seeds is used to remove the order in some places. Michael Peter Christen 2014-05-21 18:37:52 +02:00
  • 6634b5b737 debug code for index distribution testing Michael Peter Christen 2014-05-21 18:20:16 +02:00
  • 89e13fa34e fixed bug in test function Michael Peter Christen 2014-05-21 15:31:47 +02:00