Commit Graph

8 Commits

Author SHA1 Message Date
b1ace63607 codespell: spelling corrections 2021-05-06 01:52:55 +10:00
a54471849b sitemap.xml support for harvesting loc urls.
parse xml docs as pure xml again but set nodeid
to TAG_LINK etc. so Linkdb.cpp can get links again.
added isparentsitemap url filter to prioritize urls
from sitemaps. added isrssext to url filters to
prioritize new possible rss feed urls. added numinlinks
to url filters to prioritize popular urls for spidering.
use those filters in default web filter set.
fix filters that delete urls from the index using
the 'DELETE' priority. they weren't getting deleted.
2015-03-17 14:26:16 -06:00
96b8197ad3 now it compiles with -m32 2014-11-10 14:45:11 -08:00
e7dd8f7956 replace long long with int64_t 2014-10-30 13:36:39 -06:00
a2beb23d87 added Xml::getCompoundName() 2014-09-28 08:39:46 -07:00
caee238c46 fixes to make easier to compile on max os x. 2014-08-28 12:55:02 -07:00
a72c5dae51 fix <script> tags that immediately end in </script> or
never end but hit another <script> or a </gbiframe> tag.
2014-07-14 17:24:20 -07:00
f6e560c1f4 Initial file population. 2013-08-02 13:12:24 -07:00