gallery-dl

mirror of https://github.com/mikf/gallery-dl.git synced 2024-11-23 19:22:32 +01:00

Author	SHA1	Message	Date
Vrihub	96fcff182c	generic extractor (#735 ) * Generic extractor, see issue #683 * Fix failed test_names test, no subcategory needed * Prefix directory_fmt with "generic" * Relax regex (would break some urls) * Flake8 compliance * pattern: don't require a scheme This fixes a bug when we force the generic extractor on urls without a scheme (that are allowed by all other extractors). * Fix using g: and r: on urls without http(s) scheme Almost all extractors accept urls without an initial http(s) scheme. Many extractors also allow for generic subdomains in their "pattern" variable; some of them implement this with the regex character class "[^.]+" (everything but a dot). This leads to a problem when the extractor is given a url starting with g: or r: (to force using the generic or recursive extractor) and without the http(s) scheme: e.g. with "r:foobar.tumblr.com" the "r:" is wrongly considered part of the subdomain. This commit fixes the bug, replacing the too generic "[^.]+" with the more specific "[\w-]+" (letters, digits and "-", the only characters allowed in domain names), which is already used by some extractors. * Relax imageurl_pattern_ext: allow relative urls * First round of small suggested changes * Support image urls starting with "//" * self.baseurl: remove trailing slash * Relax regexp (didn't catch some image urls) * Some fixes and cleanup * Fix domain pattern; option to enable extractor Fixed the domain section for "pattern", to pass "test_add" and "test_add_module" tests. Added the "enabled" configuration option (default False) to enable the generic extractor. Using "g(eneric):URL" forces using the extractor.	2021-12-29 22:39:29 +01:00
Mike Fährmann	bd08ee2859	remove most 'yield Message.Version' statements only leave them in oauth.py as noop results	2021-08-16 03:10:48 +02:00
Mike Fährmann	2919d78bfc	update extractor test results	2021-02-14 15:37:39 +01:00
Mike Fährmann	193dca2ce1	update extractor test results	2021-01-21 21:35:42 +01:00
Mike Fährmann	93ce7466e2	[2chan] skip external links	2020-11-24 16:41:47 +01:00
Mike Fährmann	b9bfa4c675	update extractor test results	2020-11-07 02:03:22 +01:00
Mike Fährmann	71acbdabf4	[2chan] fix metadata extraction	2019-12-03 17:01:11 +01:00
Mike Fährmann	2a3bd4e3c7	rename extractor classes starting with a digit	2019-11-02 20:42:09 +01:00
Mike Fährmann	2e516a1e3e	store the full original URL in Extractor.url	2019-02-12 18:46:48 +01:00
Mike Fährmann	4b1880fa5e	propagate 'match' to base extractor constructor	2019-02-11 13:31:10 +01:00
Mike Fährmann	6284731107	simplify extractor constants - single strings for URL patterns - tuples instead of lists for 'directory_fmt' and 'test' - single-tuple tests where applicable	2019-02-08 13:45:40 +01:00
Mike Fährmann	9e12e073ab	[2chan] fix extraction	2018-11-10 19:15:21 +01:00
Mike Fährmann	34873dbd90	set 'archive_fmt' values These are going to be used to create an unique id for each image.	2018-02-01 15:30:49 +01:00
Mike Fährmann	6f30cf4c64	change keyword names to valid Python identifiers This commit mostly replaces all minus-signs ('-') in keyword names with underscores ('_') to allow them to be used in filter-expressions. For example 'gallery-id' got renamed to 'gallery_id'. (It is theoretically possible to access any variable, regardless of its name, with 'locals()["NAME"]', but that seems a bit too convoluted if just 'NAME' could be enough)	2017-09-10 22:20:47 +02:00
Mike Fährmann	394241cd6f	[2chan] fix extraction	2017-07-20 15:01:47 +02:00
Mike Fährmann	30d3a5f9b2	support redirects on 4chan archives	2017-07-14 13:24:09 +02:00
Mike Fährmann	47692f28da	[2chan] add thread extractor	2017-07-14 08:44:31 +02:00

17 Commits