The changes in 9a255344 caused a warning about missing cookies to be
displayed even if those cookies were present, because _check_cookies()
did not account for an empty cookiedomain.
* Generic extractor, see issue #683
* Fix failed test_names test, no subcategory needed
* Prefix directory_fmt with "generic"
* Relax regex (would break some urls)
* Flake8 compliance
* pattern: don't require a scheme
This fixes a bug when we force the generic extractor on urls without a
scheme (that are allowed by all other extractors).
* Fix using g: and r: on urls without http(s) scheme
Almost all extractors accept urls without an initial http(s) scheme.
Many extractors also allow for generic subdomains in their "pattern"
variable; some of them implement this with the regex character class
"[^.]+" (everything but a dot).
This leads to a problem when the extractor is given a url starting
with g: or r: (to force using the generic or recursive extractor)
and without the http(s) scheme: e.g. with "r:foobar.tumblr.com"
the "r:" is wrongly considered part of the subdomain.
This commit fixes the bug, replacing the too generic "[^.]+" with the
more specific "[\w-]+" (letters, digits and "-", the only characters
allowed in domain names), which is already used by some extractors.
* Relax imageurl_pattern_ext: allow relative urls
* First round of small suggested changes
* Support image urls starting with "//"
* self.baseurl: remove trailing slash
* Relax regexp (didn't catch some image urls)
* Some fixes and cleanup
* Fix domain pattern; option to enable extractor
Fixed the domain section for "pattern", to pass "test_add" and
"test_add_module" tests.
Added the "enabled" configuration option (default False) to enable the
generic extractor. Using "g(eneric):URL" forces using the extractor.
More precisely, it now splits the full 'filename' into 'name' and 'id'
instead of overwriting 'filename'. The format string stays the same as
before. Use '{name}.{extension}' to restore the old behavior.
before:
- filename: foobar
- id : 12345
now:
- filename: foobar-12345
- name : foobar
- id : 12345
Gelbooru only allows to retrieve the latest 20k posts for a tag search.
Add 'id:<N' to the search tags to work around that limitation, where N
is the ID of the last retrieved post.
http://gelbooru.me/index.php?page=forum&s=view&id=1467