Mike Fährmann
a383eca7f6
decouple extractor initialization
...
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().
This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
d97b8c2fba
consistent cookie-related names
...
- rename every cookie variable or method to 'cookies_*'
- simplify '.session.cookies' to just '.cookies'
- more consistent 'login()' structure
2023-07-22 01:20:50 +02:00
Mike Fährmann
f0cb951566
[paheal] unescape 'source'
2023-07-07 20:03:00 +02:00
Mike Fährmann
b480b7076a
[paheal] fix a78f8ce5
for enabled 'metadata' ( #4262 )
2023-07-07 20:00:49 +02:00
Mike Fährmann
a78f8ce5b0
[paheal] fix extraction ( #4262 )
...
swap ' and "
2023-07-04 17:36:41 +02:00
Mike Fährmann
7865067d19
[shimmie2] add generic extractors for Shimmie2 sites ( #3734 )
...
add support for
- loudbooru.com (#3734 )
- booru.cavemanon.xyz (#3734 )
- giantessbooru.com (#943 )
- tentaclerape.net
2023-04-26 19:20:44 +02:00
Mike Fährmann
2ed58029f9
{paheal[ add proper support for videos ( #2892 )
2022-09-04 13:30:48 +02:00
Mike Fährmann
4b78bd423f
[paheal] add 'metadata' option ( #2641 )
2022-06-04 16:05:49 +02:00
Mike Fährmann
61fa9b535a
[paheal] improve metadata extraction ( #2641 )
...
- unescape 'tags'
- add 'date', 'source', and 'uploader' for single posts
2022-05-30 17:23:08 +02:00
Mike Fährmann
211de95dd0
update extractor test results
2021-11-01 02:58:53 +01:00
Mike Fährmann
4b1cda4cf7
[paheal] fix metadata extraction
2021-02-14 15:43:39 +01:00
Mike Fährmann
43120407cc
[paheal] create directory for each post ( closes #1147 )
2020-12-01 12:14:55 +01:00
Mike Fährmann
1e3dd7330e
merge SharedConfigMixin functionality into Extractor
2020-11-17 00:34:07 +01:00
Mike Fährmann
558cde139c
[paheal] fix extraction ( fixes #1088 )
2020-10-28 21:51:31 +01:00
Mike Fährmann
968d3e8465
remove '&' from URL patterns
...
'/?&#' -> '/?#' and '?&#' -> '?#'
According to https://www.ietf.org/rfc/rfc3986.txt , URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
844793847c
update extractor test results
2020-10-11 18:15:41 +02:00
Mike Fährmann
19bf76bcf8
update extractor test results
2020-08-03 21:57:00 +02:00
Mike Fährmann
1d4a369ea2
update extractor test results
2020-02-27 22:15:40 +01:00
Mike Fährmann
e6cd49e78b
update extractor test results
2020-02-16 21:48:46 +01:00
Mike Fährmann
2852691d78
[paheal] replace test URL
...
searching for 'k-on' doesn't yield any results anymore
2020-01-27 22:19:41 +01:00
Mike Fährmann
62335b9015
[paheal] adjust test results
2019-06-05 11:42:01 +02:00
Mike Fährmann
6a34f4b0c1
skip tests on read timeouts; print list of skipped tests
2019-06-01 20:47:31 +02:00
Mike Fährmann
d6ddb74cde
update test results
...
- deviantart: 'index' is now an integer
- flickr: image file with lower quality
- paheal: image server name changed
- rule34: post got deleted
2019-04-12 09:59:48 +02:00
Mike Fährmann
f8782c05f2
[paheal] rename "tags" to "search_tags"
...
to better match field names of other booru extractors
2019-02-17 18:18:09 +01:00
Mike Fährmann
5530871b5a
change results of text.nameext_from_url()
...
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)
Example: "https://example.org/path/filename.ext "
before:
- filename : filename.ext
- name : filename
- extension: ext
now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
2019-02-11 13:31:10 +01:00
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
4d656a81ca
replace SharedConfigExtractor class with a Mixin
2019-02-04 13:46:02 +01:00
Mike Fährmann
4d73cc785d
update test results
2018-12-14 16:07:32 +01:00
Mike Fährmann
c9f70e0a19
[paheal] use HTTPS
2018-07-17 21:25:03 +02:00
Mike Fährmann
7a58151566
fix util.parse_bytes invocations
...
(should be text.parse_bytes)
2018-05-10 22:07:55 +02:00
Mike Fährmann
cc36f88586
rename safe_int to parse_int; move parse_* to text module
2018-04-20 14:53:21 +02:00
Mike Fährmann
34873dbd90
set 'archive_fmt' values
...
These are going to be used to create an unique id for each image.
2018-02-01 15:30:49 +01:00
Mike Fährmann
40d35c87bc
[paheal] add tag- and post-extractors ( closes #69 )
2018-01-15 16:39:05 +01:00