Mike Fährmann
2818973981
[gelbooru_v02] unescape categorized tags
2024-10-10 17:30:55 +02:00
Mike Fährmann
51fd14f87d
[gelbooru_v02] use total number of posts as end marker ( #5830 )
...
… and potentially retry on empty responses
2024-07-12 22:51:46 +02:00
Mike Fährmann
807e2f7094
[realbooru] fix videos and provide fallback URLs ( #2530 )
...
revert acc94ac187
.
2024-05-31 23:58:40 +02:00
Mike Fährmann
acc94ac187
[realbooru] fix extraction
...
revert ac97aca99c
2024-01-20 17:56:07 +01:00
Mike Fährmann
93b4120e77
[gelbooru] support 'all' and empty tag ( #5076 )
2024-01-18 21:49:33 +01:00
Mike Fährmann
89066844f4
add 'config_instance' method
...
to allow for a more streamlined access to BaseExtractor instance options
2024-01-18 03:20:36 +01:00
Mike Fährmann
085411f3f1
[rule34] recognize URLs with 'www' subdomain ( #4984 )
2023-12-30 16:07:56 +01:00
Mike Fährmann
a453335a9f
remove test results in extractor modules
...
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
6eca1fab9b
[gelbooru_v02] support 'xbooru.com' ( #4493 )
2023-09-03 15:39:02 +02:00
Mike Fährmann
a383eca7f6
decouple extractor initialization
...
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().
This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
ac97aca99c
[realbooru] fix extraction
...
get file URLs from HTML pages
2023-04-02 20:45:16 +02:00
Mike Fährmann
cd931e1139
update extractor test results
2022-12-08 18:58:29 +01:00
Mike Fährmann
6423f990de
[realbooru] fix 'tags' extraction ( #2530 )
2022-11-10 17:04:02 +01:00
Mike Fährmann
ecad02cf3f
[realbooru] fix download URLs ( #2530 )
2022-11-10 13:29:35 +01:00
Mike Fährmann
b0cb4a1b9c
replace 'text.extract()' with 'text.extr()' where possible
2022-11-05 01:14:09 +01:00
Mike Fährmann
4fd3c893fa
[booru] adjust/match '_tags' and '_notes' code
2022-11-04 19:49:39 +01:00
Mike Fährmann
88954aa2e4
[gelbooru_v02] implement 'notes' extraction
...
same code as for 'moebooru' works here as well
2022-11-04 19:49:39 +01:00
Mike Fährmann
775895f44b
[booru] refactor 'tags' and 'notes' extraction
...
- move HTML request for post pages into its own function
- move gelbooru_v02.py notes extraction to gelbooru.py
since it only works there
- clean up some code
2022-10-31 12:01:19 +01:00
Mike Fährmann
67a2efb885
[rule34] implement 'pool' pagination ( #2853 )
2022-08-26 17:57:17 +02:00
Mike Fährmann
f225247670
[gelbooru] add support for api_key
and user_id
( #2767 )
2022-07-18 18:46:31 +02:00
Mike Fährmann
c6a9bab019
update extractor test results
2022-07-12 15:49:22 +02:00
Mike Fährmann
ff5e10a86d
[hypnohub] move to gelbooru_v02 instances ( #2631 )
2022-05-28 21:10:05 +02:00
Mike Fährmann
d26da3b9e5
add pre-generated 'pattern' for supported BaseExtractor sites
2022-05-09 22:20:09 +02:00
Mike Fährmann
3e926bd465
[realbooru] fix extraction ( fixes #2530 )
2022-05-02 09:03:34 +02:00
Mike Fährmann
dee0d22561
update extractor test results
2022-02-06 21:39:24 +01:00
Mike Fährmann
199e7616a7
[rule34] use https://api.rule34.xxx for API requests
2022-01-08 17:14:50 +01:00
Mike Fährmann
93cef78450
[gelbooru] workaround pagination limits
...
Gelbooru only allows to retrieve the latest 20k posts for a tag search.
Add 'id:<N' to the search tags to work around that limitation, where N
is the ID of the last retrieved post.
http://gelbooru.me/index.php?page=forum&s=view&id=1467
2021-11-26 18:56:31 +01:00
Mike Fährmann
7bbb1f92d7
[gelbooru_v02] add 'favorite' extractor ( closes #1834 )
2021-09-10 20:43:59 +02:00
thatfuckingbird
dff03a6605
[booru] add an option to extract notes (only gelbooru for now) ( #1457 )
...
* [booru] add an option to extract notes (currently implemented only for gelbooru)
* appease linter
* [gelbooru] rename "text" to "body" in note extraction
* add a code comment about reusing return value of _extended_tags
2021-04-13 23:40:24 +02:00
thatfuckingbird
918b0441fb
[gelbooru] fix tag category extraction ( #1455 )
2021-04-10 19:05:00 +02:00
Mike Fährmann
3df527ee2c
update extractor test results
2021-02-27 21:01:29 +01:00
Mike Fährmann
59fd740b47
[tbib] add support for https://tbib.org/ ( #473 , closes #1082 )
2021-02-17 00:28:25 +01:00
Mike Fährmann
08d7934c6e
move extractors from booru.py into their own gelbooru_v02 module
2021-02-17 00:26:24 +01:00