Mike Fährmann
a453335a9f
remove test results in extractor modules
...
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
a383eca7f6
decouple extractor initialization
...
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().
This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
cc15fbe71a
[moebooru] add generalized extractors for moebooru sites
...
- add support for sakugabooru.com (closes #1136 )
- add support for lolibooru.moe (closes #1050 )
This allows users to dynamically add support for moebooru/myimouto
based sites by adding an entry to their config file
(like for foolslide, foolfuuka, etc)
For example:
{
"extractor": {
"moebooru": {
"new-site-1": {"root": "https://site1.net "},
"new-site-2": {"root": "https://www.site2.moe "}
}
}
}
2020-12-01 22:27:18 +01:00
Mike Fährmann
1d4a369ea2
update extractor test results
2020-02-27 22:15:40 +01:00
Mike Fährmann
978cb03f81
update misc test results
...
- Livedoor now uses https:// for its image URLs
- Instagram image URLs got simplified
2019-11-20 21:45:48 +01:00
Mike Fährmann
2a3bd4e3c7
rename extractor classes starting with a digit
2019-11-02 20:42:09 +01:00
Mike Fährmann
11ea689013
[simplyhentai] fix image and video URLs
2019-09-16 21:37:16 +02:00
Mike Fährmann
f2cf1c1d73
use 'text.extract_from()' in a few places
2019-04-21 15:19:20 +02:00
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
4a57509392
generalize tag-splitting option ( #92 )
...
- extend functionality to other booru sites:
- http://behoimi.org/
- https://konachan.com/
- https://e621.net/
- https://rule34.xxx/
- https://safebooru.org/
- https://yande.re/
2018-07-04 12:21:16 +02:00
Mike Fährmann
974e73bdbb
[booru] smaller code adjustments
2018-01-06 17:48:49 +01:00
Mike Fährmann
9e8a84ab6c
[booru] rewrite using Mixin classes ( #59 )
...
- improved code structure
- improved URL patterns
- better pagination to work around page limits on
- Danbooru
- e621
- 3dbooru
2018-01-04 00:01:39 +01:00
Mike Fährmann
e6814aebe2
add 'extractor.*.user-agent' config option
2017-11-15 14:01:33 +01:00
Mike Fährmann
158e60ee89
[3dbooru] enable download continuation
...
behoimi.org doesn't respect 'Range' headers and doesn't report
'Content-Length' for compressed content encodings.
2017-10-24 13:05:31 +02:00
Mike Fährmann
81a7788b40
replace space characters in unit test URLs
2017-10-23 17:00:53 +02:00
Mike Fährmann
41adb99e9c
[pawoo] fix extraction
...
- changed access_token
- use account-search instead of general search
2017-10-02 18:33:52 +02:00
Mike Fährmann
00420ff202
[booru] consistent order for "popular" results
2017-09-06 12:33:19 +02:00
Mike Fährmann
65997d835b
replace popular/ranking tests with older ones
...
Metadata of several year old lists shouldn't change as much as it
would for newer ones, which makes metadata-comparisons of the output
of build_testresult_db.oy easier.
2017-08-31 15:09:18 +02:00
Mike Fährmann
88a386977e
[booru] add "popular" extractors for more sites
...
- konachan.com
- behoimi.org
- e621.net
2017-08-26 23:08:52 +02:00
Mike Fährmann
07214f4007
[booru] place subcategories into base classes
2017-08-26 22:27:55 +02:00
Mike Fährmann
94e10f249a
code adjustments according to pep8 nr2
2017-02-01 00:53:19 +01:00
Mike Fährmann
d7e168799d
consistent extractor naming scheme + docstrings
2016-09-12 10:34:31 +02:00
Mike Fährmann
616e0aedd6
update booru testdata
2015-12-22 03:10:52 +01:00
Mike Fährmann
ba99506c72
more extractor test-cases
2015-12-14 03:00:58 +01:00
Mike Fährmann
f7c47a6018
add subcategories to extractors
2015-11-30 01:11:13 +01:00
Mike Fährmann
1bce63124b
[3dbooru] update to new format
2015-11-21 01:48:44 +01:00
Mike Fährmann
3b0fe8f544
unify booru filename-patterns
2015-11-06 16:48:33 +01:00
Mike Fährmann
3c13548f29
rewrite extractors to use config-module
2015-10-05 15:51:08 +02:00
Mike Fährmann
9c25c15438
[3dbooru] fix default regex
2015-05-04 18:22:07 +02:00
Mike Fährmann
a2cfbe445f
add extractor '3dbooru'
2015-04-15 22:24:27 +02:00