Mike Fährmann
1f9b16a70b
replace static 'sleep-request' defaults with dynamic ones
2023-12-18 22:06:26 +01:00
Mike Fährmann
3ecb512722
send Referer headers by default
2023-09-19 00:02:04 +02:00
Mike Fährmann
a453335a9f
remove test results in extractor modules
...
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
1d2b5d0c60
update test comment positions
...
always put them above the test they're referring to
2023-09-06 18:16:09 +02:00
Mike Fährmann
a383eca7f6
decouple extractor initialization
...
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().
This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
dd884b02ee
replace json.loads with direct calls to JSONDecoder.decode
2023-02-09 15:22:00 +01:00
Mike Fährmann
b0cb4a1b9c
replace 'text.extract()' with 'text.extr()' where possible
2022-11-05 01:14:09 +01:00
Mike Fährmann
ea8113ff36
[reactor] match 'best', 'new', 'all' URLs ( #3073 )
2022-10-19 10:52:33 +02:00
Mike Fährmann
d26da3b9e5
add pre-generated 'pattern' for supported BaseExtractor sites
2022-05-09 22:20:09 +02:00
Mike Fährmann
addb72e1bb
[reactor] support thatpervert.com ( closes #2029 )
2021-11-26 18:58:07 +01:00
Mike Fährmann
d8d9502e1e
[reactor] inherit from BaseExtractor
2021-11-26 18:58:07 +01:00
Mike Fährmann
bd08ee2859
remove most 'yield Message.Version' statements
...
only leave them in oauth.py as noop results
2021-08-16 03:10:48 +02:00
Nyasume
fa6af46756
Added ability to download GIFs instead of mp4 from Luscious and Reactor ( #1701 )
2021-08-12 15:12:42 +02:00
Mike Fährmann
21c2da454f
update extractor test results
2021-07-04 22:00:32 +02:00
Mike Fährmann
2c60c7d798
[reactor] skip deleted/empty posts
2021-05-21 16:14:09 +02:00
Mike Fährmann
bae874f370
replace 'wait-min/-max' with 'sleep-request'
...
on exhentai, idolcomplex, reactor
2021-03-02 22:55:45 +01:00
Mike Fährmann
3df527ee2c
update extractor test results
2021-02-27 21:01:29 +01:00
Mike Fährmann
65ca923b4e
fix 'whitelist' option for BaseExtractor instances
2021-02-15 21:58:33 +01:00
Mike Fährmann
912eea29bc
update extractor test results
2020-12-27 17:41:08 +01:00
Mike Fährmann
1e3dd7330e
merge SharedConfigMixin functionality into Extractor
2020-11-17 00:34:07 +01:00
Mike Fährmann
968d3e8465
remove '&' from URL patterns
...
'/?&#' -> '/?#' and '?&#' -> '?#'
According to https://www.ietf.org/rfc/rfc3986.txt , URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
dawidsowa
43b156fb40
[reactor] match URLs without subdomain ( #1053 )
2020-10-11 18:15:06 +02:00
Mike Fährmann
7619152988
[reactor] sort 'tags'
...
to ensure a consistent order for test results
2020-08-15 18:22:31 +02:00
Mike Fährmann
c50d60a53d
[reactor] fix image URLs
2019-08-16 14:07:22 +02:00
Mike Fährmann
b1db194c14
[reactor] update and improve
...
- split 'tags' into a list
- parse 'date' into a datetime object
- fix webm/mp4 URLs
2019-05-09 23:24:49 +02:00
Mike Fährmann
0f02e85961
[reactor] use "/full/" URLs ( closes #210 )
...
Putting a "/full/" in image URLs potentially gives higher resolution
and better quality.
2019-03-30 22:14:57 +01:00
Mike Fährmann
5530871b5a
change results of text.nameext_from_url()
...
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)
Example: "https://example.org/path/filename.ext "
before:
- filename : filename.ext
- name : filename
- extension: ext
now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
2e516a1e3e
store the full original URL in Extractor.url
2019-02-12 18:46:48 +01:00
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
2019-02-11 13:31:10 +01:00
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
050bc1aa4a
[reactor] simplify tests
...
Some posts have, for whatever reason, a slightly different text
formatting the first time they are accessed that day
compared to any further time.
2019-02-05 10:37:44 +01:00
Mike Fährmann
4d656a81ca
replace SharedConfigExtractor class with a Mixin
2019-02-04 13:46:02 +01:00
Mike Fährmann
1734a6c879
[reactor] detect "circular" redirects ( #148 )
2019-01-09 14:59:15 +01:00
Mike Fährmann
e53cdfd6a8
update build_supportedsites.py
2019-01-09 14:58:35 +01:00
Mike Fährmann
e95b24f056
[reactor] add wait-min & -max options ( #148 )
2019-01-07 18:04:47 +01:00
Mike Fährmann
8e01cf0ef8
[reactor] generalize extractors ( #148 )
...
- support *.reactor.cc domains
- combine joyreactor and pornreactor modules
2019-01-07 17:06:47 +01:00