Mike Fährmann
3ecb512722
send Referer headers by default
2023-09-19 00:02:04 +02:00
Mike Fährmann
a453335a9f
remove test results in extractor modules
...
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
1d2b5d0c60
update test comment positions
...
always put them above the test they're referring to
2023-09-06 18:16:09 +02:00
Mike Fährmann
a383eca7f6
decouple extractor initialization
...
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().
This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
d12dd3813c
[imgur] fix internal image/album URLs
...
URLs from "link" attributes of newer images/albums were all returned
as 'https://imgur.com/gallery/ ...' instead of the expected format,
causing them to be ignored.
2023-05-06 15:13:38 +02:00
Mike Fährmann
8520de57f0
[imgur] add 'favorite-folder' extractor ( #4016 )
2023-05-06 15:10:13 +02:00
Mike Fährmann
aaf58a1259
[imgur] document 'client-id' option ( #3937 )
2023-04-21 15:08:50 +02:00
ClosedPort22
bf1649dadb
[imgur] add support for imgur.io URLs
2022-12-17 14:33:44 +08:00
Mike Fährmann
4598d32370
[imgur] prevent exception for empty albums ( closes #2557 )
2022-05-04 17:34:50 +02:00
Mike Fährmann
bd08ee2859
remove most 'yield Message.Version' statements
...
only leave them in oauth.py as noop results
2021-08-16 03:10:48 +02:00
Mike Fährmann
4fc9668922
[imgur] update URL patterns ( #1561 )
2021-05-19 15:44:10 +02:00
Mike Fährmann
0b55f5ad84
[imgur] fix/improve rate limit handling ( #1386 )
...
- also wait-and-retry on 429 status codes
- use infinite loop instead of recursive calls
- 'extractor.sleep()' -> 'extractor.wait()'
2021-03-18 15:45:26 +01:00
Mike Fährmann
3df527ee2c
update extractor test results
2021-02-27 21:01:29 +01:00
Mike Fährmann
968d3e8465
remove '&' from URL patterns
...
'/?&#' -> '/?#' and '?&#' -> '?#'
According to https://www.ietf.org/rfc/rfc3986.txt , URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
799ca07fc8
[imgur] update
...
- fix image/album detection for galleries
- use new API endpoints for image/album data
2020-09-06 21:11:32 +02:00
Mike Fährmann
ab1af66a97
[imgur] add 'search' extractor ( #934 )
2020-08-27 22:46:17 +02:00
Mike Fährmann
e4bbc1fb5c
[imgur] add 'tag' extractor ( #934 )
2020-08-27 22:46:17 +02:00
Mike Fährmann
ec5870576d
[imgur] handle 403 overcapacity responses ( closes #910 )
2020-07-30 19:26:01 +02:00
Mike Fährmann
27d163afb3
[imgur] support all '/t/...' URLs ( closes #880 )
...
… instead of just '/t/unmuted/'
2020-07-09 22:17:01 +02:00
Mike Fährmann
bd0e1ca1a5
[imgur] build directory path for each file ( closes #842 )
2020-06-21 19:25:52 +02:00
Mike Fährmann
6bcdb264e0
[imgur] treat 't/unmuted' URLs as galleries
2020-05-25 22:21:57 +02:00
Mike Fährmann
b6cee3e45b
[imgur] fix extraction of animated images without 'mp4' entry
2020-05-25 22:21:57 +02:00
Mike Fährmann
4e361b3008
add tests for specific datetime values
2020-02-23 16:48:30 +01:00
Mike Fährmann
32d7195d08
[pinterest] improve detection of invalid pin.it links
2020-01-18 21:06:44 +01:00
Mike Fährmann
1f2a69f3c5
add '_extractor' information to redirect results
2019-12-29 23:37:34 +01:00
Mike Fährmann
6e23c0da09
[imgur] add extractor for subreddit links ( closes #500 )
2019-12-02 23:44:13 +01:00
Mike Fährmann
e9aed62c91
[imgur] unescape image titles
2019-11-28 22:13:24 +01:00
Mike Fährmann
b0197098e6
[imgur] get title from webpage if missing in API response
...
(closes #467 )
2019-11-07 21:10:04 +01:00
Mike Fährmann
8f38a35b91
[imgur] use API with "public" client_id ( #446 )
...
Using the API endpoints makes it possible to access NSFW content
without logging in.
2019-10-23 21:43:55 +02:00
Mike Fährmann
7ebd984e8d
[imgur] print error message if no JSON data is found ( #446 )
2019-10-16 17:45:14 +02:00
Mike Fährmann
5882b00f2f
[imgur] implement login support ( #446 )
2019-10-15 22:00:22 +02:00
Mike Fährmann
913460240d
[reddit] fix 'extractor.blacklist()' arguments
...
The second argument must support 'append()'.
2019-09-24 23:01:12 +02:00
Mike Fährmann
4330133114
[imgur] add 'favorite' extractor ( closes #420 )
...
… and use a newer site-internal API endpoint for user posts
2019-09-19 15:54:26 +02:00
Mike Fährmann
d780f0357e
[imgur] add user extractor
2019-09-17 22:58:18 +02:00
Mike Fährmann
7d6af936c5
[imgur] simplify gallery extraction
2019-08-20 20:00:43 +02:00
Mike Fährmann
829b1ccf04
[imgur] distinguish album and gallery URLs ( #380 )
...
A gallery can be either an album or a single image.
2019-08-14 21:40:14 +02:00
Mike Fährmann
fdec59f8e2
replace extractor.request() 'expect' argument
...
with
- 'fatal': allow 4xx status codes
- 'notfound': raise NotFoundError on 404
2019-07-05 00:42:16 +02:00
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
2019-02-11 13:31:10 +01:00
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
34bab080ae
rewrite URL patterns to use only 1 per extractor
2019-02-08 12:03:10 +01:00
Mike Fährmann
ff436692bf
["deviantart] add 'journals' option
2018-07-16 18:14:41 +02:00
Mike Fährmann
017188d268
improve extractor.request()
...
Replace the 'fatal' parameter with 'expect', which is a list/range
of HTTP status codes >= 400 that should also be accepted.
2018-06-18 16:29:56 +02:00
Mike Fährmann
ad14de19c6
[imgur] support "unmuted" URLs
2018-05-30 16:19:01 +02:00
Mike Fährmann
4cea886177
[imgur] allow longer album hashes
2018-05-13 11:21:51 +02:00
Mike Fährmann
1b80fa82a9
[imgur] update URL pattern and tests
2018-04-08 21:06:21 +02:00
Mike Fährmann
179bcdd349
adjust archive-ids
2018-02-13 04:50:45 +01:00
Mike Fährmann
3cec533c28
Merge branch 'archive'
2018-02-12 18:07:58 +01:00
Mike Fährmann
20af86b2ea
add more extractor tests
...
for mangastream, reddit and imgur
2018-02-12 17:07:18 +01:00
Mike Fährmann
7e0207bcf4
[imgur] strip trailing '?1' from 'ext'
2018-02-10 21:33:40 +01:00
Mike Fährmann
34873dbd90
set 'archive_fmt' values
...
These are going to be used to create an unique id for each image.
2018-02-01 15:30:49 +01:00