Mike Fährmann
eb3185d6a3
update exception hierarchy
2018-09-05 18:15:33 +02:00
Mike Fährmann
e9ae6fd080
improve downloader/postprocessor module loading
...
- handle arguments of any type without propagating an exception
- prevent potential security risk through relative imports
2018-09-05 16:39:40 +02:00
Mike Fährmann
712b58a93b
[postprocessor] add black-/whitelist options
...
Each post-processor config dict now supports a list of extractor
categories for which it should/shouldn't be active for.
For example:
"postprocessors": [
{"name": "classify",
"whitelist": ["tumblr", "deviantart"],
...
}
]
2018-09-03 14:53:43 +02:00
Mike Fährmann
8a23b21d0e
[tests] let 'pattern' require at least 1 URL
2018-09-02 21:19:44 +02:00
Mike Fährmann
0bc8ef51c8
[smugmug] Handle albums with no explicit owner ( #100 )
2018-09-01 12:55:02 +02:00
Mike Fährmann
ff83ee22b0
release version 1.5.2
2018-08-31 20:27:09 +02:00
Mike Fährmann
b47af4637a
[mangadex] update URL pattern
...
Manga URLs now begin with /title/ instead of /manga/
2018-08-31 20:16:50 +02:00
Mike Fährmann
75862715ac
[behance] add user extractor
2018-08-31 17:42:09 +02:00
Mike Fährmann
a493fed376
[deviantart] fix journal creation if no 'username' is set
2018-08-31 17:38:12 +02:00
Mike Fährmann
6ecb36d88c
[postprocessor:ugoira] add 'ffmpeg-output' option
2018-08-31 17:37:35 +02:00
Mike Fährmann
02a4a67f6d
[postprocessor:ugoira] support danbooru sources
2018-08-27 20:58:45 +02:00
Mike Fährmann
5b8a314de7
[tumblr] replace inline URLs with higher quality ones ( #98 )
2018-08-25 18:43:51 +02:00
Mike Fährmann
2af2bb7911
[mangadex] fix relative page URLs
2018-08-25 11:07:26 +02:00
Mike Fährmann
590c0b3ad5
re-implement and improve filename formatter
...
A format string now gets parsed only once instead of re-parsing it each
time it is applied to a set of data.
The initial parsing causes directory path creation to be at about 2x
slower than before, since each format string there is used only once,
but building a filename, the more common operation, is at least 2x
faster. The "directory slowness" cancels at about 5 filenames and
everything above that is significantly faster.
2018-08-25 10:45:14 +02:00
Mike Fährmann
34b556922d
update/restore tests
2018-08-23 15:47:40 +02:00
Mike Fährmann
ab2bfaeb46
[ngomik] add replacement for 'subapics'
...
http://subapics.com/ got discontinued and replaced by http://ngomik.in/ .
ngomik.in is still displaying a link to the "old site" showing a big
"Account Suspended" sign.
2018-08-23 15:29:53 +02:00
Mike Fährmann
a2eeef1f5e
[behance] replace test
...
The "UVMW Studio" account and their galleries are gone.
2018-08-19 21:17:21 +02:00
Mike Fährmann
e9dd2eff1d
[twitter] add extractor for media-tweet timelines ( #96 )
...
For example "https://twitter.com/PicturesEarth/media ".
They are different from normal timelines in that they do not contain
any (re)tweets from other users and feature all media the user ever
posted, including responses to other tweets.
2018-08-19 20:46:12 +02:00
Mike Fährmann
f45c9f2141
[gfycat] test-updates and code-adjustments
2018-08-18 23:04:45 +02:00
Mike Fährmann
9b1c39032c
[twitter] changes and improvements
...
- rename User- to TimelineExtractor
- rename 'userid' to 'user_id' to conform to the other ..._id values
- adjust archive_fmt to deal with retweets
- emulate browser behavior for API calls
2018-08-18 23:04:45 +02:00
Mike Fährmann
10365394d7
[twitter] add support for user-timelines ( closes #96 )
...
also adds a 'retweets' option to filter retweeted content
2018-08-17 20:04:11 +02:00
Mike Fährmann
e3055d356c
release version 1.5.1
2018-08-17 13:21:36 +02:00
Mike Fährmann
d3f1eed2a6
[pinterest] improvements
...
- add stop condition for pin-related pins
- improve URL patterns
- make Pylint happy
2018-08-16 18:11:39 +02:00
Mike Fährmann
2801a0d997
[exhentai] skip "Content Warning" page when not logged in
...
(closes #97 )
2018-08-16 09:17:22 +02:00
Mike Fährmann
63fa0b2006
[pinterest] add extractors for related pins
...
Related pins can not be accessed by adding a "#related" fragment
to the end of a Pinterest URL, for example:
- https://www.pinterest.com/pin/858146903966145189/#related
- https://www.pinterest.com/g1952849/test-/#related
There are no explicit real URLs for related pins,
using an option to enable them results in "clunky" code,
and a custom "related:<URL>" scheme doesn't feel right either.
2018-08-15 21:49:45 +02:00
Mike Fährmann
1694039de0
[komikcast] update ad-filter
2018-08-15 21:49:44 +02:00
Mike Fährmann
f9ded38d89
[test:results] add support for "range" options in tests
2018-08-15 21:49:44 +02:00
Mike Fährmann
c9e6ccbd7c
[test:extractor] small fixes and improvements
2018-08-15 21:49:33 +02:00
Mike Fährmann
792135a339
enable Python 3.7 for Travis CI tests
2018-08-14 11:54:01 +02:00
Mike Fährmann
a74591b84b
[tumblr] remove "original image" functionality
...
Accessing higher/original quality images on
https://s3.amazonaws.com/data.tumblr.com and http://data.tumblr.com
is no longer possible and any HTTP request results in 403 Forbidden.
A few images can still be accessed through https//a.tumblr.com [1][2],
but not as "_raw", just "_1280", and that might also be "fixed" in
the near future.
[1] https://a.tumblr.com/tumblr_kzjlfiTnfe1qz4rgho1_1280.jpg
[2] https://a.tumblr.com/ee589c6345f29d2d5935cecb49b0a705/tumblr_oztu02dIHp1wgha4yo1_1280.png
2018-08-14 11:51:17 +02:00
Mike Fährmann
38d4f43cc0
[komikcast] skip ads
2018-08-14 11:17:59 +02:00
Mike Fährmann
4313c95bc9
improve error message for OAuth2 authentication
2018-08-11 23:54:25 +02:00
Mike Fährmann
7f4e41c989
increase timeout during extractor tests
...
cloudflare's 522 response takes longer than 30 seconds
2018-08-10 16:51:05 +02:00
Mike Fährmann
b55e39d1ee
[mangadex] improve extraction
...
- cache manga API results
- add artist, author and date fields to chapter metadata
- remove Manga-/ChapterExtractor inheritance
- minor code simplifications and improvements
2018-08-10 16:50:07 +02:00
Mike Fährmann
b1c4c1e13c
[mangadex] fix extraction
2018-08-08 18:08:26 +02:00
Mike Fährmann
3c90df6635
[piczel] add user, folder and image extractors
2018-08-08 10:53:01 +02:00
Mike Fährmann
2a9f3341a2
[behance] fix title extraction
2018-08-08 10:48:58 +02:00
Mike Fährmann
3fc2f269fa
[behance] filter 'fields' list
2018-08-07 12:14:41 +02:00
Mike Fährmann
b67339155f
[rule34] update test results
...
'metadata' tag type has been removed
2018-08-07 12:13:34 +02:00
Mike Fährmann
a86f2bfc80
[pinterest] update not-found redirects
2018-08-07 12:13:19 +02:00
Mike Fährmann
7442d2940c
release version 1.5.0
2018-08-03 17:50:27 +02:00
Mike Fährmann
b040ca0718
[rule34] small unit test fixes
2018-08-03 17:28:47 +02:00
Mike Fährmann
b164231bca
[sankaku] increase default values for 'wait-min/-max'
2018-08-03 17:06:51 +02:00
Mike Fährmann
68d6033a5d
use 'retries' and 'timeout' options for regular HTTP requests
2018-08-02 16:11:54 +02:00
Mike Fährmann
f3793660ef
update tests
2018-08-02 14:57:28 +02:00
Mike Fährmann
42a346413b
fix "re:" prefix for keyword tests
2018-08-02 14:48:51 +02:00
Mike Fährmann
df082e923c
[behance] add gallery extractor ( #95 )
2018-08-01 21:46:55 +02:00
Mike Fährmann
c83fc62abc
prioritize archive over disk access ( #87 )
2018-07-30 17:48:23 +02:00
Mike Fährmann
e0dd8dff5f
implement L<maxlen>/<replacement>/ format option
...
The L option allows for the contents of a format field to be replaced
with <replacement> if its length is greater than <maxlen>.
Example:
{f:L5/too long/} -> "foo" (if "f" is "foo")
-> "too long" (if "f" is "foobar")
(#92 ) (#94 )
2018-07-29 13:52:07 +02:00
Mike Fährmann
5f27cfeff6
[deviantart] remove prefer-public
option
...
All API requests now always use a public token and only switch to
a private token for pagination results if `refresh-token` is set
and less deviations than requested were returned.
2018-07-26 19:43:46 +02:00