Mike Fährmann
0232d80cec
[deviantart] convert 'published_time' to int ( fixes #108 )
...
The 'published_time' field (a timestamp) changed from integer to string
and caused journal creation to fail.
2018-09-13 19:52:01 +02:00
Mike Fährmann
7742cf8601
[tumblr] change 'reblogs' option ( #103 )
...
- rename "deleted" to "same-blog"
- change test for deleted original post to test if
original post owner has the same UUID (full blog name) as the one
being downloaded from
- add 'blog[uuid]' metadata to allow comparison with
'reblogged_from_uuid'
2018-09-10 15:40:25 +02:00
Mike Fährmann
f1695567e8
adjust values in template config file
...
(#104 )
2018-09-09 14:10:55 +02:00
Mike Fährmann
d4d95d3154
[tumblr] improve rewrite rules for video URLs
2018-09-09 14:09:47 +02:00
Mike Fährmann
542a25c389
[ngomik] fix extraction
2018-09-09 13:45:40 +02:00
Mike Fährmann
a666ddd16b
[tumblr] extend 'reblogs' functionality ( #103 )
...
Setting 'reblogs' to "deleted" will check if the parent post of a
reblog has been deleted and download its media content if that is the
case, otherwise it will be skipped.
This is a rather costly operation (1 API request per reblogged post)
and should therefore be used with care.
2018-09-07 19:13:52 +02:00
Mike Fährmann
c9b8e6aefc
[reddit] fix submission-ID parsing ( #104 )
...
Uppercase characters caused a ValueError exception
2018-09-07 18:27:54 +02:00
Mike Fährmann
488abeca0b
[hentaicafe] adjust default directory format
...
A separate folder for each chapter is rather pointless if almost all
manga have only one chapter each.
2018-09-07 18:25:58 +02:00
Mike Fährmann
b4eca2633e
[tumblr] support /archive URLs
2018-09-06 11:09:13 +02:00
Mike Fährmann
aa1de70da0
[tumblr] recognize inline videos ( #102 )
2018-09-06 10:37:40 +02:00
Mike Fährmann
3ecea4cf36
[hentaicafe] add chapter and manga extractors ( #101 )
2018-09-05 21:08:40 +02:00
Mike Fährmann
41249f3ead
improve extractor.get_downloader()
2018-09-05 18:17:16 +02:00
Mike Fährmann
eb3185d6a3
update exception hierarchy
2018-09-05 18:15:33 +02:00
Mike Fährmann
e9ae6fd080
improve downloader/postprocessor module loading
...
- handle arguments of any type without propagating an exception
- prevent potential security risk through relative imports
2018-09-05 16:39:40 +02:00
Mike Fährmann
712b58a93b
[postprocessor] add black-/whitelist options
...
Each post-processor config dict now supports a list of extractor
categories for which it should/shouldn't be active for.
For example:
"postprocessors": [
{"name": "classify",
"whitelist": ["tumblr", "deviantart"],
...
}
]
2018-09-03 14:53:43 +02:00
Mike Fährmann
8a23b21d0e
[tests] let 'pattern' require at least 1 URL
2018-09-02 21:19:44 +02:00
Mike Fährmann
0bc8ef51c8
[smugmug] Handle albums with no explicit owner ( #100 )
2018-09-01 12:55:02 +02:00
Mike Fährmann
ff83ee22b0
release version 1.5.2
2018-08-31 20:27:09 +02:00
Mike Fährmann
b47af4637a
[mangadex] update URL pattern
...
Manga URLs now begin with /title/ instead of /manga/
2018-08-31 20:16:50 +02:00
Mike Fährmann
75862715ac
[behance] add user extractor
2018-08-31 17:42:09 +02:00
Mike Fährmann
a493fed376
[deviantart] fix journal creation if no 'username' is set
2018-08-31 17:38:12 +02:00
Mike Fährmann
6ecb36d88c
[postprocessor:ugoira] add 'ffmpeg-output' option
2018-08-31 17:37:35 +02:00
Mike Fährmann
02a4a67f6d
[postprocessor:ugoira] support danbooru sources
2018-08-27 20:58:45 +02:00
Mike Fährmann
5b8a314de7
[tumblr] replace inline URLs with higher quality ones ( #98 )
2018-08-25 18:43:51 +02:00
Mike Fährmann
2af2bb7911
[mangadex] fix relative page URLs
2018-08-25 11:07:26 +02:00
Mike Fährmann
590c0b3ad5
re-implement and improve filename formatter
...
A format string now gets parsed only once instead of re-parsing it each
time it is applied to a set of data.
The initial parsing causes directory path creation to be at about 2x
slower than before, since each format string there is used only once,
but building a filename, the more common operation, is at least 2x
faster. The "directory slowness" cancels at about 5 filenames and
everything above that is significantly faster.
2018-08-25 10:45:14 +02:00
Mike Fährmann
34b556922d
update/restore tests
2018-08-23 15:47:40 +02:00
Mike Fährmann
ab2bfaeb46
[ngomik] add replacement for 'subapics'
...
http://subapics.com/ got discontinued and replaced by http://ngomik.in/ .
ngomik.in is still displaying a link to the "old site" showing a big
"Account Suspended" sign.
2018-08-23 15:29:53 +02:00
Mike Fährmann
a2eeef1f5e
[behance] replace test
...
The "UVMW Studio" account and their galleries are gone.
2018-08-19 21:17:21 +02:00
Mike Fährmann
e9dd2eff1d
[twitter] add extractor for media-tweet timelines ( #96 )
...
For example "https://twitter.com/PicturesEarth/media ".
They are different from normal timelines in that they do not contain
any (re)tweets from other users and feature all media the user ever
posted, including responses to other tweets.
2018-08-19 20:46:12 +02:00
Mike Fährmann
f45c9f2141
[gfycat] test-updates and code-adjustments
2018-08-18 23:04:45 +02:00
Mike Fährmann
9b1c39032c
[twitter] changes and improvements
...
- rename User- to TimelineExtractor
- rename 'userid' to 'user_id' to conform to the other ..._id values
- adjust archive_fmt to deal with retweets
- emulate browser behavior for API calls
2018-08-18 23:04:45 +02:00
Mike Fährmann
10365394d7
[twitter] add support for user-timelines ( closes #96 )
...
also adds a 'retweets' option to filter retweeted content
2018-08-17 20:04:11 +02:00
Mike Fährmann
e3055d356c
release version 1.5.1
2018-08-17 13:21:36 +02:00
Mike Fährmann
d3f1eed2a6
[pinterest] improvements
...
- add stop condition for pin-related pins
- improve URL patterns
- make Pylint happy
2018-08-16 18:11:39 +02:00
Mike Fährmann
2801a0d997
[exhentai] skip "Content Warning" page when not logged in
...
(closes #97 )
2018-08-16 09:17:22 +02:00
Mike Fährmann
63fa0b2006
[pinterest] add extractors for related pins
...
Related pins can not be accessed by adding a "#related" fragment
to the end of a Pinterest URL, for example:
- https://www.pinterest.com/pin/858146903966145189/#related
- https://www.pinterest.com/g1952849/test-/#related
There are no explicit real URLs for related pins,
using an option to enable them results in "clunky" code,
and a custom "related:<URL>" scheme doesn't feel right either.
2018-08-15 21:49:45 +02:00
Mike Fährmann
1694039de0
[komikcast] update ad-filter
2018-08-15 21:49:44 +02:00
Mike Fährmann
f9ded38d89
[test:results] add support for "range" options in tests
2018-08-15 21:49:44 +02:00
Mike Fährmann
c9e6ccbd7c
[test:extractor] small fixes and improvements
2018-08-15 21:49:33 +02:00
Mike Fährmann
792135a339
enable Python 3.7 for Travis CI tests
2018-08-14 11:54:01 +02:00
Mike Fährmann
a74591b84b
[tumblr] remove "original image" functionality
...
Accessing higher/original quality images on
https://s3.amazonaws.com/data.tumblr.com and http://data.tumblr.com
is no longer possible and any HTTP request results in 403 Forbidden.
A few images can still be accessed through https//a.tumblr.com [1][2],
but not as "_raw", just "_1280", and that might also be "fixed" in
the near future.
[1] https://a.tumblr.com/tumblr_kzjlfiTnfe1qz4rgho1_1280.jpg
[2] https://a.tumblr.com/ee589c6345f29d2d5935cecb49b0a705/tumblr_oztu02dIHp1wgha4yo1_1280.png
2018-08-14 11:51:17 +02:00
Mike Fährmann
38d4f43cc0
[komikcast] skip ads
2018-08-14 11:17:59 +02:00
Mike Fährmann
4313c95bc9
improve error message for OAuth2 authentication
2018-08-11 23:54:25 +02:00
Mike Fährmann
7f4e41c989
increase timeout during extractor tests
...
cloudflare's 522 response takes longer than 30 seconds
2018-08-10 16:51:05 +02:00
Mike Fährmann
b55e39d1ee
[mangadex] improve extraction
...
- cache manga API results
- add artist, author and date fields to chapter metadata
- remove Manga-/ChapterExtractor inheritance
- minor code simplifications and improvements
2018-08-10 16:50:07 +02:00
Mike Fährmann
b1c4c1e13c
[mangadex] fix extraction
2018-08-08 18:08:26 +02:00
Mike Fährmann
3c90df6635
[piczel] add user, folder and image extractors
2018-08-08 10:53:01 +02:00
Mike Fährmann
2a9f3341a2
[behance] fix title extraction
2018-08-08 10:48:58 +02:00
Mike Fährmann
3fc2f269fa
[behance] filter 'fields' list
2018-08-07 12:14:41 +02:00