Mike Fährmann
e5d81bdc7b
[mangadex] handle 'external' chapters ( closes #1154 )
2020-12-04 20:56:30 +01:00
Mike Fährmann
447488fb18
[instagram] rewrite
...
(#1113 , #1122 , #1128 , #1130 , #1149 )
Rely on the results of GraphQL queries instead of requesting data
for each post separately via '/p/<shortcode>/?__a=1'.
This might result in some missing metadata, and there might be some
issues for '/channel/' and '/saved/' URLs, but at least downloading
from the regular post listings should work without issues and without
getting users blocked/banned.
TODO: reimplement support for stories
2020-12-03 14:30:59 +01:00
Mike Fährmann
cc15fbe71a
[moebooru] add generalized extractors for moebooru sites
...
- add support for sakugabooru.com (closes #1136 )
- add support for lolibooru.moe (closes #1050 )
This allows users to dynamically add support for moebooru/myimouto
based sites by adding an entry to their config file
(like for foolslide, foolfuuka, etc)
For example:
{
"extractor": {
"moebooru": {
"new-site-1": {"root": "https://site1.net "},
"new-site-2": {"root": "https://www.site2.moe "}
}
}
}
2020-12-01 22:27:18 +01:00
Mike Fährmann
43120407cc
[paheal] create directory for each post ( closes #1147 )
2020-12-01 12:14:55 +01:00
Mike Fährmann
63e61a0932
[twitter] update image URL format ( #1145 )
...
use
'/<name>?format=<fmt>&name=<size>'
instead of the potentially deprecated
'/<name>.<fmt>:<size>'
but keep all of them as fallback URLs
2020-12-01 11:53:51 +01:00
Mike Fährmann
1a4b61f7eb
[downloader:http] fix issues with chunked transfer encoding
...
(fixes #1144 )
2020-11-30 01:10:45 +01:00
Mike Fährmann
536c088462
[downloader:http] improve 'adjust-extensions' ( #776 )
...
Check file headers against a list of file signatures before
downloading the whole file and writing it to disk.
The file signature check needs some improvements (*),
but it produces usable results for the most part.
(*)
- 'webp', 'wav', and others start with 'RFFI'
- 'svg' uses the same "signature" as all XML documents
- 'webm' has the same signature as 'mkv' files
- only 'mp3' files in an ID3v2 container get recognized
2020-11-29 20:55:35 +01:00
Mike Fährmann
46323ae6ff
initialize 'hooks' as empty tuple
...
follow-up to 9c29fc4e
Prevent a "race" between initializing 'pathfmt' and 'hooks',
and receiving a signal in between (e.g. ctrl+c),
which would then crash in 'handle_finalize()'.
2020-11-28 18:18:49 +01:00
Mike Fährmann
06af57e84a
update CHANGELOG and README for 1.15.4
2020-11-28 00:09:34 +01:00
Mike Fährmann
9c29fc4e55
always initialize DownloadJob.hooks ( fixes #1135 )
...
and not just when any (potential) post processors are defined
2020-11-28 00:09:19 +01:00
Mike Fährmann
ae6a1d5fbc
[mangoxo] fix extraction 2
2020-11-27 13:55:30 +01:00
Mike Fährmann
0bc492c0fa
add docs for 'event' and 'filename' options
...
from 9c3568c3
and ca59bd69
2020-11-25 12:12:41 +01:00
Mike Fährmann
f6a684bc37
[hentainexus] update data decoding procedure ( #1125 )
2020-11-25 11:26:26 +01:00
Mike Fährmann
c57a918f4a
[e621] implement delay via '_request_interval_min'
2020-11-25 00:19:32 +01:00
Mike Fährmann
93ce7466e2
[2chan] skip external links
2020-11-24 16:41:47 +01:00
Mike Fährmann
547107307e
fix 'Metadata' messages in result tests
2020-11-24 13:34:54 +01:00
Mike Fährmann
b214e89b5c
[mangoxo] fix extraction
2020-11-24 12:50:46 +01:00
Mike Fährmann
578dcf805c
[mangapanda] don't force https://
2020-11-21 20:24:37 +01:00
Mike Fährmann
102c482f5e
[reddit] skip invalid/failed gallery items ( fixes #1127 )
2020-11-21 17:34:38 +01:00
Mike Fährmann
174945d2b2
[hentainexus] fix extraction ( fixes #1125 )
2020-11-20 22:31:35 +01:00
Mike Fährmann
ca59bd691c
[postprocessor:metadata] add 'event' and 'filename' options
2020-11-20 22:29:11 +01:00
Mike Fährmann
9c3568c397
[postprocessor:exec] add 'event' option
...
and remove 'final' option -- use '"event": "finalize"' instead.
2020-11-19 02:30:48 +01:00
Mike Fährmann
9fffa9c343
rework post processor callbacks
2020-11-19 02:29:06 +01:00
Mike Fährmann
d6986be8b0
Move CI to GitHub Actions
2020-11-17 19:40:45 +01:00
Mike Fährmann
f99c6031e0
apply post processor blacklists/whitelists to basecategories
...
(#1103 )
2020-11-17 02:02:31 +01:00
Mike Fährmann
1e3dd7330e
merge SharedConfigMixin functionality into Extractor
2020-11-17 00:34:07 +01:00
Mike Fährmann
ddfb4fd07a
[twitter] use ' https://twitter.com/i/api/ ' for logged in users
...
Doesn't seem to make a difference from what I can tell,
i.e. downloaded files are the same, but the website does it.
2020-11-16 11:26:37 +01:00
Mike Fährmann
42ccae53c4
[mangadex] switch to API v2
...
https://mangadex.org/api/v2/
https://mangadex.org/thread/351011
2020-11-16 11:05:17 +01:00
Mike Fährmann
ca44111726
[flickr] update
...
- ensure every photo has an 'owner' (#828 )
- change default directories to a more consistent schema
- create directory for each photo
2020-11-15 10:44:29 +01:00
Mike Fährmann
9b1bd09454
change 'extension-map' default
...
Replace all JPEG filename extensions with 'jpg'.
2020-11-14 22:40:31 +01:00
Mike Fährmann
e5438b8a29
release version 1.15.3
2020-11-13 15:50:05 +01:00
Mike Fährmann
de0c57886d
[twitter] add 'list-members' extractor ( closes #1096 )
2020-11-13 06:47:45 +01:00
Mike Fährmann
904ba08568
[gfycat] fix default filename format
2020-11-13 06:37:21 +01:00
Mike Fährmann
a46561bc16
[500px] update query hashes
2020-11-13 06:36:11 +01:00
Mike Fährmann
2e3a0dff21
[8kun] fix file URLs of older posts ( fixes #1101 )
2020-11-07 23:10:37 +01:00
Mike Fährmann
00825cddf5
[hentaifoundry] use scheme from input URL ( fixes #1095 )
...
Let the user choose between http and https,
instead of always forcing https.
2020-11-07 22:40:02 +01:00
Mike Fährmann
8a98d3549a
[weasyl] create directory for each favorite submission
...
(#1032 )
2020-11-07 18:47:55 +01:00
Mike Fährmann
91db8df1c7
[deviantart] add 'index_base36' metadata field ( closes #1099 )
...
This is the same ID as found in 'filename' without the 'd' in front,
which is just 'index' encoded in base36.
2020-11-07 18:39:50 +01:00
Mike Fährmann
b9bfa4c675
update extractor test results
2020-11-07 02:03:22 +01:00
Mike Fährmann
1b5b789401
[mangoxo] fix metadata extraction
2020-11-07 01:35:29 +01:00
Mike Fährmann
41d4968866
[twitter] add 'list' extractor ( #1096 )
2020-11-05 22:55:38 +01:00
Mike Fährmann
5d10520f4c
[twitter] update GraphQL endpoint & fix width/height entries
2020-11-05 22:53:29 +01:00
Mike Fährmann
bc7b1d91bc
fix rST markup in configuration.rst
...
[ci skip]
2020-11-02 15:32:29 +01:00
Mike Fährmann
9b2e5f72d6
[exhentai] update image URL parsing ( #1094 )
2020-11-02 15:28:54 +01:00
Mike Fährmann
e3480bc8de
implement 'extension-map' option ( #318 )
2020-11-02 15:27:07 +01:00
Mike Fährmann
98a4d86a01
[sankakucomplex] extract videos and embeds ( closes #308 )
2020-10-30 01:21:11 +01:00
Mike Fährmann
c3f01dc4e6
implement 'util.unique()'
2020-10-29 23:33:41 +01:00
Mike Fährmann
558cde139c
[paheal] fix extraction ( fixes #1088 )
2020-10-28 21:51:31 +01:00
Mike Fährmann
0211af7ca8
[hentaifoundry] update 'YII_CSRF_TOKEN' cookie handling
...
(fixes #1083 )
2020-10-28 21:49:03 +01:00
Mike Fährmann
d83b95fd28
[postprocessor:metadata] accept a string-list for 'content-format'
...
(closes #1080 )
2020-10-27 20:09:58 +01:00