Mike Fährmann
5d4494b15f
add "ascii" as a special 'path-restrict' value
2021-01-09 02:41:20 +01:00
Mike Fährmann
5818c928c4
refactor 'path-restrict' parsing
2021-01-09 02:33:42 +01:00
Mike Fährmann
aac00a2024
add 'd' conversion for format strings
...
to convert a timestamp to a formattable 'datetime' object.
For example '{created_at!d:%Y-%m-%d}'
transforms the timestamp in 'created_at' into a 'datetime' object
and then formats its content using '%Y-%m-%d' as template.
1262304000 -> datetime(2010, 1, 1) -> "2010-01-01"
2021-01-09 01:58:44 +01:00
Mike Fährmann
20bd9cd296
[wikiart] add extractor for single paintings ( closes #1233 )
...
There is no API endpoint for single paintings from what I can tell,
so this uses the site's search.
2021-01-08 23:19:00 +01:00
Mike Fährmann
e2d4ca4955
[deviantart] improve '--range' for favorites ( closes #1226 )
2021-01-08 22:57:35 +01:00
Mike Fährmann
56ccb9951a
[gfycat] add 'date' metadata field ( #1138 )
2021-01-08 17:45:09 +01:00
Mike Fährmann
f2b83b8578
[gfycat] convert IDs to lowercase
...
Redgifs expects all IDs and names to be lowercase
and throws a 404 if an ID contains an uppercase letter.
Gfycat on the other hand doesn't care about case,
so it's fine to just convert all IDs.
(#1138 )
2021-01-08 17:41:45 +01:00
Mike Fährmann
b3bc646236
[redgifs] match embedded URLs
...
https://redgifs.com/ifr/ <ID>
2021-01-08 16:01:01 +01:00
Mike Fährmann
98e0d21383
[instagram] categorize single highlight URLs as 'highlights'
...
They were categorized as 'stories' before.
(fixes #1222 )
2021-01-08 15:56:27 +01:00
Mike Fährmann
1c9435e0df
add '-G' command-line option ( #1217 )
...
A "stronger" version of '-g', resolving all intermediate URLs.
2021-01-07 19:07:05 +01:00
Mike Fährmann
fa8ee6eac4
[derpibooru] add search and gallery extractors ( #862 )
2021-01-07 18:05:32 +01:00
Mike Fährmann
3759d0cb42
[redgifs] fix search results
...
The metadata for Redgifs search results got stripped down to a bare
minimum, including download URLs. (Clicking on search results on the
website itself is broken as well)
As a workaround, we make an extra call to '/v1/gfycats/<ID>'
for each search result entry to fetch the missing data.
2021-01-06 18:16:06 +01:00
Mike Fährmann
8a88025dc4
[pinterest] support generic user URLs ( #1205 )
...
i.e. https://www.pinterest.com/USERNAME
also renames 'BoardsExtractor' to 'UserExtractor'
2021-01-02 02:36:53 +01:00
Mike Fährmann
56b460dcea
[foolfuuka] add 'search' extractors ( #1174 )
2021-01-02 02:34:06 +01:00
Mike Fährmann
fb64183d53
[foolfuuka] add 'board' extractors ( closes #1044 )
2021-01-01 19:33:35 +01:00
Mike Fährmann
0594821fcd
[downloader:http] add MIME type and signature for .ico files
...
(closes #1211 )
2021-01-01 16:07:33 +01:00
Mike Fährmann
b0beed7a06
[sankaku] add support for book searches ( closes #1204 )
2020-12-29 17:36:37 +01:00
Mike Fährmann
6cdbab07b5
[pinterest] add support for getting all boards of a user
...
(#1205 )
2020-12-29 16:57:03 +01:00
Mike Fährmann
25074aec47
[twitter] fetch media from pinned tweets ( #1203 )
2020-12-29 16:27:43 +01:00
Mike Fährmann
2475176d99
[twitter] fetch tweets from 'homeConversation' entries
...
When logged in, some entries returned by Twitter's API are so called
'homeConversation's (they would be regular tweet entries otherwise.)
Those weren't picked up before and resulted in missing files compared
to accessing a timeline as guest.
('/media' timelines and search results were not affected)
2020-12-29 00:42:46 +01:00
Mike Fährmann
3af9350648
[twitter] update API calls
...
- use 'https://twitter.com/i/api ' for all requests
except '/guest/activate.json'
- update (default) URL parameters
- update GraphQL endpoints
2020-12-28 22:05:48 +01:00
Mike Fährmann
b656b829db
[twitter] fix login with username & password
...
It is no longer possible to get an 'authenticity_token' from Twitter's
Javascript-free login form, which got disabled few days ago.
Generating a random 16 byte hex string client-side and sending that as
a cookie alongside the regular login form works just as well.
2020-12-28 16:10:19 +01:00
Mike Fährmann
d1903589a5
release version 1.16.1
2020-12-27 18:28:33 +01:00
Mike Fährmann
912eea29bc
update extractor test results
2020-12-27 17:41:08 +01:00
Mike Fährmann
47a7a51944
[sankaku] fix 'invalid_token' detection
2020-12-27 02:31:01 +01:00
Mike Fährmann
ba5df84f7e
[keenspot] improve redirect handling
...
Before it would use http:// for all requests and
get a redirect to a https:// version if those are supported.
Now the redirect only happens once during the first request.
2020-12-26 21:38:40 +01:00
Mike Fährmann
d781e6ac44
[e621] return pool posts in order ( closes #1195 )
...
… and add a 'num' enumeration index.
A bit more code than the PR version, but it prints some helpful messages
and doesn't call 'metadata()' twice.
2020-12-26 19:00:29 +01:00
Mike Fährmann
e7d446a8f7
[danbooru] slight code refactoring
2020-12-25 22:06:25 +01:00
Mike Fährmann
e41e2be2f9
[booru] split '_prepare_post()'
2020-12-24 01:13:54 +01:00
Mike Fährmann
53222445d5
[hentaicafe] simplify default filenames
2020-12-23 01:03:08 +01:00
Mike Fährmann
712c792fbe
[hentaicafe] prefer title of /hc.fyi/ pages ( closes #1106 )
2020-12-23 01:01:15 +01:00
Mike Fährmann
2c4d4a75db
[mangadex] respect 'chapter-reverse' settings ( closes #1194 )
...
The extractor in question doesn't inherit from MangaExtractor
and therefore didn't do this automatically.
2020-12-22 15:08:10 +01:00
Mike Fährmann
3bd08acc8f
[pixiv] output debug message on failed login attempt
...
(#1192 )
2020-12-22 14:59:31 +01:00
Mike Fährmann
b58e605dc7
raise error when required username or password are missing
...
do not try to login as 'None' (#1192 )
2020-12-22 14:40:18 +01:00
Mike Fährmann
b233531aaa
[sankaku] use '/posts' endpoint for single posts
2020-12-22 02:44:40 +01:00
Mike Fährmann
459a0af4f8
[sankaku] add support for sankaku.app URLs ( closes #1193 )
2020-12-22 01:57:53 +01:00
Mike Fährmann
371e9ca6df
[pinterest] implement video support ( closes #1189 )
2020-12-21 16:09:06 +01:00
Mike Fährmann
537742c0ee
[sankaku] normalize 'created_at' metadata ( closes #1190 )
2020-12-21 02:06:29 +01:00
Mike Fährmann
ae6748996a
[pornhub] update tests
2020-12-21 02:06:28 +01:00
Mike Fährmann
bf629a2818
[instagram] add 'include' option ( closes #1180 )
...
Split the functionality of the old 'user' extractor into separate
'posts' and 'highlights' extractors, which respond to virtual URLs
('/<user>/posts' and '/<user>/highlights')
2020-12-21 02:06:28 +01:00
Mike Fährmann
78061658ea
[booru] reduce exceptions caught during _prepare_post()
...
don't catch HttpErrors etc.
2020-12-21 02:05:59 +01:00
Mike Fährmann
212ae0c399
[mangapanda] remove module
...
site now redirects to mangareader.net
2020-12-20 17:42:15 +01:00
Mike Fährmann
337b118e25
[instagram] warn about private profiles ( #1187 )
2020-12-19 22:32:28 +01:00
Mike Fährmann
e8c64dd961
[postprocessor:exec] do not auto-add '{}' to command ( #1185 )
...
This was initially done to mimic youtube-dl's behavior and
implementation of --exec, and it seemed reasonable at the time.
2020-12-19 20:53:46 +01:00
Mike Fährmann
0a3bbc9c63
[postprocessor:exec] update output
2020-12-19 20:36:39 +01:00
Mike Fährmann
511d8d3fa3
increase SQLite connection timeouts ( #1173 )
2020-12-19 20:15:07 +01:00
Mike Fährmann
465015f75a
[sankaku] reimplement login support ( #1176 , #1182 )
2020-12-17 16:12:59 +01:00
Mike Fährmann
8d2e4e5f13
[booru] improve error handling
...
e.g. for posts without a valid 'file_url' (#1176 )
2020-12-17 01:16:45 +01:00
Mike Fährmann
1f9121fecb
release version 1.16.0
2020-12-12 23:08:25 +01:00
Mike Fährmann
1d753542c2
[hentainexus] fix extraction ( fixes #1166 )
2020-12-12 20:30:51 +01:00
Mike Fährmann
b6f1fe59cb
add deprecation warnings for exec.final and metadata.bypost
2020-12-12 16:58:23 +01:00
Mike Fährmann
476d563ec2
[downloader:http] add MIME type and signature for .swf files
2020-12-11 14:21:04 +01:00
Mike Fährmann
a00b60fbe7
[twitter] update 'x-csrf-token' header ( fixes #1170 )
...
Twitter started using a bigger (80 instead of 16 bytes) CSRf token for
logged in users, and expects those to be used as 'x-csrf-token' header
when send via 'ct0' cookie.
Generating an 80 byte token ourselves doesn't work, and Twitter will
still insist on using its own.
2020-12-11 13:46:58 +01:00
Mike Fährmann
b88c97b873
[instagram] add 'cursor' option ( #1149 )
...
To enable at least 'some' way to continue downloading from the middle
of a user profile listing.
2020-12-11 13:46:58 +01:00
Mike Fährmann
0d406c8daf
[common] restrict values used in 'generate_extractors()'
2020-12-11 13:46:47 +01:00
Mike Fährmann
fe0265c7a5
[downloader.http] small improvements to file signature list
...
- specify multiple entries for gif, mp3, zip
- add entries for pdf
2020-12-08 21:20:18 +01:00
Mike Fährmann
b2c55f0a72
[sankaku] remove login support
...
The old login method for 'https://chan.sankakucomplex.com/user/login '
and the cookies it produces have no effect on the results from
'beta.sankakucomplex.com'.
2020-12-08 21:05:47 +01:00
Mike Fährmann
7f3d811d7b
[moebooru] inherit from BooruExtractor
2020-12-08 18:34:56 +01:00
Mike Fährmann
a3a863fc13
[booru] add generalized extractors for *booru sites
...
similar to cc15fbe7
2020-12-08 18:34:30 +01:00
Mike Fährmann
5f23441e12
[piczel] update API URLs
2020-12-07 15:56:32 +01:00
Mike Fährmann
47114339a2
[webtoons] update 'ageGate' cookie
2020-12-07 14:56:32 +01:00
Mike Fährmann
4225f12783
[nozomi] handle empty 'date' fields ( fixes #1163 )
2020-12-07 00:08:53 +01:00
Mike Fährmann
2b93515ee0
[instagram] reimplement support for stories ( #1149 )
2020-12-06 21:32:10 +01:00
Mike Fährmann
ecdea799dd
[sankaku] use 'beta.sankakucomplex.com' API endpoints
2020-12-05 22:08:58 +01:00
Mike Fährmann
b3ecc89a9a
[instagram] use double quotes for strings when possible
2020-12-05 19:33:42 +01:00
Mike Fährmann
76285eb60d
[instagram] reimplement support for story highlights ( #1149 )
2020-12-05 19:13:00 +01:00
Mike Fährmann
8ca7f54750
rename '_request_…' variables
...
- remove '_' at the beginning
- _request_last -> request_timestamp
2020-12-05 00:09:15 +01:00
Mike Fährmann
15a122aff3
[instagram] update 'X-IG-WWW-Claim' headers
2020-12-04 20:58:34 +01:00
Mike Fährmann
e5d81bdc7b
[mangadex] handle 'external' chapters ( closes #1154 )
2020-12-04 20:56:30 +01:00
Mike Fährmann
447488fb18
[instagram] rewrite
...
(#1113 , #1122 , #1128 , #1130 , #1149 )
Rely on the results of GraphQL queries instead of requesting data
for each post separately via '/p/<shortcode>/?__a=1'.
This might result in some missing metadata, and there might be some
issues for '/channel/' and '/saved/' URLs, but at least downloading
from the regular post listings should work without issues and without
getting users blocked/banned.
TODO: reimplement support for stories
2020-12-03 14:30:59 +01:00
Mike Fährmann
cc15fbe71a
[moebooru] add generalized extractors for moebooru sites
...
- add support for sakugabooru.com (closes #1136 )
- add support for lolibooru.moe (closes #1050 )
This allows users to dynamically add support for moebooru/myimouto
based sites by adding an entry to their config file
(like for foolslide, foolfuuka, etc)
For example:
{
"extractor": {
"moebooru": {
"new-site-1": {"root": "https://site1.net "},
"new-site-2": {"root": "https://www.site2.moe "}
}
}
}
2020-12-01 22:27:18 +01:00
Mike Fährmann
43120407cc
[paheal] create directory for each post ( closes #1147 )
2020-12-01 12:14:55 +01:00
Mike Fährmann
63e61a0932
[twitter] update image URL format ( #1145 )
...
use
'/<name>?format=<fmt>&name=<size>'
instead of the potentially deprecated
'/<name>.<fmt>:<size>'
but keep all of them as fallback URLs
2020-12-01 11:53:51 +01:00
Mike Fährmann
1a4b61f7eb
[downloader:http] fix issues with chunked transfer encoding
...
(fixes #1144 )
2020-11-30 01:10:45 +01:00
Mike Fährmann
536c088462
[downloader:http] improve 'adjust-extensions' ( #776 )
...
Check file headers against a list of file signatures before
downloading the whole file and writing it to disk.
The file signature check needs some improvements (*),
but it produces usable results for the most part.
(*)
- 'webp', 'wav', and others start with 'RFFI'
- 'svg' uses the same "signature" as all XML documents
- 'webm' has the same signature as 'mkv' files
- only 'mp3' files in an ID3v2 container get recognized
2020-11-29 20:55:35 +01:00
Mike Fährmann
46323ae6ff
initialize 'hooks' as empty tuple
...
follow-up to 9c29fc4e
Prevent a "race" between initializing 'pathfmt' and 'hooks',
and receiving a signal in between (e.g. ctrl+c),
which would then crash in 'handle_finalize()'.
2020-11-28 18:18:49 +01:00
Mike Fährmann
9c29fc4e55
always initialize DownloadJob.hooks ( fixes #1135 )
...
and not just when any (potential) post processors are defined
2020-11-28 00:09:19 +01:00
Mike Fährmann
ae6a1d5fbc
[mangoxo] fix extraction 2
2020-11-27 13:55:30 +01:00
Mike Fährmann
f6a684bc37
[hentainexus] update data decoding procedure ( #1125 )
2020-11-25 11:26:26 +01:00
Mike Fährmann
c57a918f4a
[e621] implement delay via '_request_interval_min'
2020-11-25 00:19:32 +01:00
Mike Fährmann
93ce7466e2
[2chan] skip external links
2020-11-24 16:41:47 +01:00
Mike Fährmann
b214e89b5c
[mangoxo] fix extraction
2020-11-24 12:50:46 +01:00
Mike Fährmann
578dcf805c
[mangapanda] don't force https://
2020-11-21 20:24:37 +01:00
Mike Fährmann
102c482f5e
[reddit] skip invalid/failed gallery items ( fixes #1127 )
2020-11-21 17:34:38 +01:00
Mike Fährmann
174945d2b2
[hentainexus] fix extraction ( fixes #1125 )
2020-11-20 22:31:35 +01:00
Mike Fährmann
ca59bd691c
[postprocessor:metadata] add 'event' and 'filename' options
2020-11-20 22:29:11 +01:00
Mike Fährmann
9c3568c397
[postprocessor:exec] add 'event' option
...
and remove 'final' option -- use '"event": "finalize"' instead.
2020-11-19 02:30:48 +01:00
Mike Fährmann
9fffa9c343
rework post processor callbacks
2020-11-19 02:29:06 +01:00
Mike Fährmann
f99c6031e0
apply post processor blacklists/whitelists to basecategories
...
(#1103 )
2020-11-17 02:02:31 +01:00
Mike Fährmann
1e3dd7330e
merge SharedConfigMixin functionality into Extractor
2020-11-17 00:34:07 +01:00
Mike Fährmann
ddfb4fd07a
[twitter] use ' https://twitter.com/i/api/ ' for logged in users
...
Doesn't seem to make a difference from what I can tell,
i.e. downloaded files are the same, but the website does it.
2020-11-16 11:26:37 +01:00
Mike Fährmann
42ccae53c4
[mangadex] switch to API v2
...
https://mangadex.org/api/v2/
https://mangadex.org/thread/351011
2020-11-16 11:05:17 +01:00
Mike Fährmann
ca44111726
[flickr] update
...
- ensure every photo has an 'owner' (#828 )
- change default directories to a more consistent schema
- create directory for each photo
2020-11-15 10:44:29 +01:00
Mike Fährmann
9b1bd09454
change 'extension-map' default
...
Replace all JPEG filename extensions with 'jpg'.
2020-11-14 22:40:31 +01:00
Mike Fährmann
e5438b8a29
release version 1.15.3
2020-11-13 15:50:05 +01:00
Mike Fährmann
de0c57886d
[twitter] add 'list-members' extractor ( closes #1096 )
2020-11-13 06:47:45 +01:00
Mike Fährmann
904ba08568
[gfycat] fix default filename format
2020-11-13 06:37:21 +01:00
Mike Fährmann
a46561bc16
[500px] update query hashes
2020-11-13 06:36:11 +01:00
Mike Fährmann
2e3a0dff21
[8kun] fix file URLs of older posts ( fixes #1101 )
2020-11-07 23:10:37 +01:00
Mike Fährmann
00825cddf5
[hentaifoundry] use scheme from input URL ( fixes #1095 )
...
Let the user choose between http and https,
instead of always forcing https.
2020-11-07 22:40:02 +01:00
Mike Fährmann
8a98d3549a
[weasyl] create directory for each favorite submission
...
(#1032 )
2020-11-07 18:47:55 +01:00
Mike Fährmann
91db8df1c7
[deviantart] add 'index_base36' metadata field ( closes #1099 )
...
This is the same ID as found in 'filename' without the 'd' in front,
which is just 'index' encoded in base36.
2020-11-07 18:39:50 +01:00
Mike Fährmann
b9bfa4c675
update extractor test results
2020-11-07 02:03:22 +01:00
Mike Fährmann
1b5b789401
[mangoxo] fix metadata extraction
2020-11-07 01:35:29 +01:00
Mike Fährmann
41d4968866
[twitter] add 'list' extractor ( #1096 )
2020-11-05 22:55:38 +01:00
Mike Fährmann
5d10520f4c
[twitter] update GraphQL endpoint & fix width/height entries
2020-11-05 22:53:29 +01:00
Mike Fährmann
9b2e5f72d6
[exhentai] update image URL parsing ( #1094 )
2020-11-02 15:28:54 +01:00
Mike Fährmann
e3480bc8de
implement 'extension-map' option ( #318 )
2020-11-02 15:27:07 +01:00
Mike Fährmann
98a4d86a01
[sankakucomplex] extract videos and embeds ( closes #308 )
2020-10-30 01:21:11 +01:00
Mike Fährmann
c3f01dc4e6
implement 'util.unique()'
2020-10-29 23:33:41 +01:00
Mike Fährmann
558cde139c
[paheal] fix extraction ( fixes #1088 )
2020-10-28 21:51:31 +01:00
Mike Fährmann
0211af7ca8
[hentaifoundry] update 'YII_CSRF_TOKEN' cookie handling
...
(fixes #1083 )
2020-10-28 21:49:03 +01:00
Mike Fährmann
d83b95fd28
[postprocessor:metadata] accept a string-list for 'content-format'
...
(closes #1080 )
2020-10-27 20:09:58 +01:00
Mike Fährmann
198c33ec36
also collect post processors from 'basecategory' entries
...
(fixes #1084 )
2020-10-27 19:56:48 +01:00
Mike Fährmann
350b1afe1c
speed up _list_classes() after iterating over all modules once
2020-10-26 22:18:15 +01:00
Mike Fährmann
5bcf28de93
add a 'extractor.modules' option
2020-10-25 03:05:10 +01:00
Mike Fährmann
18213dc5ba
release version 1.15.2
2020-10-24 18:57:29 +02:00
Mike Fährmann
de4a1e45c9
improve 'generate_csrf_token()'
...
no need to use hashlib.md5()
2020-10-24 02:56:40 +02:00
Mike Fährmann
b788712844
[fallenangels] fix extraction of '.5' chapters
2020-10-23 16:56:08 +02:00
Mike Fährmann
28d8541cb3
[mangafox] ensure download URLs have a scheme
2020-10-23 02:45:15 +02:00
Mike Fährmann
8e3a324c91
[mangakakalot] ignore "Go Home" buttons in chapter pages
2020-10-23 02:33:35 +02:00
Mike Fährmann
c14c5d82d6
[newgrounds] use generator for fallback URLs
2020-10-23 00:39:19 +02:00
Mike Fährmann
a09f42f6b3
improve filename_from_url() performance
...
Manually extracting the part between the last '/' and '?' instead of
relying on the standard libraries' 'urllib.parse.urlsplit()' increases
performance by ~400%.
urlsplit() : 3.64 secs per 1.000.000 iterations
partition(): 0.87 secs per 1.000.000 iterations
2020-10-23 00:14:06 +02:00
Mike Fährmann
968d3e8465
remove '&' from URL patterns
...
'/?&#' -> '/?#' and '?&#' -> '?#'
According to https://www.ietf.org/rfc/rfc3986.txt , URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
1686dc1757
[twitter] support media from Cards ( #1005 , #937 )
...
Can be enabled with 'extractor.twitter.cards', but for now disabled by
default because cards can redirect to rather large videos from YouTube
or Twitch.
2020-10-22 21:33:53 +02:00
Mike Fährmann
ffd38215a4
[hitomi] fix image URLs and URL pattern
...
- non-webp files are now hosted on [a-c]b.hitomi.la
- removed ampersand from invalid slug characters
2020-10-22 15:15:34 +02:00
Mike Fährmann
286718950c
[mangahere] ensure download URLs have a scheme ( fixes #1070 )
2020-10-17 22:43:59 +02:00
Mike Fährmann
76dfa11a65
[reddit] add 'date' metadata field ( closes #1068 )
2020-10-16 15:48:04 +02:00
Mike Fährmann
3f2ba629ea
[newgrounds] provide fallback URLs for video downloads ( #1042 )
2020-10-16 01:16:12 +02:00
Mike Fährmann
a3ca2f6080
update fallback URL handling
...
remove Message.Urllist and use a '_fallback' field inside a kwdict
2020-10-16 01:09:55 +02:00
Mike Fährmann
43dab3a228
[mangadex] unescape more metadata fields ( fixes #1066 )
...
like 'manga', 'author', 'artist', etc.
2020-10-16 00:41:15 +02:00
Mike Fährmann
ec61696316
add 't' format string conversion ( closes #1065 )
...
to Trim whitespace from the beginning and end of strings.
Example: '{field!t}' becomes 'foo' for 'field' == " \nfoo\t\r"
2020-10-16 00:37:22 +02:00
Mike Fährmann
5565025221
[xhamster] fix user profile extraction
2020-10-15 18:57:35 +02:00
Mike Fährmann
07432d6262
[seiga] fix flake8 and cookie test ( #1063 )
2020-10-15 15:37:58 +02:00
Mike Fährmann
b8daabc3ca
[pinterest] implement login support ( closes #1055 )
...
being logged allows access to secret/protected boards
2020-10-15 15:14:18 +02:00
Mike Fährmann
1b1cf01d0d
add a general 'generate_csrf_token()' function
2020-10-15 15:14:18 +02:00
Mike Fährmann
7a0ba370d1
[gelbooru] rewrite mp4 video URLs ( fixes #1048 )
2020-10-15 15:14:18 +02:00
Mike Fährmann
6491db3eaf
[blogger] handle URLs with specified width/height ( closes #1061 )
...
get highest quality for images with
/wXXX-hXXX/ instead of the usual /sXXX/
2020-10-15 15:14:18 +02:00
Mike Fährmann
783e0af26d
[hentaifoundry] update and simplify
2020-10-15 15:14:17 +02:00
Mike Fährmann
5b844a72b7
[newgrounds] handle embeds without scheme ( #1033 )
2020-10-15 15:13:54 +02:00
kurumigi
7e0e872f4f
[seiga] Add metadata for single image downloads ( #1063 )
...
* [seiga] Support image metadata.
* [seiga] Update test data.
* [seiga] Fix cookie check.
* [test_cookies] [seiga] Fit test_cookies.py to the last commit.
2020-10-15 15:13:27 +02:00
Zanny
3ec60e894a
[weasyl] api-key authentication ( #1057 )
...
* [weasyl] support api keys
* [weasyl] document api-key authentication
* [weasyl] usernames can contain ~
2020-10-15 15:12:09 +02:00
Mike Fährmann
35056a07d1
release version 1.15.1
2020-10-11 18:44:46 +02:00
Mike Fährmann
844793847c
update extractor test results
2020-10-11 18:15:41 +02:00
Mike Fährmann
ddd6840509
[behance] fix 'collection' extraction
2020-10-11 18:15:41 +02:00
Mike Fährmann
c5e3971b18
[newgrounds] extract image embeds ( closes #1033 )
2020-10-11 18:15:40 +02:00
dawidsowa
43b156fb40
[reactor] match URLs without subdomain ( #1053 )
2020-10-11 18:15:06 +02:00
Mike Fährmann
fd20093c96
allow blacklist/whitelist to be empty lists/strings ( #1051 )
2020-10-08 14:55:21 +02:00
Mike Fährmann
3ebb174f2c
add missing extractor info when spawning new ones ( fixes #1051 )
...
Not having this information causes the blacklist/whitelist logic to
trigger and prevents things from functioning as intended when using
default settings.
Fixes issues for 8muses, deviantart, exhentai, and mangoxo.
2020-10-08 14:34:53 +02:00
Mike Fährmann
f9c1684af7
[newgrounds] restore original video URLs ( #1042 )
2020-10-07 22:53:53 +02:00
Mike Fährmann
73373c06ec
[weibo] handle posts with more than 9 images ( closes #926 )
...
Responses from '/api/container/getIndex' don't list more than
9 images per 'status' object, but the embedded JSON from a
'/detail/<ID>' page does.
2020-10-06 18:16:08 +02:00
Mike Fährmann
dd1e545597
[hentaifoundry] rename GalleryExtractor to PicturesExtractor
2020-10-04 22:53:23 +02:00
Mike Fährmann
c874071f5a
[kissmanga] remove module
2020-10-04 22:46:41 +02:00
Mike Fährmann
93e04bf9a9
[500px] update query hashes
2020-10-03 19:25:28 +02:00
Mike Fährmann
844502cad5
update extractor test results
2020-10-03 19:24:19 +02:00
Mike Fährmann
fad7748b6b
[xvideos] fix 'title' extraction
2020-10-01 22:04:14 +02:00
Mike Fährmann
5b927c15df
[newgrounds] fix video extraction ( closes #1042 )
2020-10-01 20:14:16 +02:00
Mike Fährmann
bdc6c8f074
improve message for 'oauth:deviantart' etc ( closes #989 )
2020-09-29 21:25:24 +02:00
Mike Fährmann
430b6d6e2e
[twitter] extend 'retweets' option ( closes #1026 )
...
Setting 'retweets' to '"original"' will use metadata from the
original retweeted Tweets, and not from the Retweet entry.
2020-09-28 23:03:35 +02:00
Mike Fährmann
b9bdd2c564
[hentaifoundry] add support for stories ( closes #734 )
2020-09-27 02:27:40 +02:00
Mike Fährmann
9a9d1924d8
[hentaicafe] add 'manga_id' metadata field ( closes #1036 )
...
This field is only available when using a non-foolslide URL
like '/hc.fyi/9874' or '/hazuki-yuuto-summer-blues/'
2020-09-26 14:34:48 +02:00
Mike Fährmann
cc4ac80302
[weasyl] add 'favorite' extractor ( #1032 )
2020-09-26 13:09:03 +02:00
Mike Fährmann
e9cc719497
[weasyl] update and simplify
...
- simplify 'pattern' regexps
- parse 'posted_at' as 'date'
- use unaltered 'title' ({title!l:R /_/} to lowercase and replace spaces)
2020-09-26 02:10:45 +02:00
Mike Fährmann
6514312126
[nijie] add 'include' option ( closes #1018 )
2020-09-25 18:18:35 +02:00
Mike Fährmann
0d43456323
[hentaifoundry] add 'include' option
2020-09-25 18:18:03 +02:00
Zanny
ebb7737b9b
Weasyl Extractor ( #977 )
...
* weasyl extractor
* @kattjevfel suggested changes
* @mikf changes
2020-09-25 15:18:21 +02:00
Mike Fährmann
d5fa716d89
fix crash when using 'skip=false' and archive ( fixes #1023 )
...
Separating the archive check from pathfmt.exists() in b5243297
had some unintended side effects.
It is also not possible to monkey-patch a dunder method like
__contains__ because of the special method lookup that gets
performed for them.
2020-09-23 19:07:40 +02:00
Mike Fährmann
aeb0d32333
[twitter] improve twitpic extraction ( fixes #1019 )
...
- ignore twitpic.com/photos/… URLs
- ignore empty image URLs
2020-09-22 22:22:35 +02:00
Mike Fährmann
2184ec5d78
release version 1.15.0
2020-09-20 22:06:46 +02:00
Mike Fährmann
7cd383c0f9
update extractor test results
2020-09-20 21:54:39 +02:00
Mike Fährmann
1e313d5b84
implement 'sleep-request' option
2020-09-20 20:28:17 +02:00
Mike Fährmann
65744a7a31
use alternative for all falsey values in format strings
...
… and not just None (#525 )
It would be better to consistently use None for all non-existent
fields and/or fields without a valid value, but this is a good
enough workaround for now.
2020-09-19 22:02:47 +02:00
Mike Fährmann
c43b3894be
[myhentaigallery] update and fix extraction ( #1001 )
...
- extract more metadata
- match "/show/" URLs
- complete test results
- fix missing images for lines starting with " <img"
- fix missing comma in supportedsites.py
2020-09-17 18:14:23 +02:00
choeronline
05b9ac8d37
[myhentaigallery] add extractor ( #1001 )
...
* adds support for myhentaigallery
* fixes linting issues in myhentaigallery extractor
2020-09-17 17:32:54 +02:00
Mike Fährmann
2626629117
[danbooru] handle posts without 'id' ( fixes #1004 )
2020-09-16 21:35:27 +02:00
Mike Fährmann
cc1fb0b4ea
[500px] update query hash
2020-09-16 01:26:31 +02:00
Mike Fährmann
da87a5fb7e
[exhentai] fix accessing config before main constructor
...
bug introduced with 055c32e0
Making 'Extractor.config()' quite a bit faster is worth the "cost"
of having to set _cfgpath in exhentai constructors, I think.
2020-09-15 18:09:50 +02:00
Mike Fährmann
f5b7ae01c1
update extractor test results
2020-09-15 18:07:08 +02:00
Mike Fährmann
136df52d1f
[deviantart] support watchers-only/paid deviations ( #995 )
2020-09-15 16:03:46 +02:00
Mike Fährmann
055c32e0f7
precompute extractor config paths
2020-09-14 22:06:54 +02:00
Mike Fährmann
231dd4c800
accumulate postprocessor objects ( #994 )
...
Instead of one 'postprocessors' setting overwriting all others lower
in the hierarchy, all postprocessors along the config path will now
get collected into one big list.
For example '--mtime-from-date' will therefore no longer cause
other postprocessor settings in a config file to get ignored.
2020-09-14 21:51:55 +02:00
Mike Fährmann
392d022b04
implement 'config.accumulate()' ( #994 )
2020-09-14 21:13:08 +02:00
Mike Fährmann
3afd362e2e
add 'sleep-extractor' option ( closes #964 )
...
(would have been nice if this were possible without code duplication)
2020-09-12 21:04:47 +02:00
Mike Fährmann
3108e85b89
[worldthree] remove extractors
...
http://www.slide.world-three.org/ hasn't been accessible for a long time.
2020-09-11 18:12:57 +02:00
Mike Fährmann
8fed3eb8cb
[jaiminisbox] remove extractors
...
https://jaiminisbox.com/post.html
2020-09-11 18:09:35 +02:00
Mike Fährmann
dcf3ad7eef
[furaffinity] update download URL extraction ( fixes #988 )
...
support the new 'd2.facdn.net' subdomain
2020-09-11 13:23:57 +02:00
Mike Fährmann
3918b69677
remove 'extractor.blacklist' context manager
2020-09-11 13:17:35 +02:00
Mike Fährmann
c78aa17506
add general 'blacklist' and 'whitelist' options ( #492 , #844 )
2020-09-11 13:17:12 +02:00
Mike Fährmann
abda352a5b
add '--no-skip' command-line option ( closes #986 )
2020-09-11 01:23:39 +02:00
Mike Fährmann
5912727b88
support format string replacement fields in archive paths
...
(closes #985 )
2020-09-10 22:09:30 +02:00
Mike Fährmann
2b8d57f0ab
[twitter] support '/intent/user?user_id=…' URLs ( #980 )
2020-09-08 23:17:50 +02:00
Mike Fährmann
a3b473bd2f
[twitter] support specifying users by ID ( #980 )
...
by using 'id:…' as their screen name, i.e.
https://www.twitter.com/id:2976459548/media
instead of
https://twitter.com/supernaturepics/media
The user ID can, for example, be obtained from the output of
$ gallery-dl -j --range 1 https://twitter.com/ <screen-name>
2020-09-08 22:56:52 +02:00
Mike Fährmann
a0d916ed41
[exhentai] update wait time before original image download ( #978 )
...
depend on 'wait-max', don't use a hard-coded value
2020-09-07 23:48:28 +02:00
Mike Fährmann
f6fd449b59
reduce wait time growth rate from exponential to linear
...
Waiting for 2**N seconds after each error grows too fast.
Simply waiting N seconds seems far more reasonable.
2020-09-06 22:38:25 +02:00
Mike Fährmann
bc48514d84
[aryion] get post ID via gallery-item ( fixes #981 , closes #982 )
...
this even works when fetching post IDs from '/latest.php?id='
2020-09-06 22:17:23 +02:00
Mike Fährmann
799ca07fc8
[imgur] update
...
- fix image/album detection for galleries
- use new API endpoints for image/album data
2020-09-06 21:11:32 +02:00
Mike Fährmann
b5243297ff
write skipped files to archive ( closes #550 )
2020-09-03 18:37:38 +02:00
Mike Fährmann
ac3036ef56
add 'filesize-min' and 'filesize-max' options ( closes #780 )
2020-09-03 18:21:04 +02:00
Mike Fährmann
7876a03ece
[tumblr] create directories for each post ( fixes #965 )
...
This changes the identifiers for directory format string fields.
Everything blog related is now inside a 'blog' object
and not at the "base level" anymore.
E.g. '{name}' for directories is now '{blog[name]}'
(or '{blog_name}', since that is also available)
2020-08-31 21:58:20 +02:00
Mike Fährmann
fd0685d9b5
[postprocessor:zip] defer zip file creation ( fixes #968 )
...
don't try to create zip files on postprocessor construction,
wait until directory creation during file download,
2020-08-31 21:53:18 +02:00
Mike Fährmann
33fe67b594
release version 1.14.5
2020-08-30 21:20:26 +02:00
Mike Fährmann
d50f3b333a
update extractor test results
2020-08-30 20:55:22 +02:00
Mike Fährmann
0f55b8e80a
[exhentai] fix type check from dbbbb21
( #940 )
...
'bool' is a subclass of 'int', and therefore
'isinstance(self.limits, int)' also returns True when
'self.limits' has a boolean value
2020-08-30 20:51:22 +02:00
Mike Fährmann
e33293fdd8
[hentaihand] update to new site layout
2020-08-30 00:41:03 +02:00
Mike Fährmann
fda9e296dd
[gelbooru] fix extraction without API
2020-08-28 22:33:37 +02:00
Mike Fährmann
69e4871005
update extractor test results
...
- sensescans: replace 404d chapters
- mangapark: replace 404d chapters
- subscribestar: update test for attached files
2020-08-28 22:32:32 +02:00
Mike Fährmann
ab1af66a97
[imgur] add 'search' extractor ( #934 )
2020-08-27 22:46:17 +02:00
Mike Fährmann
e4bbc1fb5c
[imgur] add 'tag' extractor ( #934 )
2020-08-27 22:46:17 +02:00
Mike Fährmann
deaacc70bb
[hitomi] update URL pattern for tag searches
2020-08-27 22:46:03 +02:00
ArtaxIsSleeping
0e941553ec
[aryion] Add username/password support ( #960 )
...
* Add username/password support to aryion extractor
* Update docs to match
* Fix code style
2020-08-27 22:45:30 +02:00
Mike Fährmann
84e04cc23b
[500px] fix extraction and update URL patterns ( fixes #956 )
...
- rewrite most API calls to GraphQL queries
- match '500px.com/p/<user>' URLs
2020-08-24 18:25:31 +02:00
Mike Fährmann
d4ff767291
[reddit] improve gallery extraction ( fixes #955 )
2020-08-23 22:06:06 +02:00
Mike Fährmann
7140fe7e6d
[hitomi] fix redirect processing
2020-08-23 15:18:44 +02:00
Mike Fährmann
a57b6b3c3a
[reddit] handle deleted galleries ( fixes #953 )
2020-08-20 20:14:07 +02:00
Mike Fährmann
063c71cd84
[furaffinity] add 'search' extractor ( closes #915 )
2020-08-18 21:26:46 +02:00
Mike Fährmann
dbbbb21180
[exhentai] add ability to specify custom image limit ( #940 )
2020-08-17 22:29:20 +02:00
Mike Fährmann
b2009ea39e
[aryion] update folder mime type list ( fixes #945 )
2020-08-16 22:30:15 +02:00
Mike Fährmann
688bd046fc
release version 1.14.4
2020-08-15 21:29:02 +02:00
Mike Fährmann
d06ad148c7
[shopify] use alternate regex for products on collection pages
...
when the first on doesn't yield any results
2020-08-15 18:24:14 +02:00
Mike Fährmann
7619152988
[reactor] sort 'tags'
...
to ensure a consistent order for test results
2020-08-15 18:22:31 +02:00
Mike Fährmann
cd9de613a2
[exhentai] adjust image limit costs ( #940 )
...
Each original file costs 10 points per 10^6 bytes,
not 10 per 2^20 == 1048576 bytes.
2020-08-15 18:19:33 +02:00
Mike Fährmann
2e6f6ee1c1
[mangoxo] fix login
2020-08-13 22:30:37 +02:00
Mike Fährmann
a6a080656c
[pixnet] detect password-protected albums ( #177 )
2020-08-08 20:48:47 +02:00
Mike Fährmann
67ac6667af
[mangareader] fix extraction
2020-08-07 22:30:10 +02:00
Mike Fährmann
2b88c90f6f
[blogger] add search extractor ( #925 )
2020-08-06 19:43:39 +02:00
Mike Fährmann
d5067c51c5
[instagram] support '/reel/' URLs
2020-08-06 19:20:25 +02:00
Mike Fährmann
2c9766b29f
fix UnboundLocalError in Extractor.request()
...
introduced in d6a271d
2020-08-05 21:52:04 +02:00
Mike Fährmann
aa64149583
[blogger] support searching posts by labels ( closes #925 )
2020-08-04 22:49:37 +02:00
Mike Fährmann
60ba3cb946
[reddit] support gallery posts ( closes #920 )
2020-08-03 22:06:15 +02:00
Mike Fährmann
0d84d3af55
[subscribestar] extract attached media files ( #852 )
2020-08-03 22:02:42 +02:00
Mike Fährmann
19bf76bcf8
update extractor test results
2020-08-03 21:57:00 +02:00
Mike Fährmann
0762d6b29c
[inkbunny] add 'num' field ( #283 )
2020-07-30 19:26:09 +02:00
Mike Fährmann
fbc4278fe4
[instagram] wait before GraphQL requests ( #901 )
2020-07-30 19:26:09 +02:00
Mike Fährmann
ec5870576d
[imgur] handle 403 overcapacity responses ( closes #910 )
2020-07-30 19:26:01 +02:00
Mike Fährmann
d6a271d2c7
add 'response' objects to 'HttpError's
2020-07-30 18:23:26 +02:00
Mike Fährmann
72c5578a27
[hentainexus] improve/simplify code
2020-07-30 00:35:49 +02:00
Mike Fährmann
627d2141d3
[xhamster] fix extraction ( closes #917 )
2020-07-29 22:51:34 +02:00
Mike Fährmann
3f73cc6855
allow 'parent-directory' to work recursively ( fixes #905 )
2020-07-29 00:31:23 +02:00
Mike Fährmann
27e31f4a16
[myportfolio] raise 'NotFoundError' for deleted posts
2020-07-27 16:15:24 +02:00
Mike Fährmann
f317a57c5e
[simplyhentai] fix 'gallery_id' extraction
2020-07-27 16:14:06 +02:00
Mike Fährmann
daeef8a5e3
[vsco] handle missing 'description' fields
2020-07-27 14:45:17 +02:00
Mike Fährmann
26a967cbd4
[pinterest] match 'pinterest.co.uk' URLs ( fixes #914 )
2020-07-27 14:41:34 +02:00
Mike Fährmann
c5aaa1de77
[inkbunny] simplify metadata structure ( #283 )
...
Just put everything at the top level,
instead of having a separate 'post' object.
2020-07-26 23:43:50 +02:00
Mike Fährmann
b921fee24d
[inkbunny] fix submission order ( #283 )
...
Getting detailed submission info via /api_submissions.php reordered the
input submissions and sorted them by ID. InkbunnyAPI.detail() now sorts
them back and ensures they are returned in their original order.
This commit also removes the 'metadata' option and always requests
submission descriptions.
2020-07-26 23:12:45 +02:00
Mike Fährmann
e50c75628c
[subscribestar] update 'date' parsing
2020-07-24 22:27:36 +02:00
Mike Fährmann
c4ed9f4faa
[inkbunny] add 'metadata' option ( #283 )
2020-07-24 18:05:53 +02:00
Mike Fährmann
493cadb1e7
[inkbunny] add 'orderby' option ( #283 )
2020-07-24 17:50:32 +02:00
Mike Fährmann
336e682a7a
[inkbunny] handle gallery/scraps URLs ( #283 )
2020-07-24 17:05:00 +02:00
Mike Fährmann
8dbf827649
[bobx] remove module
2020-07-24 17:00:43 +02:00
Mike Fährmann
8f64585ff2
[twitter] handle 429 responses without x-rate-limit-reset header
2020-07-23 22:38:17 +02:00