1
0
mirror of https://github.com/mikf/gallery-dl.git synced 2024-11-24 03:32:33 +01:00
Commit Graph

2055 Commits

Author SHA1 Message Date
Mike Fährmann
c5e3971b18
[newgrounds] extract image embeds (closes #1033) 2020-10-11 18:15:40 +02:00
dawidsowa
43b156fb40
[reactor] match URLs without subdomain (#1053) 2020-10-11 18:15:06 +02:00
Mike Fährmann
3ebb174f2c
add missing extractor info when spawning new ones (fixes #1051)
Not having this information causes the blacklist/whitelist logic to
trigger and prevents things from functioning as intended when using
default settings.

Fixes issues for 8muses, deviantart, exhentai, and mangoxo.
2020-10-08 14:34:53 +02:00
Mike Fährmann
f9c1684af7
[newgrounds] restore original video URLs (#1042) 2020-10-07 22:53:53 +02:00
Mike Fährmann
73373c06ec
[weibo] handle posts with more than 9 images (closes #926)
Responses from '/api/container/getIndex' don't list more than
9 images per 'status' object, but the embedded JSON from a
'/detail/<ID>' page does.
2020-10-06 18:16:08 +02:00
Mike Fährmann
dd1e545597
[hentaifoundry] rename GalleryExtractor to PicturesExtractor 2020-10-04 22:53:23 +02:00
Mike Fährmann
c874071f5a
[kissmanga] remove module 2020-10-04 22:46:41 +02:00
Mike Fährmann
93e04bf9a9
[500px] update query hashes 2020-10-03 19:25:28 +02:00
Mike Fährmann
844502cad5
update extractor test results 2020-10-03 19:24:19 +02:00
Mike Fährmann
fad7748b6b
[xvideos] fix 'title' extraction 2020-10-01 22:04:14 +02:00
Mike Fährmann
5b927c15df
[newgrounds] fix video extraction (closes #1042) 2020-10-01 20:14:16 +02:00
Mike Fährmann
bdc6c8f074
improve message for 'oauth:deviantart' etc (closes #989) 2020-09-29 21:25:24 +02:00
Mike Fährmann
430b6d6e2e
[twitter] extend 'retweets' option (closes #1026)
Setting 'retweets' to '"original"' will use metadata from the
original retweeted Tweets, and not from the Retweet entry.
2020-09-28 23:03:35 +02:00
Mike Fährmann
b9bdd2c564
[hentaifoundry] add support for stories (closes #734) 2020-09-27 02:27:40 +02:00
Mike Fährmann
9a9d1924d8
[hentaicafe] add 'manga_id' metadata field (closes #1036)
This field is only available when using a non-foolslide URL
like '/hc.fyi/9874' or '/hazuki-yuuto-summer-blues/'
2020-09-26 14:34:48 +02:00
Mike Fährmann
cc4ac80302
[weasyl] add 'favorite' extractor (#1032) 2020-09-26 13:09:03 +02:00
Mike Fährmann
e9cc719497
[weasyl] update and simplify
- simplify 'pattern' regexps
- parse 'posted_at' as 'date'
- use unaltered 'title' ({title!l:R /_/} to lowercase and replace spaces)
2020-09-26 02:10:45 +02:00
Mike Fährmann
6514312126
[nijie] add 'include' option (closes #1018) 2020-09-25 18:18:35 +02:00
Mike Fährmann
0d43456323
[hentaifoundry] add 'include' option 2020-09-25 18:18:03 +02:00
Zanny
ebb7737b9b
Weasyl Extractor (#977)
* weasyl extractor

* @kattjevfel suggested changes

* @mikf changes
2020-09-25 15:18:21 +02:00
Mike Fährmann
aeb0d32333
[twitter] improve twitpic extraction (fixes #1019)
- ignore twitpic.com/photos/… URLs
- ignore empty image URLs
2020-09-22 22:22:35 +02:00
Mike Fährmann
7cd383c0f9
update extractor test results 2020-09-20 21:54:39 +02:00
Mike Fährmann
1e313d5b84
implement 'sleep-request' option 2020-09-20 20:28:17 +02:00
Mike Fährmann
c43b3894be
[myhentaigallery] update and fix extraction (#1001)
- extract more metadata
- match "/show/" URLs
- complete test results
- fix missing images for lines starting with " <img"
- fix missing comma in supportedsites.py
2020-09-17 18:14:23 +02:00
choeronline
05b9ac8d37
[myhentaigallery] add extractor (#1001)
* adds support for myhentaigallery

* fixes linting issues in myhentaigallery extractor
2020-09-17 17:32:54 +02:00
Mike Fährmann
2626629117
[danbooru] handle posts without 'id' (fixes #1004) 2020-09-16 21:35:27 +02:00
Mike Fährmann
cc1fb0b4ea
[500px] update query hash 2020-09-16 01:26:31 +02:00
Mike Fährmann
da87a5fb7e
[exhentai] fix accessing config before main constructor
bug introduced with 055c32e0

Making 'Extractor.config()' quite  a bit faster is worth the "cost"
of having to set _cfgpath in exhentai constructors, I think.
2020-09-15 18:09:50 +02:00
Mike Fährmann
f5b7ae01c1
update extractor test results 2020-09-15 18:07:08 +02:00
Mike Fährmann
136df52d1f
[deviantart] support watchers-only/paid deviations (#995) 2020-09-15 16:03:46 +02:00
Mike Fährmann
055c32e0f7
precompute extractor config paths 2020-09-14 22:06:54 +02:00
Mike Fährmann
231dd4c800
accumulate postprocessor objects (#994)
Instead of one 'postprocessors' setting overwriting all others lower
in the hierarchy, all postprocessors along the config path will now
get collected into one big list.

For example '--mtime-from-date' will therefore no longer cause
other postprocessor settings in a config file to get ignored.
2020-09-14 21:51:55 +02:00
Mike Fährmann
3108e85b89
[worldthree] remove extractors
http://www.slide.world-three.org/ hasn't been accessible for a long time.
2020-09-11 18:12:57 +02:00
Mike Fährmann
8fed3eb8cb
[jaiminisbox] remove extractors
https://jaiminisbox.com/post.html
2020-09-11 18:09:35 +02:00
Mike Fährmann
dcf3ad7eef
[furaffinity] update download URL extraction (fixes #988)
support the new 'd2.facdn.net' subdomain
2020-09-11 13:23:57 +02:00
Mike Fährmann
3918b69677
remove 'extractor.blacklist' context manager 2020-09-11 13:17:35 +02:00
Mike Fährmann
2b8d57f0ab
[twitter] support '/intent/user?user_id=…' URLs (#980) 2020-09-08 23:17:50 +02:00
Mike Fährmann
a3b473bd2f
[twitter] support specifying users by ID (#980)
by using 'id:…' as their screen name, i.e.
https://www.twitter.com/id:2976459548/media
instead of
https://twitter.com/supernaturepics/media

The user ID can, for example, be obtained from the output of
$ gallery-dl -j --range 1 https://twitter.com/<screen-name>
2020-09-08 22:56:52 +02:00
Mike Fährmann
a0d916ed41
[exhentai] update wait time before original image download (#978)
depend on 'wait-max', don't use a hard-coded value
2020-09-07 23:48:28 +02:00
Mike Fährmann
f6fd449b59
reduce wait time growth rate from exponential to linear
Waiting for 2**N seconds after each error grows too fast.
Simply waiting N seconds seems far more reasonable.
2020-09-06 22:38:25 +02:00
Mike Fährmann
bc48514d84
[aryion] get post ID via gallery-item (fixes #981, closes #982)
this even works when fetching post IDs from '/latest.php?id='
2020-09-06 22:17:23 +02:00
Mike Fährmann
799ca07fc8
[imgur] update
- fix image/album detection for galleries
- use new API endpoints for image/album data
2020-09-06 21:11:32 +02:00
Mike Fährmann
7876a03ece
[tumblr] create directories for each post (fixes #965)
This changes the identifiers for directory format string fields.
Everything blog related is now inside a 'blog' object
and not at the "base level" anymore.

E.g. '{name}' for directories is now '{blog[name]}'
(or '{blog_name}', since that is also available)
2020-08-31 21:58:20 +02:00
Mike Fährmann
d50f3b333a
update extractor test results 2020-08-30 20:55:22 +02:00
Mike Fährmann
0f55b8e80a
[exhentai] fix type check from dbbbb21 (#940)
'bool' is a subclass of 'int', and therefore
'isinstance(self.limits, int)' also returns True when
'self.limits' has a boolean value
2020-08-30 20:51:22 +02:00
Mike Fährmann
e33293fdd8
[hentaihand] update to new site layout 2020-08-30 00:41:03 +02:00
Mike Fährmann
fda9e296dd
[gelbooru] fix extraction without API 2020-08-28 22:33:37 +02:00
Mike Fährmann
69e4871005
update extractor test results
- sensescans: replace 404d chapters
- mangapark: replace 404d chapters
- subscribestar: update test for attached files
2020-08-28 22:32:32 +02:00
Mike Fährmann
ab1af66a97
[imgur] add 'search' extractor (#934) 2020-08-27 22:46:17 +02:00
Mike Fährmann
e4bbc1fb5c
[imgur] add 'tag' extractor (#934) 2020-08-27 22:46:17 +02:00
Mike Fährmann
deaacc70bb
[hitomi] update URL pattern for tag searches 2020-08-27 22:46:03 +02:00
ArtaxIsSleeping
0e941553ec
[aryion] Add username/password support (#960)
* Add username/password support to aryion extractor

* Update docs to match

* Fix code style
2020-08-27 22:45:30 +02:00
Mike Fährmann
84e04cc23b
[500px] fix extraction and update URL patterns (fixes #956)
- rewrite most API calls to GraphQL queries
- match '500px.com/p/<user>' URLs
2020-08-24 18:25:31 +02:00
Mike Fährmann
d4ff767291
[reddit] improve gallery extraction (fixes #955) 2020-08-23 22:06:06 +02:00
Mike Fährmann
7140fe7e6d
[hitomi] fix redirect processing 2020-08-23 15:18:44 +02:00
Mike Fährmann
a57b6b3c3a
[reddit] handle deleted galleries (fixes #953) 2020-08-20 20:14:07 +02:00
Mike Fährmann
063c71cd84
[furaffinity] add 'search' extractor (closes #915) 2020-08-18 21:26:46 +02:00
Mike Fährmann
dbbbb21180
[exhentai] add ability to specify custom image limit (#940) 2020-08-17 22:29:20 +02:00
Mike Fährmann
b2009ea39e
[aryion] update folder mime type list (fixes #945) 2020-08-16 22:30:15 +02:00
Mike Fährmann
d06ad148c7
[shopify] use alternate regex for products on collection pages
when the first on doesn't yield any results
2020-08-15 18:24:14 +02:00
Mike Fährmann
7619152988
[reactor] sort 'tags'
to ensure a consistent order for test results
2020-08-15 18:22:31 +02:00
Mike Fährmann
cd9de613a2
[exhentai] adjust image limit costs (#940)
Each original file costs 10 points per 10^6 bytes,
not 10 per 2^20 == 1048576 bytes.
2020-08-15 18:19:33 +02:00
Mike Fährmann
2e6f6ee1c1
[mangoxo] fix login 2020-08-13 22:30:37 +02:00
Mike Fährmann
a6a080656c
[pixnet] detect password-protected albums (#177) 2020-08-08 20:48:47 +02:00
Mike Fährmann
67ac6667af
[mangareader] fix extraction 2020-08-07 22:30:10 +02:00
Mike Fährmann
2b88c90f6f
[blogger] add search extractor (#925) 2020-08-06 19:43:39 +02:00
Mike Fährmann
d5067c51c5
[instagram] support '/reel/' URLs 2020-08-06 19:20:25 +02:00
Mike Fährmann
2c9766b29f
fix UnboundLocalError in Extractor.request()
introduced in d6a271d
2020-08-05 21:52:04 +02:00
Mike Fährmann
aa64149583
[blogger] support searching posts by labels (closes #925) 2020-08-04 22:49:37 +02:00
Mike Fährmann
60ba3cb946
[reddit] support gallery posts (closes #920) 2020-08-03 22:06:15 +02:00
Mike Fährmann
0d84d3af55
[subscribestar] extract attached media files (#852) 2020-08-03 22:02:42 +02:00
Mike Fährmann
19bf76bcf8
update extractor test results 2020-08-03 21:57:00 +02:00
Mike Fährmann
0762d6b29c
[inkbunny] add 'num' field (#283) 2020-07-30 19:26:09 +02:00
Mike Fährmann
fbc4278fe4
[instagram] wait before GraphQL requests (#901) 2020-07-30 19:26:09 +02:00
Mike Fährmann
ec5870576d
[imgur] handle 403 overcapacity responses (closes #910) 2020-07-30 19:26:01 +02:00
Mike Fährmann
d6a271d2c7
add 'response' objects to 'HttpError's 2020-07-30 18:23:26 +02:00
Mike Fährmann
72c5578a27
[hentainexus] improve/simplify code 2020-07-30 00:35:49 +02:00
Mike Fährmann
627d2141d3
[xhamster] fix extraction (closes #917) 2020-07-29 22:51:34 +02:00
Mike Fährmann
27e31f4a16
[myportfolio] raise 'NotFoundError' for deleted posts 2020-07-27 16:15:24 +02:00
Mike Fährmann
f317a57c5e
[simplyhentai] fix 'gallery_id' extraction 2020-07-27 16:14:06 +02:00
Mike Fährmann
daeef8a5e3
[vsco] handle missing 'description' fields 2020-07-27 14:45:17 +02:00
Mike Fährmann
26a967cbd4
[pinterest] match 'pinterest.co.uk' URLs (fixes #914) 2020-07-27 14:41:34 +02:00
Mike Fährmann
c5aaa1de77
[inkbunny] simplify metadata structure (#283)
Just put everything at the top level,
instead of having a separate 'post' object.
2020-07-26 23:43:50 +02:00
Mike Fährmann
b921fee24d
[inkbunny] fix submission order (#283)
Getting detailed submission info via /api_submissions.php reordered the
input submissions and sorted them by ID. InkbunnyAPI.detail() now sorts
them back and ensures they are returned in their original order.

This commit also removes the 'metadata' option and always requests
submission descriptions.
2020-07-26 23:12:45 +02:00
Mike Fährmann
e50c75628c
[subscribestar] update 'date' parsing 2020-07-24 22:27:36 +02:00
Mike Fährmann
c4ed9f4faa
[inkbunny] add 'metadata' option (#283) 2020-07-24 18:05:53 +02:00
Mike Fährmann
493cadb1e7
[inkbunny] add 'orderby' option (#283) 2020-07-24 17:50:32 +02:00
Mike Fährmann
336e682a7a
[inkbunny] handle gallery/scraps URLs (#283) 2020-07-24 17:05:00 +02:00
Mike Fährmann
8dbf827649
[bobx] remove module 2020-07-24 17:00:43 +02:00
Mike Fährmann
8f64585ff2
[twitter] handle 429 responses without x-rate-limit-reset header 2020-07-23 22:38:17 +02:00
Mike Fährmann
d2e17e16bf
[inkbunny] update tests (#283) 2020-07-23 22:37:05 +02:00
Mike Fährmann
57f7d9b790
[inkbunny] improve error handling (#283) 2020-07-23 22:31:22 +02:00
Mike Fährmann
baf5d0e3c1
[gfycat] skip malformed gfycat responses (closes #902) 2020-07-22 23:59:56 +02:00
Mike Fährmann
453f3bc519
[blogger] improve error messages for missing posts/blogs (#903) 2020-07-22 23:51:48 +02:00
Mike Fährmann
87202b8d74
[inkbunny] add 'user' and 'post' extractors (#283) 2020-07-22 22:21:30 +02:00
Mike Fährmann
2ecf1efb16
update extractor test results
- tumblr: remove deleted post
- jaiminisbox: replace removed manga/chapters
- smugmug: one inconsequential field got removed
2020-07-18 15:12:28 +02:00
Mike Fährmann
d5fcffcced
[subscribestar] add login capabilities (#852) 2020-07-17 22:18:01 +02:00
Mike Fährmann
ecaecc4064
[exhentai] add 'domain' option (#897) 2020-07-17 22:17:46 +02:00
Mike Fährmann
45c32213dc
[gfycat] retry 404'ed videos on redgifs (closes #874) 2020-07-16 15:00:32 +02:00
Mike Fährmann
cf44571fe0
[gfycat] add 'user' and 'search' extractors 2020-07-16 15:00:32 +02:00
Mike Fährmann
11b744d971
[mangakakalot] improve/fix chapter extraction 2020-07-16 15:00:31 +02:00
Mike Fährmann
2da71cb561
[twitter] raise proper exception if user doesn't exist (#891) 2020-07-16 15:00:31 +02:00
Leonardo Taccari
86e5a05e29
[twitter] add support for nitter.net URLs in pattern (#890)
Please note that URLs are only "translated", all requests are still
done always via the Twitter API.
2020-07-13 23:48:42 +02:00
Mike Fährmann
e17d4f44f6
[newgrounds] fix favorites extraction 2020-07-13 23:08:45 +02:00
Mike Fährmann
c51fbd72ba
update extractor test results 2020-07-13 22:57:48 +02:00
Mike Fährmann
9cd1bc6907
[mangakakalot] update URL patterns, fix flake8 errors (#876) 2020-07-13 22:47:24 +02:00
jakem72360
7dfdcc3fbf
[mangakakalot] Added extractors for MangaKakalot (#876) 2020-07-13 21:20:09 +02:00
Mike Fährmann
cb0132e441
[khinsider] add 'format' option (closes #840) 2020-07-13 17:17:58 +02:00
Mike Fährmann
d594977ca1
[artstation] add 'following' extractor (closes #888) 2020-07-12 23:03:05 +02:00
Mike Fährmann
3855d0dd3c
[twitter] add debug messages for all skipped Tweets (#867) 2020-07-11 00:41:50 +02:00
Mike Fährmann
27d163afb3
[imgur] support all '/t/...' URLs (closes #880)
… instead of just '/t/unmuted/'
2020-07-09 22:17:01 +02:00
Mike Fährmann
f5c9f1d066
[subscribestar] use current date instead of hard-coded '2020' (#852) 2020-07-09 22:12:39 +02:00
Mike Fährmann
5a6e750704
[reddit] fix AttributeError when using 'recursion' (fixes #879) 2020-07-09 19:19:05 +02:00
Mike Fährmann
94a08f0bcb
[reddit] limit title length in default filenames (#873) 2020-07-09 18:19:33 +02:00
Mike Fährmann
3424fb96c3
[redgifs] support gifsdeliverynetwork.com URLs (#874) 2020-07-09 18:04:30 +02:00
Mike Fährmann
f1344fe552
[patreon] yield images and attachments before postfiles (#871)
The reported filename of the 'postfile' entry of each post may differ
from the corresponding entry in the list of images or attachments,
and be outright "wrong".
2020-07-09 00:10:26 +02:00
Mike Fährmann
6e2af9a8d8
[twitter] improve error message formatting 2020-07-06 23:13:05 +02:00
Mike Fährmann
c28db7a6ea
[8muses] support 'comics.8muses.com' URLs 2020-07-05 19:43:45 +02:00
Mike Fährmann
d5bfb0b38c
set pseudo extension for Metadata messages (#865)
This prevents pathfmt.filename from potentially being empty.
2020-07-04 22:14:39 +02:00
Mike Fährmann
821524e4ee
[subscribestar] add 'user' and 'post' extractors (#852) 2020-07-03 21:08:47 +02:00
Mike Fährmann
e62ebb4643
update CHANGELOG before building sdist and wheel packages 2020-06-27 19:45:09 +02:00
Mike Fährmann
f1ddbff0b5
[aryion] add 'recursive' option (fixes #832)
This is enabled by default and will recursively go through all
(sub)folders in an artist's gallery.

The old method of using "Latest Updates" lists can be restored by
disabling this option.
2020-06-26 23:36:50 +02:00
Mike Fährmann
699062b91f
Revert "[kissmanga] workaround for CAPTCHAs (#818)"
This reverts commit 4cf3d54718.
2020-06-25 19:35:03 +02:00
Mike Fährmann
0cac14c3bd
update extractor test results 2020-06-25 19:11:47 +02:00
Mike Fährmann
5e5be67c26
[tumblr] prevent KeyErrors when using reblogs=same-blog
(fixes #851)
2020-06-25 19:00:12 +02:00
Mike Fährmann
9da2bc67f8
[twitter] add option to filter media from quoted tweets (#854) 2020-06-25 18:59:25 +02:00
Mike Fährmann
56ab5fb8f4
[twitter] improve handling of quoted tweets (#854)
Split each "quote" into two parts:
- the original tweet
- the tweet that quoted the original
2020-06-24 21:14:18 +02:00
Mike Fährmann
bd0e1ca1a5
[imgur] build directory path for each file (closes #842) 2020-06-21 19:25:52 +02:00
Mike Fährmann
a8c2d997e8
[twitter] treat quoted tweets like retweets (#833)
- filter them when 'retweets' is disabled
- set 'author' to the creator of the quoted tweet

like it was before the rewrite
2020-06-21 19:14:12 +02:00
Mike Fährmann
aed1c63e51
[twitter] improve search results (fixes #847)
Adding 'tweet_search_mode=live' to the query parameters
is the most important part here.
2020-06-21 15:53:20 +02:00
Mike Fährmann
0e714b9a0e
[pinterest] add 'section' extractor (#835) 2020-06-21 00:08:14 +02:00
Mike Fährmann
53cc498d9c
improve config lookup when there are multiple possible locations
This specifically applies to all Mastodon extractors and all
extractors with a 'basecategory', i.e. 'booru', 'foolslide', etc.

Values inside those general config locations wouldn't be recognized
when a value with the same was set on the 'extractor' level.

For example 'extractor.mastodon.directory' should be used over
'extractor.directory' when both are set, but this was impossible
with the previous implementation.

(fixes #843)
2020-06-21 00:07:10 +02:00
Mike Fährmann
d81a8e6544
[twitter] update tests 2020-06-19 23:01:02 +02:00
Mike Fährmann
d39eedd9bb
[twitter] improve handling of deleted tweets (fixes #838) 2020-06-19 18:11:37 +02:00
Mike Fährmann
1ae1df0d27
update '--write-pages' (#737)
- fix infinite recursion for responses with multiple entries in
  'history'
- hide values of Set-Cookie headers
- only write the response content by default
  (use '-o write-pages=all' to also include HTTP headers)
2020-06-18 15:07:30 +02:00
Mike Fährmann
dc16f73965
[twitter] move '_guest_token()' into TwitterAPI class 2020-06-18 15:02:51 +02:00
Mike Fährmann
3561d1020a
[twitter] always provide an 'author' field (#831, #833)
The idea was to have less metadata clutter for most Tweets were
'author' and 'user' are the same (non-retweets), and only provide
a 'user' field.

The original Tweet author could be gotten with
{author[…]|user[…]}, but basically no one knows about that.
2020-06-18 15:02:51 +02:00
Mike Fährmann
7158bdd7c7
[weibo] improve extractor logic (#829) 2020-06-18 15:00:31 +02:00
Mike Fährmann
0371fd54a1
[artstation] add 'date' metadata field (#839) 2020-06-17 20:22:18 +02:00
Mike Fährmann
8c857052d7
[mastodon] ignore toots without media attachments 2020-06-17 20:21:28 +02:00
Mike Fährmann
de045d39b2
[mastodon] add 'date' metadata field (#839) 2020-06-17 19:22:28 +02:00
Mike Fährmann
d5d90a0450
[weibo] add 'date' field to 'status' objects (#829) 2020-06-16 14:46:46 +02:00
Mike Fährmann
5ba90f72ca
[pinterest] add support for sections (closes #835) 2020-06-16 14:41:05 +02:00
Mike Fährmann
c37a1c06c8
[twitter] add extractor for liked tweets (closes #837)
You need to be logged in to get access to anyone's liked tweets,
it seems.
2020-06-16 14:27:22 +02:00
Mike Fährmann
b94394104c
[twitter] don't download video previews (#833)
when 'videos' is set to False
2020-06-16 14:10:51 +02:00
Mike Fährmann
bb882b8cdb
improve output of '-K' for parent extractors (#825) 2020-06-14 21:39:21 +02:00
Mike Fährmann
4cf3d54718
[kissmanga] workaround for CAPTCHAs (fixes #818)
Requesting the same page again when being redirected to a CAPTCHA
lets us access that page without solving it.
2020-06-12 00:41:49 +02:00
Mike Fährmann
7daef6ee70
update extractor test results
- certain posts on Instagram now return
  https://static.cdninstagram.com/rsrc.php/null.jpg
  for public users
- MangaDex is deploying its new MangaDex@Home network similar to
  exhentai's Hentai@Home
- realbooru has a new site layout, but the underlying booru API still
  works like before
2020-06-12 00:36:06 +02:00
Mike Fährmann
ffb6c5277a
[furaffinity] add 'artist_url' metadata field (closes #821) 2020-06-11 18:36:24 +02:00
Mike Fährmann
be04e44e2c
[reddit] catch JSON decode errors (#765) 2020-06-11 18:32:52 +02:00
Mike Fährmann
cf863f60b3
[redgifs] add 'user' and 'search' extractors (closes #724) 2020-06-10 22:03:52 +02:00
Mike Fährmann
998d1d3a5c
[webtoons] generalize and improve comic extraction (fixes #820) 2020-06-10 21:44:42 +02:00
Mike Fährmann
036a40943a
[twitter] don't cache results of 'user_by_screen_name()'
A 'keyarg=1' argument to the memcache decorator would have worked as
well, but keeping the user object in memory isn't useful for the vast
majority of use cases and only wastes space.

(closes #817)
2020-06-10 20:58:42 +02:00
Mike Fährmann
4442dfe7b8
[twitter] add 'reply_to' metadata to replies 2020-06-09 21:48:04 +02:00
Mike Fährmann
83b7bd0413
[nhentai] fix extraction (closes #819) 2020-06-09 21:27:07 +02:00
Mike Fährmann
d769bb4b80
[twitter] improve pagination 2020-06-07 15:23:45 +02:00
Mike Fährmann
5bc1097f9d
[twitter] metadata cleanup #2
- remove useless clutter by creating new tweet-data dicts instead of
  reusing the original Tweet objects
- rename fields to how they were named before
  ('id_str' -> 'tweet_id', etc.)
- only include 'author' if it would differ from 'user'
- restore 'archive_fmt'
2020-06-07 02:25:29 +02:00
Mike Fährmann
c6c06c41f6
[deviantart] don't add journal text to description (#712) 2020-06-05 21:56:12 +02:00
Mike Fährmann
4aea5138dd
[sensescans] use https:// 2020-06-05 21:55:19 +02:00
Mike Fährmann
3eed5f52d7
[twitter] small metadata cleanup
- add 'date' field
- remove 'entities' and 'extended_entities'
- don't include 'focus_fields' from 'original_info'
2020-06-04 18:21:54 +02:00
Mike Fährmann
655c98cbef
[twitter] skip unavailable tweets 2020-06-04 14:51:25 +02:00
Mike Fährmann
41d03160ff
[deviantart] also search journals for sta.sh links (#712)
when 'extra' is enabled
2020-06-04 14:47:08 +02:00
Mike Fährmann
2132e5461a
[twitter] restore TwitPic support 2020-06-04 01:22:34 +02:00
Mike Fährmann
bd0f21478a
[twitter] login using the mobile nojs login page 2020-06-04 00:07:12 +02:00
Mike Fährmann
a10f31dde5
[twitter] rewrite; use new interface (#740, #806)
Everything except logging in with username & password and TwitPic
embeds should be working again.

Metadata per Tweet is massively different than before (mostly raw API
responses - might need some cleaning up) and the default 'archive_fmt'
changed.
2020-06-03 20:51:29 +02:00
Mike Fährmann
3bad1579ee
update extractor test results 2020-05-31 17:42:07 +02:00
Mike Fährmann
864f4220d9
update output of 'oauth:…' (#616) 2020-05-31 17:41:40 +02:00
Mike Fährmann
0f459f340b
[instagram] fix and re-enable login with username&password
This reverts commit 3e0848a482.
(#756, #771, #797, #803)

https://github.com/althonos/InsaLooter/issues/287#issuecomment-630456522
2020-05-31 00:29:09 +02:00
Mike Fährmann
3e0848a482
[instagram] disable login with username&password (#756) 2020-05-29 23:29:40 +02:00
Mike Fährmann
a32aea41e1
[instagram] update 'query_hash' values 2020-05-29 23:11:42 +02:00
Mike Fährmann
2bff8dd465
[hentainexus] fix flake8 issues (#787) 2020-05-28 22:45:08 +02:00
Mike Fährmann
a63682a9c0
[instagram] simplify code & complete tests (#743) 2020-05-28 22:31:01 +02:00
墨焓
a4e3d40672
hentainexus.py minor fix (#787)
* rectify code of `join_title`, some minor fix.

* + hentainexus self.data

* fixed: call staticmethod join_title with data
2020-05-28 21:59:26 +02:00
Vrihub
62b65e59d0
Add instagram metadata: post_pageurl, post_tags (#743)
* Add instagram metadata: post_pageurl, post_tags

Add the following metadata for instagram:
- post_pageurl: json string with url of the post page
- post_tags: json array with instagram tags extracted from the post description

* Oops: rename post_tags to tags for --write-tags

This way, --write-tags will pick up the post tags.

* Rename to post_url, improve regex

* Add post_url and tags to tests

* Remove duplicate tags and sort them

* Bugfix: don't create empty tag lists

* Metadata: add location

* Metadata: add tagged_users for each media

* Move self._find_tags() to base class

* Make flake happy
2020-05-28 21:58:24 +02:00
Mike Fährmann
275cceeb6a
[redgifs] fix extraction (#724)
… and prepare for more potential extractors
2020-05-28 02:18:42 +02:00
Mike Fährmann
45baa13615
update extractor test results
- don't run Instagram tests on Travis anymore
- replace Twitter test because timeline was made private
- update Hiperdex domain to '.com' (again ...)
2020-05-28 02:18:06 +02:00
Mike Fährmann
dfcf2a2c91
write OAuth token to cache by default (#616) 2020-05-25 22:35:45 +02:00
Mike Fährmann
15c3d29062
move dump_response() into a separate function (#737) 2020-05-25 22:21:58 +02:00
Mike Fährmann
a363da4b43
include redirects and headers in --write-pages dumps (#737) 2020-05-25 22:21:57 +02:00
Mike Fährmann
6bcdb264e0
[imgur] treat 't/unmuted' URLs as galleries 2020-05-25 22:21:57 +02:00
Mike Fährmann
b6cee3e45b
[imgur] fix extraction of animated images without 'mp4' entry 2020-05-25 22:21:57 +02:00
Leonardo Taccari
bcac31b7c7
[webtoons] make archive_fmt unique (#779)
close #778
2020-05-25 21:23:54 +02:00
Mike Fährmann
e19f665a44
[danbooru] change default for 'ugoira' to 'false'
Downloading the pre-rendered versions should be a better default
than .zip files with individual frames.
2020-05-20 19:57:28 +02:00
Mike Fährmann
3201fe3521
add global SENTINEL object 2020-05-19 22:32:53 +02:00
Mike Fährmann
c8787647ed
add global WINDOWS bool 2020-05-19 22:32:53 +02:00
Mike Fährmann
6294e2c540
add 'text.ensure_http_scheme()' 2020-05-19 22:32:53 +02:00
Mike Fährmann
0378d079a5
[webtoons] fixes and simplifications (#593, #761)
- fix episode listings for french comics
- allow input URLs without explicit scheme
- add 'lang'/'language' metadata
- use str.format() instead of '+' to assemble URLs
2020-05-18 20:20:03 +02:00
Mike Fährmann
ab11b1c896
[imagechest] simplify code (#750) 2020-05-18 19:11:26 +02:00
Mike Fährmann
846d3a2466
[sexcom] replace 404ed test 2020-05-18 19:04:51 +02:00
Mike Fährmann
9b4635917f
[gelbooru] simplify and fix pool extraction
use 'pool:<pool id>' as search tag to get pool posts
2020-05-18 19:04:51 +02:00
Leonardo Taccari
39cd389679
[webtoons] Add a new extractor for webtoons.com (#761)
The webtoons extractor can extract episode and entire comic (all
episodes) from webtoons.com.

All the logic of the extractors should be trivial except for a couple
of kludges needed:

 - `ageGatePass' cookie is always set to avoid possible redirect and stop of
    extraction, especially in the comic extractor
 - The image URLs returned by the episode extractor could not be fetched
   directly and the `Referer:' HTTP header needs to be passed to fetch them

Close #593.
2020-05-18 19:04:20 +02:00
Bepis
7b5711ee04
[imagechest] Add new extractor for ImageChest (#750)
* [imagechest] Add new extractor for ImageChest

* [imagechest] Fix flake8 compliance issues
2020-05-18 19:02:56 +02:00
Mike Fährmann
a1e739b96c
reuse connection adapters from parent extractors 2020-05-12 23:52:01 +02:00
Mike Fährmann
f8f95e68a7
improve '--write-pages' (#737)
- move code into its own function
- add enumeration index to filenames
- dump responses regardless of status code
2020-05-12 20:40:25 +02:00
Mike Fährmann
09cc9dbec0
prevent flake8 errors from comments looking like type annotations 2020-05-12 20:08:05 +02:00
Mike Fährmann
2d6724180b
[hiperdex] update domain to hiperdex.info 2020-05-12 17:00:51 +02:00
Vrihub
4cc761c730
Implement --write-pages option (#736)
* Implement --write-pages option

* Fix long lines

* Fix file mode to binary

* Fix pattern for Windows compatibility
2020-05-12 14:25:21 +02:00
Mike Fährmann
f557cac074
[redgifs] add image extractor (#724) 2020-05-10 00:31:42 +02:00
Mike Fährmann
65b1cb7acd
[deviantart] use private access tokens for Journals (fixes #738) 2020-05-08 21:45:01 +02:00
Mike Fährmann
0bf0146bfe
[reddit] don't send OAuth headers for file downloads (fixes #729) 2020-05-08 21:42:52 +02:00