1
0
mirror of https://github.com/mikf/gallery-dl.git synced 2024-11-23 19:22:32 +01:00
Commit Graph

2589 Commits

Author SHA1 Message Date
Mike Fährmann
f2e8aedd74
[twitter] changes to 'cards' option
- change default value to 'true'
- only invoke youtube-dl for cards unsupported by gallery
  when 'cards' is set to "ytdl"

"cards": true   --> only download card images
"cards": "ytdl" --> download card images and
                    use youtube_dl on otherwise unsupported cards
2022-01-15 22:02:57 +01:00
Mike Fährmann
2d34d8ff8b
[reddit] allow downloading from quarantined subreddits (#2180) 2022-01-14 21:55:59 +01:00
Mike Fährmann
17c9c47ca0
[hitomi] fix 'tag' extraction (fixes #2189) 2022-01-13 16:45:46 +01:00
Mike Fährmann
df2f0c09bb
[twitter] support "image_carousel_website" unified cards 2022-01-13 16:05:52 +01:00
Mike Fährmann
cdc96e1217
[gelbooru] improve video file detection (fixes #2188)
not all files from 'https://video-cdnN.gelbooru.com' are videos
2022-01-12 21:33:02 +01:00
Mike Fährmann
4acc31bd9f
[newgrounds] set suitabilities filter before starting a search 2022-01-11 23:50:29 +01:00
Mike Fährmann
170711af7e
[mangadex] fix extraction (closes #2177) 2022-01-08 17:21:35 +01:00
Mike Fährmann
199e7616a7
[rule34] use https://api.rule34.xxx for API requests 2022-01-08 17:14:50 +01:00
Mike Fährmann
37beb1298e
[newgrounds] add 'search' extractor (closes #2161) 2022-01-06 19:32:39 +01:00
Mike Fährmann
8b910dd8ae
[hitomi] fix image URLs
again and again ...
2022-01-06 18:21:26 +01:00
Mike Fährmann
3085aac4d8
[gelbooru] handle changed API response format (#2157) 2022-01-03 16:42:48 +01:00
Mike Fährmann
38e2af29d6
[hitomi] fix image URLs
update '_parse_gg()' yet again
2022-01-03 16:41:00 +01:00
Mike Fährmann
6f2e0c9c3d
fix cookie checks for patreon, fanbox, fantia
The changes in 9a255344 caused a warning about missing cookies to be
displayed even if those cookies were present, because _check_cookies()
did not account for an empty cookiedomain.
2022-01-01 17:55:58 +01:00
Mike Fährmann
1e0278702d
[hitomi] update '_parse_gg()' 2022-01-01 17:55:58 +01:00
Mike Fährmann
becc7f85a6
[hitomi] fix image URLs 2021-12-29 22:46:17 +01:00
Mike Fährmann
6af8d71da6
[kemonoparty] use service as subcategory (closes #2147) 2021-12-29 22:46:17 +01:00
Vrihub
96fcff182c
generic extractor (#735)
* Generic extractor, see issue #683

* Fix failed test_names test, no subcategory needed

* Prefix directory_fmt with "generic"

* Relax regex (would break some urls)

* Flake8 compliance

* pattern: don't require a scheme

This fixes a bug when we force the generic extractor on urls without a
scheme (that are allowed by all other extractors).

* Fix using g: and r: on urls without http(s) scheme

Almost all extractors accept urls without an initial http(s) scheme.

Many extractors also allow for generic subdomains in their "pattern"
variable; some of them implement this with the regex character class
"[^.]+" (everything but a dot).

This leads to a problem when the extractor is given a url starting
with g: or r: (to force using the generic or recursive extractor)
and without the http(s) scheme: e.g. with "r:foobar.tumblr.com"
the "r:" is wrongly considered part of the subdomain.

This commit fixes the bug, replacing the too generic "[^.]+" with the
more specific "[\w-]+" (letters, digits and "-", the only characters
allowed in domain names), which is already used by some extractors.

* Relax imageurl_pattern_ext: allow relative urls

* First round of small suggested changes

* Support image urls starting with "//"

* self.baseurl: remove trailing slash

* Relax regexp (didn't catch some image urls)

* Some fixes and cleanup

* Fix domain pattern; option to enable extractor

Fixed the domain section for "pattern", to pass "test_add" and
"test_add_module" tests.
Added the "enabled" configuration option (default False) to enable the
generic extractor. Using "g(eneric):URL" forces using the extractor.
2021-12-29 22:39:29 +01:00
Mike Fährmann
4376b39a2b
[sexcom] fix and improve embed extraction (fixes #2145) 2021-12-28 21:59:39 +01:00
Mike Fährmann
6d190834ee
[instagram] fix error when PostPage data is not in GraphQL format
(#2037)
2021-12-28 00:27:59 +01:00
Mike Fährmann
dd67e24aa9
[lolisafe] include file ID in filenames
More precisely, it now splits the full 'filename' into 'name' and 'id'
instead of overwriting 'filename'. The format string stays the same as
before. Use '{name}.{extension}' to restore the old behavior.

before:
- filename: foobar
- id      : 12345

now:
- filename: foobar-12345
- name    : foobar
- id      : 12345
2021-12-25 17:16:45 +01:00
Mike Fährmann
f3d61de18d
[artstation] create directories per asset (closes #2136) 2021-12-25 17:16:45 +01:00
Mike Fährmann
49a50fb2eb
[500px] create directories per photo 2021-12-25 17:16:45 +01:00
Mike Fährmann
89bebe1bef
[500px] add 'favorite' extractor (closes #1927) 2021-12-25 17:16:45 +01:00
Mike Fährmann
22b0433985
[fanbox] support pixiv redirects (closes #2122) 2021-12-25 17:15:39 +01:00
Mike Fährmann
281828b58b
[tumblrgallery] improve search pagination (fixes #2132) 2021-12-24 03:42:28 +01:00
Mike Fährmann
4bec34fc94
[pixiv] allow setting a date range for search results (#2133)
with the 'scd' and 'ecd' query parameters
2021-12-23 23:03:39 +01:00
Mike Fährmann
882c614281
add album extractor for lolisafe/chibisafe instances
- support bunkr.is (closes #2038)
- support zz.ht    (closes #2105)
2021-12-21 19:24:17 +01:00
Mike Fährmann
d441888bfb
[deviantart] adjust API endpoints
Start all endpoints with a forward slash '/'
to be consistent with other API interfaces.
2021-12-21 00:18:06 +01:00
Mike Fährmann
8f0cf0bf71
[deviantart] use '/browse/newest' for most-recent searches
(#2096)
2021-12-20 22:40:03 +01:00
Mike Fährmann
0bd7607da5
[tumblrgallery] improve 'id' extraction (#2115) 2021-12-19 05:46:02 +01:00
Mike Fährmann
0d02a7861e
[tumblrgallery] fix extraction (closes #2112) 2021-12-17 19:55:53 +01:00
Mike Fährmann
62692c6842
[exhentai] add 'source' option
setting it to "hitomi" downloads the corresponding gallery from
hitomi.la; might be extended to other sources in the future
2021-12-16 23:16:19 +01:00
Mike Fährmann
099ed72de7
[hitomi] disable extra 'metadata' by default
safes one HTTP request that not needed with default filename settings
2021-12-16 22:21:07 +01:00
Mike Fährmann
9a25534490
use Extractor._check_cookies() for all cookie checks 2021-12-16 02:21:16 +01:00
Mike Fährmann
63c6bc26b5
[rule34us] extract tags per category (#1527)
like for other boorus with 'tags': true
2021-12-16 00:06:52 +01:00
Mike Fährmann
f587458a3c
[twitter] include '4096x4096' as a default image fallback
(closes #2107, closes #1881)
2021-12-15 23:19:30 +01:00
Mike Fährmann
8ed282f7f2
[kemonoparty] support coomer.party URLs (#2100) 2021-12-15 16:21:05 +01:00
Mike Fährmann
87ce3fa669
[furaffinity] warn when no session cookies were found 2021-12-15 16:21:05 +01:00
Mike Fährmann
159631c808
[philomena] use a default 'filter_id' if non is given 2021-12-15 16:20:53 +01:00
Mike Fährmann
ad30653b17
allow running a BaseExtractor for any URL
by prefixing it with '<base-category>:'

For example:
  shopify:https://partakefoods.com/products/crunchy-cookie-variety-pack
  gelbooru_v01:https://5naf.booru.org/index.php?page=post&s=view&id=46963

Available base categories are:
  mastodon, shopify, moebooru, gelbooru_v01, gelbooru_v02,
  reactor, foolslide, foolfuuka,  philomena
2021-12-15 00:32:17 +01:00
Mike Fährmann
299bd2f1f5
[rule34us] add 'tag' and 'post' extractors (#1527) 2021-12-14 00:27:46 +01:00
Mike Fährmann
3cf1075d86
[inkbunny] add 'search' extractor (closes #2094) 2021-12-12 03:08:14 +01:00
Mike Fährmann
c6a23c26d7
[instagram] allow downloading specific stories (closes #2088)
https://instagram.com/stories/<USER>/<ID> now only downloads the one
story specified by <ID> and not all stories from that user.
2021-12-11 21:34:25 +01:00
Mike Fährmann
352ffcddb0
[instagran] match post URLs with usernames (fixes #2085) 2021-12-10 18:37:33 +01:00
Mike Fährmann
f4e3cee6ac
use yt-dlp by default (#1850, #2028) 2021-11-29 18:24:26 +01:00
Mike Fährmann
f1b142e993
{kemonoparty[ change default 'files' order to attachments,file,inline
(#1991)
2021-11-29 04:41:30 +01:00
Mike Fährmann
275543b2d2
update extractor test results 2021-11-27 19:26:44 +01:00
Mike Fährmann
e7ea4f2567
[mangoxo] fix metadata extraction 2021-11-27 18:19:51 +01:00
Mike Fährmann
e298882acc
[kemonoparty] match URLs with www subdomain 2021-11-26 18:58:26 +01:00
Mike Fährmann
addb72e1bb
[reactor] support thatpervert.com (closes #2029) 2021-11-26 18:58:07 +01:00
Mike Fährmann
d8d9502e1e
[reactor] inherit from BaseExtractor 2021-11-26 18:58:07 +01:00
Mike Fährmann
f4ea216c95
[shopify] support loungeunderwear.com (closes #2053) 2021-11-26 18:58:06 +01:00
Mike Fährmann
93cef78450
[gelbooru] workaround pagination limits
Gelbooru only allows to retrieve the latest 20k posts for a tag search.
Add 'id:<N' to the search tags to work around that limitation, where N
is the ID of the last retrieved post.

http://gelbooru.me/index.php?page=forum&s=view&id=1467
2021-11-26 18:56:31 +01:00
Mike Fährmann
f2ae179713
[exhentai] fix extraction for disowned galleries (closes #2055) 2021-11-24 21:26:16 +01:00
Alice
612850438e
[skeb] add 'thumbnails' option (#2047) (#2051) 2021-11-23 21:16:42 +01:00
Mike Fährmann
11a3d96d13
[mangadex] load additional metadata using includes[] directives
- always provide 'artist', 'author', and 'group' metadata fields (#2049)
- remove 'metadata' option
2021-11-22 01:16:33 +01:00
Mike Fährmann
19e00f1322
[dynastyscans] provide 'date' as proper datetime object (#2050) 2021-11-21 22:50:52 +01:00
Mike Fährmann
af6424f398
allow testing metadata in list elements 2021-11-21 22:46:34 +01:00
Mike Fährmann
c67756e187
[kemonoparty] add 'dms' option (#2008) 2021-11-20 23:36:16 +01:00
Mike Fährmann
3a7a19c7b9
[dynastyscans] add 'manga' extractor (closes #2035) 2021-11-19 22:51:26 +01:00
Mike Fährmann
9bc83af3a6
[kemonoparty] 'postfile' -> 'file' (#1991)
to stay consistent with the existing file types for kemono
2021-11-19 01:50:48 +01:00
Mike Fährmann
522782c09d
[subscribestar] emit metadata for posts without media (#1569) 2021-11-18 23:42:17 +01:00
Mike Fährmann
1c8aaf9318
[subscribestar] add 'num' enumeration index (closes #2040) 2021-11-18 23:38:41 +01:00
Mike Fährmann
d433735750
[kemonoparty] skip duplicate files (#2032, #1991, #1899)
Extract the SHA-256 file hash from URLs
and skip files with the same hash in the same post.

- provide a 'hash' metadata field (empty string if not available)
- remove 'patreon-skip-file' option
2021-11-17 22:44:15 +01:00
Mike Fährmann
d4ec245554
[kemonoparty] implement a 'files' option (#1991)
similar to 8d676151
2021-11-17 22:43:41 +01:00
Mike Fährmann
ab8eea1a24
[twitter] fix extractor for direct image links (fixes #2030) 2021-11-16 22:57:46 +01:00
Mike Fährmann
2076d40681
[ytdl] improve error handling (#1680) 2021-11-15 22:56:42 +01:00
Mike Fährmann
2aaac3c997
[instagram] include user metadata for 'tagged' downloads (#2024)
Adds
- tagged_owner_id
- tagged_full_name
- tagged_username
containing the values for the user profile the URL originated from,
e.g. 'instagram' for https://www.instagram.com/instagram/tagged/.
2021-11-15 21:21:59 +01:00
Mike Fährmann
cfa4876848
[philomena] support furbooru.org (closes #1995) 2021-11-15 20:57:51 +01:00
Mike Fährmann
4377f1c284
[twitter] distinguish between fatal & nonfatal errors (#2020)
only show a warning for nonfatal errors
and do not raise a StopExtraction exception
2021-11-13 22:46:40 +01:00
Kyle Anthony Williams
a14b72be21
[webtoons] Use swebtoon-phinf.pstatic.net instead of webtoon-phinf.pstatic.net (#2005)
* [webtoons] Use swebtoon-phinf.pstatic.net instead of webtoon-phinf.pstatic.net

This trick to avoid having to set a Referer header comes from
Webtoon's RSS feeds. The two URLs below are equivalent in content:

https://webtoon-phinf.pstatic.net/20210929_153/1632867980912DmcGK_JPEG/16328679808882705182.jpg?type=q90
https://swebtoon-phinf.pstatic.net/20210929_153/1632867980912DmcGK_JPEG/16328679808882705182.jpg?type=q90

The URL with the domain "webtoon-phinf.pstatic.net" needs a Referer
header, and the domain "swebtoon-phinf.pstatic.net" does not. This
is because of the environment "swebtoon" images live in, one without
explicit network control: RSS feeds on sites such as Feedly. This change should
make it easier for gallery-dl developers to embed Webtoon comics without
worrying about headers.
2021-11-11 20:03:34 +01:00
Mike Fährmann
6e3658ef52
[kemonoparty] provide 'date' metadata for gumroad (#2007)
Not the 'published' or 'edited' values since they are 'null',
but still better then nothing at all.
2021-11-11 19:38:10 +01:00
Mike Fährmann
37c9dedee1
[seisoparty] remove module 2021-11-09 22:41:04 +01:00
Mike Fährmann
efa178cc91
[ytdl] implement parsing ytdl command-line options (#1680)
- adds 'config-file' and 'cmdline-args' options
  for both ytdl downloader and extractor
- create 'ytdl' helper module, which combines YoutubeDL creation
  and option parsing.
- most likely a buggy mess due to incompatibilities between the
  original youtube-dl and yt-dlp.
2021-11-07 02:44:11 +01:00
Mike Fährmann
7cb303d745
[redgifs] improve URL extraction
Fields inside 'urls' can be None, which would have caused an exception
with the old method.
2021-11-05 20:02:43 +01:00
Mike Fährmann
2befed1a96
[redgifs] update search URL pattern (#1984) 2021-11-05 20:00:06 +01:00
Mike Fährmann
b315a0ecef
[redgifs] update to API v2 (#1984) 2021-11-04 21:31:20 +01:00
Mike Fährmann
f0fc3b0ba1
[kemonoparty] add 'comments' option (#1980) 2021-11-03 23:02:13 +01:00
Mike Fährmann
1fac74b14d
[reddit] prevent crash for galleries with no 'media_metadata'
(fixes #2001)
2021-11-03 17:55:40 +01:00
Mike Fährmann
211de95dd0
update extractor test results 2021-11-01 02:58:53 +01:00
Mike Fährmann
8bea02c38c
[deviantart] fix 'index' values for stashed deviations 2021-11-01 01:08:24 +01:00
Mike Fährmann
dd88a7d980
{cyberdrop] restore video extraction (fixes #1993)
fixes a regression introduced in f33c2ef7
2021-10-31 04:34:01 +01:00
Mike Fährmann
fa5646eadc
[mangoxo] fix login and extraction 2021-10-31 02:16:13 +01:00
Mike Fährmann
4c49174579
[mangakakalot] update domain and fix extraction 2021-10-31 02:16:13 +01:00
YongChan Cho
14852f7050
[hitomi] fix image path (#1988) 2021-10-30 21:45:01 +02:00
Mike Fährmann
dad2875a3e
fix calculating retry sleep times (fixes #1990) 2021-10-29 23:53:48 +02:00
Mike Fährmann
9156e90f1f
[twitter] add 'pinned' option 2021-10-29 22:10:58 +02:00
Mike Fährmann
06b414c9a3
[redgifs] 'gfyId' -> 'id' (#1984) 2021-10-29 02:05:39 +02:00
Ryu juheon
d4614e5ba4
[hitomi] fix image URLs (#1982) 2021-10-28 19:29:48 +02:00
Mike Fährmann
6434ccf9e8
[redgifs] split from 'gfycat' (#1984)
Update API endpoints and metadata names - mostly 'gfycat' -> 'gif' -
and remove some obsolete checks.
2021-10-28 19:22:41 +02:00
Mike Fährmann
e4696b40ba
[instagram] update query hashes 2021-10-27 21:37:31 +02:00
Alice
bfd7401b1e
[skeb] add 'user' and 'post' extractors (#1031) (#1971)
* Create skeb.py

* Update __init__.py

* Update supportedsites.py

* Update supportedsites.md

* Update supportedsites.py

* Update skeb.py
2021-10-26 20:00:41 +02:00
Ryu juheon
6b6d92d51c
[hitomi]: fix image URLs (#1975) 2021-10-26 19:35:01 +02:00
Mike Fährmann
dcb201ff19
[gfycat] show warning when there are no available formats 2021-10-26 19:26:50 +02:00
Mike Fährmann
e436a2607b
[gfycat] consistent 'userName' values for 'user' downloads (#1962)
by using the name from the input URL and not relying on possibly faulty
or incomplete API results.

'userData[username]', if available, will still have the original name.
2021-10-26 19:15:30 +02:00
Mike Fährmann
f1487a3cfa
[kemonoparty:discord] improve 'inline' extraction (#1940)
- extract media.discordapp.*NET* URLs
- rewrite media.discordapp.net to cdn.discordapp.com
- use a more restricted set of characters for the URL path
2021-10-24 21:15:21 +02:00
Mike Fährmann
02a247f4e5
[deviantart] full resolution for non-downloadable images (#293)
Many thanks to @Ironchest337 for discovering this method
and providing a well-documented implementation.
2021-10-24 21:11:12 +02:00
Mike Fährmann
a7ddb5f5fa
[deviantart] update 'search' argument handling (fixes #1911)
- use 'alltime' by default
- support newer 'order' values (most-recent, this-week, etc)
2021-10-23 21:48:02 +02:00
Mike Fährmann
c19e762fdf
[vk] add 'album' extractor (#474, fixes #1952)
todo: better metadata for albums
2021-10-23 00:46:20 +02:00
Mike Fährmann
8bb442f20d
[redgifs][gfycat] provide fallback URLs (fixes #1962)
and extend the 'format' option
2021-10-22 22:47:29 +02:00