1
0
mirror of https://github.com/mikf/gallery-dl.git synced 2024-11-23 19:22:32 +01:00
Commit Graph

2816 Commits

Author SHA1 Message Date
Mike Fährmann
1bc77efa02
[artstation] use "browser": "firefox" by default (#2527) 2022-05-02 09:03:13 +02:00
Mike Fährmann
a39e7b7366
[vk] handle photos without width/height info (fixes #2535) 2022-05-02 09:03:00 +02:00
Federico Ravasio
0381752575
[photovogue] switch to .com, update api endpoint (#2494) 2022-04-27 22:37:53 +02:00
Mike Fährmann
3f02e483c6
[e621] fix applying request_interval_min (#2533)
Setting this property after calling Extractor.__init__() has no effect.
2022-04-27 21:10:34 +02:00
Mike Fährmann
afde76269c
[weibo] fix infinite retries for deleted accounts (fixes #2521) 2022-04-27 20:23:11 +02:00
Mike Fährmann
d85e66bcac
[vk] fix extraction (#2512)
Use a different API endpoint, since thumbnail URLs from the old one
cannot be transformed into URLs for "original" photos anymore.
2022-04-21 14:01:50 +02:00
Mike Fährmann
9e6ff42a9d
[pixiv] implement 'background' option (#623, #1124, #2495) 2022-04-21 13:53:02 +02:00
Mike Fährmann
4d1896830f
[mangadex] download chapters with 'externalUrl' (fixes #2503)
if the have pages hosted on mangadex
2022-04-18 18:09:52 +02:00
Mike Fährmann
97e8a15295
[deviantart] implement 'pagination' option (#2488) 2022-04-18 18:08:01 +02:00
Mike Fährmann
1f9a0e2fd8
update extractor test results 2022-04-18 17:24:00 +02:00
Mike Fährmann
ad5a4b1756
[twitter] fix various syndication issues
- handle retweets
- fix videos without dimensions in URL (3e942a58)
- fix '"retweets": "self"' filter (#2499)
2022-04-15 20:49:26 +02:00
Mike Fährmann
12bd9ba33a
[readcomiconline] add 'quality' option (#2467) 2022-04-15 18:10:37 +02:00
Mike Fährmann
60ad46ddcc
[readcomiconline] unobfuscate image URLs (#2481) 2022-04-15 18:04:09 +02:00
Mike Fährmann
a6c4ff58fb
[cyberdrop] match cyberdrop.to URLs (closes #2496) 2022-04-15 15:39:29 +02:00
Mike Fährmann
13ed18b9aa
[lolisafe] fix typo
LolisafelbumExtractor -> LolisafeAlbumExtractor
2022-04-15 15:02:30 +02:00
Mike Fährmann
3e942a58be
[twitter] improve syndication video selection (#2354)
- ignore .m3u8 manifests
- always select largest format
2022-04-11 17:06:10 +02:00
Mike Fährmann
0794027100
[issuu] fix extraction (#2483) 2022-04-10 14:23:10 +02:00
Mike Fährmann
5d5a08cc69
[sexcom] add fallback for empty files (#2485) 2022-04-10 14:22:07 +02:00
thatfuckingbird
4527a35aba
[twitter] accept fxtwitter.com URLs (#2484) 2022-04-08 14:32:08 +02:00
Mike Fährmann
c1768972c2
[newgrounds] update and fix pagination (#2456) 2022-04-07 15:38:41 +02:00
Mike Fährmann
78e5d0c423
[kissgoddess] extract all images (closes #2473)
and not only the first two per page
https://github.com/mikf/gallery-dl/issues/1052#issuecomment-1047367383
2022-04-06 21:28:40 +02:00
Mike Fährmann
0b33435da5
[pinterest] support multiple files per pin (closes #1619, #2452) 2022-04-06 21:21:33 +02:00
Mike Fährmann
9c5d2d7af3
[pinterest] add extractor for created pins (#2452) 2022-04-01 16:59:58 +02:00
Mike Fährmann
1171911dc3
[twitter] add 'syndication' option (#2354)
to fetch age-restricted content using Twitter's  syndication API
2022-04-01 16:56:47 +02:00
Mike Fährmann
a53cfc845e
[newgrounds] warn about age-restricted posts (#2456) 2022-03-30 16:18:33 +02:00
Mike Fährmann
ecee315bbf
[mangasee] unescape manga names (fixes #2454) 2022-03-30 16:18:18 +02:00
loragja
7e545a3ae9
[gofile] add gofile.io extractor (#2364)
* Add gofile extractor

* add gofile extractor to module list

* add support for tiny monitors and ancient python versions

* seriously, f-strings are not *that* new...

* i love flake8 :)

* add 'api-token' and 'recursive' options
* add tests
2022-03-29 17:31:57 +02:00
Layerex
625f4d4cc4
[telegraph] Add telegra.ph extractor (#2312) 2022-03-28 19:18:13 +02:00
Mike Fährmann
48cc4853be
[skeb] refactor 'sent-requests' and add tests 2022-03-28 11:26:24 +02:00
Mike Fährmann
37d584a9b2
[hitomi] update metadata extraction (fixes #2444)
remove 'hitomi.metadata' option, as it is no longer necessary
to make additional HTTP requests to fetch all metadata.
2022-03-26 12:46:18 +01:00
Mike Fährmann
b03ca7f10c
[aryion] provide correct 'date' independent of dst 2022-03-24 22:57:18 +01:00
Mike Fährmann
ba69fb669d
[kemonoparty] add 'duplicates' option (closes #2440) 2022-03-24 11:58:38 +01:00
Mike Fährmann
29db716a63
implement 'datetime_to_timestamp()'
and rename 'to_timestamp()'
to the more descriptive 'datetime_to_timestamp_string()'
2022-03-23 22:36:01 +01:00
Mike Fährmann
9313d4dc10
[pinterest] do not force 'm3u8_native' for video downloads (#2436) 2022-03-21 10:11:51 +01:00
Mike Fährmann
42f2fd2ed7
[twibooru] fix posts without 'name' (fixes #2434) 2022-03-21 10:08:37 +01:00
chinggg
6f1d5e8ab9
[unsplash] replace dash with space in search API queries (#2429) 2022-03-19 16:00:05 +01:00
Mike Fährmann
f8230dde43
[instagram] add 'previews' option (#2135) 2022-03-19 15:26:40 +01:00
Mike Fährmann
500a479026
fix a third(!) bug in _check_cookies() (#2372)
turns out tests are worthless if you get em wrong ...
2022-03-18 19:52:37 +01:00
Mike Fährmann
c4cc387f7d
[furaffinity] fix search result pagination (fixes #2402) 2022-03-18 13:44:36 +01:00
Mike Fährmann
281a5b3b28
[newgrounds] fix video descriptions (#2328) 2022-03-14 08:38:20 +01:00
Mike Fährmann
b1b15d6cef
[imagebam] add support for /view/ paths (closes #2378) 2022-03-14 08:38:20 +01:00
Mike Fährmann
e64c2b85d0
[fantia] apply patch (#2381)
from @thatfuckingbird with small adjustments

https://github.com/mikf/gallery-dl/issues/2381#issuecomment-1063208696
2022-03-11 18:02:31 +01:00
Mike Fährmann
f31ab0d2ec
[fanbox] fetch data for each individual post (fixes #2388)
Posts from 'https://api.fanbox.cc/post.listCreator'
do not  contain a 'body' with all images anymore.

https://github.com/mikf/gallery-dl/pull/1459#discussion_r614322881
2022-03-11 17:36:05 +01:00
Mike Fährmann
fc277fa45f
[seiga] require authentication with 'user_session' cookie (#2372)
Login with username & password would now require entering a 2FA token.

see also 7b009cc893
2022-03-11 02:10:15 +01:00
Mike Fährmann
47cf05c4ab
refactor proxy handling code (#2357)
- allow gallery-dl proxy settings to overwrite environment proxies
- allow specifying different proxies for data extraction and download
  - add 'downloader.proxy' option
  - '-o extractor.proxy=–PROXY_URL -o downloader.proxy=null'
    now has the same effect as youtube-dl's '--geo-verification-proxy'
2022-03-10 23:55:35 +01:00
Mike Fährmann
d50a1ec2cc
[subscribestar] unescape attachment URLs (fixes #2370) 2022-03-09 19:06:04 +01:00
Mike Fährmann
3ddc620ef6
[skeb] fix post extractor (#2330) 2022-03-09 18:45:07 +01:00
Orkun Koçyiğit
eb2bb7d998
[fantia] add 'num' enumeration index (#2377)
* Adding numerical ordering to fantia

* Fixed line to fit PEP8 line size limit
2022-03-08 22:06:41 +01:00
Mike Fährmann
fac8047899
[kemonoparty] limit default filename length (#2373) 2022-03-08 21:14:47 +01:00
Mike Fährmann
bfa5e61900
[patreon] add explicit 'image_large' file type (#2257)
to allow more control over when and if to download 'large_url' images

4fee3a0e52 forced them to be downloaded
instead of regular images, even though 'large_url' images are most likely
an upscaled version of the original.
2022-03-06 17:07:13 +01:00
Mike Fährmann
6ea3ff5173
[tumblr] notify users about registering an oauth application
if they hit the daily rate limit and are using default API credentials
2022-03-06 16:28:53 +01:00
Mike Fährmann
b5236656d5
[deviantart] notify users about registering an oauth application
if they get repeated 429 errors and are using default API credentials
2022-03-06 16:24:39 +01:00
Mike Fährmann
2aa47e8382
[twitter] handle Tweets with "softIntervention" entries
or other such things where the actual Tweet data is one level deeper
than usual
2022-03-03 02:06:54 +01:00
Mike Fährmann
64bbc7969d
[twitter] warn about age-restricted Tweets (#2354) 2022-03-03 02:03:27 +01:00
Mike Fährmann
e778be52bc
[twitter] update query hashes 2022-03-02 23:05:31 +01:00
Mike Fährmann
bddcec49f1
implement 'text.root_from_url()'
use domain from input URL for kemono
2022-03-01 03:09:57 +01:00
Mike Fährmann
92c492dc09
[kemonoparty] match beta.kemono.party URLs (#2348) 2022-03-01 03:02:30 +01:00
Mike Fährmann
4ea9157d51
[mangadex] fix chapters without 'translatedLanguage' (#2352) 2022-03-01 02:04:25 +01:00
Alice
f1cab23724
[skeb] add 'sent-requests' option (#2322) (#2330)
* Update skeb.py

* Update configuration.rst

* flake8
2022-02-28 22:42:15 +01:00
dragobit
781fdfa212
[hentaicosplays] add Referer to headers (#2317) 2022-02-28 22:19:32 +01:00
Mike Fährmann
4385a34e05
[twitter] fix handling of 429 responses (fixes #2339)
Twitter doesn't return a valid JSON response for 429 errors anymore.
2022-02-28 16:42:55 +01:00
Mike Fährmann
5a50569360
[toyhouse] support 'art' listings (#1546, #2331) 2022-02-27 16:22:50 +01:00
Mike Fährmann
1c79044433
[imagebam] set 'nsfw_inter' cookie (fixes #2334) 2022-02-27 16:12:28 +01:00
Mike Fährmann
d71c173150
[newgrounds] strip incomplete HTML tag from '_comment' (#2328) 2022-02-23 21:42:28 +01:00
Mike Fährmann
cf58048bd4
[newgrounds] add 'post_url' metadata field (#2328) 2022-02-23 00:00:23 +01:00
Mike Fährmann
7aa2e2cd84
[slideshare] fix extraction 2022-02-21 02:52:45 +01:00
Mike Fährmann
fdfdc1b614
[kissgoddess] add 'gallery' and 'model' extractors
(closes #1052, #2304)
2022-02-20 04:45:37 +01:00
Mike Fährmann
79a461a2c1
[mememuseum] add 'tag' and 'post' extractors (closes #2264) 2022-02-20 02:15:38 +01:00
Mike Fährmann
e5f6af6e32
[oauth:pixiv] add note about 'code' expiring in 30 seconds (#2306) 2022-02-19 23:47:30 +01:00
Mike Fährmann
bbc4190017
[bunkr] fix .mp4 downloads (#2239)
again ...
2022-02-19 03:55:14 +01:00
Mike Fährmann
254a5b26e0
[twibooru] add extractors for searches, galleries, and posts
(#2219)
2022-02-18 23:43:57 +01:00
Mike Fährmann
9ebc20e290
[booru] call nameext_from_url() before update() and _prepare()
to be able to overwrite filename and extension in _prepare()
2022-02-18 00:37:59 +01:00
Mike Fährmann
4fee3a0e52
[patreon] download 'large_url' images if available (#2257) 2022-02-17 18:23:59 +01:00
Mike Fährmann
f5b2b9333f
fix another bug in _check:cookies (#2160)
regression introduced in ed317bfc

Added a couple of tests to hopefully catch such bugs
before they land in a release.
2022-02-16 22:58:57 +01:00
Ailothaen
203a04a4a3
[reddit] Support of standalone submissions on personal pages of users (#2301)
* [reddit] Support of submissions on personal pages of users

* [reddit] Design improvement for user submissions

* [reddit] Removed functions declared twice
2022-02-13 23:03:46 +01:00
Mike Fährmann
806bc62379
[redgifs] support 'i.redgifs.com' URLs (closes #2300) 2022-02-13 23:00:50 +01:00
Mike Fährmann
655b2de5d9
[vk] fix infinite pagination loops (fixes #2297) 2022-02-13 23:00:50 +01:00
Mike Fährmann
cc5b1ce91a
[inkbunny] rename search parameters to their API equivalents
(fixes #2292)
2022-02-13 23:00:49 +01:00
Mike Fährmann
ed317bfcf1
warn about cookies expiring in less than 24 hours
requires an expiration timestamp,
so this only works with cookies from a cookies.txt file
2022-02-13 23:00:49 +01:00
David Hoppenbrouwers
b17e2dcf93
[wallpapercave] add extractor for images (#2205) 2022-02-11 23:44:51 +01:00
v-delta
c661737f36
[Imgbox] Fix ImgboxExtractor (#2281) 2022-02-11 22:17:02 +01:00
Thomas Jost
a7de819aca
[lightroom] add Lightroom gallery extractor (#2263) 2022-02-11 21:30:59 +01:00
Mike Fährmann
563bd0ecf4
[danbooru] inherit from BaseExtractor
- merge danbooru and e621 code
- support booru.allthefallen.moe (closes #2283)
- remove support for old e621 tag search URLs
2022-02-11 21:01:51 +01:00
Mike Fährmann
bc0e853d30
combine KeyError & IndexError to common base class LookupError 2022-02-11 00:42:49 +01:00
Mike Fährmann
f1c853c6ef
[furaffinity] add 'layout' option (#2277)
to be able to force gallery-dl to parse according to a specific layout
in case its auto-detect fails
2022-02-11 00:28:47 +01:00
Mike Fährmann
b4f8e15a1f
allow BaseExtractors to use the domain pf the matched URL 2022-02-10 01:38:50 +01:00
Mike Fährmann
a57a44f510
[kemonoparty] handle files without 'name' (fixes #2276) 2022-02-08 18:27:05 +01:00
Mike Fährmann
4efe56f419
[furaffinity] improve new/old layout detection (fixes #2277) 2022-02-08 18:10:52 +01:00
Mike Fährmann
0f1e7ff319
[twitter] fix extraction (#2275) 2022-02-07 23:18:35 +01:00
Mike Fährmann
dee0d22561
update extractor test results 2022-02-06 21:39:24 +01:00
Mike Fährmann
d7b8e04b50
[kemonoparty] use 'Accept-Encoding: identity' for all downloads
(#2267)

fixes issues when data send with 'Content-Encoding: gzip' or other
encodings is larger than the actual file
2022-02-05 18:06:58 +01:00
enormous-muscles
55326377d8
Add Kohlchan extractor (#2251) 2022-02-04 23:22:17 +01:00
Mike Fährmann
cc7dce5755
[sexcom] add 'pins' extractor (closes #2265) 2022-02-04 20:55:00 +01:00
Mike Fährmann
02e18f56be
[e621] add 'favorite' extractor (closes #2250) 2022-02-04 20:54:48 +01:00
Mike Fährmann
70e6e1549e
[twitter] provide fallback URLs for card images
f2e8aedd74 (commitcomment-64057751)
2022-02-03 23:43:18 +01:00
Mike Fährmann
86fa412b47
[hitomi] add 'format' option (#2260)
default is 'webp' since downloading original files is no longer allowed
2022-02-03 23:32:19 +01:00
Mike Fährmann
492436f936
[twitter] add 'warnings' option (#2258)
disable reporting any non-fatal errors by default
2022-02-02 18:37:19 +01:00
Mike Fährmann
a5163e4c70
[twitter] restore 'logout' functionality (#1719) 2022-02-02 18:21:15 +01:00
Mike Fährmann
f58364f6a8
update Firefox cipher list 2022-02-01 02:33:01 +01:00
Mike Fährmann
7e6981dda6
rename 'disabletls12' to 'tls12'
and let config options override any default settings
2022-02-01 01:37:03 +01:00
Mike Fährmann
bb3e182562
overhaul session initialization
- share adapter & connection pool across sessions with the same
  ssl options, ssl ciphers, and source address
- simplify browser emulation to just a list of headers and ciphers
2022-01-31 23:12:08 +01:00
Mike Fährmann
e670dc518e
[weibo] update pagination code (fixes #2244)
- send proper headers and query parameters
- use 'since_id' instead of page numbers
- set a 1-2 second delay between requests
2022-01-31 19:16:01 +01:00
Robert Pendell
4c651f6252
[patreon] Disable TLS 1.2 by default (#2249)
Disables TLS 1.2 on Patreon by default.
2022-01-30 23:30:44 +01:00
Robert Pendell
392cf079f7
Add ability to disable TLS 1.2 (#2243)
Fix for Patreon Cloudflare issues by having only TLS v1.3 or higher establish HTTPS connections

This now allows you to disable it on a per-host or global basis.  Add disabletls12 as a config option either under extractor.(host) or just under extractor.  Option is false by default.

Example:
        "patreon":
        {
            "disabletls12": true,
            "cookies": {
                "session_id": "X"
            }
        }
2022-01-30 22:14:43 +01:00
Mike Fährmann
d33227fc38
[twitter] restore errors for protected timelines etc (fixes #2237) 2022-01-30 16:42:13 +01:00
Mike Fährmann
ebd3d5c1cc
[bunkr] fix .mp4 downloads (closes #2239) 2022-01-28 23:21:16 +01:00
Mike Fährmann
e2be199124
[gelbooru] improve and fix pagination (#2230, #2232)
Use 'id:<POSTID' as a tag instead of going through pages with 'pid'.

Something similar was already implemented in 93cef784,
but that got broken again in 3085aac4.
2022-01-27 17:44:47 +01:00
Mike Fährmann
8230f31800
[twitter] update query hashes 2022-01-26 00:49:46 +01:00
Mike Fährmann
c180806cec
[twitter] fix deleted/invalid retweets (#2225) 2022-01-25 23:57:13 +01:00
Mike Fährmann
a2eecc6aa8
[kemonoparty] fix DMs extraction (#2008) 2022-01-25 23:16:13 +01:00
Mike Fährmann
2bf554a896
[twitter] fix several errors (#2212, #2216, #2225)
- fix Tweets with deleted quotes
- fix suspended Tweets without 'legacy' entry
- fix unified_cards without 'type'
2022-01-25 16:13:22 +01:00
Mike Fährmann
e5242b83bf
[twitter] define directory format for events (#2109) 2022-01-24 17:44:17 +01:00
Mike Fährmann
efb3e65a6a
[sexcom] extend URL pattern (fixes #2220) 2022-01-24 01:19:40 +01:00
vsyx
3f2b6335d7
[instagram] fix highlights extraction (#2197)
* [instagram] fix highlights extraction

* [instagram] improve highlights extraction

- 'yield' individual reels instead of collecting them in a list
  and returning them all at once
- reduce 'chunk_size' to an even saver value
  (instagram.com also uses 5)
2022-01-24 00:20:12 +01:00
Mike Fährmann
5ed26e1773
[twitter] fix pinned tweets (#2216)
caused by the changes in dffa440ede
2022-01-23 22:52:57 +01:00
Mike Fährmann
a9f78e6527
[twitter] improve error handling
- handle accounts without 'rest_id'
- handle timelines with empty 'instructions'
2022-01-23 18:01:05 +01:00
Mike Fährmann
729b07c1f5
[twitter] simplify
- use dict with common GraphQL variables
- reduce 'variables' size with custom JSON encoder instance
- centralise TwitterAPI() creation
2022-01-23 01:44:55 +01:00
Mike Fährmann
7cb29224f0
[philomena] fix search parameter escaping (#2215)
The pluses from search terms in /tags/ URLs need to be
replaced with spaces to get accepted by Philomena.
2022-01-23 01:03:37 +01:00
Mike Fährmann
9ca8bb2dc0
[twitter] improve error handling 2022-01-22 23:09:45 +01:00
Mike Fährmann
9a221494c3
[twitter] add 'event' extractor (closes #2109) 2022-01-22 20:55:50 +01:00
Mike Fährmann
14867dad6b
[twitter] fix unified cards from search results 2022-01-22 20:25:10 +01:00
Mike Fährmann
dffa440ede
[twitter] improve handling of deleted tweets (#2212) 2022-01-22 00:41:58 +01:00
Mike Fährmann
54ef874ba4
[twitter] fix retweet filter (#2212) 2022-01-21 23:53:59 +01:00
Mike Fährmann
cb43f7731b
[twitter] update to GraphQL API (#2212)
The old REST API endpoints, which were not used by Twitter since
summer 2021, are going to finally be phased out it seems, with
'/2/timeline/profile/USERID.json' being the first one.

Only Twitter's search doesn't have a GraphQL interface yet.
2022-01-21 23:34:41 +01:00
Mike Fährmann
de754590e0
add --source-address command-line option (closes #2206) 2022-01-21 17:07:56 +01:00
Mike Fährmann
698f35215e
[blogger] support new image domain (fixes #2204) 2022-01-20 23:13:07 +01:00
Mike Fährmann
c587b678d0
[mangadex] re-enable warning for external chapters (#2193) 2022-01-16 03:21:50 +01:00
Mike Fährmann
f2e8aedd74
[twitter] changes to 'cards' option
- change default value to 'true'
- only invoke youtube-dl for cards unsupported by gallery
  when 'cards' is set to "ytdl"

"cards": true   --> only download card images
"cards": "ytdl" --> download card images and
                    use youtube_dl on otherwise unsupported cards
2022-01-15 22:02:57 +01:00
Mike Fährmann
2d34d8ff8b
[reddit] allow downloading from quarantined subreddits (#2180) 2022-01-14 21:55:59 +01:00
Mike Fährmann
17c9c47ca0
[hitomi] fix 'tag' extraction (fixes #2189) 2022-01-13 16:45:46 +01:00
Mike Fährmann
df2f0c09bb
[twitter] support "image_carousel_website" unified cards 2022-01-13 16:05:52 +01:00
Mike Fährmann
cdc96e1217
[gelbooru] improve video file detection (fixes #2188)
not all files from 'https://video-cdnN.gelbooru.com' are videos
2022-01-12 21:33:02 +01:00
Mike Fährmann
4acc31bd9f
[newgrounds] set suitabilities filter before starting a search 2022-01-11 23:50:29 +01:00
Mike Fährmann
170711af7e
[mangadex] fix extraction (closes #2177) 2022-01-08 17:21:35 +01:00
Mike Fährmann
199e7616a7
[rule34] use https://api.rule34.xxx for API requests 2022-01-08 17:14:50 +01:00
Mike Fährmann
37beb1298e
[newgrounds] add 'search' extractor (closes #2161) 2022-01-06 19:32:39 +01:00
Mike Fährmann
8b910dd8ae
[hitomi] fix image URLs
again and again ...
2022-01-06 18:21:26 +01:00
Mike Fährmann
3085aac4d8
[gelbooru] handle changed API response format (#2157) 2022-01-03 16:42:48 +01:00
Mike Fährmann
38e2af29d6
[hitomi] fix image URLs
update '_parse_gg()' yet again
2022-01-03 16:41:00 +01:00
Mike Fährmann
6f2e0c9c3d
fix cookie checks for patreon, fanbox, fantia
The changes in 9a255344 caused a warning about missing cookies to be
displayed even if those cookies were present, because _check_cookies()
did not account for an empty cookiedomain.
2022-01-01 17:55:58 +01:00
Mike Fährmann
1e0278702d
[hitomi] update '_parse_gg()' 2022-01-01 17:55:58 +01:00
Mike Fährmann
becc7f85a6
[hitomi] fix image URLs 2021-12-29 22:46:17 +01:00
Mike Fährmann
6af8d71da6
[kemonoparty] use service as subcategory (closes #2147) 2021-12-29 22:46:17 +01:00
Vrihub
96fcff182c
generic extractor (#735)
* Generic extractor, see issue #683

* Fix failed test_names test, no subcategory needed

* Prefix directory_fmt with "generic"

* Relax regex (would break some urls)

* Flake8 compliance

* pattern: don't require a scheme

This fixes a bug when we force the generic extractor on urls without a
scheme (that are allowed by all other extractors).

* Fix using g: and r: on urls without http(s) scheme

Almost all extractors accept urls without an initial http(s) scheme.

Many extractors also allow for generic subdomains in their "pattern"
variable; some of them implement this with the regex character class
"[^.]+" (everything but a dot).

This leads to a problem when the extractor is given a url starting
with g: or r: (to force using the generic or recursive extractor)
and without the http(s) scheme: e.g. with "r:foobar.tumblr.com"
the "r:" is wrongly considered part of the subdomain.

This commit fixes the bug, replacing the too generic "[^.]+" with the
more specific "[\w-]+" (letters, digits and "-", the only characters
allowed in domain names), which is already used by some extractors.

* Relax imageurl_pattern_ext: allow relative urls

* First round of small suggested changes

* Support image urls starting with "//"

* self.baseurl: remove trailing slash

* Relax regexp (didn't catch some image urls)

* Some fixes and cleanup

* Fix domain pattern; option to enable extractor

Fixed the domain section for "pattern", to pass "test_add" and
"test_add_module" tests.
Added the "enabled" configuration option (default False) to enable the
generic extractor. Using "g(eneric):URL" forces using the extractor.
2021-12-29 22:39:29 +01:00
Mike Fährmann
4376b39a2b
[sexcom] fix and improve embed extraction (fixes #2145) 2021-12-28 21:59:39 +01:00
Mike Fährmann
6d190834ee
[instagram] fix error when PostPage data is not in GraphQL format
(#2037)
2021-12-28 00:27:59 +01:00
Mike Fährmann
dd67e24aa9
[lolisafe] include file ID in filenames
More precisely, it now splits the full 'filename' into 'name' and 'id'
instead of overwriting 'filename'. The format string stays the same as
before. Use '{name}.{extension}' to restore the old behavior.

before:
- filename: foobar
- id      : 12345

now:
- filename: foobar-12345
- name    : foobar
- id      : 12345
2021-12-25 17:16:45 +01:00
Mike Fährmann
f3d61de18d
[artstation] create directories per asset (closes #2136) 2021-12-25 17:16:45 +01:00
Mike Fährmann
49a50fb2eb
[500px] create directories per photo 2021-12-25 17:16:45 +01:00
Mike Fährmann
89bebe1bef
[500px] add 'favorite' extractor (closes #1927) 2021-12-25 17:16:45 +01:00
Mike Fährmann
22b0433985
[fanbox] support pixiv redirects (closes #2122) 2021-12-25 17:15:39 +01:00
Mike Fährmann
281828b58b
[tumblrgallery] improve search pagination (fixes #2132) 2021-12-24 03:42:28 +01:00
Mike Fährmann
4bec34fc94
[pixiv] allow setting a date range for search results (#2133)
with the 'scd' and 'ecd' query parameters
2021-12-23 23:03:39 +01:00
Mike Fährmann
882c614281
add album extractor for lolisafe/chibisafe instances
- support bunkr.is (closes #2038)
- support zz.ht    (closes #2105)
2021-12-21 19:24:17 +01:00
Mike Fährmann
d441888bfb
[deviantart] adjust API endpoints
Start all endpoints with a forward slash '/'
to be consistent with other API interfaces.
2021-12-21 00:18:06 +01:00
Mike Fährmann
8f0cf0bf71
[deviantart] use '/browse/newest' for most-recent searches
(#2096)
2021-12-20 22:40:03 +01:00
Mike Fährmann
0bd7607da5
[tumblrgallery] improve 'id' extraction (#2115) 2021-12-19 05:46:02 +01:00
Mike Fährmann
0d02a7861e
[tumblrgallery] fix extraction (closes #2112) 2021-12-17 19:55:53 +01:00
Mike Fährmann
62692c6842
[exhentai] add 'source' option
setting it to "hitomi" downloads the corresponding gallery from
hitomi.la; might be extended to other sources in the future
2021-12-16 23:16:19 +01:00
Mike Fährmann
099ed72de7
[hitomi] disable extra 'metadata' by default
safes one HTTP request that not needed with default filename settings
2021-12-16 22:21:07 +01:00
Mike Fährmann
9a25534490
use Extractor._check_cookies() for all cookie checks 2021-12-16 02:21:16 +01:00
Mike Fährmann
63c6bc26b5
[rule34us] extract tags per category (#1527)
like for other boorus with 'tags': true
2021-12-16 00:06:52 +01:00
Mike Fährmann
f587458a3c
[twitter] include '4096x4096' as a default image fallback
(closes #2107, closes #1881)
2021-12-15 23:19:30 +01:00
Mike Fährmann
8ed282f7f2
[kemonoparty] support coomer.party URLs (#2100) 2021-12-15 16:21:05 +01:00
Mike Fährmann
87ce3fa669
[furaffinity] warn when no session cookies were found 2021-12-15 16:21:05 +01:00
Mike Fährmann
159631c808
[philomena] use a default 'filter_id' if non is given 2021-12-15 16:20:53 +01:00
Mike Fährmann
ad30653b17
allow running a BaseExtractor for any URL
by prefixing it with '<base-category>:'

For example:
  shopify:https://partakefoods.com/products/crunchy-cookie-variety-pack
  gelbooru_v01:https://5naf.booru.org/index.php?page=post&s=view&id=46963

Available base categories are:
  mastodon, shopify, moebooru, gelbooru_v01, gelbooru_v02,
  reactor, foolslide, foolfuuka,  philomena
2021-12-15 00:32:17 +01:00
Mike Fährmann
299bd2f1f5
[rule34us] add 'tag' and 'post' extractors (#1527) 2021-12-14 00:27:46 +01:00
Mike Fährmann
3cf1075d86
[inkbunny] add 'search' extractor (closes #2094) 2021-12-12 03:08:14 +01:00
Mike Fährmann
c6a23c26d7
[instagram] allow downloading specific stories (closes #2088)
https://instagram.com/stories/<USER>/<ID> now only downloads the one
story specified by <ID> and not all stories from that user.
2021-12-11 21:34:25 +01:00
Mike Fährmann
352ffcddb0
[instagran] match post URLs with usernames (fixes #2085) 2021-12-10 18:37:33 +01:00
Mike Fährmann
f4e3cee6ac
use yt-dlp by default (#1850, #2028) 2021-11-29 18:24:26 +01:00
Mike Fährmann
f1b142e993
{kemonoparty[ change default 'files' order to attachments,file,inline
(#1991)
2021-11-29 04:41:30 +01:00
Mike Fährmann
275543b2d2
update extractor test results 2021-11-27 19:26:44 +01:00
Mike Fährmann
e7ea4f2567
[mangoxo] fix metadata extraction 2021-11-27 18:19:51 +01:00
Mike Fährmann
e298882acc
[kemonoparty] match URLs with www subdomain 2021-11-26 18:58:26 +01:00
Mike Fährmann
addb72e1bb
[reactor] support thatpervert.com (closes #2029) 2021-11-26 18:58:07 +01:00
Mike Fährmann
d8d9502e1e
[reactor] inherit from BaseExtractor 2021-11-26 18:58:07 +01:00
Mike Fährmann
f4ea216c95
[shopify] support loungeunderwear.com (closes #2053) 2021-11-26 18:58:06 +01:00
Mike Fährmann
93cef78450
[gelbooru] workaround pagination limits
Gelbooru only allows to retrieve the latest 20k posts for a tag search.
Add 'id:<N' to the search tags to work around that limitation, where N
is the ID of the last retrieved post.

http://gelbooru.me/index.php?page=forum&s=view&id=1467
2021-11-26 18:56:31 +01:00
Mike Fährmann
f2ae179713
[exhentai] fix extraction for disowned galleries (closes #2055) 2021-11-24 21:26:16 +01:00
Alice
612850438e
[skeb] add 'thumbnails' option (#2047) (#2051) 2021-11-23 21:16:42 +01:00
Mike Fährmann
11a3d96d13
[mangadex] load additional metadata using includes[] directives
- always provide 'artist', 'author', and 'group' metadata fields (#2049)
- remove 'metadata' option
2021-11-22 01:16:33 +01:00
Mike Fährmann
19e00f1322
[dynastyscans] provide 'date' as proper datetime object (#2050) 2021-11-21 22:50:52 +01:00
Mike Fährmann
af6424f398
allow testing metadata in list elements 2021-11-21 22:46:34 +01:00
Mike Fährmann
c67756e187
[kemonoparty] add 'dms' option (#2008) 2021-11-20 23:36:16 +01:00
Mike Fährmann
3a7a19c7b9
[dynastyscans] add 'manga' extractor (closes #2035) 2021-11-19 22:51:26 +01:00
Mike Fährmann
9bc83af3a6
[kemonoparty] 'postfile' -> 'file' (#1991)
to stay consistent with the existing file types for kemono
2021-11-19 01:50:48 +01:00
Mike Fährmann
522782c09d
[subscribestar] emit metadata for posts without media (#1569) 2021-11-18 23:42:17 +01:00
Mike Fährmann
1c8aaf9318
[subscribestar] add 'num' enumeration index (closes #2040) 2021-11-18 23:38:41 +01:00
Mike Fährmann
d433735750
[kemonoparty] skip duplicate files (#2032, #1991, #1899)
Extract the SHA-256 file hash from URLs
and skip files with the same hash in the same post.

- provide a 'hash' metadata field (empty string if not available)
- remove 'patreon-skip-file' option
2021-11-17 22:44:15 +01:00
Mike Fährmann
d4ec245554
[kemonoparty] implement a 'files' option (#1991)
similar to 8d676151
2021-11-17 22:43:41 +01:00
Mike Fährmann
ab8eea1a24
[twitter] fix extractor for direct image links (fixes #2030) 2021-11-16 22:57:46 +01:00
Mike Fährmann
2076d40681
[ytdl] improve error handling (#1680) 2021-11-15 22:56:42 +01:00
Mike Fährmann
2aaac3c997
[instagram] include user metadata for 'tagged' downloads (#2024)
Adds
- tagged_owner_id
- tagged_full_name
- tagged_username
containing the values for the user profile the URL originated from,
e.g. 'instagram' for https://www.instagram.com/instagram/tagged/.
2021-11-15 21:21:59 +01:00
Mike Fährmann
cfa4876848
[philomena] support furbooru.org (closes #1995) 2021-11-15 20:57:51 +01:00
Mike Fährmann
4377f1c284
[twitter] distinguish between fatal & nonfatal errors (#2020)
only show a warning for nonfatal errors
and do not raise a StopExtraction exception
2021-11-13 22:46:40 +01:00
Kyle Anthony Williams
a14b72be21
[webtoons] Use swebtoon-phinf.pstatic.net instead of webtoon-phinf.pstatic.net (#2005)
* [webtoons] Use swebtoon-phinf.pstatic.net instead of webtoon-phinf.pstatic.net

This trick to avoid having to set a Referer header comes from
Webtoon's RSS feeds. The two URLs below are equivalent in content:

https://webtoon-phinf.pstatic.net/20210929_153/1632867980912DmcGK_JPEG/16328679808882705182.jpg?type=q90
https://swebtoon-phinf.pstatic.net/20210929_153/1632867980912DmcGK_JPEG/16328679808882705182.jpg?type=q90

The URL with the domain "webtoon-phinf.pstatic.net" needs a Referer
header, and the domain "swebtoon-phinf.pstatic.net" does not. This
is because of the environment "swebtoon" images live in, one without
explicit network control: RSS feeds on sites such as Feedly. This change should
make it easier for gallery-dl developers to embed Webtoon comics without
worrying about headers.
2021-11-11 20:03:34 +01:00
Mike Fährmann
6e3658ef52
[kemonoparty] provide 'date' metadata for gumroad (#2007)
Not the 'published' or 'edited' values since they are 'null',
but still better then nothing at all.
2021-11-11 19:38:10 +01:00
Mike Fährmann
37c9dedee1
[seisoparty] remove module 2021-11-09 22:41:04 +01:00