Mike Fährmann
1bc77efa02
[artstation] use "browser": "firefox" by default ( #2527 )
2022-05-02 09:03:13 +02:00
Mike Fährmann
a39e7b7366
[vk] handle photos without width/height info ( fixes #2535 )
2022-05-02 09:03:00 +02:00
Federico Ravasio
0381752575
[photovogue] switch to .com, update api endpoint ( #2494 )
2022-04-27 22:37:53 +02:00
Mike Fährmann
3f02e483c6
[e621] fix applying request_interval_min ( #2533 )
...
Setting this property after calling Extractor.__init__() has no effect.
2022-04-27 21:10:34 +02:00
Mike Fährmann
afde76269c
[weibo] fix infinite retries for deleted accounts ( fixes #2521 )
2022-04-27 20:23:11 +02:00
Mike Fährmann
d85e66bcac
[vk] fix extraction ( #2512 )
...
Use a different API endpoint, since thumbnail URLs from the old one
cannot be transformed into URLs for "original" photos anymore.
2022-04-21 14:01:50 +02:00
Mike Fährmann
9e6ff42a9d
[pixiv] implement 'background' option ( #623 , #1124 , #2495 )
2022-04-21 13:53:02 +02:00
Mike Fährmann
4d1896830f
[mangadex] download chapters with 'externalUrl' ( fixes #2503 )
...
if the have pages hosted on mangadex
2022-04-18 18:09:52 +02:00
Mike Fährmann
97e8a15295
[deviantart] implement 'pagination' option ( #2488 )
2022-04-18 18:08:01 +02:00
Mike Fährmann
1f9a0e2fd8
update extractor test results
2022-04-18 17:24:00 +02:00
Mike Fährmann
ad5a4b1756
[twitter] fix various syndication issues
...
- handle retweets
- fix videos without dimensions in URL (3e942a58
)
- fix '"retweets": "self"' filter (#2499 )
2022-04-15 20:49:26 +02:00
Mike Fährmann
12bd9ba33a
[readcomiconline] add 'quality' option ( #2467 )
2022-04-15 18:10:37 +02:00
Mike Fährmann
60ad46ddcc
[readcomiconline] unobfuscate image URLs ( #2481 )
2022-04-15 18:04:09 +02:00
Mike Fährmann
a6c4ff58fb
[cyberdrop] match cyberdrop.to URLs ( closes #2496 )
2022-04-15 15:39:29 +02:00
Mike Fährmann
13ed18b9aa
[lolisafe] fix typo
...
LolisafelbumExtractor -> LolisafeAlbumExtractor
2022-04-15 15:02:30 +02:00
Mike Fährmann
3e942a58be
[twitter] improve syndication video selection ( #2354 )
...
- ignore .m3u8 manifests
- always select largest format
2022-04-11 17:06:10 +02:00
Mike Fährmann
0794027100
[issuu] fix extraction ( #2483 )
2022-04-10 14:23:10 +02:00
Mike Fährmann
5d5a08cc69
[sexcom] add fallback for empty files ( #2485 )
2022-04-10 14:22:07 +02:00
thatfuckingbird
4527a35aba
[twitter] accept fxtwitter.com URLs ( #2484 )
2022-04-08 14:32:08 +02:00
Mike Fährmann
c1768972c2
[newgrounds] update and fix pagination ( #2456 )
2022-04-07 15:38:41 +02:00
Mike Fährmann
78e5d0c423
[kissgoddess] extract all images ( closes #2473 )
...
and not only the first two per page
https://github.com/mikf/gallery-dl/issues/1052#issuecomment-1047367383
2022-04-06 21:28:40 +02:00
Mike Fährmann
0b33435da5
[pinterest] support multiple files per pin ( closes #1619 , #2452 )
2022-04-06 21:21:33 +02:00
Mike Fährmann
9c5d2d7af3
[pinterest] add extractor for created pins ( #2452 )
2022-04-01 16:59:58 +02:00
Mike Fährmann
1171911dc3
[twitter] add 'syndication' option ( #2354 )
...
to fetch age-restricted content using Twitter's syndication API
2022-04-01 16:56:47 +02:00
Mike Fährmann
a53cfc845e
[newgrounds] warn about age-restricted posts ( #2456 )
2022-03-30 16:18:33 +02:00
Mike Fährmann
ecee315bbf
[mangasee] unescape manga names ( fixes #2454 )
2022-03-30 16:18:18 +02:00
loragja
7e545a3ae9
[gofile] add gofile.io extractor ( #2364 )
...
* Add gofile extractor
* add gofile extractor to module list
* add support for tiny monitors and ancient python versions
* seriously, f-strings are not *that* new...
* i love flake8 :)
* add 'api-token' and 'recursive' options
* add tests
2022-03-29 17:31:57 +02:00
Layerex
625f4d4cc4
[telegraph] Add telegra.ph extractor ( #2312 )
2022-03-28 19:18:13 +02:00
Mike Fährmann
48cc4853be
[skeb] refactor 'sent-requests' and add tests
2022-03-28 11:26:24 +02:00
Mike Fährmann
37d584a9b2
[hitomi] update metadata extraction ( fixes #2444 )
...
remove 'hitomi.metadata' option, as it is no longer necessary
to make additional HTTP requests to fetch all metadata.
2022-03-26 12:46:18 +01:00
Mike Fährmann
b03ca7f10c
[aryion] provide correct 'date' independent of dst
2022-03-24 22:57:18 +01:00
Mike Fährmann
ba69fb669d
[kemonoparty] add 'duplicates' option ( closes #2440 )
2022-03-24 11:58:38 +01:00
Mike Fährmann
29db716a63
implement 'datetime_to_timestamp()'
...
and rename 'to_timestamp()'
to the more descriptive 'datetime_to_timestamp_string()'
2022-03-23 22:36:01 +01:00
Mike Fährmann
9313d4dc10
[pinterest] do not force 'm3u8_native' for video downloads ( #2436 )
2022-03-21 10:11:51 +01:00
Mike Fährmann
42f2fd2ed7
[twibooru] fix posts without 'name' ( fixes #2434 )
2022-03-21 10:08:37 +01:00
chinggg
6f1d5e8ab9
[unsplash] replace dash with space in search API queries ( #2429 )
2022-03-19 16:00:05 +01:00
Mike Fährmann
f8230dde43
[instagram] add 'previews' option ( #2135 )
2022-03-19 15:26:40 +01:00
Mike Fährmann
500a479026
fix a third(!) bug in _check_cookies() ( #2372 )
...
turns out tests are worthless if you get em wrong ...
2022-03-18 19:52:37 +01:00
Mike Fährmann
c4cc387f7d
[furaffinity] fix search result pagination ( fixes #2402 )
2022-03-18 13:44:36 +01:00
Mike Fährmann
281a5b3b28
[newgrounds] fix video descriptions ( #2328 )
2022-03-14 08:38:20 +01:00
Mike Fährmann
b1b15d6cef
[imagebam] add support for /view/ paths ( closes #2378 )
2022-03-14 08:38:20 +01:00
Mike Fährmann
e64c2b85d0
[fantia] apply patch ( #2381 )
...
from @thatfuckingbird with small adjustments
https://github.com/mikf/gallery-dl/issues/2381#issuecomment-1063208696
2022-03-11 18:02:31 +01:00
Mike Fährmann
f31ab0d2ec
[fanbox] fetch data for each individual post ( fixes #2388 )
...
Posts from 'https://api.fanbox.cc/post.listCreator '
do not contain a 'body' with all images anymore.
https://github.com/mikf/gallery-dl/pull/1459#discussion_r614322881
2022-03-11 17:36:05 +01:00
Mike Fährmann
fc277fa45f
[seiga] require authentication with 'user_session' cookie ( #2372 )
...
Login with username & password would now require entering a 2FA token.
see also 7b009cc893
2022-03-11 02:10:15 +01:00
Mike Fährmann
47cf05c4ab
refactor proxy handling code ( #2357 )
...
- allow gallery-dl proxy settings to overwrite environment proxies
- allow specifying different proxies for data extraction and download
- add 'downloader.proxy' option
- '-o extractor.proxy=–PROXY_URL -o downloader.proxy=null'
now has the same effect as youtube-dl's '--geo-verification-proxy'
2022-03-10 23:55:35 +01:00
Mike Fährmann
d50a1ec2cc
[subscribestar] unescape attachment URLs ( fixes #2370 )
2022-03-09 19:06:04 +01:00
Mike Fährmann
3ddc620ef6
[skeb] fix post extractor ( #2330 )
2022-03-09 18:45:07 +01:00
Orkun Koçyiğit
eb2bb7d998
[fantia] add 'num' enumeration index ( #2377 )
...
* Adding numerical ordering to fantia
* Fixed line to fit PEP8 line size limit
2022-03-08 22:06:41 +01:00
Mike Fährmann
fac8047899
[kemonoparty] limit default filename length ( #2373 )
2022-03-08 21:14:47 +01:00
Mike Fährmann
bfa5e61900
[patreon] add explicit 'image_large' file type ( #2257 )
...
to allow more control over when and if to download 'large_url' images
4fee3a0e52
forced them to be downloaded
instead of regular images, even though 'large_url' images are most likely
an upscaled version of the original.
2022-03-06 17:07:13 +01:00
Mike Fährmann
6ea3ff5173
[tumblr] notify users about registering an oauth application
...
if they hit the daily rate limit and are using default API credentials
2022-03-06 16:28:53 +01:00
Mike Fährmann
b5236656d5
[deviantart] notify users about registering an oauth application
...
if they get repeated 429 errors and are using default API credentials
2022-03-06 16:24:39 +01:00
Mike Fährmann
2aa47e8382
[twitter] handle Tweets with "softIntervention" entries
...
or other such things where the actual Tweet data is one level deeper
than usual
2022-03-03 02:06:54 +01:00
Mike Fährmann
64bbc7969d
[twitter] warn about age-restricted Tweets ( #2354 )
2022-03-03 02:03:27 +01:00
Mike Fährmann
e778be52bc
[twitter] update query hashes
2022-03-02 23:05:31 +01:00
Mike Fährmann
bddcec49f1
implement 'text.root_from_url()'
...
use domain from input URL for kemono
2022-03-01 03:09:57 +01:00
Mike Fährmann
92c492dc09
[kemonoparty] match beta.kemono.party URLs ( #2348 )
2022-03-01 03:02:30 +01:00
Mike Fährmann
4ea9157d51
[mangadex] fix chapters without 'translatedLanguage' ( #2352 )
2022-03-01 02:04:25 +01:00
Alice
f1cab23724
[skeb] add 'sent-requests' option ( #2322 ) ( #2330 )
...
* Update skeb.py
* Update configuration.rst
* flake8
2022-02-28 22:42:15 +01:00
dragobit
781fdfa212
[hentaicosplays] add Referer to headers ( #2317 )
2022-02-28 22:19:32 +01:00
Mike Fährmann
4385a34e05
[twitter] fix handling of 429 responses ( fixes #2339 )
...
Twitter doesn't return a valid JSON response for 429 errors anymore.
2022-02-28 16:42:55 +01:00
Mike Fährmann
5a50569360
[toyhouse] support 'art' listings ( #1546 , #2331 )
2022-02-27 16:22:50 +01:00
Mike Fährmann
1c79044433
[imagebam] set 'nsfw_inter' cookie ( fixes #2334 )
2022-02-27 16:12:28 +01:00
Mike Fährmann
d71c173150
[newgrounds] strip incomplete HTML tag from '_comment' ( #2328 )
2022-02-23 21:42:28 +01:00
Mike Fährmann
cf58048bd4
[newgrounds] add 'post_url' metadata field ( #2328 )
2022-02-23 00:00:23 +01:00
Mike Fährmann
7aa2e2cd84
[slideshare] fix extraction
2022-02-21 02:52:45 +01:00
Mike Fährmann
fdfdc1b614
[kissgoddess] add 'gallery' and 'model' extractors
...
(closes #1052 , #2304 )
2022-02-20 04:45:37 +01:00
Mike Fährmann
79a461a2c1
[mememuseum] add 'tag' and 'post' extractors ( closes #2264 )
2022-02-20 02:15:38 +01:00
Mike Fährmann
e5f6af6e32
[oauth:pixiv] add note about 'code' expiring in 30 seconds ( #2306 )
2022-02-19 23:47:30 +01:00
Mike Fährmann
bbc4190017
[bunkr] fix .mp4 downloads ( #2239 )
...
again ...
2022-02-19 03:55:14 +01:00
Mike Fährmann
254a5b26e0
[twibooru] add extractors for searches, galleries, and posts
...
(#2219 )
2022-02-18 23:43:57 +01:00
Mike Fährmann
9ebc20e290
[booru] call nameext_from_url() before update() and _prepare()
...
to be able to overwrite filename and extension in _prepare()
2022-02-18 00:37:59 +01:00
Mike Fährmann
4fee3a0e52
[patreon] download 'large_url' images if available ( #2257 )
2022-02-17 18:23:59 +01:00
Mike Fährmann
f5b2b9333f
fix another bug in _check:cookies ( #2160 )
...
regression introduced in ed317bfc
Added a couple of tests to hopefully catch such bugs
before they land in a release.
2022-02-16 22:58:57 +01:00
Ailothaen
203a04a4a3
[reddit] Support of standalone submissions on personal pages of users ( #2301 )
...
* [reddit] Support of submissions on personal pages of users
* [reddit] Design improvement for user submissions
* [reddit] Removed functions declared twice
2022-02-13 23:03:46 +01:00
Mike Fährmann
806bc62379
[redgifs] support 'i.redgifs.com' URLs ( closes #2300 )
2022-02-13 23:00:50 +01:00
Mike Fährmann
655b2de5d9
[vk] fix infinite pagination loops ( fixes #2297 )
2022-02-13 23:00:50 +01:00
Mike Fährmann
cc5b1ce91a
[inkbunny] rename search parameters to their API equivalents
...
(fixes #2292 )
2022-02-13 23:00:49 +01:00
Mike Fährmann
ed317bfcf1
warn about cookies expiring in less than 24 hours
...
requires an expiration timestamp,
so this only works with cookies from a cookies.txt file
2022-02-13 23:00:49 +01:00
David Hoppenbrouwers
b17e2dcf93
[wallpapercave] add extractor for images ( #2205 )
2022-02-11 23:44:51 +01:00
v-delta
c661737f36
[Imgbox] Fix ImgboxExtractor ( #2281 )
2022-02-11 22:17:02 +01:00
Thomas Jost
a7de819aca
[lightroom] add Lightroom gallery extractor ( #2263 )
2022-02-11 21:30:59 +01:00
Mike Fährmann
563bd0ecf4
[danbooru] inherit from BaseExtractor
...
- merge danbooru and e621 code
- support booru.allthefallen.moe (closes #2283 )
- remove support for old e621 tag search URLs
2022-02-11 21:01:51 +01:00
Mike Fährmann
bc0e853d30
combine KeyError & IndexError to common base class LookupError
2022-02-11 00:42:49 +01:00
Mike Fährmann
f1c853c6ef
[furaffinity] add 'layout' option ( #2277 )
...
to be able to force gallery-dl to parse according to a specific layout
in case its auto-detect fails
2022-02-11 00:28:47 +01:00
Mike Fährmann
b4f8e15a1f
allow BaseExtractors to use the domain pf the matched URL
2022-02-10 01:38:50 +01:00
Mike Fährmann
a57a44f510
[kemonoparty] handle files without 'name' ( fixes #2276 )
2022-02-08 18:27:05 +01:00
Mike Fährmann
4efe56f419
[furaffinity] improve new/old layout detection ( fixes #2277 )
2022-02-08 18:10:52 +01:00
Mike Fährmann
0f1e7ff319
[twitter] fix extraction ( #2275 )
2022-02-07 23:18:35 +01:00
Mike Fährmann
dee0d22561
update extractor test results
2022-02-06 21:39:24 +01:00
Mike Fährmann
d7b8e04b50
[kemonoparty] use 'Accept-Encoding: identity' for all downloads
...
(#2267 )
fixes issues when data send with 'Content-Encoding: gzip' or other
encodings is larger than the actual file
2022-02-05 18:06:58 +01:00
enormous-muscles
55326377d8
Add Kohlchan extractor ( #2251 )
2022-02-04 23:22:17 +01:00
Mike Fährmann
cc7dce5755
[sexcom] add 'pins' extractor ( closes #2265 )
2022-02-04 20:55:00 +01:00
Mike Fährmann
02e18f56be
[e621] add 'favorite' extractor ( closes #2250 )
2022-02-04 20:54:48 +01:00
Mike Fährmann
70e6e1549e
[twitter] provide fallback URLs for card images
...
f2e8aedd74 (commitcomment-64057751)
2022-02-03 23:43:18 +01:00
Mike Fährmann
86fa412b47
[hitomi] add 'format' option ( #2260 )
...
default is 'webp' since downloading original files is no longer allowed
2022-02-03 23:32:19 +01:00
Mike Fährmann
492436f936
[twitter] add 'warnings' option ( #2258 )
...
disable reporting any non-fatal errors by default
2022-02-02 18:37:19 +01:00
Mike Fährmann
a5163e4c70
[twitter] restore 'logout' functionality ( #1719 )
2022-02-02 18:21:15 +01:00
Mike Fährmann
f58364f6a8
update Firefox cipher list
2022-02-01 02:33:01 +01:00
Mike Fährmann
7e6981dda6
rename 'disabletls12' to 'tls12'
...
and let config options override any default settings
2022-02-01 01:37:03 +01:00
Mike Fährmann
bb3e182562
overhaul session initialization
...
- share adapter & connection pool across sessions with the same
ssl options, ssl ciphers, and source address
- simplify browser emulation to just a list of headers and ciphers
2022-01-31 23:12:08 +01:00
Mike Fährmann
e670dc518e
[weibo] update pagination code ( fixes #2244 )
...
- send proper headers and query parameters
- use 'since_id' instead of page numbers
- set a 1-2 second delay between requests
2022-01-31 19:16:01 +01:00
Robert Pendell
4c651f6252
[patreon] Disable TLS 1.2 by default ( #2249 )
...
Disables TLS 1.2 on Patreon by default.
2022-01-30 23:30:44 +01:00
Robert Pendell
392cf079f7
Add ability to disable TLS 1.2 ( #2243 )
...
Fix for Patreon Cloudflare issues by having only TLS v1.3 or higher establish HTTPS connections
This now allows you to disable it on a per-host or global basis. Add disabletls12 as a config option either under extractor.(host) or just under extractor. Option is false by default.
Example:
"patreon":
{
"disabletls12": true,
"cookies": {
"session_id": "X"
}
}
2022-01-30 22:14:43 +01:00
Mike Fährmann
d33227fc38
[twitter] restore errors for protected timelines etc ( fixes #2237 )
2022-01-30 16:42:13 +01:00
Mike Fährmann
ebd3d5c1cc
[bunkr] fix .mp4 downloads ( closes #2239 )
2022-01-28 23:21:16 +01:00
Mike Fährmann
e2be199124
[gelbooru] improve and fix pagination ( #2230 , #2232 )
...
Use 'id:<POSTID' as a tag instead of going through pages with 'pid'.
Something similar was already implemented in 93cef784
,
but that got broken again in 3085aac4
.
2022-01-27 17:44:47 +01:00
Mike Fährmann
8230f31800
[twitter] update query hashes
2022-01-26 00:49:46 +01:00
Mike Fährmann
c180806cec
[twitter] fix deleted/invalid retweets ( #2225 )
2022-01-25 23:57:13 +01:00
Mike Fährmann
a2eecc6aa8
[kemonoparty] fix DMs extraction ( #2008 )
2022-01-25 23:16:13 +01:00
Mike Fährmann
2bf554a896
[twitter] fix several errors ( #2212 , #2216 , #2225 )
...
- fix Tweets with deleted quotes
- fix suspended Tweets without 'legacy' entry
- fix unified_cards without 'type'
2022-01-25 16:13:22 +01:00
Mike Fährmann
e5242b83bf
[twitter] define directory format for events ( #2109 )
2022-01-24 17:44:17 +01:00
Mike Fährmann
efb3e65a6a
[sexcom] extend URL pattern ( fixes #2220 )
2022-01-24 01:19:40 +01:00
vsyx
3f2b6335d7
[instagram] fix highlights extraction ( #2197 )
...
* [instagram] fix highlights extraction
* [instagram] improve highlights extraction
- 'yield' individual reels instead of collecting them in a list
and returning them all at once
- reduce 'chunk_size' to an even saver value
(instagram.com also uses 5)
2022-01-24 00:20:12 +01:00
Mike Fährmann
5ed26e1773
[twitter] fix pinned tweets ( #2216 )
...
caused by the changes in dffa440ede
2022-01-23 22:52:57 +01:00
Mike Fährmann
a9f78e6527
[twitter] improve error handling
...
- handle accounts without 'rest_id'
- handle timelines with empty 'instructions'
2022-01-23 18:01:05 +01:00
Mike Fährmann
729b07c1f5
[twitter] simplify
...
- use dict with common GraphQL variables
- reduce 'variables' size with custom JSON encoder instance
- centralise TwitterAPI() creation
2022-01-23 01:44:55 +01:00
Mike Fährmann
7cb29224f0
[philomena] fix search parameter escaping ( #2215 )
...
The pluses from search terms in /tags/ URLs need to be
replaced with spaces to get accepted by Philomena.
2022-01-23 01:03:37 +01:00
Mike Fährmann
9ca8bb2dc0
[twitter] improve error handling
2022-01-22 23:09:45 +01:00
Mike Fährmann
9a221494c3
[twitter] add 'event' extractor ( closes #2109 )
2022-01-22 20:55:50 +01:00
Mike Fährmann
14867dad6b
[twitter] fix unified cards from search results
2022-01-22 20:25:10 +01:00
Mike Fährmann
dffa440ede
[twitter] improve handling of deleted tweets ( #2212 )
2022-01-22 00:41:58 +01:00
Mike Fährmann
54ef874ba4
[twitter] fix retweet filter ( #2212 )
2022-01-21 23:53:59 +01:00
Mike Fährmann
cb43f7731b
[twitter] update to GraphQL API ( #2212 )
...
The old REST API endpoints, which were not used by Twitter since
summer 2021, are going to finally be phased out it seems, with
'/2/timeline/profile/USERID.json' being the first one.
Only Twitter's search doesn't have a GraphQL interface yet.
2022-01-21 23:34:41 +01:00
Mike Fährmann
de754590e0
add --source-address command-line option ( closes #2206 )
2022-01-21 17:07:56 +01:00
Mike Fährmann
698f35215e
[blogger] support new image domain ( fixes #2204 )
2022-01-20 23:13:07 +01:00
Mike Fährmann
c587b678d0
[mangadex] re-enable warning for external chapters ( #2193 )
2022-01-16 03:21:50 +01:00
Mike Fährmann
f2e8aedd74
[twitter] changes to 'cards' option
...
- change default value to 'true'
- only invoke youtube-dl for cards unsupported by gallery
when 'cards' is set to "ytdl"
"cards": true --> only download card images
"cards": "ytdl" --> download card images and
use youtube_dl on otherwise unsupported cards
2022-01-15 22:02:57 +01:00
Mike Fährmann
2d34d8ff8b
[reddit] allow downloading from quarantined subreddits ( #2180 )
2022-01-14 21:55:59 +01:00
Mike Fährmann
17c9c47ca0
[hitomi] fix 'tag' extraction ( fixes #2189 )
2022-01-13 16:45:46 +01:00
Mike Fährmann
df2f0c09bb
[twitter] support "image_carousel_website" unified cards
2022-01-13 16:05:52 +01:00
Mike Fährmann
cdc96e1217
[gelbooru] improve video file detection ( fixes #2188 )
...
not all files from 'https://video-cdnN.gelbooru.com ' are videos
2022-01-12 21:33:02 +01:00
Mike Fährmann
4acc31bd9f
[newgrounds] set suitabilities filter before starting a search
2022-01-11 23:50:29 +01:00
Mike Fährmann
170711af7e
[mangadex] fix extraction ( closes #2177 )
2022-01-08 17:21:35 +01:00
Mike Fährmann
199e7616a7
[rule34] use https://api.rule34.xxx for API requests
2022-01-08 17:14:50 +01:00
Mike Fährmann
37beb1298e
[newgrounds] add 'search' extractor ( closes #2161 )
2022-01-06 19:32:39 +01:00
Mike Fährmann
8b910dd8ae
[hitomi] fix image URLs
...
again and again ...
2022-01-06 18:21:26 +01:00
Mike Fährmann
3085aac4d8
[gelbooru] handle changed API response format ( #2157 )
2022-01-03 16:42:48 +01:00
Mike Fährmann
38e2af29d6
[hitomi] fix image URLs
...
update '_parse_gg()' yet again
2022-01-03 16:41:00 +01:00
Mike Fährmann
6f2e0c9c3d
fix cookie checks for patreon, fanbox, fantia
...
The changes in 9a255344
caused a warning about missing cookies to be
displayed even if those cookies were present, because _check_cookies()
did not account for an empty cookiedomain.
2022-01-01 17:55:58 +01:00
Mike Fährmann
1e0278702d
[hitomi] update '_parse_gg()'
2022-01-01 17:55:58 +01:00
Mike Fährmann
becc7f85a6
[hitomi] fix image URLs
2021-12-29 22:46:17 +01:00
Mike Fährmann
6af8d71da6
[kemonoparty] use service as subcategory ( closes #2147 )
2021-12-29 22:46:17 +01:00
Vrihub
96fcff182c
generic extractor ( #735 )
...
* Generic extractor, see issue #683
* Fix failed test_names test, no subcategory needed
* Prefix directory_fmt with "generic"
* Relax regex (would break some urls)
* Flake8 compliance
* pattern: don't require a scheme
This fixes a bug when we force the generic extractor on urls without a
scheme (that are allowed by all other extractors).
* Fix using g: and r: on urls without http(s) scheme
Almost all extractors accept urls without an initial http(s) scheme.
Many extractors also allow for generic subdomains in their "pattern"
variable; some of them implement this with the regex character class
"[^.]+" (everything but a dot).
This leads to a problem when the extractor is given a url starting
with g: or r: (to force using the generic or recursive extractor)
and without the http(s) scheme: e.g. with "r:foobar.tumblr.com"
the "r:" is wrongly considered part of the subdomain.
This commit fixes the bug, replacing the too generic "[^.]+" with the
more specific "[\w-]+" (letters, digits and "-", the only characters
allowed in domain names), which is already used by some extractors.
* Relax imageurl_pattern_ext: allow relative urls
* First round of small suggested changes
* Support image urls starting with "//"
* self.baseurl: remove trailing slash
* Relax regexp (didn't catch some image urls)
* Some fixes and cleanup
* Fix domain pattern; option to enable extractor
Fixed the domain section for "pattern", to pass "test_add" and
"test_add_module" tests.
Added the "enabled" configuration option (default False) to enable the
generic extractor. Using "g(eneric):URL" forces using the extractor.
2021-12-29 22:39:29 +01:00
Mike Fährmann
4376b39a2b
[sexcom] fix and improve embed extraction ( fixes #2145 )
2021-12-28 21:59:39 +01:00
Mike Fährmann
6d190834ee
[instagram] fix error when PostPage data is not in GraphQL format
...
(#2037 )
2021-12-28 00:27:59 +01:00
Mike Fährmann
dd67e24aa9
[lolisafe] include file ID in filenames
...
More precisely, it now splits the full 'filename' into 'name' and 'id'
instead of overwriting 'filename'. The format string stays the same as
before. Use '{name}.{extension}' to restore the old behavior.
before:
- filename: foobar
- id : 12345
now:
- filename: foobar-12345
- name : foobar
- id : 12345
2021-12-25 17:16:45 +01:00
Mike Fährmann
f3d61de18d
[artstation] create directories per asset ( closes #2136 )
2021-12-25 17:16:45 +01:00
Mike Fährmann
49a50fb2eb
[500px] create directories per photo
2021-12-25 17:16:45 +01:00
Mike Fährmann
89bebe1bef
[500px] add 'favorite' extractor ( closes #1927 )
2021-12-25 17:16:45 +01:00
Mike Fährmann
22b0433985
[fanbox] support pixiv redirects ( closes #2122 )
2021-12-25 17:15:39 +01:00
Mike Fährmann
281828b58b
[tumblrgallery] improve search pagination ( fixes #2132 )
2021-12-24 03:42:28 +01:00
Mike Fährmann
4bec34fc94
[pixiv] allow setting a date range for search results ( #2133 )
...
with the 'scd' and 'ecd' query parameters
2021-12-23 23:03:39 +01:00
Mike Fährmann
882c614281
add album extractor for lolisafe/chibisafe instances
...
- support bunkr.is (closes #2038 )
- support zz.ht (closes #2105 )
2021-12-21 19:24:17 +01:00
Mike Fährmann
d441888bfb
[deviantart] adjust API endpoints
...
Start all endpoints with a forward slash '/'
to be consistent with other API interfaces.
2021-12-21 00:18:06 +01:00
Mike Fährmann
8f0cf0bf71
[deviantart] use '/browse/newest' for most-recent searches
...
(#2096 )
2021-12-20 22:40:03 +01:00
Mike Fährmann
0bd7607da5
[tumblrgallery] improve 'id' extraction ( #2115 )
2021-12-19 05:46:02 +01:00
Mike Fährmann
0d02a7861e
[tumblrgallery] fix extraction ( closes #2112 )
2021-12-17 19:55:53 +01:00
Mike Fährmann
62692c6842
[exhentai] add 'source' option
...
setting it to "hitomi" downloads the corresponding gallery from
hitomi.la; might be extended to other sources in the future
2021-12-16 23:16:19 +01:00
Mike Fährmann
099ed72de7
[hitomi] disable extra 'metadata' by default
...
safes one HTTP request that not needed with default filename settings
2021-12-16 22:21:07 +01:00
Mike Fährmann
9a25534490
use Extractor._check_cookies() for all cookie checks
2021-12-16 02:21:16 +01:00
Mike Fährmann
63c6bc26b5
[rule34us] extract tags per category ( #1527 )
...
like for other boorus with 'tags': true
2021-12-16 00:06:52 +01:00
Mike Fährmann
f587458a3c
[twitter] include '4096x4096' as a default image fallback
...
(closes #2107 , closes #1881 )
2021-12-15 23:19:30 +01:00
Mike Fährmann
8ed282f7f2
[kemonoparty] support coomer.party URLs ( #2100 )
2021-12-15 16:21:05 +01:00
Mike Fährmann
87ce3fa669
[furaffinity] warn when no session cookies were found
2021-12-15 16:21:05 +01:00
Mike Fährmann
159631c808
[philomena] use a default 'filter_id' if non is given
2021-12-15 16:20:53 +01:00
Mike Fährmann
ad30653b17
allow running a BaseExtractor for any URL
...
by prefixing it with '<base-category>:'
For example:
shopify:https://partakefoods.com/products/crunchy-cookie-variety-pack
gelbooru_v01:https://5naf.booru.org/index.php?page=post&s=view&id=46963
Available base categories are:
mastodon, shopify, moebooru, gelbooru_v01, gelbooru_v02,
reactor, foolslide, foolfuuka, philomena
2021-12-15 00:32:17 +01:00
Mike Fährmann
299bd2f1f5
[rule34us] add 'tag' and 'post' extractors ( #1527 )
2021-12-14 00:27:46 +01:00
Mike Fährmann
3cf1075d86
[inkbunny] add 'search' extractor ( closes #2094 )
2021-12-12 03:08:14 +01:00
Mike Fährmann
c6a23c26d7
[instagram] allow downloading specific stories ( closes #2088 )
...
https://instagram.com/stories/ <USER>/<ID> now only downloads the one
story specified by <ID> and not all stories from that user.
2021-12-11 21:34:25 +01:00
Mike Fährmann
352ffcddb0
[instagran] match post URLs with usernames ( fixes #2085 )
2021-12-10 18:37:33 +01:00
Mike Fährmann
f4e3cee6ac
use yt-dlp by default ( #1850 , #2028 )
2021-11-29 18:24:26 +01:00
Mike Fährmann
f1b142e993
{kemonoparty[ change default 'files' order to attachments,file,inline
...
(#1991 )
2021-11-29 04:41:30 +01:00
Mike Fährmann
275543b2d2
update extractor test results
2021-11-27 19:26:44 +01:00
Mike Fährmann
e7ea4f2567
[mangoxo] fix metadata extraction
2021-11-27 18:19:51 +01:00
Mike Fährmann
e298882acc
[kemonoparty] match URLs with www subdomain
2021-11-26 18:58:26 +01:00
Mike Fährmann
addb72e1bb
[reactor] support thatpervert.com ( closes #2029 )
2021-11-26 18:58:07 +01:00
Mike Fährmann
d8d9502e1e
[reactor] inherit from BaseExtractor
2021-11-26 18:58:07 +01:00
Mike Fährmann
f4ea216c95
[shopify] support loungeunderwear.com ( closes #2053 )
2021-11-26 18:58:06 +01:00
Mike Fährmann
93cef78450
[gelbooru] workaround pagination limits
...
Gelbooru only allows to retrieve the latest 20k posts for a tag search.
Add 'id:<N' to the search tags to work around that limitation, where N
is the ID of the last retrieved post.
http://gelbooru.me/index.php?page=forum&s=view&id=1467
2021-11-26 18:56:31 +01:00
Mike Fährmann
f2ae179713
[exhentai] fix extraction for disowned galleries ( closes #2055 )
2021-11-24 21:26:16 +01:00
Alice
612850438e
[skeb] add 'thumbnails' option ( #2047 ) ( #2051 )
2021-11-23 21:16:42 +01:00
Mike Fährmann
11a3d96d13
[mangadex] load additional metadata using includes[] directives
...
- always provide 'artist', 'author', and 'group' metadata fields (#2049 )
- remove 'metadata' option
2021-11-22 01:16:33 +01:00
Mike Fährmann
19e00f1322
[dynastyscans] provide 'date' as proper datetime object ( #2050 )
2021-11-21 22:50:52 +01:00
Mike Fährmann
af6424f398
allow testing metadata in list elements
2021-11-21 22:46:34 +01:00
Mike Fährmann
c67756e187
[kemonoparty] add 'dms' option ( #2008 )
2021-11-20 23:36:16 +01:00
Mike Fährmann
3a7a19c7b9
[dynastyscans] add 'manga' extractor ( closes #2035 )
2021-11-19 22:51:26 +01:00
Mike Fährmann
9bc83af3a6
[kemonoparty] 'postfile' -> 'file' ( #1991 )
...
to stay consistent with the existing file types for kemono
2021-11-19 01:50:48 +01:00
Mike Fährmann
522782c09d
[subscribestar] emit metadata for posts without media ( #1569 )
2021-11-18 23:42:17 +01:00
Mike Fährmann
1c8aaf9318
[subscribestar] add 'num' enumeration index ( closes #2040 )
2021-11-18 23:38:41 +01:00
Mike Fährmann
d433735750
[kemonoparty] skip duplicate files ( #2032 , #1991 , #1899 )
...
Extract the SHA-256 file hash from URLs
and skip files with the same hash in the same post.
- provide a 'hash' metadata field (empty string if not available)
- remove 'patreon-skip-file' option
2021-11-17 22:44:15 +01:00
Mike Fährmann
d4ec245554
[kemonoparty] implement a 'files' option ( #1991 )
...
similar to 8d676151
2021-11-17 22:43:41 +01:00
Mike Fährmann
ab8eea1a24
[twitter] fix extractor for direct image links ( fixes #2030 )
2021-11-16 22:57:46 +01:00
Mike Fährmann
2076d40681
[ytdl] improve error handling ( #1680 )
2021-11-15 22:56:42 +01:00
Mike Fährmann
2aaac3c997
[instagram] include user metadata for 'tagged' downloads ( #2024 )
...
Adds
- tagged_owner_id
- tagged_full_name
- tagged_username
containing the values for the user profile the URL originated from,
e.g. 'instagram' for https://www.instagram.com/instagram/tagged/ .
2021-11-15 21:21:59 +01:00
Mike Fährmann
cfa4876848
[philomena] support furbooru.org ( closes #1995 )
2021-11-15 20:57:51 +01:00
Mike Fährmann
4377f1c284
[twitter] distinguish between fatal & nonfatal errors ( #2020 )
...
only show a warning for nonfatal errors
and do not raise a StopExtraction exception
2021-11-13 22:46:40 +01:00
Kyle Anthony Williams
a14b72be21
[webtoons] Use swebtoon-phinf.pstatic.net instead of webtoon-phinf.pstatic.net ( #2005 )
...
* [webtoons] Use swebtoon-phinf.pstatic.net instead of webtoon-phinf.pstatic.net
This trick to avoid having to set a Referer header comes from
Webtoon's RSS feeds. The two URLs below are equivalent in content:
https://webtoon-phinf.pstatic.net/20210929_153/1632867980912DmcGK_JPEG/16328679808882705182.jpg?type=q90
https://swebtoon-phinf.pstatic.net/20210929_153/1632867980912DmcGK_JPEG/16328679808882705182.jpg?type=q90
The URL with the domain "webtoon-phinf.pstatic.net" needs a Referer
header, and the domain "swebtoon-phinf.pstatic.net" does not. This
is because of the environment "swebtoon" images live in, one without
explicit network control: RSS feeds on sites such as Feedly. This change should
make it easier for gallery-dl developers to embed Webtoon comics without
worrying about headers.
2021-11-11 20:03:34 +01:00
Mike Fährmann
6e3658ef52
[kemonoparty] provide 'date' metadata for gumroad ( #2007 )
...
Not the 'published' or 'edited' values since they are 'null',
but still better then nothing at all.
2021-11-11 19:38:10 +01:00
Mike Fährmann
37c9dedee1
[seisoparty] remove module
2021-11-09 22:41:04 +01:00