gallery-dl

mirror of https://github.com/mikf/gallery-dl.git synced 2024-11-23 19:22:32 +01:00

Author	SHA1	Message	Date
Mike Fährmann	e59bcb8437	[weibo] ensure media URLs use https://	2022-06-03 17:37:57 +02:00
Mike Fährmann	73f673e3ca	[weibo] handle 'gif' pictures	2022-06-03 17:33:14 +02:00
Mike Fährmann	345199a3ec	[pixiv] include '.gif' in background fallback URLs (#2495 )	2022-06-03 17:25:23 +02:00
Mike Fährmann	57508d3bb7	[weibo] support all different 'tabtype' listings (#686 , #2601 )	2022-06-03 16:36:22 +02:00
Mike Fährmann	2687ef6bd9	[nozomi] remove slashes from search terms (fixes #2653 )	2022-06-02 22:17:15 +02:00
Mike Fährmann	ee7cea888e	[instagram] it is now possible to use 'id:…' instead of a user's screen name: - https://www.instagram.com/instagram/ - https://www.instagram.com/id:25025320/ similar to the same functionality for twitter: `a3b473bd2f` for /tagged/ URLs, using a user ID will only have 'tagged_owner_id' defined. 'tagged_username' and 'tagged_full_name', which are available when using a screen name, will not be defined.	2022-06-02 21:49:09 +02:00
Mike Fährmann	d0dc29f312	[instagram] fix stories (#2644 )	2022-06-02 13:14:19 +02:00
Mike Fährmann	2fb01938f4	[instagram] fix and update extractors (#2644 ) - use different way to fetch user IDs - use new API endpoints for /tagged/ and single posts	2022-06-01 22:05:45 +02:00
Mike Fährmann	05d4a0215a	[sankaku] extend URL patterns (fixes #2647 ) - support URLs with ISO 639-1 language codes - support black.… and white.… subdomains	2022-06-01 21:31:11 +02:00
Mike Fährmann	e0ac358aa5	[gofile] fix 401 Unauthorized errors (#2632 )	2022-06-01 13:02:34 +02:00
Mike Fährmann	8a42d859bf	[bunkr] change domain to 'app.bunkr.is' (#2634 )	2022-06-01 11:30:27 +02:00
Mike Fährmann	7a9cba9c10	[weibo] add support for usernames in URLs (#1662 )	2022-05-31 22:48:34 +02:00
Mike Fährmann	4bf5bc2403	[weibo] support 'livephoto' entries (#2146 )	2022-05-31 15:35:24 +02:00
Mike Fährmann	a0692818af	[weibo] switch to desktop API (#2601 )	2022-05-31 12:46:35 +02:00
Mike Fährmann	61fa9b535a	[paheal] improve metadata extraction (#2641 ) - unescape 'tags' - add 'date', 'source', and 'uploader' for single posts	2022-05-30 17:23:08 +02:00
Mike Fährmann	415c208c1f	[gfycat] cleanup	2022-05-29 15:24:23 +02:00
Mike Fährmann	a80ba17ed4	[gfycat] add 'collections' extractor (#2629 )	2022-05-29 14:39:08 +02:00
Mike Fährmann	ff5e10a86d	[hypnohub] move to gelbooru_v02 instances (#2631 )	2022-05-28 21:10:05 +02:00
Mike Fährmann	d6e744bf0f	[gfycat] add 'collection' extractor (#2629 )	2022-05-28 16:53:27 +02:00
Mike Fährmann	4f7fe9b4be	[deviantart] fix folder listings with 'pagination: manual' (#2488)	2022-05-27 18:41:06 +02:00
Mike Fährmann	310fee99d5	[readcomiconline] remove automatic 'browser' setting (#2625 )	2022-05-27 13:44:28 +02:00
Mike Fährmann	d4e9d51760	[reddit] add 'home' extractor (#2614 )	2022-05-26 15:28:33 +02:00
Infinitay	f54525573b	[Instagram] Add tagged_users to keywords for stories (#2582 ) (#2584 )	2022-05-25 17:02:42 +02:00
thatfuckingbird	da0696e1f5	recognize vxtwitter URLs (#2621 )	2022-05-25 17:01:58 +02:00
Mike Fährmann	dcb580240d	[twitter] extract alt texts as 'description' (closes #2617 )	2022-05-24 12:37:38 +02:00
Mike Fährmann	915dba8345	[twitter] improve results for regular user URLs - continuation of `3346f58a` - use media timeline results (or tweet timeline if retweets are enabled) plus search results starting from the last tweet id of the first timeline, similar to how Twitter Media Downloader operates - the old behavior can be forced by appending '/tweets' to a user URL, like with '/media' (https://twitter.com/USER/tweets) although there should be no need to ever do that	2022-05-23 18:33:52 +02:00
Mike Fährmann	9df4e0f65b	[twitter] disable 'cards' by default	2022-05-21 15:39:25 +02:00
Mike Fährmann	79dce8ae68	[weasyl] implement 'metadata' option (#2610 )	2022-05-20 22:32:35 +02:00
Mike Fährmann	9d5580a091	[khinsider] fix metadata extraction (closes #2611 )	2022-05-20 20:00:39 +02:00
Mike Fährmann	688d6553b4	replace calls to print() with stdout_write() (#2529 )	2022-05-19 17:09:24 +02:00
Mike Fährmann	86cbf485ab	[webtoons] extract real episode number (#2591 ) The number from the 'episode_no' query parameter got renamed to 'episode_no'.	2022-05-17 22:33:29 +02:00
Mike Fährmann	82c1cc130b	[readcomiconline] update deobfuscation code (#2481 )	2022-05-17 10:52:45 +02:00
Mike Fährmann	4005171db3	[pixiv] provide more metadata fields when option enabled (#2594 )	2022-05-15 14:47:14 +02:00
Mike Fährmann	c8abb16c60	[mangahere] send Referer headers (#2592 )	2022-05-15 14:41:16 +02:00
Mike Fährmann	3fd9249717	[mangafox] send Referer headers (#2592 )	2022-05-15 14:41:16 +02:00
Mike Fährmann	90d28387ef	[instagram] detect empty story listings faster	2022-05-15 14:41:03 +02:00
Mike Fährmann	bd6ec5c352	[foolfuuka] match 4chan filenames (#2577 ) introduce two new metadata fields: - filename_media: original filename of file uploaded to 4chan - timestamp_ms : timestamp with millisecond precision (tim)	2022-05-15 14:39:54 +02:00
Mike Fährmann	feb470d19a	[shopifx] natively support a few more sites (closes #2089 ) - chelseacrew.com - michaels.com.au - modcloth.com - pinupgirlclothing.com - raidlondon.com (loveraid.com) - unique-vintage.com	2022-05-10 15:49:36 +02:00
Mike Fährmann	60f4d59b1e	[gelbooru_v01] remove 'tlb.booru.org' from supported domains 403 Forbidden nginx it is also no longer listed on https://booru.org/top	2022-05-10 12:23:05 +02:00
Mike Fährmann	6b6eb0b8f6	[lolisafe] implement 'domain' option (#2575 )	2022-05-10 12:17:59 +02:00
Mike Fährmann	d26da3b9e5	add pre-generated 'pattern' for supported BaseExtractor sites	2022-05-09 22:20:09 +02:00
Mike Fährmann	6ae3a5cdb0	[pixiv] make retrieving ugoira metadata non-fatal (#2562 )	2022-05-08 20:05:38 +02:00
Mike Fährmann	6742f3bc1e	implement --cookies-from-browser (#1606 ) most of the code is adapted from yt-dlp's implementation and should work the same.	2022-05-07 23:06:37 +02:00
Mike Fährmann	c4b9f7bab8	update functions working with cookies.txt files - rename - load_cookiestxt -> cookiestxt_load - save_cookiestxt -< cookiestxt_store - in cookiestxt_load, add cookies directly to a cookie jar instead of storing them in a list first - other unnoticeable performance increases	2022-05-06 13:21:29 +02:00
Mike Fährmann	f190018e37	[mangasee] use randomly generated PHPSESSID cookie (#2560 )	2022-05-05 19:35:32 +02:00
Mike Fährmann	4c47dfffdd	[instagram] report redirects to captcha challenges (#2543 )	2022-05-05 13:18:24 +02:00
Mike Fährmann	4598d32370	[imgur] prevent exception for empty albums (closes #2557 )	2022-05-04 17:34:50 +02:00
Mike Fährmann	435e9c5d2e	[vk] report errors for private albums (#2556 )	2022-05-04 17:34:50 +02:00
Mike Fährmann	9adea93aef	[pixiv] updates to avatar/background extractors (#2495 ) - add 'date' metadata to avatar/background files when available and use that in default filenames / archive ids - remove deprecation warnings as their option names clash with subcategory names	2022-05-04 17:30:54 +02:00
Mike Fährmann	3e6aba05ab	[vk] add fallback for user ID extraction (#2535 )	2022-05-03 13:42:45 +02:00
Mike Fährmann	52b47c3cf9	[gelbooru_v01] add 'favorite' extractor (#2546 )	2022-05-02 11:33:28 +02:00
Mike Fährmann	5b7423d14c	[vk] fix URLs for older photos (#2535 )	2022-05-02 11:19:18 +02:00
Mike Fährmann	3346f58a2a	[twitter] use twMediaDownloader strategy for user URLs - use media timeline + search for default user URLs like https://twitter.com/SCREEN_NAME - fetches all/most media for the type of twitter URL that most users use with gallery-dl - can be disabled by setting 'strategy' to any truthy value, like "timeline"	2022-05-02 09:03:35 +02:00
Mike Fährmann	84756982e9	[pixiv] implement 'include' option - split 'user' extractor and its 'avatar' and 'background' options into separate extractors ('artworks', 'avatar', 'background') - avatars can now be downloaded with https://www.pixiv.net/en/users/ID/avatar as URL and will use a proper archive key; similar for backgrounds - options for the 'user' subcategory must be moved to 'artworks' to have the same effect as before	2022-05-02 09:03:35 +02:00
Mike Fährmann	d11e2191ae	[nijie] support /history_nuita.php listings (closes #2541 )	2022-05-02 09:03:34 +02:00
Mike Fährmann	4aca29b7b4	[naverwebtoon] support (best)challenge comics (closes #2542 ) and update URL pattern to match URLs without '.nhn'	2022-05-02 09:03:34 +02:00
Mike Fährmann	3e926bd465	[realbooru] fix extraction (fixes #2530 )	2022-05-02 09:03:34 +02:00
Mike Fährmann	82eee72b39	[pixiv] update API interface - start all endpoints with '/' - use extractor.wait() for rate limit - retry with while loop instead of recursion - in case of error, write entire response to debug log	2022-05-02 09:03:34 +02:00
Mike Fährmann	1bc77efa02	[artstation] use "browser": "firefox" by default (#2527 )	2022-05-02 09:03:13 +02:00
Mike Fährmann	a39e7b7366	[vk] handle photos without width/height info (fixes #2535 )	2022-05-02 09:03:00 +02:00
Federico Ravasio	0381752575	[photovogue] switch to .com, update api endpoint (#2494 )	2022-04-27 22:37:53 +02:00
Mike Fährmann	3f02e483c6	[e621] fix applying request_interval_min (#2533 ) Setting this property after calling Extractor.__init__() has no effect.	2022-04-27 21:10:34 +02:00
Mike Fährmann	afde76269c	[weibo] fix infinite retries for deleted accounts (fixes #2521 )	2022-04-27 20:23:11 +02:00
Mike Fährmann	d85e66bcac	[vk] fix extraction (#2512 ) Use a different API endpoint, since thumbnail URLs from the old one cannot be transformed into URLs for "original" photos anymore.	2022-04-21 14:01:50 +02:00
Mike Fährmann	9e6ff42a9d	[pixiv] implement 'background' option (#623 , #1124 , #2495 )	2022-04-21 13:53:02 +02:00
Mike Fährmann	4d1896830f	[mangadex] download chapters with 'externalUrl' (fixes #2503 ) if the have pages hosted on mangadex	2022-04-18 18:09:52 +02:00
Mike Fährmann	97e8a15295	[deviantart] implement 'pagination' option (#2488 )	2022-04-18 18:08:01 +02:00
Mike Fährmann	1f9a0e2fd8	update extractor test results	2022-04-18 17:24:00 +02:00
Mike Fährmann	ad5a4b1756	[twitter] fix various syndication issues - handle retweets - fix videos without dimensions in URL (`3e942a58`) - fix '"retweets": "self"' filter (#2499)	2022-04-15 20:49:26 +02:00
Mike Fährmann	12bd9ba33a	[readcomiconline] add 'quality' option (#2467 )	2022-04-15 18:10:37 +02:00
Mike Fährmann	60ad46ddcc	[readcomiconline] unobfuscate image URLs (#2481 )	2022-04-15 18:04:09 +02:00
Mike Fährmann	a6c4ff58fb	[cyberdrop] match cyberdrop.to URLs (closes #2496 )	2022-04-15 15:39:29 +02:00
Mike Fährmann	13ed18b9aa	[lolisafe] fix typo LolisafelbumExtractor -> LolisafeAlbumExtractor	2022-04-15 15:02:30 +02:00
Mike Fährmann	3e942a58be	[twitter] improve syndication video selection (#2354 ) - ignore .m3u8 manifests - always select largest format	2022-04-11 17:06:10 +02:00
Mike Fährmann	0794027100	[issuu] fix extraction (#2483 )	2022-04-10 14:23:10 +02:00
Mike Fährmann	5d5a08cc69	[sexcom] add fallback for empty files (#2485 )	2022-04-10 14:22:07 +02:00
thatfuckingbird	4527a35aba	[twitter] accept fxtwitter.com URLs (#2484 )	2022-04-08 14:32:08 +02:00
Mike Fährmann	c1768972c2	[newgrounds] update and fix pagination (#2456 )	2022-04-07 15:38:41 +02:00
Mike Fährmann	78e5d0c423	[kissgoddess] extract all images (closes #2473 ) and not only the first two per page https://github.com/mikf/gallery-dl/issues/1052#issuecomment-1047367383	2022-04-06 21:28:40 +02:00
Mike Fährmann	0b33435da5	[pinterest] support multiple files per pin (closes #1619 , #2452 )	2022-04-06 21:21:33 +02:00
Mike Fährmann	9c5d2d7af3	[pinterest] add extractor for created pins (#2452 )	2022-04-01 16:59:58 +02:00
Mike Fährmann	1171911dc3	[twitter] add 'syndication' option (#2354 ) to fetch age-restricted content using Twitter's syndication API	2022-04-01 16:56:47 +02:00
Mike Fährmann	a53cfc845e	[newgrounds] warn about age-restricted posts (#2456 )	2022-03-30 16:18:33 +02:00
Mike Fährmann	ecee315bbf	[mangasee] unescape manga names (fixes #2454 )	2022-03-30 16:18:18 +02:00
loragja	7e545a3ae9	[gofile] add gofile.io extractor (#2364 ) * Add gofile extractor * add gofile extractor to module list * add support for tiny monitors and ancient python versions * seriously, f-strings are not that new... * i love flake8 :) * add 'api-token' and 'recursive' options * add tests	2022-03-29 17:31:57 +02:00
Layerex	625f4d4cc4	[telegraph] Add telegra.ph extractor (#2312 )	2022-03-28 19:18:13 +02:00
Mike Fährmann	48cc4853be	[skeb] refactor 'sent-requests' and add tests	2022-03-28 11:26:24 +02:00
Mike Fährmann	37d584a9b2	[hitomi] update metadata extraction (fixes #2444 ) remove 'hitomi.metadata' option, as it is no longer necessary to make additional HTTP requests to fetch all metadata.	2022-03-26 12:46:18 +01:00
Mike Fährmann	b03ca7f10c	[aryion] provide correct 'date' independent of dst	2022-03-24 22:57:18 +01:00
Mike Fährmann	ba69fb669d	[kemonoparty] add 'duplicates' option (closes #2440 )	2022-03-24 11:58:38 +01:00
Mike Fährmann	29db716a63	implement 'datetime_to_timestamp()' and rename 'to_timestamp()' to the more descriptive 'datetime_to_timestamp_string()'	2022-03-23 22:36:01 +01:00
Mike Fährmann	9313d4dc10	[pinterest] do not force 'm3u8_native' for video downloads (#2436 )	2022-03-21 10:11:51 +01:00
Mike Fährmann	42f2fd2ed7	[twibooru] fix posts without 'name' (fixes #2434 )	2022-03-21 10:08:37 +01:00
chinggg	6f1d5e8ab9	[unsplash] replace dash with space in search API queries (#2429 )	2022-03-19 16:00:05 +01:00
Mike Fährmann	f8230dde43	[instagram] add 'previews' option (#2135 )	2022-03-19 15:26:40 +01:00
Mike Fährmann	500a479026	fix a third(!) bug in _check_cookies() (#2372 ) turns out tests are worthless if you get em wrong ...	2022-03-18 19:52:37 +01:00
Mike Fährmann	c4cc387f7d	[furaffinity] fix search result pagination (fixes #2402 )	2022-03-18 13:44:36 +01:00
Mike Fährmann	281a5b3b28	[newgrounds] fix video descriptions (#2328 )	2022-03-14 08:38:20 +01:00
Mike Fährmann	b1b15d6cef	[imagebam] add support for /view/ paths (closes #2378 )	2022-03-14 08:38:20 +01:00
Mike Fährmann	e64c2b85d0	[fantia] apply patch (#2381 ) from @thatfuckingbird with small adjustments https://github.com/mikf/gallery-dl/issues/2381#issuecomment-1063208696	2022-03-11 18:02:31 +01:00
Mike Fährmann	f31ab0d2ec	[fanbox] fetch data for each individual post (fixes #2388 ) Posts from 'https://api.fanbox.cc/post.listCreator' do not contain a 'body' with all images anymore. https://github.com/mikf/gallery-dl/pull/1459#discussion_r614322881	2022-03-11 17:36:05 +01:00
Mike Fährmann	fc277fa45f	[seiga] require authentication with 'user_session' cookie (#2372 ) Login with username & password would now require entering a 2FA token. see also `7b009cc893`	2022-03-11 02:10:15 +01:00
Mike Fährmann	47cf05c4ab	refactor proxy handling code (#2357 ) - allow gallery-dl proxy settings to overwrite environment proxies - allow specifying different proxies for data extraction and download - add 'downloader.proxy' option - '-o extractor.proxy=–PROXY_URL -o downloader.proxy=null' now has the same effect as youtube-dl's '--geo-verification-proxy'	2022-03-10 23:55:35 +01:00
Mike Fährmann	d50a1ec2cc	[subscribestar] unescape attachment URLs (fixes #2370 )	2022-03-09 19:06:04 +01:00
Mike Fährmann	3ddc620ef6	[skeb] fix post extractor (#2330 )	2022-03-09 18:45:07 +01:00
Orkun Koçyiğit	eb2bb7d998	[fantia] add 'num' enumeration index (#2377 ) * Adding numerical ordering to fantia * Fixed line to fit PEP8 line size limit	2022-03-08 22:06:41 +01:00
Mike Fährmann	fac8047899	[kemonoparty] limit default filename length (#2373 )	2022-03-08 21:14:47 +01:00
Mike Fährmann	bfa5e61900	[patreon] add explicit 'image_large' file type (#2257 ) to allow more control over when and if to download 'large_url' images `4fee3a0e52` forced them to be downloaded instead of regular images, even though 'large_url' images are most likely an upscaled version of the original.	2022-03-06 17:07:13 +01:00
Mike Fährmann	6ea3ff5173	[tumblr] notify users about registering an oauth application if they hit the daily rate limit and are using default API credentials	2022-03-06 16:28:53 +01:00
Mike Fährmann	b5236656d5	[deviantart] notify users about registering an oauth application if they get repeated 429 errors and are using default API credentials	2022-03-06 16:24:39 +01:00
Mike Fährmann	2aa47e8382	[twitter] handle Tweets with "softIntervention" entries or other such things where the actual Tweet data is one level deeper than usual	2022-03-03 02:06:54 +01:00
Mike Fährmann	64bbc7969d	[twitter] warn about age-restricted Tweets (#2354 )	2022-03-03 02:03:27 +01:00
Mike Fährmann	e778be52bc	[twitter] update query hashes	2022-03-02 23:05:31 +01:00
Mike Fährmann	bddcec49f1	implement 'text.root_from_url()' use domain from input URL for kemono	2022-03-01 03:09:57 +01:00
Mike Fährmann	92c492dc09	[kemonoparty] match beta.kemono.party URLs (#2348 )	2022-03-01 03:02:30 +01:00
Mike Fährmann	4ea9157d51	[mangadex] fix chapters without 'translatedLanguage' (#2352 )	2022-03-01 02:04:25 +01:00
Alice	f1cab23724	[skeb] add 'sent-requests' option (#2322 ) (#2330 ) * Update skeb.py * Update configuration.rst * flake8	2022-02-28 22:42:15 +01:00
dragobit	781fdfa212	[hentaicosplays] add Referer to headers (#2317 )	2022-02-28 22:19:32 +01:00
Mike Fährmann	4385a34e05	[twitter] fix handling of 429 responses (fixes #2339 ) Twitter doesn't return a valid JSON response for 429 errors anymore.	2022-02-28 16:42:55 +01:00
Mike Fährmann	5a50569360	[toyhouse] support 'art' listings (#1546 , #2331 )	2022-02-27 16:22:50 +01:00
Mike Fährmann	1c79044433	[imagebam] set 'nsfw_inter' cookie (fixes #2334 )	2022-02-27 16:12:28 +01:00
Mike Fährmann	d71c173150	[newgrounds] strip incomplete HTML tag from '_comment' (#2328 )	2022-02-23 21:42:28 +01:00
Mike Fährmann	cf58048bd4	[newgrounds] add 'post_url' metadata field (#2328 )	2022-02-23 00:00:23 +01:00
Mike Fährmann	7aa2e2cd84	[slideshare] fix extraction	2022-02-21 02:52:45 +01:00
Mike Fährmann	fdfdc1b614	[kissgoddess] add 'gallery' and 'model' extractors (closes #1052, #2304)	2022-02-20 04:45:37 +01:00
Mike Fährmann	79a461a2c1	[mememuseum] add 'tag' and 'post' extractors (closes #2264 )	2022-02-20 02:15:38 +01:00
Mike Fährmann	e5f6af6e32	[oauth:pixiv] add note about 'code' expiring in 30 seconds (#2306 )	2022-02-19 23:47:30 +01:00
Mike Fährmann	bbc4190017	[bunkr] fix .mp4 downloads (#2239 ) again ...	2022-02-19 03:55:14 +01:00
Mike Fährmann	254a5b26e0	[twibooru] add extractors for searches, galleries, and posts (#2219)	2022-02-18 23:43:57 +01:00
Mike Fährmann	9ebc20e290	[booru] call nameext_from_url() before update() and _prepare() to be able to overwrite filename and extension in _prepare()	2022-02-18 00:37:59 +01:00
Mike Fährmann	4fee3a0e52	[patreon] download 'large_url' images if available (#2257 )	2022-02-17 18:23:59 +01:00
Mike Fährmann	f5b2b9333f	fix another bug in _check:cookies (#2160 ) regression introduced in `ed317bfc` Added a couple of tests to hopefully catch such bugs before they land in a release.	2022-02-16 22:58:57 +01:00
Ailothaen	203a04a4a3	[reddit] Support of standalone submissions on personal pages of users (#2301 ) * [reddit] Support of submissions on personal pages of users * [reddit] Design improvement for user submissions * [reddit] Removed functions declared twice	2022-02-13 23:03:46 +01:00
Mike Fährmann	806bc62379	[redgifs] support 'i.redgifs.com' URLs (closes #2300 )	2022-02-13 23:00:50 +01:00
Mike Fährmann	655b2de5d9	[vk] fix infinite pagination loops (fixes #2297 )	2022-02-13 23:00:50 +01:00
Mike Fährmann	cc5b1ce91a	[inkbunny] rename search parameters to their API equivalents (fixes #2292)	2022-02-13 23:00:49 +01:00
Mike Fährmann	ed317bfcf1	warn about cookies expiring in less than 24 hours requires an expiration timestamp, so this only works with cookies from a cookies.txt file	2022-02-13 23:00:49 +01:00
David Hoppenbrouwers	b17e2dcf93	[wallpapercave] add extractor for images (#2205 )	2022-02-11 23:44:51 +01:00
v-delta	c661737f36	[Imgbox] Fix ImgboxExtractor (#2281 )	2022-02-11 22:17:02 +01:00
Thomas Jost	a7de819aca	[lightroom] add Lightroom gallery extractor (#2263 )	2022-02-11 21:30:59 +01:00
Mike Fährmann	563bd0ecf4	[danbooru] inherit from BaseExtractor - merge danbooru and e621 code - support booru.allthefallen.moe (closes #2283) - remove support for old e621 tag search URLs	2022-02-11 21:01:51 +01:00
Mike Fährmann	bc0e853d30	combine KeyError & IndexError to common base class LookupError	2022-02-11 00:42:49 +01:00
Mike Fährmann	f1c853c6ef	[furaffinity] add 'layout' option (#2277 ) to be able to force gallery-dl to parse according to a specific layout in case its auto-detect fails	2022-02-11 00:28:47 +01:00
Mike Fährmann	b4f8e15a1f	allow BaseExtractors to use the domain pf the matched URL	2022-02-10 01:38:50 +01:00
Mike Fährmann	a57a44f510	[kemonoparty] handle files without 'name' (fixes #2276 )	2022-02-08 18:27:05 +01:00
Mike Fährmann	4efe56f419	[furaffinity] improve new/old layout detection (fixes #2277 )	2022-02-08 18:10:52 +01:00
Mike Fährmann	0f1e7ff319	[twitter] fix extraction (#2275 )	2022-02-07 23:18:35 +01:00
Mike Fährmann	dee0d22561	update extractor test results	2022-02-06 21:39:24 +01:00
Mike Fährmann	d7b8e04b50	[kemonoparty] use 'Accept-Encoding: identity' for all downloads (#2267) fixes issues when data send with 'Content-Encoding: gzip' or other encodings is larger than the actual file	2022-02-05 18:06:58 +01:00
enormous-muscles	55326377d8	Add Kohlchan extractor (#2251 )	2022-02-04 23:22:17 +01:00
Mike Fährmann	cc7dce5755	[sexcom] add 'pins' extractor (closes #2265 )	2022-02-04 20:55:00 +01:00
Mike Fährmann	02e18f56be	[e621] add 'favorite' extractor (closes #2250 )	2022-02-04 20:54:48 +01:00
Mike Fährmann	70e6e1549e	[twitter] provide fallback URLs for card images `f2e8aedd74 (commitcomment-64057751)`	2022-02-03 23:43:18 +01:00
Mike Fährmann	86fa412b47	[hitomi] add 'format' option (#2260 ) default is 'webp' since downloading original files is no longer allowed	2022-02-03 23:32:19 +01:00
Mike Fährmann	492436f936	[twitter] add 'warnings' option (#2258 ) disable reporting any non-fatal errors by default	2022-02-02 18:37:19 +01:00
Mike Fährmann	a5163e4c70	[twitter] restore 'logout' functionality (#1719 )	2022-02-02 18:21:15 +01:00
Mike Fährmann	f58364f6a8	update Firefox cipher list	2022-02-01 02:33:01 +01:00
Mike Fährmann	7e6981dda6	rename 'disabletls12' to 'tls12' and let config options override any default settings	2022-02-01 01:37:03 +01:00
Mike Fährmann	bb3e182562	overhaul session initialization - share adapter & connection pool across sessions with the same ssl options, ssl ciphers, and source address - simplify browser emulation to just a list of headers and ciphers	2022-01-31 23:12:08 +01:00
Mike Fährmann	e670dc518e	[weibo] update pagination code (fixes #2244 ) - send proper headers and query parameters - use 'since_id' instead of page numbers - set a 1-2 second delay between requests	2022-01-31 19:16:01 +01:00
Robert Pendell	4c651f6252	[patreon] Disable TLS 1.2 by default (#2249 ) Disables TLS 1.2 on Patreon by default.	2022-01-30 23:30:44 +01:00
Robert Pendell	392cf079f7	Add ability to disable TLS 1.2 (#2243 ) Fix for Patreon Cloudflare issues by having only TLS v1.3 or higher establish HTTPS connections This now allows you to disable it on a per-host or global basis. Add disabletls12 as a config option either under extractor.(host) or just under extractor. Option is false by default. Example: "patreon": { "disabletls12": true, "cookies": { "session_id": "X" } }	2022-01-30 22:14:43 +01:00
Mike Fährmann	d33227fc38	[twitter] restore errors for protected timelines etc (fixes #2237 )	2022-01-30 16:42:13 +01:00
Mike Fährmann	ebd3d5c1cc	[bunkr] fix .mp4 downloads (closes #2239 )	2022-01-28 23:21:16 +01:00
Mike Fährmann	e2be199124	[gelbooru] improve and fix pagination (#2230 , #2232 ) Use 'id:<POSTID' as a tag instead of going through pages with 'pid'. Something similar was already implemented in `93cef784`, but that got broken again in `3085aac4`.	2022-01-27 17:44:47 +01:00
Mike Fährmann	8230f31800	[twitter] update query hashes	2022-01-26 00:49:46 +01:00
Mike Fährmann	c180806cec	[twitter] fix deleted/invalid retweets (#2225 )	2022-01-25 23:57:13 +01:00
Mike Fährmann	a2eecc6aa8	[kemonoparty] fix DMs extraction (#2008 )	2022-01-25 23:16:13 +01:00
Mike Fährmann	2bf554a896	[twitter] fix several errors (#2212 , #2216 , #2225 ) - fix Tweets with deleted quotes - fix suspended Tweets without 'legacy' entry - fix unified_cards without 'type'	2022-01-25 16:13:22 +01:00
Mike Fährmann	e5242b83bf	[twitter] define directory format for events (#2109 )	2022-01-24 17:44:17 +01:00
Mike Fährmann	efb3e65a6a	[sexcom] extend URL pattern (fixes #2220 )	2022-01-24 01:19:40 +01:00
vsyx	3f2b6335d7	[instagram] fix highlights extraction (#2197 ) * [instagram] fix highlights extraction * [instagram] improve highlights extraction - 'yield' individual reels instead of collecting them in a list and returning them all at once - reduce 'chunk_size' to an even saver value (instagram.com also uses 5)	2022-01-24 00:20:12 +01:00
Mike Fährmann	5ed26e1773	[twitter] fix pinned tweets (#2216 ) caused by the changes in `dffa440ede`	2022-01-23 22:52:57 +01:00
Mike Fährmann	a9f78e6527	[twitter] improve error handling - handle accounts without 'rest_id' - handle timelines with empty 'instructions'	2022-01-23 18:01:05 +01:00
Mike Fährmann	729b07c1f5	[twitter] simplify - use dict with common GraphQL variables - reduce 'variables' size with custom JSON encoder instance - centralise TwitterAPI() creation	2022-01-23 01:44:55 +01:00
Mike Fährmann	7cb29224f0	[philomena] fix search parameter escaping (#2215 ) The pluses from search terms in /tags/ URLs need to be replaced with spaces to get accepted by Philomena.	2022-01-23 01:03:37 +01:00
Mike Fährmann	9ca8bb2dc0	[twitter] improve error handling	2022-01-22 23:09:45 +01:00
Mike Fährmann	9a221494c3	[twitter] add 'event' extractor (closes #2109 )	2022-01-22 20:55:50 +01:00
Mike Fährmann	14867dad6b	[twitter] fix unified cards from search results	2022-01-22 20:25:10 +01:00
Mike Fährmann	dffa440ede	[twitter] improve handling of deleted tweets (#2212 )	2022-01-22 00:41:58 +01:00
Mike Fährmann	54ef874ba4	[twitter] fix retweet filter (#2212 )	2022-01-21 23:53:59 +01:00
Mike Fährmann	cb43f7731b	[twitter] update to GraphQL API (#2212 ) The old REST API endpoints, which were not used by Twitter since summer 2021, are going to finally be phased out it seems, with '/2/timeline/profile/USERID.json' being the first one. Only Twitter's search doesn't have a GraphQL interface yet.	2022-01-21 23:34:41 +01:00
Mike Fährmann	de754590e0	add --source-address command-line option (closes #2206 )	2022-01-21 17:07:56 +01:00
Mike Fährmann	698f35215e	[blogger] support new image domain (fixes #2204 )	2022-01-20 23:13:07 +01:00
Mike Fährmann	c587b678d0	[mangadex] re-enable warning for external chapters (#2193 )	2022-01-16 03:21:50 +01:00
Mike Fährmann	f2e8aedd74	[twitter] changes to 'cards' option - change default value to 'true' - only invoke youtube-dl for cards unsupported by gallery when 'cards' is set to "ytdl" "cards": true --> only download card images "cards": "ytdl" --> download card images and use youtube_dl on otherwise unsupported cards	2022-01-15 22:02:57 +01:00
Mike Fährmann	2d34d8ff8b	[reddit] allow downloading from quarantined subreddits (#2180 )	2022-01-14 21:55:59 +01:00
Mike Fährmann	17c9c47ca0	[hitomi] fix 'tag' extraction (fixes #2189 )	2022-01-13 16:45:46 +01:00
Mike Fährmann	df2f0c09bb	[twitter] support "image_carousel_website" unified cards	2022-01-13 16:05:52 +01:00
Mike Fährmann	cdc96e1217	[gelbooru] improve video file detection (fixes #2188 ) not all files from 'https://video-cdnN.gelbooru.com' are videos	2022-01-12 21:33:02 +01:00
Mike Fährmann	4acc31bd9f	[newgrounds] set suitabilities filter before starting a search	2022-01-11 23:50:29 +01:00
Mike Fährmann	170711af7e	[mangadex] fix extraction (closes #2177 )	2022-01-08 17:21:35 +01:00
Mike Fährmann	199e7616a7	[rule34] use https://api.rule34.xxx for API requests	2022-01-08 17:14:50 +01:00
Mike Fährmann	37beb1298e	[newgrounds] add 'search' extractor (closes #2161 )	2022-01-06 19:32:39 +01:00
Mike Fährmann	8b910dd8ae	[hitomi] fix image URLs again and again ...	2022-01-06 18:21:26 +01:00
Mike Fährmann	3085aac4d8	[gelbooru] handle changed API response format (#2157 )	2022-01-03 16:42:48 +01:00
Mike Fährmann	38e2af29d6	[hitomi] fix image URLs update '_parse_gg()' yet again	2022-01-03 16:41:00 +01:00
Mike Fährmann	6f2e0c9c3d	fix cookie checks for patreon, fanbox, fantia The changes in `9a255344` caused a warning about missing cookies to be displayed even if those cookies were present, because _check_cookies() did not account for an empty cookiedomain.	2022-01-01 17:55:58 +01:00
Mike Fährmann	1e0278702d	[hitomi] update '_parse_gg()'	2022-01-01 17:55:58 +01:00
Mike Fährmann	becc7f85a6	[hitomi] fix image URLs	2021-12-29 22:46:17 +01:00
Mike Fährmann	6af8d71da6	[kemonoparty] use service as subcategory (closes #2147 )	2021-12-29 22:46:17 +01:00
Vrihub	96fcff182c	generic extractor (#735 ) * Generic extractor, see issue #683 * Fix failed test_names test, no subcategory needed * Prefix directory_fmt with "generic" * Relax regex (would break some urls) * Flake8 compliance * pattern: don't require a scheme This fixes a bug when we force the generic extractor on urls without a scheme (that are allowed by all other extractors). * Fix using g: and r: on urls without http(s) scheme Almost all extractors accept urls without an initial http(s) scheme. Many extractors also allow for generic subdomains in their "pattern" variable; some of them implement this with the regex character class "[^.]+" (everything but a dot). This leads to a problem when the extractor is given a url starting with g: or r: (to force using the generic or recursive extractor) and without the http(s) scheme: e.g. with "r:foobar.tumblr.com" the "r:" is wrongly considered part of the subdomain. This commit fixes the bug, replacing the too generic "[^.]+" with the more specific "[\w-]+" (letters, digits and "-", the only characters allowed in domain names), which is already used by some extractors. * Relax imageurl_pattern_ext: allow relative urls * First round of small suggested changes * Support image urls starting with "//" * self.baseurl: remove trailing slash * Relax regexp (didn't catch some image urls) * Some fixes and cleanup * Fix domain pattern; option to enable extractor Fixed the domain section for "pattern", to pass "test_add" and "test_add_module" tests. Added the "enabled" configuration option (default False) to enable the generic extractor. Using "g(eneric):URL" forces using the extractor.	2021-12-29 22:39:29 +01:00
Mike Fährmann	4376b39a2b	[sexcom] fix and improve embed extraction (fixes #2145 )	2021-12-28 21:59:39 +01:00
Mike Fährmann	6d190834ee	[instagram] fix error when PostPage data is not in GraphQL format (#2037)	2021-12-28 00:27:59 +01:00
Mike Fährmann	dd67e24aa9	[lolisafe] include file ID in filenames More precisely, it now splits the full 'filename' into 'name' and 'id' instead of overwriting 'filename'. The format string stays the same as before. Use '{name}.{extension}' to restore the old behavior. before: - filename: foobar - id : 12345 now: - filename: foobar-12345 - name : foobar - id : 12345	2021-12-25 17:16:45 +01:00
Mike Fährmann	f3d61de18d	[artstation] create directories per asset (closes #2136 )	2021-12-25 17:16:45 +01:00
Mike Fährmann	49a50fb2eb	[500px] create directories per photo	2021-12-25 17:16:45 +01:00
Mike Fährmann	89bebe1bef	[500px] add 'favorite' extractor (closes #1927 )	2021-12-25 17:16:45 +01:00
Mike Fährmann	22b0433985	[fanbox] support pixiv redirects (closes #2122 )	2021-12-25 17:15:39 +01:00
Mike Fährmann	281828b58b	[tumblrgallery] improve search pagination (fixes #2132 )	2021-12-24 03:42:28 +01:00
Mike Fährmann	4bec34fc94	[pixiv] allow setting a date range for search results (#2133 ) with the 'scd' and 'ecd' query parameters	2021-12-23 23:03:39 +01:00
Mike Fährmann	882c614281	add album extractor for lolisafe/chibisafe instances - support bunkr.is (closes #2038) - support zz.ht (closes #2105)	2021-12-21 19:24:17 +01:00
Mike Fährmann	d441888bfb	[deviantart] adjust API endpoints Start all endpoints with a forward slash '/' to be consistent with other API interfaces.	2021-12-21 00:18:06 +01:00
Mike Fährmann	8f0cf0bf71	[deviantart] use '/browse/newest' for most-recent searches (#2096)	2021-12-20 22:40:03 +01:00
Mike Fährmann	0bd7607da5	[tumblrgallery] improve 'id' extraction (#2115 )	2021-12-19 05:46:02 +01:00
Mike Fährmann	0d02a7861e	[tumblrgallery] fix extraction (closes #2112 )	2021-12-17 19:55:53 +01:00
Mike Fährmann	62692c6842	[exhentai] add 'source' option setting it to "hitomi" downloads the corresponding gallery from hitomi.la; might be extended to other sources in the future	2021-12-16 23:16:19 +01:00
Mike Fährmann	099ed72de7	[hitomi] disable extra 'metadata' by default safes one HTTP request that not needed with default filename settings	2021-12-16 22:21:07 +01:00
Mike Fährmann	9a25534490	use Extractor._check_cookies() for all cookie checks	2021-12-16 02:21:16 +01:00
Mike Fährmann	63c6bc26b5	[rule34us] extract tags per category (#1527 ) like for other boorus with 'tags': true	2021-12-16 00:06:52 +01:00
Mike Fährmann	f587458a3c	[twitter] include '4096x4096' as a default image fallback (closes #2107, closes #1881)	2021-12-15 23:19:30 +01:00
Mike Fährmann	8ed282f7f2	[kemonoparty] support coomer.party URLs (#2100 )	2021-12-15 16:21:05 +01:00
Mike Fährmann	87ce3fa669	[furaffinity] warn when no session cookies were found	2021-12-15 16:21:05 +01:00
Mike Fährmann	159631c808	[philomena] use a default 'filter_id' if non is given	2021-12-15 16:20:53 +01:00
Mike Fährmann	ad30653b17	allow running a BaseExtractor for any URL by prefixing it with '<base-category>:' For example: shopify:https://partakefoods.com/products/crunchy-cookie-variety-pack gelbooru_v01:https://5naf.booru.org/index.php?page=post&s=view&id=46963 Available base categories are: mastodon, shopify, moebooru, gelbooru_v01, gelbooru_v02, reactor, foolslide, foolfuuka, philomena	2021-12-15 00:32:17 +01:00
Mike Fährmann	299bd2f1f5	[rule34us] add 'tag' and 'post' extractors (#1527 )	2021-12-14 00:27:46 +01:00
Mike Fährmann	3cf1075d86	[inkbunny] add 'search' extractor (closes #2094 )	2021-12-12 03:08:14 +01:00
Mike Fährmann	c6a23c26d7	[instagram] allow downloading specific stories (closes #2088 ) https://instagram.com/stories/<USER>/<ID> now only downloads the one story specified by <ID> and not all stories from that user.	2021-12-11 21:34:25 +01:00
Mike Fährmann	352ffcddb0	[instagran] match post URLs with usernames (fixes #2085 )	2021-12-10 18:37:33 +01:00
Mike Fährmann	f4e3cee6ac	use yt-dlp by default (#1850 , #2028 )	2021-11-29 18:24:26 +01:00
Mike Fährmann	f1b142e993	{kemonoparty[ change default 'files' order to attachments,file,inline (#1991)	2021-11-29 04:41:30 +01:00
Mike Fährmann	275543b2d2	update extractor test results	2021-11-27 19:26:44 +01:00
Mike Fährmann	e7ea4f2567	[mangoxo] fix metadata extraction	2021-11-27 18:19:51 +01:00
Mike Fährmann	e298882acc	[kemonoparty] match URLs with www subdomain	2021-11-26 18:58:26 +01:00
Mike Fährmann	addb72e1bb	[reactor] support thatpervert.com (closes #2029 )	2021-11-26 18:58:07 +01:00
Mike Fährmann	d8d9502e1e	[reactor] inherit from BaseExtractor	2021-11-26 18:58:07 +01:00
Mike Fährmann	f4ea216c95	[shopify] support loungeunderwear.com (closes #2053 )	2021-11-26 18:58:06 +01:00
Mike Fährmann	93cef78450	[gelbooru] workaround pagination limits Gelbooru only allows to retrieve the latest 20k posts for a tag search. Add 'id:<N' to the search tags to work around that limitation, where N is the ID of the last retrieved post. http://gelbooru.me/index.php?page=forum&s=view&id=1467	2021-11-26 18:56:31 +01:00
Mike Fährmann	f2ae179713	[exhentai] fix extraction for disowned galleries (closes #2055 )	2021-11-24 21:26:16 +01:00
Alice	612850438e	[skeb] add 'thumbnails' option (#2047 ) (#2051 )	2021-11-23 21:16:42 +01:00
Mike Fährmann	11a3d96d13	[mangadex] load additional metadata using includes[] directives - always provide 'artist', 'author', and 'group' metadata fields (#2049) - remove 'metadata' option	2021-11-22 01:16:33 +01:00
Mike Fährmann	19e00f1322	[dynastyscans] provide 'date' as proper datetime object (#2050 )	2021-11-21 22:50:52 +01:00
Mike Fährmann	af6424f398	allow testing metadata in list elements	2021-11-21 22:46:34 +01:00
Mike Fährmann	c67756e187	[kemonoparty] add 'dms' option (#2008 )	2021-11-20 23:36:16 +01:00
Mike Fährmann	3a7a19c7b9	[dynastyscans] add 'manga' extractor (closes #2035 )	2021-11-19 22:51:26 +01:00
Mike Fährmann	9bc83af3a6	[kemonoparty] 'postfile' -> 'file' (#1991 ) to stay consistent with the existing file types for kemono	2021-11-19 01:50:48 +01:00
Mike Fährmann	522782c09d	[subscribestar] emit metadata for posts without media (#1569 )	2021-11-18 23:42:17 +01:00
Mike Fährmann	1c8aaf9318	[subscribestar] add 'num' enumeration index (closes #2040 )	2021-11-18 23:38:41 +01:00
Mike Fährmann	d433735750	[kemonoparty] skip duplicate files (#2032 , #1991 , #1899 ) Extract the SHA-256 file hash from URLs and skip files with the same hash in the same post. - provide a 'hash' metadata field (empty string if not available) - remove 'patreon-skip-file' option	2021-11-17 22:44:15 +01:00
Mike Fährmann	d4ec245554	[kemonoparty] implement a 'files' option (#1991 ) similar to `8d676151`	2021-11-17 22:43:41 +01:00

... 3 4 5 6 7 ...

2924 Commits