gallery-dl

mirror of https://github.com/mikf/gallery-dl.git synced 2024-11-24 03:32:33 +01:00

Author	SHA1	Message	Date
Mike Fährmann	73373c06ec	[weibo] handle posts with more than 9 images (closes #926 ) Responses from '/api/container/getIndex' don't list more than 9 images per 'status' object, but the embedded JSON from a '/detail/<ID>' page does.	2020-10-06 18:16:08 +02:00
Mike Fährmann	dd1e545597	[hentaifoundry] rename GalleryExtractor to PicturesExtractor	2020-10-04 22:53:23 +02:00
Mike Fährmann	c874071f5a	[kissmanga] remove module	2020-10-04 22:46:41 +02:00
Mike Fährmann	93e04bf9a9	[500px] update query hashes	2020-10-03 19:25:28 +02:00
Mike Fährmann	844502cad5	update extractor test results	2020-10-03 19:24:19 +02:00
Mike Fährmann	fad7748b6b	[xvideos] fix 'title' extraction	2020-10-01 22:04:14 +02:00
Mike Fährmann	5b927c15df	[newgrounds] fix video extraction (closes #1042 )	2020-10-01 20:14:16 +02:00
Mike Fährmann	bdc6c8f074	improve message for 'oauth:deviantart' etc (closes #989 )	2020-09-29 21:25:24 +02:00
Mike Fährmann	430b6d6e2e	[twitter] extend 'retweets' option (closes #1026 ) Setting 'retweets' to '"original"' will use metadata from the original retweeted Tweets, and not from the Retweet entry.	2020-09-28 23:03:35 +02:00
Mike Fährmann	b9bdd2c564	[hentaifoundry] add support for stories (closes #734 )	2020-09-27 02:27:40 +02:00
Mike Fährmann	9a9d1924d8	[hentaicafe] add 'manga_id' metadata field (closes #1036 ) This field is only available when using a non-foolslide URL like '/hc.fyi/9874' or '/hazuki-yuuto-summer-blues/'	2020-09-26 14:34:48 +02:00
Mike Fährmann	cc4ac80302	[weasyl] add 'favorite' extractor (#1032 )	2020-09-26 13:09:03 +02:00
Mike Fährmann	e9cc719497	[weasyl] update and simplify - simplify 'pattern' regexps - parse 'posted_at' as 'date' - use unaltered 'title' ({title!l:R /_/} to lowercase and replace spaces)	2020-09-26 02:10:45 +02:00
Mike Fährmann	6514312126	[nijie] add 'include' option (closes #1018 )	2020-09-25 18:18:35 +02:00
Mike Fährmann	0d43456323	[hentaifoundry] add 'include' option	2020-09-25 18:18:03 +02:00
Zanny	ebb7737b9b	Weasyl Extractor (#977 ) * weasyl extractor * @kattjevfel suggested changes * @mikf changes	2020-09-25 15:18:21 +02:00
Mike Fährmann	aeb0d32333	[twitter] improve twitpic extraction (fixes #1019 ) - ignore twitpic.com/photos/… URLs - ignore empty image URLs	2020-09-22 22:22:35 +02:00
Mike Fährmann	7cd383c0f9	update extractor test results	2020-09-20 21:54:39 +02:00
Mike Fährmann	1e313d5b84	implement 'sleep-request' option	2020-09-20 20:28:17 +02:00
Mike Fährmann	c43b3894be	[myhentaigallery] update and fix extraction (#1001 ) - extract more metadata - match "/show/" URLs - complete test results - fix missing images for lines starting with " <img" - fix missing comma in supportedsites.py	2020-09-17 18:14:23 +02:00
choeronline	05b9ac8d37	[myhentaigallery] add extractor (#1001 ) * adds support for myhentaigallery * fixes linting issues in myhentaigallery extractor	2020-09-17 17:32:54 +02:00
Mike Fährmann	2626629117	[danbooru] handle posts without 'id' (fixes #1004 )	2020-09-16 21:35:27 +02:00
Mike Fährmann	cc1fb0b4ea	[500px] update query hash	2020-09-16 01:26:31 +02:00
Mike Fährmann	da87a5fb7e	[exhentai] fix accessing config before main constructor bug introduced with `055c32e0` Making 'Extractor.config()' quite a bit faster is worth the "cost" of having to set _cfgpath in exhentai constructors, I think.	2020-09-15 18:09:50 +02:00
Mike Fährmann	f5b7ae01c1	update extractor test results	2020-09-15 18:07:08 +02:00
Mike Fährmann	136df52d1f	[deviantart] support watchers-only/paid deviations (#995 )	2020-09-15 16:03:46 +02:00
Mike Fährmann	055c32e0f7	precompute extractor config paths	2020-09-14 22:06:54 +02:00
Mike Fährmann	231dd4c800	accumulate postprocessor objects (#994 ) Instead of one 'postprocessors' setting overwriting all others lower in the hierarchy, all postprocessors along the config path will now get collected into one big list. For example '--mtime-from-date' will therefore no longer cause other postprocessor settings in a config file to get ignored.	2020-09-14 21:51:55 +02:00
Mike Fährmann	3108e85b89	[worldthree] remove extractors http://www.slide.world-three.org/ hasn't been accessible for a long time.	2020-09-11 18:12:57 +02:00
Mike Fährmann	8fed3eb8cb	[jaiminisbox] remove extractors https://jaiminisbox.com/post.html	2020-09-11 18:09:35 +02:00
Mike Fährmann	dcf3ad7eef	[furaffinity] update download URL extraction (fixes #988 ) support the new 'd2.facdn.net' subdomain	2020-09-11 13:23:57 +02:00
Mike Fährmann	3918b69677	remove 'extractor.blacklist' context manager	2020-09-11 13:17:35 +02:00
Mike Fährmann	2b8d57f0ab	[twitter] support '/intent/user?user_id=…' URLs (#980 )	2020-09-08 23:17:50 +02:00
Mike Fährmann	a3b473bd2f	[twitter] support specifying users by ID (#980 ) by using 'id:…' as their screen name, i.e. https://www.twitter.com/id:2976459548/media instead of https://twitter.com/supernaturepics/media The user ID can, for example, be obtained from the output of $ gallery-dl -j --range 1 https://twitter.com/<screen-name>	2020-09-08 22:56:52 +02:00
Mike Fährmann	a0d916ed41	[exhentai] update wait time before original image download (#978 ) depend on 'wait-max', don't use a hard-coded value	2020-09-07 23:48:28 +02:00
Mike Fährmann	f6fd449b59	reduce wait time growth rate from exponential to linear Waiting for 2**N seconds after each error grows too fast. Simply waiting N seconds seems far more reasonable.	2020-09-06 22:38:25 +02:00
Mike Fährmann	bc48514d84	[aryion] get post ID via gallery-item (fixes #981 , closes #982 ) this even works when fetching post IDs from '/latest.php?id='	2020-09-06 22:17:23 +02:00
Mike Fährmann	799ca07fc8	[imgur] update - fix image/album detection for galleries - use new API endpoints for image/album data	2020-09-06 21:11:32 +02:00
Mike Fährmann	7876a03ece	[tumblr] create directories for each post (fixes #965 ) This changes the identifiers for directory format string fields. Everything blog related is now inside a 'blog' object and not at the "base level" anymore. E.g. '{name}' for directories is now '{blog[name]}' (or '{blog_name}', since that is also available)	2020-08-31 21:58:20 +02:00
Mike Fährmann	d50f3b333a	update extractor test results	2020-08-30 20:55:22 +02:00
Mike Fährmann	0f55b8e80a	[exhentai] fix type check from `dbbbb21` (#940 ) 'bool' is a subclass of 'int', and therefore 'isinstance(self.limits, int)' also returns True when 'self.limits' has a boolean value	2020-08-30 20:51:22 +02:00
Mike Fährmann	e33293fdd8	[hentaihand] update to new site layout	2020-08-30 00:41:03 +02:00
Mike Fährmann	fda9e296dd	[gelbooru] fix extraction without API	2020-08-28 22:33:37 +02:00
Mike Fährmann	69e4871005	update extractor test results - sensescans: replace 404d chapters - mangapark: replace 404d chapters - subscribestar: update test for attached files	2020-08-28 22:32:32 +02:00
Mike Fährmann	ab1af66a97	[imgur] add 'search' extractor (#934 )	2020-08-27 22:46:17 +02:00
Mike Fährmann	e4bbc1fb5c	[imgur] add 'tag' extractor (#934 )	2020-08-27 22:46:17 +02:00
Mike Fährmann	deaacc70bb	[hitomi] update URL pattern for tag searches	2020-08-27 22:46:03 +02:00
ArtaxIsSleeping	0e941553ec	[aryion] Add username/password support (#960 ) * Add username/password support to aryion extractor * Update docs to match * Fix code style	2020-08-27 22:45:30 +02:00
Mike Fährmann	84e04cc23b	[500px] fix extraction and update URL patterns (fixes #956 ) - rewrite most API calls to GraphQL queries - match '500px.com/p/<user>' URLs	2020-08-24 18:25:31 +02:00
Mike Fährmann	d4ff767291	[reddit] improve gallery extraction (fixes #955 )	2020-08-23 22:06:06 +02:00
Mike Fährmann	7140fe7e6d	[hitomi] fix redirect processing	2020-08-23 15:18:44 +02:00
Mike Fährmann	a57b6b3c3a	[reddit] handle deleted galleries (fixes #953 )	2020-08-20 20:14:07 +02:00
Mike Fährmann	063c71cd84	[furaffinity] add 'search' extractor (closes #915 )	2020-08-18 21:26:46 +02:00
Mike Fährmann	dbbbb21180	[exhentai] add ability to specify custom image limit (#940 )	2020-08-17 22:29:20 +02:00
Mike Fährmann	b2009ea39e	[aryion] update folder mime type list (fixes #945 )	2020-08-16 22:30:15 +02:00
Mike Fährmann	d06ad148c7	[shopify] use alternate regex for products on collection pages when the first on doesn't yield any results	2020-08-15 18:24:14 +02:00
Mike Fährmann	7619152988	[reactor] sort 'tags' to ensure a consistent order for test results	2020-08-15 18:22:31 +02:00
Mike Fährmann	cd9de613a2	[exhentai] adjust image limit costs (#940 ) Each original file costs 10 points per 10^6 bytes, not 10 per 2^20 == 1048576 bytes.	2020-08-15 18:19:33 +02:00
Mike Fährmann	2e6f6ee1c1	[mangoxo] fix login	2020-08-13 22:30:37 +02:00
Mike Fährmann	a6a080656c	[pixnet] detect password-protected albums (#177 )	2020-08-08 20:48:47 +02:00
Mike Fährmann	67ac6667af	[mangareader] fix extraction	2020-08-07 22:30:10 +02:00
Mike Fährmann	2b88c90f6f	[blogger] add search extractor (#925 )	2020-08-06 19:43:39 +02:00
Mike Fährmann	d5067c51c5	[instagram] support '/reel/' URLs	2020-08-06 19:20:25 +02:00
Mike Fährmann	2c9766b29f	fix UnboundLocalError in Extractor.request() introduced in `d6a271d`	2020-08-05 21:52:04 +02:00
Mike Fährmann	aa64149583	[blogger] support searching posts by labels (closes #925 )	2020-08-04 22:49:37 +02:00
Mike Fährmann	60ba3cb946	[reddit] support gallery posts (closes #920 )	2020-08-03 22:06:15 +02:00
Mike Fährmann	0d84d3af55	[subscribestar] extract attached media files (#852 )	2020-08-03 22:02:42 +02:00
Mike Fährmann	19bf76bcf8	update extractor test results	2020-08-03 21:57:00 +02:00
Mike Fährmann	0762d6b29c	[inkbunny] add 'num' field (#283 )	2020-07-30 19:26:09 +02:00
Mike Fährmann	fbc4278fe4	[instagram] wait before GraphQL requests (#901 )	2020-07-30 19:26:09 +02:00
Mike Fährmann	ec5870576d	[imgur] handle 403 overcapacity responses (closes #910 )	2020-07-30 19:26:01 +02:00
Mike Fährmann	d6a271d2c7	add 'response' objects to 'HttpError's	2020-07-30 18:23:26 +02:00
Mike Fährmann	72c5578a27	[hentainexus] improve/simplify code	2020-07-30 00:35:49 +02:00
Mike Fährmann	627d2141d3	[xhamster] fix extraction (closes #917 )	2020-07-29 22:51:34 +02:00
Mike Fährmann	27e31f4a16	[myportfolio] raise 'NotFoundError' for deleted posts	2020-07-27 16:15:24 +02:00
Mike Fährmann	f317a57c5e	[simplyhentai] fix 'gallery_id' extraction	2020-07-27 16:14:06 +02:00
Mike Fährmann	daeef8a5e3	[vsco] handle missing 'description' fields	2020-07-27 14:45:17 +02:00
Mike Fährmann	26a967cbd4	[pinterest] match 'pinterest.co.uk' URLs (fixes #914 )	2020-07-27 14:41:34 +02:00
Mike Fährmann	c5aaa1de77	[inkbunny] simplify metadata structure (#283 ) Just put everything at the top level, instead of having a separate 'post' object.	2020-07-26 23:43:50 +02:00
Mike Fährmann	b921fee24d	[inkbunny] fix submission order (#283 ) Getting detailed submission info via /api_submissions.php reordered the input submissions and sorted them by ID. InkbunnyAPI.detail() now sorts them back and ensures they are returned in their original order. This commit also removes the 'metadata' option and always requests submission descriptions.	2020-07-26 23:12:45 +02:00
Mike Fährmann	e50c75628c	[subscribestar] update 'date' parsing	2020-07-24 22:27:36 +02:00
Mike Fährmann	c4ed9f4faa	[inkbunny] add 'metadata' option (#283 )	2020-07-24 18:05:53 +02:00
Mike Fährmann	493cadb1e7	[inkbunny] add 'orderby' option (#283 )	2020-07-24 17:50:32 +02:00
Mike Fährmann	336e682a7a	[inkbunny] handle gallery/scraps URLs (#283 )	2020-07-24 17:05:00 +02:00
Mike Fährmann	8dbf827649	[bobx] remove module	2020-07-24 17:00:43 +02:00
Mike Fährmann	8f64585ff2	[twitter] handle 429 responses without x-rate-limit-reset header	2020-07-23 22:38:17 +02:00
Mike Fährmann	d2e17e16bf	[inkbunny] update tests (#283 )	2020-07-23 22:37:05 +02:00
Mike Fährmann	57f7d9b790	[inkbunny] improve error handling (#283 )	2020-07-23 22:31:22 +02:00
Mike Fährmann	baf5d0e3c1	[gfycat] skip malformed gfycat responses (closes #902 )	2020-07-22 23:59:56 +02:00
Mike Fährmann	453f3bc519	[blogger] improve error messages for missing posts/blogs (#903 )	2020-07-22 23:51:48 +02:00
Mike Fährmann	87202b8d74	[inkbunny] add 'user' and 'post' extractors (#283 )	2020-07-22 22:21:30 +02:00
Mike Fährmann	2ecf1efb16	update extractor test results - tumblr: remove deleted post - jaiminisbox: replace removed manga/chapters - smugmug: one inconsequential field got removed	2020-07-18 15:12:28 +02:00
Mike Fährmann	d5fcffcced	[subscribestar] add login capabilities (#852 )	2020-07-17 22:18:01 +02:00
Mike Fährmann	ecaecc4064	[exhentai] add 'domain' option (#897 )	2020-07-17 22:17:46 +02:00
Mike Fährmann	45c32213dc	[gfycat] retry 404'ed videos on redgifs (closes #874 )	2020-07-16 15:00:32 +02:00
Mike Fährmann	cf44571fe0	[gfycat] add 'user' and 'search' extractors	2020-07-16 15:00:32 +02:00
Mike Fährmann	11b744d971	[mangakakalot] improve/fix chapter extraction	2020-07-16 15:00:31 +02:00
Mike Fährmann	2da71cb561	[twitter] raise proper exception if user doesn't exist (#891 )	2020-07-16 15:00:31 +02:00
Leonardo Taccari	86e5a05e29	[twitter] add support for nitter.net URLs in pattern (#890 ) Please note that URLs are only "translated", all requests are still done always via the Twitter API.	2020-07-13 23:48:42 +02:00
Mike Fährmann	e17d4f44f6	[newgrounds] fix favorites extraction	2020-07-13 23:08:45 +02:00
Mike Fährmann	c51fbd72ba	update extractor test results	2020-07-13 22:57:48 +02:00
Mike Fährmann	9cd1bc6907	[mangakakalot] update URL patterns, fix flake8 errors (#876 )	2020-07-13 22:47:24 +02:00
jakem72360	7dfdcc3fbf	[mangakakalot] Added extractors for MangaKakalot (#876 )	2020-07-13 21:20:09 +02:00
Mike Fährmann	cb0132e441	[khinsider] add 'format' option (closes #840 )	2020-07-13 17:17:58 +02:00
Mike Fährmann	d594977ca1	[artstation] add 'following' extractor (closes #888 )	2020-07-12 23:03:05 +02:00
Mike Fährmann	3855d0dd3c	[twitter] add debug messages for all skipped Tweets (#867 )	2020-07-11 00:41:50 +02:00
Mike Fährmann	27d163afb3	[imgur] support all '/t/...' URLs (closes #880 ) … instead of just '/t/unmuted/'	2020-07-09 22:17:01 +02:00
Mike Fährmann	f5c9f1d066	[subscribestar] use current date instead of hard-coded '2020' (#852 )	2020-07-09 22:12:39 +02:00
Mike Fährmann	5a6e750704	[reddit] fix AttributeError when using 'recursion' (fixes #879 )	2020-07-09 19:19:05 +02:00
Mike Fährmann	94a08f0bcb	[reddit] limit title length in default filenames (#873 )	2020-07-09 18:19:33 +02:00
Mike Fährmann	3424fb96c3	[redgifs] support gifsdeliverynetwork.com URLs (#874 )	2020-07-09 18:04:30 +02:00
Mike Fährmann	f1344fe552	[patreon] yield images and attachments before postfiles (#871 ) The reported filename of the 'postfile' entry of each post may differ from the corresponding entry in the list of images or attachments, and be outright "wrong".	2020-07-09 00:10:26 +02:00
Mike Fährmann	6e2af9a8d8	[twitter] improve error message formatting	2020-07-06 23:13:05 +02:00
Mike Fährmann	c28db7a6ea	[8muses] support 'comics.8muses.com' URLs	2020-07-05 19:43:45 +02:00
Mike Fährmann	d5bfb0b38c	set pseudo extension for Metadata messages (#865 ) This prevents pathfmt.filename from potentially being empty.	2020-07-04 22:14:39 +02:00
Mike Fährmann	821524e4ee	[subscribestar] add 'user' and 'post' extractors (#852 )	2020-07-03 21:08:47 +02:00
Mike Fährmann	e62ebb4643	update CHANGELOG before building sdist and wheel packages	2020-06-27 19:45:09 +02:00
Mike Fährmann	f1ddbff0b5	[aryion] add 'recursive' option (fixes #832 ) This is enabled by default and will recursively go through all (sub)folders in an artist's gallery. The old method of using "Latest Updates" lists can be restored by disabling this option.	2020-06-26 23:36:50 +02:00
Mike Fährmann	699062b91f	Revert "[kissmanga] workaround for CAPTCHAs (#818 )" This reverts commit `4cf3d54718`.	2020-06-25 19:35:03 +02:00
Mike Fährmann	0cac14c3bd	update extractor test results	2020-06-25 19:11:47 +02:00
Mike Fährmann	5e5be67c26	[tumblr] prevent KeyErrors when using reblogs=same-blog (fixes #851)	2020-06-25 19:00:12 +02:00
Mike Fährmann	9da2bc67f8	[twitter] add option to filter media from quoted tweets (#854 )	2020-06-25 18:59:25 +02:00
Mike Fährmann	56ab5fb8f4	[twitter] improve handling of quoted tweets (#854 ) Split each "quote" into two parts: - the original tweet - the tweet that quoted the original	2020-06-24 21:14:18 +02:00
Mike Fährmann	bd0e1ca1a5	[imgur] build directory path for each file (closes #842 )	2020-06-21 19:25:52 +02:00
Mike Fährmann	a8c2d997e8	[twitter] treat quoted tweets like retweets (#833 ) - filter them when 'retweets' is disabled - set 'author' to the creator of the quoted tweet like it was before the rewrite	2020-06-21 19:14:12 +02:00
Mike Fährmann	aed1c63e51	[twitter] improve search results (fixes #847 ) Adding 'tweet_search_mode=live' to the query parameters is the most important part here.	2020-06-21 15:53:20 +02:00
Mike Fährmann	0e714b9a0e	[pinterest] add 'section' extractor (#835 )	2020-06-21 00:08:14 +02:00
Mike Fährmann	53cc498d9c	improve config lookup when there are multiple possible locations This specifically applies to all Mastodon extractors and all extractors with a 'basecategory', i.e. 'booru', 'foolslide', etc. Values inside those general config locations wouldn't be recognized when a value with the same was set on the 'extractor' level. For example 'extractor.mastodon.directory' should be used over 'extractor.directory' when both are set, but this was impossible with the previous implementation. (fixes #843)	2020-06-21 00:07:10 +02:00
Mike Fährmann	d81a8e6544	[twitter] update tests	2020-06-19 23:01:02 +02:00
Mike Fährmann	d39eedd9bb	[twitter] improve handling of deleted tweets (fixes #838 )	2020-06-19 18:11:37 +02:00
Mike Fährmann	1ae1df0d27	update '--write-pages' (#737 ) - fix infinite recursion for responses with multiple entries in 'history' - hide values of Set-Cookie headers - only write the response content by default (use '-o write-pages=all' to also include HTTP headers)	2020-06-18 15:07:30 +02:00
Mike Fährmann	dc16f73965	[twitter] move '_guest_token()' into TwitterAPI class	2020-06-18 15:02:51 +02:00
Mike Fährmann	3561d1020a	[twitter] always provide an 'author' field (#831 , #833 ) The idea was to have less metadata clutter for most Tweets were 'author' and 'user' are the same (non-retweets), and only provide a 'user' field. The original Tweet author could be gotten with {author[…]\|user[…]}, but basically no one knows about that.	2020-06-18 15:02:51 +02:00
Mike Fährmann	7158bdd7c7	[weibo] improve extractor logic (#829 )	2020-06-18 15:00:31 +02:00
Mike Fährmann	0371fd54a1	[artstation] add 'date' metadata field (#839 )	2020-06-17 20:22:18 +02:00
Mike Fährmann	8c857052d7	[mastodon] ignore toots without media attachments	2020-06-17 20:21:28 +02:00
Mike Fährmann	de045d39b2	[mastodon] add 'date' metadata field (#839 )	2020-06-17 19:22:28 +02:00
Mike Fährmann	d5d90a0450	[weibo] add 'date' field to 'status' objects (#829 )	2020-06-16 14:46:46 +02:00
Mike Fährmann	5ba90f72ca	[pinterest] add support for sections (closes #835 )	2020-06-16 14:41:05 +02:00
Mike Fährmann	c37a1c06c8	[twitter] add extractor for liked tweets (closes #837 ) You need to be logged in to get access to anyone's liked tweets, it seems.	2020-06-16 14:27:22 +02:00
Mike Fährmann	b94394104c	[twitter] don't download video previews (#833 ) when 'videos' is set to False	2020-06-16 14:10:51 +02:00
Mike Fährmann	bb882b8cdb	improve output of '-K' for parent extractors (#825 )	2020-06-14 21:39:21 +02:00
Mike Fährmann	4cf3d54718	[kissmanga] workaround for CAPTCHAs (fixes #818 ) Requesting the same page again when being redirected to a CAPTCHA lets us access that page without solving it.	2020-06-12 00:41:49 +02:00
Mike Fährmann	7daef6ee70	update extractor test results - certain posts on Instagram now return https://static.cdninstagram.com/rsrc.php/null.jpg for public users - MangaDex is deploying its new MangaDex@Home network similar to exhentai's Hentai@Home - realbooru has a new site layout, but the underlying booru API still works like before	2020-06-12 00:36:06 +02:00
Mike Fährmann	ffb6c5277a	[furaffinity] add 'artist_url' metadata field (closes #821 )	2020-06-11 18:36:24 +02:00
Mike Fährmann	be04e44e2c	[reddit] catch JSON decode errors (#765 )	2020-06-11 18:32:52 +02:00
Mike Fährmann	cf863f60b3	[redgifs] add 'user' and 'search' extractors (closes #724 )	2020-06-10 22:03:52 +02:00
Mike Fährmann	998d1d3a5c	[webtoons] generalize and improve comic extraction (fixes #820 )	2020-06-10 21:44:42 +02:00
Mike Fährmann	036a40943a	[twitter] don't cache results of 'user_by_screen_name()' A 'keyarg=1' argument to the memcache decorator would have worked as well, but keeping the user object in memory isn't useful for the vast majority of use cases and only wastes space. (closes #817)	2020-06-10 20:58:42 +02:00
Mike Fährmann	4442dfe7b8	[twitter] add 'reply_to' metadata to replies	2020-06-09 21:48:04 +02:00
Mike Fährmann	83b7bd0413	[nhentai] fix extraction (closes #819 )	2020-06-09 21:27:07 +02:00
Mike Fährmann	d769bb4b80	[twitter] improve pagination	2020-06-07 15:23:45 +02:00
Mike Fährmann	5bc1097f9d	[twitter] metadata cleanup #2 - remove useless clutter by creating new tweet-data dicts instead of reusing the original Tweet objects - rename fields to how they were named before ('id_str' -> 'tweet_id', etc.) - only include 'author' if it would differ from 'user' - restore 'archive_fmt'	2020-06-07 02:25:29 +02:00
Mike Fährmann	c6c06c41f6	[deviantart] don't add journal text to description (#712 )	2020-06-05 21:56:12 +02:00
Mike Fährmann	4aea5138dd	[sensescans] use https://	2020-06-05 21:55:19 +02:00
Mike Fährmann	3eed5f52d7	[twitter] small metadata cleanup - add 'date' field - remove 'entities' and 'extended_entities' - don't include 'focus_fields' from 'original_info'	2020-06-04 18:21:54 +02:00
Mike Fährmann	655c98cbef	[twitter] skip unavailable tweets	2020-06-04 14:51:25 +02:00
Mike Fährmann	41d03160ff	[deviantart] also search journals for sta.sh links (#712 ) when 'extra' is enabled	2020-06-04 14:47:08 +02:00
Mike Fährmann	2132e5461a	[twitter] restore TwitPic support	2020-06-04 01:22:34 +02:00
Mike Fährmann	bd0f21478a	[twitter] login using the mobile nojs login page	2020-06-04 00:07:12 +02:00
Mike Fährmann	a10f31dde5	[twitter] rewrite; use new interface (#740 , #806 ) Everything except logging in with username & password and TwitPic embeds should be working again. Metadata per Tweet is massively different than before (mostly raw API responses - might need some cleaning up) and the default 'archive_fmt' changed.	2020-06-03 20:51:29 +02:00
Mike Fährmann	3bad1579ee	update extractor test results	2020-05-31 17:42:07 +02:00
Mike Fährmann	864f4220d9	update output of 'oauth:…' (#616 )	2020-05-31 17:41:40 +02:00
Mike Fährmann	0f459f340b	[instagram] fix and re-enable login with username&password This reverts commit `3e0848a482`. (#756, #771, #797, #803) https://github.com/althonos/InsaLooter/issues/287#issuecomment-630456522	2020-05-31 00:29:09 +02:00
Mike Fährmann	3e0848a482	[instagram] disable login with username&password (#756 )	2020-05-29 23:29:40 +02:00
Mike Fährmann	a32aea41e1	[instagram] update 'query_hash' values	2020-05-29 23:11:42 +02:00
Mike Fährmann	2bff8dd465	[hentainexus] fix flake8 issues (#787 )	2020-05-28 22:45:08 +02:00
Mike Fährmann	a63682a9c0	[instagram] simplify code & complete tests (#743 )	2020-05-28 22:31:01 +02:00
墨焓	a4e3d40672	hentainexus.py minor fix (#787 ) * rectify code of `join_title`, some minor fix. * + hentainexus self.data * fixed: call staticmethod join_title with data	2020-05-28 21:59:26 +02:00
Vrihub	62b65e59d0	Add instagram metadata: post_pageurl, post_tags (#743 ) * Add instagram metadata: post_pageurl, post_tags Add the following metadata for instagram: - post_pageurl: json string with url of the post page - post_tags: json array with instagram tags extracted from the post description * Oops: rename post_tags to tags for --write-tags This way, --write-tags will pick up the post tags. * Rename to post_url, improve regex * Add post_url and tags to tests * Remove duplicate tags and sort them * Bugfix: don't create empty tag lists * Metadata: add location * Metadata: add tagged_users for each media * Move self._find_tags() to base class * Make flake happy	2020-05-28 21:58:24 +02:00
Mike Fährmann	275cceeb6a	[redgifs] fix extraction (#724 ) … and prepare for more potential extractors	2020-05-28 02:18:42 +02:00
Mike Fährmann	45baa13615	update extractor test results - don't run Instagram tests on Travis anymore - replace Twitter test because timeline was made private - update Hiperdex domain to '.com' (again ...)	2020-05-28 02:18:06 +02:00
Mike Fährmann	dfcf2a2c91	write OAuth token to cache by default (#616 )	2020-05-25 22:35:45 +02:00
Mike Fährmann	15c3d29062	move dump_response() into a separate function (#737 )	2020-05-25 22:21:58 +02:00
Mike Fährmann	a363da4b43	include redirects and headers in --write-pages dumps (#737 )	2020-05-25 22:21:57 +02:00
Mike Fährmann	6bcdb264e0	[imgur] treat 't/unmuted' URLs as galleries	2020-05-25 22:21:57 +02:00
Mike Fährmann	b6cee3e45b	[imgur] fix extraction of animated images without 'mp4' entry	2020-05-25 22:21:57 +02:00
Leonardo Taccari	bcac31b7c7	[webtoons] make archive_fmt unique (#779 ) close #778	2020-05-25 21:23:54 +02:00
Mike Fährmann	e19f665a44	[danbooru] change default for 'ugoira' to 'false' Downloading the pre-rendered versions should be a better default than .zip files with individual frames.	2020-05-20 19:57:28 +02:00
Mike Fährmann	3201fe3521	add global SENTINEL object	2020-05-19 22:32:53 +02:00
Mike Fährmann	c8787647ed	add global WINDOWS bool	2020-05-19 22:32:53 +02:00
Mike Fährmann	6294e2c540	add 'text.ensure_http_scheme()'	2020-05-19 22:32:53 +02:00
Mike Fährmann	0378d079a5	[webtoons] fixes and simplifications (#593 , #761 ) - fix episode listings for french comics - allow input URLs without explicit scheme - add 'lang'/'language' metadata - use str.format() instead of '+' to assemble URLs	2020-05-18 20:20:03 +02:00
Mike Fährmann	ab11b1c896	[imagechest] simplify code (#750 )	2020-05-18 19:11:26 +02:00
Mike Fährmann	846d3a2466	[sexcom] replace 404ed test	2020-05-18 19:04:51 +02:00
Mike Fährmann	9b4635917f	[gelbooru] simplify and fix pool extraction use 'pool:<pool id>' as search tag to get pool posts	2020-05-18 19:04:51 +02:00
Leonardo Taccari	39cd389679	[webtoons] Add a new extractor for webtoons.com (#761 ) The webtoons extractor can extract episode and entire comic (all episodes) from webtoons.com. All the logic of the extractors should be trivial except for a couple of kludges needed: - `ageGatePass' cookie is always set to avoid possible redirect and stop of extraction, especially in the comic extractor - The image URLs returned by the episode extractor could not be fetched directly and the `Referer:' HTTP header needs to be passed to fetch them Close #593.	2020-05-18 19:04:20 +02:00
Bepis	7b5711ee04	[imagechest] Add new extractor for ImageChest (#750 ) * [imagechest] Add new extractor for ImageChest * [imagechest] Fix flake8 compliance issues	2020-05-18 19:02:56 +02:00
Mike Fährmann	a1e739b96c	reuse connection adapters from parent extractors	2020-05-12 23:52:01 +02:00
Mike Fährmann	f8f95e68a7	improve '--write-pages' (#737 ) - move code into its own function - add enumeration index to filenames - dump responses regardless of status code	2020-05-12 20:40:25 +02:00
Mike Fährmann	09cc9dbec0	prevent flake8 errors from comments looking like type annotations	2020-05-12 20:08:05 +02:00
Mike Fährmann	2d6724180b	[hiperdex] update domain to hiperdex.info	2020-05-12 17:00:51 +02:00
Vrihub	4cc761c730	Implement --write-pages option (#736 ) * Implement --write-pages option * Fix long lines * Fix file mode to binary * Fix pattern for Windows compatibility	2020-05-12 14:25:21 +02:00
Mike Fährmann	f557cac074	[redgifs] add image extractor (#724 )	2020-05-10 00:31:42 +02:00
Mike Fährmann	65b1cb7acd	[deviantart] use private access tokens for Journals (fixes #738 )	2020-05-08 21:45:01 +02:00
Mike Fährmann	0bf0146bfe	[reddit] don't send OAuth headers for file downloads (fixes #729 )	2020-05-08 21:42:52 +02:00
Mike Fährmann	d6a480682f	update test results	2020-05-02 21:13:00 +02:00
Leonardo Taccari	b47cfc5ac9	[speakerdeck] Add a new extractor for speakerdeck.com (#726 )	2020-05-01 22:32:22 +02:00
Mike Fährmann	90491ab606	[artstation] improve embed extraction (#720 )	2020-04-30 21:25:03 +02:00
Mike Fährmann	999efec5cc	[deviantart] limit API wait times to 2**9=512 seconds (#721 )	2020-04-30 21:16:09 +02:00

... 2 3 4 5 6 ...

2051 Commits