gallery-dl

mirror of https://github.com/mikf/gallery-dl.git synced 2024-11-23 03:02:50 +01:00

Author	SHA1	Message	Date
Mike Fährmann	b06c372e4d	[postprocessor:exec] improve; add command-line option (#421 )	2019-10-05 23:46:55 +02:00
Mike Fährmann	5a54efa025	[xhamster] unescape 'title' and 'description'	2019-10-04 14:44:51 +02:00
Mike Fährmann	1b9bf4fc6e	[behance] fix 'tags' extraction	2019-10-03 17:36:02 +02:00
Mike Fährmann	bb97e87989	[komikcast] ignore banner image	2019-10-03 17:34:06 +02:00
Mike Fährmann	0ff90a3f7d	[gfycat] include title in default filenames (closes #434 )	2019-10-02 21:46:01 +02:00
Mike Fährmann	fabdc3b0c6	release version 1.10.5	2019-09-28 22:13:41 +02:00
Mike Fährmann	de4e2029d1	[nsfwalbum] update test album the old one is no longer available	2019-09-28 20:48:15 +02:00
Mike Fährmann	1faec285d1	[nijie] further improvements (closes #423 ) - provide a 'user_name' metadata field - usually the same as 'artist_id', except for favorite downloads - extract the whole description text and properly escape HTML entities - fixed an issue with titles or tags containing double quotes	2019-09-27 23:14:32 +02:00
Mike Fährmann	6d0a533d68	[reddit] respect 'comments:0' for single submissions (#429 )	2019-09-27 23:11:28 +02:00
Mike Fährmann	803d8f814e	[oauth] update scope for reddit tokens (#428 ) '/user/<username>/...' requires the 'history' scope to be accessible (https://www.reddit.com/dev/api/#GET_user_{username}_{where})	2019-09-27 17:38:55 +02:00
Mike Fährmann	46ba173ded	[reddit] fix documentation inconsistencies (closes #429 ) - Require 'reddit.comments' to be a number and convert it to an integer to be extra sure - Link to the README's OAuth section were appropriate	2019-09-27 17:34:10 +02:00
Mike Fährmann	20eb6c401f	[nijie] improvements and fixes (#423 ) - ignore unavailable image pages - more metadata fields: artist_name, date, tags - rename 'index' to 'num' - improved code structure	2019-09-26 21:45:01 +02:00
Mike Fährmann	d1ea08c67d	[weibo] fixes and improvements - ignore unavailable videos (fixes #427) - handle empty 'geo' fields - consistent metadata fields for images and videos	2019-09-26 14:57:35 +02:00
Mike Fährmann	38d97f3da6	[deviantart] add debug message about API credentials (#424 )	2019-09-25 21:20:55 +02:00
Mike Fährmann	80c2104fb5	[deviantart] fix 429 handling if 'fatal' is False (closes #424 )	2019-09-25 21:16:35 +02:00
Mike Fährmann	913460240d	[reddit] fix 'extractor.blacklist()' arguments The second argument must support 'append()'.	2019-09-24 23:01:12 +02:00
Mike Fährmann	22bac14452	[pixiv] match '/artworks/' URLs	2019-09-24 21:53:14 +02:00
Mike Fährmann	66cac207ac	[twitter] match and use 'i/web' status URLs	2019-09-24 21:18:05 +02:00
Mike Fährmann	946f2751e2	[reddit] add 'user' extractor (closes #350 )	2019-09-22 22:18:17 +02:00
Mike Fährmann	c14abb9fb8	[reddit] improve URL parameter handling for subreddit links	2019-09-22 22:03:22 +02:00
Mike Fährmann	ee8b654464	[instagram] implement 'highlights' option (closes #329 )	2019-09-21 23:38:20 +02:00
Mike Fährmann	f63c3097a9	[instagram] rework some code paths - combine fetching an HTML page and extracting its 'shared_data' - move 'shared_data' and field access info out of '_extract_page()' - introduce a '_request_graphql()' method	2019-09-21 23:10:41 +02:00
Mike Fährmann	4330133114	[imgur] add 'favorite' extractor (closes #420 ) … and use a newer site-internal API endpoint for user posts	2019-09-19 15:54:26 +02:00
Mike Fährmann	ee5e20221f	[imgth] fix image URLs	2019-09-19 14:56:48 +02:00
Mike Fährmann	b63b126808	[hentaicafe] extend URL pattern	2019-09-18 19:08:45 +02:00
Mike Fährmann	d780f0357e	[imgur] add user extractor	2019-09-17 22:58:18 +02:00
Mike Fährmann	11ea689013	[simplyhentai] fix image and video URLs	2019-09-16 21:37:16 +02:00
Mike Fährmann	15632a1570	[tsumino] fix extraction	2019-09-15 22:09:59 +02:00
Mike Fährmann	d92802fd37	[luscious] fix detection of unavailable galleries	2019-09-15 21:16:25 +02:00
Mike Fährmann	f99da2b866	[imgbb] detect invalid album and user profile links and update test results, since the old album got deleted	2019-09-14 23:22:08 +02:00
Mike Fährmann	01bc7adadc	[deviantart] improve journal detection (#419 ) Some journal-like posts are not reported to be journals (isJournal is set to False), even though they have a textContent field. https://www.deviantart.com/gliitchlord/art/brashstrokes-812942668	2019-09-14 22:45:22 +02:00
Mike Fährmann	776e9e073f	close archive on job completion (#417 )	2019-09-10 22:43:51 +02:00
Mike Fährmann	5ac9732adc	call 'sys.exit()' on Ctrl+c	2019-09-10 16:53:21 +02:00
Mike Fährmann	9178b54eae	handle errors when opening download archive file (#417 )	2019-09-10 16:44:47 +02:00
Mike Fährmann	6e12907de6	[deviantart] improve handling of private deviations (#414 ) - don't try to call '/deviation/metadata' with an empty list of deviation ids - print a warning when detecting private deviations without having a 'refresh-token'	2019-09-10 16:09:03 +02:00
Mike Fährmann	4203931d79	release version 1.10.4	2019-09-08 13:54:45 +02:00
Mike Fährmann	e7690ac694	[vsco] update URL pattern (closes #410 )	2019-09-08 11:37:27 +02:00
Mike Fährmann	1848788970	update test results etc	2019-09-08 11:33:35 +02:00
Mike Fährmann	d5fbb2d9de	[tumblr] ignore audio links from Spotify etc.	2019-09-07 18:18:12 +02:00
Mike Fährmann	b1cddce865	Revert "[simplyhentai] fix extraction; remove image+video extractors" This reverts commit `d1db5180ab`.	2019-09-07 14:48:31 +02:00
Mike Fährmann	d23660c04d	[hentaicafe] restore default 'request()' behavior	2019-09-07 14:35:00 +02:00
Mike Fährmann	9ae58a6b3e	[exhentai] update image limit checks - adjust cost of original images - delay limit initialization until gallery and first image page have been requested and all cookies are available	2019-09-07 13:29:01 +02:00
Mike Fährmann	6fe9a134bf	[lineblog] add blog and post extractors (closes #404 )	2019-09-06 22:16:42 +02:00
Mike Fährmann	4e8a548a61	[livedoor] update metadata extraction	2019-09-06 21:44:25 +02:00
Mike Fährmann	f9285f99e6	[pixiv] fix authentication	2019-09-02 22:38:56 +02:00
Mike Fährmann	6f3df3999a	[fuskator] add gallery and search extractor (closes #407 )	2019-09-02 21:20:02 +02:00
Mike Fährmann	bc0ca66c99	[twitter] small improvements - handle reply tweets (#403) - unset cookies in Tweet extractor to "force" the legacy interface	2019-09-01 17:37:48 +02:00
Mike Fährmann	682105b8ee	prevent crash when loading unavailable downloader (#405 )	2019-08-31 21:58:33 +02:00
Mike Fährmann	5fcebb69c2	[postprocessor:ugoira] improve error messages (#406 )	2019-08-31 21:55:42 +02:00
Mike Fährmann	f02a768b5c	[danbooru] add 'ugoira' option (#406 ) to choose between ZIP archives or converted video files for Ugoira posts	2019-08-31 21:51:52 +02:00
Mike Fährmann	9646ccb320	release version 1.10.3	2019-08-30 19:41:16 +02:00
Mike Fährmann	dedea3b4db	[deviantart] fix journal creation (#400 )	2019-08-30 18:50:04 +02:00
Mike Fährmann	c6c5cb1898	improve 'deviantart.quality' description	2019-08-30 18:41:18 +02:00
Mike Fährmann	8eaae58045	[downloader:http] change log message level to 'debug'	2019-08-29 23:05:47 +02:00
Mike Fährmann	efb64ad031	[deviantart] generate filenames (#392 , #400 )	2019-08-29 10:09:21 +02:00
Mike Fährmann	0ce98169b8	improve path generation - fix 'abspath()' results for Python <3.7 (closes #402) - 'abspath()' in Python 3.7+ removes trailing path separators - in Python <3.7 it doesn't - filter empty path segments	2019-08-28 23:25:18 +02:00
Mike Fährmann	b2151f3928	[seiga] support mobile URLs (closes #401 )	2019-08-28 22:56:43 +02:00
Mike Fährmann	20fd2d8450	[flickr] skip unavailable images/videos (fixes #398 )	2019-08-27 23:26:49 +02:00
Mike Fährmann	60c8e090da	[postprocessor:zip] fix archive names (closes #397 ) Remove the trailing path separator introduced in `3284c62` before adding the archive's filename extension. [ci skip]	2019-08-24 23:14:26 +02:00
Mike Fährmann	7c09545f70	[downloader:ytdl] add 'outtmpl' option (#395 )	2019-08-24 22:47:59 +02:00
Mike Fährmann	5cc7be2536	[piczel] update and improve - use proper pagination (fixes #396) - update API host and endpoints - "fix" double slash // in image URLs	2019-08-24 20:37:33 +02:00
Mike Fährmann	0c1c7abb4d	release version 1.10.2	2019-08-23 22:10:54 +02:00
Mike Fährmann	49f6d7176d	[deviantart] restore filenames (#392 ) <title>_by_<user>_<id> --> <title>_by_<user>-<id>	2019-08-23 22:02:03 +02:00
Mike Fährmann	63daa68d67	[deviantart] improvements (#392 ) - consistent 'filename' entries, at least as far as possible - GIFs and SWFs don't have a <title>_by_<artist>_<id> anywhere in their metadata - Generating <id> (from 'deviationid'?) might be something that needs to be figured out, so we can build those filenames ourselves - better code structure etc. - tests for videos, archives, and flash animations	2019-08-23 12:27:19 +02:00
Mike Fährmann	d1db5180ab	[simplyhentai] fix extraction; remove image+video extractors	2019-08-22 23:56:41 +02:00
Mike Fährmann	30d6e284b0	[deviantart] use NAPI for artworks and scraps (#392 ) TODO: - journal downloads - test for all media types	2019-08-21 23:56:06 +02:00
Mike Fährmann	7d6af936c5	[imgur] simplify gallery extraction	2019-08-20 20:00:43 +02:00
Mike Fährmann	3284c62f22	ensure PathFormat.directory ends with a path separator ... plus some other small optimizations	2019-08-20 00:25:13 +02:00
Mike Fährmann	ebabc5caf1	[downloader:http] treat 416 without downloaded data as error Downloading https://pbs.twimg.com/media/EB2cGUYX4AI2Vuu.jpg:orig (NSFW) sometimes returns a 416 status code, even though no 'Range' header was sent and no data was downloaded prior. This code usually means a file has already been downloaded completely and the download method indicates success, but in this case it causes an exception down the pipeline since no file was created.	2019-08-20 00:15:17 +02:00
Mike Fährmann	2495b99347	[postprocessor:classify] improve path generation (fixes #138 ) It still doesn't work for converted ugoira animations thanks to how those files are handled, but everything else, including files with unknown or changing file extension, now works as it should.	2019-08-19 23:05:28 +02:00
Mike Fährmann	e77a656437	optimize directory path generation - use str.join() instead of os.path.join() (less "features", but 10x as fast) - cache directory formatters - detect and optimize field access for 1-element format strings	2019-08-19 15:56:20 +02:00
Mike Fährmann	51d10783fc	[patreon] include image info in API results (#383 )	2019-08-18 23:28:47 +02:00
Mike Fährmann	7a5e78741c	[booru] build directory path for each file (#385 )	2019-08-18 23:28:33 +02:00
Mike Fährmann	b1728f512d	[patreon] support multi image posts and post URLs (#383 )	2019-08-17 23:24:46 +02:00
Mike Fährmann	454bf1ebf9	preserve enumeration index after 'set_extension()' (#306 )	2019-08-16 23:12:33 +02:00
Mike Fährmann	f5039b897f	replace DownloadArchive.check() with __contains__() Interestingly enough, 'a in obj' is slightly faster than 'obj.check(a)' and is also nicer to look at, I think.	2019-08-16 23:12:32 +02:00
Mike Fährmann	5a210991b6	Remove control characters from filesystem paths - add 'path-remove' option to specify the set of characters that should be removed - rename 'restrict-filenames' to 'path-restrict' - #348, #380	2019-08-16 23:12:16 +02:00
Mike Fährmann	c50d60a53d	[reactor] fix image URLs	2019-08-16 14:07:22 +02:00
Mike Fährmann	32447d0d24	[pixiv] simplify default filename format (#366)	2019-08-15 13:32:47 +02:00
Mike Fährmann	5f8621b29d	improve output of active post processor modules	2019-08-15 13:31:04 +02:00
Mike Fährmann	2cbbc3dec4	add a 'whitelist' to '--ugoira-conv' (#382 )	2019-08-15 13:27:57 +02:00
Mike Fährmann	829b1ccf04	[imgur] distinguish album and gallery URLs (#380 ) A gallery can be either an album or a single image.	2019-08-14 21:40:14 +02:00
Mike Fährmann	23251356cb	require 'extension' data for each URL (#382 )	2019-08-14 20:03:03 +02:00
Mike Fährmann	a67413d64f	[xhamster] use input URL domain Don't rewrite all URLs as 'https://xhamster.com/...'	2019-08-14 00:21:30 +02:00
Mike Fährmann	0bb873757a	update PathFormat class - change 'has_extension' from a simple flag/bool to a field that contains the original filename extension - rename 'keywords' to 'kwdict' and some other stuff as well - inline 'adjust_path()' - put enumeration index before filename extension (#306)	2019-08-12 21:40:37 +02:00
Mike Fährmann	423f68f585	[deviantart] fix scraps extraction (closes #376 )	2019-08-11 16:06:15 +02:00
Mike Fährmann	3bf20ffb70	[instagram] add support for story highlights	2019-08-10 14:34:22 +02:00
Mike Fährmann	a732e9c430	[instagram] update query hashes and headers	2019-08-10 14:13:08 +02:00
Mike Fährmann	2ccf6a9e35	[instagram] make extractor tests happy (#373 )	2019-08-08 18:50:26 +02:00
Mike Fährmann	8dc42bb178	implement 'enumerate' for 'extractor.skip' (#306 ) [ci skip]	2019-08-08 18:37:54 +02:00
Leonardo Taccari	bc5eaf7746	[instagram] Add support for IGTV (#373 ) Add support for IGTV profile (instagram.com/<username>/channel/) and IGTV medias (instagram.com/tv/<short_id>).	2019-08-08 18:33:13 +02:00
Mike Fährmann	b7fb93e2b2	[downloader:http] add 'adjust-extensions' option	2019-08-08 16:54:20 +02:00
Mike Fährmann	eb7da159e2	[imagebam] update URL test results Image URLs are now using https://, but the website itself is still served as http://.	2019-08-07 21:47:44 +02:00
Mike Fährmann	189acbeac9	[imgbb] add extractor for individual images (closes #363 )	2019-08-05 22:52:08 +02:00
Mike Fährmann	ad3ac02fbc	[pixiv] update metadata entries (#366 ) - change 'num' to a simple enumerating integer - change default filename format - provide content of the old 'num' field as 'suffix' - add 'filename' for ugoira	2019-08-05 22:41:56 +02:00
Mike Fährmann	1ff4c4ec03	[adultempire] consistent artist order	2019-08-05 22:06:11 +02:00
Leonardo Taccari	2df050e627	[instagram] Add support for stories (#371 ) * [instagram] Add support for stories Add support for Instagram user's stories (https://www.instagram.com/stories/<username>/). First the shared_data in instagram.com/stories/<username> is fetched in order to retrieve the user_id that is then passed to fetch the stories via the corresponding graphql query. Please note that fetching stories is supported only when authentication is enabled and the corresponding <username> is followed. * [instagram] Add an only-matching test for stories * [instagram] Simplify InstagramExtractor.items() and _extract_stories() Simplify handling of typename in InstagramExtractor.items() and multi-line string in _extract_stories(). NFCI.	2019-08-05 22:04:34 +02:00
Mike Fährmann	f4bc75e854	fix rate limit handling for OAuth APIs (#368 )	2019-08-03 13:43:00 +02:00
Mike Fährmann	3957d27d79	[deviantart] add 'quality' option (#369 )	2019-08-03 11:40:35 +02:00
Mike Fährmann	64b2935d8e	[pixiv] provide 'filename' and change default filename format to '{filename}.{extension}' (closes #366)	2019-08-02 22:35:10 +02:00
Mike Fährmann	2f33bac030	release version 1.10.1	2019-08-02 21:23:06 +02:00
Mike Fährmann	fa60109e97	[exhentai] don't use e-hentai.org for exhentai URLs	2019-08-02 21:10:09 +02:00
Mike Fährmann	dfe552421b	release version 1.10.0	2019-08-01 23:22:58 +02:00
Mike Fährmann	0609afd1e4	update default cache directory ... again Use a 'gallery-dl' subdirectory in ~/.cache to adhere to how other programs store their cached data, and call os.makedirs() so it also works without an existing ~/.cache directory.	2019-08-01 22:11:00 +02:00
Mike Fährmann	4a0c98bfc9	miscellaneous fixes and adjustments	2019-08-01 22:09:43 +02:00
Mike Fährmann	2c839f3760	[imgbb] add user extractor + login support (#361 )	2019-08-01 21:39:20 +02:00
Mike Fährmann	a8b60b2bd9	change default cache directory for unix systems Use either $XDG_CACHE_HOME or ~/.cache (if the former isn't set) and store potentially sensitive cookies and tokens in a user's home directory and not in the world-readable /tmp.	2019-07-31 22:56:14 +02:00
Mike Fährmann	4b6edfbfd2	restrict permissions without importing 'pathlib' and only on non-Windows systems. 1. On Windows the 'mode' argument for os.open() has no (visible) effect on access permissions for new files. 2. The default location for 'cache.file' on Windows is in %USERPROFILE%\AppData\Local\Temp which can only be accessed by the owner himself (or an admin).	2019-07-31 21:48:09 +02:00
Leonardo Taccari	afce1ee1eb	Avoid possible sensitive information disclosure via cache.file Previously cache.file could be created world readable leading to possible sensitive information disclosure on multi-user systems. Restrict permissions only to the owner by creating an empty file. Please note that cache.file created before this commit may need a `chmod 600' or similar!	2019-07-31 15:05:26 +02:00
Mike Fährmann	2153206093	[imgbb] add album extractor (#361 )	2019-07-30 23:11:19 +02:00
Mike Fährmann	beb4fab2e6	[exhentai] improve limit and error handling (#360 ) - check image limit before opening the first gallery or image page - prevent any further exhentai extractors from running after the image limit has been reached	2019-07-30 22:58:35 +02:00
Mike Fährmann	81b35ed3cb	[exhentai] catch more error states (#356 , #360 ) - warn on MPV-enabled galleries - catch parsing errors for gallery pages and image info - write page content to debug output	2019-07-29 16:54:31 +02:00
Mike Fährmann	a90280f4e7	[postprocessor:zip] add 'mode' option (#355 )	2019-07-29 16:51:26 +02:00
Mike Fährmann	6ce22f606b	[exhentai] update login procedure and tests Logging in now follows the natural login flow that also happens in a browser more closely and collects more cookies than just ipb_member_id and ipb_pass_hash. Test URLs have been updated and now point to the e-hentai.org domain.	2019-07-28 16:51:05 +02:00
Mike Fährmann	dc73d02d87	[exhentai] always use e-hentai.org as domain + set nw cookie	2019-07-28 10:54:17 +02:00
Mike Fährmann	40637556fa	[ngomik] fix extraction	2019-07-28 10:53:46 +02:00
Mike Fährmann	3969f9cbbd	[behance] fix collection extraction	2019-07-27 14:26:40 +02:00
Mike Fährmann	20f7b07312	ensure postproc finalize() is called during C-c or crash (#355 )	2019-07-27 11:14:52 +02:00
Mike Fährmann	17a3426845	[gelbooru] enable all content when not using API	2019-07-27 11:13:38 +02:00
Mike Fährmann	279db2c5b2	[vsco] add collection & image extractor + video support (#331 )	2019-07-26 19:06:15 +02:00
Mike Fährmann	547ea71463	[downloader.ytdl] add 'forward-cookies' option (#352 ) The "long" name is necessary because just calling it 'cookies' would clash with how the lookup for '--cookies' is implemented.	2019-07-24 21:19:11 +02:00
Mike Fährmann	d9d44ad953	[tsumino] update test results	2019-07-24 21:17:23 +02:00
Mike Fährmann	b1bea8aaeb	add 'restrict-filenames' option (#348 )	2019-07-23 17:41:24 +02:00
Mike Fährmann	60cf40380a	[vsco] add user extractor (#331 )	2019-07-23 16:23:11 +02:00
Mike Fährmann	3fe5ccdfa6	[adultempire] add gallery extractor (closes #340 )	2019-07-21 22:29:57 +02:00
Mike Fährmann	b3851e01d9	release version 1.9.0	2019-07-19 21:55:25 +02:00
Mike Fährmann	5d968412ca	[deviantart] case-insensitive folder name matching (fixes #343 )	2019-07-19 18:05:31 +02:00
Mike Fährmann	a3c736fedc	[500px] fix extraction Maximum available image dimensions have been reduced to 4096px on the longest edge. (from 5000px) A few (unimportant) metadata fields are no longer available or have been changed to 'null'.	2019-07-19 17:23:03 +02:00
Mike Fährmann	1133b7fcbd	[smugmug] update unit tests The account used for tests before has been deleted.	2019-07-19 17:16:24 +02:00
Mike Fährmann	21991acc49	add 'ciphers' option; update default User-Agent	2019-07-19 17:14:40 +02:00
Mike Fährmann	84f4d3bc0b	replace urllib3's default cipher list with Firefox's (#342 ) Avoids Cloudflare CAPTCHAs on both Linux in Windows without pyOpenSSL installed.	2019-07-18 19:42:13 +02:00
Mike Fährmann	feb98cf196	[twitter] improve 'content' formatting; add option (#338 ) - include emoticons - leave newlines intact - remove pic.twitter.com/ links at the end	2019-07-17 16:02:51 +02:00
Mike Fährmann	1740086d8a	add 'repl' and 'sep' arguments to text.replace_html()	2019-07-17 14:48:24 +02:00
Mike Fährmann	8d1ae9b715	[tumblr] enable date-min/-max/-format options (#337 )	2019-07-17 14:36:41 +02:00
Mike Fährmann	09f37fde39	[reddit] move date-min/-max handling into Extractor class	2019-07-16 22:54:39 +02:00
Mike Fährmann	7b77ecc35a	fix paths for files without extension (#220 )	2019-07-15 16:39:03 +02:00
Mike Fährmann	c41ff9441e	improve find() for downloaders and postprocessors	2019-07-15 16:33:03 +02:00
Mike Fährmann	0151e250f5	[twitter] extract 'content' metadata (closes #333 )	2019-07-15 16:25:22 +02:00
Mike Fährmann	16c582aaf9	implement 'mtime' post-processor (#332 ) This can set a file's modification time according to a UNIX timestamp or a datetime object from its metadata.	2019-07-14 22:39:17 +02:00
Mike Fährmann	62097284fe	add 'download' option (#220 )	2019-07-14 18:48:18 +02:00
Mike Fährmann	fe7805de7c	improve attribute access in DownloadJob.handle_url() Storing a value in a local variable an accessing it that way is faster than going through 'self' if it is accessed more than once.	2019-07-13 21:42:07 +02:00
Mike Fährmann	56c7a66a4a	detect Cloudflare CAPTCHAs and update cipher list	2019-07-10 15:18:20 +02:00
Mike Fährmann	a7b42b37a2	[35photo] fix extraction	2019-07-09 20:33:57 +02:00
Mike Fährmann	04b8d0894a	[newgrounds] improve metadata extraction	2019-07-08 17:53:55 +02:00
Mike Fährmann	12da6bd0c9	[simplyhentai] fix/improve extraction	2019-07-06 20:25:53 +02:00
Mike Fährmann	fdec59f8e2	replace extractor.request() 'expect' argument with - 'fatal': allow 4xx status codes - 'notfound': raise NotFoundError on 404	2019-07-05 00:42:16 +02:00
Mike Fährmann	2ff73873f0	[erolord] add gallery extractor (closes #326 )	2019-07-04 20:28:04 +02:00
Mike Fährmann	b4da8c5a97	[sexcom] add extractor for related pins (#325 )	2019-07-03 21:04:23 +02:00
Mike Fährmann	69997e92db	[sexcom] skip unavailable pins (#325 )	2019-07-02 22:05:54 +02:00
Mike Fährmann	8966930c5c	[downloader:http] try to import SSL exception class from OpenSSL (#324)	2019-07-01 20:10:26 +02:00
Mike Fährmann	bc6b0cfddc	[shopify] skip consecutive duplicate products Not filtering duplicate URLs anymore caused the archive ID uniqueness test to fail.	2019-07-01 20:04:57 +02:00
Mike Fährmann	b89f0d8d3c	update extractor result tests	2019-07-01 20:02:47 +02:00
Mike Fährmann	69205df68d	allow '-1' for infinite retries (#300 )	2019-06-30 23:10:47 +02:00
Mike Fährmann	f7b5c4c3e7	use values of 'retries' options correctly The RE-tries option now specifies exactly that: the maximum number a failed HTTP request is re-tried. For example a value of 2 will now correctly stop after 3 attempts: the initial one + 2 re-tries. The maximum wait-time now also caps at 30min and increases exponentially for both extractor.request() and downloader.http.download().	2019-06-30 23:10:18 +02:00
Mike Fährmann	6393b47db2	add '-A/--abort'; deprecate '--abort-on-skip'	2019-06-30 14:28:28 +02:00
Mike Fährmann	f2000a69aa	implement 'image-unique' and 'chapter-unique' options (#303 ) The default value for both is 'false', i.e. duplicate URLs are NOT ignored. The previous behavior was to always ignore duplicate URLs to make '--abort-on-skip' work properly when new images where added to the beginning of a collection while gallery-dl is running.	2019-06-29 22:50:17 +02:00
Mike Fährmann	40da44b17f	Merge branch 'v1.9.0'	2019-06-29 15:39:52 +02:00
Mike Fährmann	9a216a6c6c	release version 1.8.7	2019-06-28 21:04:00 +02:00
Mike Fährmann	7a99e85943	[kissmanga] fix download URLs and file extensions The current Blogspot image URLs hosted on Kissmanga end with an "invalid" query parameter (/000.png&upx=...), which doesn't get recognized by 'spliturl()' and 'parseurl()' as such and gets therefore included in the 'extension' field from 'text.nameext_from_url()'.	2019-06-28 20:34:43 +02:00
Mike Fährmann	055102431f	[hitomi] handle Game CG galleries with scenes (fixes #321 )	2019-06-27 20:25:40 +02:00
Mike Fährmann	a9c89085fb	[instagram] implement login support (#195 )	2019-06-26 23:58:47 +02:00
Mike Fährmann	f1b0c2bf5c	[downloader:ytdl] forward cookies to youtube-dl to be able to download private videos from Twitter, Instagram, etc.	2019-06-26 19:32:07 +02:00
Mike Fährmann	7856e5e7dc	]deviantart] "fix" scraps extraction	2019-06-25 18:18:12 +02:00
Mike Fährmann	082cb24acd	[pururin] fix extraction Missing metadata information would lead to unnecessary exceptions.	2019-06-24 22:27:50 +02:00
Mike Fährmann	98554cbab8	[mangoxo] fix login	2019-06-24 21:57:17 +02:00
Mike Fährmann	108963d138	[imagefap] include Referer headers	2019-06-24 21:31:29 +02:00
Mike Fährmann	e314621366	[nsfwalbum] fix default directory_fmt (#287 )	2019-06-24 18:29:54 +02:00
Mike Fährmann	95b1e4c3c0	implement R<old>/<new>/ format option (#318 )	2019-06-23 22:45:44 +02:00
Mike Fährmann	18a1f8c6cd	[vanillarock] add post and tag extractors (closes #254 )	2019-06-23 22:45:36 +02:00
Mike Fährmann	f0c5093812	[nsfwalbum] add album extractor (closes #287 )	2019-06-23 22:45:07 +02:00
Mike Fährmann	15e4ddf46d	implement custom logging formatter supports custom log message formats for each loglevel and, by extension, custom ANSI codes and colors for errors and warnings (#304)	2019-06-21 20:17:58 +02:00
Mike Fährmann	61e413d85d	[hentaifoundry] stop disabling IPv6 addresses The rogue address mentioned in `a138d58` is no longer included in the DNS results for www.hentai-foundry.com.	2019-06-21 20:03:14 +02:00
Mike Fährmann	76ae9957c2	[deviantart] force legacy version for single deviations Let's see how long this works ... DeviantArt is rolling out a new version of their website, including a new internal and potentially usable API (rewrite incoming, yay). The issue with the new layout is that it doesn't include the "old" UUIDs for single deviations, i.e. mapping a numeric deviation ID to its UUID counterpart is impossible with the new layout.	2019-06-20 19:26:15 +02:00
Mike Fährmann	db3f52881a	add 'mtime' option	2019-06-20 17:19:44 +02:00
Mike Fährmann	ee4d7c3d89	update downloader.find() and related code Instead of replacing 'https' with 'http' for every URL in 'get_downloader()', this now only happens once during downloader initialization. Also unit tests.	2019-06-20 16:59:44 +02:00
Mike Fährmann	f4ba98771d	use Last-Modified header to set file modification time (#236, #277)	2019-06-19 23:16:32 +02:00
Mike Fährmann	179d112083	[downloader] overhaul http and text modules Get rid of the modular structure and simplify/specialize those modules.	2019-06-19 22:56:11 +02:00
Mike Fährmann	a01f99728c	[postprocessor:zip] delete empty archives when done (#316 )	2019-06-19 18:14:33 +02:00
Mike Fährmann	520c8ba106	[hentaicafe] extract 'tags' and 'artist' metadata (closes #238 ) These metadata fields will only be filled in when using a top-level URL, because that's the only place this information is available. Using a Foolslide URL (1) will leave these fields empty. (1) https://hentai.cafe/manga/read/.../en/0/1/"	2019-06-18 14:30:26 +02:00
Mike Fährmann	b51baa9a4b	[hitomi] fix empty language detection; parse datetime	2019-06-17 20:02:58 +02:00
Mike Fährmann	258e8b2060	[deviantart] small code improvements	2019-06-17 19:49:50 +02:00
Mike Fährmann	a77340c647	[keenspot] fix extraction for "TwoKinds"	2019-06-17 19:49:39 +02:00
Mike Fährmann	03e6876fbe	[instagram] provide 'description' metadata (#310 )	2019-06-16 21:54:01 +02:00
Mike Fährmann	b171befa87	implement 'parse_unicode_escapes()'	2019-06-16 21:47:24 +02:00
Mike Fährmann	3a36a0fa1e	release version 1.8.6	2019-06-14 21:11:58 +02:00
Mike Fährmann	ec3e8601f1	[slickpic] add user extractor (#249 )	2019-06-14 18:55:56 +02:00
Mike Fährmann	97ef416218	[8muses] support multi-page listings (#305 )	2019-06-14 18:48:22 +02:00
Mike Fährmann	f5961ac968	[deviantart] download deviations with no 'content' field Some deviations (possibly only from sta.sh sources) are downloadable (i.e. 'is_downloadable' is true and /deviation/download/ works), but have no 'content' or similar in their JSON representation. (fixes #307)	2019-06-13 21:14:12 +02:00
Mike Fährmann	4e07f99e3e	[mangoxo] change token message level to debug The login page currently doesn't provide and require a login token (logging in works without a token), so printing a warning during each login is unnecessary.	2019-06-13 21:09:11 +02:00
Mike Fährmann	d997c10320	[8muses] add album extractor (#305 )	2019-06-10 22:17:46 +02:00
Mike Fährmann	e05a96db5e	[deviantart] rename 'stash' to 'extra' (#302 ) 'stash' is already used as a name for the StashExtractor and therefore expected to be a dictionary.	2019-06-10 21:05:25 +02:00
Mike Fährmann	2184e3a86b	[slickpic] add album extractor (#249 )	2019-06-09 21:59:22 +02:00
Mike Fährmann	c23bf263fe	[deviantart] rename 'external' to 'stash' (#302 ) restrict extracted URLs to ones from https://sta.sh/...	2019-06-09 11:16:02 +02:00
Mike Fährmann	c73c2cda50	[pornhub] add gallery & user extractor (#282 )	2019-06-07 16:31:20 +02:00
Mike Fährmann	7c6cb908f9	[xhamster] update test results	2019-06-07 16:28:49 +02:00
Mike Fährmann	2fb85178da	[deviantart] add 'external' option (#302 ) If a description is available, this will extract URLs from the description text and try to find Extractors for them.	2019-06-06 18:53:50 +02:00
Mike Fährmann	f85e42cffc	[deviantart] fix --range for deviation & stash extractor	2019-06-06 18:45:10 +02:00
Mike Fährmann	40c7eb3424	[livedoor] improve extraction (fixes #301 )	2019-06-06 15:22:27 +02:00
Mike Fährmann	62335b9015	[paheal] adjust test results	2019-06-05 11:42:01 +02:00
Mike Fährmann	aa1ca4ed35	[shopify] skip deleted products (#175 ) Product pages which return a 4xx status code will now be skipped instead of raising an exception.	2019-06-05 11:40:54 +02:00
Mike Fährmann	096009367b	[xhamster] add gallery & user extractor (#281 )	2019-06-05 11:11:51 +02:00
Mike Fährmann	208202b962	[tumblr] improve error handling (#297 ) In some cases Tumblr's API responds with an HTML document. Trying to decode it as JSON would raise an uncaught exception.	2019-06-04 14:02:17 +02:00
Mike Fährmann	c08c340178	[directlink] make pattern case insensitive (fixes #296 )	2019-06-03 10:56:14 +02:00
Mike Fährmann	95b4a53b9c	[keenspot] improve pagination (#223 ) The old code would skip the last comic page for some series.	2019-06-02 22:12:21 +02:00
Mike Fährmann	12c965d547	release version 1.8.5	2019-06-01 20:57:55 +02:00
Mike Fährmann	731c7cbd5b	[keenspot] support all comics and "random" access (#223 )	2019-06-01 20:48:13 +02:00
Mike Fährmann	6a34f4b0c1	skip tests on read timeouts; print list of skipped tests	2019-06-01 20:47:31 +02:00
Mike Fährmann	1c36e65e9b	[exhentai] choose site version depending on input URL (#278 ) Use e-hentai.org as root and cookiedomain if the input URL is from e-hentai (or g.e-hentai), use exhentai.org otherwise.	2019-05-31 15:34:39 +02:00
Mike Fährmann	6da3e21237	[downloader:ytdl] provide 'filename' metadata (closes #291 )	2019-05-31 14:56:45 +02:00
Mike Fährmann	d33f5a7423	[wallhaven] rewrite - use API - remove login support, add 'api-key' option - remove support for "alpha" subdomain - alpha.wallhaven.cc used numeric IDs that can't be translated to the new ID system - support direct links to wallpapers	2019-05-31 14:53:02 +02:00
Mike Fährmann	5499934ae2	[ngomik] fix extraction	2019-05-30 20:18:36 +02:00
Mike Fährmann	f1893b2b5b	[deviantart] add 'folders' option (#276 )	2019-05-30 17:28:12 +02:00
Mike Fährmann	c849574def	[keenspot] add comic extractor (#223 ) Doesn't work for - http://brawlinthefamily.keenspot.com/ - http://flipside.keenspot.com/ - http://lastblood.keenspot.com/ - http://mysticrevolution.keenspot.com/ - http://porcelain.keenspot.com/ - http://twokinds.keenspot.com/ yet, because of custom layouts.	2019-05-28 21:34:38 +02:00
Mike Fährmann	2b1999476e	implement 'text.rextract()'	2019-05-28 21:03:41 +02:00
Mike Fährmann	8bd5a19515	[hentainexus] add '_extractor' data	2019-05-28 00:20:01 +02:00
Mike Fährmann	2a085a5e96	[sankakucomplex] fix 'date' values (#258 )	2019-05-28 00:18:58 +02:00
Mike Fährmann	bcd1801aa8	[sankakucomplex] add 'tag' extractor (#258 )	2019-05-27 23:57:44 +02:00
Mike Fährmann	74c2415138	[sankakucomplex] move article extractor to its own module (#258 )	2019-05-27 23:49:23 +02:00
Mike Fährmann	4465a3ea68	[kissmanga][readcomiconline] add 'captcha' option (#279 ) to configure how to handle CAPTCHA page redirects: - either interactively wait for the user to solve the CAPTCHA - or raise StopExtraction like before	2019-05-27 22:24:48 +02:00
Mike Fährmann	1e3e15c4f3	[sankaku] add article extractor (#258 )	2019-05-26 17:42:36 +02:00
Mike Fährmann	48233f00c0	[readcomiconline] detect 'AreYouHuman' redirects (#279 )	2019-05-26 15:58:37 +02:00
Mike Fährmann	1cde38110d	[livedoor] return 'date' as datetime object	2019-05-25 23:45:56 +02:00
Mike Fährmann	e88824e1a7	[livedoor] fix adjustments for https:// URLs	2019-05-25 23:45:22 +02:00
Mike Fährmann	2316e0ed3d	fix strptime workaround from `b0e85a4` Don't return a modified version of 'date_time' if strptime fails.	2019-05-25 23:22:26 +02:00
Mike Fährmann	b3e4664715	[hentainexus] fix extraction	2019-05-25 22:35:04 +02:00
Mike Fährmann	399e8e965a	also update urllib3's cipher list for versions >= 1.25	2019-05-21 23:02:20 +02:00
Mike Fährmann	f837ea98cb	[deviantart] don't call 'extend()' on folders (fixes #271 )	2019-05-20 16:24:13 +02:00
Mike Fährmann	bb32a2d490	[patreon] use file extensions from original filenames (#268 )	2019-05-20 15:46:59 +02:00
Mike Fährmann	efa805c5d7	[sankaku] update pagination end condition (fixes #265 ) Pagination over popular listings (`date:...+order:popular") never terminates, not even on the site itself, and at some point returns the same results over and over again.	2019-05-20 15:46:06 +02:00
Mike Fährmann	d514d49c72	release version 1.8.4	2019-05-17 23:52:09 +02:00
Mike Fährmann	a4ba34c835	[booru] prevent crash when no tags are present (#259 )	2019-05-17 19:32:53 +02:00
Mike Fährmann	ca3bad1779	[patreon] small fixes and adjustments (#226 ) - fix datetime parsing - rename 'user' to 'creator' - convert 'id' to integer - improve tests	2019-05-17 19:32:41 +02:00
Leonardo Taccari	fb09dd962a	[instagram] Fix extraction after `rhx_gis' field removal	2019-05-17 18:17:42 +02:00
Mike Fährmann	7a14aaed7d	[luscious] fix extraction	2019-05-17 10:48:47 +02:00
Mike Fährmann	e82cadac61	[patreon] add extractors (#226 )	2019-05-17 10:47:22 +02:00
Mike Fährmann	4891f4a328	[hentainexus] add search extractor (#256 )	2019-05-16 23:55:30 +02:00
Mike Fährmann	c02f12ce2f	avoid Cloudflare CAPTCHAs for OpenSSL < 1.1.1 see https://github.com/Anorov/cloudflare-scrape/pull/242	2019-05-15 12:25:20 +02:00
Mike Fährmann	0b4be57a10	[sankaku] fix error when no tags available (closes #259 ) [ci skip]	2019-05-14 23:40:07 +02:00
Mike Fährmann	9890bfdf23	[flickr] improve code and metadata - simplify pagination - add more metadata and slightly change its structure - convert suitable values to int or list - move keys from ["photo"] to the base level - proper video support (#246) - rename method and variable names to better fit with other extractors	2019-05-14 22:10:50 +02:00
Mike Fährmann	aa8e366b90	[luscious] fix tag extraction	2019-05-14 17:35:52 +02:00
Mike Fährmann	ba8eb1ffec	[hentainexus] add gallery extractor (#256 )	2019-05-12 23:59:41 +02:00
Mike Fährmann	bd9cb3d191	improve job class selection code + consistent argument order for add_argument() calls	2019-05-10 22:05:57 +02:00
Mike Fährmann	e64773ffdd	allow multiple post-processor command-line options (#253 ) ... without overwriting any previous ones	2019-05-10 15:32:23 +02:00
Mike Fährmann	b1db194c14	[reactor] update and improve - split 'tags' into a list - parse 'date' into a datetime object - fix webm/mp4 URLs	2019-05-09 23:24:49 +02:00
Mike Fährmann	b0e85a42e3	apply workaround from `4736912` in parse_datetime() itself	2019-05-09 21:53:17 +02:00
Mike Fährmann	523ebc9b0b	Fix serialization of 'datetime' objects in '--write-metadata' Simplified universal serialization support in json.dump() can be achieved by passing 'default=str', which was already the case in DataJob.run() for -j/--dump-json, but not for the 'metadata' post-processor. This commit introduces util.dump_json() that (more or less) unifies the JSON output procedure of both --write-metadata and --dump-json. (#251, #252)	2019-05-09 16:49:22 +02:00
Mike Fährmann	8de5866fd2	[twitter] replace unit test URLs https://twitter.com/PicturesEarth was deleted	2019-05-09 10:17:55 +02:00
Mike Fährmann	74c7304c6b	[newgrounds] extract 'date', 'favorites', and 'score'	2019-05-08 18:09:17 +02:00
Mike Fährmann	4736912d4e	[pixiv] work around strptime limitations in Python < 3.7 "%z" doesn't allow a colon separator in older Python versions: - "+0900" is OK - "+09:00" raises an exception	2019-05-08 18:08:03 +02:00
Mike Fährmann	1f7fa9dc8e	[exhentai] update data extraction code - parse 'date' to datetime object - use 'text.extract_from()'	2019-05-08 15:44:29 +02:00

... 3 4 5 6 7 ...

2012 Commits