gallery-dl

mirror of https://github.com/mikf/gallery-dl.git synced 2024-11-26 04:32:51 +01:00

Author	SHA1	Message	Date
Mike Fährmann	fc19010808	[downloader:ytdl] fix 'outtmpl' setting for yt_dlp (#1680 ) yt_dlp supports multiple outtmpl settings for different file types and uses its 'outtmpl_dict' for that.	2021-07-16 15:05:16 +02:00
Mike Fährmann	e622e004f0	[ytdl] improve module imports (#1680 ) Apply 'extractor.ytdl.module' for every URL, not just the first.	2021-07-14 03:08:00 +02:00
Mike Fährmann	36ac2197db	[ytdl] add extractor for sites supported by youtube-dl (#1680, #878) Can be used by prefixing any URL with 'ytdl:', or by setting 'extractor,ytdl.enabled' to 'true'.	2021-07-10 20:55:47 +02:00
Mike Fährmann	221015e586	[downloader:http] disable filename extension changes for ugoira (#1507)	2021-04-27 01:29:09 +02:00
Mike Fährmann	1a38fae785	add option to use different youtube-dl modules (fixes #1330 ) by setting the 'downloader.ytdl.module' value. For example { "downloader": { "ytdl": { "module": "yt_dlp" } } } or '-o module=yt_dlp'	2021-03-01 03:10:42 +01:00
Mike Fährmann	8821dceb79	use __import__() to dynamically load modules	2021-03-01 01:27:02 +01:00
Mike Fährmann	cf5fa75d4c	add 'browser' option (#1117 ) - change default user agent to Firefox ESR 78 on Windows 10 - remove 'ciphers' option	2021-02-26 13:41:27 +01:00
Mike Fährmann	560277394e	[downloader:http] add 'headers' option (#1322 )	2021-02-21 19:13:39 +01:00
Mike Fährmann	a228bb3a5f	[downloader:http] support callbacks to validate responses	2021-01-29 22:15:21 +01:00
Mike Fährmann	0594821fcd	[downloader:http] add MIME type and signature for .ico files (closes #1211)	2021-01-01 16:07:33 +01:00
Mike Fährmann	476d563ec2	[downloader:http] add MIME type and signature for .swf files	2020-12-11 14:21:04 +01:00
Mike Fährmann	fe0265c7a5	[downloader.http] small improvements to file signature list - specify multiple entries for gif, mp3, zip - add entries for pdf	2020-12-08 21:20:18 +01:00
Mike Fährmann	1a4b61f7eb	[downloader:http] fix issues with chunked transfer encoding (fixes #1144)	2020-11-30 01:10:45 +01:00
Mike Fährmann	536c088462	[downloader:http] improve 'adjust-extensions' (#776 ) Check file headers against a list of file signatures before downloading the whole file and writing it to disk. The file signature check needs some improvements (), but it produces usable results for the most part. () - 'webp', 'wav', and others start with 'RFFI' - 'svg' uses the same "signature" as all XML documents - 'webm' has the same signature as 'mkv' files - only 'mp3' files in an ID3v2 container get recognized	2020-11-29 20:55:35 +01:00
Mike Fährmann	f6fd449b59	reduce wait time growth rate from exponential to linear Waiting for 2**N seconds after each error grows too fast. Simply waiting N seconds seems far more reasonable.	2020-09-06 22:38:25 +02:00
Mike Fährmann	ac3036ef56	add 'filesize-min' and 'filesize-max' options (closes #780 )	2020-09-03 18:21:04 +02:00
Mike Fährmann	34929f673f	readd 'session' to base downloader class (fixes #768 )	2020-05-20 20:04:46 +02:00
Mike Fährmann	ece73b5b2a	make 'path' and 'keywords' available in logging messages Wrap all loggers used by job, extractor, downloader, and postprocessor objects into a (custom) LoggerAdapter that provides access to the underlying job, extractor, pathfmt, and kwdict objects and their properties. __init__() signatures for all downloader and postprocessor classes have been changed to take the current Job object as their first argument, instead of the current extractor or pathfmt. (#574, #575)	2020-05-18 19:04:51 +02:00
Mike Fährmann	f8661c6578	[downloader:ytdl] fix file extensions when merging into mkv	2020-05-13 22:35:33 +02:00
Mike Fährmann	dba87ca99e	[downloader:ytdl] change 'forward-cookies' default to 'false' There are currently no situations where forwarding gallery-dl's cookies to youtube-dl is necessary, and it only causes problems when forcing youtube-dl for Twitter video downloads while logged in.	2020-05-12 20:17:08 +02:00
Mike Fährmann	19a7afdd9b	[downloader:http] add MIME types for .psd files (closes #714 )	2020-04-29 23:01:42 +02:00
Mike Fährmann	38bc6430d3	[downloader:http] don't overwrite existing '_mtime' fields	2020-04-10 23:08:03 +02:00
Mike Fährmann	115fd2c6f2	"fix" incomplete MIME types (#632 ) e-/exhentai's original image downloads currently send incomplete/invalid Content-Type headers, "jpg" instead of "image/jpg" etc, since the last update. (https://forums.e-hentai.org/index.php?showtopic=236113) This change prepends any Content-Type value missing a media type specification with "image/", transforming it into a valid MIME type. (A global solution to a local problem, but it shouldn't cause any issues anywhere else)	2020-03-03 21:21:57 +01:00
Mike Fährmann	adcd7cb24a	[downloader:http] add another MIME type for '.rar' files (#628 )	2020-03-01 20:42:13 +01:00
Mike Fährmann	380b693fad	[downloader:http] add more MIME types for '.bmp' files (#621 )	2020-02-23 16:51:04 +01:00
Mike Fährmann	760b9b4db4	add remove_file() and remove_directory() helpers these functions call os.unlink() or os.rmdir() while catching and suppressing potential OSErrors	2020-01-18 00:21:26 +01:00
Mike Fährmann	200aea308a	[downloader:common] enable 'job'/'extractor' for logging messages (#574)	2020-01-12 21:41:16 +01:00
Mike Fährmann	c4702ec9b6	simplify some logging calls	2019-12-10 21:30:08 +01:00
Mike Fährmann	c59b98c81b	[downloader:http] improve rate limit handling - Move the download "logic" with rate limit checks into its own method that only gets used if a rate limit should be enforced - Fix an issue where suspending gallery-dl during a download would basically ignore the rate limit for the remaining download when resuming its execution.	2019-12-09 20:34:22 +01:00
Mike Fährmann	bbbafc1c24	[downloader:http] catch both possible SSLException instances With pyOpenSSL installed, but disabled, the SSLError exception would be set to the one from pyOpenSSL, which could never get raised. This commit solves this problem by catching both, the native SSLError exception as well as the one from pyOpenSSL (if available.1)	2019-12-09 20:34:10 +01:00
Mike Fährmann	f5604492c3	update interface of config functions	2019-11-24 00:42:28 +01:00
Mike Fährmann	bbbeff4c41	[downloader.http] implement file-specific HTTP headers	2019-11-19 23:50:54 +01:00
Mike Fährmann	a5be08a830	[downloader:ytdl] forward proxy settings	2019-11-05 16:16:26 +01:00
Mike Fährmann	d44f790e81	adjust output for HTTP status related errors	2019-10-27 23:55:02 +01:00
Mike Fährmann	083e14ad9a	[downloader:ytdl] add data from '_ytdl_extra' to info_dicts	2019-10-25 13:17:13 +02:00
Mike Fährmann	1032cfa34b	[downloader:http] extend mimetype map with archive formats	2019-10-10 18:30:23 +02:00
Mike Fährmann	8eaae58045	[downloader:http] change log message level to 'debug'	2019-08-29 23:05:47 +02:00
Mike Fährmann	7c09545f70	[downloader:ytdl] add 'outtmpl' option (#395 )	2019-08-24 22:47:59 +02:00
Mike Fährmann	ebabc5caf1	[downloader:http] treat 416 without downloaded data as error Downloading https://pbs.twimg.com/media/EB2cGUYX4AI2Vuu.jpg:orig (NSFW) sometimes returns a 416 status code, even though no 'Range' header was sent and no data was downloaded prior. This code usually means a file has already been downloaded completely and the download method indicates success, but in this case it causes an exception down the pipeline since no file was created.	2019-08-20 00:15:17 +02:00
Mike Fährmann	0bb873757a	update PathFormat class - change 'has_extension' from a simple flag/bool to a field that contains the original filename extension - rename 'keywords' to 'kwdict' and some other stuff as well - inline 'adjust_path()' - put enumeration index before filename extension (#306)	2019-08-12 21:40:37 +02:00
Mike Fährmann	b7fb93e2b2	[downloader:http] add 'adjust-extensions' option	2019-08-08 16:54:20 +02:00
Mike Fährmann	547ea71463	[downloader.ytdl] add 'forward-cookies' option (#352 ) The "long" name is necessary because just calling it 'cookies' would clash with how the lookup for '--cookies' is implemented.	2019-07-24 21:19:11 +02:00
Mike Fährmann	c41ff9441e	improve find() for downloaders and postprocessors	2019-07-15 16:33:03 +02:00
Mike Fährmann	16c582aaf9	implement 'mtime' post-processor (#332 ) This can set a file's modification time according to a UNIX timestamp or a datetime object from its metadata.	2019-07-14 22:39:17 +02:00
Mike Fährmann	8966930c5c	[downloader:http] try to import SSL exception class from OpenSSL (#324)	2019-07-01 20:10:26 +02:00
Mike Fährmann	69205df68d	allow '-1' for infinite retries (#300 )	2019-06-30 23:10:47 +02:00
Mike Fährmann	f7b5c4c3e7	use values of 'retries' options correctly The RE-tries option now specifies exactly that: the maximum number a failed HTTP request is re-tried. For example a value of 2 will now correctly stop after 3 attempts: the initial one + 2 re-tries. The maximum wait-time now also caps at 30min and increases exponentially for both extractor.request() and downloader.http.download().	2019-06-30 23:10:18 +02:00
Mike Fährmann	f1b0c2bf5c	[downloader:ytdl] forward cookies to youtube-dl to be able to download private videos from Twitter, Instagram, etc.	2019-06-26 19:32:07 +02:00
Mike Fährmann	db3f52881a	add 'mtime' option	2019-06-20 17:19:44 +02:00
Mike Fährmann	ee4d7c3d89	update downloader.find() and related code Instead of replacing 'https' with 'http' for every URL in 'get_downloader()', this now only happens once during downloader initialization. Also unit tests.	2019-06-20 16:59:44 +02:00
Mike Fährmann	f4ba98771d	use Last-Modified header to set file modification time (#236, #277)	2019-06-19 23:16:32 +02:00
Mike Fährmann	179d112083	[downloader] overhaul http and text modules Get rid of the modular structure and simplify/specialize those modules.	2019-06-19 22:56:11 +02:00
Mike Fährmann	6da3e21237	[downloader:ytdl] provide 'filename' metadata (closes #291 )	2019-05-31 14:56:45 +02:00
Mike Fährmann	7973419b54	restrict downloader and postprocessor module imports	2019-04-16 18:09:30 +02:00
Mike Fährmann	114b8eecc5	[downloader;ytdl] utilize '_ytdl_index' metadata fields	2019-03-24 11:27:20 +01:00
Mike Fährmann	c14d44e1bc	[downloader:common] retry downloads on SSL errors (#130 )	2018-12-14 16:33:04 +01:00
Mike Fährmann	b17a5d6f3b	give downloader classes proper names	2018-11-16 14:40:05 +01:00
Mike Fährmann	655549df7c	[downloader:ytdl] add several options The "default" downloader options (rate, retries, timeout, verify) are mapped to corresponding youtube-dl options. downloader.ytdl.logging tells the downloader to pass youtube-dl's output to a Logger object. downloader.ytdl.raw-options allows to pass arbitrary options to the YoutubeDL constructor.	2018-10-20 18:26:49 +02:00
Mike Fährmann	4a348990f4	adjust value resolution for retries/timeout/verify options This change introduces 'extractor..retries/timeout/verify' options as a general way to set these values for all HTTP requests. 'downloader.http.retries/timeout/verify' is a way to override these options for file downloads only and will fall back to 'extractor..…* values if they haven't been explicitly set. Also: downloader classes now take an extractor object as first argument instead of a requests.session.	2018-10-07 21:13:39 +02:00
Mike Fährmann	188876d814	implement youtube-dl downloader module URLs starting with 'ytdl:' will now be handled by youtube-dl. There is probably a lot to fix and improve, but the basic use case works. TODO: - format selection and ytdl options in general - better filename/path handling - ytdl support for "unsupported URLs" - ...	2018-10-05 18:05:11 +02:00
Mike Fährmann	e9ae6fd080	improve downloader/postprocessor module loading - handle arguments of any type without propagating an exception - prevent potential security risk through relative imports	2018-09-05 16:39:40 +02:00
Mike Fährmann	973cf98e88	fix download skip for files without extension	2018-06-27 17:16:07 +02:00
Mike Fährmann	821535b458	adjust PathFormat class	2018-06-06 20:17:17 +02:00
Mike Fährmann	cc36f88586	rename safe_int to parse_int; move parse_* to text module	2018-04-20 14:53:21 +02:00
Mike Fährmann	1d54a8e07d	fix logging output during downloads from: filename.ext[download][warning] ... to: filename.ext [download][warning] ...	2018-03-01 18:43:43 +01:00
Mike Fährmann	915807dd77	log HTTP errors as warnings	2018-01-29 21:55:46 +01:00
Mike Fährmann	f94e3706a8	use logging module for error messages during downloads	2018-01-26 18:11:13 +01:00
Mike Fährmann	b837420291	fix minor urllist issues	2018-01-19 22:54:15 +01:00
Mike Fährmann	6174a5c4ef	[download] adjust filename extension on filetype mismatch (closes #63)	2018-01-17 18:37:06 +01:00
Mike Fährmann	ebe9b0a04c	another attempt at downloader retry behavior This commit changes the general behavior from 'Retry on every exception and abort on DownloadError' to 'Only retry on DownloadRetry exceptions and abort on every other one' The previous version would have retried on several states which would have no chance of ever succeeding (invalid URLs, etc.)	2017-12-07 15:31:14 +01:00
Mike Fährmann	8f518e03f8	add options to set maximum download rate - -r/--limit-rate as cmdline option - downloader.http.rate as config option This implementation very roughly uses the idea of the token bucket algorithm [1] and mostly uses Wget's approach [2] as inspiration. [1] https://en.wikipedia.org/wiki/Token_bucket [2] http://git.savannah.gnu.org/cgit/wget.git/tree/src/retr.c?h=v1.19.2&id=ba6b44f6745b14dce414761a8e4b35d31b176bba#n111	2017-12-02 01:47:26 +01:00
Mike Fährmann	3dc1169736	use own mapping before relying on the 'mimetypes' module	2017-12-01 13:50:31 +01:00
Mike Fährmann	79bcaa8726	improve downloader retry behavior - only retry download on 5xx and 429 status codes - immediately fail on 4xx status codes	2017-11-10 21:46:18 +01:00
Mike Fährmann	42e948584d	fix downloader error handling RequestException being a subclass of OSError caused all exceptions during file downloads to be ignored/re-raised.	2017-11-07 15:23:07 +01:00
Mike Fährmann	707b15b586	create missing directories for 'part-directory' also some code improvements regarding downloader config values	2017-10-27 12:22:45 +02:00
Mike Fährmann	caf26412dd	add option to set alternate location of .part files (#29 ) Note: The path set for 'downloader.*.part-directory' needs to point to an already existing directory.	2017-10-26 00:16:48 +02:00
Mike Fährmann	9a41002b77	fix partial downloads for 'text:' URLs Using a filesize in bytes as offset into a Python string is not a good idea if said file contains non-ASCII characters.	2017-10-25 15:04:45 +02:00
Mike Fährmann	963670d73b	add options to control usage of .part files (#29 ) - '--no-part' command line option to disable them - 'downloader.http.part' and 'downloader.text.part' config options Disabling .part files restores the behaviour of the old downloader implementation.	2017-10-24 23:33:44 +02:00
Mike Fährmann	b0353aa02d	rewrite download modules (#29 ) - use '.part' files during file-download - implement continuation of incomplete downloads - check if file size matches the one reported by server	2017-10-24 12:53:03 +02:00
Mike Fährmann	2e982f56af	use 'Content-Length' to determine incomplete downloads (#29 )	2017-10-20 18:56:18 +02:00
Mike Fährmann	b8862ff15e	add 'downloader.http.verify' option (also: change the default 'timeout' from None to 30)	2017-08-31 15:21:08 +02:00
Mike Fährmann	d70c66c516	fix "text:" downloader	2017-08-16 12:11:47 +02:00
Mike Fährmann	58e95a7487	share extractor and downloader sessions There was never any "good" reason for the strict separation between extractors and downloaders. This change allows for reduced resource usage (probably unnoticeable) and less lines of code at the "cost" of tighter coupling.	2017-06-30 19:38:14 +02:00
Mike Fährmann	fac6c02224	[downloader] fix extension from content-type	2017-06-19 09:24:00 +02:00
Mike Fährmann	107d29ad8a	improve handling of text:... URLs - don't require // after the colon - open output files in text mode	2017-05-12 14:10:25 +02:00
Mike Fährmann	48a5b11204	fix error if no file extension is found	2017-04-26 12:31:42 +02:00
Mike Fährmann	e3212dd98f	fix some smaller stuff - remove support for old windows config paths - catch exception if cache-database can't be opened - fix username/password settings for unit tests - rename variable 'max_tries' to 'retries'	2017-03-27 14:30:32 +02:00
Mike Fährmann	e2b5cd9918	change config-path for 'retries' and 'timeout'	2017-03-26 18:24:46 +02:00
Mike Fährmann	0b5076815d	always delete incompletely downloaded files	2017-03-21 15:53:43 +01:00
Mike Fährmann	22910f9562	improve error handling of http file downloads (#10)	2017-03-16 04:17:35 +01:00
Mike Fährmann	4f123b8513	code adjustments according to pep8	2017-01-30 19:40:15 +01:00
Mike Fährmann	3c1daef839	don't delete downloaded files in certain edge cases	2016-11-27 23:43:25 +01:00
Mike Fährmann	2b2bdce366	don't raise an exception if a download fails (#5 )	2016-11-23 13:07:44 +01:00
Mike Fährmann	dd8236e733	enable non-standard MIME types	2016-09-30 16:41:49 +02:00
Mike Fährmann	29692c5784	get extension from Content-Type header if not provided	2016-09-30 12:32:48 +02:00
Mike Fährmann	ecc6542fc8	change required parameter type to file-like objects	2015-12-21 22:46:49 +01:00
Mike Fährmann	a8c0b4531d	fix issue with Ctrl+c on windows	2015-12-02 01:01:33 +01:00
Mike Fährmann	4b377ccc09	use output-module during downloads	2015-12-01 21:22:58 +01:00
Mike Fährmann	352950eebe	new method to import downloaders	2015-11-12 02:29:59 +01:00
Mike Fährmann	28fa7c53b4	docstrings and other small fixes for downloaders	2015-04-10 21:45:41 +02:00

1 2 3 4

154 Commits