1
0
mirror of https://github.com/mikf/gallery-dl.git synced 2024-11-26 04:32:51 +01:00
Commit Graph

154 Commits

Author SHA1 Message Date
Mike Fährmann
fc19010808
[downloader:ytdl] fix 'outtmpl' setting for yt_dlp (#1680)
yt_dlp supports multiple outtmpl settings for different file types and
uses its 'outtmpl_dict' for that.
2021-07-16 15:05:16 +02:00
Mike Fährmann
e622e004f0
[ytdl] improve module imports (#1680)
Apply 'extractor.ytdl.module' for every URL, not just the first.
2021-07-14 03:08:00 +02:00
Mike Fährmann
36ac2197db
[ytdl] add extractor for sites supported by youtube-dl
(#1680, #878)

Can be used by prefixing any URL with 'ytdl:',
or by setting 'extractor,ytdl.enabled' to 'true'.
2021-07-10 20:55:47 +02:00
Mike Fährmann
221015e586
[downloader:http] disable filename extension changes for ugoira
(#1507)
2021-04-27 01:29:09 +02:00
Mike Fährmann
1a38fae785
add option to use different youtube-dl modules (fixes #1330)
by setting the 'downloader.ytdl.module' value. For example

{
    "downloader": {
        "ytdl": {
            "module": "yt_dlp"
        }
    }
}

or '-o module=yt_dlp'
2021-03-01 03:10:42 +01:00
Mike Fährmann
8821dceb79
use __import__() to dynamically load modules 2021-03-01 01:27:02 +01:00
Mike Fährmann
cf5fa75d4c
add 'browser' option (#1117)
- change default user agent to Firefox ESR 78 on Windows 10
- remove 'ciphers' option
2021-02-26 13:41:27 +01:00
Mike Fährmann
560277394e
[downloader:http] add 'headers' option (#1322) 2021-02-21 19:13:39 +01:00
Mike Fährmann
a228bb3a5f
[downloader:http] support callbacks to validate responses 2021-01-29 22:15:21 +01:00
Mike Fährmann
0594821fcd
[downloader:http] add MIME type and signature for .ico files
(closes #1211)
2021-01-01 16:07:33 +01:00
Mike Fährmann
476d563ec2
[downloader:http] add MIME type and signature for .swf files 2020-12-11 14:21:04 +01:00
Mike Fährmann
fe0265c7a5
[downloader.http] small improvements to file signature list
- specify multiple entries for gif, mp3, zip
- add entries for pdf
2020-12-08 21:20:18 +01:00
Mike Fährmann
1a4b61f7eb
[downloader:http] fix issues with chunked transfer encoding
(fixes #1144)
2020-11-30 01:10:45 +01:00
Mike Fährmann
536c088462
[downloader:http] improve 'adjust-extensions' (#776)
Check file headers against a list of file signatures before
downloading the whole file and writing it to disk.

The file signature check needs some improvements (*),
but it produces usable results for the most part.

(*)
- 'webp', 'wav', and others start with 'RFFI'
- 'svg' uses the same "signature" as all XML documents
- 'webm' has the same signature as 'mkv' files
- only 'mp3' files in an ID3v2 container get recognized
2020-11-29 20:55:35 +01:00
Mike Fährmann
f6fd449b59
reduce wait time growth rate from exponential to linear
Waiting for 2**N seconds after each error grows too fast.
Simply waiting N seconds seems far more reasonable.
2020-09-06 22:38:25 +02:00
Mike Fährmann
ac3036ef56
add 'filesize-min' and 'filesize-max' options (closes #780) 2020-09-03 18:21:04 +02:00
Mike Fährmann
34929f673f
readd 'session' to base downloader class (fixes #768) 2020-05-20 20:04:46 +02:00
Mike Fährmann
ece73b5b2a
make 'path' and 'keywords' available in logging messages
Wrap all loggers used by job, extractor, downloader, and postprocessor
objects into a (custom) LoggerAdapter that provides access to the
underlying job, extractor, pathfmt, and kwdict objects and their
properties.

__init__() signatures for all downloader and postprocessor classes have
been changed to take the current Job object as their first argument,
instead of the current extractor or pathfmt.

(#574, #575)
2020-05-18 19:04:51 +02:00
Mike Fährmann
f8661c6578
[downloader:ytdl] fix file extensions when merging into mkv 2020-05-13 22:35:33 +02:00
Mike Fährmann
dba87ca99e
[downloader:ytdl] change 'forward-cookies' default to 'false'
There are currently no situations where forwarding gallery-dl's cookies
to youtube-dl is necessary, and it only causes problems when forcing
youtube-dl for Twitter video downloads while logged in.
2020-05-12 20:17:08 +02:00
Mike Fährmann
19a7afdd9b
[downloader:http] add MIME types for .psd files (closes #714) 2020-04-29 23:01:42 +02:00
Mike Fährmann
38bc6430d3
[downloader:http] don't overwrite existing '_mtime' fields 2020-04-10 23:08:03 +02:00
Mike Fährmann
115fd2c6f2
"fix" incomplete MIME types (#632)
e-/exhentai's original image downloads currently send
incomplete/invalid Content-Type headers, "jpg" instead
of "image/jpg" etc, since the last update.
(https://forums.e-hentai.org/index.php?showtopic=236113)

This change prepends any Content-Type value missing a
media type specification with "image/", transforming it
into a valid MIME type.

(A global solution to a local problem, but it shouldn't
 cause any issues anywhere else)
2020-03-03 21:21:57 +01:00
Mike Fährmann
adcd7cb24a
[downloader:http] add another MIME type for '.rar' files (#628) 2020-03-01 20:42:13 +01:00
Mike Fährmann
380b693fad
[downloader:http] add more MIME types for '.bmp' files (#621) 2020-02-23 16:51:04 +01:00
Mike Fährmann
760b9b4db4
add remove_file() and remove_directory() helpers
these functions call os.unlink() or os.rmdir()
while catching and suppressing potential OSErrors
2020-01-18 00:21:26 +01:00
Mike Fährmann
200aea308a
[downloader:common] enable 'job'/'extractor' for logging messages
(#574)
2020-01-12 21:41:16 +01:00
Mike Fährmann
c4702ec9b6
simplify some logging calls 2019-12-10 21:30:08 +01:00
Mike Fährmann
c59b98c81b
[downloader:http] improve rate limit handling
- Move the download "logic" with rate limit checks into its own
  method that only gets used if a rate limit should be enforced
- Fix an issue where suspending gallery-dl during a download would
  basically ignore the rate limit for the remaining download when
  resuming its execution.
2019-12-09 20:34:22 +01:00
Mike Fährmann
bbbafc1c24
[downloader:http] catch both possible SSLException instances
With pyOpenSSL installed, but disabled, the SSLError exception
would be set to the one from pyOpenSSL, which could never get raised.

This commit solves this problem by catching both, the native SSLError
exception as well as the one from pyOpenSSL (if available.1)
2019-12-09 20:34:10 +01:00
Mike Fährmann
f5604492c3
update interface of config functions 2019-11-24 00:42:28 +01:00
Mike Fährmann
bbbeff4c41
[downloader.http] implement file-specific HTTP headers 2019-11-19 23:50:54 +01:00
Mike Fährmann
a5be08a830
[downloader:ytdl] forward proxy settings 2019-11-05 16:16:26 +01:00
Mike Fährmann
d44f790e81
adjust output for HTTP status related errors 2019-10-27 23:55:02 +01:00
Mike Fährmann
083e14ad9a
[downloader:ytdl] add data from '_ytdl_extra' to info_dicts 2019-10-25 13:17:13 +02:00
Mike Fährmann
1032cfa34b
[downloader:http] extend mimetype map with archive formats 2019-10-10 18:30:23 +02:00
Mike Fährmann
8eaae58045
[downloader:http] change log message level to 'debug' 2019-08-29 23:05:47 +02:00
Mike Fährmann
7c09545f70
[downloader:ytdl] add 'outtmpl' option (#395) 2019-08-24 22:47:59 +02:00
Mike Fährmann
ebabc5caf1
[downloader:http] treat 416 without downloaded data as error
Downloading https://pbs.twimg.com/media/EB2cGUYX4AI2Vuu.jpg:orig (NSFW)
sometimes returns a 416 status code, even though no 'Range' header was
sent and no data was downloaded prior.
This code usually means a file has already been downloaded completely
and the download method indicates success, but in this case it causes
an exception down the pipeline since no file was created.
2019-08-20 00:15:17 +02:00
Mike Fährmann
0bb873757a
update PathFormat class
- change 'has_extension' from a simple flag/bool to a field that
  contains the original filename extension
- rename 'keywords' to 'kwdict' and some other stuff as well
- inline 'adjust_path()'
- put enumeration index before filename extension (#306)
2019-08-12 21:40:37 +02:00
Mike Fährmann
b7fb93e2b2
[downloader:http] add 'adjust-extensions' option 2019-08-08 16:54:20 +02:00
Mike Fährmann
547ea71463
[downloader.ytdl] add 'forward-cookies' option (#352)
The "long" name is necessary because just calling it 'cookies' would
clash with how the lookup for '--cookies' is implemented.
2019-07-24 21:19:11 +02:00
Mike Fährmann
c41ff9441e
improve find() for downloaders and postprocessors 2019-07-15 16:33:03 +02:00
Mike Fährmann
16c582aaf9
implement 'mtime' post-processor (#332)
This can set a file's modification time according to a UNIX timestamp
or a datetime object from its metadata.
2019-07-14 22:39:17 +02:00
Mike Fährmann
8966930c5c
[downloader:http] try to import SSL exception class from OpenSSL
(#324)
2019-07-01 20:10:26 +02:00
Mike Fährmann
69205df68d
allow '-1' for infinite retries (#300) 2019-06-30 23:10:47 +02:00
Mike Fährmann
f7b5c4c3e7
use values of 'retries' options correctly
The RE-tries option now specifies exactly that: the maximum number a
failed HTTP request is re-tried. For example a value of 2 will now
correctly stop after 3 attempts: the initial one + 2 re-tries.

The maximum wait-time now also caps at 30min and increases exponentially
for both extractor.request() and downloader.http.download().
2019-06-30 23:10:18 +02:00
Mike Fährmann
f1b0c2bf5c
[downloader:ytdl] forward cookies to youtube-dl
to be able to download private videos from Twitter, Instagram, etc.
2019-06-26 19:32:07 +02:00
Mike Fährmann
db3f52881a
add 'mtime' option 2019-06-20 17:19:44 +02:00
Mike Fährmann
ee4d7c3d89
update downloader.find() and related code
Instead of replacing 'https' with 'http' for every URL in
'get_downloader()', this now only happens once during downloader
initialization. Also unit tests.
2019-06-20 16:59:44 +02:00
Mike Fährmann
f4ba98771d
use Last-Modified header to set file modification time
(#236, #277)
2019-06-19 23:16:32 +02:00
Mike Fährmann
179d112083
[downloader] overhaul http and text modules
Get rid of the modular structure and simplify/specialize those modules.
2019-06-19 22:56:11 +02:00
Mike Fährmann
6da3e21237
[downloader:ytdl] provide 'filename' metadata (closes #291) 2019-05-31 14:56:45 +02:00
Mike Fährmann
7973419b54
restrict downloader and postprocessor module imports 2019-04-16 18:09:30 +02:00
Mike Fährmann
114b8eecc5
[downloader;ytdl] utilize '_ytdl_index' metadata fields 2019-03-24 11:27:20 +01:00
Mike Fährmann
c14d44e1bc
[downloader:common] retry downloads on SSL errors (#130) 2018-12-14 16:33:04 +01:00
Mike Fährmann
b17a5d6f3b
give downloader classes proper names 2018-11-16 14:40:05 +01:00
Mike Fährmann
655549df7c
[downloader:ytdl] add several options
The "default" downloader options (rate, retries, timeout, verify) are
mapped to corresponding youtube-dl options.

downloader.ytdl.logging tells the downloader to pass youtube-dl's output
to a Logger object.

downloader.ytdl.raw-options allows to pass arbitrary options to the
YoutubeDL constructor.
2018-10-20 18:26:49 +02:00
Mike Fährmann
4a348990f4
adjust value resolution for retries/timeout/verify options
This change introduces 'extractor.*.retries/timeout/verify' options
as a general way to set these values for all HTTP requests.

'downloader.http.retries/timeout/verify' is a way to override these
options for file downloads only and will fall back to 'extractor.*.…*
values if they haven't been explicitly set.

Also: downloader classes now take an extractor object as first argument
instead of a requests.session.
2018-10-07 21:13:39 +02:00
Mike Fährmann
188876d814
implement youtube-dl downloader module
URLs starting with 'ytdl:' will now be handled by youtube-dl.
There is probably a lot to fix and improve, but the basic use case
works.

TODO:
- format selection and ytdl options in general
- better filename/path handling
- ytdl support for "unsupported URLs"
- ...
2018-10-05 18:05:11 +02:00
Mike Fährmann
e9ae6fd080
improve downloader/postprocessor module loading
- handle arguments of any type without propagating an exception
- prevent potential security risk through relative imports
2018-09-05 16:39:40 +02:00
Mike Fährmann
973cf98e88
fix download skip for files without extension 2018-06-27 17:16:07 +02:00
Mike Fährmann
821535b458
adjust PathFormat class 2018-06-06 20:17:17 +02:00
Mike Fährmann
cc36f88586
rename safe_int to parse_int; move parse_* to text module 2018-04-20 14:53:21 +02:00
Mike Fährmann
1d54a8e07d
fix logging output during downloads
from:
filename.ext[download][warning] ...

to:
filename.ext
[download][warning] ...
2018-03-01 18:43:43 +01:00
Mike Fährmann
915807dd77
log HTTP errors as warnings 2018-01-29 21:55:46 +01:00
Mike Fährmann
f94e3706a8
use logging module for error messages during downloads 2018-01-26 18:11:13 +01:00
Mike Fährmann
b837420291
fix minor urllist issues 2018-01-19 22:54:15 +01:00
Mike Fährmann
6174a5c4ef
[download] adjust filename extension on filetype mismatch
(closes #63)
2018-01-17 18:37:06 +01:00
Mike Fährmann
ebe9b0a04c
another attempt at downloader retry behavior
This commit changes the general behavior from
'Retry on every exception and abort on DownloadError' to
'Only retry on DownloadRetry exceptions and abort on every other one'

The previous version would have retried on several states which
would have no chance of ever succeeding (invalid URLs, etc.)
2017-12-07 15:31:14 +01:00
Mike Fährmann
8f518e03f8
add options to set maximum download rate
- -r/--limit-rate as cmdline option
- downloader.http.rate as config option

This implementation very roughly uses the idea of the token bucket
algorithm [1] and mostly uses Wget's approach [2] as inspiration.

[1] https://en.wikipedia.org/wiki/Token_bucket
[2] http://git.savannah.gnu.org/cgit/wget.git/tree/src/retr.c?h=v1.19.2&id=ba6b44f6745b14dce414761a8e4b35d31b176bba#n111
2017-12-02 01:47:26 +01:00
Mike Fährmann
3dc1169736
use own mapping before relying on the 'mimetypes' module 2017-12-01 13:50:31 +01:00
Mike Fährmann
79bcaa8726
improve downloader retry behavior
- only retry download on 5xx and 429 status codes
- immediately fail on 4xx status codes
2017-11-10 21:46:18 +01:00
Mike Fährmann
42e948584d
fix downloader error handling
RequestException being a subclass of OSError caused all exceptions
during file downloads to be ignored/re-raised.
2017-11-07 15:23:07 +01:00
Mike Fährmann
707b15b586
create missing directories for 'part-directory'
also some code improvements regarding downloader config values
2017-10-27 12:22:45 +02:00
Mike Fährmann
caf26412dd
add option to set alternate location of .part files (#29)
Note: The path set for 'downloader.*.part-directory' needs to point to an
already existing directory.
2017-10-26 00:16:48 +02:00
Mike Fährmann
9a41002b77
fix partial downloads for 'text:' URLs
Using a filesize in bytes as offset into a Python string is not
a good idea if said file contains non-ASCII characters.
2017-10-25 15:04:45 +02:00
Mike Fährmann
963670d73b
add options to control usage of .part files (#29)
- '--no-part' command line option to disable them
- 'downloader.http.part' and 'downloader.text.part' config options

Disabling .part files restores the behaviour of the old downloader
implementation.
2017-10-24 23:33:44 +02:00
Mike Fährmann
b0353aa02d
rewrite download modules (#29)
- use '.part' files during file-download
- implement continuation of incomplete downloads
- check if file size matches the one reported by server
2017-10-24 12:53:03 +02:00
Mike Fährmann
2e982f56af
use 'Content-Length' to determine incomplete downloads (#29) 2017-10-20 18:56:18 +02:00
Mike Fährmann
b8862ff15e
add 'downloader.http.verify' option
(also: change the default 'timeout' from None to 30)
2017-08-31 15:21:08 +02:00
Mike Fährmann
d70c66c516
fix "text:" downloader 2017-08-16 12:11:47 +02:00
Mike Fährmann
58e95a7487
share extractor and downloader sessions
There was never any "good" reason for the strict separation
between extractors and downloaders. This change allows for
reduced resource usage (probably unnoticeable) and less lines
of code at the "cost" of tighter coupling.
2017-06-30 19:38:14 +02:00
Mike Fährmann
fac6c02224
[downloader] fix extension from content-type 2017-06-19 09:24:00 +02:00
Mike Fährmann
107d29ad8a
improve handling of text:... URLs
- don't require // after the colon
- open output files in text mode
2017-05-12 14:10:25 +02:00
Mike Fährmann
48a5b11204
fix error if no file extension is found 2017-04-26 12:31:42 +02:00
Mike Fährmann
e3212dd98f
fix some smaller stuff
- remove support for old windows config paths
- catch exception if cache-database can't be opened
- fix username/password settings for unit tests
- rename variable 'max_tries' to 'retries'
2017-03-27 14:30:32 +02:00
Mike Fährmann
e2b5cd9918
change config-path for 'retries' and 'timeout' 2017-03-26 18:24:46 +02:00
Mike Fährmann
0b5076815d
always delete incompletely downloaded files 2017-03-21 15:53:43 +01:00
Mike Fährmann
22910f9562
improve error handling of http file downloads
(#10)
2017-03-16 04:17:35 +01:00
Mike Fährmann
4f123b8513
code adjustments according to pep8 2017-01-30 19:40:15 +01:00
Mike Fährmann
3c1daef839
don't delete downloaded files in certain edge cases 2016-11-27 23:43:25 +01:00
Mike Fährmann
2b2bdce366
don't raise an exception if a download fails (#5) 2016-11-23 13:07:44 +01:00
Mike Fährmann
dd8236e733
enable non-standard MIME types 2016-09-30 16:41:49 +02:00
Mike Fährmann
29692c5784
get extension from Content-Type header if not provided 2016-09-30 12:32:48 +02:00
Mike Fährmann
ecc6542fc8
change required parameter type to file-like objects 2015-12-21 22:46:49 +01:00
Mike Fährmann
a8c0b4531d
fix issue with Ctrl+c on windows 2015-12-02 01:01:33 +01:00
Mike Fährmann
4b377ccc09
use output-module during downloads 2015-12-01 21:22:58 +01:00
Mike Fährmann
352950eebe
new method to import downloaders 2015-11-12 02:29:59 +01:00
Mike Fährmann
28fa7c53b4 docstrings and other small fixes for downloaders 2015-04-10 21:45:41 +02:00