1
0
mirror of https://github.com/mikf/gallery-dl.git synced 2024-11-22 18:53:21 +01:00
Commit Graph

69 Commits

Author SHA1 Message Date
Mike Fährmann
590c0b3ad5
re-implement and improve filename formatter
A format string now gets parsed only once instead of re-parsing it each
time it is applied to a set of data.

The initial parsing causes directory path creation to be at about 2x
slower than before, since each format string there is used only once,
but building a filename, the more common operation, is at least 2x
faster. The "directory slowness" cancels at about 5 filenames and
everything above that is significantly faster.
2018-08-25 10:45:14 +02:00
Mike Fährmann
c83fc62abc
prioritize archive over disk access (#87) 2018-07-30 17:48:23 +02:00
Mike Fährmann
e0dd8dff5f
implement L<maxlen>/<replacement>/ format option
The L option allows for the contents of a format field to be replaced
with <replacement> if its length is greater than <maxlen>.

Example:
{f:L5/too long/} -> "foo"      (if "f" is "foo")
                 -> "too long" (if "f" is "foobar")

(#92) (#94)
2018-07-29 13:52:07 +02:00
Mike Fährmann
8fe9056b16
implement string slicing for format strings
It is now possible to slice string (or list) values of format string
replacement fields with the same syntax as in regular Python code.

"{digits}"       -> "0123456789"
"{digits[2:-2]}" -> "234567"
"{digits[:5]}"   -> "01234"

The optional third parameter (step) has been left out to simplify things.
2018-07-14 09:53:15 +02:00
Mike Fährmann
a9e276bc37
reset delete-flag
Since 'PathFormat' objects are being reused, setting `delete`
to True once caused all files downloaded after to be deleted as well.
2018-06-20 18:12:59 +02:00
Mike Fährmann
baccf8a958
improve postprocessor handling
- add pathfmt argument for __init__()
- add finalization step
- add option to keep or delete zipped files
2018-06-08 17:39:02 +02:00
Mike Fährmann
7646bdbcfd
improve postprocessor initialization code 2018-06-07 22:29:54 +02:00
Mike Fährmann
821535b458
adjust PathFormat class 2018-06-06 20:17:17 +02:00
Mike Fährmann
6a31ada9e3
re-implement OAuth1.0 code
OAuth support for SmugMug needs some additional features
(auth-rebuild on redirect, query parameters in URL, ...)
and fixing this in the old code wouldn't work all that well.
2018-05-10 18:47:05 +02:00
Mike Fährmann
69a5e6ddb3
Merge branch 'master' into 1.4-dev 2018-05-04 10:19:02 +02:00
Mike Fährmann
16e014baaa
[smugmug] added image and album extractor
just some initial code that still requires a lot of work ...

TODO:
- folders
- old-style albums (which are nearly all of them ...)
- images from users
- OAuth

It could also happen that the API credentials used will become invalid
whenever my 14 day trial period ends (7 days remaining), but that
would just require users to supply their own.
2018-04-29 21:27:25 +02:00
Mike Fährmann
cc36f88586
rename safe_int to parse_int; move parse_* to text module 2018-04-20 14:53:21 +02:00
Mike Fährmann
51ea699083
add 'abort()' as function to filter expressions
calling 'abort()' in a filter aborts the current extractor run
in a cleaner way than using something like 1/0, which
causes an error message to be printed
2018-04-12 17:07:12 +02:00
Mike Fährmann
3f2dd6b6f8
avoid double path-separators
(#74)
2018-03-22 10:24:59 +01:00
Mike Fährmann
b69cc94f0e
[util] implement bencode() 2018-03-14 13:17:34 +01:00
Mike Fährmann
749fbbfa6c
[mangadex] add chapter- and manga-extractor 2018-03-05 18:37:21 +01:00
Mike Fährmann
2fad0b1f1b
add 'U' conversion for format strings to unquote their content
(#74)
2018-02-25 21:57:59 +01:00
Mike Fährmann
8cdce21dcb
make archive keys user-configurable 2018-02-25 21:57:01 +01:00
Mike Fährmann
e1e0668ca8
add option to set default replacement field value
Missing or undefined keywords will now be replaced with the value
set for 'keywords-default'. The default is Python's 'None', which
is equivalent to setting this option to JSON's 'null'.
2018-02-23 00:59:20 +01:00
Mike Fährmann
ac3da8115e
[util] don't add text: URLs to list of downloaded URLs 2018-02-20 18:14:27 +01:00
Mike Fährmann
b50bdbf3d7
change config specifiers in input file format
Instead of a dictionary/object, input file options are now specified
by a 'key=value' pair starting with '-' for options only applying to
the next URL or '-G' for Global options applying to all following URLs.

See the docstring of parse_inputfile() for details.

Example option specifiers:

- filename = "{id}.{extension}"
- extractor.pixiv.user.directory = ["Pixiv Users", "{user[id]}"]
-spaces="are_optional"
-G keywords = {"global": "option"}
2018-02-16 03:10:41 +01:00
Mike Fährmann
f970a8f13c
fix adding keys to download archive when using skip=false 2018-02-13 23:45:30 +01:00
Mike Fährmann
179bcdd349
adjust archive-ids 2018-02-13 04:50:45 +01:00
Mike Fährmann
3cec533c28
Merge branch 'archive' 2018-02-12 18:07:58 +01:00
Mike Fährmann
b73b8b4f50
add OAuth unittests 2018-02-12 17:07:07 +01:00
Mike Fährmann
4d2fadfb6f
restore skip actions with download archive 2018-02-12 16:56:45 +01:00
Mike Fährmann
65773263fc
[util] implement OAuthSession.urlencode() (closes #75)
- Python's own urllib.parse.urlencode() has no quote_via argument in
  Python 3.3 and 3.4, which is necessary to follow  OAuth 1.0 quoting
  rules.
2018-02-10 21:56:13 +01:00
Mike Fährmann
057668e17e
extend input-file format with per-URL config and comments
- see docstring of parse_inputfile() for details
- TODO: unittests, recursion (currently setting for example
  {"extractor": {"key": "value"}} will override the whole "extractor"
  branch instead of merging {"key": "value"} into the already existing
  dictionary)
2018-02-07 21:47:27 +01:00
Mike Fährmann
347baf7ac5
improve util.parse_range() performance
It is never going to actually matter, but using partition() instead
of split() is twice as fast.
2018-02-05 22:28:11 +01:00
Mike Fährmann
aa38eab2be
allow not-defined fields in format strings
... and replace them with "None", for now
2018-02-03 22:28:41 +01:00
Mike Fährmann
84a52a9256
add DownloadArchive class 2018-01-30 15:23:23 +01:00
Mike Fährmann
db7f04dd97
emit log messages on download failure
and when retrying with fallback URLs
2018-01-28 18:44:10 +01:00
Mike Fährmann
6174a5c4ef
[download] adjust filename extension on filetype mismatch
(closes #63)
2018-01-17 18:37:06 +01:00
Mike Fährmann
f10ffc0839
update extractor blacklist to also allow classes 2018-01-14 18:47:22 +01:00
Mike Fährmann
29d75fc3fa
[tumblr] add support for OAuth authentication (#65) 2018-01-11 14:11:37 +01:00
Mike Fährmann
d241a0fb60
[util] replace '/' with '\' in base-directory paths
... on Windows to have consistent path separators.
2017-12-21 21:56:24 +01:00
Mike Fährmann
93482a1f88
implement 'util.advance()' 2017-12-03 01:38:24 +01:00
Mike Fährmann
a718c6c6cd
implement 'util.parse_bytes()' 2017-12-02 01:24:49 +01:00
Mike Fährmann
caf26412dd
add option to set alternate location of .part files (#29)
Note: The path set for 'downloader.*.part-directory' needs to point to an
already existing directory.
2017-10-26 00:16:48 +02:00
Mike Fährmann
ea8ca4cfa4
add 'util.expand_path()' 2017-10-26 00:04:28 +02:00
Mike Fährmann
963670d73b
add options to control usage of .part files (#29)
- '--no-part' command line option to disable them
- 'downloader.http.part' and 'downloader.text.part' config options

Disabling .part files restores the behaviour of the old downloader
implementation.
2017-10-24 23:33:44 +02:00
Mike Fährmann
b0353aa02d
rewrite download modules (#29)
- use '.part' files during file-download
- implement continuation of incomplete downloads
- check if file size matches the one reported by server
2017-10-24 12:53:03 +02:00
Mike Fährmann
832b8b76ac
[util] extend global namespace for filter expressions 2017-10-09 22:12:58 +02:00
Mike Fährmann
8e6a767109
[util] restructure formatter for better exception propagation 2017-10-06 17:10:35 +02:00
Mike Fährmann
8df023e144
[util:filter] re-enable builtins
Trying to restrict access to Python's builtin functions (exec,
print, __import__, ...) can easily be circumvented and is
therefore completely pointless.

This also adds 'safe_int()' and the 'datetime' module to the global
namespace used when evaluating filter expressions.
2017-10-04 16:00:12 +02:00
Mike Fährmann
b319f4bab3
smaller code and text changes 2017-10-01 18:23:40 +02:00
Mike Fährmann
c1f0afe4c6
add custom string formatter class 2017-09-28 17:12:39 +02:00
Mike Fährmann
9fc1d0c901
implement and use 'util.safe_int()'
same as Python's 'int()', except it doesn't raise any exceptions and
accepts a default value
2017-09-24 15:59:25 +02:00
Mike Fährmann
9b21d3f13c
add '--filter' command-line option
This allows for image filtering via Python expressions by the same
metadata that is also used to build filenames (--list-keywords).

The usually shunned eval() function is used to evaluate
filter-expressions, but it seemed quite appropriate in this case and
shouldn't introduce any new security issues, as any attacker that could do
> gallery-dl --filter "delete-everything()" ...
could as well do
> python -c "delete-everything()"
2017-09-08 17:52:00 +02:00
Mike Fährmann
268cfa3cfe
filter duplicate URLs (#36)
Duplicate URLs might occur if, for example,  an artist adds another
image to his gallery while an extractor is running and images are being
downloaded on sites like pixiv/nijie/hentaifoundry.
The next image on the next page will have already been downloaded and
will cause a premature end if '--abort-on-skip' is being used.
2017-09-06 17:08:50 +02:00