Simplified universal serialization support in json.dump() can be achieved
by passing 'default=str', which was already the case in DataJob.run()
for -j/--dump-json, but not for the 'metadata' post-processor.
This commit introduces util.dump_json() that (more or less) unifies the
JSON output procedure of both --write-metadata and --dump-json.
(#251, #252)
This allows for stuff like "{extractor.url}" and "{extractor.category}"
in logging format strings.
Accessing 'extractor' and 'job' in any way will return "None" if those
fields aren't defined, i.e. in general logging messages.
In addition to 'abort' and 'exit', it is now possible to specify
'abort:N' and 'exit:N' (where N is any integer) as value for 'skip'
to abort/exit after consecutively skipping N downloads.
The functionality of --(chapter-)filter and --(chapter-)range are now
also exposed as the following config-file options:
- extractor.*.image-filter
- extractor.*.image-range
- extractor.*.chapter-filter
- extractor.*.chapter-range
TODO: update configuration.rst
A format string now gets parsed only once instead of re-parsing it each
time it is applied to a set of data.
The initial parsing causes directory path creation to be at about 2x
slower than before, since each format string there is used only once,
but building a filename, the more common operation, is at least 2x
faster. The "directory slowness" cancels at about 5 filenames and
everything above that is significantly faster.
The L option allows for the contents of a format field to be replaced
with <replacement> if its length is greater than <maxlen>.
Example:
{f:L5/too long/} -> "foo" (if "f" is "foo")
-> "too long" (if "f" is "foobar")
(#92) (#94)
It is now possible to slice string (or list) values of format string
replacement fields with the same syntax as in regular Python code.
"{digits}" -> "0123456789"
"{digits[2:-2]}" -> "234567"
"{digits[:5]}" -> "01234"
The optional third parameter (step) has been left out to simplify things.
OAuth support for SmugMug needs some additional features
(auth-rebuild on redirect, query parameters in URL, ...)
and fixing this in the old code wouldn't work all that well.
just some initial code that still requires a lot of work ...
TODO:
- folders
- old-style albums (which are nearly all of them ...)
- images from users
- OAuth
It could also happen that the API credentials used will become invalid
whenever my 14 day trial period ends (7 days remaining), but that
would just require users to supply their own.
calling 'abort()' in a filter aborts the current extractor run
in a cleaner way than using something like 1/0, which
causes an error message to be printed
Missing or undefined keywords will now be replaced with the value
set for 'keywords-default'. The default is Python's 'None', which
is equivalent to setting this option to JSON's 'null'.
Instead of a dictionary/object, input file options are now specified
by a 'key=value' pair starting with '-' for options only applying to
the next URL or '-G' for Global options applying to all following URLs.
See the docstring of parse_inputfile() for details.
Example option specifiers:
- filename = "{id}.{extension}"
- extractor.pixiv.user.directory = ["Pixiv Users", "{user[id]}"]
-spaces="are_optional"
-G keywords = {"global": "option"}
- see docstring of parse_inputfile() for details
- TODO: unittests, recursion (currently setting for example
{"extractor": {"key": "value"}} will override the whole "extractor"
branch instead of merging {"key": "value"} into the already existing
dictionary)