1
0
mirror of https://github.com/mikf/gallery-dl.git synced 2024-11-23 19:22:32 +01:00
Commit Graph

144 Commits

Author SHA1 Message Date
Mike Fährmann
0ce98169b8
improve path generation
- fix 'abspath()' results for Python <3.7 (closes #402)
  - 'abspath()' in Python 3.7+ removes trailing path separators
  - in Python <3.7 it doesn't
- filter empty path segments
2019-08-28 23:25:18 +02:00
Mike Fährmann
3284c62f22
ensure PathFormat.directory ends with a path separator
... plus some other small optimizations
2019-08-20 00:25:13 +02:00
Mike Fährmann
e77a656437
optimize directory path generation
- use str.join() instead of os.path.join()
  (less "features", but 10x as fast)
- cache directory formatters
- detect and optimize field access for 1-element format strings
2019-08-19 15:56:20 +02:00
Mike Fährmann
454bf1ebf9
preserve enumeration index after 'set_extension()' (#306) 2019-08-16 23:12:33 +02:00
Mike Fährmann
f5039b897f
replace DownloadArchive.check() with __contains__()
Interestingly enough, 'a in obj' is slightly faster than
'obj.check(a)' and is also nicer to look at, I think.
2019-08-16 23:12:32 +02:00
Mike Fährmann
5a210991b6
Remove control characters from filesystem paths
- add 'path-remove' option to specify the set of characters that
 should be removed
- rename 'restrict-filenames' to 'path-restrict'
- #348, #380
2019-08-16 23:12:16 +02:00
Mike Fährmann
0bb873757a
update PathFormat class
- change 'has_extension' from a simple flag/bool to a field that
  contains the original filename extension
- rename 'keywords' to 'kwdict' and some other stuff as well
- inline 'adjust_path()'
- put enumeration index before filename extension (#306)
2019-08-12 21:40:37 +02:00
Mike Fährmann
8dc42bb178
implement 'enumerate' for 'extractor.skip' (#306)
[ci skip]
2019-08-08 18:37:54 +02:00
Mike Fährmann
b1bea8aaeb
add 'restrict-filenames' option (#348) 2019-07-23 17:41:24 +02:00
Mike Fährmann
7b77ecc35a
fix paths for files without extension (#220) 2019-07-15 16:39:03 +02:00
Mike Fährmann
16c582aaf9
implement 'mtime' post-processor (#332)
This can set a file's modification time according to a UNIX timestamp
or a datetime object from its metadata.
2019-07-14 22:39:17 +02:00
Mike Fährmann
40da44b17f
Merge branch 'v1.9.0' 2019-06-29 15:39:52 +02:00
Mike Fährmann
95b1e4c3c0
implement R<old>/<new>/ format option (#318) 2019-06-23 22:45:44 +02:00
Mike Fährmann
f4ba98771d
use Last-Modified header to set file modification time
(#236, #277)
2019-06-19 23:16:32 +02:00
Mike Fährmann
523ebc9b0b
Fix serialization of 'datetime' objects in '--write-metadata'
Simplified universal serialization support in json.dump() can be achieved
by passing 'default=str', which was already the case in DataJob.run()
for -j/--dump-json, but not for the 'metadata' post-processor.

This commit introduces util.dump_json() that (more or less) unifies the
JSON output procedure of both --write-metadata and --dump-json.

(#251, #252)
2019-05-09 16:49:22 +02:00
Mike Fährmann
23baecb29e
fix 'CONVERSIONS' variable name 2019-03-05 22:50:56 +01:00
Mike Fährmann
105097ddcf
add 'S' conversion options for format string fields
Same as 's' (convert to string), but has a better, human-readable
conversion for lists.
2019-03-04 21:13:34 +01:00
Mike Fährmann
148b8f15d0
update tests for util.py 2019-02-14 11:15:19 +01:00
Mike Fährmann
ae353ed3b0
provide "extractor" and "job" keys for logging output
This allows for stuff like "{extractor.url}" and "{extractor.category}"
in logging format strings.
Accessing 'extractor' and 'job' in any way will return "None" if those
fields aren't defined, i.e. in general logging messages.
2019-02-14 11:09:58 +01:00
Mike Fährmann
79c01ec7ae
implement J<separator>/ format option
J joins list elements by calling <separator>.join(list):

Example:
{f:J - /} -> "a - b - c" (if "f" is ["a", "b", "c"])
2019-01-17 17:01:58 +01:00
Mike Fährmann
c5d4f558c9
allow missing field access keys in format strings (#136) 2018-12-22 13:54:14 +01:00
Mike Fährmann
d3d7f01543
add 'prepare()' step for post-processors
This allows post-processors to modify the destination path before
checking if a file already exists.
2018-10-18 22:32:03 +02:00
Mike Fährmann
6ed629f2b6
allow specifying number of skips before abort/exit (closes #115)
In addition to 'abort' and 'exit', it is now possible to specify
'abort:N' and 'exit:N' (where N is any integer) as value for 'skip'
to abort/exit after consecutively skipping N downloads.
2018-10-13 17:21:55 +02:00
Mike Fährmann
48a8717a7c
add 'output.num-to-str' option
... to convert any numeric values to string when outputting them as JSON
(during '--dump-json' or otherwise)
2018-10-08 20:28:54 +02:00
Mike Fährmann
0514d6a0ae
make --filter and --range config-file options
The functionality of --(chapter-)filter and --(chapter-)range are now
also exposed as the following config-file options:

- extractor.*.image-filter
- extractor.*.image-range
- extractor.*.chapter-filter
- extractor.*.chapter-range

TODO: update configuration.rst
2018-10-07 21:39:56 +02:00
Mike Fährmann
590c0b3ad5
re-implement and improve filename formatter
A format string now gets parsed only once instead of re-parsing it each
time it is applied to a set of data.

The initial parsing causes directory path creation to be at about 2x
slower than before, since each format string there is used only once,
but building a filename, the more common operation, is at least 2x
faster. The "directory slowness" cancels at about 5 filenames and
everything above that is significantly faster.
2018-08-25 10:45:14 +02:00
Mike Fährmann
c83fc62abc
prioritize archive over disk access (#87) 2018-07-30 17:48:23 +02:00
Mike Fährmann
e0dd8dff5f
implement L<maxlen>/<replacement>/ format option
The L option allows for the contents of a format field to be replaced
with <replacement> if its length is greater than <maxlen>.

Example:
{f:L5/too long/} -> "foo"      (if "f" is "foo")
                 -> "too long" (if "f" is "foobar")

(#92) (#94)
2018-07-29 13:52:07 +02:00
Mike Fährmann
8fe9056b16
implement string slicing for format strings
It is now possible to slice string (or list) values of format string
replacement fields with the same syntax as in regular Python code.

"{digits}"       -> "0123456789"
"{digits[2:-2]}" -> "234567"
"{digits[:5]}"   -> "01234"

The optional third parameter (step) has been left out to simplify things.
2018-07-14 09:53:15 +02:00
Mike Fährmann
a9e276bc37
reset delete-flag
Since 'PathFormat' objects are being reused, setting `delete`
to True once caused all files downloaded after to be deleted as well.
2018-06-20 18:12:59 +02:00
Mike Fährmann
baccf8a958
improve postprocessor handling
- add pathfmt argument for __init__()
- add finalization step
- add option to keep or delete zipped files
2018-06-08 17:39:02 +02:00
Mike Fährmann
7646bdbcfd
improve postprocessor initialization code 2018-06-07 22:29:54 +02:00
Mike Fährmann
821535b458
adjust PathFormat class 2018-06-06 20:17:17 +02:00
Mike Fährmann
6a31ada9e3
re-implement OAuth1.0 code
OAuth support for SmugMug needs some additional features
(auth-rebuild on redirect, query parameters in URL, ...)
and fixing this in the old code wouldn't work all that well.
2018-05-10 18:47:05 +02:00
Mike Fährmann
69a5e6ddb3
Merge branch 'master' into 1.4-dev 2018-05-04 10:19:02 +02:00
Mike Fährmann
16e014baaa
[smugmug] added image and album extractor
just some initial code that still requires a lot of work ...

TODO:
- folders
- old-style albums (which are nearly all of them ...)
- images from users
- OAuth

It could also happen that the API credentials used will become invalid
whenever my 14 day trial period ends (7 days remaining), but that
would just require users to supply their own.
2018-04-29 21:27:25 +02:00
Mike Fährmann
cc36f88586
rename safe_int to parse_int; move parse_* to text module 2018-04-20 14:53:21 +02:00
Mike Fährmann
51ea699083
add 'abort()' as function to filter expressions
calling 'abort()' in a filter aborts the current extractor run
in a cleaner way than using something like 1/0, which
causes an error message to be printed
2018-04-12 17:07:12 +02:00
Mike Fährmann
3f2dd6b6f8
avoid double path-separators
(#74)
2018-03-22 10:24:59 +01:00
Mike Fährmann
b69cc94f0e
[util] implement bencode() 2018-03-14 13:17:34 +01:00
Mike Fährmann
749fbbfa6c
[mangadex] add chapter- and manga-extractor 2018-03-05 18:37:21 +01:00
Mike Fährmann
2fad0b1f1b
add 'U' conversion for format strings to unquote their content
(#74)
2018-02-25 21:57:59 +01:00
Mike Fährmann
8cdce21dcb
make archive keys user-configurable 2018-02-25 21:57:01 +01:00
Mike Fährmann
e1e0668ca8
add option to set default replacement field value
Missing or undefined keywords will now be replaced with the value
set for 'keywords-default'. The default is Python's 'None', which
is equivalent to setting this option to JSON's 'null'.
2018-02-23 00:59:20 +01:00
Mike Fährmann
ac3da8115e
[util] don't add text: URLs to list of downloaded URLs 2018-02-20 18:14:27 +01:00
Mike Fährmann
b50bdbf3d7
change config specifiers in input file format
Instead of a dictionary/object, input file options are now specified
by a 'key=value' pair starting with '-' for options only applying to
the next URL or '-G' for Global options applying to all following URLs.

See the docstring of parse_inputfile() for details.

Example option specifiers:

- filename = "{id}.{extension}"
- extractor.pixiv.user.directory = ["Pixiv Users", "{user[id]}"]
-spaces="are_optional"
-G keywords = {"global": "option"}
2018-02-16 03:10:41 +01:00
Mike Fährmann
f970a8f13c
fix adding keys to download archive when using skip=false 2018-02-13 23:45:30 +01:00
Mike Fährmann
179bcdd349
adjust archive-ids 2018-02-13 04:50:45 +01:00
Mike Fährmann
3cec533c28
Merge branch 'archive' 2018-02-12 18:07:58 +01:00
Mike Fährmann
b73b8b4f50
add OAuth unittests 2018-02-12 17:07:07 +01:00
Mike Fährmann
4d2fadfb6f
restore skip actions with download archive 2018-02-12 16:56:45 +01:00
Mike Fährmann
65773263fc
[util] implement OAuthSession.urlencode() (closes #75)
- Python's own urllib.parse.urlencode() has no quote_via argument in
  Python 3.3 and 3.4, which is necessary to follow  OAuth 1.0 quoting
  rules.
2018-02-10 21:56:13 +01:00
Mike Fährmann
057668e17e
extend input-file format with per-URL config and comments
- see docstring of parse_inputfile() for details
- TODO: unittests, recursion (currently setting for example
  {"extractor": {"key": "value"}} will override the whole "extractor"
  branch instead of merging {"key": "value"} into the already existing
  dictionary)
2018-02-07 21:47:27 +01:00
Mike Fährmann
347baf7ac5
improve util.parse_range() performance
It is never going to actually matter, but using partition() instead
of split() is twice as fast.
2018-02-05 22:28:11 +01:00
Mike Fährmann
aa38eab2be
allow not-defined fields in format strings
... and replace them with "None", for now
2018-02-03 22:28:41 +01:00
Mike Fährmann
84a52a9256
add DownloadArchive class 2018-01-30 15:23:23 +01:00
Mike Fährmann
db7f04dd97
emit log messages on download failure
and when retrying with fallback URLs
2018-01-28 18:44:10 +01:00
Mike Fährmann
6174a5c4ef
[download] adjust filename extension on filetype mismatch
(closes #63)
2018-01-17 18:37:06 +01:00
Mike Fährmann
f10ffc0839
update extractor blacklist to also allow classes 2018-01-14 18:47:22 +01:00
Mike Fährmann
29d75fc3fa
[tumblr] add support for OAuth authentication (#65) 2018-01-11 14:11:37 +01:00
Mike Fährmann
d241a0fb60
[util] replace '/' with '\' in base-directory paths
... on Windows to have consistent path separators.
2017-12-21 21:56:24 +01:00
Mike Fährmann
93482a1f88
implement 'util.advance()' 2017-12-03 01:38:24 +01:00
Mike Fährmann
a718c6c6cd
implement 'util.parse_bytes()' 2017-12-02 01:24:49 +01:00
Mike Fährmann
caf26412dd
add option to set alternate location of .part files (#29)
Note: The path set for 'downloader.*.part-directory' needs to point to an
already existing directory.
2017-10-26 00:16:48 +02:00
Mike Fährmann
ea8ca4cfa4
add 'util.expand_path()' 2017-10-26 00:04:28 +02:00
Mike Fährmann
963670d73b
add options to control usage of .part files (#29)
- '--no-part' command line option to disable them
- 'downloader.http.part' and 'downloader.text.part' config options

Disabling .part files restores the behaviour of the old downloader
implementation.
2017-10-24 23:33:44 +02:00
Mike Fährmann
b0353aa02d
rewrite download modules (#29)
- use '.part' files during file-download
- implement continuation of incomplete downloads
- check if file size matches the one reported by server
2017-10-24 12:53:03 +02:00
Mike Fährmann
832b8b76ac
[util] extend global namespace for filter expressions 2017-10-09 22:12:58 +02:00
Mike Fährmann
8e6a767109
[util] restructure formatter for better exception propagation 2017-10-06 17:10:35 +02:00
Mike Fährmann
8df023e144
[util:filter] re-enable builtins
Trying to restrict access to Python's builtin functions (exec,
print, __import__, ...) can easily be circumvented and is
therefore completely pointless.

This also adds 'safe_int()' and the 'datetime' module to the global
namespace used when evaluating filter expressions.
2017-10-04 16:00:12 +02:00
Mike Fährmann
b319f4bab3
smaller code and text changes 2017-10-01 18:23:40 +02:00
Mike Fährmann
c1f0afe4c6
add custom string formatter class 2017-09-28 17:12:39 +02:00
Mike Fährmann
9fc1d0c901
implement and use 'util.safe_int()'
same as Python's 'int()', except it doesn't raise any exceptions and
accepts a default value
2017-09-24 15:59:25 +02:00
Mike Fährmann
9b21d3f13c
add '--filter' command-line option
This allows for image filtering via Python expressions by the same
metadata that is also used to build filenames (--list-keywords).

The usually shunned eval() function is used to evaluate
filter-expressions, but it seemed quite appropriate in this case and
shouldn't introduce any new security issues, as any attacker that could do
> gallery-dl --filter "delete-everything()" ...
could as well do
> python -c "delete-everything()"
2017-09-08 17:52:00 +02:00
Mike Fährmann
268cfa3cfe
filter duplicate URLs (#36)
Duplicate URLs might occur if, for example,  an artist adds another
image to his gallery while an extractor is running and images are being
downloaded on sites like pixiv/nijie/hentaifoundry.
The next image on the next page will have already been downloaded and
will cause a premature end if '--abort-on-skip' is being used.
2017-09-06 17:08:50 +02:00
Mike Fährmann
9bf9d64ad8
update unittests for util.py 2017-08-13 14:31:22 +02:00
Mike Fährmann
e3bfb8325a
fix circular dependency
- util.py imported config.py and vice versa
- Python < 3.5 doesn't like this
2017-08-12 21:32:24 +02:00
Mike Fährmann
004456d5d5
properly update the config-dictionary
When using 2 or more config files, the values of the second would
improperly overwrite nested dictionaries of the first one.
The new method properly combines these nested dictionaries as well.
2017-08-12 20:07:27 +02:00
Mike Fährmann
ae2d61e5b3
handle format string exceptions separately 2017-08-11 21:48:37 +02:00
Mike Fährmann
d74a635e41
[util] update 'default' values and improve test coverage
for 'code_to_language()' and 'language_to_code()'
2017-08-08 19:22:04 +02:00
rachmadani haryono
dcd573806e chg: dev: fix error (#32)
* fix: dev: error

* fix: dev: AttributeError when getting artist

* fix: dev: typo on luscious parser
2017-08-04 15:01:10 +02:00
Mike Fährmann
0610ae5000
skip login if cookies are present 2017-07-17 10:33:36 +02:00
Mike Fährmann
2993206c4b
smaller fixes and "security" measures
- move the OAuthSession class into util.py
- block special extractors for reddit and recursive
- ignore 'only matching' tests for testresults script
2017-06-16 21:01:40 +02:00
Mike Fährmann
72f1c6f87a
[flickr] add support for flic.kr/p/... URLs
Example:
    https://flic.kr/p/FPVo9U
2017-06-02 09:01:35 +02:00
Mike Fährmann
107d29ad8a
improve handling of text:... URLs
- don't require // after the colon
- open output files in text mode
2017-05-12 14:10:25 +02:00
Mike Fährmann
ef90a2de2f
implement the "exit" option for the "skip" config-key 2017-05-05 15:49:58 +02:00
Mike Fährmann
fc9223c072
add '--abort-on-skip' option and ability to control skip behavior
the 'skip' config option controls skipping behavior:
    true    - skip download if file already exist (default)
    false   - download and overwrite files even if it exists
    "abort" - abort extractor run if a download would be skipped
              (same as '--abort-on-skip')
2017-05-03 15:26:04 +02:00
Mike Fährmann
841fd50242
move code into util.py 2017-03-28 13:12:44 +02:00
Mike Fährmann
7a9d66fbce
implement basic way to tell extractors to skip ahead 2017-03-03 17:26:50 +01:00
Mike Fährmann
6208d9dd79
implement '--images' and '--chapters' options
- the former '--items' has been renamed to '--chapters'
- #6
2017-02-23 21:51:29 +01:00
Mike Fährmann
2a32b12043
add '--items' option
this allows to specify which manga-chapters/comic-issues to download
when using gallery-dl on a manga/comic URL
2017-02-20 22:02:49 +01:00
Mike Fährmann
513808d156 move code from util.py 2015-04-08 01:46:04 +02:00
Mike Fährmann
b630753e5e add 'method' parameter 2014-10-31 23:38:21 +01:00
Mike Fährmann
deef91eddc initial commit 2014-10-12 21:56:44 +02:00