1
0
mirror of https://github.com/mikf/gallery-dl.git synced 2024-11-25 12:12:34 +01:00
Commit Graph

231 Commits

Author SHA1 Message Date
Mike Fährmann
bb3e182562
overhaul session initialization
- share adapter & connection pool across sessions with the same
  ssl options, ssl ciphers, and source address
- simplify browser emulation to just a list of headers and ciphers
2022-01-31 23:12:08 +01:00
Mike Fährmann
6e0a6c484f
apply SPECIAL_EXTRACTORS only for blacklist settings
as was the case before 010d65dc
2022-01-06 21:09:30 +01:00
Mike Fährmann
010d65dcec
extend blacklist/whitelist syntax (#2025)
Each entry in such a list can now also include a subcategory
'<category>:<subcategory>'
and it is possible to use '*' or an empty string as placeholder
'*:<subcategory>', ':<subcategory>', '<category>:*'

For example
  "blacklist": "imgur,*:tag,gfycat:user" or
  "blacklist": ["imgur", "*:tag", "gfycat:user"]
will filter all 'imgur' extractors, all extractors  with a 'tag'
subcategory (e.g. https://danbooru.donmai.us/posts?tags=bonocho),
and all 'gfycat' user extractors.
2021-11-23 20:31:43 +01:00
Mike Fährmann
cad85640de
move 'util.PathFormat' into its own 'path' module
to prevent circular imports between 'formatter' and 'util'
2021-09-27 21:29:37 +02:00
Mike Fährmann
74145467dd
move 'util.Formatter' into its own 'formatter' module 2021-09-27 02:37:04 +02:00
Mike Fährmann
c9e6693530
allow specifying a minimum/maximum for 'sleep-*' options (#1835)
for example '"sleep-request": [5.0, 10.0]' to wait between 5 and 10
seconds between each HTTP request
2021-09-14 17:40:05 +02:00
Mike Fährmann
d79bcb6236
allow extractors to register a 'finalize()' method 2021-09-07 21:15:30 +02:00
Mike Fährmann
72c0cd30c7
do not return with a nonzero exit status when no results found
also change loglevel from 'warning' to 'info'
(#1789)
2021-08-24 18:49:13 +02:00
Mike Fährmann
bd08ee2859
remove most 'yield Message.Version' statements
only leave them in oauth.py as noop results
2021-08-16 03:10:48 +02:00
Mike Fährmann
bdfdabf498
show warning if extractor doesn't yield any results (#1759) 2021-08-16 02:49:36 +02:00
Mike Fährmann
d320ee6251
implement a 'fallback' option (closes #1770) 2021-08-16 01:47:59 +02:00
Mike Fährmann
1b2f9050fb
rename all instances of 'kwds' to 'kwdict' 2021-07-20 20:21:19 +02:00
Mike Fährmann
b9783403d9
add 'url-metadata' option (#1659, #1073) 2021-07-14 03:08:49 +02:00
Mike Fährmann
e95f99882f
extend 'parent-metadata' functionality (#1687, #1651, #1364) 2021-07-14 02:53:41 +02:00
Mike Fährmann
64986f9435
fix depth counter in UrlJob
regression from adf4d661

It would either stop at the first level (-g) or go infinitely deep (-G)
Going down to for example level 3 with -ggg didn't work.
2021-06-26 00:30:03 +02:00
Mike Fährmann
83fc4c1098
update post processor config capabilities
This change makes it possible to specify just the name of a post processor
in the "postprocessors" list instead of a dict with all of its options.
The options for it will then be taken from inside the "postprocessor"
block similar to "extractor", "downloader", or "output" blocks.

This makes it possible to for example override the default settings for
--write-metadata by specifying a custom "metadata" block, or to set a
custom post processor block ("cbz") and then use it by referencing just
its name in "postprocessors" lists.

{
    "postprocessor":
    {
        "metadata": {
            "name": "metadata",
            "event": "post",
            "filename": "{tweet_id|post_id|id}.json"
        },
        "cbz": {
            "name"       : "zip",
            "compression": "store",
            "extension"  : "cbz"
        }
    }
}
2021-06-05 14:11:16 +02:00
Mike Fährmann
3cbbefd4ed
support 'filter' option for post processors (#1460) 2021-06-04 18:23:32 +02:00
Mike Fährmann
adf4d661b3
use '_extractor' info in UrlJobs 2021-05-19 15:52:30 +02:00
Mike Fährmann
b50b8e6cf4
refactor applying 'parent-…' options 2021-05-13 21:56:34 +02:00
Mike Fährmann
7ab8374385
add 'parent-skip' option (#1399) 2021-05-13 16:40:04 +02:00
Mike Fährmann
c693db5b1a
add '"skip": "terminate"' option
Stops not only the current extractor/job,
but all parent extractors/jobs as well.
2021-05-12 02:22:28 +02:00
Mike Fährmann
c5ca7905ce
add 'noop()' and 'identity()' functions 2021-05-04 19:27:17 +02:00
Mike Fährmann
5b4da4b4bf
reorder config access in Job constructor
(#1111)
2021-04-27 15:12:59 +02:00
Mike Fährmann
b4ed7cb961
fix 'category-transfer' (#1111)
broken since commit 055c32e0
2021-04-19 00:55:44 +02:00
Mike Fährmann
a86ffb04bb
add 'output.fallback' option
to enable/disable fallback URLs for -g/--get-urls
2021-04-12 02:00:41 +02:00
Mike Fährmann
a75e485461
add archive format to InfoJob output (#875) 2021-04-07 21:50:16 +02:00
Mike Fährmann
bf241811dd
allow '_extractor' fields to be None or empty 2021-03-20 01:19:31 +01:00
Mike Fährmann
23641742a3
improve 'parent-directory' (#1364)
Allow forwarding metadata from the top-level extractor to all children
if 'parent-directory' is enabled for all extractors along the way.

For example 'reddit' -> 'gfycat' -> 'redgifs'
2021-03-14 17:19:57 +01:00
Mike Fährmann
df94182e11
implement 'parent-metadata' option (#1364)
experimental, might not work as expected, etc.
2021-03-11 01:10:34 +01:00
Mike Fährmann
b6719becf1
ensure '-s/--simulate' always prints filenames (#1360)
by assuming a potentially wrong filename extension in cases where the
correct one would only get known after a download started
2021-03-07 22:38:20 +01:00
Mike Fährmann
c963741860
add '-E/--extractor-info' command-line option (#875) 2021-03-02 23:59:56 +01:00
Mike Fährmann
65ca923b4e
fix 'whitelist' option for BaseExtractor instances 2021-02-15 21:58:33 +01:00
Mike Fährmann
56a8968435
remove 'Message.Metadata' (#866) 2021-01-31 02:12:37 +01:00
Mike Fährmann
46323ae6ff
initialize 'hooks' as empty tuple
follow-up to 9c29fc4e

Prevent a "race" between initializing 'pathfmt' and 'hooks',
and receiving a signal in between (e.g. ctrl+c),
which would then crash in 'handle_finalize()'.
2020-11-28 18:18:49 +01:00
Mike Fährmann
9c29fc4e55
always initialize DownloadJob.hooks (fixes #1135)
and not just when any (potential) post processors are defined
2020-11-28 00:09:19 +01:00
Mike Fährmann
9fffa9c343
rework post processor callbacks 2020-11-19 02:29:06 +01:00
Mike Fährmann
f99c6031e0
apply post processor blacklists/whitelists to basecategories
(#1103)
2020-11-17 02:02:31 +01:00
Mike Fährmann
a3ca2f6080
update fallback URL handling
remove Message.Urllist and use a '_fallback' field inside a kwdict
2020-10-16 01:09:55 +02:00
Mike Fährmann
fd20093c96
allow blacklist/whitelist to be empty lists/strings (#1051) 2020-10-08 14:55:21 +02:00
Mike Fährmann
d5fa716d89
fix crash when using 'skip=false' and archive (fixes #1023)
Separating the archive check from pathfmt.exists() in b5243297
had some unintended side effects.

It is also not possible to monkey-patch a dunder method like
__contains__ because of the special method lookup that gets
performed for them.
2020-09-23 19:07:40 +02:00
Mike Fährmann
231dd4c800
accumulate postprocessor objects (#994)
Instead of one 'postprocessors' setting overwriting all others lower
in the hierarchy, all postprocessors along the config path will now
get collected into one big list.

For example '--mtime-from-date' will therefore no longer cause
other postprocessor settings in a config file to get ignored.
2020-09-14 21:51:55 +02:00
Mike Fährmann
3afd362e2e
add 'sleep-extractor' option (closes #964)
(would have been nice if this were possible without code duplication)
2020-09-12 21:04:47 +02:00
Mike Fährmann
c78aa17506
add general 'blacklist' and 'whitelist' options (#492, #844) 2020-09-11 13:17:12 +02:00
Mike Fährmann
5912727b88
support format string replacement fields in archive paths
(closes #985)
2020-09-10 22:09:30 +02:00
Mike Fährmann
b5243297ff
write skipped files to archive (closes #550) 2020-09-03 18:37:38 +02:00
Mike Fährmann
3f73cc6855
allow 'parent-directory' to work recursively (fixes #905) 2020-07-29 00:31:23 +02:00
Mike Fährmann
d5bfb0b38c
set pseudo extension for Metadata messages (#865)
This prevents pathfmt.filename from potentially being empty.
2020-07-04 22:14:39 +02:00
Mike Fährmann
1b3870a4be
flush after writing JSON in DataJob() (#727)
… and remove the dead handle_finalize() method,
which is never called since DataJob() overrides run().
2020-06-19 23:05:44 +02:00
Mike Fährmann
7e8a747c56
improve output of '-K' for parent extractors 2 (#825)
This is what bb882b8 was supposed to be, but I managed to
not include those changes in the first commit …
2020-06-18 15:04:15 +02:00
Mike Fährmann
ece73b5b2a
make 'path' and 'keywords' available in logging messages
Wrap all loggers used by job, extractor, downloader, and postprocessor
objects into a (custom) LoggerAdapter that provides access to the
underlying job, extractor, pathfmt, and kwdict objects and their
properties.

__init__() signatures for all downloader and postprocessor classes have
been changed to take the current Job object as their first argument,
instead of the current extractor or pathfmt.

(#574, #575)
2020-05-18 19:04:51 +02:00
Mike Fährmann
a1e739b96c
reuse connection adapters from parent extractors 2020-05-12 23:52:01 +02:00
Mike Fährmann
42f29c3e11
improve and simplify attribute access in DownloadJob.initialize() 2020-05-09 00:57:59 +02:00
Mike Fährmann
56f1c96168
implement 'parent-directory' option (#551) 2020-01-29 18:32:37 +01:00
Mike Fährmann
37247dbaff
miscellaneous fixes 2020-01-19 22:53:06 +01:00
Mike Fährmann
0e9dc5c88e
fix AttributeError when accessing 'temppath'
[ci skip]
2020-01-19 00:41:21 +01:00
Mike Fährmann
0b84068d84
remove temp files before downloading from fallback URLs
otherwise the next call to download() with a fallback URL could see
the partially downloaded "remains" from the previous, failed download
attempt and "continue" it, writing the second half of a potentially
different version of that file.
2020-01-18 00:47:17 +01:00
Mike Fährmann
2d4887b75b
improve KeywordJob output for "parent" extractors (closes #548) 2019-12-28 22:26:49 +01:00
Mike Fährmann
2e2fc7f0ad
prevent infinite recursion when spawning extractors (closes #489) 2019-12-26 23:38:16 +01:00
Mike Fährmann
1921c127a5
make OSErrors during file downloads nonfatal (closes #512)
… except ENOSPC (No space left on device), since there is no reason to
continue downloading in that case.

All other errors that would prevent downloading data and writing it to
disk get already raised during directory creation and are therefore not
checked here.
2019-12-19 18:34:05 +01:00
Mike Fährmann
63e6993716
merge 'bypost' functionality into metadata postprocessor 2019-12-16 17:19:23 +01:00
Gio
c0b9ad678d Separate metadata from handle_url into handle_metadata, commenting 2019-12-09 16:02:15 -06:00
Gio
6ed4fc07ff Don't print intentional metadata skips to the console. 2019-12-09 01:02:17 -06:00
Gio
cfc70a97ab Added an additional channel for downloading the metadata of an entire post or gallery. 2019-12-09 00:56:27 -06:00
Mike Fährmann
f5604492c3
update interface of config functions 2019-11-24 00:42:28 +01:00
Mike Fährmann
3fc1e12949
[postprocessor:metadata] filter private entries
i.e. keys starting with an underscore
2019-11-21 16:58:44 +01:00
Mike Fährmann
9e88e7a344
[postprocessor:exec] improve (#421, #413)
- add 'final' option
- include job status in pp finalization
- improve and extend documentation
2019-11-03 21:45:45 +01:00
Mike Fährmann
5af291ba5c
include failed downloads and child extractors in exit status 2019-10-29 15:56:54 +01:00
Mike Fährmann
322c2e7ed4
renaming variables
mostly 'keyword(s)' to 'kwdict'
2019-10-29 15:46:35 +01:00
Mike Fährmann
4409d00141
embed error messages in StopExtraction exceptions 2019-10-28 16:39:49 +01:00
Mike Fährmann
c887493a80
overhaul exception stuff 2019-10-27 23:53:37 +01:00
Mike Fährmann
389d2d7e38
implement 'cookies-update' option (#445) 2019-10-19 15:23:55 +02:00
Mike Fährmann
03bc8adfc7
[postprocessor:exec] run after file moved to target location
(#421)
2019-10-06 23:12:22 +02:00
Mike Fährmann
776e9e073f
close archive on job completion (#417) 2019-09-10 22:43:51 +02:00
Mike Fährmann
9178b54eae
handle errors when opening download archive file (#417) 2019-09-10 16:44:47 +02:00
Mike Fährmann
682105b8ee
prevent crash when loading unavailable downloader (#405) 2019-08-31 21:58:33 +02:00
Mike Fährmann
5f8621b29d
improve output of active post processor modules 2019-08-15 13:31:04 +02:00
Mike Fährmann
0bb873757a
update PathFormat class
- change 'has_extension' from a simple flag/bool to a field that
  contains the original filename extension
- rename 'keywords' to 'kwdict' and some other stuff as well
- inline 'adjust_path()'
- put enumeration index before filename extension (#306)
2019-08-12 21:40:37 +02:00
Mike Fährmann
8dc42bb178
implement 'enumerate' for 'extractor.skip' (#306)
[ci skip]
2019-08-08 18:37:54 +02:00
Mike Fährmann
20f7b07312
ensure postproc finalize() is called during C-c or crash (#355) 2019-07-27 11:14:52 +02:00
Mike Fährmann
7b77ecc35a
fix paths for files without extension (#220) 2019-07-15 16:39:03 +02:00
Mike Fährmann
62097284fe
add 'download' option (#220) 2019-07-14 18:48:18 +02:00
Mike Fährmann
fe7805de7c
improve attribute access in DownloadJob.handle_url()
Storing a value in a local variable an accessing it that way is faster
than going through 'self' if it is accessed more than once.
2019-07-13 21:42:07 +02:00
Mike Fährmann
f2000a69aa
implement 'image-unique' and 'chapter-unique' options (#303)
The default value for both is 'false', i.e. duplicate URLs are NOT
ignored.

The previous behavior was to always ignore duplicate URLs to make
'--abort-on-skip' work properly when new images where added to the
beginning of a collection while gallery-dl is running.
2019-06-29 22:50:17 +02:00
Mike Fährmann
ee4d7c3d89
update downloader.find() and related code
Instead of replacing 'https' with 'http' for every URL in
'get_downloader()', this now only happens once during downloader
initialization. Also unit tests.
2019-06-20 16:59:44 +02:00
Mike Fährmann
523ebc9b0b
Fix serialization of 'datetime' objects in '--write-metadata'
Simplified universal serialization support in json.dump() can be achieved
by passing 'default=str', which was already the case in DataJob.run()
for -j/--dump-json, but not for the 'metadata' post-processor.

This commit introduces util.dump_json() that (more or less) unifies the
JSON output procedure of both --write-metadata and --dump-json.

(#251, #252)
2019-05-09 16:49:22 +02:00
Mike Fährmann
b09a8184ca
move TestJob into test module; test _extractor values 2019-02-17 18:18:31 +01:00
Mike Fährmann
ae353ed3b0
provide "extractor" and "job" keys for logging output
This allows for stuff like "{extractor.url}" and "{extractor.category}"
in logging format strings.
Accessing 'extractor' and 'job' in any way will return "None" if those
fields aren't defined, i.e. in general logging messages.
2019-02-14 11:09:58 +01:00
Mike Fährmann
89ee8cd7e4
filter "private" kwdict entries 2019-02-13 13:22:11 +01:00
Mike Fährmann
61741d7333
provide type information for Queue messages
Child extractors are now directly constructed with Extractor.from_url()
if the extractor class is known beforehand, instead of using
extractor.find() and searching through all possible extractor classes.
2019-02-12 21:32:32 +01:00
Mike Fährmann
277b52101a
add 'category-transfer' option
[ci skip]
2019-01-19 20:28:19 +01:00
Mike Fährmann
5f38ac9609
[postprocessor:exec] add a better error message (#155) 2019-01-13 13:59:11 +01:00
Mike Fährmann
0225d90078
add exception name and traceback for OSErrors 2018-12-04 19:24:50 +01:00
Mike Fährmann
fb53b5dd55
fix control+c during -j and range tests 2018-11-25 18:54:05 +01:00
Mike Fährmann
13cb270326
set target directory before postprocessor init (fixes #126) 2018-11-21 22:21:26 +01:00
Mike Fährmann
b828473aa3
retry HTTP requests for more exception classes 2018-11-19 15:49:13 +01:00
Mike Fährmann
c47482b110
smaller changes, missing docs, etc.
- make 'netrc' extractor-specific
- rename 'downloader.enable' to 'enabled'
- document 'downloader.ytdl.format'
- consistent newlines in configuration.rst
2018-11-16 18:18:07 +01:00
Mike Fährmann
3c25fa2dad
update build_testresult_db.py script 2018-11-15 22:58:14 +01:00
Mike Fährmann
8ef84a6823
add option to enable/disable specific downloader modules
... and write URLs with no (active) downloader to unsupported-file
2018-11-13 18:06:36 +01:00
Mike Fährmann
d3d7f01543
add 'prepare()' step for post-processors
This allows post-processors to modify the destination path before
checking if a file already exists.
2018-10-18 22:32:03 +02:00
Mike Fährmann
6ed629f2b6
allow specifying number of skips before abort/exit (closes #115)
In addition to 'abort' and 'exit', it is now possible to specify
'abort:N' and 'exit:N' (where N is any integer) as value for 'skip'
to abort/exit after consecutively skipping N downloads.
2018-10-13 17:21:55 +02:00