gallery-dl

mirror of https://github.com/mikf/gallery-dl.git synced 2024-11-25 12:12:34 +01:00

Author	SHA1	Message	Date
Mike Fährmann	bb3e182562	overhaul session initialization - share adapter & connection pool across sessions with the same ssl options, ssl ciphers, and source address - simplify browser emulation to just a list of headers and ciphers	2022-01-31 23:12:08 +01:00
Mike Fährmann	6e0a6c484f	apply SPECIAL_EXTRACTORS only for blacklist settings as was the case before `010d65dc`	2022-01-06 21:09:30 +01:00
Mike Fährmann	010d65dcec	extend blacklist/whitelist syntax (#2025 ) Each entry in such a list can now also include a subcategory '<category>:<subcategory>' and it is possible to use '' or an empty string as placeholder ':<subcategory>', ':<subcategory>', '<category>:' For example "blacklist": "imgur,:tag,gfycat:user" or "blacklist": ["imgur", "*:tag", "gfycat:user"] will filter all 'imgur' extractors, all extractors with a 'tag' subcategory (e.g. https://danbooru.donmai.us/posts?tags=bonocho), and all 'gfycat' user extractors.	2021-11-23 20:31:43 +01:00
Mike Fährmann	cad85640de	move 'util.PathFormat' into its own 'path' module to prevent circular imports between 'formatter' and 'util'	2021-09-27 21:29:37 +02:00
Mike Fährmann	74145467dd	move 'util.Formatter' into its own 'formatter' module	2021-09-27 02:37:04 +02:00
Mike Fährmann	c9e6693530	allow specifying a minimum/maximum for 'sleep-*' options (#1835 ) for example '"sleep-request": [5.0, 10.0]' to wait between 5 and 10 seconds between each HTTP request	2021-09-14 17:40:05 +02:00
Mike Fährmann	d79bcb6236	allow extractors to register a 'finalize()' method	2021-09-07 21:15:30 +02:00
Mike Fährmann	72c0cd30c7	do not return with a nonzero exit status when no results found also change loglevel from 'warning' to 'info' (#1789)	2021-08-24 18:49:13 +02:00
Mike Fährmann	bd08ee2859	remove most 'yield Message.Version' statements only leave them in oauth.py as noop results	2021-08-16 03:10:48 +02:00
Mike Fährmann	bdfdabf498	show warning if extractor doesn't yield any results (#1759 )	2021-08-16 02:49:36 +02:00
Mike Fährmann	d320ee6251	implement a 'fallback' option (closes #1770 )	2021-08-16 01:47:59 +02:00
Mike Fährmann	1b2f9050fb	rename all instances of 'kwds' to 'kwdict'	2021-07-20 20:21:19 +02:00
Mike Fährmann	b9783403d9	add 'url-metadata' option (#1659 , #1073 )	2021-07-14 03:08:49 +02:00
Mike Fährmann	e95f99882f	extend 'parent-metadata' functionality (#1687 , #1651 , #1364 )	2021-07-14 02:53:41 +02:00
Mike Fährmann	64986f9435	fix depth counter in UrlJob regression from `adf4d661` It would either stop at the first level (-g) or go infinitely deep (-G) Going down to for example level 3 with -ggg didn't work.	2021-06-26 00:30:03 +02:00
Mike Fährmann	83fc4c1098	update post processor config capabilities This change makes it possible to specify just the name of a post processor in the "postprocessors" list instead of a dict with all of its options. The options for it will then be taken from inside the "postprocessor" block similar to "extractor", "downloader", or "output" blocks. This makes it possible to for example override the default settings for --write-metadata by specifying a custom "metadata" block, or to set a custom post processor block ("cbz") and then use it by referencing just its name in "postprocessors" lists. { "postprocessor": { "metadata": { "name": "metadata", "event": "post", "filename": "{tweet_id\|post_id\|id}.json" }, "cbz": { "name" : "zip", "compression": "store", "extension" : "cbz" } } }	2021-06-05 14:11:16 +02:00
Mike Fährmann	3cbbefd4ed	support 'filter' option for post processors (#1460 )	2021-06-04 18:23:32 +02:00
Mike Fährmann	adf4d661b3	use '_extractor' info in UrlJobs	2021-05-19 15:52:30 +02:00
Mike Fährmann	b50b8e6cf4	refactor applying 'parent-…' options	2021-05-13 21:56:34 +02:00
Mike Fährmann	7ab8374385	add 'parent-skip' option (#1399 )	2021-05-13 16:40:04 +02:00
Mike Fährmann	c693db5b1a	add '"skip": "terminate"' option Stops not only the current extractor/job, but all parent extractors/jobs as well.	2021-05-12 02:22:28 +02:00
Mike Fährmann	c5ca7905ce	add 'noop()' and 'identity()' functions	2021-05-04 19:27:17 +02:00
Mike Fährmann	5b4da4b4bf	reorder config access in Job constructor (#1111)	2021-04-27 15:12:59 +02:00
Mike Fährmann	b4ed7cb961	fix 'category-transfer' (#1111 ) broken since commit `055c32e0`	2021-04-19 00:55:44 +02:00
Mike Fährmann	a86ffb04bb	add 'output.fallback' option to enable/disable fallback URLs for -g/--get-urls	2021-04-12 02:00:41 +02:00
Mike Fährmann	a75e485461	add archive format to InfoJob output (#875 )	2021-04-07 21:50:16 +02:00
Mike Fährmann	bf241811dd	allow '_extractor' fields to be None or empty	2021-03-20 01:19:31 +01:00
Mike Fährmann	23641742a3	improve 'parent-directory' (#1364 ) Allow forwarding metadata from the top-level extractor to all children if 'parent-directory' is enabled for all extractors along the way. For example 'reddit' -> 'gfycat' -> 'redgifs'	2021-03-14 17:19:57 +01:00
Mike Fährmann	df94182e11	implement 'parent-metadata' option (#1364 ) experimental, might not work as expected, etc.	2021-03-11 01:10:34 +01:00
Mike Fährmann	b6719becf1	ensure '-s/--simulate' always prints filenames (#1360 ) by assuming a potentially wrong filename extension in cases where the correct one would only get known after a download started	2021-03-07 22:38:20 +01:00
Mike Fährmann	c963741860	add '-E/--extractor-info' command-line option (#875 )	2021-03-02 23:59:56 +01:00
Mike Fährmann	65ca923b4e	fix 'whitelist' option for BaseExtractor instances	2021-02-15 21:58:33 +01:00
Mike Fährmann	56a8968435	remove 'Message.Metadata' (#866 )	2021-01-31 02:12:37 +01:00
Mike Fährmann	46323ae6ff	initialize 'hooks' as empty tuple follow-up to `9c29fc4e` Prevent a "race" between initializing 'pathfmt' and 'hooks', and receiving a signal in between (e.g. ctrl+c), which would then crash in 'handle_finalize()'.	2020-11-28 18:18:49 +01:00
Mike Fährmann	9c29fc4e55	always initialize DownloadJob.hooks (fixes #1135 ) and not just when any (potential) post processors are defined	2020-11-28 00:09:19 +01:00
Mike Fährmann	9fffa9c343	rework post processor callbacks	2020-11-19 02:29:06 +01:00
Mike Fährmann	f99c6031e0	apply post processor blacklists/whitelists to basecategories (#1103)	2020-11-17 02:02:31 +01:00
Mike Fährmann	a3ca2f6080	update fallback URL handling remove Message.Urllist and use a '_fallback' field inside a kwdict	2020-10-16 01:09:55 +02:00
Mike Fährmann	fd20093c96	allow blacklist/whitelist to be empty lists/strings (#1051 )	2020-10-08 14:55:21 +02:00
Mike Fährmann	d5fa716d89	fix crash when using 'skip=false' and archive (fixes #1023 ) Separating the archive check from pathfmt.exists() in `b5243297` had some unintended side effects. It is also not possible to monkey-patch a dunder method like __contains__ because of the special method lookup that gets performed for them.	2020-09-23 19:07:40 +02:00
Mike Fährmann	231dd4c800	accumulate postprocessor objects (#994 ) Instead of one 'postprocessors' setting overwriting all others lower in the hierarchy, all postprocessors along the config path will now get collected into one big list. For example '--mtime-from-date' will therefore no longer cause other postprocessor settings in a config file to get ignored.	2020-09-14 21:51:55 +02:00
Mike Fährmann	3afd362e2e	add 'sleep-extractor' option (closes #964 ) (would have been nice if this were possible without code duplication)	2020-09-12 21:04:47 +02:00
Mike Fährmann	c78aa17506	add general 'blacklist' and 'whitelist' options (#492 , #844 )	2020-09-11 13:17:12 +02:00
Mike Fährmann	5912727b88	support format string replacement fields in archive paths (closes #985)	2020-09-10 22:09:30 +02:00
Mike Fährmann	b5243297ff	write skipped files to archive (closes #550 )	2020-09-03 18:37:38 +02:00
Mike Fährmann	3f73cc6855	allow 'parent-directory' to work recursively (fixes #905 )	2020-07-29 00:31:23 +02:00
Mike Fährmann	d5bfb0b38c	set pseudo extension for Metadata messages (#865 ) This prevents pathfmt.filename from potentially being empty.	2020-07-04 22:14:39 +02:00
Mike Fährmann	1b3870a4be	flush after writing JSON in DataJob() (#727 ) … and remove the dead handle_finalize() method, which is never called since DataJob() overrides run().	2020-06-19 23:05:44 +02:00
Mike Fährmann	7e8a747c56	improve output of '-K' for parent extractors 2 (#825 ) This is what `bb882b8` was supposed to be, but I managed to not include those changes in the first commit …	2020-06-18 15:04:15 +02:00
Mike Fährmann	ece73b5b2a	make 'path' and 'keywords' available in logging messages Wrap all loggers used by job, extractor, downloader, and postprocessor objects into a (custom) LoggerAdapter that provides access to the underlying job, extractor, pathfmt, and kwdict objects and their properties. __init__() signatures for all downloader and postprocessor classes have been changed to take the current Job object as their first argument, instead of the current extractor or pathfmt. (#574, #575)	2020-05-18 19:04:51 +02:00
Mike Fährmann	a1e739b96c	reuse connection adapters from parent extractors	2020-05-12 23:52:01 +02:00
Mike Fährmann	42f29c3e11	improve and simplify attribute access in DownloadJob.initialize()	2020-05-09 00:57:59 +02:00
Mike Fährmann	56f1c96168	implement 'parent-directory' option (#551 )	2020-01-29 18:32:37 +01:00
Mike Fährmann	37247dbaff	miscellaneous fixes	2020-01-19 22:53:06 +01:00
Mike Fährmann	0e9dc5c88e	fix AttributeError when accessing 'temppath' [ci skip]	2020-01-19 00:41:21 +01:00
Mike Fährmann	0b84068d84	remove temp files before downloading from fallback URLs otherwise the next call to download() with a fallback URL could see the partially downloaded "remains" from the previous, failed download attempt and "continue" it, writing the second half of a potentially different version of that file.	2020-01-18 00:47:17 +01:00
Mike Fährmann	2d4887b75b	improve KeywordJob output for "parent" extractors (closes #548 )	2019-12-28 22:26:49 +01:00
Mike Fährmann	2e2fc7f0ad	prevent infinite recursion when spawning extractors (closes #489 )	2019-12-26 23:38:16 +01:00
Mike Fährmann	1921c127a5	make OSErrors during file downloads nonfatal (closes #512 ) … except ENOSPC (No space left on device), since there is no reason to continue downloading in that case. All other errors that would prevent downloading data and writing it to disk get already raised during directory creation and are therefore not checked here.	2019-12-19 18:34:05 +01:00
Mike Fährmann	63e6993716	merge 'bypost' functionality into metadata postprocessor	2019-12-16 17:19:23 +01:00
Gio	c0b9ad678d	Separate metadata from handle_url into handle_metadata, commenting	2019-12-09 16:02:15 -06:00
Gio	6ed4fc07ff	Don't print intentional metadata skips to the console.	2019-12-09 01:02:17 -06:00
Gio	cfc70a97ab	Added an additional channel for downloading the metadata of an entire post or gallery.	2019-12-09 00:56:27 -06:00
Mike Fährmann	f5604492c3	update interface of config functions	2019-11-24 00:42:28 +01:00
Mike Fährmann	3fc1e12949	[postprocessor:metadata] filter private entries i.e. keys starting with an underscore	2019-11-21 16:58:44 +01:00
Mike Fährmann	9e88e7a344	[postprocessor:exec] improve (#421 , #413 ) - add 'final' option - include job status in pp finalization - improve and extend documentation	2019-11-03 21:45:45 +01:00
Mike Fährmann	5af291ba5c	include failed downloads and child extractors in exit status	2019-10-29 15:56:54 +01:00
Mike Fährmann	322c2e7ed4	renaming variables mostly 'keyword(s)' to 'kwdict'	2019-10-29 15:46:35 +01:00
Mike Fährmann	4409d00141	embed error messages in StopExtraction exceptions	2019-10-28 16:39:49 +01:00
Mike Fährmann	c887493a80	overhaul exception stuff	2019-10-27 23:53:37 +01:00
Mike Fährmann	389d2d7e38	implement 'cookies-update' option (#445 )	2019-10-19 15:23:55 +02:00
Mike Fährmann	03bc8adfc7	[postprocessor:exec] run after file moved to target location (#421)	2019-10-06 23:12:22 +02:00
Mike Fährmann	776e9e073f	close archive on job completion (#417 )	2019-09-10 22:43:51 +02:00
Mike Fährmann	9178b54eae	handle errors when opening download archive file (#417 )	2019-09-10 16:44:47 +02:00
Mike Fährmann	682105b8ee	prevent crash when loading unavailable downloader (#405 )	2019-08-31 21:58:33 +02:00
Mike Fährmann	5f8621b29d	improve output of active post processor modules	2019-08-15 13:31:04 +02:00
Mike Fährmann	0bb873757a	update PathFormat class - change 'has_extension' from a simple flag/bool to a field that contains the original filename extension - rename 'keywords' to 'kwdict' and some other stuff as well - inline 'adjust_path()' - put enumeration index before filename extension (#306)	2019-08-12 21:40:37 +02:00
Mike Fährmann	8dc42bb178	implement 'enumerate' for 'extractor.skip' (#306 ) [ci skip]	2019-08-08 18:37:54 +02:00
Mike Fährmann	20f7b07312	ensure postproc finalize() is called during C-c or crash (#355 )	2019-07-27 11:14:52 +02:00
Mike Fährmann	7b77ecc35a	fix paths for files without extension (#220 )	2019-07-15 16:39:03 +02:00
Mike Fährmann	62097284fe	add 'download' option (#220 )	2019-07-14 18:48:18 +02:00
Mike Fährmann	fe7805de7c	improve attribute access in DownloadJob.handle_url() Storing a value in a local variable an accessing it that way is faster than going through 'self' if it is accessed more than once.	2019-07-13 21:42:07 +02:00
Mike Fährmann	f2000a69aa	implement 'image-unique' and 'chapter-unique' options (#303 ) The default value for both is 'false', i.e. duplicate URLs are NOT ignored. The previous behavior was to always ignore duplicate URLs to make '--abort-on-skip' work properly when new images where added to the beginning of a collection while gallery-dl is running.	2019-06-29 22:50:17 +02:00
Mike Fährmann	ee4d7c3d89	update downloader.find() and related code Instead of replacing 'https' with 'http' for every URL in 'get_downloader()', this now only happens once during downloader initialization. Also unit tests.	2019-06-20 16:59:44 +02:00
Mike Fährmann	523ebc9b0b	Fix serialization of 'datetime' objects in '--write-metadata' Simplified universal serialization support in json.dump() can be achieved by passing 'default=str', which was already the case in DataJob.run() for -j/--dump-json, but not for the 'metadata' post-processor. This commit introduces util.dump_json() that (more or less) unifies the JSON output procedure of both --write-metadata and --dump-json. (#251, #252)	2019-05-09 16:49:22 +02:00
Mike Fährmann	b09a8184ca	move TestJob into test module; test _extractor values	2019-02-17 18:18:31 +01:00
Mike Fährmann	ae353ed3b0	provide "extractor" and "job" keys for logging output This allows for stuff like "{extractor.url}" and "{extractor.category}" in logging format strings. Accessing 'extractor' and 'job' in any way will return "None" if those fields aren't defined, i.e. in general logging messages.	2019-02-14 11:09:58 +01:00
Mike Fährmann	89ee8cd7e4	filter "private" kwdict entries	2019-02-13 13:22:11 +01:00
Mike Fährmann	61741d7333	provide type information for Queue messages Child extractors are now directly constructed with Extractor.from_url() if the extractor class is known beforehand, instead of using extractor.find() and searching through all possible extractor classes.	2019-02-12 21:32:32 +01:00
Mike Fährmann	277b52101a	add 'category-transfer' option [ci skip]	2019-01-19 20:28:19 +01:00
Mike Fährmann	5f38ac9609	[postprocessor:exec] add a better error message (#155 )	2019-01-13 13:59:11 +01:00
Mike Fährmann	0225d90078	add exception name and traceback for OSErrors	2018-12-04 19:24:50 +01:00
Mike Fährmann	fb53b5dd55	fix control+c during -j and range tests	2018-11-25 18:54:05 +01:00
Mike Fährmann	13cb270326	set target directory before postprocessor init (fixes #126 )	2018-11-21 22:21:26 +01:00
Mike Fährmann	b828473aa3	retry HTTP requests for more exception classes	2018-11-19 15:49:13 +01:00
Mike Fährmann	c47482b110	smaller changes, missing docs, etc. - make 'netrc' extractor-specific - rename 'downloader.enable' to 'enabled' - document 'downloader.ytdl.format' - consistent newlines in configuration.rst	2018-11-16 18:18:07 +01:00
Mike Fährmann	3c25fa2dad	update build_testresult_db.py script	2018-11-15 22:58:14 +01:00
Mike Fährmann	8ef84a6823	add option to enable/disable specific downloader modules ... and write URLs with no (active) downloader to unsupported-file	2018-11-13 18:06:36 +01:00
Mike Fährmann	d3d7f01543	add 'prepare()' step for post-processors This allows post-processors to modify the destination path before checking if a file already exists.	2018-10-18 22:32:03 +02:00
Mike Fährmann	6ed629f2b6	allow specifying number of skips before abort/exit (closes #115 ) In addition to 'abort' and 'exit', it is now possible to specify 'abort:N' and 'exit:N' (where N is any integer) as value for 'skip' to abort/exit after consecutively skipping N downloads.	2018-10-13 17:21:55 +02:00

1 2 3 4 5

231 Commits