gallery-dl

mirror of https://github.com/mikf/gallery-dl.git synced 2024-11-26 04:32:51 +01:00

Author	SHA1	Message	Date
Mike Fährmann	41249f3ead	improve extractor.get_downloader()	2018-09-05 18:17:16 +02:00
Mike Fährmann	712b58a93b	[postprocessor] add black-/whitelist options Each post-processor config dict now supports a list of extractor categories for which it should/shouldn't be active for. For example: "postprocessors": [ {"name": "classify", "whitelist": ["tumblr", "deviantart"], ... } ]	2018-09-03 14:53:43 +02:00
Mike Fährmann	4313c95bc9	improve error message for OAuth2 authentication	2018-08-11 23:54:25 +02:00
Mike Fährmann	973cf98e88	fix download skip for files without extension	2018-06-27 17:16:07 +02:00
Mike Fährmann	2403c405e3	Merge branch 'postprocessor'	2018-06-08 17:43:11 +02:00
Mike Fährmann	baccf8a958	improve postprocessor handling - add pathfmt argument for __init__() - add finalization step - add option to keep or delete zipped files	2018-06-08 17:39:02 +02:00
Mike Fährmann	7646bdbcfd	improve postprocessor initialization code	2018-06-07 22:29:54 +02:00
Mike Fährmann	821535b458	adjust PathFormat class	2018-06-06 20:17:17 +02:00
Mike Fährmann	2df1a15fb8	add '-s/--simulate' to run data extraction without download Useful for quick testing (even though -g and -j kind of do the same) and to fill a download archive without actually downloading the files. -s does the same as the default behaviour, except downloading stuff. Maybe it should get a more fitting name, as it does actually write to disk (cache, archive)?	2018-05-25 16:07:18 +02:00
Mike Fährmann	76c32d58e5	[postprocessor] initial code	2018-05-22 14:59:22 +02:00
Mike Fährmann	8bf3cdd82b	implement logging options Standard logging to stderr, logfiles, and unsupported URL files (which are now handled through the logging module) can now be configured by setting their respective option keys (log, logfile, unsupportedfile) to a dict and specifying the following options; - format: format string for logging messages available keys: see [1] default: "[{name}][{levelname}] {message}" - format-date: format string for {asctime} fields in logging messages available keys: see [2] default: "%Y-%m-%d %H:%M:%S" - level: the lowercase levelname until which the logger should activate; available levels are debug, info, warning, error, exception default: "info" - path: path of the file to be written to - mode: 'mode' argument when opening the specified file can be either "w" to truncate the file or "a" to append to it (see [3]) If 'output.log', '.logfile', or '.unsupportedfile' is a string, it will be interpreted, as it has been, as the filepath (or as format string for .log) [1] https://docs.python.org/3/library/logging.html#logrecord-attributes [2] https://docs.python.org/3/library/time.html#time.strftime [3] https://docs.python.org/3/library/functions.html#open	2018-05-01 17:54:52 +02:00
Mike Fährmann	9fb82e6b43	apply expand_path() to archive paths	2018-03-08 18:06:39 +01:00
Mike Fährmann	f970a8f13c	fix adding keys to download archive when using skip=false	2018-02-13 23:45:30 +01:00
Mike Fährmann	be3ea4425d	test archive-id creation and uniqueness	2018-02-12 23:02:09 +01:00
Mike Fährmann	3cec533c28	Merge branch 'archive'	2018-02-12 18:07:58 +01:00
Mike Fährmann	4d2fadfb6f	restore skip actions with download archive	2018-02-12 16:56:45 +01:00
Mike Fährmann	7f7c16ae37	add option to specify additional key-value pairs	2018-02-08 23:10:58 +01:00
Mike Fährmann	8c3b713362	rework DownloadJob.handle_url(); include archive functionality todo: "abort" and "exit" skip modes if download is skipped because of archive	2018-02-01 20:49:41 +01:00
Mike Fährmann	db7f04dd97	emit log messages on download failure and when retrying with fallback URLs	2018-01-28 18:44:10 +01:00
Mike Fährmann	27fce6f600	fix UrlJob behavior	2018-01-23 15:42:26 +01:00
Mike Fährmann	b837420291	fix minor urllist issues	2018-01-19 22:54:15 +01:00
Mike Fährmann	9d69401391	initial support for multiple URLs per image	2018-01-17 22:08:19 +01:00
Mike Fährmann	6174a5c4ef	[download] adjust filename extension on filetype mismatch (closes #63)	2018-01-17 18:37:06 +01:00
Mike Fährmann	1a70857a12	update extractor-unittest capabilities - "count" can now be a string defining a comparison in the form of '<operator> <value>', for example: '> 12' or '!= 1'. If its value is not a string, it is assumed to be a concrete integer as before. - "keyword" can now be a dictionary defining tests for individual keys. These tests can either be a type, a concrete value or a regex starting with "re:". Dictionaries can be stacked inside each other. Optional keys can be indicated with a "?" before its name. For example: "keyword:" { "image_id": int, "gallery_id", 123, "name": "re:pattern", "user": { "id": 321, }, "?optional": None, }	2017-12-30 19:05:37 +01:00
Mike Fährmann	88bb0798fd	delay initialization of PathFormat objects This allows the DeviantArt group-check to be moved inside the Extractor.items() method which in turn allows for better exception handling. As a new general rule: Never raise exceptions during extractor initialization.	2017-12-29 22:15:57 +01:00
Mike Fährmann	9d73ed4772	fix issue with using 'skip()' when a filter is present calling skip() skips over unfiltered items and does not apply the filter expression to them, which is not what should happen	2017-12-27 22:09:10 +01:00
Mike Fährmann	291369eab2	various smaller changes/additions	2017-12-06 21:45:56 +01:00
Mike Fährmann	4fb6803fa6	add option to sleep before each download	2017-12-04 17:33:10 +01:00
Mike Fährmann	6c9da67581	apply selection options (filter, range) when using '-j'	2017-11-18 17:35:57 +01:00
Mike Fährmann	27c026543f	re-enable download unit tests	2017-10-25 12:55:36 +02:00
Mike Fährmann	2e982f56af	use 'Content-Length' to determine incomplete downloads (#29 )	2017-10-20 18:56:18 +02:00
Mike Fährmann	2ef3c35c98	smaller textual changes - swapped doc for deviantart.mature and .original - updated gallery-dl.conf - "transferred" -> "delegated"	2017-10-09 23:23:19 +02:00
Mike Fährmann	0386503c80	fix (sub)category-transfer for DownloadJob instances (#41 ) ... and extend "parent" parameters to TestJob- and DataJob-classes as well.	2017-10-06 15:38:35 +02:00
Mike Fährmann	b319f4bab3	smaller code and text changes	2017-10-01 18:23:40 +02:00
Mike Fährmann	26a866e7d8	implement (sub)category-transfer between extractors (#41 ) ImageFap- and all Manga-Extractors will transfer their (sub)category values to other extractors instantiated by them, which will in turn allow those to use options set for their parents. Example: ImagefapGalleryExtractors will use options set under extractor.imagefap.user, if (and only if) they have been instantiated by a ImagefapUserExtractor; and options from extractor.imagefap.gallery otherwise.	2017-09-26 21:05:11 +02:00
Mike Fährmann	9c138dfc1f	[common] detect empty HTTP response bodies	2017-09-26 16:49:58 +02:00
Mike Fährmann	0dedbe759c	enable '--chapter-filter' The same filter infrastructure that can be applied to image URLS now also works for manga chapters and other delegated URLs. TODO: actually provide any metadata (currently supported is only deviantart and imagefap).	2017-09-12 16:19:00 +02:00
Mike Fährmann	5704c709fa	apply filter before range	2017-09-09 14:51:31 +02:00
Mike Fährmann	9b21d3f13c	add '--filter' command-line option This allows for image filtering via Python expressions by the same metadata that is also used to build filenames (--list-keywords). The usually shunned eval() function is used to evaluate filter-expressions, but it seemed quite appropriate in this case and shouldn't introduce any new security issues, as any attacker that could do > gallery-dl --filter "delete-everything()" ... could as well do > python -c "delete-everything()"	2017-09-08 17:52:00 +02:00
Mike Fährmann	268cfa3cfe	filter duplicate URLs (#36 ) Duplicate URLs might occur if, for example, an artist adds another image to his gallery while an extractor is running and images are being downloaded on sites like pixiv/nijie/hentaifoundry. The next image on the next page will have already been downloaded and will cause a premature end if '--abort-on-skip' is being used.	2017-09-06 17:08:50 +02:00
Mike Fährmann	47bcf53ec1	implement support for additional unit test result types - "pattern" matches all resulting URLs against the given regex - "count" allows to specify the amount of returned URLs	2017-08-25 22:01:14 +02:00
Mike Fährmann	ae2d61e5b3	handle format string exceptions separately	2017-08-11 21:48:37 +02:00
Mike Fährmann	3c9f190757	extend output of --list-keywords	2017-08-10 17:36:21 +02:00
Mike Fährmann	cfa479fab5	update error message for unspecified exceptions - ask user to report unexpected errors, which usually indicate extractor failure - handle OSErrors separately (permissions, disk full, etc) - revert `30eef52`	2017-08-10 16:35:46 +02:00
Mike Fährmann	915a0137de	improve 'extractor.request' - add 'fatal' argument - improve internal logic and flow - raise known exception on error - update exception hierarchy	2017-08-05 16:11:46 +02:00
Mike Fährmann	58e95a7487	share extractor and downloader sessions There was never any "good" reason for the strict separation between extractors and downloaders. This change allows for reduced resource usage (probably unnoticeable) and less lines of code at the "cost" of tighter coupling.	2017-06-30 19:38:14 +02:00
Mike Fährmann	c921b4f32a	code cleanup and fixing tests	2017-06-02 09:10:58 +02:00
Mike Fährmann	25bcdc8aa9	add `--write-unsupported` option (#15 )	2017-05-27 16:16:57 +02:00
Mike Fährmann	99b72130ee	[reddit] enable recursion (#15 ) reddit extractors now recursively visit other submissions/posts linked to in the initial set of submissions. This behaviour can be configured via the 'extractor.reddit.recursion' key in the configuration file or by `-o recursion=<value>`. Example: {"extractor": { "reddit": { "recursion": <value> }}} Possible values: * -1 - infinite recursion (don't do this) * 0 - recursion is disabled (default) * 1 and higher - maximum recursion level	2017-05-26 17:01:27 +02:00
Mike Fährmann	ae686c4c08	run queue items immediately	2017-05-24 15:15:06 +02:00
Mike Fährmann	30eef527d8	update output logic on error [ci skip]	2017-05-23 20:12:57 +02:00
Mike Fährmann	e425243b1e	[reddit] some small fixes - filter or complete some URLs - remove the 'nofollow:' scheme before printing URLs - (#15)	2017-05-23 11:48:00 +02:00
Mike Fährmann	a90c6acc9c	code cleanup + fixes	2017-05-18 15:18:18 +02:00
Mike Fährmann	4c88c0d496	rework the output format for --list-keywords	2017-05-15 18:30:47 +02:00
Mike Fährmann	13dc5d72bc	update some extractors to use https	2017-04-20 13:32:40 +02:00
Mike Fährmann	5af35ea150	add -v/--verbose option and reduce error verbosity (#12)	2017-04-18 11:38:48 +02:00
Mike Fährmann	b43cd88101	add '-j/--dump-json' option this outputs the extractor-results in JSON format rather then downloading files	2017-04-12 18:43:41 +02:00
Mike Fährmann	841fd50242	move code into util.py	2017-03-28 13:12:44 +02:00
Mike Fährmann	ed94d9b92d	fix/improve various things	2017-03-17 09:39:46 +01:00
Mike Fährmann	27ae152f57	use logging to report errors	2017-03-11 01:47:57 +01:00
Mike Fährmann	7a9d66fbce	implement basic way to tell extractors to skip ahead	2017-03-03 17:26:50 +01:00
Mike Fährmann	2fa575b273	restore exception-testing to its old form	2017-02-27 23:05:08 +01:00
Mike Fährmann	40be4933b8	fix exception based tests	2017-02-26 02:06:56 +01:00
Mike Fährmann	24f41e13b3	move some exception handling code	2017-02-25 23:53:31 +01:00
Mike Fährmann	6208d9dd79	implement '--images' and '--chapters' options - the former '--items' has been renamed to '--chapters' - #6	2017-02-23 21:51:29 +01:00
Mike Fährmann	2a32b12043	add '--items' option this allows to specify which manga-chapters/comic-issues to download when using gallery-dl on a manga/comic URL	2017-02-20 22:02:49 +01:00
Mike Fährmann	3bca866185	rework the '-g' cmdline option the amount of how often the -g option is given now determines up until what level URLs are resolved. example: $ gallery-dl -g http://kissmanga.com/Manga/Dropout http://kissmanga.com/Manga/Dropout/Ch-000---Oneshot-?id=145847 - when applied to a manga-extractor, specifying the -g option once will now print a list of all chapter URls $ gallery-dl -gg http://kissmanga.com/Manga/Dropout http://2.bp.blogspot.com/.../000.png http://2.bp.blogspot.com/.../001.png ... - specifying it twice (or even more often) will go a level deeper and print the image URLs found in those chapters	2017-02-17 22:18:16 +01:00
Mike Fährmann	4f123b8513	code adjustments according to pep8	2017-01-30 19:40:15 +01:00
Mike Fährmann	29692c5784	get extension from Content-Type header if not provided	2016-09-30 12:32:48 +02:00
Mike Fährmann	1134339c1f	Merge branch 'category'	2016-09-25 17:52:55 +02:00
Mike Fährmann	f32cf28758	enable long pathnames on windows (#4 )	2016-09-25 09:30:06 +02:00
Mike Fährmann	581daebc4b	remove trailing spaces from path segments (#4 )	2016-09-24 11:29:25 +02:00
Mike Fährmann	a347d50ef5	add (sub)category keyword automatically	2016-09-24 10:45:11 +02:00
Mike Fährmann	406add217c	print urls recursively	2016-08-11 13:20:21 +02:00
Mike Fährmann	6f7f29d684	rename a few files	2016-07-14 14:25:56 +02:00

1 2 3

125 Commits