gallery-dl

mirror of https://github.com/mikf/gallery-dl.git synced 2024-11-23 11:12:40 +01:00

Author	SHA1	Message	Date
Mike Fährmann	05255f5be0	add 'default' argument to 'text.extr()'	2022-11-09 11:00:32 +01:00
Mike Fährmann	eb33e6cf2d	add 'text.extr()' a stripped-down version of text.extract() that - always returns a string (like 'extract_from') - only returns a string - does not deal with 'pos' arguments - is ~20% faster	2022-11-04 21:37:36 +01:00
Mike Fährmann	67bad04dda	[formatter] add 'g' conversion to sluGify a string (#2410 )	2022-08-26 17:57:17 +02:00
Mike Fährmann	bddcec49f1	implement 'text.root_from_url()' use domain from input URL for kemono	2022-03-01 03:09:57 +01:00
Mike Fährmann	bc0e853d30	combine KeyError & IndexError to common base class LookupError	2022-02-11 00:42:49 +01:00
Mike Fährmann	bc868e7bb8	consider apparently long extensions as part of the filename (#1516)	2021-05-02 21:15:50 +02:00
Mike Fährmann	387fe415d5	unescape items in text.split_html()	2021-03-29 02:12:29 +02:00
Mike Fährmann	78fd63b8f0	remove 'text.clean_xml()' was not used anywhere	2021-03-28 04:05:16 +02:00
Mike Fährmann	8553b218d9	replace calls to 'os.path.splitext()' with 'str.rpartition()' Makes functions who used it more than twice as fast and we can get rid of an import as well.	2021-03-28 04:01:27 +02:00
Mike Fährmann	a09f42f6b3	improve filename_from_url() performance Manually extracting the part between the last '/' and '?' instead of relying on the standard libraries' 'urllib.parse.urlsplit()' increases performance by ~400%. urlsplit() : 3.64 secs per 1.000.000 iterations partition(): 0.87 secs per 1.000.000 iterations	2020-10-23 00:14:06 +02:00
Mike Fährmann	37d71f6e09	strip microseconds in text.parse_datetime()	2020-06-17 21:40:16 +02:00
Mike Fährmann	6294e2c540	add 'text.ensure_http_scheme()'	2020-05-19 22:32:53 +02:00
Mike Fährmann	a0f4c295c0	add optional 'utcoffset' argument to 'parse_datetime()'	2020-04-11 02:05:00 +02:00
Mike Fährmann	f6c5edb76b	pre-compile regex pattern for remove_html() and split_html()	2020-03-13 23:31:54 +01:00
Mike Fährmann	b1bea8aaeb	add 'restrict-filenames' option (#348 )	2019-07-23 17:41:24 +02:00
Mike Fährmann	1740086d8a	add 'repl' and 'sep' arguments to text.replace_html()	2019-07-17 14:48:24 +02:00
Mike Fährmann	b171befa87	implement 'parse_unicode_escapes()'	2019-06-16 21:47:24 +02:00
Mike Fährmann	2b1999476e	implement 'text.rextract()'	2019-05-28 21:03:41 +02:00
Mike Fährmann	2316e0ed3d	fix strptime workaround from `b0e85a4` Don't return a modified version of 'date_time' if strptime fails.	2019-05-25 23:22:26 +02:00
Mike Fährmann	b0e85a42e3	apply workaround from `4736912` in parse_datetime() itself	2019-05-09 21:53:17 +02:00
Mike Fährmann	d09864b581	implement text.parse_datetime()	2019-05-08 15:43:59 +02:00
Mike Fährmann	6264a46212	use 'utcfromtimestamp()' 'fromtimestamp()' converts its results to the local timezone and causes problems when running tests on a different machine.	2019-04-21 16:22:53 +02:00
Mike Fährmann	d670de0344	implement 'text.parse_timestamp()'	2019-04-21 15:28:27 +02:00
Mike Fährmann	21a7e395a7	implement convenience wrapper for text.extract functionality	2019-04-19 22:30:11 +02:00
Mike Fährmann	8f249f1d54	improve text.extract_iter() performance by roughly 40% through - inlining code - pre-calculating reused values - entering a try-except block only once	2019-04-18 23:37:17 +02:00
Mike Fährmann	5530871b5a	change results of text.nameext_from_url() Instead of getting a complete 'filename' from an URL and splitting that into 'name' and 'extension', the new approach gets rid of the complete version and renames 'name' to 'filename'. (Using anything other than {extension} for a filename extension doesn't really work anyway) Example: "https://example.org/path/filename.ext" before: - filename : filename.ext - name : filename - extension: ext now: - filename : filename - extension: ext	2019-02-14 16:07:17 +01:00
Mike Fährmann	e1d3e9a926	add 'ext_from_url' to text.py	2019-01-31 12:23:25 +01:00
Mike Fährmann	2d2953a5bf	add 'text.parse_float()' + cleanup in text.py	2019-01-29 16:46:21 +01:00
Mike Fährmann	ae9a37a528	implement text.split_html()	2018-05-27 15:00:41 +02:00
Mike Fährmann	cc36f88586	rename safe_int to parse_int; move parse_* to text module	2018-04-20 14:53:21 +02:00
Mike Fährmann	4ffa94f634	remove 'shorten_path()' and 'shorten_filename()'	2018-04-15 18:44:13 +02:00
Mike Fährmann	27eab4e467	rewrite text tests and improve functions - test more edge cases - consistently return an empty string for invalid arguments - remove the ungreedy-flag in 'remove_html()'	2018-04-15 18:13:46 +02:00
Mike Fährmann	e3f2bd4087	add tests for 'text.clean_xml()' and improve it	2018-04-14 22:07:01 +02:00
Mike Fährmann	6d8b191ea7	improve 'parse_query()' and add tests - another irrelevant micro-optimization ! - use urllib.parse.parse_qsl directly instead of parse_qs, which just packs the results of parse_qsl in a different data structure - reduced memory requirements since no additional dict and lists are created	2018-04-13 19:21:32 +02:00
Mike Fährmann	731ffd4986	improve text.filename_from_url() performance - urlsplit() is faster than urlparse() - rpartition() is faster than rindex() + slicing - new version is 2.3 times as fast	2018-02-18 16:50:07 +01:00
Mike Fährmann	f7cdfd4c25	add a simplified version of 'parse_qs' This version only returns a dict of plain string to string key-value pairs and ignores multiple values for the same query variable.	2017-08-24 20:55:58 +02:00
Mike Fährmann	e5f79ae839	[deviantart] add support for all media types - this includes - images - videos - flash-animations - journals - also renamed some of the extractors - User -> Gallery - Image -> Deviation	2017-05-10 16:45:45 +02:00
Mike Fährmann	ed94d9b92d	fix/improve various things	2017-03-17 09:39:46 +01:00
Mike Fährmann	619c74159a	[seiga] fix file extension and xml parsing - The file extension of the first image had been used for all further images - API responses can contain invalid characters, which cause the XML parser to fail (http://seiga.nicovideo.jp/user/illust/26377934 contains several \x08 characters)	2017-03-14 09:09:04 +01:00
Mike Fährmann	4f123b8513	code adjustments according to pep8	2017-01-30 19:40:15 +01:00
Mike Fährmann	8780abcc77	fix a small spelling error	2017-01-10 14:24:58 +01:00
Mike Fährmann	00074a71d7	several changes to make travis build work - fixed html.unescape not being available on Python3.3 - removed inconsistent test result - added username/password pairs for authenticating extractors	2017-01-10 13:41:00 +01:00
Mike Fährmann	91c446805b	replace platform.system() with os.name	2016-10-25 15:44:36 +02:00
Mike Fährmann	8a49a28d13	replace deprecated 'unescape' method	2016-02-18 15:54:58 +01:00
Mike Fährmann	99b4fbb081	implement text.extract_iter	2015-11-28 01:46:34 +01:00
Mike Fährmann	7fd284a705	always provide lowercase fileextensions	2015-11-16 17:40:05 +01:00
Mike Fährmann	ca523b9f64	add helper method to text module	2015-11-16 03:46:43 +01:00
Mike Fährmann	d0bebd9ce3	allow adding values to existing dict	2015-11-03 00:05:18 +01:00
Mike Fährmann	629133a27a	document text.extract	2015-11-02 15:52:26 +01:00
Mike Fährmann	692d0c95cc	reimplement text.extract_all	2015-11-02 15:51:32 +01:00

1 2

53 Commits