Mike Fährmann
387fe415d5
unescape items in text.split_html()
2021-03-29 02:12:29 +02:00
Mike Fährmann
968d3e8465
remove '&' from URL patterns
...
'/?&#' -> '/?#' and '?&#' -> '?#'
According to https://www.ietf.org/rfc/rfc3986.txt , URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
4a0c98bfc9
miscellaneous fixes and adjustments
2019-08-01 22:09:43 +02:00
Mike Fährmann
0d7e8be987
[dynastyscans] simplify image extractor
2019-04-27 13:24:30 +02:00
Mike Fährmann
9aa0bb5afe
[dynastyscans] encode "[]" in search queries
...
urllib3 1.25 classifies URLs with unencoded "[" or "]" as invalid
and raises an exception
2019-04-27 13:22:40 +02:00
Mike Fährmann
f2cf1c1d73
use 'text.extract_from()' in a few places
2019-04-21 15:19:20 +02:00
Mike Fährmann
937a802b49
[dynastyscans] add extractors for images and image searches
...
(closes #163 )
2019-02-18 12:25:52 +01:00
Mike Fährmann
5530871b5a
change results of text.nameext_from_url()
...
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)
Example: "https://example.org/path/filename.ext "
before:
- filename : filename.ext
- name : filename
- extension: ext
now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
580baef72c
change Chapter and MangaExtractor classes
...
- unify and simplify constructors
- rename get_metadata and get_images to just metadata() and images()
- rename self.url to chapter_url and manga_url
2019-02-11 18:38:47 +01:00
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
2019-02-11 13:31:10 +01:00
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
cc36f88586
rename safe_int to parse_int; move parse_* to text module
2018-04-20 14:53:21 +02:00
Mike Fährmann
7a412f5c32
implement generic manga-chapter extractor
2018-02-04 22:02:04 +01:00
Mike Fährmann
92027f67f9
use consistent names for URL constants
...
root := <scheme>://<host>
base_url := <root>/<common path>
2017-11-06 20:56:49 +01:00
Mike Fährmann
31ea6001e8
[dynastyscans] improve metadata and filename formats
2017-10-10 17:14:39 +02:00
Mike Fährmann
c921b4f32a
code cleanup and fixing tests
2017-06-02 09:10:58 +02:00
Mike Fährmann
13dc5d72bc
update some extractors to use https
2017-04-20 13:32:40 +02:00
Mike Fährmann
94e10f249a
code adjustments according to pep8 nr2
2017-02-01 00:53:19 +01:00
Mike Fährmann
56d810c896
update keyword hashes for tests
2016-09-25 17:28:46 +02:00
Mike Fährmann
19c2d4ff6f
remove explicit (sub)category keywords
2016-09-25 14:22:07 +02:00
Mike Fährmann
9d107b8e1b
[dynastyscans] add chapter extractor
2016-09-22 17:20:57 +02:00