Mike Fährmann
a453335a9f
remove test results in extractor modules
...
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
dd884b02ee
replace json.loads with direct calls to JSONDecoder.decode
2023-02-09 15:22:00 +01:00
Mike Fährmann
d2fc73f20b
[hentai2read] fix manga metadata extraction
...
and update tests
2022-10-24 16:31:01 +02:00
enduser420
fd19c4b228
[hentai2read] recognize '.' in chapter ( #3089 )
2022-10-24 15:53:51 +02:00
Mike Fährmann
968d3e8465
remove '&' from URL patterns
...
'/?&#' -> '/?#' and '?&#' -> '?#'
According to https://www.ietf.org/rfc/rfc3986.txt , URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
5530871b5a
change results of text.nameext_from_url()
...
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)
Example: "https://example.org/path/filename.ext "
before:
- filename : filename.ext
- name : filename
- extension: ext
now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
32edf4fc7b
add '_extractor' info to manga extractor results
2019-02-13 13:23:36 +01:00
Mike Fährmann
580baef72c
change Chapter and MangaExtractor classes
...
- unify and simplify constructors
- rename get_metadata and get_images to just metadata() and images()
- rename self.url to chapter_url and manga_url
2019-02-11 18:38:47 +01:00
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
2019-02-11 13:31:10 +01:00
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
cc36f88586
rename safe_int to parse_int; move parse_* to text module
2018-04-20 14:53:21 +02:00
Mike Fährmann
179bcdd349
adjust archive-ids
2018-02-13 04:50:45 +01:00
Mike Fährmann
cf147dfee9
[hentai2read] fix manga extraction
...
- site changed its HTML structure
2018-02-09 22:24:34 +01:00
Mike Fährmann
5b3c34aa96
use generic chapter-extractor in more modules
2018-02-07 12:36:39 +01:00
Mike Fährmann
377b78b3c9
[hentai2read] fix manga name extraction
2018-02-04 22:12:24 +01:00
Mike Fährmann
6b8e3003df
[hentai2read] ensure consistent extraction results
2017-12-03 02:34:35 +01:00
Mike Fährmann
9fc1d0c901
implement and use 'util.safe_int()'
...
same as Python's 'int()', except it doesn't raise any exceptions and
accepts a default value
2017-09-24 15:59:25 +02:00
Mike Fährmann
92c8a6cb01
[hentai2read] extract hmanga metadata
2017-09-20 13:28:57 +02:00
Mike Fährmann
6f30cf4c64
change keyword names to valid Python identifiers
...
This commit mostly replaces all minus-signs ('-') in keyword names with
underscores ('_') to allow them to be used in filter-expressions. For
example 'gallery-id' got renamed to 'gallery_id'.
(It is theoretically possible to access any variable, regardless of its
name, with 'locals()["NAME"]', but that seems a bit too convoluted if
just 'NAME' could be enough)
2017-09-10 22:20:47 +02:00
Mike Fährmann
e61a3a56d1
[hentai2read] fix and update keywords
...
Added the "author" keyword and changed the name of a few others to be
consistent with other manga/chapter extractors.
2017-08-22 15:01:47 +02:00
Mike Fährmann
6950708e52
[hentaicdn] use HTTPS
2017-08-02 18:31:21 +02:00
Mike Fährmann
f226417420
simplify code by using a MangaExtractor base class
2017-05-20 11:27:43 +02:00
Mike Fährmann
f361cb13e0
[hentai2read] fix extraction
2017-02-26 02:25:36 +01:00
Mike Fährmann
94e10f249a
code adjustments according to pep8 nr2
2017-02-01 00:53:19 +01:00
Mike Fährmann
c155c7b94b
[hentai2read] put some common code in a base class
2016-10-05 09:19:09 +02:00
Mike Fährmann
56d810c896
update keyword hashes for tests
2016-09-25 17:28:46 +02:00
Mike Fährmann
19c2d4ff6f
remove explicit (sub)category keywords
2016-09-25 14:22:07 +02:00
Mike Fährmann
d7e168799d
consistent extractor naming scheme + docstrings
2016-09-12 10:34:31 +02:00
Mike Fährmann
1416e7f6f7
[hentai2read] fix parsing for new page layout
2016-04-20 08:25:06 +02:00
Mike Fährmann
595e5872d3
[hentai2read] add multi-chapter extractor
2016-02-20 06:49:35 +01:00
Mike Fährmann
f3dc8851c8
[hentai2read] add a couple more keywords
2016-02-20 06:39:13 +01:00
Mike Fährmann
20228a003f
[hentai2read] add extractor
2016-02-19 15:24:49 +01:00