Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
0e46db6f45
rename some base classes
...
They shouldn't be called …Extractor if they don't have 'Extractor' as
their base class.
2019-02-08 11:43:40 +01:00
Mike Fährmann
8095f5f81a
[mangapark] fix manga title extraction
2019-01-28 18:04:42 +01:00
Mike Fährmann
217a0687ef
[behance] add 'collection' extractor ( closes #157 )
2019-01-19 18:11:20 +01:00
Mike Fährmann
66460337f1
[mangapark] fix extraction
2019-01-17 21:24:53 +01:00
Mike Fährmann
fa7fa2f8ff
[deviantart1 update tests]
2019-01-01 15:39:34 +01:00
Mike Fährmann
b7b5456a32
[kissmanga] use HTTPS
2018-12-30 14:04:46 +01:00
Mike Fährmann
98314aa04c
[mangapark] detect non-existent chapters
2018-12-27 21:41:50 +01:00
Mike Fährmann
f9ace0f4a3
[mangapark] fix manga extraction ... again
2018-12-26 18:56:57 +01:00
Mike Fährmann
0c9762f00e
[mangapark] fix extraction
2018-12-22 13:52:48 +01:00
Mike Fährmann
4d73cc785d
update test results
2018-12-14 16:07:32 +01:00
Mike Fährmann
1c6b9ba322
[readcomiconline] use HTTPS
2018-12-09 14:54:55 +01:00
Mike Fährmann
fd8ed35591
[turboimagehost] fix extraction
2018-10-23 21:08:24 +02:00
Mike Fährmann
d1f3d32eec
[fallenangels] unescape chapter titles
2018-10-20 18:31:26 +02:00
Mike Fährmann
2eefaa99a3
[mangapark] support .net and .com mirrors
2018-07-05 14:45:05 +02:00
Mike Fährmann
95392554ee
use text.urljoin()
2018-04-26 17:00:26 +02:00
Mike Fährmann
cc36f88586
rename safe_int to parse_int; move parse_* to text module
2018-04-20 14:53:21 +02:00
Mike Fährmann
f5c6a2d7f5
[nhentai] use API to get gallery info
2018-03-21 12:58:41 +01:00
Mike Fährmann
7a412f5c32
implement generic manga-chapter extractor
2018-02-04 22:02:04 +01:00
Mike Fährmann
35e09869d1
[mangapark] fix image URLs and use HTTPS
2018-01-12 14:59:49 +01:00
Mike Fährmann
c1e331edbb
[mangapark] replace manga test
2017-12-28 13:58:32 +01:00
Mike Fährmann
444008a14a
[khinsider] use urljoin() to complete page URLs
2017-12-17 16:21:05 +01:00
Mike Fährmann
633b376f35
improve/adjust default filename formats for manga sites
2017-10-02 19:06:24 +02:00
Mike Fährmann
9fc1d0c901
implement and use 'util.safe_int()'
...
same as Python's 'int()', except it doesn't raise any exceptions and
accepts a default value
2017-09-24 15:59:25 +02:00
Mike Fährmann
b7a54a51d0
[mangapark] extract manga metadata + code improvements
2017-09-22 17:53:32 +02:00
Mike Fährmann
6f30cf4c64
change keyword names to valid Python identifiers
...
This commit mostly replaces all minus-signs ('-') in keyword names with
underscores ('_') to allow them to be used in filter-expressions. For
example 'gallery-id' got renamed to 'gallery_id'.
(It is theoretically possible to access any variable, regardless of its
name, with 'locals()["NAME"]', but that seems a bit too convoluted if
just 'NAME' could be enough)
2017-09-10 22:20:47 +02:00
Mike Fährmann
47bcf53ec1
implement support for additional unit test result types
...
- "pattern" matches all resulting URLs against the given regex
- "count" allows to specify the amount of returned URLs
2017-08-25 22:01:14 +02:00
Mike Fährmann
c45770331a
use 'str.partition()'
...
The (r)partition method is always faster then split() or any other
method that has been replaced in this commit.
2017-08-21 18:29:50 +02:00
Mike Fährmann
9759fe8c6b
allow 'only_matching' tests
2017-06-14 08:43:05 +02:00
Mike Fährmann
c921b4f32a
code cleanup and fixing tests
2017-06-02 09:10:58 +02:00
Mike Fährmann
f79320e35b
fix tests
2017-05-27 11:47:15 +02:00
Mike Fährmann
99b72130ee
[reddit] enable recursion ( #15 )
...
reddit extractors now recursively visit other submissions/posts
linked to in the initial set of submissions.
This behaviour can be configured via the 'extractor.reddit.recursion'
key in the configuration file or by `-o recursion=<value>`.
Example:
{"extractor": {
"reddit": {
"recursion": <value>
}}}
Possible values:
* -1 - infinite recursion (don't do this)
* 0 - recursion is disabled (default)
* 1 and higher - maximum recursion level
2017-05-26 17:01:27 +02:00
Mike Fährmann
f226417420
simplify code by using a MangaExtractor base class
2017-05-20 11:27:43 +02:00
Mike Fährmann
f4aa452bd1
update unit test results
2017-04-14 14:40:36 +02:00
Mike Fährmann
94e10f249a
code adjustments according to pep8 nr2
2017-02-01 00:53:19 +01:00
Mike Fährmann
c333bc33e3
[mangapark] small fixes and additions
...
- add a 'title' keyword for chapter-titles and update the directory
format accordingly
- add a 'type' keyword to distinguish between manga and manhwa
- fix an issue where an exception would be thrown if a chapter number
did not have any special additions (2.5, 55a, v2, etc.)
- add a test-case without a special chapter number
- unescape manga title
2016-11-16 14:42:13 +01:00
Mike Fährmann
883e702fd6
[mangapark] remove 'url' keyword + fix tests
2016-10-03 15:56:27 +02:00
Mike Fährmann
56d810c896
update keyword hashes for tests
2016-09-25 17:28:46 +02:00
Mike Fährmann
19c2d4ff6f
remove explicit (sub)category keywords
2016-09-25 14:22:07 +02:00
Mike Fährmann
49a05c32ed
add missing tests
2016-09-19 16:15:27 +02:00
Mike Fährmann
d7e168799d
consistent extractor naming scheme + docstrings
2016-09-12 10:34:31 +02:00
Mike Fährmann
ba99506c72
more extractor test-cases
2015-12-14 03:00:58 +01:00
Mike Fährmann
a99fdb0d1e
[mangapark] fix regexes
2015-12-14 01:56:49 +01:00
Mike Fährmann
50ec170b00
[mangapark] add manga extractor
2015-12-09 00:07:18 +01:00
Mike Fährmann
0ff437ca88
[mangapark] add chapter extractor
2015-12-08 22:29:34 +01:00