Mike Fährmann
a453335a9f
remove test results in extractor modules
...
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
dd884b02ee
replace json.loads with direct calls to JSONDecoder.decode
2023-02-09 15:22:00 +01:00
Mike Fährmann
f5b7ae01c1
update extractor test results
2020-09-15 18:07:08 +02:00
Mike Fährmann
a28552fd19
update test results
...
- hbrowse: one tag got removed
- mangoxo: gallery changed owner
- photobucket: ?, but photo still downloads
2019-11-30 23:59:32 +01:00
Mike Fährmann
4409d00141
embed error messages in StopExtraction exceptions
2019-10-28 16:39:49 +01:00
Mike Fährmann
5530871b5a
change results of text.nameext_from_url()
...
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)
Example: "https://example.org/path/filename.ext "
before:
- filename : filename.ext
- name : filename
- extension: ext
now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
32edf4fc7b
add '_extractor' info to manga extractor results
2019-02-13 13:23:36 +01:00
Mike Fährmann
580baef72c
change Chapter and MangaExtractor classes
...
- unify and simplify constructors
- rename get_metadata and get_images to just metadata() and images()
- rename self.url to chapter_url and manga_url
2019-02-11 18:38:47 +01:00
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
2019-02-11 13:31:10 +01:00
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
0e46db6f45
rename some base classes
...
They shouldn't be called …Extractor if they don't have 'Extractor' as
their base class.
2019-02-08 11:43:40 +01:00
Mike Fährmann
0fb98d1d79
[hbrowse] extract tag metadata
2019-01-15 18:08:10 +01:00
Mike Fährmann
9bbbadd93a
[hbrowse] use HTTPS
2019-01-15 18:07:39 +01:00
Mike Fährmann
d7a4739cf6
[hbrowse] print error message if site is down
...
... instead of crashing with a meaningless exception
2019-01-14 15:44:23 +01:00
Mike Fährmann
95392554ee
use text.urljoin()
2018-04-26 17:00:26 +02:00
Mike Fährmann
cc36f88586
rename safe_int to parse_int; move parse_* to text module
2018-04-20 14:53:21 +02:00
Mike Fährmann
3cec533c28
Merge branch 'archive'
2018-02-12 18:07:58 +01:00
Mike Fährmann
7a412f5c32
implement generic manga-chapter extractor
2018-02-04 22:02:04 +01:00
Mike Fährmann
34873dbd90
set 'archive_fmt' values
...
These are going to be used to create an unique id for each image.
2018-02-01 15:30:49 +01:00
Mike Fährmann
7e0d9257a7
[hbrowse] fix manga extraction
2017-11-15 13:59:50 +01:00
Mike Fährmann
9fc1d0c901
implement and use 'util.safe_int()'
...
same as Python's 'int()', except it doesn't raise any exceptions and
accepts a default value
2017-09-24 15:59:25 +02:00
Mike Fährmann
c265cc074a
[hbrowse] fix syntax for Python3.3 and 3.4
2017-09-20 16:41:39 +02:00
Mike Fährmann
a9e7145651
[hbrowse] extract hmanga metadata & general maintenance
2017-09-20 16:25:25 +02:00
Mike Fährmann
6f30cf4c64
change keyword names to valid Python identifiers
...
This commit mostly replaces all minus-signs ('-') in keyword names with
underscores ('_') to allow them to be used in filter-expressions. For
example 'gallery-id' got renamed to 'gallery_id'.
(It is theoretically possible to access any variable, regardless of its
name, with 'locals()["NAME"]', but that seems a bit too convoluted if
just 'NAME' could be enough)
2017-09-10 22:20:47 +02:00
Mike Fährmann
0245a0ba5f
fix extraction and update test results
...
- fixes for hbrowse, imgyt, imgcandy, hosturimage
- test updates for deviantart, gfycat
2017-08-08 19:11:13 +02:00
Mike Fährmann
f226417420
simplify code by using a MangaExtractor base class
2017-05-20 11:27:43 +02:00
Mike Fährmann
fc9223c072
add '--abort-on-skip' option and ability to control skip behavior
...
the 'skip' config option controls skipping behavior:
true - skip download if file already exist (default)
false - download and overwrite files even if it exists
"abort" - abort extractor run if a download would be skipped
(same as '--abort-on-skip')
2017-05-03 15:26:04 +02:00
Mike Fährmann
af56887a47
[exhentai] fall back to e-hentai if no username is given
2017-04-28 15:59:56 +02:00
Mike Fährmann
94e10f249a
code adjustments according to pep8 nr2
2017-02-01 00:53:19 +01:00
Mike Fährmann
56d810c896
update keyword hashes for tests
2016-09-25 17:28:46 +02:00
Mike Fährmann
19c2d4ff6f
remove explicit (sub)category keywords
2016-09-25 14:22:07 +02:00
Mike Fährmann
80d98f97fa
[hbrowse] add manga extractor
2016-09-14 09:48:47 +02:00
Mike Fährmann
4d56b76aa8
update all other extractors
2015-11-21 04:26:30 +01:00
Mike Fährmann
c2f0720184
code cleanup to use nameext_from_url
2015-11-16 17:32:26 +01:00
Mike Fährmann
3f2fbd874d
[hbrowse] add extractor
2015-11-15 01:30:26 +01:00