Mike Fährmann
a453335a9f
remove test results in extractor modules
...
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
a383eca7f6
decouple extractor initialization
...
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().
This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
b0cb4a1b9c
replace 'text.extract()' with 'text.extr()' where possible
2022-11-05 01:14:09 +01:00
Mike Fährmann
bd08ee2859
remove most 'yield Message.Version' statements
...
only leave them in oauth.py as noop results
2021-08-16 03:10:48 +02:00
Mike Fährmann
d900edfcfb
[simplyhentai] fix extraction
2021-04-25 18:51:43 +02:00
Mike Fährmann
968d3e8465
remove '&' from URL patterns
...
'/?&#' -> '/?#' and '?&#' -> '?#'
According to https://www.ietf.org/rfc/rfc3986.txt , URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
f317a57c5e
[simplyhentai] fix 'gallery_id' extraction
2020-07-27 16:14:06 +02:00
Mike Fährmann
7499d71d02
[simplyhentai] ignore certificate errors in video test
2020-03-28 21:07:30 +01:00
Mike Fährmann
87a87bff7e
[simplyhentai] fix image URLs
2019-10-28 21:11:06 +01:00
Mike Fährmann
ef17d94469
update test results
2019-10-21 21:53:21 +02:00
Mike Fährmann
1693d97bd3
update extractor class hierarchies
...
- let the GalleryExtractor class inherit directly from Extractor
- make ChapterExtractor a subclass of GalleryExtractor
- change enumeration field names of GalleryExtractors to 'num'
2019-10-16 18:15:29 +02:00
Mike Fährmann
11ea689013
[simplyhentai] fix image and video URLs
2019-09-16 21:37:16 +02:00
Mike Fährmann
b1cddce865
Revert "[simplyhentai] fix extraction; remove image+video extractors"
...
This reverts commit d1db5180ab
.
2019-09-07 14:48:31 +02:00
Mike Fährmann
d1db5180ab
[simplyhentai] fix extraction; remove image+video extractors
2019-08-22 23:56:41 +02:00
Mike Fährmann
12da6bd0c9
[simplyhentai] fix/improve extraction
2019-07-06 20:25:53 +02:00
Mike Fährmann
26c4365baa
adjust metadata types for GalleryExtractors
2019-03-02 14:53:04 +01:00
Mike Fährmann
3595cd582f
use GalleryExtractor as common base class
2019-03-01 14:13:16 +01:00
Mike Fährmann
5530871b5a
change results of text.nameext_from_url()
...
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)
Example: "https://example.org/path/filename.ext "
before:
- filename : filename.ext
- name : filename
- extension: ext
now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
2e516a1e3e
store the full original URL in Extractor.url
2019-02-12 18:46:48 +01:00
Mike Fährmann
580baef72c
change Chapter and MangaExtractor classes
...
- unify and simplify constructors
- rename get_metadata and get_images to just metadata() and images()
- rename self.url to chapter_url and manga_url
2019-02-11 18:38:47 +01:00
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
2019-02-11 13:31:10 +01:00
Mike Fährmann
02d733d219
[simplyhentai] fix and improve tag extraction
...
The "tags" field is now a list instead of a string.
In format strings, use "{tags:J, }" to Join them.
2019-02-10 13:52:09 +01:00
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
8e01cf0ef8
[reactor] generalize extractors ( #148 )
...
- support *.reactor.cc domains
- combine joyreactor and pornreactor modules
2019-01-07 17:06:47 +01:00
Mike Fährmann
a47c6136cd
[simplyhentai] avoid redirects for all-pages.json ( #89 )
2018-06-01 22:06:34 +02:00
Mike Fährmann
72e66f0aac
[simplyhentai] improve URL pattern
...
[ci skip]
2018-05-30 11:44:43 +02:00
Mike Fährmann
cdcc3427a0
[simplyhentai] add video extractor ( #89 )
...
All videos hosted on their own servers seem be to dead,
but myhentai.tv embeds, which are most of the videos, work fine.
2018-05-30 11:25:23 +02:00
Mike Fährmann
f9a6a19658
[simplyhentai] add image extractor ( #89 )
2018-05-30 10:58:48 +02:00
Mike Fährmann
55b0913412
[simplyhentai] add gallery extractor ( #89 )
2018-05-27 15:25:04 +02:00