Mike Fährmann
fc8f86bf24
[hitomi] recognize 'imageset' gallery URLs ( #4756 )
2023-11-02 15:29:44 +01:00
Mike Fährmann
a453335a9f
remove test results in extractor modules
...
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
a383eca7f6
decouple extractor initialization
...
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().
This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
dd884b02ee
replace json.loads with direct calls to JSONDecoder.decode
2023-02-09 15:22:00 +01:00
sudo
a6305d031c
[hitomi] apply format check for every image ( #3030 ) ( #3280 )
2022-11-27 15:55:25 +01:00
Mike Fährmann
b2b0b1c455
[hitomi] fall back to webp when format not available ( #3030 )
2022-10-11 10:48:28 +02:00
Mike Fährmann
2eb0ddd083
[hitomi] fix error when number of tag results is multiple of 25
...
(#2870 )
2022-08-28 17:06:11 +02:00
Mike Fährmann
946643c23c
[hitomi] use maxage for gg.js cache ( #2863 )
...
cached values become invalid after 1-2 hours
2022-08-26 17:57:17 +02:00
Mike Fährmann
37d584a9b2
[hitomi] update metadata extraction ( fixes #2444 )
...
remove 'hitomi.metadata' option, as it is no longer necessary
to make additional HTTP requests to fetch all metadata.
2022-03-26 12:46:18 +01:00
Mike Fährmann
dee0d22561
update extractor test results
2022-02-06 21:39:24 +01:00
Mike Fährmann
86fa412b47
[hitomi] add 'format' option ( #2260 )
...
default is 'webp' since downloading original files is no longer allowed
2022-02-03 23:32:19 +01:00
Mike Fährmann
17c9c47ca0
[hitomi] fix 'tag' extraction ( fixes #2189 )
2022-01-13 16:45:46 +01:00
Mike Fährmann
8b910dd8ae
[hitomi] fix image URLs
...
again and again ...
2022-01-06 18:21:26 +01:00
Mike Fährmann
38e2af29d6
[hitomi] fix image URLs
...
update '_parse_gg()' yet again
2022-01-03 16:41:00 +01:00
Mike Fährmann
1e0278702d
[hitomi] update '_parse_gg()'
2022-01-01 17:55:58 +01:00
Mike Fährmann
becc7f85a6
[hitomi] fix image URLs
2021-12-29 22:46:17 +01:00
Mike Fährmann
099ed72de7
[hitomi] disable extra 'metadata' by default
...
safes one HTTP request that not needed with default filename settings
2021-12-16 22:21:07 +01:00
Mike Fährmann
211de95dd0
update extractor test results
2021-11-01 02:58:53 +01:00
YongChan Cho
14852f7050
[hitomi] fix image path ( #1988 )
2021-10-30 21:45:01 +02:00
Ryu juheon
d4614e5ba4
[hitomi] fix image URLs ( #1982 )
2021-10-28 19:29:48 +02:00
Ryu juheon
6b6d92d51c
[hitomi]: fix image URLs ( #1975 )
2021-10-26 19:35:01 +02:00
Mike Fährmann
47a780942c
update extractor test results
2021-09-03 19:36:12 +02:00
Ryu JuHeon
9429eaa0a3
[hitomi]: fix image URLs ( #1765 )
2021-08-12 14:39:10 +02:00
Mike Fährmann
5612ca31c2
[hitomi] fix image URLs ( closes #1679 )
2021-07-09 18:01:49 +02:00
Mike Fährmann
e98fa01c44
[hitomi] update image URL code ( fixes #1637 )
2021-06-18 16:44:22 +02:00
Mike Fährmann
968d3e8465
remove '&' from URL patterns
...
'/?&#' -> '/?#' and '?&#' -> '?#'
According to https://www.ietf.org/rfc/rfc3986.txt , URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
ffd38215a4
[hitomi] fix image URLs and URL pattern
...
- non-webp files are now hosted on [a-c]b.hitomi.la
- removed ampersand from invalid slug characters
2020-10-22 15:15:34 +02:00
Mike Fährmann
7cd383c0f9
update extractor test results
2020-09-20 21:54:39 +02:00
Mike Fährmann
deaacc70bb
[hitomi] update URL pattern for tag searches
2020-08-27 22:46:03 +02:00
Mike Fährmann
7140fe7e6d
[hitomi] fix redirect processing
2020-08-23 15:18:44 +02:00
Mike Fährmann
a3de234e70
[hitomi] add extractor for tag searches ( closes #697 )
2020-04-20 21:55:19 +02:00
Mike Fährmann
55ac408bdf
[hitomi] fix extraction of galleries without tags
2020-04-20 21:42:14 +02:00
Mike Fährmann
59edcdc822
[hitomi] restore metadata fields from before f33b13a
...
... and add a 'metadata' option to disable
visiting the gallery page and extracting data from it
if this is not needed.
2020-03-12 23:43:41 +01:00
Mike Fährmann
f33b13aacf
[hitomi] simplify metadata extraction
...
Use the data from https://ltn.hitomi.la/galleries/ <id>.js for both
image URLs and metadata and ignore any gallery or reader pages.
This removes 'artist', 'characters', 'group', and 'parody' metadata
fields since this information is, as for now, only available in
gallery pages.
2020-03-04 01:22:32 +01:00
Mike Fährmann
80ecb99089
[hitomi] fix extraction
2020-02-22 22:07:21 +01:00
Mike Fährmann
5607dd3646
[hitomi] follow multiple redirects
2020-02-20 18:22:13 +01:00
Mike Fährmann
d1de7dc296
[hitomi] implement workaround for "broken" redirects
...
Some galleries redirect to a new "version" with different gallery id.
This new version might not be available any more, but the /reader/
page for the original gallery id can still work.
2020-02-02 17:24:23 +01:00
Mike Fährmann
8bb32ee188
[hitomi] fix image URLs
2020-01-14 12:04:48 +01:00
Mike Fährmann
f8ac67ce50
[hitomi] extend URL pattern + follow redirects
2019-11-01 21:40:10 +01:00
Mike Fährmann
8361d874d7
[hitomi] fix extraction
2019-10-29 16:23:20 +01:00
Mike Fährmann
1693d97bd3
update extractor class hierarchies
...
- let the GalleryExtractor class inherit directly from Extractor
- make ChapterExtractor a subclass of GalleryExtractor
- change enumeration field names of GalleryExtractors to 'num'
2019-10-16 18:15:29 +02:00
Mike Fährmann
15af2f8464
[hitomi] fallback to /reader/ page if main page returns 404
...
Some galleries return a 404: Not Found error when trying to access
them through the main gallery URL, but their content is still
available on the respective /reader/ page.
2019-10-11 18:39:52 +02:00
Mike Fährmann
cf5e716b9d
[hitomi] fix image URLs
2019-10-09 17:21:37 +02:00
Mike Fährmann
a732e9c430
[instagram] update query hashes and headers
2019-08-10 14:13:08 +02:00
Mike Fährmann
055102431f
[hitomi] handle Game CG galleries with scenes ( fixes #321 )
2019-06-27 20:25:40 +02:00
Mike Fährmann
b51baa9a4b
[hitomi] fix empty language detection; parse datetime
2019-06-17 20:02:58 +02:00
Mike Fährmann
fc5e4f2b21
[hitomi] simplify data extraction code
2019-05-01 11:14:21 +02:00
Mike Fährmann
2756cc8dde
[hitomi] set Referer header ( fixes #239 )
2019-05-01 10:56:00 +02:00
Mike Fährmann
26c4365baa
adjust metadata types for GalleryExtractors
2019-03-02 14:53:04 +01:00
Mike Fährmann
3595cd582f
use GalleryExtractor as common base class
2019-03-01 14:13:16 +01:00