Mike Fährmann
e70af6a550
[hentaifoundry] do not update filters when cookies are provided
2023-04-13 14:16:53 +02:00
Mike Fährmann
d84a617273
[hentaifoundry] fix setting content filters ( #3887 )
2023-04-09 18:04:49 +02:00
Mike Fährmann
b0cb4a1b9c
replace 'text.extract()' with 'text.extr()' where possible
2022-11-05 01:14:09 +01:00
Mike Fährmann
4e11ca737e
[hentaifoundry] fix metadata extraction
2022-07-12 22:19:22 +02:00
Mike Fährmann
00825cddf5
[hentaifoundry] use scheme from input URL ( fixes #1095 )
...
Let the user choose between http and https,
instead of always forcing https.
2020-11-07 22:40:02 +01:00
Mike Fährmann
0211af7ca8
[hentaifoundry] update 'YII_CSRF_TOKEN' cookie handling
...
(fixes #1083 )
2020-10-28 21:49:03 +01:00
Mike Fährmann
968d3e8465
remove '&' from URL patterns
...
'/?&#' -> '/?#' and '?&#' -> '?#'
According to https://www.ietf.org/rfc/rfc3986.txt , URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
783e0af26d
[hentaifoundry] update and simplify
2020-10-15 15:14:17 +02:00
Mike Fährmann
dd1e545597
[hentaifoundry] rename GalleryExtractor to PicturesExtractor
2020-10-04 22:53:23 +02:00
Mike Fährmann
b9bdd2c564
[hentaifoundry] add support for stories ( closes #734 )
2020-09-27 02:27:40 +02:00
Mike Fährmann
0d43456323
[hentaifoundry] add 'include' option
2020-09-25 18:18:03 +02:00
Mike Fährmann
4e361b3008
add tests for specific datetime values
2020-02-23 16:48:30 +01:00
Mike Fährmann
33a6e0ac6e
[hentaifoundry] extract more metadata ( closes #565 )
2020-01-11 23:22:50 +01:00
Mike Fährmann
1848788970
update test results etc
2019-09-08 11:33:35 +02:00
Mike Fährmann
61e413d85d
[hentaifoundry] stop disabling IPv6 addresses
...
The rogue address mentioned in a138d58
is no longer included in the DNS
results for www.hentai-foundry.com.
2019-06-21 20:03:14 +02:00
Mike Fährmann
a138d5873d
[hentaifoundry] improve/fix extraction
...
- Sometimes an ad interfered when trying to get a download URL
- Resolving "www.hentai-foundry.com" yields an invalid(?) IPv6 address
(2607:5300:60:ca9e:feed:dead:beef:1) and urllib3 only tries to connect
to the IPv4 variant after a rather long wait time
2019-02-25 16:16:09 +01:00
Mike Fährmann
5530871b5a
change results of text.nameext_from_url()
...
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)
Example: "https://example.org/path/filename.ext "
before:
- filename : filename.ext
- name : filename
- extension: ext
now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
2e516a1e3e
store the full original URL in Extractor.url
2019-02-12 18:46:48 +01:00
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
2019-02-11 13:31:10 +01:00
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
041bd501fc
[hentaifoundry] unescape YII_CSRF_TOKEN value
...
This fixes the POST requests to /site/filters
2018-11-19 21:46:17 +01:00
Mike Fährmann
c402cc4047
[hentaifoundry] add 'popular' and 'recent' extractors
...
for "Popular Pictures" and "Recent Pictures" listings
2018-09-24 13:11:18 +02:00
Mike Fährmann
a5fc311dfa
[hentaifoundry] add 'favorite' extractor
2018-09-22 21:23:29 +02:00
Mike Fährmann
1c95a0173f
[hentaifoundry] split 'artist' into 'user'+'artist'
...
and some smaller changes ...
'user' is the name of the account an image is listed at and
'artist' is now the name of the account who created the image.
For example "https://www.hentai-foundry.com/user/Tenpura/faves/pictures "
- 'user': Tenpura
- 'artist' of the only image: LewdBrush
2018-09-22 21:21:07 +02:00
Mike Fährmann
006f75b538
[hentaifoundry] rewrite + more metadata
...
- extract width, height, artist per image
- improve pattern regex
- better extensibility for other listings
2018-09-21 11:23:51 +02:00
Mike Fährmann
eeb7424783
[hentaifoundry] add support for "scraps" ( #110 )
2018-09-20 13:41:23 +02:00
Mike Fährmann
017188d268
improve extractor.request()
...
Replace the 'fatal' parameter with 'expect', which is a list/range
of HTTP status codes >= 400 that should also be accepted.
2018-06-18 16:29:56 +02:00
Mike Fährmann
95392554ee
use text.urljoin()
2018-04-26 17:00:26 +02:00
Mike Fährmann
2d17a9e07f
improve extractor.request()
...
- better retry behavior
- exponential back-off
- removed 'allow_empty' argument
2018-04-23 18:45:59 +02:00
Mike Fährmann
f471161920
Merge branch 'master' into 1.4-dev
2018-04-21 12:15:40 +02:00
Mike Fährmann
eb37fbf0e8
[hentaifoundry] improve extractor
...
- use common base class
- better pagination
- respect '.../page/<num>'
- implement skip() / --range support
- get YII_CSRF_TOKEN from cookies
2018-04-20 18:26:23 +02:00
Mike Fährmann
cc36f88586
rename safe_int to parse_int; move parse_* to text module
2018-04-20 14:53:21 +02:00
Mike Fährmann
179bcdd349
adjust archive-ids
2018-02-13 04:50:45 +01:00
Mike Fährmann
34873dbd90
set 'archive_fmt' values
...
These are going to be used to create an unique id for each image.
2018-02-01 15:30:49 +01:00
Mike Fährmann
92027f67f9
use consistent names for URL constants
...
root := <scheme>://<host>
base_url := <root>/<common path>
2017-11-06 20:56:49 +01:00
Mike Fährmann
9c138dfc1f
[common] detect empty HTTP response bodies
2017-09-26 16:49:58 +02:00
Mike Fährmann
9fc1d0c901
implement and use 'util.safe_int()'
...
same as Python's 'int()', except it doesn't raise any exceptions and
accepts a default value
2017-09-24 15:59:25 +02:00
Mike Fährmann
915a0137de
improve 'extractor.request'
...
- add 'fatal' argument
- improve internal logic and flow
- raise known exception on error
- update exception hierarchy
2017-08-05 16:11:46 +02:00
Mike Fährmann
dcc1d3b2ea
[hentaifoundry] fix infinite loop for multiple of 25 images
2017-07-03 14:16:08 +02:00
Mike Fährmann
13dc5d72bc
update some extractors to use https
2017-04-20 13:32:40 +02:00
Mike Fährmann
bd95fea82c
update unit test results
2017-04-11 21:03:09 +02:00
Mike Fährmann
0456efaa5a
[hentaifoundry] update unit tests
2017-04-10 10:50:34 +02:00
Mike Fährmann
0257d3e7ac
[mangamint] remove extractors - site is down
2017-03-20 13:38:54 +01:00
Mike Fährmann
7880cc1ad7
[imgtrex] remove extractor - domain no longer exists
2017-02-05 16:54:04 +01:00
Mike Fährmann
94e10f249a
code adjustments according to pep8 nr2
2017-02-01 00:53:19 +01:00
Mike Fährmann
a849d8f2f7
add a few more tests
2016-12-31 00:51:06 +01:00
Mike Fährmann
efdc299547
[hentaifoundry] get artist name from webpage
2016-12-29 17:38:22 +01:00
Mike Fährmann
8b2024a1a5
[hentaifoundry] support direct links to images
2016-12-27 15:24:42 +01:00
Mike Fährmann
dfd1992a2c
[hentaifoundry] small updates
...
- throw an exception if an user or image does not exist
- update tests, since the user of the old ones left
2016-12-07 09:10:18 +01:00
Mike Fährmann
56d810c896
update keyword hashes for tests
2016-09-25 17:28:46 +02:00