1
0
mirror of https://github.com/mikf/gallery-dl.git synced 2024-11-25 12:12:34 +01:00
Commit Graph

63 Commits

Author SHA1 Message Date
Mike Fährmann
27ec653991
fix bug in test_init and update example URLs 2023-09-14 13:27:03 +02:00
Mike Fährmann
a453335a9f
remove test results in extractor modules
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
a383eca7f6
decouple extractor initialization
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().

This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
d97b8c2fba
consistent cookie-related names
- rename every cookie variable or method to 'cookies_*'
- simplify '.session.cookies' to just '.cookies'
- more consistent 'login()' structure
2023-07-22 01:20:50 +02:00
Mike Fährmann
e70af6a550
[hentaifoundry] do not update filters when cookies are provided 2023-04-13 14:16:53 +02:00
Mike Fährmann
d84a617273
[hentaifoundry] fix setting content filters (#3887) 2023-04-09 18:04:49 +02:00
Mike Fährmann
b0cb4a1b9c
replace 'text.extract()' with 'text.extr()' where possible 2022-11-05 01:14:09 +01:00
Mike Fährmann
4e11ca737e
[hentaifoundry] fix metadata extraction 2022-07-12 22:19:22 +02:00
Mike Fährmann
00825cddf5
[hentaifoundry] use scheme from input URL (fixes #1095)
Let the user choose between http and https,
instead of always forcing https.
2020-11-07 22:40:02 +01:00
Mike Fährmann
0211af7ca8
[hentaifoundry] update 'YII_CSRF_TOKEN' cookie handling
(fixes #1083)
2020-10-28 21:49:03 +01:00
Mike Fährmann
968d3e8465
remove '&' from URL patterns
'/?&#' -> '/?#' and '?&#' -> '?#'

According to https://www.ietf.org/rfc/rfc3986.txt, URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
783e0af26d
[hentaifoundry] update and simplify 2020-10-15 15:14:17 +02:00
Mike Fährmann
dd1e545597
[hentaifoundry] rename GalleryExtractor to PicturesExtractor 2020-10-04 22:53:23 +02:00
Mike Fährmann
b9bdd2c564
[hentaifoundry] add support for stories (closes #734) 2020-09-27 02:27:40 +02:00
Mike Fährmann
0d43456323
[hentaifoundry] add 'include' option 2020-09-25 18:18:03 +02:00
Mike Fährmann
4e361b3008
add tests for specific datetime values 2020-02-23 16:48:30 +01:00
Mike Fährmann
33a6e0ac6e
[hentaifoundry] extract more metadata (closes #565) 2020-01-11 23:22:50 +01:00
Mike Fährmann
1848788970
update test results etc 2019-09-08 11:33:35 +02:00
Mike Fährmann
61e413d85d
[hentaifoundry] stop disabling IPv6 addresses
The rogue address mentioned in a138d58 is no longer included in the DNS
results for www.hentai-foundry.com.
2019-06-21 20:03:14 +02:00
Mike Fährmann
a138d5873d
[hentaifoundry] improve/fix extraction
- Sometimes an ad interfered when trying to get a download URL
- Resolving "www.hentai-foundry.com" yields an invalid(?) IPv6 address
  (2607:5300:60:ca9e:feed:dead:beef:1) and urllib3 only tries to connect
  to the IPv4 variant after a rather long wait time
2019-02-25 16:16:09 +01:00
Mike Fährmann
5530871b5a
change results of text.nameext_from_url()
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)

Example: "https://example.org/path/filename.ext"

before:
- filename : filename.ext
- name     : filename
- extension: ext

now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
2e516a1e3e
store the full original URL in Extractor.url 2019-02-12 18:46:48 +01:00
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor 2019-02-11 13:31:10 +01:00
Mike Fährmann
6284731107
simplify extractor constants
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
041bd501fc
[hentaifoundry] unescape YII_CSRF_TOKEN value
This fixes the POST requests to /site/filters
2018-11-19 21:46:17 +01:00
Mike Fährmann
c402cc4047
[hentaifoundry] add 'popular' and 'recent' extractors
for "Popular Pictures" and "Recent Pictures" listings
2018-09-24 13:11:18 +02:00
Mike Fährmann
a5fc311dfa
[hentaifoundry] add 'favorite' extractor 2018-09-22 21:23:29 +02:00
Mike Fährmann
1c95a0173f
[hentaifoundry] split 'artist' into 'user'+'artist'
and some smaller changes ...

'user' is the name of the account an image is listed at and
'artist' is now the name of the account who created the image.

For example "https://www.hentai-foundry.com/user/Tenpura/faves/pictures"
- 'user': Tenpura
- 'artist' of the only image: LewdBrush
2018-09-22 21:21:07 +02:00
Mike Fährmann
006f75b538
[hentaifoundry] rewrite + more metadata
- extract width, height, artist per image
- improve pattern regex
- better extensibility for other listings
2018-09-21 11:23:51 +02:00
Mike Fährmann
eeb7424783
[hentaifoundry] add support for "scraps" (#110) 2018-09-20 13:41:23 +02:00
Mike Fährmann
017188d268
improve extractor.request()
Replace the 'fatal' parameter with 'expect', which is a list/range
of HTTP status codes >= 400 that should also be accepted.
2018-06-18 16:29:56 +02:00
Mike Fährmann
95392554ee
use text.urljoin() 2018-04-26 17:00:26 +02:00
Mike Fährmann
2d17a9e07f
improve extractor.request()
- better retry behavior
- exponential back-off
- removed 'allow_empty' argument
2018-04-23 18:45:59 +02:00
Mike Fährmann
f471161920
Merge branch 'master' into 1.4-dev 2018-04-21 12:15:40 +02:00
Mike Fährmann
eb37fbf0e8
[hentaifoundry] improve extractor
- use common base class
- better pagination
- respect '.../page/<num>'
- implement skip() / --range support
- get YII_CSRF_TOKEN from cookies
2018-04-20 18:26:23 +02:00
Mike Fährmann
cc36f88586
rename safe_int to parse_int; move parse_* to text module 2018-04-20 14:53:21 +02:00
Mike Fährmann
179bcdd349
adjust archive-ids 2018-02-13 04:50:45 +01:00
Mike Fährmann
34873dbd90
set 'archive_fmt' values
These are going to be used to create an unique id for each image.
2018-02-01 15:30:49 +01:00
Mike Fährmann
92027f67f9
use consistent names for URL constants
root := <scheme>://<host>
base_url := <root>/<common path>
2017-11-06 20:56:49 +01:00
Mike Fährmann
9c138dfc1f
[common] detect empty HTTP response bodies 2017-09-26 16:49:58 +02:00
Mike Fährmann
9fc1d0c901
implement and use 'util.safe_int()'
same as Python's 'int()', except it doesn't raise any exceptions and
accepts a default value
2017-09-24 15:59:25 +02:00
Mike Fährmann
915a0137de
improve 'extractor.request'
- add 'fatal' argument
- improve internal logic and flow
- raise known exception on error
- update exception hierarchy
2017-08-05 16:11:46 +02:00
Mike Fährmann
dcc1d3b2ea
[hentaifoundry] fix infinite loop for multiple of 25 images 2017-07-03 14:16:08 +02:00
Mike Fährmann
13dc5d72bc
update some extractors to use https 2017-04-20 13:32:40 +02:00
Mike Fährmann
bd95fea82c
update unit test results 2017-04-11 21:03:09 +02:00
Mike Fährmann
0456efaa5a
[hentaifoundry] update unit tests 2017-04-10 10:50:34 +02:00
Mike Fährmann
0257d3e7ac
[mangamint] remove extractors - site is down 2017-03-20 13:38:54 +01:00
Mike Fährmann
7880cc1ad7
[imgtrex] remove extractor - domain no longer exists 2017-02-05 16:54:04 +01:00
Mike Fährmann
94e10f249a
code adjustments according to pep8 nr2 2017-02-01 00:53:19 +01:00
Mike Fährmann
a849d8f2f7
add a few more tests 2016-12-31 00:51:06 +01:00