Mike Fährmann
b0cb4a1b9c
replace 'text.extract()' with 'text.extr()' where possible
2022-11-05 01:14:09 +01:00
Mike Fährmann
3b369ce3d1
[nijie] add 'followed' extractor ( #3048 )
2022-10-14 14:59:18 +02:00
Mike Fährmann
c4a62a48ae
[nijie] add 'feed' extractor ( #3048 )
2022-10-14 12:03:00 +02:00
Mike Fährmann
636d03df95
[nijie] reduce cache maxage to 90 days
2022-08-27 21:57:45 +02:00
Mike Fährmann
241e82e18d
[horne] add support for horne.red ( #2700 )
2022-06-25 16:52:16 +02:00
Mike Fährmann
d11e2191ae
[nijie] support /history_nuita.php listings ( closes #2541 )
2022-05-02 09:03:34 +02:00
Mike Fährmann
1f9a0e2fd8
update extractor test results
2022-04-18 17:24:00 +02:00
Mike Fährmann
bd08ee2859
remove most 'yield Message.Version' statements
...
only leave them in oauth.py as noop results
2021-08-16 03:10:48 +02:00
Mike Fährmann
b58e605dc7
raise error when required username or password are missing
...
do not try to login as 'None' (#1192 )
2020-12-22 14:40:18 +01:00
Mike Fährmann
6514312126
[nijie] add 'include' option ( closes #1018 )
2020-09-25 18:18:35 +02:00
Mike Fährmann
e62c209ca0
[nijie] fix 'date' parsing
2019-11-30 23:08:21 +01:00
Mike Fährmann
94dbdbf506
[nijie] change default filename format
...
… to be consistent with Pixiv filenames
2019-11-04 20:47:38 +01:00
Mike Fährmann
1faec285d1
[nijie] further improvements ( closes #423 )
...
- provide a 'user_name' metadata field
- usually the same as 'artist_id', except for favorite downloads
- extract the whole description text and properly escape HTML entities
- fixed an issue with titles or tags containing double quotes
2019-09-27 23:14:32 +02:00
Mike Fährmann
20eb6c401f
[nijie] improvements and fixes ( #423 )
...
- ignore unavailable image pages
- more metadata fields: artist_name, date, tags
- rename 'index' to 'num'
- improved code structure
2019-09-26 21:45:01 +02:00
Mike Fährmann
12da6bd0c9
[simplyhentai] fix/improve extraction
2019-07-06 20:25:53 +02:00
Mike Fährmann
fdec59f8e2
replace extractor.request() 'expect' argument
...
with
- 'fatal': allow 4xx status codes
- 'notfound': raise NotFoundError on 404
2019-07-05 00:42:16 +02:00
Mike Fährmann
b89f0d8d3c
update extractor result tests
2019-07-01 20:02:47 +02:00
Mike Fährmann
a2af2d2965
adjust cache maxage values
2019-03-14 22:21:49 +01:00
Mike Fährmann
5530871b5a
change results of text.nameext_from_url()
...
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)
Example: "https://example.org/path/filename.ext "
before:
- filename : filename.ext
- name : filename
- extension: ext
now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
2019-02-11 13:31:10 +01:00
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
00dc37ccbf
replace AsynchronousMixin Extractor with a Mixin
2019-02-04 14:21:19 +01:00
Mike Fährmann
dd358b4564
improve cookie handling during logins
2019-01-30 17:09:32 +01:00
Mike Fährmann
173add6935
[nijie] fix artist_id extraction
...
view_popup.php pages for older images or dojins either have the
artist_id value at a different place or not at all.
2018-07-10 12:30:53 +02:00
Mike Fährmann
017188d268
improve extractor.request()
...
Replace the 'fatal' parameter with 'expect', which is a list/range
of HTTP status codes >= 400 that should also be accepted.
2018-06-18 16:29:56 +02:00
Mike Fährmann
2d17a9e07f
improve extractor.request()
...
- better retry behavior
- exponential back-off
- removed 'allow_empty' argument
2018-04-23 18:45:59 +02:00
Mike Fährmann
cc36f88586
rename safe_int to parse_int; move parse_* to text module
2018-04-20 14:53:21 +02:00
Mike Fährmann
7b562907c3
[nijie] add favorites extractor
...
adds support for 'https://nijie.info/user_like_illust_view.php?id= ...'
2018-03-31 18:54:25 +02:00
Mike Fährmann
445db75955
[nijie] improve extraction and metadata
...
- add 'title' and 'description'
- split 'artist_id' into 'user_id' and 'artist_id'
- 'user_id' is the ID of the user from which the image entry
originates from
- 'artist_id' is the ID of the actual image artist
- improve pagination and URL patterns
2018-03-31 18:48:41 +02:00
Mike Fährmann
a112e3f2a0
[nijie] add doujin extractor
...
adds support for "https://nijie.info/members_dojin.php?id= <artist_id>"
2018-03-31 18:17:41 +02:00
Mike Fährmann
3cec533c28
Merge branch 'archive'
2018-02-12 18:07:58 +01:00
Mike Fährmann
f5f2d29f56
[nijie] fix dojin extraction
...
- correctly extract artist_id
- set extension to "jpg" if it was empty and let filetype checks do
the rest
2018-02-09 22:06:26 +01:00
Mike Fährmann
34873dbd90
set 'archive_fmt' values
...
These are going to be used to create an unique id for each image.
2018-02-01 15:30:49 +01:00
Mike Fährmann
9c138dfc1f
[common] detect empty HTTP response bodies
2017-09-26 16:49:58 +02:00
Mike Fährmann
6f30cf4c64
change keyword names to valid Python identifiers
...
This commit mostly replaces all minus-signs ('-') in keyword names with
underscores ('_') to allow them to be used in filter-expressions. For
example 'gallery-id' got renamed to 'gallery_id'.
(It is theoretically possible to access any variable, regardless of its
name, with 'locals()["NAME"]', but that seems a bit too convoluted if
just 'NAME' could be enough)
2017-09-10 22:20:47 +02:00
Mike Fährmann
915a0137de
improve 'extractor.request'
...
- add 'fatal' argument
- improve internal logic and flow
- raise known exception on error
- update exception hierarchy
2017-08-05 16:11:46 +02:00
Mike Fährmann
7aa9fa796a
code cleanup and fixes
2017-07-25 14:59:41 +02:00
Mike Fährmann
808f67ba7d
use 'cookiedomain' for cookies set by object-config-values
...
otherwise these cookies would not be picked up by the
_check_cookies() method.
2017-07-22 15:43:35 +02:00
Mike Fährmann
0610ae5000
skip login if cookies are present
2017-07-17 10:33:36 +02:00
Mike Fährmann
d3b04076f7
add .netrc support ( #22 )
...
Use the '--netrc' cmdline option or set the 'netrc' config option
to 'true' to enable the use of .netrc authentication data.
The 'machine' names for the .netrc info are the lowercase extractor
names (or categories): batoto, exhentai, nijie, pixiv, seiga.
2017-06-24 12:17:26 +02:00
Mike Fährmann
4b967fa189
implement and use extractor.config() method
2017-04-25 17:12:48 +02:00
Mike Fährmann
298d7c45f7
[nijie] support multi-page image listings
2017-04-02 11:43:23 +02:00
Mike Fährmann
1d46be545c
add login notifications
2017-03-17 09:42:59 +01:00
Mike Fährmann
94e10f249a
code adjustments according to pep8 nr2
2017-02-01 00:53:19 +01:00
Mike Fährmann
4a8d74973c
adjust login methods to a specific style
2017-01-08 17:33:25 +01:00
Mike Fährmann
7952b8d18d
add a few tests expecting exceptions
2016-12-30 01:46:42 +01:00
Mike Fährmann
56d810c896
update keyword hashes for tests
2016-09-25 17:28:46 +02:00
Mike Fährmann
19c2d4ff6f
remove explicit (sub)category keywords
2016-09-25 14:22:07 +02:00
Mike Fährmann
fea3be0aed
[nijie] add image-extractor
2016-09-19 08:51:49 +02:00
Mike Fährmann
d7e168799d
consistent extractor naming scheme + docstrings
2016-09-12 10:34:31 +02:00