Mike Fährmann
a383eca7f6
decouple extractor initialization
...
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().
This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
d97b8c2fba
consistent cookie-related names
...
- rename every cookie variable or method to 'cookies_*'
- simplify '.session.cookies' to just '.cookies'
- more consistent 'login()' structure
2023-07-22 01:20:50 +02:00
Mike Fährmann
b0cb4a1b9c
replace 'text.extract()' with 'text.extr()' where possible
2022-11-05 01:14:09 +01:00
Mike Fährmann
b03ca7f10c
[aryion] provide correct 'date' independent of dst
2022-03-24 22:57:18 +01:00
Mike Fährmann
4b3e309b90
[aryion] update/improve pagination ( #1849 )
...
Manually increment the 'p' query parameter,
instead of relying on a "Next" link which only works up to page 200.
2021-09-16 16:27:25 +02:00
Mike Fährmann
266ed9b62e
[aryion] add 'tag' extractor ( closes #1849 )
2021-09-14 23:33:33 +02:00
Mike Fährmann
0f35aca728
[aryion] minor code updates
2021-05-19 23:46:33 +02:00
Mike Fährmann
2eb46452ad
[aryion] update 'needle' to not skip text posts ( fixes #1568 )
...
on "Latest Updates" pages
"class='thumb scrollthumb' href='/g4/view/" and
"class='thumb' href='/g4/view/" both end with
"thumb' href='/g4/view/"
2021-05-19 23:35:05 +02:00
Mike Fährmann
387fe415d5
unescape items in text.split_html()
2021-03-29 02:12:29 +02:00
Magnus Boman
522d0a834c
[aryion] Unescape paths too ( #1414 )
...
Without this you'll get paths like this:
- Starcross - Ch. 2 "The Ins and Outs of Sarah"
This commit changes it to:
- Starcross - Ch. 2 "The Ins and Outs of Sarah"
2021-03-27 18:25:38 +01:00
Mike Fährmann
968d3e8465
remove '&' from URL patterns
...
'/?&#' -> '/?#' and '?&#' -> '?#'
According to https://www.ietf.org/rfc/rfc3986.txt , URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
bc48514d84
[aryion] get post ID via gallery-item ( fixes #981 , closes #982 )
...
this even works when fetching post IDs from '/latest.php?id='
2020-09-06 22:17:23 +02:00
ArtaxIsSleeping
0e941553ec
[aryion] Add username/password support ( #960 )
...
* Add username/password support to aryion extractor
* Update docs to match
* Fix code style
2020-08-27 22:45:30 +02:00
Mike Fährmann
b2009ea39e
[aryion] update folder mime type list ( fixes #945 )
2020-08-16 22:30:15 +02:00
Mike Fährmann
f1ddbff0b5
[aryion] add 'recursive' option ( fixes #832 )
...
This is enabled by default and will recursively go through all
(sub)folders in an artist's gallery.
The old method of using "Latest Updates" lists can be restored by
disabling this option.
2020-06-26 23:36:50 +02:00
Mike Fährmann
db6685eeae
[aryion] support downloading from folders ( fixes #694 )
2020-04-18 01:25:54 +02:00
Mike Fährmann
cf4cef3d63
[aryion] adjust 'date' to UTC time
2020-04-11 02:08:05 +02:00
Mike Fährmann
6c531be294
[aryion] fix malformed 'last-modified' headers ( #390 )
2020-04-10 23:08:52 +02:00
Mike Fährmann
dc65f7d8dc
[aryion] use generic download URLs ( #390 )
...
i.e. /g4/data.php?id=…
- get filename & extension from Content-Disposition header
- handle all downloadable file types (docx, swf, etc)
2020-04-10 22:08:45 +02:00
Mike Fährmann
96b78bcf04
[aryion] include path in default directory format ( #390 )
2020-04-10 21:58:46 +02:00
Mike Fährmann
6143050980
[aryion] add gallery and post extractors ( #390 , #673 )
2020-04-08 21:52:51 +02:00