Mike Fährmann
3ecb512722
send Referer headers by default
2023-09-19 00:02:04 +02:00
Mike Fährmann
d13c82eff1
[kemonoparty] update favorites API endpoint ( #4522 )
2023-09-14 14:57:01 +02:00
Mike Fährmann
27ec653991
fix bug in test_init and update example URLs
2023-09-14 13:27:03 +02:00
Mike Fährmann
a453335a9f
remove test results in extractor modules
...
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
a383eca7f6
decouple extractor initialization
...
Introduce an 'initialize()' function that does the actual init
(session, cookies, config options) and can called separately from
the constructor __init__().
This allows, for example, to adjust config access inside a Job
before most of it already happened when calling 'extractor.find()'.
2023-07-25 22:16:16 +02:00
Mike Fährmann
d97b8c2fba
consistent cookie-related names
...
- rename every cookie variable or method to 'cookies_*'
- simplify '.session.cookies' to just '.cookies'
- more consistent 'login()' structure
2023-07-22 01:20:50 +02:00
Mike Fährmann
4ae925c88f
[kemonoparty] support '.su' TLD ( #4139 )
2023-06-06 20:55:03 +02:00
Mike Fährmann
3516fdae74
[kemonoparty] fix kemono and coomer logins using the same cache
...
(#4098 )
2023-05-26 13:35:02 +02:00
Mike Fährmann
76b01b64cf
[kemonoparty] remove MD5 hash extraction ( #3531 )
...
This partially reverts commit 20d6194ffa
.
2023-01-25 11:10:09 +01:00
ClosedPort22
20d6194ffa
[kemonoparty] improve hash extraction
...
- extract MD5 hash from URLs
- extract MD5 and SHA256 hash from Discord URLs (kemono.party only)
- minor optimization (do not call 'hashes.add' when 'duplicates' is
true)
- update tests accordingly
Co-authored-by: Mike Fährmann <mike_faehrmann@web.de>
2023-01-15 12:01:13 +08:00
Mike Fährmann
85bd1cbc89
[kemonoparty] fix regression from 473bd380
( #3519 )
...
- do not access 'response.content' unless necessary
- only validate responses if filename extensions differ
2023-01-11 15:25:01 +01:00
Mike Fährmann
473bd380c8
[kemonoparty] reject invalid/empty files ( #3510 )
2023-01-10 19:04:47 +01:00
Mike Fährmann
b0cb4a1b9c
replace 'text.extract()' with 'text.extr()' where possible
2022-11-05 01:14:09 +01:00
Mike Fährmann
77173694d5
[kemonoparty] fix 'dms' extraction ( #3106 )
2022-10-26 14:25:43 +02:00
Mike Fährmann
94a2dfe205
[kemonoparty] update pagination offset
2022-10-17 10:22:12 +02:00
Mike Fährmann
78694a61bb
[kemonoparty] restore 'favorites' API endpoints ( #2994 )
2022-10-01 12:15:32 +02:00
Mike Fährmann
b84982b2f9
[kemonoparty] send Referer headers ( #2989 , #2990 )
2022-10-01 11:45:56 +02:00
Mike Fährmann
779e75c6f8
[kemonoparty] fix attachment IDs overwriting post IDs ( #2984 )
...
regression from 09a5cc61
2022-09-30 16:47:09 +02:00
Mike Fährmann
09a5cc6103
[kemonoparty] add 'count' metadata field ( #2952 )
2022-09-23 10:44:12 +02:00
enduser420
574e38a287
[kemonoparty] add 'favorites' option ( #2826 ) ( #2831 )
...
* [kemonoparty] add 'favorites' option (#2826 )
* [kemonoparty] add regex for the url parameter and fallback on the config
option
* [kemonoparty] simplify
2022-08-18 18:01:42 +02:00
Mike Fährmann
7c0505868c
[kemonoparty] ensure all files have an 'extension' ( #2740 )
2022-07-10 13:53:07 +02:00
Mike Fährmann
ba69fb669d
[kemonoparty] add 'duplicates' option ( closes #2440 )
2022-03-24 11:58:38 +01:00
Mike Fährmann
fac8047899
[kemonoparty] limit default filename length ( #2373 )
2022-03-08 21:14:47 +01:00
Mike Fährmann
bddcec49f1
implement 'text.root_from_url()'
...
use domain from input URL for kemono
2022-03-01 03:09:57 +01:00
Mike Fährmann
92c492dc09
[kemonoparty] match beta.kemono.party URLs ( #2348 )
2022-03-01 03:02:30 +01:00
Mike Fährmann
a57a44f510
[kemonoparty] handle files without 'name' ( fixes #2276 )
2022-02-08 18:27:05 +01:00
Mike Fährmann
d7b8e04b50
[kemonoparty] use 'Accept-Encoding: identity' for all downloads
...
(#2267 )
fixes issues when data send with 'Content-Encoding: gzip' or other
encodings is larger than the actual file
2022-02-05 18:06:58 +01:00
Mike Fährmann
a2eecc6aa8
[kemonoparty] fix DMs extraction ( #2008 )
2022-01-25 23:16:13 +01:00
Mike Fährmann
6af8d71da6
[kemonoparty] use service as subcategory ( closes #2147 )
2021-12-29 22:46:17 +01:00
Mike Fährmann
8ed282f7f2
[kemonoparty] support coomer.party URLs ( #2100 )
2021-12-15 16:21:05 +01:00
Mike Fährmann
f1b142e993
{kemonoparty[ change default 'files' order to attachments,file,inline
...
(#1991 )
2021-11-29 04:41:30 +01:00
Mike Fährmann
e298882acc
[kemonoparty] match URLs with www subdomain
2021-11-26 18:58:26 +01:00
Mike Fährmann
af6424f398
allow testing metadata in list elements
2021-11-21 22:46:34 +01:00
Mike Fährmann
c67756e187
[kemonoparty] add 'dms' option ( #2008 )
2021-11-20 23:36:16 +01:00
Mike Fährmann
9bc83af3a6
[kemonoparty] 'postfile' -> 'file' ( #1991 )
...
to stay consistent with the existing file types for kemono
2021-11-19 01:50:48 +01:00
Mike Fährmann
d433735750
[kemonoparty] skip duplicate files ( #2032 , #1991 , #1899 )
...
Extract the SHA-256 file hash from URLs
and skip files with the same hash in the same post.
- provide a 'hash' metadata field (empty string if not available)
- remove 'patreon-skip-file' option
2021-11-17 22:44:15 +01:00
Mike Fährmann
d4ec245554
[kemonoparty] implement a 'files' option ( #1991 )
...
similar to 8d676151
2021-11-17 22:43:41 +01:00
Mike Fährmann
6e3658ef52
[kemonoparty] provide 'date' metadata for gumroad ( #2007 )
...
Not the 'published' or 'edited' values since they are 'null',
but still better then nothing at all.
2021-11-11 19:38:10 +01:00
Mike Fährmann
f0fc3b0ba1
[kemonoparty] add 'comments' option ( #1980 )
2021-11-03 23:02:13 +01:00
Mike Fährmann
f1487a3cfa
[kemonoparty:discord] improve 'inline' extraction ( #1940 )
...
- extract media.discordapp.*NET* URLs
- rewrite media.discordapp.net to cdn.discordapp.com
- use a more restricted set of characters for the URL path
2021-10-24 21:15:21 +02:00
Mike Fährmann
b6443c576d
[kemonoparty:discord] extract 'inline' files
2021-10-22 02:50:47 +02:00
Mike Fährmann
bcbf9bcf36
[kemonoparty] split 'discord' extractor ( #1940 )
...
in 'server' and 'channel'
2021-10-18 04:04:58 +02:00
Mike Fährmann
db857b40d8
[kemonoparty] improve inline extraction ( #1899 )
2021-10-17 21:47:11 +02:00
Mike Fährmann
70005e3275
[kemonoparty:discord] support downloading from a specific channel
...
https://kemono.party/discord/server/ <server-id>#<channel-name>>
2021-10-15 18:50:08 +02:00
Mike Fährmann
003f25931d
[kemonoparty:discord] provide a 'channel_name'
2021-10-15 18:37:08 +02:00
Mike Fährmann
fe6ce5495a
[kemonoparty] add 'discord' extractor ( #1827 , #1940 )
2021-10-13 20:33:05 +02:00
Mike Fährmann
da9685609c
[kemonoparty] update file download URLs
...
(closes #1902 , fixes #1903 )
2021-09-30 23:02:46 +02:00
Mike Fährmann
4ec11af6a4
[kemonoparty] implement login with username & password ( #1824 )
2021-09-09 01:06:25 +02:00
Mike Fährmann
83bbb628d8
[kemonoparty] add 'favorite' extractor ( #1824 )
2021-09-08 00:32:49 +02:00
Mike Fährmann
bb6a130942
automatically set required DDoS-GUARD cookies ( #1779 )
...
for kemono.party and seiso.party
2021-08-16 17:40:29 +02:00