Mike Fährmann
f1ddbff0b5
[aryion] add 'recursive' option ( fixes #832 )
...
This is enabled by default and will recursively go through all
(sub)folders in an artist's gallery.
The old method of using "Latest Updates" lists can be restored by
disabling this option.
2020-06-26 23:36:50 +02:00
Mike Fährmann
699062b91f
Revert "[kissmanga] workaround for CAPTCHAs ( #818 )"
...
This reverts commit 4cf3d54718
.
2020-06-25 19:35:03 +02:00
Mike Fährmann
0cac14c3bd
update extractor test results
2020-06-25 19:11:47 +02:00
Mike Fährmann
5e5be67c26
[tumblr] prevent KeyErrors when using reblogs=same-blog
...
(fixes #851 )
2020-06-25 19:00:12 +02:00
Mike Fährmann
9da2bc67f8
[twitter] add option to filter media from quoted tweets ( #854 )
2020-06-25 18:59:25 +02:00
Mike Fährmann
56ab5fb8f4
[twitter] improve handling of quoted tweets ( #854 )
...
Split each "quote" into two parts:
- the original tweet
- the tweet that quoted the original
2020-06-24 21:14:18 +02:00
Mike Fährmann
bd0e1ca1a5
[imgur] build directory path for each file ( closes #842 )
2020-06-21 19:25:52 +02:00
Mike Fährmann
a8c2d997e8
[twitter] treat quoted tweets like retweets ( #833 )
...
- filter them when 'retweets' is disabled
- set 'author' to the creator of the quoted tweet
like it was before the rewrite
2020-06-21 19:14:12 +02:00
Mike Fährmann
aed1c63e51
[twitter] improve search results ( fixes #847 )
...
Adding 'tweet_search_mode=live' to the query parameters
is the most important part here.
2020-06-21 15:53:20 +02:00
Mike Fährmann
0e714b9a0e
[pinterest] add 'section' extractor ( #835 )
2020-06-21 00:08:14 +02:00
Mike Fährmann
53cc498d9c
improve config lookup when there are multiple possible locations
...
This specifically applies to all Mastodon extractors and all
extractors with a 'basecategory', i.e. 'booru', 'foolslide', etc.
Values inside those general config locations wouldn't be recognized
when a value with the same was set on the 'extractor' level.
For example 'extractor.mastodon.directory' should be used over
'extractor.directory' when both are set, but this was impossible
with the previous implementation.
(fixes #843 )
2020-06-21 00:07:10 +02:00
Mike Fährmann
1b3870a4be
flush after writing JSON in DataJob() ( #727 )
...
… and remove the dead handle_finalize() method,
which is never called since DataJob() overrides run().
2020-06-19 23:05:44 +02:00
Mike Fährmann
d81a8e6544
[twitter] update tests
2020-06-19 23:01:02 +02:00
Mike Fährmann
d39eedd9bb
[twitter] improve handling of deleted tweets ( fixes #838 )
2020-06-19 18:11:37 +02:00
Mike Fährmann
1ae1df0d27
update '--write-pages' ( #737 )
...
- fix infinite recursion for responses with multiple entries in
'history'
- hide values of Set-Cookie headers
- only write the response content by default
(use '-o write-pages=all' to also include HTTP headers)
2020-06-18 15:07:30 +02:00
Mike Fährmann
7e8a747c56
improve output of '-K' for parent extractors 2 ( #825 )
...
This is what bb882b8
was supposed to be, but I managed to
not include those changes in the first commit …
2020-06-18 15:04:15 +02:00
Mike Fährmann
dc16f73965
[twitter] move '_guest_token()' into TwitterAPI class
2020-06-18 15:02:51 +02:00
Mike Fährmann
3561d1020a
[twitter] always provide an 'author' field ( #831 , #833 )
...
The idea was to have less metadata clutter for most Tweets were
'author' and 'user' are the same (non-retweets), and only provide
a 'user' field.
The original Tweet author could be gotten with
{author[…]|user[…]}, but basically no one knows about that.
2020-06-18 15:02:51 +02:00
Mike Fährmann
7158bdd7c7
[weibo] improve extractor logic ( #829 )
2020-06-18 15:00:31 +02:00
Mike Fährmann
37d71f6e09
strip microseconds in text.parse_datetime()
2020-06-17 21:40:16 +02:00
Mike Fährmann
0371fd54a1
[artstation] add 'date' metadata field ( #839 )
2020-06-17 20:22:18 +02:00
Mike Fährmann
8c857052d7
[mastodon] ignore toots without media attachments
2020-06-17 20:21:28 +02:00
Mike Fährmann
de045d39b2
[mastodon] add 'date' metadata field ( #839 )
2020-06-17 19:22:28 +02:00
Mike Fährmann
d5d90a0450
[weibo] add 'date' field to 'status' objects ( #829 )
2020-06-16 14:46:46 +02:00
Mike Fährmann
5ba90f72ca
[pinterest] add support for sections ( closes #835 )
2020-06-16 14:41:05 +02:00
Mike Fährmann
c37a1c06c8
[twitter] add extractor for liked tweets ( closes #837 )
...
You need to be logged in to get access to anyone's liked tweets,
it seems.
2020-06-16 14:27:22 +02:00
Mike Fährmann
b94394104c
[twitter] don't download video previews ( #833 )
...
when 'videos' is set to False
2020-06-16 14:10:51 +02:00
Mike Fährmann
bb882b8cdb
improve output of '-K' for parent extractors ( #825 )
2020-06-14 21:39:21 +02:00
Mike Fährmann
6db7ed90cb
release version 1.14.1
2020-06-12 20:12:09 +02:00
Mike Fährmann
087e3184dc
use a non-twitter URL when testing snap creation
2020-06-12 18:31:14 +02:00
Mike Fährmann
c184cce876
update configuration.rst
...
- fix anonymous links
- update description of 'extractor.twitter.videos'
- document 'extractor.redgifs.format' (#724 )
2020-06-12 18:25:17 +02:00
Mike Fährmann
4cf3d54718
[kissmanga] workaround for CAPTCHAs ( fixes #818 )
...
Requesting the same page again when being redirected to a CAPTCHA
lets us access that page without solving it.
2020-06-12 00:41:49 +02:00
Mike Fährmann
7daef6ee70
update extractor test results
...
- certain posts on Instagram now return
https://static.cdninstagram.com/rsrc.php/null.jpg
for public users
- MangaDex is deploying its new MangaDex@Home network similar to
exhentai's Hentai@Home
- realbooru has a new site layout, but the underlying booru API still
works like before
2020-06-12 00:36:06 +02:00
Mike Fährmann
ffb6c5277a
[furaffinity] add 'artist_url' metadata field ( closes #821 )
2020-06-11 18:36:24 +02:00
Mike Fährmann
be04e44e2c
[reddit] catch JSON decode errors ( #765 )
2020-06-11 18:32:52 +02:00
Mike Fährmann
cf863f60b3
[redgifs] add 'user' and 'search' extractors ( closes #724 )
2020-06-10 22:03:52 +02:00
Mike Fährmann
998d1d3a5c
[webtoons] generalize and improve comic extraction ( fixes #820 )
2020-06-10 21:44:42 +02:00
Mike Fährmann
1489712325
resolve redirects after solving Cloudflare challanges
2020-06-10 21:14:57 +02:00
Mike Fährmann
b0b1feaa67
request 'transparent.gif' when solving Cloudflare challenges
...
This currently also works without, but they might be using these to
detect potential bots in the future.
2020-06-10 21:04:33 +02:00
Mike Fährmann
036a40943a
[twitter] don't cache results of 'user_by_screen_name()'
...
A 'keyarg=1' argument to the memcache decorator would have worked as
well, but keeping the user object in memory isn't useful for the vast
majority of use cases and only wastes space.
(closes #817 )
2020-06-10 20:58:42 +02:00
Mike Fährmann
4442dfe7b8
[twitter] add 'reply_to' metadata to replies
2020-06-09 21:48:04 +02:00
Mike Fährmann
83b7bd0413
[nhentai] fix extraction ( closes #819 )
2020-06-09 21:27:07 +02:00
Mike Fährmann
d769bb4b80
[twitter] improve pagination
2020-06-07 15:23:45 +02:00
Mike Fährmann
5bc1097f9d
[twitter] metadata cleanup #2
...
- remove useless clutter by creating new tweet-data dicts instead of
reusing the original Tweet objects
- rename fields to how they were named before
('id_str' -> 'tweet_id', etc.)
- only include 'author' if it would differ from 'user'
- restore 'archive_fmt'
2020-06-07 02:25:29 +02:00
Mike Fährmann
1fcf938f9c
implement a general 'delete_items()' function
2020-06-06 23:49:49 +02:00
Mike Fährmann
c6c06c41f6
[deviantart] don't add journal text to description ( #712 )
2020-06-05 21:56:12 +02:00
Mike Fährmann
4aea5138dd
[sensescans] use https://
2020-06-05 21:55:19 +02:00
Mike Fährmann
3eed5f52d7
[twitter] small metadata cleanup
...
- add 'date' field
- remove 'entities' and 'extended_entities'
- don't include 'focus_fields' from 'original_info'
2020-06-04 18:21:54 +02:00
Mike Fährmann
655c98cbef
[twitter] skip unavailable tweets
2020-06-04 14:51:25 +02:00
Mike Fährmann
41d03160ff
[deviantart] also search journals for sta.sh links ( #712 )
...
when 'extra' is enabled
2020-06-04 14:47:08 +02:00