Mike Fährmann
208202b962
[tumblr] improve error handling ( #297 )
...
In some cases Tumblr's API responds with an HTML document.
Trying to decode it as JSON would raise an uncaught exception.
2019-06-04 14:02:17 +02:00
Mike Fährmann
add7e693d0
[tumblr] provide parsed 'date' metadata ( #232 )
2019-04-29 17:30:42 +02:00
Mike Fährmann
fb14f80d62
[tumblr] fix avatar URLs for non-OAuth1.0 calls ( closes #193 )
2019-03-17 11:07:22 +01:00
Mike Fährmann
d0059cab79
[tumblr] check for null URLs ( closes #165 )
2019-02-19 13:49:55 +01:00
Mike Fährmann
5530871b5a
change results of text.nameext_from_url()
...
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)
Example: "https://example.org/path/filename.ext "
before:
- filename : filename.ext
- name : filename
- extension: ext
now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
2019-02-11 13:31:10 +01:00
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
0afa913de4
[tumblr] add tests for hidden and private blogs ( #145 )
...
Hidden / dashboard-only blogs are pretty straightforward and "only"
require a valid 'access-token' and 'access-token-secret' for the given
'api-key' and 'api-secret', so that signed OAuth1.0 requests are possible.
Private / password protected blogs on the other hand are a bit
cumbersome. In addition to a valid 'access-token' and
'access-token-secret', they also require the account belonging to those
tokens to be a member of the blog itself. Knowing the password and
entering it in the website isn't enough to access a blog through the
API. Following a private blog is also impossible, so that option can't
work either.
2019-01-03 16:12:24 +01:00
Mike Fährmann
2f4f60de33
[tumblr] add tests for each post type
2018-12-27 22:41:42 +01:00
Mike Fährmann
28f9539551
[tumblr] change default values for post types and inline media
2018-12-26 18:55:59 +01:00
Mike Fährmann
5be95034ba
[tumblr] add option to download avatars ( #137 )
2018-12-26 14:29:30 +01:00
Mike Fährmann
2e5f82e59e
[tumblr] don't follow 'external' Tumblr URLs ( #139 )
2018-12-22 14:05:43 +01:00
Mike Fährmann
049a9575c4
[tumblr] fix inline extraction #2
...
Using only the "comment" field isn't enough ...
[ci skip]
2018-12-11 21:57:20 +01:00
Mike Fährmann
b7a9f6cc49
[tumblr] improve inline extraction ( #137 )
2018-12-11 20:02:48 +01:00
HRXN
e80ee77d71
tumblr.py: update regex for video ( #133 )
...
There seems to be another sub-domain for videos, apparently..
Not just
`vt(.media).tumblr`
`vtt(media).tumblr`
But also
`ve(.media).tumblr`
2018-12-09 09:07:46 +01:00
Mike Fährmann
9a98b6769d
use extractor.request for API calls ( #130 )
...
... at least for OAuth1.0 based APIs (flickr, smugmug, tumblr)
2018-12-04 21:29:06 +01:00
Mike Fährmann
ad2cefda6b
[tumblr] in case of exception use filename as 'hash' ( #129 )
...
While a filename might not be a real 'hash', or comparable to what
tumbler usually provides, it is still better than an empty string.
At least as long as "alternatives" in format strings aren't implemented.
2018-12-04 19:15:23 +01:00
Mike Fährmann
95636418ad
[tumblr] catch exception for 'hash' extraction ( fixes #129 )
2018-12-02 19:48:09 +01:00
Mike Fährmann
7742cf8601
[tumblr] change 'reblogs' option ( #103 )
...
- rename "deleted" to "same-blog"
- change test for deleted original post to test if
original post owner has the same UUID (full blog name) as the one
being downloaded from
- add 'blog[uuid]' metadata to allow comparison with
'reblogged_from_uuid'
2018-09-10 15:40:25 +02:00
Mike Fährmann
d4d95d3154
[tumblr] improve rewrite rules for video URLs
2018-09-09 14:09:47 +02:00
Mike Fährmann
a666ddd16b
[tumblr] extend 'reblogs' functionality ( #103 )
...
Setting 'reblogs' to "deleted" will check if the parent post of a
reblog has been deleted and download its media content if that is the
case, otherwise it will be skipped.
This is a rather costly operation (1 API request per reblogged post)
and should therefore be used with care.
2018-09-07 19:13:52 +02:00
Mike Fährmann
b4eca2633e
[tumblr] support /archive URLs
2018-09-06 11:09:13 +02:00
Mike Fährmann
aa1de70da0
[tumblr] recognize inline videos ( #102 )
2018-09-06 10:37:40 +02:00
Mike Fährmann
5b8a314de7
[tumblr] replace inline URLs with higher quality ones ( #98 )
2018-08-25 18:43:51 +02:00
Mike Fährmann
a74591b84b
[tumblr] remove "original image" functionality
...
Accessing higher/original quality images on
https://s3.amazonaws.com/data.tumblr.com and http://data.tumblr.com
is no longer possible and any HTTP request results in 403 Forbidden.
A few images can still be accessed through https//a.tumblr.com [1][2],
but not as "_raw", just "_1280", and that might also be "fixed" in
the near future.
[1] https://a.tumblr.com/tumblr_kzjlfiTnfe1qz4rgho1_1280.jpg
[2] https://a.tumblr.com/ee589c6345f29d2d5935cecb49b0a705/tumblr_oztu02dIHp1wgha4yo1_1280.png
2018-08-14 11:51:17 +02:00
Mike Fährmann
1c1e086d01
use common base class for OAuth1.0 based API interfaces
2018-05-10 21:57:45 +02:00
Mike Fährmann
6a31ada9e3
re-implement OAuth1.0 code
...
OAuth support for SmugMug needs some additional features
(auth-rebuild on redirect, query parameters in URL, ...)
and fixing this in the old code wouldn't work all that well.
2018-05-10 18:47:05 +02:00
Mike Fährmann
69a5e6ddb3
Merge branch 'master' into 1.4-dev
2018-05-04 10:19:02 +02:00
Mike Fährmann
8b79eaafea
[tumblr] log actual time of rate limit resets
...
... instead of the amount of seconds until a reset
2018-04-25 16:13:03 +02:00
Mike Fährmann
f471161920
Merge branch 'master' into 1.4-dev
2018-04-21 12:15:40 +02:00
Mike Fährmann
b1325d4d2c
fix extractor docstrings
2018-04-18 18:03:43 +02:00
Mike Fährmann
728c64a3fb
[tumblr] rename 'offset' to 'num and adjust formats
...
Trying to somehow emulate Tumblr filenames is a bad idea ...
2018-04-15 18:58:32 +02:00
Mike Fährmann
6bd857a319
[tumblr] handle rate limits / 429 errors
...
- wait for the hourly limit to reset
- abort upon exceeding the daily limit (it doesn't seem useful to
potentially wait for several hours)
2018-04-12 16:25:20 +02:00
Mike Fährmann
a1fa4b43b0
Revert "[tumblr] add option to sort photosets by upload order"
...
This reverts commit 4a26ae32df
.
2018-04-09 16:08:08 +02:00
Mike Fährmann
4a26ae32df
[tumblr] add option to sort photosets by upload order
2018-04-07 15:57:55 +02:00
Mike Fährmann
6b72be8ee6
[tumblr] add 'hash' keyword
...
'hash' is the middle part of the filename in a tumblr image URL.
For example an image with '.../tumblr_p6tgemp1NZ1wgha4yo1_250.png' as
its URL would have 'p6tgemp1NZ1wgha4yo1' as hash.
2018-04-07 15:54:30 +02:00
Mike Fährmann
68e9fbee16
[tumblr] check all 4 keys/secrets before using OAuth
...
it was possible to cause a crash by setting api-key or -secret to null.
(this commit also slightly improves the blog-cache implementation)
2018-04-05 15:42:23 +02:00
Mike Fährmann
f8168c693e
[tumblr] avoid calls to '/blog/.../info'
...
The same information returned by the 'blog/.../info' API endpoint
is also included in the result of every 'blog/.../posts' call.
2018-04-04 14:15:24 +02:00
Mike Fährmann
858fdbdb22
[tumblr] improve 'inline' extraction
...
'quote' posts store their HTML content in the 'source' field
2018-03-02 06:59:44 +01:00
Mike Fährmann
5008e105ee
update archive IDs
...
... to behave in a more straightforward way when dealing with
bookmarks/favourites/etc.
specific IDs are now grouped by their owner, album-id, ... to
allow for duplicates when it would be expected.
2018-03-01 18:20:50 +01:00
Mike Fährmann
3cec533c28
Merge branch 'archive'
2018-02-12 18:07:58 +01:00
Mike Fährmann
d38bf2f54c
[tumblr] recognize /image/... URLs
...
xyz.tumblr.com/image/123 refers to the same images
as xyz.tumblr.com/post/123.
2018-02-08 23:08:14 +01:00
Mike Fährmann
34873dbd90
set 'archive_fmt' values
...
These are going to be used to create an unique id for each image.
2018-02-01 15:30:49 +01:00
Mike Fährmann
9fccd7b783
[tumblr] provide fallback URLs ( #64 )
...
Each image now produces 3 URLs:
- amazonaws.com _raw (or _1280 for older images)
- amazonaws.com _500
- media.tumblr.com (URL returned by API)
2018-01-19 23:12:15 +01:00
Mike Fährmann
421a9740a3
[tumblr] add 'tumblr:' to force Tumblr extractor ( #71 )
2018-01-15 18:27:58 +01:00
Mike Fährmann
9a049bdf51
[tumblr] add 'likes' extractor ( #65 )
2018-01-12 14:56:01 +01:00
Mike Fährmann
29d75fc3fa
[tumblr] add support for OAuth authentication ( #65 )
2018-01-11 14:11:37 +01:00
Mike Fährmann
75b2e84b6d
[tumblr] use s3.amazonaws.com for image URLs ( #64 )
2018-01-09 15:13:00 +01:00
Mike Fährmann
03b8a548cb
[tumblr] change reblogs
default value to true
( #61 )
2018-01-06 15:52:08 +01:00
Mike Fährmann
d235f68f59
[tumblr] add option to filter reblogged posts ( #61 )
...
Reblogs are ignored by default, but can be included by setting
'extractor.tumblr.reblogs' to 'true'.
2018-01-05 13:05:57 +01:00
Mike Fährmann
b14de6ffc2
[tumblr] small improvements
...
- don't transform inline GIF URLs
- set 'type' parameter for API calls if there is only
one post type selected
2017-11-24 16:51:07 +01:00
Mike Fährmann
9296a26eae
[tumblr] add warning messages
2017-11-23 16:12:07 +01:00
Mike Fährmann
12de658937
[tumblr] add options to control extraction behavior ( #48 )
...
- posts : list of post-types to inspect
- inline : scan post bodies for inline images
- external: follow external links
2017-11-23 15:32:54 +01:00
Mike Fährmann
077f8c12be
[tumblr] original video URLs + continuous offset
2017-11-20 20:51:02 +01:00
Mike Fährmann
8eb12ebeae
[tumblr] support more post/media types ( #48 )
...
This adds support for audio and video posts (most videos are shared
from youtube/instagram which isn't supported -> youtube-dl),
as well as link posts and image-search inside of text posts.
Most of this is just WIP and will need some sort of improvement
and options to enable/disable different media types etc.
2017-11-18 23:11:32 +01:00
Mike Fährmann
980fd3616d
[tumblr] use API v2 ( #48 )
2017-11-03 22:16:57 +01:00
Mike Fährmann
d6bed9f36f
[tumblr] prevent premature exit to get all images ( fixes #48 )
2017-11-03 14:59:31 +01:00
Mike Fährmann
81a7788b40
replace space characters in unit test URLs
2017-10-23 17:00:53 +02:00
Mike Fährmann
393755ee94
[tumblr] update tests
2017-10-09 00:10:37 +02:00
Mike Fährmann
6f30cf4c64
change keyword names to valid Python identifiers
...
This commit mostly replaces all minus-signs ('-') in keyword names with
underscores ('_') to allow them to be used in filter-expressions. For
example 'gallery-id' got renamed to 'gallery_id'.
(It is theoretically possible to access any variable, regardless of its
name, with 'locals()["NAME"]', but that seems a bit too convoluted if
just 'NAME' could be enough)
2017-09-10 22:20:47 +02:00
Mike Fährmann
80c2e03aaa
[reddit] allow 'date-min/max' to be human readable dates
...
If the date-min/max config value is a string, try parsing it using
datetime.strptime [1] with 'date-format' as format string [2]
(default: "%Y-%m-%dT%H:%M:%S")
Example: get all submissions posted in 2016
$ gallery-dl reddit.com/r/... \
-o date-format=%Y \
-o date-min=\"2016\" \
-o date-max=\"2017\"
[1] https://docs.python.org/3/library/datetime.html#datetime.datetime.strptime
[2] https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior
2017-07-01 18:46:38 +02:00
Mike Fährmann
71e08dc9c4
[tumblr] keyword consistency
2017-04-13 20:47:22 +02:00
Mike Fährmann
bd95fea82c
update unit test results
2017-04-11 21:03:09 +02:00
Mike Fährmann
94e10f249a
code adjustments according to pep8 nr2
2017-02-01 00:53:19 +01:00
Mike Fährmann
0211ec4114
update some tests
2016-12-08 00:24:23 +01:00
Mike Fährmann
8d106a447c
[tumblr] delete more useless keywords
2016-09-27 21:49:38 +02:00
Mike Fährmann
56d810c896
update keyword hashes for tests
2016-09-25 17:28:46 +02:00
Mike Fährmann
19c2d4ff6f
remove explicit (sub)category keywords
2016-09-25 14:22:07 +02:00
Mike Fährmann
85ff3d160e
[tumblr] fix json parsing + metadata consistency
2016-09-16 09:38:14 +02:00
Mike Fährmann
d7e168799d
consistent extractor naming scheme + docstrings
2016-09-12 10:34:31 +02:00
Mike Fährmann
808cf69556
update a few tests
2016-09-01 18:28:16 +02:00
Mike Fährmann
6f7d42b974
update tests
2016-07-12 12:08:36 +02:00
Mike Fährmann
81096f7790
[tumblr] fix json parsing
2016-03-06 15:30:55 +01:00
Mike Fährmann
f974ea73db
[tumblr] add tag-extractor
2016-02-20 15:24:55 +01:00
Mike Fährmann
58a0029bb2
[tumblr] add post-extractor
2016-02-20 15:24:30 +01:00
Mike Fährmann
8eb7232169
[tumblr] add extractor
2016-02-20 11:29:10 +01:00