With --metadata-json, a JSON file for each post is created saving the
Post properties defined in instaloader.Post class, i.e. caption, number
of likes, people tagged in caption or the picture itself, etc.
This closes #33 and closes #47.
caption_mentions is a list of all lowercased profiles that are mentioned
in the Post's caption, without preceeding '@'.
tagged_users is a list of all lowercased users that are tagged in the
Post. This was requested in #47.
Just like all properties of instaloader.Post class, caption_mentions and
tagged_users are available for --only-if filters.
caption_hashtags is a list of all hashtags that are mentioned in the
Post's caption. It allows to easily filter Posts that have multiple
hashtags, and as such fixes #24.
Further, the documentation of --only-if has been completed by linking to
a description of the syntax in the Python documentation, and by linking
to a list of all defined properties with their meanings. So, this commit
also closes #42.
I've come across several implausible values for `device_timestamp` (such as 182428140, which would be in october 1975, assuming millisecond). I guess it is due to improperly configured phones, or maybe some third party software that's mangeling the EXIF data on images before posting. Anyway. Since by the nature of stories, the `taken_at` timestamp (presumably when the instagram servers received the post) ought to be approximately when an image was actually taken. So there's no real value trying to use the timestamp provided by the photo-taking-device.
Additional sleeps are necessary because Instagram is rate limiting
GraphQL queries. The error does not occur if not more than 100 queries
are made in a sliding window of eleven minutes.
Ports a894c2d to version 3.
If --only-if='likes>1000 or viewer_has_liked' is given, it is not
neccessary to evaluate viewer_has_liked if the post has more than 1000
likes. The new implementation smartly handles this case.
download_stories() may trow a BadResponseException, which should not
cause abortion of download_profile(). Now, all calls of
download_stories() are within an _error_catcher context.
These are now adapted to how many requests have already been done. With
the current settings, Instaloader does not more than
12 request in the first ten seconds,
28 requests in the first minute,
40 requests in the first two minutes,
63 requests in the first five minutes,
90 requests in the first ten minutes,
and after that 50 requests per ten minutes.
This should make it less likely that Instaloader is rate-limited by
Instagram, while still being fast if downloading only a few posts.
Further, option --no-sleep is hidden in --help output and README.rst.
Fixes a KeyError which occurred when fetching some information note
available in Post._node. Makes all properties not throw exceptions,
rather they return None if the information cannot be obtained.
sidecar_edges is now a method get_sidecar_edges(). get_comments() does
not do any additional requests if the post does not have any comment.
Fixes #39.
where FILTER is a boolean expression in python syntax where all names
are evaluated to instaloader.Post properties.
Examples:
instaloader --login=USER --only-if='viewer_has_liked' :feed
instaloader --only-if='likes>1000 and comments>5' profile
The default Instaloader headers aren't passed with a simple `requests.head()`, so leakage of user agents such as `"python-requests/2.18.1"` will occur.
This simplifies accessing properties of a Post. Method download_post()
remains to class Instaloader rather than Post, as it fits there better.
Also, since it is now easily possible, all download_*() functions now
have a filter_func parameter. Its meaning has been reverted to be
consistent of how a filter is commonly understood: A post is downloaded
iff filter_func is None or evaluates to True.
Post.get_comments() foreports commit 86fb80d ("Avoid GraphQL queries if
all comments in metadata").
The original method of substituting 2048x2048 for whatever resolution was given seemed somewhat convoluted. This accomplishes the same thing, except raising an exception if the given url is not on the right domain.
Instead of retrying a download attempt answered with a 404, the download
is aborted after the first attempt. Thanks to the _error_catcher(), a
message is printed and Instaloader goes on with the next files to
download.
Further, this commit removes the unneeded NodeUnavailableException and
adjusts docstrings accordingly.
Additional sleeps are necessary because Instagram is rate limiting
GraphQL queries. The error does not occur if not more than 100 queries
are made in a sliding window of eleven minutes.
Fixes #29
If get_node_metadata() is able to provide all comments of a node, no
additional query is needed. Especially GraphQL queries are time
expensive because no more than 100 can be queried in ten minutes. Since
get_node_metadata() does not use GraphQL queries, this is a usefull
tradeoff.
+ Additional error handling
Now, setup.py does not assume to be called from the path where the
source tree resides. This fixes getting the long_description and the
version if setup.py is called from outside.
These options instruct instaloader to not save captions or geotags
respectively, even if the regarding information can be obtained without
any additional queries to Instagram.
This feature was proposed in #25, and thus this commit should close #25.
Get rid of NonfatalException (an exception is nonfatal iff it is
catched somewhere)
Foreport fixes for #26 and #30.
The current __sersion__ string is now kept in instaloader.py rather than
setup.py. This lets instaloader.__version__ always deliver the version
of the actually-loaded Instaloader module.
Minor changes to README.rst, error handling and which class methods are
public.
With these and the changes of the previous commit, we saved 31 lines of
code, indicating that it might be easier to understand and to maintain.
Instead of relying on that datetime.fromtimestamp() raises ValueError if
the timestamp of a story is in milliseconds rather than seconds, we now
compare the timestamp value with a timestamp of the year 2286 to decide
whether to divide it by 1000 or not.
This is motivated by #30, where an OSError is raised in
datetime.fromtimestamp() under Windows.