1
0
mirror of https://github.com/instaloader/instaloader.git synced 2024-10-03 22:07:11 +02:00

Merge branch 'master' into upcoming/v4.6

This commit is contained in:
Alexander Graf 2020-11-28 19:00:49 +01:00
commit 4c02a186d3
12 changed files with 274 additions and 68 deletions

View File

@ -13,7 +13,7 @@ Steps to reproduce the behavior:
(e.g. Instaloader command line)
**Expected behavior**
A clear and concise description of what you expected to happen (if not obvious).
A clear and concise description of what you expected to happen (even if it seems obvious).
**Error messages and tracebacks**
If applicable, add error messages and tracebacks to help explain your problem.

View File

@ -6,3 +6,4 @@ Instaloader is written by
- Alexander Graf (@aandergr)
- André Koch-Kramer (@Thammus)
- Lars Lindqvist (@e5150)
- ... and many more, see https://github.com/instaloader/instaloader/graphs/contributors

135
CODE_OF_CONDUCT.md Normal file
View File

@ -0,0 +1,135 @@
# Contributor Covenant Code of Conduct
## Our Pledge
We as members, contributors, and leaders pledge to make participation in our
community a harassment-free experience for everyone, regardless of age, body
size, visible or invisible disability, ethnicity, sex characteristics, gender
identity and expression, level of experience, education, socio-economic status,
nationality, personal appearance, race, religion, or sexual identity
and orientation.
We pledge to act and interact in ways that contribute to an open, welcoming,
diverse, inclusive, and healthy community.
## Our Standards
Examples of behavior that contributes to a positive environment for our
community include:
* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes,
and learning from the experience
* Focusing on what is best not just for us as individuals, but for the
overall community
Examples of unacceptable behavior include:
* The use of sexualized language or imagery, and sexual attention or
advances of any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or email
address, without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting
## Enforcement Responsibilities
Community leaders are responsible for clarifying and enforcing our standards of
acceptable behavior and will take appropriate and fair corrective action in
response to any behavior that they deem inappropriate, threatening, offensive,
or harmful.
Community leaders have the right and responsibility to remove, edit, or reject
comments, commits, code, wiki edits, issues, and other contributions that are
not aligned to this Code of Conduct, and will communicate reasons for moderation
decisions when appropriate.
## Scope
This Code of Conduct applies within all community spaces, and also applies when
an individual is officially representing the community in public spaces.
Examples of representing our community include using an official e-mail address,
posting via an official social media account, or acting as an appointed
representative at an online or offline event.
## Enforcement
Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement by opening an
issue or contacting one or more of the project maintainers.
All complaints will be reviewed and investigated promptly and fairly.
All community leaders are obligated to respect the privacy and security of the
reporter of any incident.
## Enforcement Guidelines
Community leaders will follow these Community Impact Guidelines in determining
the consequences for any action they deem in violation of this Code of Conduct:
### 1. Correction
**Community Impact**: Use of inappropriate language or other behavior deemed
unprofessional or unwelcome in the community.
**Consequence**: A private, written warning from community leaders, providing
clarity around the nature of the violation and an explanation of why the
behavior was inappropriate. A public apology may be requested.
### 2. Warning
**Community Impact**: A violation through a single incident or series
of actions.
**Consequence**: A warning with consequences for continued behavior. No
interaction with the people involved, including unsolicited interaction with
those enforcing the Code of Conduct, for a specified period of time. This
includes avoiding interactions in community spaces as well as external channels
like social media. Violating these terms may lead to a temporary or
permanent ban.
### 3. Temporary Ban
**Community Impact**: A serious violation of community standards, including
sustained inappropriate behavior.
**Consequence**: A temporary ban from any sort of interaction or public
communication with the community for a specified period of time. No public or
private interaction with the people involved, including unsolicited interaction
with those enforcing the Code of Conduct, is allowed during this period.
Violating these terms may lead to a permanent ban.
### 4. Permanent Ban
**Community Impact**: Demonstrating a pattern of violation of community
standards, including sustained inappropriate behavior, harassment of an
individual, or aggression toward or disparagement of classes of individuals.
**Consequence**: A permanent ban from any sort of public interaction within
the community.
## Attribution
This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 2.0, available at
[https://www.contributor-covenant.org/version/2/0/code_of_conduct.html][v2.0].
Community Impact Guidelines were inspired by
[Mozilla's code of conduct enforcement ladder][Mozilla CoC].
For answers to common questions about this code of conduct, see the FAQ at
[https://www.contributor-covenant.org/faq][FAQ]. Translations are available
at [https://www.contributor-covenant.org/translations][translations].
[homepage]: https://www.contributor-covenant.org
[v2.0]: https://www.contributor-covenant.org/version/2/0/code_of_conduct.html
[Mozilla CoC]: https://github.com/mozilla/diversity
[FAQ]: https://www.contributor-covenant.org/faq
[translations]: https://www.contributor-covenant.org/translations

View File

@ -1,6 +1,6 @@
The MIT License (MIT)
Copyright (c) 2016-2019 Alexander Graf and André Koch-Kramer.
Copyright (c) 2016-2020 Alexander Graf and André Koch-Kramer.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal

View File

@ -32,7 +32,7 @@ reporting a problem, please keep the following in mind:
- Include all **error messages and tracebacks** in the report.
- If not obvious, describe **which behavior you expected**
- Even if it seems obvious, describe **which behavior you expected**
instead of what actually happened.
- If we have closed an issue apparently inadvertently or inappropriately, please

View File

@ -33,6 +33,9 @@ To **upgrade Instaloader** to its current version, do::
- On **Windows 10**, you may download the standalone executable from the
`current release page <https://github.com/instaloader/instaloader/releases/latest>`__.
- On **Android**, you can use Instaloader with `Termux <https://play.google.com/store/apps/details?id=com.termux>`__
after typing ``pkg install python`` and ``pip3 install instaloader``.
- To test the most current **pre-release** version of Instaloader::
pip3 install --pre instaloader

View File

@ -9,10 +9,7 @@ Troubleshooting
---------------------
Instaloader has a logic to keep track of its requests to Instagram and to obey
their rate limits. Since they are nowhere documented, we try them out
experimentally. We have a daily cron job running to confirm that Instaloader
still stays within the rate limits. Nevertheless, the rate control logic assumes
that
their rate limits. The rate controller assumes that
- at one time, Instaloader is the only application that consumes requests, i.e.
neither the Instagram browser interface, nor a mobile app, nor another
@ -21,7 +18,13 @@ that
- no requests had been consumed when Instaloader starts.
The latter one implies that restarting or reinstantiating Instaloader often
within short time is prone to cause a 429. If a request is denied with a 429,
within short time is prone to cause a 429.
Since the behavior of the rate controller might change between different
versions of Instaloader, make sure to use the current version of Instaloader,
especially when encountering many 429 errors.
If a request is denied with a 429,
Instaloader retries the request as soon as the temporary ban is assumed to be
expired. In case the retry continuously fails for some reason, which should not
happen under normal conditions, consider adjusting the
@ -32,13 +35,18 @@ promiscuous IP addresses, such as cloud, VPN and public proxy services, might be
subject to significantly stricter limits for anonymous access. However,
logged-in accesses (see :option:`--login`) do not seem to be affected.
Instaloader allows to adjust the rate controlling behavior by overriding
:class:`instaloader.RateController`.
Too many queries in the last time
---------------------------------
**"Too many queries in the last time"** is not an error. It is a notice that the
rate limit has almost been reached, according to Instaloader's own rate
accounting mechanism. We regularly adjust this mechanism to match Instagram's
current rate limiting.
accounting mechanism.
Instaloader allows to adjust the rate controlling behavior by overriding
:class:`instaloader.RateController`.
Private but not followed
------------------------
@ -57,9 +65,8 @@ pointing the user to an URL to be opened in a browser.
Nevertheless, in :issue:`92` and :issue:`615` users reported problems with
logging in. We recommend to always keep the session file which Instaloader
creates when using :option:`--login`. If a session file is present,
:option:`--login` does not make make use of the failure-prone login procedure.
Also, session files usually do not expire and can be copied between different
computers without any problems.
:option:`--login` does not make use of the failure-prone login procedure.
Also, session files usually do not expire.
If you do not have a session file present, you may use the following script
(:example:`615_import_firefox_session.py`) to workaround login problems by

View File

@ -1,7 +1,7 @@
"""Download pictures (or videos) along with their captions and other metadata from Instagram."""
__version__ = '4.5.1'
__version__ = '4.5.4'
try:

View File

@ -514,15 +514,16 @@ class Instaloader:
# Download the image(s) / video thumbnail and videos within sidecars if desired
downloaded = True
if post.typename == 'GraphSidecar':
for edge_number, sidecar_node in enumerate(post.get_sidecar_nodes(), start=1):
if self.download_pictures and (not sidecar_node.is_video or self.download_video_thumbnails):
# Download sidecar picture or video thumbnail (--no-pictures implies --no-video-thumbnails)
downloaded &= self.download_pic(filename=filename, url=sidecar_node.display_url,
mtime=post.date_local, filename_suffix=str(edge_number))
if sidecar_node.is_video and self.download_videos:
# Download sidecar video if desired
downloaded &= self.download_pic(filename=filename, url=sidecar_node.video_url,
mtime=post.date_local, filename_suffix=str(edge_number))
if self.download_pictures or self.download_videos:
for edge_number, sidecar_node in enumerate(post.get_sidecar_nodes(), start=1):
if self.download_pictures and (not sidecar_node.is_video or self.download_video_thumbnails):
# Download sidecar picture or video thumbnail (--no-pictures implies --no-video-thumbnails)
downloaded &= self.download_pic(filename=filename, url=sidecar_node.display_url,
mtime=post.date_local, filename_suffix=str(edge_number))
if sidecar_node.is_video and self.download_videos:
# Download sidecar video if desired
downloaded &= self.download_pic(filename=filename, url=sidecar_node.video_url,
mtime=post.date_local, filename_suffix=str(edge_number))
elif post.typename == 'GraphImage':
# Download picture
if self.download_pictures:
@ -578,12 +579,12 @@ class Instaloader:
def _userid_chunks():
assert userids is not None
userids_per_query = 100
userids_per_query = 50
for i in range(0, len(userids), userids_per_query):
yield userids[i:i + userids_per_query]
for userid_chunk in _userid_chunks():
stories = self.context.graphql_query("bf41e22b1c4ba4c9f31b844ebb7d9056",
stories = self.context.graphql_query("303a4ae99711322310f25250d988f3b7",
{"reel_ids": userid_chunk, "precomposed_overlay": False})["data"]
yield from (Story(self.context, media) for media in stories['reels_media'])
@ -856,7 +857,7 @@ class Instaloader:
"""
self.context.log("Retrieving saved posts...")
assert self.context.username is not None # safe due to @_requires_login; required by typechecker
node_iterator = Profile.from_username(self.context, self.context.username).get_saved_posts()
node_iterator = Profile.own_profile(self.context).get_saved_posts()
self.posts_download_loop(node_iterator, ":saved",
fast_update, post_filter,
max_count=max_count, total_count=node_iterator.count)

View File

@ -347,10 +347,10 @@ class InstaloaderContext:
raise ConnectionException("\"window._sharedData\" does not contain required keys.")
# If GraphQL data is missing in `window._sharedData`, search for it in `__additionalDataLoaded`.
if 'graphql' not in post_or_profile_page[0]:
match = re.search(r'window\.__additionalDataLoaded\([^{]+{"graphql":({.*})}\);</script>',
match = re.search(r'window\.__additionalDataLoaded\(.*?({.*"graphql":.*})\);</script>',
resp.text)
if match is not None:
post_or_profile_page[0]['graphql'] = json.loads(match.group(1))
post_or_profile_page[0]['graphql'] = json.loads(match.group(1))['graphql']
return resp_json
else:
resp_json = resp.json()
@ -545,8 +545,8 @@ class RateController:
def __init__(self, context: InstaloaderContext):
self._context = context
self._graphql_query_timestamps = dict() # type: Dict[str, List[float]]
self._graphql_earliest_next_request_time = 0.0
self._query_timestamps = dict() # type: Dict[str, List[float]]
self._earliest_next_request_time = 0.0
def sleep(self, secs: float):
"""Wait given number of seconds."""
@ -556,11 +556,11 @@ class RateController:
time.sleep(secs)
def _dump_query_timestamps(self, current_time: float, failed_query_type: str):
windows = [10, 11, 15, 20, 30, 60]
windows = [10, 11, 20, 22, 30, 60]
self._context.error("Requests within last {} minutes grouped by type:"
.format('/'.join(str(w) for w in windows)),
repeat_at_end=False)
for query_type, times in self._graphql_query_timestamps.items():
for query_type, times in self._query_timestamps.items():
reqs_in_sliding_window = [sum(t > current_time - w * 60 for t in times) for w in windows]
self._context.error(" {} {:>32}: {}".format(
"*" if query_type == failed_query_type else " ",
@ -569,28 +569,61 @@ class RateController:
), repeat_at_end=False)
def count_per_sliding_window(self, query_type: str) -> int:
"""Return how many GraphQL requests can be done within the sliding window."""
"""Return how many requests can be done within the sliding window."""
# Not static, to allow for the count_per_sliding_window to depend on context-inherent properties, such as
# whether we are logged in.
# pylint:disable=no-self-use,unused-argument
return 200
# pylint:disable=no-self-use
return 75 if query_type in ['iphone', 'other'] else 200
def _reqs_in_sliding_window(self, query_type: Optional[str], current_time: float, window: float) -> List[float]:
if query_type is not None:
# timestamps of type query_type
relevant_timestamps = self._query_timestamps[query_type]
else:
# all GraphQL queries, i.e. not 'iphone' or 'other'
graphql_query_timestamps = filter(lambda tp: tp[0] not in ['iphone', 'other'],
self._query_timestamps.items())
relevant_timestamps = [t for times in (tp[1] for tp in graphql_query_timestamps) for t in times]
return list(filter(lambda t: t > current_time - window, relevant_timestamps))
def query_waittime(self, query_type: str, current_time: float, untracked_queries: bool = False) -> float:
"""Calculate time needed to wait before GraphQL query can be executed."""
sliding_window = 660
if query_type not in self._graphql_query_timestamps:
self._graphql_query_timestamps[query_type] = []
self._graphql_query_timestamps[query_type] = list(filter(lambda t: t > current_time - 60 * 60,
self._graphql_query_timestamps[query_type]))
reqs_in_sliding_window = list(filter(lambda t: t > current_time - sliding_window,
self._graphql_query_timestamps[query_type]))
count_per_sliding_window = self.count_per_sliding_window(query_type)
if len(reqs_in_sliding_window) < count_per_sliding_window and not untracked_queries:
return max(0.0, self._graphql_earliest_next_request_time - current_time)
next_request_time = min(reqs_in_sliding_window) + sliding_window + 6
if untracked_queries:
self._graphql_earliest_next_request_time = next_request_time
return max(next_request_time, self._graphql_earliest_next_request_time) - current_time
"""Calculate time needed to wait before query can be executed."""
per_type_sliding_window = 660
if query_type not in self._query_timestamps:
self._query_timestamps[query_type] = []
self._query_timestamps[query_type] = list(filter(lambda t: t > current_time - 60 * 60,
self._query_timestamps[query_type]))
def per_type_next_request_time():
reqs_in_sliding_window = self._reqs_in_sliding_window(query_type, current_time, per_type_sliding_window)
if len(reqs_in_sliding_window) < self.count_per_sliding_window(query_type):
return 0.0
else:
return min(reqs_in_sliding_window) + per_type_sliding_window + 6
def gql_accumulated_next_request_time():
if query_type in ['iphone', 'other']:
return 0.0
gql_accumulated_sliding_window = 600
gql_accumulated_max_count = 275
reqs_in_sliding_window = self._reqs_in_sliding_window(None, current_time, gql_accumulated_sliding_window)
if len(reqs_in_sliding_window) < gql_accumulated_max_count:
return 0.0
else:
return min(reqs_in_sliding_window) + gql_accumulated_sliding_window
def untracked_next_request_time():
if untracked_queries:
reqs_in_sliding_window = self._reqs_in_sliding_window(query_type, current_time, per_type_sliding_window)
self._earliest_next_request_time = min(reqs_in_sliding_window) + per_type_sliding_window + 6
return self._earliest_next_request_time
return max(0.0,
max(
per_type_next_request_time(),
gql_accumulated_next_request_time(),
untracked_next_request_time(),
) - current_time)
def wait_before_query(self, query_type: str) -> None:
"""This method is called before a query to Instagram. It calls :meth:`RateController.sleep` to wait
@ -602,10 +635,10 @@ class RateController:
.format(round(waittime), datetime.now() + timedelta(seconds=waittime)))
if waittime > 0:
self.sleep(waittime)
if query_type not in self._graphql_query_timestamps:
self._graphql_query_timestamps[query_type] = [time.monotonic()]
if query_type not in self._query_timestamps:
self._query_timestamps[query_type] = [time.monotonic()]
else:
self._graphql_query_timestamps[query_type].append(time.monotonic())
self._query_timestamps[query_type].append(time.monotonic())
def handle_429(self, query_type: str) -> None:
"""This method is called to handle a 429 Too Many Requests response. It calls :meth:`RateController.sleep` to

View File

@ -81,11 +81,13 @@ class NodeIterator(Iterator[T]):
self._node_wrapper = node_wrapper
self._query_variables = query_variables if query_variables is not None else {}
self._query_referer = query_referer
self._data = first_data
self._page_index = 0
self._total_index = 0
self._best_before = (None if first_data is None else
datetime.now() + NodeIterator._shelf_life)
if first_data is not None:
self._data = first_data
self._best_before = datetime.now() + NodeIterator._shelf_life
else:
self._data = self._query()
def _query(self, after: Optional[str] = None) -> Dict:
pagination_variables = {'first': NodeIterator._graphql_page_length} # type: Dict[str, Any]
@ -113,8 +115,6 @@ class NodeIterator(Iterator[T]):
return self
def __next__(self) -> T:
if self._data is None:
self._data = self._query()
if self._page_index < len(self._data['edges']):
node = self._data['edges'][self._page_index]['node']
page_index, total_index = self._page_index, self._total_index
@ -193,8 +193,12 @@ class NodeIterator(Iterator[T]):
self._query_referer != frozen.query_referer or
self._context.username != frozen.context_username):
raise InvalidArgumentException("Mismatching resume information.")
if not frozen.best_before:
raise InvalidArgumentException("\"best before\" date missing.")
if frozen.remaining_data is None:
raise InvalidArgumentException("\"remaining_data\" missing.")
self._total_index = frozen.total_index
self._best_before = datetime.fromtimestamp(frozen.best_before) if frozen.best_before else None
self._best_before = datetime.fromtimestamp(frozen.best_before)
self._data = frozen.remaining_data

View File

@ -147,10 +147,6 @@ class Post:
)
self._full_metadata_dict = pic_json['data']['shortcode_media']
if self._full_metadata_dict is None:
# issue #449
self._context.error("Fetching Post metadata failed (issue #449). "
"The following data has been returned:\n"
+ json.dumps(pic_json['entry_data'], indent=2))
raise BadResponseException("Fetching Post metadata failed.")
if self.shortcode != self._full_metadata_dict['shortcode']:
self._node.update(self._full_metadata_dict)
@ -207,7 +203,13 @@ class Post:
@property
def owner_id(self) -> int:
"""The ID of the Post's owner."""
return self.owner_profile.userid
# The ID may already be available, e.g. if the post instance was created
# from an `hashtag.get_posts()` iterator, so no need to make another
# http request.
if 'owner' in self._node and 'id' in self._node['owner']:
return self._node['owner']['id']
else:
return self.owner_profile.userid
@property
def date_local(self) -> datetime:
@ -441,7 +443,14 @@ class Post:
)
def get_likes(self) -> Iterator['Profile']:
"""Iterate over all likes of the post. A :class:`Profile` instance of each likee is yielded."""
"""
Iterate over all likes of the post. A :class:`Profile` instance of each likee is yielded.
.. versionchanged:: 4.5.4
Require being logged in (as required by Instagram).
"""
if not self._context.is_logged_in:
raise LoginRequiredException("--login required to access likes of a post.")
if self.likes == 0:
# Avoid doing additional requests if there are no comments
return
@ -555,7 +564,7 @@ class Profile:
"""
# pylint:disable=protected-access
profile = cls(context, {'username': username.lower()})
profile._obtain_metadata() # to raise ProfileNotExistException now in case username is invalid
profile._obtain_metadata() # to raise ProfileNotExistsException now in case username is invalid
return profile
@classmethod
@ -585,6 +594,17 @@ class Profile:
context.profile_id_cache[profile_id] = profile
return profile
@classmethod
def own_profile(cls, context: InstaloaderContext):
"""Return own profile if logged-in.
:param context: :attr:`Instaloader.context`
.. versionadded:: 4.5.2"""
if not context.is_logged_in:
raise LoginRequiredException("--login required to access own profile.")
return cls(context, context.graphql_query("d6f4427fbe92d846298cf93df0b937d3", {})["data"]["user"])
def _asdict(self):
json_node = self._node.copy()
# remove posts to avoid "Circular reference detected" exception
@ -808,7 +828,6 @@ class Profile:
if self.username != self._context.username:
raise LoginRequiredException("--login={} required to get that profile's saved posts.".format(self.username))
self._obtain_metadata()
return NodeIterator(
self._context,
'f883d95537fbcd400f466f63d42bd8a1',
@ -816,7 +835,6 @@ class Profile:
lambda n: Post(self._context, n),
{'id': self.userid},
'https://www.instagram.com/{0}/'.format(self.username),
self._metadata('edge_saved_media'),
)
def get_tagged_posts(self) -> NodeIterator[Post]:
@ -1369,9 +1387,13 @@ class Hashtag:
next_other = next(other_posts, None)
while next_top is not None or next_other is not None:
if next_other is None:
assert next_top is not None
yield next_top
yield from sorted_top_posts
break
if next_top is None:
assert next_other is not None
yield next_other
yield from other_posts
break
if next_top == next_other: