1
0
mirror of https://github.com/instaloader/instaloader.git synced 2024-08-17 12:19:38 +02:00

caption_hashtags property for only-if evaluation

caption_hashtags is a list of all hashtags that are mentioned in the
Post's caption. It allows to easily filter Posts that have multiple
hashtags, and as such fixes #24.

Further, the documentation of --only-if has been completed by linking to
a description of the syntax in the Python documentation, and by linking
to a list of all defined properties with their meanings. So, this commit
also closes #42.
This commit is contained in:
Alexander Graf 2017-08-29 11:03:12 +02:00
parent d84136b2dd
commit 5b5d540310
4 changed files with 29 additions and 2 deletions

View File

@ -129,8 +129,8 @@ Filter Posts
The ``--only-if`` option allows to specify criterias that posts have to The ``--only-if`` option allows to specify criterias that posts have to
meet to be downloaded. If not given, all posts are downloaded. It must meet to be downloaded. If not given, all posts are downloaded. It must
be a boolean Python expression where the variables ``likes``, be a boolean Python expression where the variables ``likes``,
``comments``, ``viewer_has_liked``, ``is_video``, ``date``, and some ``comments``, ``viewer_has_liked``, ``is_video``, and many
more (see class ``instaloader.Post`` for a full list) are defined. more are defined.
A few examples: A few examples:
@ -153,8 +153,17 @@ Or you may **skip videos**:
instaloader --only-if="not is_video" target instaloader --only-if="not is_video" target
Or you may filter by hashtags that occur in the Post's caption. For
example, to download posts of kittens that are cute: ::
instaloader --only-if="'cute' in caption_hashtags" "#kitten"
.. basic-usage-end .. basic-usage-end
(For a more complete description of the ``-only-if`` option, refer to
the `Instaloader Documentation <https://instaloader.readthedocs.io/basic-usage.html#filter-posts>`__)
Advanced Options Advanced Options
---------------- ----------------

View File

@ -16,6 +16,8 @@ Introduction
:members: :members:
:undoc-members: :undoc-members:
.. _post-class:
``Post`` Class ``Post`` Class
^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^

View File

@ -7,3 +7,9 @@ Basic Usage
.. include:: ../README.rst .. include:: ../README.rst
:start-after: basic-usage-start :start-after: basic-usage-start
:end-before: basic-usage-end :end-before: basic-usage-end
.. (continuation of --only-if explanation)
The given string is evaluated as a
`Python boolean expression <https://docs.python.org/3/reference/expressions.html#boolean-operations>`__,
where all occuring variables are properties of the :ref:`post-class`.

View File

@ -275,6 +275,16 @@ class Post:
elif "caption" in self._node: elif "caption" in self._node:
return self._node["caption"] return self._node["caption"]
@property
def caption_hashtags(self) -> List[str]:
"""List of all hashtags (without preceeding #) which occur in the Post's caption."""
if not self.caption:
return []
# This regular expression is from jStassen, adjusted to use Python's \w to support Unicode
# http://blog.jstassen.com/2016/03/code-regex-for-instagram-username-and-hashtags/
hashtag_regex = re.compile(r"(?:#)(\w(?:(?:\w|(?:\.(?!\.))){0,28}(?:\w))?)")
return re.findall(hashtag_regex, self.caption)
@property @property
def is_video(self) -> bool: def is_video(self) -> bool:
"""True if the Post is a video.""" """True if the Post is a video."""