1
0
mirror of https://github.com/instaloader/instaloader.git synced 2024-07-14 23:00:06 +02:00

caption_hashtags property for only-if evaluation

caption_hashtags is a list of all hashtags that are mentioned in the
Post's caption. It allows to easily filter Posts that have multiple
hashtags, and as such fixes #24.

Further, the documentation of --only-if has been completed by linking to
a description of the syntax in the Python documentation, and by linking
to a list of all defined properties with their meanings. So, this commit
also closes #42.
This commit is contained in:
Alexander Graf 2017-08-29 11:03:12 +02:00
parent d84136b2dd
commit 5b5d540310
4 changed files with 29 additions and 2 deletions

View File

@ -129,8 +129,8 @@ Filter Posts
The ``--only-if`` option allows to specify criterias that posts have to
meet to be downloaded. If not given, all posts are downloaded. It must
be a boolean Python expression where the variables ``likes``,
``comments``, ``viewer_has_liked``, ``is_video``, ``date``, and some
more (see class ``instaloader.Post`` for a full list) are defined.
``comments``, ``viewer_has_liked``, ``is_video``, and many
more are defined.
A few examples:
@ -153,8 +153,17 @@ Or you may **skip videos**:
instaloader --only-if="not is_video" target
Or you may filter by hashtags that occur in the Post's caption. For
example, to download posts of kittens that are cute: ::
instaloader --only-if="'cute' in caption_hashtags" "#kitten"
.. basic-usage-end
(For a more complete description of the ``-only-if`` option, refer to
the `Instaloader Documentation <https://instaloader.readthedocs.io/basic-usage.html#filter-posts>`__)
Advanced Options
----------------

View File

@ -16,6 +16,8 @@ Introduction
:members:
:undoc-members:
.. _post-class:
``Post`` Class
^^^^^^^^^^^^^^

View File

@ -7,3 +7,9 @@ Basic Usage
.. include:: ../README.rst
:start-after: basic-usage-start
:end-before: basic-usage-end
.. (continuation of --only-if explanation)
The given string is evaluated as a
`Python boolean expression <https://docs.python.org/3/reference/expressions.html#boolean-operations>`__,
where all occuring variables are properties of the :ref:`post-class`.

View File

@ -275,6 +275,16 @@ class Post:
elif "caption" in self._node:
return self._node["caption"]
@property
def caption_hashtags(self) -> List[str]:
"""List of all hashtags (without preceeding #) which occur in the Post's caption."""
if not self.caption:
return []
# This regular expression is from jStassen, adjusted to use Python's \w to support Unicode
# http://blog.jstassen.com/2016/03/code-regex-for-instagram-username-and-hashtags/
hashtag_regex = re.compile(r"(?:#)(\w(?:(?:\w|(?:\.(?!\.))){0,28}(?:\w))?)")
return re.findall(hashtag_regex, self.caption)
@property
def is_video(self) -> bool:
"""True if the Post is a video."""