mirror of
https://github.com/mikf/gallery-dl.git
synced 2024-11-24 19:52:32 +01:00
Merge branch 'master' into support-webtoonxyz
This commit is contained in:
commit
6f98527111
23
.github/workflows/tests.yml
vendored
23
.github/workflows/tests.yml
vendored
@ -20,19 +20,36 @@ jobs:
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
|
||||
- name: Check file permissions
|
||||
run: |
|
||||
if [[ "$(find ./gallery_dl -type f -not -perm 644)" ]]; then exit 1; fi
|
||||
|
||||
- name: Set up Python ${{ matrix.python-version }}
|
||||
uses: actions/setup-python@v4
|
||||
with:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
|
||||
- name: Install dependencies
|
||||
env:
|
||||
PYV: ${{ matrix.python-version }}
|
||||
run: |
|
||||
pip install -r requirements.txt
|
||||
pip install "flake8<4" "importlib-metadata<5"
|
||||
pip install youtube-dl
|
||||
if [[ "$PYV" != "3.4" && "$PYV" != "3.5" ]]; then pip install yt-dlp; fi
|
||||
|
||||
- name: Install yt-dlp
|
||||
run: |
|
||||
case "${{ matrix.python-version }}" in
|
||||
3.4|3.5)
|
||||
# don't install yt-dlp
|
||||
;;
|
||||
3.6)
|
||||
# install from PyPI
|
||||
pip install yt-dlp
|
||||
;;
|
||||
*)
|
||||
# install from master
|
||||
pip install https://github.com/yt-dlp/yt-dlp/archive/refs/heads/master.tar.gz
|
||||
;;
|
||||
esac
|
||||
|
||||
- name: Lint with flake8
|
||||
run: |
|
||||
|
172
CHANGELOG.md
172
CHANGELOG.md
@ -1,5 +1,177 @@
|
||||
# Changelog
|
||||
|
||||
## 1.25.8 - 2023-07-15
|
||||
### Changes
|
||||
- update default User-Agent header to Firefox 115 ESR
|
||||
### Additions
|
||||
- [gfycat] support `@me` user ([#3770](https://github.com/mikf/gallery-dl/issues/3770), [#4271](https://github.com/mikf/gallery-dl/issues/4271))
|
||||
- [gfycat] implement login support ([#3770](https://github.com/mikf/gallery-dl/issues/3770), [#4271](https://github.com/mikf/gallery-dl/issues/4271))
|
||||
- [reddit] notify users about registering an OAuth application ([#4292](https://github.com/mikf/gallery-dl/issues/4292))
|
||||
- [twitter] add `ratelimit` option ([#4251](https://github.com/mikf/gallery-dl/issues/4251))
|
||||
- [twitter] use `TweetResultByRestId` endpoint that allows accessing single Tweets without login ([#4250](https://github.com/mikf/gallery-dl/issues/4250))
|
||||
### Fixes
|
||||
- [bunkr] use `.la` TLD for `media-files12` servers ([#4147](https://github.com/mikf/gallery-dl/issues/4147), [#4276](https://github.com/mikf/gallery-dl/issues/4276))
|
||||
- [erome] ignore duplicate album IDs
|
||||
- [fantia] send `X-Requested-With` header ([#4273](https://github.com/mikf/gallery-dl/issues/4273))
|
||||
- [gelbooru_v01] fix `source` metadata ([#4302](https://github.com/mikf/gallery-dl/issues/4302), [#4303](https://github.com/mikf/gallery-dl/issues/4303))
|
||||
- [gelbooru_v01] update `vidyart` domain
|
||||
- [jpgfish] update domain to `jpeg.pet`
|
||||
- [mangaread] fix `tags` metadata extraction
|
||||
- [naverwebtoon] fix `comic` metadata extraction
|
||||
- [newgrounds] extract & pass auth token during login ([#4268](https://github.com/mikf/gallery-dl/issues/4268))
|
||||
- [paheal] fix extraction ([#4262](https://github.com/mikf/gallery-dl/issues/4262), [#4293](https://github.com/mikf/gallery-dl/issues/4293))
|
||||
- [paheal] unescape `source`
|
||||
- [philomena] fix `--range` ([#4288](https://github.com/mikf/gallery-dl/issues/4288))
|
||||
- [philomena] handle `429 Too Many Requests` errors ([#4288](https://github.com/mikf/gallery-dl/issues/4288))
|
||||
- [pornhub] set `accessAgeDisclaimerPH` cookie ([#4301](https://github.com/mikf/gallery-dl/issues/4301))
|
||||
- [reddit] use 0.6s delay between API requests ([#4292](https://github.com/mikf/gallery-dl/issues/4292))
|
||||
- [seiga] set `skip_fetish_warning` cookie ([#4242](https://github.com/mikf/gallery-dl/issues/4242))
|
||||
- [slideshare] fix extraction
|
||||
- [twitter] fix `following` extractor not getting all users ([#4287](https://github.com/mikf/gallery-dl/issues/4287))
|
||||
- [twitter] use GraphQL search endpoint by default ([#4264](https://github.com/mikf/gallery-dl/issues/4264))
|
||||
- [twitter] do not treat missing `TimelineAddEntries` instruction as fatal ([#4278](https://github.com/mikf/gallery-dl/issues/4278))
|
||||
- [weibo] fix cursor based pagination
|
||||
- [wikifeet] fix `tag` extraction ([#4289](https://github.com/mikf/gallery-dl/issues/4289), [#4291](https://github.com/mikf/gallery-dl/issues/4291))
|
||||
### Removals
|
||||
- [bcy] remove module
|
||||
- [lineblog] remove module
|
||||
|
||||
## 1.25.7 - 2023-07-02
|
||||
### Additions
|
||||
- [flickr] add 'exif' option
|
||||
- [flickr] add 'metadata' option ([#4227](https://github.com/mikf/gallery-dl/issues/4227))
|
||||
- [mangapark] add 'source' option ([#3969](https://github.com/mikf/gallery-dl/issues/3969))
|
||||
- [twitter] extend 'conversations' option ([#4211](https://github.com/mikf/gallery-dl/issues/4211))
|
||||
### Fixes
|
||||
- [furaffinity] improve 'description' HTML ([#4224](https://github.com/mikf/gallery-dl/issues/4224))
|
||||
- [gelbooru_v01] fix '--range' ([#4167](https://github.com/mikf/gallery-dl/issues/4167))
|
||||
- [hentaifox] fix titles containing '@' ([#4201](https://github.com/mikf/gallery-dl/issues/4201))
|
||||
- [mangapark] update to v5 ([#3969](https://github.com/mikf/gallery-dl/issues/3969))
|
||||
- [piczel] update API server address ([#4244](https://github.com/mikf/gallery-dl/issues/4244))
|
||||
- [poipiku] improve error detection ([#4206](https://github.com/mikf/gallery-dl/issues/4206))
|
||||
- [sankaku] improve warnings for unavailable posts
|
||||
- [senmanga] ensure download URLs have a scheme ([#4235](https://github.com/mikf/gallery-dl/issues/4235))
|
||||
|
||||
## 1.25.6 - 2023-06-17
|
||||
### Additions
|
||||
- [blogger] download files from `lh*.googleusercontent.com` ([#4070](https://github.com/mikf/gallery-dl/issues/4070))
|
||||
- [fantia] extract `plan` metadata ([#2477](https://github.com/mikf/gallery-dl/issues/2477))
|
||||
- [fantia] emit warning for non-visible content sections ([#4128](https://github.com/mikf/gallery-dl/issues/4128))
|
||||
- [furaffinity] extract `favorite_id` metadata ([#4133](https://github.com/mikf/gallery-dl/issues/4133))
|
||||
- [jschan] add generic extractors for jschan image boards ([#3447](https://github.com/mikf/gallery-dl/issues/3447))
|
||||
- [kemonoparty] support `.su` TLDs ([#4139](https://github.com/mikf/gallery-dl/issues/4139))
|
||||
- [pixiv:novel] add `novel-bookmark` extractor ([#4111](https://github.com/mikf/gallery-dl/issues/4111))
|
||||
- [pixiv:novel] add `full-series` option ([#4111](https://github.com/mikf/gallery-dl/issues/4111))
|
||||
- [postimage] add gallery support, update image extractor ([#3115](https://github.com/mikf/gallery-dl/issues/3115), [#4134](https://github.com/mikf/gallery-dl/issues/4134))
|
||||
- [redgifs] support galleries ([#4021](https://github.com/mikf/gallery-dl/issues/4021))
|
||||
- [twitter] extract `conversation_id` metadata ([#3839](https://github.com/mikf/gallery-dl/issues/3839))
|
||||
- [vipergirls] add login support ([#4166](https://github.com/mikf/gallery-dl/issues/4166))
|
||||
- [vipergirls] use API endpoints ([#4166](https://github.com/mikf/gallery-dl/issues/4166))
|
||||
- [formatter] implement `H` conversion ([#4164](https://github.com/mikf/gallery-dl/issues/4164))
|
||||
### Fixes
|
||||
- [acidimg] fix extraction ([#4136](https://github.com/mikf/gallery-dl/issues/4136))
|
||||
- [bunkr] update domain to bunkrr.su ([#4159](https://github.com/mikf/gallery-dl/issues/4159), [#4189](https://github.com/mikf/gallery-dl/issues/4189))
|
||||
- [bunkr] fix video downloads
|
||||
- [fanbox] prevent exception due to missing embeds ([#4088](https://github.com/mikf/gallery-dl/issues/4088))
|
||||
- [instagram] fix retrieving `/tagged` posts ([#4122](https://github.com/mikf/gallery-dl/issues/4122))
|
||||
- [jpgfish] update domain to `jpg.pet` ([#4138](https://github.com/mikf/gallery-dl/issues/4138))
|
||||
- [pixiv:novel] fix error with embeds extraction ([#4175](https://github.com/mikf/gallery-dl/issues/4175))
|
||||
- [pornhub] improve redirect handling ([#4188](https://github.com/mikf/gallery-dl/issues/4188))
|
||||
- [reddit] fix crash due to empty `crosspost_parent_lists` ([#4120](https://github.com/mikf/gallery-dl/issues/4120), [#4172](https://github.com/mikf/gallery-dl/issues/4172))
|
||||
- [redgifs] update `search` URL pattern ([#4115](https://github.com/mikf/gallery-dl/issues/4115), [#4185](https://github.com/mikf/gallery-dl/issues/4185))
|
||||
- [senmanga] fix and update ([#4160](https://github.com/mikf/gallery-dl/issues/4160))
|
||||
- [twitter] use GraphQL API search endpoint ([#3942](https://github.com/mikf/gallery-dl/issues/3942))
|
||||
- [wallhaven] improve HTTP error handling ([#4192](https://github.com/mikf/gallery-dl/issues/4192))
|
||||
- [weibo] prevent fatal exception due to missing video data ([#4150](https://github.com/mikf/gallery-dl/issues/4150))
|
||||
- [weibo] fix `.json` extension for some videos
|
||||
|
||||
## 1.25.5 - 2023-05-27
|
||||
### Additions
|
||||
- [8muses] add `parts` metadata field ([#3329](https://github.com/mikf/gallery-dl/issues/3329))
|
||||
- [danbooru] add `date` metadata field ([#4047](https://github.com/mikf/gallery-dl/issues/4047))
|
||||
- [e621] add `date` metadata field ([#4047](https://github.com/mikf/gallery-dl/issues/4047))
|
||||
- [gofile] add basic password support ([#4056](https://github.com/mikf/gallery-dl/issues/4056))
|
||||
- [imagechest] implement API support ([#4065](https://github.com/mikf/gallery-dl/issues/4065))
|
||||
- [instagram] add `order-files` option ([#3993](https://github.com/mikf/gallery-dl/issues/3993), [#4017](https://github.com/mikf/gallery-dl/issues/4017))
|
||||
- [instagram] add `order-posts` option ([#3993](https://github.com/mikf/gallery-dl/issues/3993), [#4017](https://github.com/mikf/gallery-dl/issues/4017))
|
||||
- [instagram] add `metadata` option ([#3107](https://github.com/mikf/gallery-dl/issues/3107))
|
||||
- [jpgfish] add `jpg.fishing` extractors ([#2657](https://github.com/mikf/gallery-dl/issues/2657), [#2719](https://github.com/mikf/gallery-dl/issues/2719))
|
||||
- [lensdump] add `lensdump.com` extractors ([#2078](https://github.com/mikf/gallery-dl/issues/2078), [#4104](https://github.com/mikf/gallery-dl/issues/4104))
|
||||
- [mangaread] add `mangaread.org` extractors ([#2425](https://github.com/mikf/gallery-dl/issues/2425), [#2781](https://github.com/mikf/gallery-dl/issues/2781))
|
||||
- [misskey] add `favorite` extractor ([#3950](https://github.com/mikf/gallery-dl/issues/3950))
|
||||
- [pixiv] add `novel` support ([#1241](https://github.com/mikf/gallery-dl/issues/1241), [#4044](https://github.com/mikf/gallery-dl/issues/4044))
|
||||
- [reddit] support cross-posted media ([#887](https://github.com/mikf/gallery-dl/issues/887), [#3586](https://github.com/mikf/gallery-dl/issues/3586), [#3976](https://github.com/mikf/gallery-dl/issues/3976))
|
||||
- [postprocessor:exec] support tilde expansion for `command`
|
||||
- [formatter] support slicing strings as bytes ([#4087](https://github.com/mikf/gallery-dl/issues/4087))
|
||||
### Fixes
|
||||
- [8muses] fix value of `album[url]` ([#3329](https://github.com/mikf/gallery-dl/issues/3329))
|
||||
- [danbooru] refactor pagination logic ([#4002](https://github.com/mikf/gallery-dl/issues/4002))
|
||||
- [fanbox] skip invalid posts ([#4088](https://github.com/mikf/gallery-dl/issues/4088))
|
||||
- [gofile] automatically fetch `website-token`
|
||||
- [kemonoparty] fix kemono and coomer logins sharing the same cache ([#4098](https://github.com/mikf/gallery-dl/issues/4098))
|
||||
- [newgrounds] add default delay between requests ([#4046](https://github.com/mikf/gallery-dl/issues/4046))
|
||||
- [nsfwalbum] detect placeholder images
|
||||
- [poipiku] extract full `descriptions` ([#4066](https://github.com/mikf/gallery-dl/issues/4066))
|
||||
- [tcbscans] update domain to `tcbscans.com` ([#4080](https://github.com/mikf/gallery-dl/issues/4080))
|
||||
- [twitter] extract TwitPic URLs in text ([#3792](https://github.com/mikf/gallery-dl/issues/3792), [#3796](https://github.com/mikf/gallery-dl/issues/3796))
|
||||
- [weibo] require numeric IDs to have length >= 10 ([#4059](https://github.com/mikf/gallery-dl/issues/4059))
|
||||
- [ytdl] fix crash due to removed `no_color` attribute
|
||||
- [cookies] improve logging behavior ([#4050](https://github.com/mikf/gallery-dl/issues/4050))
|
||||
|
||||
## 1.25.4 - 2023-05-07
|
||||
### Additions
|
||||
- [4chanarchives] add `thread` and `board` extractors ([#4012](https://github.com/mikf/gallery-dl/issues/4012))
|
||||
- [foolfuuka] add `archive.palanq.win`
|
||||
- [imgur] add `favorite-folder` extractor ([#4016](https://github.com/mikf/gallery-dl/issues/4016))
|
||||
- [mangadex] add `status` and `tags` metadata ([#4031](https://github.com/mikf/gallery-dl/issues/4031))
|
||||
- allow selecting a domain with `--cookies-from-browser`
|
||||
- add `--cookies-export` command-line option
|
||||
- add `-C` as short option for `--cookies`
|
||||
- include exception type in config error messages
|
||||
### Fixes
|
||||
- [exhentai] update sadpanda check
|
||||
- [imagechest] load all images when a "Load More" button is present ([#4028](https://github.com/mikf/gallery-dl/issues/4028))
|
||||
- [imgur] fix bug causing some images/albums from user profiles and favorites to be ignored
|
||||
- [pinterest] update endpoint for related board pins
|
||||
- [pinterest] fix `pin.it` extractor
|
||||
- [ytdl] fix yt-dlp `--xff/--geo-bypass` tests ([#3989](https://github.com/mikf/gallery-dl/issues/3989))
|
||||
### Removals
|
||||
- [420chan] remove module
|
||||
- [foolfuuka] remove `archive.alice.al` and `tokyochronos.net`
|
||||
- [foolslide] remove `sensescans.com`
|
||||
- [nana] remove module
|
||||
|
||||
## 1.25.3 - 2023-04-30
|
||||
### Additions
|
||||
- [imagefap] extract `description` and `categories` metadata ([#3905](https://github.com/mikf/gallery-dl/issues/3905))
|
||||
- [imxto] add `gallery` extractor ([#1289](https://github.com/mikf/gallery-dl/issues/1289))
|
||||
- [itchio] add `game` extractor ([#3923](https://github.com/mikf/gallery-dl/issues/3923))
|
||||
- [nitter] extract user IDs from encoded banner URLs
|
||||
- [pixiv] allow sorting search results by popularity ([#3970](https://github.com/mikf/gallery-dl/issues/3970))
|
||||
- [reddit] match `preview.redd.it` URLs ([#3935](https://github.com/mikf/gallery-dl/issues/3935))
|
||||
- [sankaku] support post URLs with MD5 hashes ([#3952](https://github.com/mikf/gallery-dl/issues/3952))
|
||||
- [shimmie2] add generic extractors for Shimmie2 sites ([#3734](https://github.com/mikf/gallery-dl/issues/3734), [#943](https://github.com/mikf/gallery-dl/issues/943))
|
||||
- [tumblr] add `day` extractor ([#3951](https://github.com/mikf/gallery-dl/issues/3951))
|
||||
- [twitter] support `profile-conversation` entries ([#3938](https://github.com/mikf/gallery-dl/issues/3938))
|
||||
- [vipergirls] add `thread` and `post` extractors ([#3812](https://github.com/mikf/gallery-dl/issues/3812), [#2720](https://github.com/mikf/gallery-dl/issues/2720), [#731](https://github.com/mikf/gallery-dl/issues/731))
|
||||
- [downloader:http] add `consume-content` option ([#3748](https://github.com/mikf/gallery-dl/issues/3748))
|
||||
### Fixes
|
||||
- [2chen] update domain to sturdychan.help
|
||||
- [behance] fix extraction ([#3980](https://github.com/mikf/gallery-dl/issues/3980))
|
||||
- [deviantart] retry downloads with private token ([#3941](https://github.com/mikf/gallery-dl/issues/3941))
|
||||
- [imagefap] fix empty `tags` metadata
|
||||
- [manganelo] support arbitrary minor version separators ([#3972](https://github.com/mikf/gallery-dl/issues/3972))
|
||||
- [nozomi] fix file URLs ([#3925](https://github.com/mikf/gallery-dl/issues/3925))
|
||||
- [oauth] catch exceptions from `webbrowser.get()` ([#3947](https://github.com/mikf/gallery-dl/issues/3947))
|
||||
- [pixiv] fix `pixivision` extraction
|
||||
- [reddit] ignore `id-max` value `"zik0zj"`/`2147483647` ([#3939](https://github.com/mikf/gallery-dl/issues/3939), [#3862](https://github.com/mikf/gallery-dl/issues/3862), [#3697](https://github.com/mikf/gallery-dl/issues/3697), [#3606](https://github.com/mikf/gallery-dl/issues/3606), [#3546](https://github.com/mikf/gallery-dl/issues/3546), [#3521](https://github.com/mikf/gallery-dl/issues/3521), [#3412](https://github.com/mikf/gallery-dl/issues/3412))
|
||||
- [sankaku] sanitize `date:` tags ([#1790](https://github.com/mikf/gallery-dl/issues/1790))
|
||||
- [tumblr] fix and update pagination logic ([#2191](https://github.com/mikf/gallery-dl/issues/2191))
|
||||
- [twitter] fix `user` metadata when downloading quoted Tweets ([#3922](https://github.com/mikf/gallery-dl/issues/3922))
|
||||
- [ytdl] fix crash due to `--geo-bypass` deprecation ([#3975](https://github.com/mikf/gallery-dl/issues/3975))
|
||||
- [postprocessor:metadata] support putting keys in quotes
|
||||
- include more optional dependencies in executables ([#3907](https://github.com/mikf/gallery-dl/issues/3907))
|
||||
|
||||
## 1.25.2 - 2023-04-15
|
||||
### Additions
|
||||
- [deviantart] add `public` option
|
||||
|
13
README.rst
13
README.rst
@ -72,9 +72,9 @@ Standalone Executable
|
||||
Prebuilt executable files with a Python interpreter and
|
||||
required Python packages included are available for
|
||||
|
||||
- `Windows <https://github.com/mikf/gallery-dl/releases/download/v1.25.2/gallery-dl.exe>`__
|
||||
- `Windows <https://github.com/mikf/gallery-dl/releases/download/v1.25.8/gallery-dl.exe>`__
|
||||
(Requires `Microsoft Visual C++ Redistributable Package (x86) <https://aka.ms/vs/17/release/vc_redist.x86.exe>`__)
|
||||
- `Linux <https://github.com/mikf/gallery-dl/releases/download/v1.25.2/gallery-dl.bin>`__
|
||||
- `Linux <https://github.com/mikf/gallery-dl/releases/download/v1.25.8/gallery-dl.bin>`__
|
||||
|
||||
|
||||
Nightly Builds
|
||||
@ -123,6 +123,15 @@ For macOS or Linux users using Homebrew:
|
||||
|
||||
brew install gallery-dl
|
||||
|
||||
MacPorts
|
||||
--------
|
||||
|
||||
For macOS users with MacPorts:
|
||||
|
||||
.. code:: bash
|
||||
|
||||
sudo port install gallery-dl
|
||||
|
||||
|
||||
Usage
|
||||
=====
|
||||
|
@ -382,6 +382,7 @@ Description
|
||||
* ``e621`` (*)
|
||||
* ``e926`` (*)
|
||||
* ``exhentai``
|
||||
* ``gfycat``
|
||||
* ``idolcomplex``
|
||||
* ``imgbb``
|
||||
* ``inkbunny``
|
||||
@ -395,6 +396,7 @@ Description
|
||||
* ``tapas``
|
||||
* ``tsumino``
|
||||
* ``twitter``
|
||||
* ``vipergirls``
|
||||
* ``zerochan``
|
||||
|
||||
These values can also be specified via the
|
||||
@ -440,30 +442,35 @@ Description
|
||||
"isAdult" : "1"
|
||||
}
|
||||
|
||||
* A ``list`` with up to 4 entries specifying a browser profile.
|
||||
* A ``list`` with up to 5 entries specifying a browser profile.
|
||||
|
||||
* The first entry is the browser name
|
||||
* The optional second entry is a profile name or an absolute path to a profile directory
|
||||
* The optional third entry is the keyring to retrieve passwords for decrypting cookies from
|
||||
* The optional fourth entry is a (Firefox) container name (``"none"`` for only cookies with no container)
|
||||
* The optional fifth entry is the domain to extract cookies for. Prefix it with a dot ``.`` to include cookies for subdomains. Has no effect when also specifying a container.
|
||||
|
||||
.. code:: json
|
||||
|
||||
["firefox"]
|
||||
["firefox", null, null, "Personal"]
|
||||
["chromium", "Private", "kwallet"]
|
||||
["chromium", "Private", "kwallet", null, ".twitter.com"]
|
||||
|
||||
|
||||
extractor.*.cookies-update
|
||||
--------------------------
|
||||
Type
|
||||
``bool``
|
||||
* ``bool``
|
||||
* |Path|_
|
||||
Default
|
||||
``true``
|
||||
Description
|
||||
If `extractor.*.cookies`_ specifies the |Path|_ of a cookies.txt
|
||||
file and it can be opened and parsed without errors,
|
||||
update its contents with cookies received during data extraction.
|
||||
Export session cookies in cookies.txt format.
|
||||
|
||||
* If this is a |Path|_, write cookies to the given file path.
|
||||
|
||||
* If this is ``true`` and `extractor.*.cookies`_ specifies the |Path|_
|
||||
of a valid cookies.txt file, update its contents.
|
||||
|
||||
|
||||
extractor.*.proxy
|
||||
@ -519,7 +526,7 @@ extractor.*.user-agent
|
||||
Type
|
||||
``string``
|
||||
Default
|
||||
``"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Firefox/102.0"``
|
||||
``"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:115.0) Gecko/20100101 Firefox/115.0"``
|
||||
Description
|
||||
User-Agent header value to be used for HTTP requests.
|
||||
|
||||
@ -1151,7 +1158,7 @@ Description
|
||||
Note: This requires 1 additional HTTP request per 200-post batch.
|
||||
|
||||
|
||||
extractor.{Danbooru].threshold
|
||||
extractor.[Danbooru].threshold
|
||||
------------------------------
|
||||
Type
|
||||
* ``string``
|
||||
@ -1535,6 +1542,39 @@ Description
|
||||
from `linking your Flickr account to gallery-dl <OAuth_>`__.
|
||||
|
||||
|
||||
extractor.flickr.exif
|
||||
---------------------
|
||||
Type
|
||||
``bool``
|
||||
Default
|
||||
``false``
|
||||
Description
|
||||
Fetch `exif` and `camera` metadata for each photo.
|
||||
|
||||
Note: This requires 1 additional API call per photo.
|
||||
|
||||
|
||||
extractor.flickr.metadata
|
||||
-------------------------
|
||||
Type
|
||||
* ``bool``
|
||||
* ``string``
|
||||
* ``list`` of ``strings``
|
||||
Default
|
||||
``false``
|
||||
Example
|
||||
* ``license,last_update,machine_tags``
|
||||
* ``["license", "last_update", "machine_tags"]``
|
||||
Description
|
||||
Extract additional metadata
|
||||
(license, date_taken, original_format, last_update, geo, machine_tags, o_dims)
|
||||
|
||||
It is possible to specify a custom list of metadata includes.
|
||||
See `the extras parameter <https://www.flickr.com/services/api/flickr.people.getPhotos.html>`__
|
||||
in `Flickr API docs <https://www.flickr.com/services/api/>`__
|
||||
for possible field names.
|
||||
|
||||
|
||||
extractor.flickr.videos
|
||||
-----------------------
|
||||
Type
|
||||
@ -1651,7 +1691,11 @@ Default
|
||||
``["mp4", "webm", "mobile", "gif"]``
|
||||
Description
|
||||
List of names of the preferred animation format, which can be
|
||||
``"mp4"``, ``"webm"``, ``"mobile"``, ``"gif"``, or ``"webp"``.
|
||||
``"mp4"``,
|
||||
``"webm"``,
|
||||
``"mobile"``,
|
||||
``"gif"``, or
|
||||
``"webp"``.
|
||||
|
||||
If a selected format is not available, the next one in the list will be
|
||||
tried until an available format is found.
|
||||
@ -1677,15 +1721,14 @@ extractor.gofile.website-token
|
||||
------------------------------
|
||||
Type
|
||||
``string``
|
||||
Default
|
||||
``"12345"``
|
||||
Description
|
||||
API token value used during API requests.
|
||||
|
||||
A not up-to-date value will result in ``401 Unauthorized`` errors.
|
||||
An invalid or not up-to-date value
|
||||
will result in ``401 Unauthorized`` errors.
|
||||
|
||||
Setting this value to ``null`` will do an extra HTTP request to fetch
|
||||
the current value used by gofile.
|
||||
Keeping this option unset will use an extra HTTP request
|
||||
to attempt to fetch the current value used by gofile.
|
||||
|
||||
|
||||
extractor.gofile.recursive
|
||||
@ -1733,6 +1776,21 @@ Description
|
||||
but is most likely going to fail with ``403 Forbidden`` errors.
|
||||
|
||||
|
||||
extractor.imagechest.access-token
|
||||
---------------------------------
|
||||
Type
|
||||
``string``
|
||||
Description
|
||||
Your personal Image Chest access token.
|
||||
|
||||
These tokens allow using the API instead of having to scrape HTML pages,
|
||||
providing more detailed metadata.
|
||||
(``date``, ``description``, etc)
|
||||
|
||||
See https://imgchest.com/docs/api/1.0/general/authorization
|
||||
for instructions on how to generate such a token.
|
||||
|
||||
|
||||
extractor.imgur.client-id
|
||||
-------------------------
|
||||
Type
|
||||
@ -1808,6 +1866,55 @@ Description
|
||||
It is possible to use ``"all"`` instead of listing all values separately.
|
||||
|
||||
|
||||
extractor.instagram.metadata
|
||||
----------------------------
|
||||
Type
|
||||
``bool``
|
||||
Default
|
||||
``false``
|
||||
Description
|
||||
Provide extended ``user`` metadata even when referring to a user by ID,
|
||||
e.g. ``instagram.com/id:12345678``.
|
||||
|
||||
Note: This metadata is always available when referring to a user by name,
|
||||
e.g. ``instagram.com/USERNAME``.
|
||||
|
||||
|
||||
extractor.instagram.order-files
|
||||
-------------------------------
|
||||
Type
|
||||
``string``
|
||||
Default
|
||||
``"asc"``
|
||||
Description
|
||||
Controls the order in which files of each post are returned.
|
||||
|
||||
* ``"asc"``: Same order as displayed in a post
|
||||
* ``"desc"``: Reverse order as displayed in a post
|
||||
* ``"reverse"``: Same as ``"desc"``
|
||||
|
||||
Note: This option does *not* affect ``{num}``.
|
||||
To enumerate files in reverse order, use ``count - num + 1``.
|
||||
|
||||
|
||||
extractor.instagram.order-posts
|
||||
-------------------------------
|
||||
Type
|
||||
``string``
|
||||
Default
|
||||
``"asc"``
|
||||
Description
|
||||
Controls the order in which posts are returned.
|
||||
|
||||
* ``"asc"``: Same order as displayed
|
||||
* ``"desc"``: Reverse order as displayed
|
||||
* ``"id"`` or ``"id_asc"``: Ascending order by ID
|
||||
* ``"id_desc"``: Descending order by ID
|
||||
* ``"reverse"``: Same as ``"desc"``
|
||||
|
||||
Note: This option only affects ``highlights``.
|
||||
|
||||
|
||||
extractor.instagram.previews
|
||||
----------------------------
|
||||
Type
|
||||
@ -1979,18 +2086,21 @@ Example
|
||||
Description
|
||||
Additional query parameters to send when fetching manga chapters.
|
||||
|
||||
(See `/manga/{id}/feed <https://api.mangadex.org/docs.html#operation/get-manga-id-feed>`_
|
||||
and `/user/follows/manga/feed <https://api.mangadex.org/docs.html#operation/get-user-follows-manga-feed>`_)
|
||||
(See `/manga/{id}/feed <https://api.mangadex.org/docs/swagger.html#/Manga/get-manga-id-feed>`__
|
||||
and `/user/follows/manga/feed <https://api.mangadex.org/docs/swagger.html#/Feed/get-user-follows-manga-feed>`__)
|
||||
|
||||
|
||||
extractor.mangadex.lang
|
||||
-----------------------
|
||||
Type
|
||||
``string``
|
||||
* ``string``
|
||||
* ``list`` of ``strings``
|
||||
Example
|
||||
``"en"``
|
||||
* ``"en"``
|
||||
* ``"fr,it"``
|
||||
* ``["fr", "it"]``
|
||||
Description
|
||||
`ISO 639-1 <https://en.wikipedia.org/wiki/ISO_639-1>`__ language code
|
||||
`ISO 639-1 <https://en.wikipedia.org/wiki/ISO_639-1>`__ language codes
|
||||
to filter chapters by.
|
||||
|
||||
|
||||
@ -2004,6 +2114,24 @@ Description
|
||||
List of acceptable content ratings for returned chapters.
|
||||
|
||||
|
||||
extractor.mangapark.source
|
||||
--------------------------
|
||||
Type
|
||||
* ``string``
|
||||
* ``integer``
|
||||
Example
|
||||
* ``"koala:en"``
|
||||
* ``15150116``
|
||||
Description
|
||||
Select chapter source and language for a manga.
|
||||
|
||||
| The general syntax is ``"<source name>:<ISO 639-1 language code>"``.
|
||||
| Both are optional, meaning ``"koala"``, ``"koala:"``, ``":en"``,
|
||||
or even just ``":"`` are possible as well.
|
||||
|
||||
Specifying the numeric ``ID`` of a source is also supported.
|
||||
|
||||
|
||||
extractor.[mastodon].access-token
|
||||
---------------------------------
|
||||
Type
|
||||
@ -2050,8 +2178,16 @@ Description
|
||||
Also emit metadata for text-only posts without media content.
|
||||
|
||||
|
||||
extractor.[misskey].access-token
|
||||
--------------------------------
|
||||
Type
|
||||
``string``
|
||||
Description
|
||||
Your access token, necessary to fetch favorited notes.
|
||||
|
||||
|
||||
extractor.[misskey].renotes
|
||||
----------------------------
|
||||
---------------------------
|
||||
Type
|
||||
``bool``
|
||||
Default
|
||||
@ -2061,7 +2197,7 @@ Description
|
||||
|
||||
|
||||
extractor.[misskey].replies
|
||||
----------------------------
|
||||
---------------------------
|
||||
Type
|
||||
``bool``
|
||||
Default
|
||||
@ -2070,17 +2206,6 @@ Description
|
||||
Fetch media from replies to other notes.
|
||||
|
||||
|
||||
extractor.nana.favkey
|
||||
---------------------
|
||||
Type
|
||||
``string``
|
||||
Default
|
||||
``null``
|
||||
Description
|
||||
Your `Nana Favorite Key <https://nana.my.id/tutorial>`__,
|
||||
used to access your favorite archives.
|
||||
|
||||
|
||||
extractor.newgrounds.flash
|
||||
--------------------------
|
||||
Type
|
||||
@ -2341,7 +2466,12 @@ Description
|
||||
when processing a user profile.
|
||||
|
||||
Possible values are
|
||||
``"artworks"``, ``"avatar"``, ``"background"``, ``"favorite"``.
|
||||
``"artworks"``,
|
||||
``"avatar"``,
|
||||
``"background"``,
|
||||
``"favorite"``,
|
||||
``"novel-user"``,
|
||||
``"novel-bookmark"``.
|
||||
|
||||
It is possible to use ``"all"`` instead of listing all values separately.
|
||||
|
||||
@ -2357,6 +2487,27 @@ Description
|
||||
`gppt <https://github.com/eggplants/get-pixivpy-token>`__.
|
||||
|
||||
|
||||
extractor.pixiv.embeds
|
||||
----------------------
|
||||
Type
|
||||
``bool``
|
||||
Default
|
||||
``false``
|
||||
Description
|
||||
Download images embedded in novels.
|
||||
|
||||
|
||||
extractor.pixiv.novel.full-series
|
||||
---------------------------------
|
||||
Type
|
||||
``bool``
|
||||
Default
|
||||
``false``
|
||||
Description
|
||||
When downloading a novel being part of a series,
|
||||
download all novels of that series.
|
||||
|
||||
|
||||
extractor.pixiv.metadata
|
||||
------------------------
|
||||
Type
|
||||
@ -2602,7 +2753,12 @@ Default
|
||||
``["hd", "sd", "gif"]``
|
||||
Description
|
||||
List of names of the preferred animation format, which can be
|
||||
``"hd"``, ``"sd"``, `"gif"``, `"vthumbnail"``, `"thumbnail"``, or ``"poster"``.
|
||||
``"hd"``,
|
||||
``"sd"``,
|
||||
``"gif"``,
|
||||
``"thumbnail"``,
|
||||
``"vthumbnail"``, or
|
||||
``"poster"``.
|
||||
|
||||
If a selected format is not available, the next one in the list will be
|
||||
tried until an available format is found.
|
||||
@ -2901,15 +3057,19 @@ Description
|
||||
extractor.twitter.conversations
|
||||
-------------------------------
|
||||
Type
|
||||
``bool``
|
||||
* ``bool``
|
||||
* ``string``
|
||||
Default
|
||||
``false``
|
||||
Description
|
||||
For input URLs pointing to a single Tweet,
|
||||
e.g. `https://twitter.com/i/web/status/<TweetID>`,
|
||||
fetch media from all Tweets and replies in this `conversation
|
||||
<https://help.twitter.com/en/using-twitter/twitter-conversations>`__
|
||||
or thread.
|
||||
<https://help.twitter.com/en/using-twitter/twitter-conversations>`__.
|
||||
|
||||
If this option is equal to ``"accessible"``,
|
||||
only download from conversation Tweets
|
||||
if the given initial Tweet is accessible.
|
||||
|
||||
|
||||
extractor.twitter.csrf
|
||||
@ -2945,6 +3105,32 @@ Description
|
||||
`syndication <extractor.twitter.syndication_>`__ API.
|
||||
|
||||
|
||||
extractor.twitter.include
|
||||
-------------------------
|
||||
Type
|
||||
* ``string``
|
||||
* ``list`` of ``strings``
|
||||
Default
|
||||
``"timeline"``
|
||||
Example
|
||||
* ``"avatar,background,media"``
|
||||
* ``["avatar", "background", "media"]``
|
||||
Description
|
||||
A (comma-separated) list of subcategories to include
|
||||
when processing a user profile.
|
||||
|
||||
Possible values are
|
||||
``"avatar"``,
|
||||
``"background"``,
|
||||
``"timeline"``,
|
||||
``"tweets"``,
|
||||
``"media"``,
|
||||
``"replies"``,
|
||||
``"likes"``.
|
||||
|
||||
It is possible to use ``"all"`` instead of listing all values separately.
|
||||
|
||||
|
||||
extractor.twitter.transform
|
||||
---------------------------
|
||||
Type
|
||||
@ -2955,6 +3141,20 @@ Description
|
||||
Transform Tweet and User metadata into a simpler, uniform format.
|
||||
|
||||
|
||||
extractor.twitter.tweet-endpoint
|
||||
--------------------------------
|
||||
Type
|
||||
``string``
|
||||
Default
|
||||
``"auto"``
|
||||
Description
|
||||
Selects the API endpoint used to retrieve single Tweets.
|
||||
|
||||
* ``"restid"``: ``/TweetResultByRestId`` - accessible to guest users
|
||||
* ``"detail"``: ``/TweetDetail`` - more stable
|
||||
* ``"auto"``: ``"detail"`` when logged in, ``"restid"`` otherwise
|
||||
|
||||
|
||||
extractor.twitter.size
|
||||
----------------------
|
||||
Type
|
||||
@ -3027,6 +3227,19 @@ Description
|
||||
a quoted (original) Tweet when it sees the Tweet which quotes it.
|
||||
|
||||
|
||||
extractor.twitter.ratelimit
|
||||
---------------------------
|
||||
Type
|
||||
``string``
|
||||
Default
|
||||
``"wait"``
|
||||
Description
|
||||
Selects how to handle exceeding the API rate limit.
|
||||
|
||||
* ``"abort"``: Raise an error and stop extraction
|
||||
* ``"wait"``: Wait until rate limit reset
|
||||
|
||||
|
||||
extractor.twitter.replies
|
||||
-------------------------
|
||||
Type
|
||||
@ -3067,8 +3280,8 @@ Type
|
||||
Default
|
||||
``"auto"``
|
||||
Description
|
||||
Controls the strategy / tweet source used for user URLs
|
||||
(``https://twitter.com/USER``).
|
||||
Controls the strategy / tweet source used for timeline URLs
|
||||
(``https://twitter.com/USER/timeline``).
|
||||
|
||||
* ``"tweets"``: `/tweets <https://twitter.com/USER/tweets>`__ timeline + search
|
||||
* ``"media"``: `/media <https://twitter.com/USER/media>`__ timeline + search
|
||||
@ -3637,6 +3850,25 @@ Description
|
||||
contains JPEG/JFIF data.
|
||||
|
||||
|
||||
downloader.http.consume-content
|
||||
-------------------------------
|
||||
Type
|
||||
``bool``
|
||||
Default
|
||||
``false``
|
||||
Description
|
||||
Controls the behavior when an HTTP response is considered
|
||||
unsuccessful
|
||||
|
||||
If the value is ``true``, consume the response body. This
|
||||
avoids closing the connection and therefore improves connection
|
||||
reuse.
|
||||
|
||||
If the value is ``false``, immediately close the connection
|
||||
without reading the response. This can be useful if the server
|
||||
is known to send large bodies for error responses.
|
||||
|
||||
|
||||
downloader.http.chunk-size
|
||||
--------------------------
|
||||
Type
|
||||
@ -4497,7 +4729,7 @@ Default
|
||||
Description
|
||||
Name of the metadata field whose value should be used.
|
||||
|
||||
This value must either be a UNIX timestamp or a
|
||||
This value must be either a UNIX timestamp or a
|
||||
|datetime|_ object.
|
||||
|
||||
Note: This option gets ignored if `mtime.value`_ is set.
|
||||
@ -4515,10 +4747,54 @@ Example
|
||||
Description
|
||||
A `format string`_ whose value should be used.
|
||||
|
||||
The resulting value must either be a UNIX timestamp or a
|
||||
The resulting value must be either a UNIX timestamp or a
|
||||
|datetime|_ object.
|
||||
|
||||
|
||||
python.archive
|
||||
--------------
|
||||
Type
|
||||
|Path|_
|
||||
Description
|
||||
File to store IDs of called Python functions in,
|
||||
similar to `extractor.*.archive`_.
|
||||
|
||||
``archive-format``, ``archive-prefix``, and ``archive-pragma`` options,
|
||||
akin to
|
||||
`extractor.*.archive-format`_,
|
||||
`extractor.*.archive-prefix`_, and
|
||||
`extractor.*.archive-pragma`_, are supported as well.
|
||||
|
||||
|
||||
python.event
|
||||
------------
|
||||
Type
|
||||
``string``
|
||||
Default
|
||||
``"file"``
|
||||
Description
|
||||
The event for which `python.function`_ gets called.
|
||||
|
||||
See `metadata.event`_ for a list of available events.
|
||||
|
||||
|
||||
python.function
|
||||
---------------
|
||||
Type
|
||||
``string``
|
||||
Example
|
||||
* ``"my_module:generate_text"``
|
||||
* ``"~/.local/share/gdl-utils.py:resize"``
|
||||
Description
|
||||
The Python function to call.
|
||||
|
||||
This function gets specified as ``<module>:<function name>``
|
||||
and gets called with the current metadata dict as argument.
|
||||
|
||||
``module`` is either an importable Python module name
|
||||
or the |Path|_ to a `.py` file,
|
||||
|
||||
|
||||
ugoira.extension
|
||||
----------------
|
||||
Type
|
||||
@ -4836,17 +5112,6 @@ Description
|
||||
used for (urllib3) warnings.
|
||||
|
||||
|
||||
pyopenssl
|
||||
---------
|
||||
Type
|
||||
``bool``
|
||||
Default
|
||||
``false``
|
||||
Description
|
||||
Use `pyOpenSSL <https://www.pyopenssl.org/en/stable/>`__-backed
|
||||
SSL-support.
|
||||
|
||||
|
||||
|
||||
API Tokens & IDs
|
||||
================
|
||||
@ -4912,6 +5177,10 @@ How To
|
||||
``user-agent`` and replace ``<application name>`` and ``<username>``
|
||||
accordingly (see Reddit's
|
||||
`API access rules <https://github.com/reddit/reddit/wiki/API>`__)
|
||||
* clear your `cache <cache.file_>`__ to delete any remaining
|
||||
``access-token`` entries. (``gallery-dl --clear-cache reddit``)
|
||||
* get a `refresh-token <extractor.reddit.refresh-token_>`__ for the
|
||||
new ``client-id`` (``gallery-dl oauth:reddit``)
|
||||
|
||||
|
||||
extractor.smugmug.api-key & .api-secret
|
||||
@ -5123,6 +5392,8 @@ Description
|
||||
Write metadata to separate files
|
||||
``mtime``
|
||||
Set file modification time according to its metadata
|
||||
``python``
|
||||
Call Python functions
|
||||
``ugoira``
|
||||
Convert Pixiv Ugoira to WebM using `FFmpeg <https://www.ffmpeg.org/>`__
|
||||
``zip``
|
||||
|
@ -11,14 +11,16 @@ Field names select the metadata value to use in a replacement field.
|
||||
|
||||
While simple names are usually enough, more complex forms like accessing values by attribute, element index, or slicing are also supported.
|
||||
|
||||
| | Example | Result |
|
||||
| -------------------- | ----------------- | ---------------------- |
|
||||
| Name | `{title}` | `Hello World` |
|
||||
| Element Index | `{title[6]}` | `W` |
|
||||
| Slicing | `{title[3:8]}` | `lo Wo` |
|
||||
| Alternatives | `{empty\|title}` | `Hello World` |
|
||||
| Element Access | `{user[name]}` | `John Doe` |
|
||||
| Attribute Access | `{extractor.url}` | `https://example.org/` |
|
||||
| | Example | Result |
|
||||
| -------------------- | ------------------- | ---------------------- |
|
||||
| Name | `{title}` | `Hello World` |
|
||||
| Element Index | `{title[6]}` | `W` |
|
||||
| Slicing | `{title[3:8]}` | `lo Wo` |
|
||||
| Slicing (Bytes) | `{title_ja[b3:18]}` | `ロー・ワー` |
|
||||
| Alternatives | `{empty\|title}` | `Hello World` |
|
||||
| Attribute Access | `{extractor.url}` | `https://example.org/` |
|
||||
| Element Access | `{user[name]}` | `John Doe` |
|
||||
| | `{user['name']}` | `John Doe` |
|
||||
|
||||
All of these methods can be combined as needed.
|
||||
For example `{title[24]|empty|extractor.url[15:-1]}` would result in `.org`.
|
||||
@ -92,6 +94,18 @@ Conversion specifiers allow to *convert* the value to a different form or type.
|
||||
<td><code>{created!d}</code></td>
|
||||
<td><code>2010-01-01 00:00:00</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td align="center"><code>U</code></td>
|
||||
<td>Convert HTML entities</td>
|
||||
<td><code>{html!U}</code></td>
|
||||
<td><code><p>foo & bar</p></code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td align="center"><code>H</code></td>
|
||||
<td>Convert HTML entities & remove HTML tags</td>
|
||||
<td><code>{html!H}</code></td>
|
||||
<td><code>foo & bar</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td align="center"><code>s</code></td>
|
||||
<td>Convert value to <a href="https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str" rel="nofollow"><code>str</code></a></td>
|
||||
@ -150,6 +164,12 @@ Format specifiers can be used for advanced formatting by using the options provi
|
||||
<td><code>{foo:[1:-1]}</code></td>
|
||||
<td><code>oo Ba</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><code>[b<start>:<stop>]</code></td>
|
||||
<td>Same as above, but applies to the <a href="https://docs.python.org/3/library/stdtypes.html#bytes"><code>bytes()</code></a> representation of a string in <a href="https://docs.python.org/3/library/sys.html#sys.getfilesystemencoding">filesystem encoding</a></td>
|
||||
<td><code>{foo_ja:[b3:-1]}</code></td>
|
||||
<td><code>ー・バ</code></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td rowspan="2"><code>L<maxlen>/<repl>/</code></td>
|
||||
<td rowspan="2">Replaces the entire output with <code><repl></code> if its length exceeds <code><maxlen></code></td>
|
||||
@ -193,7 +213,9 @@ Format specifiers can be used for advanced formatting by using the options provi
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
All special format specifiers (`?`, `L`, `J`, `R`, `D`, `O`) can be chained and combined with one another, but must always come before any standard format specifiers:
|
||||
All special format specifiers (`?`, `L`, `J`, `R`, `D`, `O`, etc)
|
||||
can be chained and combined with one another,
|
||||
but must always appear before any standard format specifiers:
|
||||
|
||||
For example `{foo:?//RF/B/Ro/e/> 10}` -> ` Bee Bar`
|
||||
- `?//` - Tests if `foo` has a value
|
||||
@ -244,7 +266,7 @@ Replacement field names that are available in all format strings.
|
||||
|
||||
## Special Type Format Strings
|
||||
|
||||
Starting a format string with '\f<Type> ' allows to set a different format string type than the default. Available ones are:
|
||||
Starting a format string with `\f<Type> ` allows to set a different format string type than the default. Available ones are:
|
||||
|
||||
<table>
|
||||
<thead>
|
||||
@ -285,13 +307,3 @@ Starting a format string with '\f<Type> ' allows to set a different format strin
|
||||
</tr>
|
||||
</tbody>
|
||||
</table>
|
||||
|
||||
> **Note:**
|
||||
>
|
||||
> `\f` is the [Form Feed](https://en.wikipedia.org/w/index.php?title=Page_break&oldid=1027475805#Form_feed)
|
||||
> character. (ASCII code 12 or 0xc)
|
||||
>
|
||||
> Writing it as `\f` is native to JSON, but will *not* get interpreted
|
||||
> as such by most shells. To use this character there:
|
||||
> * hold `Ctrl`, then press `v` followed by `l`, resulting in `^L` or
|
||||
> * use `echo` or `printf` (e.g. `gallery-dl -f "$(echo -ne \\fM) my_module:generate_text"`)
|
||||
|
@ -10,7 +10,7 @@
|
||||
"proxy": null,
|
||||
"skip": true,
|
||||
|
||||
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Firefox/102.0",
|
||||
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:115.0) Gecko/20100101 Firefox/115.0",
|
||||
"retries": 4,
|
||||
"timeout": 30.0,
|
||||
"verify": true,
|
||||
@ -108,8 +108,10 @@
|
||||
},
|
||||
"flickr":
|
||||
{
|
||||
"videos": true,
|
||||
"size-max": null
|
||||
"exif": false,
|
||||
"metadata": false,
|
||||
"size-max": null,
|
||||
"videos": true
|
||||
},
|
||||
"furaffinity":
|
||||
{
|
||||
@ -129,7 +131,7 @@
|
||||
},
|
||||
"gofile": {
|
||||
"api-token": null,
|
||||
"website-token": "12345"
|
||||
"website-token": null
|
||||
},
|
||||
"hentaifoundry":
|
||||
{
|
||||
@ -146,6 +148,9 @@
|
||||
"password": null,
|
||||
"sleep-request": 5.0
|
||||
},
|
||||
"imagechest": {
|
||||
"access-token": null
|
||||
},
|
||||
"imgbb":
|
||||
{
|
||||
"username": null,
|
||||
@ -166,6 +171,9 @@
|
||||
"api": "rest",
|
||||
"cookies": null,
|
||||
"include": "posts",
|
||||
"order-files": "asc",
|
||||
"order-posts": "asc",
|
||||
"previews": false,
|
||||
"sleep-request": [6.0, 12.0],
|
||||
"videos": true
|
||||
},
|
||||
@ -190,6 +198,7 @@
|
||||
"password": null
|
||||
},
|
||||
"misskey": {
|
||||
"access-token": null,
|
||||
"renotes": false,
|
||||
"replies": true
|
||||
},
|
||||
@ -201,10 +210,6 @@
|
||||
"format": "original",
|
||||
"include": "art"
|
||||
},
|
||||
"nana":
|
||||
{
|
||||
"favkey": null
|
||||
},
|
||||
"nijie":
|
||||
{
|
||||
"username": null,
|
||||
@ -243,6 +248,7 @@
|
||||
{
|
||||
"refresh-token": null,
|
||||
"include": "artworks",
|
||||
"embeds": false,
|
||||
"metadata": false,
|
||||
"metadata-bookmark": false,
|
||||
"tags": "japanese",
|
||||
@ -255,6 +261,9 @@
|
||||
},
|
||||
"reddit":
|
||||
{
|
||||
"client-id": null,
|
||||
"user-agent": null,
|
||||
"refresh-token": null,
|
||||
"comments": 0,
|
||||
"morecomments": false,
|
||||
"date-min": 0,
|
||||
|
@ -18,12 +18,6 @@
|
||||
--user-agent UA User-Agent request header
|
||||
--clear-cache MODULE Delete cached login sessions, cookies, etc. for
|
||||
MODULE (ALL to delete everything)
|
||||
--cookies FILE File to load additional cookies from
|
||||
--cookies-from-browser BROWSER[+KEYRING][:PROFILE][::CONTAINER]
|
||||
Name of the browser to load cookies from, with
|
||||
optional keyring name prefixed with '+', profile
|
||||
prefixed with ':', and container prefixed with
|
||||
'::' ('none' for no container)
|
||||
|
||||
## Output Options:
|
||||
-q, --quiet Activate quiet mode
|
||||
@ -84,6 +78,16 @@
|
||||
-p, --password PASS Password belonging to the given username
|
||||
--netrc Enable .netrc authentication data
|
||||
|
||||
## Cookie Options:
|
||||
-C, --cookies FILE File to load additional cookies from
|
||||
--cookies-export FILE Export session cookies to FILE
|
||||
--cookies-from-browser BROWSER[/DOMAIN][+KEYRING][:PROFILE][::CONTAINER]
|
||||
Name of the browser to load cookies from, with
|
||||
optional domain prefixed with '/', keyring name
|
||||
prefixed with '+', profile prefixed with ':',
|
||||
and container prefixed with '::' ('none' for no
|
||||
container)
|
||||
|
||||
## Selection Options:
|
||||
--download-archive FILE Record all downloaded or skipped files in FILE
|
||||
and skip downloading any file already in it
|
||||
|
@ -32,14 +32,14 @@ Consider all sites to be NSFW unless otherwise known.
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>420chan</td>
|
||||
<td>https://420chan.org/</td>
|
||||
<td>4chan</td>
|
||||
<td>https://www.4chan.org/</td>
|
||||
<td>Boards, Threads</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>4chan</td>
|
||||
<td>https://www.4chan.org/</td>
|
||||
<td>4chanarchives</td>
|
||||
<td>https://4chanarchives.com/</td>
|
||||
<td>Boards, Threads</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
@ -111,7 +111,7 @@ Consider all sites to be NSFW unless otherwise known.
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Bunkr</td>
|
||||
<td>https://bunkr.la/</td>
|
||||
<td>https://bunkrr.su/</td>
|
||||
<td>Albums</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
@ -251,7 +251,7 @@ Consider all sites to be NSFW unless otherwise known.
|
||||
<td>Gfycat</td>
|
||||
<td>https://gfycat.com/</td>
|
||||
<td>Collections, individual Images, Search Results, User Profiles</td>
|
||||
<td></td>
|
||||
<td>Supported</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Gofile</td>
|
||||
@ -394,7 +394,7 @@ Consider all sites to be NSFW unless otherwise known.
|
||||
<tr>
|
||||
<td>imgur</td>
|
||||
<td>https://imgur.com/</td>
|
||||
<td>Albums, Favorites, Galleries, individual Images, Search Results, Subreddits, Tag Searches, User Profiles</td>
|
||||
<td>Albums, Favorites, Favorites Folders, Galleries, individual Images, Search Results, Subreddits, Tag Searches, User Profiles</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
@ -427,6 +427,18 @@ Consider all sites to be NSFW unless otherwise known.
|
||||
<td>Galleries, individual Images</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>itch.io</td>
|
||||
<td>https://itch.io/</td>
|
||||
<td>Games</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>JPG Fish</td>
|
||||
<td>https://jpeg.pet/</td>
|
||||
<td>Albums, individual Images, User Profiles</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Keenspot</td>
|
||||
<td>http://www.keenspot.com/</td>
|
||||
@ -451,6 +463,12 @@ Consider all sites to be NSFW unless otherwise known.
|
||||
<td>Chapters, Manga</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Lensdump</td>
|
||||
<td>https://lensdump.com/</td>
|
||||
<td>Albums, individual Images</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Lexica</td>
|
||||
<td>https://lexica.art/</td>
|
||||
@ -463,12 +481,6 @@ Consider all sites to be NSFW unless otherwise known.
|
||||
<td>Galleries</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>LINE BLOG</td>
|
||||
<td>https://www.lineblog.me/</td>
|
||||
<td>Blogs, Posts</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>livedoor Blog</td>
|
||||
<td>http://blog.livedoor.jp/</td>
|
||||
@ -523,6 +535,12 @@ Consider all sites to be NSFW unless otherwise known.
|
||||
<td>Chapters, Manga</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>MangaRead</td>
|
||||
<td>https://mangaread.org/</td>
|
||||
<td>Chapters, Manga</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>MangaSee</td>
|
||||
<td>https://mangasee123.com/</td>
|
||||
@ -535,24 +553,12 @@ Consider all sites to be NSFW unless otherwise known.
|
||||
<td>Albums, Channels</td>
|
||||
<td>Supported</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>meme.museum</td>
|
||||
<td>https://meme.museum/</td>
|
||||
<td>Posts, Tag Searches</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>My Hentai Gallery</td>
|
||||
<td>https://myhentaigallery.com/</td>
|
||||
<td>Galleries</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Nana</td>
|
||||
<td>https://nana.my.id/</td>
|
||||
<td>Galleries, Favorites, Search Results</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Naver</td>
|
||||
<td>https://blog.naver.com/</td>
|
||||
@ -652,7 +658,7 @@ Consider all sites to be NSFW unless otherwise known.
|
||||
<tr>
|
||||
<td>Pixiv</td>
|
||||
<td>https://www.pixiv.net/</td>
|
||||
<td>Artworks, Avatars, Backgrounds, Favorites, Follows, pixiv.me Links, pixivision, Rankings, Search Results, Series, Sketch, User Profiles, individual Images</td>
|
||||
<td>Artworks, Avatars, Backgrounds, Favorites, Follows, pixiv.me Links, Novels, Novel Bookmarks, Novel Series, pixivision, Rankings, Search Results, Series, Sketch, User Profiles, individual Images</td>
|
||||
<td><a href="https://github.com/mikf/gallery-dl#oauth">OAuth</a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
@ -700,7 +706,7 @@ Consider all sites to be NSFW unless otherwise known.
|
||||
<tr>
|
||||
<td>Postimg</td>
|
||||
<td>https://postimages.org/</td>
|
||||
<td>individual Images</td>
|
||||
<td>Galleries, individual Images</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
@ -724,7 +730,7 @@ Consider all sites to be NSFW unless otherwise known.
|
||||
<tr>
|
||||
<td>RedGIFs</td>
|
||||
<td>https://redgifs.com/</td>
|
||||
<td>Collections, individual Images, Search Results, User Profiles</td>
|
||||
<td>Collections, individual Images, Niches, Search Results, User Profiles</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
@ -819,7 +825,7 @@ Consider all sites to be NSFW unless otherwise known.
|
||||
</tr>
|
||||
<tr>
|
||||
<td>TCB Scans</td>
|
||||
<td>https://onepiecechapters.com/</td>
|
||||
<td>https://tcbscans.com/</td>
|
||||
<td>Chapters, Manga</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
@ -844,7 +850,7 @@ Consider all sites to be NSFW unless otherwise known.
|
||||
<tr>
|
||||
<td>Tumblr</td>
|
||||
<td>https://www.tumblr.com/</td>
|
||||
<td>Likes, Posts, Tag Searches, User Profiles</td>
|
||||
<td>Days, Likes, Posts, Tag Searches, User Profiles</td>
|
||||
<td><a href="https://github.com/mikf/gallery-dl#oauth">OAuth</a></td>
|
||||
</tr>
|
||||
<tr>
|
||||
@ -868,7 +874,7 @@ Consider all sites to be NSFW unless otherwise known.
|
||||
<tr>
|
||||
<td>Twitter</td>
|
||||
<td>https://twitter.com/</td>
|
||||
<td>Avatars, Backgrounds, Bookmarks, Events, Hashtags, individual Images, Likes, Lists, List Members, Media Timelines, Search Results, Timelines, Tweets</td>
|
||||
<td>Avatars, Backgrounds, Bookmarks, Events, Hashtags, individual Images, Likes, Lists, List Members, Media Timelines, Search Results, Timelines, Tweets, User Profiles</td>
|
||||
<td>Supported</td>
|
||||
</tr>
|
||||
<tr>
|
||||
@ -887,7 +893,7 @@ Consider all sites to be NSFW unless otherwise known.
|
||||
<td>Vipergirls</td>
|
||||
<td>https://vipergirls.to/</td>
|
||||
<td>Posts, Threads</td>
|
||||
<td></td>
|
||||
<td>Supported</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Vipr</td>
|
||||
@ -989,7 +995,7 @@ Consider all sites to be NSFW unless otherwise known.
|
||||
<td>Zerochan</td>
|
||||
<td>https://www.zerochan.net/</td>
|
||||
<td>individual Images, Tag Searches</td>
|
||||
<td></td>
|
||||
<td>Supported</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>かべうち</td>
|
||||
@ -1003,12 +1009,6 @@ Consider all sites to be NSFW unless otherwise known.
|
||||
<td>Posts, Tag Searches</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>半次元</td>
|
||||
<td>https://bcy.net/</td>
|
||||
<td>Posts, User Profiles</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td colspan="4"><strong>Danbooru Instances</strong></td>
|
||||
@ -1031,6 +1031,12 @@ Consider all sites to be NSFW unless otherwise known.
|
||||
<td>Pools, Popular Images, Posts, Tag Searches</td>
|
||||
<td>Supported</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Booruvar</td>
|
||||
<td>https://booru.borvar.art/</td>
|
||||
<td>Pools, Popular Images, Posts, Tag Searches</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td colspan="4"><strong>e621 Instances</strong></td>
|
||||
@ -1047,6 +1053,12 @@ Consider all sites to be NSFW unless otherwise known.
|
||||
<td>Favorites, Pools, Popular Images, Posts, Tag Searches</td>
|
||||
<td>Supported</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>e6AI</td>
|
||||
<td>https://e6ai.net/</td>
|
||||
<td>Favorites, Pools, Popular Images, Posts, Tag Searches</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td colspan="4"><strong>Gelbooru Beta 0.1.11</strong></td>
|
||||
@ -1076,8 +1088,8 @@ Consider all sites to be NSFW unless otherwise known.
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>/v/idyart</td>
|
||||
<td>https://vidyart.booru.org/</td>
|
||||
<td>/v/idyart2</td>
|
||||
<td>https://vidyart2.booru.org/</td>
|
||||
<td>Favorites, Posts, Tag Searches</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
@ -1116,6 +1128,16 @@ Consider all sites to be NSFW unless otherwise known.
|
||||
<td></td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td colspan="4"><strong>jschan Imageboards</strong></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>94chan</td>
|
||||
<td>https://94chan.org/</td>
|
||||
<td>Boards, Threads</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td colspan="4"><strong>LynxChan Imageboards</strong></td>
|
||||
</tr>
|
||||
@ -1144,19 +1166,19 @@ Consider all sites to be NSFW unless otherwise known.
|
||||
<tr>
|
||||
<td>Misskey.io</td>
|
||||
<td>https://misskey.io/</td>
|
||||
<td>Images from Notes, User Profiles</td>
|
||||
<td>Favorites, Images from Notes, User Profiles</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Lesbian.energy</td>
|
||||
<td>https://lesbian.energy/</td>
|
||||
<td>Images from Notes, User Profiles</td>
|
||||
<td>Favorites, Images from Notes, User Profiles</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Sushi.ski</td>
|
||||
<td>https://sushi.ski/</td>
|
||||
<td>Images from Notes, User Profiles</td>
|
||||
<td>Favorites, Images from Notes, User Profiles</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
|
||||
@ -1266,6 +1288,40 @@ Consider all sites to be NSFW unless otherwise known.
|
||||
<td></td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td colspan="4"><strong>Shimmie2 Instances</strong></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>meme.museum</td>
|
||||
<td>https://meme.museum/</td>
|
||||
<td>Posts, Tag Searches</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Loudbooru</td>
|
||||
<td>https://loudbooru.com/</td>
|
||||
<td>Posts, Tag Searches</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Giantessbooru</td>
|
||||
<td>https://giantessbooru.com/</td>
|
||||
<td>Posts, Tag Searches</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Tentaclerape</td>
|
||||
<td>https://tentaclerape.net/</td>
|
||||
<td>Posts, Tag Searches</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Cavemanon</td>
|
||||
<td>https://booru.cavemanon.xyz/</td>
|
||||
<td>Posts, Tag Searches</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td colspan="4"><strong>szurubooru Instances</strong></td>
|
||||
</tr>
|
||||
@ -1388,14 +1444,8 @@ Consider all sites to be NSFW unless otherwise known.
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Rozen Arcana</td>
|
||||
<td>https://archive.alice.al/</td>
|
||||
<td>Boards, Galleries, Search Results, Threads</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>TokyoChronos</td>
|
||||
<td>https://www.tokyochronos.net/</td>
|
||||
<td>Palanq</td>
|
||||
<td>https://archive.palanq.win/</td>
|
||||
<td>Boards, Galleries, Search Results, Threads</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
@ -1421,12 +1471,6 @@ Consider all sites to be NSFW unless otherwise known.
|
||||
<td>Chapters, Manga</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>Sense-Scans</td>
|
||||
<td>https://sensescans.com/reader/</td>
|
||||
<td>Chapters, Manga</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
|
||||
<tr>
|
||||
<td colspan="4"><strong>Mastodon Instances</strong></td>
|
||||
|
@ -70,12 +70,14 @@ def main():
|
||||
if args.cookies_from_browser:
|
||||
browser, _, profile = args.cookies_from_browser.partition(":")
|
||||
browser, _, keyring = browser.partition("+")
|
||||
browser, _, domain = browser.partition("/")
|
||||
if profile.startswith(":"):
|
||||
container = profile[1:]
|
||||
profile = None
|
||||
else:
|
||||
profile, _, container = profile.partition("::")
|
||||
config.set((), "cookies", (browser, profile, keyring, container))
|
||||
config.set((), "cookies", (
|
||||
browser, profile, keyring, container, domain))
|
||||
if args.options_pp:
|
||||
config.set((), "postprocessor-options", args.options_pp)
|
||||
for opts in args.options:
|
||||
|
@ -102,7 +102,8 @@ def load(files=None, strict=False, load=util.json_loads):
|
||||
log.error(exc)
|
||||
sys.exit(1)
|
||||
except Exception as exc:
|
||||
log.warning("Could not parse '%s': %s", path, exc)
|
||||
log.error("%s when loading '%s': %s",
|
||||
exc.__class__.__name__, path, exc)
|
||||
if strict:
|
||||
sys.exit(2)
|
||||
else:
|
||||
@ -118,7 +119,7 @@ def clear():
|
||||
_config.clear()
|
||||
|
||||
|
||||
def get(path, key, default=None, *, conf=_config):
|
||||
def get(path, key, default=None, conf=_config):
|
||||
"""Get the value of property 'key' or a default value"""
|
||||
try:
|
||||
for p in path:
|
||||
@ -128,7 +129,7 @@ def get(path, key, default=None, *, conf=_config):
|
||||
return default
|
||||
|
||||
|
||||
def interpolate(path, key, default=None, *, conf=_config):
|
||||
def interpolate(path, key, default=None, conf=_config):
|
||||
"""Interpolate the value of 'key'"""
|
||||
if key in conf:
|
||||
return conf[key]
|
||||
@ -142,7 +143,7 @@ def interpolate(path, key, default=None, *, conf=_config):
|
||||
return default
|
||||
|
||||
|
||||
def interpolate_common(common, paths, key, default=None, *, conf=_config):
|
||||
def interpolate_common(common, paths, key, default=None, conf=_config):
|
||||
"""Interpolate the value of 'key'
|
||||
using multiple 'paths' along a 'common' ancestor
|
||||
"""
|
||||
@ -174,7 +175,7 @@ def interpolate_common(common, paths, key, default=None, *, conf=_config):
|
||||
return default
|
||||
|
||||
|
||||
def accumulate(path, key, *, conf=_config):
|
||||
def accumulate(path, key, conf=_config):
|
||||
"""Accumulate the values of 'key' along 'path'"""
|
||||
result = []
|
||||
try:
|
||||
@ -193,7 +194,7 @@ def accumulate(path, key, *, conf=_config):
|
||||
return result
|
||||
|
||||
|
||||
def set(path, key, value, *, conf=_config):
|
||||
def set(path, key, value, conf=_config):
|
||||
"""Set the value of property 'key' for this session"""
|
||||
for p in path:
|
||||
try:
|
||||
@ -203,7 +204,7 @@ def set(path, key, value, *, conf=_config):
|
||||
conf[key] = value
|
||||
|
||||
|
||||
def setdefault(path, key, value, *, conf=_config):
|
||||
def setdefault(path, key, value, conf=_config):
|
||||
"""Set the value of property 'key' if it doesn't exist"""
|
||||
for p in path:
|
||||
try:
|
||||
@ -213,7 +214,7 @@ def setdefault(path, key, value, *, conf=_config):
|
||||
return conf.setdefault(key, value)
|
||||
|
||||
|
||||
def unset(path, key, *, conf=_config):
|
||||
def unset(path, key, conf=_config):
|
||||
"""Unset the value of property 'key'"""
|
||||
try:
|
||||
for p in path:
|
||||
|
@ -20,7 +20,6 @@ import struct
|
||||
import subprocess
|
||||
import sys
|
||||
import tempfile
|
||||
from datetime import datetime, timedelta, timezone
|
||||
from hashlib import pbkdf2_hmac
|
||||
from http.cookiejar import Cookie
|
||||
from . import aes, text, util
|
||||
@ -34,19 +33,19 @@ logger = logging.getLogger("cookies")
|
||||
|
||||
|
||||
def load_cookies(cookiejar, browser_specification):
|
||||
browser_name, profile, keyring, container = \
|
||||
browser_name, profile, keyring, container, domain = \
|
||||
_parse_browser_specification(*browser_specification)
|
||||
if browser_name == "firefox":
|
||||
load_cookies_firefox(cookiejar, profile, container)
|
||||
load_cookies_firefox(cookiejar, profile, container, domain)
|
||||
elif browser_name == "safari":
|
||||
load_cookies_safari(cookiejar, profile)
|
||||
load_cookies_safari(cookiejar, profile, domain)
|
||||
elif browser_name in SUPPORTED_BROWSERS_CHROMIUM:
|
||||
load_cookies_chrome(cookiejar, browser_name, profile, keyring)
|
||||
load_cookies_chrome(cookiejar, browser_name, profile, keyring, domain)
|
||||
else:
|
||||
raise ValueError("unknown browser '{}'".format(browser_name))
|
||||
|
||||
|
||||
def load_cookies_firefox(cookiejar, profile=None, container=None):
|
||||
def load_cookies_firefox(cookiejar, profile=None, container=None, domain=None):
|
||||
path, container_id = _firefox_cookies_database(profile, container)
|
||||
with DatabaseCopy(path) as db:
|
||||
|
||||
@ -60,6 +59,13 @@ def load_cookies_firefox(cookiejar, profile=None, container=None):
|
||||
sql += " WHERE originAttributes LIKE ? OR originAttributes LIKE ?"
|
||||
uid = "%userContextId={}".format(container_id)
|
||||
parameters = (uid, uid + "&%")
|
||||
elif domain:
|
||||
if domain[0] == ".":
|
||||
sql += " WHERE host == ? OR host LIKE ?"
|
||||
parameters = (domain[1:], "%" + domain)
|
||||
else:
|
||||
sql += " WHERE host == ? OR host == ?"
|
||||
parameters = (domain, "." + domain)
|
||||
|
||||
set_cookie = cookiejar.set_cookie
|
||||
for name, value, domain, path, secure, expires in db.execute(
|
||||
@ -69,9 +75,10 @@ def load_cookies_firefox(cookiejar, profile=None, container=None):
|
||||
domain, bool(domain), domain.startswith("."),
|
||||
path, bool(path), secure, expires, False, None, None, {},
|
||||
))
|
||||
_log_info("Extracted %s cookies from Firefox", len(cookiejar))
|
||||
|
||||
|
||||
def load_cookies_safari(cookiejar, profile=None):
|
||||
def load_cookies_safari(cookiejar, profile=None, domain=None):
|
||||
"""Ref.: https://github.com/libyal/dtformats/blob
|
||||
/main/documentation/Safari%20Cookies.asciidoc
|
||||
- This data appears to be out of date
|
||||
@ -87,27 +94,40 @@ def load_cookies_safari(cookiejar, profile=None):
|
||||
_safari_parse_cookies_page(p.read_bytes(page_size), cookiejar)
|
||||
|
||||
|
||||
def load_cookies_chrome(cookiejar, browser_name, profile, keyring):
|
||||
def load_cookies_chrome(cookiejar, browser_name, profile=None,
|
||||
keyring=None, domain=None):
|
||||
config = _get_chromium_based_browser_settings(browser_name)
|
||||
path = _chrome_cookies_database(profile, config)
|
||||
logger.debug("Extracting cookies from %s", path)
|
||||
_log_debug("Extracting cookies from %s", path)
|
||||
|
||||
with DatabaseCopy(path) as db:
|
||||
db.text_factory = bytes
|
||||
decryptor = get_cookie_decryptor(
|
||||
config["directory"], config["keyring"], keyring=keyring)
|
||||
config["directory"], config["keyring"], keyring)
|
||||
|
||||
if domain:
|
||||
if domain[0] == ".":
|
||||
condition = " WHERE host_key == ? OR host_key LIKE ?"
|
||||
parameters = (domain[1:], "%" + domain)
|
||||
else:
|
||||
condition = " WHERE host_key == ? OR host_key == ?"
|
||||
parameters = (domain, "." + domain)
|
||||
else:
|
||||
condition = ""
|
||||
parameters = ()
|
||||
|
||||
try:
|
||||
rows = db.execute(
|
||||
"SELECT host_key, name, value, encrypted_value, path, "
|
||||
"expires_utc, is_secure FROM cookies")
|
||||
"expires_utc, is_secure FROM cookies" + condition, parameters)
|
||||
except sqlite3.OperationalError:
|
||||
rows = db.execute(
|
||||
"SELECT host_key, name, value, encrypted_value, path, "
|
||||
"expires_utc, secure FROM cookies")
|
||||
"expires_utc, secure FROM cookies" + condition, parameters)
|
||||
|
||||
set_cookie = cookiejar.set_cookie
|
||||
failed_cookies = unencrypted_cookies = 0
|
||||
failed_cookies = 0
|
||||
unencrypted_cookies = 0
|
||||
|
||||
for domain, name, value, enc_value, path, expires, secure in rows:
|
||||
|
||||
@ -135,11 +155,11 @@ def load_cookies_chrome(cookiejar, browser_name, profile, keyring):
|
||||
else:
|
||||
failed_message = ""
|
||||
|
||||
logger.info("Extracted %s cookies from %s%s",
|
||||
len(cookiejar), browser_name, failed_message)
|
||||
counts = decryptor.cookie_counts.copy()
|
||||
_log_info("Extracted %s cookies from %s%s",
|
||||
len(cookiejar), browser_name.capitalize(), failed_message)
|
||||
counts = decryptor.cookie_counts
|
||||
counts["unencrypted"] = unencrypted_cookies
|
||||
logger.debug("cookie version breakdown: %s", counts)
|
||||
_log_debug("Cookie version breakdown: %s", counts)
|
||||
|
||||
|
||||
# --------------------------------------------------------------------
|
||||
@ -157,11 +177,11 @@ def _firefox_cookies_database(profile=None, container=None):
|
||||
if path is None:
|
||||
raise FileNotFoundError("Unable to find Firefox cookies database in "
|
||||
"{}".format(search_root))
|
||||
logger.debug("Extracting cookies from %s", path)
|
||||
_log_debug("Extracting cookies from %s", path)
|
||||
|
||||
if container == "none":
|
||||
container_id = False
|
||||
logger.debug("Only loading cookies not belonging to any container")
|
||||
_log_debug("Only loading cookies not belonging to any container")
|
||||
|
||||
elif container:
|
||||
containers_path = os.path.join(
|
||||
@ -171,8 +191,8 @@ def _firefox_cookies_database(profile=None, container=None):
|
||||
with open(containers_path) as file:
|
||||
identities = util.json_loads(file.read())["identities"]
|
||||
except OSError:
|
||||
logger.error("Unable to read Firefox container database at %s",
|
||||
containers_path)
|
||||
_log_error("Unable to read Firefox container database at '%s'",
|
||||
containers_path)
|
||||
raise
|
||||
except KeyError:
|
||||
identities = ()
|
||||
@ -183,10 +203,10 @@ def _firefox_cookies_database(profile=None, container=None):
|
||||
container_id = context["userContextId"]
|
||||
break
|
||||
else:
|
||||
raise ValueError("Unable to find Firefox container {}".format(
|
||||
raise ValueError("Unable to find Firefox container '{}'".format(
|
||||
container))
|
||||
logger.debug("Only loading cookies from container '%s' (ID %s)",
|
||||
container, container_id)
|
||||
_log_debug("Only loading cookies from container '%s' (ID %s)",
|
||||
container, container_id)
|
||||
else:
|
||||
container_id = None
|
||||
|
||||
@ -209,7 +229,7 @@ def _safari_cookies_database():
|
||||
path = os.path.expanduser("~/Library/Cookies/Cookies.binarycookies")
|
||||
return open(path, "rb")
|
||||
except FileNotFoundError:
|
||||
logger.debug("Trying secondary cookie location")
|
||||
_log_debug("Trying secondary cookie location")
|
||||
path = os.path.expanduser("~/Library/Containers/com.apple.Safari/Data"
|
||||
"/Library/Cookies/Cookies.binarycookies")
|
||||
return open(path, "rb")
|
||||
@ -224,13 +244,13 @@ def _safari_parse_cookies_header(data):
|
||||
return page_sizes, p.cursor
|
||||
|
||||
|
||||
def _safari_parse_cookies_page(data, jar):
|
||||
def _safari_parse_cookies_page(data, cookiejar, domain=None):
|
||||
p = DataParser(data)
|
||||
p.expect_bytes(b"\x00\x00\x01\x00", "page signature")
|
||||
number_of_cookies = p.read_uint()
|
||||
record_offsets = [p.read_uint() for _ in range(number_of_cookies)]
|
||||
if number_of_cookies == 0:
|
||||
logger.debug("a cookies page of size %s has no cookies", len(data))
|
||||
_log_debug("Cookies page of size %s has no cookies", len(data))
|
||||
return
|
||||
|
||||
p.skip_to(record_offsets[0], "unknown page header field")
|
||||
@ -238,12 +258,12 @@ def _safari_parse_cookies_page(data, jar):
|
||||
for i, record_offset in enumerate(record_offsets):
|
||||
p.skip_to(record_offset, "space between records")
|
||||
record_length = _safari_parse_cookies_record(
|
||||
data[record_offset:], jar)
|
||||
data[record_offset:], cookiejar, domain)
|
||||
p.read_bytes(record_length)
|
||||
p.skip_to_end("space in between pages")
|
||||
|
||||
|
||||
def _safari_parse_cookies_record(data, cookiejar):
|
||||
def _safari_parse_cookies_record(data, cookiejar, host=None):
|
||||
p = DataParser(data)
|
||||
record_size = p.read_uint()
|
||||
p.skip(4, "unknown record field 1")
|
||||
@ -262,6 +282,14 @@ def _safari_parse_cookies_record(data, cookiejar):
|
||||
p.skip_to(domain_offset)
|
||||
domain = p.read_cstring()
|
||||
|
||||
if host:
|
||||
if host[0] == ".":
|
||||
if host[1:] != domain and not domain.endswith(host):
|
||||
return record_size
|
||||
else:
|
||||
if host != domain and ("." + host) != domain:
|
||||
return record_size
|
||||
|
||||
p.skip_to(name_offset)
|
||||
name = p.read_cstring()
|
||||
|
||||
@ -271,8 +299,7 @@ def _safari_parse_cookies_record(data, cookiejar):
|
||||
p.skip_to(value_offset)
|
||||
value = p.read_cstring()
|
||||
except UnicodeDecodeError:
|
||||
logger.warning("failed to parse Safari cookie "
|
||||
"because UTF-8 decoding failed")
|
||||
_log_warning("Failed to parse Safari cookie")
|
||||
return record_size
|
||||
|
||||
p.skip_to(record_size, "space at the end of the record")
|
||||
@ -300,7 +327,7 @@ def _chrome_cookies_database(profile, config):
|
||||
elif config["profiles"]:
|
||||
search_root = os.path.join(config["directory"], profile)
|
||||
else:
|
||||
logger.warning("%s does not support profiles", config["browser"])
|
||||
_log_warning("%s does not support profiles", config["browser"])
|
||||
search_root = config["directory"]
|
||||
|
||||
path = _find_most_recently_used_file(search_root, "Cookies")
|
||||
@ -412,18 +439,17 @@ class ChromeCookieDecryptor:
|
||||
raise NotImplementedError("Must be implemented by sub classes")
|
||||
|
||||
|
||||
def get_cookie_decryptor(browser_root, browser_keyring_name, *, keyring=None):
|
||||
def get_cookie_decryptor(browser_root, browser_keyring_name, keyring=None):
|
||||
if sys.platform in ("win32", "cygwin"):
|
||||
return WindowsChromeCookieDecryptor(browser_root)
|
||||
elif sys.platform == "darwin":
|
||||
return MacChromeCookieDecryptor(browser_keyring_name)
|
||||
else:
|
||||
return LinuxChromeCookieDecryptor(
|
||||
browser_keyring_name, keyring=keyring)
|
||||
return LinuxChromeCookieDecryptor(browser_keyring_name, keyring)
|
||||
|
||||
|
||||
class LinuxChromeCookieDecryptor(ChromeCookieDecryptor):
|
||||
def __init__(self, browser_keyring_name, *, keyring=None):
|
||||
def __init__(self, browser_keyring_name, keyring=None):
|
||||
self._v10_key = self.derive_key(b"peanuts")
|
||||
password = _get_linux_keyring_password(browser_keyring_name, keyring)
|
||||
self._v11_key = None if password is None else self.derive_key(password)
|
||||
@ -452,7 +478,7 @@ class LinuxChromeCookieDecryptor(ChromeCookieDecryptor):
|
||||
elif version == b"v11":
|
||||
self._cookie_counts["v11"] += 1
|
||||
if self._v11_key is None:
|
||||
logger.warning("cannot decrypt v11 cookies: no key found")
|
||||
_log_warning("Unable to decrypt v11 cookies: no key found")
|
||||
return None
|
||||
return _decrypt_aes_cbc(ciphertext, self._v11_key)
|
||||
|
||||
@ -486,7 +512,7 @@ class MacChromeCookieDecryptor(ChromeCookieDecryptor):
|
||||
if version == b"v10":
|
||||
self._cookie_counts["v10"] += 1
|
||||
if self._v10_key is None:
|
||||
logger.warning("cannot decrypt v10 cookies: no key found")
|
||||
_log_warning("Unable to decrypt v10 cookies: no key found")
|
||||
return None
|
||||
|
||||
return _decrypt_aes_cbc(ciphertext, self._v10_key)
|
||||
@ -516,7 +542,7 @@ class WindowsChromeCookieDecryptor(ChromeCookieDecryptor):
|
||||
if version == b"v10":
|
||||
self._cookie_counts["v10"] += 1
|
||||
if self._v10_key is None:
|
||||
logger.warning("cannot decrypt v10 cookies: no key found")
|
||||
_log_warning("Unable to decrypt v10 cookies: no key found")
|
||||
return None
|
||||
|
||||
# https://chromium.googlesource.com/chromium/src/+/refs/heads
|
||||
@ -554,7 +580,7 @@ def _choose_linux_keyring():
|
||||
SelectBackend
|
||||
"""
|
||||
desktop_environment = _get_linux_desktop_environment(os.environ)
|
||||
logger.debug("Detected desktop environment: %s", desktop_environment)
|
||||
_log_debug("Detected desktop environment: %s", desktop_environment)
|
||||
if desktop_environment == DE_KDE:
|
||||
return KEYRING_KWALLET
|
||||
if desktop_environment == DE_OTHER:
|
||||
@ -582,23 +608,23 @@ def _get_kwallet_network_wallet():
|
||||
)
|
||||
|
||||
if proc.returncode != 0:
|
||||
logger.warning("failed to read NetworkWallet")
|
||||
_log_warning("Failed to read NetworkWallet")
|
||||
return default_wallet
|
||||
else:
|
||||
network_wallet = stdout.decode().strip()
|
||||
logger.debug("NetworkWallet = '%s'", network_wallet)
|
||||
_log_debug("NetworkWallet = '%s'", network_wallet)
|
||||
return network_wallet
|
||||
except Exception as exc:
|
||||
logger.warning("exception while obtaining NetworkWallet (%s: %s)",
|
||||
exc.__class__.__name__, exc)
|
||||
_log_warning("Error while obtaining NetworkWallet (%s: %s)",
|
||||
exc.__class__.__name__, exc)
|
||||
return default_wallet
|
||||
|
||||
|
||||
def _get_kwallet_password(browser_keyring_name):
|
||||
logger.debug("using kwallet-query to obtain password from kwallet")
|
||||
_log_debug("Using kwallet-query to obtain password from kwallet")
|
||||
|
||||
if shutil.which("kwallet-query") is None:
|
||||
logger.error(
|
||||
_log_error(
|
||||
"kwallet-query command not found. KWallet and kwallet-query "
|
||||
"must be installed to read from KWallet. kwallet-query should be "
|
||||
"included in the kwallet package for your distribution")
|
||||
@ -615,14 +641,14 @@ def _get_kwallet_password(browser_keyring_name):
|
||||
)
|
||||
|
||||
if proc.returncode != 0:
|
||||
logger.error("kwallet-query failed with return code {}. "
|
||||
"Please consult the kwallet-query man page "
|
||||
"for details".format(proc.returncode))
|
||||
_log_error("kwallet-query failed with return code {}. "
|
||||
"Please consult the kwallet-query man page "
|
||||
"for details".format(proc.returncode))
|
||||
return b""
|
||||
|
||||
if stdout.lower().startswith(b"failed to read"):
|
||||
logger.debug("Failed to read password from kwallet. "
|
||||
"Using empty string instead")
|
||||
_log_debug("Failed to read password from kwallet. "
|
||||
"Using empty string instead")
|
||||
# This sometimes occurs in KDE because chrome does not check
|
||||
# hasEntry and instead just tries to read the value (which
|
||||
# kwallet returns "") whereas kwallet-query checks hasEntry.
|
||||
@ -633,13 +659,12 @@ def _get_kwallet_password(browser_keyring_name):
|
||||
# random password and store it, but that doesn't matter here.
|
||||
return b""
|
||||
else:
|
||||
logger.debug("password found")
|
||||
if stdout[-1:] == b"\n":
|
||||
stdout = stdout[:-1]
|
||||
return stdout
|
||||
except Exception as exc:
|
||||
logger.warning("exception running kwallet-query (%s: %s)",
|
||||
exc.__class__.__name__, exc)
|
||||
_log_warning("Error when running kwallet-query (%s: %s)",
|
||||
exc.__class__.__name__, exc)
|
||||
return b""
|
||||
|
||||
|
||||
@ -647,7 +672,7 @@ def _get_gnome_keyring_password(browser_keyring_name):
|
||||
try:
|
||||
import secretstorage
|
||||
except ImportError:
|
||||
logger.error("secretstorage not available")
|
||||
_log_error("'secretstorage' Python package not available")
|
||||
return b""
|
||||
|
||||
# Gnome keyring does not seem to organise keys in the same way as KWallet,
|
||||
@ -662,7 +687,7 @@ def _get_gnome_keyring_password(browser_keyring_name):
|
||||
if item.get_label() == label:
|
||||
return item.get_secret()
|
||||
else:
|
||||
logger.error("failed to read from keyring")
|
||||
_log_error("Failed to read from GNOME keyring")
|
||||
return b""
|
||||
|
||||
|
||||
@ -676,7 +701,7 @@ def _get_linux_keyring_password(browser_keyring_name, keyring):
|
||||
|
||||
if not keyring:
|
||||
keyring = _choose_linux_keyring()
|
||||
logger.debug("Chosen keyring: %s", keyring)
|
||||
_log_debug("Chosen keyring: %s", keyring)
|
||||
|
||||
if keyring == KEYRING_KWALLET:
|
||||
return _get_kwallet_password(browser_keyring_name)
|
||||
@ -690,8 +715,8 @@ def _get_linux_keyring_password(browser_keyring_name, keyring):
|
||||
|
||||
|
||||
def _get_mac_keyring_password(browser_keyring_name):
|
||||
logger.debug("using find-generic-password to obtain "
|
||||
"password from OSX keychain")
|
||||
_log_debug("Using find-generic-password to obtain "
|
||||
"password from OSX keychain")
|
||||
try:
|
||||
proc, stdout = Popen_communicate(
|
||||
"security", "find-generic-password",
|
||||
@ -704,28 +729,28 @@ def _get_mac_keyring_password(browser_keyring_name):
|
||||
stdout = stdout[:-1]
|
||||
return stdout
|
||||
except Exception as exc:
|
||||
logger.warning("exception running find-generic-password (%s: %s)",
|
||||
exc.__class__.__name__, exc)
|
||||
_log_warning("Error when using find-generic-password (%s: %s)",
|
||||
exc.__class__.__name__, exc)
|
||||
return None
|
||||
|
||||
|
||||
def _get_windows_v10_key(browser_root):
|
||||
path = _find_most_recently_used_file(browser_root, "Local State")
|
||||
if path is None:
|
||||
logger.error("could not find local state file")
|
||||
_log_error("Unable to find Local State file")
|
||||
return None
|
||||
logger.debug("Found local state file at '%s'", path)
|
||||
_log_debug("Found Local State file at '%s'", path)
|
||||
with open(path, encoding="utf-8") as file:
|
||||
data = util.json_loads(file.read())
|
||||
try:
|
||||
base64_key = data["os_crypt"]["encrypted_key"]
|
||||
except KeyError:
|
||||
logger.error("no encrypted key in Local State")
|
||||
_log_error("Unable to find encrypted key in Local State")
|
||||
return None
|
||||
encrypted_key = binascii.a2b_base64(base64_key)
|
||||
prefix = b"DPAPI"
|
||||
if not encrypted_key.startswith(prefix):
|
||||
logger.error("invalid key")
|
||||
_log_error("Invalid Local State key")
|
||||
return None
|
||||
return _decrypt_windows_dpapi(encrypted_key[len(prefix):])
|
||||
|
||||
@ -777,10 +802,10 @@ class DataParser:
|
||||
|
||||
def skip(self, num_bytes, description="unknown"):
|
||||
if num_bytes > 0:
|
||||
logger.debug("skipping {} bytes ({}): {!r}".format(
|
||||
_log_debug("Skipping {} bytes ({}): {!r}".format(
|
||||
num_bytes, description, self.read_bytes(num_bytes)))
|
||||
elif num_bytes < 0:
|
||||
raise ParserError("invalid skip of {} bytes".format(num_bytes))
|
||||
raise ParserError("Invalid skip of {} bytes".format(num_bytes))
|
||||
|
||||
def skip_to(self, offset, description="unknown"):
|
||||
self.skip(offset - self.cursor, description)
|
||||
@ -893,8 +918,8 @@ def _get_linux_desktop_environment(env):
|
||||
|
||||
|
||||
def _mac_absolute_time_to_posix(timestamp):
|
||||
return int((datetime(2001, 1, 1, 0, 0, tzinfo=timezone.utc) +
|
||||
timedelta(seconds=timestamp)).timestamp())
|
||||
# 978307200 is timestamp of 2001-01-01 00:00:00
|
||||
return 978307200 + int(timestamp)
|
||||
|
||||
|
||||
def pbkdf2_sha1(password, salt, iterations, key_length):
|
||||
@ -902,31 +927,25 @@ def pbkdf2_sha1(password, salt, iterations, key_length):
|
||||
|
||||
|
||||
def _decrypt_aes_cbc(ciphertext, key, initialization_vector=b" " * 16):
|
||||
plaintext = aes.unpad_pkcs7(
|
||||
aes.aes_cbc_decrypt_bytes(ciphertext, key, initialization_vector))
|
||||
try:
|
||||
return plaintext.decode()
|
||||
return aes.unpad_pkcs7(aes.aes_cbc_decrypt_bytes(
|
||||
ciphertext, key, initialization_vector)).decode()
|
||||
except UnicodeDecodeError:
|
||||
logger.warning("failed to decrypt cookie (AES-CBC) because UTF-8 "
|
||||
"decoding failed. Possibly the key is wrong?")
|
||||
return None
|
||||
_log_warning("Failed to decrypt cookie (AES-CBC Unicode)")
|
||||
except ValueError:
|
||||
_log_warning("Failed to decrypt cookie (AES-CBC)")
|
||||
return None
|
||||
|
||||
|
||||
def _decrypt_aes_gcm(ciphertext, key, nonce, authentication_tag):
|
||||
try:
|
||||
plaintext = aes.aes_gcm_decrypt_and_verify_bytes(
|
||||
ciphertext, key, authentication_tag, nonce)
|
||||
except ValueError:
|
||||
logger.warning("failed to decrypt cookie (AES-GCM) because MAC check "
|
||||
"failed. Possibly the key is wrong?")
|
||||
return None
|
||||
|
||||
try:
|
||||
return plaintext.decode()
|
||||
return aes.aes_gcm_decrypt_and_verify_bytes(
|
||||
ciphertext, key, authentication_tag, nonce).decode()
|
||||
except UnicodeDecodeError:
|
||||
logger.warning("failed to decrypt cookie (AES-GCM) because UTF-8 "
|
||||
"decoding failed. Possibly the key is wrong?")
|
||||
return None
|
||||
_log_warning("Failed to decrypt cookie (AES-GCM Unicode)")
|
||||
except ValueError:
|
||||
_log_warning("Failed to decrypt cookie (AES-GCM MAC)")
|
||||
return None
|
||||
|
||||
|
||||
def _decrypt_windows_dpapi(ciphertext):
|
||||
@ -954,7 +973,7 @@ def _decrypt_windows_dpapi(ciphertext):
|
||||
ctypes.byref(blob_out) # pDataOut
|
||||
)
|
||||
if not ret:
|
||||
logger.warning("failed to decrypt with DPAPI")
|
||||
_log_warning("Failed to decrypt cookie (DPAPI)")
|
||||
return None
|
||||
|
||||
result = ctypes.string_at(blob_out.pbData, blob_out.cbData)
|
||||
@ -979,12 +998,29 @@ def _is_path(value):
|
||||
|
||||
|
||||
def _parse_browser_specification(
|
||||
browser, profile=None, keyring=None, container=None):
|
||||
browser, profile=None, keyring=None, container=None, domain=None):
|
||||
browser = browser.lower()
|
||||
if browser not in SUPPORTED_BROWSERS:
|
||||
raise ValueError("unsupported browser '{}'".format(browser))
|
||||
raise ValueError("Unsupported browser '{}'".format(browser))
|
||||
if keyring and keyring not in SUPPORTED_KEYRINGS:
|
||||
raise ValueError("unsupported keyring '{}'".format(keyring))
|
||||
raise ValueError("Unsupported keyring '{}'".format(keyring))
|
||||
if profile and _is_path(profile):
|
||||
profile = os.path.expanduser(profile)
|
||||
return browser, profile, keyring, container
|
||||
return browser, profile, keyring, container, domain
|
||||
|
||||
|
||||
_log_cache = set()
|
||||
_log_debug = logger.debug
|
||||
_log_info = logger.info
|
||||
|
||||
|
||||
def _log_warning(msg, *args):
|
||||
if msg not in _log_cache:
|
||||
_log_cache.add(msg)
|
||||
logger.warning(msg, *args)
|
||||
|
||||
|
||||
def _log_error(msg, *args):
|
||||
if msg not in _log_cache:
|
||||
_log_cache.add(msg)
|
||||
logger.error(msg, *args)
|
||||
|
@ -44,6 +44,12 @@ class HttpDownloader(DownloaderBase):
|
||||
self.mtime = self.config("mtime", True)
|
||||
self.rate = self.config("rate")
|
||||
|
||||
if not self.config("consume-content", False):
|
||||
# this resets the underlying TCP connection, and therefore
|
||||
# if the program makes another request to the same domain,
|
||||
# a new connection (either TLS or plain TCP) must be made
|
||||
self.release_conn = lambda resp: resp.close()
|
||||
|
||||
if self.retries < 0:
|
||||
self.retries = float("inf")
|
||||
if self.minsize:
|
||||
@ -106,7 +112,7 @@ class HttpDownloader(DownloaderBase):
|
||||
while True:
|
||||
if tries:
|
||||
if response:
|
||||
response.close()
|
||||
self.release_conn(response)
|
||||
response = None
|
||||
self.log.warning("%s (%s/%s)", msg, tries, self.retries+1)
|
||||
if tries > self.retries:
|
||||
@ -165,18 +171,24 @@ class HttpDownloader(DownloaderBase):
|
||||
retry = kwdict.get("_http_retry")
|
||||
if retry and retry(response):
|
||||
continue
|
||||
self.release_conn(response)
|
||||
self.log.warning(msg)
|
||||
return False
|
||||
|
||||
# check for invalid responses
|
||||
validate = kwdict.get("_http_validate")
|
||||
if validate and self.validate:
|
||||
result = validate(response)
|
||||
try:
|
||||
result = validate(response)
|
||||
except Exception:
|
||||
self.release_conn(response)
|
||||
raise
|
||||
if isinstance(result, str):
|
||||
url = result
|
||||
tries -= 1
|
||||
continue
|
||||
if not result:
|
||||
self.release_conn(response)
|
||||
self.log.warning("Invalid response")
|
||||
return False
|
||||
|
||||
@ -184,11 +196,13 @@ class HttpDownloader(DownloaderBase):
|
||||
size = text.parse_int(size, None)
|
||||
if size is not None:
|
||||
if self.minsize and size < self.minsize:
|
||||
self.release_conn(response)
|
||||
self.log.warning(
|
||||
"File size smaller than allowed minimum (%s < %s)",
|
||||
size, self.minsize)
|
||||
return False
|
||||
if self.maxsize and size > self.maxsize:
|
||||
self.release_conn(response)
|
||||
self.log.warning(
|
||||
"File size larger than allowed maximum (%s > %s)",
|
||||
size, self.maxsize)
|
||||
@ -280,6 +294,18 @@ class HttpDownloader(DownloaderBase):
|
||||
|
||||
return True
|
||||
|
||||
def release_conn(self, response):
|
||||
"""Release connection back to pool by consuming response body"""
|
||||
try:
|
||||
for _ in response.iter_content(self.chunk_size):
|
||||
pass
|
||||
except (RequestException, SSLError, OpenSSLError) as exc:
|
||||
print()
|
||||
self.log.debug(
|
||||
"Unable to consume response body (%s: %s); "
|
||||
"closing the connection anyway", exc.__class__.__name__, exc)
|
||||
response.close()
|
||||
|
||||
@staticmethod
|
||||
def receive(fp, content, bytes_total, bytes_start):
|
||||
write = fp.write
|
||||
|
@ -1,6 +1,6 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2015-2020 Mike Fährmann
|
||||
# Copyright 2015-2023 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
@ -17,12 +17,10 @@ class _3dbooruBase():
|
||||
basecategory = "booru"
|
||||
root = "http://behoimi.org"
|
||||
|
||||
def __init__(self, match):
|
||||
super().__init__(match)
|
||||
self.session.headers.update({
|
||||
"Referer": "http://behoimi.org/post/show/",
|
||||
"Accept-Encoding": "identity",
|
||||
})
|
||||
def _init(self):
|
||||
headers = self.session.headers
|
||||
headers["Referer"] = "http://behoimi.org/post/show/"
|
||||
headers["Accept-Encoding"] = "identity"
|
||||
|
||||
|
||||
class _3dbooruTagExtractor(_3dbooruBase, moebooru.MoebooruTagExtractor):
|
||||
|
@ -1,76 +0,0 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2021 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
# published by the Free Software Foundation.
|
||||
|
||||
"""Extractors for https://420chan.org/"""
|
||||
|
||||
from .common import Extractor, Message
|
||||
|
||||
|
||||
class _420chanThreadExtractor(Extractor):
|
||||
"""Extractor for 420chan threads"""
|
||||
category = "420chan"
|
||||
subcategory = "thread"
|
||||
directory_fmt = ("{category}", "{board}", "{thread} {title}")
|
||||
archive_fmt = "{board}_{thread}_{filename}"
|
||||
pattern = r"(?:https?://)?boards\.420chan\.org/([^/?#]+)/thread/(\d+)"
|
||||
test = ("https://boards.420chan.org/ani/thread/33251/chow-chows", {
|
||||
"pattern": r"https://boards\.420chan\.org/ani/src/\d+\.jpg",
|
||||
"content": "b07c803b0da78de159709da923e54e883c100934",
|
||||
"count": 2,
|
||||
})
|
||||
|
||||
def __init__(self, match):
|
||||
Extractor.__init__(self, match)
|
||||
self.board, self.thread = match.groups()
|
||||
|
||||
def items(self):
|
||||
url = "https://api.420chan.org/{}/res/{}.json".format(
|
||||
self.board, self.thread)
|
||||
posts = self.request(url).json()["posts"]
|
||||
|
||||
data = {
|
||||
"board" : self.board,
|
||||
"thread": self.thread,
|
||||
"title" : posts[0].get("sub") or posts[0]["com"][:50],
|
||||
}
|
||||
|
||||
yield Message.Directory, data
|
||||
for post in posts:
|
||||
if "filename" in post:
|
||||
post.update(data)
|
||||
post["extension"] = post["ext"][1:]
|
||||
url = "https://boards.420chan.org/{}/src/{}{}".format(
|
||||
post["board"], post["filename"], post["ext"])
|
||||
yield Message.Url, url, post
|
||||
|
||||
|
||||
class _420chanBoardExtractor(Extractor):
|
||||
"""Extractor for 420chan boards"""
|
||||
category = "420chan"
|
||||
subcategory = "board"
|
||||
pattern = r"(?:https?://)?boards\.420chan\.org/([^/?#]+)/\d*$"
|
||||
test = ("https://boards.420chan.org/po/", {
|
||||
"pattern": _420chanThreadExtractor.pattern,
|
||||
"count": ">= 100",
|
||||
})
|
||||
|
||||
def __init__(self, match):
|
||||
Extractor.__init__(self, match)
|
||||
self.board = match.group(1)
|
||||
|
||||
def items(self):
|
||||
url = "https://api.420chan.org/{}/threads.json".format(self.board)
|
||||
threads = self.request(url).json()
|
||||
|
||||
for page in threads:
|
||||
for thread in page["threads"]:
|
||||
url = "https://boards.420chan.org/{}/thread/{}/".format(
|
||||
self.board, thread["no"])
|
||||
thread["page"] = page["page"]
|
||||
thread["_extractor"] = _420chanThreadExtractor
|
||||
yield Message.Queue, url, thread
|
139
gallery_dl/extractor/4chanarchives.py
Normal file
139
gallery_dl/extractor/4chanarchives.py
Normal file
@ -0,0 +1,139 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2023 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
# published by the Free Software Foundation.
|
||||
|
||||
"""Extractors for https://4chanarchives.com/"""
|
||||
|
||||
from .common import Extractor, Message
|
||||
from .. import text
|
||||
|
||||
|
||||
class _4chanarchivesThreadExtractor(Extractor):
|
||||
"""Extractor for threads on 4chanarchives.com"""
|
||||
category = "4chanarchives"
|
||||
subcategory = "thread"
|
||||
root = "https://4chanarchives.com"
|
||||
directory_fmt = ("{category}", "{board}", "{thread} - {title}")
|
||||
filename_fmt = "{no}-{filename}.{extension}"
|
||||
archive_fmt = "{board}_{thread}_{no}"
|
||||
pattern = r"(?:https?://)?4chanarchives\.com/board/([^/?#]+)/thread/(\d+)"
|
||||
test = (
|
||||
("https://4chanarchives.com/board/c/thread/2707110", {
|
||||
"pattern": r"https://i\.imgur\.com/(0wLGseE|qbByWDc)\.jpg",
|
||||
"count": 2,
|
||||
"keyword": {
|
||||
"board": "c",
|
||||
"com": str,
|
||||
"name": "Anonymous",
|
||||
"no": int,
|
||||
"thread": "2707110",
|
||||
"time": r"re:2016-07-1\d \d\d:\d\d:\d\d",
|
||||
"title": "Ren Kagami from 'Oyako Neburi'",
|
||||
},
|
||||
}),
|
||||
)
|
||||
|
||||
def __init__(self, match):
|
||||
Extractor.__init__(self, match)
|
||||
self.board, self.thread = match.groups()
|
||||
|
||||
def items(self):
|
||||
url = "{}/board/{}/thread/{}".format(
|
||||
self.root, self.board, self.thread)
|
||||
page = self.request(url).text
|
||||
data = self.metadata(page)
|
||||
posts = self.posts(page)
|
||||
|
||||
if not data["title"]:
|
||||
data["title"] = text.unescape(text.remove_html(
|
||||
posts[0]["com"]))[:50]
|
||||
|
||||
for post in posts:
|
||||
post.update(data)
|
||||
yield Message.Directory, post
|
||||
if "url" in post:
|
||||
yield Message.Url, post["url"], post
|
||||
|
||||
def metadata(self, page):
|
||||
return {
|
||||
"board" : self.board,
|
||||
"thread" : self.thread,
|
||||
"title" : text.unescape(text.extr(
|
||||
page, 'property="og:title" content="', '"')),
|
||||
}
|
||||
|
||||
def posts(self, page):
|
||||
"""Build a list of all post objects"""
|
||||
return [self.parse(html) for html in text.extract_iter(
|
||||
page, 'id="pc', '</blockquote>')]
|
||||
|
||||
def parse(self, html):
|
||||
"""Build post object by extracting data from an HTML post"""
|
||||
post = self._extract_post(html)
|
||||
if ">File: <" in html:
|
||||
self._extract_file(html, post)
|
||||
post["extension"] = post["url"].rpartition(".")[2]
|
||||
return post
|
||||
|
||||
@staticmethod
|
||||
def _extract_post(html):
|
||||
extr = text.extract_from(html)
|
||||
return {
|
||||
"no" : text.parse_int(extr('', '"')),
|
||||
"name": extr('class="name">', '<'),
|
||||
"time": extr('class="dateTime postNum" >', '<').rstrip(),
|
||||
"com" : text.unescape(
|
||||
html[html.find('<blockquote'):].partition(">")[2]),
|
||||
}
|
||||
|
||||
@staticmethod
|
||||
def _extract_file(html, post):
|
||||
extr = text.extract_from(html, html.index(">File: <"))
|
||||
post["url"] = extr('href="', '"')
|
||||
post["filename"] = text.unquote(extr(">", "<").rpartition(".")[0])
|
||||
post["fsize"] = extr("(", ", ")
|
||||
post["w"] = text.parse_int(extr("", "x"))
|
||||
post["h"] = text.parse_int(extr("", ")"))
|
||||
|
||||
|
||||
class _4chanarchivesBoardExtractor(Extractor):
|
||||
"""Extractor for boards on 4chanarchives.com"""
|
||||
category = "4chanarchives"
|
||||
subcategory = "board"
|
||||
root = "https://4chanarchives.com"
|
||||
pattern = r"(?:https?://)?4chanarchives\.com/board/([^/?#]+)(?:/(\d+))?/?$"
|
||||
test = (
|
||||
("https://4chanarchives.com/board/c/", {
|
||||
"pattern": _4chanarchivesThreadExtractor.pattern,
|
||||
"range": "1-40",
|
||||
"count": 40,
|
||||
}),
|
||||
("https://4chanarchives.com/board/c"),
|
||||
("https://4chanarchives.com/board/c/10"),
|
||||
)
|
||||
|
||||
def __init__(self, match):
|
||||
Extractor.__init__(self, match)
|
||||
self.board, self.page = match.groups()
|
||||
|
||||
def items(self):
|
||||
data = {"_extractor": _4chanarchivesThreadExtractor}
|
||||
pnum = text.parse_int(self.page, 1)
|
||||
needle = '''<span class="postNum desktop">
|
||||
<span><a href="'''
|
||||
|
||||
while True:
|
||||
url = "{}/board/{}/{}".format(self.root, self.board, pnum)
|
||||
page = self.request(url).text
|
||||
|
||||
thread = None
|
||||
for thread in text.extract_iter(page, needle, '"'):
|
||||
yield Message.Queue, thread, data
|
||||
|
||||
if thread is None:
|
||||
return
|
||||
pnum += 1
|
@ -21,10 +21,9 @@ class _500pxExtractor(Extractor):
|
||||
filename_fmt = "{id}_{name}.{extension}"
|
||||
archive_fmt = "{id}"
|
||||
root = "https://500px.com"
|
||||
cookiedomain = ".500px.com"
|
||||
cookies_domain = ".500px.com"
|
||||
|
||||
def __init__(self, match):
|
||||
Extractor.__init__(self, match)
|
||||
def _init(self):
|
||||
self.session.headers["Referer"] = self.root + "/"
|
||||
|
||||
def items(self):
|
||||
@ -73,7 +72,7 @@ class _500pxExtractor(Extractor):
|
||||
def _request_api(self, url, params):
|
||||
headers = {
|
||||
"Origin": self.root,
|
||||
"x-csrf-token": self.session.cookies.get(
|
||||
"x-csrf-token": self.cookies.get(
|
||||
"x-csrf-token", domain=".500px.com"),
|
||||
}
|
||||
return self.request(url, headers=headers, params=params).json()
|
||||
@ -81,7 +80,7 @@ class _500pxExtractor(Extractor):
|
||||
def _request_graphql(self, opname, variables):
|
||||
url = "https://api.500px.com/graphql"
|
||||
headers = {
|
||||
"x-csrf-token": self.session.cookies.get(
|
||||
"x-csrf-token": self.cookies.get(
|
||||
"x-csrf-token", domain=".500px.com"),
|
||||
}
|
||||
data = {
|
||||
|
@ -1,6 +1,6 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2022 Mike Fährmann
|
||||
# Copyright 2022-2023 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
@ -27,7 +27,7 @@ class _8chanExtractor(Extractor):
|
||||
Extractor.__init__(self, match)
|
||||
|
||||
@memcache()
|
||||
def _prepare_cookies(self):
|
||||
def cookies_prepare(self):
|
||||
# fetch captcha cookies
|
||||
# (necessary to download without getting interrupted)
|
||||
now = datetime.utcnow()
|
||||
@ -39,14 +39,14 @@ class _8chanExtractor(Extractor):
|
||||
# - remove 'expires' timestamp
|
||||
# - move 'captchaexpiration' value forward by 1 month)
|
||||
domain = self.root.rpartition("/")[2]
|
||||
for cookie in self.session.cookies:
|
||||
for cookie in self.cookies:
|
||||
if cookie.domain.endswith(domain):
|
||||
cookie.expires = None
|
||||
if cookie.name == "captchaexpiration":
|
||||
cookie.value = (now + timedelta(30, 300)).strftime(
|
||||
"%a, %d %b %Y %H:%M:%S GMT")
|
||||
|
||||
return self.session.cookies
|
||||
return self.cookies
|
||||
|
||||
|
||||
class _8chanThreadExtractor(_8chanExtractor):
|
||||
@ -113,7 +113,7 @@ class _8chanThreadExtractor(_8chanExtractor):
|
||||
thread["_http_headers"] = {"Referer": url + "html"}
|
||||
|
||||
try:
|
||||
self.session.cookies = self._prepare_cookies()
|
||||
self.cookies = self.cookies_prepare()
|
||||
except Exception as exc:
|
||||
self.log.debug("Failed to fetch captcha cookies: %s: %s",
|
||||
exc.__class__.__name__, exc, exc_info=True)
|
||||
@ -150,6 +150,8 @@ class _8chanBoardExtractor(_8chanExtractor):
|
||||
def __init__(self, match):
|
||||
_8chanExtractor.__init__(self, match)
|
||||
_, self.board, self.page = match.groups()
|
||||
|
||||
def _init(self):
|
||||
self.session.headers["Referer"] = self.root + "/"
|
||||
|
||||
def items(self):
|
||||
|
@ -35,8 +35,10 @@ class _8musesAlbumExtractor(Extractor):
|
||||
"id" : 10467,
|
||||
"title" : "Liar",
|
||||
"path" : "Fakku Comics/mogg/Liar",
|
||||
"parts" : ["Fakku Comics", "mogg", "Liar"],
|
||||
"private": False,
|
||||
"url" : str,
|
||||
"url" : "https://comics.8muses.com/comics"
|
||||
"/album/Fakku-Comics/mogg/Liar",
|
||||
"parent" : 10464,
|
||||
"views" : int,
|
||||
"likes" : int,
|
||||
@ -118,9 +120,10 @@ class _8musesAlbumExtractor(Extractor):
|
||||
return {
|
||||
"id" : album["id"],
|
||||
"path" : album["path"],
|
||||
"parts" : album["path"].split("/"),
|
||||
"title" : album["name"],
|
||||
"private": album["isPrivate"],
|
||||
"url" : self.root + album["permalink"],
|
||||
"url" : self.root + "/comics/album/" + album["permalink"],
|
||||
"parent" : text.parse_int(album["parentId"]),
|
||||
"views" : text.parse_int(album["numberViews"]),
|
||||
"likes" : text.parse_int(album["numberLikes"]),
|
||||
|
@ -14,8 +14,8 @@ modules = [
|
||||
"2chen",
|
||||
"35photo",
|
||||
"3dbooru",
|
||||
"420chan",
|
||||
"4chan",
|
||||
"4chanarchives",
|
||||
"500px",
|
||||
"8chan",
|
||||
"8muses",
|
||||
@ -24,7 +24,6 @@ modules = [
|
||||
"artstation",
|
||||
"aryion",
|
||||
"bbc",
|
||||
"bcy",
|
||||
"behance",
|
||||
"blogger",
|
||||
"bunkr",
|
||||
@ -74,14 +73,17 @@ modules = [
|
||||
"instagram",
|
||||
"issuu",
|
||||
"itaku",
|
||||
"itchio",
|
||||
"jpgfish",
|
||||
"jschan",
|
||||
"kabeuchi",
|
||||
"keenspot",
|
||||
"kemonoparty",
|
||||
"khinsider",
|
||||
"komikcast",
|
||||
"lensdump",
|
||||
"lexica",
|
||||
"lightroom",
|
||||
"lineblog",
|
||||
"livedoor",
|
||||
"luscious",
|
||||
"lynxchan",
|
||||
@ -91,13 +93,12 @@ modules = [
|
||||
"mangakakalot",
|
||||
"manganelo",
|
||||
"mangapark",
|
||||
"mangaread",
|
||||
"mangasee",
|
||||
"mangoxo",
|
||||
"mememuseum",
|
||||
"misskey",
|
||||
"myhentaigallery",
|
||||
"myportfolio",
|
||||
"nana",
|
||||
"naver",
|
||||
"naverwebtoon",
|
||||
"newgrounds",
|
||||
@ -133,6 +134,7 @@ modules = [
|
||||
"seiga",
|
||||
"senmanga",
|
||||
"sexcom",
|
||||
"shimmie2",
|
||||
"simplyhentai",
|
||||
"skeb",
|
||||
"slickpic",
|
||||
|
@ -1,6 +1,6 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2018-2022 Mike Fährmann
|
||||
# Copyright 2018-2023 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
@ -27,12 +27,12 @@ class ArtstationExtractor(Extractor):
|
||||
def __init__(self, match):
|
||||
Extractor.__init__(self, match)
|
||||
self.user = match.group(1) or match.group(2)
|
||||
self.external = self.config("external", False)
|
||||
|
||||
def items(self):
|
||||
data = self.metadata()
|
||||
|
||||
projects = self.projects()
|
||||
external = self.config("external", False)
|
||||
max_posts = self.config("max-posts")
|
||||
if max_posts:
|
||||
projects = itertools.islice(projects, max_posts)
|
||||
@ -45,7 +45,7 @@ class ArtstationExtractor(Extractor):
|
||||
asset["num"] = num
|
||||
yield Message.Directory, asset
|
||||
|
||||
if adict["has_embedded_player"] and self.external:
|
||||
if adict["has_embedded_player"] and external:
|
||||
player = adict["player_embedded"]
|
||||
url = (text.extr(player, 'src="', '"') or
|
||||
text.extr(player, "src='", "'"))
|
||||
|
@ -23,8 +23,8 @@ class AryionExtractor(Extractor):
|
||||
directory_fmt = ("{category}", "{user!l}", "{path:J - }")
|
||||
filename_fmt = "{id} {title}.{extension}"
|
||||
archive_fmt = "{id}"
|
||||
cookiedomain = ".aryion.com"
|
||||
cookienames = ("phpbb3_rl7a3_sid",)
|
||||
cookies_domain = ".aryion.com"
|
||||
cookies_names = ("phpbb3_rl7a3_sid",)
|
||||
root = "https://aryion.com"
|
||||
|
||||
def __init__(self, match):
|
||||
@ -33,11 +33,12 @@ class AryionExtractor(Extractor):
|
||||
self.recursive = True
|
||||
|
||||
def login(self):
|
||||
if self._check_cookies(self.cookienames):
|
||||
if self.cookies_check(self.cookies_names):
|
||||
return
|
||||
|
||||
username, password = self._get_auth_info()
|
||||
if username:
|
||||
self._update_cookies(self._login_impl(username, password))
|
||||
self.cookies_update(self._login_impl(username, password))
|
||||
|
||||
@cache(maxage=14*24*3600, keyarg=1)
|
||||
def _login_impl(self, username, password):
|
||||
@ -53,7 +54,7 @@ class AryionExtractor(Extractor):
|
||||
response = self.request(url, method="POST", data=data)
|
||||
if b"You have been successfully logged in." not in response.content:
|
||||
raise exception.AuthenticationError()
|
||||
return {c: response.cookies[c] for c in self.cookienames}
|
||||
return {c: response.cookies[c] for c in self.cookies_names}
|
||||
|
||||
def items(self):
|
||||
self.login()
|
||||
@ -188,9 +189,11 @@ class AryionGalleryExtractor(AryionExtractor):
|
||||
|
||||
def __init__(self, match):
|
||||
AryionExtractor.__init__(self, match)
|
||||
self.recursive = self.config("recursive", True)
|
||||
self.offset = 0
|
||||
|
||||
def _init(self):
|
||||
self.recursive = self.config("recursive", True)
|
||||
|
||||
def skip(self, num):
|
||||
if self.recursive:
|
||||
return 0
|
||||
@ -216,9 +219,11 @@ class AryionTagExtractor(AryionExtractor):
|
||||
"count": ">= 5",
|
||||
})
|
||||
|
||||
def metadata(self):
|
||||
def _init(self):
|
||||
self.params = text.parse_query(self.user)
|
||||
self.user = None
|
||||
|
||||
def metadata(self):
|
||||
return {"search_tags": self.params.get("tag")}
|
||||
|
||||
def posts(self):
|
||||
|
@ -1,206 +0,0 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2020-2023 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
# published by the Free Software Foundation.
|
||||
|
||||
"""Extractors for https://bcy.net/"""
|
||||
|
||||
from .common import Extractor, Message
|
||||
from .. import text, util, exception
|
||||
import re
|
||||
|
||||
|
||||
class BcyExtractor(Extractor):
|
||||
"""Base class for bcy extractors"""
|
||||
category = "bcy"
|
||||
directory_fmt = ("{category}", "{user[id]} {user[name]}")
|
||||
filename_fmt = "{post[id]} {id}.{extension}"
|
||||
archive_fmt = "{post[id]}_{id}"
|
||||
root = "https://bcy.net"
|
||||
|
||||
def __init__(self, match):
|
||||
Extractor.__init__(self, match)
|
||||
self.item_id = match.group(1)
|
||||
self.session.headers["Referer"] = self.root + "/"
|
||||
|
||||
def items(self):
|
||||
sub = re.compile(r"^https?://p\d+-bcy"
|
||||
r"(?:-sign\.bcyimg\.com|\.byteimg\.com/img)"
|
||||
r"/banciyuan").sub
|
||||
iroot = "https://img-bcy-qn.pstatp.com"
|
||||
noop = self.config("noop")
|
||||
|
||||
for post in self.posts():
|
||||
if not post["image_list"]:
|
||||
continue
|
||||
|
||||
multi = None
|
||||
tags = post.get("post_tags") or ()
|
||||
data = {
|
||||
"user": {
|
||||
"id" : post["uid"],
|
||||
"name" : post["uname"],
|
||||
"avatar" : sub(iroot, post["avatar"].partition("~")[0]),
|
||||
},
|
||||
"post": {
|
||||
"id" : text.parse_int(post["item_id"]),
|
||||
"tags" : [t["tag_name"] for t in tags],
|
||||
"date" : text.parse_timestamp(post["ctime"]),
|
||||
"parody" : post["work"],
|
||||
"content": post["plain"],
|
||||
"likes" : post["like_count"],
|
||||
"shares" : post["share_count"],
|
||||
"replies": post["reply_count"],
|
||||
},
|
||||
}
|
||||
|
||||
yield Message.Directory, data
|
||||
for data["num"], image in enumerate(post["image_list"], 1):
|
||||
data["id"] = image["mid"]
|
||||
data["width"] = image["w"]
|
||||
data["height"] = image["h"]
|
||||
|
||||
url = image["path"].partition("~")[0]
|
||||
text.nameext_from_url(url, data)
|
||||
|
||||
# full-resolution image without watermark
|
||||
if data["extension"]:
|
||||
if not url.startswith(iroot):
|
||||
url = sub(iroot, url)
|
||||
data["filter"] = ""
|
||||
yield Message.Url, url, data
|
||||
|
||||
# watermarked image & low quality noop filter
|
||||
else:
|
||||
if multi is None:
|
||||
multi = self._data_from_post(
|
||||
post["item_id"])["post_data"]["multi"]
|
||||
image = multi[data["num"] - 1]
|
||||
|
||||
if image["origin"]:
|
||||
data["filter"] = "watermark"
|
||||
yield Message.Url, image["origin"], data
|
||||
|
||||
if noop:
|
||||
data["extension"] = ""
|
||||
data["filter"] = "noop"
|
||||
yield Message.Url, image["original_path"], data
|
||||
|
||||
def posts(self):
|
||||
"""Returns an iterable with all relevant 'post' objects"""
|
||||
|
||||
def _data_from_post(self, post_id):
|
||||
url = "{}/item/detail/{}".format(self.root, post_id)
|
||||
page = self.request(url, notfound="post").text
|
||||
data = (text.extr(page, 'JSON.parse("', '");')
|
||||
.replace('\\\\u002F', '/')
|
||||
.replace('\\"', '"'))
|
||||
try:
|
||||
return util.json_loads(data)["detail"]
|
||||
except ValueError:
|
||||
return util.json_loads(data.replace('\\"', '"'))["detail"]
|
||||
|
||||
|
||||
class BcyUserExtractor(BcyExtractor):
|
||||
"""Extractor for user timelines"""
|
||||
subcategory = "user"
|
||||
pattern = r"(?:https?://)?bcy\.net/u/(\d+)"
|
||||
test = (
|
||||
("https://bcy.net/u/1933712", {
|
||||
"pattern": r"https://img-bcy-qn.pstatp.com/\w+/\d+/post/\w+/.+jpg",
|
||||
"count": ">= 20",
|
||||
}),
|
||||
("https://bcy.net/u/109282764041", {
|
||||
"pattern": r"https://p\d-bcy-sign\.bcyimg\.com/banciyuan/[0-9a-f]+"
|
||||
r"~tplv-bcyx-yuan-logo-v1:.+\.image",
|
||||
"range": "1-25",
|
||||
"count": 25,
|
||||
}),
|
||||
)
|
||||
|
||||
def posts(self):
|
||||
url = self.root + "/apiv3/user/selfPosts"
|
||||
params = {"uid": self.item_id, "since": None}
|
||||
|
||||
while True:
|
||||
data = self.request(url, params=params).json()
|
||||
|
||||
try:
|
||||
items = data["data"]["items"]
|
||||
except KeyError:
|
||||
return
|
||||
if not items:
|
||||
return
|
||||
|
||||
for item in items:
|
||||
yield item["item_detail"]
|
||||
params["since"] = item["since"]
|
||||
|
||||
|
||||
class BcyPostExtractor(BcyExtractor):
|
||||
"""Extractor for individual posts"""
|
||||
subcategory = "post"
|
||||
pattern = r"(?:https?://)?bcy\.net/item/detail/(\d+)"
|
||||
test = (
|
||||
("https://bcy.net/item/detail/6355835481002893070", {
|
||||
"url": "301202375e61fd6e0e2e35de6c3ac9f74885dec3",
|
||||
"count": 1,
|
||||
"keyword": {
|
||||
"user": {
|
||||
"id" : 1933712,
|
||||
"name" : "wukloo",
|
||||
"avatar" : "re:https://img-bcy-qn.pstatp.com/Public/",
|
||||
},
|
||||
"post": {
|
||||
"id" : 6355835481002893070,
|
||||
"tags" : list,
|
||||
"date" : "dt:2016-11-22 08:47:46",
|
||||
"parody" : "东方PROJECT",
|
||||
"content": "re:根据微博的建议稍微做了点修改",
|
||||
"likes" : int,
|
||||
"shares" : int,
|
||||
"replies": int,
|
||||
},
|
||||
"id": 8330182,
|
||||
"num": 1,
|
||||
"width" : 3000,
|
||||
"height": 1687,
|
||||
"filename": "712e0780b09011e696f973c3d1568337",
|
||||
"extension": "jpg",
|
||||
},
|
||||
}),
|
||||
# only watermarked images available
|
||||
("https://bcy.net/item/detail/6950136331708144648", {
|
||||
"pattern": r"https://p\d-bcy-sign\.bcyimg\.com/banciyuan/[0-9a-f]+"
|
||||
r"~tplv-bcyx-yuan-logo-v1:.+\.image",
|
||||
"count": 10,
|
||||
"keyword": {"filter": "watermark"},
|
||||
}),
|
||||
# deleted
|
||||
("https://bcy.net/item/detail/6780546160802143237", {
|
||||
"exception": exception.NotFoundError,
|
||||
"count": 0,
|
||||
}),
|
||||
# only visible to logged in users
|
||||
("https://bcy.net/item/detail/6747523535150783495", {
|
||||
"count": 0,
|
||||
}),
|
||||
# JSON decode error (#3321)
|
||||
("https://bcy.net/item/detail/7166939271872388110", {
|
||||
"count": 0,
|
||||
}),
|
||||
)
|
||||
|
||||
def posts(self):
|
||||
try:
|
||||
data = self._data_from_post(self.item_id)
|
||||
except KeyError:
|
||||
return ()
|
||||
post = data["post_data"]
|
||||
post["image_list"] = post["multi"]
|
||||
post["plain"] = text.parse_unicode_escapes(post["plain"])
|
||||
post.update(data["detail_user"])
|
||||
return (post,)
|
@ -81,10 +81,13 @@ class BehanceGalleryExtractor(BehanceExtractor):
|
||||
("https://www.behance.net/gallery/88276087/Audi-R8-RWD", {
|
||||
"count": 20,
|
||||
"url": "6bebff0d37f85349f9ad28bd8b76fd66627c1e2f",
|
||||
"pattern": r"https://mir-s3-cdn-cf\.behance\.net/project_modules"
|
||||
r"/source/[0-9a-f]+.[0-9a-f]+\.jpg"
|
||||
}),
|
||||
# 'video' modules (#1282)
|
||||
("https://www.behance.net/gallery/101185577/COLCCI", {
|
||||
"pattern": r"ytdl:https://cdn-prod-ccv\.adobe\.com/",
|
||||
"pattern": r"https://cdn-prod-ccv\.adobe\.com/\w+"
|
||||
r"/rend/\w+_720\.mp4\?",
|
||||
"count": 3,
|
||||
}),
|
||||
)
|
||||
@ -129,26 +132,35 @@ class BehanceGalleryExtractor(BehanceExtractor):
|
||||
append = result.append
|
||||
|
||||
for module in data["modules"]:
|
||||
mtype = module["type"]
|
||||
mtype = module["__typename"]
|
||||
|
||||
if mtype == "image":
|
||||
url = module["sizes"]["original"]
|
||||
if mtype == "ImageModule":
|
||||
url = module["imageSizes"]["size_original"]["url"]
|
||||
append((url, module))
|
||||
|
||||
elif mtype == "video":
|
||||
page = self.request(module["src"]).text
|
||||
url = text.extr(page, '<source src="', '"')
|
||||
if text.ext_from_url(url) == "m3u8":
|
||||
url = "ytdl:" + url
|
||||
elif mtype == "VideoModule":
|
||||
renditions = module["videoData"]["renditions"]
|
||||
try:
|
||||
url = [
|
||||
r["url"] for r in renditions
|
||||
if text.ext_from_url(r["url"]) != "m3u8"
|
||||
][-1]
|
||||
except Exception as exc:
|
||||
self.log.debug("%s: %s", exc.__class__.__name__, exc)
|
||||
url = "ytdl:" + renditions[-1]["url"]
|
||||
append((url, module))
|
||||
|
||||
elif mtype == "media_collection":
|
||||
elif mtype == "MediaCollectionModule":
|
||||
for component in module["components"]:
|
||||
url = component["sizes"]["source"]
|
||||
append((url, module))
|
||||
for size in component["imageSizes"].values():
|
||||
if size:
|
||||
parts = size["url"].split("/")
|
||||
parts[4] = "source"
|
||||
append(("/".join(parts), module))
|
||||
break
|
||||
|
||||
elif mtype == "embed":
|
||||
embed = module.get("original_embed") or module.get("embed")
|
||||
elif mtype == "EmbedModule":
|
||||
embed = module.get("originalEmbed") or module.get("fluidEmbed")
|
||||
if embed:
|
||||
append(("ytdl:" + text.extr(embed, 'src="', '"'), module))
|
||||
|
||||
|
@ -28,12 +28,13 @@ class BloggerExtractor(Extractor):
|
||||
|
||||
def __init__(self, match):
|
||||
Extractor.__init__(self, match)
|
||||
self.videos = self.config("videos", True)
|
||||
self.blog = match.group(1) or match.group(2)
|
||||
|
||||
def _init(self):
|
||||
self.api = BloggerAPI(self)
|
||||
self.videos = self.config("videos", True)
|
||||
|
||||
def items(self):
|
||||
|
||||
blog = self.api.blog_by_url("http://" + self.blog)
|
||||
blog["pages"] = blog["pages"]["totalItems"]
|
||||
blog["posts"] = blog["posts"]["totalItems"]
|
||||
@ -44,6 +45,7 @@ class BloggerExtractor(Extractor):
|
||||
findall_image = re.compile(
|
||||
r'src="(https?://(?:'
|
||||
r'blogger\.googleusercontent\.com/img|'
|
||||
r'lh\d+\.googleusercontent\.com/|'
|
||||
r'\d+\.bp\.blogspot\.com)/[^"]+)').findall
|
||||
findall_video = re.compile(
|
||||
r'src="(https?://www\.blogger\.com/video\.g\?token=[^"]+)').findall
|
||||
|
@ -6,19 +6,19 @@
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
# published by the Free Software Foundation.
|
||||
|
||||
"""Extractors for https://bunkr.la/"""
|
||||
"""Extractors for https://bunkrr.su/"""
|
||||
|
||||
from .lolisafe import LolisafeAlbumExtractor
|
||||
from .. import text
|
||||
|
||||
|
||||
class BunkrAlbumExtractor(LolisafeAlbumExtractor):
|
||||
"""Extractor for bunkr.la albums"""
|
||||
"""Extractor for bunkrr.su albums"""
|
||||
category = "bunkr"
|
||||
root = "https://bunkr.la"
|
||||
pattern = r"(?:https?://)?(?:app\.)?bunkr\.(?:la|[sr]u|is|to)/a/([^/?#]+)"
|
||||
root = "https://bunkrr.su"
|
||||
pattern = r"(?:https?://)?(?:app\.)?bunkr+\.(?:la|[sr]u|is|to)/a/([^/?#]+)"
|
||||
test = (
|
||||
("https://bunkr.la/a/Lktg9Keq", {
|
||||
("https://bunkrr.su/a/Lktg9Keq", {
|
||||
"pattern": r"https://cdn\.bunkr\.ru/test-テスト-\"&>-QjgneIQv\.png",
|
||||
"content": "0c8768055e4e20e7c7259608b67799171b691140",
|
||||
"keyword": {
|
||||
@ -52,6 +52,12 @@ class BunkrAlbumExtractor(LolisafeAlbumExtractor):
|
||||
"num": int,
|
||||
},
|
||||
}),
|
||||
# cdn12 .ru TLD (#4147)
|
||||
("https://bunkrr.su/a/j1G29CnD", {
|
||||
"pattern": r"https://(cdn12.bunkr.ru|media-files12.bunkr.la)/\w+",
|
||||
"count": 8,
|
||||
}),
|
||||
("https://bunkrr.su/a/Lktg9Keq"),
|
||||
("https://bunkr.la/a/Lktg9Keq"),
|
||||
("https://bunkr.su/a/Lktg9Keq"),
|
||||
("https://bunkr.ru/a/Lktg9Keq"),
|
||||
@ -70,7 +76,7 @@ class BunkrAlbumExtractor(LolisafeAlbumExtractor):
|
||||
cdn = None
|
||||
files = []
|
||||
append = files.append
|
||||
headers = {"Referer": self.root.replace("://", "://stream.", 1) + "/"}
|
||||
headers = {"Referer": self.root + "/"}
|
||||
|
||||
pos = page.index('class="grid-images')
|
||||
for url in text.extract_iter(page, '<a href="', '"', pos):
|
||||
@ -86,10 +92,12 @@ class BunkrAlbumExtractor(LolisafeAlbumExtractor):
|
||||
url = text.unescape(url)
|
||||
if url.endswith((".mp4", ".m4v", ".mov", ".webm", ".mkv", ".ts",
|
||||
".zip", ".rar", ".7z")):
|
||||
append({"file": url.replace("://cdn", "://media-files", 1),
|
||||
"_http_headers": headers})
|
||||
else:
|
||||
append({"file": url})
|
||||
if url.startswith("https://cdn12."):
|
||||
url = ("https://media-files12.bunkr.la" +
|
||||
url[url.find("/", 14):])
|
||||
else:
|
||||
url = url.replace("://cdn", "://media-files", 1)
|
||||
append({"file": url, "_http_headers": headers})
|
||||
|
||||
return files, {
|
||||
"album_id" : self.album_id,
|
||||
|
@ -32,11 +32,10 @@ class Extractor():
|
||||
directory_fmt = ("{category}",)
|
||||
filename_fmt = "{filename}.{extension}"
|
||||
archive_fmt = ""
|
||||
cookiedomain = ""
|
||||
cookies_domain = ""
|
||||
browser = None
|
||||
root = ""
|
||||
test = None
|
||||
finalize = None
|
||||
request_interval = 0.0
|
||||
request_interval_min = 0.0
|
||||
request_timestamp = 0.0
|
||||
@ -45,32 +44,9 @@ class Extractor():
|
||||
def __init__(self, match):
|
||||
self.log = logging.getLogger(self.category)
|
||||
self.url = match.string
|
||||
|
||||
if self.basecategory:
|
||||
self.config = self._config_shared
|
||||
self.config_accumulate = self._config_shared_accumulate
|
||||
self._cfgpath = ("extractor", self.category, self.subcategory)
|
||||
self._parentdir = ""
|
||||
|
||||
self._write_pages = self.config("write-pages", False)
|
||||
self._retry_codes = self.config("retry-codes")
|
||||
self._retries = self.config("retries", 4)
|
||||
self._timeout = self.config("timeout", 30)
|
||||
self._verify = self.config("verify", True)
|
||||
self._proxies = util.build_proxy_map(self.config("proxy"), self.log)
|
||||
self._interval = util.build_duration_func(
|
||||
self.config("sleep-request", self.request_interval),
|
||||
self.request_interval_min,
|
||||
)
|
||||
|
||||
if self._retries < 0:
|
||||
self._retries = float("inf")
|
||||
if not self._retry_codes:
|
||||
self._retry_codes = ()
|
||||
|
||||
self._init_session()
|
||||
self._init_cookies()
|
||||
|
||||
@classmethod
|
||||
def from_url(cls, url):
|
||||
if isinstance(cls.pattern, str):
|
||||
@ -79,8 +55,19 @@ class Extractor():
|
||||
return cls(match) if match else None
|
||||
|
||||
def __iter__(self):
|
||||
self.initialize()
|
||||
return self.items()
|
||||
|
||||
def initialize(self):
|
||||
self._init_options()
|
||||
self._init_session()
|
||||
self._init_cookies()
|
||||
self._init()
|
||||
self.initialize = util.noop
|
||||
|
||||
def finalize(self):
|
||||
pass
|
||||
|
||||
def items(self):
|
||||
yield Message.Version, 1
|
||||
|
||||
@ -90,23 +77,44 @@ class Extractor():
|
||||
def config(self, key, default=None):
|
||||
return config.interpolate(self._cfgpath, key, default)
|
||||
|
||||
def config_deprecated(self, key, deprecated, default=None,
|
||||
sentinel=util.SENTINEL, history=set()):
|
||||
value = self.config(deprecated, sentinel)
|
||||
if value is not sentinel:
|
||||
if deprecated not in history:
|
||||
history.add(deprecated)
|
||||
self.log.warning("'%s' is deprecated. Use '%s' instead.",
|
||||
deprecated, key)
|
||||
default = value
|
||||
|
||||
value = self.config(key, sentinel)
|
||||
if value is not sentinel:
|
||||
return value
|
||||
return default
|
||||
|
||||
def config_accumulate(self, key):
|
||||
return config.accumulate(self._cfgpath, key)
|
||||
|
||||
def _config_shared(self, key, default=None):
|
||||
return config.interpolate_common(("extractor",), (
|
||||
(self.category, self.subcategory),
|
||||
(self.basecategory, self.subcategory),
|
||||
), key, default)
|
||||
return config.interpolate_common(
|
||||
("extractor",), self._cfgpath, key, default)
|
||||
|
||||
def _config_shared_accumulate(self, key):
|
||||
values = config.accumulate(self._cfgpath, key)
|
||||
conf = config.get(("extractor",), self.basecategory)
|
||||
if conf:
|
||||
values[:0] = config.accumulate((self.subcategory,), key, conf=conf)
|
||||
first = True
|
||||
extr = ("extractor",)
|
||||
|
||||
for path in self._cfgpath:
|
||||
if first:
|
||||
first = False
|
||||
values = config.accumulate(extr + path, key)
|
||||
else:
|
||||
conf = config.get(extr, path[0])
|
||||
if conf:
|
||||
values[:0] = config.accumulate(
|
||||
(self.subcategory,), key, conf=conf)
|
||||
return values
|
||||
|
||||
def request(self, url, *, method="GET", session=None,
|
||||
def request(self, url, method="GET", session=None,
|
||||
retries=None, retry_codes=None, encoding=None,
|
||||
fatal=True, notfound=None, **kwargs):
|
||||
if session is None:
|
||||
@ -180,7 +188,7 @@ class Extractor():
|
||||
|
||||
raise exception.HttpError(msg, response)
|
||||
|
||||
def wait(self, *, seconds=None, until=None, adjust=1.0,
|
||||
def wait(self, seconds=None, until=None, adjust=1.0,
|
||||
reason="rate limit reset"):
|
||||
now = time.time()
|
||||
|
||||
@ -230,6 +238,26 @@ class Extractor():
|
||||
|
||||
return username, password
|
||||
|
||||
def _init(self):
|
||||
pass
|
||||
|
||||
def _init_options(self):
|
||||
self._write_pages = self.config("write-pages", False)
|
||||
self._retry_codes = self.config("retry-codes")
|
||||
self._retries = self.config("retries", 4)
|
||||
self._timeout = self.config("timeout", 30)
|
||||
self._verify = self.config("verify", True)
|
||||
self._proxies = util.build_proxy_map(self.config("proxy"), self.log)
|
||||
self._interval = util.build_duration_func(
|
||||
self.config("sleep-request", self.request_interval),
|
||||
self.request_interval_min,
|
||||
)
|
||||
|
||||
if self._retries < 0:
|
||||
self._retries = float("inf")
|
||||
if not self._retry_codes:
|
||||
self._retry_codes = ()
|
||||
|
||||
def _init_session(self):
|
||||
self.session = session = requests.Session()
|
||||
headers = session.headers
|
||||
@ -271,7 +299,7 @@ class Extractor():
|
||||
useragent = self.config("user-agent")
|
||||
if useragent is None:
|
||||
useragent = ("Mozilla/5.0 (Windows NT 10.0; Win64; x64; "
|
||||
"rv:102.0) Gecko/20100101 Firefox/102.0")
|
||||
"rv:115.0) Gecko/20100101 Firefox/115.0")
|
||||
elif useragent == "browser":
|
||||
useragent = _browser_useragent()
|
||||
headers["User-Agent"] = useragent
|
||||
@ -315,26 +343,26 @@ class Extractor():
|
||||
|
||||
def _init_cookies(self):
|
||||
"""Populate the session's cookiejar"""
|
||||
self._cookiefile = None
|
||||
self._cookiejar = self.session.cookies
|
||||
if self.cookiedomain is None:
|
||||
self.cookies = self.session.cookies
|
||||
self.cookies_file = None
|
||||
if self.cookies_domain is None:
|
||||
return
|
||||
|
||||
cookies = self.config("cookies")
|
||||
if cookies:
|
||||
if isinstance(cookies, dict):
|
||||
self._update_cookies_dict(cookies, self.cookiedomain)
|
||||
self.cookies_update_dict(cookies, self.cookies_domain)
|
||||
|
||||
elif isinstance(cookies, str):
|
||||
cookiefile = util.expand_path(cookies)
|
||||
path = util.expand_path(cookies)
|
||||
try:
|
||||
with open(cookiefile) as fp:
|
||||
util.cookiestxt_load(fp, self._cookiejar)
|
||||
with open(path) as fp:
|
||||
util.cookiestxt_load(fp, self.cookies)
|
||||
except Exception as exc:
|
||||
self.log.warning("cookies: %s", exc)
|
||||
else:
|
||||
self.log.debug("Loading cookies from '%s'", cookies)
|
||||
self._cookiefile = cookiefile
|
||||
self.cookies_file = path
|
||||
|
||||
elif isinstance(cookies, (list, tuple)):
|
||||
key = tuple(cookies)
|
||||
@ -342,7 +370,7 @@ class Extractor():
|
||||
|
||||
if cookiejar is None:
|
||||
from ..cookies import load_cookies
|
||||
cookiejar = self._cookiejar.__class__()
|
||||
cookiejar = self.cookies.__class__()
|
||||
try:
|
||||
load_cookies(cookiejar, cookies)
|
||||
except Exception as exc:
|
||||
@ -352,9 +380,9 @@ class Extractor():
|
||||
else:
|
||||
self.log.debug("Using cached cookies from %s", key)
|
||||
|
||||
setcookie = self._cookiejar.set_cookie
|
||||
set_cookie = self.cookies.set_cookie
|
||||
for cookie in cookiejar:
|
||||
setcookie(cookie)
|
||||
set_cookie(cookie)
|
||||
|
||||
else:
|
||||
self.log.warning(
|
||||
@ -362,46 +390,56 @@ class Extractor():
|
||||
"option, got '%s' (%s)",
|
||||
cookies.__class__.__name__, cookies)
|
||||
|
||||
def _store_cookies(self):
|
||||
"""Store the session's cookiejar in a cookies.txt file"""
|
||||
if self._cookiefile and self.config("cookies-update", True):
|
||||
try:
|
||||
with open(self._cookiefile, "w") as fp:
|
||||
util.cookiestxt_store(fp, self._cookiejar)
|
||||
except OSError as exc:
|
||||
self.log.warning("cookies: %s", exc)
|
||||
def cookies_store(self):
|
||||
"""Store the session's cookies in a cookies.txt file"""
|
||||
export = self.config("cookies-update", True)
|
||||
if not export:
|
||||
return
|
||||
|
||||
def _update_cookies(self, cookies, *, domain=""):
|
||||
if isinstance(export, str):
|
||||
path = util.expand_path(export)
|
||||
else:
|
||||
path = self.cookies_file
|
||||
if not path:
|
||||
return
|
||||
|
||||
try:
|
||||
with open(path, "w") as fp:
|
||||
util.cookiestxt_store(fp, self.cookies)
|
||||
except OSError as exc:
|
||||
self.log.warning("cookies: %s", exc)
|
||||
|
||||
def cookies_update(self, cookies, domain=""):
|
||||
"""Update the session's cookiejar with 'cookies'"""
|
||||
if isinstance(cookies, dict):
|
||||
self._update_cookies_dict(cookies, domain or self.cookiedomain)
|
||||
self.cookies_update_dict(cookies, domain or self.cookies_domain)
|
||||
else:
|
||||
setcookie = self._cookiejar.set_cookie
|
||||
set_cookie = self.cookies.set_cookie
|
||||
try:
|
||||
cookies = iter(cookies)
|
||||
except TypeError:
|
||||
setcookie(cookies)
|
||||
set_cookie(cookies)
|
||||
else:
|
||||
for cookie in cookies:
|
||||
setcookie(cookie)
|
||||
set_cookie(cookie)
|
||||
|
||||
def _update_cookies_dict(self, cookiedict, domain):
|
||||
def cookies_update_dict(self, cookiedict, domain):
|
||||
"""Update cookiejar with name-value pairs from a dict"""
|
||||
setcookie = self._cookiejar.set
|
||||
set_cookie = self.cookies.set
|
||||
for name, value in cookiedict.items():
|
||||
setcookie(name, value, domain=domain)
|
||||
set_cookie(name, value, domain=domain)
|
||||
|
||||
def _check_cookies(self, cookienames, *, domain=None):
|
||||
"""Check if all 'cookienames' are in the session's cookiejar"""
|
||||
if not self._cookiejar:
|
||||
def cookies_check(self, cookies_names, domain=None):
|
||||
"""Check if all 'cookies_names' are in the session's cookiejar"""
|
||||
if not self.cookies:
|
||||
return False
|
||||
|
||||
if domain is None:
|
||||
domain = self.cookiedomain
|
||||
names = set(cookienames)
|
||||
domain = self.cookies_domain
|
||||
names = set(cookies_names)
|
||||
now = time.time()
|
||||
|
||||
for cookie in self._cookiejar:
|
||||
for cookie in self.cookies:
|
||||
if cookie.name in names and (
|
||||
not domain or cookie.domain == domain):
|
||||
|
||||
@ -425,9 +463,16 @@ class Extractor():
|
||||
return False
|
||||
|
||||
def _prepare_ddosguard_cookies(self):
|
||||
if not self._cookiejar.get("__ddg2", domain=self.cookiedomain):
|
||||
self._cookiejar.set(
|
||||
"__ddg2", util.generate_token(), domain=self.cookiedomain)
|
||||
if not self.cookies.get("__ddg2", domain=self.cookies_domain):
|
||||
self.cookies.set(
|
||||
"__ddg2", util.generate_token(), domain=self.cookies_domain)
|
||||
|
||||
def _cache(self, func, maxage, keyarg=None):
|
||||
# return cache.DatabaseCacheDecorator(func, maxage, keyarg)
|
||||
return cache.DatabaseCacheDecorator(func, keyarg, maxage)
|
||||
|
||||
def _cache_memory(self, func, maxage=None, keyarg=None):
|
||||
return cache.Memcache()
|
||||
|
||||
def _get_date_min_max(self, dmin=None, dmax=None):
|
||||
"""Retrieve and parse 'date-min' and 'date-max' config values"""
|
||||
@ -530,7 +575,13 @@ class GalleryExtractor(Extractor):
|
||||
|
||||
def items(self):
|
||||
self.login()
|
||||
page = self.request(self.gallery_url, notfound=self.subcategory).text
|
||||
|
||||
if self.gallery_url:
|
||||
page = self.request(
|
||||
self.gallery_url, notfound=self.subcategory).text
|
||||
else:
|
||||
page = None
|
||||
|
||||
data = self.metadata(page)
|
||||
imgs = self.images(page)
|
||||
|
||||
@ -623,6 +674,8 @@ class AsynchronousMixin():
|
||||
"""Run info extraction in a separate thread"""
|
||||
|
||||
def __iter__(self):
|
||||
self.initialize()
|
||||
|
||||
messages = queue.Queue(5)
|
||||
thread = threading.Thread(
|
||||
target=self.async_items,
|
||||
@ -774,8 +827,8 @@ _browser_cookies = {}
|
||||
|
||||
HTTP_HEADERS = {
|
||||
"firefox": (
|
||||
("User-Agent", "Mozilla/5.0 ({}; rv:102.0) "
|
||||
"Gecko/20100101 Firefox/102.0"),
|
||||
("User-Agent", "Mozilla/5.0 ({}; rv:115.0) "
|
||||
"Gecko/20100101 Firefox/115.0"),
|
||||
("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,"
|
||||
"image/avif,image/webp,*/*;q=0.8"),
|
||||
("Accept-Language", "en-US,en;q=0.5"),
|
||||
@ -866,13 +919,3 @@ if action:
|
||||
except Exception:
|
||||
pass
|
||||
del action
|
||||
|
||||
# Undo automatic pyOpenSSL injection by requests
|
||||
pyopenssl = config.get((), "pyopenssl", False)
|
||||
if not pyopenssl:
|
||||
try:
|
||||
from requests.packages.urllib3.contrib import pyopenssl # noqa
|
||||
pyopenssl.extract_from_urllib3()
|
||||
except ImportError:
|
||||
pass
|
||||
del pyopenssl
|
||||
|
@ -22,8 +22,7 @@ class DanbooruExtractor(BaseExtractor):
|
||||
per_page = 200
|
||||
request_interval = 1.0
|
||||
|
||||
def __init__(self, match):
|
||||
BaseExtractor.__init__(self, match)
|
||||
def _init(self):
|
||||
self.ugoira = self.config("ugoira", False)
|
||||
self.external = self.config("external", False)
|
||||
self.includes = False
|
||||
@ -70,6 +69,8 @@ class DanbooruExtractor(BaseExtractor):
|
||||
continue
|
||||
|
||||
text.nameext_from_url(url, post)
|
||||
post["date"] = text.parse_datetime(
|
||||
post["created_at"], "%Y-%m-%dT%H:%M:%S.%f%z")
|
||||
|
||||
if post["extension"] == "zip":
|
||||
if self.ugoira:
|
||||
@ -92,42 +93,47 @@ class DanbooruExtractor(BaseExtractor):
|
||||
def posts(self):
|
||||
return ()
|
||||
|
||||
def _pagination(self, endpoint, params, pages=False):
|
||||
def _pagination(self, endpoint, params, prefix=None):
|
||||
url = self.root + endpoint
|
||||
params["limit"] = self.per_page
|
||||
params["page"] = self.page_start
|
||||
|
||||
first = True
|
||||
while True:
|
||||
posts = self.request(url, params=params).json()
|
||||
if "posts" in posts:
|
||||
if isinstance(posts, dict):
|
||||
posts = posts["posts"]
|
||||
|
||||
if self.includes and posts:
|
||||
if not pages and "only" not in params:
|
||||
params["page"] = "b{}".format(posts[0]["id"] + 1)
|
||||
params["only"] = self.includes
|
||||
data = {
|
||||
meta["id"]: meta
|
||||
for meta in self.request(url, params=params).json()
|
||||
}
|
||||
for post in posts:
|
||||
post.update(data[post["id"]])
|
||||
params["only"] = None
|
||||
if posts:
|
||||
if self.includes:
|
||||
params_meta = {
|
||||
"only" : self.includes,
|
||||
"limit": len(posts),
|
||||
"tags" : "id:" + ",".join(str(p["id"]) for p in posts),
|
||||
}
|
||||
data = {
|
||||
meta["id"]: meta
|
||||
for meta in self.request(
|
||||
url, params=params_meta).json()
|
||||
}
|
||||
for post in posts:
|
||||
post.update(data[post["id"]])
|
||||
|
||||
yield from posts
|
||||
if prefix == "a" and not first:
|
||||
posts.reverse()
|
||||
|
||||
yield from posts
|
||||
|
||||
if len(posts) < self.threshold:
|
||||
return
|
||||
|
||||
if pages:
|
||||
if prefix:
|
||||
params["page"] = "{}{}".format(prefix, posts[-1]["id"])
|
||||
elif params["page"]:
|
||||
params["page"] += 1
|
||||
else:
|
||||
for post in reversed(posts):
|
||||
if "id" in post:
|
||||
params["page"] = "b{}".format(post["id"])
|
||||
break
|
||||
else:
|
||||
return
|
||||
params["page"] = 2
|
||||
first = False
|
||||
|
||||
def _ugoira_frames(self, post):
|
||||
data = self.request("{}/posts/{}.json?only=media_metadata".format(
|
||||
@ -153,7 +159,11 @@ BASE_PATTERN = DanbooruExtractor.update({
|
||||
"aibooru": {
|
||||
"root": None,
|
||||
"pattern": r"(?:safe.)?aibooru\.online",
|
||||
}
|
||||
},
|
||||
"booruvar": {
|
||||
"root": "https://booru.borvar.art",
|
||||
"pattern": r"booru\.borvar\.art",
|
||||
},
|
||||
})
|
||||
|
||||
|
||||
@ -181,7 +191,12 @@ class DanbooruTagExtractor(DanbooruExtractor):
|
||||
"count": 12,
|
||||
}),
|
||||
("https://aibooru.online/posts?tags=center_frills&z=1", {
|
||||
"pattern": r"https://aibooru\.online/data/original"
|
||||
"pattern": r"https://cdn\.aibooru\.online/original"
|
||||
r"/[0-9a-f]{2}/[0-9a-f]{2}/[0-9a-f]{32}\.\w+",
|
||||
"count": ">= 3",
|
||||
}),
|
||||
("https://booru.borvar.art/posts?tags=chibi&z=1", {
|
||||
"pattern": r"https://booru\.borvar\.art/data/original"
|
||||
r"/[0-9a-f]{2}/[0-9a-f]{2}/[0-9a-f]{32}\.\w+",
|
||||
"count": ">= 3",
|
||||
}),
|
||||
@ -200,7 +215,21 @@ class DanbooruTagExtractor(DanbooruExtractor):
|
||||
return {"search_tags": self.tags}
|
||||
|
||||
def posts(self):
|
||||
return self._pagination("/posts.json", {"tags": self.tags})
|
||||
prefix = "b"
|
||||
for tag in self.tags.split():
|
||||
if tag.startswith("order:"):
|
||||
if tag == "order:id" or tag == "order:id_asc":
|
||||
prefix = "a"
|
||||
elif tag == "order:id_desc":
|
||||
prefix = "b"
|
||||
else:
|
||||
prefix = None
|
||||
elif tag.startswith(
|
||||
("id:", "md5", "ordfav:", "ordfavgroup:", "ordpool:")):
|
||||
prefix = None
|
||||
break
|
||||
|
||||
return self._pagination("/posts.json", {"tags": self.tags}, prefix)
|
||||
|
||||
|
||||
class DanbooruPoolExtractor(DanbooruExtractor):
|
||||
@ -217,6 +246,10 @@ class DanbooruPoolExtractor(DanbooruExtractor):
|
||||
"url": "902549ffcdb00fe033c3f63e12bc3cb95c5fd8d5",
|
||||
"count": 6,
|
||||
}),
|
||||
("https://booru.borvar.art/pools/2", {
|
||||
"url": "77fa3559a3fc919f72611f4e3dd0f919d19d3e0d",
|
||||
"count": 4,
|
||||
}),
|
||||
("https://aibooru.online/pools/1"),
|
||||
("https://danbooru.donmai.us/pool/show/7659"),
|
||||
)
|
||||
@ -234,7 +267,7 @@ class DanbooruPoolExtractor(DanbooruExtractor):
|
||||
|
||||
def posts(self):
|
||||
params = {"tags": "pool:" + self.pool_id}
|
||||
return self._pagination("/posts.json", params)
|
||||
return self._pagination("/posts.json", params, "b")
|
||||
|
||||
|
||||
class DanbooruPostExtractor(DanbooruExtractor):
|
||||
@ -245,6 +278,7 @@ class DanbooruPostExtractor(DanbooruExtractor):
|
||||
test = (
|
||||
("https://danbooru.donmai.us/posts/294929", {
|
||||
"content": "5e255713cbf0a8e0801dc423563c34d896bb9229",
|
||||
"keyword": {"date": "dt:2008-08-12 04:46:05"},
|
||||
}),
|
||||
("https://danbooru.donmai.us/posts/3613024", {
|
||||
"pattern": r"https?://.+\.zip$",
|
||||
@ -256,6 +290,9 @@ class DanbooruPostExtractor(DanbooruExtractor):
|
||||
("https://aibooru.online/posts/1", {
|
||||
"content": "54d548743cd67799a62c77cbae97cfa0fec1b7e9",
|
||||
}),
|
||||
("https://booru.borvar.art/posts/1487", {
|
||||
"content": "91273ac1ea413a12be468841e2b5804656a50bff",
|
||||
}),
|
||||
("https://danbooru.donmai.us/post/show/294929"),
|
||||
)
|
||||
|
||||
@ -287,6 +324,7 @@ class DanbooruPopularExtractor(DanbooruExtractor):
|
||||
}),
|
||||
("https://booru.allthefallen.moe/explore/posts/popular"),
|
||||
("https://aibooru.online/explore/posts/popular"),
|
||||
("https://booru.borvar.art/explore/posts/popular"),
|
||||
)
|
||||
|
||||
def __init__(self, match):
|
||||
@ -307,7 +345,4 @@ class DanbooruPopularExtractor(DanbooruExtractor):
|
||||
return {"date": date, "scale": scale}
|
||||
|
||||
def posts(self):
|
||||
if self.page_start is None:
|
||||
self.page_start = 1
|
||||
return self._pagination(
|
||||
"/explore/posts/popular.json", self.params, True)
|
||||
return self._pagination("/explore/posts/popular.json", self.params)
|
||||
|
@ -32,20 +32,24 @@ class DeviantartExtractor(Extractor):
|
||||
root = "https://www.deviantart.com"
|
||||
directory_fmt = ("{category}", "{username}")
|
||||
filename_fmt = "{category}_{index}_{title}.{extension}"
|
||||
cookiedomain = None
|
||||
cookienames = ("auth", "auth_secure", "userinfo")
|
||||
cookies_domain = None
|
||||
cookies_names = ("auth", "auth_secure", "userinfo")
|
||||
_last_request = 0
|
||||
|
||||
def __init__(self, match):
|
||||
Extractor.__init__(self, match)
|
||||
self.user = match.group(1) or match.group(2)
|
||||
|
||||
def _init(self):
|
||||
self.flat = self.config("flat", True)
|
||||
self.extra = self.config("extra", False)
|
||||
self.original = self.config("original", True)
|
||||
self.comments = self.config("comments", False)
|
||||
self.user = match.group(1) or match.group(2)
|
||||
|
||||
self.api = DeviantartOAuthAPI(self)
|
||||
self.group = False
|
||||
self.offset = 0
|
||||
self.api = None
|
||||
self._premium_cache = {}
|
||||
|
||||
unwatch = self.config("auto-unwatch")
|
||||
if unwatch:
|
||||
@ -60,27 +64,28 @@ class DeviantartExtractor(Extractor):
|
||||
self._update_content = self._update_content_image
|
||||
self.original = True
|
||||
|
||||
self._premium_cache = {}
|
||||
self.commit_journal = {
|
||||
"html": self._commit_journal_html,
|
||||
"text": self._commit_journal_text,
|
||||
}.get(self.config("journals", "html"))
|
||||
journals = self.config("journals", "html")
|
||||
if journals == "html":
|
||||
self.commit_journal = self._commit_journal_html
|
||||
elif journals == "text":
|
||||
self.commit_journal = self._commit_journal_text
|
||||
else:
|
||||
self.commit_journal = None
|
||||
|
||||
def skip(self, num):
|
||||
self.offset += num
|
||||
return num
|
||||
|
||||
def login(self):
|
||||
if not self._check_cookies(self.cookienames):
|
||||
username, password = self._get_auth_info()
|
||||
if not username:
|
||||
return False
|
||||
self._update_cookies(_login_impl(self, username, password))
|
||||
return True
|
||||
if self.cookies_check(self.cookies_names):
|
||||
return True
|
||||
|
||||
username, password = self._get_auth_info()
|
||||
if username:
|
||||
self.cookies_update(_login_impl(self, username, password))
|
||||
return True
|
||||
|
||||
def items(self):
|
||||
self.api = DeviantartOAuthAPI(self)
|
||||
|
||||
if self.user and self.config("group", True):
|
||||
profile = self.api.user_profile(self.user)
|
||||
self.group = not profile
|
||||
@ -448,6 +453,9 @@ class DeviantartUserExtractor(DeviantartExtractor):
|
||||
("https://shimoda7.deviantart.com/"),
|
||||
)
|
||||
|
||||
def initialize(self):
|
||||
pass
|
||||
|
||||
def items(self):
|
||||
base = "{}/{}/".format(self.root, self.user)
|
||||
return self._dispatch_extractors((
|
||||
@ -1105,11 +1113,14 @@ class DeviantartDeviationExtractor(DeviantartExtractor):
|
||||
match.group(4) or match.group(5) or id_from_base36(match.group(6))
|
||||
|
||||
def deviations(self):
|
||||
url = "{}/{}/{}/{}".format(
|
||||
self.root, self.user or "u", self.type or "art", self.deviation_id)
|
||||
if self.user:
|
||||
url = "{}/{}/{}/{}".format(
|
||||
self.root, self.user, self.type or "art", self.deviation_id)
|
||||
else:
|
||||
url = "{}/view/{}/".format(self.root, self.deviation_id)
|
||||
|
||||
uuid = text.extract(self._limited_request(url).text,
|
||||
'"deviationUuid\\":\\"', '\\')[0]
|
||||
uuid = text.extr(self._limited_request(url).text,
|
||||
'"deviationUuid\\":\\"', '\\')
|
||||
if not uuid:
|
||||
raise exception.NotFoundError("deviation")
|
||||
return (self.api.deviation(uuid),)
|
||||
@ -1120,7 +1131,7 @@ class DeviantartScrapsExtractor(DeviantartExtractor):
|
||||
subcategory = "scraps"
|
||||
directory_fmt = ("{category}", "{username}", "Scraps")
|
||||
archive_fmt = "s_{_username}_{index}.{extension}"
|
||||
cookiedomain = ".deviantart.com"
|
||||
cookies_domain = ".deviantart.com"
|
||||
pattern = BASE_PATTERN + r"/gallery/(?:\?catpath=)?scraps\b"
|
||||
test = (
|
||||
("https://www.deviantart.com/shimoda7/gallery/scraps", {
|
||||
@ -1143,7 +1154,7 @@ class DeviantartSearchExtractor(DeviantartExtractor):
|
||||
subcategory = "search"
|
||||
directory_fmt = ("{category}", "Search", "{search_tags}")
|
||||
archive_fmt = "Q_{search_tags}_{index}.{extension}"
|
||||
cookiedomain = ".deviantart.com"
|
||||
cookies_domain = ".deviantart.com"
|
||||
pattern = (r"(?:https?://)?www\.deviantart\.com"
|
||||
r"/search(?:/deviations)?/?\?([^#]+)")
|
||||
test = (
|
||||
@ -1202,7 +1213,7 @@ class DeviantartGallerySearchExtractor(DeviantartExtractor):
|
||||
"""Extractor for deviantart gallery searches"""
|
||||
subcategory = "gallery-search"
|
||||
archive_fmt = "g_{_username}_{index}.{extension}"
|
||||
cookiedomain = ".deviantart.com"
|
||||
cookies_domain = ".deviantart.com"
|
||||
pattern = BASE_PATTERN + r"/gallery/?\?(q=[^#]+)"
|
||||
test = (
|
||||
("https://www.deviantart.com/shimoda7/gallery?q=memory", {
|
||||
@ -1417,7 +1428,14 @@ class DeviantartOAuthAPI():
|
||||
"""Get the original file download (if allowed)"""
|
||||
endpoint = "/deviation/download/" + deviation_id
|
||||
params = {"mature_content": self.mature}
|
||||
return self._call(endpoint, params=params, public=public)
|
||||
|
||||
try:
|
||||
return self._call(
|
||||
endpoint, params=params, public=public, log=False)
|
||||
except Exception:
|
||||
if not self.refresh_token_key:
|
||||
raise
|
||||
return self._call(endpoint, params=params, public=False)
|
||||
|
||||
def deviation_metadata(self, deviations):
|
||||
""" Fetch deviation metadata for a set of deviations"""
|
||||
@ -1518,7 +1536,7 @@ class DeviantartOAuthAPI():
|
||||
refresh_token_key, data["refresh_token"])
|
||||
return "Bearer " + data["access_token"]
|
||||
|
||||
def _call(self, endpoint, fatal=True, public=None, **kwargs):
|
||||
def _call(self, endpoint, fatal=True, log=True, public=None, **kwargs):
|
||||
"""Call an API endpoint"""
|
||||
url = "https://www.deviantart.com/api/v1/oauth2" + endpoint
|
||||
kwargs["fatal"] = None
|
||||
@ -1563,7 +1581,8 @@ class DeviantartOAuthAPI():
|
||||
"cs/configuration.rst#extractordeviantartclient-id"
|
||||
"--client-secret")
|
||||
else:
|
||||
self.log.error(msg)
|
||||
if log:
|
||||
self.log.error(msg)
|
||||
return data
|
||||
|
||||
def _pagination(self, endpoint, params,
|
||||
@ -1571,15 +1590,14 @@ class DeviantartOAuthAPI():
|
||||
warn = True
|
||||
if public is None:
|
||||
public = self.public
|
||||
elif not public:
|
||||
self.public = False
|
||||
|
||||
while True:
|
||||
data = self._call(endpoint, params=params, public=public)
|
||||
if key not in data:
|
||||
try:
|
||||
results = data[key]
|
||||
except KeyError:
|
||||
self.log.error("Unexpected API response: %s", data)
|
||||
return
|
||||
results = data[key]
|
||||
|
||||
if unpack:
|
||||
results = [item["journal"] for item in results
|
||||
@ -1588,7 +1606,7 @@ class DeviantartOAuthAPI():
|
||||
if public and len(results) < params["limit"]:
|
||||
if self.refresh_token_key:
|
||||
self.log.debug("Switching to private access token")
|
||||
self.public = public = False
|
||||
public = False
|
||||
continue
|
||||
elif data["has_more"] and warn:
|
||||
warn = False
|
||||
@ -1859,7 +1877,7 @@ def _login_impl(extr, username, password):
|
||||
|
||||
return {
|
||||
cookie.name: cookie.value
|
||||
for cookie in extr.session.cookies
|
||||
for cookie in extr.cookies
|
||||
}
|
||||
|
||||
|
||||
|
@ -57,6 +57,8 @@ class E621Extractor(danbooru.DanbooruExtractor):
|
||||
|
||||
post["filename"] = file["md5"]
|
||||
post["extension"] = file["ext"]
|
||||
post["date"] = text.parse_datetime(
|
||||
post["created_at"], "%Y-%m-%dT%H:%M:%S.%f%z")
|
||||
|
||||
post.update(data)
|
||||
yield Message.Directory, post
|
||||
@ -72,6 +74,10 @@ BASE_PATTERN = E621Extractor.update({
|
||||
"root": "https://e926.net",
|
||||
"pattern": r"e926\.net",
|
||||
},
|
||||
"e6ai": {
|
||||
"root": "https://e6ai.net",
|
||||
"pattern": r"e6ai\.net",
|
||||
},
|
||||
})
|
||||
|
||||
|
||||
@ -92,6 +98,10 @@ class E621TagExtractor(E621Extractor, danbooru.DanbooruTagExtractor):
|
||||
}),
|
||||
("https://e926.net/post/index/1/anry"),
|
||||
("https://e926.net/post?tags=anry"),
|
||||
|
||||
("https://e6ai.net/posts?tags=anry"),
|
||||
("https://e6ai.net/post/index/1/anry"),
|
||||
("https://e6ai.net/post?tags=anry"),
|
||||
)
|
||||
|
||||
|
||||
@ -110,6 +120,11 @@ class E621PoolExtractor(E621Extractor, danbooru.DanbooruPoolExtractor):
|
||||
"content": "91abe5d5334425d9787811d7f06d34c77974cd22",
|
||||
}),
|
||||
("https://e926.net/pool/show/73"),
|
||||
|
||||
("https://e6ai.net/pools/3", {
|
||||
"url": "a6d1ad67a3fa9b9f73731d34d5f6f26f7e85855f",
|
||||
}),
|
||||
("https://e6ai.net/pool/show/3"),
|
||||
)
|
||||
|
||||
def posts(self):
|
||||
@ -140,6 +155,7 @@ class E621PostExtractor(E621Extractor, danbooru.DanbooruPostExtractor):
|
||||
("https://e621.net/posts/535", {
|
||||
"url": "f7f78b44c9b88f8f09caac080adc8d6d9fdaa529",
|
||||
"content": "66f46e96a893fba8e694c4e049b23c2acc9af462",
|
||||
"keyword": {"date": "dt:2007-02-17 19:02:32"},
|
||||
}),
|
||||
("https://e621.net/posts/3181052", {
|
||||
"options": (("metadata", "notes,pools"),),
|
||||
@ -189,6 +205,12 @@ class E621PostExtractor(E621Extractor, danbooru.DanbooruPostExtractor):
|
||||
"content": "66f46e96a893fba8e694c4e049b23c2acc9af462",
|
||||
}),
|
||||
("https://e926.net/post/show/535"),
|
||||
|
||||
("https://e6ai.net/posts/23", {
|
||||
"url": "3c85a806b3d9eec861948af421fe0e8ad6b8f881",
|
||||
"content": "a05a484e4eb64637d56d751c02e659b4bc8ea5d5",
|
||||
}),
|
||||
("https://e6ai.net/post/show/23"),
|
||||
)
|
||||
|
||||
def posts(self):
|
||||
@ -213,12 +235,12 @@ class E621PopularExtractor(E621Extractor, danbooru.DanbooruPopularExtractor):
|
||||
"pattern": r"https://static\d.e926.net/data/../../[0-9a-f]+",
|
||||
"count": ">= 70",
|
||||
}),
|
||||
|
||||
("https://e6ai.net/explore/posts/popular"),
|
||||
)
|
||||
|
||||
def posts(self):
|
||||
if self.page_start is None:
|
||||
self.page_start = 1
|
||||
return self._pagination("/popular.json", self.params, True)
|
||||
return self._pagination("/popular.json", self.params)
|
||||
|
||||
|
||||
class E621FavoriteExtractor(E621Extractor):
|
||||
@ -239,6 +261,8 @@ class E621FavoriteExtractor(E621Extractor):
|
||||
"pattern": r"https://static\d.e926.net/data/../../[0-9a-f]+",
|
||||
"count": "> 260",
|
||||
}),
|
||||
|
||||
("https://e6ai.net/favorites"),
|
||||
)
|
||||
|
||||
def __init__(self, match):
|
||||
@ -249,6 +273,4 @@ class E621FavoriteExtractor(E621Extractor):
|
||||
return {"user_id": self.query.get("user_id", "")}
|
||||
|
||||
def posts(self):
|
||||
if self.page_start is None:
|
||||
self.page_start = 1
|
||||
return self._pagination("/favorites.json", self.query, True)
|
||||
return self._pagination("/favorites.json", self.query)
|
||||
|
@ -1,6 +1,6 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2021-2022 Mike Fährmann
|
||||
# Copyright 2021-2023 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
@ -65,7 +65,7 @@ class EromeExtractor(Extractor):
|
||||
def request(self, url, **kwargs):
|
||||
if self.__cookies:
|
||||
self.__cookies = False
|
||||
self.session.cookies.update(_cookie_cache())
|
||||
self.cookies.update(_cookie_cache())
|
||||
|
||||
for _ in range(5):
|
||||
response = Extractor.request(self, url, **kwargs)
|
||||
@ -80,7 +80,7 @@ class EromeExtractor(Extractor):
|
||||
for params["page"] in itertools.count(1):
|
||||
page = self.request(url, params=params).text
|
||||
|
||||
album_ids = EromeAlbumExtractor.pattern.findall(page)
|
||||
album_ids = EromeAlbumExtractor.pattern.findall(page)[::2]
|
||||
yield from album_ids
|
||||
|
||||
if len(album_ids) < 36:
|
||||
|
@ -1,6 +1,6 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2014-2022 Mike Fährmann
|
||||
# Copyright 2014-2023 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
@ -21,28 +21,31 @@ class ExhentaiExtractor(Extractor):
|
||||
"""Base class for exhentai extractors"""
|
||||
category = "exhentai"
|
||||
directory_fmt = ("{category}", "{gid} {title[:247]}")
|
||||
filename_fmt = (
|
||||
"{gid}_{num:>04}_{image_token}_{filename}.{extension}")
|
||||
filename_fmt = "{gid}_{num:>04}_{image_token}_{filename}.{extension}"
|
||||
archive_fmt = "{gid}_{num}"
|
||||
cookienames = ("ipb_member_id", "ipb_pass_hash")
|
||||
cookiedomain = ".exhentai.org"
|
||||
cookies_domain = ".exhentai.org"
|
||||
cookies_names = ("ipb_member_id", "ipb_pass_hash")
|
||||
root = "https://exhentai.org"
|
||||
request_interval = 5.0
|
||||
|
||||
LIMIT = False
|
||||
|
||||
def __init__(self, match):
|
||||
# allow calling 'self.config()' before 'Extractor.__init__()'
|
||||
self._cfgpath = ("extractor", self.category, self.subcategory)
|
||||
Extractor.__init__(self, match)
|
||||
self.version = match.group(1)
|
||||
|
||||
version = match.group(1)
|
||||
def initialize(self):
|
||||
domain = self.config("domain", "auto")
|
||||
if domain == "auto":
|
||||
domain = ("ex" if version == "ex" else "e-") + "hentai.org"
|
||||
domain = ("ex" if self.version == "ex" else "e-") + "hentai.org"
|
||||
self.root = "https://" + domain
|
||||
self.cookiedomain = "." + domain
|
||||
self.cookies_domain = "." + domain
|
||||
|
||||
Extractor.__init__(self, match)
|
||||
Extractor.initialize(self)
|
||||
|
||||
if self.version != "ex":
|
||||
self.cookies.set("nw", "1", domain=self.cookies_domain)
|
||||
self.session.headers["Referer"] = self.root + "/"
|
||||
self.original = self.config("original", True)
|
||||
|
||||
limits = self.config("limits", False)
|
||||
@ -52,14 +55,10 @@ class ExhentaiExtractor(Extractor):
|
||||
else:
|
||||
self.limits = False
|
||||
|
||||
self.session.headers["Referer"] = self.root + "/"
|
||||
if version != "ex":
|
||||
self.session.cookies.set("nw", "1", domain=self.cookiedomain)
|
||||
|
||||
def request(self, *args, **kwargs):
|
||||
response = Extractor.request(self, *args, **kwargs)
|
||||
if self._is_sadpanda(response):
|
||||
self.log.info("sadpanda.jpg")
|
||||
def request(self, url, **kwargs):
|
||||
response = Extractor.request(self, url, **kwargs)
|
||||
if response.history and response.headers.get("Content-Length") == "0":
|
||||
self.log.info("blank page")
|
||||
raise exception.AuthorizationError()
|
||||
return response
|
||||
|
||||
@ -67,17 +66,20 @@ class ExhentaiExtractor(Extractor):
|
||||
"""Login and set necessary cookies"""
|
||||
if self.LIMIT:
|
||||
raise exception.StopExtraction("Image limit reached!")
|
||||
if self._check_cookies(self.cookienames):
|
||||
|
||||
if self.cookies_check(self.cookies_names):
|
||||
return
|
||||
|
||||
username, password = self._get_auth_info()
|
||||
if username:
|
||||
self._update_cookies(self._login_impl(username, password))
|
||||
else:
|
||||
self.log.info("no username given; using e-hentai.org")
|
||||
self.root = "https://e-hentai.org"
|
||||
self.original = False
|
||||
self.limits = False
|
||||
self.session.cookies["nw"] = "1"
|
||||
return self.cookies_update(self._login_impl(username, password))
|
||||
|
||||
self.log.info("no username given; using e-hentai.org")
|
||||
self.root = "https://e-hentai.org"
|
||||
self.cookies_domain = ".e-hentai.org"
|
||||
self.cookies.set("nw", "1", domain=self.cookies_domain)
|
||||
self.original = False
|
||||
self.limits = False
|
||||
|
||||
@cache(maxage=90*24*3600, keyarg=1)
|
||||
def _login_impl(self, username, password):
|
||||
@ -98,15 +100,7 @@ class ExhentaiExtractor(Extractor):
|
||||
response = self.request(url, method="POST", headers=headers, data=data)
|
||||
if b"You are now logged in as:" not in response.content:
|
||||
raise exception.AuthenticationError()
|
||||
return {c: response.cookies[c] for c in self.cookienames}
|
||||
|
||||
@staticmethod
|
||||
def _is_sadpanda(response):
|
||||
"""Return True if the response object contains a sad panda"""
|
||||
return (
|
||||
response.headers.get("Content-Length") == "9615" and
|
||||
"sadpanda.jpg" in response.headers.get("Content-Disposition", "")
|
||||
)
|
||||
return {c: response.cookies[c] for c in self.cookies_names}
|
||||
|
||||
|
||||
class ExhentaiGalleryExtractor(ExhentaiExtractor):
|
||||
@ -180,6 +174,7 @@ class ExhentaiGalleryExtractor(ExhentaiExtractor):
|
||||
self.image_token = match.group(4)
|
||||
self.image_num = text.parse_int(match.group(6), 1)
|
||||
|
||||
def _init(self):
|
||||
source = self.config("source")
|
||||
if source == "hitomi":
|
||||
self.items = self._items_hitomi
|
||||
@ -399,8 +394,9 @@ class ExhentaiGalleryExtractor(ExhentaiExtractor):
|
||||
url = "https://e-hentai.org/home.php"
|
||||
cookies = {
|
||||
cookie.name: cookie.value
|
||||
for cookie in self.session.cookies
|
||||
if cookie.domain == self.cookiedomain and cookie.name != "igneous"
|
||||
for cookie in self.cookies
|
||||
if cookie.domain == self.cookies_domain and
|
||||
cookie.name != "igneous"
|
||||
}
|
||||
|
||||
page = self.request(url, cookies=cookies).text
|
||||
|
@ -6,9 +6,9 @@
|
||||
|
||||
"""Extractors for https://www.fanbox.cc/"""
|
||||
|
||||
import re
|
||||
from .common import Extractor, Message
|
||||
from .. import text
|
||||
import re
|
||||
|
||||
|
||||
BASE_PATTERN = (
|
||||
@ -27,14 +27,12 @@ class FanboxExtractor(Extractor):
|
||||
archive_fmt = "{id}_{num}"
|
||||
_warning = True
|
||||
|
||||
def __init__(self, match):
|
||||
Extractor.__init__(self, match)
|
||||
def _init(self):
|
||||
self.embeds = self.config("embeds", True)
|
||||
|
||||
def items(self):
|
||||
|
||||
if self._warning:
|
||||
if not self._check_cookies(("FANBOXSESSID",)):
|
||||
if not self.cookies_check(("FANBOXSESSID",)):
|
||||
self.log.warning("no 'FANBOXSESSID' cookie set")
|
||||
FanboxExtractor._warning = False
|
||||
|
||||
@ -52,8 +50,11 @@ class FanboxExtractor(Extractor):
|
||||
url = text.ensure_http_scheme(url)
|
||||
body = self.request(url, headers=headers).json()["body"]
|
||||
for item in body["items"]:
|
||||
yield self._get_post_data(item["id"])
|
||||
|
||||
try:
|
||||
yield self._get_post_data(item["id"])
|
||||
except Exception as exc:
|
||||
self.log.warning("Skipping post %s (%s: %s)",
|
||||
item["id"], exc.__class__.__name__, exc)
|
||||
url = body["nextUrl"]
|
||||
|
||||
def _get_post_data(self, post_id):
|
||||
@ -211,9 +212,15 @@ class FanboxExtractor(Extractor):
|
||||
# to a proper Fanbox URL
|
||||
url = "https://www.pixiv.net/fanbox/"+content_id
|
||||
# resolve redirect
|
||||
response = self.request(url, method="HEAD", allow_redirects=False)
|
||||
url = response.headers["Location"]
|
||||
final_post["_extractor"] = FanboxPostExtractor
|
||||
try:
|
||||
url = self.request(url, method="HEAD",
|
||||
allow_redirects=False).headers["location"]
|
||||
except Exception as exc:
|
||||
url = None
|
||||
self.log.warning("Unable to extract fanbox embed %s (%s: %s)",
|
||||
content_id, exc.__class__.__name__, exc)
|
||||
else:
|
||||
final_post["_extractor"] = FanboxPostExtractor
|
||||
elif provider == "twitter":
|
||||
url = "https://twitter.com/_/status/"+content_id
|
||||
elif provider == "google_forms":
|
||||
|
@ -23,30 +23,54 @@ class FantiaExtractor(Extractor):
|
||||
self.headers = {
|
||||
"Accept" : "application/json, text/plain, */*",
|
||||
"Referer": self.root,
|
||||
"X-Requested-With": "XMLHttpRequest",
|
||||
}
|
||||
_empty_plan = {
|
||||
"id" : 0,
|
||||
"price": 0,
|
||||
"limit": 0,
|
||||
"name" : "",
|
||||
"description": "",
|
||||
"thumb": self.root + "/images/fallback/plan/thumb_default.png",
|
||||
}
|
||||
|
||||
if self._warning:
|
||||
if not self._check_cookies(("_session_id",)):
|
||||
if not self.cookies_check(("_session_id",)):
|
||||
self.log.warning("no '_session_id' cookie set")
|
||||
FantiaExtractor._warning = False
|
||||
|
||||
for post_id in self.posts():
|
||||
full_response, post = self._get_post_data(post_id)
|
||||
yield Message.Directory, post
|
||||
post = self._get_post_data(post_id)
|
||||
post["num"] = 0
|
||||
for url, url_data in self._get_urls_from_post(full_response, post):
|
||||
post["num"] += 1
|
||||
fname = url_data["content_filename"] or url
|
||||
text.nameext_from_url(fname, url_data)
|
||||
url_data["file_url"] = url
|
||||
yield Message.Url, url, url_data
|
||||
|
||||
for content in self._get_post_contents(post):
|
||||
post["content_category"] = content["category"]
|
||||
post["content_title"] = content["title"]
|
||||
post["content_filename"] = content.get("filename", "")
|
||||
post["content_id"] = content["id"]
|
||||
post["plan"] = content["plan"] or _empty_plan
|
||||
yield Message.Directory, post
|
||||
|
||||
if content["visible_status"] != "visible":
|
||||
self.log.warning(
|
||||
"Unable to download '%s' files from "
|
||||
"%s#post-content-id-%s", content["visible_status"],
|
||||
post["post_url"], content["id"])
|
||||
|
||||
for url in self._get_content_urls(post, content):
|
||||
text.nameext_from_url(
|
||||
post["content_filename"] or url, post)
|
||||
post["file_url"] = url
|
||||
post["num"] += 1
|
||||
yield Message.Url, url, post
|
||||
|
||||
def posts(self):
|
||||
"""Return post IDs"""
|
||||
|
||||
def _pagination(self, url):
|
||||
params = {"page": 1}
|
||||
headers = self.headers
|
||||
headers = self.headers.copy()
|
||||
del headers["X-Requested-With"]
|
||||
|
||||
while True:
|
||||
page = self.request(url, params=params, headers=headers).text
|
||||
@ -71,7 +95,7 @@ class FantiaExtractor(Extractor):
|
||||
"""Fetch and process post data"""
|
||||
url = self.root+"/api/v1/posts/"+post_id
|
||||
resp = self.request(url, headers=self.headers).json()["post"]
|
||||
post = {
|
||||
return {
|
||||
"post_id": resp["id"],
|
||||
"post_url": self.root + "/posts/" + str(resp["id"]),
|
||||
"post_title": resp["title"],
|
||||
@ -85,55 +109,65 @@ class FantiaExtractor(Extractor):
|
||||
"fanclub_user_name": resp["fanclub"]["user"]["name"],
|
||||
"fanclub_name": resp["fanclub"]["name"],
|
||||
"fanclub_url": self.root+"/fanclubs/"+str(resp["fanclub"]["id"]),
|
||||
"tags": resp["tags"]
|
||||
"tags": resp["tags"],
|
||||
"_data": resp,
|
||||
}
|
||||
return resp, post
|
||||
|
||||
def _get_urls_from_post(self, resp, post):
|
||||
def _get_post_contents(self, post):
|
||||
contents = post["_data"]["post_contents"]
|
||||
|
||||
try:
|
||||
url = post["_data"]["thumb"]["original"]
|
||||
except Exception:
|
||||
pass
|
||||
else:
|
||||
contents.insert(0, {
|
||||
"id": "thumb",
|
||||
"title": "thumb",
|
||||
"category": "thumb",
|
||||
"download_uri": url,
|
||||
"visible_status": "visible",
|
||||
"plan": None,
|
||||
})
|
||||
|
||||
return contents
|
||||
|
||||
def _get_content_urls(self, post, content):
|
||||
"""Extract individual URL data from the response"""
|
||||
if "thumb" in resp and resp["thumb"] and "original" in resp["thumb"]:
|
||||
post["content_filename"] = ""
|
||||
post["content_category"] = "thumb"
|
||||
post["file_id"] = "thumb"
|
||||
yield resp["thumb"]["original"], post
|
||||
if "comment" in content:
|
||||
post["content_comment"] = content["comment"]
|
||||
|
||||
for content in resp["post_contents"]:
|
||||
post["content_category"] = content["category"]
|
||||
post["content_title"] = content["title"]
|
||||
post["content_filename"] = content.get("filename", "")
|
||||
post["content_id"] = content["id"]
|
||||
if "post_content_photos" in content:
|
||||
for photo in content["post_content_photos"]:
|
||||
post["file_id"] = photo["id"]
|
||||
yield photo["url"]["original"]
|
||||
|
||||
if "comment" in content:
|
||||
post["content_comment"] = content["comment"]
|
||||
if "download_uri" in content:
|
||||
post["file_id"] = content["id"]
|
||||
url = content["download_uri"]
|
||||
if url[0] == "/":
|
||||
url = self.root + url
|
||||
yield url
|
||||
|
||||
if "post_content_photos" in content:
|
||||
for photo in content["post_content_photos"]:
|
||||
post["file_id"] = photo["id"]
|
||||
yield photo["url"]["original"], post
|
||||
if content["category"] == "blog" and "comment" in content:
|
||||
comment_json = util.json_loads(content["comment"])
|
||||
ops = comment_json.get("ops") or ()
|
||||
|
||||
if "download_uri" in content:
|
||||
post["file_id"] = content["id"]
|
||||
yield self.root+"/"+content["download_uri"], post
|
||||
# collect blogpost text first
|
||||
blog_text = ""
|
||||
for op in ops:
|
||||
insert = op.get("insert")
|
||||
if isinstance(insert, str):
|
||||
blog_text += insert
|
||||
post["blogpost_text"] = blog_text
|
||||
|
||||
if content["category"] == "blog" and "comment" in content:
|
||||
comment_json = util.json_loads(content["comment"])
|
||||
ops = comment_json.get("ops", ())
|
||||
|
||||
# collect blogpost text first
|
||||
blog_text = ""
|
||||
for op in ops:
|
||||
insert = op.get("insert")
|
||||
if isinstance(insert, str):
|
||||
blog_text += insert
|
||||
post["blogpost_text"] = blog_text
|
||||
|
||||
# collect images
|
||||
for op in ops:
|
||||
insert = op.get("insert")
|
||||
if isinstance(insert, dict) and "fantiaImage" in insert:
|
||||
img = insert["fantiaImage"]
|
||||
post["file_id"] = img["id"]
|
||||
yield "https://fantia.jp" + img["original_url"], post
|
||||
# collect images
|
||||
for op in ops:
|
||||
insert = op.get("insert")
|
||||
if isinstance(insert, dict) and "fantiaImage" in insert:
|
||||
img = insert["fantiaImage"]
|
||||
post["file_id"] = img["id"]
|
||||
yield self.root + img["original_url"]
|
||||
|
||||
|
||||
class FantiaCreatorExtractor(FantiaExtractor):
|
||||
|
@ -20,12 +20,16 @@ class FlickrExtractor(Extractor):
|
||||
filename_fmt = "{category}_{id}.{extension}"
|
||||
directory_fmt = ("{category}", "{user[username]}")
|
||||
archive_fmt = "{id}"
|
||||
cookiedomain = None
|
||||
cookies_domain = None
|
||||
request_interval = (1.0, 2.0)
|
||||
request_interval_min = 0.2
|
||||
|
||||
def __init__(self, match):
|
||||
Extractor.__init__(self, match)
|
||||
self.api = FlickrAPI(self)
|
||||
self.item_id = match.group(1)
|
||||
|
||||
def _init(self):
|
||||
self.api = FlickrAPI(self)
|
||||
self.user = None
|
||||
|
||||
def items(self):
|
||||
@ -106,6 +110,8 @@ class FlickrImageExtractor(FlickrExtractor):
|
||||
|
||||
def items(self):
|
||||
photo = self.api.photos_getInfo(self.item_id)
|
||||
if self.api.exif:
|
||||
photo.update(self.api.photos_getExif(self.item_id))
|
||||
|
||||
if photo["media"] == "video" and self.api.videos:
|
||||
self.api._extract_video(photo)
|
||||
@ -287,8 +293,8 @@ class FlickrAPI(oauth.OAuth1API):
|
||||
"""
|
||||
|
||||
API_URL = "https://api.flickr.com/services/rest/"
|
||||
API_KEY = "ac4fd7aa98585b9eee1ba761c209de68"
|
||||
API_SECRET = "3adb0f568dc68393"
|
||||
API_KEY = "f8f78d1a40debf471f0b22fa2d00525f"
|
||||
API_SECRET = "4f9dae1113e45556"
|
||||
FORMATS = [
|
||||
("o" , "Original" , None),
|
||||
("6k", "X-Large 6K" , 6144),
|
||||
@ -323,6 +329,7 @@ class FlickrAPI(oauth.OAuth1API):
|
||||
def __init__(self, extractor):
|
||||
oauth.OAuth1API.__init__(self, extractor)
|
||||
|
||||
self.exif = extractor.config("exif", False)
|
||||
self.videos = extractor.config("videos", True)
|
||||
self.maxsize = extractor.config("size-max")
|
||||
if isinstance(self.maxsize, str):
|
||||
@ -367,6 +374,11 @@ class FlickrAPI(oauth.OAuth1API):
|
||||
params = {"user_id": user_id}
|
||||
return self._pagination("people.getPhotos", params)
|
||||
|
||||
def photos_getExif(self, photo_id):
|
||||
"""Retrieves a list of EXIF/TIFF/GPS tags for a given photo."""
|
||||
params = {"photo_id": photo_id}
|
||||
return self._call("photos.getExif", params)["photo"]
|
||||
|
||||
def photos_getInfo(self, photo_id):
|
||||
"""Get information about a photo."""
|
||||
params = {"photo_id": photo_id}
|
||||
@ -451,9 +463,19 @@ class FlickrAPI(oauth.OAuth1API):
|
||||
return data
|
||||
|
||||
def _pagination(self, method, params, key="photos"):
|
||||
params["extras"] = ("description,date_upload,tags,views,media,"
|
||||
"path_alias,owner_name,")
|
||||
params["extras"] += ",".join("url_" + fmt[0] for fmt in self.formats)
|
||||
extras = ("description,date_upload,tags,views,media,"
|
||||
"path_alias,owner_name,")
|
||||
includes = self.extractor.config("metadata")
|
||||
if includes:
|
||||
if isinstance(includes, (list, tuple)):
|
||||
includes = ",".join(includes)
|
||||
elif not isinstance(includes, str):
|
||||
includes = ("license,date_taken,original_format,last_update,"
|
||||
"geo,machine_tags,o_dims")
|
||||
extras = extras + includes + ","
|
||||
extras += ",".join("url_" + fmt[0] for fmt in self.formats)
|
||||
|
||||
params["extras"] = extras
|
||||
params["page"] = 1
|
||||
|
||||
while True:
|
||||
@ -478,6 +500,9 @@ class FlickrAPI(oauth.OAuth1API):
|
||||
photo["views"] = text.parse_int(photo["views"])
|
||||
photo["date"] = text.parse_timestamp(photo["dateupload"])
|
||||
photo["tags"] = photo["tags"].split()
|
||||
|
||||
if self.exif:
|
||||
photo.update(self.photos_getExif(photo["id"]))
|
||||
photo["id"] = text.parse_int(photo["id"])
|
||||
|
||||
if "owner" in photo:
|
||||
|
@ -1,6 +1,6 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2019-2022 Mike Fährmann
|
||||
# Copyright 2019-2023 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
@ -22,10 +22,12 @@ class FoolfuukaExtractor(BaseExtractor):
|
||||
|
||||
def __init__(self, match):
|
||||
BaseExtractor.__init__(self, match)
|
||||
self.session.headers["Referer"] = self.root
|
||||
if self.category == "b4k":
|
||||
self.remote = self._remote_direct
|
||||
|
||||
def _init(self):
|
||||
self.session.headers["Referer"] = self.root + "/"
|
||||
|
||||
def items(self):
|
||||
yield Message.Directory, self.metadata()
|
||||
for post in self.posts():
|
||||
@ -88,13 +90,9 @@ BASE_PATTERN = FoolfuukaExtractor.update({
|
||||
"root": "https://boards.fireden.net",
|
||||
"pattern": r"boards\.fireden\.net",
|
||||
},
|
||||
"rozenarcana": {
|
||||
"root": "https://archive.alice.al",
|
||||
"pattern": r"(?:archive\.)?alice\.al",
|
||||
},
|
||||
"tokyochronos": {
|
||||
"root": "https://www.tokyochronos.net",
|
||||
"pattern": r"(?:www\.)?tokyochronos\.net",
|
||||
"palanq": {
|
||||
"root": "https://archive.palanq.win",
|
||||
"pattern": r"archive\.palanq\.win",
|
||||
},
|
||||
"rbt": {
|
||||
"root": "https://rbt.asia",
|
||||
@ -137,11 +135,8 @@ class FoolfuukaThreadExtractor(FoolfuukaExtractor):
|
||||
("https://boards.fireden.net/sci/thread/11264294/", {
|
||||
"url": "61cab625c95584a12a30049d054931d64f8d20aa",
|
||||
}),
|
||||
("https://archive.alice.al/c/thread/2849220/", {
|
||||
"url": "632e2c8de05de6b3847685f4bf1b4e5c6c9e0ed5",
|
||||
}),
|
||||
("https://www.tokyochronos.net/a/thread/241664141/", {
|
||||
"url": "ae03852cf44e3dcfce5be70274cb1828e1dbb7d6",
|
||||
("https://archive.palanq.win/c/thread/4209598/", {
|
||||
"url": "1f9b5570d228f1f2991c827a6631030bc0e5933c",
|
||||
}),
|
||||
("https://rbt.asia/g/thread/61487650/", {
|
||||
"url": "fadd274b25150a1bdf03a40c58db320fa3b617c4",
|
||||
@ -187,8 +182,7 @@ class FoolfuukaBoardExtractor(FoolfuukaExtractor):
|
||||
("https://arch.b4k.co/meta/"),
|
||||
("https://desuarchive.org/a/"),
|
||||
("https://boards.fireden.net/sci/"),
|
||||
("https://archive.alice.al/c/"),
|
||||
("https://www.tokyochronos.net/a/"),
|
||||
("https://archive.palanq.win/c/"),
|
||||
("https://rbt.asia/g/"),
|
||||
("https://thebarchive.com/b/"),
|
||||
)
|
||||
@ -231,8 +225,7 @@ class FoolfuukaSearchExtractor(FoolfuukaExtractor):
|
||||
("https://archiveofsins.com/_/search/text/test/"),
|
||||
("https://desuarchive.org/_/search/text/test/"),
|
||||
("https://boards.fireden.net/_/search/text/test/"),
|
||||
("https://archive.alice.al/_/search/text/test/"),
|
||||
("https://www.tokyochronos.net/_/search/text/test/"),
|
||||
("https://archive.palanq.win/_/search/text/test/"),
|
||||
("https://rbt.asia/_/search/text/test/"),
|
||||
("https://thebarchive.com/_/search/text/test/"),
|
||||
)
|
||||
@ -297,8 +290,7 @@ class FoolfuukaGalleryExtractor(FoolfuukaExtractor):
|
||||
("https://arch.b4k.co/meta/gallery/"),
|
||||
("https://desuarchive.org/a/gallery/5"),
|
||||
("https://boards.fireden.net/sci/gallery/6"),
|
||||
("https://archive.alice.al/c/gallery/7"),
|
||||
("https://www.tokyochronos.net/a/gallery/7"),
|
||||
("https://archive.palanq.win/c/gallery"),
|
||||
("https://rbt.asia/g/gallery/8"),
|
||||
("https://thebarchive.com/b/gallery/9"),
|
||||
)
|
||||
|
@ -42,11 +42,6 @@ BASE_PATTERN = FoolslideExtractor.update({
|
||||
"root": "https://read.powermanga.org",
|
||||
"pattern": r"read(?:er)?\.powermanga\.org",
|
||||
},
|
||||
"sensescans": {
|
||||
"root": "https://sensescans.com/reader",
|
||||
"pattern": r"(?:(?:www\.)?sensescans\.com/reader"
|
||||
r"|reader\.sensescans\.com)",
|
||||
},
|
||||
})
|
||||
|
||||
|
||||
@ -64,11 +59,6 @@ class FoolslideChapterExtractor(FoolslideExtractor):
|
||||
"url": "854c5817f8f767e1bccd05fa9d58ffb5a4b09384",
|
||||
"keyword": "a60c42f2634b7387899299d411ff494ed0ad6dbe",
|
||||
}),
|
||||
("https://sensescans.com/reader/read/ao_no_orchestra/en/0/26/", {
|
||||
"url": "bbd428dc578f5055e9f86ad635b510386cd317cd",
|
||||
"keyword": "083ef6f8831c84127fe4096fa340a249be9d1424",
|
||||
}),
|
||||
("https://reader.sensescans.com/read/ao_no_orchestra/en/0/26/"),
|
||||
)
|
||||
|
||||
def items(self):
|
||||
@ -129,9 +119,6 @@ class FoolslideMangaExtractor(FoolslideExtractor):
|
||||
"volume": int,
|
||||
},
|
||||
}),
|
||||
("https://sensescans.com/reader/series/yotsubato/", {
|
||||
"count": ">= 3",
|
||||
}),
|
||||
)
|
||||
|
||||
def items(self):
|
||||
|
@ -1,6 +1,6 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2020-2022 Mike Fährmann
|
||||
# Copyright 2020-2023 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
@ -20,13 +20,16 @@ class FuraffinityExtractor(Extractor):
|
||||
directory_fmt = ("{category}", "{user!l}")
|
||||
filename_fmt = "{id}{title:? //}.{extension}"
|
||||
archive_fmt = "{id}"
|
||||
cookiedomain = ".furaffinity.net"
|
||||
cookies_domain = ".furaffinity.net"
|
||||
cookies_names = ("a", "b")
|
||||
root = "https://www.furaffinity.net"
|
||||
_warning = True
|
||||
|
||||
def __init__(self, match):
|
||||
Extractor.__init__(self, match)
|
||||
self.user = match.group(1)
|
||||
|
||||
def _init(self):
|
||||
self.offset = 0
|
||||
|
||||
if self.config("descriptions") == "html":
|
||||
@ -39,9 +42,8 @@ class FuraffinityExtractor(Extractor):
|
||||
self._new_layout = None
|
||||
|
||||
def items(self):
|
||||
|
||||
if self._warning:
|
||||
if not self._check_cookies(("a", "b")):
|
||||
if not self.cookies_check(self.cookies_names):
|
||||
self.log.warning("no 'a' and 'b' session cookies set")
|
||||
FuraffinityExtractor._warning = False
|
||||
|
||||
@ -98,7 +100,9 @@ class FuraffinityExtractor(Extractor):
|
||||
'class="tags-row">', '</section>'))
|
||||
data["title"] = text.unescape(extr("<h2><p>", "</p></h2>"))
|
||||
data["artist"] = extr("<strong>", "<")
|
||||
data["_description"] = extr('class="section-body">', '</div>')
|
||||
data["_description"] = extr(
|
||||
'class="submission-description user-submitted-links">',
|
||||
' </div>')
|
||||
data["views"] = pi(rh(extr('class="views">', '</span>')))
|
||||
data["favorites"] = pi(rh(extr('class="favorites">', '</span>')))
|
||||
data["comments"] = pi(rh(extr('class="comments">', '</span>')))
|
||||
@ -125,7 +129,9 @@ class FuraffinityExtractor(Extractor):
|
||||
data["tags"] = text.split_html(extr(
|
||||
'id="keywords">', '</div>'))[::2]
|
||||
data["rating"] = extr('<img alt="', ' ')
|
||||
data["_description"] = extr("</table>", "</table>")
|
||||
data["_description"] = extr(
|
||||
'<td valign="top" align="left" width="70%" class="alt1" '
|
||||
'style="padding:8px">', ' </td>')
|
||||
|
||||
data["artist_url"] = data["artist"].replace("_", "").lower()
|
||||
data["user"] = self.user or data["artist_url"]
|
||||
@ -159,7 +165,13 @@ class FuraffinityExtractor(Extractor):
|
||||
|
||||
while path:
|
||||
page = self.request(self.root + path).text
|
||||
yield from text.extract_iter(page, 'id="sid-', '"')
|
||||
extr = text.extract_from(page)
|
||||
while True:
|
||||
post_id = extr('id="sid-', '"')
|
||||
if not post_id:
|
||||
break
|
||||
self._favorite_id = text.parse_int(extr('data-fav-id="', '"'))
|
||||
yield post_id
|
||||
path = text.extr(page, 'right" href="', '"')
|
||||
|
||||
def _pagination_search(self, query):
|
||||
@ -241,6 +253,7 @@ class FuraffinityFavoriteExtractor(FuraffinityExtractor):
|
||||
test = ("https://www.furaffinity.net/favorites/mirlinthloth/", {
|
||||
"pattern": r"https://d\d?\.f(uraffinity|acdn)\.net"
|
||||
r"/art/[^/]+/\d+/\d+.\w+\.\w+",
|
||||
"keyword": {"favorite_id": int},
|
||||
"range": "45-50",
|
||||
"count": 6,
|
||||
})
|
||||
@ -248,6 +261,12 @@ class FuraffinityFavoriteExtractor(FuraffinityExtractor):
|
||||
def posts(self):
|
||||
return self._pagination_favorites()
|
||||
|
||||
def _parse_post(self, post_id):
|
||||
post = FuraffinityExtractor._parse_post(self, post_id)
|
||||
if post:
|
||||
post["favorite_id"] = self._favorite_id
|
||||
return post
|
||||
|
||||
|
||||
class FuraffinitySearchExtractor(FuraffinityExtractor):
|
||||
"""Extractor for furaffinity search results"""
|
||||
@ -354,7 +373,7 @@ class FuraffinityPostExtractor(FuraffinityExtractor):
|
||||
class FuraffinityUserExtractor(FuraffinityExtractor):
|
||||
"""Extractor for furaffinity user profiles"""
|
||||
subcategory = "user"
|
||||
cookiedomain = None
|
||||
cookies_domain = None
|
||||
pattern = BASE_PATTERN + r"/user/([^/?#]+)"
|
||||
test = (
|
||||
("https://www.furaffinity.net/user/mirlinthloth/", {
|
||||
@ -367,6 +386,9 @@ class FuraffinityUserExtractor(FuraffinityExtractor):
|
||||
}),
|
||||
)
|
||||
|
||||
def initialize(self):
|
||||
pass
|
||||
|
||||
def items(self):
|
||||
base = "{}/{{}}/{}/".format(self.root, self.user)
|
||||
return self._dispatch_extractors((
|
||||
|
@ -1,6 +1,6 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2021-2022 Mike Fährmann
|
||||
# Copyright 2021-2023 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
@ -19,29 +19,32 @@ class GelbooruV01Extractor(booru.BooruExtractor):
|
||||
def _parse_post(self, post_id):
|
||||
url = "{}/index.php?page=post&s=view&id={}".format(
|
||||
self.root, post_id)
|
||||
page = self.request(url).text
|
||||
extr = text.extract_from(self.request(url).text)
|
||||
|
||||
post = text.extract_all(page, (
|
||||
("created_at", 'Posted: ', ' <'),
|
||||
("uploader" , 'By: ', ' <'),
|
||||
("width" , 'Size: ', 'x'),
|
||||
("height" , '', ' <'),
|
||||
("source" , 'Source: <a href="', '"'),
|
||||
("rating" , 'Rating: ', '<'),
|
||||
("score" , 'Score: ', ' <'),
|
||||
("file_url" , '<img alt="img" src="', '"'),
|
||||
("tags" , 'id="tags" name="tags" cols="40" rows="5">', '<'),
|
||||
))[0]
|
||||
post = {
|
||||
"id" : post_id,
|
||||
"created_at": extr('Posted: ', ' <'),
|
||||
"uploader" : extr('By: ', ' <'),
|
||||
"width" : extr('Size: ', 'x'),
|
||||
"height" : extr('', ' <'),
|
||||
"source" : extr('Source: ', ' <'),
|
||||
"rating" : (extr('Rating: ', '<') or "?")[0].lower(),
|
||||
"score" : extr('Score: ', ' <'),
|
||||
"file_url" : extr('<img alt="img" src="', '"'),
|
||||
"tags" : text.unescape(extr(
|
||||
'id="tags" name="tags" cols="40" rows="5">', '<')),
|
||||
}
|
||||
|
||||
post["id"] = post_id
|
||||
post["md5"] = post["file_url"].rpartition("/")[2].partition(".")[0]
|
||||
post["rating"] = (post["rating"] or "?")[0].lower()
|
||||
post["tags"] = text.unescape(post["tags"])
|
||||
post["date"] = text.parse_datetime(
|
||||
post["created_at"], "%Y-%m-%d %H:%M:%S")
|
||||
|
||||
return post
|
||||
|
||||
def skip(self, num):
|
||||
self.page_start += num
|
||||
return num
|
||||
|
||||
def _pagination(self, url, begin, end):
|
||||
pid = self.page_start
|
||||
|
||||
@ -75,9 +78,9 @@ BASE_PATTERN = GelbooruV01Extractor.update({
|
||||
"root": "https://drawfriends.booru.org",
|
||||
"pattern": r"drawfriends\.booru\.org",
|
||||
},
|
||||
"vidyart": {
|
||||
"root": "https://vidyart.booru.org",
|
||||
"pattern": r"vidyart\.booru\.org",
|
||||
"vidyart2": {
|
||||
"root": "https://vidyart2.booru.org",
|
||||
"pattern": r"vidyart2\.booru\.org",
|
||||
},
|
||||
})
|
||||
|
||||
@ -103,7 +106,7 @@ class GelbooruV01TagExtractor(GelbooruV01Extractor):
|
||||
"count": 25,
|
||||
}),
|
||||
("https://drawfriends.booru.org/index.php?page=post&s=list&tags=all"),
|
||||
("https://vidyart.booru.org/index.php?page=post&s=list&tags=all"),
|
||||
("https://vidyart2.booru.org/index.php?page=post&s=list&tags=all"),
|
||||
)
|
||||
|
||||
def __init__(self, match):
|
||||
@ -138,7 +141,7 @@ class GelbooruV01FavoriteExtractor(GelbooruV01Extractor):
|
||||
"count": 4,
|
||||
}),
|
||||
("https://drawfriends.booru.org/index.php?page=favorites&s=view&id=1"),
|
||||
("https://vidyart.booru.org/index.php?page=favorites&s=view&id=1"),
|
||||
("https://vidyart2.booru.org/index.php?page=favorites&s=view&id=1"),
|
||||
)
|
||||
|
||||
def __init__(self, match):
|
||||
@ -182,7 +185,7 @@ class GelbooruV01PostExtractor(GelbooruV01Extractor):
|
||||
"md5": "2aaa0438d58fc7baa75a53b4a9621bb89a9d3fdb",
|
||||
"rating": "s",
|
||||
"score": str,
|
||||
"source": None,
|
||||
"source": "",
|
||||
"tags": "blush dress green_eyes green_hair hatsune_miku "
|
||||
"long_hair twintails vocaloid",
|
||||
"uploader": "Honochi31",
|
||||
@ -190,7 +193,7 @@ class GelbooruV01PostExtractor(GelbooruV01Extractor):
|
||||
},
|
||||
}),
|
||||
("https://drawfriends.booru.org/index.php?page=post&s=view&id=107474"),
|
||||
("https://vidyart.booru.org/index.php?page=post&s=view&id=383111"),
|
||||
("https://vidyart2.booru.org/index.php?page=post&s=view&id=39168"),
|
||||
)
|
||||
|
||||
def __init__(self, match):
|
||||
|
@ -19,8 +19,7 @@ import re
|
||||
class GelbooruV02Extractor(booru.BooruExtractor):
|
||||
basecategory = "gelbooru_v02"
|
||||
|
||||
def __init__(self, match):
|
||||
booru.BooruExtractor.__init__(self, match)
|
||||
def _init(self):
|
||||
self.api_key = self.config("api-key")
|
||||
self.user_id = self.config("user-id")
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2017-2022 Mike Fährmann
|
||||
# Copyright 2017-2023 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
@ -10,6 +10,7 @@
|
||||
|
||||
from .common import Extractor, Message
|
||||
from .. import text, exception
|
||||
from ..cache import cache
|
||||
|
||||
|
||||
class GfycatExtractor(Extractor):
|
||||
@ -23,6 +24,7 @@ class GfycatExtractor(Extractor):
|
||||
Extractor.__init__(self, match)
|
||||
self.key = match.group(1).lower()
|
||||
|
||||
def _init(self):
|
||||
formats = self.config("format")
|
||||
if formats is None:
|
||||
formats = ("mp4", "webm", "mobile", "gif")
|
||||
@ -80,6 +82,8 @@ class GfycatUserExtractor(GfycatExtractor):
|
||||
})
|
||||
|
||||
def gfycats(self):
|
||||
if self.key == "me":
|
||||
return GfycatAPI(self).me()
|
||||
return GfycatAPI(self).user(self.key)
|
||||
|
||||
|
||||
@ -219,15 +223,8 @@ class GfycatAPI():
|
||||
|
||||
def __init__(self, extractor):
|
||||
self.extractor = extractor
|
||||
|
||||
def gfycat(self, gfycat_id):
|
||||
endpoint = "/v1/gfycats/" + gfycat_id
|
||||
return self._call(endpoint)["gfyItem"]
|
||||
|
||||
def user(self, user):
|
||||
endpoint = "/v1/users/{}/gfycats".format(user.lower())
|
||||
params = {"count": 100}
|
||||
return self._pagination(endpoint, params)
|
||||
self.headers = {}
|
||||
self.username, self.password = extractor._get_auth_info()
|
||||
|
||||
def collection(self, user, collection):
|
||||
endpoint = "/v1/users/{}/collections/{}/gfycats".format(
|
||||
@ -240,14 +237,64 @@ class GfycatAPI():
|
||||
params = {"count": 100}
|
||||
return self._pagination(endpoint, params, "gfyCollections")
|
||||
|
||||
def gfycat(self, gfycat_id):
|
||||
endpoint = "/v1/gfycats/" + gfycat_id
|
||||
return self._call(endpoint)["gfyItem"]
|
||||
|
||||
def me(self):
|
||||
endpoint = "/v1/me/gfycats"
|
||||
params = {"count": 100}
|
||||
return self._pagination(endpoint, params)
|
||||
|
||||
def search(self, query):
|
||||
endpoint = "/v1/gfycats/search"
|
||||
params = {"search_text": query, "count": 150}
|
||||
return self._pagination(endpoint, params)
|
||||
|
||||
def user(self, user):
|
||||
endpoint = "/v1/users/{}/gfycats".format(user.lower())
|
||||
params = {"count": 100}
|
||||
return self._pagination(endpoint, params)
|
||||
|
||||
def authenticate(self):
|
||||
self.headers["Authorization"] = \
|
||||
self._authenticate_impl(self.username, self.password)
|
||||
|
||||
@cache(maxage=3600, keyarg=1)
|
||||
def _authenticate_impl(self, username, password):
|
||||
self.extractor.log.info("Logging in as %s", username)
|
||||
|
||||
url = "https://weblogin.gfycat.com/oauth/webtoken"
|
||||
headers = {"Origin": "https://gfycat.com"}
|
||||
data = {
|
||||
"access_key": "Anr96uuqt9EdamSCwK4txKPjMsf2"
|
||||
"M95Rfa5FLLhPFucu8H5HTzeutyAa",
|
||||
}
|
||||
response = self.extractor.request(
|
||||
url, method="POST", headers=headers, json=data).json()
|
||||
|
||||
url = "https://weblogin.gfycat.com/oauth/weblogin"
|
||||
headers["authorization"] = "Bearer " + response["access_token"]
|
||||
data = {
|
||||
"grant_type": "password",
|
||||
"username" : username,
|
||||
"password" : password,
|
||||
}
|
||||
response = self.extractor.request(
|
||||
url, method="POST", headers=headers, json=data, fatal=None).json()
|
||||
|
||||
if "errorMessage" in response:
|
||||
raise exception.AuthenticationError(
|
||||
response["errorMessage"]["description"])
|
||||
return "Bearer " + response["access_token"]
|
||||
|
||||
def _call(self, endpoint, params=None):
|
||||
if self.username:
|
||||
self.authenticate()
|
||||
|
||||
url = self.API_ROOT + endpoint
|
||||
return self.extractor.request(url, params=params).json()
|
||||
return self.extractor.request(
|
||||
url, params=params, headers=self.headers).json()
|
||||
|
||||
def _pagination(self, endpoint, params, key="gfycats"):
|
||||
while True:
|
||||
|
@ -6,7 +6,8 @@
|
||||
|
||||
from .common import Extractor, Message
|
||||
from .. import text, exception
|
||||
from ..cache import memcache
|
||||
from ..cache import cache, memcache
|
||||
import hashlib
|
||||
|
||||
|
||||
class GofileFolderExtractor(Extractor):
|
||||
@ -66,19 +67,18 @@ class GofileFolderExtractor(Extractor):
|
||||
|
||||
def items(self):
|
||||
recursive = self.config("recursive")
|
||||
password = self.config("password")
|
||||
|
||||
token = self.config("api-token")
|
||||
if not token:
|
||||
token = self._create_account()
|
||||
self.session.cookies.set("accountToken", token, domain=".gofile.io")
|
||||
self.cookies.set("accountToken", token, domain=".gofile.io")
|
||||
self.api_token = token
|
||||
|
||||
token = self.config("website-token", "12345")
|
||||
if not token:
|
||||
token = self._get_website_token()
|
||||
self.website_token = token
|
||||
self.website_token = (self.config("website-token") or
|
||||
self._get_website_token())
|
||||
|
||||
folder = self._get_content(self.content_id)
|
||||
folder = self._get_content(self.content_id, password)
|
||||
yield Message.Directory, folder
|
||||
|
||||
num = 0
|
||||
@ -109,17 +109,20 @@ class GofileFolderExtractor(Extractor):
|
||||
self.log.debug("Creating temporary account")
|
||||
return self._api_request("createAccount")["token"]
|
||||
|
||||
@memcache()
|
||||
@cache(maxage=86400)
|
||||
def _get_website_token(self):
|
||||
self.log.debug("Fetching website token")
|
||||
page = self.request(self.root + "/contents/files.html").text
|
||||
return text.extract(page, "websiteToken:", ",")[0].strip("\" ")
|
||||
page = self.request(self.root + "/dist/js/alljs.js").text
|
||||
return text.extr(page, 'fetchData.websiteToken = "', '"')
|
||||
|
||||
def _get_content(self, content_id):
|
||||
def _get_content(self, content_id, password=None):
|
||||
if password is not None:
|
||||
password = hashlib.sha256(password.encode()).hexdigest()
|
||||
return self._api_request("getContent", {
|
||||
"contentId" : content_id,
|
||||
"token" : self.api_token,
|
||||
"websiteToken": self.website_token,
|
||||
"password" : password,
|
||||
})
|
||||
|
||||
def _api_request(self, endpoint, params=None):
|
||||
|
@ -57,7 +57,9 @@ class HentaicosplaysGalleryExtractor(GalleryExtractor):
|
||||
self.root = text.ensure_http_scheme(root)
|
||||
url = "{}/story/{}/".format(self.root, self.slug)
|
||||
GalleryExtractor.__init__(self, match, url)
|
||||
self.session.headers["Referer"] = url
|
||||
|
||||
def _init(self):
|
||||
self.session.headers["Referer"] = self.gallery_url
|
||||
|
||||
def metadata(self, page):
|
||||
title = text.extr(page, "<title>", "</title>")
|
||||
|
@ -20,7 +20,7 @@ class HentaifoundryExtractor(Extractor):
|
||||
directory_fmt = ("{category}", "{user}")
|
||||
filename_fmt = "{category}_{index}_{title}.{extension}"
|
||||
archive_fmt = "{index}"
|
||||
cookiedomain = "www.hentai-foundry.com"
|
||||
cookies_domain = "www.hentai-foundry.com"
|
||||
root = "https://www.hentai-foundry.com"
|
||||
per_page = 25
|
||||
|
||||
@ -123,14 +123,14 @@ class HentaifoundryExtractor(Extractor):
|
||||
|
||||
def _init_site_filters(self):
|
||||
"""Set site-internal filters to show all images"""
|
||||
if self.session.cookies.get("PHPSESSID", domain=self.cookiedomain):
|
||||
if self.cookies.get("PHPSESSID", domain=self.cookies_domain):
|
||||
return
|
||||
|
||||
url = self.root + "/?enterAgree=1"
|
||||
self.request(url, method="HEAD")
|
||||
|
||||
csrf_token = self.session.cookies.get(
|
||||
"YII_CSRF_TOKEN", domain=self.cookiedomain)
|
||||
csrf_token = self.cookies.get(
|
||||
"YII_CSRF_TOKEN", domain=self.cookies_domain)
|
||||
if not csrf_token:
|
||||
self.log.warning("Unable to update site content filters")
|
||||
return
|
||||
@ -170,6 +170,9 @@ class HentaifoundryUserExtractor(HentaifoundryExtractor):
|
||||
pattern = BASE_PATTERN + r"/user/([^/?#]+)/profile"
|
||||
test = ("https://www.hentai-foundry.com/user/Tenpura/profile",)
|
||||
|
||||
def initialize(self):
|
||||
pass
|
||||
|
||||
def items(self):
|
||||
root = self.root
|
||||
user = "/user/" + self.user
|
||||
|
@ -45,6 +45,15 @@ class HentaifoxGalleryExtractor(HentaifoxBase, GalleryExtractor):
|
||||
"type": "doujinshi",
|
||||
},
|
||||
}),
|
||||
# email-protected title (#4201)
|
||||
("https://hentaifox.com/gallery/35261/", {
|
||||
"keyword": {
|
||||
"gallery_id": 35261,
|
||||
"title": "ManageM@ster!",
|
||||
"artist": ["haritama hiroki"],
|
||||
"group": ["studio n.ball"],
|
||||
},
|
||||
}),
|
||||
)
|
||||
|
||||
def __init__(self, match):
|
||||
@ -65,13 +74,14 @@ class HentaifoxGalleryExtractor(HentaifoxBase, GalleryExtractor):
|
||||
|
||||
return {
|
||||
"gallery_id": text.parse_int(self.gallery_id),
|
||||
"title" : text.unescape(extr("<h1>", "</h1>")),
|
||||
"parody" : split(extr(">Parodies:" , "</ul>")),
|
||||
"characters": split(extr(">Characters:", "</ul>")),
|
||||
"tags" : split(extr(">Tags:" , "</ul>")),
|
||||
"artist" : split(extr(">Artists:" , "</ul>")),
|
||||
"group" : split(extr(">Groups:" , "</ul>")),
|
||||
"type" : text.remove_html(extr(">Category:", "<span")),
|
||||
"title" : text.unescape(extr(
|
||||
'id="gallery_title" value="', '"')),
|
||||
"language" : "English",
|
||||
"lang" : "en",
|
||||
}
|
||||
|
@ -153,7 +153,7 @@ class HiperdexMangaExtractor(HiperdexBase, MangaExtractor):
|
||||
"Accept": "*/*",
|
||||
"X-Requested-With": "XMLHttpRequest",
|
||||
"Origin": self.root,
|
||||
"Referer": self.manga_url,
|
||||
"Referer": "https://" + text.quote(self.manga_url[8:]),
|
||||
}
|
||||
html = self.request(url, method="POST", headers=headers).text
|
||||
|
||||
|
@ -66,12 +66,13 @@ class HitomiGalleryExtractor(GalleryExtractor):
|
||||
)
|
||||
|
||||
def __init__(self, match):
|
||||
gid = match.group(1)
|
||||
url = "https://ltn.hitomi.la/galleries/{}.js".format(gid)
|
||||
self.gid = match.group(1)
|
||||
url = "https://ltn.hitomi.la/galleries/{}.js".format(self.gid)
|
||||
GalleryExtractor.__init__(self, match, url)
|
||||
self.info = None
|
||||
|
||||
def _init(self):
|
||||
self.session.headers["Referer"] = "{}/reader/{}.html".format(
|
||||
self.root, gid)
|
||||
self.root, self.gid)
|
||||
|
||||
def metadata(self, page):
|
||||
self.info = info = util.json_loads(page.partition("=")[2])
|
||||
|
@ -21,9 +21,8 @@ class HotleakExtractor(Extractor):
|
||||
archive_fmt = "{type}_{creator}_{id}"
|
||||
root = "https://hotleak.vip"
|
||||
|
||||
def __init__(self, match):
|
||||
Extractor.__init__(self, match)
|
||||
self.session.headers["Referer"] = self.root
|
||||
def _init(self):
|
||||
self.session.headers["Referer"] = self.root + "/"
|
||||
|
||||
def items(self):
|
||||
for post in self.posts():
|
||||
|
@ -19,9 +19,9 @@ import re
|
||||
class IdolcomplexExtractor(SankakuExtractor):
|
||||
"""Base class for idolcomplex extractors"""
|
||||
category = "idolcomplex"
|
||||
cookienames = ("login", "pass_hash")
|
||||
cookiedomain = "idol.sankakucomplex.com"
|
||||
root = "https://" + cookiedomain
|
||||
cookies_domain = "idol.sankakucomplex.com"
|
||||
cookies_names = ("login", "pass_hash")
|
||||
root = "https://" + cookies_domain
|
||||
request_interval = 5.0
|
||||
|
||||
def __init__(self, match):
|
||||
@ -29,6 +29,8 @@ class IdolcomplexExtractor(SankakuExtractor):
|
||||
self.logged_in = True
|
||||
self.start_page = 1
|
||||
self.start_post = 0
|
||||
|
||||
def _init(self):
|
||||
self.extags = self.config("tags", False)
|
||||
|
||||
def items(self):
|
||||
@ -51,14 +53,14 @@ class IdolcomplexExtractor(SankakuExtractor):
|
||||
"""Return an iterable containing all relevant post ids"""
|
||||
|
||||
def login(self):
|
||||
if self._check_cookies(self.cookienames):
|
||||
if self.cookies_check(self.cookies_names):
|
||||
return
|
||||
|
||||
username, password = self._get_auth_info()
|
||||
if username:
|
||||
cookies = self._login_impl(username, password)
|
||||
self._update_cookies(cookies)
|
||||
else:
|
||||
self.logged_in = False
|
||||
return self.cookies_update(self._login_impl(username, password))
|
||||
|
||||
self.logged_in = False
|
||||
|
||||
@cache(maxage=90*24*3600, keyarg=1)
|
||||
def _login_impl(self, username, password):
|
||||
@ -76,7 +78,7 @@ class IdolcomplexExtractor(SankakuExtractor):
|
||||
if not response.history or response.url != self.root + "/user/home":
|
||||
raise exception.AuthenticationError()
|
||||
cookies = response.history[0].cookies
|
||||
return {c: cookies[c] for c in self.cookienames}
|
||||
return {c: cookies[c] for c in self.cookies_names}
|
||||
|
||||
def _parse_post(self, post_id):
|
||||
"""Extract metadata of a single post"""
|
||||
|
@ -1,6 +1,6 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2014-2022 Mike Fährmann
|
||||
# Copyright 2014-2023 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
@ -21,7 +21,9 @@ class ImagebamExtractor(Extractor):
|
||||
def __init__(self, match):
|
||||
Extractor.__init__(self, match)
|
||||
self.path = match.group(1)
|
||||
self.session.cookies.set("nsfw_inter", "1", domain="www.imagebam.com")
|
||||
|
||||
def _init(self):
|
||||
self.cookies.set("nsfw_inter", "1", domain="www.imagebam.com")
|
||||
|
||||
def _parse_image_page(self, path):
|
||||
page = self.request(self.root + path).text
|
||||
|
@ -31,6 +31,15 @@ class ImagechestGalleryExtractor(GalleryExtractor):
|
||||
"content": "076959e65be30249a2c651fbe6090dc30ba85193",
|
||||
"count": 3
|
||||
}),
|
||||
# "Load More Files" button (#4028)
|
||||
("https://imgchest.com/p/9p4n3q2z7nq", {
|
||||
"pattern": r"https://cdn\.imgchest\.com/files/\w+\.(jpg|png)",
|
||||
"url": "f5674e8ba79d336193c9f698708d9dcc10e78cc7",
|
||||
"count": 52,
|
||||
}),
|
||||
("https://imgchest.com/p/xxxxxxxxxxx", {
|
||||
"exception": exception.NotFoundError,
|
||||
}),
|
||||
)
|
||||
|
||||
def __init__(self, match):
|
||||
@ -38,6 +47,14 @@ class ImagechestGalleryExtractor(GalleryExtractor):
|
||||
url = self.root + "/p/" + self.gallery_id
|
||||
GalleryExtractor.__init__(self, match, url)
|
||||
|
||||
def _init(self):
|
||||
access_token = self.config("access-token")
|
||||
if access_token:
|
||||
self.api = ImagechestAPI(self, access_token)
|
||||
self.gallery_url = None
|
||||
self.metadata = self._metadata_api
|
||||
self.images = self._images_api
|
||||
|
||||
def metadata(self, page):
|
||||
if "Sorry, but the page you requested could not be found." in page:
|
||||
raise exception.NotFoundError("gallery")
|
||||
@ -49,7 +66,84 @@ class ImagechestGalleryExtractor(GalleryExtractor):
|
||||
}
|
||||
|
||||
def images(self, page):
|
||||
if " More Files</button>" in page:
|
||||
url = "{}/p/{}/loadAll".format(self.root, self.gallery_id)
|
||||
headers = {
|
||||
"X-Requested-With": "XMLHttpRequest",
|
||||
"Origin" : self.root,
|
||||
"Referer" : self.gallery_url,
|
||||
}
|
||||
csrf_token = text.extr(page, 'name="csrf-token" content="', '"')
|
||||
data = {"_token": csrf_token}
|
||||
page += self.request(
|
||||
url, method="POST", headers=headers, data=data).text
|
||||
|
||||
return [
|
||||
(url, None)
|
||||
for url in text.extract_iter(page, 'data-url="', '"')
|
||||
]
|
||||
|
||||
def _metadata_api(self, page):
|
||||
post = self.api.post(self.gallery_id)
|
||||
|
||||
post["date"] = text.parse_datetime(
|
||||
post["created"], "%Y-%m-%dT%H:%M:%S.%fZ")
|
||||
for img in post["images"]:
|
||||
img["date"] = text.parse_datetime(
|
||||
img["created"], "%Y-%m-%dT%H:%M:%S.%fZ")
|
||||
|
||||
post["gallery_id"] = self.gallery_id
|
||||
post.pop("image_count", None)
|
||||
self._image_list = post.pop("images")
|
||||
|
||||
return post
|
||||
|
||||
def _images_api(self, page):
|
||||
return [
|
||||
(img["link"], img)
|
||||
for img in self._image_list
|
||||
]
|
||||
|
||||
|
||||
class ImagechestAPI():
|
||||
"""Interface for the Image Chest API
|
||||
|
||||
https://imgchest.com/docs/api/1.0/general/overview
|
||||
"""
|
||||
root = "https://api.imgchest.com"
|
||||
|
||||
def __init__(self, extractor, access_token):
|
||||
self.extractor = extractor
|
||||
self.headers = {"Authorization": "Bearer " + access_token}
|
||||
|
||||
def file(self, file_id):
|
||||
endpoint = "/v1/file/" + file_id
|
||||
return self._call(endpoint)
|
||||
|
||||
def post(self, post_id):
|
||||
endpoint = "/v1/post/" + post_id
|
||||
return self._call(endpoint)
|
||||
|
||||
def user(self, username):
|
||||
endpoint = "/v1/user/" + username
|
||||
return self._call(endpoint)
|
||||
|
||||
def _call(self, endpoint):
|
||||
url = self.root + endpoint
|
||||
|
||||
while True:
|
||||
response = self.extractor.request(
|
||||
url, headers=self.headers, fatal=None, allow_redirects=False)
|
||||
|
||||
if response.status_code < 300:
|
||||
return response.json()["data"]
|
||||
|
||||
elif response.status_code < 400:
|
||||
raise exception.AuthenticationError("Invalid API access token")
|
||||
|
||||
elif response.status_code == 429:
|
||||
self.extractor.wait(seconds=600)
|
||||
|
||||
else:
|
||||
self.extractor.log.debug(response.text)
|
||||
raise exception.StopExtraction("API request failed")
|
||||
|
@ -23,9 +23,8 @@ class ImagefapExtractor(Extractor):
|
||||
archive_fmt = "{gallery_id}_{image_id}"
|
||||
request_interval = (2.0, 4.0)
|
||||
|
||||
def __init__(self, match):
|
||||
Extractor.__init__(self, match)
|
||||
self.session.headers["Referer"] = self.root
|
||||
def _init(self):
|
||||
self.session.headers["Referer"] = self.root + "/"
|
||||
|
||||
def request(self, url, **kwargs):
|
||||
response = Extractor.request(self, url, **kwargs)
|
||||
@ -283,7 +282,7 @@ class ImagefapFolderExtractor(ImagefapExtractor):
|
||||
yield gid, extr("<b>", "<")
|
||||
cnt += 1
|
||||
|
||||
if cnt < 25:
|
||||
if cnt < 20:
|
||||
break
|
||||
params["page"] += 1
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2016-2022 Mike Fährmann
|
||||
# Copyright 2016-2023 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
@ -19,23 +19,23 @@ class ImagehostImageExtractor(Extractor):
|
||||
basecategory = "imagehost"
|
||||
subcategory = "image"
|
||||
archive_fmt = "{token}"
|
||||
https = True
|
||||
params = None
|
||||
cookies = None
|
||||
encoding = None
|
||||
_https = True
|
||||
_params = None
|
||||
_cookies = None
|
||||
_encoding = None
|
||||
|
||||
def __init__(self, match):
|
||||
Extractor.__init__(self, match)
|
||||
self.page_url = "http{}://{}".format(
|
||||
"s" if self.https else "", match.group(1))
|
||||
"s" if self._https else "", match.group(1))
|
||||
self.token = match.group(2)
|
||||
|
||||
if self.params == "simple":
|
||||
self.params = {
|
||||
if self._params == "simple":
|
||||
self._params = {
|
||||
"imgContinue": "Continue+to+image+...+",
|
||||
}
|
||||
elif self.params == "complex":
|
||||
self.params = {
|
||||
elif self._params == "complex":
|
||||
self._params = {
|
||||
"op": "view",
|
||||
"id": self.token,
|
||||
"pre": "1",
|
||||
@ -46,16 +46,16 @@ class ImagehostImageExtractor(Extractor):
|
||||
def items(self):
|
||||
page = self.request(
|
||||
self.page_url,
|
||||
method=("POST" if self.params else "GET"),
|
||||
data=self.params,
|
||||
cookies=self.cookies,
|
||||
encoding=self.encoding,
|
||||
method=("POST" if self._params else "GET"),
|
||||
data=self._params,
|
||||
cookies=self._cookies,
|
||||
encoding=self._encoding,
|
||||
).text
|
||||
|
||||
url, filename = self.get_info(page)
|
||||
data = text.nameext_from_url(filename, {"token": self.token})
|
||||
data.update(self.metadata(page))
|
||||
if self.https and url.startswith("http:"):
|
||||
if self._https and url.startswith("http:"):
|
||||
url = "https:" + url[5:]
|
||||
|
||||
yield Message.Directory, data
|
||||
@ -102,8 +102,8 @@ class ImxtoImageExtractor(ImagehostImageExtractor):
|
||||
"exception": exception.NotFoundError,
|
||||
}),
|
||||
)
|
||||
params = "simple"
|
||||
encoding = "utf-8"
|
||||
_params = "simple"
|
||||
_encoding = "utf-8"
|
||||
|
||||
def __init__(self, match):
|
||||
ImagehostImageExtractor.__init__(self, match)
|
||||
@ -153,8 +153,9 @@ class ImxtoGalleryExtractor(ImagehostImageExtractor):
|
||||
"_extractor": ImxtoImageExtractor,
|
||||
"title": text.unescape(title.partition(">")[2]).strip(),
|
||||
}
|
||||
for url in text.extract_iter(page, '<a href="', '"', pos):
|
||||
yield Message.Queue, url, data
|
||||
|
||||
for url in text.extract_iter(page, "<a href=", " ", pos):
|
||||
yield Message.Queue, url.strip("\"'"), data
|
||||
|
||||
|
||||
class AcidimgImageExtractor(ImagehostImageExtractor):
|
||||
@ -163,17 +164,23 @@ class AcidimgImageExtractor(ImagehostImageExtractor):
|
||||
pattern = r"(?:https?://)?((?:www\.)?acidimg\.cc/img-([a-z0-9]+)\.html)"
|
||||
test = ("https://acidimg.cc/img-5acb6b9de4640.html", {
|
||||
"url": "f132a630006e8d84f52d59555191ed82b3b64c04",
|
||||
"keyword": "a8bb9ab8b2f6844071945d31f8c6e04724051f37",
|
||||
"keyword": "135347ab4345002fc013863c0d9419ba32d98f78",
|
||||
"content": "0c8768055e4e20e7c7259608b67799171b691140",
|
||||
})
|
||||
params = "simple"
|
||||
encoding = "utf-8"
|
||||
_params = "simple"
|
||||
_encoding = "utf-8"
|
||||
|
||||
def get_info(self, page):
|
||||
url, pos = text.extract(page, "<img class='centred' src='", "'")
|
||||
if not url:
|
||||
raise exception.NotFoundError("image")
|
||||
filename, pos = text.extract(page, " alt='", "'", pos)
|
||||
url, pos = text.extract(page, '<img class="centred" src="', '"')
|
||||
if not url:
|
||||
raise exception.NotFoundError("image")
|
||||
|
||||
filename, pos = text.extract(page, "alt='", "'", pos)
|
||||
if not filename:
|
||||
filename, pos = text.extract(page, 'alt="', '"', pos)
|
||||
|
||||
return url, (filename + splitext(url)[1]) if filename else url
|
||||
|
||||
|
||||
@ -225,7 +232,7 @@ class ImagetwistImageExtractor(ImagehostImageExtractor):
|
||||
|
||||
@property
|
||||
@memcache(maxage=3*3600)
|
||||
def cookies(self):
|
||||
def _cookies(self):
|
||||
return self.request(self.page_url).cookies
|
||||
|
||||
def get_info(self, page):
|
||||
@ -263,7 +270,7 @@ class PixhostImageExtractor(ImagehostImageExtractor):
|
||||
"keyword": "3bad6d59db42a5ebbd7842c2307e1c3ebd35e6b0",
|
||||
"content": "0c8768055e4e20e7c7259608b67799171b691140",
|
||||
})
|
||||
cookies = {"pixhostads": "1", "pixhosttest": "1"}
|
||||
_cookies = {"pixhostads": "1", "pixhosttest": "1"}
|
||||
|
||||
def get_info(self, page):
|
||||
url , pos = text.extract(page, "class=\"image-img\" src=\"", "\"")
|
||||
@ -294,19 +301,38 @@ class PostimgImageExtractor(ImagehostImageExtractor):
|
||||
"""Extractor for single images from postimages.org"""
|
||||
category = "postimg"
|
||||
pattern = (r"(?:https?://)?((?:www\.)?(?:postimg|pixxxels)\.(?:cc|org)"
|
||||
r"/(?:image/)?([^/?#]+)/?)")
|
||||
r"/(?!gallery/)(?:image/)?([^/?#]+)/?)")
|
||||
test = ("https://postimg.cc/Wtn2b3hC", {
|
||||
"url": "0794cfda9b8951a8ac3aa692472484200254ab86",
|
||||
"url": "72f3c8b1d6c6601a20ad58f35635494b4891a99e",
|
||||
"keyword": "2d05808d04e4e83e33200db83521af06e3147a84",
|
||||
"content": "cfaa8def53ed1a575e0c665c9d6d8cf2aac7a0ee",
|
||||
})
|
||||
|
||||
def get_info(self, page):
|
||||
url , pos = text.extract(page, 'id="main-image" src="', '"')
|
||||
pos = page.index(' id="download"')
|
||||
url , pos = text.rextract(page, ' href="', '"', pos)
|
||||
filename, pos = text.extract(page, 'class="imagename">', '<', pos)
|
||||
return url, text.unescape(filename)
|
||||
|
||||
|
||||
class PostimgGalleryExtractor(ImagehostImageExtractor):
|
||||
"""Extractor for images galleries from postimages.org"""
|
||||
category = "postimg"
|
||||
subcategory = "gallery"
|
||||
pattern = (r"(?:https?://)?((?:www\.)?(?:postimg|pixxxels)\.(?:cc|org)"
|
||||
r"/(?:gallery/)([^/?#]+)/?)")
|
||||
test = ("https://postimg.cc/gallery/wxpDLgX", {
|
||||
"pattern": PostimgImageExtractor.pattern,
|
||||
"count": 22,
|
||||
})
|
||||
|
||||
def items(self):
|
||||
page = self.request(self.page_url).text
|
||||
data = {"_extractor": PostimgImageExtractor}
|
||||
for url in text.extract_iter(page, ' class="thumb"><a href="', '"'):
|
||||
yield Message.Queue, url, data
|
||||
|
||||
|
||||
class TurboimagehostImageExtractor(ImagehostImageExtractor):
|
||||
"""Extractor for single images from www.turboimagehost.com"""
|
||||
category = "turboimagehost"
|
||||
@ -315,7 +341,7 @@ class TurboimagehostImageExtractor(ImagehostImageExtractor):
|
||||
test = ("https://www.turboimagehost.com/p/39078423/test--.png.html", {
|
||||
"url": "b94de43612318771ced924cb5085976f13b3b90e",
|
||||
"keyword": "704757ca8825f51cec516ec44c1e627c1f2058ca",
|
||||
"content": "0c8768055e4e20e7c7259608b67799171b691140",
|
||||
"content": "f38b54b17cd7462e687b58d83f00fca88b1b105a",
|
||||
})
|
||||
|
||||
def get_info(self, page):
|
||||
@ -346,8 +372,8 @@ class ImgclickImageExtractor(ImagehostImageExtractor):
|
||||
"keyword": "6895256143eab955622fc149aa367777a8815ba3",
|
||||
"content": "0c8768055e4e20e7c7259608b67799171b691140",
|
||||
})
|
||||
https = False
|
||||
params = "complex"
|
||||
_https = False
|
||||
_params = "complex"
|
||||
|
||||
def get_info(self, page):
|
||||
url , pos = text.extract(page, '<br><img src="', '"')
|
||||
|
@ -62,7 +62,7 @@ class ImgbbExtractor(Extractor):
|
||||
def login(self):
|
||||
username, password = self._get_auth_info()
|
||||
if username:
|
||||
self._update_cookies(self._login_impl(username, password))
|
||||
self.cookies_update(self._login_impl(username, password))
|
||||
|
||||
@cache(maxage=360*24*3600, keyarg=1)
|
||||
def _login_impl(self, username, password):
|
||||
@ -82,7 +82,7 @@ class ImgbbExtractor(Extractor):
|
||||
|
||||
if not response.history:
|
||||
raise exception.AuthenticationError()
|
||||
return self.session.cookies
|
||||
return self.cookies
|
||||
|
||||
def _pagination(self, page, endpoint, params):
|
||||
data = None
|
||||
|
@ -22,8 +22,10 @@ class ImgurExtractor(Extractor):
|
||||
|
||||
def __init__(self, match):
|
||||
Extractor.__init__(self, match)
|
||||
self.api = ImgurAPI(self)
|
||||
self.key = match.group(1)
|
||||
|
||||
def _init(self):
|
||||
self.api = ImgurAPI(self)
|
||||
self.mp4 = self.config("mp4", True)
|
||||
|
||||
def _prepare(self, image):
|
||||
@ -47,8 +49,13 @@ class ImgurExtractor(Extractor):
|
||||
image_ex = ImgurImageExtractor
|
||||
|
||||
for item in items:
|
||||
item["_extractor"] = album_ex if item["is_album"] else image_ex
|
||||
yield Message.Queue, item["link"], item
|
||||
if item["is_album"]:
|
||||
url = "https://imgur.com/a/" + item["id"]
|
||||
item["_extractor"] = album_ex
|
||||
else:
|
||||
url = "https://imgur.com/" + item["id"]
|
||||
item["_extractor"] = image_ex
|
||||
yield Message.Queue, url, item
|
||||
|
||||
|
||||
class ImgurImageExtractor(ImgurExtractor):
|
||||
@ -272,7 +279,7 @@ class ImgurUserExtractor(ImgurExtractor):
|
||||
("https://imgur.com/user/Miguenzo", {
|
||||
"range": "1-100",
|
||||
"count": 100,
|
||||
"pattern": r"https?://(i.imgur.com|imgur.com/a)/[\w.]+",
|
||||
"pattern": r"https://imgur\.com(/a)?/\w+$",
|
||||
}),
|
||||
("https://imgur.com/user/Miguenzo/posts"),
|
||||
("https://imgur.com/user/Miguenzo/submitted"),
|
||||
@ -285,17 +292,41 @@ class ImgurUserExtractor(ImgurExtractor):
|
||||
class ImgurFavoriteExtractor(ImgurExtractor):
|
||||
"""Extractor for a user's favorites"""
|
||||
subcategory = "favorite"
|
||||
pattern = BASE_PATTERN + r"/user/([^/?#]+)/favorites"
|
||||
pattern = BASE_PATTERN + r"/user/([^/?#]+)/favorites/?$"
|
||||
test = ("https://imgur.com/user/Miguenzo/favorites", {
|
||||
"range": "1-100",
|
||||
"count": 100,
|
||||
"pattern": r"https?://(i.imgur.com|imgur.com/a)/[\w.]+",
|
||||
"pattern": r"https://imgur\.com(/a)?/\w+$",
|
||||
})
|
||||
|
||||
def items(self):
|
||||
return self._items_queue(self.api.account_favorites(self.key))
|
||||
|
||||
|
||||
class ImgurFavoriteFolderExtractor(ImgurExtractor):
|
||||
"""Extractor for a user's favorites folder"""
|
||||
subcategory = "favorite-folder"
|
||||
pattern = BASE_PATTERN + r"/user/([^/?#]+)/favorites/folder/(\d+)"
|
||||
test = (
|
||||
("https://imgur.com/user/mikf1/favorites/folder/11896757/public", {
|
||||
"pattern": r"https://imgur\.com(/a)?/\w+$",
|
||||
"count": 3,
|
||||
}),
|
||||
("https://imgur.com/user/mikf1/favorites/folder/11896741/private", {
|
||||
"pattern": r"https://imgur\.com(/a)?/\w+$",
|
||||
"count": 5,
|
||||
}),
|
||||
)
|
||||
|
||||
def __init__(self, match):
|
||||
ImgurExtractor.__init__(self, match)
|
||||
self.folder_id = match.group(2)
|
||||
|
||||
def items(self):
|
||||
return self._items_queue(self.api.account_favorites_folder(
|
||||
self.key, self.folder_id))
|
||||
|
||||
|
||||
class ImgurSubredditExtractor(ImgurExtractor):
|
||||
"""Extractor for a subreddits's imgur links"""
|
||||
subcategory = "subreddit"
|
||||
@ -303,7 +334,7 @@ class ImgurSubredditExtractor(ImgurExtractor):
|
||||
test = ("https://imgur.com/r/pics", {
|
||||
"range": "1-100",
|
||||
"count": 100,
|
||||
"pattern": r"https?://(i.imgur.com|imgur.com/a)/[\w.]+",
|
||||
"pattern": r"https://imgur\.com(/a)?/\w+$",
|
||||
})
|
||||
|
||||
def items(self):
|
||||
@ -317,7 +348,7 @@ class ImgurTagExtractor(ImgurExtractor):
|
||||
test = ("https://imgur.com/t/animals", {
|
||||
"range": "1-100",
|
||||
"count": 100,
|
||||
"pattern": r"https?://(i.imgur.com|imgur.com/a)/[\w.]+",
|
||||
"pattern": r"https://imgur\.com(/a)?/\w+$",
|
||||
})
|
||||
|
||||
def items(self):
|
||||
@ -331,7 +362,7 @@ class ImgurSearchExtractor(ImgurExtractor):
|
||||
test = ("https://imgur.com/search?q=cute+cat", {
|
||||
"range": "1-100",
|
||||
"count": 100,
|
||||
"pattern": r"https?://(i.imgur.com|imgur.com/a)/[\w.]+",
|
||||
"pattern": r"https://imgur\.com(/a)?/\w+$",
|
||||
})
|
||||
|
||||
def items(self):
|
||||
@ -346,15 +377,18 @@ class ImgurAPI():
|
||||
"""
|
||||
def __init__(self, extractor):
|
||||
self.extractor = extractor
|
||||
self.headers = {
|
||||
"Authorization": "Client-ID " + (
|
||||
extractor.config("client-id") or "546c25a59c58ad7"),
|
||||
}
|
||||
self.client_id = extractor.config("client-id") or "546c25a59c58ad7"
|
||||
self.headers = {"Authorization": "Client-ID " + self.client_id}
|
||||
|
||||
def account_favorites(self, account):
|
||||
endpoint = "/3/account/{}/gallery_favorites".format(account)
|
||||
return self._pagination(endpoint)
|
||||
|
||||
def account_favorites_folder(self, account, folder_id):
|
||||
endpoint = "/3/account/{}/folders/{}/favorites".format(
|
||||
account, folder_id)
|
||||
return self._pagination_v2(endpoint)
|
||||
|
||||
def gallery_search(self, query):
|
||||
endpoint = "/3/gallery/search"
|
||||
params = {"q": query}
|
||||
@ -386,12 +420,12 @@ class ImgurAPI():
|
||||
endpoint = "/post/v1/posts/" + gallery_hash
|
||||
return self._call(endpoint)
|
||||
|
||||
def _call(self, endpoint, params=None):
|
||||
def _call(self, endpoint, params=None, headers=None):
|
||||
while True:
|
||||
try:
|
||||
return self.extractor.request(
|
||||
"https://api.imgur.com" + endpoint,
|
||||
params=params, headers=self.headers,
|
||||
params=params, headers=(headers or self.headers),
|
||||
).json()
|
||||
except exception.HttpError as exc:
|
||||
if exc.status not in (403, 429) or \
|
||||
@ -410,3 +444,23 @@ class ImgurAPI():
|
||||
return
|
||||
yield from data
|
||||
num += 1
|
||||
|
||||
def _pagination_v2(self, endpoint, params=None, key=None):
|
||||
if params is None:
|
||||
params = {}
|
||||
params["client_id"] = self.client_id
|
||||
params["page"] = 0
|
||||
params["sort"] = "newest"
|
||||
|
||||
headers = {
|
||||
"Referer": "https://imgur.com/",
|
||||
"Origin": "https://imgur.com",
|
||||
}
|
||||
|
||||
while True:
|
||||
data = self._call(endpoint, params, headers)["data"]
|
||||
if not data:
|
||||
return
|
||||
yield from data
|
||||
|
||||
params["page"] += 1
|
||||
|
@ -24,8 +24,7 @@ class InkbunnyExtractor(Extractor):
|
||||
archive_fmt = "{file_id}"
|
||||
root = "https://inkbunny.net"
|
||||
|
||||
def __init__(self, match):
|
||||
Extractor.__init__(self, match)
|
||||
def _init(self):
|
||||
self.api = InkbunnyAPI(self)
|
||||
|
||||
def items(self):
|
||||
|
@ -27,34 +27,41 @@ class InstagramExtractor(Extractor):
|
||||
filename_fmt = "{sidecar_media_id:?/_/}{media_id}.{extension}"
|
||||
archive_fmt = "{media_id}"
|
||||
root = "https://www.instagram.com"
|
||||
cookiedomain = ".instagram.com"
|
||||
cookienames = ("sessionid",)
|
||||
cookies_domain = ".instagram.com"
|
||||
cookies_names = ("sessionid",)
|
||||
request_interval = (6.0, 12.0)
|
||||
|
||||
def __init__(self, match):
|
||||
Extractor.__init__(self, match)
|
||||
self.item = match.group(1)
|
||||
self.api = None
|
||||
|
||||
def _init(self):
|
||||
self.www_claim = "0"
|
||||
self.csrf_token = util.generate_token()
|
||||
self._logged_in = True
|
||||
self._find_tags = re.compile(r"#\w+").findall
|
||||
self._logged_in = True
|
||||
self._cursor = None
|
||||
self._user = None
|
||||
|
||||
def items(self):
|
||||
self.login()
|
||||
self.cookies.set(
|
||||
"csrftoken", self.csrf_token, domain=self.cookies_domain)
|
||||
|
||||
if self.config("api") == "graphql":
|
||||
self.api = InstagramGraphqlAPI(self)
|
||||
else:
|
||||
self.api = InstagramRestAPI(self)
|
||||
|
||||
def items(self):
|
||||
self.login()
|
||||
|
||||
data = self.metadata()
|
||||
videos = self.config("videos", True)
|
||||
previews = self.config("previews", False)
|
||||
video_headers = {"User-Agent": "Mozilla/5.0"}
|
||||
|
||||
order = self.config("order-files")
|
||||
reverse = order[0] in ("r", "d") if order else False
|
||||
|
||||
for post in self.posts():
|
||||
|
||||
if "__typename" in post:
|
||||
@ -71,6 +78,8 @@ class InstagramExtractor(Extractor):
|
||||
|
||||
if "date" in post:
|
||||
del post["date"]
|
||||
if reverse:
|
||||
files.reverse()
|
||||
|
||||
for file in files:
|
||||
file.update(post)
|
||||
@ -126,14 +135,14 @@ class InstagramExtractor(Extractor):
|
||||
return response
|
||||
|
||||
def login(self):
|
||||
if not self._check_cookies(self.cookienames):
|
||||
username, password = self._get_auth_info()
|
||||
if username:
|
||||
self._update_cookies(_login_impl(self, username, password))
|
||||
else:
|
||||
self._logged_in = False
|
||||
self.session.cookies.set(
|
||||
"csrftoken", self.csrf_token, domain=self.cookiedomain)
|
||||
if self.cookies_check(self.cookies_names):
|
||||
return
|
||||
|
||||
username, password = self._get_auth_info()
|
||||
if username:
|
||||
return self.cookies_update(_login_impl(self, username, password))
|
||||
|
||||
self._logged_in = False
|
||||
|
||||
def _parse_post_rest(self, post):
|
||||
if "items" in post: # story or highlight
|
||||
@ -393,6 +402,12 @@ class InstagramUserExtractor(InstagramExtractor):
|
||||
("https://www.instagram.com/id:25025320/"),
|
||||
)
|
||||
|
||||
def initialize(self):
|
||||
pass
|
||||
|
||||
def finalize(self):
|
||||
pass
|
||||
|
||||
def items(self):
|
||||
base = "{}/{}/".format(self.root, self.item)
|
||||
stories = "{}/stories/{}/".format(self.root, self.item)
|
||||
@ -756,10 +771,20 @@ class InstagramRestAPI():
|
||||
endpoint = "/v1/guides/guide/{}/".format(guide_id)
|
||||
return self._pagination_guides(endpoint)
|
||||
|
||||
def highlights_media(self, user_id):
|
||||
chunk_size = 5
|
||||
def highlights_media(self, user_id, chunk_size=5):
|
||||
reel_ids = [hl["id"] for hl in self.highlights_tray(user_id)]
|
||||
|
||||
order = self.extractor.config("order-posts")
|
||||
if order:
|
||||
if order in ("desc", "reverse"):
|
||||
reel_ids.reverse()
|
||||
elif order in ("id", "id_asc"):
|
||||
reel_ids.sort(key=lambda r: int(r[10:]))
|
||||
elif order == "id_desc":
|
||||
reel_ids.sort(key=lambda r: int(r[10:]), reverse=True)
|
||||
elif order != "asc":
|
||||
self.extractor.log.warning("Unknown posts order '%s'", order)
|
||||
|
||||
for offset in range(0, len(reel_ids), chunk_size):
|
||||
yield from self.reels_media(
|
||||
reel_ids[offset : offset+chunk_size])
|
||||
@ -799,13 +824,17 @@ class InstagramRestAPI():
|
||||
params = {"username": screen_name}
|
||||
return self._call(endpoint, params=params)["data"]["user"]
|
||||
|
||||
@memcache(keyarg=1)
|
||||
def user_by_id(self, user_id):
|
||||
endpoint = "/v1/users/{}/info/".format(user_id)
|
||||
return self._call(endpoint)["user"]
|
||||
|
||||
def user_id(self, screen_name, check_private=True):
|
||||
if screen_name.startswith("id:"):
|
||||
if self.extractor.config("metadata"):
|
||||
self.extractor._user = self.user_by_id(screen_name[3:])
|
||||
return screen_name[3:]
|
||||
|
||||
user = self.user_by_name(screen_name)
|
||||
if user is None:
|
||||
raise exception.AuthorizationError(
|
||||
@ -845,7 +874,7 @@ class InstagramRestAPI():
|
||||
|
||||
def user_tagged(self, user_id):
|
||||
endpoint = "/v1/usertags/{}/feed/".format(user_id)
|
||||
params = {"count": 50}
|
||||
params = {"count": 20}
|
||||
return self._pagination(endpoint, params)
|
||||
|
||||
def _call(self, endpoint, **kwargs):
|
||||
|
@ -26,8 +26,10 @@ class ItakuExtractor(Extractor):
|
||||
|
||||
def __init__(self, match):
|
||||
Extractor.__init__(self, match)
|
||||
self.api = ItakuAPI(self)
|
||||
self.item = match.group(1)
|
||||
|
||||
def _init(self):
|
||||
self.api = ItakuAPI(self)
|
||||
self.videos = self.config("videos", True)
|
||||
|
||||
def items(self):
|
||||
|
82
gallery_dl/extractor/itchio.py
Normal file
82
gallery_dl/extractor/itchio.py
Normal file
@ -0,0 +1,82 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2023 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
# published by the Free Software Foundation.
|
||||
|
||||
"""Extractors for https://itch.io/"""
|
||||
|
||||
from .common import Extractor, Message
|
||||
from .. import text
|
||||
|
||||
|
||||
class ItchioGameExtractor(Extractor):
|
||||
"""Extractor for itch.io games"""
|
||||
category = "itchio"
|
||||
subcategory = "game"
|
||||
root = "https://itch.io"
|
||||
directory_fmt = ("{category}", "{user[name]}")
|
||||
filename_fmt = "{game[title]} ({id}).{extension}"
|
||||
archive_fmt = "{id}"
|
||||
pattern = r"(?:https?://)?(\w+).itch\.io/([\w-]+)"
|
||||
test = (
|
||||
("https://sirtartarus.itch.io/a-craft-of-mine", {
|
||||
"pattern": r"https://\w+\.ssl\.hwcdn\.net/upload2"
|
||||
r"/game/1983311/7723751\?",
|
||||
"count": 1,
|
||||
"keyword": {
|
||||
"extension": "",
|
||||
"filename": "7723751",
|
||||
"game": {
|
||||
"id": 1983311,
|
||||
"noun": "game",
|
||||
"title": "A Craft Of Mine",
|
||||
"url": "https://sirtartarus.itch.io/a-craft-of-mine",
|
||||
},
|
||||
"user": {
|
||||
"id": 4060052,
|
||||
"name": "SirTartarus",
|
||||
"url": "https://sirtartarus.itch.io",
|
||||
},
|
||||
},
|
||||
}),
|
||||
)
|
||||
|
||||
def __init__(self, match):
|
||||
self.user, self.slug = match.groups()
|
||||
Extractor.__init__(self, match)
|
||||
|
||||
def items(self):
|
||||
game_url = "https://{}.itch.io/{}".format(self.user, self.slug)
|
||||
page = self.request(game_url).text
|
||||
|
||||
params = {
|
||||
"source": "view_game",
|
||||
"as_props": "1",
|
||||
"after_download_lightbox": "true",
|
||||
}
|
||||
headers = {
|
||||
"Referer": game_url,
|
||||
"X-Requested-With": "XMLHttpRequest",
|
||||
"Origin": "https://{}.itch.io".format(self.user),
|
||||
}
|
||||
data = {
|
||||
"csrf_token": text.unquote(self.cookies["itchio_token"]),
|
||||
}
|
||||
|
||||
for upload_id in text.extract_iter(page, 'data-upload_id="', '"'):
|
||||
file_url = "{}/file/{}".format(game_url, upload_id)
|
||||
info = self.request(file_url, method="POST", params=params,
|
||||
headers=headers, data=data).json()
|
||||
|
||||
game = info["lightbox"]["game"]
|
||||
user = info["lightbox"]["user"]
|
||||
game["url"] = game_url
|
||||
user.pop("follow_button", None)
|
||||
game = {"game": game, "user": user, "id": upload_id}
|
||||
|
||||
url = info["url"]
|
||||
yield Message.Directory, game
|
||||
yield Message.Url, url, text.nameext_from_url(url, game)
|
151
gallery_dl/extractor/jpgfish.py
Normal file
151
gallery_dl/extractor/jpgfish.py
Normal file
@ -0,0 +1,151 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
# published by the Free Software Foundation.
|
||||
|
||||
"""Extractors for https://jpeg.pet/"""
|
||||
|
||||
from .common import Extractor, Message
|
||||
from .. import text
|
||||
|
||||
BASE_PATTERN = r"(?:https?://)?jpe?g\.(?:pet|fish(?:ing)?|church)"
|
||||
|
||||
|
||||
class JpgfishExtractor(Extractor):
|
||||
"""Base class for jpgfish extractors"""
|
||||
category = "jpgfish"
|
||||
root = "https://jpeg.pet"
|
||||
directory_fmt = ("{category}", "{user}", "{album}",)
|
||||
archive_fmt = "{id}"
|
||||
|
||||
def _pagination(self, url):
|
||||
while url:
|
||||
page = self.request(url).text
|
||||
|
||||
for item in text.extract_iter(
|
||||
page, '<div class="list-item-image ', 'image-container'):
|
||||
yield text.extract(item, '<a href="', '"')[0]
|
||||
|
||||
url = text.extract(
|
||||
page, '<a data-pagination="next" href="', '" ><')[0]
|
||||
|
||||
|
||||
class JpgfishImageExtractor(JpgfishExtractor):
|
||||
"""Extractor for jpgfish Images"""
|
||||
subcategory = "image"
|
||||
pattern = BASE_PATTERN + r"/img/((?:[^/?#]+\.)?(\w+))"
|
||||
test = (
|
||||
("https://jpeg.pet/img/funnymeme.LecXGS", {
|
||||
"pattern": r"https://simp3\.jpg\.church/images/funnymeme\.jpg",
|
||||
"content": "098e5e9b17ad634358426e0ffd1c93871474d13c",
|
||||
"keyword": {
|
||||
"album": "",
|
||||
"extension": "jpg",
|
||||
"filename": "funnymeme",
|
||||
"id": "LecXGS",
|
||||
"url": "https://simp3.jpg.church/images/funnymeme.jpg",
|
||||
"user": "exearco",
|
||||
},
|
||||
}),
|
||||
("https://jpg.church/img/auCruA", {
|
||||
"pattern": r"https://simp2\.jpg\.church/hannahowo_00457\.jpg",
|
||||
"keyword": {"album": "401-500"},
|
||||
}),
|
||||
("https://jpg.pet/img/funnymeme.LecXGS"),
|
||||
("https://jpg.fishing/img/funnymeme.LecXGS"),
|
||||
("https://jpg.fish/img/funnymeme.LecXGS"),
|
||||
("https://jpg.church/img/funnymeme.LecXGS"),
|
||||
)
|
||||
|
||||
def __init__(self, match):
|
||||
JpgfishExtractor.__init__(self, match)
|
||||
self.path, self.image_id = match.groups()
|
||||
|
||||
def items(self):
|
||||
url = "{}/img/{}".format(self.root, self.path)
|
||||
extr = text.extract_from(self.request(url).text)
|
||||
|
||||
image = {
|
||||
"id" : self.image_id,
|
||||
"url" : extr('<meta property="og:image" content="', '"'),
|
||||
"album": text.extract(extr(
|
||||
"Added to <a", "/a>"), ">", "<")[0] or "",
|
||||
"user" : extr('username: "', '"'),
|
||||
}
|
||||
|
||||
text.nameext_from_url(image["url"], image)
|
||||
yield Message.Directory, image
|
||||
yield Message.Url, image["url"], image
|
||||
|
||||
|
||||
class JpgfishAlbumExtractor(JpgfishExtractor):
|
||||
"""Extractor for jpgfish Albums"""
|
||||
subcategory = "album"
|
||||
pattern = BASE_PATTERN + r"/a(?:lbum)?/([^/?#]+)(/sub)?"
|
||||
test = (
|
||||
("https://jpeg.pet/album/CDilP/?sort=date_desc&page=1", {
|
||||
"count": 2,
|
||||
}),
|
||||
("https://jpg.fishing/a/gunggingnsk.N9OOI", {
|
||||
"count": 114,
|
||||
}),
|
||||
("https://jpg.fish/a/101-200.aNJ6A/", {
|
||||
"count": 100,
|
||||
}),
|
||||
("https://jpg.church/a/hannahowo.aNTdH/sub", {
|
||||
"count": 606,
|
||||
}),
|
||||
("https://jpg.pet/album/CDilP/?sort=date_desc&page=1"),
|
||||
)
|
||||
|
||||
def __init__(self, match):
|
||||
JpgfishExtractor.__init__(self, match)
|
||||
self.album, self.sub_albums = match.groups()
|
||||
|
||||
def items(self):
|
||||
url = "{}/a/{}".format(self.root, self.album)
|
||||
data = {"_extractor": JpgfishImageExtractor}
|
||||
|
||||
if self.sub_albums:
|
||||
albums = self._pagination(url + "/sub")
|
||||
else:
|
||||
albums = (url,)
|
||||
|
||||
for album in albums:
|
||||
for image in self._pagination(album):
|
||||
yield Message.Queue, image, data
|
||||
|
||||
|
||||
class JpgfishUserExtractor(JpgfishExtractor):
|
||||
"""Extractor for jpgfish Users"""
|
||||
subcategory = "user"
|
||||
pattern = BASE_PATTERN + r"/(?!img|a(?:lbum)?)([^/?#]+)(/albums)?"
|
||||
test = (
|
||||
("https://jpeg.pet/exearco", {
|
||||
"count": 3,
|
||||
}),
|
||||
("https://jpg.church/exearco/albums", {
|
||||
"count": 1,
|
||||
}),
|
||||
("https://jpg.pet/exearco"),
|
||||
("https://jpg.fishing/exearco"),
|
||||
("https://jpg.fish/exearco"),
|
||||
("https://jpg.church/exearco"),
|
||||
)
|
||||
|
||||
def __init__(self, match):
|
||||
JpgfishExtractor.__init__(self, match)
|
||||
self.user, self.albums = match.groups()
|
||||
|
||||
def items(self):
|
||||
url = "{}/{}".format(self.root, self.user)
|
||||
|
||||
if self.albums:
|
||||
url += "/albums"
|
||||
data = {"_extractor": JpgfishAlbumExtractor}
|
||||
else:
|
||||
data = {"_extractor": JpgfishImageExtractor}
|
||||
|
||||
for url in self._pagination(url):
|
||||
yield Message.Queue, url, data
|
94
gallery_dl/extractor/jschan.py
Normal file
94
gallery_dl/extractor/jschan.py
Normal file
@ -0,0 +1,94 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
# published by the Free Software Foundation.
|
||||
|
||||
"""Extractors for jschan Imageboards"""
|
||||
|
||||
from .common import BaseExtractor, Message
|
||||
from .. import text
|
||||
import itertools
|
||||
|
||||
|
||||
class JschanExtractor(BaseExtractor):
|
||||
basecategory = "jschan"
|
||||
|
||||
|
||||
BASE_PATTERN = JschanExtractor.update({
|
||||
"94chan": {
|
||||
"root": "https://94chan.org",
|
||||
"pattern": r"94chan\.org"
|
||||
}
|
||||
})
|
||||
|
||||
|
||||
class JschanThreadExtractor(JschanExtractor):
|
||||
"""Extractor for jschan threads"""
|
||||
subcategory = "thread"
|
||||
directory_fmt = ("{category}", "{board}",
|
||||
"{threadId} {subject|nomarkup[:50]}")
|
||||
filename_fmt = "{postId}{num:?-//} {filename}.{extension}"
|
||||
archive_fmt = "{board}_{postId}_{num}"
|
||||
pattern = BASE_PATTERN + r"/([^/?#]+)/thread/(\d+)\.html"
|
||||
test = (
|
||||
("https://94chan.org/art/thread/25.html", {
|
||||
"pattern": r"https://94chan.org/file/[0-9a-f]{64}(\.\w+)?",
|
||||
"count": ">= 15"
|
||||
})
|
||||
)
|
||||
|
||||
def __init__(self, match):
|
||||
JschanExtractor.__init__(self, match)
|
||||
index = match.lastindex
|
||||
self.board = match.group(index-1)
|
||||
self.thread = match.group(index)
|
||||
|
||||
def items(self):
|
||||
url = "{}/{}/thread/{}.json".format(
|
||||
self.root, self.board, self.thread)
|
||||
thread = self.request(url).json()
|
||||
thread["threadId"] = thread["postId"]
|
||||
posts = thread.pop("replies", ())
|
||||
|
||||
yield Message.Directory, thread
|
||||
for post in itertools.chain((thread,), posts):
|
||||
files = post.pop("files", ())
|
||||
if files:
|
||||
thread.update(post)
|
||||
thread["count"] = len(files)
|
||||
for num, file in enumerate(files):
|
||||
url = self.root + "/file/" + file["filename"]
|
||||
file.update(thread)
|
||||
file["num"] = num
|
||||
file["siteFilename"] = file["filename"]
|
||||
text.nameext_from_url(file["originalFilename"], file)
|
||||
yield Message.Url, url, file
|
||||
|
||||
|
||||
class JschanBoardExtractor(JschanExtractor):
|
||||
"""Extractor for jschan boards"""
|
||||
subcategory = "board"
|
||||
pattern = (BASE_PATTERN + r"/([^/?#]+)"
|
||||
r"(?:/index\.html|/catalog\.html|/\d+\.html|/?$)")
|
||||
test = (
|
||||
("https://94chan.org/art/", {
|
||||
"pattern": JschanThreadExtractor.pattern,
|
||||
"count": ">= 30"
|
||||
}),
|
||||
("https://94chan.org/art/2.html"),
|
||||
("https://94chan.org/art/catalog.html"),
|
||||
("https://94chan.org/art/index.html"),
|
||||
)
|
||||
|
||||
def __init__(self, match):
|
||||
JschanExtractor.__init__(self, match)
|
||||
self.board = match.group(match.lastindex)
|
||||
|
||||
def items(self):
|
||||
url = "{}/{}/catalog.json".format(self.root, self.board)
|
||||
for thread in self.request(url).json():
|
||||
url = "{}/{}/thread/{}.html".format(
|
||||
self.root, self.board, thread["postId"])
|
||||
thread["_extractor"] = JschanThreadExtractor
|
||||
yield Message.Queue, url, thread
|
@ -14,7 +14,7 @@ from ..cache import cache
|
||||
import itertools
|
||||
import re
|
||||
|
||||
BASE_PATTERN = r"(?:https?://)?(?:www\.|beta\.)?(kemono|coomer)\.party"
|
||||
BASE_PATTERN = r"(?:https?://)?(?:www\.|beta\.)?(kemono|coomer)\.(party|su)"
|
||||
USER_PATTERN = BASE_PATTERN + r"/([^/?#]+)/user/([^/?#]+)"
|
||||
HASH_PATTERN = r"/[0-9a-f]{2}/[0-9a-f]{2}/([0-9a-f]{64})"
|
||||
|
||||
@ -26,22 +26,24 @@ class KemonopartyExtractor(Extractor):
|
||||
directory_fmt = ("{category}", "{service}", "{user}")
|
||||
filename_fmt = "{id}_{title}_{num:>02}_{filename[:180]}.{extension}"
|
||||
archive_fmt = "{service}_{user}_{id}_{num}"
|
||||
cookiedomain = ".kemono.party"
|
||||
cookies_domain = ".kemono.party"
|
||||
|
||||
def __init__(self, match):
|
||||
if match.group(1) == "coomer":
|
||||
self.category = "coomerparty"
|
||||
self.cookiedomain = ".coomer.party"
|
||||
domain = match.group(1)
|
||||
tld = match.group(2)
|
||||
self.category = domain + "party"
|
||||
self.root = text.root_from_url(match.group(0))
|
||||
self.cookies_domain = ".{}.{}".format(domain, tld)
|
||||
Extractor.__init__(self, match)
|
||||
|
||||
def _init(self):
|
||||
self.session.headers["Referer"] = self.root + "/"
|
||||
self._prepare_ddosguard_cookies()
|
||||
self._find_inline = re.compile(
|
||||
r'src="(?:https?://(?:kemono|coomer)\.(?:party|su))?(/inline/[^"]+'
|
||||
r'|/[0-9a-f]{2}/[0-9a-f]{2}/[0-9a-f]{64}\.[^"]+)').findall
|
||||
|
||||
def items(self):
|
||||
self._prepare_ddosguard_cookies()
|
||||
|
||||
self._find_inline = re.compile(
|
||||
r'src="(?:https?://(?:kemono|coomer)\.party)?(/inline/[^"]+'
|
||||
r'|/[0-9a-f]{2}/[0-9a-f]{2}/[0-9a-f]{64}\.[^"]+)').findall
|
||||
find_hash = re.compile(HASH_PATTERN).match
|
||||
generators = self._build_file_generators(self.config("files"))
|
||||
duplicates = self.config("duplicates")
|
||||
@ -125,10 +127,12 @@ class KemonopartyExtractor(Extractor):
|
||||
def login(self):
|
||||
username, password = self._get_auth_info()
|
||||
if username:
|
||||
self._update_cookies(self._login_impl(username, password))
|
||||
self.cookies_update(self._login_impl(
|
||||
(username, self.cookies_domain), password))
|
||||
|
||||
@cache(maxage=28*24*3600, keyarg=1)
|
||||
def _login_impl(self, username, password):
|
||||
username = username[0]
|
||||
self.log.info("Logging in as %s", username)
|
||||
|
||||
url = self.root + "/account/login"
|
||||
@ -222,11 +226,12 @@ class KemonopartyUserExtractor(KemonopartyExtractor):
|
||||
"options": (("max-posts", 25),),
|
||||
"count": "< 100",
|
||||
}),
|
||||
("https://kemono.su/subscribestar/user/alcorart"),
|
||||
("https://kemono.party/subscribestar/user/alcorart"),
|
||||
)
|
||||
|
||||
def __init__(self, match):
|
||||
_, service, user_id, offset = match.groups()
|
||||
_, _, service, user_id, offset = match.groups()
|
||||
self.subcategory = service
|
||||
KemonopartyExtractor.__init__(self, match)
|
||||
self.api_url = "{}/api/{}/user/{}".format(self.root, service, user_id)
|
||||
@ -327,13 +332,14 @@ class KemonopartyPostExtractor(KemonopartyExtractor):
|
||||
r"f51c10adc9dabd86e92bd52339f298b9\.txt",
|
||||
"content": "da39a3ee5e6b4b0d3255bfef95601890afd80709", # empty
|
||||
}),
|
||||
("https://kemono.su/subscribestar/user/alcorart/post/184330"),
|
||||
("https://kemono.party/subscribestar/user/alcorart/post/184330"),
|
||||
("https://www.kemono.party/subscribestar/user/alcorart/post/184330"),
|
||||
("https://beta.kemono.party/subscribestar/user/alcorart/post/184330"),
|
||||
)
|
||||
|
||||
def __init__(self, match):
|
||||
_, service, user_id, post_id = match.groups()
|
||||
_, _, service, user_id, post_id = match.groups()
|
||||
self.subcategory = service
|
||||
KemonopartyExtractor.__init__(self, match)
|
||||
self.api_url = "{}/api/{}/user/{}/post/{}".format(
|
||||
@ -359,9 +365,9 @@ class KemonopartyDiscordExtractor(KemonopartyExtractor):
|
||||
"count": 4,
|
||||
"keyword": {"channel_name": "finish-work"},
|
||||
}),
|
||||
(("https://kemono.party/discord"
|
||||
(("https://kemono.su/discord"
|
||||
"/server/256559665620451329/channel/462437519519383555#"), {
|
||||
"pattern": r"https://kemono\.party/data/("
|
||||
"pattern": r"https://kemono\.su/data/("
|
||||
r"e3/77/e377e3525164559484ace2e64425b0cec1db08.*\.png|"
|
||||
r"51/45/51453640a5e0a4d23fbf57fb85390f9c5ec154.*\.gif)",
|
||||
"keyword": {"hash": "re:e377e3525164559484ace2e64425b0cec1db08"
|
||||
@ -380,7 +386,7 @@ class KemonopartyDiscordExtractor(KemonopartyExtractor):
|
||||
|
||||
def __init__(self, match):
|
||||
KemonopartyExtractor.__init__(self, match)
|
||||
_, self.server, self.channel, self.channel_name = match.groups()
|
||||
_, _, self.server, self.channel, self.channel_name = match.groups()
|
||||
|
||||
def items(self):
|
||||
self._prepare_ddosguard_cookies()
|
||||
@ -455,14 +461,20 @@ class KemonopartyDiscordExtractor(KemonopartyExtractor):
|
||||
class KemonopartyDiscordServerExtractor(KemonopartyExtractor):
|
||||
subcategory = "discord-server"
|
||||
pattern = BASE_PATTERN + r"/discord/server/(\d+)$"
|
||||
test = ("https://kemono.party/discord/server/488668827274444803", {
|
||||
"pattern": KemonopartyDiscordExtractor.pattern,
|
||||
"count": 13,
|
||||
})
|
||||
test = (
|
||||
("https://kemono.party/discord/server/488668827274444803", {
|
||||
"pattern": KemonopartyDiscordExtractor.pattern,
|
||||
"count": 13,
|
||||
}),
|
||||
("https://kemono.su/discord/server/488668827274444803", {
|
||||
"pattern": KemonopartyDiscordExtractor.pattern,
|
||||
"count": 13,
|
||||
}),
|
||||
)
|
||||
|
||||
def __init__(self, match):
|
||||
KemonopartyExtractor.__init__(self, match)
|
||||
self.server = match.group(2)
|
||||
self.server = match.group(3)
|
||||
|
||||
def items(self):
|
||||
url = "{}/api/discord/channels/lookup?q={}".format(
|
||||
@ -491,11 +503,16 @@ class KemonopartyFavoriteExtractor(KemonopartyExtractor):
|
||||
"url": "ecfccf5f0d50b8d14caa7bbdcf071de5c1e5b90f",
|
||||
"count": 3,
|
||||
}),
|
||||
("https://kemono.su/favorites?type=post", {
|
||||
"pattern": KemonopartyPostExtractor.pattern,
|
||||
"url": "4be8e84cb384a907a8e7997baaf6287b451783b5",
|
||||
"count": 3,
|
||||
}),
|
||||
)
|
||||
|
||||
def __init__(self, match):
|
||||
KemonopartyExtractor.__init__(self, match)
|
||||
self.favorites = (text.parse_query(match.group(2)).get("type") or
|
||||
self.favorites = (text.parse_query(match.group(3)).get("type") or
|
||||
self.config("favorites") or
|
||||
"artist")
|
||||
|
||||
|
161
gallery_dl/extractor/lensdump.py
Normal file
161
gallery_dl/extractor/lensdump.py
Normal file
@ -0,0 +1,161 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
# published by the Free Software Foundation.
|
||||
|
||||
"""Extractors for https://lensdump.com/"""
|
||||
|
||||
from .common import GalleryExtractor, Extractor, Message
|
||||
from .. import text, util
|
||||
|
||||
BASE_PATTERN = r"(?:https?://)?lensdump\.com"
|
||||
|
||||
|
||||
class LensdumpBase():
|
||||
"""Base class for lensdump extractors"""
|
||||
category = "lensdump"
|
||||
root = "https://lensdump.com"
|
||||
|
||||
def nodes(self, page=None):
|
||||
if page is None:
|
||||
page = self.request(self.url).text
|
||||
|
||||
# go through all pages starting from the oldest
|
||||
page_url = text.urljoin(self.root, text.extr(
|
||||
text.extr(page, ' id="list-most-oldest-link"', '>'),
|
||||
'href="', '"'))
|
||||
while page_url is not None:
|
||||
if page_url == self.url:
|
||||
current_page = page
|
||||
else:
|
||||
current_page = self.request(page_url).text
|
||||
|
||||
for node in text.extract_iter(
|
||||
current_page, ' class="list-item ', '>'):
|
||||
yield node
|
||||
|
||||
# find url of next page
|
||||
page_url = text.extr(
|
||||
text.extr(current_page, ' data-pagination="next"', '>'),
|
||||
'href="', '"')
|
||||
if page_url is not None and len(page_url) > 0:
|
||||
page_url = text.urljoin(self.root, page_url)
|
||||
else:
|
||||
page_url = None
|
||||
|
||||
|
||||
class LensdumpAlbumExtractor(LensdumpBase, GalleryExtractor):
|
||||
subcategory = "album"
|
||||
pattern = BASE_PATTERN + r"/(?:((?!\w+/albums|a/|i/)\w+)|a/(\w+))"
|
||||
test = (
|
||||
("https://lensdump.com/a/1IhJr", {
|
||||
"pattern": r"https://[abcd]\.l3n\.co/i/tq\w{4}\.png",
|
||||
"keyword": {
|
||||
"extension": "png",
|
||||
"name": str,
|
||||
"num": int,
|
||||
"title": str,
|
||||
"url": str,
|
||||
"width": int,
|
||||
},
|
||||
}),
|
||||
)
|
||||
|
||||
def __init__(self, match):
|
||||
GalleryExtractor.__init__(self, match, match.string)
|
||||
self.gallery_id = match.group(1) or match.group(2)
|
||||
|
||||
def metadata(self, page):
|
||||
return {
|
||||
"gallery_id": self.gallery_id,
|
||||
"title": text.unescape(text.extr(
|
||||
page, 'property="og:title" content="', '"').strip())
|
||||
}
|
||||
|
||||
def images(self, page):
|
||||
for node in self.nodes(page):
|
||||
# get urls and filenames of images in current page
|
||||
json_data = util.json_loads(text.unquote(
|
||||
text.extr(node, "data-object='", "'") or
|
||||
text.extr(node, 'data-object="', '"')))
|
||||
image_id = json_data.get('name')
|
||||
image_url = json_data.get('url')
|
||||
image_title = json_data.get('title')
|
||||
if image_title is not None:
|
||||
image_title = text.unescape(image_title)
|
||||
yield (image_url, {
|
||||
'id': image_id,
|
||||
'url': image_url,
|
||||
'title': image_title,
|
||||
'name': json_data.get('filename'),
|
||||
'filename': image_id,
|
||||
'extension': json_data.get('extension'),
|
||||
'height': text.parse_int(json_data.get('height')),
|
||||
'width': text.parse_int(json_data.get('width')),
|
||||
})
|
||||
|
||||
|
||||
class LensdumpAlbumsExtractor(LensdumpBase, Extractor):
|
||||
"""Extractor for album list from lensdump.com"""
|
||||
subcategory = "albums"
|
||||
pattern = BASE_PATTERN + r"/\w+/albums"
|
||||
test = ("https://lensdump.com/vstar925/albums",)
|
||||
|
||||
def items(self):
|
||||
for node in self.nodes():
|
||||
album_url = text.urljoin(self.root, text.extr(
|
||||
node, 'data-url-short="', '"'))
|
||||
yield Message.Queue, album_url, {
|
||||
"_extractor": LensdumpAlbumExtractor}
|
||||
|
||||
|
||||
class LensdumpImageExtractor(LensdumpBase, Extractor):
|
||||
"""Extractor for individual images on lensdump.com"""
|
||||
subcategory = "image"
|
||||
filename_fmt = "{category}_{id}{title:?_//}.{extension}"
|
||||
directory_fmt = ("{category}",)
|
||||
archive_fmt = "{id}"
|
||||
pattern = BASE_PATTERN + r"/i/(\w+)"
|
||||
test = (
|
||||
("https://lensdump.com/i/tyoAyM", {
|
||||
"pattern": r"https://c\.l3n\.co/i/tyoAyM\.webp",
|
||||
"content": "1aa749ed2c0cf679ec8e1df60068edaf3875de46",
|
||||
"keyword": {
|
||||
"date": "dt:2022-08-01 08:24:28",
|
||||
"extension": "webp",
|
||||
"filename": "tyoAyM",
|
||||
"height": 400,
|
||||
"id": "tyoAyM",
|
||||
"title": "MYOBI clovis bookcaseset",
|
||||
"url": "https://c.l3n.co/i/tyoAyM.webp",
|
||||
"width": 620,
|
||||
},
|
||||
}),
|
||||
)
|
||||
|
||||
def __init__(self, match):
|
||||
Extractor.__init__(self, match)
|
||||
self.key = match.group(1)
|
||||
|
||||
def items(self):
|
||||
url = "{}/i/{}".format(self.root, self.key)
|
||||
extr = text.extract_from(self.request(url).text)
|
||||
|
||||
data = {
|
||||
"id" : self.key,
|
||||
"title" : text.unescape(extr(
|
||||
'property="og:title" content="', '"')),
|
||||
"url" : extr(
|
||||
'property="og:image" content="', '"'),
|
||||
"width" : text.parse_int(extr(
|
||||
'property="image:width" content="', '"')),
|
||||
"height": text.parse_int(extr(
|
||||
'property="image:height" content="', '"')),
|
||||
"date" : text.parse_datetime(extr(
|
||||
'<span title="', '"'), "%Y-%m-%d %H:%M:%S"),
|
||||
}
|
||||
|
||||
text.nameext_from_url(data["url"], data)
|
||||
yield Message.Directory, data
|
||||
yield Message.Url, data["url"], data
|
@ -1,73 +0,0 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2019-2020 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
# published by the Free Software Foundation.
|
||||
|
||||
"""Extractors for https://www.lineblog.me/"""
|
||||
|
||||
from .livedoor import LivedoorBlogExtractor, LivedoorPostExtractor
|
||||
from .. import text
|
||||
|
||||
|
||||
class LineblogBase():
|
||||
"""Base class for lineblog extractors"""
|
||||
category = "lineblog"
|
||||
root = "https://lineblog.me"
|
||||
|
||||
def _images(self, post):
|
||||
imgs = []
|
||||
body = post.pop("body")
|
||||
|
||||
for num, img in enumerate(text.extract_iter(body, "<img ", ">"), 1):
|
||||
src = text.extr(img, 'src="', '"')
|
||||
alt = text.extr(img, 'alt="', '"')
|
||||
|
||||
if not src:
|
||||
continue
|
||||
if src.startswith("https://obs.line-scdn.") and src.count("/") > 3:
|
||||
src = src.rpartition("/")[0]
|
||||
|
||||
imgs.append(text.nameext_from_url(alt or src, {
|
||||
"url" : src,
|
||||
"num" : num,
|
||||
"hash": src.rpartition("/")[2],
|
||||
"post": post,
|
||||
}))
|
||||
|
||||
return imgs
|
||||
|
||||
|
||||
class LineblogBlogExtractor(LineblogBase, LivedoorBlogExtractor):
|
||||
"""Extractor for a user's blog on lineblog.me"""
|
||||
pattern = r"(?:https?://)?lineblog\.me/(\w+)/?(?:$|[?#])"
|
||||
test = ("https://lineblog.me/mamoru_miyano/", {
|
||||
"range": "1-20",
|
||||
"count": 20,
|
||||
"pattern": r"https://obs.line-scdn.net/[\w-]+$",
|
||||
"keyword": {
|
||||
"post": {
|
||||
"categories" : tuple,
|
||||
"date" : "type:datetime",
|
||||
"description": str,
|
||||
"id" : int,
|
||||
"tags" : list,
|
||||
"title" : str,
|
||||
"user" : "mamoru_miyano"
|
||||
},
|
||||
"filename": str,
|
||||
"hash" : r"re:\w{32,}",
|
||||
"num" : int,
|
||||
},
|
||||
})
|
||||
|
||||
|
||||
class LineblogPostExtractor(LineblogBase, LivedoorPostExtractor):
|
||||
"""Extractor for blog posts on lineblog.me"""
|
||||
pattern = r"(?:https?://)?lineblog\.me/(\w+)/archives/(\d+)"
|
||||
test = ("https://lineblog.me/mamoru_miyano/archives/1919150.html", {
|
||||
"url": "24afeb4044c554f80c374b52bf8109c6f1c0c757",
|
||||
"keyword": "76a38e2c0074926bd3362f66f9fc0e6c41591dcb",
|
||||
})
|
@ -46,9 +46,10 @@ class LolisafeAlbumExtractor(LolisafeExtractor):
|
||||
LolisafeExtractor.__init__(self, match)
|
||||
self.album_id = match.group(match.lastindex)
|
||||
|
||||
def _init(self):
|
||||
domain = self.config("domain")
|
||||
if domain == "auto":
|
||||
self.root = text.root_from_url(match.group(0))
|
||||
self.root = text.root_from_url(self.url)
|
||||
elif domain:
|
||||
self.root = text.ensure_http_scheme(domain)
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2016-2022 Mike Fährmann
|
||||
# Copyright 2016-2023 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
@ -15,7 +15,7 @@ from .. import text, exception
|
||||
class LusciousExtractor(Extractor):
|
||||
"""Base class for luscious extractors"""
|
||||
category = "luscious"
|
||||
cookiedomain = ".luscious.net"
|
||||
cookies_domain = ".luscious.net"
|
||||
root = "https://members.luscious.net"
|
||||
|
||||
def _graphql(self, op, variables, query):
|
||||
@ -118,6 +118,8 @@ class LusciousAlbumExtractor(LusciousExtractor):
|
||||
def __init__(self, match):
|
||||
LusciousExtractor.__init__(self, match)
|
||||
self.album_id = match.group(1)
|
||||
|
||||
def _init(self):
|
||||
self.gif = self.config("gif", False)
|
||||
|
||||
def items(self):
|
||||
|
@ -30,9 +30,11 @@ class MangadexExtractor(Extractor):
|
||||
|
||||
def __init__(self, match):
|
||||
Extractor.__init__(self, match)
|
||||
self.uuid = match.group(1)
|
||||
|
||||
def _init(self):
|
||||
self.session.headers["User-Agent"] = util.USERAGENT
|
||||
self.api = MangadexAPI(self)
|
||||
self.uuid = match.group(1)
|
||||
|
||||
def items(self):
|
||||
for chapter in self.chapters():
|
||||
@ -85,6 +87,10 @@ class MangadexExtractor(Extractor):
|
||||
data["group"] = [group["attributes"]["name"]
|
||||
for group in relationships["scanlation_group"]]
|
||||
|
||||
data["status"] = mattributes["status"]
|
||||
data["tags"] = [tag["attributes"]["name"]["en"]
|
||||
for tag in mattributes["tags"]]
|
||||
|
||||
return data
|
||||
|
||||
|
||||
@ -94,13 +100,13 @@ class MangadexChapterExtractor(MangadexExtractor):
|
||||
pattern = BASE_PATTERN + r"/chapter/([0-9a-f-]+)"
|
||||
test = (
|
||||
("https://mangadex.org/chapter/f946ac53-0b71-4b5d-aeb2-7931b13c4aaa", {
|
||||
"keyword": "86fb262cf767dac6d965cd904ad499adba466404",
|
||||
"keyword": "e86128a79ebe7201b648f1caa828496a2878dc8f",
|
||||
# "content": "50383a4c15124682057b197d40261641a98db514",
|
||||
}),
|
||||
# oneshot
|
||||
("https://mangadex.org/chapter/61a88817-9c29-4281-bdf1-77b3c1be9831", {
|
||||
"count": 64,
|
||||
"keyword": "6abcbe1e24eeb1049dc931958853cd767ee483fb",
|
||||
"keyword": "d11ed057a919854696853362be35fc0ba7dded4c",
|
||||
}),
|
||||
# MANGA Plus (#1154)
|
||||
("https://mangadex.org/chapter/74149a55-e7c4-44ea-8a37-98e879c1096f", {
|
||||
@ -144,6 +150,7 @@ class MangadexMangaExtractor(MangadexExtractor):
|
||||
pattern = BASE_PATTERN + r"/(?:title|manga)/(?!feed$)([0-9a-f-]+)"
|
||||
test = (
|
||||
("https://mangadex.org/title/f90c4398-8aad-4f51-8a1f-024ca09fdcbc", {
|
||||
"count": ">= 5",
|
||||
"keyword": {
|
||||
"manga" : "Souten no Koumori",
|
||||
"manga_id": "f90c4398-8aad-4f51-8a1f-024ca09fdcbc",
|
||||
@ -157,6 +164,19 @@ class MangadexMangaExtractor(MangadexExtractor):
|
||||
"language": str,
|
||||
"artist" : ["Arakawa Hiromu"],
|
||||
"author" : ["Arakawa Hiromu"],
|
||||
"status" : "completed",
|
||||
"tags" : ["Oneshot", "Historical", "Action",
|
||||
"Martial Arts", "Drama", "Tragedy"],
|
||||
},
|
||||
}),
|
||||
# mutliple values for 'lang' (#4093)
|
||||
("https://mangadex.org/title/f90c4398-8aad-4f51-8a1f-024ca09fdcbc", {
|
||||
"options": (("lang", "fr,it"),),
|
||||
"count": 2,
|
||||
"keyword": {
|
||||
"manga" : "Souten no Koumori",
|
||||
"lang" : "re:fr|it",
|
||||
"language": "re:French|Italian",
|
||||
},
|
||||
}),
|
||||
("https://mangadex.cc/manga/d0c88e3b-ea64-4e07-9841-c1d2ac982f4a/", {
|
||||
@ -186,13 +206,16 @@ class MangadexFeedExtractor(MangadexExtractor):
|
||||
|
||||
|
||||
class MangadexAPI():
|
||||
"""Interface for the MangaDex API v5"""
|
||||
"""Interface for the MangaDex API v5
|
||||
|
||||
https://api.mangadex.org/docs/
|
||||
"""
|
||||
|
||||
def __init__(self, extr):
|
||||
self.extractor = extr
|
||||
self.headers = {}
|
||||
|
||||
self.username, self.password = self.extractor._get_auth_info()
|
||||
self.username, self.password = extr._get_auth_info()
|
||||
if not self.username:
|
||||
self.authenticate = util.noop
|
||||
|
||||
@ -278,9 +301,13 @@ class MangadexAPI():
|
||||
if ratings is None:
|
||||
ratings = ("safe", "suggestive", "erotica", "pornographic")
|
||||
|
||||
lang = config("lang")
|
||||
if isinstance(lang, str) and "," in lang:
|
||||
lang = lang.split(",")
|
||||
|
||||
params["contentRating[]"] = ratings
|
||||
params["translatedLanguage[]"] = lang
|
||||
params["includes[]"] = ("scanlation_group",)
|
||||
params["translatedLanguage[]"] = config("lang")
|
||||
params["offset"] = 0
|
||||
|
||||
api_params = config("api-parameters")
|
||||
|
@ -1,6 +1,6 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2017-2022 Mike Fährmann
|
||||
# Copyright 2017-2023 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
@ -33,6 +33,8 @@ class MangafoxChapterExtractor(ChapterExtractor):
|
||||
base, self.cstr, self.volume, self.chapter, self.minor = match.groups()
|
||||
self.urlbase = self.root + base
|
||||
ChapterExtractor.__init__(self, match, self.urlbase + "/1.html")
|
||||
|
||||
def _init(self):
|
||||
self.session.headers["Referer"] = self.root + "/"
|
||||
|
||||
def metadata(self, page):
|
||||
|
@ -1,6 +1,6 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2015-2022 Mike Fährmann
|
||||
# Copyright 2015-2023 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
@ -42,6 +42,8 @@ class MangahereChapterExtractor(MangahereBase, ChapterExtractor):
|
||||
self.part, self.volume, self.chapter = match.groups()
|
||||
url = self.url_fmt.format(self.part, 1)
|
||||
ChapterExtractor.__init__(self, match, url)
|
||||
|
||||
def _init(self):
|
||||
self.session.headers["Referer"] = self.root_mobile + "/"
|
||||
|
||||
def metadata(self, page):
|
||||
@ -112,9 +114,8 @@ class MangahereMangaExtractor(MangahereBase, MangaExtractor):
|
||||
("https://m.mangahere.co/manga/aria/"),
|
||||
)
|
||||
|
||||
def __init__(self, match):
|
||||
MangaExtractor.__init__(self, match)
|
||||
self.session.cookies.set("isAdult", "1", domain="www.mangahere.cc")
|
||||
def _init(self):
|
||||
self.cookies.set("isAdult", "1", domain="www.mangahere.cc")
|
||||
|
||||
def chapters(self, page):
|
||||
results = []
|
||||
|
@ -1,7 +1,7 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2020 Jake Mannens
|
||||
# Copyright 2021-2022 Mike Fährmann
|
||||
# Copyright 2021-2023 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
@ -39,7 +39,9 @@ class MangakakalotChapterExtractor(MangakakalotBase, ChapterExtractor):
|
||||
def __init__(self, match):
|
||||
self.path = match.group(1)
|
||||
ChapterExtractor.__init__(self, match, self.root + self.path)
|
||||
self.session.headers['Referer'] = self.root
|
||||
|
||||
def _init(self):
|
||||
self.session.headers['Referer'] = self.root + "/"
|
||||
|
||||
def metadata(self, page):
|
||||
_ , pos = text.extract(page, '<span itemprop="title">', '<')
|
||||
|
@ -16,21 +16,28 @@ BASE_PATTERN = r"(?:https?://)?((?:chap|read|www\.|m\.)?mangan(?:at|el)o\.com)"
|
||||
class ManganeloBase():
|
||||
category = "manganelo"
|
||||
root = "https://chapmanganato.com"
|
||||
_match_chapter = None
|
||||
|
||||
def __init__(self, match):
|
||||
domain, path = match.groups()
|
||||
super().__init__(match, "https://" + domain + path)
|
||||
self.session.headers['Referer'] = self.root
|
||||
|
||||
self._match_chapter = re.compile(
|
||||
r"(?:[Vv]ol\.?\s*(\d+)\s?)?"
|
||||
r"[Cc]hapter\s*([^:]+)"
|
||||
r"(?::\s*(.+))?").match
|
||||
def _init(self):
|
||||
self.session.headers['Referer'] = self.root + "/"
|
||||
|
||||
if self._match_chapter is None:
|
||||
ManganeloBase._match_chapter = re.compile(
|
||||
r"(?:[Vv]ol\.?\s*(\d+)\s?)?"
|
||||
r"[Cc]hapter\s*(\d+)([^:]*)"
|
||||
r"(?::\s*(.+))?").match
|
||||
|
||||
def _parse_chapter(self, info, manga, author, date=None):
|
||||
match = self._match_chapter(info)
|
||||
volume, chapter, title = match.groups() if match else ("", "", info)
|
||||
chapter, sep, minor = chapter.partition(".")
|
||||
if match:
|
||||
volume, chapter, minor, title = match.groups()
|
||||
else:
|
||||
volume = chapter = minor = ""
|
||||
title = info
|
||||
|
||||
return {
|
||||
"manga" : manga,
|
||||
@ -39,7 +46,7 @@ class ManganeloBase():
|
||||
"title" : text.unescape(title) if title else "",
|
||||
"volume" : text.parse_int(volume),
|
||||
"chapter" : text.parse_int(chapter),
|
||||
"chapter_minor": sep + minor,
|
||||
"chapter_minor": minor,
|
||||
"lang" : "en",
|
||||
"language" : "English",
|
||||
}
|
||||
@ -61,6 +68,10 @@ class ManganeloChapterExtractor(ManganeloBase, ChapterExtractor):
|
||||
"keyword": "06e01fa9b3fc9b5b954c0d4a98f0153b40922ded",
|
||||
"count": 45,
|
||||
}),
|
||||
("https://chapmanganato.com/manga-no991297/chapter-8", {
|
||||
"keyword": {"chapter": 8, "chapter_minor": "-1"},
|
||||
"count": 20,
|
||||
}),
|
||||
("https://readmanganato.com/manga-gn983696/chapter-23"),
|
||||
("https://manganelo.com/chapter/gamers/chapter_15"),
|
||||
("https://manganelo.com/chapter/gq921227/chapter_23"),
|
||||
|
@ -8,155 +8,464 @@
|
||||
|
||||
"""Extractors for https://mangapark.net/"""
|
||||
|
||||
from .common import ChapterExtractor, MangaExtractor
|
||||
from .common import ChapterExtractor, Extractor, Message
|
||||
from .. import text, util, exception
|
||||
import re
|
||||
|
||||
BASE_PATTERN = r"(?:https?://)?(?:www\.)?mangapark\.(?:net|com|org|io|me)"
|
||||
|
||||
|
||||
class MangaparkBase():
|
||||
"""Base class for mangapark extractors"""
|
||||
category = "mangapark"
|
||||
root_fmt = "https://v2.mangapark.{}"
|
||||
browser = "firefox"
|
||||
_match_title = None
|
||||
|
||||
@staticmethod
|
||||
def parse_chapter_path(path, data):
|
||||
"""Get volume/chapter information from url-path of a chapter"""
|
||||
data["volume"], data["chapter_minor"] = 0, ""
|
||||
for part in path.split("/")[1:]:
|
||||
key, value = part[0], part[1:]
|
||||
if key == "c":
|
||||
chapter, dot, minor = value.partition(".")
|
||||
data["chapter"] = text.parse_int(chapter)
|
||||
data["chapter_minor"] = dot + minor
|
||||
elif key == "i":
|
||||
data["chapter_id"] = text.parse_int(value)
|
||||
elif key == "v":
|
||||
data["volume"] = text.parse_int(value)
|
||||
elif key == "s":
|
||||
data["stream"] = text.parse_int(value)
|
||||
elif key == "e":
|
||||
data["chapter_minor"] = "v" + value
|
||||
|
||||
@staticmethod
|
||||
def parse_chapter_title(title, data):
|
||||
match = re.search(r"(?i)(?:vol(?:ume)?[ .]*(\d+) )?"
|
||||
r"ch(?:apter)?[ .]*(\d+)(\.\w+)?", title)
|
||||
if match:
|
||||
vol, ch, data["chapter_minor"] = match.groups()
|
||||
data["volume"] = text.parse_int(vol)
|
||||
data["chapter"] = text.parse_int(ch)
|
||||
def _parse_chapter_title(self, title):
|
||||
if not self._match_title:
|
||||
MangaparkBase._match_title = re.compile(
|
||||
r"(?i)"
|
||||
r"(?:vol(?:\.|ume)?\s*(\d+)\s*)?"
|
||||
r"ch(?:\.|apter)?\s*(\d+)([^\s:]*)"
|
||||
r"(?:\s*:\s*(.*))?"
|
||||
).match
|
||||
match = self._match_title(title)
|
||||
return match.groups() if match else (0, 0, "", "")
|
||||
|
||||
|
||||
class MangaparkChapterExtractor(MangaparkBase, ChapterExtractor):
|
||||
"""Extractor for manga-chapters from mangapark.net"""
|
||||
pattern = (r"(?:https?://)?(?:www\.|v2\.)?mangapark\.(me|net|com)"
|
||||
r"/manga/([^?#]+/i\d+)")
|
||||
pattern = BASE_PATTERN + r"/title/[^/?#]+/(\d+)"
|
||||
test = (
|
||||
("https://mangapark.net/manga/gosu/i811653/c055/1", {
|
||||
"count": 50,
|
||||
"keyword": "db1ed9af4f972756a25dbfa5af69a8f155b043ff",
|
||||
("https://mangapark.net/title/114972-aria/6710214-en-ch.60.2", {
|
||||
"count": 70,
|
||||
"pattern": r"https://[\w-]+\.mpcdn\.org/comic/2002/e67"
|
||||
r"/61e29278a583b9227964076e/\d+_\d+_\d+_\d+\.jpeg"
|
||||
r"\?acc=[^&#]+&exp=\d+",
|
||||
"keyword": {
|
||||
"artist": [],
|
||||
"author": ["Amano Kozue"],
|
||||
"chapter": 60,
|
||||
"chapter_id": 6710214,
|
||||
"chapter_minor": ".2",
|
||||
"count": 70,
|
||||
"date": "dt:2022-01-15 09:25:03",
|
||||
"extension": "jpeg",
|
||||
"filename": str,
|
||||
"genre": ["adventure", "comedy", "drama", "sci_fi",
|
||||
"shounen", "slice_of_life"],
|
||||
"lang": "en",
|
||||
"language": "English",
|
||||
"manga": "Aria",
|
||||
"manga_id": 114972,
|
||||
"page": int,
|
||||
"source": "Koala",
|
||||
"title": "Special Navigation - Aquaria Ii",
|
||||
"volume": 12,
|
||||
},
|
||||
}),
|
||||
(("https://mangapark.net/manga"
|
||||
"/ad-astra-per-aspera-hata-kenjirou/i662051/c001.2/1"), {
|
||||
"count": 40,
|
||||
"keyword": "2bb3a8f426383ea13f17ff5582f3070d096d30ac",
|
||||
}),
|
||||
(("https://mangapark.net/manga"
|
||||
"/gekkan-shoujo-nozaki-kun/i2067426/v7/c70/1"), {
|
||||
"count": 15,
|
||||
"keyword": "edc14993c4752cee3a76e09b2f024d40d854bfd1",
|
||||
}),
|
||||
("https://mangapark.me/manga/gosu/i811615/c55/1"),
|
||||
("https://mangapark.com/manga/gosu/i811615/c55/1"),
|
||||
("https://mangapark.com/title/114972-aria/6710214-en-ch.60.2"),
|
||||
("https://mangapark.org/title/114972-aria/6710214-en-ch.60.2"),
|
||||
("https://mangapark.io/title/114972-aria/6710214-en-ch.60.2"),
|
||||
("https://mangapark.me/title/114972-aria/6710214-en-ch.60.2"),
|
||||
)
|
||||
|
||||
def __init__(self, match):
|
||||
tld, self.path = match.groups()
|
||||
self.root = self.root_fmt.format(tld)
|
||||
url = "{}/manga/{}?zoom=2".format(self.root, self.path)
|
||||
self.root = text.root_from_url(match.group(0))
|
||||
url = "{}/title/_/{}".format(self.root, match.group(1))
|
||||
ChapterExtractor.__init__(self, match, url)
|
||||
|
||||
def metadata(self, page):
|
||||
data = text.extract_all(page, (
|
||||
("manga_id" , "var _manga_id = '", "'"),
|
||||
("chapter_id", "var _book_id = '", "'"),
|
||||
("stream" , "var _stream = '", "'"),
|
||||
("path" , "var _book_link = '", "'"),
|
||||
("manga" , "<h2>", "</h2>"),
|
||||
("title" , "</a>", "<"),
|
||||
), values={"lang": "en", "language": "English"})[0]
|
||||
data = util.json_loads(text.extr(
|
||||
page, 'id="__NEXT_DATA__" type="application/json">', '<'))
|
||||
chapter = (data["props"]["pageProps"]["dehydratedState"]
|
||||
["queries"][0]["state"]["data"]["data"])
|
||||
manga = chapter["comicNode"]["data"]
|
||||
source = chapter["sourceNode"]["data"]
|
||||
|
||||
if not data["path"]:
|
||||
raise exception.NotFoundError("chapter")
|
||||
self._urls = chapter["imageSet"]["httpLis"]
|
||||
self._params = chapter["imageSet"]["wordLis"]
|
||||
vol, ch, minor, title = self._parse_chapter_title(chapter["dname"])
|
||||
|
||||
self.parse_chapter_path(data["path"], data)
|
||||
if "chapter" not in data:
|
||||
self.parse_chapter_title(data["title"], data)
|
||||
|
||||
data["manga"], _, data["type"] = data["manga"].rpartition(" ")
|
||||
data["manga"] = text.unescape(data["manga"])
|
||||
data["title"] = data["title"].partition(": ")[2]
|
||||
for key in ("manga_id", "chapter_id", "stream"):
|
||||
data[key] = text.parse_int(data[key])
|
||||
|
||||
return data
|
||||
return {
|
||||
"manga" : manga["name"],
|
||||
"manga_id" : manga["id"],
|
||||
"artist" : source["artists"],
|
||||
"author" : source["authors"],
|
||||
"genre" : source["genres"],
|
||||
"volume" : text.parse_int(vol),
|
||||
"chapter" : text.parse_int(ch),
|
||||
"chapter_minor": minor,
|
||||
"chapter_id": chapter["id"],
|
||||
"title" : chapter["title"] or title or "",
|
||||
"lang" : chapter["lang"],
|
||||
"language" : util.code_to_language(chapter["lang"]),
|
||||
"source" : source["srcTitle"],
|
||||
"source_id" : source["id"],
|
||||
"date" : text.parse_timestamp(chapter["dateCreate"] // 1000),
|
||||
}
|
||||
|
||||
def images(self, page):
|
||||
data = util.json_loads(text.extr(page, "var _load_pages =", ";"))
|
||||
return [
|
||||
(text.urljoin(self.root, item["u"]), {
|
||||
"width": text.parse_int(item["w"]),
|
||||
"height": text.parse_int(item["h"]),
|
||||
})
|
||||
for item in data
|
||||
(url + "?" + params, None)
|
||||
for url, params in zip(self._urls, self._params)
|
||||
]
|
||||
|
||||
|
||||
class MangaparkMangaExtractor(MangaparkBase, MangaExtractor):
|
||||
class MangaparkMangaExtractor(MangaparkBase, Extractor):
|
||||
"""Extractor for manga from mangapark.net"""
|
||||
chapterclass = MangaparkChapterExtractor
|
||||
pattern = (r"(?:https?://)?(?:www\.|v2\.)?mangapark\.(me|net|com)"
|
||||
r"(/manga/[^/?#]+)/?$")
|
||||
subcategory = "manga"
|
||||
pattern = BASE_PATTERN + r"/title/(\d+)(?:-[^/?#]*)?/?$"
|
||||
test = (
|
||||
("https://mangapark.net/manga/aria", {
|
||||
"url": "51c6d82aed5c3c78e0d3f980b09a998e6a2a83ee",
|
||||
"keyword": "cabc60cf2efa82749d27ac92c495945961e4b73c",
|
||||
("https://mangapark.net/title/114972-aria", {
|
||||
"count": 141,
|
||||
"pattern": MangaparkChapterExtractor.pattern,
|
||||
"keyword": {
|
||||
"chapter": int,
|
||||
"chapter_id": int,
|
||||
"chapter_minor": str,
|
||||
"date": "type:datetime",
|
||||
"lang": "en",
|
||||
"language": "English",
|
||||
"manga_id": 114972,
|
||||
"source": "re:Horse|Koala",
|
||||
"source_id": int,
|
||||
"title": str,
|
||||
"volume": int,
|
||||
},
|
||||
}),
|
||||
("https://mangapark.me/manga/aria"),
|
||||
("https://mangapark.com/manga/aria"),
|
||||
# 'source' option
|
||||
("https://mangapark.net/title/114972-aria", {
|
||||
"options": (("source", "koala"),),
|
||||
"count": 70,
|
||||
"pattern": MangaparkChapterExtractor.pattern,
|
||||
"keyword": {
|
||||
"source": "Koala",
|
||||
"source_id": 15150116,
|
||||
},
|
||||
}),
|
||||
("https://mangapark.com/title/114972-"),
|
||||
("https://mangapark.com/title/114972"),
|
||||
("https://mangapark.com/title/114972-aria"),
|
||||
("https://mangapark.org/title/114972-aria"),
|
||||
("https://mangapark.io/title/114972-aria"),
|
||||
("https://mangapark.me/title/114972-aria"),
|
||||
)
|
||||
|
||||
def __init__(self, match):
|
||||
self.root = self.root_fmt.format(match.group(1))
|
||||
MangaExtractor.__init__(self, match, self.root + match.group(2))
|
||||
self.root = text.root_from_url(match.group(0))
|
||||
self.manga_id = int(match.group(1))
|
||||
Extractor.__init__(self, match)
|
||||
|
||||
def chapters(self, page):
|
||||
results = []
|
||||
data = {"lang": "en", "language": "English"}
|
||||
data["manga"] = text.unescape(
|
||||
text.extr(page, '<title>', ' Manga - '))
|
||||
def items(self):
|
||||
for chapter in self.chapters():
|
||||
chapter = chapter["data"]
|
||||
url = self.root + chapter["urlPath"]
|
||||
|
||||
for stream in page.split('<div id="stream_')[1:]:
|
||||
data["stream"] = text.parse_int(text.extr(stream, '', '"'))
|
||||
vol, ch, minor, title = self._parse_chapter_title(chapter["dname"])
|
||||
data = {
|
||||
"manga_id" : self.manga_id,
|
||||
"volume" : text.parse_int(vol),
|
||||
"chapter" : text.parse_int(ch),
|
||||
"chapter_minor": minor,
|
||||
"chapter_id": chapter["id"],
|
||||
"title" : chapter["title"] or title or "",
|
||||
"lang" : chapter["lang"],
|
||||
"language" : util.code_to_language(chapter["lang"]),
|
||||
"source" : chapter["srcTitle"],
|
||||
"source_id" : chapter["sourceId"],
|
||||
"date" : text.parse_timestamp(
|
||||
chapter["dateCreate"] // 1000),
|
||||
"_extractor": MangaparkChapterExtractor,
|
||||
}
|
||||
yield Message.Queue, url, data
|
||||
|
||||
for chapter in text.extract_iter(stream, '<li ', '</li>'):
|
||||
path , pos = text.extract(chapter, 'href="', '"')
|
||||
title1, pos = text.extract(chapter, '>', '<', pos)
|
||||
title2, pos = text.extract(chapter, '>: </span>', '<', pos)
|
||||
count , pos = text.extract(chapter, ' of ', ' ', pos)
|
||||
def chapters(self):
|
||||
source = self.config("source")
|
||||
if not source:
|
||||
return self.chapters_all()
|
||||
|
||||
self.parse_chapter_path(path[8:], data)
|
||||
if "chapter" not in data:
|
||||
self.parse_chapter_title(title1, data)
|
||||
source_id = self._select_source(source)
|
||||
self.log.debug("Requesting chapters for source_id %s", source_id)
|
||||
return self.chapters_source(source_id)
|
||||
|
||||
if title2:
|
||||
data["title"] = title2.strip()
|
||||
else:
|
||||
data["title"] = title1.partition(":")[2].strip()
|
||||
def chapters_all(self):
|
||||
pnum = 0
|
||||
variables = {
|
||||
"select": {
|
||||
"comicId": self.manga_id,
|
||||
"range" : None,
|
||||
"isAsc" : not self.config("chapter-reverse"),
|
||||
}
|
||||
}
|
||||
|
||||
data["count"] = text.parse_int(count)
|
||||
results.append((self.root + path, data.copy()))
|
||||
data.pop("chapter", None)
|
||||
while True:
|
||||
data = self._request_graphql(
|
||||
"get_content_comicChapterRangeList", variables)
|
||||
|
||||
return results
|
||||
for item in data["items"]:
|
||||
yield from item["chapterNodes"]
|
||||
|
||||
if not pnum:
|
||||
pager = data["pager"]
|
||||
pnum += 1
|
||||
|
||||
try:
|
||||
variables["select"]["range"] = pager[pnum]
|
||||
except IndexError:
|
||||
return
|
||||
|
||||
def chapters_source(self, source_id):
|
||||
variables = {
|
||||
"sourceId": source_id,
|
||||
}
|
||||
chapters = self._request_graphql(
|
||||
"get_content_source_chapterList", variables)
|
||||
|
||||
if self.config("chapter-reverse"):
|
||||
chapters.reverse()
|
||||
return chapters
|
||||
|
||||
def _select_source(self, source):
|
||||
if isinstance(source, int):
|
||||
return source
|
||||
|
||||
group, _, lang = source.partition(":")
|
||||
group = group.lower()
|
||||
|
||||
variables = {
|
||||
"comicId" : self.manga_id,
|
||||
"dbStatuss" : ["normal"],
|
||||
"haveChapter": True,
|
||||
}
|
||||
for item in self._request_graphql(
|
||||
"get_content_comic_sources", variables):
|
||||
data = item["data"]
|
||||
if (not group or data["srcTitle"].lower() == group) and (
|
||||
not lang or data["lang"] == lang):
|
||||
return data["id"]
|
||||
|
||||
raise exception.StopExtraction(
|
||||
"'%s' does not match any available source", source)
|
||||
|
||||
def _request_graphql(self, opname, variables):
|
||||
url = self.root + "/apo/"
|
||||
data = {
|
||||
"query" : QUERIES[opname],
|
||||
"variables" : util.json_dumps(variables),
|
||||
"operationName": opname,
|
||||
}
|
||||
return self.request(
|
||||
url, method="POST", json=data).json()["data"][opname]
|
||||
|
||||
|
||||
QUERIES = {
|
||||
"get_content_comicChapterRangeList": """
|
||||
query get_content_comicChapterRangeList($select: Content_ComicChapterRangeList_Select) {
|
||||
get_content_comicChapterRangeList(
|
||||
select: $select
|
||||
) {
|
||||
reqRange{x y}
|
||||
missing
|
||||
pager {x y}
|
||||
items{
|
||||
serial
|
||||
chapterNodes {
|
||||
|
||||
id
|
||||
data {
|
||||
|
||||
|
||||
id
|
||||
sourceId
|
||||
|
||||
dbStatus
|
||||
isNormal
|
||||
isHidden
|
||||
isDeleted
|
||||
isFinal
|
||||
|
||||
dateCreate
|
||||
datePublic
|
||||
dateModify
|
||||
lang
|
||||
volume
|
||||
serial
|
||||
dname
|
||||
title
|
||||
urlPath
|
||||
|
||||
srcTitle srcColor
|
||||
|
||||
count_images
|
||||
|
||||
stat_count_post_child
|
||||
stat_count_post_reply
|
||||
stat_count_views_login
|
||||
stat_count_views_guest
|
||||
|
||||
userId
|
||||
userNode {
|
||||
|
||||
id
|
||||
data {
|
||||
|
||||
id
|
||||
name
|
||||
uniq
|
||||
avatarUrl
|
||||
urlPath
|
||||
|
||||
verified
|
||||
deleted
|
||||
banned
|
||||
|
||||
dateCreate
|
||||
dateOnline
|
||||
|
||||
stat_count_chapters_normal
|
||||
stat_count_chapters_others
|
||||
|
||||
is_adm is_mod is_vip is_upr
|
||||
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
disqusId
|
||||
|
||||
|
||||
}
|
||||
|
||||
sser_read
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
""",
|
||||
|
||||
"get_content_source_chapterList": """
|
||||
query get_content_source_chapterList($sourceId: Int!) {
|
||||
get_content_source_chapterList(
|
||||
sourceId: $sourceId
|
||||
) {
|
||||
|
||||
id
|
||||
data {
|
||||
|
||||
|
||||
id
|
||||
sourceId
|
||||
|
||||
dbStatus
|
||||
isNormal
|
||||
isHidden
|
||||
isDeleted
|
||||
isFinal
|
||||
|
||||
dateCreate
|
||||
datePublic
|
||||
dateModify
|
||||
lang
|
||||
volume
|
||||
serial
|
||||
dname
|
||||
title
|
||||
urlPath
|
||||
|
||||
srcTitle srcColor
|
||||
|
||||
count_images
|
||||
|
||||
stat_count_post_child
|
||||
stat_count_post_reply
|
||||
stat_count_views_login
|
||||
stat_count_views_guest
|
||||
|
||||
userId
|
||||
userNode {
|
||||
|
||||
id
|
||||
data {
|
||||
|
||||
id
|
||||
name
|
||||
uniq
|
||||
avatarUrl
|
||||
urlPath
|
||||
|
||||
verified
|
||||
deleted
|
||||
banned
|
||||
|
||||
dateCreate
|
||||
dateOnline
|
||||
|
||||
stat_count_chapters_normal
|
||||
stat_count_chapters_others
|
||||
|
||||
is_adm is_mod is_vip is_upr
|
||||
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
disqusId
|
||||
|
||||
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
""",
|
||||
|
||||
"get_content_comic_sources": """
|
||||
query get_content_comic_sources($comicId: Int!, $dbStatuss: [String] = [], $userId: Int, $haveChapter: Boolean, $sortFor: String) {
|
||||
get_content_comic_sources(
|
||||
comicId: $comicId
|
||||
dbStatuss: $dbStatuss
|
||||
userId: $userId
|
||||
haveChapter: $haveChapter
|
||||
sortFor: $sortFor
|
||||
) {
|
||||
|
||||
id
|
||||
data{
|
||||
|
||||
id
|
||||
|
||||
dbStatus
|
||||
isNormal
|
||||
isHidden
|
||||
isDeleted
|
||||
|
||||
lang name altNames authors artists
|
||||
|
||||
release
|
||||
genres summary{code} extraInfo{code}
|
||||
|
||||
urlCover600
|
||||
urlCover300
|
||||
urlCoverOri
|
||||
|
||||
srcTitle srcColor
|
||||
|
||||
chapterCount
|
||||
chapterNode_last {
|
||||
id
|
||||
data {
|
||||
dateCreate datePublic dateModify
|
||||
volume serial
|
||||
dname title
|
||||
urlPath
|
||||
userNode {
|
||||
id data {uniq name}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
""",
|
||||
}
|
||||
|
192
gallery_dl/extractor/mangaread.py
Normal file
192
gallery_dl/extractor/mangaread.py
Normal file
@ -0,0 +1,192 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
# published by the Free Software Foundation.
|
||||
|
||||
"""Extractors for https://mangaread.org/"""
|
||||
|
||||
from .common import ChapterExtractor, MangaExtractor
|
||||
from .. import text, exception
|
||||
import re
|
||||
|
||||
|
||||
class MangareadBase():
|
||||
"""Base class for Mangaread extractors"""
|
||||
category = "mangaread"
|
||||
root = "https://www.mangaread.org"
|
||||
|
||||
@staticmethod
|
||||
def parse_chapter_string(chapter_string, data):
|
||||
match = re.match(
|
||||
r"(?:(.+)\s*-\s*)?[Cc]hapter\s*(\d+)(\.\d+)?(?:\s*-\s*(.+))?",
|
||||
text.unescape(chapter_string).strip())
|
||||
manga, chapter, minor, title = match.groups()
|
||||
manga = manga.strip() if manga else ""
|
||||
data["manga"] = data.pop("manga", manga)
|
||||
data["chapter"] = text.parse_int(chapter)
|
||||
data["chapter_minor"] = minor or ""
|
||||
data["title"] = title or ""
|
||||
data["lang"] = "en"
|
||||
data["language"] = "English"
|
||||
|
||||
|
||||
class MangareadChapterExtractor(MangareadBase, ChapterExtractor):
|
||||
"""Extractor for manga-chapters from mangaread.org"""
|
||||
pattern = (r"(?:https?://)?(?:www\.)?mangaread\.org"
|
||||
r"(/manga/[^/?#]+/[^/?#]+)")
|
||||
test = (
|
||||
("https://www.mangaread.org/manga/one-piece/chapter-1053-3/", {
|
||||
"pattern": (r"https://www\.mangaread\.org/wp-content/uploads"
|
||||
r"/WP-manga/data/manga_[^/]+/[^/]+/[^.]+\.\w+"),
|
||||
"count": 11,
|
||||
"keyword": {
|
||||
"manga" : "One Piece",
|
||||
"title" : "",
|
||||
"chapter" : 1053,
|
||||
"chapter_minor": ".3",
|
||||
"tags" : ["Oda Eiichiro"],
|
||||
"lang" : "en",
|
||||
"language": "English",
|
||||
}
|
||||
}),
|
||||
("https://www.mangaread.org/manga/one-piece/chapter-1000000/", {
|
||||
"exception": exception.NotFoundError,
|
||||
}),
|
||||
(("https://www.mangaread.org"
|
||||
"/manga/kanan-sama-wa-akumade-choroi/chapter-10/"), {
|
||||
"pattern": (r"https://www\.mangaread\.org/wp-content/uploads"
|
||||
r"/WP-manga/data/manga_[^/]+/[^/]+/[^.]+\.\w+"),
|
||||
"count": 9,
|
||||
"keyword": {
|
||||
"manga" : "Kanan-sama wa Akumade Choroi",
|
||||
"title" : "",
|
||||
"chapter" : 10,
|
||||
"chapter_minor": "",
|
||||
"tags" : list,
|
||||
"lang" : "en",
|
||||
"language": "English",
|
||||
}
|
||||
}),
|
||||
# 'Chapter146.5'
|
||||
# ^^ no whitespace
|
||||
("https://www.mangaread.org/manga/above-all-gods/chapter146-5/", {
|
||||
"pattern": (r"https://www\.mangaread\.org/wp-content/uploads"
|
||||
r"/WP-manga/data/manga_[^/]+/[^/]+/[^.]+\.\w+"),
|
||||
"count": 6,
|
||||
"keyword": {
|
||||
"manga" : "Above All Gods",
|
||||
"title" : "",
|
||||
"chapter" : 146,
|
||||
"chapter_minor": ".5",
|
||||
"tags" : list,
|
||||
"lang" : "en",
|
||||
"language": "English",
|
||||
}
|
||||
}),
|
||||
)
|
||||
|
||||
def metadata(self, page):
|
||||
tags = text.extr(page, 'class="wp-manga-tags-list">', '</div>')
|
||||
data = {"tags": list(text.split_html(tags)[::2])}
|
||||
info = text.extr(page, '<h1 id="chapter-heading">', "</h1>")
|
||||
if not info:
|
||||
raise exception.NotFoundError("chapter")
|
||||
self.parse_chapter_string(info, data)
|
||||
return data
|
||||
|
||||
def images(self, page):
|
||||
page = text.extr(
|
||||
page, '<div class="reading-content">', '<div class="entry-header')
|
||||
return [
|
||||
(url.strip(), None)
|
||||
for url in text.extract_iter(page, 'data-src="', '"')
|
||||
]
|
||||
|
||||
|
||||
class MangareadMangaExtractor(MangareadBase, MangaExtractor):
|
||||
"""Extractor for manga from mangaread.org"""
|
||||
chapterclass = MangareadChapterExtractor
|
||||
pattern = r"(?:https?://)?(?:www\.)?mangaread\.org(/manga/[^/?#]+)/?$"
|
||||
test = (
|
||||
("https://www.mangaread.org/manga/kanan-sama-wa-akumade-choroi", {
|
||||
"pattern": (r"https://www\.mangaread\.org/manga"
|
||||
r"/kanan-sama-wa-akumade-choroi"
|
||||
r"/chapter-\d+(-.+)?/"),
|
||||
"count" : ">= 13",
|
||||
"keyword": {
|
||||
"manga" : "Kanan-sama wa Akumade Choroi",
|
||||
"author" : ["nonco"],
|
||||
"artist" : ["nonco"],
|
||||
"type" : "Manga",
|
||||
"genres" : ["Comedy", "Romance", "Shounen", "Supernatural"],
|
||||
"rating" : float,
|
||||
"release": 2022,
|
||||
"status" : "OnGoing",
|
||||
"lang" : "en",
|
||||
"language" : "English",
|
||||
"manga_alt" : list,
|
||||
"description": str,
|
||||
}
|
||||
}),
|
||||
("https://www.mangaread.org/manga/one-piece", {
|
||||
"pattern": (r"https://www\.mangaread\.org/manga"
|
||||
r"/one-piece/chapter-\d+(-.+)?/"),
|
||||
"count" : ">= 1066",
|
||||
"keyword": {
|
||||
"manga" : "One Piece",
|
||||
"author" : ["Oda Eiichiro"],
|
||||
"artist" : ["Oda Eiichiro"],
|
||||
"type" : "Manga",
|
||||
"genres" : list,
|
||||
"rating" : float,
|
||||
"release": 1997,
|
||||
"status" : "OnGoing",
|
||||
"lang" : "en",
|
||||
"language" : "English",
|
||||
"manga_alt" : ["One Piece"],
|
||||
"description": str,
|
||||
}
|
||||
}),
|
||||
("https://www.mangaread.org/manga/doesnotexist", {
|
||||
"exception": exception.HttpError,
|
||||
}),
|
||||
)
|
||||
|
||||
def chapters(self, page):
|
||||
if 'class="error404' in page:
|
||||
raise exception.NotFoundError("manga")
|
||||
data = self.metadata(page)
|
||||
result = []
|
||||
for chapter in text.extract_iter(
|
||||
page, '<li class="wp-manga-chapter', "</li>"):
|
||||
url , pos = text.extract(chapter, '<a href="', '"')
|
||||
info, _ = text.extract(chapter, ">", "</a>", pos)
|
||||
self.parse_chapter_string(info, data)
|
||||
result.append((url, data.copy()))
|
||||
return result
|
||||
|
||||
def metadata(self, page):
|
||||
extr = text.extract_from(text.extr(
|
||||
page, 'class="summary_content">', 'class="manga-action"'))
|
||||
return {
|
||||
"manga" : text.extr(page, "<h1>", "</h1>").strip(),
|
||||
"description": text.unescape(text.remove_html(text.extract(
|
||||
page, ">", "</div>", page.index("summary__content"))[0])),
|
||||
"rating" : text.parse_float(
|
||||
extr('total_votes">', "</span>").strip()),
|
||||
"manga_alt" : text.remove_html(
|
||||
extr("Alternative </h5>\n</div>", "</div>")).split("; "),
|
||||
"author" : list(text.extract_iter(
|
||||
extr('class="author-content">', "</div>"), '"tag">', "</a>")),
|
||||
"artist" : list(text.extract_iter(
|
||||
extr('class="artist-content">', "</div>"), '"tag">', "</a>")),
|
||||
"genres" : list(text.extract_iter(
|
||||
extr('class="genres-content">', "</div>"), '"tag">', "</a>")),
|
||||
"type" : text.remove_html(
|
||||
extr("Type </h5>\n</div>", "</div>")),
|
||||
"release" : text.parse_int(text.remove_html(
|
||||
extr("Release </h5>\n</div>", "</div>"))),
|
||||
"status" : text.remove_html(
|
||||
extr("Status </h5>\n</div>", "</div>")),
|
||||
}
|
@ -90,10 +90,12 @@ class MangaseeChapterExtractor(MangaseeBase, ChapterExtractor):
|
||||
self.category = "mangalife"
|
||||
self.root = "https://manga4life.com"
|
||||
ChapterExtractor.__init__(self, match, self.root + match.group(2))
|
||||
|
||||
def _init(self):
|
||||
self.session.headers["Referer"] = self.gallery_url
|
||||
|
||||
domain = self.root.rpartition("/")[2]
|
||||
cookies = self.session.cookies
|
||||
cookies = self.cookies
|
||||
if not cookies.get("PHPSESSID", domain=domain):
|
||||
cookies.set("PHPSESSID", util.generate_token(13), domain=domain)
|
||||
|
||||
|
@ -19,14 +19,14 @@ class MangoxoExtractor(Extractor):
|
||||
"""Base class for mangoxo extractors"""
|
||||
category = "mangoxo"
|
||||
root = "https://www.mangoxo.com"
|
||||
cookiedomain = "www.mangoxo.com"
|
||||
cookienames = ("SESSION",)
|
||||
cookies_domain = "www.mangoxo.com"
|
||||
cookies_names = ("SESSION",)
|
||||
_warning = True
|
||||
|
||||
def login(self):
|
||||
username, password = self._get_auth_info()
|
||||
if username:
|
||||
self._update_cookies(self._login_impl(username, password))
|
||||
self.cookies_update(self._login_impl(username, password))
|
||||
elif MangoxoExtractor._warning:
|
||||
MangoxoExtractor._warning = False
|
||||
self.log.warning("Unauthenticated users cannot see "
|
||||
@ -51,7 +51,7 @@ class MangoxoExtractor(Extractor):
|
||||
data = response.json()
|
||||
if str(data.get("result")) != "1":
|
||||
raise exception.AuthenticationError(data.get("msg"))
|
||||
return {"SESSION": self.session.cookies.get("SESSION")}
|
||||
return {"SESSION": self.cookies.get("SESSION")}
|
||||
|
||||
@staticmethod
|
||||
def _sign_by_md5(username, password, token):
|
||||
|
@ -19,12 +19,14 @@ class MastodonExtractor(BaseExtractor):
|
||||
directory_fmt = ("mastodon", "{instance}", "{account[username]}")
|
||||
filename_fmt = "{category}_{id}_{media[id]}.{extension}"
|
||||
archive_fmt = "{media[id]}"
|
||||
cookiedomain = None
|
||||
cookies_domain = None
|
||||
|
||||
def __init__(self, match):
|
||||
BaseExtractor.__init__(self, match)
|
||||
self.instance = self.root.partition("://")[2]
|
||||
self.item = match.group(match.lastindex)
|
||||
|
||||
def _init(self):
|
||||
self.instance = self.root.partition("://")[2]
|
||||
self.reblogs = self.config("reblogs", False)
|
||||
self.replies = self.config("replies", True)
|
||||
|
||||
|
@ -1,120 +0,0 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2022 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
# published by the Free Software Foundation.
|
||||
|
||||
"""Extractors for https://meme.museum/"""
|
||||
|
||||
from .common import Extractor, Message
|
||||
from .. import text
|
||||
|
||||
|
||||
class MememuseumExtractor(Extractor):
|
||||
"""Base class for meme.museum extractors"""
|
||||
basecategory = "booru"
|
||||
category = "mememuseum"
|
||||
filename_fmt = "{category}_{id}_{md5}.{extension}"
|
||||
archive_fmt = "{id}"
|
||||
root = "https://meme.museum"
|
||||
|
||||
def items(self):
|
||||
data = self.metadata()
|
||||
|
||||
for post in self.posts():
|
||||
url = post["file_url"]
|
||||
for key in ("id", "width", "height"):
|
||||
post[key] = text.parse_int(post[key])
|
||||
post["tags"] = text.unquote(post["tags"])
|
||||
post.update(data)
|
||||
yield Message.Directory, post
|
||||
yield Message.Url, url, text.nameext_from_url(url, post)
|
||||
|
||||
def metadata(self):
|
||||
"""Return general metadata"""
|
||||
return ()
|
||||
|
||||
def posts(self):
|
||||
"""Return an iterable containing data of all relevant posts"""
|
||||
return ()
|
||||
|
||||
|
||||
class MememuseumTagExtractor(MememuseumExtractor):
|
||||
"""Extractor for images from meme.museum by search-tags"""
|
||||
subcategory = "tag"
|
||||
directory_fmt = ("{category}", "{search_tags}")
|
||||
pattern = r"(?:https?://)?meme\.museum/post/list/([^/?#]+)"
|
||||
test = ("https://meme.museum/post/list/animated/1", {
|
||||
"pattern": r"https://meme\.museum/_images/\w+/\d+%20-%20",
|
||||
"count": ">= 30"
|
||||
})
|
||||
per_page = 25
|
||||
|
||||
def __init__(self, match):
|
||||
MememuseumExtractor.__init__(self, match)
|
||||
self.tags = text.unquote(match.group(1))
|
||||
|
||||
def metadata(self):
|
||||
return {"search_tags": self.tags}
|
||||
|
||||
def posts(self):
|
||||
pnum = 1
|
||||
while True:
|
||||
url = "{}/post/list/{}/{}".format(self.root, self.tags, pnum)
|
||||
extr = text.extract_from(self.request(url).text)
|
||||
|
||||
while True:
|
||||
mime = extr("data-mime='", "'")
|
||||
if not mime:
|
||||
break
|
||||
|
||||
pid = extr("data-post-id='", "'")
|
||||
tags, dimensions, size = extr("title='", "'").split(" // ")
|
||||
md5 = extr("/_thumbs/", "/")
|
||||
width, _, height = dimensions.partition("x")
|
||||
|
||||
yield {
|
||||
"file_url": "{}/_images/{}/{}%20-%20{}.{}".format(
|
||||
self.root, md5, pid, text.quote(tags),
|
||||
mime.rpartition("/")[2]),
|
||||
"id": pid, "md5": md5, "tags": tags,
|
||||
"width": width, "height": height,
|
||||
"size": text.parse_bytes(size[:-1]),
|
||||
}
|
||||
|
||||
if not extr(">Next<", ">"):
|
||||
return
|
||||
pnum += 1
|
||||
|
||||
|
||||
class MememuseumPostExtractor(MememuseumExtractor):
|
||||
"""Extractor for single images from meme.museum"""
|
||||
subcategory = "post"
|
||||
pattern = r"(?:https?://)?meme\.museum/post/view/(\d+)"
|
||||
test = ("https://meme.museum/post/view/10243", {
|
||||
"pattern": r"https://meme\.museum/_images/105febebcd5ca791ee332adc4997"
|
||||
r"1f78/10243%20-%20g%20beard%20open_source%20richard_stallm"
|
||||
r"an%20stallman%20tagme%20text\.jpg",
|
||||
"keyword": "3c8009251480cf17248c08b2b194dc0c4d59580e",
|
||||
"content": "45565f3f141fc960a8ae1168b80e718a494c52d2",
|
||||
})
|
||||
|
||||
def __init__(self, match):
|
||||
MememuseumExtractor.__init__(self, match)
|
||||
self.post_id = match.group(1)
|
||||
|
||||
def posts(self):
|
||||
url = "{}/post/view/{}".format(self.root, self.post_id)
|
||||
extr = text.extract_from(self.request(url).text)
|
||||
|
||||
return ({
|
||||
"id" : self.post_id,
|
||||
"tags" : extr(": ", "<"),
|
||||
"md5" : extr("/_thumbs/", "/"),
|
||||
"file_url": self.root + extr("id='main_image' src='", "'"),
|
||||
"width" : extr("data-width=", " ").strip("'\""),
|
||||
"height" : extr("data-height=", " ").strip("'\""),
|
||||
"size" : 0,
|
||||
},)
|
@ -7,7 +7,7 @@
|
||||
"""Extractors for Misskey instances"""
|
||||
|
||||
from .common import BaseExtractor, Message
|
||||
from .. import text
|
||||
from .. import text, exception
|
||||
|
||||
|
||||
class MisskeyExtractor(BaseExtractor):
|
||||
@ -19,14 +19,18 @@ class MisskeyExtractor(BaseExtractor):
|
||||
|
||||
def __init__(self, match):
|
||||
BaseExtractor.__init__(self, match)
|
||||
self.item = match.group(match.lastindex)
|
||||
|
||||
def _init(self):
|
||||
self.api = MisskeyAPI(self)
|
||||
self.instance = self.root.rpartition("://")[2]
|
||||
self.item = match.group(match.lastindex)
|
||||
self.renotes = self.config("renotes", False)
|
||||
self.replies = self.config("replies", True)
|
||||
|
||||
def items(self):
|
||||
for note in self.notes():
|
||||
if "note" in note:
|
||||
note = note["note"]
|
||||
files = note.pop("files") or []
|
||||
renote = note.get("renote")
|
||||
if renote:
|
||||
@ -68,7 +72,7 @@ BASE_PATTERN = MisskeyExtractor.update({
|
||||
},
|
||||
"lesbian.energy": {
|
||||
"root": "https://lesbian.energy",
|
||||
"pattern": r"lesbian\.energy"
|
||||
"pattern": r"lesbian\.energy",
|
||||
},
|
||||
"sushi.ski": {
|
||||
"root": "https://sushi.ski",
|
||||
@ -152,6 +156,21 @@ class MisskeyNoteExtractor(MisskeyExtractor):
|
||||
return (self.api.notes_show(self.item),)
|
||||
|
||||
|
||||
class MisskeyFavoriteExtractor(MisskeyExtractor):
|
||||
"""Extractor for favorited notes"""
|
||||
subcategory = "favorite"
|
||||
pattern = BASE_PATTERN + r"/(?:my|api/i)/favorites"
|
||||
test = (
|
||||
("https://misskey.io/my/favorites"),
|
||||
("https://misskey.io/api/i/favorites"),
|
||||
("https://lesbian.energy/my/favorites"),
|
||||
("https://sushi.ski/my/favorites"),
|
||||
)
|
||||
|
||||
def notes(self):
|
||||
return self.api.i_favorites()
|
||||
|
||||
|
||||
class MisskeyAPI():
|
||||
"""Interface for Misskey API
|
||||
|
||||
@ -164,6 +183,7 @@ class MisskeyAPI():
|
||||
self.root = extractor.root
|
||||
self.extractor = extractor
|
||||
self.headers = {"Content-Type": "application/json"}
|
||||
self.access_token = extractor.config("access-token")
|
||||
|
||||
def user_id_by_username(self, username):
|
||||
endpoint = "/users/show"
|
||||
@ -187,6 +207,13 @@ class MisskeyAPI():
|
||||
data = {"noteId": note_id}
|
||||
return self._call(endpoint, data)
|
||||
|
||||
def i_favorites(self):
|
||||
endpoint = "/i/favorites"
|
||||
if not self.access_token:
|
||||
raise exception.AuthenticationError()
|
||||
data = {"i": self.access_token}
|
||||
return self._pagination(endpoint, data)
|
||||
|
||||
def _call(self, endpoint, data):
|
||||
url = self.root + "/api" + endpoint
|
||||
return self.extractor.request(
|
||||
|
@ -1,6 +1,6 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2020-2022 Mike Fährmann
|
||||
# Copyright 2020-2023 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
@ -166,7 +166,7 @@ class MoebooruTagExtractor(MoebooruExtractor):
|
||||
subcategory = "tag"
|
||||
directory_fmt = ("{category}", "{search_tags}")
|
||||
archive_fmt = "t_{search_tags}_{id}"
|
||||
pattern = BASE_PATTERN + r"/post\?(?:[^&#]*&)*tags=([^&#]+)"
|
||||
pattern = BASE_PATTERN + r"/post\?(?:[^&#]*&)*tags=([^&#]*)"
|
||||
test = (
|
||||
("https://yande.re/post?tags=ouzoku+armor", {
|
||||
"content": "59201811c728096b2d95ce6896fd0009235fe683",
|
||||
@ -174,6 +174,8 @@ class MoebooruTagExtractor(MoebooruExtractor):
|
||||
("https://konachan.com/post?tags=patata", {
|
||||
"content": "838cfb815e31f48160855435655ddf7bfc4ecb8d",
|
||||
}),
|
||||
# empty 'tags' (#4354)
|
||||
("https://konachan.com/post?tags="),
|
||||
("https://konachan.net/post?tags=patata"),
|
||||
("https://www.sakugabooru.com/post?tags=nichijou"),
|
||||
("https://lolibooru.moe/post?tags=ruu_%28tksymkw%29"),
|
||||
|
@ -38,7 +38,9 @@ class MyhentaigalleryGalleryExtractor(GalleryExtractor):
|
||||
self.gallery_id = match.group(1)
|
||||
url = "{}/gallery/thumbnails/{}".format(self.root, self.gallery_id)
|
||||
GalleryExtractor.__init__(self, match, url)
|
||||
self.session.headers["Referer"] = url
|
||||
|
||||
def _init(self):
|
||||
self.session.headers["Referer"] = self.gallery_url
|
||||
|
||||
def metadata(self, page):
|
||||
extr = text.extract_from(page)
|
||||
|
@ -1,12 +1,12 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2018-2022 Mike Fährmann
|
||||
# Copyright 2018-2023 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
# published by the Free Software Foundation.
|
||||
|
||||
"""Extract images from https://www.myportfolio.com/"""
|
||||
"""Extractors for https://www.myportfolio.com/"""
|
||||
|
||||
from .common import Extractor, Message
|
||||
from .. import text, exception
|
||||
@ -21,7 +21,7 @@ class MyportfolioGalleryExtractor(Extractor):
|
||||
archive_fmt = "{user}_{filename}"
|
||||
pattern = (r"(?:myportfolio:(?:https?://)?([^/]+)|"
|
||||
r"(?:https?://)?([\w-]+\.myportfolio\.com))"
|
||||
r"(/[^/?&#]+)?")
|
||||
r"(/[^/?#]+)?")
|
||||
test = (
|
||||
("https://andrewling.myportfolio.com/volvo-xc-90-hybrid", {
|
||||
"url": "acea0690c76db0e5cf267648cefd86e921bc3499",
|
||||
|
@ -1,118 +0,0 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
# published by the Free Software Foundation.
|
||||
|
||||
"""Extractors for https://nana.my.id/"""
|
||||
|
||||
from .common import GalleryExtractor, Extractor, Message
|
||||
from .. import text, util, exception
|
||||
|
||||
|
||||
class NanaGalleryExtractor(GalleryExtractor):
|
||||
"""Extractor for image galleries from nana.my.id"""
|
||||
category = "nana"
|
||||
directory_fmt = ("{category}", "{title}")
|
||||
pattern = r"(?:https?://)?nana\.my\.id/reader/([^/?#]+)"
|
||||
test = (
|
||||
(("https://nana.my.id/reader/"
|
||||
"059f7de55a4297413bfbd432ce7d6e724dd42bae"), {
|
||||
"pattern": r"https://nana\.my\.id/reader/"
|
||||
r"\w+/image/page\?path=.*\.\w+",
|
||||
"keyword": {
|
||||
"title" : "Everybody Loves Shion",
|
||||
"artist": "fuzui",
|
||||
"tags" : list,
|
||||
"count" : 29,
|
||||
},
|
||||
}),
|
||||
(("https://nana.my.id/reader/"
|
||||
"77c8712b67013e427923573379f5bafcc0c72e46"), {
|
||||
"pattern": r"https://nana\.my\.id/reader/"
|
||||
r"\w+/image/page\?path=.*\.\w+",
|
||||
"keyword": {
|
||||
"title" : "Lovey-Dovey With an Otaku-Friendly Gyaru",
|
||||
"artist": "Sueyuu",
|
||||
"tags" : ["Sueyuu"],
|
||||
"count" : 58,
|
||||
},
|
||||
}),
|
||||
)
|
||||
|
||||
def __init__(self, match):
|
||||
self.gallery_id = match.group(1)
|
||||
url = "https://nana.my.id/reader/" + self.gallery_id
|
||||
GalleryExtractor.__init__(self, match, url)
|
||||
|
||||
def metadata(self, page):
|
||||
title = text.unescape(
|
||||
text.extr(page, '</a> ', '</div>'))
|
||||
artist = text.unescape(text.extr(
|
||||
page, '<title>', '</title>'))[len(title):-10]
|
||||
tags = text.extr(page, 'Reader.tags = "', '"')
|
||||
|
||||
return {
|
||||
"gallery_id": self.gallery_id,
|
||||
"title" : title,
|
||||
"artist" : artist[4:] if artist.startswith(" by ") else "",
|
||||
"tags" : tags.split(", ") if tags else (),
|
||||
"lang" : "en",
|
||||
"language" : "English",
|
||||
}
|
||||
|
||||
def images(self, page):
|
||||
data = util.json_loads(text.extr(page, "Reader.pages = ", ".pages"))
|
||||
return [
|
||||
("https://nana.my.id" + image, None)
|
||||
for image in data["pages"]
|
||||
]
|
||||
|
||||
|
||||
class NanaSearchExtractor(Extractor):
|
||||
"""Extractor for nana search results"""
|
||||
category = "nana"
|
||||
subcategory = "search"
|
||||
pattern = r"(?:https?://)?nana\.my\.id(?:/?\?([^#]+))"
|
||||
test = (
|
||||
('https://nana.my.id/?q=+"elf"&sort=desc', {
|
||||
"pattern": NanaGalleryExtractor.pattern,
|
||||
"range": "1-100",
|
||||
"count": 100,
|
||||
}),
|
||||
("https://nana.my.id/?q=favorites%3A", {
|
||||
"pattern": NanaGalleryExtractor.pattern,
|
||||
"count": ">= 2",
|
||||
}),
|
||||
)
|
||||
|
||||
def __init__(self, match):
|
||||
Extractor.__init__(self, match)
|
||||
self.params = text.parse_query(match.group(1))
|
||||
self.params["p"] = text.parse_int(self.params.get("p"), 1)
|
||||
self.params["q"] = self.params.get("q") or ""
|
||||
|
||||
def items(self):
|
||||
if "favorites:" in self.params["q"]:
|
||||
favkey = self.config("favkey")
|
||||
if not favkey:
|
||||
raise exception.AuthenticationError(
|
||||
"'Favorite key' not provided. "
|
||||
"Please see 'https://nana.my.id/tutorial'")
|
||||
self.session.cookies.set("favkey", favkey, domain="nana.my.id")
|
||||
|
||||
data = {"_extractor": NanaGalleryExtractor}
|
||||
while True:
|
||||
try:
|
||||
page = self.request(
|
||||
"https://nana.my.id", params=self.params).text
|
||||
except exception.HttpError:
|
||||
return
|
||||
|
||||
for gallery in text.extract_iter(
|
||||
page, '<div class="id3">', '</div>'):
|
||||
url = "https://nana.my.id" + text.extr(
|
||||
gallery, '<a href="', '"')
|
||||
yield Message.Queue, url, data
|
||||
|
||||
self.params["p"] += 1
|
@ -91,7 +91,7 @@ class NaverwebtoonEpisodeExtractor(NaverwebtoonBase, GalleryExtractor):
|
||||
return {
|
||||
"title_id": self.title_id,
|
||||
"episode" : self.episode,
|
||||
"comic" : extr("titleName: '", "'"),
|
||||
"comic" : extr('titleName: "', '"'),
|
||||
"tags" : [t.strip() for t in text.extract_iter(
|
||||
extr("tagList: [", "}],"), '"tagName":"', '"')],
|
||||
"title" : extr('"subtitle":"', '"'),
|
||||
|
@ -21,13 +21,16 @@ class NewgroundsExtractor(Extractor):
|
||||
filename_fmt = "{category}_{_index}_{title}.{extension}"
|
||||
archive_fmt = "{_type}{_index}"
|
||||
root = "https://www.newgrounds.com"
|
||||
cookiedomain = ".newgrounds.com"
|
||||
cookienames = ("NG_GG_username", "vmk1du5I8m")
|
||||
cookies_domain = ".newgrounds.com"
|
||||
cookies_names = ("NG_GG_username", "vmk1du5I8m")
|
||||
request_interval = 1.0
|
||||
|
||||
def __init__(self, match):
|
||||
Extractor.__init__(self, match)
|
||||
self.user = match.group(1)
|
||||
self.user_root = "https://{}.newgrounds.com".format(self.user)
|
||||
|
||||
def _init(self):
|
||||
self.flash = self.config("flash", True)
|
||||
|
||||
fmt = self.config("format", "original")
|
||||
@ -71,11 +74,12 @@ class NewgroundsExtractor(Extractor):
|
||||
"""Return general metadata"""
|
||||
|
||||
def login(self):
|
||||
if self._check_cookies(self.cookienames):
|
||||
if self.cookies_check(self.cookies_names):
|
||||
return
|
||||
|
||||
username, password = self._get_auth_info()
|
||||
if username:
|
||||
self._update_cookies(self._login_impl(username, password))
|
||||
self.cookies_update(self._login_impl(username, password))
|
||||
|
||||
@cache(maxage=360*24*3600, keyarg=1)
|
||||
def _login_impl(self, username, password):
|
||||
@ -84,16 +88,17 @@ class NewgroundsExtractor(Extractor):
|
||||
url = self.root + "/passport/"
|
||||
response = self.request(url)
|
||||
if response.history and response.url.endswith("/social"):
|
||||
return self.session.cookies
|
||||
return self.cookies
|
||||
|
||||
page = response.text
|
||||
headers = {"Origin": self.root, "Referer": url}
|
||||
url = text.urljoin(self.root, text.extr(
|
||||
response.text, 'action="', '"'))
|
||||
url = text.urljoin(self.root, text.extr(page, 'action="', '"'))
|
||||
data = {
|
||||
"username": username,
|
||||
"password": password,
|
||||
"remember": "1",
|
||||
"login" : "1",
|
||||
"auth" : text.extr(page, 'name="auth" value="', '"'),
|
||||
}
|
||||
|
||||
response = self.request(url, method="POST", headers=headers, data=data)
|
||||
@ -103,7 +108,7 @@ class NewgroundsExtractor(Extractor):
|
||||
return {
|
||||
cookie.name: cookie.value
|
||||
for cookie in response.history[0].cookies
|
||||
if cookie.expires and cookie.domain == self.cookiedomain
|
||||
if cookie.expires and cookie.domain == self.cookies_domain
|
||||
}
|
||||
|
||||
def extract_post(self, post_url):
|
||||
@ -514,6 +519,9 @@ class NewgroundsUserExtractor(NewgroundsExtractor):
|
||||
}),
|
||||
)
|
||||
|
||||
def initialize(self):
|
||||
pass
|
||||
|
||||
def items(self):
|
||||
base = self.user_root + "/"
|
||||
return self._dispatch_extractors((
|
||||
|
@ -21,19 +21,20 @@ class NijieExtractor(AsynchronousMixin, BaseExtractor):
|
||||
archive_fmt = "{image_id}_{num}"
|
||||
|
||||
def __init__(self, match):
|
||||
self._init_category(match)
|
||||
self.cookiedomain = "." + self.root.rpartition("/")[2]
|
||||
self.cookienames = (self.category + "_tok",)
|
||||
BaseExtractor.__init__(self, match)
|
||||
self.user_id = text.parse_int(match.group(match.lastindex))
|
||||
|
||||
def initialize(self):
|
||||
self.cookies_domain = "." + self.root.rpartition("/")[2]
|
||||
self.cookies_names = (self.category + "_tok",)
|
||||
|
||||
BaseExtractor.initialize(self)
|
||||
|
||||
self.session.headers["Referer"] = self.root + "/"
|
||||
self.user_name = None
|
||||
if self.category == "horne":
|
||||
self._extract_data = self._extract_data_horne
|
||||
|
||||
BaseExtractor.__init__(self, match)
|
||||
|
||||
self.user_id = text.parse_int(match.group(match.lastindex))
|
||||
self.user_name = None
|
||||
self.session.headers["Referer"] = self.root + "/"
|
||||
|
||||
def items(self):
|
||||
self.login()
|
||||
|
||||
@ -121,10 +122,11 @@ class NijieExtractor(AsynchronousMixin, BaseExtractor):
|
||||
return text.unescape(text.extr(page, "<br />", "<"))
|
||||
|
||||
def login(self):
|
||||
"""Login and obtain session cookies"""
|
||||
if not self._check_cookies(self.cookienames):
|
||||
username, password = self._get_auth_info()
|
||||
self._update_cookies(self._login_impl(username, password))
|
||||
if self.cookies_check(self.cookies_names):
|
||||
return
|
||||
|
||||
username, password = self._get_auth_info()
|
||||
self.cookies_update(self._login_impl(username, password))
|
||||
|
||||
@cache(maxage=90*24*3600, keyarg=1)
|
||||
def _login_impl(self, username, password):
|
||||
@ -139,7 +141,7 @@ class NijieExtractor(AsynchronousMixin, BaseExtractor):
|
||||
response = self.request(url, method="POST", data=data)
|
||||
if "/login.php" in response.text:
|
||||
raise exception.AuthenticationError()
|
||||
return self.session.cookies
|
||||
return self.cookies
|
||||
|
||||
def _pagination(self, path):
|
||||
url = "{}/{}.php".format(self.root, path)
|
||||
@ -172,13 +174,16 @@ BASE_PATTERN = NijieExtractor.update({
|
||||
class NijieUserExtractor(NijieExtractor):
|
||||
"""Extractor for nijie user profiles"""
|
||||
subcategory = "user"
|
||||
cookiedomain = None
|
||||
cookies_domain = None
|
||||
pattern = BASE_PATTERN + r"/members\.php\?id=(\d+)"
|
||||
test = (
|
||||
("https://nijie.info/members.php?id=44"),
|
||||
("https://horne.red/members.php?id=58000"),
|
||||
)
|
||||
|
||||
def initialize(self):
|
||||
pass
|
||||
|
||||
def items(self):
|
||||
fmt = "{}/{{}}.php?id={}".format(self.root, self.user_id).format
|
||||
return self._dispatch_extractors((
|
||||
|
@ -21,7 +21,7 @@ class NitterExtractor(BaseExtractor):
|
||||
archive_fmt = "{tweet_id}_{num}"
|
||||
|
||||
def __init__(self, match):
|
||||
self.cookiedomain = self.root.partition("://")[2]
|
||||
self.cookies_domain = self.root.partition("://")[2]
|
||||
BaseExtractor.__init__(self, match)
|
||||
|
||||
lastindex = match.lastindex
|
||||
@ -35,7 +35,7 @@ class NitterExtractor(BaseExtractor):
|
||||
if videos:
|
||||
ytdl = (videos == "ytdl")
|
||||
videos = True
|
||||
self._cookiejar.set("hlsPlayback", "on", domain=self.cookiedomain)
|
||||
self.cookies.set("hlsPlayback", "on", domain=self.cookies_domain)
|
||||
|
||||
for tweet in self.tweets():
|
||||
|
||||
@ -162,7 +162,11 @@ class NitterExtractor(BaseExtractor):
|
||||
banner = extr('class="profile-banner"><a href="', '"')
|
||||
|
||||
try:
|
||||
uid = banner.split("%2F")[4]
|
||||
if "/enc/" in banner:
|
||||
uid = binascii.a2b_base64(banner.rpartition(
|
||||
"/")[2]).decode().split("/")[4]
|
||||
else:
|
||||
uid = banner.split("%2F")[4]
|
||||
except Exception:
|
||||
uid = 0
|
||||
|
||||
@ -302,7 +306,10 @@ class NitterTweetsExtractor(NitterExtractor):
|
||||
r"/media%2FCGMNYZvW0AIVoom\.jpg",
|
||||
"range": "1",
|
||||
}),
|
||||
("https://nitter.1d4.us/supernaturepics"),
|
||||
("https://nitter.1d4.us/supernaturepics", {
|
||||
"range": "1",
|
||||
"keyword": {"user": {"id": "2976459548"}},
|
||||
}),
|
||||
("https://nitter.kavin.rocks/id:2976459548"),
|
||||
("https://nitter.unixfox.eu/supernaturepics"),
|
||||
)
|
||||
|
@ -75,7 +75,8 @@ class NsfwalbumAlbumExtractor(GalleryExtractor):
|
||||
|
||||
@staticmethod
|
||||
def _validate_response(response):
|
||||
return not response.request.url.endswith("/no_image.jpg")
|
||||
return not response.request.url.endswith(
|
||||
("/no_image.jpg", "/placeholder.png"))
|
||||
|
||||
@staticmethod
|
||||
def _annihilate(value, base=6):
|
||||
|
@ -28,6 +28,8 @@ class OAuthBase(Extractor):
|
||||
def __init__(self, match):
|
||||
Extractor.__init__(self, match)
|
||||
self.client = None
|
||||
|
||||
def _init(self):
|
||||
self.cache = config.get(("extractor", self.category), "cache", True)
|
||||
|
||||
def oauth_config(self, key, default=None):
|
||||
@ -71,8 +73,11 @@ class OAuthBase(Extractor):
|
||||
|
||||
browser = self.config("browser", True)
|
||||
if browser:
|
||||
import webbrowser
|
||||
browser = webbrowser.get()
|
||||
try:
|
||||
import webbrowser
|
||||
browser = webbrowser.get()
|
||||
except Exception:
|
||||
browser = None
|
||||
|
||||
if browser and browser.open(url):
|
||||
name = getattr(browser, "name", "Browser")
|
||||
@ -131,7 +136,7 @@ class OAuthBase(Extractor):
|
||||
|
||||
def _oauth2_authorization_code_grant(
|
||||
self, client_id, client_secret, default_id, default_secret,
|
||||
auth_url, token_url, *, scope="read", duration="permanent",
|
||||
auth_url, token_url, scope="read", duration="permanent",
|
||||
key="refresh_token", auth=True, cache=None, instance=None):
|
||||
"""Perform an OAuth2 authorization code grant"""
|
||||
|
||||
|
@ -1,6 +1,6 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2018-2022 Mike Fährmann
|
||||
# Copyright 2018-2023 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
@ -14,14 +14,14 @@ from .. import text
|
||||
|
||||
class PahealExtractor(Extractor):
|
||||
"""Base class for paheal extractors"""
|
||||
basecategory = "booru"
|
||||
basecategory = "shimmie2"
|
||||
category = "paheal"
|
||||
filename_fmt = "{category}_{id}_{md5}.{extension}"
|
||||
archive_fmt = "{id}"
|
||||
root = "https://rule34.paheal.net"
|
||||
|
||||
def items(self):
|
||||
self.session.cookies.set(
|
||||
self.cookies.set(
|
||||
"ui-tnc-agreed", "true", domain="rule34.paheal.net")
|
||||
data = self.get_metadata()
|
||||
|
||||
@ -55,8 +55,8 @@ class PahealExtractor(Extractor):
|
||||
"class='username' href='/user/", "'")),
|
||||
"date" : text.parse_datetime(
|
||||
extr("datetime='", "'"), "%Y-%m-%dT%H:%M:%S%z"),
|
||||
"source" : text.extract(
|
||||
extr(">Source Link<", "</td>"), "href='", "'")[0],
|
||||
"source" : text.unescape(text.extr(
|
||||
extr(">Source Link<", "</td>"), "href='", "'")),
|
||||
}
|
||||
|
||||
dimensions, size, ext = extr("Info</th><td>", ">").split(" // ")
|
||||
@ -74,16 +74,41 @@ class PahealTagExtractor(PahealExtractor):
|
||||
directory_fmt = ("{category}", "{search_tags}")
|
||||
pattern = (r"(?:https?://)?(?:rule34|rule63|cosplay)\.paheal\.net"
|
||||
r"/post/list/([^/?#]+)")
|
||||
test = ("https://rule34.paheal.net/post/list/Ayane_Suzuki/1", {
|
||||
"pattern": r"https://[^.]+\.paheal\.net/_images/\w+/\d+%20-%20",
|
||||
"count": ">= 15"
|
||||
})
|
||||
test = (
|
||||
("https://rule34.paheal.net/post/list/Ayane_Suzuki/1", {
|
||||
"pattern": r"https://[^.]+\.paheal\.net/_images/\w+/\d+%20-%20",
|
||||
"count": ">= 15"
|
||||
}),
|
||||
("https://rule34.paheal.net/post/list/Ayane_Suzuki/1", {
|
||||
"range": "1",
|
||||
"options": (("metadata", True),),
|
||||
"keyword": {
|
||||
"date": "dt:2018-01-07 07:04:05",
|
||||
"duration": 0.0,
|
||||
"extension": "jpg",
|
||||
"filename": "2446128 - Ayane_Suzuki Idolmaster "
|
||||
"idolmaster_dearly_stars Zanzi",
|
||||
"height": 768,
|
||||
"id": 2446128,
|
||||
"md5": "b0ceda9d860df1d15b60293a7eb465c1",
|
||||
"search_tags": "Ayane_Suzuki",
|
||||
"size": 205312,
|
||||
"source": "https://www.pixiv.net/member_illust.php"
|
||||
"?mode=medium&illust_id=19957280",
|
||||
"tags": "Ayane_Suzuki Idolmaster "
|
||||
"idolmaster_dearly_stars Zanzi",
|
||||
"uploader": "XXXname",
|
||||
"width": 1024,
|
||||
},
|
||||
}),
|
||||
)
|
||||
per_page = 70
|
||||
|
||||
def __init__(self, match):
|
||||
PahealExtractor.__init__(self, match)
|
||||
self.tags = text.unquote(match.group(1))
|
||||
|
||||
def _init(self):
|
||||
if self.config("metadata"):
|
||||
self._extract_data = self._extract_data_ex
|
||||
|
||||
@ -96,8 +121,9 @@ class PahealTagExtractor(PahealExtractor):
|
||||
url = "{}/post/list/{}/{}".format(self.root, self.tags, pnum)
|
||||
page = self.request(url).text
|
||||
|
||||
pos = page.find("id='image-list'")
|
||||
for post in text.extract_iter(
|
||||
page, '<img id="thumb_', 'Only</a>'):
|
||||
page, "<img id='thumb_", "Only</a>", pos):
|
||||
yield self._extract_data(post)
|
||||
|
||||
if ">Next<" not in page:
|
||||
@ -106,10 +132,10 @@ class PahealTagExtractor(PahealExtractor):
|
||||
|
||||
@staticmethod
|
||||
def _extract_data(post):
|
||||
pid , pos = text.extract(post, '', '"')
|
||||
data, pos = text.extract(post, 'title="', '"', pos)
|
||||
md5 , pos = text.extract(post, '/_thumbs/', '/', pos)
|
||||
url , pos = text.extract(post, '<a href="', '"', pos)
|
||||
pid , pos = text.extract(post, "", "'")
|
||||
data, pos = text.extract(post, "title='", "'", pos)
|
||||
md5 , pos = text.extract(post, "/_thumbs/", "/", pos)
|
||||
url , pos = text.extract(post, "<a href='", "'", pos)
|
||||
|
||||
tags, data, date = data.split("\n")
|
||||
dimensions, size, ext = data.split(" // ")
|
||||
@ -126,7 +152,7 @@ class PahealTagExtractor(PahealExtractor):
|
||||
}
|
||||
|
||||
def _extract_data_ex(self, post):
|
||||
pid = post[:post.index('"')]
|
||||
pid = post[:post.index("'")]
|
||||
return self._extract_post(pid)
|
||||
|
||||
|
||||
@ -139,19 +165,19 @@ class PahealPostExtractor(PahealExtractor):
|
||||
("https://rule34.paheal.net/post/view/481609", {
|
||||
"pattern": r"https://tulip\.paheal\.net/_images"
|
||||
r"/bbdc1c33410c2cdce7556c7990be26b7/481609%20-%20"
|
||||
r"Azumanga_Daioh%20Osaka%20Vuvuzela%20inanimate\.jpg",
|
||||
r"Azumanga_Daioh%20inanimate%20Osaka%20Vuvuzela\.jpg",
|
||||
"content": "7b924bcf150b352ac75c9d281d061e174c851a11",
|
||||
"keyword": {
|
||||
"date": "dt:2010-06-17 15:40:23",
|
||||
"extension": "jpg",
|
||||
"file_url": "re:https://tulip.paheal.net/_images/bbdc1c33410c",
|
||||
"filename": "481609 - Azumanga_Daioh Osaka Vuvuzela inanimate",
|
||||
"filename": "481609 - Azumanga_Daioh inanimate Osaka Vuvuzela",
|
||||
"height": 660,
|
||||
"id": 481609,
|
||||
"md5": "bbdc1c33410c2cdce7556c7990be26b7",
|
||||
"size": 157389,
|
||||
"source": None,
|
||||
"tags": "Azumanga_Daioh Osaka Vuvuzela inanimate",
|
||||
"source": "",
|
||||
"tags": "Azumanga_Daioh inanimate Osaka Vuvuzela",
|
||||
"uploader": "CaptainButtface",
|
||||
"width": 614,
|
||||
},
|
||||
@ -163,7 +189,7 @@ class PahealPostExtractor(PahealExtractor):
|
||||
"md5": "b39edfe455a0381110c710d6ed2ef57d",
|
||||
"size": 758989,
|
||||
"source": "http://www.furaffinity.net/view/4057821/",
|
||||
"tags": "Vuvuzela inanimate thelost-dragon",
|
||||
"tags": "inanimate thelost-dragon Vuvuzela",
|
||||
"uploader": "leacheate_soup",
|
||||
"width": 1200,
|
||||
},
|
||||
@ -171,8 +197,8 @@ class PahealPostExtractor(PahealExtractor):
|
||||
# video
|
||||
("https://rule34.paheal.net/post/view/3864982", {
|
||||
"pattern": r"https://[\w]+\.paheal\.net/_images/7629fc0ff77e32637d"
|
||||
r"de5bf4f992b2cb/3864982%20-%20Metal_Gear%20Metal_Gear_"
|
||||
r"Solid_V%20Quiet%20Vg_erotica%20animated%20webm\.webm",
|
||||
r"de5bf4f992b2cb/3864982%20-%20animated%20Metal_Gear%20"
|
||||
r"Metal_Gear_Solid_V%20Quiet%20Vg_erotica%20webm\.webm",
|
||||
"keyword": {
|
||||
"date": "dt:2020-09-06 01:59:03",
|
||||
"duration": 30.0,
|
||||
@ -183,8 +209,8 @@ class PahealPostExtractor(PahealExtractor):
|
||||
"size": 18454938,
|
||||
"source": "https://twitter.com/VG_Worklog"
|
||||
"/status/1302407696294055936",
|
||||
"tags": "Metal_Gear Metal_Gear_Solid_V Quiet "
|
||||
"Vg_erotica animated webm",
|
||||
"tags": "animated Metal_Gear Metal_Gear_Solid_V "
|
||||
"Quiet Vg_erotica webm",
|
||||
"uploader": "justausername",
|
||||
"width": 1768,
|
||||
},
|
||||
|
@ -19,7 +19,7 @@ class PatreonExtractor(Extractor):
|
||||
"""Base class for patreon extractors"""
|
||||
category = "patreon"
|
||||
root = "https://www.patreon.com"
|
||||
cookiedomain = ".patreon.com"
|
||||
cookies_domain = ".patreon.com"
|
||||
directory_fmt = ("{category}", "{creator[full_name]}")
|
||||
filename_fmt = "{id}_{title}_{num:>02}.{extension}"
|
||||
archive_fmt = "{id}_{num}"
|
||||
@ -28,11 +28,11 @@ class PatreonExtractor(Extractor):
|
||||
_warning = True
|
||||
|
||||
def items(self):
|
||||
|
||||
if self._warning:
|
||||
if not self._check_cookies(("session_id",)):
|
||||
if not self.cookies_check(("session_id",)):
|
||||
self.log.warning("no 'session_id' cookie set")
|
||||
PatreonExtractor._warning = False
|
||||
|
||||
generators = self._build_file_generators(self.config("files"))
|
||||
|
||||
for post in self.posts():
|
||||
|
@ -19,39 +19,18 @@ class PhilomenaExtractor(BooruExtractor):
|
||||
filename_fmt = "{filename}.{extension}"
|
||||
archive_fmt = "{id}"
|
||||
request_interval = 1.0
|
||||
page_start = 1
|
||||
per_page = 50
|
||||
|
||||
def _init(self):
|
||||
self.api = PhilomenaAPI(self)
|
||||
|
||||
_file_url = operator.itemgetter("view_url")
|
||||
|
||||
@staticmethod
|
||||
def _prepare(post):
|
||||
post["date"] = text.parse_datetime(post["created_at"])
|
||||
|
||||
def _pagination(self, url, params):
|
||||
params["page"] = 1
|
||||
params["per_page"] = self.per_page
|
||||
|
||||
api_key = self.config("api-key")
|
||||
if api_key:
|
||||
params["key"] = api_key
|
||||
|
||||
filter_id = self.config("filter")
|
||||
if filter_id:
|
||||
params["filter_id"] = filter_id
|
||||
elif not api_key:
|
||||
try:
|
||||
params["filter_id"] = INSTANCES[self.category]["filter_id"]
|
||||
except (KeyError, TypeError):
|
||||
params["filter_id"] = "2"
|
||||
|
||||
while True:
|
||||
data = self.request(url, params=params).json()
|
||||
yield from data["images"]
|
||||
|
||||
if len(data["images"]) < self.per_page:
|
||||
return
|
||||
params["page"] += 1
|
||||
|
||||
|
||||
INSTANCES = {
|
||||
"derpibooru": {
|
||||
@ -146,8 +125,7 @@ class PhilomenaPostExtractor(PhilomenaExtractor):
|
||||
self.image_id = match.group(match.lastindex)
|
||||
|
||||
def posts(self):
|
||||
url = self.root + "/api/v1/json/images/" + self.image_id
|
||||
return (self.request(url).json()["image"],)
|
||||
return (self.api.image(self.image_id),)
|
||||
|
||||
|
||||
class PhilomenaSearchExtractor(PhilomenaExtractor):
|
||||
@ -201,8 +179,7 @@ class PhilomenaSearchExtractor(PhilomenaExtractor):
|
||||
return {"search_tags": self.params.get("q", "")}
|
||||
|
||||
def posts(self):
|
||||
url = self.root + "/api/v1/json/search/images"
|
||||
return self._pagination(url, self.params)
|
||||
return self.api.search(self.params)
|
||||
|
||||
|
||||
class PhilomenaGalleryExtractor(PhilomenaExtractor):
|
||||
@ -239,15 +216,81 @@ class PhilomenaGalleryExtractor(PhilomenaExtractor):
|
||||
self.gallery_id = match.group(match.lastindex)
|
||||
|
||||
def metadata(self):
|
||||
url = self.root + "/api/v1/json/search/galleries"
|
||||
params = {"q": "id:" + self.gallery_id}
|
||||
galleries = self.request(url, params=params).json()["galleries"]
|
||||
if not galleries:
|
||||
try:
|
||||
return {"gallery": self.api.gallery(self.gallery_id)}
|
||||
except IndexError:
|
||||
raise exception.NotFoundError("gallery")
|
||||
return {"gallery": galleries[0]}
|
||||
|
||||
def posts(self):
|
||||
gallery_id = "gallery_id:" + self.gallery_id
|
||||
url = self.root + "/api/v1/json/search/images"
|
||||
params = {"sd": "desc", "sf": gallery_id, "q": gallery_id}
|
||||
return self._pagination(url, params)
|
||||
return self.api.search(params)
|
||||
|
||||
|
||||
class PhilomenaAPI():
|
||||
"""Interface for the Philomena API
|
||||
|
||||
https://www.derpibooru.org/pages/api
|
||||
"""
|
||||
|
||||
def __init__(self, extractor):
|
||||
self.extractor = extractor
|
||||
self.root = extractor.root + "/api"
|
||||
|
||||
def gallery(self, gallery_id):
|
||||
endpoint = "/v1/json/search/galleries"
|
||||
params = {"q": "id:" + gallery_id}
|
||||
return self._call(endpoint, params)["galleries"][0]
|
||||
|
||||
def image(self, image_id):
|
||||
endpoint = "/v1/json/images/" + image_id
|
||||
return self._call(endpoint)["image"]
|
||||
|
||||
def search(self, params):
|
||||
endpoint = "/v1/json/search/images"
|
||||
return self._pagination(endpoint, params)
|
||||
|
||||
def _call(self, endpoint, params=None):
|
||||
url = self.root + endpoint
|
||||
|
||||
while True:
|
||||
response = self.extractor.request(url, params=params, fatal=None)
|
||||
|
||||
if response.status_code < 400:
|
||||
return response.json()
|
||||
|
||||
if response.status_code == 429:
|
||||
self.extractor.wait(seconds=600)
|
||||
continue
|
||||
|
||||
# error
|
||||
self.extractor.log.debug(response.content)
|
||||
raise exception.StopExtraction(
|
||||
"%s %s", response.status_code, response.reason)
|
||||
|
||||
def _pagination(self, endpoint, params):
|
||||
extr = self.extractor
|
||||
|
||||
api_key = extr.config("api-key")
|
||||
if api_key:
|
||||
params["key"] = api_key
|
||||
|
||||
filter_id = extr.config("filter")
|
||||
if filter_id:
|
||||
params["filter_id"] = filter_id
|
||||
elif not api_key:
|
||||
try:
|
||||
params["filter_id"] = INSTANCES[extr.category]["filter_id"]
|
||||
except (KeyError, TypeError):
|
||||
params["filter_id"] = "2"
|
||||
|
||||
params["page"] = extr.page_start
|
||||
params["per_page"] = extr.per_page
|
||||
|
||||
while True:
|
||||
data = self._call(endpoint, params)
|
||||
yield from data["images"]
|
||||
|
||||
if len(data["images"]) < extr.per_page:
|
||||
return
|
||||
params["page"] += 1
|
||||
|
@ -48,9 +48,10 @@ class PhotobucketAlbumExtractor(Extractor):
|
||||
)
|
||||
|
||||
def __init__(self, match):
|
||||
Extractor.__init__(self, match)
|
||||
self.album_path = ""
|
||||
self.root = "https://" + match.group(1)
|
||||
Extractor.__init__(self, match)
|
||||
|
||||
def _init(self):
|
||||
self.session.headers["Referer"] = self.url
|
||||
|
||||
def items(self):
|
||||
@ -129,6 +130,8 @@ class PhotobucketImageExtractor(Extractor):
|
||||
Extractor.__init__(self, match)
|
||||
self.user = match.group(1) or match.group(3)
|
||||
self.media_id = match.group(2)
|
||||
|
||||
def _init(self):
|
||||
self.session.headers["Referer"] = self.url
|
||||
|
||||
def items(self):
|
||||
|
@ -1,6 +1,6 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2018-2022 Mike Fährmann
|
||||
# Copyright 2018-2023 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
@ -19,7 +19,7 @@ class PiczelExtractor(Extractor):
|
||||
filename_fmt = "{category}_{id}_{title}_{num:>02}.{extension}"
|
||||
archive_fmt = "{id}_{num}"
|
||||
root = "https://piczel.tv"
|
||||
api_root = "https://tombstone.piczel.tv"
|
||||
api_root = root
|
||||
|
||||
def items(self):
|
||||
for post in self.posts():
|
||||
|
@ -24,7 +24,7 @@ class PillowfortExtractor(Extractor):
|
||||
filename_fmt = ("{post_id} {title|original_post[title]:?/ /}"
|
||||
"{num:>02}.{extension}")
|
||||
archive_fmt = "{id}"
|
||||
cookiedomain = "www.pillowfort.social"
|
||||
cookies_domain = "www.pillowfort.social"
|
||||
|
||||
def __init__(self, match):
|
||||
Extractor.__init__(self, match)
|
||||
@ -82,15 +82,14 @@ class PillowfortExtractor(Extractor):
|
||||
yield msgtype, url, post
|
||||
|
||||
def login(self):
|
||||
cget = self.session.cookies.get
|
||||
if cget("_Pf_new_session", domain=self.cookiedomain) \
|
||||
or cget("remember_user_token", domain=self.cookiedomain):
|
||||
if self.cookies.get("_Pf_new_session", domain=self.cookies_domain):
|
||||
return
|
||||
if self.cookies.get("remember_user_token", domain=self.cookies_domain):
|
||||
return
|
||||
|
||||
username, password = self._get_auth_info()
|
||||
if username:
|
||||
cookies = self._login_impl(username, password)
|
||||
self._update_cookies(cookies)
|
||||
self.cookies_update(self._login_impl(username, password))
|
||||
|
||||
@cache(maxage=14*24*3600, keyarg=1)
|
||||
def _login_impl(self, username, password):
|
||||
|
@ -23,12 +23,10 @@ class PinterestExtractor(Extractor):
|
||||
archive_fmt = "{id}{media_id}"
|
||||
root = "https://www.pinterest.com"
|
||||
|
||||
def __init__(self, match):
|
||||
Extractor.__init__(self, match)
|
||||
|
||||
def _init(self):
|
||||
domain = self.config("domain")
|
||||
if not domain or domain == "auto" :
|
||||
self.root = text.root_from_url(match.group(0))
|
||||
self.root = text.root_from_url(self.url)
|
||||
else:
|
||||
self.root = text.ensure_http_scheme(domain)
|
||||
|
||||
@ -112,7 +110,7 @@ class PinterestExtractor(Extractor):
|
||||
class PinterestPinExtractor(PinterestExtractor):
|
||||
"""Extractor for images from a single pin from pinterest.com"""
|
||||
subcategory = "pin"
|
||||
pattern = BASE_PATTERN + r"/pin/([^/?#&]+)(?!.*#related$)"
|
||||
pattern = BASE_PATTERN + r"/pin/([^/?#]+)(?!.*#related$)"
|
||||
test = (
|
||||
("https://www.pinterest.com/pin/858146903966145189/", {
|
||||
"url": "afb3c26719e3a530bb0e871c480882a801a4e8a5",
|
||||
@ -121,7 +119,7 @@ class PinterestPinExtractor(PinterestExtractor):
|
||||
}),
|
||||
# video pin (#1189)
|
||||
("https://www.pinterest.com/pin/422564377542934214/", {
|
||||
"pattern": r"https://v\.pinimg\.com/videos/mc/hls/d7/22/ff"
|
||||
"pattern": r"https://v\d*\.pinimg\.com/videos/mc/hls/d7/22/ff"
|
||||
r"/d722ff00ab2352981b89974b37909de8.m3u8",
|
||||
}),
|
||||
("https://www.pinterest.com/pin/858146903966145188/", {
|
||||
@ -147,8 +145,8 @@ class PinterestBoardExtractor(PinterestExtractor):
|
||||
subcategory = "board"
|
||||
directory_fmt = ("{category}", "{board[owner][username]}", "{board[name]}")
|
||||
archive_fmt = "{board[id]}_{id}"
|
||||
pattern = (BASE_PATTERN + r"/(?!pin/)([^/?#&]+)"
|
||||
"/(?!_saved|_created|pins/)([^/?#&]+)/?$")
|
||||
pattern = (BASE_PATTERN + r"/(?!pin/)([^/?#]+)"
|
||||
"/(?!_saved|_created|pins/)([^/?#]+)/?$")
|
||||
test = (
|
||||
("https://www.pinterest.com/g1952849/test-/", {
|
||||
"pattern": r"https://i\.pinimg\.com/originals/",
|
||||
@ -198,7 +196,7 @@ class PinterestBoardExtractor(PinterestExtractor):
|
||||
class PinterestUserExtractor(PinterestExtractor):
|
||||
"""Extractor for a user's boards"""
|
||||
subcategory = "user"
|
||||
pattern = BASE_PATTERN + r"/(?!pin/)([^/?#&]+)(?:/_saved)?/?$"
|
||||
pattern = BASE_PATTERN + r"/(?!pin/)([^/?#]+)(?:/_saved)?/?$"
|
||||
test = (
|
||||
("https://www.pinterest.com/g1952849/", {
|
||||
"pattern": PinterestBoardExtractor.pattern,
|
||||
@ -223,7 +221,7 @@ class PinterestAllpinsExtractor(PinterestExtractor):
|
||||
"""Extractor for a user's 'All Pins' feed"""
|
||||
subcategory = "allpins"
|
||||
directory_fmt = ("{category}", "{user}")
|
||||
pattern = BASE_PATTERN + r"/(?!pin/)([^/?#&]+)/pins/?$"
|
||||
pattern = BASE_PATTERN + r"/(?!pin/)([^/?#]+)/pins/?$"
|
||||
test = ("https://www.pinterest.com/g1952849/pins/", {
|
||||
"pattern": r"https://i\.pinimg\.com/originals/[0-9a-f]{2}"
|
||||
r"/[0-9a-f]{2}/[0-9a-f]{2}/[0-9a-f]{32}\.\w{3}",
|
||||
@ -245,10 +243,10 @@ class PinterestCreatedExtractor(PinterestExtractor):
|
||||
"""Extractor for a user's created pins"""
|
||||
subcategory = "created"
|
||||
directory_fmt = ("{category}", "{user}")
|
||||
pattern = BASE_PATTERN + r"/(?!pin/)([^/?#&]+)/_created/?$"
|
||||
pattern = BASE_PATTERN + r"/(?!pin/)([^/?#]+)/_created/?$"
|
||||
test = ("https://www.pinterest.de/digitalmomblog/_created/", {
|
||||
"pattern": r"https://i\.pinimg\.com/originals/[0-9a-f]{2}"
|
||||
r"/[0-9a-f]{2}/[0-9a-f]{2}/[0-9a-f]{32}\.jpg",
|
||||
r"/[0-9a-f]{2}/[0-9a-f]{2}/[0-9a-f]{32}\.(jpg|png)",
|
||||
"count": 10,
|
||||
"range": "1-10",
|
||||
})
|
||||
@ -270,7 +268,7 @@ class PinterestSectionExtractor(PinterestExtractor):
|
||||
directory_fmt = ("{category}", "{board[owner][username]}",
|
||||
"{board[name]}", "{section[title]}")
|
||||
archive_fmt = "{board[id]}_{id}"
|
||||
pattern = BASE_PATTERN + r"/(?!pin/)([^/?#&]+)/([^/?#&]+)/([^/?#&]+)"
|
||||
pattern = BASE_PATTERN + r"/(?!pin/)([^/?#]+)/([^/?#]+)/([^/?#]+)"
|
||||
test = ("https://www.pinterest.com/g1952849/stuff/section", {
|
||||
"count": 2,
|
||||
})
|
||||
@ -321,7 +319,7 @@ class PinterestRelatedPinExtractor(PinterestPinExtractor):
|
||||
"""Extractor for related pins of another pin from pinterest.com"""
|
||||
subcategory = "related-pin"
|
||||
directory_fmt = ("{category}", "related {original_pin[id]}")
|
||||
pattern = BASE_PATTERN + r"/pin/([^/?#&]+).*#related$"
|
||||
pattern = BASE_PATTERN + r"/pin/([^/?#]+).*#related$"
|
||||
test = ("https://www.pinterest.com/pin/858146903966145189/#related", {
|
||||
"range": "31-70",
|
||||
"count": 40,
|
||||
@ -340,7 +338,7 @@ class PinterestRelatedBoardExtractor(PinterestBoardExtractor):
|
||||
subcategory = "related-board"
|
||||
directory_fmt = ("{category}", "{board[owner][username]}",
|
||||
"{board[name]}", "related")
|
||||
pattern = BASE_PATTERN + r"/(?!pin/)([^/?#&]+)/([^/?#&]+)/?#related$"
|
||||
pattern = BASE_PATTERN + r"/(?!pin/)([^/?#]+)/([^/?#]+)/?#related$"
|
||||
test = ("https://www.pinterest.com/g1952849/test-/#related", {
|
||||
"range": "31-70",
|
||||
"count": 40,
|
||||
@ -348,13 +346,13 @@ class PinterestRelatedBoardExtractor(PinterestBoardExtractor):
|
||||
})
|
||||
|
||||
def pins(self):
|
||||
return self.api.board_related(self.board["id"])
|
||||
return self.api.board_content_recommendation(self.board["id"])
|
||||
|
||||
|
||||
class PinterestPinitExtractor(PinterestExtractor):
|
||||
"""Extractor for images from a pin.it URL"""
|
||||
subcategory = "pinit"
|
||||
pattern = r"(?:https?://)?pin\.it/([^/?#&]+)"
|
||||
pattern = r"(?:https?://)?pin\.it/([^/?#]+)"
|
||||
|
||||
test = (
|
||||
("https://pin.it/Hvt8hgT", {
|
||||
@ -370,7 +368,7 @@ class PinterestPinitExtractor(PinterestExtractor):
|
||||
self.shortened_id = match.group(1)
|
||||
|
||||
def items(self):
|
||||
url = "https://api.pinterest.com/url_shortener/{}/redirect".format(
|
||||
url = "https://api.pinterest.com/url_shortener/{}/redirect/".format(
|
||||
self.shortened_id)
|
||||
response = self.request(url, method="HEAD", allow_redirects=False)
|
||||
location = response.headers.get("Location")
|
||||
@ -458,10 +456,10 @@ class PinterestAPI():
|
||||
options = {"section_id": section_id}
|
||||
return self._pagination("BoardSectionPins", options)
|
||||
|
||||
def board_related(self, board_id):
|
||||
def board_content_recommendation(self, board_id):
|
||||
"""Yield related pins of a specific board"""
|
||||
options = {"board_id": board_id, "add_vase": True}
|
||||
return self._pagination("BoardRelatedPixieFeed", options)
|
||||
options = {"id": board_id, "type": "board", "add_vase": True}
|
||||
return self._pagination("BoardContentRecommendation", options)
|
||||
|
||||
def user_pins(self, user):
|
||||
"""Yield all pins from 'user'"""
|
||||
|
@ -15,6 +15,9 @@ from datetime import datetime, timedelta
|
||||
import itertools
|
||||
import hashlib
|
||||
|
||||
BASE_PATTERN = r"(?:https?://)?(?:www\.|touch\.)?pixiv\.net"
|
||||
USER_PATTERN = BASE_PATTERN + r"/(?:en/)?users/(\d+)"
|
||||
|
||||
|
||||
class PixivExtractor(Extractor):
|
||||
"""Base class for pixiv extractors"""
|
||||
@ -23,10 +26,9 @@ class PixivExtractor(Extractor):
|
||||
directory_fmt = ("{category}", "{user[id]} {user[account]}")
|
||||
filename_fmt = "{id}_p{num}.{extension}"
|
||||
archive_fmt = "{id}{suffix}.{extension}"
|
||||
cookiedomain = None
|
||||
cookies_domain = None
|
||||
|
||||
def __init__(self, match):
|
||||
Extractor.__init__(self, match)
|
||||
def _init(self):
|
||||
self.api = PixivAppAPI(self)
|
||||
self.load_ugoira = self.config("ugoira", True)
|
||||
self.max_posts = self.config("max-posts", 0)
|
||||
@ -44,6 +46,8 @@ class PixivExtractor(Extractor):
|
||||
def transform_tags(work):
|
||||
work["tags"] = [tag["name"] for tag in work["tags"]]
|
||||
|
||||
url_sanity = ("https://s.pximg.net/common/images"
|
||||
"/limit_sanity_level_360.png")
|
||||
ratings = {0: "General", 1: "R-18", 2: "R-18G"}
|
||||
meta_user = self.config("metadata")
|
||||
meta_bookmark = self.config("metadata-bookmark")
|
||||
@ -99,6 +103,10 @@ class PixivExtractor(Extractor):
|
||||
|
||||
elif work["page_count"] == 1:
|
||||
url = meta_single_page["original_image_url"]
|
||||
if url == url_sanity:
|
||||
self.log.debug("Skipping 'sanity_level' warning (%s)",
|
||||
work["id"])
|
||||
continue
|
||||
work["date_url"] = self._date_from_url(url)
|
||||
yield Message.Url, url, text.nameext_from_url(url, work)
|
||||
|
||||
@ -150,7 +158,7 @@ class PixivExtractor(Extractor):
|
||||
class PixivUserExtractor(PixivExtractor):
|
||||
"""Extractor for a pixiv user profile"""
|
||||
subcategory = "user"
|
||||
pattern = (r"(?:https?://)?(?:www\.|touch\.)?pixiv\.net/(?:"
|
||||
pattern = (BASE_PATTERN + r"/(?:"
|
||||
r"(?:en/)?u(?:sers)?/|member\.php\?id=|(?:mypage\.php)?#id="
|
||||
r")(\d+)(?:$|[?#])")
|
||||
test = (
|
||||
@ -165,20 +173,25 @@ class PixivUserExtractor(PixivExtractor):
|
||||
PixivExtractor.__init__(self, match)
|
||||
self.user_id = match.group(1)
|
||||
|
||||
def initialize(self):
|
||||
pass
|
||||
|
||||
def items(self):
|
||||
base = "{}/users/{}/".format(self.root, self.user_id)
|
||||
return self._dispatch_extractors((
|
||||
(PixivAvatarExtractor , base + "avatar"),
|
||||
(PixivBackgroundExtractor, base + "background"),
|
||||
(PixivArtworksExtractor , base + "artworks"),
|
||||
(PixivFavoriteExtractor , base + "bookmarks/artworks"),
|
||||
(PixivAvatarExtractor , base + "avatar"),
|
||||
(PixivBackgroundExtractor , base + "background"),
|
||||
(PixivArtworksExtractor , base + "artworks"),
|
||||
(PixivFavoriteExtractor , base + "bookmarks/artworks"),
|
||||
(PixivNovelBookmarkExtractor, base + "bookmarks/novels"),
|
||||
(PixivNovelUserExtractor , base + "novels"),
|
||||
), ("artworks",))
|
||||
|
||||
|
||||
class PixivArtworksExtractor(PixivExtractor):
|
||||
"""Extractor for artworks of a pixiv user"""
|
||||
subcategory = "artworks"
|
||||
pattern = (r"(?:https?://)?(?:www\.|touch\.)?pixiv\.net/(?:"
|
||||
pattern = (BASE_PATTERN + r"/(?:"
|
||||
r"(?:en/)?users/(\d+)/(?:artworks|illustrations|manga)"
|
||||
r"(?:/([^/?#]+))?/?(?:$|[?#])"
|
||||
r"|member_illust\.php\?id=(\d+)(?:&([^#]+))?)")
|
||||
@ -239,8 +252,7 @@ class PixivAvatarExtractor(PixivExtractor):
|
||||
subcategory = "avatar"
|
||||
filename_fmt = "avatar{date:?_//%Y-%m-%d}.{extension}"
|
||||
archive_fmt = "avatar_{user[id]}_{date}"
|
||||
pattern = (r"(?:https?://)?(?:www\.)?pixiv\.net"
|
||||
r"/(?:en/)?users/(\d+)/avatar")
|
||||
pattern = USER_PATTERN + r"/avatar"
|
||||
test = ("https://www.pixiv.net/en/users/173530/avatar", {
|
||||
"content": "4e57544480cc2036ea9608103e8f024fa737fe66",
|
||||
})
|
||||
@ -260,8 +272,7 @@ class PixivBackgroundExtractor(PixivExtractor):
|
||||
subcategory = "background"
|
||||
filename_fmt = "background{date:?_//%Y-%m-%d}.{extension}"
|
||||
archive_fmt = "background_{user[id]}_{date}"
|
||||
pattern = (r"(?:https?://)?(?:www\.)?pixiv\.net"
|
||||
r"/(?:en/)?users/(\d+)/background")
|
||||
pattern = USER_PATTERN + "/background"
|
||||
test = ("https://www.pixiv.net/en/users/194921/background", {
|
||||
"pattern": r"https://i\.pximg\.net/background/img/2021/01/30/16/12/02"
|
||||
r"/194921_af1f71e557a42f499213d4b9eaccc0f8\.jpg",
|
||||
@ -375,12 +386,12 @@ class PixivWorkExtractor(PixivExtractor):
|
||||
|
||||
|
||||
class PixivFavoriteExtractor(PixivExtractor):
|
||||
"""Extractor for all favorites/bookmarks of a pixiv-user"""
|
||||
"""Extractor for all favorites/bookmarks of a pixiv user"""
|
||||
subcategory = "favorite"
|
||||
directory_fmt = ("{category}", "bookmarks",
|
||||
"{user_bookmark[id]} {user_bookmark[account]}")
|
||||
archive_fmt = "f_{user_bookmark[id]}_{id}{num}.{extension}"
|
||||
pattern = (r"(?:https?://)?(?:www\.|touch\.)?pixiv\.net/(?:(?:en/)?"
|
||||
pattern = (BASE_PATTERN + r"/(?:(?:en/)?"
|
||||
r"users/(\d+)/(bookmarks/artworks|following)(?:/([^/?#]+))?"
|
||||
r"|bookmark\.php)(?:\?([^#]*))?")
|
||||
test = (
|
||||
@ -483,8 +494,7 @@ class PixivRankingExtractor(PixivExtractor):
|
||||
archive_fmt = "r_{ranking[mode]}_{ranking[date]}_{id}{num}.{extension}"
|
||||
directory_fmt = ("{category}", "rankings",
|
||||
"{ranking[mode]}", "{ranking[date]}")
|
||||
pattern = (r"(?:https?://)?(?:www\.|touch\.)?pixiv\.net"
|
||||
r"/ranking\.php(?:\?([^#]*))?")
|
||||
pattern = BASE_PATTERN + r"/ranking\.php(?:\?([^#]*))?"
|
||||
test = (
|
||||
("https://www.pixiv.net/ranking.php?mode=daily&date=20170818"),
|
||||
("https://www.pixiv.net/ranking.php"),
|
||||
@ -549,8 +559,7 @@ class PixivSearchExtractor(PixivExtractor):
|
||||
subcategory = "search"
|
||||
archive_fmt = "s_{search[word]}_{id}{num}.{extension}"
|
||||
directory_fmt = ("{category}", "search", "{search[word]}")
|
||||
pattern = (r"(?:https?://)?(?:www\.|touch\.)?pixiv\.net"
|
||||
r"/(?:(?:en/)?tags/([^/?#]+)(?:/[^/?#]+)?/?"
|
||||
pattern = (BASE_PATTERN + r"/(?:(?:en/)?tags/([^/?#]+)(?:/[^/?#]+)?/?"
|
||||
r"|search\.php)(?:\?([^#]+))?")
|
||||
test = (
|
||||
("https://www.pixiv.net/en/tags/Original", {
|
||||
@ -596,6 +605,9 @@ class PixivSearchExtractor(PixivExtractor):
|
||||
sort_map = {
|
||||
"date": "date_asc",
|
||||
"date_d": "date_desc",
|
||||
"popular_d": "popular_desc",
|
||||
"popular_male_d": "popular_male_desc",
|
||||
"popular_female_d": "popular_female_desc",
|
||||
}
|
||||
try:
|
||||
self.sort = sort = sort_map[sort]
|
||||
@ -630,8 +642,7 @@ class PixivFollowExtractor(PixivExtractor):
|
||||
subcategory = "follow"
|
||||
archive_fmt = "F_{user_follow[id]}_{id}{num}.{extension}"
|
||||
directory_fmt = ("{category}", "following")
|
||||
pattern = (r"(?:https?://)?(?:www\.|touch\.)?pixiv\.net"
|
||||
r"/bookmark_new_illust\.php")
|
||||
pattern = BASE_PATTERN + r"/bookmark_new_illust\.php"
|
||||
test = (
|
||||
("https://www.pixiv.net/bookmark_new_illust.php"),
|
||||
("https://touch.pixiv.net/bookmark_new_illust.php"),
|
||||
@ -670,7 +681,7 @@ class PixivPixivisionExtractor(PixivExtractor):
|
||||
|
||||
def works(self):
|
||||
return (
|
||||
self.api.illust_detail(illust_id)
|
||||
self.api.illust_detail(illust_id.partition("?")[0])
|
||||
for illust_id in util.unique_sequence(text.extract_iter(
|
||||
self.page, '<a href="https://www.pixiv.net/en/artworks/', '"'))
|
||||
)
|
||||
@ -693,8 +704,7 @@ class PixivSeriesExtractor(PixivExtractor):
|
||||
directory_fmt = ("{category}", "{user[id]} {user[account]}",
|
||||
"{series[id]} {series[title]}")
|
||||
filename_fmt = "{num_series:>03}_{id}_p{num}.{extension}"
|
||||
pattern = (r"(?:https?://)?(?:www\.)?pixiv\.net"
|
||||
r"/user/(\d+)/series/(\d+)")
|
||||
pattern = BASE_PATTERN + r"/user/(\d+)/series/(\d+)"
|
||||
test = ("https://www.pixiv.net/user/10509347/series/21859", {
|
||||
"range": "1-10",
|
||||
"count": 10,
|
||||
@ -747,6 +757,220 @@ class PixivSeriesExtractor(PixivExtractor):
|
||||
params["p"] += 1
|
||||
|
||||
|
||||
class PixivNovelExtractor(PixivExtractor):
|
||||
"""Extractor for pixiv novels"""
|
||||
subcategory = "novel"
|
||||
request_interval = 1.0
|
||||
pattern = BASE_PATTERN + r"/n(?:ovel/show\.php\?id=|/)(\d+)"
|
||||
test = (
|
||||
("https://www.pixiv.net/novel/show.php?id=19612040", {
|
||||
"count": 1,
|
||||
"content": "8c818474153cbd2f221ee08766e1d634c821d8b4",
|
||||
"keyword": {
|
||||
"caption": r"re:「無能な名無し」と呼ばれ虐げられて育った鈴\(すず\)は、",
|
||||
"comment_access_control": 0,
|
||||
"create_date": "2023-04-02T15:18:58+09:00",
|
||||
"date": "dt:2023-04-02 06:18:58",
|
||||
"id": 19612040,
|
||||
"is_bookmarked": False,
|
||||
"is_muted": False,
|
||||
"is_mypixiv_only": False,
|
||||
"is_original": True,
|
||||
"is_x_restricted": False,
|
||||
"novel_ai_type": 1,
|
||||
"page_count": 1,
|
||||
"rating": "General",
|
||||
"restrict": 0,
|
||||
"series": {
|
||||
"id": 10278364,
|
||||
"title": "龍の贄嫁〜無能な名無しと虐げられていましたが、"
|
||||
"どうやら異母妹に霊力を搾取されていたようです〜",
|
||||
},
|
||||
"tags": ["和風ファンタジー", "溺愛", "神様", "ヤンデレ", "執着",
|
||||
"異能", "ざまぁ", "学園", "神嫁"],
|
||||
"text_length": 5974,
|
||||
"title": "異母妹から「無能な名無し」と虐げられていた私、"
|
||||
"どうやら異母妹に霊力を搾取されていたようです(1)",
|
||||
"user": {
|
||||
"account": "yukinaga_chifuyu",
|
||||
"id": 77055466,
|
||||
},
|
||||
"visible": True,
|
||||
"x_restrict": 0,
|
||||
},
|
||||
}),
|
||||
# embeds
|
||||
("https://www.pixiv.net/novel/show.php?id=16422450", {
|
||||
"options": (("embeds", True),),
|
||||
"count": 3,
|
||||
}),
|
||||
# full series
|
||||
("https://www.pixiv.net/novel/show.php?id=19612040", {
|
||||
"options": (("full-series", True),),
|
||||
"count": 4,
|
||||
}),
|
||||
# short URL
|
||||
("https://www.pixiv.net/n/19612040"),
|
||||
)
|
||||
|
||||
def __init__(self, match):
|
||||
PixivExtractor.__init__(self, match)
|
||||
self.novel_id = match.group(1)
|
||||
|
||||
def items(self):
|
||||
tags = self.config("tags", "japanese")
|
||||
if tags == "original":
|
||||
transform_tags = None
|
||||
elif tags == "translated":
|
||||
def transform_tags(work):
|
||||
work["tags"] = list(dict.fromkeys(
|
||||
tag["translated_name"] or tag["name"]
|
||||
for tag in work["tags"]))
|
||||
else:
|
||||
def transform_tags(work):
|
||||
work["tags"] = [tag["name"] for tag in work["tags"]]
|
||||
|
||||
ratings = {0: "General", 1: "R-18", 2: "R-18G"}
|
||||
meta_user = self.config("metadata")
|
||||
meta_bookmark = self.config("metadata-bookmark")
|
||||
embeds = self.config("embeds")
|
||||
|
||||
if embeds:
|
||||
headers = {
|
||||
"User-Agent" : "Mozilla/5.0",
|
||||
"App-OS" : None,
|
||||
"App-OS-Version": None,
|
||||
"App-Version" : None,
|
||||
"Referer" : self.root + "/",
|
||||
"Authorization" : None,
|
||||
}
|
||||
|
||||
novels = self.novels()
|
||||
if self.max_posts:
|
||||
novels = itertools.islice(novels, self.max_posts)
|
||||
for novel in novels:
|
||||
if meta_user:
|
||||
novel.update(self.api.user_detail(novel["user"]["id"]))
|
||||
if meta_bookmark and novel["is_bookmarked"]:
|
||||
detail = self.api.novel_bookmark_detail(novel["id"])
|
||||
novel["tags_bookmark"] = [tag["name"] for tag in detail["tags"]
|
||||
if tag["is_registered"]]
|
||||
if transform_tags:
|
||||
transform_tags(novel)
|
||||
novel["num"] = 0
|
||||
novel["date"] = text.parse_datetime(novel["create_date"])
|
||||
novel["rating"] = ratings.get(novel["x_restrict"])
|
||||
novel["suffix"] = ""
|
||||
|
||||
yield Message.Directory, novel
|
||||
|
||||
novel["extension"] = "txt"
|
||||
content = self.api.novel_text(novel["id"])["novel_text"]
|
||||
yield Message.Url, "text:" + content, novel
|
||||
|
||||
if embeds:
|
||||
desktop = False
|
||||
illusts = {}
|
||||
|
||||
for marker in text.extract_iter(content, "[", "]"):
|
||||
if marker.startswith("[jumpuri:If you would like to "):
|
||||
desktop = True
|
||||
elif marker.startswith("pixivimage:"):
|
||||
illusts[marker[11:].partition("-")[0]] = None
|
||||
|
||||
if desktop:
|
||||
novel_id = str(novel["id"])
|
||||
url = "{}/novel/show.php?id={}".format(
|
||||
self.root, novel_id)
|
||||
data = util.json_loads(text.extr(
|
||||
self.request(url, headers=headers).text,
|
||||
"id=\"meta-preload-data\" content='", "'"))
|
||||
|
||||
for image in (data["novel"][novel_id]
|
||||
["textEmbeddedImages"]).values():
|
||||
url = image.pop("urls")["original"]
|
||||
novel.update(image)
|
||||
novel["date_url"] = self._date_from_url(url)
|
||||
novel["num"] += 1
|
||||
novel["suffix"] = "_p{:02}".format(novel["num"])
|
||||
text.nameext_from_url(url, novel)
|
||||
yield Message.Url, url, novel
|
||||
|
||||
if illusts:
|
||||
novel["_extractor"] = PixivWorkExtractor
|
||||
novel["date_url"] = None
|
||||
for illust_id in illusts:
|
||||
novel["num"] += 1
|
||||
novel["suffix"] = "_p{:02}".format(novel["num"])
|
||||
url = "{}/artworks/{}".format(self.root, illust_id)
|
||||
yield Message.Queue, url, novel
|
||||
|
||||
def novels(self):
|
||||
novel = self.api.novel_detail(self.novel_id)
|
||||
if self.config("full-series") and novel["series"]:
|
||||
self.subcategory = PixivNovelSeriesExtractor.subcategory
|
||||
return self.api.novel_series(novel["series"]["id"])
|
||||
return (novel,)
|
||||
|
||||
|
||||
class PixivNovelUserExtractor(PixivNovelExtractor):
|
||||
"""Extractor for pixiv users' novels"""
|
||||
subcategory = "novel-user"
|
||||
pattern = USER_PATTERN + r"/novels"
|
||||
test = ("https://www.pixiv.net/en/users/77055466/novels", {
|
||||
"pattern": "^text:",
|
||||
"range": "1-5",
|
||||
"count": 5,
|
||||
})
|
||||
|
||||
def novels(self):
|
||||
return self.api.user_novels(self.novel_id)
|
||||
|
||||
|
||||
class PixivNovelSeriesExtractor(PixivNovelExtractor):
|
||||
"""Extractor for pixiv novel series"""
|
||||
subcategory = "novel-series"
|
||||
pattern = BASE_PATTERN + r"/novel/series/(\d+)"
|
||||
test = ("https://www.pixiv.net/novel/series/10278364", {
|
||||
"count": 4,
|
||||
"content": "b06abed001b3f6ccfb1579699e9a238b46d38ea2",
|
||||
})
|
||||
|
||||
def novels(self):
|
||||
return self.api.novel_series(self.novel_id)
|
||||
|
||||
|
||||
class PixivNovelBookmarkExtractor(PixivNovelExtractor):
|
||||
"""Extractor for bookmarked pixiv novels"""
|
||||
subcategory = "novel-bookmark"
|
||||
pattern = (USER_PATTERN + r"/bookmarks/novels"
|
||||
r"(?:/([^/?#]+))?(?:/?\?([^#]+))?")
|
||||
test = (
|
||||
("https://www.pixiv.net/en/users/77055466/bookmarks/novels", {
|
||||
"count": 1,
|
||||
"content": "7194e8faa876b2b536f185ee271a2b6e46c69089",
|
||||
}),
|
||||
("https://www.pixiv.net/en/users/11/bookmarks/novels/TAG?rest=hide"),
|
||||
)
|
||||
|
||||
def __init__(self, match):
|
||||
PixivNovelExtractor.__init__(self, match)
|
||||
self.user_id, self.tag, self.query = match.groups()
|
||||
|
||||
def novels(self):
|
||||
if self.tag:
|
||||
tag = text.unquote(self.tag)
|
||||
else:
|
||||
tag = None
|
||||
|
||||
if text.parse_query(self.query).get("rest") == "hide":
|
||||
restrict = "private"
|
||||
else:
|
||||
restrict = "public"
|
||||
|
||||
return self.api.user_bookmarks_novel(self.user_id, tag, restrict)
|
||||
|
||||
|
||||
class PixivSketchExtractor(Extractor):
|
||||
"""Extractor for user pages on sketch.pixiv.net"""
|
||||
category = "pixiv"
|
||||
@ -755,7 +979,7 @@ class PixivSketchExtractor(Extractor):
|
||||
filename_fmt = "{post_id} {id}.{extension}"
|
||||
archive_fmt = "S{user[id]}_{id}"
|
||||
root = "https://sketch.pixiv.net"
|
||||
cookiedomain = ".pixiv.net"
|
||||
cookies_domain = ".pixiv.net"
|
||||
pattern = r"(?:https?://)?sketch\.pixiv\.net/@([^/?#]+)"
|
||||
test = ("https://sketch.pixiv.net/@nicoby", {
|
||||
"pattern": r"https://img\-sketch\.pixiv\.net/uploads/medium"
|
||||
@ -904,6 +1128,23 @@ class PixivAppAPI():
|
||||
params = {"illust_id": illust_id}
|
||||
return self._pagination("/v2/illust/related", params)
|
||||
|
||||
def novel_bookmark_detail(self, novel_id):
|
||||
params = {"novel_id": novel_id}
|
||||
return self._call(
|
||||
"/v2/novel/bookmark/detail", params)["bookmark_detail"]
|
||||
|
||||
def novel_detail(self, novel_id):
|
||||
params = {"novel_id": novel_id}
|
||||
return self._call("/v2/novel/detail", params)["novel"]
|
||||
|
||||
def novel_series(self, series_id):
|
||||
params = {"series_id": series_id}
|
||||
return self._pagination("/v1/novel/series", params, "novels")
|
||||
|
||||
def novel_text(self, novel_id):
|
||||
params = {"novel_id": novel_id}
|
||||
return self._call("/v1/novel/text", params)
|
||||
|
||||
def search_illust(self, word, sort=None, target=None, duration=None,
|
||||
date_start=None, date_end=None):
|
||||
params = {"word": word, "search_target": target,
|
||||
@ -916,6 +1157,11 @@ class PixivAppAPI():
|
||||
params = {"user_id": user_id, "tag": tag, "restrict": restrict}
|
||||
return self._pagination("/v1/user/bookmarks/illust", params)
|
||||
|
||||
def user_bookmarks_novel(self, user_id, tag=None, restrict="public"):
|
||||
"""Return novels bookmarked by a user"""
|
||||
params = {"user_id": user_id, "tag": tag, "restrict": restrict}
|
||||
return self._pagination("/v1/user/bookmarks/novel", params, "novels")
|
||||
|
||||
def user_bookmark_tags_illust(self, user_id, restrict="public"):
|
||||
"""Return bookmark tags defined by a user"""
|
||||
params = {"user_id": user_id, "restrict": restrict}
|
||||
@ -935,6 +1181,10 @@ class PixivAppAPI():
|
||||
params = {"user_id": user_id}
|
||||
return self._pagination("/v1/user/illusts", params)
|
||||
|
||||
def user_novels(self, user_id):
|
||||
params = {"user_id": user_id}
|
||||
return self._pagination("/v1/user/novels", params, "novels")
|
||||
|
||||
def ugoira_metadata(self, illust_id):
|
||||
params = {"illust_id": illust_id}
|
||||
return self._call("/v1/ugoira/metadata", params)["ugoira_metadata"]
|
||||
|
@ -41,7 +41,7 @@ class PoipikuExtractor(Extractor):
|
||||
"user_name" : text.unescape(extr(
|
||||
'<h2 class="UserInfoUserName">', '</').rpartition(">")[2]),
|
||||
"description": text.unescape(extr(
|
||||
'class="IllustItemDesc" >', '<')),
|
||||
'class="IllustItemDesc" >', '</h1>')),
|
||||
"_http_headers": {"Referer": post_url},
|
||||
}
|
||||
|
||||
@ -76,11 +76,12 @@ class PoipikuExtractor(Extractor):
|
||||
"MD" : "0",
|
||||
"TWF": "-1",
|
||||
}
|
||||
page = self.request(
|
||||
url, method="POST", headers=headers, data=data).json()["html"]
|
||||
resp = self.request(
|
||||
url, method="POST", headers=headers, data=data).json()
|
||||
|
||||
if page.startswith(("You need to", "Password is incorrect")):
|
||||
self.log.warning("'%s'", page)
|
||||
page = resp["html"]
|
||||
if (resp.get("result_num") or 0) < 0:
|
||||
self.log.warning("'%s'", page.replace("<br/>", " "))
|
||||
|
||||
for thumb in text.extract_iter(
|
||||
page, 'class="IllustItemThumbImg" src="', '"'):
|
||||
@ -172,7 +173,9 @@ class PoipikuPostExtractor(PoipikuExtractor):
|
||||
"count": 3,
|
||||
"keyword": {
|
||||
"count": "3",
|
||||
"description": "ORANGE OASISボスネタバレ",
|
||||
"description": "ORANGE OASISボスネタバレ<br />曲も大好き<br />"
|
||||
"2枚目以降はほとんど見えなかった1枚目背景"
|
||||
"のヒエログリフ小ネタです𓀀",
|
||||
"num": int,
|
||||
"post_category": "SPOILER",
|
||||
"post_id": "5776587",
|
||||
|
@ -1,6 +1,6 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
# Copyright 2019-2021 Mike Fährmann
|
||||
# Copyright 2019-2023 Mike Fährmann
|
||||
#
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License version 2 as
|
||||
@ -11,7 +11,6 @@
|
||||
from .common import Extractor, Message
|
||||
from .. import text, exception
|
||||
|
||||
|
||||
BASE_PATTERN = r"(?:https?://)?(?:[\w-]+\.)?pornhub\.com"
|
||||
|
||||
|
||||
@ -59,6 +58,9 @@ class PornhubGalleryExtractor(PornhubExtractor):
|
||||
self._first = None
|
||||
|
||||
def items(self):
|
||||
self.cookies.set(
|
||||
"accessAgeDisclaimerPH", "1", domain=".pornhub.com")
|
||||
|
||||
data = self.metadata()
|
||||
yield Message.Directory, data
|
||||
for num, image in enumerate(self.images(), 1):
|
||||
@ -109,7 +111,7 @@ class PornhubGalleryExtractor(PornhubExtractor):
|
||||
"views" : text.parse_int(img["times_viewed"]),
|
||||
"score" : text.parse_int(img["vote_percent"]),
|
||||
}
|
||||
key = img["next"]
|
||||
key = str(img["next"])
|
||||
if key == end:
|
||||
return
|
||||
|
||||
@ -146,10 +148,20 @@ class PornhubUserExtractor(PornhubExtractor):
|
||||
|
||||
data = {"_extractor": PornhubGalleryExtractor}
|
||||
while True:
|
||||
page = self.request(
|
||||
url, method="POST", headers=headers, params=params).text
|
||||
if not page:
|
||||
return
|
||||
for gid in text.extract_iter(page, 'id="albumphoto', '"'):
|
||||
response = self.request(
|
||||
url, method="POST", headers=headers, params=params,
|
||||
allow_redirects=False)
|
||||
|
||||
if 300 <= response.status_code < 400:
|
||||
url = "{}{}/photos/{}/ajax".format(
|
||||
self.root, response.headers["location"],
|
||||
self.cat or "public")
|
||||
continue
|
||||
|
||||
gid = None
|
||||
for gid in text.extract_iter(response.text, 'id="albumphoto', '"'):
|
||||
yield Message.Queue, self.root + "/album/" + gid, data
|
||||
if gid is None:
|
||||
return
|
||||
|
||||
params["page"] += 1
|
||||
|
@ -23,7 +23,9 @@ class PornpicsExtractor(Extractor):
|
||||
def __init__(self, match):
|
||||
super().__init__(match)
|
||||
self.item = match.group(1)
|
||||
self.session.headers["Referer"] = self.root
|
||||
|
||||
def _init(self):
|
||||
self.session.headers["Referer"] = self.root + "/"
|
||||
|
||||
def items(self):
|
||||
for gallery in self.galleries():
|
||||
|
@ -22,18 +22,21 @@ class ReactorExtractor(BaseExtractor):
|
||||
|
||||
def __init__(self, match):
|
||||
BaseExtractor.__init__(self, match)
|
||||
|
||||
url = text.ensure_http_scheme(match.group(0), "http://")
|
||||
pos = url.index("/", 10)
|
||||
|
||||
self.root, self.path = url[:pos], url[pos:]
|
||||
self.session.headers["Referer"] = self.root
|
||||
self.gif = self.config("gif", False)
|
||||
self.root = url[:pos]
|
||||
self.path = url[pos:]
|
||||
|
||||
if self.category == "reactor":
|
||||
# set category based on domain name
|
||||
netloc = urllib.parse.urlsplit(self.root).netloc
|
||||
self.category = netloc.rpartition(".")[0]
|
||||
|
||||
def _init(self):
|
||||
self.session.headers["Referer"] = self.root
|
||||
self.gif = self.config("gif", False)
|
||||
|
||||
def items(self):
|
||||
data = self.metadata()
|
||||
yield Message.Directory, data
|
||||
|
@ -57,8 +57,10 @@ class ReadcomiconlineIssueExtractor(ReadcomiconlineBase, ChapterExtractor):
|
||||
|
||||
def __init__(self, match):
|
||||
ChapterExtractor.__init__(self, match)
|
||||
self.params = match.group(2)
|
||||
|
||||
params = text.parse_query(match.group(2))
|
||||
def _init(self):
|
||||
params = text.parse_query(self.params)
|
||||
quality = self.config("quality")
|
||||
|
||||
if quality is None or quality == "auto":
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user