1
0
mirror of https://github.com/mikf/gallery-dl.git synced 2024-11-23 11:12:40 +01:00
Commit Graph

1958 Commits

Author SHA1 Message Date
Mike Fährmann
0ff90a3f7d
[gfycat] include title in default filenames (closes #434) 2019-10-02 21:46:01 +02:00
Mike Fährmann
fabdc3b0c6
release version 1.10.5 2019-09-28 22:13:41 +02:00
Mike Fährmann
de4e2029d1
[nsfwalbum] update test album
the old one is no longer available
2019-09-28 20:48:15 +02:00
Mike Fährmann
1faec285d1
[nijie] further improvements (closes #423)
- provide a 'user_name' metadata field
  - usually the same as 'artist_id', except for favorite downloads
- extract the whole description text and properly escape HTML entities
- fixed an issue with titles or tags containing double quotes
2019-09-27 23:14:32 +02:00
Mike Fährmann
6d0a533d68
[reddit] respect 'comments:0' for single submissions (#429) 2019-09-27 23:11:28 +02:00
Mike Fährmann
803d8f814e
[oauth] update scope for reddit tokens (#428)
'/user/<username>/...' requires the 'history' scope to be accessible
(https://www.reddit.com/dev/api/#GET_user_{username}_{where})
2019-09-27 17:38:55 +02:00
Mike Fährmann
46ba173ded
[reddit] fix documentation inconsistencies (closes #429)
- Require 'reddit.comments' to be a number and convert it to an
  integer to be extra sure
- Link to the README's OAuth section were appropriate
2019-09-27 17:34:10 +02:00
Mike Fährmann
20eb6c401f
[nijie] improvements and fixes (#423)
- ignore unavailable image pages
- more metadata fields: artist_name, date, tags
- rename 'index' to 'num'
- improved code structure
2019-09-26 21:45:01 +02:00
Mike Fährmann
d1ea08c67d
[weibo] fixes and improvements
- ignore unavailable videos (fixes #427)
- handle empty 'geo' fields
- consistent metadata fields for images and videos
2019-09-26 14:57:35 +02:00
Mike Fährmann
38d97f3da6
[deviantart] add debug message about API credentials (#424) 2019-09-25 21:20:55 +02:00
Mike Fährmann
80c2104fb5
[deviantart] fix 429 handling if 'fatal' is False (closes #424) 2019-09-25 21:16:35 +02:00
Mike Fährmann
913460240d
[reddit] fix 'extractor.blacklist()' arguments
The second argument must support 'append()'.
2019-09-24 23:01:12 +02:00
Mike Fährmann
22bac14452
[pixiv] match '/artworks/' URLs 2019-09-24 21:53:14 +02:00
Mike Fährmann
66cac207ac
[twitter] match and use 'i/web' status URLs 2019-09-24 21:18:05 +02:00
Mike Fährmann
946f2751e2
[reddit] add 'user' extractor (closes #350) 2019-09-22 22:18:17 +02:00
Mike Fährmann
c14abb9fb8
[reddit] improve URL parameter handling for subreddit links 2019-09-22 22:03:22 +02:00
Mike Fährmann
ee8b654464
[instagram] implement 'highlights' option (closes #329) 2019-09-21 23:38:20 +02:00
Mike Fährmann
f63c3097a9
[instagram] rework some code paths
- combine fetching an HTML page and extracting its 'shared_data'
- move 'shared_data' and field access info out of '_extract_page()'
- introduce a '_request_graphql()' method
2019-09-21 23:10:41 +02:00
Mike Fährmann
4330133114
[imgur] add 'favorite' extractor (closes #420)
… and use a newer site-internal API endpoint for user posts
2019-09-19 15:54:26 +02:00
Mike Fährmann
ee5e20221f
[imgth] fix image URLs 2019-09-19 14:56:48 +02:00
Mike Fährmann
b63b126808
[hentaicafe] extend URL pattern 2019-09-18 19:08:45 +02:00
Mike Fährmann
d780f0357e
[imgur] add user extractor 2019-09-17 22:58:18 +02:00
Mike Fährmann
11ea689013
[simplyhentai] fix image and video URLs 2019-09-16 21:37:16 +02:00
Mike Fährmann
15632a1570
[tsumino] fix extraction 2019-09-15 22:09:59 +02:00
Mike Fährmann
d92802fd37
[luscious] fix detection of unavailable galleries 2019-09-15 21:16:25 +02:00
Mike Fährmann
f99da2b866
[imgbb] detect invalid album and user profile links
and update test results, since the old album got deleted
2019-09-14 23:22:08 +02:00
Mike Fährmann
01bc7adadc
[deviantart] improve journal detection (#419)
Some journal-like posts are not reported to be journals (isJournal
is set to False), even though they have a textContent field.

https://www.deviantart.com/gliitchlord/art/brashstrokes-812942668
2019-09-14 22:45:22 +02:00
Mike Fährmann
776e9e073f
close archive on job completion (#417) 2019-09-10 22:43:51 +02:00
Mike Fährmann
5ac9732adc
call 'sys.exit()' on Ctrl+c 2019-09-10 16:53:21 +02:00
Mike Fährmann
9178b54eae
handle errors when opening download archive file (#417) 2019-09-10 16:44:47 +02:00
Mike Fährmann
6e12907de6
[deviantart] improve handling of private deviations (#414)
- don't try to call '/deviation/metadata' with an empty list of
  deviation ids
- print a warning when detecting private deviations without having
  a 'refresh-token'
2019-09-10 16:09:03 +02:00
Mike Fährmann
4203931d79
release version 1.10.4 2019-09-08 13:54:45 +02:00
Mike Fährmann
e7690ac694
[vsco] update URL pattern (closes #410) 2019-09-08 11:37:27 +02:00
Mike Fährmann
1848788970
update test results etc 2019-09-08 11:33:35 +02:00
Mike Fährmann
d5fbb2d9de
[tumblr] ignore audio links from Spotify etc. 2019-09-07 18:18:12 +02:00
Mike Fährmann
b1cddce865
Revert "[simplyhentai] fix extraction; remove image+video extractors"
This reverts commit d1db5180ab.
2019-09-07 14:48:31 +02:00
Mike Fährmann
d23660c04d
[hentaicafe] restore default 'request()' behavior 2019-09-07 14:35:00 +02:00
Mike Fährmann
9ae58a6b3e
[exhentai] update image limit checks
- adjust cost of original images
- delay limit initialization until gallery and first image page have
  been requested and all cookies are available
2019-09-07 13:29:01 +02:00
Mike Fährmann
6fe9a134bf
[lineblog] add blog and post extractors (closes #404) 2019-09-06 22:16:42 +02:00
Mike Fährmann
4e8a548a61
[livedoor] update metadata extraction 2019-09-06 21:44:25 +02:00
Mike Fährmann
f9285f99e6
[pixiv] fix authentication 2019-09-02 22:38:56 +02:00
Mike Fährmann
6f3df3999a
[fuskator] add gallery and search extractor (closes #407) 2019-09-02 21:20:02 +02:00
Mike Fährmann
bc0ca66c99
[twitter] small improvements
- handle reply tweets (#403)
- unset cookies in Tweet extractor to "force" the legacy interface
2019-09-01 17:37:48 +02:00
Mike Fährmann
682105b8ee
prevent crash when loading unavailable downloader (#405) 2019-08-31 21:58:33 +02:00
Mike Fährmann
5fcebb69c2
[postprocessor:ugoira] improve error messages (#406) 2019-08-31 21:55:42 +02:00
Mike Fährmann
f02a768b5c
[danbooru] add 'ugoira' option (#406)
to choose between ZIP archives or converted video files
for Ugoira posts
2019-08-31 21:51:52 +02:00
Mike Fährmann
9646ccb320
release version 1.10.3 2019-08-30 19:41:16 +02:00
Mike Fährmann
dedea3b4db
[deviantart] fix journal creation (#400) 2019-08-30 18:50:04 +02:00
Mike Fährmann
c6c5cb1898
improve 'deviantart.quality' description 2019-08-30 18:41:18 +02:00
Mike Fährmann
8eaae58045
[downloader:http] change log message level to 'debug' 2019-08-29 23:05:47 +02:00
Mike Fährmann
efb64ad031
[deviantart] generate filenames (#392, #400) 2019-08-29 10:09:21 +02:00
Mike Fährmann
0ce98169b8
improve path generation
- fix 'abspath()' results for Python <3.7 (closes #402)
  - 'abspath()' in Python 3.7+ removes trailing path separators
  - in Python <3.7 it doesn't
- filter empty path segments
2019-08-28 23:25:18 +02:00
Mike Fährmann
b2151f3928
[seiga] support mobile URLs (closes #401) 2019-08-28 22:56:43 +02:00
Mike Fährmann
20fd2d8450
[flickr] skip unavailable images/videos (fixes #398) 2019-08-27 23:26:49 +02:00
Mike Fährmann
60c8e090da
[postprocessor:zip] fix archive names (closes #397)
Remove the trailing path separator introduced in 3284c62 before
adding the archive's filename extension.

[ci skip]
2019-08-24 23:14:26 +02:00
Mike Fährmann
7c09545f70
[downloader:ytdl] add 'outtmpl' option (#395) 2019-08-24 22:47:59 +02:00
Mike Fährmann
5cc7be2536
[piczel] update and improve
- use proper pagination (fixes #396)
- update API host and endpoints
- "fix" double slash // in image URLs
2019-08-24 20:37:33 +02:00
Mike Fährmann
0c1c7abb4d
release version 1.10.2 2019-08-23 22:10:54 +02:00
Mike Fährmann
49f6d7176d
[deviantart] restore filenames (#392)
<title>_by_<user>_<id> --> <title>_by_<user>-<id>
2019-08-23 22:02:03 +02:00
Mike Fährmann
63daa68d67
[deviantart] improvements (#392)
- consistent 'filename' entries, at least as far as possible
  - GIFs and SWFs don't have a <title>_by_<artist>_<id> anywhere in
    their metadata
  - Generating <id> (from 'deviationid'?) might be something that needs
    to be figured out, so we can build those filenames ourselves
- better code structure etc.
- tests for videos, archives, and flash animations
2019-08-23 12:27:19 +02:00
Mike Fährmann
d1db5180ab
[simplyhentai] fix extraction; remove image+video extractors 2019-08-22 23:56:41 +02:00
Mike Fährmann
30d6e284b0
[deviantart] use NAPI for artworks and scraps (#392)
TODO:
- journal downloads
- test for all media types
2019-08-21 23:56:06 +02:00
Mike Fährmann
7d6af936c5
[imgur] simplify gallery extraction 2019-08-20 20:00:43 +02:00
Mike Fährmann
3284c62f22
ensure PathFormat.directory ends with a path separator
... plus some other small optimizations
2019-08-20 00:25:13 +02:00
Mike Fährmann
ebabc5caf1
[downloader:http] treat 416 without downloaded data as error
Downloading https://pbs.twimg.com/media/EB2cGUYX4AI2Vuu.jpg:orig (NSFW)
sometimes returns a 416 status code, even though no 'Range' header was
sent and no data was downloaded prior.
This code usually means a file has already been downloaded completely
and the download method indicates success, but in this case it causes
an exception down the pipeline since no file was created.
2019-08-20 00:15:17 +02:00
Mike Fährmann
2495b99347
[postprocessor:classify] improve path generation (fixes #138)
It still doesn't work for converted ugoira animations thanks to how
those files are handled, but everything else, including files with
unknown or changing file extension, now works as it should.
2019-08-19 23:05:28 +02:00
Mike Fährmann
e77a656437
optimize directory path generation
- use str.join() instead of os.path.join()
  (less "features", but 10x as fast)
- cache directory formatters
- detect and optimize field access for 1-element format strings
2019-08-19 15:56:20 +02:00
Mike Fährmann
51d10783fc
[patreon] include image info in API results (#383) 2019-08-18 23:28:47 +02:00
Mike Fährmann
7a5e78741c
[booru] build directory path for each file (#385) 2019-08-18 23:28:33 +02:00
Mike Fährmann
b1728f512d
[patreon] support multi image posts and post URLs (#383) 2019-08-17 23:24:46 +02:00
Mike Fährmann
454bf1ebf9
preserve enumeration index after 'set_extension()' (#306) 2019-08-16 23:12:33 +02:00
Mike Fährmann
f5039b897f
replace DownloadArchive.check() with __contains__()
Interestingly enough, 'a in obj' is slightly faster than
'obj.check(a)' and is also nicer to look at, I think.
2019-08-16 23:12:32 +02:00
Mike Fährmann
5a210991b6
Remove control characters from filesystem paths
- add 'path-remove' option to specify the set of characters that
 should be removed
- rename 'restrict-filenames' to 'path-restrict'
- #348, #380
2019-08-16 23:12:16 +02:00
Mike Fährmann
c50d60a53d
[reactor] fix image URLs 2019-08-16 14:07:22 +02:00
Mike Fährmann
32447d0d24
[pixiv] simplify default filename format
(#366)
2019-08-15 13:32:47 +02:00
Mike Fährmann
5f8621b29d
improve output of active post processor modules 2019-08-15 13:31:04 +02:00
Mike Fährmann
2cbbc3dec4
add a 'whitelist' to '--ugoira-conv' (#382) 2019-08-15 13:27:57 +02:00
Mike Fährmann
829b1ccf04
[imgur] distinguish album and gallery URLs (#380)
A gallery can be either an album or a single image.
2019-08-14 21:40:14 +02:00
Mike Fährmann
23251356cb
require 'extension' data for each URL (#382) 2019-08-14 20:03:03 +02:00
Mike Fährmann
a67413d64f
[xhamster] use input URL domain
Don't rewrite all URLs as 'https://xhamster.com/...'
2019-08-14 00:21:30 +02:00
Mike Fährmann
0bb873757a
update PathFormat class
- change 'has_extension' from a simple flag/bool to a field that
  contains the original filename extension
- rename 'keywords' to 'kwdict' and some other stuff as well
- inline 'adjust_path()'
- put enumeration index before filename extension (#306)
2019-08-12 21:40:37 +02:00
Mike Fährmann
423f68f585
[deviantart] fix scraps extraction (closes #376) 2019-08-11 16:06:15 +02:00
Mike Fährmann
3bf20ffb70
[instagram] add support for story highlights 2019-08-10 14:34:22 +02:00
Mike Fährmann
a732e9c430
[instagram] update query hashes and headers 2019-08-10 14:13:08 +02:00
Mike Fährmann
2ccf6a9e35
[instagram] make extractor tests happy (#373) 2019-08-08 18:50:26 +02:00
Mike Fährmann
8dc42bb178
implement 'enumerate' for 'extractor.skip' (#306)
[ci skip]
2019-08-08 18:37:54 +02:00
Leonardo Taccari
bc5eaf7746 [instagram] Add support for IGTV (#373)
Add support for IGTV profile (instagram.com/<username>/channel/)
and IGTV medias (instagram.com/tv/<short_id>).
2019-08-08 18:33:13 +02:00
Mike Fährmann
b7fb93e2b2
[downloader:http] add 'adjust-extensions' option 2019-08-08 16:54:20 +02:00
Mike Fährmann
eb7da159e2
[imagebam] update URL test results
Image URLs are now using https://, but the website itself is still
served as http://.
2019-08-07 21:47:44 +02:00
Mike Fährmann
189acbeac9
[imgbb] add extractor for individual images (closes #363) 2019-08-05 22:52:08 +02:00
Mike Fährmann
ad3ac02fbc
[pixiv] update metadata entries (#366)
- change 'num' to a simple enumerating integer
- change default filename format
- provide content of the old 'num' field as 'suffix'
- add 'filename' for ugoira
2019-08-05 22:41:56 +02:00
Mike Fährmann
1ff4c4ec03
[adultempire] consistent artist order 2019-08-05 22:06:11 +02:00
Leonardo Taccari
2df050e627 [instagram] Add support for stories (#371)
* [instagram] Add support for stories

Add support for Instagram user's stories
(https://www.instagram.com/stories/<username>/).

First the shared_data in instagram.com/stories/<username> is fetched in
order to retrieve the user_id that is then passed to fetch the stories
via the corresponding graphql query.

Please note that fetching stories is supported only when authentication
is enabled and the corresponding <username> is followed.

* [instagram] Add an only-matching test for stories

* [instagram] Simplify InstagramExtractor.items() and _extract_stories()

Simplify handling of typename in InstagramExtractor.items() and multi-line
string in _extract_stories().  NFCI.
2019-08-05 22:04:34 +02:00
Mike Fährmann
f4bc75e854
fix rate limit handling for OAuth APIs (#368) 2019-08-03 13:43:00 +02:00
Mike Fährmann
3957d27d79
[deviantart] add 'quality' option (#369) 2019-08-03 11:40:35 +02:00
Mike Fährmann
64b2935d8e
[pixiv] provide 'filename' and change default filename format
to '{filename}.{extension}' (closes #366)
2019-08-02 22:35:10 +02:00
Mike Fährmann
2f33bac030
release version 1.10.1 2019-08-02 21:23:06 +02:00
Mike Fährmann
fa60109e97
[exhentai] don't use e-hentai.org for exhentai URLs 2019-08-02 21:10:09 +02:00
Mike Fährmann
dfe552421b
release version 1.10.0 2019-08-01 23:22:58 +02:00
Mike Fährmann
0609afd1e4
update default cache directory ... again
Use a 'gallery-dl' subdirectory in ~/.cache to adhere to how other
programs store their cached data, and call os.makedirs() so it also
works without an existing ~/.cache directory.
2019-08-01 22:11:00 +02:00
Mike Fährmann
4a0c98bfc9
miscellaneous fixes and adjustments 2019-08-01 22:09:43 +02:00
Mike Fährmann
2c839f3760
[imgbb] add user extractor + login support (#361) 2019-08-01 21:39:20 +02:00
Mike Fährmann
a8b60b2bd9
change default cache directory for unix systems
Use either $XDG_CACHE_HOME or ~/.cache (if the former isn't set)
and store potentially sensitive cookies and tokens in a user's
home directory and not in the world-readable /tmp.
2019-07-31 22:56:14 +02:00
Mike Fährmann
4b6edfbfd2
restrict permissions without importing 'pathlib'
and only on non-Windows systems.

1. On Windows the 'mode' argument for os.open() has no (visible) effect
   on access permissions for new files.
2. The default location for 'cache.file' on Windows is in
   %USERPROFILE%\AppData\Local\Temp which can only be accessed by the
   owner himself (or an admin).
2019-07-31 21:48:09 +02:00
Leonardo Taccari
afce1ee1eb
Avoid possible sensitive information disclosure via cache.file
Previously cache.file could be created world readable leading to
possible sensitive information disclosure on multi-user systems.
Restrict permissions only to the owner by creating an empty file.

Please note that cache.file created before this commit may need a
`chmod 600' or similar!
2019-07-31 15:05:26 +02:00
Mike Fährmann
2153206093
[imgbb] add album extractor (#361) 2019-07-30 23:11:19 +02:00
Mike Fährmann
beb4fab2e6
[exhentai] improve limit and error handling (#360)
- check image limit before opening the first gallery or image page
- prevent any further exhentai extractors from running after the image
  limit has been reached
2019-07-30 22:58:35 +02:00
Mike Fährmann
81b35ed3cb
[exhentai] catch more error states (#356, #360)
- warn on MPV-enabled galleries
- catch parsing errors for gallery pages and image info
- write page content to debug output
2019-07-29 16:54:31 +02:00
Mike Fährmann
a90280f4e7
[postprocessor:zip] add 'mode' option (#355) 2019-07-29 16:51:26 +02:00
Mike Fährmann
6ce22f606b
[exhentai] update login procedure and tests
Logging in now follows the natural login flow that also happens in a
browser more closely and collects more cookies than just ipb_member_id
and ipb_pass_hash.

Test URLs have been updated and now point to the e-hentai.org domain.
2019-07-28 16:51:05 +02:00
Mike Fährmann
dc73d02d87
[exhentai] always use e-hentai.org as domain + set nw cookie 2019-07-28 10:54:17 +02:00
Mike Fährmann
40637556fa
[ngomik] fix extraction 2019-07-28 10:53:46 +02:00
Mike Fährmann
3969f9cbbd
[behance] fix collection extraction 2019-07-27 14:26:40 +02:00
Mike Fährmann
20f7b07312
ensure postproc finalize() is called during C-c or crash (#355) 2019-07-27 11:14:52 +02:00
Mike Fährmann
17a3426845
[gelbooru] enable all content when not using API 2019-07-27 11:13:38 +02:00
Mike Fährmann
279db2c5b2
[vsco] add collection & image extractor + video support (#331) 2019-07-26 19:06:15 +02:00
Mike Fährmann
547ea71463
[downloader.ytdl] add 'forward-cookies' option (#352)
The "long" name is necessary because just calling it 'cookies' would
clash with how the lookup for '--cookies' is implemented.
2019-07-24 21:19:11 +02:00
Mike Fährmann
d9d44ad953
[tsumino] update test results 2019-07-24 21:17:23 +02:00
Mike Fährmann
b1bea8aaeb
add 'restrict-filenames' option (#348) 2019-07-23 17:41:24 +02:00
Mike Fährmann
60cf40380a
[vsco] add user extractor (#331) 2019-07-23 16:23:11 +02:00
Mike Fährmann
3fe5ccdfa6
[adultempire] add gallery extractor (closes #340) 2019-07-21 22:29:57 +02:00
Mike Fährmann
b3851e01d9
release version 1.9.0 2019-07-19 21:55:25 +02:00
Mike Fährmann
5d968412ca
[deviantart] case-insensitive folder name matching (fixes #343) 2019-07-19 18:05:31 +02:00
Mike Fährmann
a3c736fedc
[500px] fix extraction
Maximum available image dimensions have been reduced to 4096px
on the longest edge. (from 5000px)
A few (unimportant) metadata fields are no longer available or have
been changed to 'null'.
2019-07-19 17:23:03 +02:00
Mike Fährmann
1133b7fcbd
[smugmug] update unit tests
The account used for tests before has been deleted.
2019-07-19 17:16:24 +02:00
Mike Fährmann
21991acc49
add 'ciphers' option; update default User-Agent 2019-07-19 17:14:40 +02:00
Mike Fährmann
84f4d3bc0b
replace urllib3's default cipher list with Firefox's (#342)
Avoids Cloudflare CAPTCHAs on both Linux in Windows without
pyOpenSSL installed.
2019-07-18 19:42:13 +02:00
Mike Fährmann
feb98cf196
[twitter] improve 'content' formatting; add option (#338)
- include emoticons
- leave newlines intact
- remove pic.twitter.com/ links at the end
2019-07-17 16:02:51 +02:00
Mike Fährmann
1740086d8a
add 'repl' and 'sep' arguments to text.replace_html() 2019-07-17 14:48:24 +02:00
Mike Fährmann
8d1ae9b715
[tumblr] enable date-min/-max/-format options (#337) 2019-07-17 14:36:41 +02:00
Mike Fährmann
09f37fde39
[reddit] move date-min/-max handling into Extractor class 2019-07-16 22:54:39 +02:00
Mike Fährmann
7b77ecc35a
fix paths for files without extension (#220) 2019-07-15 16:39:03 +02:00
Mike Fährmann
c41ff9441e
improve find() for downloaders and postprocessors 2019-07-15 16:33:03 +02:00
Mike Fährmann
0151e250f5
[twitter] extract 'content' metadata (closes #333) 2019-07-15 16:25:22 +02:00
Mike Fährmann
16c582aaf9
implement 'mtime' post-processor (#332)
This can set a file's modification time according to a UNIX timestamp
or a datetime object from its metadata.
2019-07-14 22:39:17 +02:00
Mike Fährmann
62097284fe
add 'download' option (#220) 2019-07-14 18:48:18 +02:00
Mike Fährmann
fe7805de7c
improve attribute access in DownloadJob.handle_url()
Storing a value in a local variable an accessing it that way is faster
than going through 'self' if it is accessed more than once.
2019-07-13 21:42:07 +02:00
Mike Fährmann
56c7a66a4a
detect Cloudflare CAPTCHAs and update cipher list 2019-07-10 15:18:20 +02:00
Mike Fährmann
a7b42b37a2
[35photo] fix extraction 2019-07-09 20:33:57 +02:00
Mike Fährmann
04b8d0894a
[newgrounds] improve metadata extraction 2019-07-08 17:53:55 +02:00
Mike Fährmann
12da6bd0c9
[simplyhentai] fix/improve extraction 2019-07-06 20:25:53 +02:00
Mike Fährmann
fdec59f8e2
replace extractor.request() 'expect' argument
with
- 'fatal': allow 4xx status codes
- 'notfound': raise NotFoundError on 404
2019-07-05 00:42:16 +02:00
Mike Fährmann
2ff73873f0
[erolord] add gallery extractor (closes #326) 2019-07-04 20:28:04 +02:00
Mike Fährmann
b4da8c5a97
[sexcom] add extractor for related pins (#325) 2019-07-03 21:04:23 +02:00
Mike Fährmann
69997e92db
[sexcom] skip unavailable pins (#325) 2019-07-02 22:05:54 +02:00
Mike Fährmann
8966930c5c
[downloader:http] try to import SSL exception class from OpenSSL
(#324)
2019-07-01 20:10:26 +02:00
Mike Fährmann
bc6b0cfddc
[shopify] skip consecutive duplicate products
Not filtering duplicate URLs anymore caused the archive ID uniqueness
test to fail.
2019-07-01 20:04:57 +02:00
Mike Fährmann
b89f0d8d3c
update extractor result tests 2019-07-01 20:02:47 +02:00
Mike Fährmann
69205df68d
allow '-1' for infinite retries (#300) 2019-06-30 23:10:47 +02:00
Mike Fährmann
f7b5c4c3e7
use values of 'retries' options correctly
The RE-tries option now specifies exactly that: the maximum number a
failed HTTP request is re-tried. For example a value of 2 will now
correctly stop after 3 attempts: the initial one + 2 re-tries.

The maximum wait-time now also caps at 30min and increases exponentially
for both extractor.request() and downloader.http.download().
2019-06-30 23:10:18 +02:00
Mike Fährmann
6393b47db2
add '-A/--abort'; deprecate '--abort-on-skip' 2019-06-30 14:28:28 +02:00
Mike Fährmann
f2000a69aa
implement 'image-unique' and 'chapter-unique' options (#303)
The default value for both is 'false', i.e. duplicate URLs are NOT
ignored.

The previous behavior was to always ignore duplicate URLs to make
'--abort-on-skip' work properly when new images where added to the
beginning of a collection while gallery-dl is running.
2019-06-29 22:50:17 +02:00
Mike Fährmann
40da44b17f
Merge branch 'v1.9.0' 2019-06-29 15:39:52 +02:00
Mike Fährmann
9a216a6c6c
release version 1.8.7 2019-06-28 21:04:00 +02:00
Mike Fährmann
7a99e85943
[kissmanga] fix download URLs and file extensions
The current Blogspot image URLs hosted on Kissmanga end with an
"invalid" query parameter (/000.png&upx=...), which doesn't get
recognized by 'spliturl()' and 'parseurl()' as such and gets therefore
included in the 'extension' field from 'text.nameext_from_url()'.
2019-06-28 20:34:43 +02:00
Mike Fährmann
055102431f
[hitomi] handle Game CG galleries with scenes (fixes #321) 2019-06-27 20:25:40 +02:00
Mike Fährmann
a9c89085fb
[instagram] implement login support (#195) 2019-06-26 23:58:47 +02:00
Mike Fährmann
f1b0c2bf5c
[downloader:ytdl] forward cookies to youtube-dl
to be able to download private videos from Twitter, Instagram, etc.
2019-06-26 19:32:07 +02:00
Mike Fährmann
7856e5e7dc
]deviantart] "fix" scraps extraction 2019-06-25 18:18:12 +02:00
Mike Fährmann
082cb24acd
[pururin] fix extraction
Missing metadata information would lead to unnecessary exceptions.
2019-06-24 22:27:50 +02:00
Mike Fährmann
98554cbab8
[mangoxo] fix login 2019-06-24 21:57:17 +02:00
Mike Fährmann
108963d138
[imagefap] include Referer headers 2019-06-24 21:31:29 +02:00
Mike Fährmann
e314621366
[nsfwalbum] fix default directory_fmt (#287) 2019-06-24 18:29:54 +02:00
Mike Fährmann
95b1e4c3c0
implement R<old>/<new>/ format option (#318) 2019-06-23 22:45:44 +02:00
Mike Fährmann
18a1f8c6cd
[vanillarock] add post and tag extractors (closes #254) 2019-06-23 22:45:36 +02:00
Mike Fährmann
f0c5093812
[nsfwalbum] add album extractor (closes #287) 2019-06-23 22:45:07 +02:00
Mike Fährmann
15e4ddf46d
implement custom logging formatter
supports custom log message formats for each loglevel and, by
extension, custom ANSI codes and colors for errors and warnings

(#304)
2019-06-21 20:17:58 +02:00
Mike Fährmann
61e413d85d
[hentaifoundry] stop disabling IPv6 addresses
The rogue address mentioned in a138d58 is no longer included in the DNS
results for www.hentai-foundry.com.
2019-06-21 20:03:14 +02:00
Mike Fährmann
76ae9957c2
[deviantart] force legacy version for single deviations
Let's see how long this works ...

DeviantArt is rolling out a new version of their website, including a
new internal and potentially usable API (rewrite incoming, yay).

The issue with the new layout is that it doesn't include the "old"
UUIDs for single deviations, i.e. mapping a numeric deviation ID to its
UUID counterpart is impossible with the new layout.
2019-06-20 19:26:15 +02:00
Mike Fährmann
db3f52881a
add 'mtime' option 2019-06-20 17:19:44 +02:00
Mike Fährmann
ee4d7c3d89
update downloader.find() and related code
Instead of replacing 'https' with 'http' for every URL in
'get_downloader()', this now only happens once during downloader
initialization. Also unit tests.
2019-06-20 16:59:44 +02:00
Mike Fährmann
f4ba98771d
use Last-Modified header to set file modification time
(#236, #277)
2019-06-19 23:16:32 +02:00
Mike Fährmann
179d112083
[downloader] overhaul http and text modules
Get rid of the modular structure and simplify/specialize those modules.
2019-06-19 22:56:11 +02:00
Mike Fährmann
a01f99728c
[postprocessor:zip] delete empty archives when done (#316) 2019-06-19 18:14:33 +02:00
Mike Fährmann
520c8ba106
[hentaicafe] extract 'tags' and 'artist' metadata (closes #238)
These metadata fields will only be filled in when using a top-level
URL, because that's the only place this information is available. Using
a Foolslide URL (1) will leave these fields empty.

(1) https://hentai.cafe/manga/read/.../en/0/1/"
2019-06-18 14:30:26 +02:00
Mike Fährmann
b51baa9a4b
[hitomi] fix empty language detection; parse datetime 2019-06-17 20:02:58 +02:00
Mike Fährmann
258e8b2060
[deviantart] small code improvements 2019-06-17 19:49:50 +02:00
Mike Fährmann
a77340c647
[keenspot] fix extraction for "TwoKinds" 2019-06-17 19:49:39 +02:00
Mike Fährmann
03e6876fbe
[instagram] provide 'description' metadata (#310) 2019-06-16 21:54:01 +02:00
Mike Fährmann
b171befa87
implement 'parse_unicode_escapes()' 2019-06-16 21:47:24 +02:00
Mike Fährmann
3a36a0fa1e
release version 1.8.6 2019-06-14 21:11:58 +02:00
Mike Fährmann
ec3e8601f1
[slickpic] add user extractor (#249) 2019-06-14 18:55:56 +02:00
Mike Fährmann
97ef416218
[8muses] support multi-page listings (#305) 2019-06-14 18:48:22 +02:00
Mike Fährmann
f5961ac968
[deviantart] download deviations with no 'content' field
Some deviations (possibly only from sta.sh sources) are downloadable
(i.e. 'is_downloadable' is true and /deviation/download/ works), but
have no 'content' or similar  in their JSON representation.

(fixes #307)
2019-06-13 21:14:12 +02:00
Mike Fährmann
4e07f99e3e
[mangoxo] change token message level to debug
The login page currently doesn't provide and require a login token
(logging in works without a token), so printing a warning during
each login is unnecessary.
2019-06-13 21:09:11 +02:00
Mike Fährmann
d997c10320
[8muses] add album extractor (#305) 2019-06-10 22:17:46 +02:00
Mike Fährmann
e05a96db5e
[deviantart] rename 'stash' to 'extra' (#302)
'stash' is already used as a name for the StashExtractor and therefore
expected to be a dictionary.
2019-06-10 21:05:25 +02:00
Mike Fährmann
2184e3a86b
[slickpic] add album extractor (#249) 2019-06-09 21:59:22 +02:00
Mike Fährmann
c23bf263fe
[deviantart] rename 'external' to 'stash' (#302)
restrict extracted URLs to ones from https://sta.sh/...
2019-06-09 11:16:02 +02:00
Mike Fährmann
c73c2cda50
[pornhub] add gallery & user extractor (#282) 2019-06-07 16:31:20 +02:00
Mike Fährmann
7c6cb908f9
[xhamster] update test results 2019-06-07 16:28:49 +02:00
Mike Fährmann
2fb85178da
[deviantart] add 'external' option (#302)
If a description is available, this will extract URLs from the
description text and try to find Extractors for them.
2019-06-06 18:53:50 +02:00
Mike Fährmann
f85e42cffc
[deviantart] fix --range for deviation & stash extractor 2019-06-06 18:45:10 +02:00
Mike Fährmann
40c7eb3424
[livedoor] improve extraction (fixes #301) 2019-06-06 15:22:27 +02:00
Mike Fährmann
62335b9015
[paheal] adjust test results 2019-06-05 11:42:01 +02:00
Mike Fährmann
aa1ca4ed35
[shopify] skip deleted products (#175)
Product pages which return a 4xx status code will now be skipped instead
of raising an exception.
2019-06-05 11:40:54 +02:00
Mike Fährmann
096009367b
[xhamster] add gallery & user extractor (#281) 2019-06-05 11:11:51 +02:00
Mike Fährmann
208202b962
[tumblr] improve error handling (#297)
In some cases Tumblr's API responds with an HTML document.
Trying to decode it as JSON would raise an uncaught exception.
2019-06-04 14:02:17 +02:00
Mike Fährmann
c08c340178
[directlink] make pattern case insensitive (fixes #296) 2019-06-03 10:56:14 +02:00
Mike Fährmann
95b4a53b9c
[keenspot] improve pagination (#223)
The old code would skip the last comic page for some series.
2019-06-02 22:12:21 +02:00