Mike Fährmann
829ddf4ac1
[sankaku] general improvements
...
- simplify regex
- unquote search tags
- increase default wait-time between HTTP requests
- downloading several hundreds of images always resulted
in '429 Too Many Requests' eventually
- circumvent paging restrictions for unauthenticated users by only
using the 'next' parameter
- setting 'page' to a constant, low value (or simply omitting it)
does the trick
2018-02-27 16:51:14 +01:00
Jad
49463f76bb
support multi-page URL ( #79 )
...
* support multi-page URL
* fix
* all done.
* fix, again
2018-02-26 11:13:49 +01:00
Mike Fährmann
19aefdfde3
[directlink] update test results
2018-02-26 03:01:23 +01:00
Mike Fährmann
74029c50bb
[directlink] unquote metadata fields
2018-02-26 02:12:47 +01:00
Mike Fährmann
2fad0b1f1b
add 'U' conversion for format strings to unquote their content
...
(#74 )
2018-02-25 21:57:59 +01:00
Mike Fährmann
8cdce21dcb
make archive keys user-configurable
2018-02-25 21:57:01 +01:00
Mike Fährmann
8f338347b6
[imagehosts] cleanup
...
removed
- chronos.to - unable to resolve hostname
- coreimg.net - same
- imgmaid.net - same
- hosturimage.com - everything returns 404
- imageontime.org - redirects to some shady site
- imgupload.yt - cloudflare error 522, host down
- img4ever.net - read timeout
2018-02-23 01:05:42 +01:00
Mike Fährmann
edfd3d9fc9
[yeet] remove module
...
- archive.yeet.net returns a 500 server error
- yeet.net moved to yeet.rip, but the archive is gone
2018-02-23 01:05:41 +01:00
Mike Fährmann
e1e0668ca8
add option to set default replacement field value
...
Missing or undefined keywords will now be replaced with the value
set for 'keywords-default'. The default is Python's 'None', which
is equivalent to setting this option to JSON's 'null'.
2018-02-23 00:59:20 +01:00
Mike Fährmann
ac3da8115e
[util] don't add text: URLs to list of downloaded URLs
2018-02-20 18:14:27 +01:00
Mike Fährmann
8704d850bf
add explicit proxy support ( #76 )
...
- '--proxy' as command-line argument
- 'extractor.*.proxy' as config option
2018-02-19 18:45:06 +01:00
Mike Fährmann
89440382ad
[tumblr] use separate API key for unit tests
2018-02-19 16:54:37 +01:00
Mike Fährmann
367b963d37
[pixiv] fix ugoira extraction ... again ( #78 )
...
Some animations are not available for mobile devices, so we
pretend to be a desktop browser when requesting the ugoira page.
2018-02-19 16:50:12 +01:00
Mike Fährmann
b79f1f2ca7
[pixiv] fix ugoira extraction ( closes #78 )
2018-02-19 08:51:09 +01:00
Mike Fährmann
731ffd4986
improve text.filename_from_url() performance
...
- urlsplit() is faster than urlparse()
- rpartition() is faster than rindex() + slicing
- new version is 2.3 times as fast
2018-02-18 16:50:07 +01:00
Mike Fährmann
d122203be1
[mangastream] fix extraction
2018-02-17 22:40:16 +01:00
Mike Fährmann
8809b32aed
release version 1.2.0
2018-02-16 22:29:57 +01:00
Mike Fährmann
5864afc0d3
update CHANGELOG
2018-02-16 22:27:40 +01:00
Mike Fährmann
b50bdbf3d7
change config specifiers in input file format
...
Instead of a dictionary/object, input file options are now specified
by a 'key=value' pair starting with '-' for options only applying to
the next URL or '-G' for Global options applying to all following URLs.
See the docstring of parse_inputfile() for details.
Example option specifiers:
- filename = "{id}.{extension}"
- extractor.pixiv.user.directory = ["Pixiv Users", "{user[id]}"]
-spaces="are_optional"
-G keywords = {"global": "option"}
2018-02-16 03:10:41 +01:00
Mike Fährmann
f970a8f13c
fix adding keys to download archive when using skip=false
2018-02-13 23:45:30 +01:00
Mike Fährmann
179bcdd349
adjust archive-ids
2018-02-13 04:50:45 +01:00
Mike Fährmann
be3ea4425d
test archive-id creation and uniqueness
2018-02-12 23:02:09 +01:00
Mike Fährmann
3cec533c28
Merge branch 'archive'
2018-02-12 18:07:58 +01:00
Mike Fährmann
20af86b2ea
add more extractor tests
...
for mangastream, reddit and imgur
2018-02-12 17:07:18 +01:00
Mike Fährmann
b73b8b4f50
add OAuth unittests
2018-02-12 17:07:07 +01:00
Mike Fährmann
4d2fadfb6f
restore skip actions with download archive
2018-02-12 16:56:45 +01:00
Mike Fährmann
65773263fc
[util] implement OAuthSession.urlencode() ( closes #75 )
...
- Python's own urllib.parse.urlencode() has no quote_via argument in
Python 3.3 and 3.4, which is necessary to follow OAuth 1.0 quoting
rules.
2018-02-10 21:56:13 +01:00
Mike Fährmann
7e0207bcf4
[imgur] strip trailing '?1' from 'ext'
2018-02-10 21:33:40 +01:00
Mike Fährmann
cf147dfee9
[hentai2read] fix manga extraction
...
- site changed its HTML structure
2018-02-09 22:24:34 +01:00
Mike Fährmann
f5f2d29f56
[nijie] fix dojin extraction
...
- correctly extract artist_id
- set extension to "jpg" if it was empty and let filetype checks do
the rest
2018-02-09 22:06:26 +01:00
Mike Fährmann
7f7c16ae37
add option to specify additional key-value pairs
2018-02-08 23:10:58 +01:00
Mike Fährmann
d38bf2f54c
[tumblr] recognize /image/... URLs
...
xyz.tumblr.com/image/123 refers to the same images
as xyz.tumblr.com/post/123.
2018-02-08 23:08:14 +01:00
Mike Fährmann
057668e17e
extend input-file format with per-URL config and comments
...
- see docstring of parse_inputfile() for details
- TODO: unittests, recursion (currently setting for example
{"extractor": {"key": "value"}} will override the whole "extractor"
branch instead of merging {"key": "value"} into the already existing
dictionary)
2018-02-07 21:47:27 +01:00
Mike Fährmann
5b3c34aa96
use generic chapter-extractor in more modules
2018-02-07 12:36:39 +01:00
Mike Fährmann
347baf7ac5
improve util.parse_range() performance
...
It is never going to actually matter, but using partition() instead
of split() is twice as fast.
2018-02-05 22:28:11 +01:00
Mike Fährmann
7b5ba69951
[hentaihere] ensure consistent extraction results
...
sometimes there is a random space before the next <a>
2018-02-05 15:26:25 +01:00
Mike Fährmann
377b78b3c9
[hentai2read] fix manga name extraction
2018-02-04 22:12:24 +01:00
Mike Fährmann
54c36a8a34
[subapics] add chapter- and manga-extractor ( #70 )
2018-02-04 22:02:10 +01:00
Mike Fährmann
2dd3aeeeae
[komikcast] add chapter- and manga-extractor ( #70 )
2018-02-04 22:02:10 +01:00
Mike Fährmann
7a412f5c32
implement generic manga-chapter extractor
2018-02-04 22:02:04 +01:00
Mike Fährmann
aa38eab2be
allow not-defined fields in format strings
...
... and replace them with "None", for now
2018-02-03 22:28:41 +01:00
Mike Fährmann
6a07e38366
implement extractor.add() and .add_module()
...
... as a public and non-hacky way to add (external) extractors to
gallery-dl's pool and make them available for extractor.find()
2018-02-02 00:01:41 +01:00
Mike Fährmann
c0dd922c13
add '--download-archive' cmdline option
...
… as well as a config file equivalent
2018-02-01 22:00:44 +01:00
Mike Fährmann
8c3b713362
rework DownloadJob.handle_url(); include archive functionality
...
todo:
"abort" and "exit" skip modes if download is skipped because of archive
2018-02-01 20:49:41 +01:00
Mike Fährmann
34873dbd90
set 'archive_fmt' values
...
These are going to be used to create an unique id for each image.
2018-02-01 15:30:49 +01:00
Mike Fährmann
a34cebc253
[luscious] jump to first image if cover does not link to it
2018-01-30 22:39:01 +01:00
Mike Fährmann
84a52a9256
add DownloadArchive class
2018-01-30 15:23:23 +01:00
Mike Fährmann
915807dd77
log HTTP errors as warnings
2018-01-29 21:55:46 +01:00
Mike Fährmann
db7f04dd97
emit log messages on download failure
...
and when retrying with fallback URLs
2018-01-28 18:44:10 +01:00
Mike Fährmann
d951f13e37
add config option for unsupported-URL file
...
for consistency's sake
2018-01-28 18:42:10 +01:00