Mike Fährmann
904ba08568
[gfycat] fix default filename format
2020-11-13 06:37:21 +01:00
Mike Fährmann
a46561bc16
[500px] update query hashes
2020-11-13 06:36:11 +01:00
Mike Fährmann
2e3a0dff21
[8kun] fix file URLs of older posts ( fixes #1101 )
2020-11-07 23:10:37 +01:00
Mike Fährmann
00825cddf5
[hentaifoundry] use scheme from input URL ( fixes #1095 )
...
Let the user choose between http and https,
instead of always forcing https.
2020-11-07 22:40:02 +01:00
Mike Fährmann
8a98d3549a
[weasyl] create directory for each favorite submission
...
(#1032 )
2020-11-07 18:47:55 +01:00
Mike Fährmann
91db8df1c7
[deviantart] add 'index_base36' metadata field ( closes #1099 )
...
This is the same ID as found in 'filename' without the 'd' in front,
which is just 'index' encoded in base36.
2020-11-07 18:39:50 +01:00
Mike Fährmann
b9bfa4c675
update extractor test results
2020-11-07 02:03:22 +01:00
Mike Fährmann
1b5b789401
[mangoxo] fix metadata extraction
2020-11-07 01:35:29 +01:00
Mike Fährmann
41d4968866
[twitter] add 'list' extractor ( #1096 )
2020-11-05 22:55:38 +01:00
Mike Fährmann
5d10520f4c
[twitter] update GraphQL endpoint & fix width/height entries
2020-11-05 22:53:29 +01:00
Mike Fährmann
bc7b1d91bc
fix rST markup in configuration.rst
...
[ci skip]
2020-11-02 15:32:29 +01:00
Mike Fährmann
9b2e5f72d6
[exhentai] update image URL parsing ( #1094 )
2020-11-02 15:28:54 +01:00
Mike Fährmann
e3480bc8de
implement 'extension-map' option ( #318 )
2020-11-02 15:27:07 +01:00
Mike Fährmann
98a4d86a01
[sankakucomplex] extract videos and embeds ( closes #308 )
2020-10-30 01:21:11 +01:00
Mike Fährmann
c3f01dc4e6
implement 'util.unique()'
2020-10-29 23:33:41 +01:00
Mike Fährmann
558cde139c
[paheal] fix extraction ( fixes #1088 )
2020-10-28 21:51:31 +01:00
Mike Fährmann
0211af7ca8
[hentaifoundry] update 'YII_CSRF_TOKEN' cookie handling
...
(fixes #1083 )
2020-10-28 21:49:03 +01:00
Mike Fährmann
d83b95fd28
[postprocessor:metadata] accept a string-list for 'content-format'
...
(closes #1080 )
2020-10-27 20:09:58 +01:00
Mike Fährmann
198c33ec36
also collect post processors from 'basecategory' entries
...
(fixes #1084 )
2020-10-27 19:56:48 +01:00
Mike Fährmann
350b1afe1c
speed up _list_classes() after iterating over all modules once
2020-10-26 22:18:15 +01:00
Mike Fährmann
5bcf28de93
add a 'extractor.modules' option
2020-10-25 03:05:10 +01:00
Mike Fährmann
18213dc5ba
release version 1.15.2
2020-10-24 18:57:29 +02:00
Mike Fährmann
de4a1e45c9
improve 'generate_csrf_token()'
...
no need to use hashlib.md5()
2020-10-24 02:56:40 +02:00
Mike Fährmann
b788712844
[fallenangels] fix extraction of '.5' chapters
2020-10-23 16:56:08 +02:00
Mike Fährmann
28d8541cb3
[mangafox] ensure download URLs have a scheme
2020-10-23 02:45:15 +02:00
Mike Fährmann
8e3a324c91
[mangakakalot] ignore "Go Home" buttons in chapter pages
2020-10-23 02:33:35 +02:00
Mike Fährmann
c14c5d82d6
[newgrounds] use generator for fallback URLs
2020-10-23 00:39:19 +02:00
Mike Fährmann
a09f42f6b3
improve filename_from_url() performance
...
Manually extracting the part between the last '/' and '?' instead of
relying on the standard libraries' 'urllib.parse.urlsplit()' increases
performance by ~400%.
urlsplit() : 3.64 secs per 1.000.000 iterations
partition(): 0.87 secs per 1.000.000 iterations
2020-10-23 00:14:06 +02:00
Mike Fährmann
968d3e8465
remove '&' from URL patterns
...
'/?&#' -> '/?#' and '?&#' -> '?#'
According to https://www.ietf.org/rfc/rfc3986.txt , URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
1686dc1757
[twitter] support media from Cards ( #1005 , #937 )
...
Can be enabled with 'extractor.twitter.cards', but for now disabled by
default because cards can redirect to rather large videos from YouTube
or Twitch.
2020-10-22 21:33:53 +02:00
Mike Fährmann
ffd38215a4
[hitomi] fix image URLs and URL pattern
...
- non-webp files are now hosted on [a-c]b.hitomi.la
- removed ampersand from invalid slug characters
2020-10-22 15:15:34 +02:00
Mike Fährmann
bac8af69e8
update configuration.rst
...
- add some lines to better explain post processor usage
- syntax highlighting for JSON blocks and other smaller stuff
2020-10-19 21:57:26 +02:00
Mike Fährmann
05d7009cc6
rename 'Authentication' entries in supportedsites.rst
...
- change 'Optional' to 'Supported'
- use 'OAuth' and 'Cookies' in their own
- add link to weasyl API key option
2020-10-19 20:16:17 +02:00
Mike Fährmann
286718950c
[mangahere] ensure download URLs have a scheme ( fixes #1070 )
2020-10-17 22:43:59 +02:00
Mike Fährmann
76dfa11a65
[reddit] add 'date' metadata field ( closes #1068 )
2020-10-16 15:48:04 +02:00
Mike Fährmann
3f2ba629ea
[newgrounds] provide fallback URLs for video downloads ( #1042 )
2020-10-16 01:16:12 +02:00
Mike Fährmann
a3ca2f6080
update fallback URL handling
...
remove Message.Urllist and use a '_fallback' field inside a kwdict
2020-10-16 01:09:55 +02:00
Mike Fährmann
43dab3a228
[mangadex] unescape more metadata fields ( fixes #1066 )
...
like 'manga', 'author', 'artist', etc.
2020-10-16 00:41:15 +02:00
Mike Fährmann
ec61696316
add 't' format string conversion ( closes #1065 )
...
to Trim whitespace from the beginning and end of strings.
Example: '{field!t}' becomes 'foo' for 'field' == " \nfoo\t\r"
2020-10-16 00:37:22 +02:00
Mike Fährmann
5565025221
[xhamster] fix user profile extraction
2020-10-15 18:57:35 +02:00
Mike Fährmann
07432d6262
[seiga] fix flake8 and cookie test ( #1063 )
2020-10-15 15:37:58 +02:00
Mike Fährmann
d1c6d78477
fix rST markup in configuration.rst
2020-10-15 15:17:19 +02:00
Mike Fährmann
b8daabc3ca
[pinterest] implement login support ( closes #1055 )
...
being logged allows access to secret/protected boards
2020-10-15 15:14:18 +02:00
Mike Fährmann
1b1cf01d0d
add a general 'generate_csrf_token()' function
2020-10-15 15:14:18 +02:00
Mike Fährmann
7a0ba370d1
[gelbooru] rewrite mp4 video URLs ( fixes #1048 )
2020-10-15 15:14:18 +02:00
Mike Fährmann
6491db3eaf
[blogger] handle URLs with specified width/height ( closes #1061 )
...
get highest quality for images with
/wXXX-hXXX/ instead of the usual /sXXX/
2020-10-15 15:14:18 +02:00
Mike Fährmann
783e0af26d
[hentaifoundry] update and simplify
2020-10-15 15:14:17 +02:00
Mike Fährmann
5b844a72b7
[newgrounds] handle embeds without scheme ( #1033 )
2020-10-15 15:13:54 +02:00
kurumigi
7e0e872f4f
[seiga] Add metadata for single image downloads ( #1063 )
...
* [seiga] Support image metadata.
* [seiga] Update test data.
* [seiga] Fix cookie check.
* [test_cookies] [seiga] Fit test_cookies.py to the last commit.
2020-10-15 15:13:27 +02:00
Zanny
3ec60e894a
[weasyl] api-key authentication ( #1057 )
...
* [weasyl] support api keys
* [weasyl] document api-key authentication
* [weasyl] usernames can contain ~
2020-10-15 15:12:09 +02:00