Mike Fährmann
d33f5a7423
[wallhaven] rewrite
...
- use API
- remove login support, add 'api-key' option
- remove support for "alpha" subdomain - alpha.wallhaven.cc used numeric
IDs that can't be translated to the new ID system
- support direct links to wallpapers
2019-05-31 14:53:02 +02:00
Mike Fährmann
5499934ae2
[ngomik] fix extraction
2019-05-30 20:18:36 +02:00
Mike Fährmann
f1893b2b5b
[deviantart] add 'folders' option ( #276 )
2019-05-30 17:28:12 +02:00
Mike Fährmann
c849574def
[keenspot] add comic extractor ( #223 )
...
Doesn't work for
- http://brawlinthefamily.keenspot.com/
- http://flipside.keenspot.com/
- http://lastblood.keenspot.com/
- http://mysticrevolution.keenspot.com/
- http://porcelain.keenspot.com/
- http://twokinds.keenspot.com/
yet, because of custom layouts.
2019-05-28 21:34:38 +02:00
Mike Fährmann
2b1999476e
implement 'text.rextract()'
2019-05-28 21:03:41 +02:00
Mike Fährmann
8bd5a19515
[hentainexus] add '_extractor' data
2019-05-28 00:20:01 +02:00
Mike Fährmann
2a085a5e96
[sankakucomplex] fix 'date' values ( #258 )
2019-05-28 00:18:58 +02:00
Mike Fährmann
bcd1801aa8
[sankakucomplex] add 'tag' extractor ( #258 )
2019-05-27 23:57:44 +02:00
Mike Fährmann
74c2415138
[sankakucomplex] move article extractor to its own module ( #258 )
2019-05-27 23:49:23 +02:00
Mike Fährmann
4465a3ea68
[kissmanga][readcomiconline] add 'captcha' option ( #279 )
...
to configure how to handle CAPTCHA page redirects:
- either interactively wait for the user to solve the CAPTCHA
- or raise StopExtraction like before
2019-05-27 22:24:48 +02:00
Mike Fährmann
e30ada162d
fix cookie tests
...
update _get_extractor():
- always return an Extractor instance with a _login_impl() method
- use Extractor.from_url()
2019-05-26 20:22:04 +02:00
Mike Fährmann
1e3e15c4f3
[sankaku] add article extractor ( #258 )
2019-05-26 17:42:36 +02:00
Mike Fährmann
48233f00c0
[readcomiconline] detect 'AreYouHuman' redirects ( #279 )
2019-05-26 15:58:37 +02:00
Mike Fährmann
1cde38110d
[livedoor] return 'date' as datetime object
2019-05-25 23:45:56 +02:00
Mike Fährmann
e88824e1a7
[livedoor] fix adjustments for https:// URLs
2019-05-25 23:45:22 +02:00
Mike Fährmann
2316e0ed3d
fix strptime workaround from b0e85a4
...
Don't return a modified version of 'date_time' if strptime fails.
2019-05-25 23:22:26 +02:00
Mike Fährmann
b3e4664715
[hentainexus] fix extraction
2019-05-25 22:35:04 +02:00
Mike Fährmann
399e8e965a
also update urllib3's cipher list for versions >= 1.25
2019-05-21 23:02:20 +02:00
Mike Fährmann
f837ea98cb
[deviantart] don't call 'extend()' on folders ( fixes #271 )
2019-05-20 16:24:13 +02:00
Mike Fährmann
bb32a2d490
[patreon] use file extensions from original filenames ( #268 )
2019-05-20 15:46:59 +02:00
Mike Fährmann
efa805c5d7
[sankaku] update pagination end condition ( fixes #265 )
...
Pagination over popular listings (`date:...+order:popular") never
terminates, not even on the site itself, and at some point returns the
same results over and over again.
2019-05-20 15:46:06 +02:00
Mike Fährmann
d514d49c72
release version 1.8.4
2019-05-17 23:52:09 +02:00
Mike Fährmann
a4ba34c835
[booru] prevent crash when no tags are present ( #259 )
2019-05-17 19:32:53 +02:00
Mike Fährmann
ca3bad1779
[patreon] small fixes and adjustments ( #226 )
...
- fix datetime parsing
- rename 'user' to 'creator'
- convert 'id' to integer
- improve tests
2019-05-17 19:32:41 +02:00
Leonardo Taccari
fb09dd962a
[instagram] Fix extraction after `rhx_gis' field removal
2019-05-17 18:17:42 +02:00
Mike Fährmann
7a14aaed7d
[luscious] fix extraction
2019-05-17 10:48:47 +02:00
Mike Fährmann
e82cadac61
[patreon] add extractors ( #226 )
2019-05-17 10:47:22 +02:00
Mike Fährmann
4891f4a328
[hentainexus] add search extractor ( #256 )
2019-05-16 23:55:30 +02:00
Mike Fährmann
c02f12ce2f
avoid Cloudflare CAPTCHAs for OpenSSL < 1.1.1
...
see https://github.com/Anorov/cloudflare-scrape/pull/242
2019-05-15 12:25:20 +02:00
Mike Fährmann
0b4be57a10
[sankaku] fix error when no tags available ( closes #259 )
...
[ci skip]
2019-05-14 23:40:07 +02:00
Mike Fährmann
6764847349
fix cookie tests
...
'cookies' is a CookieJar, not a dict,
and removing the call to '.keys()' doesn't have the same effect
2019-05-14 22:32:40 +02:00
Mike Fährmann
9890bfdf23
[flickr] improve code and metadata
...
- simplify pagination
- add more metadata and slightly change its structure
- convert suitable values to int or list
- move keys from ["photo"] to the base level
- proper video support (#246 )
- rename method and variable names to better fit with other extractors
2019-05-14 22:10:50 +02:00
Mike Fährmann
aa8e366b90
[luscious] fix tag extraction
2019-05-14 17:35:52 +02:00
Mike Fährmann
a5b060765d
improve code in tests
...
- use 'assertRaises' as context manager
- remove calls to .keys()
2019-05-13 11:48:20 +02:00
Mike Fährmann
ba8eb1ffec
[hentainexus] add gallery extractor ( #256 )
2019-05-12 23:59:41 +02:00
Mike Fährmann
bd9cb3d191
improve job class selection code
...
+ consistent argument order for add_argument() calls
2019-05-10 22:05:57 +02:00
Mike Fährmann
e64773ffdd
allow multiple post-processor command-line options ( #253 )
...
... without overwriting any previous ones
2019-05-10 15:32:23 +02:00
Mike Fährmann
b1db194c14
[reactor] update and improve
...
- split 'tags' into a list
- parse 'date' into a datetime object
- fix webm/mp4 URLs
2019-05-09 23:24:49 +02:00
Mike Fährmann
b0e85a42e3
apply workaround from 4736912
in parse_datetime() itself
2019-05-09 21:53:17 +02:00
Mike Fährmann
523ebc9b0b
Fix serialization of 'datetime' objects in '--write-metadata'
...
Simplified universal serialization support in json.dump() can be achieved
by passing 'default=str', which was already the case in DataJob.run()
for -j/--dump-json, but not for the 'metadata' post-processor.
This commit introduces util.dump_json() that (more or less) unifies the
JSON output procedure of both --write-metadata and --dump-json.
(#251 , #252 )
2019-05-09 16:49:22 +02:00
Mike Fährmann
8de5866fd2
[twitter] replace unit test URLs
...
https://twitter.com/PicturesEarth was deleted
2019-05-09 10:17:55 +02:00
Mike Fährmann
74c7304c6b
[newgrounds] extract 'date', 'favorites', and 'score'
2019-05-08 18:09:17 +02:00
Mike Fährmann
4736912d4e
[pixiv] work around strptime limitations in Python < 3.7
...
"%z" doesn't allow a colon separator in older Python versions:
- "+0900" is OK
- "+09:00" raises an exception
2019-05-08 18:08:03 +02:00
Mike Fährmann
1f7fa9dc8e
[exhentai] update data extraction code
...
- parse 'date' to datetime object
- use 'text.extract_from()'
2019-05-08 15:44:29 +02:00
Mike Fährmann
80fdb11508
[pixiv] add 'date' metadata field ( closes #248 )
2019-05-08 15:43:59 +02:00
Mike Fährmann
d09864b581
implement text.parse_datetime()
2019-05-08 15:43:59 +02:00
Mike Fährmann
049e9fd6ce
[twitter] fix pagination end condition
...
Some timelines would cause an endless loop because 'has_more_items' is
always True, even if it would return the same list of tweets over and
over again.
2019-05-08 15:43:59 +02:00
Mike Fährmann
51e0e92429
[deviantart] fix GIF downloads ( #242 )
...
The "original" download URL for GIF animations is only a preview version
of the original file.
2019-05-08 15:43:43 +02:00
Leonardo Taccari
f347d2d152
[instagram] Fix for missing edge_media_to_comment' field and add
date' metadata ( #250 )
...
* [instagram] Remove no longer always present `comments' field
`edge_media_to_comment' is no longer always present in the response
(also for the same media sometimes is present and sometimes is not present).
* [instagram] Add `date' metadata
2019-05-08 15:42:58 +02:00
Mike Fährmann
26b516b328
release version 1.8.3
2019-05-04 22:50:00 +02:00