Mike Fährmann
f5604492c3
update interface of config functions
2019-11-24 00:42:28 +01:00
Mike Fährmann
3fc1e12949
[postprocessor:metadata] filter private entries
...
i.e. keys starting with an underscore
2019-11-21 16:58:44 +01:00
Mike Fährmann
978cb03f81
update misc test results
...
- Livedoor now uses https:// for its image URLs
- Instagram image URLs got simplified
2019-11-20 21:45:48 +01:00
Mike Fährmann
bbbeff4c41
[downloader.http] implement file-specific HTTP headers
2019-11-19 23:50:54 +01:00
Mike Fährmann
3ece3976ae
[newgrounds] implement login support ( #394 )
2019-11-16 23:45:32 +01:00
Mike Fährmann
abfcb356fc
[flickr] support 3k, 4k, 5k, and 6k photo sizes ( closes #472 )
2019-11-10 17:52:51 +01:00
Mike Fährmann
da6789b2b0
disable unique archive id checks for some tests
...
- same image twice in a livedoor blog post
- unreliable results for related pinterest items
2019-11-10 17:04:51 +01:00
Mike Fährmann
ba083b30b2
fix snap build
...
… hopefully
2019-11-08 21:44:12 +01:00
Mike Fährmann
94a94f3b86
miscellaneous stuff
2019-11-08 20:58:53 +01:00
Mike Fährmann
557e2c018b
[8chan] remove module
2019-11-02 20:06:47 +01:00
Mike Fährmann
322c2e7ed4
renaming variables
...
mostly 'keyword(s)' to 'kwdict'
2019-10-29 15:46:35 +01:00
Mike Fährmann
87a87bff7e
[simplyhentai] fix image URLs
2019-10-28 21:11:06 +01:00
Mike Fährmann
b23c822b23
[luscious] use GraphQL
2019-10-22 21:17:08 +02:00
Mike Fährmann
1693d97bd3
update extractor class hierarchies
...
- let the GalleryExtractor class inherit directly from Extractor
- make ChapterExtractor a subclass of GalleryExtractor
- change enumeration field names of GalleryExtractors to 'num'
2019-10-16 18:15:29 +02:00
Mike Fährmann
7ebd984e8d
[imgur] print error message if no JSON data is found ( #446 )
2019-10-16 17:45:14 +02:00
Mike Fährmann
de4e2029d1
[nsfwalbum] update test album
...
the old one is no longer available
2019-09-28 20:48:15 +02:00
Mike Fährmann
913460240d
[reddit] fix 'extractor.blacklist()' arguments
...
The second argument must support 'append()'.
2019-09-24 23:01:12 +02:00
Mike Fährmann
1848788970
update test results etc
2019-09-08 11:33:35 +02:00
Mike Fährmann
d5fbb2d9de
[tumblr] ignore audio links from Spotify etc.
2019-09-07 18:18:12 +02:00
Mike Fährmann
c6c5cb1898
improve 'deviantart.quality' description
2019-08-30 18:41:18 +02:00
Mike Fährmann
49f6d7176d
[deviantart] restore filenames ( #392 )
...
<title>_by_<user>_<id> --> <title>_by_<user>-<id>
2019-08-23 22:02:03 +02:00
Mike Fährmann
e528f3cb77
adjust postprocessor test results
...
see 2495b99
2019-08-21 23:54:08 +02:00
Mike Fährmann
23251356cb
require 'extension' data for each URL ( #382 )
2019-08-14 20:03:03 +02:00
Mike Fährmann
0bb873757a
update PathFormat class
...
- change 'has_extension' from a simple flag/bool to a field that
contains the original filename extension
- rename 'keywords' to 'kwdict' and some other stuff as well
- inline 'adjust_path()'
- put enumeration index before filename extension (#306 )
2019-08-12 21:40:37 +02:00
Mike Fährmann
748e37554c
update .travis.yml
...
- install pyOpenSSL before running tests
- simplify snap tests
2019-08-11 16:03:19 +02:00
Mike Fährmann
b7fb93e2b2
[downloader:http] add 'adjust-extensions' option
2019-08-08 16:54:20 +02:00
Mike Fährmann
eb7da159e2
[imagebam] update URL test results
...
Image URLs are now using https://, but the website itself is still
served as http://.
2019-08-07 21:47:44 +02:00
Mike Fährmann
fa60109e97
[exhentai] don't use e-hentai.org for exhentai URLs
2019-08-02 21:10:09 +02:00
Mike Fährmann
4a0c98bfc9
miscellaneous fixes and adjustments
2019-08-01 22:09:43 +02:00
Mike Fährmann
40637556fa
[ngomik] fix extraction
2019-07-28 10:53:46 +02:00
Mike Fährmann
d9d44ad953
[tsumino] update test results
2019-07-24 21:17:23 +02:00
Mike Fährmann
b3851e01d9
release version 1.9.0
2019-07-19 21:55:25 +02:00
Mike Fährmann
12da6bd0c9
[simplyhentai] fix/improve extraction
2019-07-06 20:25:53 +02:00
Mike Fährmann
b89f0d8d3c
update extractor result tests
2019-07-01 20:02:47 +02:00
Mike Fährmann
40da44b17f
Merge branch 'v1.9.0'
2019-06-29 15:39:52 +02:00
Mike Fährmann
7a99e85943
[kissmanga] fix download URLs and file extensions
...
The current Blogspot image URLs hosted on Kissmanga end with an
"invalid" query parameter (/000.png&upx=...), which doesn't get
recognized by 'spliturl()' and 'parseurl()' as such and gets therefore
included in the 'extension' field from 'text.nameext_from_url()'.
2019-06-28 20:34:43 +02:00
Mike Fährmann
a9c89085fb
[instagram] implement login support ( #195 )
2019-06-26 23:58:47 +02:00
Mike Fährmann
b1985d6579
test default format strings during extractor result tests
...
A missing value or an invalid "syntax" for a format replacement field
will raise an exception.
2019-06-25 18:12:32 +02:00
Mike Fährmann
95b1e4c3c0
implement R<old>/<new>/ format option ( #318 )
2019-06-23 22:45:44 +02:00
Mike Fährmann
70713f0f28
fix extractor result tests
2019-06-20 18:12:36 +02:00
Mike Fährmann
a77340c647
[keenspot] fix extraction for "TwoKinds"
2019-06-17 19:49:39 +02:00
Mike Fährmann
e05a96db5e
[deviantart] rename 'stash' to 'extra' ( #302 )
...
'stash' is already used as a name for the StashExtractor and therefore
expected to be a dictionary.
2019-06-10 21:05:25 +02:00
Mike Fährmann
7c6cb908f9
[xhamster] update test results
2019-06-07 16:28:49 +02:00
Mike Fährmann
62335b9015
[paheal] adjust test results
2019-06-05 11:42:01 +02:00
Mike Fährmann
6a34f4b0c1
skip tests on read timeouts; print list of skipped tests
2019-06-01 20:47:31 +02:00
Mike Fährmann
d33f5a7423
[wallhaven] rewrite
...
- use API
- remove login support, add 'api-key' option
- remove support for "alpha" subdomain - alpha.wallhaven.cc used numeric
IDs that can't be translated to the new ID system
- support direct links to wallpapers
2019-05-31 14:53:02 +02:00
Mike Fährmann
5499934ae2
[ngomik] fix extraction
2019-05-30 20:18:36 +02:00
Mike Fährmann
a5b060765d
improve code in tests
...
- use 'assertRaises' as context manager
- remove calls to .keys()
2019-05-13 11:48:20 +02:00
Mike Fährmann
5582b06ae4
fix tests with 'urllist' messages
2019-04-30 16:31:48 +02:00
Mike Fährmann
5018781898
allow type tests by name
2019-04-29 17:27:59 +02:00
Mike Fährmann
e25ebc4bff
don't disable certificate checks anymore
...
Executables generated with PyInstaller auto-include the root certificate
file and certificate checks now work out-of-the-box.
2019-04-17 13:27:19 +02:00
Mike Fährmann
d6ddb74cde
update test results
...
- deviantart: 'index' is now an integer
- flickr: image file with lower quality
- paheal: image server name changed
- rule34: post got deleted
2019-04-12 09:59:48 +02:00
Mike Fährmann
d9b94a585d
[mangoxo] add login support ( #184 )
...
A very recent change: It is now only possible to see more
than the first 5 images of an album if you are logged in.
2019-04-10 18:55:25 +02:00
Mike Fährmann
e730fc9045
[twitter] add login support ( #214 )
2019-04-09 09:27:49 +02:00
Mike Fährmann
790f15a56f
[photobucket] use HTTPS
2019-04-03 18:30:45 +02:00
Mike Fährmann
c70b21248d
[wikiart] add extractors ( #179 )
...
for
- artists: https://www.wikiart.org/en/thomas-cole
- artist-listings: https://www.wikiart.org/en/artists-by-century/12
- artwork-listings: https://www.wikiart.org/en/paintings-by-media/grisaille
2019-04-02 17:34:57 +02:00
Mike Fährmann
0c991a3155
add convenience targets to Makefile
2019-03-29 15:35:00 +01:00
Mike Fährmann
973a720a7a
[weibo] fix unit test URL patterns
2019-03-15 15:19:39 +01:00
Mike Fährmann
6f57d44ec2
[seaotterscans] remove extractor
...
http://seaotterscans.com/ now redirects to their MangaDex profile
2019-03-13 22:02:45 +01:00
Mike Fährmann
0887fb61f4
[komikcast] update test results
2019-03-07 14:55:52 +01:00
Mike Fährmann
976ccb267f
[myportfolio] combine gallery and user extractors
...
An URL alone isn't good enough to distinguish between a gallery or a
gallery-listing, so the new extractor decides what to do based on the
page's content.
2019-03-06 19:45:01 +01:00
Mike Fährmann
9c0e2f294b
[shopify] add generic collection and product extractors ( #175 )
...
with fashionnova.com as a default domain
2019-03-05 22:33:37 +01:00
Mike Fährmann
e687a6095e
[luscious] raise exception if album is not available
2019-02-19 13:30:39 +01:00
Mike Fährmann
b09a8184ca
move TestJob into test module; test _extractor values
2019-02-17 18:18:31 +01:00
Mike Fährmann
1f3422c28b
[mangahere] fix extraction
2019-02-10 22:10:53 +01:00
Mike Fährmann
84ae72b8d8
[ngomik] fix extraction
2019-02-10 14:19:08 +01:00
Mike Fährmann
bc0951d974
allow for simplified test data structures
...
Instead of a strict list of (URL, RESULTS)-tuples, extractor result
tests can now be a single (URL, RESULTS)-tuple, if it's just one test,
and "only matching" tests can now be a simple string.
2019-02-06 17:24:44 +01:00
Mike Fährmann
347398f692
fix various tests
2019-02-04 14:40:21 +01:00
Mike Fährmann
0c32dc5858
[hentaifox] add extractor for search results ( #160 )
2019-01-28 22:38:32 +01:00
Mike Fährmann
217a0687ef
[behance] add 'collection' extractor ( closes #157 )
2019-01-19 18:11:20 +01:00
Mike Fährmann
66460337f1
[mangapark] fix extraction
2019-01-17 21:24:53 +01:00
Mike Fährmann
9bbbadd93a
[hbrowse] use HTTPS
2019-01-15 18:07:39 +01:00
Mike Fährmann
98c6520384
[pinterest] update root URL of API calls
2019-01-14 15:22:04 +01:00
Mike Fährmann
751e535948
[nhentai] fix extraction ( closes #156 )
...
Use JSON embedded in webpage since API endpoints have been disabled
2019-01-14 07:57:50 +01:00
Mike Fährmann
1734a6c879
[reactor] detect "circular" redirects ( #148 )
2019-01-09 14:59:15 +01:00
Mike Fährmann
e53cdfd6a8
update build_supportedsites.py
2019-01-09 14:58:35 +01:00
Mike Fährmann
0afa913de4
[tumblr] add tests for hidden and private blogs ( #145 )
...
Hidden / dashboard-only blogs are pretty straightforward and "only"
require a valid 'access-token' and 'access-token-secret' for the given
'api-key' and 'api-secret', so that signed OAuth1.0 requests are possible.
Private / password protected blogs on the other hand are a bit
cumbersome. In addition to a valid 'access-token' and
'access-token-secret', they also require the account belonging to those
tokens to be a member of the blog itself. Knowing the password and
entering it in the website isn't enough to access a blog through the
API. Following a private blog is also impossible, so that option can't
work either.
2019-01-03 16:12:24 +01:00
Mike Fährmann
fa7fa2f8ff
[deviantart1 update tests]
2019-01-01 15:39:34 +01:00
Mike Fährmann
259123732f
[readcomiconline] improve comic-page parsing
2018-12-30 13:19:23 +01:00
Mike Fährmann
6c71e9cf5d
[deviantart] add separate 'sta.sh' extractor ( #113 )
...
- supports multiple stashed deviations per page
- explicitly mentions sta.sh support on supportedsites.rst
2018-12-26 18:56:57 +01:00
Mike Fährmann
4d73cc785d
update test results
2018-12-14 16:07:32 +01:00
Mike Fährmann
010da8372a
[instagram] relax test pattern
2018-12-11 19:59:28 +01:00
Mike Fährmann
15890930ea
[mangafox] fix extraction
...
use mobile version since desktop version is obfuscated
2018-11-26 16:13:41 +01:00
Mike Fährmann
fb53b5dd55
fix control+c during -j and range tests
2018-11-25 18:54:05 +01:00
Mike Fährmann
59bb434ba5
[flickr] add ability to download all albums of a user
...
for example with 'https://www.flickr.com/photos/shona_s/albums '
2018-11-23 09:09:37 +01:00
Mike Fährmann
041bd501fc
[hentaifoundry] unescape YII_CSRF_TOKEN value
...
This fixes the POST requests to /site/filters
2018-11-19 21:46:17 +01:00
Mike Fährmann
d4b2b73bef
release version 1.6.0
2018-11-17 18:28:02 +01:00
Mike Fährmann
3c25fa2dad
update build_testresult_db.py script
2018-11-15 22:58:14 +01:00
Mike Fährmann
7f6a0be982
adjust some tests
2018-11-15 22:50:04 +01:00
Mike Fährmann
966a9ca3a0
update test results
2018-11-10 19:14:54 +01:00
Mike Fährmann
c9861ca812
adjust message for status_code based exceptions
...
from: 5xx HTTP Error: Reason
to : 5xx: Reason
The "HTTP Error" part was in there to emulate Request's error messages
from response.raise_for_status(), but it reads a lot better without.
2018-10-18 15:09:49 +02:00
Mike Fährmann
c00dce2adc
[behance] enable 'categorytransfer'
2018-10-09 23:40:49 +02:00
Mike Fährmann
1532d1b690
fix 'range' tests and update a few test results
2018-10-08 23:53:58 +02:00
Mike Fährmann
ca6ac4db6a
fix 'content' tests
2018-10-05 21:10:33 +02:00
Mike Fährmann
d70db2d555
Revert "[komikcast] fix extraction"
...
This reverts commit 5507f5ce2e
.
2018-10-02 20:38:42 +02:00
Mike Fährmann
5507f5ce2e
[komikcast] fix extraction
2018-09-29 16:37:30 +02:00
Mike Fährmann
17611bfec0
update build_supportedsites.py script
2018-09-28 12:43:19 +02:00
Mike Fährmann
e066f35118
update extractor tests
2018-09-21 11:25:56 +02:00
Mike Fährmann
22ab509a70
[bobx] rename "model" to "idol" extractor
2018-09-14 18:11:36 +02:00
Mike Fährmann
8a23b21d0e
[tests] let 'pattern' require at least 1 URL
2018-09-02 21:19:44 +02:00
Mike Fährmann
0bc8ef51c8
[smugmug] Handle albums with no explicit owner ( #100 )
2018-09-01 12:55:02 +02:00
Mike Fährmann
34b556922d
update/restore tests
2018-08-23 15:47:40 +02:00
Mike Fährmann
e3055d356c
release version 1.5.1
2018-08-17 13:21:36 +02:00
Mike Fährmann
f9ded38d89
[test:results] add support for "range" options in tests
2018-08-15 21:49:44 +02:00
Mike Fährmann
7f4e41c989
increase timeout during extractor tests
...
cloudflare's 522 response takes longer than 30 seconds
2018-08-10 16:51:05 +02:00
Mike Fährmann
b55e39d1ee
[mangadex] improve extraction
...
- cache manga API results
- add artist, author and date fields to chapter metadata
- remove Manga-/ChapterExtractor inheritance
- minor code simplifications and improvements
2018-08-10 16:50:07 +02:00
Mike Fährmann
2a9f3341a2
[behance] fix title extraction
2018-08-08 10:48:58 +02:00
Mike Fährmann
a86f2bfc80
[pinterest] update not-found redirects
2018-08-07 12:13:19 +02:00
Mike Fährmann
7442d2940c
release version 1.5.0
2018-08-03 17:50:27 +02:00
Mike Fährmann
b040ca0718
[rule34] small unit test fixes
2018-08-03 17:28:47 +02:00
Mike Fährmann
f3793660ef
update tests
2018-08-02 14:57:28 +02:00
Mike Fährmann
42a346413b
fix "re:" prefix for keyword tests
2018-08-02 14:48:51 +02:00
Mike Fährmann
bb89a1e6d7
[mangahere] use http://
...
invalid SSL cert for quite some time now
2018-07-26 18:11:31 +02:00
Mike Fährmann
ce34d82cb4
fix skipping tests on 5xx status codes
2018-07-19 18:47:23 +02:00
Mike Fährmann
a6fe2bb594
[whatisthisimnotgoodwithcomputers] remove extractor
2018-07-14 09:53:16 +02:00
Mike Fährmann
0ba93650e0
[8chan] replace unit test URL
...
the other thread is no longer accessible
2018-07-14 09:53:16 +02:00
Mike Fährmann
269dc2bbd5
[sankaku] add 'tags' option ( #94 )
2018-07-14 09:53:01 +02:00
Mike Fährmann
764331823b
release version 1.4.2
2018-07-06 16:02:40 +02:00
Mike Fährmann
2eefaa99a3
[mangapark] support .net and .com mirrors
2018-07-05 14:45:05 +02:00
Mike Fährmann
188e956c4e
[imagefap] use HTTPS + update test results
2018-06-30 19:40:46 +02:00
Mike Fährmann
a699787d01
[deviantart] update URL patterns to new format
...
DeviantArt changed its URL format from
https://<name>.deviantart.com/...
to
https://www.deviantart.com/ <name>/...
With this change both formats will be supported.
2018-06-28 20:21:59 +02:00
Mike Fährmann
b8c97d2295
use 'extractor.request()' for more HTTP requests
2018-06-25 23:40:59 +02:00
Mike Fährmann
7a98cc9798
[smugmug] update tests
...
My test account expired and all uploaded images got deleted.
2018-06-22 15:04:31 +02:00
Mike Fährmann
4eb94aca17
[postprocessor:ugoira] pass '-f' if not present
2018-06-22 13:26:17 +02:00
Mike Fährmann
a9e276bc37
reset delete-flag
...
Since 'PathFormat' objects are being reused, setting `delete`
to True once caused all files downloaded after to be deleted as well.
2018-06-20 18:12:59 +02:00
Mike Fährmann
6ac403c5d3
add postprocessor config example
2018-06-08 18:31:59 +02:00
Mike Fährmann
a47c6136cd
[simplyhentai] avoid redirects for all-pages.json ( #89 )
2018-06-01 22:06:34 +02:00
Mike Fährmann
0a1863fce3
[pixiv] respect more query parameters for user URLs
...
The API endpoint responsible for user illustrations does not
provide sufficient filter capabilities* to match the actual
website, so we are spinning our own filters.
Respected parameters are
'type': illust, manga, ugoira
'tag' : any image tag (this was already supported)
'p' : the page to start on
*
- API can filter for illustrations and manga, but not for ugoira.
- 'offset' is applied before filtering
- no 'tag' filter
2018-05-18 15:36:30 +02:00
Mike Fährmann
4cea886177
[imgur] allow longer album hashes
2018-05-13 11:21:51 +02:00
Mike Fährmann
e1e23165a0
[pinterest] catch JSON decode errors
2018-05-11 17:37:27 +02:00
Mike Fährmann
e2157f594e
[mangadex] fix manga extraction ( closes #84 )
...
Chapter listings for manga now use
https://mangadex.org/manga/ <id>/_/chapters/2/
as URL instead of
https://mangadex.org/manga/ <id>/_//2/
2018-05-06 17:43:50 +02:00
Mike Fährmann
3fe653d940
fix test_results for empty sets
...
{} is an empty dict and doesn't support set operations
2018-04-29 22:43:37 +02:00
Mike Fährmann
d96b3474e5
[puremashiro] remove module
...
site has been unreachable for a couple of weeks
and now the DNS record is gone as well
2018-04-28 14:24:20 +02:00
Mike Fährmann
b44a296404
[gomanga] remove module
...
site has been unreachable for a couple of weeks
and the cloudflare status page shows host errors
2018-04-28 14:24:21 +02:00
Mike Fährmann
2395d870dd
[pinterest] unquote board and user names, better errors
2018-04-26 16:38:12 +02:00
Mike Fährmann
55d4d23860
[pinterest] use Pinterest's "Web" API ( #83 )
...
no access tokens, no user credentials of any kind ...
2018-04-24 22:28:10 +02:00
Mike Fährmann
10cc59f3b5
fix extractor names
2018-04-18 18:12:57 +02:00
Mike Fährmann
df7e18399e
[luscious] fix image order
2018-04-17 17:32:21 +02:00
Mike Fährmann
d10579edb5
[pinterest] improve PinterestAPI code; remove OAuth mentions
...
on another note: access_tokens have been set to only allow for
10 requests per hour (from 200 yesterday)
2018-04-17 17:12:42 +02:00
Mike Fährmann
4bd182c107
[pinterest] implement oauth:pinterest
( #83 )
...
Pinterest access tokens are rate limited at 200 requests per
hour (or maybe per 2 or 3 hours?) so having just one access token
for all users isn't going to work in the long run.
2018-04-16 20:03:28 +02:00
Mike Fährmann
dbe250f7e5
[pinterest] update access_token ( #83 )
2018-04-16 09:46:45 +02:00
Mike Fährmann
48a83a89e9
[loveisover] remove module
...
archive.loveisover.me was shut down on 2018-03-29;
https://www.archiveteam.org/index.php?title=4chan#archive.loveisover.me
2018-04-09 16:05:15 +02:00
Mike Fährmann
564e12ca8f
replace 'imgyt' with 'imxto'
...
https://img.yt/ wasn't available for a couple of days, but has now
re-emerged as https://imx.to/ with a new web-interface.
Links to older images still work (see tests).
2018-04-09 15:53:20 +02:00
Mike Fährmann
d11fcf4804
smaller changes and fixes
...
- fix the cloudflare challenge result if the last decimal places
are zero (JS`s toFixed() removes trailing zeroes)
- fix downloading of kissmanga chapter-pages hosted on blogspot
(accessing blogspot with "kissmanga.com" as referrer yields a 401)
- disable certificate validation for 'mangahere' tests
- update flickr test result
2018-04-06 15:30:09 +02:00
Mike Fährmann
759ba26fb0
[luscious] proper image order for picture albums
...
... and (try) to start with the first image instead of somewhere
in the middle of an album.
2018-04-05 18:12:01 +02:00
Mike Fährmann
0381ae5318
replace error handlers for stdout and co.
...
Python3.5 and lower throw an UnicodeEncodeError when trying to print
not-encodable characters when not using 'utf-8' as encoding.
Setting their error handlers to 'replace' should help.
2018-04-04 17:30:42 +02:00
Mike Fährmann
64d7c85b55
[exhentai] improve metadata
...
- add 'width', 'height' and 'size' (in bytes) for each image
- change the former 'size' and 'size_units' into 'gallery_size'
2018-04-03 18:59:53 +02:00
Mike Fährmann
a112e3f2a0
[nijie] add doujin extractor
...
adds support for "https://nijie.info/members_dojin.php?id= <artist_id>"
2018-03-31 18:17:41 +02:00
Mike Fährmann
f5c6a2d7f5
[nhentai] use API to get gallery info
2018-03-21 12:58:41 +01:00
Mike Fährmann
8ef790de12
update .travis.yml
...
- restrict builds to master branch and release tags
- implement 'core' and 'results' test categories
2018-03-19 17:57:32 +01:00