Mike Fährmann
6dae6bee37
automatically detect and bypass cloudflare challenge pages
...
TODO: cache and re-apply cfclearance cookies
2019-03-10 15:31:33 +01:00
Mike Fährmann
25aaf55514
[smugmug] improve format selection ( closes #183 )
...
- use original image if available
- support video formats
- remove user info for ImageExtractor (it is no longer possible to get
image owner information for a single image)
2019-03-10 15:20:35 +01:00
Mike Fährmann
7c1cb923a4
[myportfolio] replace unit test
...
the old gallery got removed
2019-03-10 15:06:16 +01:00
Mike Fährmann
fffbfd3dce
[imgspice] fix extraction
2019-03-09 20:29:23 +01:00
Mike Fährmann
4ca4631bad
simplify auto-disabling certificate verification
...
if no certificate bundle is found
2019-03-08 16:34:01 +01:00
Mike Fährmann
09d872a2b1
generalize extractor creation code
2019-03-07 22:55:26 +01:00
Mike Fährmann
8dc6be246b
[shopify] add custom retry logic for 430 status codes ( #175 )
2019-03-07 15:31:15 +01:00
Mike Fährmann
0887fb61f4
[komikcast] update test results
2019-03-07 14:55:52 +01:00
Mike Fährmann
976ccb267f
[myportfolio] combine gallery and user extractors
...
An URL alone isn't good enough to distinguish between a gallery or a
gallery-listing, so the new extractor decides what to do based on the
page's content.
2019-03-06 19:45:01 +01:00
Mike Fährmann
efd104e45e
[instagram] reject more non-user URLs ( #180 )
2019-03-06 10:26:01 +01:00
HRXN
56e0e92e0d
[shopify] cosmetic changes in shopify.py ( #181 )
...
Glanced over the commits, randomly spotted some minor things.
2019-03-06 09:16:27 +01:00
Mike Fährmann
9c0e2f294b
[shopify] add generic collection and product extractors ( #175 )
...
with fashionnova.com as a default domain
2019-03-05 22:33:37 +01:00
Mike Fährmann
26c4365baa
adjust metadata types for GalleryExtractors
2019-03-02 14:53:04 +01:00
Mike Fährmann
13e0f2a78f
[deviantart] add 'scraps' extractor ( closes #168 )
2019-03-01 14:13:34 +01:00
Mike Fährmann
3ea11f5d5e
[nhentai] rewrite
...
- use GalleryExtractor as base class
- extract a lot more metadata (artist, tags, etc.)
2019-03-01 14:13:34 +01:00
Mike Fährmann
3595cd582f
use GalleryExtractor as common base class
2019-03-01 14:13:16 +01:00
Mike Fährmann
a138d5873d
[hentaifoundry] improve/fix extraction
...
- Sometimes an ad interfered when trying to get a download URL
- Resolving "www.hentai-foundry.com" yields an invalid(?) IPv6 address
(2607:5300:60:ca9e:feed:dead:beef:1) and urllib3 only tries to connect
to the IPv4 variant after a rather long wait time
2019-02-25 16:16:09 +01:00
Mike Fährmann
280531c8ff
[pururin] add gallery extractor ( closes #174 )
2019-02-25 14:54:57 +01:00
Mike Fährmann
3159dd79d5
[seiga] use HTTPS
2019-02-21 22:51:11 +01:00
Mike Fährmann
f6734142ee
[komikcast] remove 'width' and 'height' info
2019-02-19 15:12:40 +01:00
Mike Fährmann
d0059cab79
[tumblr] check for null URLs ( closes #165 )
2019-02-19 13:49:55 +01:00
Mike Fährmann
e687a6095e
[luscious] raise exception if album is not available
2019-02-19 13:30:39 +01:00
Mike Fährmann
22d3a2fcc8
[artstation] add extractor for artwork listings ( #80 )
...
like https://www.artstation.com/artwork?sorting=latest
or https://www.artstation.com/artwork?sorting=picks
2019-02-18 12:45:44 +01:00
Mike Fährmann
937a802b49
[dynastyscans] add extractors for images and image searches
...
(closes #163 )
2019-02-18 12:25:52 +01:00
Mike Fährmann
b09a8184ca
move TestJob into test module; test _extractor values
2019-02-17 18:18:31 +01:00
Mike Fährmann
19860655a3
[weibo] add 'user' and 'status' extractors
2019-02-17 18:18:31 +01:00
Mike Fährmann
f8782c05f2
[paheal] rename "tags" to "search_tags"
...
to better match field names of other booru extractors
2019-02-17 18:18:09 +01:00
Mike Fährmann
c7b8421333
[deviantart] don't match 'www' as a potential username
2019-02-15 16:38:29 +01:00
Mike Fährmann
5530871b5a
change results of text.nameext_from_url()
...
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)
Example: "https://example.org/path/filename.ext "
before:
- filename : filename.ext
- name : filename
- extension: ext
now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
32edf4fc7b
add '_extractor' info to manga extractor results
2019-02-13 13:23:36 +01:00
Mike Fährmann
89ee8cd7e4
filter "private" kwdict entries
2019-02-13 13:22:11 +01:00
Mike Fährmann
61741d7333
provide type information for Queue messages
...
Child extractors are now directly constructed with Extractor.from_url()
if the extractor class is known beforehand, instead of using
extractor.find() and searching through all possible extractor classes.
2019-02-12 21:32:32 +01:00
Mike Fährmann
2e516a1e3e
store the full original URL in Extractor.url
2019-02-12 18:46:48 +01:00
Mike Fährmann
580baef72c
change Chapter and MangaExtractor classes
...
- unify and simplify constructors
- rename get_metadata and get_images to just metadata() and images()
- rename self.url to chapter_url and manga_url
2019-02-11 18:38:47 +01:00
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
2019-02-11 13:31:10 +01:00
Mike Fährmann
ade86da7a1
[tsumino] replace test
2019-02-11 13:25:38 +01:00
Mike Fährmann
1f3422c28b
[mangahere] fix extraction
2019-02-10 22:10:53 +01:00
Mike Fährmann
84ae72b8d8
[ngomik] fix extraction
2019-02-10 14:19:08 +01:00
Mike Fährmann
02d733d219
[simplyhentai] fix and improve tag extraction
...
The "tags" field is now a list instead of a string.
In format strings, use "{tags:J, }" to Join them.
2019-02-10 13:52:09 +01:00
Mike Fährmann
3a0b4af744
[seiga] recognize /thumb/ URLs
...
https://lohas.nicoseiga.jp/thumb/5977527i
2019-02-09 16:53:27 +01:00
Mike Fährmann
8fc6fbfa34
[artstation] recognize shortened project URLs
...
https://artstn.co/p/ <project-id>
2019-02-09 16:53:11 +01:00
Mike Fährmann
9a9cd32461
implement alternative constructor for extractors
2019-02-09 14:42:25 +01:00
Mike Fährmann
abbd45d0f4
update handling of extractor URL patterns
...
When loading extractor classes during 'extractor.find(…)', their
'pattern' attribute will be replaced with a compiled version of itself.
2019-02-08 20:08:16 +01:00
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
34bab080ae
rewrite URL patterns to use only 1 per extractor
2019-02-08 12:03:10 +01:00
Mike Fährmann
0e46db6f45
rename some base classes
...
They shouldn't be called …Extractor if they don't have 'Extractor' as
their base class.
2019-02-08 11:43:40 +01:00
Mike Fährmann
793b24e513
[imagehosts] fix and improve various extractors
2019-02-06 17:41:26 +01:00
Mike Fährmann
bc0951d974
allow for simplified test data structures
...
Instead of a strict list of (URL, RESULTS)-tuples, extractor result
tests can now be a single (URL, RESULTS)-tuple, if it's just one test,
and "only matching" tests can now be a simple string.
2019-02-06 17:24:44 +01:00
Mike Fährmann
050bc1aa4a
[reactor] simplify tests
...
Some posts have, for whatever reason, a slightly different text
formatting the first time they are accessed that day
compared to any further time.
2019-02-05 10:37:44 +01:00
Mike Fährmann
2f3a021d72
[hentaicafe] restore functionality
2019-02-05 10:22:52 +01:00