Mike Fährmann
c7b8421333
[deviantart] don't match 'www' as a potential username
2019-02-15 16:38:29 +01:00
Mike Fährmann
5530871b5a
change results of text.nameext_from_url()
...
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)
Example: "https://example.org/path/filename.ext "
before:
- filename : filename.ext
- name : filename
- extension: ext
now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
148b8f15d0
update tests for util.py
2019-02-14 11:15:19 +01:00
Mike Fährmann
ae353ed3b0
provide "extractor" and "job" keys for logging output
...
This allows for stuff like "{extractor.url}" and "{extractor.category}"
in logging format strings.
Accessing 'extractor' and 'job' in any way will return "None" if those
fields aren't defined, i.e. in general logging messages.
2019-02-14 11:09:58 +01:00
Mike Fährmann
32edf4fc7b
add '_extractor' info to manga extractor results
2019-02-13 13:23:36 +01:00
Mike Fährmann
89ee8cd7e4
filter "private" kwdict entries
2019-02-13 13:22:11 +01:00
Mike Fährmann
61741d7333
provide type information for Queue messages
...
Child extractors are now directly constructed with Extractor.from_url()
if the extractor class is known beforehand, instead of using
extractor.find() and searching through all possible extractor classes.
2019-02-12 21:32:32 +01:00
Mike Fährmann
2e516a1e3e
store the full original URL in Extractor.url
2019-02-12 18:46:48 +01:00
Mike Fährmann
580baef72c
change Chapter and MangaExtractor classes
...
- unify and simplify constructors
- rename get_metadata and get_images to just metadata() and images()
- rename self.url to chapter_url and manga_url
2019-02-11 18:38:47 +01:00
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
2019-02-11 13:31:10 +01:00
Mike Fährmann
ade86da7a1
[tsumino] replace test
2019-02-11 13:25:38 +01:00
Mike Fährmann
1f3422c28b
[mangahere] fix extraction
2019-02-10 22:10:53 +01:00
Mike Fährmann
84ae72b8d8
[ngomik] fix extraction
2019-02-10 14:19:08 +01:00
Mike Fährmann
02d733d219
[simplyhentai] fix and improve tag extraction
...
The "tags" field is now a list instead of a string.
In format strings, use "{tags:J, }" to Join them.
2019-02-10 13:52:09 +01:00
Mike Fährmann
3a0b4af744
[seiga] recognize /thumb/ URLs
...
https://lohas.nicoseiga.jp/thumb/5977527i
2019-02-09 16:53:27 +01:00
Mike Fährmann
8fc6fbfa34
[artstation] recognize shortened project URLs
...
https://artstn.co/p/ <project-id>
2019-02-09 16:53:11 +01:00
Mike Fährmann
9a9cd32461
implement alternative constructor for extractors
2019-02-09 14:42:25 +01:00
Mike Fährmann
abbd45d0f4
update handling of extractor URL patterns
...
When loading extractor classes during 'extractor.find(…)', their
'pattern' attribute will be replaced with a compiled version of itself.
2019-02-08 20:08:16 +01:00
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
34bab080ae
rewrite URL patterns to use only 1 per extractor
2019-02-08 12:03:10 +01:00
Mike Fährmann
0e46db6f45
rename some base classes
...
They shouldn't be called …Extractor if they don't have 'Extractor' as
their base class.
2019-02-08 11:43:40 +01:00
Mike Fährmann
793b24e513
[imagehosts] fix and improve various extractors
2019-02-06 17:41:26 +01:00
Mike Fährmann
bc0951d974
allow for simplified test data structures
...
Instead of a strict list of (URL, RESULTS)-tuples, extractor result
tests can now be a single (URL, RESULTS)-tuple, if it's just one test,
and "only matching" tests can now be a simple string.
2019-02-06 17:24:44 +01:00
Mike Fährmann
b49c3c9991
release version 1.7.0
2019-02-05 16:31:39 +01:00
Mike Fährmann
53c2fd4664
add mastodon/foolslide/foolfuuka examples to example config
2019-02-05 16:17:25 +01:00
Mike Fährmann
050bc1aa4a
[reactor] simplify tests
...
Some posts have, for whatever reason, a slightly different text
formatting the first time they are accessed that day
compared to any further time.
2019-02-05 10:37:44 +01:00
Mike Fährmann
2f3a021d72
[hentaicafe] restore functionality
2019-02-05 10:22:52 +01:00
Mike Fährmann
347398f692
fix various tests
2019-02-04 14:40:21 +01:00
Mike Fährmann
00dc37ccbf
replace AsynchronousMixin Extractor with a Mixin
2019-02-04 14:21:19 +01:00
Mike Fährmann
4d656a81ca
replace SharedConfigExtractor class with a Mixin
2019-02-04 13:46:02 +01:00
Mike Fährmann
ccb95d0ba4
[mastodon] changes/improvements based on foolfuuka/-slide
2019-02-04 13:13:58 +01:00
Mike Fährmann
12ff750111
[foolfuuka] smaller code changes and updates
2019-02-04 12:55:33 +01:00
Mike Fährmann
e1bf3b225e
[foolslide] dynamically generate extractor classes
2019-02-04 12:54:07 +01:00
Mike Fährmann
58a9eede38
[foolfuuka] dynamically generate extractor classes
2019-02-03 17:09:45 +01:00
Mike Fährmann
22d7a783d5
update extraction result tests
2019-02-02 15:37:54 +01:00
Mike Fährmann
197d0e99a4
[tsumino] more useful error message ( #161 )
...
if Tsumino suspects a non-human user and refuses to send gallery pages
2019-02-02 14:57:51 +01:00
Mike Fährmann
d36ec51e5a
[tsumino] add extractor for search results ( #161 )
2019-02-02 14:56:46 +01:00
Mike Fährmann
1c1367ec5b
[behance] fix empty docstring
2019-02-02 14:41:05 +01:00
Mike Fährmann
373cb07b28
update .travis.yml and run_tests.sh
...
- add python3.8 and pypy3 builds
- remove deprecated 'sudo: true' and 'sudo: false'
- enable builds for 'test-...' branches
2019-01-31 15:58:52 +01:00
Mike Fährmann
45e529ab91
[behance] fix extraction
...
HTML structure for gallery pages changed quite a bit, so it is now using
the embedded JSON data. This changes a lot of metadata field names, but
'gallery_id', 'title', and 'user' are still provided for backwards
compatibility.
The internal API endpoint for user galleries also changed its data
structure, but nothing too major.
2019-01-31 14:33:23 +01:00
Mike Fährmann
e1d3e9a926
add 'ext_from_url' to text.py
2019-01-31 12:23:25 +01:00
Mike Fährmann
bfbbac4495
[tsumino] add login capabilities ( #161 )
2019-01-30 17:58:48 +01:00
Mike Fährmann
dd358b4564
improve cookie handling during logins
2019-01-30 17:09:32 +01:00
Mike Fährmann
6126615698
update URLs for supportedsites.rst
2019-01-30 16:18:22 +01:00
Mike Fährmann
80a75a1ecf
[tsumino] add gallery extractor ( #161 )
2019-01-29 17:28:48 +01:00
Mike Fährmann
2d2953a5bf
add 'text.parse_float()' + cleanup in text.py
2019-01-29 16:46:21 +01:00
Mike Fährmann
0c32dc5858
[hentaifox] add extractor for search results ( #160 )
2019-01-28 22:38:32 +01:00
Mike Fährmann
580947bfce
[hentaifox] rename Chapter- to GalleryExtractor ( #160 )
2019-01-28 21:49:26 +01:00
Mike Fährmann
8095f5f81a
[mangapark] fix manga title extraction
2019-01-28 18:04:42 +01:00
Mike Fährmann
0156189468
[hentaifox] add chapter extractor ( #160 )
2019-01-28 18:00:32 +01:00