Mike Fährmann
968d3e8465
remove '&' from URL patterns
...
'/?&#' -> '/?#' and '?&#' -> '?#'
According to https://www.ietf.org/rfc/rfc3986.txt , URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
cb0132e441
[khinsider] add 'format' option ( closes #840 )
2020-07-13 17:17:58 +02:00
Mike Fährmann
19ae6f3fc4
update test results
...
- twitter:
Don't test the whole kwdict, only the actual content, since the
keyword hash changes whenever that user changes his display name.
- khinsider:
Download host changed
2020-02-22 03:25:32 +01:00
Mike Fährmann
6426e3efc7
[khinsider] fix and improve metadata extraction
2020-02-07 18:20:38 +01:00
Mike Fährmann
5530871b5a
change results of text.nameext_from_url()
...
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)
Example: "https://example.org/path/filename.ext "
before:
- filename : filename.ext
- name : filename
- extension: ext
now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
2019-02-11 13:31:10 +01:00
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
00dc37ccbf
replace AsynchronousMixin Extractor with a Mixin
2019-02-04 14:21:19 +01:00
Mike Fährmann
95392554ee
use text.urljoin()
2018-04-26 17:00:26 +02:00
Mike Fährmann
179bcdd349
adjust archive-ids
2018-02-13 04:50:45 +01:00
Mike Fährmann
34873dbd90
set 'archive_fmt' values
...
These are going to be used to create an unique id for each image.
2018-02-01 15:30:49 +01:00
Mike Fährmann
444008a14a
[khinsider] use urljoin() to complete page URLs
2017-12-17 16:21:05 +01:00
Mike Fährmann
291369eab2
various smaller changes/additions
2017-12-06 21:45:56 +01:00
Mike Fährmann
d275b1d9a3
[khinsider] fix extraction
...
... again
2017-12-04 12:42:06 +01:00
Mike Fährmann
2b9a783fc7
[khinsider] fix extraction
2017-12-01 14:00:37 +01:00
Mike Fährmann
55c64cad4b
[khinsider] fix filename extension and test-pattern
2017-11-28 19:35:47 +01:00
Mike Fährmann
65c1c53eb8
[khinsider] fix extraction
2017-11-23 15:33:49 +01:00
Mike Fährmann
68a0a7579c
fix/improve some regular expressions
2017-10-09 22:37:50 +02:00
Mike Fährmann
85a2b2ae59
[khinsider] fix extraction
2017-09-28 11:47:26 +02:00
Mike Fährmann
84d4450410
[fallenangels] extract manga metadata
2017-09-15 20:51:40 +02:00
Mike Fährmann
c184e47ee3
put common directory- and filename formats in base classes
2017-05-30 12:10:16 +02:00
Mike Fährmann
13dc5d72bc
update some extractors to use https
2017-04-20 13:32:40 +02:00
Mike Fährmann
94e10f249a
code adjustments according to pep8 nr2
2017-02-01 00:53:19 +01:00
Mike Fährmann
37d4d07d9b
compatibility fixes to make a standalone exe work
2017-01-23 00:07:36 +01:00
Mike Fährmann
828aedd571
[khinsider] unescape soundtrack title
2016-10-25 15:26:32 +02:00
Mike Fährmann
56d810c896
update keyword hashes for tests
2016-09-25 17:28:46 +02:00
Mike Fährmann
19c2d4ff6f
remove explicit (sub)category keywords
2016-09-25 14:22:07 +02:00
Mike Fährmann
49a05c32ed
add missing tests
2016-09-19 16:15:27 +02:00
Mike Fährmann
d7e168799d
consistent extractor naming scheme + docstrings
2016-09-12 10:34:31 +02:00
Mike Fährmann
000df8d1fa
add 'encoding' argument for Extractor.request
2016-07-12 12:06:17 +02:00
Mike Fährmann
2b15b81673
[khinsider] add extractor
2016-04-20 08:34:44 +02:00