Mike Fährmann
a453335a9f
remove test results in extractor modules
...
and add generic example URLs
2023-09-11 16:30:55 +02:00
Mike Fährmann
4883420e67
[generic] revert pattern change
2023-03-09 22:25:23 +01:00
ClosedPort22
34a7fab0e2
[generic] add support for IDNs
...
(internationalized domain name)
2023-03-06 22:42:36 +08:00
Mike Fährmann
b8d268f57e
allow '/' and '?' in URL queries
2022-10-02 19:02:05 +02:00
Mike Fährmann
bd08ee2859
remove most 'yield Message.Version' statements
...
only leave them in oauth.py as noop results
2021-08-16 03:10:48 +02:00
Mike Fährmann
a416e54765
[directlink] manually encode Referer URLs ( fixes #1647 )
...
Trying to send a non-latin-1-encodable header raises an exception,
so we encode the Referer value ourselves with 'errors=ignore'.
2021-06-21 20:28:19 +02:00
Mike Fährmann
968d3e8465
remove '&' from URL patterns
...
'/?&#' -> '/?#' and '?&#' -> '?#'
According to https://www.ietf.org/rfc/rfc3986.txt , URLs are
"organized hierarchically" by using "the slash ("/"), question
mark ("?"), and number sign ("#") characters to delimit components"
2020-10-22 23:31:25 +02:00
Mike Fährmann
bf3df3d0b0
[directlink] send Referer headers ( closes #536 )
2019-12-25 17:17:07 +01:00
Mike Fährmann
db35c3b581
[directlink] separate filenames from paths
...
With this, all default filename formats specify an '{extension}'
and PathFormat.set_extension() reliably works for all files.
2019-11-28 23:50:00 +01:00
Mike Fährmann
c08c340178
[directlink] make pattern case insensitive ( fixes #296 )
2019-06-03 10:56:14 +02:00
Mike Fährmann
5530871b5a
change results of text.nameext_from_url()
...
Instead of getting a complete 'filename' from an URL and splitting that
into 'name' and 'extension', the new approach gets rid of the complete
version and renames 'name' to 'filename'. (Using anything other than
{extension} for a filename extension doesn't really work anyway)
Example: "https://example.org/path/filename.ext "
before:
- filename : filename.ext
- name : filename
- extension: ext
now:
- filename : filename
- extension: ext
2019-02-14 16:07:17 +01:00
Mike Fährmann
2e516a1e3e
store the full original URL in Extractor.url
2019-02-12 18:46:48 +01:00
Mike Fährmann
4b1880fa5e
propagate 'match' to base extractor constructor
2019-02-11 13:31:10 +01:00
Mike Fährmann
6284731107
simplify extractor constants
...
- single strings for URL patterns
- tuples instead of lists for 'directory_fmt' and 'test'
- single-tuple tests where applicable
2019-02-08 13:45:40 +01:00
Mike Fährmann
16e014baaa
[smugmug] added image and album extractor
...
just some initial code that still requires a lot of work ...
TODO:
- folders
- old-style albums (which are nearly all of them ...)
- images from users
- OAuth
It could also happen that the API credentials used will become invalid
whenever my 14 day trial period ends (7 days remaining), but that
would just require users to supply their own.
2018-04-29 21:27:25 +02:00
Mike Fährmann
19aefdfde3
[directlink] update test results
2018-02-26 03:01:23 +01:00
Mike Fährmann
74029c50bb
[directlink] unquote metadata fields
2018-02-26 02:12:47 +01:00
Mike Fährmann
179bcdd349
adjust archive-ids
2018-02-13 04:50:45 +01:00
Mike Fährmann
c4713404c8
[directlink] improve URL pattern
2017-08-02 21:06:49 +02:00
Mike Fährmann
c864be479e
[directlink] update URL pattern & PEP 8
...
- combine some file extensions
- don't match '.je'
- line length < 80
2017-07-27 20:46:15 +02:00
H R X N
45f9d64c23
Update directlink.py with additional file exts. ( #30 )
...
Add WebP, still not that common, but it's increasing.
Add 3rd JPEG variant (https://en.wikipedia.org/wiki/JPEG#JPEG_filename_extensions )
Never seen JFIF in the wild, would probably be overkill.
Extend Ogg formats (https://en.wikipedia.org/wiki/Ogg ; https://wiki.xiph.org/MIME_Types_and_File_Extensions )
2017-07-27 20:40:00 +02:00
Mike Fährmann
b6fffa9e26
[directlink] update filename format and metadata
2017-05-30 17:33:09 +02:00
Mike Fährmann
c184e47ee3
put common directory- and filename formats in base classes
2017-05-30 12:10:16 +02:00
Mike Fährmann
f79320e35b
fix tests
2017-05-27 11:47:15 +02:00
Mike Fährmann
691c4dd709
support direct image links
2017-05-24 12:51:18 +02:00