1
0
mirror of https://github.com/yt-dlp/yt-dlp.git synced 2024-07-01 09:50:05 +02:00

Compare commits

...

62 Commits

Author SHA1 Message Date
Elyse
3709567cac
Merge 6208f7be9c into f3411af12e 2024-06-25 07:47:15 -07:00
megumin
f3411af12e
[ie/matchtv] Fix extractor (#10190)
Authored by: megumintyan
2024-06-25 00:49:09 +02:00
bashonly
6208f7be9c
Merge branch 'master' into yt-live-from-start-range 2024-06-12 01:29:53 -05:00
bashonly
6a84199473
Merge branch 'yt-dlp:master' into pr/live-sections 2024-05-28 13:22:13 -05:00
bashonly
54ad67d785
Merge branch 'yt-dlp:master' into pr/live-sections 2024-05-23 09:48:06 -05:00
bashonly
172dfbeaed
Merge branch 'yt-dlp:master' into pr/live-sections 2024-05-10 13:52:35 -05:00
bashonly
cf96b24de6
Merge branch 'master' into yt-live-from-start-range 2024-04-16 11:01:17 -05:00
bashonly
50c943e8a0
Merge branch 'yt-dlp:master' into pr/yt-live-from-start-range 2024-03-19 15:18:22 -05:00
bashonly
6fc6349ef0
Merge branch 'master' into yt-live-from-start-range 2024-02-29 04:58:30 -06:00
bashonly
5156a16cf9
Merge branch 'master' into yt-live-from-start-range 2024-01-19 17:05:19 -06:00
Elyse
fb2b57a773 Merge remote-tracking branch 'github/yt-live-from-start-range' into yt-live-from-start-range 2023-10-08 01:01:31 -06:00
Elyse
2741b5827d Merge remote-tracking branch 'origin' into yt-live-from-start-range 2023-10-08 00:24:29 -06:00
bashonly
bd730470f2
Cleanup 2023-07-22 13:32:10 -05:00
bashonly
194bc49c55
Merge branch 'yt-dlp:master' into pr/6498 2023-07-22 13:23:54 -05:00
bashonly
1416cee726
Update yt_dlp/options.py 2023-07-22 17:59:48 +00:00
Elyse
622c555356 Fix bug after merge 2023-06-24 14:43:50 -06:00
Elyse
99e6074c5d Merge remote-tracking branch 'origin' into yt-live-from-start-range 2023-06-24 14:30:12 -06:00
Elyse
1f7974690e Merge remote-tracking branch 'origin' into yt-live-from-start-range 2023-06-03 14:39:32 -06:00
Elyse
8ee942a9c8 Add warning about --download-sections without --live-from-start 2023-05-13 13:29:28 -06:00
Elyse
444e02ef3b Merge remote-tracking branch 'origin/master' into yt-live-from-start-range 2023-05-07 00:33:18 -06:00
Elyse
4e93198ae6 Restore README.md
I think this is auto-generated by some script
2023-05-06 23:29:40 -06:00
Elyse
78285eea86 Update options docs 2023-05-06 23:24:58 -06:00
Elyse
7f93eb7a28 Support for epoch timestamps 2023-05-06 23:05:38 -06:00
Elyse
128d30492b Always compute last_seq 2023-04-18 23:17:39 -06:00
Elyse
129555b19a Fix return values of _extract_sequence_from_mpd 2023-03-17 22:39:21 -06:00
Elyse
01f672fe27 Lock less agressively
This gives a speed performance of about 30%
2023-03-17 22:37:31 -06:00
Elyse
2fbe18557b Add some documentation 2023-03-12 01:42:45 -06:00
Elyse
b131f3d1f1 Improve option documentation 2023-03-12 01:37:33 -06:00
Elyse
544836de83 Allow days in parse_duration 2023-03-12 01:37:21 -06:00
pukkandan
6cea8cbe2d
Merge remote-tracking branch 'origin/master' into pr/6498 2023-03-12 11:57:41 +05:30
Elyse
5e4699a623 Fix linter 2023-03-11 20:02:52 -06:00
Elyse
79ae58a5c4 Fix linter 2023-03-11 20:00:34 -06:00
Elyse
3faa1e33ed Add initial documentation 2023-03-11 19:51:14 -06:00
Elyse
fbae888c65 Add debug for selected section 2023-03-11 19:51:14 -06:00
Elyse
cdac7641d6 Remove tz_aware date code 2023-03-11 19:51:14 -06:00
Elyse
a43ba2eff6 Fix unified_timestamp 2023-03-11 19:51:14 -06:00
Elyse
0ed9a73a73 Add fragment count 2023-03-11 19:51:14 -06:00
Elyse
e40132da09 Revert "[utils] Allow using local timezone for 'now' timestamps"
This reverts commit 1799a6ae36.
2023-03-11 19:51:14 -06:00
Elyse
e6e2eb00f1 Support negative durations 2023-03-11 19:51:14 -06:00
pukkandan
9fc70f3f6d [extractor/youtube] Construct fragment list lazily
Building fragment list for all formats take significant time for large videos
2023-03-11 19:51:14 -06:00
pukkandan
5ef1a928a7 [extractor/youtube] Add extractor-arg include_duplicate_formats 2023-03-11 19:51:14 -06:00
Lesmiscore
db62ffdafe [extractor/youtube] Add client name to format_note when -v (#6254)
Authored by: Lesmiscore, pukkandan
2023-03-11 19:51:14 -06:00
vampirefrog
f137666451 [extractor/rokfin] Re-construct manifest url (#6507)
Authored by: vampirefrog
2023-03-11 19:51:14 -06:00
Daniel Vogt
e3ffdf76aa [extractor/opencast] Fix format bug (#6512)
Authored by: C0D3D3V
2023-03-11 19:51:14 -06:00
pukkandan
9f717b69b4 [extractor/hidive] Fix login
Fixes https://github.com/yt-dlp/yt-dlp/issues/6493#issuecomment-1462906556
2023-03-11 19:51:14 -06:00
pukkandan
34d3df72e9 Support loading info.json with a list at it's root 2023-03-11 19:51:14 -06:00
makeworld
96f5d29db0 [extractor/cbc:gem] Update _VALID_URL (#6499)
Authored by: makeworld-the-better-one
Closes #6395
2023-03-11 19:51:13 -06:00
Elyse
c222f6cbfc [extractor/twitch] Fix is_live (#6500)
Closes #6494
Authored by: elyse0
2023-03-11 19:51:13 -06:00
pukkandan
2d1655493f [extractor/youtube] Bypass throttling for -f17
and related cleanup

Thanks @AudricV for the finding
2023-03-11 19:51:13 -06:00
pukkandan
c376b95f95 [downloader/curl] Fix progress reporting
Bug in 8c53322cda
Closes #6490
2023-03-11 19:51:13 -06:00
Daniel Vogt
8df470761e [extractor/opencast] Add ltitools to _VALID_URL (#6371)
Authored by: C0D3D3V
2023-03-11 19:51:13 -06:00
D0LLYNH0
e3b08bac9c [extractor/iq] Set more language codes (#6476)
Authored by: D0LLYNH0
2023-03-11 19:51:13 -06:00
Elyse
932758707f Fix linter 2023-03-09 18:51:10 -06:00
Elyse
317ba03fdf Improve parse_chapters comments 2023-03-09 18:35:20 -06:00
Elyse
e42e25619f Create last_segment_url only if necessary 2023-03-09 18:24:39 -06:00
Elyse
fba1c397b1 [youtube] Support --download-sections for YT Livestream from start 2023-03-09 17:32:19 -06:00
Elyse
b83d7526f2 Add fixme in modified parse_chapters function
A range like '*(now-1hour)-(now-30minutes)' doesn't work
2023-03-09 17:21:02 -06:00
Elyse
fdb9aaf416 Use local timezone for download sections 2023-03-09 17:19:39 -06:00
Elyse
1799a6ae36 [utils] Allow using local timezone for 'now' timestamps 2023-03-09 17:18:44 -06:00
Elyse
367429e238 [common] Extract start and end keys for Dash fragments 2023-03-09 17:17:16 -06:00
Sophire
439be2b4a4 [utils] Add microseconds to unified_timestamp 2023-03-09 12:07:08 -06:00
Elyse
2fbd6de957 [utils] Add hackish 'now' support for --download-sections 2023-03-09 11:30:40 -06:00
9 changed files with 123 additions and 62 deletions

View File

@ -413,10 +413,15 @@ def test_unified_timestamps(self):
self.assertEqual(unified_timestamp('Sep 11, 2013 | 5:49 AM'), 1378878540) self.assertEqual(unified_timestamp('Sep 11, 2013 | 5:49 AM'), 1378878540)
self.assertEqual(unified_timestamp('December 15, 2017 at 7:49 am'), 1513324140) self.assertEqual(unified_timestamp('December 15, 2017 at 7:49 am'), 1513324140)
self.assertEqual(unified_timestamp('2018-03-14T08:32:43.1493874+00:00'), 1521016363) self.assertEqual(unified_timestamp('2018-03-14T08:32:43.1493874+00:00'), 1521016363)
self.assertEqual(unified_timestamp('2022-10-13T02:37:47.831Z'), 1665628667)
self.assertEqual(unified_timestamp('December 31 1969 20:00:01 EDT'), 1) self.assertEqual(unified_timestamp('December 31 1969 20:00:01 EDT'), 1)
self.assertEqual(unified_timestamp('Wednesday 31 December 1969 18:01:26 MDT'), 86) self.assertEqual(unified_timestamp('Wednesday 31 December 1969 18:01:26 MDT'), 86)
self.assertEqual(unified_timestamp('12/31/1969 20:01:18 EDT', False), 78) self.assertEqual(unified_timestamp('12/31/1969 20:01:18 EDT', False), 78)
self.assertEqual(unified_timestamp('2023-03-09T18:01:33.646Z', with_milliseconds=True), 1678384893.646)
# ISO8601 spec says that if no timezone is specified, we should use local timezone;
# but yt-dlp uses UTC to keep things consistent
self.assertEqual(unified_timestamp('2023-03-11T06:48:34.008'), 1678517314)
def test_determine_ext(self): def test_determine_ext(self):
self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4') self.assertEqual(determine_ext('http://example.com/foo/bar.mp4/?download'), 'mp4')

View File

@ -27,7 +27,12 @@
from .compat import functools, urllib # isort: split from .compat import functools, urllib # isort: split
from .compat import compat_os_name, urllib_req_to_req from .compat import compat_os_name, urllib_req_to_req
from .cookies import LenientSimpleCookie, load_cookies from .cookies import LenientSimpleCookie, load_cookies
from .downloader import FFmpegFD, get_suitable_downloader, shorten_protocol_name from .downloader import (
DashSegmentsFD,
FFmpegFD,
get_suitable_downloader,
shorten_protocol_name,
)
from .downloader.rtmp import rtmpdump_version from .downloader.rtmp import rtmpdump_version
from .extractor import gen_extractor_classes, get_info_extractor from .extractor import gen_extractor_classes, get_info_extractor
from .extractor.common import UnsupportedURLIE from .extractor.common import UnsupportedURLIE
@ -3353,7 +3358,7 @@ def existing_video_file(*filepaths):
fd, success = None, True fd, success = None, True
if info_dict.get('protocol') or info_dict.get('url'): if info_dict.get('protocol') or info_dict.get('url'):
fd = get_suitable_downloader(info_dict, self.params, to_stdout=temp_filename == '-') fd = get_suitable_downloader(info_dict, self.params, to_stdout=temp_filename == '-')
if fd != FFmpegFD and 'no-direct-merge' not in self.params['compat_opts'] and ( if fd not in [FFmpegFD, DashSegmentsFD] and 'no-direct-merge' not in self.params['compat_opts'] and (
info_dict.get('section_start') or info_dict.get('section_end')): info_dict.get('section_start') or info_dict.get('section_end')):
msg = ('This format cannot be partially downloaded' if FFmpegFD.available() msg = ('This format cannot be partially downloaded' if FFmpegFD.available()
else 'You have requested downloading the video partially, but ffmpeg is not installed') else 'You have requested downloading the video partially, but ffmpeg is not installed')

View File

@ -12,6 +12,7 @@
import optparse import optparse
import os import os
import re import re
import time
import traceback import traceback
from .compat import compat_os_name from .compat import compat_os_name
@ -332,12 +333,13 @@ def parse_chapters(name, value, advanced=False):
(?P<end_sign>-?)(?P<end>[^-]+) (?P<end_sign>-?)(?P<end>[^-]+)
)?''' )?'''
current_time = time.time()
chapters, ranges, from_url = [], [], False chapters, ranges, from_url = [], [], False
for regex in value or []: for regex in value or []:
if advanced and regex == '*from-url': if advanced and regex == '*from-url':
from_url = True from_url = True
continue continue
elif not regex.startswith('*'): elif not regex.startswith('*') and not regex.startswith('#'):
try: try:
chapters.append(re.compile(regex)) chapters.append(re.compile(regex))
except re.error as err: except re.error as err:
@ -354,11 +356,16 @@ def parse_chapters(name, value, advanced=False):
err = 'Must be of the form "*start-end"' err = 'Must be of the form "*start-end"'
elif not advanced and any(signs): elif not advanced and any(signs):
err = 'Negative timestamps are not allowed' err = 'Negative timestamps are not allowed'
else: elif regex.startswith('*'):
dur[0] *= -1 if signs[0] else 1 dur[0] *= -1 if signs[0] else 1
dur[1] *= -1 if signs[1] else 1 dur[1] *= -1 if signs[1] else 1
if dur[1] == float('-inf'): if dur[1] == float('-inf'):
err = '"-inf" is not a valid end' err = '"-inf" is not a valid end'
elif regex.startswith('#'):
dur[0] = dur[0] * (-1 if signs[0] else 1) + current_time
dur[1] = dur[1] * (-1 if signs[1] else 1) + current_time
if dur[1] == float('-inf'):
err = '"-inf" is not a valid end'
if err: if err:
raise ValueError(f'invalid {name} time range "{regex}". {err}') raise ValueError(f'invalid {name} time range "{regex}". {err}')
ranges.append(dur) ranges.append(dur)

View File

@ -36,6 +36,8 @@ def real_download(self, filename, info_dict):
'filename': fmt.get('filepath') or filename, 'filename': fmt.get('filepath') or filename,
'live': 'is_from_start' if fmt.get('is_from_start') else fmt.get('is_live'), 'live': 'is_from_start' if fmt.get('is_from_start') else fmt.get('is_live'),
'total_frags': fragment_count, 'total_frags': fragment_count,
'section_start': info_dict.get('section_start'),
'section_end': info_dict.get('section_end'),
} }
if real_downloader: if real_downloader:

View File

@ -2711,7 +2711,7 @@ def extract_common(source):
r = int(s.get('r', 0)) r = int(s.get('r', 0))
ms_info['total_number'] += 1 + r ms_info['total_number'] += 1 + r
ms_info['s'].append({ ms_info['s'].append({
't': int(s.get('t', 0)), 't': int_or_none(s.get('t')),
# @d is mandatory (see [1, 5.3.9.6.2, Table 17, page 60]) # @d is mandatory (see [1, 5.3.9.6.2, Table 17, page 60])
'd': int(s.attrib['d']), 'd': int(s.attrib['d']),
'r': r, 'r': r,
@ -2753,8 +2753,14 @@ def extract_Initialization(source):
return ms_info return ms_info
mpd_duration = parse_duration(mpd_doc.get('mediaPresentationDuration')) mpd_duration = parse_duration(mpd_doc.get('mediaPresentationDuration'))
availability_start_time = unified_timestamp(
mpd_doc.get('availabilityStartTime'), with_milliseconds=True) or 0
stream_numbers = collections.defaultdict(int) stream_numbers = collections.defaultdict(int)
for period_idx, period in enumerate(mpd_doc.findall(_add_ns('Period'))): for period_idx, period in enumerate(mpd_doc.findall(_add_ns('Period'))):
# segmentIngestTime is completely out of spec, but YT Livestream do this
segment_ingest_time = period.get('{http://youtube.com/yt/2012/10/10}segmentIngestTime')
if segment_ingest_time:
availability_start_time = unified_timestamp(segment_ingest_time, with_milliseconds=True)
period_entry = { period_entry = {
'id': period.get('id', f'period-{period_idx}'), 'id': period.get('id', f'period-{period_idx}'),
'formats': [], 'formats': [],
@ -2933,13 +2939,17 @@ def add_segment_url():
'Bandwidth': bandwidth, 'Bandwidth': bandwidth,
'Number': segment_number, 'Number': segment_number,
} }
duration = float_or_none(segment_d, representation_ms_info['timescale'])
start = float_or_none(segment_time, representation_ms_info['timescale'])
representation_ms_info['fragments'].append({ representation_ms_info['fragments'].append({
media_location_key: segment_url, media_location_key: segment_url,
'duration': float_or_none(segment_d, representation_ms_info['timescale']), 'duration': duration,
'start': availability_start_time + start,
'end': availability_start_time + start + duration,
}) })
for s in representation_ms_info['s']: for s in representation_ms_info['s']:
segment_time = s.get('t') or segment_time segment_time = s['t'] if s.get('t') is not None else segment_time
segment_d = s['d'] segment_d = s['d']
add_segment_url() add_segment_url()
segment_number += 1 segment_number += 1
@ -2955,6 +2965,7 @@ def add_segment_url():
fragments = [] fragments = []
segment_index = 0 segment_index = 0
timescale = representation_ms_info['timescale'] timescale = representation_ms_info['timescale']
start = 0
for s in representation_ms_info['s']: for s in representation_ms_info['s']:
duration = float_or_none(s['d'], timescale) duration = float_or_none(s['d'], timescale)
for _ in range(s.get('r', 0) + 1): for _ in range(s.get('r', 0) + 1):
@ -2962,8 +2973,11 @@ def add_segment_url():
fragments.append({ fragments.append({
location_key(segment_uri): segment_uri, location_key(segment_uri): segment_uri,
'duration': duration, 'duration': duration,
'start': availability_start_time + start,
'end': availability_start_time + start + duration,
}) })
segment_index += 1 segment_index += 1
start += duration
representation_ms_info['fragments'] = fragments representation_ms_info['fragments'] = fragments
elif 'segment_urls' in representation_ms_info: elif 'segment_urls' in representation_ms_info:
# Segment URLs with no SegmentTimeline # Segment URLs with no SegmentTimeline

View File

@ -1,51 +1,35 @@
import random
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import xpath_text
class MatchTVIE(InfoExtractor): class MatchTVIE(InfoExtractor):
_VALID_URL = r'https?://matchtv\.ru(?:/on-air|/?#live-player)' _VALID_URL = [
r'https?://matchtv\.ru/on-air/?(?:$|[?#])',
r'https?://video\.matchtv\.ru/iframe/channel/106/?(?:$|[?#])',
]
_TESTS = [{ _TESTS = [{
'url': 'http://matchtv.ru/#live-player', 'url': 'http://matchtv.ru/on-air/',
'info_dict': { 'info_dict': {
'id': 'matchtv-live', 'id': 'matchtv-live',
'ext': 'flv', 'ext': 'mp4',
'title': r're:^Матч ТВ - Прямой эфир \d{4}-\d{2}-\d{2} \d{2}:\d{2}$', 'title': r're:^Матч ТВ - Прямой эфир \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
'is_live': True, 'live_status': 'is_live',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
}, },
}, { }, {
'url': 'http://matchtv.ru/on-air/', 'url': 'https://video.matchtv.ru/iframe/channel/106',
'only_matching': True, 'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
video_id = 'matchtv-live' video_id = 'matchtv-live'
video_url = self._download_json( webpage = self._download_webpage('https://video.matchtv.ru/iframe/channel/106', video_id)
'http://player.matchtv.ntvplus.tv/player/smil', video_id, video_url = self._html_search_regex(
query={ r'data-config="config=(https?://[^?"]+)[?"]', webpage, 'video URL').replace('/feed/', '/media/') + '.m3u8'
'ts': '',
'quality': 'SD',
'contentId': '561d2c0df7159b37178b4567',
'sign': '',
'includeHighlights': '0',
'userId': '',
'sessionId': random.randint(1, 1000000000),
'contentType': 'channel',
'timeShift': '0',
'platform': 'portal',
},
headers={
'Referer': 'http://player.matchtv.ntvplus.tv/embed-player/NTVEmbedPlayer.swf',
})['data']['videoUrl']
f4m_url = xpath_text(self._download_xml(video_url, video_id), './to')
formats = self._extract_f4m_formats(f4m_url, video_id)
return { return {
'id': video_id, 'id': video_id,
'title': 'Матч ТВ - Прямой эфир', 'title': 'Матч ТВ - Прямой эфир',
'is_live': True, 'is_live': True,
'formats': formats, 'formats': self._extract_m3u8_formats(video_url, video_id, 'mp4', live=True),
} }

View File

@ -2798,17 +2798,17 @@ def refetch_manifest(format_id, delay):
microformats = traverse_obj( microformats = traverse_obj(
prs, (..., 'microformat', 'playerMicroformatRenderer'), prs, (..., 'microformat', 'playerMicroformatRenderer'),
expected_type=dict) expected_type=dict)
_, live_status, _, formats, _ = self._list_formats(video_id, microformats, video_details, prs, player_url) with lock:
is_live = live_status == 'is_live' _, live_status, _, formats, _ = self._list_formats(video_id, microformats, video_details, prs, player_url)
start_time = time.time() is_live = live_status == 'is_live'
start_time = time.time()
def mpd_feed(format_id, delay): def mpd_feed(format_id, delay):
""" """
@returns (manifest_url, manifest_stream_number, is_live) or None @returns (manifest_url, manifest_stream_number, is_live) or None
""" """
for retry in self.RetryManager(fatal=False): for retry in self.RetryManager(fatal=False):
with lock: refetch_manifest(format_id, delay)
refetch_manifest(format_id, delay)
f = next((f for f in formats if f['format_id'] == format_id), None) f = next((f for f in formats if f['format_id'] == format_id), None)
if not f: if not f:
@ -2839,6 +2839,11 @@ def _live_dash_fragments(self, video_id, format_id, live_start_time, mpd_feed, m
begin_index = 0 begin_index = 0
download_start_time = ctx.get('start') or time.time() download_start_time = ctx.get('start') or time.time()
section_start = ctx.get('section_start') or 0
section_end = ctx.get('section_end') or math.inf
self.write_debug(f'Selected section: {section_start} -> {section_end}')
lack_early_segments = download_start_time - (live_start_time or download_start_time) > MAX_DURATION lack_early_segments = download_start_time - (live_start_time or download_start_time) > MAX_DURATION
if lack_early_segments: if lack_early_segments:
self.report_warning(bug_reports_message( self.report_warning(bug_reports_message(
@ -2859,9 +2864,10 @@ def _extract_sequence_from_mpd(refresh_sequence, immediate):
or (mpd_url, stream_number, False)) or (mpd_url, stream_number, False))
if not refresh_sequence: if not refresh_sequence:
if expire_fast and not is_live: if expire_fast and not is_live:
return False, last_seq return False
elif old_mpd_url == mpd_url: elif old_mpd_url == mpd_url:
return True, last_seq return True
if manifestless_orig_fmt: if manifestless_orig_fmt:
fmt_info = manifestless_orig_fmt fmt_info = manifestless_orig_fmt
else: else:
@ -2872,14 +2878,13 @@ def _extract_sequence_from_mpd(refresh_sequence, immediate):
fmts = None fmts = None
if not fmts: if not fmts:
no_fragment_score += 2 no_fragment_score += 2
return False, last_seq return False
fmt_info = next(x for x in fmts if x['manifest_stream_number'] == stream_number) fmt_info = next(x for x in fmts if x['manifest_stream_number'] == stream_number)
fragments = fmt_info['fragments'] fragments = fmt_info['fragments']
fragment_base_url = fmt_info['fragment_base_url'] fragment_base_url = fmt_info['fragment_base_url']
assert fragment_base_url assert fragment_base_url
_last_seq = int(re.search(r'(?:/|^)sq/(\d+)', fragments[-1]['path']).group(1)) return True
return True, _last_seq
self.write_debug(f'[{video_id}] Generating fragments for format {format_id}') self.write_debug(f'[{video_id}] Generating fragments for format {format_id}')
while is_live: while is_live:
@ -2899,11 +2904,19 @@ def _extract_sequence_from_mpd(refresh_sequence, immediate):
last_segment_url = None last_segment_url = None
continue continue
else: else:
should_continue, last_seq = _extract_sequence_from_mpd(True, no_fragment_score > 15) should_continue = _extract_sequence_from_mpd(True, no_fragment_score > 15)
no_fragment_score += 2 no_fragment_score += 2
if not should_continue: if not should_continue:
continue continue
last_fragment = fragments[-1]
last_seq = int(re.search(r'(?:/|^)sq/(\d+)', fragments[-1]['path']).group(1))
known_fragment = next(
(fragment for fragment in fragments if f'sq/{known_idx}' in fragment['path']), None)
if known_fragment and known_fragment['end'] > section_end:
break
if known_idx > last_seq: if known_idx > last_seq:
last_segment_url = None last_segment_url = None
continue continue
@ -2913,20 +2926,36 @@ def _extract_sequence_from_mpd(refresh_sequence, immediate):
if begin_index < 0 and known_idx < 0: if begin_index < 0 and known_idx < 0:
# skip from the start when it's negative value # skip from the start when it's negative value
known_idx = last_seq + begin_index known_idx = last_seq + begin_index
if lack_early_segments: if lack_early_segments:
known_idx = max(known_idx, last_seq - int(MAX_DURATION // fragments[-1]['duration'])) known_idx = max(known_idx, last_seq - int(MAX_DURATION // last_fragment['duration']))
fragment_count = last_seq - known_idx if section_end == math.inf else int(
(section_end - section_start) // last_fragment['duration'])
try: try:
for idx in range(known_idx, last_seq): for idx in range(known_idx, last_seq):
# do not update sequence here or you'll get skipped some part of it # do not update sequence here or you'll get skipped some part of it
should_continue, _ = _extract_sequence_from_mpd(False, False) should_continue = _extract_sequence_from_mpd(False, False)
if not should_continue: if not should_continue:
known_idx = idx - 1 known_idx = idx - 1
raise ExtractorError('breaking out of outer loop') raise ExtractorError('breaking out of outer loop')
last_segment_url = urljoin(fragment_base_url, f'sq/{idx}')
yield { frag_duration = last_fragment['duration']
'url': last_segment_url, frag_start = last_fragment['start'] - (last_seq - idx) * frag_duration
'fragment_count': last_seq, frag_end = frag_start + frag_duration
}
if frag_start >= section_start and frag_end <= section_end:
last_segment_url = urljoin(fragment_base_url, f'sq/{idx}')
yield {
'url': last_segment_url,
'fragment_count': fragment_count,
'duration': frag_duration,
'start': frag_start,
'end': frag_end,
}
if known_idx == last_seq: if known_idx == last_seq:
no_fragment_score += 5 no_fragment_score += 5
else: else:
@ -3974,6 +4003,9 @@ def build_fragments(f):
dct['downloader_options'] = {'http_chunk_size': CHUNK_SIZE} dct['downloader_options'] = {'http_chunk_size': CHUNK_SIZE}
yield dct yield dct
if live_status == 'is_live' and self.get_param('download_ranges') and not self.get_param('live_from_start'):
self.report_warning('For YT livestreams, --download-sections is only supported with --live-from-start')
needs_live_processing = self._needs_live_processing(live_status, duration) needs_live_processing = self._needs_live_processing(live_status, duration)
skip_bad_formats = 'incomplete' not in format_types skip_bad_formats = 'incomplete' not in format_types
if self._configuration_arg('include_incomplete_formats'): if self._configuration_arg('include_incomplete_formats'):

View File

@ -419,7 +419,14 @@ def _alias_callback(option, opt_str, value, parser, opts, nargs):
general.add_option( general.add_option(
'--live-from-start', '--live-from-start',
action='store_true', dest='live_from_start', action='store_true', dest='live_from_start',
help='Download livestreams from the start. Currently only supported for YouTube (Experimental)') help=('Download livestreams from the start. Currently only supported for YouTube (Experimental). '
'Time ranges can be specified using --download-sections to download only a part of the stream. '
'Negative values are allowed for specifying a relative previous time, using the # syntax '
'e.g. --download-sections "#-24hours - 0" (download last 24 hours), '
'e.g. --download-sections "#-1h - 30m" (download from 1 hour ago until the next 30 minutes), '
'e.g. --download-sections "#-3days - -2days" (download from 3 days ago until 2 days ago). '
'It is also possible to specify an exact unix timestamp range, using the * syntax, '
'e.g. --download-sections "*1672531200 - 1672549200" (download between those two timestamps)'))
general.add_option( general.add_option(
'--no-live-from-start', '--no-live-from-start',
action='store_false', dest='live_from_start', action='store_false', dest='live_from_start',

View File

@ -1212,7 +1212,7 @@ def unified_strdate(date_str, day_first=True):
return str(upload_date) return str(upload_date)
def unified_timestamp(date_str, day_first=True): def unified_timestamp(date_str, day_first=True, with_milliseconds=False):
if not isinstance(date_str, str): if not isinstance(date_str, str):
return None return None
@ -1238,7 +1238,7 @@ def unified_timestamp(date_str, day_first=True):
for expression in date_formats(day_first): for expression in date_formats(day_first):
with contextlib.suppress(ValueError): with contextlib.suppress(ValueError):
dt_ = dt.datetime.strptime(date_str, expression) - timezone + dt.timedelta(hours=pm_delta) dt_ = dt.datetime.strptime(date_str, expression) - timezone + dt.timedelta(hours=pm_delta)
return calendar.timegm(dt_.timetuple()) return calendar.timegm(dt_.timetuple()) + (dt_.microsecond / 1e6 if with_milliseconds else 0)
timetuple = email.utils.parsedate_tz(date_str) timetuple = email.utils.parsedate_tz(date_str)
if timetuple: if timetuple:
@ -2038,16 +2038,19 @@ def parse_duration(s):
days, hours, mins, secs, ms = [None] * 5 days, hours, mins, secs, ms = [None] * 5
m = re.match(r'''(?x) m = re.match(r'''(?x)
(?P<sign>[+-])?
(?P<before_secs> (?P<before_secs>
(?:(?:(?P<days>[0-9]+):)?(?P<hours>[0-9]+):)?(?P<mins>[0-9]+):)? (?:(?:(?P<days>[0-9]+):)?(?P<hours>[0-9]+):)?(?P<mins>[0-9]+):)?
(?P<secs>(?(before_secs)[0-9]{1,2}|[0-9]+)) (?P<secs>(?(before_secs)[0-9]{1,2}|[0-9]+))
(?P<ms>[.:][0-9]+)?Z?$ (?P<ms>[.:][0-9]+)?Z?$
''', s) ''', s)
if m: if m:
days, hours, mins, secs, ms = m.group('days', 'hours', 'mins', 'secs', 'ms') sign, days, hours, mins, secs, ms = m.group('sign', 'days', 'hours', 'mins', 'secs', 'ms')
else: else:
m = re.match( m = re.match(
r'''(?ix)(?:P? r'''(?ix)(?:
(?P<sign>[+-])?
P?
(?: (?:
[0-9]+\s*y(?:ears?)?,?\s* [0-9]+\s*y(?:ears?)?,?\s*
)? )?
@ -2071,17 +2074,19 @@ def parse_duration(s):
(?P<secs>[0-9]+)(?P<ms>\.[0-9]+)?\s*s(?:ec(?:ond)?s?)?\s* (?P<secs>[0-9]+)(?P<ms>\.[0-9]+)?\s*s(?:ec(?:ond)?s?)?\s*
)?Z?$''', s) )?Z?$''', s)
if m: if m:
days, hours, mins, secs, ms = m.groups() sign, days, hours, mins, secs, ms = m.groups()
else: else:
m = re.match(r'(?i)(?:(?P<hours>[0-9.]+)\s*(?:hours?)|(?P<mins>[0-9.]+)\s*(?:mins?\.?|minutes?)\s*)Z?$', s) m = re.match(r'(?i)(?P<sign>[+-])?(?:(?P<days>[0-9.]+)\s*(?:days?)|(?P<hours>[0-9.]+)\s*(?:hours?)|(?P<mins>[0-9.]+)\s*(?:mins?\.?|minutes?)\s*)Z?$', s)
if m: if m:
hours, mins = m.groups() sign, days, hours, mins = m.groups()
else: else:
return None return None
sign = -1 if sign == '-' else 1
if ms: if ms:
ms = ms.replace(':', '.') ms = ms.replace(':', '.')
return sum(float(part or 0) * mult for part, mult in ( return sign * sum(float(part or 0) * mult for part, mult in (
(days, 86400), (hours, 3600), (mins, 60), (secs, 1), (ms, 1))) (days, 86400), (hours, 3600), (mins, 60), (secs, 1), (ms, 1)))