知方号

知方号

Transformers 4.37 中文文档(八十六)

def sample_frame_indices(clip_len, frame_sample_rate, seg_len): … ‘’’ … Sample a given number of frame indices from the video. … Args: … clip_len (int): Total number of frames to sample. … frame_sample_rate (int): Sample every n-th frame. … seg_len (int): Maximum allowed index of sample’s last frame. … Returns: … indices (List[int]): List of sampled frame indices … ‘’’ … converted_len = int(clip_len * frame_sample_rate) … end_idx = np.random.randint(converted_len, seg_len) … start_idx = end_idx - converted_len … indices = np.linspace(start_idx, end_idx, num=clip_len) … indices = np.clip(indices, start_idx, end_idx - 1).astype(np.int64) … return indices

load video file_path = hf_hub_download( … repo_id=“nielsr/video-demo”, filename=“eating_spaghetti.mp4”, repo_type=“dataset” … ) container = av.open(file_path)

sample frames num_frames = model.config.num_image_with_embedding indices = sample_frame_indices( … clip_len=num_frames, frame_sample_rate=4, seg_len=container.streams.video[0].frames … ) frames = read_video_pyav(container, indices)

pixel_values = processor(images=list(frames), return_tensors=“pt”).pixel_values

generated_ids = model.generate(pixel_values=pixel_values, max_length=50)

print(“Generated caption:”, processor.batch_decode(generated_ids, skip_special_tokens=True)) Generated caption: [‘a woman is sitting at a table and she is talking about the food she is holding.’]

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容, 请发送邮件至lizi9903@foxmail.com举报,一经查实,本站将立刻删除。