tool

pyannote speaker identification tool.

Functions

cropped_waveform

Crops an audio track and returns its corresponding waveform.

Classes

PyannoteIdentificationTool

pyannote speaker identification tool.

class PyannoteIdentificationTool(model_names, api_token=None, device='cpu', overwrite=False, verbose=True)[source]

Bases: IdentificationTool

pyannote speaker identification tool.

Parameters:
  • model_names (Sequence[str]) – The names of the models to use.

  • api_token (Optional[str]) – The HuggingFace API token to use.

  • device (str) – The device where the computation should be executed.

  • overwrite (bool) – Whether to overwrite existing files, otherwise raise an error.

  • verbose (Union[bool, int]) – Whether to execute the computation verbosely.

inference(mixed_audio_path, diarization_path, mono_audio_paths, identification_path)[source]

pyannote’s backed inference method.

Parameters:
  • mixed_audio_path (Union[str, Path]) – Path to the mixed audio track.

  • diarization_path (Union[str, Path]) – Path to the diarization file.

  • mono_audio_paths (Sequence[Union[str, Path]]) – Path to the mono audio tracks.

  • identification_path (Union[str, Path]) – Path to the identification file.

Returns:

cropped_waveform(path, start, end, sample_rate=32000)[source]

Crops an audio track and returns its corresponding waveform.

Parameters:
  • path (Union[str, Path]) – Path to the audio track.

  • start (float) – Start of segment in seconds.

  • end (float) – End of the segment in seconds.

  • sample_rate (int) – Sample rate of the audio track.

Return type:

Tensor

Returns:

Tensor containing the waveform of the audio segment.