audio¶

psifx audio¶

Command-line interface for processing audio tracks.

usage: psifx audio [-h] [--all-help]
                   {diarization,identification,manipulation,speech,transcription}
                   ...

-h, --help¶: show this help message and exit

--all-help¶: show help recursively and exit

psifx audio diarization¶

Command-line interface for diarizing audio tracks.

usage: psifx audio diarization [-h] [--all-help] {pyannote,visualization} ...

-h, --help¶: show this help message and exit

--all-help¶: show help recursively and exit

psifx audio diarization pyannote¶

Command-line interface for running pyannote diarization tool.

usage: psifx audio diarization pyannote [-h] [--all-help]
                                        {inference,visualization} ...

-h, --help¶: show this help message and exit

--all-help¶: show help recursively and exit

psifx audio diarization pyannote inference¶

Command-line interface for diarizing an audio track with pyannote.

usage: psifx audio diarization pyannote inference [-h] --audio AUDIO
                                                  --diarization DIARIZATION
                                                  [--num_speakers NUM_SPEAKERS]
                                                  [--model_name MODEL_NAME]
                                                  [--api_token API_TOKEN]
                                                  [--device DEVICE]
                                                  [--overwrite | --no-overwrite]
                                                  [--verbose | --no-verbose]

-h, --help¶: show this help message and exit

--audio <audio>¶: path to the input audio file, such as /path/to/audio.wav

--diarization <diarization>¶: path to the output diarization file, such as /path/to/diarization.rttm

--num_speakers <num_speakers>¶: number of speaking participants, if ignored the model will try to guess it, it is advised to specify it

--model_name <model_name>¶: name of the diarization model used, c.f. https://huggingface.co/pyannote/speaker-diarization/tree/main/reproducible_research

--api_token <api_token>¶: API token for the downloading the models from HuggingFace

--device <device>¶: device on which to run the inference, either ‘cpu’ or ‘cuda’

--overwrite, --no-overwrite¶: overwrite existing files, otherwise raises an error

--verbose, --no-verbose¶: verbosity of the script

psifx audio diarization pyannote visualization¶

Command-line interface for visualizing the diarization of a track.

usage: psifx audio diarization pyannote visualization [-h] --diarization
                                                      DIARIZATION
                                                      --visualization
                                                      VISUALIZATION
                                                      [--overwrite | --no-overwrite]
                                                      [--verbose | --no-verbose]

-h, --help¶: show this help message and exit

--diarization <diarization>¶: path to the input diarization file, such as /path/to/diarization.rttm

--visualization <visualization>¶: path to the output visualization file, such as /path/to/visualization.png

--overwrite, --no-overwrite¶: overwrite existing files, otherwise raises an error

--verbose, --no-verbose¶: verbosity of the script

psifx audio diarization visualization¶

Command-line interface for visualizing the diarization of a track.

usage: psifx audio diarization visualization [-h] --diarization DIARIZATION
                                             --visualization VISUALIZATION
                                             [--overwrite | --no-overwrite]
                                             [--verbose | --no-verbose]

-h, --help¶: show this help message and exit

--diarization <diarization>¶: path to the input diarization file, such as /path/to/diarization.rttm

--visualization <visualization>¶: path to the output visualization file, such as /path/to/visualization.png

--overwrite, --no-overwrite¶: overwrite existing files, otherwise raises an error

--verbose, --no-verbose¶: verbosity of the script

psifx audio identification¶

Command-line interface for identifying speakers in audio tracks.

usage: psifx audio identification [-h] [--all-help] {pyannote} ...

-h, --help¶: show this help message and exit

--all-help¶: show help recursively and exit

psifx audio identification pyannote¶

Command-line interface for running pyannote identification tool.

usage: psifx audio identification pyannote [-h] [--all-help] {inference} ...

-h, --help¶: show this help message and exit

--all-help¶: show help recursively and exit

psifx audio identification pyannote inference¶

Command-line interface for identifying speakers from an audio track with pyannote.

usage: psifx audio identification pyannote inference [-h] --mixed_audio
                                                     MIXED_AUDIO --diarization
                                                     DIARIZATION --mono_audios
                                                     MONO_AUDIOS
                                                     [MONO_AUDIOS ...]
                                                     --identification
                                                     IDENTIFICATION
                                                     [--model_names MODEL_NAMES [MODEL_NAMES ...]]
                                                     [--api_token API_TOKEN]
                                                     [--device DEVICE]
                                                     [--overwrite | --no-overwrite]
                                                     [--verbose | --no-verbose]

-h, --help¶: show this help message and exit

--mixed_audio <mixed_audio>¶: path to the input mixed audio file, such as /path/to/mixed-audio.wav

--diarization <diarization>¶: path to the input diarization file, such as /path/to/diarization.rttm

--mono_audios <mono_audios>¶: paths to the input mono audio files, such as /path/to/mono-audio-1.wav /path/to/mono-audio-2.wav

--identification <identification>¶: path to the output identification file, such as /path/to/identification.json

--model_names <model_names>¶: names of the embedding models

--api_token <api_token>¶: API token for the downloading the models from HuggingFace

--device <device>¶: device on which to run the inference, either ‘cpu’ or ‘cuda’

--overwrite, --no-overwrite¶: overwrite existing files, otherwise raises an error

--verbose, --no-verbose¶: verbosity of the script

psifx audio manipulation¶

Command-line interface for manipulating audio tracks.

usage: psifx audio manipulation [-h] [--all-help]
                                {extraction,conversion,split,mixdown,normalization,trim}
                                ...

-h, --help¶: show this help message and exit

--all-help¶: show help recursively and exit

psifx audio manipulation conversion¶

Command-line interface for converting any audio track to a mono audio track at 16kHz sample rate.

usage: psifx audio manipulation conversion [-h] --audio AUDIO --mono_audio
                                           MONO_AUDIO
                                           [--overwrite | --no-overwrite]
                                           [--verbose | --no-verbose]

-h, --help¶: show this help message and exit

--audio <audio>¶: path to the input audio file, such as /path/to/audio.wav (or .mp3, etc.)

--mono_audio <mono_audio>¶: path to the output audio file, such as /path/to/mono-audio.wav

--overwrite, --no-overwrite¶: overwrite existing files, otherwise raises an error

--verbose, --no-verbose¶: verbosity of the script

psifx audio manipulation extraction¶

Command-line interface for extracting the audio track from a video.

usage: psifx audio manipulation extraction [-h] --video VIDEO --audio AUDIO
                                           [--overwrite | --no-overwrite]
                                           [--verbose | --no-verbose]

-h, --help¶: show this help message and exit

--video <video>¶: path to the input video file, such as /path/to/video.mp4 (or .avi, .mkv, etc.)

--audio <audio>¶: path to the output audio file, such as /path/to/audio.wav

--overwrite, --no-overwrite¶: overwrite existing files, otherwise raises an error

--verbose, --no-verbose¶: verbosity of the script

psifx audio manipulation mixdown¶

Command-line interface for mixing multiple mono audio tracks.

usage: psifx audio manipulation mixdown [-h] --mono_audios MONO_AUDIOS
                                        [MONO_AUDIOS ...] --mixed_audio
                                        MIXED_AUDIO
                                        [--overwrite | --no-overwrite]
                                        [--verbose | --no-verbose]

-h, --help¶: show this help message and exit

--mono_audios <mono_audios>¶: paths to the input mono audio files, such as /path/to/mono-audio-1.wav /path/to/mono-audio-2.wav

--mixed_audio <mixed_audio>¶: path to the output mixed audio file, such as /path/to/mixed-audio.wav

--overwrite, --no-overwrite¶: overwrite existing files, otherwise raises an error

--verbose, --no-verbose¶: verbosity of the script

psifx audio manipulation normalization¶

Command-line interface for normalizing an audio track.

usage: psifx audio manipulation normalization [-h] --audio AUDIO
                                              --normalized_audio
                                              NORMALIZED_AUDIO
                                              [--overwrite | --no-overwrite]
                                              [--verbose | --no-verbose]

-h, --help¶: show this help message and exit

--audio <audio>¶: path to the input audio file, such as /path/to/audio.wav

--normalized_audio <normalized_audio>¶: path to the output normalized audio file, such as /path/to/normalized-audio.wav

--overwrite, --no-overwrite¶: overwrite existing files, otherwise raises an error

--verbose, --no-verbose¶: verbosity of the script

psifx audio manipulation split¶

Command-line interface for splitting a stereo audio track into two mono tracks.

usage: psifx audio manipulation split [-h] --stereo_audio STEREO_AUDIO
                                      --left_audio LEFT_AUDIO --right_audio
                                      RIGHT_AUDIO
                                      [--overwrite | --no-overwrite]
                                      [--verbose | --no-verbose]

-h, --help¶: show this help message and exit

--stereo_audio <stereo_audio>¶: path to the input stereo audio file, such as /path/to/stereo-audio.wav

--left_audio <left_audio>¶: path to the output left channel mono audio file, such as /path/to/left-audio.wav

--right_audio <right_audio>¶: path to the output right channel mono audio file, such as /path/to/right-audio.wav

--overwrite, --no-overwrite¶: overwrite existing files, otherwise raises an error

--verbose, --no-verbose¶: verbosity of the script

psifx audio manipulation trim¶

Command-line interface for trimming an audio track.

usage: psifx audio manipulation trim [-h] --audio AUDIO --trimmed_audio
                                     TRIMMED_AUDIO [--start_time START_TIME]
                                     [--end_time END_TIME]
                                     [--overwrite | --no-overwrite]
                                     [--verbose | --no-verbose]

-h, --help¶: show this help message and exit

--audio <audio>¶: path to the input audio file, such as /path/to/audio.wav

--trimmed_audio <trimmed_audio>¶: path to the output trimmed audio file, such as /path/to/trimmed-audio.wav

--start_time <start_time>¶: start time in seconds (None to keep from beginning)

--end_time <end_time>¶: end time in seconds (None to keep until end)

--overwrite, --no-overwrite¶: overwrite existing files, otherwise raises an error

--verbose, --no-verbose¶: verbosity of the script

psifx audio speech¶

Command-line interface for extracting non-verbal speech features from an audio track.

usage: psifx audio speech [-h] [--all-help] {opensmile} ...

-h, --help¶: show this help message and exit

--all-help¶: show help recursively and exit

psifx audio speech opensmile¶

Command-line interface for running OpenSmile.

usage: psifx audio speech opensmile [-h] [--all-help] {inference} ...

-h, --help¶: show this help message and exit

--all-help¶: show help recursively and exit

psifx audio speech opensmile inference¶

Command-line interface for extracting non-verbal speech features from an audio track with OpenSmile.

usage: psifx audio speech opensmile inference [-h] --audio AUDIO --diarization
                                              DIARIZATION --features FEATURES
                                              [--feature_set FEATURE_SET]
                                              [--feature_level FEATURE_LEVEL]
                                              [--overwrite | --no-overwrite]
                                              [--verbose | --no-verbose]

-h, --help¶: show this help message and exit

--audio <audio>¶: path to the input audio file, such as /path/to/audio.wav

--diarization <diarization>¶: path to the input diarization file, such as /path/to/diarization.rttm

--features <features>¶: path to the output feature archive, such as /path/to/opensmile.tar.gz

--feature_set <feature_set>¶: available sets: [‘ComParE_2016’, ‘GeMAPSv01a’, ‘GeMAPSv01b’, ‘eGeMAPSv01a’, ‘eGeMAPSv01b’, ‘eGeMAPSv02’, ‘emobase’]

--feature_level <feature_level>¶: available levels: [‘lld’, ‘lld_de’, ‘func’]

--overwrite, --no-overwrite¶: overwrite existing files, otherwise raises an error

--verbose, --no-verbose¶: verbosity of the script

psifx audio transcription¶

Command-line interface for transcribing audio tracks.

usage: psifx audio transcription [-h] [--all-help] {whisperx,enhance} ...

-h, --help¶: show this help message and exit

--all-help¶: show help recursively and exit

psifx audio transcription enhance¶

Command-line interface for enhancing a transcription with diarization and identification.

usage: psifx audio transcription enhance [-h] --transcription TRANSCRIPTION
                                         --diarization DIARIZATION
                                         --identification IDENTIFICATION
                                         --enhanced_transcription
                                         ENHANCED_TRANSCRIPTION
                                         [--overwrite | --no-overwrite]
                                         [--verbose | --no-verbose]

-h, --help¶: show this help message and exit

--transcription <transcription>¶: path to the input transcription file, such as /path/to/transcription.vtt

--diarization <diarization>¶: path to the input diarization file, such as /path/to/diarization.rttm

--identification <identification>¶: path to the input identification file, such as /path/to/identification.json

--enhanced_transcription <enhanced_transcription>¶: path to the output transcription file, such as /path/to/enhanced-transcription.vtt

--overwrite, --no-overwrite¶: overwrite existing files, otherwise raises an error

--verbose, --no-verbose¶: verbosity of the script

psifx audio transcription whisperx¶

Command-line interface for running OpenAI Whisper.

usage: psifx audio transcription whisperx [-h] [--all-help] {inference} ...

-h, --help¶: show this help message and exit

--all-help¶: show help recursively and exit

psifx audio transcription whisperx inference¶

Command-line interface for transcribing an audio track with WhisperX.

usage: psifx audio transcription whisperx inference [-h] --audio AUDIO
                                                    --transcription
                                                    TRANSCRIPTION
                                                    [--language LANGUAGE]
                                                    [--model_name MODEL_NAME]
                                                    [--translate_to_english | --no-translate_to_english]
                                                    [--batch_size BATCH_SIZE]
                                                    [--device DEVICE]
                                                    [--overwrite | --no-overwrite]
                                                    [--verbose | --no-verbose]

-h, --help¶: show this help message and exit

--audio <audio>¶: path to the input audio file, such as /path/to/audio.wav

--transcription <transcription>¶: path to the output transcription file, such as /path/to/transcription.vtt

--language <language>¶: language of the audio, if ignore, the model will try to guess it, it is advised to specify it

--model_name <model_name>¶: size of the model to use (tiny, tiny.en, base, base.en, small, small.en, distil-small.en, medium, medium.en, distil-medium.en, large-v1, large-v2, large-v3, large, distil-large-v2, distil-large-v3, large-v3-turbo, or turbo), a path to a converted model directory, or a CTranslate2-converted Whisper model ID from the HF Hub

--translate_to_english, --no-translate_to_english¶: whether to transcribe the audio in its original language or to translate it to english

--batch_size <batch_size>¶: batch size, reduce if low on GPU memory

--device <device>¶: device on which to run the inference, either ‘cpu’ or ‘cuda’

--overwrite, --no-overwrite¶: overwrite existing files, otherwise raises an error

--verbose, --no-verbose¶: verbosity of the script