CLI¶

psifx command-line interface.

usage: psifx [-h] [-v] [--all-help] {audio,video,text} ...

Named Arguments¶

-v, --version: show version info

Sub-commands¶

audio¶

Command-line interface for processing audio tracks.

psifx audio [-h] [--all-help]
            {diarization,identification,manipulation,speech,transcription} ...

Sub-commands¶

diarization¶

Command-line interface for diarizing audio tracks.

psifx audio diarization [-h] [--all-help] {pyannote,visualization} ...

Sub-commands¶

pyannote¶

Command-line interface for running pyannote diarization tool.

psifx audio diarization pyannote [-h] [--all-help]
                                 {inference,visualization} ...

Sub-commands¶

inference¶

Command-line interface for diarizing an audio track with pyannote.

psifx audio diarization pyannote inference [-h] --audio AUDIO --diarization
                                           DIARIZATION
                                           [--num_speakers NUM_SPEAKERS]
                                           [--model_name MODEL_NAME]
                                           [--api_token API_TOKEN]
                                           [--device DEVICE]
                                           [--overwrite | --no-overwrite]
                                           [--verbose | --no-verbose]

Named Arguments¶

--audio

path to the input audio file, such as /path/to/audio.wav

--diarization

path to the output diarization file, such as /path/to/diarization.rttm

--num_speakers

number of speaking participants, if ignored the model will try to guess it, it is advised to specify it

--model_name

name of the diarization model used, c.f. https://huggingface.co/pyannote/speaker-diarization/tree/main/reproducible_research

Default: “pyannote/speaker-diarization@2.1.1”

--api_token

API token for the downloading the models from HuggingFace

--device

device on which to run the inference, either ‘cpu’ or ‘cuda’

Default: “cpu”

--overwrite, --no-overwrite

overwrite existing files, otherwise raises an error (default: False)

Default: False

--verbose, --no-verbose

verbosity of the script (default: True)

Default: True

visualization¶

Command-line interface for visualizing the diarization of a track.

psifx audio diarization pyannote visualization [-h] --diarization DIARIZATION
                                               --visualization VISUALIZATION
                                               [--overwrite | --no-overwrite]
                                               [--verbose | --no-verbose]

Named Arguments¶

--diarization

path to the input diarization file, such as /path/to/diarization.rttm

--visualization

path to the output visualization file, such as /path/to/visualization.png

--overwrite, --no-overwrite

overwrite existing files, otherwise raises an error (default: False)

Default: False

--verbose, --no-verbose

verbosity of the script (default: True)

Default: True

visualization¶

Command-line interface for visualizing the diarization of a track.

psifx audio diarization visualization [-h] --diarization DIARIZATION
                                      --visualization VISUALIZATION
                                      [--overwrite | --no-overwrite]
                                      [--verbose | --no-verbose]

Named Arguments¶

--diarization

path to the input diarization file, such as /path/to/diarization.rttm

--visualization

path to the output visualization file, such as /path/to/visualization.png

--overwrite, --no-overwrite

overwrite existing files, otherwise raises an error (default: False)

Default: False

--verbose, --no-verbose

verbosity of the script (default: True)

Default: True

identification¶

Command-line interface for identifying speakers in audio tracks.

psifx audio identification [-h] [--all-help] {pyannote} ...

Sub-commands¶

pyannote¶

Command-line interface for running pyannote identification tool.

psifx audio identification pyannote [-h] [--all-help] {inference} ...

Sub-commands¶

inference¶

Command-line interface for identifying speakers from an audio track with pyannote.

psifx audio identification pyannote inference [-h] --mixed_audio MIXED_AUDIO
                                              --diarization DIARIZATION
                                              --mono_audios MONO_AUDIOS
                                              [MONO_AUDIOS ...]
                                              --identification IDENTIFICATION
                                              [--model_names MODEL_NAMES [MODEL_NAMES ...]]
                                              [--api_token API_TOKEN]
                                              [--device DEVICE]
                                              [--overwrite | --no-overwrite]
                                              [--verbose | --no-verbose]

Named Arguments¶

--mixed_audio

path to the input mixed audio file, such as /path/to/mixed-audio.wav

--diarization

path to the input diarization file, such as /path/to/diarization.rttm

--mono_audios

paths to the input mono audio files, such as /path/to/mono-audio-1.wav /path/to/mono-audio-2.wav

--identification

path to the output identification file, such as /path/to/identification.json

--model_names

names of the embedding models

Default: [‘pyannote/embedding’, ‘speechbrain/spkrec-ecapa-voxceleb’]

--api_token

API token for the downloading the models from HuggingFace

--device

device on which to run the inference, either ‘cpu’ or ‘cuda’

Default: “cpu”

--overwrite, --no-overwrite

overwrite existing files, otherwise raises an error (default: False)

Default: False

--verbose, --no-verbose

verbosity of the script (default: True)

Default: True

manipulation¶

Command-line interface for manipulating audio tracks.

psifx audio manipulation [-h] [--all-help]
                         {extraction,conversion,split,mixdown,normalization}
                         ...

Sub-commands¶

extraction¶

Command-line interface for extracting the audio track from a video.

psifx audio manipulation extraction [-h] --video VIDEO --audio AUDIO
                                    [--overwrite | --no-overwrite]
                                    [--verbose | --no-verbose]

Named Arguments¶

--video

path to the input video file, such as /path/to/video.mp4 (or .avi, .mkv, etc.)

--audio

path to the output audio file, such as /path/to/audio.wav

--overwrite, --no-overwrite

overwrite existing files, otherwise raises an error (default: False)

Default: False

--verbose, --no-verbose

verbosity of the script (default: True)

Default: True

conversion¶

Command-line interface for converting any audio track to a mono audio track at 16kHz sample rate.

psifx audio manipulation conversion [-h] --audio AUDIO --mono_audio MONO_AUDIO
                                    [--overwrite | --no-overwrite]
                                    [--verbose | --no-verbose]

Named Arguments¶

--audio

path to the input audio file, such as /path/to/audio.wav (or .mp3, etc.)

--mono_audio

path to the output audio file, such as /path/to/mono-audio.wav

--overwrite, --no-overwrite

overwrite existing files, otherwise raises an error (default: False)

Default: False

--verbose, --no-verbose

verbosity of the script (default: True)

Default: True

split¶

Command-line interface for splitting a stereo audio track into two mono tracks.

psifx audio manipulation split [-h] --stereo_audio STEREO_AUDIO --left_audio
                               LEFT_AUDIO --right_audio RIGHT_AUDIO
                               [--overwrite | --no-overwrite]
                               [--verbose | --no-verbose]

Named Arguments¶

--stereo_audio

path to the input stereo audio file, such as /path/to/stereo-audio.wav

--left_audio

path to the output left channel mono audio file, such as /path/to/left-audio.wav

--right_audio

path to the output right channel mono audio file, such as /path/to/right-audio.wav

--overwrite, --no-overwrite

overwrite existing files, otherwise raises an error (default: False)

Default: False

--verbose, --no-verbose

verbosity of the script (default: True)

Default: True

mixdown¶

Command-line interface for mixing multiple mono audio tracks.

psifx audio manipulation mixdown [-h] --mono_audios MONO_AUDIOS
                                 [MONO_AUDIOS ...] --mixed_audio MIXED_AUDIO
                                 [--overwrite | --no-overwrite]
                                 [--verbose | --no-verbose]

Named Arguments¶

--mono_audios

paths to the input mono audio files, such as /path/to/mono-audio-1.wav /path/to/mono-audio-2.wav

--mixed_audio

path to the output mixed audio file, such as /path/to/mixed-audio.wav

--overwrite, --no-overwrite

overwrite existing files, otherwise raises an error (default: False)

Default: False

--verbose, --no-verbose

verbosity of the script (default: True)

Default: True

normalization¶

Command-line interface for normalizing an audio track.

psifx audio manipulation normalization [-h] --audio AUDIO --normalized_audio
                                       NORMALIZED_AUDIO
                                       [--overwrite | --no-overwrite]
                                       [--verbose | --no-verbose]

Named Arguments¶

--audio

path to the input audio file, such as /path/to/audio.wav

--normalized_audio

path to the output normalized audio file, such as /path/to/normalized-audio.wav

--overwrite, --no-overwrite

overwrite existing files, otherwise raises an error (default: False)

Default: False

--verbose, --no-verbose

verbosity of the script (default: True)

Default: True

speech¶

Command-line interface for extracting non-verbal speech features from an audio track.

psifx audio speech [-h] [--all-help] {opensmile} ...

Sub-commands¶

opensmile¶

Command-line interface for running OpenSmile.

psifx audio speech opensmile [-h] [--all-help] {inference} ...

Sub-commands¶

inference¶

Command-line interface for extracting non-verbal speech features from an audio track with OpenSmile.

psifx audio speech opensmile inference [-h] --audio AUDIO --diarization
                                       DIARIZATION --features FEATURES
                                       [--feature_set FEATURE_SET]
                                       [--feature_level FEATURE_LEVEL]
                                       [--overwrite | --no-overwrite]
                                       [--verbose | --no-verbose]

Named Arguments¶

--audio

path to the input audio file, such as /path/to/audio.wav

--diarization

path to the input diarization file, such as /path/to/diarization.rttm

--features

path to the output feature archive, such as /path/to/opensmile.tar.gz

--feature_set

available sets: [‘ComParE_2016’, ‘GeMAPSv01a’, ‘GeMAPSv01b’, ‘eGeMAPSv01a’, ‘eGeMAPSv01b’, ‘eGeMAPSv02’, ‘emobase’]

Default: “ComParE_2016”

--feature_level

available levels: [‘lld’, ‘lld_de’, ‘func’]

Default: “func”

--overwrite, --no-overwrite

overwrite existing files, otherwise raises an error (default: False)

Default: False

--verbose, --no-verbose

verbosity of the script (default: True)

Default: True

transcription¶

Command-line interface for transcribing audio tracks.

psifx audio transcription [-h] [--all-help] {whisper,enhance} ...

Sub-commands¶

whisper¶

Command-line interface for running Whisper.

psifx audio transcription whisper [-h] [--all-help] {inference,enhance} ...

Sub-commands¶

inference¶

Command-line interface for transcribing an audio track with Whisper.

psifx audio transcription whisper inference [-h] --audio AUDIO --transcription
                                            TRANSCRIPTION
                                            [--language LANGUAGE]
                                            [--model_name MODEL_NAME]
                                            [--translate_to_english | --no-translate_to_english]
                                            [--device DEVICE]
                                            [--overwrite | --no-overwrite]
                                            [--verbose | --no-verbose]

Named Arguments¶

--audio

path to the input audio file, such as /path/to/audio.wav

--transcription

path to the output transcription file, such as /path/to/transcription.vtt

--language

language of the audio, if ignore, the model will try to guess it, it is advised to specify it

--model_name

name of the model, check https://github.com/openai/whisper#available-models-and-languages

Default: “small”

--translate_to_english, --no-translate_to_english

whether to transcribe the audio in its original language or to translate it to english (default: False)

Default: False

--device

device on which to run the inference, either ‘cpu’ or ‘cuda’

Default: “cpu”

--overwrite, --no-overwrite

overwrite existing files, otherwise raises an error (default: False)

Default: False

--verbose, --no-verbose

verbosity of the script (default: True)

Default: True

enhance¶

Command-line interface for enhancing a transcription with diarization and identification.

psifx audio transcription whisper enhance [-h] --transcription TRANSCRIPTION
                                          --diarization DIARIZATION
                                          --identification IDENTIFICATION
                                          --enhanced_transcription
                                          ENHANCED_TRANSCRIPTION
                                          [--overwrite | --no-overwrite]
                                          [--verbose | --no-verbose]

Named Arguments¶

--transcription

path to the input transcription file, such as /path/to/transcription.vtt

--diarization

path to the input diarization file, such as /path/to/diarization.rttm

--identification

path to the input identification file, such as /path/to/identification.json

--enhanced_transcription

path to the output transcription file, such as /path/to/enhanced-transcription.vtt

--overwrite, --no-overwrite

overwrite existing files, otherwise raises an error (default: False)

Default: False

--verbose, --no-verbose

verbosity of the script (default: True)

Default: True

enhance¶

Command-line interface for enhancing a transcription with diarization and identification.

psifx audio transcription enhance [-h] --transcription TRANSCRIPTION
                                  --diarization DIARIZATION --identification
                                  IDENTIFICATION --enhanced_transcription
                                  ENHANCED_TRANSCRIPTION
                                  [--overwrite | --no-overwrite]
                                  [--verbose | --no-verbose]

Named Arguments¶

--transcription

path to the input transcription file, such as /path/to/transcription.vtt

--diarization

path to the input diarization file, such as /path/to/diarization.rttm

--identification

path to the input identification file, such as /path/to/identification.json

--enhanced_transcription

path to the output transcription file, such as /path/to/enhanced-transcription.vtt

--overwrite, --no-overwrite

overwrite existing files, otherwise raises an error (default: False)

Default: False

--verbose, --no-verbose

verbosity of the script (default: True)

Default: True

video¶

Command-line interface for processing videos.

psifx video [-h] [--all-help] {manipulation,pose,face} ...

Sub-commands¶

manipulation¶

Command-line interface for manipulating videos.

psifx video manipulation [-h] [--all-help] {process} ...

Sub-commands¶

process¶

Command-line interface for processing videos. The trimming, cropping and resizing can be performed all at once, and in that order.

psifx video manipulation process [-h] --in_video IN_VIDEO --out_video
                                 OUT_VIDEO [--start START] [--end END]
                                 [--x_min X_MIN] [--y_min Y_MIN]
                                 [--x_max X_MAX] [--y_max Y_MAX]
                                 [--width WIDTH] [--height HEIGHT]
                                 [--overwrite | --no-overwrite]
                                 [--verbose | --no-verbose]

Named Arguments¶

--in_video

path to the input video file, such as /path/to/video.mp4 (or .avi, .mkv, etc.)

--out_video

path to the output video file, such as /path/to/video.mp4 (or .avi, .mkv, etc.)

--start

trim: timestamp in seconds of the start of the selection

--end

trim: timestamp in seconds of the end of the selection

--x_min

crop: x-axis coordinate of the top-left corner in pixels

--y_min

crop: y-axis coordinate of the top-left corner in pixels

--x_max

crop: x-axis coordinate of the bottom-right corner in pixels

--y_max

crop: y-axis coordinate of the bottom-right corner in pixels

--width

resize: width of the resized output

--height

resize: height of the resized output

--overwrite, --no-overwrite

overwrite existing files, otherwise raises an error (default: False)

Default: False

--verbose, --no-verbose

verbosity of the script (default: True)

Default: True

pose¶

Command-line interface for estimating human poses from videos.

psifx video pose [-h] [--all-help] {mediapipe,visualization} ...

Sub-commands¶

mediapipe¶

Command-line interface for running MediaPipe.

psifx video pose mediapipe [-h] [--all-help] {inference,visualization} ...

Sub-commands¶

inference¶

Command-line interface for inferring human pose with MediaPipe Holistic.

psifx video pose mediapipe inference [-h] --video VIDEO --poses POSES
                                     [--masks MASKS]
                                     [--mask_threshold MASK_THRESHOLD]
                                     [--model_complexity MODEL_COMPLEXITY]
                                     [--smooth | --no-smooth]
                                     [--device DEVICE]
                                     [--overwrite | --no-overwrite]
                                     [--verbose | --no-verbose]

Named Arguments¶

--video

path to the input video file, such as /path/to/video.mp4 (or .avi, .mkv, etc.)

--poses

path to the output pose archive, such as /path/to/poses.tar.gz

--masks

path to the output segmentation mask video file, such as /path/to/masks.mp4 (or .avi, .mkv, etc.)

--mask_threshold

threshold for the binarization of the segmentation mask

Default: 0.1

--model_complexity

complexity of the model: {0, 1, 2}, higher means more FLOPs, but also more accurate results

Default: 2

--smooth, --no-smooth

temporally smooth the inference results to reduce the jitter (default: True)

Default: True

--device

device on which to run the inference, either ‘cpu’ or ‘cuda’

Default: “cpu”

--overwrite, --no-overwrite

overwrite existing files, otherwise raises an error (default: False)

Default: False

--verbose, --no-verbose

verbosity of the script (default: True)

Default: True

visualization¶

Command-line interface for visualizing the poses over the video.

psifx video pose mediapipe visualization [-h] --video VIDEO --poses POSES
                                         --visualization VISUALIZATION
                                         [--confidence_threshold CONFIDENCE_THRESHOLD]
                                         [--overwrite | --no-overwrite]
                                         [--verbose | --no-verbose]

Named Arguments¶

--video

path to the input video file, such as /path/to/video.mp4 (or .avi, .mkv, etc.)

--poses

path to the input pose archive, such as /path/to/poses.tar.gz

--visualization

path to the output visualization video file, such as /path/to/visualization.mp4 (or .avi, .mkv, etc.)

--confidence_threshold

threshold for not displaying low confidence keypoints

Default: 0.0

--overwrite, --no-overwrite

overwrite existing files, otherwise raises an error (default: False)

Default: False

--verbose, --no-verbose

verbosity of the script (default: True)

Default: True

visualization¶

Command-line interface for visualizing the poses over the video.

psifx video pose visualization [-h] --video VIDEO --poses POSES
                               --visualization VISUALIZATION
                               [--confidence_threshold CONFIDENCE_THRESHOLD]
                               [--overwrite | --no-overwrite]
                               [--verbose | --no-verbose]

Named Arguments¶

--video

path to the input video file, such as /path/to/video.mp4 (or .avi, .mkv, etc.)

--poses

path to the input pose archive, such as /path/to/poses.tar.gz

--visualization

path to the output visualization video file, such as /path/to/visualization.mp4 (or .avi, .mkv, etc.)

--confidence_threshold

threshold for not displaying low confidence keypoints

Default: 0.0

--overwrite, --no-overwrite

overwrite existing files, otherwise raises an error (default: False)

Default: False

--verbose, --no-verbose

verbosity of the script (default: True)

Default: True

face¶

Command-line interface for estimating face features from videos.

psifx video face [-h] [--all-help] {openface} ...

Sub-commands¶

openface¶

Command-line interface for running OpenFace.

psifx video face openface [-h] [--all-help] {inference,visualization} ...

Sub-commands¶

inference¶

Command-line interface for inferring face features from videos with OpenFace.

psifx video face openface inference [-h] --video VIDEO --features FEATURES
                                    [--overwrite | --no-overwrite]
                                    [--verbose | --no-verbose]

Named Arguments¶

--video

path to the input video file, such as /path/to/video.mp4 (or .avi, .mkv, etc.)

--features

path to the output feature archive, such as /path/to/openface.tar.gz

--overwrite, --no-overwrite

overwrite existing files, otherwise raises an error (default: False)

Default: False

--verbose, --no-verbose

verbosity of the script (default: True)

Default: True

visualization¶

Command-line interface for visualizing face features from videos with OpenFace.

psifx video face openface visualization [-h] --video VIDEO --features FEATURES
                                        --visualization VISUALIZATION
                                        [--depth DEPTH] [--f_x F_X]
                                        [--f_y F_Y] [--c_x C_X] [--c_y C_Y]
                                        [--overwrite | --no-overwrite]
                                        [--verbose | --no-verbose]

Named Arguments¶

--video

path to the input video file, such as /path/to/video.mp4 (or .avi, .mkv, etc.)

--features

path to the input feature archive, such as /path/to/openface.tar.gz

--visualization

path to the output video file, such as /path/to/visualization.mp4 (or .avi, .mkv, etc.)

--depth

projection: assumed static depth of the subject in meters

Default: 3.0

--f_x

projection: x-axis of the focal length

--f_y

projection: y-axis of the focal length

--c_x

projection: x-axis of the principal point

--c_y

projection: y-axis of the principal point

--overwrite, --no-overwrite

overwrite existing files, otherwise raises an error (default: False)

Default: False

--verbose, --no-verbose

verbosity of the script (default: True)

Default: True

text¶

Command-line interface for processing text.

psifx text [-h] [--all-help] {chat,instruction} ...

Sub-commands¶

chat¶

Command-line interface for a chatbot

psifx text chat [-h] [--overwrite | --no-overwrite] [--verbose | --no-verbose]
                [--prompt PROMPT] [--output OUTPUT]
                [--provider {ollama,hf,openai,anthropic}] [--model MODEL]
                [--model_config MODEL_CONFIG] [--api_key API_KEY]

Named Arguments¶

--overwrite, --no-overwrite

overwrite existing files, otherwise raises an error (default: False)

Default: False

--verbose, --no-verbose

verbosity of the script (default: True)

Default: True

--prompt

prompt or path to a .txt file containing the prompt

Default: “”

--output

path to a .txt save file

--provider

Possible choices: ollama, hf, openai, anthropic

The large language model provider. Choices are ‘ollama’, ‘hf’, ‘openai’, or ‘anthropic’. Default is ‘ollama’.

--model

The large language model to use. This depends on the provider.

--model_config

Path to the model .yaml configuration file.

--api_key

Corresponding API key for ‘hf’, ‘openai’, or ‘anthropic’.

instruction¶

Command-line interface for custom instructions

psifx text instruction [-h] [--overwrite | --no-overwrite]
                       [--verbose | --no-verbose] --input INPUT --output
                       OUTPUT [--provider {ollama,hf,openai,anthropic}]
                       [--model MODEL] [--model_config MODEL_CONFIG]
                       [--api_key API_KEY] --instruction INSTRUCTION

Named Arguments¶

--overwrite, --no-overwrite

overwrite existing files, otherwise raises an error (default: False)

Default: False

--verbose, --no-verbose

verbosity of the script (default: True)

Default: True

--input

path to the input .txt, .csv or .vtt file

--output

path to the output .txt or .csv file

--provider

Possible choices: ollama, hf, openai, anthropic

The large language model provider. Choices are ‘ollama’, ‘hf’, ‘openai’, or ‘anthropic’. Default is ‘ollama’.

--model

The large language model to use. This depends on the provider.

--model_config

Path to the model .yaml configuration file.

--api_key

Corresponding API key for ‘hf’, ‘openai’, or ‘anthropic’.

--instruction

Path to a .yaml file containing the prompt and parser.