CLI¶
psifx
command-line interface.
usage: psifx [-h] [-v] [--all-help] {audio,video,text} ...
Named Arguments¶
- -v, --version
show version info
Sub-commands¶
audio¶
Command-line interface for processing audio tracks.
psifx audio [-h] [--all-help]
{diarization,identification,manipulation,speech,transcription} ...
Sub-commands¶
diarization¶
Command-line interface for diarizing audio tracks.
psifx audio diarization [-h] [--all-help] {pyannote,visualization} ...
Sub-commands¶
pyannote¶
Command-line interface for running pyannote diarization tool.
psifx audio diarization pyannote [-h] [--all-help]
{inference,visualization} ...
Sub-commands¶
inference¶
Command-line interface for diarizing an audio track with pyannote.
psifx audio diarization pyannote inference [-h] --audio AUDIO --diarization
DIARIZATION
[--num_speakers NUM_SPEAKERS]
[--model_name MODEL_NAME]
[--api_token API_TOKEN]
[--device DEVICE]
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Named Arguments¶
- --audio
path to the input audio file, such as
/path/to/audio.wav
- --diarization
path to the output diarization file, such as
/path/to/diarization.rttm
- --num_speakers
number of speaking participants, if ignored the model will try to guess it, it is advised to specify it
- --model_name
name of the diarization model used, c.f. https://huggingface.co/pyannote/speaker-diarization/tree/main/reproducible_research
Default: “pyannote/speaker-diarization@2.1.1”
- --api_token
API token for the downloading the models from HuggingFace
- --device
device on which to run the inference, either ‘cpu’ or ‘cuda’
Default: “cpu”
- --overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
Default: False
- --verbose, --no-verbose
verbosity of the script (default: True)
Default: True
visualization¶
Command-line interface for visualizing the diarization of a track.
psifx audio diarization pyannote visualization [-h] --diarization DIARIZATION
--visualization VISUALIZATION
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Named Arguments¶
- --diarization
path to the input diarization file, such as
/path/to/diarization.rttm
- --visualization
path to the output visualization file, such as
/path/to/visualization.png
- --overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
Default: False
- --verbose, --no-verbose
verbosity of the script (default: True)
Default: True
visualization¶
Command-line interface for visualizing the diarization of a track.
psifx audio diarization visualization [-h] --diarization DIARIZATION
--visualization VISUALIZATION
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Named Arguments¶
- --diarization
path to the input diarization file, such as
/path/to/diarization.rttm
- --visualization
path to the output visualization file, such as
/path/to/visualization.png
- --overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
Default: False
- --verbose, --no-verbose
verbosity of the script (default: True)
Default: True
identification¶
Command-line interface for identifying speakers in audio tracks.
psifx audio identification [-h] [--all-help] {pyannote} ...
Sub-commands¶
pyannote¶
Command-line interface for running pyannote identification tool.
psifx audio identification pyannote [-h] [--all-help] {inference} ...
Sub-commands¶
inference¶
Command-line interface for identifying speakers from an audio track with pyannote.
psifx audio identification pyannote inference [-h] --mixed_audio MIXED_AUDIO
--diarization DIARIZATION
--mono_audios MONO_AUDIOS
[MONO_AUDIOS ...]
--identification IDENTIFICATION
[--model_names MODEL_NAMES [MODEL_NAMES ...]]
[--api_token API_TOKEN]
[--device DEVICE]
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Named Arguments¶
- --mixed_audio
path to the input mixed audio file, such as
/path/to/mixed-audio.wav
- --diarization
path to the input diarization file, such as
/path/to/diarization.rttm
- --mono_audios
paths to the input mono audio files, such as
/path/to/mono-audio-1.wav /path/to/mono-audio-2.wav
- --identification
path to the output identification file, such as
/path/to/identification.json
- --model_names
names of the embedding models
Default: [‘pyannote/embedding’, ‘speechbrain/spkrec-ecapa-voxceleb’]
- --api_token
API token for the downloading the models from HuggingFace
- --device
device on which to run the inference, either ‘cpu’ or ‘cuda’
Default: “cpu”
- --overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
Default: False
- --verbose, --no-verbose
verbosity of the script (default: True)
Default: True
manipulation¶
Command-line interface for manipulating audio tracks.
psifx audio manipulation [-h] [--all-help]
{extraction,conversion,split,mixdown,normalization}
...
Sub-commands¶
extraction¶
Command-line interface for extracting the audio track from a video.
psifx audio manipulation extraction [-h] --video VIDEO --audio AUDIO
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Named Arguments¶
- --video
path to the input video file, such as
/path/to/video.mp4
(or .avi, .mkv, etc.)- --audio
path to the output audio file, such as
/path/to/audio.wav
- --overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
Default: False
- --verbose, --no-verbose
verbosity of the script (default: True)
Default: True
conversion¶
Command-line interface for converting any audio track to a mono audio track at 16kHz sample rate.
psifx audio manipulation conversion [-h] --audio AUDIO --mono_audio MONO_AUDIO
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Named Arguments¶
- --audio
path to the input audio file, such as
/path/to/audio.wav
(or .mp3, etc.)- --mono_audio
path to the output audio file, such as
/path/to/mono-audio.wav
- --overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
Default: False
- --verbose, --no-verbose
verbosity of the script (default: True)
Default: True
split¶
Command-line interface for splitting a stereo audio track into two mono tracks.
psifx audio manipulation split [-h] --stereo_audio STEREO_AUDIO --left_audio
LEFT_AUDIO --right_audio RIGHT_AUDIO
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Named Arguments¶
- --stereo_audio
path to the input stereo audio file, such as
/path/to/stereo-audio.wav
- --left_audio
path to the output left channel mono audio file, such as
/path/to/left-audio.wav
- --right_audio
path to the output right channel mono audio file, such as
/path/to/right-audio.wav
- --overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
Default: False
- --verbose, --no-verbose
verbosity of the script (default: True)
Default: True
mixdown¶
Command-line interface for mixing multiple mono audio tracks.
psifx audio manipulation mixdown [-h] --mono_audios MONO_AUDIOS
[MONO_AUDIOS ...] --mixed_audio MIXED_AUDIO
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Named Arguments¶
- --mono_audios
paths to the input mono audio files, such as
/path/to/mono-audio-1.wav /path/to/mono-audio-2.wav
- --mixed_audio
path to the output mixed audio file, such as
/path/to/mixed-audio.wav
- --overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
Default: False
- --verbose, --no-verbose
verbosity of the script (default: True)
Default: True
normalization¶
Command-line interface for normalizing an audio track.
psifx audio manipulation normalization [-h] --audio AUDIO --normalized_audio
NORMALIZED_AUDIO
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Named Arguments¶
- --audio
path to the input audio file, such as
/path/to/audio.wav
- --normalized_audio
path to the output normalized audio file, such as
/path/to/normalized-audio.wav
- --overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
Default: False
- --verbose, --no-verbose
verbosity of the script (default: True)
Default: True
speech¶
Command-line interface for extracting non-verbal speech features from an audio track.
psifx audio speech [-h] [--all-help] {opensmile} ...
Sub-commands¶
opensmile¶
Command-line interface for running OpenSmile.
psifx audio speech opensmile [-h] [--all-help] {inference} ...
Sub-commands¶
inference¶
Command-line interface for extracting non-verbal speech features from an audio track with OpenSmile.
psifx audio speech opensmile inference [-h] --audio AUDIO --diarization
DIARIZATION --features FEATURES
[--feature_set FEATURE_SET]
[--feature_level FEATURE_LEVEL]
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Named Arguments¶
- --audio
path to the input audio file, such as
/path/to/audio.wav
- --diarization
path to the input diarization file, such as
/path/to/diarization.rttm
- --features
path to the output feature archive, such as
/path/to/opensmile.tar.gz
- --feature_set
available sets: [‘ComParE_2016’, ‘GeMAPSv01a’, ‘GeMAPSv01b’, ‘eGeMAPSv01a’, ‘eGeMAPSv01b’, ‘eGeMAPSv02’, ‘emobase’]
Default: “ComParE_2016”
- --feature_level
available levels: [‘lld’, ‘lld_de’, ‘func’]
Default: “func”
- --overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
Default: False
- --verbose, --no-verbose
verbosity of the script (default: True)
Default: True
transcription¶
Command-line interface for transcribing audio tracks.
psifx audio transcription [-h] [--all-help] {whisper,enhance} ...
Sub-commands¶
whisper¶
Command-line interface for running Whisper.
psifx audio transcription whisper [-h] [--all-help] {inference,enhance} ...
Sub-commands¶
inference¶
Command-line interface for transcribing an audio track with Whisper.
psifx audio transcription whisper inference [-h] --audio AUDIO --transcription
TRANSCRIPTION
[--language LANGUAGE]
[--model_name MODEL_NAME]
[--translate_to_english | --no-translate_to_english]
[--device DEVICE]
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Named Arguments¶
- --audio
path to the input audio file, such as
/path/to/audio.wav
- --transcription
path to the output transcription file, such as
/path/to/transcription.vtt
- --language
language of the audio, if ignore, the model will try to guess it, it is advised to specify it
- --model_name
name of the model, check https://github.com/openai/whisper#available-models-and-languages
Default: “small”
- --translate_to_english, --no-translate_to_english
whether to transcribe the audio in its original language or to translate it to english (default: False)
Default: False
- --device
device on which to run the inference, either ‘cpu’ or ‘cuda’
Default: “cpu”
- --overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
Default: False
- --verbose, --no-verbose
verbosity of the script (default: True)
Default: True
enhance¶
Command-line interface for enhancing a transcription with diarization and identification.
psifx audio transcription whisper enhance [-h] --transcription TRANSCRIPTION
--diarization DIARIZATION
--identification IDENTIFICATION
--enhanced_transcription
ENHANCED_TRANSCRIPTION
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Named Arguments¶
- --transcription
path to the input transcription file, such as
/path/to/transcription.vtt
- --diarization
path to the input diarization file, such as
/path/to/diarization.rttm
- --identification
path to the input identification file, such as
/path/to/identification.json
- --enhanced_transcription
path to the output transcription file, such as
/path/to/enhanced-transcription.vtt
- --overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
Default: False
- --verbose, --no-verbose
verbosity of the script (default: True)
Default: True
enhance¶
Command-line interface for enhancing a transcription with diarization and identification.
psifx audio transcription enhance [-h] --transcription TRANSCRIPTION
--diarization DIARIZATION --identification
IDENTIFICATION --enhanced_transcription
ENHANCED_TRANSCRIPTION
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Named Arguments¶
- --transcription
path to the input transcription file, such as
/path/to/transcription.vtt
- --diarization
path to the input diarization file, such as
/path/to/diarization.rttm
- --identification
path to the input identification file, such as
/path/to/identification.json
- --enhanced_transcription
path to the output transcription file, such as
/path/to/enhanced-transcription.vtt
- --overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
Default: False
- --verbose, --no-verbose
verbosity of the script (default: True)
Default: True
video¶
Command-line interface for processing videos.
psifx video [-h] [--all-help] {manipulation,pose,face} ...
Sub-commands¶
manipulation¶
Command-line interface for manipulating videos.
psifx video manipulation [-h] [--all-help] {process} ...
Sub-commands¶
process¶
Command-line interface for processing videos. The trimming, cropping and resizing can be performed all at once, and in that order.
psifx video manipulation process [-h] --in_video IN_VIDEO --out_video
OUT_VIDEO [--start START] [--end END]
[--x_min X_MIN] [--y_min Y_MIN]
[--x_max X_MAX] [--y_max Y_MAX]
[--width WIDTH] [--height HEIGHT]
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Named Arguments¶
- --in_video
path to the input video file, such as
/path/to/video.mp4
(or .avi, .mkv, etc.)- --out_video
path to the output video file, such as
/path/to/video.mp4
(or .avi, .mkv, etc.)- --start
trim: timestamp in seconds of the start of the selection
- --end
trim: timestamp in seconds of the end of the selection
- --x_min
crop: x-axis coordinate of the top-left corner in pixels
- --y_min
crop: y-axis coordinate of the top-left corner in pixels
- --x_max
crop: x-axis coordinate of the bottom-right corner in pixels
- --y_max
crop: y-axis coordinate of the bottom-right corner in pixels
- --width
resize: width of the resized output
- --height
resize: height of the resized output
- --overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
Default: False
- --verbose, --no-verbose
verbosity of the script (default: True)
Default: True
pose¶
Command-line interface for estimating human poses from videos.
psifx video pose [-h] [--all-help] {mediapipe,visualization} ...
Sub-commands¶
mediapipe¶
Command-line interface for running MediaPipe.
psifx video pose mediapipe [-h] [--all-help] {inference,visualization} ...
Sub-commands¶
inference¶
Command-line interface for inferring human pose with MediaPipe Holistic.
psifx video pose mediapipe inference [-h] --video VIDEO --poses POSES
[--masks MASKS]
[--mask_threshold MASK_THRESHOLD]
[--model_complexity MODEL_COMPLEXITY]
[--smooth | --no-smooth]
[--device DEVICE]
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Named Arguments¶
- --video
path to the input video file, such as
/path/to/video.mp4
(or .avi, .mkv, etc.)- --poses
path to the output pose archive, such as
/path/to/poses.tar.gz
- --masks
path to the output segmentation mask video file, such as
/path/to/masks.mp4
(or .avi, .mkv, etc.)- --mask_threshold
threshold for the binarization of the segmentation mask
Default: 0.1
- --model_complexity
complexity of the model: {0, 1, 2}, higher means more FLOPs, but also more accurate results
Default: 2
- --smooth, --no-smooth
temporally smooth the inference results to reduce the jitter (default: True)
Default: True
- --device
device on which to run the inference, either ‘cpu’ or ‘cuda’
Default: “cpu”
- --overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
Default: False
- --verbose, --no-verbose
verbosity of the script (default: True)
Default: True
visualization¶
Command-line interface for visualizing the poses over the video.
psifx video pose mediapipe visualization [-h] --video VIDEO --poses POSES
--visualization VISUALIZATION
[--confidence_threshold CONFIDENCE_THRESHOLD]
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Named Arguments¶
- --video
path to the input video file, such as
/path/to/video.mp4
(or .avi, .mkv, etc.)- --poses
path to the input pose archive, such as
/path/to/poses.tar.gz
- --visualization
path to the output visualization video file, such as
/path/to/visualization.mp4
(or .avi, .mkv, etc.)- --confidence_threshold
threshold for not displaying low confidence keypoints
Default: 0.0
- --overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
Default: False
- --verbose, --no-verbose
verbosity of the script (default: True)
Default: True
visualization¶
Command-line interface for visualizing the poses over the video.
psifx video pose visualization [-h] --video VIDEO --poses POSES
--visualization VISUALIZATION
[--confidence_threshold CONFIDENCE_THRESHOLD]
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Named Arguments¶
- --video
path to the input video file, such as
/path/to/video.mp4
(or .avi, .mkv, etc.)- --poses
path to the input pose archive, such as
/path/to/poses.tar.gz
- --visualization
path to the output visualization video file, such as
/path/to/visualization.mp4
(or .avi, .mkv, etc.)- --confidence_threshold
threshold for not displaying low confidence keypoints
Default: 0.0
- --overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
Default: False
- --verbose, --no-verbose
verbosity of the script (default: True)
Default: True
face¶
Command-line interface for estimating face features from videos.
psifx video face [-h] [--all-help] {openface} ...
Sub-commands¶
openface¶
Command-line interface for running OpenFace.
psifx video face openface [-h] [--all-help] {inference,visualization} ...
Sub-commands¶
inference¶
Command-line interface for inferring face features from videos with OpenFace.
psifx video face openface inference [-h] --video VIDEO --features FEATURES
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Named Arguments¶
- --video
path to the input video file, such as
/path/to/video.mp4
(or .avi, .mkv, etc.)- --features
path to the output feature archive, such as
/path/to/openface.tar.gz
- --overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
Default: False
- --verbose, --no-verbose
verbosity of the script (default: True)
Default: True
visualization¶
Command-line interface for visualizing face features from videos with OpenFace.
psifx video face openface visualization [-h] --video VIDEO --features FEATURES
--visualization VISUALIZATION
[--depth DEPTH] [--f_x F_X]
[--f_y F_Y] [--c_x C_X] [--c_y C_Y]
[--overwrite | --no-overwrite]
[--verbose | --no-verbose]
Named Arguments¶
- --video
path to the input video file, such as
/path/to/video.mp4
(or .avi, .mkv, etc.)- --features
path to the input feature archive, such as
/path/to/openface.tar.gz
- --visualization
path to the output video file, such as
/path/to/visualization.mp4
(or .avi, .mkv, etc.)- --depth
projection: assumed static depth of the subject in meters
Default: 3.0
- --f_x
projection: x-axis of the focal length
- --f_y
projection: y-axis of the focal length
- --c_x
projection: x-axis of the principal point
- --c_y
projection: y-axis of the principal point
- --overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
Default: False
- --verbose, --no-verbose
verbosity of the script (default: True)
Default: True
text¶
Command-line interface for processing text.
psifx text [-h] [--all-help] {chat,instruction} ...
Sub-commands¶
chat¶
Command-line interface for a chatbot
psifx text chat [-h] [--overwrite | --no-overwrite] [--verbose | --no-verbose]
[--prompt PROMPT] [--output OUTPUT]
[--provider {ollama,hf,openai,anthropic}] [--model MODEL]
[--model_config MODEL_CONFIG] [--api_key API_KEY]
Named Arguments¶
- --overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
Default: False
- --verbose, --no-verbose
verbosity of the script (default: True)
Default: True
- --prompt
prompt or path to a .txt file containing the prompt
Default: “”
- --output
path to a .txt save file
- --provider
Possible choices: ollama, hf, openai, anthropic
The large language model provider. Choices are ‘ollama’, ‘hf’, ‘openai’, or ‘anthropic’. Default is ‘ollama’.
- --model
The large language model to use. This depends on the provider.
- --model_config
Path to the model .yaml configuration file.
- --api_key
Corresponding API key for ‘hf’, ‘openai’, or ‘anthropic’.
instruction¶
Command-line interface for custom instructions
psifx text instruction [-h] [--overwrite | --no-overwrite]
[--verbose | --no-verbose] --input INPUT --output
OUTPUT [--provider {ollama,hf,openai,anthropic}]
[--model MODEL] [--model_config MODEL_CONFIG]
[--api_key API_KEY] --instruction INSTRUCTION
Named Arguments¶
- --overwrite, --no-overwrite
overwrite existing files, otherwise raises an error (default: False)
Default: False
- --verbose, --no-verbose
verbosity of the script (default: True)
Default: True
- --input
path to the input .txt, .csv or .vtt file
- --output
path to the output .txt or .csv file
- --provider
Possible choices: ollama, hf, openai, anthropic
The large language model provider. Choices are ‘ollama’, ‘hf’, ‘openai’, or ‘anthropic’. Default is ‘ollama’.
- --model
The large language model to use. This depends on the provider.
- --model_config
Path to the model .yaml configuration file.
- --api_key
Corresponding API key for ‘hf’, ‘openai’, or ‘anthropic’.
- --instruction
Path to a .yaml file containing the prompt and parser.