Example¶
Psifx: Hands-On Example¶
This document presents a detailed hands-on example showcasing the capabilities of psifx.
Objective¶
The goal of this example is to process a video through the complete psifx pipeline, extracting as much information as possible. The video originates from a real experimental setting featuring a staged discussion between two individuals.
Process Overview¶
We will apply psifx to:
Extract poses and facial features from the video.
Diarize and identify the audio to obtain a full transcript.
Use an LLM to summarize the discussion based on the transcript.
Setup¶
Let’s begin by installing up psifx and getting the example video from the git repository.
DOCKER
Install Docker Engine and make sure to follow the post-install instructions. Otherwise, install Docker Desktop.
If you have a GPU and want to use it to accelerate compute:
Install NVIDIA CUDA Toolkit.
Install NVIDIA Container Toolkit.
Clone the repo and navigate to the folder containing the example video.
git clone https://github.com/psifx/psifx.git cd psifx/example/data
Run the latest image with the example directory mounted.
export DATA_PATH="$(pwd)" docker run \ --user $(id -u):$(id -g) \ --gpus all \ --mount type=bind,source=$DATA_PATH,target=$DATA_PATH \ --workdir $DATA_PATH \ --interactive \ --tty \ psifx/psifx:latest
LINUX
For Linux users, install the following system-wide:
sudo apt install ffmpeg ubuntu-restricted-extras \ build-essential cmake wget \ libopenblas-dev \ libopencv-dev \ libdlib-dev \ libboost-all-dev \ libsqlite3-dev
Create a dedicated
conda
environment:conda create -y -n psifx-env python=3.11 pip conda activate psifx-env
Install
psifx
:pip install 'psifx @ git+https://github.com/psifx/psifx.git'
Verify your installation with:
psifx
Install OpenFace using our fork:
wget https://raw.githubusercontent.com/GuillaumeRochette/OpenFace/master/install.py && \ python install.py && \ rm install.py
Install Ollama locally. For Linux users, use this command:
curl -fsSL https://ollama.com/install.sh | sh
Clone the repo and navigate to the folder containing the example video.
git clone https://github.com/psifx/psifx.git cd psifx/example/data
Process video¶
The video setting feature a staged discussion between two individuals. To simplify the workflow only one point of view is provided.
Pose¶
Detect and analyze human poses using MediaPipe.
psifx video pose mediapipe single-inference \
--video Video.mp4 \
--poses Poses.tar.gz --overwrite
Create a visual overlay of the poses detected in the video.
psifx video pose mediapipe visualization \
--video Video.mp4 \
--poses Poses.tar.gz \
--visualization VisualizationPoses.mp4 --overwrite
Face¶
Extract facial features from the video using OpenFace.
psifx video face openface single-inference \
--video Video.mp4 \
--features Faces.tar.gz --overwrite
Create a visual overlay of facial features detected in the video. Add it on top of the overlay for poses.
psifx video face openface visualization \
--video VisualizationPoses.mp4 \
--features Faces.tar.gz \
--visualization VisualizationFaces.mp4
Process audio¶
For the purpose of the example, the video has stereo audio, where the left and right audio channels are fed uniquely from the two lavalier microphones for each person.
As such, the full ‘audio scene’ is embedded in the single video, and we can demonstrate the full pipeline for psifx
.
Preprocess audio¶
Extract the stereo audio track from the video (which contains left and right lavalier microphone outputs for this example).
psifx audio manipulation extraction \
--video Video.mp4 \
--audio Audio.wav
Recover the audio from each microphone by splitting the right and left channels.
psifx audio manipulation split \
--stereo_audio Audio.wav \
--left_audio LeftAudio.wav \
--right_audio RightAudio.wav
Convert the stereo audio to a mono audio track.
psifx audio manipulation conversion \
--audio Audio.wav \
--mono_audio Audio.wav \
--overwrite
Normalize the volume level of each audio file.
psifx audio manipulation normalization \
--audio Audio.wav \
--normalized_audio Audio.wav \
--overwrite
psifx audio manipulation normalization \
--audio LeftAudio.wav \
--normalized_audio LeftAudio.wav \
--overwrite
psifx audio manipulation normalization \
--audio RightAudio.wav \
--normalized_audio RightAudio.wav \
--overwrite
Speaker Diarization¶
Identifies segments for each speaker in the audio file.
psifx audio diarization pyannote inference \
--audio Audio.wav \
--diarization Diarization.rttm
Diarization Visualization¶
Generate a visual timeline of speaker segments.
psifx audio diarization visualization \
--diarization Diarization.rttm \
--visualization VisualizationDiarization.png
Speaker Identification¶
Associates speakers in a mixed audio file with known audio samples.
psifx audio identification pyannote inference \
--mixed_audio Audio.wav \
--diarization Diarization.rttm \
--mono_audios RightAudio.wav LeftAudio.wav \
--identification Identification.json
Speech Transcription¶
Transcribe speech in the audio file to text.
psifx audio transcription whisperx inference \
--audio Audio.wav \
--transcription Transcription.vtt
Enhanced Transcription¶
Enhance transcription with diarization and speaker labels.
psifx audio transcription enhance \
--transcription Transcription.vtt \
--diarization Diarization.rttm \
--identification Identification.json \
--enhanced_transcription TranscriptionEnhanced.vtt
Process text¶
We will use a language model to analyse the content of the transcription.
First create a yaml file Instruction.yaml containing:
prompt: |
user: Here is the transcription of a recording: {text}
What are they talking about?
Run the following command to generate the file.
cat <<EOF > Instruction.yaml
prompt: |
user: Here is the transcription of a recording: {text}
What are they talking about?
EOF
Use a language model to analyse the content of the transcription according to some instruction.
psifx text instruction \
--instruction Instruction.yaml \
--input TranscriptionEnhanced.vtt \
--output TranscriptionAnalysis.txt