🚀 Getting Started
Getting Started¶
This project uses uv for dependency management and pyenv to ensure you're using the correct Python version. Follow the steps below for your operating system to quickly set up the environment and start developing.
1. Install System Prerequisites¶
First, you need to install direnv, pyenv, uv, and ffmpeg.
🍎 macOS¶
On MacOS, the easiest way to install these tools is using Homebrew. If you don't have Homebrew, install it first by following the instructions on the Homebrew website.
# Install direnv for managing per directory environment variables
$ brew install direnv
# Install pyenv for managing Python versions
$ brew install pyenv
# Install uv (the package manager)
$ brew install uv
# Install FFmpeg for audio/video processing
$ brew install ffmpeg
Note: You must complete the pyenv shell configuration steps printed after installation (e.g., adding eval "$(pyenv init -)" to your shell config file like ~/.zshrc).
Note: You must use "direnv allow" when you change into the transcribe directory so that it can set the environment variables it needs to using the .envrc file.
🐧 Ubuntu (and WSL2)¶
Install the prerequisites using the Advanced Package Tool (apt).
$ sudo apt update
$ sudo apt install -y make build-essential libssl-dev zlib1g-dev \\
libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm \\
libncurses5-dev libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev
# Install pyenv, uv, and ffmpeg
$ curl https://pyenv.run | bash
$ curl -LsSf https://astral.sh/uv/install.sh | bash
$ sudo apt install -y direnv
$ sudo apt install -y ffmpeg
Note: After running the pyenv installation script, you must add the initialization commands (printed to your console) to your shell's configuration file (~/.bashrc or ~/.zshrc) and then restart your terminal.
Note: You must use "direnv allow" when you change into the transcribe directory so that it can set the environment variables it needs to using the .envrc file.
🪟 Windows¶
For a smooth development experience, the recommended approach is to use the Windows Subsystem for Linux (WSL2) and follow the Ubuntu instructions above.
2. Project Setup¶
The project uses a Makefile to simplify setup. Once your prerequisites are installed, you can set up the entire environment with a single command.
Step 2.1: Install Python and Create the Environment¶
The following command will automatically:
1. Use pyenv to install the required Python version.
2. Use uv to create the virtual environment (.venv).
3. Use uv to install all project dependencies.
4. Install pre-commit hooks.
Step 2.2: Activate the Environment¶
You can manually activate the virtual environment every time you open a new terminal to work on the project however, direnv should do this for you automatically based on our .envrc file.
Here are the commands to activate your virtual python environment manually.
# For macOS/Ubuntu/WSL
$ source .venv/bin/activate
# For native Windows (cmd or PowerShell)
# .venv\\Scripts\\activate
3. Make Commands¶
The project uses make targets to standardize common tasks. Ensure your virtual environment is active before running these commands.
| Command | Description |
|---|---|
make install |
Setup: Install the virtual environment, dependencies, and pre-commit hooks. |
make check |
Quality: Run code quality tools (linters/formatters, dependency checks, and type-checking). |
make test |
Quality: Run the automated tests using pytest. |
make docs-test |
Quality: Test if documentation can be built without warnings or errors. |
make build |
Distribution: Build the distributable wheel (.whl) and source archive (.tar.gz) files in the dist/ directory. |
make clean-build |
Cleanup: Remove build artifacts from the dist/ directory. |
make docs |
Documentation: Build the documentation and serve it locally (usually at http://127.0.0.1:8000). |
make man |
Documentation: Display the transcribe scripts --help output in the form of a man page. |
make transcribe |
Execution: Interactively run the main application to transcribe video files to SRT subtitle files. |
make help |
Help: Display this help message with descriptions for all targets. |
3.1. Reading the Command "man" page.¶
To see the options that you can give the transcribe.py script simple run the following:
This will display the following "man" page:
$ make man
usage: transcribe.py [-h] [--dry-run] [--include [INCLUDE ...]] [--exclude [EXCLUDE ...]] [--force] [--input-path INPUT_PATH] [--suffix SUFFIX] [--model {tiny.en,base.en,small.en,medium.en}] [--interactive] [--version]
Transcribe audio files using a pre-trained model.
options:
-h, --help show this help message and exit
--dry-run, -n Try a dry run without any actual transcription.
--include [INCLUDE ...]
A list of files or rglob patterns to include when processing. Defaults to **/*.mp4.
--exclude [EXCLUDE ...]
A list of files or rglob patterns to exclude from processing (overrides the include list).
--force Force overwrite of existing output SRT files.
--input-path INPUT_PATH
Directory containing input audio files (required in non-interactive mode).
--suffix SUFFIX Suffix of audio files to process (default: .mp4).
--model {tiny.en,base.en,small.en,medium.en}
Pre-trained model to use (default: base.en, available ['tiny.en', 'base.en', 'small.en', 'medium.en']).
--interactive Run in interactive mode, prompting for missing arguments.
--version, -v Show program's version number and exit.
Giving no arguments will cause the command to run in "interactive" mode, prompting you for any information you need.
Supplying the arguments on the command-line gives you more control.
You can override the interactive mode's defaults by supplying the --interactive option combined with the command options you want to override.
3.2. Running the Transcriber using "make"¶
To run the main functionality of the project, use the dedicated "transcribe" target:
This will display the current transcription defaults and prompt you interactively for any defaults you want to change. See the next section for more details and an example.3.2.1 Interacting with the Transcriber¶
The Makefile transcribe target actually calls our script with the following arguments
The uv run part ensures that we are running under the correct python environment:
transcribe:
@uv run python src/transcriber/transcribe.py --interactive --exclude \
"_Model/sheets/jpgs/output.mp4" \
"_Model/OD_Textures/Open Source/AmbientCG/space-generation-success.mp4" \
"_Model/OD_Textures/Open Source/AmbientCG/space-generation-fail.mp4" \
"_Model/Animation/final video.mp4"
This Makefile target causes the script to run interactively but also excludes some of the mp4 videos,
typically found in the Bonsai_Tutorials directory, which have no audio to transcribe.
In interactive mode the script displays the current settings and then gives you the opportunity to override some of these defaults (the non-interactive mode lets you override anything with command-line options).
Here is my attempt to use the Makefile to run our script and tweak it to use
some non-defaults i.e.
- my preferred path (
~/projects/Bonsai_Tutorials) - the largest Whisper model available (
medium.eninstead of the default smallest model,base.en).
Okay, let's run make transcribe:
$ make transcribe
Entering interactive mode. Please provide the required information.
Enter the directory with videos (default: .):
The first thing the script needs to know is where the video files live.
We do not want the default (the current working directory) so we'll enter
the location of our videos as being ~/projects/Bonsai_Tutorials:
The script then shows the current default settings and prompts you to see if you want to override the current defaults:
Current settings for transcribe version 1.0.0:
Suffix: .mp4
Model: base.en
Force overwrite: No
Dry run: No
Excluded patterns: (_Model/sheets/jpgs/output.mp4, _Model/OD_Textures/Open Source/AmbientCG/space-generation-success.srt, _Model/OD_Textures/Open Source/AmbientCG/space-generation-fail.srt, _Model/Animation/final video.srt)
Include patterns: (None)
You will now be prompted for any changes to these settings.
Enter suffix to process (or press Enter to keep '.mp4'):
suffix by hitting the Enter (or Return) key.
Next, we are prompted to enter the Whisper model we want to use, the default is the smallest (base.en), which gives great results, but we will go with the largest model instead (medium,en). To be honest, the smallest model's results are pretty similar.
We'll take the default for "Force overwrite" (No), this stops the script overwriting any existing .srt files).
We'll also take the default for "Enable dry run mode" (No), this means the script will actually perform the transcription instead of just printing what it would have done.
Enter model to use (or press Enter to keep 'base.en', available tiny.en, base.en, small.en, medium.en): medium.en
Force overwrite of existing SRT files? (y/N, default: N):
Enable dry run mode? (y/N, default: N):
Confirm settings for transcribe version 1.0.0:
Suffix: .mp4
Model: medium.en
Force overwrite: No
Dry run: No
Excluded patterns: (_Model/sheets/jpgs/output.mp4, _Model/OD_Textures/Open Source/AmbientCG/space-generation-success.srt, _Model/OD_Textures/Open Source/AmbientCG/space-generation-fail.srt, _Model/Animation/final video.srt)
Include patterns: (None)
Hit Enter to continue, or Ctrl-C to abort.
The script waits for you to confirm either by hitting the Enter (or Return) key or cancel by hitting Ctrl-C.
We hit Enter and, as the CPU or GPU starts to glow white hot, we eventually get SRT subtitle text files as siblings to all the .mp4 files we recursively found in our Bonsai_Tutorials.
You should see quite a bit of output as the script processes each video.
Example of How I use the SRT files.¶
To find a critical explanation, now that I have all the transcriptions available as .srt SRT subtitle text files, I can simply use any text searching tool I want to find the topic I'm interested in:
$ find Bonsai_Tutorials -name \*.srt -exec grep -i 'profiles' {} \; -print
Go down to our profiles
Bonsai_Tutorials/077000_20250303_1601 - Working with Arrays/077000_20250303_1601 - Working with Arrays.srt
And I'm going to create a custom profile from that. So go down to our profiles and click on this object to pick it up.
Bonsai_Tutorials/077000_20250303_1601 - Working with Arrays/077000_20250303_1601 - Working with Arrays.base.srt
also you can purge unused profiles and sorry unused types as well but I'm not
Bonsai_Tutorials/113000_20250418_1626 - Purging unused materials and styles from the file/113000_20250418_1626 - Purging unused materials and styles from the file.srt
And there's also you can purge unused profiles and sorry unused types as well.
Bonsai_Tutorials/113000_20250418_1626 - Purging unused materials and styles from the file/113000_20250418_1626 - Purging unused materials and styles from the file.base.srt
To those profiles and layer sets.
Bonsai_Tutorials/093000_20250312_1635 - Annotation tag types/093000_20250312_1635 - Annotation tag types.base.srt
material was already signed to that to those profiles and layer sets.
Bonsai_Tutorials/093000_20250312_1635 - Annotation tag types/093000_20250312_1635 - Annotation tag types.srt
Click on this dropdown, you can see all the different types of profiles that IFC offers.
Bonsai_Tutorials/069000_20250226_1738 - Adding strip footings/069000_20250226_1738 - Adding strip footings.base.srt
Click on this drop down you can see all the different types of profiles that IFC offers
Bonsai_Tutorials/069000_20250226_1738 - Adding strip footings/069000_20250226_1738 - Adding strip footings.srt
You can pull in, you know, I've seen materials, profiles, and types.
Bonsai_Tutorials/080000_20250304_1723 - pulling in content or assets from other files/080000_20250304_1723 - pulling in content or assets from other files.srt
And you can pull in, I've seen materials, profiles and types.
Bonsai_Tutorials/080000_20250304_1723 - pulling in content or assets from other files/080000_20250304_1723 - pulling in content or assets from other files.base.srt
We're going to go to profiles and we're going to use this arbitrary, closed profile
Bonsai_Tutorials/070000_20250227_0930 - Thickened edge with custom profile/070000_20250227_0930 - Thickened edge with custom profile.base.srt
We're going to go to profiles and we're going to use this arbitrary closed profile def.
Bonsai_Tutorials/070000_20250227_0930 - Thickened edge with custom profile/070000_20250227_0930 - Thickened edge with custom profile.srt
The cabinets here are kind of just generic, massing cabinets. They're actually extruded profiles
Bonsai_Tutorials/124000_20250522_1549 - Intro to Git and creating a floor outline with surrounding walls/124000_20250522_1549 - Intro to Git and creating a floor outline with surrounding walls.base.srt
They're actually extruded profiles if I tab into them.
Bonsai_Tutorials/124000_20250522_1549 - Intro to Git and creating a floor outline with surrounding walls/124000_20250522_1549 - Intro to Git and creating a floor outline with surrounding walls.srt
Armed with these clues, I can then use vlc, or some equivalent video player, and I can find the exact section of the video I need to watch.
You can also use the timings in the SRT file to find the time the phrase occurred.
Any advice for improvement much appreciated...
AtDhVaAnNkCsE
Doug Scoular dscoular@gmail.com 2025/10/23