praanscribe is a small application for automatically transcribing audio and creating TextGrid files to be used in Praat.
Purpose
Praat uses the ‘TextGrid’ file format to integrate annotations into audio files. Manual transcription is frequently used during this process. The application’s goal is to automate this procedure by transcribing the utterances from audio files and generating TextGrid files that match the original audio’s length.
Logic
The program reads an audio file and transcribes it using the Whisper model, generating both word-level and sentence-level transcriptions. It then creates a TextGrid
file with 5 tiers: the top 3 are empty, the 4th tier contains word-level transcription, and the 5th tier contains sentence-level transcription. The program identifies pauses between words and adjusts the timing accordingly to ensure accurate timestamps. This TextGrid
file can be used in Praat for further analysis, allowing for both word and sentence-based investigations of the audio.
Usage
python praanscribe.py
<language_code>
: Language code indicating the language of the audio such as ‘en’, ‘tr’, ‘fr’, or more specific dialects like ‘en-US’, ‘tr-TR’, etc.
<audio_file>
: Path to the audio file (.wav) you want to transcribe.
The output is a simple TextGrid file where each word has the same length, stretching through the duration of the audio. This file, along with the audio can be further edited and analyzed using Praat.
praanscribe uses the SpeechRecognition library in order to transcribe utterances.
Example Output
Text read by a Turkish speaker:
File type = "ooTextFile"
Object class = "TextGrid"
xmin = 0
xmax = 11.248624999999999
tiers? <exists>
size = 1
item []:
item [1]:
class = "IntervalTier"
name = "words"
xmin = 0
xmax = 11.248624999999999
intervals: size = 18
intervals [1]:
xmin = 0
xmax = 0.6249236111111112
text = "İstanbul'un"
intervals [2]:
xmin = 0.6249236111111112
xmax = 1.2498472222222223
text = "iklimi"
intervals [3]:
xmin = 1.2498472222222223
xmax = 1.8747708333333335
text = "Türkiye'de"
intervals [4]:
xmin = 1.8747708333333335
xmax = 2.4996944444444447
text = "Karadeniz"
intervals [5]:
xmin = 2.4996944444444447
xmax = 3.1246180555555556
text = "iklimi"
intervals [6]:
xmin = 3.1246180555555556
xmax = 3.7495416666666666
text = "ile"
intervals [7]:
xmin = 3.7495416666666666
xmax = 4.3744652777777775
text = "Akdeniz"
intervals [8]:
xmin = 4.3744652777777775
xmax = 4.999388888888888
text = "iklimi"
intervals [9]:
xmin = 4.999388888888888
xmax = 5.624312499999999
text = "arasında"
intervals [10]:
xmin = 5.624312499999999
xmax = 6.24923611111111
text = "geçiş"
intervals [11]:
xmin = 6.24923611111111
xmax = 6.874159722222221
text = "özelliği"
intervals [12]:
xmin = 6.874159722222221
xmax = 7.499083333333332
text = "gösteren"
intervals [13]:
xmin = 7.499083333333332
xmax = 8.124006944444444
text = "bir"
intervals [14]:
xmin = 8.124006944444444
xmax = 8.748930555555555
text = "iklimdir"
intervals [15]:
xmin = 8.748930555555555
xmax = 9.373854166666666
text = "Dolayısıyla"
intervals [16]:
xmin = 9.373854166666666
xmax = 9.998777777777777
text = "İstanbul'un"
intervals [17]:
xmin = 9.998777777777777
xmax = 10.623701388888888
text = "iklimi"
intervals [18]:
xmin = 10.623701388888888
xmax = 11.248624999999999
text = "ılımandır"
Projects That Use praanscribe:
This section is dedicated to projects that have used or are currently using praanscribe for their research and have chosen to be mentioned on this website. I am honored to contribute in even a small way to their work and am grateful for the opportunity to support their research!
Kaya, A. Ç. (2024). Ölçünlü Türkçenin ünlü formant frekansları.
Uzun, İ. P. (2024).
Leave a Reply