praanscribe

praanscribe is a small application for automatically transcribing audio and creating TextGrid files to be used in Praat.

Purpose

Praat uses the ‘TextGrid’ file format to integrate annotations into audio files. Manual transcription is frequently used during this process. The application’s goal is to automate this procedure by transcribing the utterances from audio files and generating TextGrid files that match the original audio’s length.

Logic

The program reads an audio file and transcribes it using the Whisper model, generating both word-level and sentence-level transcriptions. It then creates a TextGrid file with 5 tiers: the top 3 are empty, the 4th tier contains word-level transcription, and the 5th tier contains sentence-level transcription. The program identifies pauses between words and adjusts the timing accordingly to ensure accurate timestamps. This TextGrid file can be used in Praat for further analysis, allowing for both word and sentence-based investigations of the audio.

Usage

python praanscribe.py

<language_code>: Language code indicating the language of the audio such as ‘en’, ‘tr’, ‘fr’, or more specific dialects like ‘en-US’, ‘tr-TR’, etc.

<audio_file>: Path to the audio file (.wav) you want to transcribe.

The output is a simple TextGrid file where each word has the same length, stretching through the duration of the audio. This file, along with the audio can be further edited and analyzed using Praat.

praanscribe uses the SpeechRecognition library in order to transcribe utterances.

Example Output

Text read by a Turkish speaker:

File type = "ooTextFile"
Object class = "TextGrid"

xmin = 0
xmax = 11.248624999999999
tiers? <exists>
size = 1
item []:
    item [1]:
        class = "IntervalTier"
        name = "words"
        xmin = 0
        xmax = 11.248624999999999
        intervals: size = 18
        intervals [1]:
            xmin = 0
            xmax = 0.6249236111111112
            text = "İstanbul'un"
        intervals [2]:
            xmin = 0.6249236111111112
            xmax = 1.2498472222222223
            text = "iklimi"
        intervals [3]:
            xmin = 1.2498472222222223
            xmax = 1.8747708333333335
            text = "Türkiye'de"
        intervals [4]:
            xmin = 1.8747708333333335
            xmax = 2.4996944444444447
            text = "Karadeniz"
        intervals [5]:
            xmin = 2.4996944444444447
            xmax = 3.1246180555555556
            text = "iklimi"
        intervals [6]:
            xmin = 3.1246180555555556
            xmax = 3.7495416666666666
            text = "ile"
        intervals [7]:
            xmin = 3.7495416666666666
            xmax = 4.3744652777777775
            text = "Akdeniz"
        intervals [8]:
            xmin = 4.3744652777777775
            xmax = 4.999388888888888
            text = "iklimi"
        intervals [9]:
            xmin = 4.999388888888888
            xmax = 5.624312499999999
            text = "arasında"
        intervals [10]:
            xmin = 5.624312499999999
            xmax = 6.24923611111111
            text = "geçiş"
        intervals [11]:
            xmin = 6.24923611111111
            xmax = 6.874159722222221
            text = "özelliği"
        intervals [12]:
            xmin = 6.874159722222221
            xmax = 7.499083333333332
            text = "gösteren"
        intervals [13]:
            xmin = 7.499083333333332
            xmax = 8.124006944444444
            text = "bir"
        intervals [14]:
            xmin = 8.124006944444444
            xmax = 8.748930555555555
            text = "iklimdir"
        intervals [15]:
            xmin = 8.748930555555555
            xmax = 9.373854166666666
            text = "Dolayısıyla"
        intervals [16]:
            xmin = 9.373854166666666
            xmax = 9.998777777777777
            text = "İstanbul'un"
        intervals [17]:
            xmin = 9.998777777777777
            xmax = 10.623701388888888
            text = "iklimi"
        intervals [18]:
            xmin = 10.623701388888888
            xmax = 11.248624999999999
            text = "ılımandır"

Projects That Use praanscribe:

This section is dedicated to projects that have used or are currently using praanscribe for their research and have chosen to be mentioned on this website. I am honored to contribute in even a small way to their work and am grateful for the opportunity to support their research!

Kaya, A. Ç. (2024). Ölçünlü Türkçenin ünlü formant frekansları.

Uzun, İ. P. (2024).

Leave a Reply

Your email address will not be published. Required fields are marked *