Transcribing and editing transcription of a video involves several straightforward steps.
1. Initially, identify the correct URL for the video and input it into the chosen downloader. While it’s preferable to download only the audio, availability varies based on the platform used. Optionally, to ease the upload process, you might extract the audio from the downloaded video using a local tool before proceeding.
2. The next phase is transcription. For this task, I currently utilize the Gladia platform, which is both cost-effective and efficient, costing merely a few cents per hour of video transcribed, and it’s known for its speed and reliability.
3. Following transcription, the text is then refined using a large language model. This step is crucial because spoken language often diverges significantly from written text, characterized by meandering thoughts, incomplete sentences, and self-interruptions. The editing process enhances readability by structuring sentences fully, correcting grammatical errors, and improving clarity. I employ Claude for this editing phase.
The prompt I currently use is the following. (The names of the speakers will change each time, of course.)
Edit the attached transcript to improve clarity and readability while maintaining the speakers’ original message and tone. Speaker 0 is Ayan, Speaker 1 is David. Drop the time codes. Focus on the following objectives:
- Identify and preserve the key points and overall message conveyed by each speaker.
- Correct grammatical errors and improve sentence structure for better comprehension.
- Remove redundant or filler words and phrases that do not contribute to the main ideas.
- Maintain the unique voice and tone of each speaker as much as possible.
- Break down lengthy, run-on sentences into shorter, clearer ones.
- Add transitional words and phrases to enhance the flow and coherence of the conversation.
Please ensure that the edited transcript strikes a balance between staying faithful to the original content and improving its clarity and readability. The final result should be a polished, easy-to-follow conversation that effectively communicates the speakers’ intended messages.
4. The final step entails a thorough review of the edited text. Despite the high proficiency of AI tools, minor errors can persist, thus it’s essential to ensure the final product is flawless. For this review process, I use Google Docs, which allows for versatile repurposing of the refined text.
5. If the video is very long, an additional optional step is to summarize the edited transcription, maybe listing the action items, or takeaways from it, and can again be done using ChatGPT, Claude, or another LLM of choice.