CoWork Transcription

The CoWork Transcription configuration controls real-time speech-to-text for voice and video calls in i-net CoWork. You can use either OpenAI (Whisper or GPT-4o-based models via API) or Vosk (offline, on-premise). Transcription can be always on, manually started and stopped per call, or disabled. This page is for administrators who configure transcription and for users who take part in calls where transcription is used.

Configure the options in the Configuration application under CoWork Calls → Communication → CoWork Transcription.

Transcript activation

The following options control when transcription is available and which engine is used. Set the activation mode and provider first; language, chunk duration, and end-of-call summary apply to both providers.

Transcript Activation: When transcription is available and how it runs.
- Values:
  - Off - Transcription is disabled. No transcript controls are shown in calls.
  - Always On - Transcription starts automatically when a call starts and stops when the call ends. No user action is required.
  - Manual - Transcription is available but does not start automatically. A participant in the call must start it from the call UI and can stop it before the call ends.
- Default value: Off

Language (ISO-639-1, empty for auto-detect): Optional two-letter language code, e.g., de or en, for the spoken language. Leave empty to let the provider auto-detect the language. For OpenAI, only certain languages are supported; if you set an unsupported language, no transcript is returned. See OpenAI Speech-to-Text: supported languages.

Max. Chunk Duration (seconds): Maximum length of an audio segment in seconds before it is sent for transcription. Longer values can improve context but delay the transcript. Must be greater than 0.
- Default value: 30

Generate Summary at End of Call: When enabled, a short summary of the transcript is generated when the call ends. This uses the OpenAI Chat API and requires a valid OpenAI API Key even if the transcription provider is Vosk.
- Default value: activated

Provider: The speech-to-text engine used for transcription.
- Values:
  - OpenAI - Uses the OpenAI API, Whisper or GPT-4o transcribe models. Requires an API key and network access.
  - Vosk - Uses a local Vosk model. No API key; requires downloading a model and setting the model path.
- Default value: OpenAI

OpenAI

OpenAI provides cloud-based speech-to-text via the Whisper and GPT-4o transcribe APIs. The following options are shown when Provider is OpenAI or when Generate Summary at End of Call is enabled. An API key is required for transcription and for the optional end-of-call summary.

OpenAI API Key: Your OpenAI API key. Required for OpenAI transcription and for end-of-call summary. Keep this key secure; it is used only for transcription and summary requests.

Transcription Model: The OpenAI model used for speech-to-text.
- Values:
  - gpt-4o-transcribe - GPT-4o-based transcription.
  - gpt-4o-mini-transcribe - Lighter model for transcription.
  - whisper-1 - Whisper speech recognition model.
- Default value: whisper-1

Transcription Prompt: Optional prompt to guide the model, e.g., style, terminology, or domain terms. Used only for OpenAI transcription. If empty, a default prompt is used that asks for full conversation transcription and optional sound descriptions in square brackets.

Note: When using OpenAI, set Language only to a supported language. If you specify an unsupported language, no transcript is returned.

Vosk

Vosk runs entirely on your server and does not require an API key or internet access for transcription. The following option is shown when Provider is Vosk.

Vosk Model Path: Path to the folder that contains the unzipped Vosk language model. The folder must exist on the server and be readable by the application.

Note: For Vosk you must download a model from the official list, unzip the archive into a folder, and set Vosk Model Path to that folder. Use the link Vosk Models (download) in the configuration to open the list of available models, e.g., Vosk models. Choose a model that matches the language you use in calls.

What users must do to run transcription

Depending on your role and the chosen activation mode, use the following steps. Administrators configure the options once; participants in a call see the live transcript and, in manual mode, can start or stop it from the call interface.

Administrators

In the Configuration application, open CoWork Calls → Communication → CoWork Transcription.
Set Transcript Activation to Always On or Manual.
For OpenAI: Enter a valid OpenAI API Key and optionally choose Transcription Model and Transcription Prompt. If you enable Generate Summary at End of Call, the API key is required regardless of provider.
For Vosk: Download a model from the Vosk models page, unzip it to a directory on the server, and set Vosk Model Path to that directory.
Set Language if you want to fix the language; otherwise leave it empty for auto-detect.
Be sure to save the configuration after changing any option.

When Transcript Activation is Always On

No action is required during a call. Transcription starts when the call starts and stops when the call ends. Participants see the live transcript in the call UI.

When Transcript Activation is Manual

Only participants who are in the call can start or stop transcription.
Start transcription using the transcript or transcription control in the call interface, e.g., a button or menu, after the call has started.
Stop transcription with the same control before leaving the call if you want to end the transcript earlier; otherwise it stops when the call ends.

Summary at end of call

If Generate Summary at End of Call is enabled and an OpenAI API key is set, a summary is generated automatically when the call ends. No extra user action is required.