Skip to content

AI Enhancement

Optional Feature - You Can Skip This!

This step is completely optional. Vox works perfectly without AI Enhancement - your transcriptions will be accurate as-is. Only configure this if you want to polish your transcriptions automatically (fix grammar, remove filler words like "um" or "uh").

If you're not sure what this is or don't want to set it up right now, feel free to skip to the next section!

AI Enhancement uses AI (like ChatGPT or Claude) to automatically clean up your transcriptions, making them more polished and professional.

What Does AI Enhancement Do?

In simple terms: It takes your transcription (which is already accurate) and makes it sound more professional.

Example:

  • Before AI: "Um, so basically, like, we need to, uh, schedule a meeting for, you know, next Tuesday"
  • After AI: "We need to schedule a meeting for next Tuesday"

Key Benefits

  • Fixes grammar: Corrects mistakes automatically
  • Removes filler words: Gets rid of "um", "uh", "like", etc.
  • Makes it cleaner: More professional and easier to read
  • Keeps your meaning: Your original message stays the same

Do You Need This?

You might want AI Enhancement if:

  • You use Vox for professional documents or emails
  • You want your transcriptions to look polished without editing
  • You're comfortable using AI services (like ChatGPT)

You DON'T need AI Enhancement if:

  • You're just taking quick notes for yourself
  • You prefer complete privacy (AI Enhancement sends text to external services)
  • You don't mind editing transcriptions manually
  • You're not sure what an "API key" is

Privacy Note

Your audio never leaves your device. Only the text transcription is sent to the AI service you choose. If you want complete privacy, just leave AI Enhancement disabled.

How to Set It Up

Requirements

To use AI Enhancement, you need:

  1. An account with an AI service (like OpenAI, Anthropic, AWS, etc.)
  2. An API key (think of it like a password for the AI service)
  3. A few dollars for AI costs (usually $0.01-0.05 per transcription)

Coming Soon: We're working on a built-in paid option so you won't need to set this up yourself.

AI Enhancement Settings

Step 1: Open AI Enhancement Settings

  1. Open Vox settings
  2. Click on AI Enhancement in the sidebar
  3. Toggle Enhance My Transcriptions with AI to ON

AI Enhancement Enabled

Step 2: Choose Your AI Provider

Select an AI service from the dropdown. Popular options:

  • OpenAI (ChatGPT) - Most popular, easy to use
  • Anthropic (Claude) - Good quality, privacy-focused
  • AWS Bedrock - For advanced users
  • Ollama - Run AI locally (free, but requires setup)

Step 3: Add Your API Key

  1. Get an API key from your chosen provider (see provider sections below)
  2. Paste it into the API Key field
  3. Click Test Connection to make sure it works

First Time?

If you've never used AI APIs before, we recommend starting with OpenAI (ChatGPT). They have clear pricing ($0.002 per request) and good documentation.

Supported Providers

Provider Dropdown

Vox supports multiple AI providers. Click the Provider dropdown to select:

AWS Bedrock

Best for: Production use, enterprise environments, variety of models

AWS Bedrock Configuration

Configuration:

Region

  • Select your AWS region (e.g., eu-west-1)
  • Choose a region close to you for lower latency

AWS Profile

  • Enter your AWS profile name (e.g., sso-bedrock)
  • Uses AWS CLI credentials configured on your system

Access Key ID

  • Optional: Enter your AWS access key
  • Required if not using AWS CLI profiles
  • Format: AKIA... (20 characters)

Secret Access Key

  • Optional: Enter your secret access key
  • Required if not using AWS CLI profiles
  • Stored securely in macOS Keychain / Windows Credential Manager

Model ID

  • Specify the Bedrock model to use
  • Example: global.anthropic.claude-haiku-4-5-20251101-v1:0
  • See AWS Bedrock Models

AWS Setup

AWS Bedrock requires:

  1. An AWS account with Bedrock access
  2. IAM permissions for the selected model
  3. Either AWS CLI configured with SSO or access keys

DeepSeek

Best for: Cost-effective AI enhancement, good performance

DeepSeek Configuration

Configuration:

API Key

  • Get an API key from DeepSeek
  • Stored securely in macOS Keychain / Windows Credential Manager

Model

Endpoint

  • Default: https://api.deepseek.com
  • Change only if using a custom endpoint

Microsoft Foundry

Best for: Azure users, enterprise integration

Configuration:

  • Similar to AWS Bedrock
  • Requires Azure subscription and Foundry access
  • Uses Azure authentication

OpenAI

Best for: High-quality GPT models, simplest setup

Configuration:

  • API Key: Get from OpenAI Platform
  • Model: gpt-4, gpt-4-turbo, gpt-3.5-turbo, etc.
  • Endpoint: Default https://api.openai.com/v1

GLM (Zhipu AI)

Best for: Chinese language transcription, Asia-Pacific users

Configuration:

  • API Key: Get from Zhipu AI
  • Model: glm-4, glm-4-air, etc.

Anthropic

Best for: Claude models with high reasoning ability

Configuration:

  • API Key: Get from Anthropic Console
  • Model: claude-3-5-sonnet-20241022, claude-3-opus-20240229, etc.

LiteLLM

Best for: Advanced users, custom model routing, unified API

Configuration:

Testing Your Connection

Test Connection Success

After configuring your provider:

  1. Click Test Connection
  2. Wait for the test to complete
  3. Look for "Connection successful!" message

If the test fails:

  • Verify your API keys/credentials are correct
  • Check your internet connection
  • Ensure your account has access to the specified model
  • Review error messages for specific issues

Keychain Access

You may need to grant Keychain permission to store API keys securely.

Custom Prompts

Custom Prompt Tab

Customize how AI enhances your transcriptions:

Using Custom Prompts

  1. Click the Custom Prompt tab
  2. Enter your custom instructions
  3. Settings save automatically

Custom Prompt Example

Custom Prompt Example

Example prompt:

turn everything to chinese

This will translate all transcriptions to Chinese.

Default Prompt

If you don't specify a custom prompt, Vox uses a default instruction to:

  • Fix grammar and spelling
  • Remove filler words (um, uh, like, etc.)
  • Preserve technical terms and names
  • Maintain the original meaning

Prompt Tips

Good prompts:

  • "Fix grammar but keep technical jargon"
  • "Remove filler words and format as bullet points"
  • "Translate to Spanish and fix grammar"
  • "Make more formal and professional"

Avoid:

  • Extremely long prompts (may hit token limits)
  • Prompts that change meaning significantly
  • Prompts that add information not in the transcription

Experiment

Test different prompts with the same transcription to find what works best for your use case.

Example Instructions

Example custom instructions shown in the interface:

Example Instructions

Click Example Instructions to see sample prompts:

  • "Add additional instructions on top of Vox's default behavior like grammar, remove filler words, phrases of hesitation, I avoid words to use with the fillwords"
  • Provides guidance on how to structure your custom instructions

Provider-Specific Guides

AWS Bedrock Setup

Prerequisites:

  1. AWS account with Bedrock access
  2. Model access enabled in AWS Console
  3. IAM permissions configured

Using AWS CLI Profile (Recommended):

bash
# Configure AWS CLI with SSO
aws configure sso

# Test your profile
aws bedrock list-foundation-models --profile sso-bedrock

In Vox:

  1. Select AWS Bedrock provider
  2. Enter profile name (e.g., sso-bedrock)
  3. Select region
  4. Enter model ID
  5. Click Test Connection

Using Access Keys:

  1. Create access keys in AWS IAM
  2. Enter Access Key ID and Secret Access Key in Vox
  3. Select region and model
  4. Click Test Connection

Security

Store access keys securely. AWS CLI profiles with SSO are more secure than static access keys.

DeepSeek Setup

Prerequisites:

  1. Account at platform.deepseek.com
  2. API key generated

Setup:

  1. Sign up for DeepSeek
  2. Generate an API key
  3. In Vox, select DeepSeek provider
  4. Enter your API key
  5. Use model deepseek-chat
  6. Click Test Connection

Cost: DeepSeek is cost-effective compared to other providers.

OpenAI Setup

Prerequisites:

  1. OpenAI account
  2. API key with credits

Setup:

  1. Get API key from platform.openai.com/api-keys
  2. In Vox, select OpenAI provider
  3. Enter your API key
  4. Choose model (e.g., gpt-4-turbo, gpt-3.5-turbo)
  5. Click Test Connection

Cost: OpenAI charges per token. GPT-4 is more expensive but higher quality than GPT-3.5.

Cost Considerations

Pricing Overview

AI Enhancement costs vary by provider:

ProviderCost per 1M tokensNotes
DeepSeek~$0.14Most cost-effective
OpenAI GPT-3.5~$0.50Good value
OpenAI GPT-4~$10-30High quality, expensive
AWS Bedrock~$0.25-15Varies by model
Anthropic Claude~$3-15High quality

Estimating Costs

Average transcription: 50-100 tokens Cost per transcription: $0.001-0.01 (depending on provider)

Example usage:

  • 100 transcriptions/day with DeepSeek: ~$0.50/month
  • 100 transcriptions/day with GPT-4: ~$30/month

Save Money

  • Use smaller, cheaper models for simple transcriptions
  • Use GPT-4 or Claude only when you need highest quality
  • DeepSeek offers the best price-to-performance ratio

Future Pricing

Built-in AI Coming Soon

Currently, Vox uses your own API keys for AI enhancement. In the future, we may offer a built-in paid AI model option for convenience.

This would eliminate the need to manage API keys and potentially offer:

  • Simplified setup (no API keys needed)
  • Predictable monthly pricing
  • Integrated billing
  • Optimized models for transcription

Stay tuned for updates!

Best Practices

When to Use AI Enhancement

Use AI Enhancement for:

  • Professional emails and documentation
  • Meeting notes and summaries
  • Content creation and writing
  • Formal communications

Skip AI Enhancement for:

  • Quick personal notes
  • When complete privacy is required
  • Simple, short transcriptions
  • When speed is critical

Choosing a Provider

Choose AWS Bedrock if you:

  • Already use AWS for other services
  • Need enterprise-grade security
  • Want access to multiple model providers
  • Have existing AWS credits

Choose DeepSeek if you:

  • Want the most cost-effective option
  • Need good quality at low cost
  • Transcribe frequently

Choose OpenAI if you:

  • Want the simplest setup
  • Need reliable, high-quality results
  • Already have OpenAI credits

Choose Anthropic if you:

  • Need advanced reasoning and accuracy
  • Work with complex, technical content
  • Want Claude's specific capabilities

Prompt Engineering Tips

  1. Be specific: "Remove filler words and fix grammar" is better than "improve"
  2. Test iterations: Try different prompts to find what works
  3. Combine instructions: "Fix grammar, remove fillers, and format as bullet points"
  4. Consider context: Adjust prompts for different use cases (email vs. code comments)

Troubleshooting

Connection Test Fails

AWS Bedrock:

  • Verify IAM permissions include Bedrock model access
  • Check region matches where model is available
  • Test AWS CLI: aws bedrock list-foundation-models --region <region>
  • Ensure model ID is correct

DeepSeek/OpenAI/Anthropic:

  • Verify API key is valid
  • Check your account has credits/active subscription
  • Ensure endpoint URL is correct
  • Test API key with curl:
    bash
    curl https://api.deepseek.com/v1/models \
      -H "Authorization: Bearer YOUR_API_KEY"

AI Enhancement Takes Too Long

Solutions:

  • Switch to a faster model (e.g., GPT-3.5 instead of GPT-4)
  • Use a provider with lower latency
  • Check your internet connection
  • Reduce custom prompt complexity

Enhanced Text is Wrong

Solutions:

  • Adjust your custom prompt to be more specific
  • Try a different model (larger models are often more accurate)
  • Use a simpler prompt or the default behavior
  • Check that your base transcription is accurate first

API Key Stored Incorrectly

Solution:

  1. Navigate to Settings → AI Enhancement
  2. Re-enter your API key
  3. Grant Keychain access when prompted
  4. Click Test Connection to verify

High API Costs

Solutions:

  • Switch to a cheaper provider (DeepSeek)
  • Use AI Enhancement selectively (disable for quick notes)
  • Monitor usage in your provider's dashboard
  • Consider using smaller models
  • Optimize your custom prompt to reduce output tokens

Security and Privacy

Data Privacy

What is sent to AI providers:

  • Text transcription only (after local Whisper processing)
  • Your custom prompt
  • No audio, no personal info beyond the transcription text

What is NOT sent:

  • Original audio recordings
  • Other transcriptions (each request is independent)
  • Personal information from Vox settings

Secure Storage

  • API Keys: Stored encrypted in macOS Keychain / Windows Credential Manager
  • Credentials: Never transmitted to Vox servers
  • Transcriptions: Can be retained locally (see Audio Retention)

Provider Privacy Policies

Review privacy policies for your chosen provider:

Data Processing

When you enable AI Enhancement, your transcriptions are sent to third-party AI providers. If you work with sensitive information, consider:

  • Using only local transcription (disable AI Enhancement)
  • Choosing providers with strong privacy guarantees
  • Using private deployments (AWS PrivateLink, Azure Private Link)

Advanced Configuration

Custom Endpoints

Some providers allow custom endpoints for:

  • Private deployments
  • On-premises installations
  • Proxy servers
  • Regional optimizations

Enter custom endpoints in the Endpoint field when configuring a provider.

LiteLLM for Advanced Routing

LiteLLM Provider

LiteLLM allows:

  • Unified interface to 100+ LLM providers
  • Automatic fallback and retry
  • Load balancing across multiple providers
  • Cost tracking and budgets

Setup:

  1. Deploy LiteLLM server: https://docs.litellm.ai
  2. Select LiteLLM provider in Vox
  3. Enter your LiteLLM server URL
  4. Configure routing in LiteLLM config

Environment Variables

If you use AWS CLI profiles or environment variables, Vox respects:

  • AWS_PROFILE
  • AWS_REGION
  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY

Next Steps

Built with 💜 by the open-source community & core contributors