Editorial Aggregation

How to Mix Audio for YouTube Videos

How to Mix Audio for YouTube Videos

Mixing audio for YouTube is what separates a video that holds attention from one viewers click away from. The visual bar on the platform has risen, but the audio bar has risen further: viewers tolerate imperfect picture, they do not tolerate boomy room reverb, plosive thumps, or wildly inconsistent loudness between dialogue and music. This guide walks through the practical signal chain — capture, edit, mix, and master — that gets a typical talking-head, vlog, or tutorial channel to a clean, broadcast-loudness final mix that conforms to YouTube’s loudness normalization.

The order below is the order to actually do the work in: get the recording right, then edit, then EQ and dynamics, then loudness. Skipping back is fine; skipping forward (mastering a recording with a 60 Hz hum still in it, for instance) is not.

How We Choose Our Picks

Studio Supplies is an editorial affiliate publication. We do not operate a hands-on testing lab. Our recommendations are based on:

  • Aggregated guidance from Tier-1 pro-audio publications including Sound on Sound, Production Expert, MusicTech, and Tape Op
  • Authoritative technical specifications from the ITU-R, EBU, AES, and the platforms themselves (YouTube Help)
  • Verified manufacturer documentation for named tools (FabFilter, Waves, iZotope, Sennheiser, OBS)
  • Editorial judgment on what applies to a typical YouTube creator workflow

See full methodology at /pages/methodology. All cited sources are listed at the end of this article.

What You’ll Need

  • A directional microphone appropriate for your recording position — a wireless lavalier or handheld for on-camera presenters, a large-diaphragm condenser or dynamic for fixed studio positions.
  • An audio interface or in-camera input with usable preamps. The interface is the analog-to-digital boundary; everything downstream is constrained by what arrives here.
  • Closed-back monitoring headphones for editing and mix decisions. Open-back monitors leak into mics and color the room differently than what your viewer hears.
  • A pop filter (mesh or foam windscreen) for any close-mic vocal position.
  • A digital audio workstation (DAW) or NLE with serviceable audio tools — DaVinci Resolve’s Fairlight page, Adobe Audition, Logic Pro, Reaper, and Premiere Pro’s Essential Sound panel are all viable. Free options (Resolve, Reaper’s evaluation) cover everything in this guide.

Recommended capture options

Recommended

Sennheiser XSW 1 Wireless Microphone System

For on-camera presenters, vloggers, or anyone who needs to move while talking, a true-diversity wireless lavalier removes the cable as a creative constraint and keeps the mic close to the mouth — the single biggest determinant of dialogue clarity. The XSW 1 series is Sennheiser’s entry-level pro wireless line; Sennheiser documents the XSW lavalier system on the product page at sennheiser.com (XSW 1 ME2 product page). For seated talking-head work, pair it with a desk stand and treat it as a fixed position; for walk-and-talk, clip the lavalier 6–8 inches below the chin.

See Full Details

Step 1: Set Up the Recording Environment

The cheapest and most effective audio improvement is a quieter, less reflective room. Soft furnishings absorb high-frequency reflections; a bookshelf full of mixed-height books diffuses mid-range; a heavy rug under the speaking position kills the slap echo off a hard floor. Sound on Sound’s long-running “Studio SOS” column has documented dozens of home rooms transformed by exactly this kind of furniture-grade treatment before any commercial panels are bought (Sound on Sound — Sound Advice / Studio SOS).

For close-mic dialogue, position the microphone 6–8 inches from the speaker’s mouth, slightly off-axis (aimed at the corner of the mouth rather than dead-center) to reduce plosive air hitting the capsule. Use a pop filter for any closer position. Keep the mic above the mouth angled down or below the mouth angled up — the on-axis null catches less of the room and pulls the voice forward.

Step 2: Record Clean Audio

Record at 48 kHz, 24-bit. 48 kHz is the standard for video production (matching most cameras and broadcast delivery specs); 24-bit gives you a theoretical dynamic range on the order of 144 dB to work with in editing and means you can record conservatively low without losing usable resolution. Sound on Sound’s primer on digital recording describes the 24-bit headroom and noise-floor advantages that make 48/24 the de facto capture format for video work (Sound on Sound — “Digital Problems, Digital Solutions”).

Set input gain so that loud peaks of normal speech land between −18 and −12 dBFS on the meter, with absolute peak headroom of at least 6 dB to 0 dBFS. Recording too hot leaves no room for unexpected loud moments and forces a destructive limiter on the way in; recording too cold buries the signal in the converter’s noise floor and forces aggressive boosting in the mix. Sound on Sound’s gain-staging coverage repeatedly recommends targets in this window for 24-bit digital tracking (Sound on Sound — “Gain Staging Your DAW Software”).

Always record a 30-second room tone — the speaker silent, everything else exactly as it was during the take. This becomes the connective tissue you splice into edit gaps so cuts don’t produce abrupt silences that draw the viewer’s ear.

⚠ EQUIPMENT WARNING — Phantom Power and Ribbon Microphones

Phantom power can damage ribbon microphones if applied or removed while the mic is connected. Always disable phantom power, wait 60 seconds, then connect or disconnect ribbon mics. Some modern ribbons (e.g. active ribbons) require phantom power — check your specific mic’s documentation before applying or removing it.

Phantom power (+48 V) is required for most condenser microphones and is harmless to dynamic mics, but it can permanently damage passive ribbon microphones. If your kit includes a ribbon, follow the safety callout above to the letter. Royer Labs, the foremost active-ribbon manufacturer, publishes a frequently cited FAQ confirming that phantom power applied to passive ribbons can damage the ribbon element and recommending the disable-wait-connect procedure (Royer Labs — Ribbon Microphone FAQ).

Step 3: Edit Before You Mix

Resist the urge to start applying EQ and compression to a raw timeline. Editing first — tightening pauses, removing breaths or umms where appropriate, splicing in room tone over edit gaps — means you mix the version that will actually be heard. A processed mix of an unedited take usually requires re-mixing once the edit is final, which is wasted work. Production Expert’s dialogue post-production guidance reinforces an edit-then-mix order for voice-led video (Production Expert — Dialogue Editing articles).

For long-form content, automate dialogue level on the timeline before reaching for compression. If a section was 4 dB louder than the rest of the take, pull that section’s clip gain down by 4 dB on the timeline. The compressor is then asked to do less work, which sounds more natural.

Step 4: Apply EQ for Clarity

EQ on dialogue is mostly subtractive. Sound on Sound’s voice-EQ workshop lays out the “cut first, boost last” approach as the foundation of clear spoken-word mixes (Sound on Sound — “Mixing Vocals”). The standard moves:

  • High-pass filter (HPF) at 80–100 Hz to remove rumble, HVAC, and traffic noise that lives below the speech range. Most male voices have negligible content below 80 Hz; most female voices below 100 Hz. The HPF is the single most useful EQ move on dialogue and should be the first thing in the chain.
  • Cut around 200–400 Hz, gently (1–3 dB) and with a moderate Q, to reduce “mud” or boxiness that close-mic positions add. The exact frequency varies with voice and mic; sweep a narrow boost first to find the offending region, then cut there.
  • Optional gentle boost at 2–5 kHz for presence and intelligibility. Use sparingly (1–3 dB) and back off if it makes sibilance worse.
  • Optional very high shelf (10 kHz+) for “air” on bright voices — useful on voiceover, less useful on already-bright lavaliers.

For precise notching and surgical cuts around sibilant or resonant frequencies, dialogue engineers routinely reach for a transparent parametric EQ such as FabFilter Pro-Q, whose behavior and feature set are documented in the manufacturer’s plug-in manual (FabFilter Pro-Q product documentation). Waves’ F6 and Renaissance EQ ranges are comparable workhorses, documented on the vendor’s plug-in pages (Waves F6 Dynamic EQ).

Step 5: Control Dynamics with Compression

A compressor evens out the loud-quiet swings inherent in spoken delivery. Sound on Sound’s long-form “Compression Made Easy” tutorial walks through the parameter interactions that produce natural-sounding voice compression (Sound on Sound — “Compression Made Easy”). A sensible starting point for dialogue:

  • Ratio: 3:1 (moderate — aggressive enough to control peaks, gentle enough to preserve naturalness)
  • Attack: 5–15 ms (fast enough to catch sibilants and consonant peaks; slow enough to preserve transient clarity)
  • Release: 100–200 ms (medium — long enough to avoid pumping, short enough to recover before the next phrase)
  • Threshold: set so the compressor reduces gain by 3–6 dB on the loudest peaks of normal speech
  • Make-up gain: bring the post-compression level back up by the amount of average reduction so the perceived loudness matches pre-compression

If 6 dB of reduction sounds squashed, reduce the threshold (less reduction) and accept that you’ll handle the rest with timeline automation or a limiter at mastering. Two stages of gentle compression usually sound better than one stage of aggressive compression.

Step 6: De-Ess and Tame Sibilance

If the EQ “air” boost or the natural mic response makes “s” and “t” sounds harsh, insert a de-esser after the compressor. A de-esser is a frequency-specific compressor centered around 5–8 kHz — the band where most sibilance lives, per Sound on Sound’s de-essing workshop (Sound on Sound — “De-Essing”). Aim for 2–4 dB of reduction on offending consonants only — constant de-essing means the threshold is too low and the voice will sound lispy.

Step 7: Music and Sound Effects

⚠ COPYRIGHT

All music in published video content must either be original, licensed, or used from a verified royalty-free source. “Royalty-free” libraries we recommend: Epidemic Sound, Artlist, YouTube Audio Library, and the Free Music Archive (verify each track’s specific Creative Commons license). Major-label recordings, classical recordings on commercial labels, and most film/TV soundtracks are NOT royalty-free regardless of how they’re labeled by aggregator sites.

The fastest way to a Content ID claim is to drop a commercial recording into a YouTube video. If your channel runs music behind dialogue, source it from a license-cleared library (Epidemic Sound, Artlist) or from YouTube’s own Audio Library. The Free Music Archive offers a large catalog under various Creative Commons terms, but each track must be checked individually — FMA’s license metadata is per-track, not per-collection (Free Music Archive).

For mixing music under dialogue, “ducking” (sidechain compression keyed off the dialogue track) automatically dips the music when the speaker is talking. A 6–10 dB dip with a fast attack and a 200–500 ms release is a common starting point; Production Expert’s sidechain-ducking tutorials describe the same range for dialogue-over-music applications (Production Expert — sidechain / ducking articles). Most NLEs and DAWs include a built-in ducking processor; in Resolve’s Fairlight, the “Auto Duck” effect handles this with one source assignment (DaVinci Resolve Reference Manual — Fairlight).

Step 8: Master to YouTube’s Loudness Standard

YouTube normalizes uploaded audio to approximately −14 LUFS integrated, measured against the ITU-R BS.1770 / EBU R128 loudness standards that govern broadcast and streaming loudness worldwide (ITU-R BS.1770 — Algorithms for loudness measurement; EBU R128 — Loudness normalisation). Tracks louder than that are turned down; tracks quieter than that are not turned up — they play back quieter than the catalog average. The practical implication: master to −14 LUFS integrated with a true-peak ceiling of −1 dBTP and your mix plays at YouTube’s reference level without further alteration.

Workflow:

  1. Drop a loudness meter on your master bus. Resolve, Reaper, Logic, Premiere, Audition, and most modern DAWs include one; the free YouLean Loudness Meter is a popular third-party choice whose BS.1770 / R128 compliance is documented on the vendor site (YouLean Loudness Meter product page).
  2. Play the entire program from start to finish; note the integrated LUFS reading at the end.
  3. If integrated LUFS is below −14, increase master bus gain or add a brick-wall limiter with output ceiling at −1 dBTP and slowly raise the limiter’s input until the integrated reading hits −14.
  4. If integrated LUFS is above −14, lower the master bus level. There’s no benefit to mastering hotter than −14 for YouTube specifically.
  5. Verify the true-peak ceiling never exceeds −1 dBTP. YouTube’s downstream encoder can produce inter-sample peaks above your sample-peak meter’s reading; the −1 dBTP margin is the standard insurance recommended in the EBU R128 / ITU-R BS.1770 toolchain (EBU R128 specification PDF).

YouTube’s own help documentation describes the loudness normalization model and confirms that the platform reduces (but does not amplify) the average loudness of uploaded audio (YouTube Help — “Use loudness normalization for consistent volume”). For deeper background on how iZotope’s metering tools apply BS.1770 to the YouTube target, iZotope’s published guide on streaming loudness is a useful cross-reference (iZotope — “Mastering for Spotify, Apple Music, and Other Streaming Services”).

Step 9: Export for Upload

Export the final master as a high-quality intermediate alongside the video:

  • Audio codec: AAC-LC at 384 kbps stereo (or higher), or PCM if your container/upload pipeline accepts it
  • Sample rate: 48 kHz
  • Bit depth (PCM): 24-bit if available

YouTube’s upload encoder will re-encode the audio for delivery, so providing the highest-quality source means the final viewer-side stream loses the least quality in transit. The platform’s recommended upload specs list AAC-LC at 384 kbps stereo (or 512 kbps 5.1) and 48 kHz sample rate as the target for high-quality stereo content (YouTube Help — “Recommended upload encoding settings”). If you are capturing directly from a live-stream or screen-capture workflow via OBS Studio, OBS’s documentation covers the matching AAC-LC / 48 kHz output configuration (OBS Studio — Output Settings).

Pre-Publish Quality Check

  1. Listen on at least two systems — studio headphones and a laptop speaker, or studio monitors and a phone speaker. Dialogue intelligibility on the worst system is what your average viewer experiences.
  2. Verify the integrated LUFS reading on the final exported file (re-import and re-meter). Some export pipelines apply a hidden gain stage that shifts loudness from the timeline reading.
  3. Spot-check the music ducking on a few transitions; over-ducking sounds dramatic in editing and clipped in playback.
  4. Confirm no Content ID exposure — every music cue is sourced from a license-cleared library and that license is documented somewhere you can retrieve later.

Troubleshooting

  • Voice sounds boxy or honky: sweep a narrow EQ boost between 200–500 Hz to find the resonance, then cut 2–3 dB at that frequency.
  • Voice sounds thin or distant: the mic was too far away or off-axis; re-record if possible. EQ can’t add proximity that wasn’t captured.
  • Constant background hiss: noise reduction (Resolve Fairlight, iZotope RX, Audition) can clean steady-state noise, but apply gently — aggressive denoising introduces “underwater” artifacts that are worse than the hiss. iZotope’s RX documentation covers the Spectral De-noise and Voice De-noise modules used for this purpose (iZotope RX product documentation).
  • Plosive thumps on P/B sounds: add a pop filter for the next take; for the current take, use clip gain or volume automation to dip the offending consonant.
  • Hum or buzz that wasn’t there during the take: usually an editing glitch — check that compressor make-up gain isn’t exposing room noise that the original level masked.
  • Music drowns dialogue on phone playback: raise the duck depth (more attenuation when speech is present) and check the loudness ratio — dialogue should sit 8–15 LU above music in a typical talking-head mix.

Sources & Citations

  1. Gain staging & 48 kHz / 24-bit tracking targets: Sound on Sound, “Gain Staging Your DAW Software,” soundonsound.com/techniques/gain-staging-your-daw-software; Sound on Sound, “Digital Problems, Digital Solutions,” soundonsound.com/techniques/digital-problems-digital-solutions (accessed 2026-04-20)
  2. Room treatment & Studio SOS guidance: Sound on Sound, “Sound Advice / Studio SOS” library, soundonsound.com/sound-advice (accessed 2026-04-20)
  3. Phantom power & passive ribbons: Royer Labs, Ribbon Microphone FAQ, royerlabs.com/ribbon-faq (accessed 2026-04-20)
  4. Subtractive EQ & dialogue-EQ moves: Sound on Sound, “Mixing Vocals,” soundonsound.com/techniques/mixing-vocals (accessed 2026-04-20)
  5. Parametric EQ tools (FabFilter, Waves): FabFilter Pro-Q product documentation, fabfilter.com/products/pro-q-4-equalizer-plug-in; Waves F6 Dynamic EQ, waves.com/plugins/f6-floating-band-dynamic-eq (accessed 2026-04-20)
  6. Compression settings for dialogue: Sound on Sound, “Compression Made Easy,” soundonsound.com/techniques/compression-made-easy (accessed 2026-04-20)
  7. De-esser range (5–8 kHz): Sound on Sound, “De-Essing,” soundonsound.com/techniques/de-essing (accessed 2026-04-20)
  8. Edit-then-mix order & sidechain ducking for dialogue: Production Expert, Dialogue Editing topic archive, production-expert.com — dialogue editing; Production Expert sidechain / ducking topic archive, production-expert.com — sidechain (accessed 2026-04-20)
  9. Auto Duck in Resolve Fairlight: Blackmagic Design, DaVinci Resolve Reference Manual, documents.blackmagicdesign.com — DaVinci Resolve Reference Manual (accessed 2026-04-20)
  10. Loudness measurement algorithm (LUFS): ITU-R, Recommendation BS.1770 — “Algorithms to measure audio programme loudness and true-peak audio level,” itu.int/rec/R-REC-BS.1770 (accessed 2026-04-20)
  11. Loudness normalisation practice & true-peak margin: EBU, Recommendation R128 — “Loudness normalisation and permitted maximum level of audio signals,” tech.ebu.ch/publications/r128; EBU R128 PDF, tech.ebu.ch/docs/r/r128.pdf (accessed 2026-04-20)
  12. YouTube loudness target (−14 LUFS) & normalization behavior: YouTube Help, “Use loudness normalization for consistent volume,” support.google.com/youtube/answer/9598943 (accessed 2026-04-20)
  13. Streaming-loudness practice reference: iZotope, “Mastering for Spotify, Apple Music, and Other Streaming Services,” izotope.com — mastering for streaming services (accessed 2026-04-20)
  14. YouLean Loudness Meter (BS.1770 compliant): YouLean product page, youlean.co/youlean-loudness-meter (accessed 2026-04-20)
  15. YouTube upload encoding settings (AAC-LC 384 kbps / 48 kHz): YouTube Help, “Recommended upload encoding settings,” support.google.com/youtube/answer/1722171 (accessed 2026-04-20)
  16. OBS Studio output settings (AAC / 48 kHz): OBS Project, Output Settings knowledge-base, obsproject.com/kb/output-settings (accessed 2026-04-20)
  17. iZotope RX for noise reduction: iZotope, RX product documentation, izotope.com/en/products/rx.html (accessed 2026-04-20)
  18. Sennheiser XSW 1 lavalier system documentation: Sennheiser, XSW 1 ME2 product page, sennheiser.com — XSW 1 ME2 (accessed 2026-04-20)
  19. License-cleared music libraries referenced: Epidemic Sound, epidemicsound.com; Artlist, artlist.io; YouTube Audio Library, studio.youtube.com/channel/UC/music; Free Music Archive (per-track CC license metadata), freemusicarchive.org (accessed 2026-04-20)

Last verified: 2026-04-20

Share this article: Twitter