Understanding Innovid Video Specs

In an ever-changing world of digital advertising — it can be hard to keep up. Understanding the nuances of video and audio specifications specs is crucial for advertisers to ensure that their video files meet the necessary requirements to run campaigns seamlessly. This page outlines Innovid’s video and audio spec requirements and their relevant definitions.

Our dedicated account management team ensures that the creative is delivered at its highest possible quality by verifying that all assets are compliant with the established specs. If you have any questions about the spec requirements listed please don’t hesitate to reach out to your dedicated Innovid Account Manager.

Video Specs

The following Innovid pre-roll video specs are designed to ensure that every ad is delivered at its highest possible quality. By requiring that assets meet the highest-level industry specs, Innovid is able to work from a hi-res base asset to seamlessly encode across every publisher on your plan.

If Innovid receives a video that does not meet the following minimum spec requirements, we will need written approval from the client to proceed with the video as is. Note: This step is not required for self-service clients as consent is implied when the client moves forward with the trafficking process.


  • 1920 x 1080
  • 16:9 display aspect ratio
  • No black bars or intro/outro slates
  • Constant Bitrate (CBR) >15 Mbps*
  • Main Profile @ Main Level (MP@ML)


  • 23.98, 25 or 29.97
  • Constant Frame Rate only
  • Remove any pull-down added for broadcast
  • The Innovid Asset Validation Tool does not detect duplicate or blended frames. To fix blended or duplicate frames, please use a deinterlace filter called “auto-adaptive” or “motion adaptive” to remove interlacing**


  • MPEG-4 (.mp4) format is preferable (especially for DCO)
  • QuickTime movie (.mov) is also acceptable


  • Recommended 200 MB***


  • PCM (preferred) or AAC codec
  • 192 Kbps minimum
  • 16 or 24 bit only
  • 48 kHz sample rate
  • 2 channels only
  • -24LKFS +/- 2****
  • True Peak  -6 to -9 dBTP****
  • Audio silence max 1.5 sec (1500 msec)
  • 1 audio stream max

Note: the above specs are based on Hulu guidelines and are subject to change at any time based on Hulu’s discretion. Most up-to-date specs can be found directly on Hulu’s site.

*Constant bitrate requirement is waived when delivering ProRes codec as it is built to be variable.

**To fix interlacing, please revisit the master file and encode a file with an auto-adaptive de-interlace filter.

***Innovid can accept files under 1G, however, we recommend videos under 200 MB.

****Full-episode players (FEP) are more sensitive to audio requirements. Please confirm that asset audio is compliant with the following specs if you are running video on FEP inventory.

  • Only applicable to Roku inventory. Roku audio requirements are as follows:
    • LUFS/LKFS: -23LKFS +/- 2
    • True Peak: Allowed true-peak maximum is -1 dBTP

Video Specs Defined

ASPECT RATIO: The ratio of a video’s width to the video’s height. For example, a video with dimensions of 1920x1080 has an aspect ratio of 16:9.

BITRATE: The number of bits used per second of playback time. High definition video requires a minimum of 15 MBPS (15 Megabytes per second).

BITRATE MODE: The method by which a video file is encoded, either constant bitrate (CBR) or variable bitrate (VBR). Constant bitrate encoding persists the set data rate over the entire video file. Variable bitrate encoding adjusts the data rate based on the data required by the compressor and can result in portions of the video being under the minimum required bitrate.

BLACK BARS: Whether or not the video file contains black bars on the sides of the frame (pillarboxing) or on the top and bottom of the frame (letterboxing).

DIMENSIONS: The width and height of a particular video, measured in pixels. Common high-definition dimensions include 1280x720 and 1920x1080.

FILE FORMAT: One of many standard ways for information to be encoded for storage. Examples for video include .mov, .mp4, etc.

FILE SIZE: The amount of space a file occupies on a storage medium such as a computer hard drive. File sizes can be measured in bytes (B), kilobytes (KB), megabytes (MB), gigabytes (GB), and beyond.

FRAME RATE: The number of frames or images that are projected or displayed per second. Common frame rates used in the US are 23.98, 25, and 29.97 fps.

FRAME RATE MODE: The method by which a video file is rendered, either constant frame rate (CFR) or variable frame rate (VFR). Constant frame rate encoding persists the set frame rate over the entire video file. Variable frame rate encoding adjusts the frame rate based on the perceived level of motion in a video and can result in a portion of the video being under the minimum required frame rate.

Audio Specs Defined

BITRATE: The number of bits used per second of playback time. High definition audio requires a minimum of 192 kbps (192 Kilobits per second).

BIT DEPTH: The number of bits of information in each sample. High definition audio requires a bit depth of either 16 or 24 bits.

CHANNELS: A single stream of recorded sound with a location in a sound field (“left speaker” vs. “right speaker”). Digital audio should only ever have 2 channels.

CODEC: A device or program used for encoding or decoding a digital data stream. Codec is a portmanteau of coder-decoder.

dBFS: Decibels relative to full scale. (dBFS) is a unit of measurement for volume levels in digital systems that have a defined maximum peak level. This term is used to define the optimal audio volume level relative to the systems that will be processing the audio. Recommended dBFS is between -29dB and -25dB.

MAX PEAK dB: The loudest single point of an audio file. 

SAMPLE RATE: The number of samples of a sound that are taken per second to represent the event digitally. High definition audio requires a sample rate of exactly 48 kHz.

LKFS / LUFS:  Standard loudness measurement relative to full scale. One unit of LUFS or LKFS is equal to one dB.

SILENCE: The perceived absence of audio for longer than 1.5 sec (1500 msec).

STREAMS: Streams are used as the output source for audio. Examples include: music track, voiceover, and sound effects.

