Pictured: Scarlett features a stylish brushed red aluminum finish.
I have a bit of a history reviewing audio hardware, specifically audio I/O. Over time, the audio interface has moved away from PCIe to USB, which it now currently rests at as the defacto state for nearly 15 years after USB 2.0 became widespread. I've owned a few external boxes over the course of a decade, briefly M-Audio's precursor unit that mimics today's Fast Track (Which I returned), Yamaha GO46 FireWire, and Native Instruments Audio Kontrol, and recorded two albums using the the later two. I consider myself a bit of an audio geek, but without the audiophile trappings.
Recently I hit a breaking point, NI Audio Kontrol was able to accept 1/4 inch unbalanced cables. Mystified, I decided it was time to retire the AudioKontrol and check out the offerings in 2016. Unsurprisingly, audio interfaces offer far more bang for the buck than did even 5 years ago, at $180 I was able to score the Focusrite Scarlett 6i6, offering more high quality inputs and outputs than any of my previous devices at a lower price point. Even more impressive for $240, the 18i8 offers a whopping 18 potential inputs and 8 output buses.
The weak point of every USB capture device in my experience has and probably always will be, drivers (and USB itself). As an OS X (excuse me, macOS) user, CoreAudio has been mostly positive. Most USB devices if they're ASIO/CoreAudio compliant, drivers are barely needed for basic I/O. However, if the interfaces have custom buttons / internal routing or other features, then drivers are required. In the case of my AudioKontrol, the drivers actually were mostly negative causing glitchy behavior, and same went for my week with the M-Audio Fast Track. After dealing with years of prosumerish solutions I decided to ante-up to Focusrite, renowned for their preamps, skipping budget players like Presonus and M-Audio.
Fair warning, this as much an overview of digital audio as review. Now onto the review.
FocusRite Scarlett 6i6
Pictured: The 6i6 makes for a good speaker rest
The Scarlett 6i6 is 6 in and 6 out but that doesn't quite accurately sum up the ports. A break down includes the following:Inputs
- 2 front facing Microphone XLR/ 1/4inch Line Inputs with hardware knobs for gain control and level monitoring (Supports 48v)
- 2 1/4inch Line Inputs
- 1 stereo S/PDIF input
- Midi in
- 2 1/4inch Headphone outputs with hardware volume knobs
- 2 1/4inch Line (monitor) headphone outputs with volume knob
- 2 additional line outputs
- 1 stereo S/PDIF output
- Midi out
If you notice, this doesn't add up to the 6 outputs in the device name but instead a total of 6 inputs and 10 outputs. The reasoning is headphones/monitors are all on the same audio bus bring it back down to 6 outputs buses: one for the monitors (speakers/amp + two headphones), an additional set of 1/4 inch outputs and an SPIDF cable. Each of the headphones jacks and monitors have independent volume controls but any audio routed to the monitor outputs will be outputted to those three outputs. Also notable, the Scarlett only accepts 4 analog channels in. Most users probably won't use the S/PDIF I/O (more on that later). The full tech specs can be found here.
Focusrite surprisingly ships the Scarletts with a host of wall adapters for your country of choice but being firmly rooted in North America, I had to swap to North American standard prongs. Other than that, the Scarlett is pretty straight forward: USB cable to the computer, AC adapter to the wall, audio inputs into the device. For me, this meant plugging in my Numark NS7s into the back ports and single mic.
Pictured: The mess of cabling...
With digital audio, there's always (as of writing this) buffering which requires interjecting latency. No matter the device, there will be latency depending on the buffer size. The math to calculating minimum latency is quite simple: Buffer size/sample rate (in KHz) = latency in milliseconds.
512 samples/44.1 kHz = 11.7 ms
384 samples/44.1 kHz = 8.7 ms
512 samples/96 kHz = 5.3 ms
384 samples/96 kHz = 4 ms
However, this is only the absolute minimum for ONE direction, and lowering the buffer puts more stress on CPU to be sure that the buffer never is fully depleted. This becomes tougher to accomplish as the CPU is tasked with processing more information such as more fx and more tracks. Total travel times for buffering would like like the following:
(in) 512 samples/44.1 kHz = 11.7 ms + (out) 512 samples/44.1 kHz = 23.4 ms minimum roundtrip travel time
(in) 384 samples/44.1 kHz = 8.7 ms + (out) 384 samples/44.1 kHz = 17.4 ms minimum roundtrip travel time
(in) 512 samples/96 kHz = 5.3 ms + (out) 512 samples/96 kHz = 5.3 ms = 10.6ms minimum roundtrip travel time
(in) 384 samples/96 kHz = 4 ms + (out) 384 samples/96 kHz = 4 ms = 8ms minimum roundtrip travel time
The math above also represents the absolute minimum for travel time for external audio to travel from an input and routed to an audio output. As stated this is the absolute minimum time, the audio travels through USB for the USB clock timer, which fires at 1 ms intervals, thus there's an latency buffer that has nothing to do with audio samples but rather continuous data flow imposed by USB. Lesser devices simply use a buffer size of roughly 6 ms for each direction (I/O) which adds more travel time, whereas higher end devices will finally tune the USB timing to minimize the delay. Someone using a low end USB device with 384 sample buffering can expect roughly a 29ms delay. Higher end boxes such as the Scarlett have fine tuned drivers to shave off crucial milliseconds for the USB buffering, and also include onboard DSP to allow onboard mixing to lower travel time delay. If this all sounds a bit confusing, it isn't as bad as it sounds.
I would like to route my Mic Input directly into my output so I can monitor my inputs without having to route my audio to my computer, to the DAW then back out USB, all of which introduces a time delay, hence latency. Doing this skips the travel time through the ASIO buffer and USB Clock. The benefit is that I effectively has zero-latency for my input monitoring and my downside is that I cannot make use of any effects in realtime from my DAW.
Higher end audio interface include DSP effects that can be controlled via the software mixers so basic compression/EQing/reverbs/delays can be applied to live monitoring and/or use other interfaces (Firewire has a slightly better clock timing, but Thunderbolt provides even lower latency due to the PCIe bus clock).
All in all, the big step of buying the Scarlett line over a prosumer audio interface boils down to slightly better drivers and internal mixing.
Performance: The bits of it all
24 bit is really an unrealistic thing, it's nearly a meaningless stat when it comes to audio gear, however there are measures that more appropriately reflect the dynamic range, but to fully understand this, we have to talk analog and math.
While I may get flack for saying this, despite issues like latency, digital has had a massive leg up over its analog predecessors, not simply from an archiving/storage perspective but also quality. The much loved vinyl format, hits roughly -80db between signal to noise, meaning the signal is signal power is roughly 80 times stronger to the noise power, which isn't bad. Digital however doesn't have an analog noise floor, and sound pressures are expressed in bit depth, which is the amount of steps to current in the digital-to-analog convert (DAC).
To use an analogy I developed that works reasonably well when writing for an audio publication, Bit depth is akin to bit depth in digital imagery, instead of reflecting how many colors an image can have, it reflects how many steps in volume. Sample rate is the resolution, at which the sound is captured. What becomes interesting is that there's even a formula that explicitly tells you how the maximum dynamic range in decibels for any given bit depth. Using the signal-to-quantization-noise ratio formula: 20*log10(2^BITDEPTH-1), we can calculate the signal to noise ratio. 16 bit audio has a theoretical range of 96.33 dB, which is considerably better than Vinyl, and on par with the best of studio to reel to reel systems. Also, it's important to understand these values represent a theoretical maximum as the Analog-to-digital convertors (ADC) and digital-analog converters (DAC) rarely achieve their maximums. 24 bit audio has a theoretical range of 144.49 dB, far beyond even currently the best hardware on the market. Below I made a simple calculator to play with.
The Focusrite features 109 dB dynamic range on its inputs and outputs which is a little more than 18 bit depth. For the computer savvy 18 bit = 218 which is effectively 262144 sound level pressures vs 16 bit's 65536, or 4x times more detail. Focusrite isn't being deceitful listing 24 bit, but rather dealing with the limitations of audio production. Also notably for a reference point, the theoretical maximum for volume reproduction of 24 bit would be from silence to a NASA rocket launch (140 dB), arena rock concerts are known to be in excessive of 120 dB. It's not realistic to use the entire dynamic range of 24 bit and your neighbors would not approve if you could.
If I haven't talked sampling rates yet, there's a reason, by most accounts, bit depth matters more than sampling rates after a certain point. 44.1 Khz can reproduce 0Hz-22KHz. Capturing at 96 KHz, may actually reduce sound quality if your target format is 44.1 KHz through alias noise. The best way to imagine this is a photo. If you scale proportionally by half, the image will remain clear whereas, scaling to say, 45.9% of the image size would cause some of the image clarity to be sacrificed. The reason why in applications like Photoshop this isn't that big of a problem is through resampling (scaling) algorithms. This sample principal applies to audio, as the wave form must be recomputed and resampled, creating what is known as aliasing. Bit Depth downconversion uses dithering which is a lot more predictable as its a numeric reduction in values, where a range is compressed. Depending on your target format (movies = 48 KHz) or music (usually 44.1 KHz) capturing at 2x the sampling rate is of the target format is preferred. The Scarlett can capture 88.2 KHz but the advantages of higher sampling rate less obvious since DACs have become quite good over the years at filling in the gaps so-to-speak. What high resolution can do is capture above human hearing sounds, and more accurate articulation of the effects of things phasing. It's not night and day, and honestly, I'm mostly hard pressed to tell the difference, as are a lot of people, however, audio processing does better with denser data and the real advantage almost exists entirely in the DAW.
Since I touched on analog vs digital I figure I'll put in a quip in the long standing debate. Most of analog's love has less to with superior quality, but characteristics left due to various mediums limitations. It should be also pointed out that analog effects like harmonic distortions from tube amplification and over-saturation from tape, can and are captured by digital when recording from analog sources. For audiophiles, much of the desire is to recreate how music "used to sound", hence the love of vintage hardware. There's nothing inherently wrong with this except that it often shapes audio debate in non-quantifiable terms and often leads to absurd claims about analog vs digital. Also to add to the debate mess, has been the shift in recording techniques, mixing and mastering over time which also drastically alters the sound of a recording.
Lastly, digital for recording/listening intents and purposes exists in tandem with analog. In any audio digital path, the signal must be converted into analog electrical modulations to be fed into a transducer (speaker) or start as analog from a transducer (microphone) and have the analog signal into digital, so the devices that perform this are very important. In short, as it relates to this review, the Focusrite Scarlett quality that's professional at an absurdly low price point and it's a wonderful time for a hobbiest as digital solutions are cheap and extremely high quality. Focusrite isn't the only player making low-cost/high-quality computer audio interfaces this but it has one of the more attractive packages.
The Real world
At the price point, the Focusrite is well speced, the 2nd generation due out this month gives a modest bump, mostly more headroom on the analog ports, 192 KHz capture/playback and analog protection circuitry for unexpected power surges, all welcome features but not game changing. The 1st gen can be had for $180, a nice $70 price reduction making it a lot of bang for the buck.
Pictured: Scarlett Mixing Software
The Scarlett drivers are straight forward although the device can be used without them but you'll miss out on the analog mixing. After installing the drivers, I rebooted and launched the mixer which immediately updated the firmware of the 6i6, which took mere seconds. No word of what the firmware update did but googling revealed that it improved sample rate switching for OS X (MacOS) users and enables standalone mode so the device can continue routing audio even if the computer isn't turned on, very cool.
The software mixer straight forward, with handy routing presets and input gain control which is useful for the inputs that do not have hardware controls. Any configuration preset can be saved as a snapshot and instantly reloaded, likely more useful for the Scarlett featuring more inputs and outputs but still welcome. Sevearl of the mixer elements also control hardware switches on the device like Input Gain control or line level vs instrument for the front facing. There's a gift and a curse, the hardware is small and compact but it means its entirely driver and support dependent to set the device settings, whereas with previous devices I've owned, line vs instrument gain control was hardware facing. Even with bad drivers, the AudioKontrol functioned as a simple USB input/output device regardless. With any luck though, support will be long in tooth.
Everyone's home studio will look a little different so to give users a chance to contrast and compare, my current setup is as follows:
Computer: Mac Pro 2008 with oodles of upgrades
Monitoring: Vanatoo Transparent Ones with a MartinLogan Dynamo 300 subwoofer, Beyerdynamic DT-990 headphones
Inputs: Numark NS7 Numark Performance Controller (motorized turn table controller with audio output), various Microphones
Midi:Native Instruments Maschine, Korg Padkontrol, Korg Microkey 37, Korg NanoPad, Korg nanoKontrol (all USB)
DAW: Cubase, Logic, Maschine
My mini studio is very hip hop centric, mostly focusing around beat composition. It's not space intensive and uses only a modest amount of hardware and I don't have any real plans of expanding much beyond it. The only real upgrades is probably replacing my Shure SM7b with something a little more forgiving for a wider range of vocals.
Out the gate, I was already happy to simply be able to listen to my headphones or speakers without having to change my audio settings in OS X, even if only an option click away. Swapping between the two was as simple as turning the volume up, this may not seem like a big deal but for all the love Vanatoo get, their speakers annoyingly do not have a front facing volume knob. Also the headphone amp, while some audiophiles scoff at it, is without a doubt reasonably better than my Mac's internal headphone jack that I was reduced to using. At least the Mac Pro headphone jacks aren't pummeled with white noise like the MacBooks. Out the gate, if nothing else more accessible volume knobs and better sound via headphones. I was previously debating a headphone amp for my power hungry DT-990s but they sound better than before and as good as I recall them when I used a Denon mid range receiver as my main headphone amp.
The big hiccup came when trying to get audio to work in Cubase, part of it was user error as I could not get audio to output for the life of me in Cubase and only Cubase. In a moment of inspiration I realized that my ports may not be labeled correctly in the VST panel and noticed that it carried over disabling two outputs. Cubase started showing volume meters for sound but refused to actually output audio. At this point I resorted to a classic audio hack for OS X, create an aggregate device of one in the audio midi setup. For whatever reason, it worked. Annoying? Yes. All other applications functioned normally without this, meaning the issue lies somewhere between Cubase's VST engine and the drivers for the Scarlett.
Recording was easy as ever, there isn't much to say, identifying buses was a charm and recording worked great, noiseless and sounded as rich as it should have for the instrument inputs on the back. The Mic Preamps are notably a little nicer than my Audio Kontrol, simply for the fact it'll accept unbalanced cables. The quality when tested with a Sennheiser e935 without any other preamp was clean and defined, and only required roughly half gain. Comparing it to the Audio Kontrol which wasn't terrible, it seemed just a hair "richer" to use a vague imprecise term. Audio quality is certainly up to professional standards, at least in my book.
The next plus was for the first time, I was able to use live monitoring. With my AudioKontrol, it never worked if it was supposed to. I've always done monitoring via software which has meant delay. Not ideal but it worked. A quick trip to the mixer control panel and the Scarlett worked as expected. I could play my NS7s irregardless if I had a track set to monitor. It's a real benefit over the lower end devices I was using.
After a week in Cubase, there's no noticeable glitches, which Cubase on OS X... macOS... is more prone to than many other audio apps. I'm pleasantly happy with the device.
A slightly different take - S/PDIF
The 6i6 is almost ideal but the S/PDIF coax ports are almost useless for most people. So for anyone asking what S/PDIF (Sony/Phillips Digital Interface Format) is, its the common format developed for transmitting PCM audio or compressed formats such as AC3 (Dolby Digital) or DTS via 75 Ohm Coax (RCA) cables or Toslink (Optical). Toslink over time became the much preferred format, likely for the "cool" factor, and optical cables require no shielding as RF noise does not affect light, thus cables are lightweight and small. S/PDIF can be found on many home theater receivers, some standalone CD players, most DVD players, some Blu-Ray players and in the professional world, DAT systems.
S/PDIF is so ubiquitous that my Mac Pro has I/O via S/PDIF optical and most Macs (MacBook Pros, iMacs) can output S/PDIF optical with a specialized mini-toslink cable. Digital Coax is a fading format, limited to DAT and some CD/DVD players. Outside of DAT, most formats that use S/PDIF can be transferred directly like optical media (CD/DVDs) from an optical drive bay and thus, its mostly used as a way to transmit out from a computer to a receiver or speaker system. I'm not sure about user stats but coax S/PDIF really strikes me as not very useful. I'd much prefer another set of instrument inputs for S/PDIF, a 6i4 (6 analog inputs) would be more useful, I'm guessing most studio musicians would be in the same boat. At the very least, Toslink would be much preferred as there's a much greater chance someone has a speaker system or receiver that uses it.
The other negative is I still don't know why Cubase has a problem with the Scarlett. I've used 3 other boxes over-the-years and never required any workarounds. It works but it strikes as a precarious position as I'm not sure if I'm a DAW update or OS update away from it not working with Cubase but as of writing this it does under OS X 10.11.5. As Mac Cubase user, I'm in the minority and Logic X works fine with it.
- Build quality
- Easy to use device mixer software
- 6 outputs linked to the "Monitor" audio bus alone, meaning two separate headphone amps and external speakers with all independent volume controls
- low latency for USB
- Mild driver issues with Cubase, works fine with workaround.
- Coax S/PDIF really could be swapped for more useful ports. It's best to think of this as a 4 input device