Context

In playing performous with microphone pass-through (i.e. having music play from a laptop, several people sing and have their voices amplified through the same speakers, while at the same time recording the voices to the PC for score-keeping), latency and realtimeliness are serious issues.

While the JACK way seems to accomplish the best that's generally possible given the general constraints, there's only so much that can be done when routing audio through USB (typically the only interface available for multiple sound channels on a laptop).

Note that these are all non-issues when not amplifying the voices (which is the usual scenario in small rooms / home sceanarios); this is primarily relevant to stage-like or karaoke-bar setups that to my current knowledge require a setup consisting of a mixing table and several (or a large) external sound cards.

Requirements

Hardware that can be attached to a laptop that provides

4 analog microphone-level inputs (microphones with direct input to the PC would suffer from USB latency); more would be nice
a stereo analog line-level output
receiving a stereo audio stream from the PC
sending all microphone inputs to the laptop independently
internal mixing of the microphone and the data from the laptop to the output with minimal latency (<1ms; that's not a hard requirement but certainly sufficient)

It appears that even professional sound cards with hardware routing only come with two microphone inputs, and the go-to solution for most other requirements is to go through a large hardware mixing table first.

Solution draft

Dedicated hardware could be built for this purpose. It would consist of

A microprocessor with USB 2.0 High Speed (480MBit; full speed doesn't cut the data rate)
- outputs an I2S signal provided by the PC, and the associated bit and LR clocks
- takes the preamps' I2S bitstreams
4 XLR-3 microphone sockets
a stereo RCA output socket
preamp-and-ADC chips, e.g. two TI PCM1862 or one TI PCM1865
some processor that takes in three I2S signals (stereo from the USB processor and 4 channels grouped into 2 stereo channels from the ADCs), multiplies them with a mixing vector and sums them up to an output I2S signal within a single sampling time (up to 192kHz). An iCE40 (as popularized by the IceStorm project) would do the job; possibly even a microprocessor would do (with the right DMA setup, maybe even the main one).
An I2S DAC that outputs the former's signal on line level, e.g. a TI PCM5121

This could be easily scaled up to 8 microphone inputs with twice the number of ADCs.

Cost estimation

When using a single microprocessor, parts for the 4-microphone version would be <10€ for semiconductors in 100pcs quantity; the connectors amount to another 10€ (provided they can be directly PCB-mounted; a front panel solution needs more expensive wiring). A PCB could probably be done in dual-layer 80x40mm and would add another 3€, assembly at 5€ (now I start guessing), and I have no clue how to do the casing.

But seems it should be manufacturable for about 30€. (Note, obviously, that's a manufacturing price that gets you an assembled kit; not claiming any suitability for a retail product; might be a good starting point to think about crowd funding, though.)

Microprocessor selection

The critical attributes of the main microprocessor are * USB 2.0 High Speed * 3 I2S ports (preferably 5 for the scaled-up version).

The STM32F4 and STM32F7 series seem to have several devices that check these boxes (eg. the STM32F723IE); their clock rates should suffice to do the calculations inbetween the frames. It might be necessary to run without DMA as to receive the right interrupts to do the calculation for one side while the other side is transmitting. There'll be a TX buffer empty on the sending side twice per sample / four times per cycle, so there's ample opportunity to start calculation -- and given the synchronicity of the channels, work for all should be doable in any of their interrupts, as they become ready about the same time. The remaining time (ie. outside the interrupt handlers) would be needed to pack the audio data for USB and handle actual packet transmission (which is largly DMA management).

Additional uses

The synchronized audio input of four channels could double as a neat tool in toying with microphone arrays; four omnidirectional microphones should give some basic direction sensing, and could (with the stere output) possibly be combined into some kind of sonar system.

For larger scale location and audio synchronization, it may make sense to synchronize audio streams. That can be accomplished by a dedicated GPIO pin for synchronization (which might reset the LR clock or at least inject a timestamp). On larger scales, radio or GPS-driven clock synchronization would be nice to have.

Incubator status

Too many projects around to follow up on this; it would scratch an itch, but nothing more right now.

The next steps before sketching up a schematic (fun as it'd be) would be verifying that there are no USB soundcards with hardware routing/mixing and sufficient audio inputs around.

This page is part of chrysn's public personal idea incubator; go up for its other entries, or read about the idea of having an idea incubator for more information on what this is.