For new recordings it makes sense to create a recording situation, optimized for technology such as ASR, Aligment, Emotion Detection, Facial Expression Analyses and more.
Some small guidelines:
- record each speaker on a separate audio channel via a separate microphone
- record the speech with a high sample frequency and a 4-bit sample value (not 16-16-mono but 96-32-channel-per-speaker)
- use microphones that have a more-or-less fixed distance to the mouth
- use microphone that mute as much as possible the sound from other sources that the mouth of the speaker
The benefits of the approach mentioned here are great. Separate channels per speaker makes it possible to do automatic turn-taking, it prevents that a louder speaking person "overrules" a softer speaking person and the speech can be transcribed even if people are talking together.