Having tested all of these solutions and none of them having worked for me I have found a solution that worked for me and is relatively fast.
Prerequisites:
1. It works with `ffmpeg`
2. It is based on code by Vincent Berthiaume from this post (
[To see links please register here]
)
3. It requires `numpy` (although it doesn't need much from numpy and a solution without `numpy` would probably be relatively easy to write and further increase speed)
Mode of operation, rationale:
1. The solutions provided here were based on AI, or were extremely slow, or loaded the entire audio into memory, which was not feasible for my purposes (I wanted to split the recording of all of Bach's Brandenburg Concertos into particular songs, the 2 LPs are 2 hours long, @ 44 kHz 16bit stereo that is 1.4 GB in memory and very slow). From the beginning when I stumbled upon this post I was telling myself that there must be a simple way as this is a mere threshold filter operation which doesn't need much overhead and could be accomplished on tiny chunks of audio at a time. A couple months later I stumbled upon
[To see links please register here]
which gave me the idea to accomplish audio splitting relatively efficiently.
2. The command line arguments give source mp3 (or whatever ffmpeg can read), silence duration and noise threshold value. For my Bach LP recording, 1 second junks of 0.01 of full amplitude did the trick.
3. It lets `ffmpeg` convert the input to a lossless 16-bit 22kHz PCM and pass it back via `subprocess.Popen`, with the advantage that `ffmpeg` does so very fast and in little chunks which do not occupy much memory.
4. Back in python, 2 temporary `numpy` arrays of the last and before last buffer are concatenated and checked if they surpass the given threshold. If they don't, it means there is a block of silence, and (naively I admit) simply count the time where there is "silence". If the time is at least as long as the given min. silence duration, (again naively) the middle of this current interval is taken as the splitting moment.
5. The program actually doesn't do anything with the source file and instead creates a batch file that can be run that tells `ffmpeg` to take segments bounded by these "silences" and save them into separate files.
6. The user can then run the output batch file, maybe filter through some repeating micro intervals with tiny chunks of silence in case there are long pauses between songs.
7. This solution is both working and fast (none of the other solutions in this thread worked for me).
The little code:
import subprocess as sp
import sys
import numpy
FFMPEG_BIN = "ffmpeg.exe"
print 'ASplit.py <src.mp3> <silence duration in seconds> <threshold amplitude 0.0 .. 1.0>'
src = sys.argv[1]
dur = float(sys.argv[2])
thr = int(float(sys.argv[3]) * 65535)
f = open('%s-out.bat' % src, 'wb')
tmprate = 22050
len2 = dur * tmprate
buflen = int(len2 * 2)
# t * rate * 16 bits
oarr = numpy.arange(1, dtype='int16')
# just a dummy array for the first chunk
command = [ FFMPEG_BIN,
'-i', src,
'-f', 's16le',
'-acodec', 'pcm_s16le',
'-ar', str(tmprate), # ouput sampling rate
'-ac', '1', # '1' for mono
'-'] # - output to stdout
pipe = sp.Popen(command, stdout=sp.PIPE, bufsize=10**8)
tf = True
pos = 0
opos = 0
part = 0
while tf :
raw = pipe.stdout.read(buflen)
if raw == '' :
tf = False
break
arr = numpy.fromstring(raw, dtype = "int16")
rng = numpy.concatenate([oarr, arr])
mx = numpy.amax(rng)
if mx <= thr :
# the peak in this range is less than the threshold value
trng = (rng <= thr) * 1
# effectively a pass filter with all samples <= thr set to 0 and > thr set to 1
sm = numpy.sum(trng)
# i.e. simply (naively) check how many 1's there were
if sm >= len2 :
part += 1
apos = pos + dur * 0.5
print mx, sm, len2, apos
f.write('ffmpeg -i "%s" -ss %f -to %f -c copy -y "%s-p%04d.mp3"\r\n' % (src, opos, apos, src, part))
opos = apos
pos += dur
oarr = arr
part += 1
f.write('ffmpeg -i "%s" -ss %f -to %f -c copy -y "%s-p%04d.mp3"\r\n' % (src, opos, pos, src, part))
f.close()