Musher Module¶
pybind11 musher_python plugin
Audio Decoders¶
-
musher.
load_audio_file
(file_path: str) → numpy.ndarray[numpy.uint8]¶ Load the data from an audio file.
- Parameters
file_path (str) – File path to a .wav file.
- Returns
Audio file data.
- Return type
numpy.ndarray[numpy.uint8]
-
musher.
decode_wav_from_data
(file_data: List[int]) → dict¶ Decode a wav file.
Example
>>> wav_decoded = musher.decode_wav_from_file(path_to_mp3_file) >>> print(wav_decoded) { 'avg_bitrate_kbps': 1411, 'bit_depth': 16, 'channels': 2, 'file_type': 'wav', 'length_in_seconds': 30.0, 'mono': False, 'normalized_samples': array([ [ 0., 0., 0., ..., -0.33203125, -0.32833862, -0.3274536 ], [ 0., 0., 0., ..., -0.29162598, -0.27130127, -0.25457764] ], dtype=float32), 'sample_rate': 44100, 'samples_per_channel': 1323000, 'stereo': True }
See notes for extra details.
- Parameters
file_data (List[int]) – WAV file data.
- Returns
.wav file information.
- Return type
dict
-
musher.
decode_wav_from_file
(file_path: str) → dict¶ Overloaded wrapper around DecodeWav that accepts a file path to a .wav file.
See
musher.decode_wav_from_data()
for an example.See notes for extra details.
- Parameters
file_path (str) – File path to a .wav file.
- Returns
.wav file information.
- Return type
dict
-
musher.
decode_mp3_from_file
(file_path: str) → dict¶ Decode an mp3 file.
Example
>>> mp3_decoded = musher.decode_mp3_from_file(audio_file_data) >>> print(mp3_decoded) { 'avg_bitrate_kbps': 1411, 'channels': 2, 'file_type': 'mp3', 'length_in_seconds': 30.0, 'mono': False, 'normalized_samples': array([ [ 0., 0., 0., ..., -0.33203125, -0.32833862, -0.3274536 ], [ 0., 0., 0., ..., -0.29162598, -0.27130127, -0.25457764] ], dtype=float32), 'sample_rate': 44100, 'samples_per_channel': 1323000, 'stereo': True }
See notes for extra details.
- Parameters
file_path (str) – File path to a .mp3 file.
- Returns
.mp3 file information.
- Return type
dict
NOTES¶
audio_decoded = musher.decode_wav_from_file(abs_audio_file_path)
# Will be audio channel 1.
channel_one = audio_decoded["normalized_samples"][0]
# Will be audio channel 2 OR will not exist if mono audio (audio_decoded["mono"]==False).
channel_two_maybe = audio_decoded["normalized_samples"][1]
Framecutter¶
-
class
musher.
Framecutter
(self: musher.musher_python.Framecutter, buffer: List[float], frame_size: int = 1024, hop_size: int = 512, start_from_center: bool = True, last_frame_to_end_of_file: bool = False, valid_frame_threshold_ratio: float = 0.0) → None¶ This class is an iterator.
Examples
Iterate over the class like so:
>>> buffer = [1., 2., 3., 4., 5.] >>> framecutter = musher.Framecutter(buffer, 3, 2, True, False, 0.) >>> for frame in framecutter: ... print(frame) ... [0.0, 0.0, 1.0] [1.0, 2.0, 3.0] [3.0, 4.0, 5.0] [5.0, 0.0, 0.0]
Construct a new Framecutter object
- Parameters
buffer (List[float]) – Buffer from which to read data.
frame_size (int, optional) – Output frame size. Defaults to 1024.
hop_size (int, optional) – Hop size between frames. Defaults to 512.
start_from_center (bool, optional) – If true start from the center of the buffer (zero-centered at -frameSize/2) or if false the first frame at time 0 (centered at frameSize/2). Defaults to True.
last_frame_to_end_of_file (bool, optional) – Whether the beginning of the last frame should reach the end of file. Only applicable if start_from_center is false. Defaults to False.
valid_frame_threshold_ratio (List[float], optional) – frames smaller than this ratio will be discarded, those larger will be zero-padded to a full frame. (i.e. a value of 0 will never discard frames and a value of 1 will only keep frames that are of length ‘frameSize’). Defaults to 0.0.
- Returns
Cut frame.
- Return type
List[float]
HPCP¶
-
musher.
hpcp
(frequencies: List[float], magnitudes: List[float], size: int = 12, reference_frequency: float = 440.0, harmonics: int = 0, band_preset: bool = True, band_split_frequency: float = 500.0, min_frequency: float = 40.0, max_frequency: float = 5000.0, _weight_type: str = 'squared cosine', window_size: float = 1.0, max_shifted: bool = False, non_linear: bool = False, _normalized: str = 'unit max') → numpy.ndarray[numpy.float64]¶ Computes a Harmonic Pitch Class Profile (HPCP) from the spectral peaks of a signal.
HPCP is a k*12 dimensional list which represents the intensities of the twelve (k==1) semitone pitch classes (corresponsing to notes from A to G#), or subdivisions of these (k>1).
- Parameters
frequencies (List[float]) – Frequencies (positions) of the spectral peaks [Hz].
magnitudes (List[float]) – Magnitudes (heights) of the spectral peaks.
size (int, optional) – Size of the output HPCP (must be a positive nonzero multiple of 12). Defaults to 12.
reference_frequency (float, optional) – Reference frequency for semitone index calculation, corresponding to A3 [Hz]. Defaults to 440.0.
harmonics (int, optional) – Number of harmonics for frequency contribution, 0 indicates exclusive fundamental frequency contribution. Defaults to 0.
band_preset (bool, optional) – Enables whether to use a band preset. Defaults to True.
band_split_frequency (float, optional) – Split frequency for low and high bands, not used if bandPreset is false [Hz]. Defaults to 500.0.
min_frequency (float, optional) – Minimum frequency that contributes to the HPCP [Hz] (the difference between the min and split frequencies must not be less than 200.0 Hz). Defaults to 40.0.
max_frequency (float, optional) – Maximum frequency that contributes to the HPCP [Hz] (the difference between the max and split frequencies must not be less than 200.0 Hz). Defaults to 5000.0.
_weight_type (str, optional) – Type of weighting function for determining frequency contribution. Defaults to ‘squared cosine’.
window_size (float, optional) – Size, in semitones, of the window used for the weighting. Defaults to 1.0.
max_shifted (bool, optional) – Whether to shift the HPCP list so that the maximum peak is at index 0. Defaults to False.
non_linear (bool, optional) – Apply non-linear post-processing to the output (use with _normalized=’unit max’). Boosts values close to 1, decreases values close to 0. Defaults to False.
_normalized (str, optional) – Whether to normalize the HPCP list. Defaults to ‘unit max’.
- Returns
Resulting harmonic pitch class profile.
- Return type
numpy.ndarray[numpy.float64]
-
musher.
hpcp_from_peaks
(peaks: musher.musher_python.peaks, size: int = 12, reference_frequency: float = 440.0, harmonics: int = 0, band_preset: bool = True, band_split_frequency: float = 500.0, min_frequency: float = 40.0, max_frequency: float = 5000.0, _weight_type: str = 'squared cosine', window_size: float = 1.0, max_shifted: bool = False, non_linear: bool = False, _normalized: str = 'unit max') → numpy.ndarray[numpy.float64]¶ Overloaded function for HPCP that accepts a list of peaks.
Refer to original HPCP function for more details.
- Parameters
peaks (musher.peaks) – list of spectral peaks, each peak being a tuple (frequency, magnitude).
size (int, optional) – Size of the output HPCP (must be a positive nonzero multiple of 12). Defaults to 12.
reference_frequency (float, optional) – Reference frequency for semitone index calculation, corresponding to A3 [Hz]. Defaults to 440.0.
harmonics (int, optional) – Number of harmonics for frequency contribution, 0 indicates exclusive fundamental frequency contribution. Defaults to 0.
band_preset (bool, optional) – Enables whether to use a band preset. Defaults to True.
band_split_frequency (float, optional) – Split frequency for low and high bands, not used if bandPreset is false [Hz]. Defaults to 500.0.
min_frequency (float, optional) – Minimum frequency that contributes to the HPCP [Hz] (the difference between the min and split frequencies must not be less than 200.0 Hz). Defaults to 40.0.
max_frequency (float, optional) – Maximum frequency that contributes to the HPCP [Hz] (the difference between the max and split frequencies must not be less than 200.0 Hz). Defaults to 5000.0.
_weight_type (str, optional) – Type of weighting function for determining frequency contribution. Defaults to ‘squared cosine’.
window_size (float, optional) – Size, in semitones, of the window used for the weighting. Defaults to 1.0.
max_shifted (bool, optional) – Whether to shift the HPCP list so that the maximum peak is at index 0. Defaults to False.
non_linear (bool, optional) – Apply non-linear post-processing to the output (use with _normalized=’unit max’). Boosts values close to 1, decreases values close to 0. Defaults to False.
_normalized (str, optional) – Whether to normalize the HPCP list. Defaults to ‘unit max’.
- Returns
Resulting harmonic pitch class profile.
- Return type
numpy.ndarray[numpy.float64]
Mono Mixer¶
-
musher.
mono_mixer
(input: List[List[float]]) → numpy.ndarray[numpy.float64]¶ Downmixes the signal into a single channel given a stereo signal.
If the signal was already a monoaural, it is left unchanged.
- Parameters
input (List[List[float]]) – Stereo or mono audio signal
- Returns
Downmixed audio signal
- Return type
numpy.ndarray[numpy.float64]
Key¶
-
musher.
estimate_key
(pcp: List[float], use_polphony: bool = True, use_three_chords: bool = True, num_harmonics: int = 4, slope: float = 0.6, profile_type: str = 'Bgate', use_maj_min: bool = False) → dict¶ Computes key estimate given a pitch class profile (HPCP).
- Parameters
pcp (List[float]) – The input pitch class profile.
use_polphony (bool, optional) – Enables the use of polyphonic profiles to define key profiles (this includes the contributions from triads as well as pitch harmonics). Defaults to True.
use_three_chords (bool, optional) – Consider only the 3 main triad chords of the key (T, D, SD) to build the polyphonic profiles. Defaults to True.
num_harmonics (int, optional) – Number of harmonics that should contribute to the polyphonic profile (1 only considers the fundamental harmonic). Defaults to 4.
slope (float, optional) – Value of the slope of the exponential harmonic contribution to the polyphonic profile. Defaults to 0.6.
profile_type (str, optional) – The type of polyphic profile to use for correlation calculation. Defaults to ‘Bgate’.
use_maj_min (bool, optional) – Use a third profile called ‘majmin’ for ambiguous tracks. Only available for the edma, bgate and braw profiles. Defaults to False.
- Returns
Details of key estimate.
- Return type
KeyOutput
-
musher.
detect_key
(normalized_samples: List[List[float]], sample_rate: float = 44100.0, profile_type: str = 'Bgate', use_polphony: bool = True, use_three_chords: bool = True, num_harmonics: int = 4, slope: float = 0.6, use_maj_min: bool = False, pcp_size: int = 36, frame_size: int = 4096, hop_size: int = 512, window_type_func: Callable[[List[float]], List[float]] = <built-in method of PyCapsule object at 0x7f6bb3f2d360>, max_num_peaks: int = 100, window_size: float = 0.5) → dict¶ Computes key estimate given normalized samples.
- Parameters
normalized_samples (List[List[float]]) – The input pitch class profile.
sample_rate (float, optional) – Sampling rate of the audio signal [Hz]. Defaults to 44100.0.
profile_type (str, optional) – The type of polyphic profile to use for correlation calculation. Defaults to ‘Bgate’.
use_polphony (bool, optional) – Enables the use of polyphonic profiles to define key profiles (this includes the contributions from triads as well as pitch harmonics). Defaults to True.
use_three_chords (bool, optional) – Consider only the 3 main triad chords of the key (T, D, SD) to build the polyphonic profiles. Defaults to True.
num_harmonics (int, optional) – Number of harmonics that should contribute to the polyphonic profile (1 only considers the fundamental harmonic). Defaults to 4.
slope (float, optional) – Value of the slope of the exponential harmonic contribution to the polyphonic profile. Defaults to 0.6.
use_maj_min (bool, optional) – Use a third profile called ‘majmin’ for ambiguous tracks. Only available for the edma, bgate and braw profiles. Defaults to False.
pcp_size (int, optional) – Number of array elements used to represent a semitone times 12. Defaults to 36.
frame_size (int, optional) – Output frame size of framecutter. Defaults to 4096.
hop_size (int, optional) – Hop size between frames of framecutter. Defaults to 512.
window_type_func (Callable[[List[float]], List[float]], optional) – The window type function. Examples: BlackmanHarris92dB, BlackmanHarris62dB… Defaults to BlackmanHarris62dB.
max_num_peaks (int, optional) – Maximum number of returned peaks (set to 0 to return all peaks) for spectral peaks. Defaults to 100.
window_size (float, optional) – Size, in semitones, of the window used for the weighting for HPCP. Defaults to 0.5.
- Returns
Details of key estimate.
- Return type
KeyOutput
Spectrum¶
-
musher.
convert_to_frequency_spectrum
(audio_frame: List[float]) → numpy.ndarray[numpy.float64]¶ Computes the frequency spectrum of an array of Reals.
The resulting spectrum has a size which is half the size of the input array plus one. Bins contain raw (linear) magnitude values.
- Parameters
frame (List[float]) – Input audio frame.
- Returns
Frequency spectrum of the input audio signal.
- Return type
numpy.ndarray[numpy.float64]
Spectral Peaks¶
-
musher.
spectral_peaks
(input_spectrum: List[float], threshold: float = - 1000.0, sort_by: str = 'position', max_num_peaks: int = 100, sample_rate: float = 44100.0, min_pos: int = 0, max_pos: int = 0) → musher.musher_python.peaks¶ Extracts peaks from a spectrum.
It is important to note that the peak algorithm is independent of an input that is linear or in dB, so one has to adapt the threshold to fit with the type of data fed to it. The algorithm relies on PeakDetect algorithm which is run with parabolic interpolation [1]. The exactness of the peak-searching depends heavily on the windowing type. It gives best results with dB input, a blackman-harris 92dB window and interpolation set to true. According to [1], spectral peak frequencies tend to be about twice as accurate when dB magnitude is used rather than just linear magnitude. For further information about the peak detection, see the description of the PeakDetection algorithm.
References:
[1] Peak Detection, http://ccrma.stanford.edu/~jos/parshl/Peak_Detection_Steps_3.html
- Parameters
input_spectrum (List[float]) – Input spectrum.
threshold (float, optional) – Peaks below this given threshold are not outputted. Defaults to -1000.0.
sort_by (str, optional) – Ordering type of the outputted peaks (ascending by frequency (position) or descending by magnitude (height)). Defaults to ‘position’.
max_num_peaks (int, optional) – Maximum number of returned peaks (set to 0 to return all peaks). Defaults to 100.
sample_rate (float, optional) – Sampling rate of the audio signal [Hz]. Defaults to 44100.0.
min_pos (int, optional) – Maximum frequency (position) of the range to evaluate [Hz]. Defaults to 0.
max_pos (int, optional) – Minimum frequency (position) of the range to evaluate [Hz]. Defaults to 0.
- Returns
List of spectral peaks, each peak being a tuple (frequency, magnitude).
- Return type
musher.peaks
Windowing¶
-
musher.
square
(window: List[float]) → numpy.ndarray[numpy.float64]¶ Square windowing function.
- Parameters
window (List[float]) – Audio signal window.
- Returns
Square window.
- Return type
numpy.ndarray[numpy.float64]
-
musher.
blackmanharris
(window: List[float], a0: float, a1: float, a2: float, a3: float) → numpy.ndarray[numpy.float64]¶ Blackmanharris windowing algorithm.
Window functions help control spectral leakage when doing Fourier Analysis.
- Parameters
window (List[float]) – Audio signal window.
a0 (float) – Constant a0.
a1 (float) – Constant a1.
a2 (float) – Constant a2.
a3 (float) – Constant a3.
- Returns
BlackmanHarris window.
- Return type
numpy.ndarray[numpy.float64]
-
musher.
blackmanharris62dB
(window: List[float]) → numpy.ndarray[numpy.float64]¶ Blackmanharris62db windowing algorithm.
- Parameters
window (List[float]) – Audio signal window.
- Returns
Blackmanharris62db window.
- Return type
numpy.ndarray[numpy.float64]
-
musher.
blackmanharris92dB
(window: List[float]) → numpy.ndarray[numpy.float64]¶ Blackmanharris92db windowing algorithm.
- Parameters
window (List[float]) – Audio signal window.
- Returns
Blackmanharris92db window.
- Return type
numpy.ndarray[numpy.float64]
-
musher.
windowing
(audio_frame: List[float], window_type_func: Callable[[List[float]], List[float]] = <built-in method of PyCapsule object at 0x7f6bb4611f00>, zero_padding_size: int = 0, zero_phase: bool = True, _normalize: bool = True) → numpy.ndarray[numpy.float64]¶ Applies windowing to an audio signal.
It optionally applies zero-phase windowing and optionally adds zero-padding. The resulting windowed frame size is equal to the incoming frame size plus the number of padded zeros. By default, the available windows are normalized (to have an area of 1) and then scaled by a factor of 2.
References:
[1] F. J. Harris, On the use of windows for harmonic analysis with the discrete Fourier transform, Proceedings of the IEEE, vol. 66, no. 1, pp. 51-83, Jan. 1978
[2] Window function - Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/Window_function
- Parameters
audio_frame (List[float]) – Input audio frame.
window_type_func (Callable[[List[float]], List[float]], optional) – The window type function. Examples: BlackmanHarris92dB, BlackmanHarris62dB… Defaults to BlackmanHarris92dB.
zero_padding_size (int, optional) – Size of the zero-padding. Defaults to 0.
zero_phase (bool, optional) – Enables zero-phase windowing. Defaults to True.
_normalize (bool, optional) – Specify whether to normalize windows (to have an area of 1) and then scale by a factor of 2. Defaults to True.
- Returns
Windowed audio frame.
- Return type
numpy.ndarray[numpy.float64]