Audio Decoders

struct musher::core::AudioDecoded

Decoded audio file information that is common across all audio files.

Subclassed by musher::core::Mp3Decoded, musher::core::WavDecoded

Public Members

uint32_t sample_rate

Sampling rate of the audio signal [Hz].

int channels

Number of audio channels in the buffer.

bool mono

True is audio is mono.

bool stereo

True if audio is stereo.

int samples_per_channel

Number of samples per channel.

double length_in_seconds

Detailed description after the member. Based on the number of samples and the sample rate.

std::string file_type

Type of the file decoded.

int avg_bitrate_kbps

Average bitrate of the buffer [kbps].

std::vector<std::vector<double>> normalized_samples

Normalized samples of the audio file.

normalized_samples[0] holds channel 1

normalized_samples[1] holds channel 2 (Will not exist if mono audio)

struct musher::core::WavDecoded : public musher::core::AudioDecoded

Decoded WAV file information.

Contains same attributes as AudioDecoded.

Public Members

int bit_depth

Bit depth of each sample.

struct Mp3Decoded : public musher::core::AudioDecoded

Decoded Mp3 file information.

Contains same attributes as AudioDecoded.

std::vector<uint8_t> musher::core::LoadAudioFile(const std::string &file_path)

Load the data from an audio file.

Return

std::vector<uint8_t> Audio file data.

Parameters
  • file_path: File path to a .wav file.

WavDecoded musher::core::DecodeWav(const std::vector<uint8_t> &file_data)

Decode a wav file.

Return

WavDecoded .wav file information.

Parameters
  • file_data: WAV file data.

WavDecoded musher::core::DecodeWav(const std::string &file_path)

Overloaded wrapper around DecodeWav that accepts a file path to a .wav file.

Return

WavDecoded .wav file information.

Parameters
  • file_path: File path to a .wav file.

Mp3Decoded musher::core::DecodeMp3(const std::string file_path)

Decode an mp3 file.

Return

Mp3Decoded .mp3 file information.

Parameters
  • file_path: File path to a .mp3 file.

FFT Convolve

std::vector<double> musher::core::CenterVector(const std::vector<double> &vec, size_t new_shape)

Centered a vector with respect to the full discrete linear convolution of the input.

Return

std::vector<double> Centered vector.

Parameters
  • vec: Vector

  • new_shape: New shape of of vector.

std::vector<double> musher::core::FFTConvolve(const std::vector<double> &vec1, const std::vector<double> &vec2)

Perform ‘same’ convolve of two 1-dimensional arrays using FFT.

Convolve vec1 and vec2 using the fast Fourier transform method. The output is the same size as vec1, centered with respect to the full discrete linear convolution of the inputs.

This function was heavily inspired by: https://github.com/scipy/scipy/blob/12fa74e97d3d18ca3a4e6991327663e88462f238/scipy/signal/signaltools.py#L551 https://github.com/scipy/scipy/blob/master/scipy/fft/_pocketfft/pypocketfft.cxx

Return

std::vector<double> A 1-dimensional array containing a subset of the discrete linear convolution of vec1 with vec2.

Parameters
  • vec1: Vector 1

  • vec2: Vector 2

Framecutter

class musher::core::Framecutter

This class should be treated like an iterator.

Framecutter framecutter(audio_signal);

for (const std::vector<double> &frame : framecutter) {
    perform_work_on_frame(frame);
}

Public Functions

Framecutter(const std::vector<double> buffer, int frame_size = 1024, int hop_size = 512, bool start_from_center = true, bool last_frame_to_end_of_file = false, double valid_frame_threshold_ratio = 0.)

Construct a new Framecutter object.

Parameters
  • buffer: Buffer from which to read data.

  • frame_size: Output frame size.

  • hop_size: Hop size between frames.

  • start_from_center: If true start from the center of the buffer (zero-centered at -frameSize/2) or if false the first frame at time 0 (centered at frameSize/2).

  • last_frame_to_end_of_file: Whether the beginning of the last frame should reach the end of file. Only applicable if start_from_center is false.

  • valid_frame_threshold_ratio: Frames smaller than this ratio will be discarded, those larger will be zero-padded to a full frame. (i.e. a value of 0 will never discard frames and a value of 1 will only keep frames that are of length ‘frameSize’)

std::vector<double> operator*() const

Each iteration returns a frame.

Return

std::vector<double> Cut frame.

std::vector<double> compute()

Computes the actual slicing of the frames, this function is run on each iteration to calculate the next frame.

This function should not be called by the user, it will be called internally while iterating.

Return

std::vector<double> Sliced frame.

HPCP

int musher::core::ArgMax(const std::vector<double> &input)

Get the arg max of a vector.

Checks if the vector is empty first.

Return

int Arg max

Parameters
  • vec: Vector

template<typename T>
void musher::core::NormalizeInPlace(std::vector<T> &vec)

Normalize a vector so its largest value gets mapped to 1.

If zero, the vector isn’t touched.

Template Parameters
  • T:

Parameters
  • vec: Vector to normalize.

template<typename T>
void musher::core::NormalizeSumInPlace(std::vector<T> &vec)

Normalize a vector so it’s sum is equal to 1.

The vector is not touched if it contains negative elements or the sum is zero.

Template Parameters
  • T:

Parameters
  • vec: Vector to normalize.

void musher::core::AddContributionWithWeight(double freq, double mag_lin, double reference_frequency, double window_size, WeightType weight_type, double harmonic_weight, std::vector<double> &hpcp)

Add contribution to the HPCP with weight.

Parameters
  • freq: Frequency [Hz]

  • mag_lin: Magnitude

  • reference_frequency: Reference frequency for semitone index calculation, corresponding to A3 [Hz].

  • window_size: Size, in semitones, of the window used for the weighting.

  • weight_type: Type of weighting function for determining frequency contribution.

  • harmonic_weight: Strength/weight of the harmonic.

  • hpcp: Harmonic pitch class profile.

void musher::core::AddContributionWithoutWeight(double freq, double mag_lin, double reference_frequency, double harmonic_weight, std::vector<double> &hpcp)

Add contribution to the HPCP without weight.

Parameters
  • freq: Frequency [Hz]

  • mag_lin: Magnitude

  • reference_frequency: Reference frequency for semitone index calculation, corresponding to A3 [Hz].

  • harmonic_weight: Strength/weight of the harmonic.

  • hpcp: Harmonic pitch class profile.

void musher::core::AddContribution(double freq, double mag_lin, double reference_frequency, double window_size, WeightType weight_type, std::vector<HarmonicPeak> harmonic_peaks, std::vector<double> &hpcp)

Adds the magnitude contribution of the given frequency as the tonic semitone.

As well as its possible contribution as a harmonic of another pitch.

Parameters
  • freq: Frequency [Hz]

  • mag_lin: Magnitude

  • reference_frequency: Reference frequency for semitone index calculation, corresponding to A3 [Hz].

  • window_size: Size, in semitones, of the window used for the weighting.

  • weight_type: Type of weighting function for determining frequency contribution.

  • harmonic_peaks: Weighting table of harmonic contribution.

  • hpcp: Harmonic pitch class profile.

std::vector<HarmonicPeak> musher::core::InitHarmonicContributionTable(int harmonics)

Builds a weighting table of harmonic contribution.

Higher harmonics contribute less and the fundamental frequency has a full harmonic strength of 1.0.

Return

std::vector<HarmonicPeak> Weighting table of harmonic contribution.

Parameters
  • harmonics: Number of harmonics for frequency contribution, 0 indicates exclusive fundamental frequency contribution.

std::vector<double> musher::core::HPCP(const std::vector<double> &frequencies, const std::vector<double> &magnitudes, unsigned int size = 12, double reference_frequency = 440.0, unsigned int harmonics = 0, bool band_preset = true, double band_split_frequency = 500.0, double min_frequency = 40.0, double max_frequency = 5000.0, std::string _weight_type = "squared cosine", double window_size = 1.0, bool max_shifted = false, bool non_linear = false, std::string _normalized = "unit max")

Computes a Harmonic Pitch Class Profile (HPCP) from the spectral peaks of a signal.

HPCP is a k*12 dimensional vector which represents the intensities of the twelve (k==1) semitone pitch classes (corresponsing to notes from A to G#), or subdivisions of these (k>1).

Return

std::vector<double> Resulting harmonic pitch class profile.

Parameters
  • frequencies: Frequencies (positions) of the spectral peaks [Hz].

  • magnitudes: Magnitudes (heights) of the spectral peaks.

  • size: Size of the output HPCP (must be a positive nonzero multiple of 12).

  • reference_frequency: Reference frequency for semitone index calculation, corresponding to A3 [Hz].

  • harmonics: Number of harmonics for frequency contribution, 0 indicates exclusive fundamental frequency contribution.

  • band_preset: Enables whether to use a band preset.

  • band_split_frequency: Split frequency for low and high bands, not used if bandPreset is false [Hz].

  • min_frequency: Minimum frequency that contributes to the HPCP [Hz] (the difference between the min and split frequencies must not be less than 200.0 Hz).

  • max_frequency: Maximum frequency that contributes to the HPCP [Hz] (the difference between the max and split frequencies must not be less than 200.0 Hz).

  • _weight_type: Type of weighting function for determining frequency contribution.

  • window_size: Size, in semitones, of the window used for the weighting.

  • max_shifted: Whether to shift the HPCP vector so that the maximum peak is at index 0.

  • non_linear: Apply non-linear post-processing to the output (use with _normalized=’unit max’). Boosts values close to 1, decreases values close to 0.

  • _normalized: Whether to normalize the HPCP vector.

std::vector<double> musher::core::HPCP(const std::vector<std::tuple<double, double>> &peaks, unsigned int size = 12, double reference_frequency = 440.0, unsigned int harmonics = 0, bool band_preset = true, double band_split_frequency = 500.0, double min_frequency = 40.0, double max_frequency = 5000.0, std::string _weight_type = "squared cosine", double window_size = 1.0, bool max_shifted = false, bool non_linear = false, std::string _normalized = "unit max")

Overloaded function for HPCP that accepts a vector of peaks.

Refer to original HPCP function for more details.

Return

std::vector<double> Resulting harmonic pitch class profile.

Parameters
  • peaks: Vector of spectral peaks, each peak being a tuple (frequency, magnitude).

  • size: Size of the output HPCP (must be a positive nonzero multiple of 12).

  • reference_frequency: Reference frequency for semitone index calculation, corresponding to A3 [Hz].

  • harmonics: Number of harmonics for frequency contribution, 0 indicates exclusive fundamental frequency contribution.

  • band_preset: Enables whether to use a band preset.

  • band_split_frequency: Split frequency for low and high bands, not used if bandPreset is false [Hz].

  • min_frequency: Minimum frequency that contributes to the HPCP [Hz] (the difference between the min and split frequencies must not be less than 200.0 Hz).

  • max_frequency: Maximum frequency that contributes to the HPCP [Hz] (the difference between the max and split frequencies must not be less than 200.0 Hz).

  • _weight_type: Type of weighting function for determining frequency contribution.

  • window_size: Size, in semitones, of the window used for the weighting.

  • max_shifted: Whether to shift the HPCP vector so that the maximum peak is at index 0.

  • non_linear: Apply non-linear post-processing to the output (use with _normalized=’unit max’). Boosts values close to 1, decreases values close to 0.

  • _normalized: Whether to normalize the HPCP vector.

Key

std::vector<std::vector<double>> musher::core::SelectKeyProfile(const std::string profile_type)

Select a key profile given the type.

About the Key Profiles:

  • Diatonic - Binary profile with diatonic notes of both modes. Could be useful for ambient music or diatonic music which is not strictly ‘tonal functional’

  • Tonic Triad - Just the notes of the major and minor chords. Exclusively for testing.

  • Krumhansl - Reference key profiles after cognitive experiments with users. They should work generally fine for pop music.

  • Temperley - Key profiles extracted from corpus analysis of euroclassical music. Therefore, they perform best on this repertoire (especially in minor).

  • Shaath - Profiles based on Krumhansl’s specifically tuned to popular and electronic music.

  • Noland - Profiles from Bach’s ‘Well Tempered Klavier’.

  • Edma - Automatic profiles extracted from corpus analysis of electronic dance music [3]. They normally perform better that Shaath’s

  • Edmm - Automatic profiles extracted from corpus analysis of electronic dance music and manually tweaked according to heuristic observation. It will report major modes (which are poorly represented in EDM) as minor, but improve performance otherwise [3].

  • Braw - Profiles obtained by calculating the median profile for each mode from a subset of BeatPort dataset. There is an extra profile obtained from ambiguous tracks that are reported as minor[4]

  • Bgate - Same as braw but zeroing the 4 less relevant elements of each profile[4]

References: [1] E. Gómez, “Tonal Description of Polyphonic Audio for Music Content

Processing,” INFORMS Journal on Computing, vol. 18, no. 3, pp. 294–304, 2006. [2] D. Temperley, “What’s key for key? The Krumhansl-Schmuckler key-finding algorithm reconsidered”, Music Perception vol. 17, no. 1, pp. 65-100, 1999. [3] Á. Faraldo, E. Gómez, S. Jordà, P.Herrera, “Key Estimation in Electronic” Dance Music. Proceedings of the 38th International Conference on information” Retrieval, pp. 335-347, 2016. [4] Faraldo, Á., Jordà, S., & Herrera, P. (2017, June). A multi-profile method”

for key estimation in edm. In Audio Engineering Society Conference: 2017 AES” International Conference on Semantic Audio. Audio Engineering Society.

essentia: https://github.com/MTG/essentia/blob/master/src/algorithms/tonal/key.cpp

Return

std::vector<std::vector<double>> Key profile

Parameters
  • profile_type: Key profile type.

std::vector<double> musher::core::AddContributionHarmonics(const std::vector<double> &M_chords, const int pitch_class, const double contribution, const int num_harmonics, const double slope)

Add contribution harmonics to chords. Each note contribute to the different harmonics: 1.- first harmonic f -> i 2.- second harmonic 2*f -> i 3.- third harmonic 3*f -> i+7 4.- fourth harmonic 4*f -> i .. The contribution is weighted depending of the slope.

Return

std::vector<double> chords with added contribution harmonics

Parameters
  • chords: Chords

  • pitch_class: pitch class

  • contribution: harmonic contribution

  • num_harmonics: Number of harmonics that should contribute to the polyphonic profile (1 only considers the fundamental harmonic).

  • slope: Value of the slope of the exponential harmonic contribution to the polyphonic profile.

std::vector<double> musher::core::AddMajorTriad(const std::vector<double> &M_chords, const int root, const double contribution, const int num_harmonics, const double slope)

Adds the contribution of a chord with root note ‘root’ to its major triad. A major triad includes notes from three different classes of pitch: the root, the major 3rd and perfect 5th. This is the most relaxed, most consonant chord in all of harmony.

See

http://www.songtrellis.com/directory/1146/chordTypes/majorChordTypes/majorTriad The three notes of the chord have the same weight

Return

std::vector<double> Chords with contribution added to its major triad.

Parameters
  • chords: Chords

  • root: root note

  • contribution: harmonic contribution

  • num_harmonics: Number of harmonics that should contribute to the polyphonic profile (1 only considers the fundamental harmonic).

  • slope: Value of the slope of the exponential harmonic contribution to the polyphonic profile.

std::vector<double> musher::core::AddMinorTriad(const std::vector<double> &M_chords, const int root, const double contribution, const int num_harmonics, const double slope)

Adds the contribution of a chord with root note ‘root’ to its minor triad A minor triad includes notes from three different classes of pitch: the root, the minor 3rd and perfect 5th.

See

http://www.songtrellis.com/directory/1146/chordTypes/minorChordTypes/minorTriadMi The three notes of the chord have the same weight

Return

std::vector<double> Chords with contribution added to its minor triad.

Parameters
  • chords: Chords

  • root: root note

  • contribution: harmonic contribution

  • num_harmonics: Number of harmonics that should contribute to the polyphonic profile (1 only considers the fundamental harmonic).

  • slope: Value of the slope of the exponential harmonic contribution to the polyphonic profile.

std::tuple<std::vector<double>, double, double> musher::core::ResizeProfileToPcpSize(const unsigned int pcp_size, const std::vector<double> &key_profile)

Resizes and interpolates the profiles to fit the pcp size.

Return

std::tuple<std::vector<double>, double, double> Tuple of (resized profile, mean, standard deviation).

Parameters
  • pcp_size: Number of array elements used to represent a semitone times 12.

  • key_profile: Key profile.

double musher::core::StandardDeviation(double mean, const std::vector<double> &vec)

Calculate the standard deviation of a vector.

Return

double Standard devation

Parameters
  • mean: Mean (Average)

  • vec: Vector

KeyOutput musher::core::EstimateKey(const std::vector<double> &pcp, const bool use_polphony = true, const bool use_three_chords = true, const unsigned int num_harmonics = 4, const double slope = 0.6, const std::string profile_type = "Bgate", const bool use_maj_min = false)

Computes key estimate given a pitch class profile (HPCP).

Return

KeyOutput A struct containing the following: key: Estimated key, from A to G. scale: Scale of the key (major or minor). strength: Strength of the estimated key. first_to_second_relative_strength: The relative strength difference between the best estimate and second best estimate of the key.

Parameters
  • pcp: The input pitch class profile.

  • use_polphony: Enables the use of polyphonic profiles to define key profiles (this includes the contributions from triads as well as pitch harmonics).

  • use_three_chords: Consider only the 3 main triad chords of the key (T, D, SD) to build the polyphonic profiles.

  • num_harmonics: Number of harmonics that should contribute to the polyphonic profile (1 only considers the fundamental harmonic).

  • slope: Value of the slope of the exponential harmonic contribution to the polyphonic profile.

  • profile_type: The type of polyphic profile to use for correlation calculation.

  • use_maj_min: Use a third profile called ‘majmin’ for ambiguous tracks [4]. Only available for the edma, bgate and braw profiles.

KeyOutput musher::core::DetectKey(const std::vector<std::vector<double>> &normalized_samples, double sample_rate = 44100., const std::string profile_type = "Bgate", const bool use_polphony = true, const bool use_three_chords = true, const unsigned int num_harmonics = 4, const double slope = 0.6, const bool use_maj_min = false, const unsigned int pcp_size = 36, const int frame_size = 4096, const int hop_size = 512, const std::function<std::vector<double>(const std::vector<double>&)> &window_type_func = BlackmanHarris62dB, unsigned int max_num_peaks = 100, double window_size = .5, )

Computes key estimate given normalized samples.

Return

KeyOutput A struct containing the following: key: Estimated key, from A to G. scale: Scale of the key (major or minor). strength: Strength of the estimated key. first_to_second_relative_strength: The relative strength difference between the best estimate and second best estimate of the key.

Parameters
  • normalized_samples: Normalized samples, either stereo or mono.

  • sample_rate: Sampling rate of the audio signal [Hz].

  • profile_type: The type of polyphic profile to use for correlation calculation.

  • use_polphony: Enables the use of polyphonic profiles to define key profiles (this includes the contributions from triads as well as pitch harmonics).

  • use_three_chords: Consider only the 3 main triad chords of the key (T, D, SD) to build the polyphonic profiles.

  • num_harmonics: Number of harmonics that should contribute to the polyphonic profile (1 only considers the fundamental harmonic).

  • slope: Value of the slope of the exponential harmonic contribution to the polyphonic profile.

  • use_maj_min: Use a third profile called ‘majmin’ for ambiguous tracks [4]. Only available for the edma, bgate and braw profiles.

  • pcp_size: Number of array elements used to represent a semitone times 12.

  • frame_size: Output frame size.

  • hop_size: Hop size between frames.

  • window_type_func: The window type function. Examples: BlackmanHarris92dB, BlackmanHarris62dB…

  • max_num_peaks: Maximum number of returned peaks (set to 0 to return all peaks).

  • window_size: Size, in semitones, of the window used for the weighting.

Mono Mixer

std::vector<double> musher::core::MonoMixer(const std::vector<std::vector<double>> &input)

Downmixes the signal into a single channel given a stereo signal.

If the signal was already a monoaural, it is left unchanged.

Return

std::vector<double> Downmixed audio signal

Parameters
  • input: Stereo or mono audio signal

Peak Detect

std::tuple<double, double> musher::core::QuadraticInterpolation(double a, double b, double y, int middle_point_index)

Interpolate the peak of a parabola given 3 points on the parabola.

α(a) = left point value of parabola

β(b) = middle point value of parabola

γ(y) = right point value of parabola

Interpolated peak location is given in bins (spectral samples) by:

p = 1/2 ((α - γ) / (α - 2β + γ))

The peak magnitude estimate is:

y(p) = β - 1/4(α - γ)p

Smith, J.O. “Quadratic Interpolation of Spectral Peaks”, in Spectral Audio Signal Processing, https://ccrma.stanford.edu/~jos/sasp/Quadratic_Interpolation_Spectral_Peaks.html, online book, 2011 edition, accessed 12/18/2019.

Return

std::tuple<double, double> Tuple of (location (position) of the peak, peak height estimate).

Parameters
  • a: Left point value of parabola.

  • b: Middle point value of parabola.

  • y: Right point value of parabola.

  • middle_point_index: Position of the middle point in the parabola.

std::vector<std::tuple<double, double>> musher::core::PeakDetect(const std::vector<double> &inp, double threshold = -1000.0, bool interpolate = true, std::string sort_by = "position", int max_num_peaks = 0, double range = 0., int min_pos = 0, int max_pos = 0)

Detects local maxima (peaks) in a vector.

The algorithm finds positive slopes and detects a peak when the slope changes sign and the peak is above the threshold.

Return

std::vector<std::tuple<double, double>> Vector of peaks, each peak being a tuple (positions, heights).

Parameters
  • inp: Input vector.

  • threshold: Peaks below this given threshold are not outputted.

  • interpolate: Enables interpolation.

  • sort_by: Ordering type of the outputted peaks (ascending by position or descending by height).

  • max_num_peaks: Maximum number of returned peaks (set to 0 to return all peaks).

  • range: Input range.

  • min_pos: Maximum position of the range to evaluate.

  • max_pos: Minimum position of the range to evaluate.

Spectral Peaks

std::vector<std::tuple<double, double>> musher::core::SpectralPeaks(const std::vector<double> &input_spectrum, double threshold = -1000.0, std::string sort_by = "position", unsigned int max_num_peaks = 100, double sample_rate = 44100., int min_pos = 0, int max_pos = 0)

Extracts peaks from a spectrum.

It is important to note that the peak algorithm is independent of an input that is linear or in dB, so one has to adapt the threshold to fit with the type of data fed to it. The algorithm relies on PeakDetect algorithm which is run with parabolic interpolation [1]. The exactness of the peak-searching depends heavily on the windowing type. It gives best results with dB input, a blackman-harris 92dB window and interpolation set to true. According to [1], spectral peak frequencies tend to be about twice as accurate when dB magnitude is used rather than just linear magnitude. For further information about the peak detection, see the description of the PeakDetection algorithm.

References: [1] Peak Detection, http://ccrma.stanford.edu/~jos/parshl/Peak_Detection_Steps_3.html

Return

std::vector<std::tuple<double, double>> Vector of spectral peaks, each peak being a tuple (frequency, magnitude).

Parameters
  • input_spectrum: Input spectrum.

  • threshold: Peaks below this given threshold are not outputted.

  • sort_by: Ordering type of the outputted peaks (ascending by frequency (position) or descending by magnitude (height)).

  • max_num_peaks: Maximum number of returned peaks (set to 0 to return all peaks).

  • sample_rate: Sampling rate of the audio signal [Hz].

  • min_pos: Maximum frequency (position) of the range to evaluate [Hz].

  • max_pos: Minimum frequency (position) of the range to evaluate [Hz].

Spectrum

double musher::core::Magnitude(const std::complex<double> complex_pair)

Calculate the magnitude (absolute value or modulus) of a complex number.

Return

double The magnitude of a complex number.

Parameters
  • complex_pair: Complex number. Contains 1 real and 1 imaginary number.

double musher::core::NormFct(int inorm, size_t N)
double musher::core::NormFct(int inorm, const pocketfft::shape_t &shape, const pocketfft::shape_t &axes, size_t fct, int delta)
size_t musher::core::NextFastLen(size_t n)

Calculate an efficient length to pad the inputs of the FFT.

Copied from Peter Bell. https://gdoc.pub/doc/e/2PACX-1vR6iXXG1uS9ds47GvDgQk6XtpYzVTtYepu5B8onBrMmoorfKHhnHbN0ArDoXgoA23nZrcrm_DSFMW45

Return

size_t Efficient FFT input size.

Parameters
  • n: Original input size.

std::vector<double> musher::core::ConvertToFrequencySpectrum(const std::vector<double> &audio_frame)

Computes the frequency spectrum of an array of Reals.

The resulting spectrum has a size which is half the size of the input array plus one. Bins contain raw (linear) magnitude values.

Return

std::vector<double> Frequency spectrum of the input audio signal.

Parameters
  • frame: Input audio frame.

Utilities

std::string musher::core::Uint8VectorToHexString(const std::vector<uint8_t> &v)

Convert uint8_t vector to hex string.

Return

std::string string of hex

Parameters
  • uint8_t::vector: vector of uint8_t

std::string musher::core::StrBetweenSQuotes(const std::string &s)

Get string between two single quotes.

NOTE: There must only be 2 quotes in the entire string.

Return

string between single quotes

Parameters
  • s: String that contains 2 single quotes

bool musher::core::IsBigEndian(void)

Check if the architecture of the machine running the code is big endian.

Return

true If big endian.

Return

false If not big endian.

template<typename T>
std::vector<std::vector<T>> musher::core::Deinterweave(const std::vector<T> &interweaved_vector)

Deinterweave a vector in alternating order to form two vectors.

interweaved_vector = {1, 9, 2, 8, 3, 7, 4, 6}

deinterweaved_vector = {
    {1, 2, 3, 4},
    {9, 8, 7, 6}
}

Return

std::vector<std::vector<double>> Deinterweaved vector.

Parameters
  • interweaved_vector: Interleaved vector.

double musher::core::Median(std::vector<double> &inVec)

Compute the median of a vector.

Return

double Median.

Parameters
  • inVec: Input vector.

std::vector<double> musher::core::OnePoleFilter(const std::vector<double> &vec)

Compute a one pole filter on an audio signal.

Return

std::vector<double> Filtered audio signal.

Parameters
  • vec: Audio signal.

Windowing

std::vector<double> musher::core::Square(const std::vector<double> &window)

Square windowing function.

Return

std::vector<double> Square window.

Parameters
  • window: Audio signal window.

std::vector<double> musher::core::BlackmanHarris(const std::vector<double> &window, double a0, double a1, double a2, double a3)

Blackmanharris windowing algorithm.

Window functions help control spectral leakage when doing Fourier Analysis.

Return

std::vector<double> BlackmanHarris window.

Parameters
  • window: Audio signal window.

  • a0: Constant a0.

  • a1: Constant a1.

  • a2: Constant a2.

  • a3: Constant a3.

std::vector<double> musher::core::BlackmanHarris62dB(const std::vector<double> &window)

Blackmanharris62db windowing algorithm.

Return

std::vector<double> Blackmanharris62db window.

Parameters
  • window: Audio signal window.

std::vector<double> musher::core::BlackmanHarris92dB(const std::vector<double> &window)

Blackmanharris92db windowing algorithm.

Return

std::vector<double> Blackmanharris92db window.

Parameters
  • window: Audio signal window.

std::vector<double> musher::core::Normalize(const std::vector<double> &input)

Normalize a vector (to have an area of 1) and then scale by a factor of 2.

Return

std::vector<double> normalized vector.

Parameters
  • input: Input vector.

std::vector<double> musher::core::Windowing(const std::vector<double> &audio_frame, const std::function<std::vector<double>(const std::vector<double>&)> &window_type_func = BlackmanHarris62dB, unsigned zero_padding_size = 0, bool zero_phase = true, bool _normalize = true, )

Applies windowing to an audio signal.

It optionally applies zero-phase windowing and optionally adds zero-padding. The resulting windowed frame size is equal to the incoming frame size plus the number of padded zeros. By default, the available windows are normalized (to have an area of 1) and then scaled by a factor of 2.

References: [1] F. J. Harris, On the use of windows for harmonic analysis with the discrete Fourier transform, Proceedings of the IEEE, vol. 66, no. 1, pp. 51-83, Jan. 1978 [2] Window function - Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/Window_function

Return

std::vector<double> Windowed audio frame.

Parameters
  • audio_frame: Input audio frame.

  • window_type_func: The window type function. Examples: BlackmanHarris92dB, BlackmanHarris62dB…

  • zero_padding_size: Size of the zero-padding.

  • zero_phase: Enables zero-phase windowing.

  • _normalize: Specify whether to normalize windows (to have an area of 1) and then scale by a factor of 2.