gomel

module

v0.0.10 Latest Latest Go to latest Published: Mar 13, 2026 License: MIT

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/neurlang/gomel

Links

Open Source Insights

README ¶

Golang Mel Spectrogram

Audio-to-spectrogram and spectrogram-to-audio conversion library and command-line tools for Go.

I am neural network, and I'm learning. I hope you learn with me.

Features

Mel Spectrograms: Perceptually-motivated frequency representation for audio processing
Phase-Preserving Spectrograms: High-quality audio reconstruction without phase estimation
Bidirectional Conversion: Audio → Spectrogram → Audio
Multiple Formats: Support for WAV and FLAC input files
PNG Export: Save spectrograms as images for visualization or ML applications

Installation

go get github.com/neurlang/gomel

Command-Line Tools

tomel - Audio to Mel Spectrogram

Convert audio files to mel spectrogram images:

tomel <audio_file>

Example:

tomel glados-1609757458000_.wav
# Creates: glados-1609757458000_.wav.png

Supports .wav and .flac files. If no extension is provided, .wav is assumed.

towav - Mel Spectrogram to Audio

Reconstruct audio from mel spectrogram images using Griffin-Lim algorithm:

towav <png_file> [sample_rate]

Example:

towav spectrogram.png 44100
# Creates: spectrogram.png.wav

Default sample rate: 44100 Hz

tophase - Audio to Phase Spectrogram

Convert audio files to phase-preserving spectrogram images:

tophase <audio_file>

Example:

tophase audio.flac
# Creates: audio.flac.png

Phase spectrograms retain both magnitude and phase information for lossless reconstruction.

fromphase - Phase Spectrogram to Audio

Reconstruct audio from phase-preserving spectrogram images:

fromphase <png_file> [sample_rate]

Example:

fromphase spectrogram.png 48000
# Creates: spectrogram.png.wav

Default sample rate: 44100 Hz

Library API

Mel Package

import "github.com/neurlang/gomel/mel"

// Create mel spectrogram processor
m := mel.NewMel()

// Configure parameters
m.NumMels = 192              // Number of mel bands
m.MelFmin = 0                // Minimum frequency (Hz)
m.MelFmax = 16000            // Maximum frequency (Hz)
m.Window = 1280              // Window size
m.Resolut = 4096             // FFT resolution
m.GriffinLimIterations = 2   // Iterations for audio reconstruction
m.YReverse = true            // Reverse Y-axis in image
m.VolumeBoost = 0.0          // Volume adjustment for reconstruction
m.SampleRate = 44100         // Output sample rate

// Convert audio file to mel spectrogram image
err := m.ToMelWav("input.wav", "output.png")
err := m.ToMelFlac("input.flac", "output.png")

// Convert spectrogram image back to audio
err := m.ToWavPng("input.png", "output.wav")

// Low-level API: work with buffers
samples := mel.LoadWav("input.wav")
samples := mel.LoadFlac("input.flac")

melSpec, err := m.ToMel(samples)
audioSamples, err := m.FromMel(melSpec)

err := mel.SaveWav("output.wav", audioSamples, 44100)

// Get image buffer
imageData := m.Image(melSpec)

Phase Package

import "github.com/neurlang/gomel/phase"

// Create phase spectrogram processor
p := phase.NewPhase()

// Configure parameters
p.NumFreqs = 768             // Number of frequency bins
p.Window = 1280              // Window size
p.Resolut = 4096             // FFT resolution
p.YReverse = false           // Reverse Y-axis in image
p.VolumeBoost = 4.0          // Volume adjustment for reconstruction
p.SampleRate = 44100         // Output sample rate

// Convert audio file to phase spectrogram image
err := p.ToPhaseWav("input.wav", "output.png")
err := p.ToPhaseFlac("input.flac", "output.png")

// Convert spectrogram image back to audio
err := p.ToWavPng("input.png", "output.wav")

// Low-level API: work with buffers
samples := phase.LoadWav("input.wav")
samples := phase.LoadFlac("input.flac")

// Load with sample rate
samples, sampleRate, err := phase.LoadWavSampleRate("input.wav")
samples, sampleRate, err := phase.LoadFlacSampleRate("input.flac")

phaseSpec, err := p.ToPhase(samples)
audioSamples, err := p.FromPhase(phaseSpec)

err := phase.SaveWav("output.wav", audioSamples, 44100)

// Get image buffer
imageData := p.Image(phaseSpec)

Mel vs Phase Spectrograms

Feature	Mel Spectrogram	Phase Spectrogram
Frequency Scale	Perceptual (mel scale)	Linear
Phase Information	Lost	Preserved
Reconstruction	Griffin-Lim (iterative)	Direct (no iteration)
Audio Quality	Good for speech/music	Excellent, near-lossless
Use Case	ML/AI, speech recognition	High-fidelity audio processing
File Size	Smaller	Larger

Requirements

Go 1.0 or higher

Dependencies

github.com/faiface/beep - Audio I/O
github.com/r9y9/gossp - STFT implementation
github.com/mjibson/go-dsp - FFT operations
github.com/mewkiz/flac - FLAC decoding
github.com/x448/float16 - Float16 support

License

See LICENSE file for details.

Directories ¶

Path	Synopsis
cmd
fromphase command Command fromphase converts phase-preserving spectrogram images (PNG) back to audio files (WAV).	Command fromphase converts phase-preserving spectrogram images (PNG) back to audio files (WAV).
tomel command Command tomel converts audio files (WAV/FLAC) to mel spectrogram images (PNG).	Command tomel converts audio files (WAV/FLAC) to mel spectrogram images (PNG).
tophase command Command tophase converts audio files (WAV/FLAC) to phase-preserving spectrogram images (PNG).	Command tophase converts audio files (WAV/FLAC) to phase-preserving spectrogram images (PNG).
towav command Command towav converts mel spectrogram images (PNG) back to audio files (WAV).	Command towav converts mel spectrogram images (PNG) back to audio files (WAV).
mel Package mel provides mel-frequency spectrogram generation and audio synthesis.	Package mel provides mel-frequency spectrogram generation and audio synthesis.
phase Package phase provides phase-preserving spectrogram generation and audio synthesis.	Package phase provides phase-preserving spectrogram generation and audio synthesis.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL