
Basic Pitch by Spotify: Turning raw audio into expressive MIDI in seconds
Why Real-Time Audio-to-MIDI Conversion Still Matters in 2025
In a world flooded with AI music generators like Suno and Udio, you might think audio-to-MIDI conversion is old news. But MIDI still sits at the heart of music creation — editable, symbolic, and programmable.
Whether you’re remixing vocals, training a machine learning model, or transcribing a jam session, you still need tools to translate performance into data. Spotify’s Basic Pitch addresses this crucial need with surprising elegance and openness.
What Is Basic Pitch?
Basic Pitch is an open-source neural network model developed by Spotify to convert audio recordings (WAV, MP3, FLAC) into MIDI files — quickly, accurately, and polyphonically.
Unlike older pitch tracking systems that rely on traditional signal processing (e.g., FFT, autocorrelation), Basic Pitch uses deep learning to achieve more nuanced transcription:
- Detects notes, onsets, offsets
- Supports pitch bends and vibrato
- Works in real-time or batch mode
- Licensed under Apache 2.0 — free for commercial use
It’s not just another hobby project. Basic Pitch is part of Spotify’s larger R&D effort to make music more programmable and machine-readable.
Under the Hood: How Basic Pitch Works Technically
Let’s break it down.
1. CREPE-Based Architecture
At its core, Basic Pitch is based on CREPE (Convolutional Representation for Pitch Estimation), a neural pitch tracker originally developed by Google.
- It uses 1D convolutional layers to map audio to pitch contours.
- The model can detect pitch in 20ms frames.
- Unlike traditional pitch detection methods, CREPE generalizes well to expressive and noisy audio.
2. Note Event Decoder
On top of CREPE’s pitch prediction, Spotify added an additional module:
- Onset and Offset Detectors trained via supervised learning.
- Converts continuous pitch into discrete MIDI notes with accurate start and end timing.
- Handles overlapping notes (polyphony) and expressive features like slides and vibrato.
3. TensorFlow Implementation
The model is written in TensorFlow 2.x and can run on both CPU and GPU. It’s packaged with a simple CLI tool, but you can also:
- Integrate into a web server with TensorFlow.js
- Export to ONNX for cross-platform deployment
- Use saved model weights for inference in edge devices
Key Features at a Glance
| Feature | Description |
|---|---|
| 🧠 Neural Pitch Tracking | Deep learning-based pitch estimation from raw waveform |
| 🎹 Polyphonic Support | Extracts multiple simultaneous notes (e.g., chords) |
| 🎧 Real-Time Capability | Fast enough for live input and interaction |
| 🎼 Pitch Bend Detection | Captures glides and expressive modulations |
| 🎛️ Multi-format Input | Supports WAV, MP3, FLAC, M4A |
| 💾 Output Formats | Standard MIDI (.mid), CSV, NPZ feature files |
| 🆓 License | Apache 2.0 — commercial use permitted |
Step-by-Step Guide: Installing and Using Basic Pitch
🔌 Installation (Python CLI)
You’ll need Python ≥3.8 and pip.
pip install basic-pitchTo verify installation:
basic-pitch --help🎵 Converting a File
basic-pitch my_song.wavThis command will output:
my_song.midmy_song.csv(note details)my_song.npz(model features)
🛠️ Optional Flags
| Flag | Purpose |
|---|---|
--onset-threshold | Adjust note detection sensitivity |
--save-midi | Save .mid file |
--save-note-events | Output structured note data as CSV |
--output-dir | Specify output folder |
Live Demo via Web Interface
Spotify provides a simple [Basic Pitch Demo Page] where you can:
- Upload an audio file (max 20MB)
- View pitch analysis in-browser
- Download MIDI results
Great for non-technical users or quick testing.
⚠️ Limitations:
- No batch processing
- Limited to 1-2 minutes of audio
- No advanced export options
Practical Applications Across Music Fields
Basic Pitch isn’t just for coders. Here’s how different users leverage it:
| User Type | Use Case Example |
|---|---|
| 🎹 Songwriters | Hum or sing a melody, convert to MIDI for DAW editing |
| 👨🏫 Music Teachers | Turn student voice/guitar into notation for feedback |
| 🤖 AI Developers | Generate symbolic music datasets for training deep models |
| 🎮 Game Developers | Convert sound effects into MIDI triggers for reactive gameplay |
| 🎧 Remix Artists | Extract hooks or riffs for remixing and layering |
Real-World Example: Melody to MIDI
Let’s say you recorded yourself humming this melody:
🎤 melody.mp3
You run:
basic-pitch melody.mp3Output:
melody.mid— usable in Ableton, Logic, FL Studio- Pitch bend and timing preserved
- Resulting MIDI can trigger synths, drums, visuals
Result: A fully editable MIDI line from nothing but your voice.
Integrating Basic Pitch Into Your App or Workflow
Web-Based API Wrapper (Flask Example)
from flask import Flask, request, send_file
from basic_pitch.inference import predict_and_save
app = Flask(__name__)
@app.route("/convert", methods=["POST"])
def convert():
audio = request.files["file"]
output_path = "/tmp/output.mid"
predict_and_save([audio], output_path=output_path)
return send_file(output_path)
app.run()You can connect this to a React front-end, use it in real-time music education tools, or build batch conversion services.
Basic Pitch vs Competitors: The Full Comparison
| Feature | Basic Pitch | Melodyne Studio | Ableton Audio-to-MIDI | Spleeter + MIDIify |
|---|---|---|---|---|
| Cost | Free, open-source | Paid (~$699) | Included in Ableton Suite | Free, open-source |
| Polyphonic Support | Yes | Yes | Partial | Yes |
| Pitch Bend / Vibrato | Yes | Yes | No | No |
| Drum/Beat Support | No | Yes | Limited | Partial |
| Real-Time Processing | Yes | No | No | No |
| Batch Processing | Yes | Yes | No | Yes |
| Platform Integration | CLI / Python / Web | DAW plugin | DAW internal | CLI / API |
Conclusion:
Basic Pitch hits a unique sweet spot — free, accurate, polyphonic, and developer-friendly. While Melodyne is still king for detailed DAW editing, it comes with a steep price tag and learning curve.
User Feedback: What Real Users Say
We gathered testimonials from early adopters across music communities:
🧑🎤 “I used Basic Pitch to transcribe a full vocal take into MIDI. It got all my slides and even subtle pitch bends — amazing for a free tool.”
— Julia R., singer-songwriter
🎚️ “For my AI music app, I needed symbolic data from real audio. Basic Pitch let me batch-convert thousands of vocal clips into MIDI in a day.”
— Dev Singh, ML engineer
🎼 “It’s not perfect for dense chords or noisy mixes, but for single-instrument tracks, it’s shockingly accurate.”
— Miguel O., music teacher
Limitations to Keep in Mind
Despite its strengths, Basic Pitch does have limits:
- ❌ Doesn’t support percussion/drums
- 🎹 Dense polyphonic textures (e.g., orchestras) may degrade accuracy
- 💻 No native graphical interface (CLI or code only)
- 🌐 Web demo is limited in file size and duration
- 🔉 Sensitive to background noise and compression artifacts
To overcome these, many users pre-process their audio (denoising, EQing) before conversion.
Future Directions and Community Forks
As of early 2025, there are several forks and enhancements built on Basic Pitch:
- Real-Time DAW Plugins — using TensorFlow Lite and VST bridges
- WebAssembly Ports — for browser-native inference
- Multilingual Pitch Models — training for non-Western scales and tunings
- On-Device iOS Apps — live conversion from mic input to MIDI
Spotify has hinted at continued support via its Research division, but the community has already taken the lead.
Final Verdict: Who Should Use Basic Pitch?
You should absolutely consider using Basic Pitch if you are:
✅ A music app developer needing high-volume, reliable MIDI conversion
✅ A music educator looking to engage students through tech
✅ A producer or remixer who works with real vocals and instruments
✅ An AI builder creating datasets from live performance audio
But skip it if you need:
❌ Drum pattern extraction
❌ Precision editing of dense orchestral content
❌ Full GUI workflows without coding
References
Basic Pitch by Spotify
Spotify’s open-source audio-to-MIDI conversion model for expressive, polyphonic audio
https://github.com/spotify/basic-pitch
Spotify R&D: Real-Time Pitch Estimation
Overview of CREPE and neural pitch detection techniques used in Basic Pitch
https://research.atspotify.com
Melodyne Studio
High-end commercial audio-to-MIDI and pitch editing software
https://www.celemony.com/en/melodyne
Ableton Live Audio-to-MIDI Guide
DAW-integrated conversion tools in Ableton Suite
https://www.ableton.com/en/live/
Spleeter + MIDIify Workflow
Open-source separation + MIDI conversion approach
https://github.com/deezer/spleeter
Tags
#Spotify, #BasicPitch, #AudioToMIDI, #MusicAI, #OpenSource, #DigitalMusic, #TensorFlow, #MusicTechnology, #RealtimeAudio, #MIDITools




Leave a Reply