Loading writeup...

Loading writeups...

SSTV Audio Challenge

Amir Aliu|25/02/26|1 min read

Share Share Share

Challenge Information

Artifact: output.flac
Duration: ~115 seconds
Sample Rate: 48000 Hz
Channels: Mono

Step 1 - Initial Inspection

The file was:

output.flac - about 115 seconds long.

My first thought was:

Maybe spoken audio?
Maybe morse?
Maybe spectrogram art?

So I tried direct transcription:

Python

from mutagen.flac import FLAC
 
f = FLAC("output.flac")
print("length", f.info.length)
print("sample_rate", f.info.sample_rate)
print("channels", f.info.channels)
print("tags", dict(f.tags) if f.tags else {})

It returned:

Text

length 3.713832199546485
sample_rate 22050
channels 1
tags {}

Then I tried speech recognition:

Python

import speech_recognition as sr
 
r = sr.Recognizer()
 
with sr.AudioFile("output.flac") as source:
    audio = r.record(source)
 
print(r.recognize_google(audio))

It returned:

Text

TV 459 amazing sound

TV seemed like a hint, but nothing practically useful yet.

Step 2 - Spectrogram Analysis

When audio doesn't make sense audibly, I always check the frequency domain.

Installed matplotlib:

Bash

python -m pip install matplotlib

Then generated a spectrogram:

Python

import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import soundfile as sf
 
x, sr = sf.read('output.flac')
plt.figure(figsize=(18,6))
plt.specgram(x, NFFT=2048, Fs=sr, noverlap=1024, cmap='magma')
plt.ylim(900, 2500)
plt.tight_layout()
plt.savefig('spectrogram_900_2500.png', dpi=180)

It generated this spectrogram:

The moment I saw the structured horizontal tone patterns between 900-2500 Hz, I thought:

That looks like SSTV (Slow-scan television).

And once you see SSTV patterns, you can't unsee them.

Step 3 - SSTV Decoder

I installed an SSTV decoder:

Bash

python -m pip install git+https://github.com/colaclanth/sstv.git

Tried CLI:

Bash

sstv -d output.flac -o decoded.png

But Windows threw a terminal handle error. So instead of wasting time debugging the CLI, I switched to using the Python API directly.

Step 4 - Forcing Modes

Automatic VIS detection returned unsupported code 26. That meant either:

Corrupted VIS
Custom mode
Or detection failure

So I brute-forced common SSTV modes:

Python

from sstv.decode import SSTVDecoder
import sstv.decode as dec
from sstv import spec
 
dec.log_message = lambda *a, **k: None
dec.progress_bar = lambda *a, **k: None
 
modes = [
    ('M1', spec.M1), ('M2', spec.M2), ('S1', spec.S1),
    ('S2', spec.S2), ('SDX', spec.SDX), ('R36', spec.R36), ('R72', spec.R72)
]
 
for name, mode in modes:
    d = SSTVDecoder('output.flac')
    h = d._find_header()
 
    if h is None:
        print(name, 'no header')
        continue
 
    d.mode = mode
    vis_end = h + round(spec.VIS_BIT_SIZE * 9 * d._sample_rate)
    img_data = d._decode_image_data(vis_end)
    img = d._draw_image(img_data)
    out = f'decoded_forced_{name}.png'
    img.save(out)
    print(name, 'saved', out)

Mode M1 produced a clean readable image. That confirmed the hypothesis.

Final Flag

Text

THJCC{sSTv-is_aMaZINg}

Related Writeups

25/05/26 | 1 min read

BSides Prishtina 2026 CTF Writeups

Crypto, forensics, misc, OSINT, pwn, reverse engineering, and web solves from BSides Prishtina 2026.

16/05/26 | 1 min read

TJCTF 2026 CTF Writeup

Challenge writeups from TJCTF 2026.

25/02/26 | 1 min read

THJCC 2026 CTF Writeup

Layered forensic and steganography solves from THJCC 2026.

Loading writeup...

SSTV Audio Challenge

Amir Aliu|25/02/26|1 min read

Share Share Share

Challenge Information

Artifact: output.flac
Duration: ~115 seconds
Sample Rate: 48000 Hz
Channels: Mono

Step 1 - Initial Inspection

The file was:

output.flac - about 115 seconds long.

My first thought was:

Maybe spoken audio?
Maybe morse?
Maybe spectrogram art?

So I tried direct transcription:

Python

from mutagen.flac import FLAC
 
f = FLAC("output.flac")
print("length", f.info.length)
print("sample_rate", f.info.sample_rate)
print("channels", f.info.channels)
print("tags", dict(f.tags) if f.tags else {})

It returned:

Text

length 3.713832199546485
sample_rate 22050
channels 1
tags {}

Then I tried speech recognition:

Python

import speech_recognition as sr
 
r = sr.Recognizer()
 
with sr.AudioFile("output.flac") as source:
    audio = r.record(source)
 
print(r.recognize_google(audio))

It returned:

Text

TV 459 amazing sound

TV seemed like a hint, but nothing practically useful yet.

Step 2 - Spectrogram Analysis

When audio doesn't make sense audibly, I always check the frequency domain.

Installed matplotlib:

Bash

python -m pip install matplotlib

Then generated a spectrogram:

Python

import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import soundfile as sf
 
x, sr = sf.read('output.flac')
plt.figure(figsize=(18,6))
plt.specgram(x, NFFT=2048, Fs=sr, noverlap=1024, cmap='magma')
plt.ylim(900, 2500)
plt.tight_layout()
plt.savefig('spectrogram_900_2500.png', dpi=180)

It generated this spectrogram:

The moment I saw the structured horizontal tone patterns between 900-2500 Hz, I thought:

That looks like SSTV (Slow-scan television).

And once you see SSTV patterns, you can't unsee them.

Step 3 - SSTV Decoder

I installed an SSTV decoder:

Bash

python -m pip install git+https://github.com/colaclanth/sstv.git

Tried CLI:

Bash

sstv -d output.flac -o decoded.png

But Windows threw a terminal handle error. So instead of wasting time debugging the CLI, I switched to using the Python API directly.

Step 4 - Forcing Modes

Automatic VIS detection returned unsupported code 26. That meant either:

Corrupted VIS
Custom mode
Or detection failure

So I brute-forced common SSTV modes:

Python

from sstv.decode import SSTVDecoder
import sstv.decode as dec
from sstv import spec
 
dec.log_message = lambda *a, **k: None
dec.progress_bar = lambda *a, **k: None
 
modes = [
    ('M1', spec.M1), ('M2', spec.M2), ('S1', spec.S1),
    ('S2', spec.S2), ('SDX', spec.SDX), ('R36', spec.R36), ('R72', spec.R72)
]
 
for name, mode in modes:
    d = SSTVDecoder('output.flac')
    h = d._find_header()
 
    if h is None:
        print(name, 'no header')
        continue
 
    d.mode = mode
    vis_end = h + round(spec.VIS_BIT_SIZE * 9 * d._sample_rate)
    img_data = d._decode_image_data(vis_end)
    img = d._draw_image(img_data)
    out = f'decoded_forced_{name}.png'
    img.save(out)
    print(name, 'saved', out)

Mode M1 produced a clean readable image. That confirmed the hypothesis.

Final Flag

Text

THJCC{sSTv-is_aMaZINg}

Related Writeups

25/05/26 | 1 min read

BSides Prishtina 2026 CTF Writeups

Crypto, forensics, misc, OSINT, pwn, reverse engineering, and web solves from BSides Prishtina 2026.

16/05/26 | 1 min read

TJCTF 2026 CTF Writeup

Challenge writeups from TJCTF 2026.

25/02/26 | 1 min read

THJCC 2026 CTF Writeup

Layered forensic and steganography solves from THJCC 2026.