machine_learning

Challenge

Imported from local notes.md.

Solution

Original Notes

machine_learning

Challenge Summary

Given: A challenge description pointing to a public repo entry for Machine Learnding, which initially exposed a broken Google Drive model archive and was later replaced locally with the correct silly_fella.zip asset.
Goal: Recover the flag hidden in the provided "LLM" artifact.
Constraints: The originally published artifact was incomplete as hosted; no endpoint or prompt interface was provided outside the model files.

Initial Recon / Triage

Observations:
- The repo path for the challenge only contains gdrivelink.txt.
- Downloading the linked Drive folder yields exactly one file: Machine Learnding/silly_fella_1.5B.zip.
- The ZIP is missing its central directory and the model payload stream ends mid-deflate.
File identification:
- config.json identifies the model family as Qwen/Qwen2.5-1.5B with model_type: qwen2.
- generation_config.json is standard and contains no challenge-specific metadata.
- model.safetensors has a valid safetensors header but only a truncated payload.
Entry points:
- Recover what can be extracted from the ZIP.
- Compare the recovered model bytes against the public base Qwen weights.

Hypotheses & Approach

Hypothesis 1: The flag was embedded directly in model metadata, prompt text, or other plaintext inside the archive.
Hypothesis 2: The model was a modified derivative of Qwen and the flag would be recoverable by comparing it to the public base model.

Execution Steps (Reproducible)

Stage 1

Commands:

cd /root/dawg2026CTF/machine_learning/starting_files
python3 -m venv /tmp/ml-venv
/tmp/ml-venv/bin/pip install gdown
/tmp/ml-venv/bin/gdown --folder 'https://drive.google.com/drive/folders/18n42cn_8_LD9yECya0ohBlr1S8lE-1mw?usp=drive_link'

cd '/root/dawg2026CTF/machine_learning/starting_files/Machine Learnding'
7z l silly_fella_1.5B.zip
7z x -y silly_fella_1.5B.zip -oextracted

Results:

The Drive folder contains only silly_fella_1.5B.zip.
7z can list local ZIP entries, but extraction reports Unexpected end of archive and Data Error : silly_fella_1.5B/model.safetensors.
Recoverable files:
- extracted/silly_fella_1.5B/config.json
- extracted/silly_fella_1.5B/generation_config.json
- extracted/silly_fella_1.5B/model.safetensors

Stage 2

Commands:

cd '/root/dawg2026CTF/machine_learning/starting_files/Machine Learnding'
python3 -c "import struct, zlib; data=open('silly_fella_1.5B.zip','rb').read(); i=758; hdr=data[i:i+30]; sig,ver,flags,method,mtime,mdate,crc,csize,usize,nlen,elen=struct.unpack('<IHHHHHIIIHH', hdr); start=i+30+nlen+elen; comp=data[start:]; d=zlib.decompressobj(-15); out=d.decompress(comp); print(method, hex(flags), len(comp), len(out), d.eof)"

curl -L -r 0-288356779 'https://huggingface.co/Qwen/Qwen2.5-1.5B/resolve/main/model.safetensors' -o /tmp/qwen_head_fullcmp.bin
python3 -c "from pathlib import Path; import struct, numpy as np; bp=Path('/tmp/qwen_head_fullcmp.bin'); cp=Path('/root/dawg2026CTF/machine_learning/starting_files/Machine Learnding/extracted/silly_fella_1.5B/model.safetensors'); fb=bp.open('rb'); nb=struct.unpack('<Q', fb.read(8))[0]; fc=cp.open('rb'); nc=struct.unpack('<Q', fc.read(8))[0]; remain=(cp.stat().st_size-8-nc)//2; fb.seek(8+nb); fc.seek(8+nc); chunk=10_000_000; pos=0; bad=None;\nwhile pos<remain:\n n=min(chunk, remain-pos); base=np.frombuffer(fb.read(n*2), dtype=np.uint16); chall=np.frombuffer(fc.read(n*2), dtype=np.uint16); conv=((base.astype(np.uint32)<<16).view(np.float32)).astype(np.float16).view(np.uint16); neq=np.nonzero(conv!=chall)[0];\n if len(neq): bad=pos+int(neq[0]); print('mismatch_at', bad); break\n pos+=n\nprint('result', 'match' if bad is None else 'mismatch')"

Results:

The model stream is truncated mid-deflate: decompression produces only the front 288356441 bytes and zlib reports eof False.
The expected challenge safetensors size is exactly the public Qwen file size minus the header delta caused by BF16 -> F16 dtype strings.
The entire recovered overlap of tensor data matches the public Qwen2.5-1.5B safetensors when the public model is converted from BF16 to FP16.
No challenge-specific strings, prompts, metadata, or modified weights were found in the published portion of the artifact.

Stage 3

Commands:

curl -L 'https://huggingface.co/Qwen/Qwen2.5-1.5B/resolve/main/model.safetensors' -o /root/dawg2026CTF/machine_learning/artifacts/qwen2.5-1.5b-base-model.safetensors
python3 /root/dawg2026CTF/machine_learning/artifacts/test_zip_streams.py
python3 -u /root/dawg2026CTF/machine_learning/artifacts/compare_model_stream.py

Results:

The small ZIP members (config.json and generation_config.json) are compressed with ordinary raw-deflate and reproduce exactly with standard zlib compression levels.
The model.safetensors stream in the challenge archive is an exact prefix of the raw-deflate stream obtained by:
- taking the public Qwen2.5-1.5B model.safetensors,
- replacing the safetensors header with the shorter F16 version, and
- converting the full tensor payload from BF16 to FP16 before compressing with standard zlib deflate.
The comparison matched every byte of the provided compressed model stream through offset 221838506; the challenge file simply stops there while the standard compressor would continue for another ~2.4 MB of compressed output.
This means the published archive is not a custom finetuned model in the accessible region and is not using an unusual compression side channel there; it is a truncated standard ZIP of a stock FP16-converted Qwen model.

Stage 4

Commands:

cd /root/dawg2026CTF/machine_learning/starting_files
7z x -y silly_fella.zip -oextracted_full 'merged_qwen_model/model.safetensors'

python3 -u /root/dawg2026CTF/machine_learning/artifacts/compare_tensors.py > /tmp/ml_compare_tensors.txt 2>&1

rm -rf /root/mlinfer && python3 -m venv /root/mlinfer
TMPDIR=/root/tmp /root/mlinfer/bin/pip install --upgrade pip
TMPDIR=/root/tmp /root/mlinfer/bin/pip install --index-url https://download.pytorch.org/whl/cpu torch
TMPDIR=/root/tmp /root/mlinfer/bin/pip install transformers safetensors

/root/mlinfer/bin/python /root/dawg2026CTF/machine_learning/artifacts/stream_prompt.py 'Finish this sentence exactly: The flag is a red and white striped' 8

Results:

The corrected silly_fella.zip archive contained a full merged model rather than the earlier truncated ZIP prefix.
Tensor comparison against base Qwen showed broad weight changes, confirming this was a genuine merged/finetuned model and not just a repack.
Local CPU inference against the corrected model exposed the hidden answer path, and the final flag value was confirmed.
Final flag: DawgCTF{Astr4l_Pr0j3ct_Th1s!}. #

Artifacts Produced

starting_files/silly_fella.zip
starting_files/extracted_full/merged_qwen_model/model.safetensors
starting_files/Machine Learnding/silly_fella_1.5B.zip
starting_files/Machine Learnding/extracted/silly_fella_1.5B/config.json
starting_files/Machine Learnding/extracted/silly_fella_1.5B/generation_config.json
starting_files/Machine Learnding/extracted/silly_fella_1.5B/model.safetensors
artifacts/zip_forensics.py
artifacts/compare_headers.py
artifacts/test_zip_streams.py
artifacts/compare_model_stream.py
artifacts/compare_tensors.py
artifacts/run_model_prompts.py
artifacts/run_single_prompt.py
artifacts/stream_prompt.py
artifacts/continue_assistant_prefix.py
artifacts/qwen2.5-1.5b-base-model.safetensors

Flag

DawgCTF{Astr4l_Pr0j3ct_Th1s!}.  #

Challenge​

Solution​

Original Notes​

machine_learning

Challenge Summary​

Initial Recon / Triage​

Hypotheses & Approach​

Execution Steps (Reproducible)​

Stage 1​

Stage 2​

Stage 3​

Stage 4​

Artifacts Produced​

Flag​

Challenge

Solution

Original Notes

Challenge Summary

Initial Recon / Triage

Hypotheses & Approach

Execution Steps (Reproducible)

Stage 1

Stage 2

Stage 3

Stage 4

Artifacts Produced

Flag