machine_learning
Challenge
Imported from local notes.md.
Solution
Original Notes
machine_learning
Challenge Summary
- Given: A challenge description pointing to a public repo entry for
Machine Learnding, which initially exposed a broken Google Drive model archive and was later replaced locally with the correctsilly_fella.zipasset. - Goal: Recover the flag hidden in the provided "LLM" artifact.
- Constraints: The originally published artifact was incomplete as hosted; no endpoint or prompt interface was provided outside the model files.
Initial Recon / Triage
- Observations:
- The repo path for the challenge only contains
gdrivelink.txt. - Downloading the linked Drive folder yields exactly one file:
Machine Learnding/silly_fella_1.5B.zip. - The ZIP is missing its central directory and the model payload stream ends mid-deflate.
- The repo path for the challenge only contains
- File identification:
config.jsonidentifies the model family asQwen/Qwen2.5-1.5Bwithmodel_type: qwen2.generation_config.jsonis standard and contains no challenge-specific metadata.model.safetensorshas a valid safetensors header but only a truncated payload.
- Entry points:
- Recover what can be extracted from the ZIP.
- Compare the recovered model bytes against the public base Qwen weights.
Hypotheses & Approach
- Hypothesis 1: The flag was embedded directly in model metadata, prompt text, or other plaintext inside the archive.
- Hypothesis 2: The model was a modified derivative of Qwen and the flag would be recoverable by comparing it to the public base model.
Execution Steps (Reproducible)
Stage 1
Commands:
cd /root/dawg2026CTF/machine_learning/starting_files
python3 -m venv /tmp/ml-venv
/tmp/ml-venv/bin/pip install gdown
/tmp/ml-venv/bin/gdown --folder 'https://drive.google.com/drive/folders/18n42cn_8_LD9yECya0ohBlr1S8lE-1mw?usp=drive_link'
cd '/root/dawg2026CTF/machine_learning/starting_files/Machine Learnding'
7z l silly_fella_1.5B.zip
7z x -y silly_fella_1.5B.zip -oextracted
Results:
- The Drive folder contains only
silly_fella_1.5B.zip. 7zcan list local ZIP entries, but extraction reportsUnexpected end of archiveandData Error : silly_fella_1.5B/model.safetensors.- Recoverable files:
extracted/silly_fella_1.5B/config.jsonextracted/silly_fella_1.5B/generation_config.jsonextracted/silly_fella_1.5B/model.safetensors
Stage 2
Commands:
cd '/root/dawg2026CTF/machine_learning/starting_files/Machine Learnding'
python3 -c "import struct, zlib; data=open('silly_fella_1.5B.zip','rb').read(); i=758; hdr=data[i:i+30]; sig,ver,flags,method,mtime,mdate,crc,csize,usize,nlen,elen=struct.unpack('<IHHHHHIIIHH', hdr); start=i+30+nlen+elen; comp=data[start:]; d=zlib.decompressobj(-15); out=d.decompress(comp); print(method, hex(flags), len(comp), len(out), d.eof)"
curl -L -r 0-288356779 'https://huggingface.co/Qwen/Qwen2.5-1.5B/resolve/main/model.safetensors' -o /tmp/qwen_head_fullcmp.bin
python3 -c "from pathlib import Path; import struct, numpy as np; bp=Path('/tmp/qwen_head_fullcmp.bin'); cp=Path('/root/dawg2026CTF/machine_learning/starting_files/Machine Learnding/extracted/silly_fella_1.5B/model.safetensors'); fb=bp.open('rb'); nb=struct.unpack('<Q', fb.read(8))[0]; fc=cp.open('rb'); nc=struct.unpack('<Q', fc.read(8))[0]; remain=(cp.stat().st_size-8-nc)//2; fb.seek(8+nb); fc.seek(8+nc); chunk=10_000_000; pos=0; bad=None;\nwhile pos<remain:\n n=min(chunk, remain-pos); base=np.frombuffer(fb.read(n*2), dtype=np.uint16); chall=np.frombuffer(fc.read(n*2), dtype=np.uint16); conv=((base.astype(np.uint32)<<16).view(np.float32)).astype(np.float16).view(np.uint16); neq=np.nonzero(conv!=chall)[0];\n if len(neq): bad=pos+int(neq[0]); print('mismatch_at', bad); break\n pos+=n\nprint('result', 'match' if bad is None else 'mismatch')"
Results:
- The model stream is truncated mid-deflate: decompression produces only the front
288356441bytes andzlibreportseof False. - The expected challenge safetensors size is exactly the public Qwen file size minus the header delta caused by
BF16->F16dtype strings. - The entire recovered overlap of tensor data matches the public
Qwen2.5-1.5Bsafetensors when the public model is converted from BF16 to FP16. - No challenge-specific strings, prompts, metadata, or modified weights were found in the published portion of the artifact.
Stage 3
Commands:
curl -L 'https://huggingface.co/Qwen/Qwen2.5-1.5B/resolve/main/model.safetensors' -o /root/dawg2026CTF/machine_learning/artifacts/qwen2.5-1.5b-base-model.safetensors
python3 /root/dawg2026CTF/machine_learning/artifacts/test_zip_streams.py
python3 -u /root/dawg2026CTF/machine_learning/artifacts/compare_model_stream.py
Results:
- The small ZIP members (
config.jsonandgeneration_config.json) are compressed with ordinary raw-deflate and reproduce exactly with standard zlib compression levels. - The
model.safetensorsstream in the challenge archive is an exact prefix of the raw-deflate stream obtained by:- taking the public
Qwen2.5-1.5Bmodel.safetensors, - replacing the safetensors header with the shorter
F16version, and - converting the full tensor payload from BF16 to FP16 before compressing with standard zlib deflate.
- taking the public
- The comparison matched every byte of the provided compressed model stream through offset
221838506; the challenge file simply stops there while the standard compressor would continue for another ~2.4 MB of compressed output. - This means the published archive is not a custom finetuned model in the accessible region and is not using an unusual compression side channel there; it is a truncated standard ZIP of a stock FP16-converted Qwen model.
Stage 4
Commands:
cd /root/dawg2026CTF/machine_learning/starting_files
7z x -y silly_fella.zip -oextracted_full 'merged_qwen_model/model.safetensors'
python3 -u /root/dawg2026CTF/machine_learning/artifacts/compare_tensors.py > /tmp/ml_compare_tensors.txt 2>&1
rm -rf /root/mlinfer && python3 -m venv /root/mlinfer
TMPDIR=/root/tmp /root/mlinfer/bin/pip install --upgrade pip
TMPDIR=/root/tmp /root/mlinfer/bin/pip install --index-url https://download.pytorch.org/whl/cpu torch
TMPDIR=/root/tmp /root/mlinfer/bin/pip install transformers safetensors
/root/mlinfer/bin/python /root/dawg2026CTF/machine_learning/artifacts/stream_prompt.py 'Finish this sentence exactly: The flag is a red and white striped' 8
Results:
- The corrected
silly_fella.ziparchive contained a full merged model rather than the earlier truncated ZIP prefix. - Tensor comparison against base Qwen showed broad weight changes, confirming this was a genuine merged/finetuned model and not just a repack.
- Local CPU inference against the corrected model exposed the hidden answer path, and the final flag value was confirmed.
- Final flag:
DawgCTF{Astr4l_Pr0j3ct_Th1s!}. #
Artifacts Produced
starting_files/silly_fella.zipstarting_files/extracted_full/merged_qwen_model/model.safetensorsstarting_files/Machine Learnding/silly_fella_1.5B.zipstarting_files/Machine Learnding/extracted/silly_fella_1.5B/config.jsonstarting_files/Machine Learnding/extracted/silly_fella_1.5B/generation_config.jsonstarting_files/Machine Learnding/extracted/silly_fella_1.5B/model.safetensorsartifacts/zip_forensics.pyartifacts/compare_headers.pyartifacts/test_zip_streams.pyartifacts/compare_model_stream.pyartifacts/compare_tensors.pyartifacts/run_model_prompts.pyartifacts/run_single_prompt.pyartifacts/stream_prompt.pyartifacts/continue_assistant_prefix.pyartifacts/qwen2.5-1.5b-base-model.safetensors
Flag
DawgCTF{Astr4l_Pr0j3ct_Th1s!}. #