Skip to main content

machine_learning

Challenge

Imported from local notes.md.

Solution

Original Notes

machine_learning

Challenge Summary

  • Given: A challenge description pointing to a public repo entry for Machine Learnding, which initially exposed a broken Google Drive model archive and was later replaced locally with the correct silly_fella.zip asset.
  • Goal: Recover the flag hidden in the provided "LLM" artifact.
  • Constraints: The originally published artifact was incomplete as hosted; no endpoint or prompt interface was provided outside the model files.

Initial Recon / Triage

  • Observations:
    • The repo path for the challenge only contains gdrivelink.txt.
    • Downloading the linked Drive folder yields exactly one file: Machine Learnding/silly_fella_1.5B.zip.
    • The ZIP is missing its central directory and the model payload stream ends mid-deflate.
  • File identification:
    • config.json identifies the model family as Qwen/Qwen2.5-1.5B with model_type: qwen2.
    • generation_config.json is standard and contains no challenge-specific metadata.
    • model.safetensors has a valid safetensors header but only a truncated payload.
  • Entry points:
    • Recover what can be extracted from the ZIP.
    • Compare the recovered model bytes against the public base Qwen weights.

Hypotheses & Approach

  • Hypothesis 1: The flag was embedded directly in model metadata, prompt text, or other plaintext inside the archive.
  • Hypothesis 2: The model was a modified derivative of Qwen and the flag would be recoverable by comparing it to the public base model.

Execution Steps (Reproducible)

Stage 1

Commands:

cd /root/dawg2026CTF/machine_learning/starting_files
python3 -m venv /tmp/ml-venv
/tmp/ml-venv/bin/pip install gdown
/tmp/ml-venv/bin/gdown --folder 'https://drive.google.com/drive/folders/18n42cn_8_LD9yECya0ohBlr1S8lE-1mw?usp=drive_link'

cd '/root/dawg2026CTF/machine_learning/starting_files/Machine Learnding'
7z l silly_fella_1.5B.zip
7z x -y silly_fella_1.5B.zip -oextracted

Results:

  • The Drive folder contains only silly_fella_1.5B.zip.
  • 7z can list local ZIP entries, but extraction reports Unexpected end of archive and Data Error : silly_fella_1.5B/model.safetensors.
  • Recoverable files:
    • extracted/silly_fella_1.5B/config.json
    • extracted/silly_fella_1.5B/generation_config.json
    • extracted/silly_fella_1.5B/model.safetensors

Stage 2

Commands:

cd '/root/dawg2026CTF/machine_learning/starting_files/Machine Learnding'
python3 -c "import struct, zlib; data=open('silly_fella_1.5B.zip','rb').read(); i=758; hdr=data[i:i+30]; sig,ver,flags,method,mtime,mdate,crc,csize,usize,nlen,elen=struct.unpack('<IHHHHHIIIHH', hdr); start=i+30+nlen+elen; comp=data[start:]; d=zlib.decompressobj(-15); out=d.decompress(comp); print(method, hex(flags), len(comp), len(out), d.eof)"

curl -L -r 0-288356779 'https://huggingface.co/Qwen/Qwen2.5-1.5B/resolve/main/model.safetensors' -o /tmp/qwen_head_fullcmp.bin
python3 -c "from pathlib import Path; import struct, numpy as np; bp=Path('/tmp/qwen_head_fullcmp.bin'); cp=Path('/root/dawg2026CTF/machine_learning/starting_files/Machine Learnding/extracted/silly_fella_1.5B/model.safetensors'); fb=bp.open('rb'); nb=struct.unpack('<Q', fb.read(8))[0]; fc=cp.open('rb'); nc=struct.unpack('<Q', fc.read(8))[0]; remain=(cp.stat().st_size-8-nc)//2; fb.seek(8+nb); fc.seek(8+nc); chunk=10_000_000; pos=0; bad=None;\nwhile pos<remain:\n n=min(chunk, remain-pos); base=np.frombuffer(fb.read(n*2), dtype=np.uint16); chall=np.frombuffer(fc.read(n*2), dtype=np.uint16); conv=((base.astype(np.uint32)<<16).view(np.float32)).astype(np.float16).view(np.uint16); neq=np.nonzero(conv!=chall)[0];\n if len(neq): bad=pos+int(neq[0]); print('mismatch_at', bad); break\n pos+=n\nprint('result', 'match' if bad is None else 'mismatch')"

Results:

  • The model stream is truncated mid-deflate: decompression produces only the front 288356441 bytes and zlib reports eof False.
  • The expected challenge safetensors size is exactly the public Qwen file size minus the header delta caused by BF16 -> F16 dtype strings.
  • The entire recovered overlap of tensor data matches the public Qwen2.5-1.5B safetensors when the public model is converted from BF16 to FP16.
  • No challenge-specific strings, prompts, metadata, or modified weights were found in the published portion of the artifact.

Stage 3

Commands:

curl -L 'https://huggingface.co/Qwen/Qwen2.5-1.5B/resolve/main/model.safetensors' -o /root/dawg2026CTF/machine_learning/artifacts/qwen2.5-1.5b-base-model.safetensors
python3 /root/dawg2026CTF/machine_learning/artifacts/test_zip_streams.py
python3 -u /root/dawg2026CTF/machine_learning/artifacts/compare_model_stream.py

Results:

  • The small ZIP members (config.json and generation_config.json) are compressed with ordinary raw-deflate and reproduce exactly with standard zlib compression levels.
  • The model.safetensors stream in the challenge archive is an exact prefix of the raw-deflate stream obtained by:
    • taking the public Qwen2.5-1.5B model.safetensors,
    • replacing the safetensors header with the shorter F16 version, and
    • converting the full tensor payload from BF16 to FP16 before compressing with standard zlib deflate.
  • The comparison matched every byte of the provided compressed model stream through offset 221838506; the challenge file simply stops there while the standard compressor would continue for another ~2.4 MB of compressed output.
  • This means the published archive is not a custom finetuned model in the accessible region and is not using an unusual compression side channel there; it is a truncated standard ZIP of a stock FP16-converted Qwen model.

Stage 4

Commands:

cd /root/dawg2026CTF/machine_learning/starting_files
7z x -y silly_fella.zip -oextracted_full 'merged_qwen_model/model.safetensors'

python3 -u /root/dawg2026CTF/machine_learning/artifacts/compare_tensors.py > /tmp/ml_compare_tensors.txt 2>&1

rm -rf /root/mlinfer && python3 -m venv /root/mlinfer
TMPDIR=/root/tmp /root/mlinfer/bin/pip install --upgrade pip
TMPDIR=/root/tmp /root/mlinfer/bin/pip install --index-url https://download.pytorch.org/whl/cpu torch
TMPDIR=/root/tmp /root/mlinfer/bin/pip install transformers safetensors

/root/mlinfer/bin/python /root/dawg2026CTF/machine_learning/artifacts/stream_prompt.py 'Finish this sentence exactly: The flag is a red and white striped' 8

Results:

  • The corrected silly_fella.zip archive contained a full merged model rather than the earlier truncated ZIP prefix.
  • Tensor comparison against base Qwen showed broad weight changes, confirming this was a genuine merged/finetuned model and not just a repack.
  • Local CPU inference against the corrected model exposed the hidden answer path, and the final flag value was confirmed.
  • Final flag: DawgCTF{Astr4l_Pr0j3ct_Th1s!}. #

Artifacts Produced

  • starting_files/silly_fella.zip
  • starting_files/extracted_full/merged_qwen_model/model.safetensors
  • starting_files/Machine Learnding/silly_fella_1.5B.zip
  • starting_files/Machine Learnding/extracted/silly_fella_1.5B/config.json
  • starting_files/Machine Learnding/extracted/silly_fella_1.5B/generation_config.json
  • starting_files/Machine Learnding/extracted/silly_fella_1.5B/model.safetensors
  • artifacts/zip_forensics.py
  • artifacts/compare_headers.py
  • artifacts/test_zip_streams.py
  • artifacts/compare_model_stream.py
  • artifacts/compare_tensors.py
  • artifacts/run_model_prompts.py
  • artifacts/run_single_prompt.py
  • artifacts/stream_prompt.py
  • artifacts/continue_assistant_prefix.py
  • artifacts/qwen2.5-1.5b-base-model.safetensors

Flag

DawgCTF{Astr4l_Pr0j3ct_Th1s!}.  #