Respecting max_size on fragments but not on message #1602

gmabey · 2025-03-07T19:25:03Z

I don't think this is an asyncio issue/question as I think it's above that level.
What I've got is a websocket receiving and consuming fragments from its peer just as it's indicated to in the docs:

  async for frag in web_skt.recv_streaming():
    consume_fragment(frag)

What I don't see (please correct me if I'm missing this) is a way to limit the size of the fragments (so that no peer can use up all of my memory) without also putting a limit on the total message size.
That is, I can certainly set the max_size arg in websockets.asyncio.server.serve() but when the sum total of the size of the fragments (that I have already consumed) reaches that limit, then an exception is raised. Since it's a "streaming" mode, seems like the total size of a message could be large.
Setting max_size=None certainly bypasses that issue, but then I have no way to limit the size of fragments before they (potentially) exhaust my memory.
Or, am I missing something?

The text was updated successfully, but these errors were encountered:

aaugustin · 2025-03-08T15:36:07Z

max_size limits the sum of the size of all fragments belonging to the same message.

This is tested here:

websockets/tests/test_protocol.py

Lines 387 to 404 in 6304d10

    
           def test_server_receives_fragmented_text_over_size_limit(self): 
        
               server = Protocol(SERVER, max_size=3) 
        
               server.receive_data(b"\x01\x82\x00\x00\x00\x00\xf0\x9f") 
        
               self.assertFrameReceived( 
        
                   server, 
        
                   Frame(OP_TEXT, "😀".encode()[:2], fin=False), 
        
               ) 
        
               server.receive_data(b"\x80\x82\x00\x00\x00\x00\x98\x80") 
        
               self.assertIsInstance(server.parser_exc, PayloadTooBig) 
        
               self.assertEqual( 
        
                   str(server.parser_exc), 
        
                   "frame with 2 bytes after reading 2 bytes exceeds limit of 3 bytes", 
        
               ) 
        
               self.assertConnectionFailing( 
        
                   server, 
        
                   CloseCode.MESSAGE_TOO_BIG, 
        
                   "frame with 2 bytes after reading 2 bytes exceeds limit of 3 bytes", 
        
               )

aaugustin · 2025-03-08T15:37:07Z

There's no built-in way to limit the size of each individual fragment. I'm not convinced that this would be greatly useful in general. If you want it, you can simply do:

async for frag in web_skt.recv_streaming():
    if len(frag) > 100_000:
        raise ValueError("fragment too large")
    consume_fragment(frag)

aaugustin · 2025-03-08T15:40:09Z

I'm skeptical because RFC 6455 makes it clear that any intermediary can split or reassemble fragments in any way they want. Enforcing limits at that level seems sketchy.

aaugustin · 2025-04-21T07:41:10Z

Here's a more structured recap of the situation.

Legacy implementation

Since version 3.2 (exactly 9 years ago!) websockets has had max_size and max_queue parameters to limit the size of the buffer of parsed messages. It provides these two parameters because some servers receive very small messages at a high rate and can benefit from a larger buffer e.g. crypto trackers... while others receive larger messages at a slower rate; I've talked to someone building a remote desktop over websockets, where each message contained a video frame!

The legacy implementation contained a buffer of messages after reassembling fragmented messages. The maximum size of the buffer was literally 4 * max_size * max_queue. There's a x4 because each UTF-8 encoded byte can expand into a character that Python stores on 4 bytes in memory and we test the length of the UTF-8 encoded message — we'd have to decode it to tell its length but we cannot process before applying security limits.

The legacy implementation didn't let you access frames with recv_streaming — this was one of the drivers for rewriting it — nor did it provide any control at this level.

Current implementations

The new implementations kept the logic of limiting the size of frames with max_size. When fragmentation is enabled, they keep track of how much room remains and check that incoming fragments don't exceed that limit. This behavior is necessary because fragmentation is uncommon and most users aren't aware that it exists. I don't even expect most users to take 5 mins to think about the messages they expect to receive and configure a limit properly...

However, they changed the logic of limiting the size of the buffer with max_queue: indeed, in order to support recv_streaming, the buffer is now a buffer of frames instead of buffer of messages. This change leaked into the public API. As a consequence:

without fragmentation, the limit is still max_size * max_queue
with fragmentation, the limit is effectively lower — it's roughly max_size * max_queue / fragments_per_message

Challenge

We have a conflict here between two goals:

Keeping the semantics of max_size simple to increase the chances that many users configure a reasonable value.
Controlling memory usage accurately — the knobs don't match the actual memory usage anymore.

Changing max_size to be the size of a fragment would break the first goal because users could receive an arbitrarily large message in recv() if an attacker sent a large message in small fragments. That would be unacceptable.

The current implementation fails the second goal in the case of messages with many small fragments, as described in this issue. When I wrote it, I treated fragmentation as a transport-level concern. I considered that the receiver should always be able to handle the full message.

Upon further thought, this seems incorrect. For example, if you want to transfer a large file in a WebSocket message (why not?) you can send it chunk-by-chunk at one end and write it to disk chunk-by-chunk at the other end. That should work even if it's a 10GB file and you don't want to hold it in memory.

Potential solution

I'm considering supporting max_size=(max_message_size, max_fragment_size). This would provide more control for those who want it without increasing the API surface. TBD if this can be implemented with reasonable complexity vs. the benefits.

Fix #1602.

aaugustin · 2025-04-22T06:35:56Z

Hello, would you be able to test if #1622 does what you want?

Fix #1602.

gmabey · 2025-04-23T13:37:20Z

I was initially unhappy with the concept of overloading max_size with an tuple/iterable type, but I have to admit your rationale is very good; the frequency of this use case warrants a more arcane form.
I hope to test #1622 later this week.

aaugustin · 2025-04-23T18:29:25Z

Yes, I know, I've been hesitating between overloading adding a new argument... I went for overloading because:

there are too many arguments already...
having max_size + max_fragment_size would look weird and I didn't want to rename max_size to max_message_size.

aaugustin added the enhancement label Apr 21, 2025

aaugustin added a commit that referenced this issue Apr 22, 2025

Support separate limits for messages and fragments.

8ecc1c7

Fix #1602.

aaugustin linked a pull request Apr 22, 2025 that will close this issue

Support separate limits for messages and fragments. #1622

Open

aaugustin added a commit that referenced this issue Apr 22, 2025

Support separate limits for messages and fragments.

2dd5bac

Fix #1602.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Respecting max_size on fragments but not on message #1602

Respecting max_size on fragments but not on message #1602

gmabey commented Mar 7, 2025

aaugustin commented Mar 8, 2025

aaugustin commented Mar 8, 2025 •

edited

Loading

aaugustin commented Mar 8, 2025

aaugustin commented Apr 21, 2025 •

edited

Loading

aaugustin commented Apr 22, 2025

gmabey commented Apr 23, 2025

aaugustin commented Apr 23, 2025

Respecting max_size on fragments but not on message #1602

Respecting max_size on fragments but not on message #1602

Comments

gmabey commented Mar 7, 2025

aaugustin commented Mar 8, 2025

aaugustin commented Mar 8, 2025 • edited Loading

aaugustin commented Mar 8, 2025

aaugustin commented Apr 21, 2025 • edited Loading

aaugustin commented Apr 22, 2025

gmabey commented Apr 23, 2025

aaugustin commented Apr 23, 2025

aaugustin commented Mar 8, 2025 •

edited

Loading

aaugustin commented Apr 21, 2025 •

edited

Loading