Lines Matching +full:max +full:- +full:frame +full:- +full:size
1 .. SPDX-License-Identifier: GPL-2.0
22 - Ulisses Alonso Camaró <uaca@i.hate.spam.alumni.uv.es>
23 - Johann Baudy
33 On the other hand PACKET_MMAP is very efficient. PACKET_MMAP provides a size
67 [setup] socket() -------> creation of the capture socket
68 setsockopt() ---> allocation of the circular buffer (ring)
70 mmap() ---------> mapping of the allocated buffer to the
73 [capture] poll() ---------> to wait for incoming packets
75 [shutdown] close() --------> destruction of the capture socket and
88 supported and a link level pseudo-header is provided
107 [setup] socket() -------> creation of the transmission socket
108 setsockopt() ---> allocation of the circular buffer (ring)
110 bind() ---------> bind transmission socket with a network interface
111 mmap() ---------> mapping of the allocated buffer to the
114 [transmission] poll() ---------> wait for free packets (optional)
115 send() ---------> send all packets that are set as ready in
120 [shutdown] close() --------> destruction of the transmission socket and
134 know the header size of frames used in the circular buffer.
136 As capture, each frame contains two parts::
138 --------------------
140 | | of this frame
141 |--------------------|
145 --------------------
159 ioctl(this->socket, SIOCGIFINDEX, &s_ifr);
167 bind(this->socket, (struct sockaddr *)&my_addr, sizeof(struct sockaddr_ll));
174 frame base + TPACKET_HDRLEN - sizeof(struct sockaddr_ll)
179 frame base + TPACKET_ALIGN(sizeof(struct tpacket_hdr))
182 the frame (for payload alignment with SOCK_RAW mode for instance) you
192 - Capture process::
196 - Transmission process::
205 unsigned int tp_block_size; /* Minimal size of contiguous block */
207 unsigned int tp_frame_size; /* Size of frame */
214 related meta-information like timestamps without requiring a system call.
236 +---------+---------+ +---------+---------+
237 | frame 1 | frame 2 | | frame 3 | frame 4 |
238 +---------+---------+ +---------+---------+
241 +---------+---------+ +---------+---------+
242 | frame 5 | frame 6 | | frame 7 | frame 8 |
243 +---------+---------+ +---------+---------+
245 A frame can be of any size with the only condition it can fit in a block. A block
246 can only hold an integer number of frames, or in other words, a frame cannot
258 Block size limit
259 ----------------
266 order=2 ==> 16384 bytes, etc. The maximum size of a
286 ------------------
292 called pg_vec, its size limits the number of blocks that can be allocated::
294 +---+---+---+---+
296 +---+---+---+---+
305 a pool of pre-determined sizes. This pool of memory is maintained by the slab
310 predetermined sizes that kmalloc uses can be checked in the "size-<bytes>"
318 PACKET_MMAP buffer size calculator
324 <size-max> is the maximum size of allocable with kmalloc
326 <pointer size> depends on the architecture -- ``sizeof(void *)``
327 <page size> depends on the architecture -- PAGE_SIZE or getpagesize (2)
328 <max-order> is the value defined with MAX_PAGE_ORDER
329 <frame size> it's an upper bound of frame's capture size (more on this later)
334 <block number> = <size-max>/<pointer size>
335 <block size> = <pagesize> << <max-order>
337 so, the max buffer size is::
339 <block number> * <block size>
343 <block number> * <block size> / <frame size>
348 <size-max> = 131072 bytes
349 <pointer size> = 4 bytes
351 <max-order> = 11
353 and a value for <frame size> of 2048 bytes. These parameters will yield::
356 <block size> = 4096 << 11 = 8 MiB.
358 and hence the buffer will have a 262144 MiB size. So it can hold
361 Actually, this buffer size is not possible with an i386 architecture.
363 an i386 kernel's memory size is limited to 1GiB.
371 -----------------
373 If you check the source code you will see that what I draw here as a frame
374 is not only the link level frame. At the beginning of each frame there is a
375 header called struct tpacket_hdr used in PACKET_MMAP to hold link level's frame
376 meta information like timestamp. So what we draw here a frame it's really
380 Frame structure:
382 - Start. Frame must be aligned to TPACKET_ALIGNMENT=16
383 - struct tpacket_hdr
384 - pad to TPACKET_ALIGNMENT=16
385 - struct sockaddr_ll
386 - Gap, chosen so that packet data (Start+tp_net) aligns to
388 - Start+tp_mac: [ Optional MAC header ]
389 - Start+tp_net: Packet data, aligned to TPACKET_ALIGNMENT=16.
390 - Pad to align to TPACKET_ALIGNMENT=16
395 - tp_block_size must be a multiple of PAGE_SIZE (1)
396 - tp_frame_size must be greater than TPACKET_HDRLEN (obvious)
397 - tp_frame_size must be a multiple of TPACKET_ALIGNMENT
398 - tp_frame_nr must be exactly frames_per_block*tp_block_nr
404 ---------------------------------------------
411 mmap(0, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
416 the frames. This is because a frame cannot be spawn across two
426 rx_ring = mmap(0, size * 2, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
427 tx_ring = rx_ring + size;
432 At the beginning of each frame there is an status field (see
433 struct tpacket_hdr). If this field is 0 means that the frame is ready
434 to be used for the kernel, If not, there is a frame the user can read
448 TP_STATUS_COPY This flag indicates that the frame (and associated
486 can use again that frame buffer.
508 #define TP_STATUS_AVAILABLE 0 // Frame is available
509 #define TP_STATUS_SEND_REQUEST 1 // Frame will be sent on next send()
510 #define TP_STATUS_SENDING 2 // Frame is currently in transmission
511 #define TP_STATUS_WRONG_FORMAT 4 // Frame format is not correct
514 packet, the user fills a data buffer of an available frame, sets tp_len to
515 current data buffer size and sets its status field to TP_STATUS_SEND_REQUEST.
525 header->tp_len = in_i_size;
526 header->tp_status = TP_STATUS_SEND_REQUEST;
527 retval = send(this->socket, NULL, 0, 0);
553 - Default if not otherwise specified by setsockopt(2)
554 - RX_RING, TX_RING available
556 TPACKET_V1 --> TPACKET_V2:
557 - Made 64 bit clean due to unsigned long usage in TPACKET_V1
560 - Timestamp resolution in nanoseconds instead of microseconds
561 - RX_RING, TX_RING available
562 - VLAN metadata information available for packets
566 - TP_STATUS_VLAN_VALID bit being set into the tp_status field indicates
568 - TP_STATUS_VLAN_TPID_VALID bit being set into the tp_status field
571 - How to switch to TPACKET_V2:
580 TPACKET_V2 --> TPACKET_V3:
581 - Flexible buffer implementation for RX_RING:
582 1. Blocks can be configured with non-static frame-size
583 2. Read/poll is at a block-level (as opposed to packet-level)
584 3. Added poll timeout to avoid indefinite user-space wait
586 4. Added user-configurable knobs:
591 - RX Hash data available in user space
592 - TX_RING semantics are conceptually similar to TPACKET_V2;
597 Packets with non-zero values of tp_next_offset will be dropped.
607 - PACKET_FANOUT_HASH: schedule to socket by skb's packet hash
608 - PACKET_FANOUT_LB: schedule to socket by round-robin
609 - PACKET_FANOUT_CPU: schedule to socket by CPU packet arrives on
610 - PACKET_FANOUT_RND: schedule to socket by random selection
611 - PACKET_FANOUT_ROLLOVER: if one socket is full, rollover to another
612 - PACKET_FANOUT_QM: schedule to socket by skbs recorded queue_mapping
692 while (limit-- > 0) {
740 case -1:
758 AF_PACKET's TPACKET_V3 ring buffer can be configured to use non-static frame
764 * ~15% - 20% reduction in CPU-usage
768 * Non static frame size to capture entire packet payload
773 it with gcc -Wall -O2 blob.c, and try things like "./a.out eth0", etc.)::
775 /* Written from scratch, but kernel-to-user space API usage
845 memset(&ring->req, 0, sizeof(ring->req));
846 ring->req.tp_block_size = blocksiz;
847 ring->req.tp_frame_size = framesiz;
848 ring->req.tp_block_nr = blocknum;
849 ring->req.tp_frame_nr = (blocksiz * blocknum) / framesiz;
850 ring->req.tp_retire_blk_tov = 60;
851 ring->req.tp_feature_req_word = TP_FT_REQ_FILL_RXHASH;
853 err = setsockopt(fd, SOL_PACKET, PACKET_RX_RING, &ring->req,
854 sizeof(ring->req));
860 ring->map = mmap(NULL, ring->req.tp_block_size * ring->req.tp_block_nr,
862 if (ring->map == MAP_FAILED) {
867 ring->rd = malloc(ring->req.tp_block_nr * sizeof(*ring->rd));
868 assert(ring->rd);
869 for (i = 0; i < ring->req.tp_block_nr; ++i) {
870 ring->rd[i].iov_base = ring->map + (i * ring->req.tp_block_size);
871 ring->rd[i].iov_len = ring->req.tp_block_size;
893 struct ethhdr *eth = (struct ethhdr *) ((uint8_t *) ppd + ppd->tp_mac);
896 if (eth->h_proto == htons(ETH_P_IP)) {
902 ss.sin_addr.s_addr = ip->saddr;
908 sd.sin_addr.s_addr = ip->daddr;
912 printf("%s -> %s, ", sbuff, dbuff);
915 printf("rxhash: 0x%x\n", ppd->hv1.tp_rxhash);
920 int num_pkts = pbd->h1.num_pkts, i;
925 pbd->h1.offset_to_first_pkt);
927 bytes += ppd->tp_snaplen;
931 ppd->tp_next_offset);
940 pbd->h1.block_status = TP_STATUS_KERNEL;
945 munmap(ring->map, ring->req.tp_block_size * ring->req.tp_block_nr);
946 free(ring->rd);
968 fd = setup_socket(&ring, argp[argc - 1]);
979 if ((pbd->h1.block_status & TP_STATUS_USER) == 0) {
980 poll(&pfd, 1, -1);
1015 This has the side-effect, that packets sent through PF_PACKET will bypass the
1057 frames to be updated resp. the frame handed over to the application, iv) walk
1063 in a first step to see if the frame belongs to the application, and then
1078 - Packet sockets work well together with Linux socket filters, thus you also